[PATCH 0/2] x86/VMX: fix for vmentry failure with TSX bits in LBR

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] x86/VMX: fix for vmentry failure with TSX bits in LBR
@ 2017-01-25 17:26 Sergey Dyasli
  2017-01-25 17:26 ` [PATCH 1/2] x86/VMX: introduce vmx_find_guest_msr() Sergey Dyasli
  2017-01-25 17:26 ` [PATCH 2/2] x86/VMX: fix vmentry failure with TSX bits in LBR Sergey Dyasli
  0 siblings, 2 replies; 13+ messages in thread
From: Sergey Dyasli @ 2017-01-25 17:26 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Kevin Tian, Jan Beulich, Jun Nakajima, Sergey Dyasli

The first patch is a general optimization which is nice to have prior
to the second patch which contains the real fix.

A similar bug was fixed in Linux's perf subsystem in Jun 2016:

    commit 19fc9ddd61e059cc45464bdf6e8fa304bb94080f
    ("perf/x86/intel: Fix MSR_LAST_BRANCH_FROM_x bug when no TSX")

But the issue affects virtualized systems in a more severe way since
all LBR entries have to be fixed during vmentry.

Sergey Dyasli (2):
  x86/VMX: introduce vmx_find_guest_msr()
  x86/VMX: fix vmentry failure with TSX bits in LBR

 xen/arch/x86/hvm/vmx/vmcs.c        | 80 +++++++++++++++++++++++++++++---------
 xen/arch/x86/hvm/vmx/vmx.c         | 68 ++++++++++++++++++++++++++++++++
 xen/include/asm-x86/cpufeature.h   |  3 ++
 xen/include/asm-x86/hvm/vmx/vmcs.h |  3 ++
 4 files changed, 136 insertions(+), 18 deletions(-)

-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] x86/VMX: introduce vmx_find_guest_msr()
  2017-01-25 17:26 [PATCH 0/2] x86/VMX: fix for vmentry failure with TSX bits in LBR Sergey Dyasli
@ 2017-01-25 17:26 ` Sergey Dyasli
  2017-01-31 11:29   ` Jan Beulich
  2017-01-25 17:26 ` [PATCH 2/2] x86/VMX: fix vmentry failure with TSX bits in LBR Sergey Dyasli
  1 sibling, 1 reply; 13+ messages in thread
From: Sergey Dyasli @ 2017-01-25 17:26 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Kevin Tian, Jan Beulich, Jun Nakajima, Sergey Dyasli

Currently there can be up to 256 entries in a guest's MSR array and all
entries are stored in the order they were added.  Such design requires
to perform a linear scan of the whole array in order to find the MSR
with required index which can be a costly operation.

To avoid that, reuse the existing code for heap sort and binary search
and optimize existing functions which deal with guest's MSR arrays.
This way the time complexity of searching for required MSR reduces from
linear to logarithmic.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c        | 80 +++++++++++++++++++++++++++++---------
 xen/include/asm-x86/hvm/vmx/vmcs.h |  1 +
 2 files changed, 63 insertions(+), 18 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 59ef199..d04de8d 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -24,6 +24,7 @@
 #include <xen/event.h>
 #include <xen/kernel.h>
 #include <xen/keyhandler.h>
+#include <xen/sort.h>
 #include <xen/vm_event.h>
 #include <asm/current.h>
 #include <asm/cpufeature.h>
@@ -1283,19 +1284,36 @@ static int construct_vmcs(struct vcpu *v)
     return 0;
 }
 
-int vmx_read_guest_msr(u32 msr, u64 *val)
+static int vmx_msr_entry_key_cmp(const void *key, const void *elt)
 {
-    struct vcpu *curr = current;
-    unsigned int i, msr_count = curr->arch.hvm_vmx.msr_count;
+    const u32 *msr = key;
+    const struct vmx_msr_entry *entry = elt;
+
+    if ( *msr > entry->index )
+        return 1;
+    if ( *msr < entry->index )
+        return -1;
+    return 0;
+}
+
+struct vmx_msr_entry *vmx_find_guest_msr(const u32 msr)
+{
+    const struct vcpu *curr = current;
+    const unsigned int msr_count = curr->arch.hvm_vmx.msr_count;
     const struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.msr_area;
 
-    for ( i = 0; i < msr_count; i++ )
+    return bsearch(&msr, msr_area, msr_count, sizeof(struct vmx_msr_entry),
+                   vmx_msr_entry_key_cmp);
+}
+
+int vmx_read_guest_msr(u32 msr, u64 *val)
+{
+    const struct vmx_msr_entry *ent;
+
+    if ( (ent = vmx_find_guest_msr(msr)) != NULL )
     {
-        if ( msr_area[i].index == msr )
-        {
-            *val = msr_area[i].data;
+            *val = ent->data;
             return 0;
-        }
     }
 
     return -ESRCH;
@@ -1303,22 +1321,37 @@ int vmx_read_guest_msr(u32 msr, u64 *val)
 
 int vmx_write_guest_msr(u32 msr, u64 val)
 {
-    struct vcpu *curr = current;
-    unsigned int i, msr_count = curr->arch.hvm_vmx.msr_count;
-    struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.msr_area;
+    struct vmx_msr_entry *ent;
 
-    for ( i = 0; i < msr_count; i++ )
+    if ( (ent = vmx_find_guest_msr(msr)) != NULL )
     {
-        if ( msr_area[i].index == msr )
-        {
-            msr_area[i].data = val;
+            ent->data = val;
             return 0;
-        }
     }
 
     return -ESRCH;
 }
 
+static void vmx_msr_entry_swap(void *a, void *b, int size)
+{
+    struct vmx_msr_entry *l = a, *r = b, tmp;
+
+    tmp = *l;
+    *l = *r;
+    *r = tmp;
+}
+
+static int vmx_msr_entry_cmp(const void *a, const void *b)
+{
+    const struct vmx_msr_entry *l = a, *r = b;
+
+    if ( l->index > r->index )
+        return 1;
+    if ( l->index < r->index )
+        return -1;
+    return 0;
+}
+
 int vmx_add_msr(u32 msr, int type)
 {
     struct vcpu *curr = current;
@@ -1351,9 +1384,17 @@ int vmx_add_msr(u32 msr, int type)
             __vmwrite(VM_EXIT_MSR_LOAD_ADDR, virt_to_maddr(*msr_area));
     }
 
-    for ( idx = 0; idx < *msr_count; idx++ )
-        if ( (*msr_area)[idx].index == msr )
+    if ( type == VMX_GUEST_MSR )
+    {
+        if ( vmx_find_guest_msr(msr) != NULL )
             return 0;
+    }
+    else
+    {
+        for ( idx = 0; idx < *msr_count; idx++ )
+            if ( (*msr_area)[idx].index == msr )
+                return 0;
+    }
 
     if ( *msr_count == (PAGE_SIZE / sizeof(struct vmx_msr_entry)) )
         return -ENOSPC;
@@ -1369,6 +1410,9 @@ int vmx_add_msr(u32 msr, int type)
         msr_area_elem->data = 0;
         __vmwrite(VM_EXIT_MSR_STORE_COUNT, *msr_count);
         __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, *msr_count);
+
+        sort(*msr_area, *msr_count, sizeof(struct vmx_msr_entry),
+             vmx_msr_entry_cmp, vmx_msr_entry_swap);
     }
     else
     {
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 6c3d7ba..d01099e 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -589,6 +589,7 @@ enum vmx_insn_errno
 
 void vmx_disable_intercept_for_msr(struct vcpu *v, u32 msr, int type);
 void vmx_enable_intercept_for_msr(struct vcpu *v, u32 msr, int type);
+struct vmx_msr_entry *vmx_find_guest_msr(const u32 msr);
 int vmx_read_guest_msr(u32 msr, u64 *val);
 int vmx_write_guest_msr(u32 msr, u64 val);
 int vmx_add_msr(u32 msr, int type);
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] x86/VMX: fix vmentry failure with TSX bits in LBR
  2017-01-25 17:26 [PATCH 0/2] x86/VMX: fix for vmentry failure with TSX bits in LBR Sergey Dyasli
  2017-01-25 17:26 ` [PATCH 1/2] x86/VMX: introduce vmx_find_guest_msr() Sergey Dyasli
@ 2017-01-25 17:26 ` Sergey Dyasli
  2017-01-31 11:52   ` Jan Beulich
  1 sibling, 1 reply; 13+ messages in thread
From: Sergey Dyasli @ 2017-01-25 17:26 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Kevin Tian, Jan Beulich, Jun Nakajima, Sergey Dyasli

During VM entry, H/W will automatically load guest's MSRs from MSR-load
area in the same way as they would be written by WRMSR.

However, under the following conditions:

    1. LBR (Last Branch Record) MSRs were placed in the MSR-load area
    2. Address format of LBR includes TSX bits 61:62
    3. CPU has TSX support disabled

VM entry will fail with a message in the log similar to:

    (XEN) [   97.239514] d1v0 vmentry failure (reason 0x80000022): MSR loading (entry 3)
    (XEN) [   97.239516]   msr 00000680 val 1fff800000102e60 (mbz 0)

This happens because of the following behaviour:

    - When capturing branches, LBR H/W will always clear bits 61:62
      regardless of the sign extension
    - For WRMSR, bits 61:62 are considered the part of the sign extension

This bug affects only certain pCPUs (e.g. Haswell) with vCPUs that
use LBR.  Fix it by sign-extending TSX bits in all LBR entries during
VM entry in affected cases.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
---
 xen/arch/x86/hvm/vmx/vmx.c         | 68 ++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/cpufeature.h   |  3 ++
 xen/include/asm-x86/hvm/vmx/vmcs.h |  2 ++
 3 files changed, 73 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index a5e5ffd..586aaca 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2241,6 +2241,25 @@ static void pi_notification_interrupt(struct cpu_user_regs *regs)
     raise_softirq(VCPU_KICK_SOFTIRQ);
 }
 
+static bool __read_mostly vmx_lbr_tsx_fixup_needed;
+
+static void __init vmx_lbr_tsx_fixup_check(void)
+{
+    bool tsx_support = cpu_has_hle || cpu_has_rtm;
+    u64 caps;
+    u32 lbr_format;
+
+    if ( !cpu_has_pdcm )
+        return;
+
+    rdmsrl(MSR_IA32_PERF_CAPABILITIES, caps);
+    /* Lower 6 bits define the format of the address in the LBR stack */
+    lbr_format = caps & 0x3f;
+
+    /* 000100B -- TSX info is reported in the upper bits of 'FROM' registers */
+    vmx_lbr_tsx_fixup_needed = !tsx_support && lbr_format == 0x4;
+}
+
 const struct hvm_function_table * __init start_vmx(void)
 {
     set_in_cr4(X86_CR4_VMXE);
@@ -2310,6 +2329,8 @@ const struct hvm_function_table * __init start_vmx(void)
 
     setup_vmcs_dump();
 
+    vmx_lbr_tsx_fixup_check();
+
     return &vmx_function_table;
 }
 
@@ -2833,7 +2854,11 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
             for ( ; (rc == 0) && lbr->count; lbr++ )
                 for ( i = 0; (rc == 0) && (i < lbr->count); i++ )
                     if ( (rc = vmx_add_guest_msr(lbr->base + i)) == 0 )
+                    {
                         vmx_disable_intercept_for_msr(v, lbr->base + i, MSR_TYPE_R | MSR_TYPE_W);
+                        if ( vmx_lbr_tsx_fixup_needed )
+                            v->arch.hvm_vmx.lbr_tsx_fixup_enabled = true;
+                    }
         }
 
         if ( (rc < 0) ||
@@ -3854,6 +3879,46 @@ out:
     }
 }
 
+static void vmx_lbr_tsx_fixup(void)
+{
+    static const u64 LBR_FROM_SIGNEXT_2MSB = (1ULL << 59) | (1ULL << 60);
+    static u32 lbr_from_start = 0, lbr_from_end = 0, last_in_from_ip = 0;
+
+    const struct lbr_info *lbr;
+    const struct vcpu *curr = current;
+    const unsigned int msr_count = curr->arch.hvm_vmx.msr_count;
+    const const struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.msr_area;
+    struct vmx_msr_entry *msr;
+
+    if ( lbr_from_start == ~0U )
+        return;
+
+    if ( unlikely(lbr_from_start == 0) )
+    {
+        lbr = last_branch_msr_get();
+        if ( lbr == NULL )
+        {
+            lbr_from_start = ~0U;
+            return;
+        }
+
+        /* Warning: this depends on struct lbr_info[] layout! */
+        last_in_from_ip = lbr[0].base;
+        lbr_from_start = lbr[3].base;
+        lbr_from_end = lbr_from_start + lbr[3].count;
+    }
+
+    if ( (msr = vmx_find_guest_msr(lbr_from_start)) != NULL )
+    {
+        /* Sign extend into bits 61:62 while preserving bit 63 */
+        for ( ; msr < msr_area + msr_count && msr->index < lbr_from_end; msr++ )
+            msr->data |= ((LBR_FROM_SIGNEXT_2MSB & msr->data) << 2);
+    }
+
+    if ( (msr = vmx_find_guest_msr(last_in_from_ip)) != NULL )
+        msr->data |= ((LBR_FROM_SIGNEXT_2MSB & msr->data) << 2);
+}
+
 void vmx_vmenter_helper(const struct cpu_user_regs *regs)
 {
     struct vcpu *curr = current;
@@ -3910,6 +3975,9 @@ void vmx_vmenter_helper(const struct cpu_user_regs *regs)
     }
 
  out:
+    if ( unlikely(curr->arch.hvm_vmx.lbr_tsx_fixup_enabled) )
+        vmx_lbr_tsx_fixup();
+
     HVMTRACE_ND(VMENTRY, 0, 1/*cycles*/, 0, 0, 0, 0, 0, 0, 0);
 
     __vmwrite(GUEST_RIP,    regs->rip);
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index 39ad582..8e14728 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -78,6 +78,9 @@
 #define cpu_has_cmp_legacy	boot_cpu_has(X86_FEATURE_CMP_LEGACY)
 #define cpu_has_tbm		boot_cpu_has(X86_FEATURE_TBM)
 #define cpu_has_itsc		boot_cpu_has(X86_FEATURE_ITSC)
+#define cpu_has_hle             boot_cpu_has(X86_FEATURE_HLE)
+#define cpu_has_rtm             boot_cpu_has(X86_FEATURE_RTM)
+#define cpu_has_pdcm            boot_cpu_has(X86_FEATURE_PDCM)
 
 enum _cache_type {
     CACHE_TYPE_NULL = 0,
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index d01099e..8d9db36 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -231,6 +231,8 @@ struct arch_vmx_struct {
      * pCPU and wakeup the related vCPU.
      */
     struct pi_blocking_vcpu pi_blocking;
+
+    bool lbr_tsx_fixup_enabled;
 };
 
 int vmx_create_vmcs(struct vcpu *v);
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] x86/VMX: introduce vmx_find_guest_msr()
  2017-01-25 17:26 ` [PATCH 1/2] x86/VMX: introduce vmx_find_guest_msr() Sergey Dyasli
@ 2017-01-31 11:29   ` Jan Beulich
  2017-01-31 11:49     ` Andrew Cooper
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Beulich @ 2017-01-31 11:29 UTC (permalink / raw)
  To: Sergey Dyasli; +Cc: Andrew Cooper, Kevin Tian, Jun Nakajima, xen-devel

>>> On 25.01.17 at 18:26, <sergey.dyasli@citrix.com> wrote:
> @@ -1283,19 +1284,36 @@ static int construct_vmcs(struct vcpu *v)
>      return 0;
>  }
>  
> -int vmx_read_guest_msr(u32 msr, u64 *val)
> +static int vmx_msr_entry_key_cmp(const void *key, const void *elt)
>  {
> -    struct vcpu *curr = current;
> -    unsigned int i, msr_count = curr->arch.hvm_vmx.msr_count;
> +    const u32 *msr = key;
> +    const struct vmx_msr_entry *entry = elt;
> +
> +    if ( *msr > entry->index )
> +        return 1;
> +    if ( *msr < entry->index )
> +        return -1;
> +    return 0;
> +}
> +
> +struct vmx_msr_entry *vmx_find_guest_msr(const u32 msr)
> +{
> +    const struct vcpu *curr = current;
> +    const unsigned int msr_count = curr->arch.hvm_vmx.msr_count;
>      const struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.msr_area;

I'm not convinced the latter two local variables are particularly
useful. I'm also not convinced constifying non-pointed-to types
is all that useful (which also applies to the function parameter).
But I'll leave the decision whether to accept this to the VM
maintainers of course.

> +static int vmx_msr_entry_cmp(const void *a, const void *b)
> +{
> +    const struct vmx_msr_entry *l = a, *r = b;
> +
> +    if ( l->index > r->index )
> +        return 1;
> +    if ( l->index < r->index )
> +        return -1;
> +    return 0;
> +}

I'd prefer if there was just a single compare function, even if that
requires widening the key for the bsearch() invocation. Alternatively
...

> @@ -1369,6 +1410,9 @@ int vmx_add_msr(u32 msr, int type)
>          msr_area_elem->data = 0;
>          __vmwrite(VM_EXIT_MSR_STORE_COUNT, *msr_count);
>          __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, *msr_count);
> +
> +        sort(*msr_area, *msr_count, sizeof(struct vmx_msr_entry),
> +             vmx_msr_entry_cmp, vmx_msr_entry_swap);

... how about avoiding the sort() here altogether, by simply
going through the list linearly (which, being O(n), is still faster
than sort())? The more that there is a linear scan already
anyway. At which point it may then be beneficial to also keep
the host MSR array sorted.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] x86/VMX: introduce vmx_find_guest_msr()
  2017-01-31 11:29   ` Jan Beulich
@ 2017-01-31 11:49     ` Andrew Cooper
  2017-01-31 11:54       ` Jan Beulich
  0 siblings, 1 reply; 13+ messages in thread
From: Andrew Cooper @ 2017-01-31 11:49 UTC (permalink / raw)
  To: Jan Beulich, Sergey Dyasli; +Cc: Kevin Tian, Jun Nakajima, xen-devel

On 31/01/17 11:29, Jan Beulich wrote:
>>>> On 25.01.17 at 18:26, <sergey.dyasli@citrix.com> wrote:
>>>>
>> @@ -1369,6 +1410,9 @@ int vmx_add_msr(u32 msr, int type)
>>          msr_area_elem->data = 0;
>>          __vmwrite(VM_EXIT_MSR_STORE_COUNT, *msr_count);
>>          __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, *msr_count);
>> +
>> +        sort(*msr_area, *msr_count, sizeof(struct vmx_msr_entry),
>> +             vmx_msr_entry_cmp, vmx_msr_entry_swap);
> ... how about avoiding the sort() here altogether, by simply
> going through the list linearly (which, being O(n), is still faster
> than sort())? The more that there is a linear scan already
> anyway. At which point it may then be beneficial to also keep
> the host MSR array sorted.

The entire point of sorting this list is to trade an O(n) search for
O(log(n)) in every vmentry when fixing up the LBR MSR values.

There should be no O(n) searches across the list after this patch.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] x86/VMX: fix vmentry failure with TSX bits in LBR
  2017-01-25 17:26 ` [PATCH 2/2] x86/VMX: fix vmentry failure with TSX bits in LBR Sergey Dyasli
@ 2017-01-31 11:52   ` Jan Beulich
  0 siblings, 0 replies; 13+ messages in thread
From: Jan Beulich @ 2017-01-31 11:52 UTC (permalink / raw)
  To: Sergey Dyasli; +Cc: Andrew Cooper, Kevin Tian, Jun Nakajima, xen-devel

>>> On 25.01.17 at 18:26, <sergey.dyasli@citrix.com> wrote:
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -2241,6 +2241,25 @@ static void pi_notification_interrupt(struct cpu_user_regs *regs)
>      raise_softirq(VCPU_KICK_SOFTIRQ);
>  }
>  
> +static bool __read_mostly vmx_lbr_tsx_fixup_needed;
> +
> +static void __init vmx_lbr_tsx_fixup_check(void)
> +{
> +    bool tsx_support = cpu_has_hle || cpu_has_rtm;
> +    u64 caps;
> +    u32 lbr_format;
> +
> +    if ( !cpu_has_pdcm )
> +        return;
> +
> +    rdmsrl(MSR_IA32_PERF_CAPABILITIES, caps);
> +    /* Lower 6 bits define the format of the address in the LBR stack */
> +    lbr_format = caps & 0x3f;

Please add a #define for the constant here and ...

> +    /* 000100B -- TSX info is reported in the upper bits of 'FROM' registers */
> +    vmx_lbr_tsx_fixup_needed = !tsx_support && lbr_format == 0x4;

... an enum for the known values (to be used here).

> @@ -2833,7 +2854,11 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
>              for ( ; (rc == 0) && lbr->count; lbr++ )
>                  for ( i = 0; (rc == 0) && (i < lbr->count); i++ )
>                      if ( (rc = vmx_add_guest_msr(lbr->base + i)) == 0 )
> +                    {
>                          vmx_disable_intercept_for_msr(v, lbr->base + i, MSR_TYPE_R | MSR_TYPE_W);
> +                        if ( vmx_lbr_tsx_fixup_needed )
> +                            v->arch.hvm_vmx.lbr_tsx_fixup_enabled = true;

Why not simply

                        vmx_lbr_tsx_fixup_needed = v->arch.hvm_vmx.lbr_tsx_fixup_enabled;

?

> @@ -3854,6 +3879,46 @@ out:
>      }
>  }
>  
> +static void vmx_lbr_tsx_fixup(void)
> +{
> +    static const u64 LBR_FROM_SIGNEXT_2MSB = (1ULL << 59) | (1ULL << 60);

3ULL << 59 would likely be a little less strange while reading, without
having reached the point of use yet. I'm not convinced the use of a
static const variable here is warranted anyway, the more that the
identifier is a #define one anyway (by being all upper case). Yet if
you really want to keep it, uint64_t please (and likewise elsewhere).

> +    static u32 lbr_from_start = 0, lbr_from_end = 0, last_in_from_ip = 0;
> +

No blank line in the middle of declarations please. And did you
perhaps mean "last_int_from_ip"?

> +    const struct lbr_info *lbr;
> +    const struct vcpu *curr = current;
> +    const unsigned int msr_count = curr->arch.hvm_vmx.msr_count;
> +    const const struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.msr_area;
> +    struct vmx_msr_entry *msr;
> +
> +    if ( lbr_from_start == ~0U )
> +        return;
> +
> +    if ( unlikely(lbr_from_start == 0) )
> +    {
> +        lbr = last_branch_msr_get();
> +        if ( lbr == NULL )
> +        {
> +            lbr_from_start = ~0U;
> +            return;
> +        }
> +
> +        /* Warning: this depends on struct lbr_info[] layout! */

Please make suitable #define-s for them then, placed next to the
array definition.

> +        last_in_from_ip = lbr[0].base;
> +        lbr_from_start = lbr[3].base;
> +        lbr_from_end = lbr_from_start + lbr[3].count;

There's a race here: lbr_from_start needs to be written strictly
last, for the static variable latching to work in all cases.

> +    }
> +
> +    if ( (msr = vmx_find_guest_msr(lbr_from_start)) != NULL )
> +    {
> +        /* Sign extend into bits 61:62 while preserving bit 63 */
> +        for ( ; msr < msr_area + msr_count && msr->index < lbr_from_end; msr++ )
> +            msr->data |= ((LBR_FROM_SIGNEXT_2MSB & msr->data) << 2);

Please also clarify in the comment that this depends on the array
being sorted at all times.

> --- a/xen/include/asm-x86/cpufeature.h
> +++ b/xen/include/asm-x86/cpufeature.h
> @@ -78,6 +78,9 @@
>  #define cpu_has_cmp_legacy	boot_cpu_has(X86_FEATURE_CMP_LEGACY)
>  #define cpu_has_tbm		boot_cpu_has(X86_FEATURE_TBM)
>  #define cpu_has_itsc		boot_cpu_has(X86_FEATURE_ITSC)
> +#define cpu_has_hle             boot_cpu_has(X86_FEATURE_HLE)
> +#define cpu_has_rtm             boot_cpu_has(X86_FEATURE_RTM)
> +#define cpu_has_pdcm            boot_cpu_has(X86_FEATURE_PDCM)

Hard tabs are to be used here, as suggested by the neighboring
entries.

> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -231,6 +231,8 @@ struct arch_vmx_struct {
>       * pCPU and wakeup the related vCPU.
>       */
>      struct pi_blocking_vcpu pi_blocking;
> +
> +    bool lbr_tsx_fixup_enabled;

Please fit this into a currently unused byte.

Having reached here - migration doesn't need any similar
adjustment simply because we don't migrate these MSR values
(yet)? This may be worth stating in the commit message.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] x86/VMX: introduce vmx_find_guest_msr()
  2017-01-31 11:49     ` Andrew Cooper
@ 2017-01-31 11:54       ` Jan Beulich
  2017-01-31 12:06         ` Andrew Cooper
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Beulich @ 2017-01-31 11:54 UTC (permalink / raw)
  To: Andrew Cooper, Sergey Dyasli; +Cc: Kevin Tian, Jun Nakajima, xen-devel

>>> On 31.01.17 at 12:49, <andrew.cooper3@citrix.com> wrote:
> On 31/01/17 11:29, Jan Beulich wrote:
>>>>> On 25.01.17 at 18:26, <sergey.dyasli@citrix.com> wrote:
>>>>>
>>> @@ -1369,6 +1410,9 @@ int vmx_add_msr(u32 msr, int type)
>>>          msr_area_elem->data = 0;
>>>          __vmwrite(VM_EXIT_MSR_STORE_COUNT, *msr_count);
>>>          __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, *msr_count);
>>> +
>>> +        sort(*msr_area, *msr_count, sizeof(struct vmx_msr_entry),
>>> +             vmx_msr_entry_cmp, vmx_msr_entry_swap);
>> ... how about avoiding the sort() here altogether, by simply
>> going through the list linearly (which, being O(n), is still faster
>> than sort())? The more that there is a linear scan already
>> anyway. At which point it may then be beneficial to also keep
>> the host MSR array sorted.
> 
> The entire point of sorting this list is to trade an O(n) search for
> O(log(n)) in every vmentry when fixing up the LBR MSR values.
> 
> There should be no O(n) searches across the list after this patch.

And that's indeed not the case. But the sort() is O(n * log(n)).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] x86/VMX: introduce vmx_find_guest_msr()
  2017-01-31 11:54       ` Jan Beulich
@ 2017-01-31 12:06         ` Andrew Cooper
  2017-01-31 12:43           ` Jan Beulich
  0 siblings, 1 reply; 13+ messages in thread
From: Andrew Cooper @ 2017-01-31 12:06 UTC (permalink / raw)
  To: Jan Beulich, Sergey Dyasli; +Cc: Kevin Tian, Jun Nakajima, xen-devel

On 31/01/17 11:54, Jan Beulich wrote:
>>>> On 31.01.17 at 12:49, <andrew.cooper3@citrix.com> wrote:
>> On 31/01/17 11:29, Jan Beulich wrote:
>>>>>> On 25.01.17 at 18:26, <sergey.dyasli@citrix.com> wrote:
>>>>>>
>>>> @@ -1369,6 +1410,9 @@ int vmx_add_msr(u32 msr, int type)
>>>>          msr_area_elem->data = 0;
>>>>          __vmwrite(VM_EXIT_MSR_STORE_COUNT, *msr_count);
>>>>          __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, *msr_count);
>>>> +
>>>> +        sort(*msr_area, *msr_count, sizeof(struct vmx_msr_entry),
>>>> +             vmx_msr_entry_cmp, vmx_msr_entry_swap);
>>> ... how about avoiding the sort() here altogether, by simply
>>> going through the list linearly (which, being O(n), is still faster
>>> than sort())? The more that there is a linear scan already
>>> anyway. At which point it may then be beneficial to also keep
>>> the host MSR array sorted.
>> The entire point of sorting this list is to trade an O(n) search for
>> O(log(n)) in every vmentry when fixing up the LBR MSR values.
>>
>> There should be no O(n) searches across the list after this patch.
> And that's indeed not the case. But the sort() is O(n * log(n)).

I don't understand what point you are trying to make.

Adding MSRs to the list (turns out we have no remove yet) is a rare
occurrence, and in practice, this LBR addition is the only one which
happens at runtime rather than domain creation.

However, you cannot have an efficient fixup on vmenter if the list isn't
sorted, and it is not possible to sort a list in less than O(n * log(n))
in the general case.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] x86/VMX: introduce vmx_find_guest_msr()
  2017-01-31 12:06         ` Andrew Cooper
@ 2017-01-31 12:43           ` Jan Beulich
  2017-02-01  9:38             ` Sergey Dyasli
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Beulich @ 2017-01-31 12:43 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Sergey Dyasli, Kevin Tian, Jun Nakajima, xen-devel

>>> On 31.01.17 at 13:06, <andrew.cooper3@citrix.com> wrote:
> On 31/01/17 11:54, Jan Beulich wrote:
>>>>> On 31.01.17 at 12:49, <andrew.cooper3@citrix.com> wrote:
>>> On 31/01/17 11:29, Jan Beulich wrote:
>>>>>>> On 25.01.17 at 18:26, <sergey.dyasli@citrix.com> wrote:
>>>>>>>
>>>>> @@ -1369,6 +1410,9 @@ int vmx_add_msr(u32 msr, int type)
>>>>>          msr_area_elem->data = 0;
>>>>>          __vmwrite(VM_EXIT_MSR_STORE_COUNT, *msr_count);
>>>>>          __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, *msr_count);
>>>>> +
>>>>> +        sort(*msr_area, *msr_count, sizeof(struct vmx_msr_entry),
>>>>> +             vmx_msr_entry_cmp, vmx_msr_entry_swap);
>>>> ... how about avoiding the sort() here altogether, by simply
>>>> going through the list linearly (which, being O(n), is still faster
>>>> than sort())? The more that there is a linear scan already
>>>> anyway. At which point it may then be beneficial to also keep
>>>> the host MSR array sorted.
>>> The entire point of sorting this list is to trade an O(n) search for
>>> O(log(n)) in every vmentry when fixing up the LBR MSR values.
>>>
>>> There should be no O(n) searches across the list after this patch.
>> And that's indeed not the case. But the sort() is O(n * log(n)).
> 
> I don't understand what point you are trying to make.
> 
> Adding MSRs to the list (turns out we have no remove yet) is a rare
> occurrence, and in practice, this LBR addition is the only one which
> happens at runtime rather than domain creation.
> 
> However, you cannot have an efficient fixup on vmenter if the list isn't
> sorted, and it is not possible to sort a list in less than O(n * log(n))
> in the general case.

True, but we're adding incrementally, i.e. the list is already sorted,
and it is already being walked linearly a few lines up from where the
sort() invocation is being added. Hence the addition can as well be
done without sort(), and then in O(n).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] x86/VMX: introduce vmx_find_guest_msr()
  2017-01-31 12:43           ` Jan Beulich
@ 2017-02-01  9:38             ` Sergey Dyasli
  2017-02-01 10:19               ` Jan Beulich
  0 siblings, 1 reply; 13+ messages in thread
From: Sergey Dyasli @ 2017-02-01  9:38 UTC (permalink / raw)
  To: JBeulich; +Cc: Andrew Cooper, Kevin Tian, jun.nakajima, xen-devel

On Tue, 2017-01-31 at 05:43 -0700, Jan Beulich wrote:
> > 
> > > 
> > > > 
> > > > On 31.01.17 at 13:06, <andrew.cooper3@citrix.com> wrote:
> > On 31/01/17 11:54, Jan Beulich wrote:
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > On 31.01.17 at 12:49, <andrew.cooper3@citrix.com> wrote:
> > > > On 31/01/17 11:29, Jan Beulich wrote:
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > > 
> > > > > > > > On 25.01.17 at 18:26, <sergey.dyasli@citrix.com> wrote:
> > > > > > > > 
> > > > > > @@ -1369,6 +1410,9 @@ int vmx_add_msr(u32 msr, int type)
> > > > > >          msr_area_elem->data = 0;
> > > > > >          __vmwrite(VM_EXIT_MSR_STORE_COUNT, *msr_count);
> > > > > >          __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, *msr_count);
> > > > > > +
> > > > > > +        sort(*msr_area, *msr_count, sizeof(struct vmx_msr_entry),
> > > > > > +             vmx_msr_entry_cmp, vmx_msr_entry_swap);
> > > > > ... how about avoiding the sort() here altogether, by simply
> > > > > going through the list linearly (which, being O(n), is still faster
> > > > > than sort())? The more that there is a linear scan already
> > > > > anyway. At which point it may then be beneficial to also keep
> > > > > the host MSR array sorted.
> > > > The entire point of sorting this list is to trade an O(n) search for
> > > > O(log(n)) in every vmentry when fixing up the LBR MSR values.
> > > > 
> > > > There should be no O(n) searches across the list after this patch.
> > > And that's indeed not the case. But the sort() is O(n * log(n)).
> > I don't understand what point you are trying to make.
> > 
> > Adding MSRs to the list (turns out we have no remove yet) is a rare
> > occurrence, and in practice, this LBR addition is the only one which
> > happens at runtime rather than domain creation.
> > 
> > However, you cannot have an efficient fixup on vmenter if the list isn't
> > sorted, and it is not possible to sort a list in less than O(n * log(n))
> > in the general case.
> True, but we're adding incrementally, i.e. the list is already sorted,
> and it is already being walked linearly a few lines up from where the
> sort() invocation is being added. Hence the addition can as well be
> done without sort(), and then in O(n).

1. Guest's MSR list is not sorted currently, which can be seen from
lbr_info:

    MSR_IA32_LASTINTFROMIP          0x000001dd
    MSR_IA32_LASTINTTOIP            0x000001de
    MSR_C2_LASTBRANCH_TOS           0x000001c9
    MSR_P4_LASTBRANCH_0_FROM_LIP    0x00000680

2. In the future there might be more MSRs in the list and a sorted list
is a prerequisite for fast lookups. Time complexity of vmx_add_msr()
is irrelevant since it's a "one shot" operation.

Thanks,
Sergey
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] x86/VMX: introduce vmx_find_guest_msr()
  2017-02-01  9:38             ` Sergey Dyasli
@ 2017-02-01 10:19               ` Jan Beulich
  2017-02-01 10:43                 ` Sergey Dyasli
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Beulich @ 2017-02-01 10:19 UTC (permalink / raw)
  To: Sergey Dyasli; +Cc: Andrew Cooper, Kevin Tian, jun.nakajima, xen-devel

>>> On 01.02.17 at 10:38, <sergey.dyasli@citrix.com> wrote:
> On Tue, 2017-01-31 at 05:43 -0700, Jan Beulich wrote:
>> > 
>> > > 
>> > > > 
>> > > > On 31.01.17 at 13:06, <andrew.cooper3@citrix.com> wrote:
>> > On 31/01/17 11:54, Jan Beulich wrote:
>> > > 
>> > > > 
>> > > > > 
>> > > > > > 
>> > > > > > On 31.01.17 at 12:49, <andrew.cooper3@citrix.com> wrote:
>> > > > On 31/01/17 11:29, Jan Beulich wrote:
>> > > > > 
>> > > > > > 
>> > > > > > > 
>> > > > > > > > 
>> > > > > > > > On 25.01.17 at 18:26, <sergey.dyasli@citrix.com> wrote:
>> > > > > > > > 
>> > > > > > @@ -1369,6 +1410,9 @@ int vmx_add_msr(u32 msr, int type)
>> > > > > >          msr_area_elem->data = 0;
>> > > > > >          __vmwrite(VM_EXIT_MSR_STORE_COUNT, *msr_count);
>> > > > > >          __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, *msr_count);
>> > > > > > +
>> > > > > > +        sort(*msr_area, *msr_count, sizeof(struct vmx_msr_entry),
>> > > > > > +             vmx_msr_entry_cmp, vmx_msr_entry_swap);
>> > > > > ... how about avoiding the sort() here altogether, by simply
>> > > > > going through the list linearly (which, being O(n), is still faster
>> > > > > than sort())? The more that there is a linear scan already
>> > > > > anyway. At which point it may then be beneficial to also keep
>> > > > > the host MSR array sorted.
>> > > > The entire point of sorting this list is to trade an O(n) search for
>> > > > O(log(n)) in every vmentry when fixing up the LBR MSR values.
>> > > > 
>> > > > There should be no O(n) searches across the list after this patch.
>> > > And that's indeed not the case. But the sort() is O(n * log(n)).
>> > I don't understand what point you are trying to make.
>> > 
>> > Adding MSRs to the list (turns out we have no remove yet) is a rare
>> > occurrence, and in practice, this LBR addition is the only one which
>> > happens at runtime rather than domain creation.
>> > 
>> > However, you cannot have an efficient fixup on vmenter if the list isn't
>> > sorted, and it is not possible to sort a list in less than O(n * log(n))
>> > in the general case.
>> True, but we're adding incrementally, i.e. the list is already sorted,
>> and it is already being walked linearly a few lines up from where the
>> sort() invocation is being added. Hence the addition can as well be
>> done without sort(), and then in O(n).
> 
> 1. Guest's MSR list is not sorted currently, which can be seen from
> lbr_info:
> 
>     MSR_IA32_LASTINTFROMIP          0x000001dd
>     MSR_IA32_LASTINTTOIP            0x000001de
>     MSR_C2_LASTBRANCH_TOS           0x000001c9
>     MSR_P4_LASTBRANCH_0_FROM_LIP    0x00000680

I don't understand: Your patch arranges for the list to be sorted,
doesn't it? All I'm questioning is the approach of how the sorting
is being done - what I'm trying to say is that I think you can do
without any sort() invocation, leveraging the fact that the list
you want to add to is already sorted (inductively, starting from a
zero length list, by always inserting at the right spot, the list will
always be sorted).

> 2. In the future there might be more MSRs in the list and a sorted list
> is a prerequisite for fast lookups. Time complexity of vmx_add_msr()
> is irrelevant since it's a "one shot" operation.

I've never said I'm against sorting.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] x86/VMX: introduce vmx_find_guest_msr()
  2017-02-01 10:19               ` Jan Beulich
@ 2017-02-01 10:43                 ` Sergey Dyasli
  2017-02-01 10:58                   ` Jan Beulich
  0 siblings, 1 reply; 13+ messages in thread
From: Sergey Dyasli @ 2017-02-01 10:43 UTC (permalink / raw)
  To: JBeulich
  Cc: Sergey Dyasli, Kevin Tian, Andrew Cooper, jun.nakajima, xen-devel

On Wed, 2017-02-01 at 03:19 -0700, Jan Beulich wrote:
> > 
> > > 
> > > > 
> > > > On 01.02.17 at 10:38, <sergey.dyasli@citrix.com> wrote:
> > On Tue, 2017-01-31 at 05:43 -0700, Jan Beulich wrote:
> > > 
> > > > 
> > > > 
> > > > > 
> > > > > 
> > > > > > 
> > > > > > 
> > > > > > On 31.01.17 at 13:06, <andrew.cooper3@citrix.com> wrote:
> > > > On 31/01/17 11:54, Jan Beulich wrote:
> > > > > 
> > > > > 
> > > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > On 31.01.17 at 12:49, <andrew.cooper3@citrix.com> wrote:
> > > > > > On 31/01/17 11:29, Jan Beulich wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > On 25.01.17 at 18:26, <sergey.dyasli@citrix.com> wrote:
> > > > > > > > > > 
> > > > > > > > @@ -1369,6 +1410,9 @@ int vmx_add_msr(u32 msr, int type)
> > > > > > > >          msr_area_elem->data = 0;
> > > > > > > >          __vmwrite(VM_EXIT_MSR_STORE_COUNT, *msr_count);
> > > > > > > >          __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, *msr_count);
> > > > > > > > +
> > > > > > > > +        sort(*msr_area, *msr_count, sizeof(struct vmx_msr_entry),
> > > > > > > > +             vmx_msr_entry_cmp, vmx_msr_entry_swap);
> > > > > > > ... how about avoiding the sort() here altogether, by simply
> > > > > > > going through the list linearly (which, being O(n), is still faster
> > > > > > > than sort())? The more that there is a linear scan already
> > > > > > > anyway. At which point it may then be beneficial to also keep
> > > > > > > the host MSR array sorted.
> > > > > > The entire point of sorting this list is to trade an O(n) search for
> > > > > > O(log(n)) in every vmentry when fixing up the LBR MSR values.
> > > > > > 
> > > > > > There should be no O(n) searches across the list after this patch.
> > > > > And that's indeed not the case. But the sort() is O(n * log(n)).
> > > > I don't understand what point you are trying to make.
> > > > 
> > > > Adding MSRs to the list (turns out we have no remove yet) is a rare
> > > > occurrence, and in practice, this LBR addition is the only one which
> > > > happens at runtime rather than domain creation.
> > > > 
> > > > However, you cannot have an efficient fixup on vmenter if the list isn't
> > > > sorted, and it is not possible to sort a list in less than O(n * log(n))
> > > > in the general case.
> > > True, but we're adding incrementally, i.e. the list is already sorted,
> > > and it is already being walked linearly a few lines up from where the
> > > sort() invocation is being added. Hence the addition can as well be
> > > done without sort(), and then in O(n).
> > 1. Guest's MSR list is not sorted currently, which can be seen from
> > lbr_info:
> > 
> >     MSR_IA32_LASTINTFROMIP          0x000001dd
> >     MSR_IA32_LASTINTTOIP            0x000001de
> >     MSR_C2_LASTBRANCH_TOS           0x000001c9
> >     MSR_P4_LASTBRANCH_0_FROM_LIP    0x00000680
> I don't understand: Your patch arranges for the list to be sorted,
> doesn't it? All I'm questioning is the approach of how the sorting
> is being done - what I'm trying to say is that I think you can do
> without any sort() invocation, leveraging the fact that the list
> you want to add to is already sorted (inductively, starting from a
> zero length list, by always inserting at the right spot, the list will
> always be sorted).

You are right that the most effective way would be to use insertion sort.
However, this will require to write some
new code for it.  Since I'm not
convinced that performance is really critical here, the heap sort code
is simply
reused.

> > > 
> > 2. In the future there might be more MSRs in the list and a sorted list
> > is a prerequisite for fast lookups. Time complexity of vmx_add_msr()
> > is irrelevant since it's a "one shot" operation.
> I've never said I'm against sorting.
> > Jan

Thanks,
Sergey
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] x86/VMX: introduce vmx_find_guest_msr()
  2017-02-01 10:43                 ` Sergey Dyasli
@ 2017-02-01 10:58                   ` Jan Beulich
  0 siblings, 0 replies; 13+ messages in thread
From: Jan Beulich @ 2017-02-01 10:58 UTC (permalink / raw)
  To: Sergey Dyasli; +Cc: AndrewCooper, Kevin Tian, jun.nakajima, xen-devel

>>> On 01.02.17 at 11:43, <sergey.dyasli@citrix.com> wrote:
> On Wed, 2017-02-01 at 03:19 -0700, Jan Beulich wrote:
>> > 
>> > > 
>> > > > 
>> > > > On 01.02.17 at 10:38, <sergey.dyasli@citrix.com> wrote:
>> > On Tue, 2017-01-31 at 05:43 -0700, Jan Beulich wrote:
>> > > 
>> > > > 
>> > > > 
>> > > > > 
>> > > > > 
>> > > > > > 
>> > > > > > 
>> > > > > > On 31.01.17 at 13:06, <andrew.cooper3@citrix.com> wrote:
>> > > > On 31/01/17 11:54, Jan Beulich wrote:
>> > > > > 
>> > > > > 
>> > > > > > 
>> > > > > > 
>> > > > > > > 
>> > > > > > > 
>> > > > > > > > 
>> > > > > > > > 
>> > > > > > > > On 31.01.17 at 12:49, <andrew.cooper3@citrix.com> wrote:
>> > > > > > On 31/01/17 11:29, Jan Beulich wrote:
>> > > > > > > 
>> > > > > > > 
>> > > > > > > > 
>> > > > > > > > 
>> > > > > > > > > 
>> > > > > > > > > 
>> > > > > > > > > > 
>> > > > > > > > > > 
>> > > > > > > > > > On 25.01.17 at 18:26, <sergey.dyasli@citrix.com> wrote:
>> > > > > > > > > > 
>> > > > > > > > @@ -1369,6 +1410,9 @@ int vmx_add_msr(u32 msr, int type)
>> > > > > > > >          msr_area_elem->data = 0;
>> > > > > > > >          __vmwrite(VM_EXIT_MSR_STORE_COUNT, *msr_count);
>> > > > > > > >          __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, *msr_count);
>> > > > > > > > +
>> > > > > > > > +        sort(*msr_area, *msr_count, sizeof(struct vmx_msr_entry),
>> > > > > > > > +             vmx_msr_entry_cmp, vmx_msr_entry_swap);
>> > > > > > > ... how about avoiding the sort() here altogether, by simply
>> > > > > > > going through the list linearly (which, being O(n), is still faster
>> > > > > > > than sort())? The more that there is a linear scan already
>> > > > > > > anyway. At which point it may then be beneficial to also keep
>> > > > > > > the host MSR array sorted.
>> > > > > > The entire point of sorting this list is to trade an O(n) search for
>> > > > > > O(log(n)) in every vmentry when fixing up the LBR MSR values.
>> > > > > > 
>> > > > > > There should be no O(n) searches across the list after this patch.
>> > > > > And that's indeed not the case. But the sort() is O(n * log(n)).
>> > > > I don't understand what point you are trying to make.
>> > > > 
>> > > > Adding MSRs to the list (turns out we have no remove yet) is a rare
>> > > > occurrence, and in practice, this LBR addition is the only one which
>> > > > happens at runtime rather than domain creation.
>> > > > 
>> > > > However, you cannot have an efficient fixup on vmenter if the list isn't
>> > > > sorted, and it is not possible to sort a list in less than O(n * log(n))
>> > > > in the general case.
>> > > True, but we're adding incrementally, i.e. the list is already sorted,
>> > > and it is already being walked linearly a few lines up from where the
>> > > sort() invocation is being added. Hence the addition can as well be
>> > > done without sort(), and then in O(n).
>> > 1. Guest's MSR list is not sorted currently, which can be seen from
>> > lbr_info:
>> > 
>> >     MSR_IA32_LASTINTFROMIP         0x000001dd
>> >     MSR_IA32_LASTINTTOIP            0x000001de
>> >     MSR_C2_LASTBRANCH_TOS           0x000001c9
>> >     MSR_P4_LASTBRANCH_0_FROM_LIP    0x00000680
>> I don't understand: Your patch arranges for the list to be sorted,
>> doesn't it? All I'm questioning is the approach of how the sorting
>> is being done - what I'm trying to say is that I think you can do
>> without any sort() invocation, leveraging the fact that the list
>> you want to add to is already sorted (inductively, starting from a
>> zero length list, by always inserting at the right spot, the list will
>> always be sorted).
> 
> You are right that the most effective way would be to use insertion sort.
> However, this will require to write some
> new code for it.  Since I'm not
> convinced that performance is really critical here, the heap sort code
> is simply
> reused.

Which is - I think - more new code than simply leveraging the
existing linear scan over the array, recording the insertion
point, perhaps (since it's sorted already) bailing once an entry
with a higher numbered MSR is found, and memmov()ing any
higher numbered entries. Performance isn't the main aspect
here, but I don't think we should write slow code (even if that
code is being used rarely) when - with about the same amount
of effort/code - we can have a faster variant.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-02-01 10:58 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-25 17:26 [PATCH 0/2] x86/VMX: fix for vmentry failure with TSX bits in LBR Sergey Dyasli
2017-01-25 17:26 ` [PATCH 1/2] x86/VMX: introduce vmx_find_guest_msr() Sergey Dyasli
2017-01-31 11:29   ` Jan Beulich
2017-01-31 11:49     ` Andrew Cooper
2017-01-31 11:54       ` Jan Beulich
2017-01-31 12:06         ` Andrew Cooper
2017-01-31 12:43           ` Jan Beulich
2017-02-01  9:38             ` Sergey Dyasli
2017-02-01 10:19               ` Jan Beulich
2017-02-01 10:43                 ` Sergey Dyasli
2017-02-01 10:58                   ` Jan Beulich
2017-01-25 17:26 ` [PATCH 2/2] x86/VMX: fix vmentry failure with TSX bits in LBR Sergey Dyasli
2017-01-31 11:52   ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.