All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] x86/Intel: virtualize support for cpuid faulting
@ 2016-10-14 19:47 Kyle Huey
  2016-10-14 19:47 ` [PATCH v3 1/2] x86/Intel: Expose cpuid_faulting_enabled so it can be used elsewhere Kyle Huey
  2016-10-14 19:47 ` [PATCH v3 2/2] x86/Intel: virtualize support for cpuid faulting Kyle Huey
  0 siblings, 2 replies; 17+ messages in thread
From: Kyle Huey @ 2016-10-14 19:47 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Kevin Tian, Jun Nakajima, Jan Beulich,
	Robert O'Callahan

rr (http://rr-project.org/), a Linux userspace record-and-replay reverse-
execution debugger, would like to trap and emulate the CPUID instruction.
This would allow us to a) mask away certain hardware features that rr does
not support (e.g. RDRAND) and b) enable trace portability across machines
by providing constant results. Patches for support in the Linux kernel are in
flight, and we'd like to be able to use this feature on virtualized Linux
instances as well.

Changes since v2:
Patch 1:
- s/bool_t/bool
- Added missing Signed-off-by.

Patch 2:
- Style nits.
- Made comments in traps.c more descriptive.
- Don't advance IP past the CPUID when faulting from vmx_do_cpuid.
- Do advance IP past the undefined prefix in emulate_forced_invalid_op.
- Deliver a #GP when faulting in emulate_forced_invalid_op (instead of #UD).
- Rearrange cpuid_fault within arch_vcpu.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 1/2] x86/Intel: Expose cpuid_faulting_enabled so it can be used elsewhere
  2016-10-14 19:47 [PATCH v3] x86/Intel: virtualize support for cpuid faulting Kyle Huey
@ 2016-10-14 19:47 ` Kyle Huey
  2016-10-17 12:35   ` Andrew Cooper
  2016-10-17 12:43   ` Wei Liu
  2016-10-14 19:47 ` [PATCH v3 2/2] x86/Intel: virtualize support for cpuid faulting Kyle Huey
  1 sibling, 2 replies; 17+ messages in thread
From: Kyle Huey @ 2016-10-14 19:47 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Kevin Tian, Jun Nakajima, Jan Beulich,
	Robert O'Callahan

While we're here, use bool instead of bool_t.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
---
 xen/arch/x86/cpu/intel.c    | 9 +++++----
 xen/include/asm-x86/cpuid.h | 3 +++
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
index 7b60aaa..2e11662 100644
--- a/xen/arch/x86/cpu/intel.c
+++ b/xen/arch/x86/cpu/intel.c
@@ -13,34 +13,35 @@
 #include <mach_apic.h>
 #include <asm/hvm/support.h>
 #include <asm/setup.h>
 
 #include "cpu.h"
 
 #define select_idle_routine(x) ((void)0)
 
-static bool_t __init probe_intel_cpuid_faulting(void)
+static bool __init probe_intel_cpuid_faulting(void)
 {
 	uint64_t x;
 
 	if (rdmsr_safe(MSR_INTEL_PLATFORM_INFO, x) ||
 	    !(x & MSR_PLATFORM_INFO_CPUID_FAULTING))
 		return 0;
 
 	expected_levelling_cap |= LCAP_faulting;
 	levelling_caps |=  LCAP_faulting;
 	__set_bit(X86_FEATURE_CPUID_FAULTING, boot_cpu_data.x86_capability);
 	return 1;
 }
 
-static void set_cpuid_faulting(bool_t enable)
+DEFINE_PER_CPU(bool, cpuid_faulting_enabled);
+
+static void set_cpuid_faulting(bool enable)
 {
-	static DEFINE_PER_CPU(bool_t, cpuid_faulting_enabled);
-	bool_t *this_enabled = &this_cpu(cpuid_faulting_enabled);
+	bool *this_enabled = &this_cpu(cpuid_faulting_enabled);
 	uint32_t hi, lo;
 
 	ASSERT(cpu_has_cpuid_faulting);
 
 	if (*this_enabled == enable)
 		return;
 
 	rdmsr(MSR_INTEL_MISC_FEATURES_ENABLES, lo, hi);
diff --git a/xen/include/asm-x86/cpuid.h b/xen/include/asm-x86/cpuid.h
index 8e3f639..2372474 100644
--- a/xen/include/asm-x86/cpuid.h
+++ b/xen/include/asm-x86/cpuid.h
@@ -59,16 +59,19 @@ struct cpuidmasks
 };
 
 /* Per CPU shadows of masking MSR values, for lazy context switching. */
 DECLARE_PER_CPU(struct cpuidmasks, cpuidmasks);
 
 /* Default masking MSR values, calculated at boot. */
 extern struct cpuidmasks cpuidmask_defaults;
 
+/* Whether or not cpuid faulting is available for the current domain. */
+DECLARE_PER_CPU(bool, cpuid_faulting_enabled);
+
 #endif /* __ASSEMBLY__ */
 #endif /* !__X86_CPUID_H__ */
 
 /*
  * Local variables:
  * mode: C
  * c-file-style: "BSD"
  * c-basic-offset: 4

base-commit: 71b8b46111219a2f83f4f9ae06ac5409744ea86e
-- 
2.10.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 2/2] x86/Intel: virtualize support for cpuid faulting
  2016-10-14 19:47 [PATCH v3] x86/Intel: virtualize support for cpuid faulting Kyle Huey
  2016-10-14 19:47 ` [PATCH v3 1/2] x86/Intel: Expose cpuid_faulting_enabled so it can be used elsewhere Kyle Huey
@ 2016-10-14 19:47 ` Kyle Huey
  2016-10-17 12:32   ` Wei Liu
  2016-10-17 12:49   ` Andrew Cooper
  1 sibling, 2 replies; 17+ messages in thread
From: Kyle Huey @ 2016-10-14 19:47 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Kevin Tian, Jun Nakajima, Jan Beulich,
	Robert O'Callahan

On HVM guests, the cpuid triggers a vm exit, so we can check the emulated
faulting state in vmx_do_cpuid and inject a GP(0) if CPL > 0. Notably no
hardware support for faulting on cpuid is necessary to emulate support with an
HVM guest.

On PV guests, hardware support is required so that userspace cpuid will trap
to xen. Xen already enables cpuid faulting on supported CPUs for pv guests (that
aren't the control domain, see the comment in intel_ctxt_switch_levelling).
Every PV guest cpuid will trap via a GP(0) to emulate_privileged_op (via
do_general_protection). Once there we simply decline to emulate cpuid if the
CPL > 0 and faulting is enabled, leaving the GP(0) for the guest kernel to
handle.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
---
 xen/arch/x86/hvm/vmx/vmx.c   | 24 ++++++++++++++++++++++--
 xen/arch/x86/traps.c         | 34 ++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/domain.h |  3 +++
 3 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index b9102ce..c038393 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2427,16 +2427,25 @@ static void vmx_cpuid_intercept(
 
     HVMTRACE_5D (CPUID, input, *eax, *ebx, *ecx, *edx);
 }
 
 static int vmx_do_cpuid(struct cpu_user_regs *regs)
 {
     unsigned int eax, ebx, ecx, edx;
     unsigned int leaf, subleaf;
+    struct segment_register sreg;
+    struct vcpu *v = current;
+
+    hvm_get_segment_register(v, x86_seg_ss, &sreg);
+    if ( v->arch.cpuid_fault && sreg.attr.fields.dpl > 0 )
+    {
+        hvm_inject_hw_exception(TRAP_gp_fault, 0);
+        return 1; /* Don't advance the guest IP! */
+    }
 
     eax = regs->eax;
     ebx = regs->ebx;
     ecx = regs->ecx;
     edx = regs->edx;
 
     leaf = regs->eax;
     subleaf = regs->ecx;
@@ -2694,19 +2703,23 @@ static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
     case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
     case MSR_IA32_PEBS_ENABLE:
     case MSR_IA32_DS_AREA:
         if ( vpmu_do_rdmsr(msr, msr_content) )
             goto gp_fault;
         break;
 
     case MSR_INTEL_PLATFORM_INFO:
-        if ( rdmsr_safe(MSR_INTEL_PLATFORM_INFO, *msr_content) )
-            goto gp_fault;
+        *msr_content = MSR_PLATFORM_INFO_CPUID_FAULTING;
+        break;
+
+    case MSR_INTEL_MISC_FEATURES_ENABLES:
         *msr_content = 0;
+        if ( current->arch.cpuid_fault )
+            *msr_content |= MSR_MISC_FEATURES_CPUID_FAULTING;
         break;
 
     default:
         if ( passive_domain_do_rdmsr(msr, msr_content) )
             goto done;
         switch ( long_mode_do_msr_read(msr, msr_content) )
         {
             case HNDL_unhandled:
@@ -2925,16 +2938,23 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
         break;
 
     case MSR_INTEL_PLATFORM_INFO:
         if ( msr_content ||
              rdmsr_safe(MSR_INTEL_PLATFORM_INFO, msr_content) )
             goto gp_fault;
         break;
 
+    case MSR_INTEL_MISC_FEATURES_ENABLES:
+        if ( msr_content & ~MSR_MISC_FEATURES_CPUID_FAULTING )
+            goto gp_fault;
+        v->arch.cpuid_fault =
+            !!(msr_content & MSR_MISC_FEATURES_CPUID_FAULTING);
+        break;
+
     default:
         if ( passive_domain_do_wrmsr(msr, msr_content) )
             return X86EMUL_OKAY;
 
         if ( wrmsr_viridian_regs(msr, msr_content) ) 
             break;
 
         switch ( long_mode_do_msr_write(msr, msr_content) )
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 293ff8d..12322bd 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1315,16 +1315,24 @@ static int emulate_forced_invalid_op(struct cpu_user_regs *regs)
     /* We only emulate CPUID. */
     if ( ( rc = copy_from_user(instr, (char *)eip, sizeof(instr))) != 0 )
     {
         propagate_page_fault(eip + sizeof(instr) - rc, 0);
         return EXCRET_fault_fixed;
     }
     if ( memcmp(instr, "\xf\xa2", sizeof(instr)) )
         return 0;
+
+    /* If cpuid faulting is enabled and CPL>0 inject a #GP in place of #UD. */
+    if ( current->arch.cpuid_fault && !guest_kernel_mode(current, regs) ) {
+        regs->eip = eip;
+        do_guest_trap(TRAP_gp_fault, regs);
+        return EXCRET_fault_fixed;
+    }
+
     eip += sizeof(instr);
 
     pv_cpuid(regs);
 
     instruction_done(regs, eip, 0);
 
     trace_trap_one_addr(TRC_PV_FORCED_INVALID_OP, regs->eip);
 
@@ -2474,16 +2482,27 @@ static int priv_op_read_msr(unsigned int reg, uint64_t *val,
         *val = 0;
         return X86EMUL_OKAY;
 
     case MSR_INTEL_PLATFORM_INFO:
         if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
              rdmsr_safe(MSR_INTEL_PLATFORM_INFO, *val) )
             break;
         *val = 0;
+        if ( this_cpu(cpuid_faulting_enabled) )
+            *val |= MSR_PLATFORM_INFO_CPUID_FAULTING;
+        return X86EMUL_OKAY;
+
+    case MSR_INTEL_MISC_FEATURES_ENABLES:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
+             rdmsr_safe(MSR_INTEL_MISC_FEATURES_ENABLES, *val) )
+            break;
+        *val = 0;
+        if ( curr->arch.cpuid_fault )
+            *val |= MSR_MISC_FEATURES_CPUID_FAULTING;
         return X86EMUL_OKAY;
 
     case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
     case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(3):
     case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
     case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
         if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
         {
@@ -2677,16 +2696,27 @@ static int priv_op_write_msr(unsigned int reg, uint64_t val,
         return X86EMUL_OKAY;
 
     case MSR_INTEL_PLATFORM_INFO:
         if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
              val || rdmsr_safe(MSR_INTEL_PLATFORM_INFO, val) )
             break;
         return X86EMUL_OKAY;
 
+    case MSR_INTEL_MISC_FEATURES_ENABLES:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
+             (val & ~MSR_MISC_FEATURES_CPUID_FAULTING) ||
+             rdmsr_safe(MSR_INTEL_MISC_FEATURES_ENABLES, temp) )
+            break;
+        if ( (val & MSR_MISC_FEATURES_CPUID_FAULTING) &&
+             !this_cpu(cpuid_faulting_enabled) )
+            break;
+        curr->arch.cpuid_fault = !!(val & MSR_MISC_FEATURES_CPUID_FAULTING);
+        return X86EMUL_OKAY;
+
     case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
     case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(3):
     case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
     case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
         if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
         {
             vpmu_msr = true;
     case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
@@ -3186,16 +3216,20 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
         if ( priv_op_read_msr(regs->_ecx, &val, NULL) != X86EMUL_OKAY )
             goto fail;
  rdmsr_writeback:
         regs->eax = (uint32_t)val;
         regs->edx = (uint32_t)(val >> 32);
         break;
 
     case 0xa2: /* CPUID */
+        /* If cpuid faulting is enabled and CPL>0 leave the #GP untouched. */
+        if ( v->arch.cpuid_fault && !guest_kernel_mode(v, regs) )
+            goto fail;
+
         pv_cpuid(regs);
         break;
 
     default:
         goto fail;
     }
 
 #undef wr_ad
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 5807a1f..27c20cc 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -552,16 +552,19 @@ struct arch_vcpu
      * However, processor should not be able to touch eXtended states before
      * it explicitly enables it via xcr0.
      */
     uint64_t xcr0_accum;
     /* This variable determines whether nonlazy extended state has been used,
      * and thus should be saved/restored. */
     bool_t nonlazy_xstate_used;
 
+    /* Has the guest enabled CPUID faulting? */
+    bool cpuid_fault;
+
     /*
      * The SMAP check policy when updating runstate_guest(v) and the
      * secondary system time.
      */
     smap_check_policy_t smap_check_policy;
 
     struct vmce vmce;
 
-- 
2.10.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/2] x86/Intel: virtualize support for cpuid faulting
  2016-10-14 19:47 ` [PATCH v3 2/2] x86/Intel: virtualize support for cpuid faulting Kyle Huey
@ 2016-10-17 12:32   ` Wei Liu
  2016-10-20  5:10     ` Kyle Huey
  2016-10-17 12:49   ` Andrew Cooper
  1 sibling, 1 reply; 17+ messages in thread
From: Wei Liu @ 2016-10-17 12:32 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Kevin Tian, Wei Liu, Jun Nakajima, Andrew Cooper, xen-devel,
	Jan Beulich, Robert O'Callahan

On Fri, Oct 14, 2016 at 12:47:36PM -0700, Kyle Huey wrote:
> On HVM guests, the cpuid triggers a vm exit, so we can check the emulated
> faulting state in vmx_do_cpuid and inject a GP(0) if CPL > 0. Notably no
> hardware support for faulting on cpuid is necessary to emulate support with an
> HVM guest.
> 
> On PV guests, hardware support is required so that userspace cpuid will trap
> to xen. Xen already enables cpuid faulting on supported CPUs for pv guests (that
> aren't the control domain, see the comment in intel_ctxt_switch_levelling).
> Every PV guest cpuid will trap via a GP(0) to emulate_privileged_op (via
> do_general_protection). Once there we simply decline to emulate cpuid if the
> CPL > 0 and faulting is enabled, leaving the GP(0) for the guest kernel to
> handle.
> 
> Signed-off-by: Kyle Huey <khuey@kylehuey.com>

Andrew expressed the desire of taking this patch into 4.8. After reading
the description and code in detail, I think this patch falls into the
"nice-to-have" category.

The main risk here is this patch doesn't have architecturally correct
behaviour. I would like to see an ack or review from VT maintainers to
make this patch eligible for acceptance.

Another thing to consider is timing. We plan to cut RC3 before Friday
this week, so if this patch can be acked and becomes part of RC3 I'm
fine with applying it. If not, we shall revisit the situation when it is
acked.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/2] x86/Intel: Expose cpuid_faulting_enabled so it can be used elsewhere
  2016-10-14 19:47 ` [PATCH v3 1/2] x86/Intel: Expose cpuid_faulting_enabled so it can be used elsewhere Kyle Huey
@ 2016-10-17 12:35   ` Andrew Cooper
  2016-10-17 12:43   ` Wei Liu
  1 sibling, 0 replies; 17+ messages in thread
From: Andrew Cooper @ 2016-10-17 12:35 UTC (permalink / raw)
  To: Kyle Huey, xen-devel
  Cc: Kevin Tian, Jun Nakajima, Jan Beulich, Robert O'Callahan

On 14/10/16 20:47, Kyle Huey wrote:
> While we're here, use bool instead of bool_t.
>
> Signed-off-by: Kyle Huey <khuey@kylehuey.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/2] x86/Intel: Expose cpuid_faulting_enabled so it can be used elsewhere
  2016-10-14 19:47 ` [PATCH v3 1/2] x86/Intel: Expose cpuid_faulting_enabled so it can be used elsewhere Kyle Huey
  2016-10-17 12:35   ` Andrew Cooper
@ 2016-10-17 12:43   ` Wei Liu
  1 sibling, 0 replies; 17+ messages in thread
From: Wei Liu @ 2016-10-17 12:43 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Kevin Tian, Wei Liu, Jun Nakajima, Andrew Cooper, xen-devel,
	Jan Beulich, Robert O'Callahan

On Fri, Oct 14, 2016 at 12:47:35PM -0700, Kyle Huey wrote:
> While we're here, use bool instead of bool_t.
> 
> Signed-off-by: Kyle Huey <khuey@kylehuey.com>

FWIW:
Reviewed-by: Wei Liu <wei.liu2@citrix.com>

(since I happened to read this series per Andrew's request)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/2] x86/Intel: virtualize support for cpuid faulting
  2016-10-14 19:47 ` [PATCH v3 2/2] x86/Intel: virtualize support for cpuid faulting Kyle Huey
  2016-10-17 12:32   ` Wei Liu
@ 2016-10-17 12:49   ` Andrew Cooper
  1 sibling, 0 replies; 17+ messages in thread
From: Andrew Cooper @ 2016-10-17 12:49 UTC (permalink / raw)
  To: Kyle Huey, xen-devel
  Cc: Kevin Tian, Jun Nakajima, Jan Beulich, Robert O'Callahan

On 14/10/16 20:47, Kyle Huey wrote:
> On HVM guests, the cpuid triggers a vm exit, so we can check the emulated
> faulting state in vmx_do_cpuid and inject a GP(0) if CPL > 0. Notably no
> hardware support for faulting on cpuid is necessary to emulate support with an
> HVM guest.
>
> On PV guests, hardware support is required so that userspace cpuid will trap
> to xen. Xen already enables cpuid faulting on supported CPUs for pv guests (that

to Xen.

> aren't the control domain, see the comment in intel_ctxt_switch_levelling).
> Every PV guest cpuid will trap via a GP(0) to emulate_privileged_op (via
> do_general_protection). Once there we simply decline to emulate cpuid if the
> CPL > 0 and faulting is enabled, leaving the GP(0) for the guest kernel to
> handle.
>
> Signed-off-by: Kyle Huey <khuey@kylehuey.com>
> ---
>  xen/arch/x86/hvm/vmx/vmx.c   | 24 ++++++++++++++++++++++--
>  xen/arch/x86/traps.c         | 34 ++++++++++++++++++++++++++++++++++
>  xen/include/asm-x86/domain.h |  3 +++
>  3 files changed, 59 insertions(+), 2 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index b9102ce..c038393 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -2427,16 +2427,25 @@ static void vmx_cpuid_intercept(
>  
>      HVMTRACE_5D (CPUID, input, *eax, *ebx, *ecx, *edx);
>  }
>  
>  static int vmx_do_cpuid(struct cpu_user_regs *regs)
>  {
>      unsigned int eax, ebx, ecx, edx;
>      unsigned int leaf, subleaf;
> +    struct segment_register sreg;
> +    struct vcpu *v = current;
> +
> +    hvm_get_segment_register(v, x86_seg_ss, &sreg);
> +    if ( v->arch.cpuid_fault && sreg.attr.fields.dpl > 0 )
> +    {
> +        hvm_inject_hw_exception(TRAP_gp_fault, 0);
> +        return 1; /* Don't advance the guest IP! */
> +    }

Thinking about it, the segment register query can be skipped in the
likely case that faulting isn't enabled.  Could this be re-arranged to:

if ( v->arch.cpuid_fault )
{
    struct segment_register sreg;

    hvm_get_segment_register(v, x86_seg_ss, &sreg);
    if ( sreg.attr.fields.dpl > 0 )
    {
        hvm_inject_hw_exception(TRAP_gp_fault, 0);
        return 1; /* Don't advance the guest IP! */
    }
}

With these two minor issues taken care of, Reviewed-by: Andrew Cooper
<andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/2] x86/Intel: virtualize support for cpuid faulting
  2016-10-17 12:32   ` Wei Liu
@ 2016-10-20  5:10     ` Kyle Huey
  2016-10-20  7:56       ` Andrew Cooper
  0 siblings, 1 reply; 17+ messages in thread
From: Kyle Huey @ 2016-10-20  5:10 UTC (permalink / raw)
  To: Wei Liu
  Cc: Kevin Tian, Jun Nakajima, Andrew Cooper, xen-devel, Jan Beulich,
	Robert O'Callahan

On Mon, Oct 17, 2016 at 5:32 AM, Wei Liu <wei.liu2@citrix.com> wrote:
> On Fri, Oct 14, 2016 at 12:47:36PM -0700, Kyle Huey wrote:
>> On HVM guests, the cpuid triggers a vm exit, so we can check the emulated
>> faulting state in vmx_do_cpuid and inject a GP(0) if CPL > 0. Notably no
>> hardware support for faulting on cpuid is necessary to emulate support with an
>> HVM guest.
>>
>> On PV guests, hardware support is required so that userspace cpuid will trap
>> to xen. Xen already enables cpuid faulting on supported CPUs for pv guests (that
>> aren't the control domain, see the comment in intel_ctxt_switch_levelling).
>> Every PV guest cpuid will trap via a GP(0) to emulate_privileged_op (via
>> do_general_protection). Once there we simply decline to emulate cpuid if the
>> CPL > 0 and faulting is enabled, leaving the GP(0) for the guest kernel to
>> handle.
>>
>> Signed-off-by: Kyle Huey <khuey@kylehuey.com>
>
> Andrew expressed the desire of taking this patch into 4.8. After reading
> the description and code in detail, I think this patch falls into the
> "nice-to-have" category.
>
> The main risk here is this patch doesn't have architecturally correct
> behaviour. I would like to see an ack or review from VT maintainers to
> make this patch eligible for acceptance.
>
> Another thing to consider is timing. We plan to cut RC3 before Friday
> this week, so if this patch can be acked and becomes part of RC3 I'm
> fine with applying it. If not, we shall revisit the situation when it is
> acked.

Kevin Tian reviewed the patch yesterday, so I think we're just waiting
for a final review from Andrew here.

That said, rr currently does not work in Xen guests due to some PMU
issues that we haven't tracked down yet.  So for us it's not a big
deal if this feature does not make it into 4.8.  I won't be
disappointed if you cut it from 4.8 to reduce technical risk.

- Kyle

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/2] x86/Intel: virtualize support for cpuid faulting
  2016-10-20  5:10     ` Kyle Huey
@ 2016-10-20  7:56       ` Andrew Cooper
  2016-10-20 13:55         ` Kyle Huey
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Cooper @ 2016-10-20  7:56 UTC (permalink / raw)
  To: Kyle Huey, Wei Liu
  Cc: Robert O'Callahan, Kevin Tian, Jan Beulich, Jun Nakajima, xen-devel

On 20/10/2016 06:10, Kyle Huey wrote:
> On Mon, Oct 17, 2016 at 5:32 AM, Wei Liu <wei.liu2@citrix.com> wrote:
>> On Fri, Oct 14, 2016 at 12:47:36PM -0700, Kyle Huey wrote:
>>> On HVM guests, the cpuid triggers a vm exit, so we can check the emulated
>>> faulting state in vmx_do_cpuid and inject a GP(0) if CPL > 0. Notably no
>>> hardware support for faulting on cpuid is necessary to emulate support with an
>>> HVM guest.
>>>
>>> On PV guests, hardware support is required so that userspace cpuid will trap
>>> to xen. Xen already enables cpuid faulting on supported CPUs for pv guests (that
>>> aren't the control domain, see the comment in intel_ctxt_switch_levelling).
>>> Every PV guest cpuid will trap via a GP(0) to emulate_privileged_op (via
>>> do_general_protection). Once there we simply decline to emulate cpuid if the
>>> CPL > 0 and faulting is enabled, leaving the GP(0) for the guest kernel to
>>> handle.
>>>
>>> Signed-off-by: Kyle Huey <khuey@kylehuey.com>
>> Andrew expressed the desire of taking this patch into 4.8. After reading
>> the description and code in detail, I think this patch falls into the
>> "nice-to-have" category.
>>
>> The main risk here is this patch doesn't have architecturally correct
>> behaviour. I would like to see an ack or review from VT maintainers to
>> make this patch eligible for acceptance.
>>
>> Another thing to consider is timing. We plan to cut RC3 before Friday
>> this week, so if this patch can be acked and becomes part of RC3 I'm
>> fine with applying it. If not, we shall revisit the situation when it is
>> acked.
> Kevin Tian reviewed the patch yesterday, so I think we're just waiting
> for a final review from Andrew here.

Ah - I am just waiting for your final respin with the comments so far
addressed.

>
> That said, rr currently does not work in Xen guests due to some PMU
> issues that we haven't tracked down yet.

Is this RR trying to use vPMU and it not functioning, or not
specifically trying to use PMU facilities and getting stuck anyway?

>   So for us it's not a big
> deal if this feature does not make it into 4.8.  I won't be
> disappointed if you cut it from 4.8 to reduce technical risk.

From my point of view, its a small feature with working code and a
comprehensive test case ready to go straight into regression testing. 
This makes it the least risky feature going.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/2] x86/Intel: virtualize support for cpuid faulting
  2016-10-20  7:56       ` Andrew Cooper
@ 2016-10-20 13:55         ` Kyle Huey
  2016-10-20 14:11           ` Andrew Cooper
  0 siblings, 1 reply; 17+ messages in thread
From: Kyle Huey @ 2016-10-20 13:55 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Wei Liu, Jun Nakajima, xen-devel, Jan Beulich,
	Robert O'Callahan

On Thu, Oct 20, 2016 at 12:56 AM, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> On 20/10/2016 06:10, Kyle Huey wrote:
>> On Mon, Oct 17, 2016 at 5:32 AM, Wei Liu <wei.liu2@citrix.com> wrote:
>>> On Fri, Oct 14, 2016 at 12:47:36PM -0700, Kyle Huey wrote:
>>>> On HVM guests, the cpuid triggers a vm exit, so we can check the emulated
>>>> faulting state in vmx_do_cpuid and inject a GP(0) if CPL > 0. Notably no
>>>> hardware support for faulting on cpuid is necessary to emulate support with an
>>>> HVM guest.
>>>>
>>>> On PV guests, hardware support is required so that userspace cpuid will trap
>>>> to xen. Xen already enables cpuid faulting on supported CPUs for pv guests (that
>>>> aren't the control domain, see the comment in intel_ctxt_switch_levelling).
>>>> Every PV guest cpuid will trap via a GP(0) to emulate_privileged_op (via
>>>> do_general_protection). Once there we simply decline to emulate cpuid if the
>>>> CPL > 0 and faulting is enabled, leaving the GP(0) for the guest kernel to
>>>> handle.
>>>>
>>>> Signed-off-by: Kyle Huey <khuey@kylehuey.com>
>>> Andrew expressed the desire of taking this patch into 4.8. After reading
>>> the description and code in detail, I think this patch falls into the
>>> "nice-to-have" category.
>>>
>>> The main risk here is this patch doesn't have architecturally correct
>>> behaviour. I would like to see an ack or review from VT maintainers to
>>> make this patch eligible for acceptance.
>>>
>>> Another thing to consider is timing. We plan to cut RC3 before Friday
>>> this week, so if this patch can be acked and becomes part of RC3 I'm
>>> fine with applying it. If not, we shall revisit the situation when it is
>>> acked.
>> Kevin Tian reviewed the patch yesterday, so I think we're just waiting
>> for a final review from Andrew here.
>
> Ah - I am just waiting for your final respin with the comments so far
> addressed.

Oh.  I thought I already did that ... though apparently I didn't.  I
must have forgotten to remove --dry-run or something.  Anyways, it's
sent now.

>> That said, rr currently does not work in Xen guests due to some PMU
>> issues that we haven't tracked down yet.
>
> Is this RR trying to use vPMU and it not functioning, or not
> specifically trying to use PMU facilities and getting stuck anyway?

The latter.  rr relies on the values returned by the PMU (the retired
conditional branches counter in particular) being exactly the same
during the recording and replay phases.  This is true when running on
bare metal, and when running inside a KVM guest, but when running in a
Xen HVM guest we see values that are off by a branch or two on a small
fraction of our tests.  Since it works in KVM I suspect this is some
sort of issue with how Xen multiplexes the real PMU and events are
"leaking" between guests (or perhaps from Xen itself, though I don't
think the Xen kernel executes any ring 3 code).  Even if that's
correct we're a long way from tracking it down and patching it though.

>>   So for us it's not a big
>> deal if this feature does not make it into 4.8.  I won't be
>> disappointed if you cut it from 4.8 to reduce technical risk.
>
> From my point of view, its a small feature with working code and a
> comprehensive test case ready to go straight into regression testing.
> This makes it the least risky feature going.

- Kyle

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/2] x86/Intel: virtualize support for cpuid faulting
  2016-10-20 13:55         ` Kyle Huey
@ 2016-10-20 14:11           ` Andrew Cooper
  2016-10-20 14:40             ` Boris Ostrovsky
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Cooper @ 2016-10-20 14:11 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Kevin Tian, Wei Liu, Jun Nakajima, xen-devel, Jan Beulich,
	Boris Ostrovsky, Robert O'Callahan

On 20/10/16 14:55, Kyle Huey wrote:
>
>>> That said, rr currently does not work in Xen guests due to some PMU
>>> issues that we haven't tracked down yet.
>> Is this RR trying to use vPMU and it not functioning, or not
>> specifically trying to use PMU facilities and getting stuck anyway?
> The latter.  rr relies on the values returned by the PMU (the retired
> conditional branches counter in particular) being exactly the same
> during the recording and replay phases.  This is true when running on
> bare metal, and when running inside a KVM guest, but when running in a
> Xen HVM guest we see values that are off by a branch or two on a small
> fraction of our tests.  Since it works in KVM I suspect this is some
> sort of issue with how Xen multiplexes the real PMU and events are
> "leaking" between guests (or perhaps from Xen itself, though I don't
> think the Xen kernel executes any ring 3 code).  Even if that's
> correct we're a long way from tracking it down and patching it though.

Hmm.  That is unfortunate, and does point towards a bug in Xen.  Are
these tests which notice the problem easy to run?

Boris (CC'd) is the maintainer of that code.  It has undergone quite a
few changes recently.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/2] x86/Intel: virtualize support for cpuid faulting
  2016-10-20 14:11           ` Andrew Cooper
@ 2016-10-20 14:40             ` Boris Ostrovsky
  2016-10-21 15:52               ` Kyle Huey
  0 siblings, 1 reply; 17+ messages in thread
From: Boris Ostrovsky @ 2016-10-20 14:40 UTC (permalink / raw)
  To: Andrew Cooper, Kyle Huey
  Cc: Kevin Tian, Wei Liu, Jun Nakajima, xen-devel, Jan Beulich,
	Robert O'Callahan

On 10/20/2016 10:11 AM, Andrew Cooper wrote:
> On 20/10/16 14:55, Kyle Huey wrote:
>>>> That said, rr currently does not work in Xen guests due to some PMU
>>>> issues that we haven't tracked down yet.
>>> Is this RR trying to use vPMU and it not functioning, or not
>>> specifically trying to use PMU facilities and getting stuck anyway?
>> The latter.  rr relies on the values returned by the PMU (the retired
>> conditional branches counter in particular) being exactly the same
>> during the recording and replay phases.  This is true when running on
>> bare metal, and when running inside a KVM guest, but when running in a
>> Xen HVM guest we see values that are off by a branch or two on a small
>> fraction of our tests.  Since it works in KVM I suspect this is some
>> sort of issue with how Xen multiplexes the real PMU and events are
>> "leaking" between guests (or perhaps from Xen itself, though I don't
>> think the Xen kernel executes any ring 3 code).  Even if that's
>> correct we're a long way from tracking it down and patching it though.
> Hmm.  That is unfortunate, and does point towards a bug in Xen.  Are
> these tests which notice the problem easy to run?
>
> Boris (CC'd) is the maintainer of that code.  It has undergone quite a
> few changes recently.

I am actually not the maintainer, I just break this code more often than
others.

But yes, having a test case would make it much easier to understand what
and why is not working.

Would something like

    wrmsr(PERFCTR,0);
    wrmsr(EVNTSEL, XXX); //enable counter
    // do something simple, with branches
    wrmsr(EVTSEL,YYY); // disable counter
   
demonstrate the problem? (I assume we are talking about HVM guest)

-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/2] x86/Intel: virtualize support for cpuid faulting
  2016-10-20 14:40             ` Boris Ostrovsky
@ 2016-10-21 15:52               ` Kyle Huey
  2016-10-24  4:18                 ` Kyle Huey
  0 siblings, 1 reply; 17+ messages in thread
From: Kyle Huey @ 2016-10-21 15:52 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Kevin Tian, Wei Liu, Jun Nakajima, Andrew Cooper, xen-devel,
	Jan Beulich, Robert O'Callahan

On Thu, Oct 20, 2016 at 7:40 AM, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
> On 10/20/2016 10:11 AM, Andrew Cooper wrote:
>> On 20/10/16 14:55, Kyle Huey wrote:
>>>>> That said, rr currently does not work in Xen guests due to some PMU
>>>>> issues that we haven't tracked down yet.
>>>> Is this RR trying to use vPMU and it not functioning, or not
>>>> specifically trying to use PMU facilities and getting stuck anyway?
>>> The latter.  rr relies on the values returned by the PMU (the retired
>>> conditional branches counter in particular) being exactly the same
>>> during the recording and replay phases.  This is true when running on
>>> bare metal, and when running inside a KVM guest, but when running in a
>>> Xen HVM guest we see values that are off by a branch or two on a small
>>> fraction of our tests.  Since it works in KVM I suspect this is some
>>> sort of issue with how Xen multiplexes the real PMU and events are
>>> "leaking" between guests (or perhaps from Xen itself, though I don't
>>> think the Xen kernel executes any ring 3 code).  Even if that's
>>> correct we're a long way from tracking it down and patching it though.
>> Hmm.  That is unfortunate, and does point towards a bug in Xen.  Are
>> these tests which notice the problem easy to run?
>>
>> Boris (CC'd) is the maintainer of that code.  It has undergone quite a
>> few changes recently.
>
> I am actually not the maintainer, I just break this code more often than
> others.
>
> But yes, having a test case would make it much easier to understand what
> and why is not working.
>
> Would something like
>
>     wrmsr(PERFCTR,0);
>     wrmsr(EVNTSEL, XXX); //enable counter
>     // do something simple, with branches
>     wrmsr(EVTSEL,YYY); // disable counter
>
> demonstrate the problem? (I assume we are talking about HVM guest)
>
> -boris
>

That is a good question.  I'll see if I can reduce the problem down
from "run Linux and run our tests inside it".

- Kyle

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/2] x86/Intel: virtualize support for cpuid faulting
  2016-10-21 15:52               ` Kyle Huey
@ 2016-10-24  4:18                 ` Kyle Huey
  2016-10-24 15:05                   ` Boris Ostrovsky
  0 siblings, 1 reply; 17+ messages in thread
From: Kyle Huey @ 2016-10-24  4:18 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Kevin Tian, Wei Liu, Jun Nakajima, Andrew Cooper, xen-devel,
	Jan Beulich, Robert O'Callahan

On Fri, Oct 21, 2016 at 8:52 AM, Kyle Huey <me@kylehuey.com> wrote:
> On Thu, Oct 20, 2016 at 7:40 AM, Boris Ostrovsky
> <boris.ostrovsky@oracle.com> wrote:
>> On 10/20/2016 10:11 AM, Andrew Cooper wrote:
>>> On 20/10/16 14:55, Kyle Huey wrote:
>>>>>> That said, rr currently does not work in Xen guests due to some PMU
>>>>>> issues that we haven't tracked down yet.
>>>>> Is this RR trying to use vPMU and it not functioning, or not
>>>>> specifically trying to use PMU facilities and getting stuck anyway?
>>>> The latter.  rr relies on the values returned by the PMU (the retired
>>>> conditional branches counter in particular) being exactly the same
>>>> during the recording and replay phases.  This is true when running on
>>>> bare metal, and when running inside a KVM guest, but when running in a
>>>> Xen HVM guest we see values that are off by a branch or two on a small
>>>> fraction of our tests.  Since it works in KVM I suspect this is some
>>>> sort of issue with how Xen multiplexes the real PMU and events are
>>>> "leaking" between guests (or perhaps from Xen itself, though I don't
>>>> think the Xen kernel executes any ring 3 code).  Even if that's
>>>> correct we're a long way from tracking it down and patching it though.
>>> Hmm.  That is unfortunate, and does point towards a bug in Xen.  Are
>>> these tests which notice the problem easy to run?
>>>
>>> Boris (CC'd) is the maintainer of that code.  It has undergone quite a
>>> few changes recently.
>>
>> I am actually not the maintainer, I just break this code more often than
>> others.
>>
>> But yes, having a test case would make it much easier to understand what
>> and why is not working.
>>
>> Would something like
>>
>>     wrmsr(PERFCTR,0);
>>     wrmsr(EVNTSEL, XXX); //enable counter
>>     // do something simple, with branches
>>     wrmsr(EVTSEL,YYY); // disable counter
>>
>> demonstrate the problem? (I assume we are talking about HVM guest)
>>
>> -boris
>>
>
> That is a good question.  I'll see if I can reduce the problem down
> from "run Linux and run our tests inside it".

The anomalies we see appear to be related to, or at least triggerable
by, the performance monitoring interrupt.  The following program runs
a loop of roughly 2^25 conditional branches.  It takes one argument,
the number of conditional branches to program the PMI to trigger on.
The default is 50,000, and if you run the program with that it'll
produce the same value every time.  If you drop it to 5000 or so
you'll probably see occasional off-by-one discrepancies.  If you drop
it to 500 the performance counter values fluctuate wildly.

I'm not yet sure if this is specifically related to the PMI, or if it
can be caused by any interrupt and it's only how frequently the
interrupts occur that matters.

- Kyle

#define _GNU_SOURCE 1

#include <assert.h>
#include <fcntl.h>
#include <linux/perf_event.h>
#include <signal.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/syscall.h>
#include <unistd.h>

static struct perf_event_attr rcb_attr;
static uint64_t period;
static int fd;

void counter_on(uint64_t ticks)
{
  int ret = ioctl(fd, PERF_EVENT_IOC_RESET, 0);
  assert(!ret);
  ret = ioctl(fd, PERF_EVENT_IOC_PERIOD, &ticks);
  assert(!ret);
  ret = ioctl(fd, PERF_EVENT_IOC_ENABLE, 1);
  assert(!ret);
}

void counter_off()
{
  int ret = ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);
  assert(!ret);
}

int64_t read_counter()
{
  int64_t val;
  ssize_t nread = read(fd, &val, sizeof(val));
  assert(nread == sizeof(val));
  return val;
}

void do_test()
{
  int64_t counts;
  int i, dummy;

  counter_on(period);
  for (i = 0; i < (1 << 25); i++) {
    dummy += i % (1 << 10);
    dummy += i % (79 * (1 << 10));
  }

  counter_off();
  counts = read_counter();
  printf("Counted %ld conditional branches\n", counts);
}

int main(int argc, const char* argv[])
{
  memset(&rcb_attr, 0, sizeof(rcb_attr));
  rcb_attr.size = sizeof(rcb_attr);
  rcb_attr.type = PERF_TYPE_RAW;
  /* Intel retired conditional branches counter, ring 3 only */
  rcb_attr.config = 0x5101c4;
  rcb_attr.exclude_kernel = 1;
  rcb_attr.exclude_guest = 1;
  /* We'll change this later */
  rcb_attr.sample_period = 0xffffffff;

  /* start the counter */
  fd = syscall(__NR_perf_event_open, &rcb_attr, 0, -1, -1, 0);
  if (fd < 0) {
    printf("Failed to initialize counter\n");
    return -1;
  }

  signal(SIGALRM, SIG_IGN);

  if (fcntl(fd, F_SETFL, O_ASYNC) || fcntl(fd, F_SETSIG, SIGALRM)) {
    printf("Failed to make counter async\n");
    return -1;
  }

  counter_off();

  period = 50000;
  if (argc > 1) {
    sscanf(argv[1], "%ld", &period);
  }

  printf("Period is %ld\n", period);

  do_test();

  return 0;
}

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/2] x86/Intel: virtualize support for cpuid faulting
  2016-10-24  4:18                 ` Kyle Huey
@ 2016-10-24 15:05                   ` Boris Ostrovsky
  2016-10-24 19:22                     ` Kyle Huey
  0 siblings, 1 reply; 17+ messages in thread
From: Boris Ostrovsky @ 2016-10-24 15:05 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Kevin Tian, Wei Liu, Jun Nakajima, Andrew Cooper, xen-devel,
	Jan Beulich, Robert O'Callahan

On 10/24/2016 12:18 AM, Kyle Huey wrote:
>
> The anomalies we see appear to be related to, or at least triggerable
> by, the performance monitoring interrupt.  The following program runs
> a loop of roughly 2^25 conditional branches.  It takes one argument,
> the number of conditional branches to program the PMI to trigger on.
> The default is 50,000, and if you run the program with that it'll
> produce the same value every time.  If you drop it to 5000 or so
> you'll probably see occasional off-by-one discrepancies.  If you drop
> it to 500 the performance counter values fluctuate wildly.

Yes, it does change but I also see the difference on baremetal (although
not as big as it is in an HVM guest):
ostr@workbase> ./pmu 500
Period is 500
Counted 5950003 conditional branches
ostr@workbase> ./pmu 500
Period is 500
Counted 5850003 conditional branches
ostr@workbase> ./pmu 500
Period is 500
Counted 7530107 conditional branches
ostr@workbase>



>
> I'm not yet sure if this is specifically related to the PMI, or if it
> can be caused by any interrupt and it's only how frequently the
> interrupts occur that matters.

I have never used file interface to performance counters, but what are
we reporting here (in read_counter()) --- total number of events or
number of events since last sample? It is also curious to me that the
counter in non-zero after  PERF_EVENT_IOC_RESET (but again, I don't have
any experience with these interfaces).

Also, exclude_guest doesn't appear to make any difference, I don't know
if there are any bits in Intel counters that allow you to distinguish
guest from host (unlike AMD, where there is a bit for that).


-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/2] x86/Intel: virtualize support for cpuid faulting
  2016-10-24 15:05                   ` Boris Ostrovsky
@ 2016-10-24 19:22                     ` Kyle Huey
  2016-10-24 21:15                       ` Boris Ostrovsky
  0 siblings, 1 reply; 17+ messages in thread
From: Kyle Huey @ 2016-10-24 19:22 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Kevin Tian, Wei Liu, Jun Nakajima, Andrew Cooper, xen-devel,
	Jan Beulich, Robert O'Callahan

On Mon, Oct 24, 2016 at 8:05 AM, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
> On 10/24/2016 12:18 AM, Kyle Huey wrote:
>>
>> The anomalies we see appear to be related to, or at least triggerable
>> by, the performance monitoring interrupt.  The following program runs
>> a loop of roughly 2^25 conditional branches.  It takes one argument,
>> the number of conditional branches to program the PMI to trigger on.
>> The default is 50,000, and if you run the program with that it'll
>> produce the same value every time.  If you drop it to 5000 or so
>> you'll probably see occasional off-by-one discrepancies.  If you drop
>> it to 500 the performance counter values fluctuate wildly.
>
> Yes, it does change but I also see the difference on baremetal (although
> not as big as it is in an HVM guest):
> ostr@workbase> ./pmu 500
> Period is 500
> Counted 5950003 conditional branches
> ostr@workbase> ./pmu 500
> Period is 500
> Counted 5850003 conditional branches
> ostr@workbase> ./pmu 500
> Period is 500
> Counted 7530107 conditional branches
> ostr@workbase>

Yeah, you're right.  I simplified the testcase too far.  I have
included a better one.  This testcase is stable on bare metal (down to
an interrupt every 10 branches, I didn't try below that) and more
accurately represents what our software actually does.  rr acts as a
ptrace supervisor to the process being recorded, and it seems that
context switching between the supervisor and tracee processes
stabilizes the performance counter values somehow.

>> I'm not yet sure if this is specifically related to the PMI, or if it
>> can be caused by any interrupt and it's only how frequently the
>> interrupts occur that matters.
>
> I have never used file interface to performance counters, but what are
> we reporting here (in read_counter()) --- total number of events or
> number of events since last sample? It is also curious to me that the
> counter in non-zero after  PERF_EVENT_IOC_RESET (but again, I don't have
> any experience with these interfaces).

It should be number of events since the last time the counter was
reset (or overflowed, I guess).  On my machine the counter value is
zero both before and after the PERF_EVENT_IOC_RESET ioctl.

> Also, exclude_guest doesn't appear to make any difference, I don't know
> if there are any bits in Intel counters that allow you to distinguish
> guest from host (unlike AMD, where there is a bit for that).

exclude_guest is a Linux specific thing for excluding KVM guests.
There is no hardware support involved; it's handled entirely in the
perf events infrastructure in the kernel.

- Kyle

#define _GNU_SOURCE 1

#include <assert.h>
#include <fcntl.h>
#include <linux/perf_event.h>
#include <signal.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/ptrace.h>
#include <sys/syscall.h>
#include <sys/wait.h>
#include <unistd.h>

static struct perf_event_attr rcb_attr;
static uint64_t period;
static int fd;

void counter_on(uint64_t ticks)
{
  int ret = ioctl(fd, PERF_EVENT_IOC_RESET, 0);
  assert(!ret);
  ret = ioctl(fd, PERF_EVENT_IOC_PERIOD, &ticks);
  assert(!ret);
  ret = ioctl(fd, PERF_EVENT_IOC_ENABLE, 1);
  assert(!ret);
}

void counter_off()
{
  int ret = ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);
  assert(!ret);
}

int64_t read_counter()
{
  int64_t val;
  ssize_t nread = read(fd, &val, sizeof(val));
  assert(nread == sizeof(val));
  return val;
}

void do_test()
{
  int i, dummy;

  for (i = 0; i < (1 << 25); i++) {
    dummy += i % (1 << 10);
    dummy += i % (79 * (1 << 10));
  }
}

int main(int argc, const char* argv[])
{
  int pid;
  memset(&rcb_attr, 0, sizeof(rcb_attr));
  rcb_attr.size = sizeof(rcb_attr);
  rcb_attr.type = PERF_TYPE_RAW;
  /* Intel retired conditional branches counter, ring 3 only */
  rcb_attr.config = 0x5101c4;
  rcb_attr.exclude_kernel = 1;
  rcb_attr.exclude_guest = 1;
  /* We'll change this later */
  rcb_attr.sample_period = 0xffffffff;

  signal(SIGALRM, SIG_IGN);
  pid = fork();
  if (pid == 0) {
    /* Wait for the parent */
    kill(getpid(), SIGSTOP);
    do_test();
    return 0;
  }

  /* start the counter */
  fd = syscall(__NR_perf_event_open, &rcb_attr, pid, -1, -1, 0);
  if (fd < 0) {
    printf("Failed to initialize counter\n");
    return -1;
  }

  counter_off();

  struct f_owner_ex own;
  own.type = F_OWNER_PID;
  own.pid = pid;
  if (fcntl(fd, F_SETOWN_EX, &own) ||
      fcntl(fd, F_SETFL, O_ASYNC) ||
      fcntl(fd, F_SETSIG, SIGALRM)) {
    printf("Failed to make counter async\n");
    return -1;
  }

  period = 50000;
  if (argc > 1) {
    sscanf(argv[1], "%ld", &period);
  }

  printf("Period is %ld\n", period);

  counter_on(period);
  ptrace(PTRACE_SEIZE, pid, NULL, 0);
  ptrace(PTRACE_CONT, pid, NULL, SIGCONT);

  int status = 0;
  while (1) {
    waitpid(pid, &status, 0);
    if (WIFEXITED(status)) {
      break;
    }
    if (WIFSIGNALED(status)) {
      assert(0);
      continue;
    }
    if (WIFSTOPPED(status)) {
      if (WSTOPSIG(status) == SIGALRM ||
      WSTOPSIG(status) == SIGSTOP) {
    ptrace(PTRACE_CONT, pid, NULL, WSTOPSIG(status));
    continue;
      }
    }
    assert(0 && "unhandled ptrace event!");
  }

  counter_off();
  int64_t counts = read_counter();
  printf("Counted %ld conditional branches\n", counts);

  return 0;
}

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/2] x86/Intel: virtualize support for cpuid faulting
  2016-10-24 19:22                     ` Kyle Huey
@ 2016-10-24 21:15                       ` Boris Ostrovsky
  0 siblings, 0 replies; 17+ messages in thread
From: Boris Ostrovsky @ 2016-10-24 21:15 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Kevin Tian, Wei Liu, Jun Nakajima, Andrew Cooper, xen-devel,
	Jan Beulich, Robert O'Callahan

On 10/24/2016 03:22 PM, Kyle Huey wrote:
> On Mon, Oct 24, 2016 at 8:05 AM, Boris Ostrovsky
> <boris.ostrovsky@oracle.com> wrote:
>> On 10/24/2016 12:18 AM, Kyle Huey wrote:
>>> The anomalies we see appear to be related to, or at least triggerable
>>> by, the performance monitoring interrupt.  The following program runs
>>> a loop of roughly 2^25 conditional branches.  It takes one argument,
>>> the number of conditional branches to program the PMI to trigger on.
>>> The default is 50,000, and if you run the program with that it'll
>>> produce the same value every time.  If you drop it to 5000 or so
>>> you'll probably see occasional off-by-one discrepancies.  If you drop
>>> it to 500 the performance counter values fluctuate wildly.
>> Yes, it does change but I also see the difference on baremetal (although
>> not as big as it is in an HVM guest):
>> ostr@workbase> ./pmu 500
>> Period is 500
>> Counted 5950003 conditional branches
>> ostr@workbase> ./pmu 500
>> Period is 500
>> Counted 5850003 conditional branches
>> ostr@workbase> ./pmu 500
>> Period is 500
>> Counted 7530107 conditional branches
>> ostr@workbase>
> Yeah, you're right.  I simplified the testcase too far.  I have
> included a better one.  This testcase is stable on bare metal (down to
> an interrupt every 10 branches, I didn't try below that) and more
> accurately represents what our software actually does. 

When I run this program in a loop the first iteration is always off:

ostr@workbase> while [ true ]; do taskset -c 0 ./pmu 500|grep -v Period
; done
Counted 33554556 conditional branches
Counted 33554729 conditional branches
Counted 33554729 conditional branches
...

but then it indeed is stable. Could it be "priming" the branch
predictor? Does this counter count mis-predicted branches? Probably not
since the first number is smaller than the rest.


>  rr acts as a
> ptrace supervisor to the process being recorded, and it seems that
> context switching between the supervisor and tracee processes
> stabilizes the performance counter values somehow.

Not sure I understand what you mean by this. The supervising thread is
presumably sitting in kernel (in waitpid()) so nothing should be counted
for it.

-boris



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-10-24 21:15 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-14 19:47 [PATCH v3] x86/Intel: virtualize support for cpuid faulting Kyle Huey
2016-10-14 19:47 ` [PATCH v3 1/2] x86/Intel: Expose cpuid_faulting_enabled so it can be used elsewhere Kyle Huey
2016-10-17 12:35   ` Andrew Cooper
2016-10-17 12:43   ` Wei Liu
2016-10-14 19:47 ` [PATCH v3 2/2] x86/Intel: virtualize support for cpuid faulting Kyle Huey
2016-10-17 12:32   ` Wei Liu
2016-10-20  5:10     ` Kyle Huey
2016-10-20  7:56       ` Andrew Cooper
2016-10-20 13:55         ` Kyle Huey
2016-10-20 14:11           ` Andrew Cooper
2016-10-20 14:40             ` Boris Ostrovsky
2016-10-21 15:52               ` Kyle Huey
2016-10-24  4:18                 ` Kyle Huey
2016-10-24 15:05                   ` Boris Ostrovsky
2016-10-24 19:22                     ` Kyle Huey
2016-10-24 21:15                       ` Boris Ostrovsky
2016-10-17 12:49   ` Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.