All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/3] x86/levelling: Restrict non-architectural OSXSAVE handling to emulated CPUID
@ 2016-08-23 17:26 Andrew Cooper
  2016-08-23 17:26 ` [PATCH 2/3] x86/levelling: Pass a vcpu rather than a domain to ctxt_switch_levelling() Andrew Cooper
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Andrew Cooper @ 2016-08-23 17:26 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

There is no need to extend the workaround to the faulted CPUID view, as
Linux's dependence on the workaround is stricly via the emulated view.

This causes a guest kernel faulted CPUID to observe architectural behaviour
with respect to its CR4.OSXSAVE setting.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
---
 xen/arch/x86/traps.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 3df0295..c95fadb 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1003,6 +1003,8 @@ void pv_cpuid(struct cpu_user_regs *regs)
              *
              * Therefore, the leaking of Xen's OSXSAVE setting has become a
              * defacto part of the PV ABI and can't reasonably be corrected.
+             * It can however be restricted to only the enlightened CPUID
+             * view, as seen by the guest kernel.
              *
              * The following situations and logic now applies:
              *
@@ -1016,14 +1018,18 @@ void pv_cpuid(struct cpu_user_regs *regs)
              *
              * - Enlightened CPUID or CPUID faulting available:
              *    Xen can fully control what is seen here.  Guest kernels need
-             *    to see the leaked OSXSAVE, but guest userspace is given
-             *    architectural behaviour, to reflect the guest kernels
-             *    intentions.
+             *    to see the leaked OSXSAVE via the enlightened path, but
+             *    guest userspace and the native is given architectural
+             *    behaviour.
+             *
+             *    Emulated vs Faulted CPUID is distinguised based on whether a
+             *    #UD or #GP is currently being serviced.
              */
             /* OSXSAVE cleared by pv_featureset.  Fast-forward CR4 back in. */
-            if ( (guest_kernel_mode(curr, regs) &&
-                  (read_cr4() & X86_CR4_OSXSAVE)) ||
-                 (curr->arch.pv_vcpu.ctrlreg[4] & X86_CR4_OSXSAVE) )
+            if ( (curr->arch.pv_vcpu.ctrlreg[4] & X86_CR4_OSXSAVE) ||
+                 (regs->entry_vector == TRAP_invalid_op &&
+                  guest_kernel_mode(curr, regs) &&
+                  (read_cr4() & X86_CR4_OSXSAVE)) )
                 c |= cpufeat_mask(X86_FEATURE_OSXSAVE);
 
             /*
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/3] x86/levelling: Pass a vcpu rather than a domain to ctxt_switch_levelling()
  2016-08-23 17:26 [PATCH 1/3] x86/levelling: Restrict non-architectural OSXSAVE handling to emulated CPUID Andrew Cooper
@ 2016-08-23 17:26 ` Andrew Cooper
  2016-08-24  8:16   ` Jan Beulich
  2016-08-23 17:26 ` [PATCH 3/3] x86/levelling: Provide architectural OSXSAVE handling to masked native CPUID Andrew Cooper
  2016-08-24  8:14 ` [PATCH 1/3] x86/levelling: Restrict non-architectural OSXSAVE handling to emulated CPUID Jan Beulich
  2 siblings, 1 reply; 9+ messages in thread
From: Andrew Cooper @ 2016-08-23 17:26 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

A subsequent change needs to special-case OSXSAVE handling, which is per-vcpu
rather than per-domain.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
---
 xen/arch/x86/cpu/amd.c          | 3 ++-
 xen/arch/x86/cpu/common.c       | 4 ++--
 xen/arch/x86/cpu/intel.c        | 3 ++-
 xen/arch/x86/domain.c           | 2 +-
 xen/include/asm-x86/processor.h | 2 +-
 5 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index 9c298f8..784fa40 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -203,9 +203,10 @@ static void __init noinline probe_masking_msrs(void)
  * used to context switch to the default host state (by the cpu bringup-code,
  * crash path, etc).
  */
-static void amd_ctxt_switch_levelling(const struct domain *nextd)
+static void amd_ctxt_switch_levelling(const struct vcpu *next)
 {
 	struct cpuidmasks *these_masks = &this_cpu(cpuidmasks);
+	const struct domain *nextd = next ? next->domain : NULL;
 	const struct cpuidmasks *masks =
 		(nextd && is_pv_domain(nextd) && nextd->arch.pv_domain.cpuidmasks)
 		? nextd->arch.pv_domain.cpuidmasks : &cpuidmask_defaults;
diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index 577a01f..a5cfe52 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -90,11 +90,11 @@ static const struct cpu_dev default_cpu = {
 };
 static const struct cpu_dev *this_cpu = &default_cpu;
 
-static void default_ctxt_switch_levelling(const struct domain *nextd)
+static void default_ctxt_switch_levelling(const struct vcpu *next)
 {
 	/* Nop */
 }
-void (* __read_mostly ctxt_switch_levelling)(const struct domain *nextd) =
+void (* __read_mostly ctxt_switch_levelling)(const struct vcpu *next) =
 	default_ctxt_switch_levelling;
 
 bool_t opt_cpu_info;
diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
index fe4736e..3491638 100644
--- a/xen/arch/x86/cpu/intel.c
+++ b/xen/arch/x86/cpu/intel.c
@@ -151,9 +151,10 @@ static void __init probe_masking_msrs(void)
  * used to context switch to the default host state (by the cpu bringup-code,
  * crash path, etc).
  */
-static void intel_ctxt_switch_levelling(const struct domain *nextd)
+static void intel_ctxt_switch_levelling(const struct vcpu *next)
 {
 	struct cpuidmasks *these_masks = &this_cpu(cpuidmasks);
+	const struct domain *nextd = next ? next->domain : NULL;
 	const struct cpuidmasks *masks;
 
 	if (cpu_has_cpuid_faulting) {
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 1133ea2..7ca1b66 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2130,7 +2130,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
             load_segments(next);
         }
 
-        ctxt_switch_levelling(nextd);
+        ctxt_switch_levelling(next);
     }
 
     context_saved(prev);
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index 487ae28..3e6e355 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -217,7 +217,7 @@ extern struct cpuinfo_x86 boot_cpu_data;
 extern struct cpuinfo_x86 cpu_data[];
 #define current_cpu_data cpu_data[smp_processor_id()]
 
-extern void (*ctxt_switch_levelling)(const struct domain *nextd);
+extern void (*ctxt_switch_levelling)(const struct vcpu *next);
 
 extern u64 host_pat;
 extern bool_t opt_cpu_info;
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/3] x86/levelling: Provide architectural OSXSAVE handling to masked native CPUID
  2016-08-23 17:26 [PATCH 1/3] x86/levelling: Restrict non-architectural OSXSAVE handling to emulated CPUID Andrew Cooper
  2016-08-23 17:26 ` [PATCH 2/3] x86/levelling: Pass a vcpu rather than a domain to ctxt_switch_levelling() Andrew Cooper
@ 2016-08-23 17:26 ` Andrew Cooper
  2016-08-24  8:41   ` Jan Beulich
  2016-08-24  8:14 ` [PATCH 1/3] x86/levelling: Restrict non-architectural OSXSAVE handling to emulated CPUID Jan Beulich
  2 siblings, 1 reply; 9+ messages in thread
From: Andrew Cooper @ 2016-08-23 17:26 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

Contrary to c/s b2507fe7 "x86/domctl: Update PV domain cpumasks when setting
cpuid policy", Intel CPUID masks are applied after fast forwarding hardware
state, rather than before.  (All behaviour in this regard appears completely
undocumented by both Intel and AMD).

Therefore, a set bit in the MSR causes hardware to be fast-forwarded, while a
clear bit forces the guests view to 0, even if CR4.OSXSAVE is actually set.

This allows Xen to provide an architectural view of a guest kernels
CR4.OSXSAVE setting to any CPUID instruction issused by guest kernel or
userspace.

The masking value defaults to 1 (if the guest has XSAVE available) to cause
fast-forwarding to occur for the HVM and idle vcpus.

When setting the MSRs, a PV guest kernel's choice of OXSAVE is taken into
account, and clobbered from the MSR if not set.  This causes the
fast-forwarding of Xen's CR4 state not to happen.

As a side effect however, levelling may need updating on all PV CR4 changes.

Repored-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
---
 xen/arch/x86/cpu/amd.c   | 19 ++++++++++++++++++-
 xen/arch/x86/cpu/intel.c | 24 +++++++++++++++++++++++-
 xen/arch/x86/domctl.c    | 10 +++++++++-
 xen/arch/x86/traps.c     |  1 +
 4 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index 784fa40..d1ef827 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -211,6 +211,24 @@ static void amd_ctxt_switch_levelling(const struct vcpu *next)
 		(nextd && is_pv_domain(nextd) && nextd->arch.pv_domain.cpuidmasks)
 		? nextd->arch.pv_domain.cpuidmasks : &cpuidmask_defaults;
 
+	if ((levelling_caps & LCAP_1cd) == LCAP_1cd) {
+		uint64_t val = masks->_1cd;
+
+		/*
+		 * OSXSAVE defaults to 1, which causes fast-forwarding of
+		 * Xen's real setting.	Clobber it if disabled by the guest
+		 * kernel.
+		 */
+		if (next && is_pv_vcpu(next) &&
+		    !(next->arch.pv_vcpu.ctrlreg[4] & X86_CR4_OSXSAVE))
+			val &= ~((uint64_t)cpufeat_mask(X86_FEATURE_OSXSAVE) << 32);
+
+		if (unlikely(these_masks->_1cd != val)) {
+			wrmsr_amd(MSR_K8_FEATURE_MASK, val);
+			these_masks->_1cd = val;
+		}
+	}
+
 #define LAZY(cap, msr, field)						\
 	({								\
 		if (unlikely(these_masks->field != masks->field) &&	\
@@ -221,7 +239,6 @@ static void amd_ctxt_switch_levelling(const struct vcpu *next)
 		}							\
 	})
 
-	LAZY(LCAP_1cd,  MSR_K8_FEATURE_MASK,       _1cd);
 	LAZY(LCAP_e1cd, MSR_K8_EXT_FEATURE_MASK,   e1cd);
 	LAZY(LCAP_7ab0, MSR_AMD_L7S0_FEATURE_MASK, _7ab0);
 	LAZY(LCAP_6c,   MSR_AMD_THRM_FEATURE_MASK, _6c);
diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
index 3491638..bf4f15d 100644
--- a/xen/arch/x86/cpu/intel.c
+++ b/xen/arch/x86/cpu/intel.c
@@ -182,6 +182,24 @@ static void intel_ctxt_switch_levelling(const struct vcpu *next)
 	masks = (nextd && is_pv_domain(nextd) && nextd->arch.pv_domain.cpuidmasks)
 		? nextd->arch.pv_domain.cpuidmasks : &cpuidmask_defaults;
 
+        if (msr_basic) {
+		uint64_t val = masks->_1cd;
+
+		/*
+		 * OSXSAVE defaults to 1, which causes fast-forwarding of
+		 * Xen's real setting.  Clobber it if disabled by the guest
+		 * kernel.
+		 */
+		if (next && is_pv_vcpu(next) &&
+		    !(next->arch.pv_vcpu.ctrlreg[4] & X86_CR4_OSXSAVE))
+			val &= ~cpufeat_mask(X86_FEATURE_OSXSAVE);
+
+		if (unlikely(these_masks->_1cd != val)) {
+			wrmsrl(msr_basic, val);
+			these_masks->_1cd = val;
+		}
+        }
+
 #define LAZY(msr, field)						\
 	({								\
 		if (unlikely(these_masks->field != masks->field) &&	\
@@ -192,7 +210,6 @@ static void intel_ctxt_switch_levelling(const struct vcpu *next)
 		}							\
 	})
 
-	LAZY(msr_basic, _1cd);
 	LAZY(msr_ext,   e1cd);
 	LAZY(msr_xsave, Da1);
 
@@ -218,6 +235,11 @@ static void __init noinline intel_init_levelling(void)
 		ecx &= opt_cpuid_mask_ecx;
 		edx &= opt_cpuid_mask_edx;
 
+		/* Fast-forward bits - Must be set. */
+		if (ecx & cpufeat_mask(X86_FEATURE_XSAVE))
+			ecx |= cpufeat_mask(X86_FEATURE_OSXSAVE);
+		edx |= cpufeat_mask(X86_FEATURE_APIC);
+
 		cpuidmask_defaults._1cd &= ((u64)edx << 32) | ecx;
 	}
 
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index bed70aa..a904fd6 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -110,10 +110,18 @@ static void update_domain_cpuid_info(struct domain *d,
             case X86_VENDOR_INTEL:
                 /*
                  * Intel masking MSRs are documented as AND masks.
-                 * Experimentally, they are applied before OSXSAVE and APIC
+                 * Experimentally, they are applied after OSXSAVE and APIC
                  * are fast-forwarded from real hardware state.
                  */
                 mask &= ((uint64_t)edx << 32) | ecx;
+
+                if ( ecx & cpufeat_mask(X86_FEATURE_XSAVE) )
+                    ecx = cpufeat_mask(X86_FEATURE_OSXSAVE);
+                else
+                    ecx = 0;
+                edx = cpufeat_mask(X86_FEATURE_APIC);
+
+                mask |= ((uint64_t)edx << 32) | ecx;
                 break;
 
             case X86_VENDOR_AMD:
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index c95fadb..b6e56b8 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -2739,6 +2739,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
         case 4: /* Write CR4 */
             v->arch.pv_vcpu.ctrlreg[4] = pv_guest_cr4_fixup(v, *reg);
             write_cr4(pv_guest_cr4_to_real_cr4(v));
+            ctxt_switch_levelling(v);
             break;
 
         default:
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] x86/levelling: Restrict non-architectural OSXSAVE handling to emulated CPUID
  2016-08-23 17:26 [PATCH 1/3] x86/levelling: Restrict non-architectural OSXSAVE handling to emulated CPUID Andrew Cooper
  2016-08-23 17:26 ` [PATCH 2/3] x86/levelling: Pass a vcpu rather than a domain to ctxt_switch_levelling() Andrew Cooper
  2016-08-23 17:26 ` [PATCH 3/3] x86/levelling: Provide architectural OSXSAVE handling to masked native CPUID Andrew Cooper
@ 2016-08-24  8:14 ` Jan Beulich
  2 siblings, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2016-08-24  8:14 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 23.08.16 at 19:26, <andrew.cooper3@citrix.com> wrote:
> There is no need to extend the workaround to the faulted CPUID view, as
> Linux's dependence on the workaround is stricly via the emulated view.
> 
> This causes a guest kernel faulted CPUID to observe architectural behaviour
> with respect to its CR4.OSXSAVE setting.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] x86/levelling: Pass a vcpu rather than a domain to ctxt_switch_levelling()
  2016-08-23 17:26 ` [PATCH 2/3] x86/levelling: Pass a vcpu rather than a domain to ctxt_switch_levelling() Andrew Cooper
@ 2016-08-24  8:16   ` Jan Beulich
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2016-08-24  8:16 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 23.08.16 at 19:26, <andrew.cooper3@citrix.com> wrote:
> A subsequent change needs to special-case OSXSAVE handling, which is per-vcpu
> rather than per-domain.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] x86/levelling: Provide architectural OSXSAVE handling to masked native CPUID
  2016-08-23 17:26 ` [PATCH 3/3] x86/levelling: Provide architectural OSXSAVE handling to masked native CPUID Andrew Cooper
@ 2016-08-24  8:41   ` Jan Beulich
  2016-08-31 18:29     ` Andrew Cooper
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2016-08-24  8:41 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 23.08.16 at 19:26, <andrew.cooper3@citrix.com> wrote:
> Contrary to c/s b2507fe7 "x86/domctl: Update PV domain cpumasks when setting
> cpuid policy", Intel CPUID masks are applied after fast forwarding hardware
> state, rather than before.  (All behaviour in this regard appears completely
> undocumented by both Intel and AMD).
> 
> Therefore, a set bit in the MSR causes hardware to be fast-forwarded, while a
> clear bit forces the guests view to 0, even if CR4.OSXSAVE is actually set.
> 
> This allows Xen to provide an architectural view of a guest kernels
> CR4.OSXSAVE setting to any CPUID instruction issused by guest kernel or
> userspace.

Perhaps worth saying "non-PV CPUID instruction issued"?

> The masking value defaults to 1 (if the guest has XSAVE available) to cause
> fast-forwarding to occur for the HVM and idle vcpus.

Is the latter part actually true? I don't think idle vCPU-s get through
update_domain_cpuid_info(), as that gets called solely from a domctl
handler.

> Repored-by: Jan Beulich <JBeulich@suse.com>

Reported-...

> --- a/xen/arch/x86/cpu/amd.c
> +++ b/xen/arch/x86/cpu/amd.c
> @@ -211,6 +211,24 @@ static void amd_ctxt_switch_levelling(const struct vcpu *next)
>  		(nextd && is_pv_domain(nextd) && nextd->arch.pv_domain.cpuidmasks)
>  		? nextd->arch.pv_domain.cpuidmasks : &cpuidmask_defaults;
>  
> +	if ((levelling_caps & LCAP_1cd) == LCAP_1cd) {
> +		uint64_t val = masks->_1cd;
> +
> +		/*
> +		 * OSXSAVE defaults to 1, which causes fast-forwarding of
> +		 * Xen's real setting.	Clobber it if disabled by the guest

Tab inside the comment?

> +		 * kernel.
> +		 */
> +		if (next && is_pv_vcpu(next) &&
> +		    !(next->arch.pv_vcpu.ctrlreg[4] & X86_CR4_OSXSAVE))
> +			val &= ~((uint64_t)cpufeat_mask(X86_FEATURE_OSXSAVE) << 32);

I don't think idle vCPU-s would ever have their ctrlreg[4] updated, so
other than the description says you hide the flag here for them. Since
that shouldn't matter much except for the number of cases the
WRMSR below gets executed because the masks didn't match, I'm
not sure whether it's better to fix it here or to alter the description.
One might guess that fixing it would be better, as likely going forward
most (all?) guests will enable xsave, and hence we might save the
WRMSR in most of the cases if not needed for any other reason.

> @@ -218,6 +235,11 @@ static void __init noinline intel_init_levelling(void)
>  		ecx &= opt_cpuid_mask_ecx;
>  		edx &= opt_cpuid_mask_edx;
>  
> +		/* Fast-forward bits - Must be set. */
> +		if (ecx & cpufeat_mask(X86_FEATURE_XSAVE))
> +			ecx |= cpufeat_mask(X86_FEATURE_OSXSAVE);
> +		edx |= cpufeat_mask(X86_FEATURE_APIC);
> +
>  		cpuidmask_defaults._1cd &= ((u64)edx << 32) | ecx;
>  	}

Do we really want to rely on the mask MSRs to have all interesting
bits (and namely the two ones relevant here) set by the firmware?
I.e. don't you want to instead OR in the two bits after the &=?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] x86/levelling: Provide architectural OSXSAVE handling to masked native CPUID
  2016-08-24  8:41   ` Jan Beulich
@ 2016-08-31 18:29     ` Andrew Cooper
  2016-09-01 10:25       ` [PATCH v2 " Andrew Cooper
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Cooper @ 2016-08-31 18:29 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 24/08/16 09:41, Jan Beulich wrote:
>>>> On 23.08.16 at 19:26, <andrew.cooper3@citrix.com> wrote:
>> Contrary to c/s b2507fe7 "x86/domctl: Update PV domain cpumasks when setting
>> cpuid policy", Intel CPUID masks are applied after fast forwarding hardware
>> state, rather than before.  (All behaviour in this regard appears completely
>> undocumented by both Intel and AMD).
>>
>> Therefore, a set bit in the MSR causes hardware to be fast-forwarded, while a
>> clear bit forces the guests view to 0, even if CR4.OSXSAVE is actually set.
>>
>> This allows Xen to provide an architectural view of a guest kernels
>> CR4.OSXSAVE setting to any CPUID instruction issused by guest kernel or
>> userspace.
> Perhaps worth saying "non-PV CPUID instruction issued"?

I have gone with "native CPUID", and a "even when masking is used." suffix.

>
>> The masking value defaults to 1 (if the guest has XSAVE available) to cause
>> fast-forwarding to occur for the HVM and idle vcpus.
> Is the latter part actually true? I don't think idle vCPU-s get through
> update_domain_cpuid_info(), as that gets called solely from a domctl
> handler.

Urgh - this got complicated when I fixed dom0.

I was previously using these_masks != &cpuidmask_defaults, but this test
fails for dom0, as nothing allocates suitable per-domain masks.

I probably do need an extra "&& !is_idle_vcpu(v)" in the condition. 
That will cause Idle and HVM guests to always have the bits set, as they
are set in the default masks.

>
>> Repored-by: Jan Beulich <JBeulich@suse.com>
> Reported-...

Oops yes.

>
>> --- a/xen/arch/x86/cpu/amd.c
>> +++ b/xen/arch/x86/cpu/amd.c
>> @@ -211,6 +211,24 @@ static void amd_ctxt_switch_levelling(const struct vcpu *next)
>>  		(nextd && is_pv_domain(nextd) && nextd->arch.pv_domain.cpuidmasks)
>>  		? nextd->arch.pv_domain.cpuidmasks : &cpuidmask_defaults;
>>  
>> +	if ((levelling_caps & LCAP_1cd) == LCAP_1cd) {
>> +		uint64_t val = masks->_1cd;
>> +
>> +		/*
>> +		 * OSXSAVE defaults to 1, which causes fast-forwarding of
>> +		 * Xen's real setting.	Clobber it if disabled by the guest
> Tab inside the comment?

Oh yes - I know how that will have happened.  Sorry for not noticing.

>
>> +		 * kernel.
>> +		 */
>> +		if (next && is_pv_vcpu(next) &&
>> +		    !(next->arch.pv_vcpu.ctrlreg[4] & X86_CR4_OSXSAVE))
>> +			val &= ~((uint64_t)cpufeat_mask(X86_FEATURE_OSXSAVE) << 32);
> I don't think idle vCPU-s would ever have their ctrlreg[4] updated, so
> other than the description says you hide the flag here for them.

I said that I wouldn't clobber the bit for idle vcpus.

But yes, they won't have X86_CR4_OSXSAVE set, because of
real_cr4_to_pv_guest_cr4()

>  Since
> that shouldn't matter much except for the number of cases the
> WRMSR below gets executed because the masks didn't match, I'm
> not sure whether it's better to fix it here or to alter the description.
> One might guess that fixing it would be better, as likely going forward
> most (all?) guests will enable xsave, and hence we might save the
> WRMSR in most of the cases if not needed for any other reason.

We want all idle vcpus and hvm vcpus running at the defaults, to reduce
the number of times we lazily switch context.

>
>> @@ -218,6 +235,11 @@ static void __init noinline intel_init_levelling(void)
>>  		ecx &= opt_cpuid_mask_ecx;
>>  		edx &= opt_cpuid_mask_edx;
>>  
>> +		/* Fast-forward bits - Must be set. */
>> +		if (ecx & cpufeat_mask(X86_FEATURE_XSAVE))
>> +			ecx |= cpufeat_mask(X86_FEATURE_OSXSAVE);
>> +		edx |= cpufeat_mask(X86_FEATURE_APIC);
>> +
>>  		cpuidmask_defaults._1cd &= ((u64)edx << 32) | ecx;
>>  	}
> Do we really want to rely on the mask MSRs to have all interesting
> bits (and namely the two ones relevant here) set by the firmware?
> I.e. don't you want to instead OR in the two bits after the &=?

I think so, yes.  If the BIOS got it wrong to start with, Xen would fail
to set the features up in the first place, and this logic wouldn't apply.

If the bits are not set, I don't want to try and second-guess what might
be going on.  If anyone encounters such a situation, then we can see
about re-evaluating.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 3/3] x86/levelling: Provide architectural OSXSAVE handling to masked native CPUID
  2016-08-31 18:29     ` Andrew Cooper
@ 2016-09-01 10:25       ` Andrew Cooper
  2016-09-01 10:34         ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Cooper @ 2016-09-01 10:25 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

Contrary to c/s b2507fe7 "x86/domctl: Update PV domain cpumasks when setting
cpuid policy", Intel CPUID masks are applied after fast forwarding hardware
state, rather than before.  (All behaviour in this regard appears completely
undocumented by both Intel and AMD).

Therefore, a set bit in the MSR causes hardware to be fast-forwarded, while a
clear bit forces the guests view to 0, even if Xen's CR4.OSXSAVE is actually
set.

This allows Xen to provide an architectural view of a guest kernels
CR4.OSXSAVE setting to any native CPUID instruction issused by guest kernel or
userspace, even when masking is used.

The masking value defaults to 1 (if the guest has XSAVE available) to cause
fast-forwarding to occur for the HVM and idle vcpus.

When setting the MSRs, a PV guest kernel's choice of OXSAVE is taken into
account, and clobbered from the MSR if not set.  This causes the
fast-forwarding of Xen's CR4 state not to happen.

As a side effect however, levelling potentially need updating on all PV CR4
changes.

Reported-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>

v2:
 * Spelling/grammar corrections
 * Actually leave fast fowarding enabled for idle vcpus
---
 xen/arch/x86/cpu/amd.c   | 19 ++++++++++++++++++-
 xen/arch/x86/cpu/intel.c | 24 +++++++++++++++++++++++-
 xen/arch/x86/domctl.c    | 10 +++++++++-
 xen/arch/x86/traps.c     |  1 +
 4 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index 784fa40..2317546 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -211,6 +211,24 @@ static void amd_ctxt_switch_levelling(const struct vcpu *next)
 		(nextd && is_pv_domain(nextd) && nextd->arch.pv_domain.cpuidmasks)
 		? nextd->arch.pv_domain.cpuidmasks : &cpuidmask_defaults;
 
+	if ((levelling_caps & LCAP_1cd) == LCAP_1cd) {
+		uint64_t val = masks->_1cd;
+
+		/*
+		 * OSXSAVE defaults to 1, which causes fast-forwarding of
+		 * Xen's real setting.  Clobber it if disabled by the guest
+		 * kernel.
+		 */
+		if (next && is_pv_vcpu(next) && !is_idle_vcpu(next) &&
+		    !(next->arch.pv_vcpu.ctrlreg[4] & X86_CR4_OSXSAVE))
+			val &= ~((uint64_t)cpufeat_mask(X86_FEATURE_OSXSAVE) << 32);
+
+		if (unlikely(these_masks->_1cd != val)) {
+			wrmsr_amd(MSR_K8_FEATURE_MASK, val);
+			these_masks->_1cd = val;
+		}
+	}
+
 #define LAZY(cap, msr, field)						\
 	({								\
 		if (unlikely(these_masks->field != masks->field) &&	\
@@ -221,7 +239,6 @@ static void amd_ctxt_switch_levelling(const struct vcpu *next)
 		}							\
 	})
 
-	LAZY(LCAP_1cd,  MSR_K8_FEATURE_MASK,       _1cd);
 	LAZY(LCAP_e1cd, MSR_K8_EXT_FEATURE_MASK,   e1cd);
 	LAZY(LCAP_7ab0, MSR_AMD_L7S0_FEATURE_MASK, _7ab0);
 	LAZY(LCAP_6c,   MSR_AMD_THRM_FEATURE_MASK, _6c);
diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
index 3491638..a9355cbf 100644
--- a/xen/arch/x86/cpu/intel.c
+++ b/xen/arch/x86/cpu/intel.c
@@ -182,6 +182,24 @@ static void intel_ctxt_switch_levelling(const struct vcpu *next)
 	masks = (nextd && is_pv_domain(nextd) && nextd->arch.pv_domain.cpuidmasks)
 		? nextd->arch.pv_domain.cpuidmasks : &cpuidmask_defaults;
 
+        if (msr_basic) {
+		uint64_t val = masks->_1cd;
+
+		/*
+		 * OSXSAVE defaults to 1, which causes fast-forwarding of
+		 * Xen's real setting.  Clobber it if disabled by the guest
+		 * kernel.
+		 */
+		if (next && is_pv_vcpu(next) && !is_idle_vcpu(next) &&
+		    !(next->arch.pv_vcpu.ctrlreg[4] & X86_CR4_OSXSAVE))
+			val &= ~cpufeat_mask(X86_FEATURE_OSXSAVE);
+
+		if (unlikely(these_masks->_1cd != val)) {
+			wrmsrl(msr_basic, val);
+			these_masks->_1cd = val;
+		}
+        }
+
 #define LAZY(msr, field)						\
 	({								\
 		if (unlikely(these_masks->field != masks->field) &&	\
@@ -192,7 +210,6 @@ static void intel_ctxt_switch_levelling(const struct vcpu *next)
 		}							\
 	})
 
-	LAZY(msr_basic, _1cd);
 	LAZY(msr_ext,   e1cd);
 	LAZY(msr_xsave, Da1);
 
@@ -218,6 +235,11 @@ static void __init noinline intel_init_levelling(void)
 		ecx &= opt_cpuid_mask_ecx;
 		edx &= opt_cpuid_mask_edx;
 
+		/* Fast-forward bits - Must be set. */
+		if (ecx & cpufeat_mask(X86_FEATURE_XSAVE))
+			ecx |= cpufeat_mask(X86_FEATURE_OSXSAVE);
+		edx |= cpufeat_mask(X86_FEATURE_APIC);
+
 		cpuidmask_defaults._1cd &= ((u64)edx << 32) | ecx;
 	}
 
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index bed70aa..a904fd6 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -110,10 +110,18 @@ static void update_domain_cpuid_info(struct domain *d,
             case X86_VENDOR_INTEL:
                 /*
                  * Intel masking MSRs are documented as AND masks.
-                 * Experimentally, they are applied before OSXSAVE and APIC
+                 * Experimentally, they are applied after OSXSAVE and APIC
                  * are fast-forwarded from real hardware state.
                  */
                 mask &= ((uint64_t)edx << 32) | ecx;
+
+                if ( ecx & cpufeat_mask(X86_FEATURE_XSAVE) )
+                    ecx = cpufeat_mask(X86_FEATURE_OSXSAVE);
+                else
+                    ecx = 0;
+                edx = cpufeat_mask(X86_FEATURE_APIC);
+
+                mask |= ((uint64_t)edx << 32) | ecx;
                 break;
 
             case X86_VENDOR_AMD:
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index ce924d8..bab374d 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -2737,6 +2737,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
         case 4: /* Write CR4 */
             v->arch.pv_vcpu.ctrlreg[4] = pv_guest_cr4_fixup(v, *reg);
             write_cr4(pv_guest_cr4_to_real_cr4(v));
+            ctxt_switch_levelling(v);
             break;
 
         default:
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 3/3] x86/levelling: Provide architectural OSXSAVE handling to masked native CPUID
  2016-09-01 10:25       ` [PATCH v2 " Andrew Cooper
@ 2016-09-01 10:34         ` Jan Beulich
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2016-09-01 10:34 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 01.09.16 at 12:25, <andrew.cooper3@citrix.com> wrote:
> Contrary to c/s b2507fe7 "x86/domctl: Update PV domain cpumasks when setting
> cpuid policy", Intel CPUID masks are applied after fast forwarding hardware
> state, rather than before.  (All behaviour in this regard appears completely
> undocumented by both Intel and AMD).
> 
> Therefore, a set bit in the MSR causes hardware to be fast-forwarded, while a
> clear bit forces the guests view to 0, even if Xen's CR4.OSXSAVE is actually
> set.
> 
> This allows Xen to provide an architectural view of a guest kernels
> CR4.OSXSAVE setting to any native CPUID instruction issused by guest kernel or
> userspace, even when masking is used.
> 
> The masking value defaults to 1 (if the guest has XSAVE available) to cause
> fast-forwarding to occur for the HVM and idle vcpus.
> 
> When setting the MSRs, a PV guest kernel's choice of OXSAVE is taken into
> account, and clobbered from the MSR if not set.  This causes the
> fast-forwarding of Xen's CR4 state not to happen.
> 
> As a side effect however, levelling potentially need updating on all PV CR4
> changes.
> 
> Reported-by: Jan Beulich <JBeulich@suse.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-09-01 10:34 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-23 17:26 [PATCH 1/3] x86/levelling: Restrict non-architectural OSXSAVE handling to emulated CPUID Andrew Cooper
2016-08-23 17:26 ` [PATCH 2/3] x86/levelling: Pass a vcpu rather than a domain to ctxt_switch_levelling() Andrew Cooper
2016-08-24  8:16   ` Jan Beulich
2016-08-23 17:26 ` [PATCH 3/3] x86/levelling: Provide architectural OSXSAVE handling to masked native CPUID Andrew Cooper
2016-08-24  8:41   ` Jan Beulich
2016-08-31 18:29     ` Andrew Cooper
2016-09-01 10:25       ` [PATCH v2 " Andrew Cooper
2016-09-01 10:34         ` Jan Beulich
2016-08-24  8:14 ` [PATCH 1/3] x86/levelling: Restrict non-architectural OSXSAVE handling to emulated CPUID Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.