* [PATCH v2 0/3] Add KVM support for Intel local MCE @ 2016-06-16 6:05 Haozhong Zhang 2016-06-16 6:05 ` [PATCH v2 1/3] KVM: VMX: move msr_ia32_feature_control to vcpu_vmx Haozhong Zhang ` (2 more replies) 0 siblings, 3 replies; 20+ messages in thread From: Haozhong Zhang @ 2016-06-16 6:05 UTC (permalink / raw) To: kvm Cc: Paolo Bonzini, rkrcmar, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel, Gleb Natapov, Boris Petkov, Tony Luck, Andi Kleen, Ashok Raj, Haozhong Zhang Changes in v2: * v1 Patch 1 becomes v2 Patch 3. * Fix COB chain in Patch 3. (Boris Petkov) * (Patch 1) Move msr_ia32_feature_control from nested_vmx to vcpu_vmx, because it does not depend only on nested after this patch series. (Radim Krčmář) * (Patch 2) Add a valid bitmask for MSR_IA32_FEATURE_CONTROL to allow checking individual bits of MSR_IA32_FEATURE_CONTROL according to enabled features. (Radim Krčmář) * Move the common check in handling MSR_IA32_MCG_EXT_CTL to function vmx_mcg_ext_ctl_msr_present. (Radim Krčmář) Changes in v1: * Change macro KVM_MCE_CAP_SUPPORTED to variable kvm_mce_cap_supported. * Include LMCE capability in kvm_mce_cap_supported only on Intel CPU, i.e. LMCE can be enabled only on Intel CPU. * Check if LMCE is enabled in guest MSR_IA32_FEATURE_CONTROL when handling guest access to MSR_IA32_MCG_EXT_CTL. This patch series along with the corresponding QEMU patch series (sent via another email with title "[PATCH v4 0/3] Add QEMU support for Intel local MCE") enables Intel local MCE feature for guest. This KVM patch handles guest access to LMCE-related MSR (MSR_IA32_MCG_EXT_CTL and MSR_IA32_FEATURE_CONTROL). Ashok Raj (1): KVM: VMX: enable guest access to LMCE related MSRs Haozhong Zhang (2): KVM: VMX: move msr_ia32_feature_control to vcpu_vmx KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL arch/x86/include/asm/kvm_host.h | 5 +++ arch/x86/kvm/vmx.c | 97 ++++++++++++++++++++++++++++++++++++++--- arch/x86/kvm/x86.c | 15 ++++--- 3 files changed, 104 insertions(+), 13 deletions(-) -- 2.9.0 ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v2 1/3] KVM: VMX: move msr_ia32_feature_control to vcpu_vmx 2016-06-16 6:05 [PATCH v2 0/3] Add KVM support for Intel local MCE Haozhong Zhang @ 2016-06-16 6:05 ` Haozhong Zhang 2016-06-16 11:49 ` Borislav Petkov 2016-06-16 6:05 ` [PATCH v2 2/3] KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL Haozhong Zhang 2016-06-16 6:05 ` [PATCH v2 3/3] KVM: VMX: enable guest access to LMCE related MSRs Haozhong Zhang 2 siblings, 1 reply; 20+ messages in thread From: Haozhong Zhang @ 2016-06-16 6:05 UTC (permalink / raw) To: kvm Cc: Paolo Bonzini, rkrcmar, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel, Gleb Natapov, Boris Petkov, Tony Luck, Andi Kleen, Ashok Raj, Haozhong Zhang msr_ia32_feature_control will be used for LMCE and not depend only on nested anymore, so move it from struct nested_vmx to struct vcpu_vmx. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> --- arch/x86/kvm/vmx.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 57ec6a4..6b63f2d 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -421,7 +421,6 @@ struct nested_vmx { struct pi_desc *pi_desc; bool pi_pending; u16 posted_intr_nv; - u64 msr_ia32_feature_control; struct hrtimer preemption_timer; bool preemption_timer_expired; @@ -602,6 +601,8 @@ struct vcpu_vmx { bool guest_pkru_valid; u32 guest_pkru; u32 host_pkru; + + u64 msr_ia32_feature_control; }; enum segment_cache_field { @@ -2907,7 +2908,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_FEATURE_CONTROL: if (!nested_vmx_allowed(vcpu)) return 1; - msr_info->data = to_vmx(vcpu)->nested.msr_ia32_feature_control; + msr_info->data = to_vmx(vcpu)->msr_ia32_feature_control; break; case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC: if (!nested_vmx_allowed(vcpu)) @@ -2999,10 +3000,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) break; case MSR_IA32_FEATURE_CONTROL: if (!nested_vmx_allowed(vcpu) || - (to_vmx(vcpu)->nested.msr_ia32_feature_control & + (to_vmx(vcpu)->msr_ia32_feature_control & FEATURE_CONTROL_LOCKED && !msr_info->host_initiated)) return 1; - vmx->nested.msr_ia32_feature_control = data; + vmx->msr_ia32_feature_control = data; if (msr_info->host_initiated && data == 0) vmx_leave_nested(vcpu); break; @@ -6859,7 +6860,7 @@ static int handle_vmon(struct kvm_vcpu *vcpu) return 1; } - if ((vmx->nested.msr_ia32_feature_control & VMXON_NEEDED_FEATURES) + if ((vmx->msr_ia32_feature_control & VMXON_NEEDED_FEATURES) != VMXON_NEEDED_FEATURES) { kvm_inject_gp(vcpu, 0); return 1; -- 2.9.0 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/3] KVM: VMX: move msr_ia32_feature_control to vcpu_vmx 2016-06-16 6:05 ` [PATCH v2 1/3] KVM: VMX: move msr_ia32_feature_control to vcpu_vmx Haozhong Zhang @ 2016-06-16 11:49 ` Borislav Petkov 2016-06-16 11:57 ` Paolo Bonzini 2016-06-16 11:57 ` Haozhong Zhang 0 siblings, 2 replies; 20+ messages in thread From: Borislav Petkov @ 2016-06-16 11:49 UTC (permalink / raw) To: Haozhong Zhang Cc: kvm, Paolo Bonzini, rkrcmar, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel, Gleb Natapov, Tony Luck, Andi Kleen, Ashok Raj On Thu, Jun 16, 2016 at 02:05:29PM +0800, Haozhong Zhang wrote: > msr_ia32_feature_control will be used for LMCE and not depend only on > nested anymore, so move it from struct nested_vmx to struct vcpu_vmx. > > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> > --- > arch/x86/kvm/vmx.c | 11 ++++++----- > 1 file changed, 6 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index 57ec6a4..6b63f2d 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -421,7 +421,6 @@ struct nested_vmx { > struct pi_desc *pi_desc; > bool pi_pending; > u16 posted_intr_nv; > - u64 msr_ia32_feature_control; > > struct hrtimer preemption_timer; > bool preemption_timer_expired; > @@ -602,6 +601,8 @@ struct vcpu_vmx { > bool guest_pkru_valid; > u32 guest_pkru; > u32 host_pkru; > + > + u64 msr_ia32_feature_control; > }; > > enum segment_cache_field { > @@ -2907,7 +2908,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > case MSR_IA32_FEATURE_CONTROL: > if (!nested_vmx_allowed(vcpu)) > return 1; > - msr_info->data = to_vmx(vcpu)->nested.msr_ia32_feature_control; > + msr_info->data = to_vmx(vcpu)->msr_ia32_feature_control; Since this moves out of struct nested_vmx, that check above it: if (!nested_vmx_allowed(vcpu)) should not influence it anymore, no? Ditto for the rest. -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) -- ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/3] KVM: VMX: move msr_ia32_feature_control to vcpu_vmx 2016-06-16 11:49 ` Borislav Petkov @ 2016-06-16 11:57 ` Paolo Bonzini 2016-06-16 11:57 ` Haozhong Zhang 1 sibling, 0 replies; 20+ messages in thread From: Paolo Bonzini @ 2016-06-16 11:57 UTC (permalink / raw) To: Borislav Petkov, Haozhong Zhang Cc: kvm, rkrcmar, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel, Gleb Natapov, Tony Luck, Andi Kleen, Ashok Raj On 16/06/2016 13:49, Borislav Petkov wrote: >> > enum segment_cache_field { >> > @@ -2907,7 +2908,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) >> > case MSR_IA32_FEATURE_CONTROL: >> > if (!nested_vmx_allowed(vcpu)) >> > return 1; >> > - msr_info->data = to_vmx(vcpu)->nested.msr_ia32_feature_control; >> > + msr_info->data = to_vmx(vcpu)->msr_ia32_feature_control; > Since this moves out of struct nested_vmx, that check above it: > > if (!nested_vmx_allowed(vcpu)) > > should not influence it anymore, no? For get, yes, this "if" should go. For set, you need to check that the guest doesn't write to reserved bits. So as of this patch the "if" remains tied to VMX, but the next patch changes it to be generic. Paolo > Ditto for the rest. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/3] KVM: VMX: move msr_ia32_feature_control to vcpu_vmx 2016-06-16 11:49 ` Borislav Petkov 2016-06-16 11:57 ` Paolo Bonzini @ 2016-06-16 11:57 ` Haozhong Zhang 1 sibling, 0 replies; 20+ messages in thread From: Haozhong Zhang @ 2016-06-16 11:57 UTC (permalink / raw) To: Borislav Petkov Cc: kvm, Paolo Bonzini, rkrcmar, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel, Gleb Natapov, Tony Luck, Andi Kleen, Ashok Raj On 06/16/16 13:49, Borislav Petkov wrote: > On Thu, Jun 16, 2016 at 02:05:29PM +0800, Haozhong Zhang wrote: > > msr_ia32_feature_control will be used for LMCE and not depend only on > > nested anymore, so move it from struct nested_vmx to struct vcpu_vmx. > > > > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> > > --- > > arch/x86/kvm/vmx.c | 11 ++++++----- > > 1 file changed, 6 insertions(+), 5 deletions(-) > > > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > > index 57ec6a4..6b63f2d 100644 > > --- a/arch/x86/kvm/vmx.c > > +++ b/arch/x86/kvm/vmx.c > > @@ -421,7 +421,6 @@ struct nested_vmx { > > struct pi_desc *pi_desc; > > bool pi_pending; > > u16 posted_intr_nv; > > - u64 msr_ia32_feature_control; > > > > struct hrtimer preemption_timer; > > bool preemption_timer_expired; > > @@ -602,6 +601,8 @@ struct vcpu_vmx { > > bool guest_pkru_valid; > > u32 guest_pkru; > > u32 host_pkru; > > + > > + u64 msr_ia32_feature_control; > > }; > > > > enum segment_cache_field { > > @@ -2907,7 +2908,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > > case MSR_IA32_FEATURE_CONTROL: > > if (!nested_vmx_allowed(vcpu)) > > return 1; > > - msr_info->data = to_vmx(vcpu)->nested.msr_ia32_feature_control; > > + msr_info->data = to_vmx(vcpu)->msr_ia32_feature_control; > > Since this moves out of struct nested_vmx, that check above it: > > if (!nested_vmx_allowed(vcpu)) > > should not influence it anymore, no? > > Ditto for the rest. > The same check in the set case still makes sense. I can remove the check here and leave it in the set case. Thanks, Haozhong ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v2 2/3] KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL 2016-06-16 6:05 [PATCH v2 0/3] Add KVM support for Intel local MCE Haozhong Zhang 2016-06-16 6:05 ` [PATCH v2 1/3] KVM: VMX: move msr_ia32_feature_control to vcpu_vmx Haozhong Zhang @ 2016-06-16 6:05 ` Haozhong Zhang 2016-06-16 9:55 ` Paolo Bonzini 2016-06-16 10:01 ` Paolo Bonzini 2016-06-16 6:05 ` [PATCH v2 3/3] KVM: VMX: enable guest access to LMCE related MSRs Haozhong Zhang 2 siblings, 2 replies; 20+ messages in thread From: Haozhong Zhang @ 2016-06-16 6:05 UTC (permalink / raw) To: kvm Cc: Paolo Bonzini, rkrcmar, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel, Gleb Natapov, Boris Petkov, Tony Luck, Andi Kleen, Ashok Raj, Haozhong Zhang KVM currently does not check the value written to guest MSR_IA32_FEATURE_CONTROL, though bits corresponding to disabled features may be set. This patch makes KVM to validate individual bits written to guest MSR_IA32_FEATURE_CONTROL according to enabled features. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> --- arch/x86/kvm/vmx.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 50 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 6b63f2d..1dc89c5 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -602,7 +602,16 @@ struct vcpu_vmx { u32 guest_pkru; u32 host_pkru; + /* + * Only bits masked by msr_ia32_feature_control_valid_bits can be set in + * msr_ia32_feature_control. + * + * msr_ia32_feature_control_valid_bits should be modified by + * feature_control_valid_bits_add/del(), and only bits masked by + * FEATURE_CONTROL_MAX_VALID_BITS can be modified. + */ u64 msr_ia32_feature_control; + u64 msr_ia32_feature_control_valid_bits; }; enum segment_cache_field { @@ -624,6 +633,30 @@ static struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu) return &(to_vmx(vcpu)->pi_desc); } +/* + * FEATURE_CONTROL_LOCKED is added/removed automatically by + * feature_control_valid_bits_add/del(), so it's not included here. + */ +#define FEATURE_CONTROL_MAX_VALID_BITS \ + FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX + +static void feature_control_valid_bits_add(struct kvm_vcpu *vcpu, uint64_t bits) +{ + ASSERT(!(bits & ~FEATURE_CONTROL_MAX_VALID_BITS)); + to_vmx(vcpu)->msr_ia32_feature_control_valid_bits |= + bits | FEATURE_CONTROL_LOCKED; +} + +static void feature_control_valid_bits_del(struct kvm_vcpu *vcpu, uint64_t bits) +{ + uint64_t *valid_bits = + &to_vmx(vcpu)->msr_ia32_feature_control_valid_bits; + ASSERT(!(bits & ~FEATURE_CONTROL_MAX_VALID_BITS)); + *valid_bits &= ~bits; + if (!(*valid_bits & ~FEATURE_CONTROL_LOCKED)) + *valid_bits = 0; +} + #define VMCS12_OFFSET(x) offsetof(struct vmcs12, x) #define FIELD(number, name) [number] = VMCS12_OFFSET(name) #define FIELD64(number, name) [number] = VMCS12_OFFSET(name), \ @@ -2864,6 +2897,14 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata) return 0; } +static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu, + uint64_t val) +{ + uint64_t valid_bits = to_vmx(vcpu)->msr_ia32_feature_control_valid_bits; + + return valid_bits && !(val & ~valid_bits); +} + /* * Reads an msr value (of 'msr_index') into 'pdata'. * Returns 0 on success, non-0 otherwise. @@ -2906,7 +2947,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) msr_info->data = vmcs_read64(GUEST_BNDCFGS); break; case MSR_IA32_FEATURE_CONTROL: - if (!nested_vmx_allowed(vcpu)) + if (!vmx_feature_control_msr_valid(vcpu, 0)) return 1; msr_info->data = to_vmx(vcpu)->msr_ia32_feature_control; break; @@ -2999,7 +3040,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) ret = kvm_set_msr_common(vcpu, msr_info); break; case MSR_IA32_FEATURE_CONTROL: - if (!nested_vmx_allowed(vcpu) || + if (!vmx_feature_control_msr_valid(vcpu, data) || (to_vmx(vcpu)->msr_ia32_feature_control & FEATURE_CONTROL_LOCKED && !msr_info->host_initiated)) return 1; @@ -9095,6 +9136,13 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu) vmx->nested.nested_vmx_secondary_ctls_high &= ~SECONDARY_EXEC_PCOMMIT; } + + if (nested_vmx_allowed(vcpu)) + feature_control_valid_bits_add + (vcpu, FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX); + else + feature_control_valid_bits_del + (vcpu, FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX); } static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry) -- 2.9.0 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/3] KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL 2016-06-16 6:05 ` [PATCH v2 2/3] KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL Haozhong Zhang @ 2016-06-16 9:55 ` Paolo Bonzini 2016-06-16 11:21 ` Haozhong Zhang 2016-06-16 10:01 ` Paolo Bonzini 1 sibling, 1 reply; 20+ messages in thread From: Paolo Bonzini @ 2016-06-16 9:55 UTC (permalink / raw) To: Haozhong Zhang, kvm Cc: rkrcmar, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel, Gleb Natapov, Boris Petkov, Tony Luck, Andi Kleen, Ashok Raj On 16/06/2016 08:05, Haozhong Zhang wrote: > + /* > + * Only bits masked by msr_ia32_feature_control_valid_bits can be set in > + * msr_ia32_feature_control. > + * > + * msr_ia32_feature_control_valid_bits should be modified by > + * feature_control_valid_bits_add/del(), and only bits masked by > + * FEATURE_CONTROL_MAX_VALID_BITS can be modified. > + */ > u64 msr_ia32_feature_control; > + u64 msr_ia32_feature_control_valid_bits; I noticed that the fw_cfg patch used an uint32_t. It probably should use uint64_t; what you did here is correct. Paolo > }; ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/3] KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL 2016-06-16 9:55 ` Paolo Bonzini @ 2016-06-16 11:21 ` Haozhong Zhang 0 siblings, 0 replies; 20+ messages in thread From: Haozhong Zhang @ 2016-06-16 11:21 UTC (permalink / raw) To: Paolo Bonzini Cc: kvm, rkrcmar, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel, Gleb Natapov, Boris Petkov, Tony Luck, Andi Kleen, Ashok Raj On 06/16/16 11:55, Paolo Bonzini wrote: > > > On 16/06/2016 08:05, Haozhong Zhang wrote: > > + /* > > + * Only bits masked by msr_ia32_feature_control_valid_bits can be set in > > + * msr_ia32_feature_control. > > + * > > + * msr_ia32_feature_control_valid_bits should be modified by > > + * feature_control_valid_bits_add/del(), and only bits masked by > > + * FEATURE_CONTROL_MAX_VALID_BITS can be modified. > > + */ > > u64 msr_ia32_feature_control; > > + u64 msr_ia32_feature_control_valid_bits; > > I noticed that the fw_cfg patch used an uint32_t. It probably should > use uint64_t; what you did here is correct. > I'll fix there. Thanks, Haozhong ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/3] KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL 2016-06-16 6:05 ` [PATCH v2 2/3] KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL Haozhong Zhang 2016-06-16 9:55 ` Paolo Bonzini @ 2016-06-16 10:01 ` Paolo Bonzini 2016-06-16 11:16 ` Haozhong Zhang 1 sibling, 1 reply; 20+ messages in thread From: Paolo Bonzini @ 2016-06-16 10:01 UTC (permalink / raw) To: Haozhong Zhang, kvm Cc: rkrcmar, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel, Gleb Natapov, Boris Petkov, Tony Luck, Andi Kleen, Ashok Raj On 16/06/2016 08:05, Haozhong Zhang wrote: > +/* > + * FEATURE_CONTROL_LOCKED is added/removed automatically by > + * feature_control_valid_bits_add/del(), so it's not included here. > + */ > +#define FEATURE_CONTROL_MAX_VALID_BITS \ > + FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX > + > +static void feature_control_valid_bits_add(struct kvm_vcpu *vcpu, uint64_t bits) > +{ > + ASSERT(!(bits & ~FEATURE_CONTROL_MAX_VALID_BITS)); The KVM-specific ASSERT macro should go. You can use WARN_ON. However, I think FEATURE_CONTROL_LOCKED should always be writable. If you change that, it's simpler to just do |= and &= in the caller. > + to_vmx(vcpu)->msr_ia32_feature_control_valid_bits |= > + bits | FEATURE_CONTROL_LOCKED; > +} > + > +static void feature_control_valid_bits_del(struct kvm_vcpu *vcpu, uint64_t bits) > +{ > + uint64_t *valid_bits = > + &to_vmx(vcpu)->msr_ia32_feature_control_valid_bits; > + ASSERT(!(bits & ~FEATURE_CONTROL_MAX_VALID_BITS)); > + *valid_bits &= ~bits; > + if (!(*valid_bits & ~FEATURE_CONTROL_LOCKED)) > + *valid_bits = 0; > +} > + > #define VMCS12_OFFSET(x) offsetof(struct vmcs12, x) > #define FIELD(number, name) [number] = VMCS12_OFFSET(name) > #define FIELD64(number, name) [number] = VMCS12_OFFSET(name), \ > @@ -2864,6 +2897,14 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata) > return 0; > } > > +static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu, > + uint64_t val) > +{ > + uint64_t valid_bits = to_vmx(vcpu)->msr_ia32_feature_control_valid_bits; > + > + return valid_bits && !(val & ~valid_bits); > +} > /* > * Reads an msr value (of 'msr_index') into 'pdata'. > * Returns 0 on success, non-0 otherwise. > @@ -2906,7 +2947,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > msr_info->data = vmcs_read64(GUEST_BNDCFGS); > break; > case MSR_IA32_FEATURE_CONTROL: > - if (!nested_vmx_allowed(vcpu)) > + if (!vmx_feature_control_msr_valid(vcpu, 0)) You can remove this if completely in patch 1. It's okay to make the MSR always available. Thanks, Paolo > return 1; > msr_info->data = to_vmx(vcpu)->msr_ia32_feature_control; > break; > @@ -2999,7 +3040,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > ret = kvm_set_msr_common(vcpu, msr_info); > break; > case MSR_IA32_FEATURE_CONTROL: > - if (!nested_vmx_allowed(vcpu) || > + if (!vmx_feature_control_msr_valid(vcpu, data) || > (to_vmx(vcpu)->msr_ia32_feature_control & > FEATURE_CONTROL_LOCKED && !msr_info->host_initiated)) > return 1; > @@ -9095,6 +9136,13 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu) > vmx->nested.nested_vmx_secondary_ctls_high &= > ~SECONDARY_EXEC_PCOMMIT; > } > + > + if (nested_vmx_allowed(vcpu)) > + feature_control_valid_bits_add > + (vcpu, FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX); > + else > + feature_control_valid_bits_del > + (vcpu, FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX); > } ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/3] KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL 2016-06-16 10:01 ` Paolo Bonzini @ 2016-06-16 11:16 ` Haozhong Zhang 2016-06-16 11:19 ` Paolo Bonzini 0 siblings, 1 reply; 20+ messages in thread From: Haozhong Zhang @ 2016-06-16 11:16 UTC (permalink / raw) To: Paolo Bonzini Cc: kvm, rkrcmar, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel, Gleb Natapov, Boris Petkov, Tony Luck, Andi Kleen, Ashok Raj On 06/16/16 12:01, Paolo Bonzini wrote: > > > On 16/06/2016 08:05, Haozhong Zhang wrote: > > +/* > > + * FEATURE_CONTROL_LOCKED is added/removed automatically by > > + * feature_control_valid_bits_add/del(), so it's not included here. > > + */ > > +#define FEATURE_CONTROL_MAX_VALID_BITS \ > > + FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX > > + > > +static void feature_control_valid_bits_add(struct kvm_vcpu *vcpu, uint64_t bits) > > +{ > > + ASSERT(!(bits & ~FEATURE_CONTROL_MAX_VALID_BITS)); > > The KVM-specific ASSERT macro should go. You can use WARN_ON. > will change to WARN_ON. > However, I think FEATURE_CONTROL_LOCKED should always be writable. If > you change that, it's simpler to just do |= and &= in the caller. > These two functions (add/del) are to prevent callers from forgetting setting/clearing FEATURE_CONTROL_LOCKED in msr_ia32_feature_control_valid_bits: it should be set if any feature bit is set, and be cleared if all feature bits are cleared. The second rule could relaxed as we can always present MSR_IA32_FEATURE_CONTROL to guest. I'm okey to let callers take care for the locked bit. > > + to_vmx(vcpu)->msr_ia32_feature_control_valid_bits |= > > + bits | FEATURE_CONTROL_LOCKED; > > +} > > + > > +static void feature_control_valid_bits_del(struct kvm_vcpu *vcpu, uint64_t bits) > > +{ > > + uint64_t *valid_bits = > > + &to_vmx(vcpu)->msr_ia32_feature_control_valid_bits; > > + ASSERT(!(bits & ~FEATURE_CONTROL_MAX_VALID_BITS)); > > + *valid_bits &= ~bits; > > + if (!(*valid_bits & ~FEATURE_CONTROL_LOCKED)) > > + *valid_bits = 0; > > +} > > + > > #define VMCS12_OFFSET(x) offsetof(struct vmcs12, x) > > #define FIELD(number, name) [number] = VMCS12_OFFSET(name) > > #define FIELD64(number, name) [number] = VMCS12_OFFSET(name), \ > > @@ -2864,6 +2897,14 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata) > > return 0; > > } > > > > +static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu, > > + uint64_t val) > > +{ > > + uint64_t valid_bits = to_vmx(vcpu)->msr_ia32_feature_control_valid_bits; > > + > > + return valid_bits && !(val & ~valid_bits); > > +} > > /* > > * Reads an msr value (of 'msr_index') into 'pdata'. > > * Returns 0 on success, non-0 otherwise. > > @@ -2906,7 +2947,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > > msr_info->data = vmcs_read64(GUEST_BNDCFGS); > > break; > > case MSR_IA32_FEATURE_CONTROL: > > - if (!nested_vmx_allowed(vcpu)) > > + if (!vmx_feature_control_msr_valid(vcpu, 0)) > > You can remove this if completely in patch 1. It's okay to make the MSR > always available. > But then it also allows all bits to be set by guests, even though some features are not available. Though this problem already exists before Patch 1, I prefer to leave it as was in Patch 1 and fix it here. Haozhong ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/3] KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL 2016-06-16 11:16 ` Haozhong Zhang @ 2016-06-16 11:19 ` Paolo Bonzini 2016-06-16 11:29 ` Haozhong Zhang 0 siblings, 1 reply; 20+ messages in thread From: Paolo Bonzini @ 2016-06-16 11:19 UTC (permalink / raw) To: kvm, rkrcmar, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel, Gleb Natapov, Boris Petkov, Tony Luck, Andi Kleen, Ashok Raj On 16/06/2016 13:16, Haozhong Zhang wrote: >> However, I think FEATURE_CONTROL_LOCKED should always be writable. If >> you change that, it's simpler to just do |= and &= in the caller. > > These two functions (add/del) are to prevent callers from forgetting > setting/clearing FEATURE_CONTROL_LOCKED in > msr_ia32_feature_control_valid_bits: it should be set if any feature > bit is set, and be cleared if all feature bits are cleared. The second > rule could relaxed as we can always present MSR_IA32_FEATURE_CONTROL > to guest. Yes, this means that FEATURE_CONTROL_LOCKED effectively is always valid. So you end up with just &= to clear and |= to set. > I'm okey to let callers take care for the locked bit. > >>> + to_vmx(vcpu)->msr_ia32_feature_control_valid_bits |= >>> + bits | FEATURE_CONTROL_LOCKED; >>> +} >>> + >>> +static void feature_control_valid_bits_del(struct kvm_vcpu *vcpu, uint64_t bits) >>> +{ >>> + uint64_t *valid_bits = >>> + &to_vmx(vcpu)->msr_ia32_feature_control_valid_bits; >>> + ASSERT(!(bits & ~FEATURE_CONTROL_MAX_VALID_BITS)); >>> + *valid_bits &= ~bits; >>> + if (!(*valid_bits & ~FEATURE_CONTROL_LOCKED)) >>> + *valid_bits = 0; >>> +} >>> + >>> #define VMCS12_OFFSET(x) offsetof(struct vmcs12, x) >>> #define FIELD(number, name) [number] = VMCS12_OFFSET(name) >>> #define FIELD64(number, name) [number] = VMCS12_OFFSET(name), \ >>> @@ -2864,6 +2897,14 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata) >>> return 0; >>> } >>> >>> +static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu, >>> + uint64_t val) >>> +{ >>> + uint64_t valid_bits = to_vmx(vcpu)->msr_ia32_feature_control_valid_bits; >>> + >>> + return valid_bits && !(val & ~valid_bits); >>> +} >>> /* >>> * Reads an msr value (of 'msr_index') into 'pdata'. >>> * Returns 0 on success, non-0 otherwise. >>> @@ -2906,7 +2947,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) >>> msr_info->data = vmcs_read64(GUEST_BNDCFGS); >>> break; >>> case MSR_IA32_FEATURE_CONTROL: >>> - if (!nested_vmx_allowed(vcpu)) >>> + if (!vmx_feature_control_msr_valid(vcpu, 0)) >> >> You can remove this if completely in patch 1. It's okay to make the MSR >> always available. >> > > But then it also allows all bits to be set by guests, even though some > features are not available. Note that this is "get". Of course the "if" must stay in vmx_set_msr. Paolo ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/3] KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL 2016-06-16 11:19 ` Paolo Bonzini @ 2016-06-16 11:29 ` Haozhong Zhang 0 siblings, 0 replies; 20+ messages in thread From: Haozhong Zhang @ 2016-06-16 11:29 UTC (permalink / raw) To: Paolo Bonzini Cc: kvm, rkrcmar, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel, Gleb Natapov, Boris Petkov, Tony Luck, Andi Kleen, Ashok Raj On 06/16/16 13:19, Paolo Bonzini wrote: > > > On 16/06/2016 13:16, Haozhong Zhang wrote: > >> However, I think FEATURE_CONTROL_LOCKED should always be writable. If > >> you change that, it's simpler to just do |= and &= in the caller. > > > > These two functions (add/del) are to prevent callers from forgetting > > setting/clearing FEATURE_CONTROL_LOCKED in > > msr_ia32_feature_control_valid_bits: it should be set if any feature > > bit is set, and be cleared if all feature bits are cleared. The second > > rule could relaxed as we can always present MSR_IA32_FEATURE_CONTROL > > to guest. > > Yes, this means that FEATURE_CONTROL_LOCKED effectively is always valid. > So you end up with just &= to clear and |= to set. > > > I'm okey to let callers take care for the locked bit. > > > >>> + to_vmx(vcpu)->msr_ia32_feature_control_valid_bits |= > >>> + bits | FEATURE_CONTROL_LOCKED; > >>> +} > >>> + > >>> +static void feature_control_valid_bits_del(struct kvm_vcpu *vcpu, uint64_t bits) > >>> +{ > >>> + uint64_t *valid_bits = > >>> + &to_vmx(vcpu)->msr_ia32_feature_control_valid_bits; > >>> + ASSERT(!(bits & ~FEATURE_CONTROL_MAX_VALID_BITS)); > >>> + *valid_bits &= ~bits; > >>> + if (!(*valid_bits & ~FEATURE_CONTROL_LOCKED)) > >>> + *valid_bits = 0; > >>> +} > >>> + > >>> #define VMCS12_OFFSET(x) offsetof(struct vmcs12, x) > >>> #define FIELD(number, name) [number] = VMCS12_OFFSET(name) > >>> #define FIELD64(number, name) [number] = VMCS12_OFFSET(name), \ > >>> @@ -2864,6 +2897,14 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata) > >>> return 0; > >>> } > >>> > >>> +static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu, > >>> + uint64_t val) > >>> +{ > >>> + uint64_t valid_bits = to_vmx(vcpu)->msr_ia32_feature_control_valid_bits; > >>> + > >>> + return valid_bits && !(val & ~valid_bits); > >>> +} > >>> /* > >>> * Reads an msr value (of 'msr_index') into 'pdata'. > >>> * Returns 0 on success, non-0 otherwise. > >>> @@ -2906,7 +2947,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > >>> msr_info->data = vmcs_read64(GUEST_BNDCFGS); > >>> break; > >>> case MSR_IA32_FEATURE_CONTROL: > >>> - if (!nested_vmx_allowed(vcpu)) > >>> + if (!vmx_feature_control_msr_valid(vcpu, 0)) > >> > >> You can remove this if completely in patch 1. It's okay to make the MSR > >> always available. > >> > > > > But then it also allows all bits to be set by guests, even though some > > features are not available. > > Note that this is "get". Of course the "if" must stay in vmx_set_msr. > My mistake. I'll remove it in patch 1. Thanks, Haozhong ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v2 3/3] KVM: VMX: enable guest access to LMCE related MSRs 2016-06-16 6:05 [PATCH v2 0/3] Add KVM support for Intel local MCE Haozhong Zhang 2016-06-16 6:05 ` [PATCH v2 1/3] KVM: VMX: move msr_ia32_feature_control to vcpu_vmx Haozhong Zhang 2016-06-16 6:05 ` [PATCH v2 2/3] KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL Haozhong Zhang @ 2016-06-16 6:05 ` Haozhong Zhang 2016-06-16 10:04 ` Paolo Bonzini 2 siblings, 1 reply; 20+ messages in thread From: Haozhong Zhang @ 2016-06-16 6:05 UTC (permalink / raw) To: kvm Cc: Paolo Bonzini, rkrcmar, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel, Gleb Natapov, Boris Petkov, Tony Luck, Andi Kleen, Ashok Raj, Haozhong Zhang From: Ashok Raj <ashok.raj@intel.com> On Intel platforms, this patch adds LMCE to KVM MCE supported capabilities and handles guest access to LMCE related MSRs. Signed-off-by: Ashok Raj <ashok.raj@intel.com> [Haozhong: macro KVM_MCE_CAP_SUPPORTED => variable kvm_mce_cap_supported Only enable LMCE on Intel platform Check MSR_IA32_FEATURE_CONTROL when handling guest access to MSR_IA32_MCG_EXT_CTL] Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> --- arch/x86/include/asm/kvm_host.h | 5 +++++ arch/x86/kvm/vmx.c | 36 +++++++++++++++++++++++++++++++++++- arch/x86/kvm/x86.c | 15 +++++++++------ 3 files changed, 49 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index e0fbe7e..75defa6 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -598,6 +598,7 @@ struct kvm_vcpu_arch { u64 mcg_cap; u64 mcg_status; u64 mcg_ctl; + u64 mcg_ext_ctl; u64 *mce_banks; /* Cache MMIO info */ @@ -1005,6 +1006,8 @@ struct kvm_x86_ops { int (*update_pi_irte)(struct kvm *kvm, unsigned int host_irq, uint32_t guest_irq, bool set); void (*apicv_post_state_restore)(struct kvm_vcpu *vcpu); + + void (*setup_mce)(struct kvm_vcpu *vcpu); }; struct kvm_arch_async_pf { @@ -1077,6 +1080,8 @@ extern u8 kvm_tsc_scaling_ratio_frac_bits; /* maximum allowed value of TSC scaling ratio */ extern u64 kvm_max_tsc_scaling_ratio; +extern u64 kvm_mce_cap_supported; + enum emulation_result { EMULATE_DONE, /* no further processing */ EMULATE_USER_EXIT, /* kvm_run ready for userspace exit */ diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 1dc89c5..42db42e 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -638,7 +638,7 @@ static struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu) * feature_control_valid_bits_add/del(), so it's not included here. */ #define FEATURE_CONTROL_MAX_VALID_BITS \ - FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX + (FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX | FEATURE_CONTROL_LMCE) static void feature_control_valid_bits_add(struct kvm_vcpu *vcpu, uint64_t bits) { @@ -2905,6 +2905,15 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu, return valid_bits && !(val & ~valid_bits); } +static inline bool vmx_mcg_ext_ctrl_msr_present(struct kvm_vcpu *vcpu, + bool host_initiated) +{ + return (vcpu->arch.mcg_cap & MCG_LMCE_P) && + (host_initiated || + (to_vmx(vcpu)->msr_ia32_feature_control & + FEATURE_CONTROL_LMCE)); +} + /* * Reads an msr value (of 'msr_index') into 'pdata'. * Returns 0 on success, non-0 otherwise. @@ -2946,6 +2955,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) return 1; msr_info->data = vmcs_read64(GUEST_BNDCFGS); break; + case MSR_IA32_MCG_EXT_CTL: + if (!vmx_mcg_ext_ctrl_msr_present(vcpu, + msr_info->host_initiated)) + return 1; + msr_info->data = vcpu->arch.mcg_ext_ctl; + break; case MSR_IA32_FEATURE_CONTROL: if (!vmx_feature_control_msr_valid(vcpu, 0)) return 1; @@ -3039,6 +3054,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_TSC_ADJUST: ret = kvm_set_msr_common(vcpu, msr_info); break; + case MSR_IA32_MCG_EXT_CTL: + if (!vmx_mcg_ext_ctrl_msr_present(vcpu, + msr_info->host_initiated) || + (data & ~MCG_EXT_CTL_LMCE_EN)) + return 1; + vcpu->arch.mcg_ext_ctl = data; + break; case MSR_IA32_FEATURE_CONTROL: if (!vmx_feature_control_msr_valid(vcpu, data) || (to_vmx(vcpu)->msr_ia32_feature_control & @@ -6433,6 +6455,8 @@ static __init int hardware_setup(void) kvm_set_posted_intr_wakeup_handler(wakeup_handler); + kvm_mce_cap_supported |= MCG_LMCE_P; + return alloc_kvm_area(); out8: @@ -10950,6 +10974,14 @@ out: return ret; } +static void vmx_setup_mce(struct kvm_vcpu *vcpu) +{ + if (vcpu->arch.mcg_cap & MCG_LMCE_P) + feature_control_valid_bits_add(vcpu, FEATURE_CONTROL_LMCE); + else + feature_control_valid_bits_del(vcpu, FEATURE_CONTROL_LMCE); +} + static struct kvm_x86_ops vmx_x86_ops = { .cpu_has_kvm_support = cpu_has_kvm_support, .disabled_by_bios = vmx_disabled_by_bios, @@ -11074,6 +11106,8 @@ static struct kvm_x86_ops vmx_x86_ops = { .pmu_ops = &intel_pmu_ops, .update_pi_irte = vmx_update_pi_irte, + + .setup_mce = vmx_setup_mce, }; static int __init vmx_init(void) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index bf22721..5bf76ab 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -70,7 +70,8 @@ #define MAX_IO_MSRS 256 #define KVM_MAX_MCE_BANKS 32 -#define KVM_MCE_CAP_SUPPORTED (MCG_CTL_P | MCG_SER_P) +u64 __read_mostly kvm_mce_cap_supported = MCG_CTL_P | MCG_SER_P; +EXPORT_SYMBOL_GPL(kvm_mce_cap_supported); #define emul_to_vcpu(ctxt) \ container_of(ctxt, struct kvm_vcpu, arch.emulate_ctxt) @@ -983,6 +984,7 @@ static u32 emulated_msrs[] = { MSR_IA32_MISC_ENABLE, MSR_IA32_MCG_STATUS, MSR_IA32_MCG_CTL, + MSR_IA32_MCG_EXT_CTL, MSR_IA32_SMBASE, }; @@ -2684,11 +2686,9 @@ long kvm_arch_dev_ioctl(struct file *filp, break; } case KVM_X86_GET_MCE_CAP_SUPPORTED: { - u64 mce_cap; - - mce_cap = KVM_MCE_CAP_SUPPORTED; r = -EFAULT; - if (copy_to_user(argp, &mce_cap, sizeof mce_cap)) + if (copy_to_user(argp, &kvm_mce_cap_supported, + sizeof(kvm_mce_cap_supported))) goto out; r = 0; break; @@ -2866,7 +2866,7 @@ static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu *vcpu, r = -EINVAL; if (!bank_num || bank_num >= KVM_MAX_MCE_BANKS) goto out; - if (mcg_cap & ~(KVM_MCE_CAP_SUPPORTED | 0xff | 0xff0000)) + if (mcg_cap & ~(kvm_mce_cap_supported | 0xff | 0xff0000)) goto out; r = 0; vcpu->arch.mcg_cap = mcg_cap; @@ -2876,6 +2876,9 @@ static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu *vcpu, /* Init IA32_MCi_CTL to all 1s */ for (bank = 0; bank < bank_num; bank++) vcpu->arch.mce_banks[bank*4] = ~(u64)0; + + if (kvm_x86_ops->setup_mce) + kvm_x86_ops->setup_mce(vcpu); out: return r; } -- 2.9.0 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v2 3/3] KVM: VMX: enable guest access to LMCE related MSRs 2016-06-16 6:05 ` [PATCH v2 3/3] KVM: VMX: enable guest access to LMCE related MSRs Haozhong Zhang @ 2016-06-16 10:04 ` Paolo Bonzini 2016-06-16 10:49 ` Haozhong Zhang 2016-06-16 14:55 ` Eduardo Habkost 0 siblings, 2 replies; 20+ messages in thread From: Paolo Bonzini @ 2016-06-16 10:04 UTC (permalink / raw) To: Haozhong Zhang, kvm Cc: rkrcmar, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel, Gleb Natapov, Boris Petkov, Tony Luck, Andi Kleen, Ashok Raj, Eduardo Habkost On 16/06/2016 08:05, Haozhong Zhang wrote: > From: Ashok Raj <ashok.raj@intel.com> > > On Intel platforms, this patch adds LMCE to KVM MCE supported > capabilities and handles guest access to LMCE related MSRs. > > Signed-off-by: Ashok Raj <ashok.raj@intel.com> > [Haozhong: macro KVM_MCE_CAP_SUPPORTED => variable kvm_mce_cap_supported > Only enable LMCE on Intel platform > Check MSR_IA32_FEATURE_CONTROL when handling guest > access to MSR_IA32_MCG_EXT_CTL] > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> > --- > arch/x86/include/asm/kvm_host.h | 5 +++++ > arch/x86/kvm/vmx.c | 36 +++++++++++++++++++++++++++++++++++- > arch/x86/kvm/x86.c | 15 +++++++++------ > 3 files changed, 49 insertions(+), 7 deletions(-) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index e0fbe7e..75defa6 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -598,6 +598,7 @@ struct kvm_vcpu_arch { > u64 mcg_cap; > u64 mcg_status; > u64 mcg_ctl; > + u64 mcg_ext_ctl; > u64 *mce_banks; > > /* Cache MMIO info */ > @@ -1005,6 +1006,8 @@ struct kvm_x86_ops { > int (*update_pi_irte)(struct kvm *kvm, unsigned int host_irq, > uint32_t guest_irq, bool set); > void (*apicv_post_state_restore)(struct kvm_vcpu *vcpu); > + > + void (*setup_mce)(struct kvm_vcpu *vcpu); > }; > > struct kvm_arch_async_pf { > @@ -1077,6 +1080,8 @@ extern u8 kvm_tsc_scaling_ratio_frac_bits; > /* maximum allowed value of TSC scaling ratio */ > extern u64 kvm_max_tsc_scaling_ratio; > > +extern u64 kvm_mce_cap_supported; > + > enum emulation_result { > EMULATE_DONE, /* no further processing */ > EMULATE_USER_EXIT, /* kvm_run ready for userspace exit */ > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index 1dc89c5..42db42e 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -638,7 +638,7 @@ static struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu) > * feature_control_valid_bits_add/del(), so it's not included here. > */ > #define FEATURE_CONTROL_MAX_VALID_BITS \ > - FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX > + (FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX | FEATURE_CONTROL_LMCE) > > static void feature_control_valid_bits_add(struct kvm_vcpu *vcpu, uint64_t bits) > { > @@ -2905,6 +2905,15 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu, > return valid_bits && !(val & ~valid_bits); > } > > +static inline bool vmx_mcg_ext_ctrl_msr_present(struct kvm_vcpu *vcpu, > + bool host_initiated) > +{ > + return (vcpu->arch.mcg_cap & MCG_LMCE_P) && Checking MCG_LMCE_P is unnecessary, because you cannot set FEATURE_CONTROL_LMCE unless MCG_LMCE_P is present. You can just inline this function in the callers, it's simpler. > + (host_initiated || > + (to_vmx(vcpu)->msr_ia32_feature_control & > + FEATURE_CONTROL_LMCE)); > +} > + > /* > * Reads an msr value (of 'msr_index') into 'pdata'. > * Returns 0 on success, non-0 otherwise. > @@ -2946,6 +2955,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > return 1; > msr_info->data = vmcs_read64(GUEST_BNDCFGS); > break; > + case MSR_IA32_MCG_EXT_CTL: > + if (!vmx_mcg_ext_ctrl_msr_present(vcpu, > + msr_info->host_initiated)) > + return 1; > + msr_info->data = vcpu->arch.mcg_ext_ctl; > + break; > case MSR_IA32_FEATURE_CONTROL: > if (!vmx_feature_control_msr_valid(vcpu, 0)) > return 1; > @@ -3039,6 +3054,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > case MSR_IA32_TSC_ADJUST: > ret = kvm_set_msr_common(vcpu, msr_info); > break; > + case MSR_IA32_MCG_EXT_CTL: > + if (!vmx_mcg_ext_ctrl_msr_present(vcpu, > + msr_info->host_initiated) || > + (data & ~MCG_EXT_CTL_LMCE_EN)) > + return 1; > + vcpu->arch.mcg_ext_ctl = data; > + break; > case MSR_IA32_FEATURE_CONTROL: > if (!vmx_feature_control_msr_valid(vcpu, data) || > (to_vmx(vcpu)->msr_ia32_feature_control & > @@ -6433,6 +6455,8 @@ static __init int hardware_setup(void) > > kvm_set_posted_intr_wakeup_handler(wakeup_handler); > > + kvm_mce_cap_supported |= MCG_LMCE_P; Ah, so virtual LMCE is available on all processors! This is interesting, but it also makes it more complicated to handle in QEMU; a new QEMU generally doesn't require a new kernel. Eduardo, any ideas? Thanks, Paolo > return alloc_kvm_area(); > > out8: > @@ -10950,6 +10974,14 @@ out: > return ret; > } > > +static void vmx_setup_mce(struct kvm_vcpu *vcpu) > +{ > + if (vcpu->arch.mcg_cap & MCG_LMCE_P) > + feature_control_valid_bits_add(vcpu, FEATURE_CONTROL_LMCE); > + else > + feature_control_valid_bits_del(vcpu, FEATURE_CONTROL_LMCE); > +} > + > static struct kvm_x86_ops vmx_x86_ops = { > .cpu_has_kvm_support = cpu_has_kvm_support, > .disabled_by_bios = vmx_disabled_by_bios, > @@ -11074,6 +11106,8 @@ static struct kvm_x86_ops vmx_x86_ops = { > .pmu_ops = &intel_pmu_ops, > > .update_pi_irte = vmx_update_pi_irte, > + > + .setup_mce = vmx_setup_mce, > }; > > static int __init vmx_init(void) > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index bf22721..5bf76ab 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -70,7 +70,8 @@ > > #define MAX_IO_MSRS 256 > #define KVM_MAX_MCE_BANKS 32 > -#define KVM_MCE_CAP_SUPPORTED (MCG_CTL_P | MCG_SER_P) > +u64 __read_mostly kvm_mce_cap_supported = MCG_CTL_P | MCG_SER_P; > +EXPORT_SYMBOL_GPL(kvm_mce_cap_supported); > > #define emul_to_vcpu(ctxt) \ > container_of(ctxt, struct kvm_vcpu, arch.emulate_ctxt) > @@ -983,6 +984,7 @@ static u32 emulated_msrs[] = { > MSR_IA32_MISC_ENABLE, > MSR_IA32_MCG_STATUS, > MSR_IA32_MCG_CTL, > + MSR_IA32_MCG_EXT_CTL, > MSR_IA32_SMBASE, > }; > > @@ -2684,11 +2686,9 @@ long kvm_arch_dev_ioctl(struct file *filp, > break; > } > case KVM_X86_GET_MCE_CAP_SUPPORTED: { > - u64 mce_cap; > - > - mce_cap = KVM_MCE_CAP_SUPPORTED; > r = -EFAULT; > - if (copy_to_user(argp, &mce_cap, sizeof mce_cap)) > + if (copy_to_user(argp, &kvm_mce_cap_supported, > + sizeof(kvm_mce_cap_supported))) > goto out; > r = 0; > break; > @@ -2866,7 +2866,7 @@ static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu *vcpu, > r = -EINVAL; > if (!bank_num || bank_num >= KVM_MAX_MCE_BANKS) > goto out; > - if (mcg_cap & ~(KVM_MCE_CAP_SUPPORTED | 0xff | 0xff0000)) > + if (mcg_cap & ~(kvm_mce_cap_supported | 0xff | 0xff0000)) > goto out; > r = 0; > vcpu->arch.mcg_cap = mcg_cap; > @@ -2876,6 +2876,9 @@ static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu *vcpu, > /* Init IA32_MCi_CTL to all 1s */ > for (bank = 0; bank < bank_num; bank++) > vcpu->arch.mce_banks[bank*4] = ~(u64)0; > + > + if (kvm_x86_ops->setup_mce) > + kvm_x86_ops->setup_mce(vcpu); > out: > return r; > } > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 3/3] KVM: VMX: enable guest access to LMCE related MSRs 2016-06-16 10:04 ` Paolo Bonzini @ 2016-06-16 10:49 ` Haozhong Zhang 2016-06-16 14:55 ` Eduardo Habkost 1 sibling, 0 replies; 20+ messages in thread From: Haozhong Zhang @ 2016-06-16 10:49 UTC (permalink / raw) To: Paolo Bonzini Cc: kvm, rkrcmar, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel, Gleb Natapov, Boris Petkov, Tony Luck, Andi Kleen, Ashok Raj, Eduardo Habkost On 06/16/16 12:04, Paolo Bonzini wrote: > > > On 16/06/2016 08:05, Haozhong Zhang wrote: > > From: Ashok Raj <ashok.raj@intel.com> > > > > On Intel platforms, this patch adds LMCE to KVM MCE supported > > capabilities and handles guest access to LMCE related MSRs. > > > > Signed-off-by: Ashok Raj <ashok.raj@intel.com> > > [Haozhong: macro KVM_MCE_CAP_SUPPORTED => variable kvm_mce_cap_supported > > Only enable LMCE on Intel platform > > Check MSR_IA32_FEATURE_CONTROL when handling guest > > access to MSR_IA32_MCG_EXT_CTL] > > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> > > --- > > arch/x86/include/asm/kvm_host.h | 5 +++++ > > arch/x86/kvm/vmx.c | 36 +++++++++++++++++++++++++++++++++++- > > arch/x86/kvm/x86.c | 15 +++++++++------ > > 3 files changed, 49 insertions(+), 7 deletions(-) > > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > > index e0fbe7e..75defa6 100644 > > --- a/arch/x86/include/asm/kvm_host.h > > +++ b/arch/x86/include/asm/kvm_host.h > > @@ -598,6 +598,7 @@ struct kvm_vcpu_arch { > > u64 mcg_cap; > > u64 mcg_status; > > u64 mcg_ctl; > > + u64 mcg_ext_ctl; > > u64 *mce_banks; > > > > /* Cache MMIO info */ > > @@ -1005,6 +1006,8 @@ struct kvm_x86_ops { > > int (*update_pi_irte)(struct kvm *kvm, unsigned int host_irq, > > uint32_t guest_irq, bool set); > > void (*apicv_post_state_restore)(struct kvm_vcpu *vcpu); > > + > > + void (*setup_mce)(struct kvm_vcpu *vcpu); > > }; > > > > struct kvm_arch_async_pf { > > @@ -1077,6 +1080,8 @@ extern u8 kvm_tsc_scaling_ratio_frac_bits; > > /* maximum allowed value of TSC scaling ratio */ > > extern u64 kvm_max_tsc_scaling_ratio; > > > > +extern u64 kvm_mce_cap_supported; > > + > > enum emulation_result { > > EMULATE_DONE, /* no further processing */ > > EMULATE_USER_EXIT, /* kvm_run ready for userspace exit */ > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > > index 1dc89c5..42db42e 100644 > > --- a/arch/x86/kvm/vmx.c > > +++ b/arch/x86/kvm/vmx.c > > @@ -638,7 +638,7 @@ static struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu) > > * feature_control_valid_bits_add/del(), so it's not included here. > > */ > > #define FEATURE_CONTROL_MAX_VALID_BITS \ > > - FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX > > + (FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX | FEATURE_CONTROL_LMCE) > > > > static void feature_control_valid_bits_add(struct kvm_vcpu *vcpu, uint64_t bits) > > { > > @@ -2905,6 +2905,15 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu, > > return valid_bits && !(val & ~valid_bits); > > } > > > > +static inline bool vmx_mcg_ext_ctrl_msr_present(struct kvm_vcpu *vcpu, > > + bool host_initiated) > > +{ > > + return (vcpu->arch.mcg_cap & MCG_LMCE_P) && > > Checking MCG_LMCE_P is unnecessary, because you cannot set > FEATURE_CONTROL_LMCE unless MCG_LMCE_P is present. > > You can just inline this function in the callers, it's simpler. I'll remove the first line check and inline the other parts. > > > + (host_initiated || > > + (to_vmx(vcpu)->msr_ia32_feature_control & > > + FEATURE_CONTROL_LMCE)); > > +} > > + > > /* > > * Reads an msr value (of 'msr_index') into 'pdata'. > > * Returns 0 on success, non-0 otherwise. > > @@ -2946,6 +2955,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > > return 1; > > msr_info->data = vmcs_read64(GUEST_BNDCFGS); > > break; > > + case MSR_IA32_MCG_EXT_CTL: > > + if (!vmx_mcg_ext_ctrl_msr_present(vcpu, > > + msr_info->host_initiated)) > > + return 1; > > + msr_info->data = vcpu->arch.mcg_ext_ctl; > > + break; > > case MSR_IA32_FEATURE_CONTROL: > > if (!vmx_feature_control_msr_valid(vcpu, 0)) > > return 1; > > @@ -3039,6 +3054,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > > case MSR_IA32_TSC_ADJUST: > > ret = kvm_set_msr_common(vcpu, msr_info); > > break; > > + case MSR_IA32_MCG_EXT_CTL: > > + if (!vmx_mcg_ext_ctrl_msr_present(vcpu, > > + msr_info->host_initiated) || > > + (data & ~MCG_EXT_CTL_LMCE_EN)) > > + return 1; > > + vcpu->arch.mcg_ext_ctl = data; > > + break; > > case MSR_IA32_FEATURE_CONTROL: > > if (!vmx_feature_control_msr_valid(vcpu, data) || > > (to_vmx(vcpu)->msr_ia32_feature_control & > > @@ -6433,6 +6455,8 @@ static __init int hardware_setup(void) > > > > kvm_set_posted_intr_wakeup_handler(wakeup_handler); > > > > + kvm_mce_cap_supported |= MCG_LMCE_P; > > Ah, so virtual LMCE is available on all processors! This is > interesting, but it also makes it more complicated to handle in QEMU; a > new QEMU generally doesn't require a new kernel. > My QMEU patch checks KVM MCE capabilities before enabling LMCE (See lmce_supported() and mce_init() in QEMU Patch 1), so either new QEMU on old kernel or old QEMU on new kernel will work. Haozhong > Eduardo, any ideas? > > Thanks, > > Paolo > > > return alloc_kvm_area(); > > > > out8: > > @@ -10950,6 +10974,14 @@ out: > > return ret; > > } > > > > +static void vmx_setup_mce(struct kvm_vcpu *vcpu) > > +{ > > + if (vcpu->arch.mcg_cap & MCG_LMCE_P) > > + feature_control_valid_bits_add(vcpu, FEATURE_CONTROL_LMCE); > > + else > > + feature_control_valid_bits_del(vcpu, FEATURE_CONTROL_LMCE); > > +} > > + > > static struct kvm_x86_ops vmx_x86_ops = { > > .cpu_has_kvm_support = cpu_has_kvm_support, > > .disabled_by_bios = vmx_disabled_by_bios, > > @@ -11074,6 +11106,8 @@ static struct kvm_x86_ops vmx_x86_ops = { > > .pmu_ops = &intel_pmu_ops, > > > > .update_pi_irte = vmx_update_pi_irte, > > + > > + .setup_mce = vmx_setup_mce, > > }; > > > > static int __init vmx_init(void) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > index bf22721..5bf76ab 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -70,7 +70,8 @@ > > > > #define MAX_IO_MSRS 256 > > #define KVM_MAX_MCE_BANKS 32 > > -#define KVM_MCE_CAP_SUPPORTED (MCG_CTL_P | MCG_SER_P) > > +u64 __read_mostly kvm_mce_cap_supported = MCG_CTL_P | MCG_SER_P; > > +EXPORT_SYMBOL_GPL(kvm_mce_cap_supported); > > > > #define emul_to_vcpu(ctxt) \ > > container_of(ctxt, struct kvm_vcpu, arch.emulate_ctxt) > > @@ -983,6 +984,7 @@ static u32 emulated_msrs[] = { > > MSR_IA32_MISC_ENABLE, > > MSR_IA32_MCG_STATUS, > > MSR_IA32_MCG_CTL, > > + MSR_IA32_MCG_EXT_CTL, > > MSR_IA32_SMBASE, > > }; > > > > @@ -2684,11 +2686,9 @@ long kvm_arch_dev_ioctl(struct file *filp, > > break; > > } > > case KVM_X86_GET_MCE_CAP_SUPPORTED: { > > - u64 mce_cap; > > - > > - mce_cap = KVM_MCE_CAP_SUPPORTED; > > r = -EFAULT; > > - if (copy_to_user(argp, &mce_cap, sizeof mce_cap)) > > + if (copy_to_user(argp, &kvm_mce_cap_supported, > > + sizeof(kvm_mce_cap_supported))) > > goto out; > > r = 0; > > break; > > @@ -2866,7 +2866,7 @@ static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu *vcpu, > > r = -EINVAL; > > if (!bank_num || bank_num >= KVM_MAX_MCE_BANKS) > > goto out; > > - if (mcg_cap & ~(KVM_MCE_CAP_SUPPORTED | 0xff | 0xff0000)) > > + if (mcg_cap & ~(kvm_mce_cap_supported | 0xff | 0xff0000)) > > goto out; > > r = 0; > > vcpu->arch.mcg_cap = mcg_cap; > > @@ -2876,6 +2876,9 @@ static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu *vcpu, > > /* Init IA32_MCi_CTL to all 1s */ > > for (bank = 0; bank < bank_num; bank++) > > vcpu->arch.mce_banks[bank*4] = ~(u64)0; > > + > > + if (kvm_x86_ops->setup_mce) > > + kvm_x86_ops->setup_mce(vcpu); > > out: > > return r; > > } > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 3/3] KVM: VMX: enable guest access to LMCE related MSRs 2016-06-16 10:04 ` Paolo Bonzini 2016-06-16 10:49 ` Haozhong Zhang @ 2016-06-16 14:55 ` Eduardo Habkost 2016-06-17 1:11 ` Haozhong Zhang 1 sibling, 1 reply; 20+ messages in thread From: Eduardo Habkost @ 2016-06-16 14:55 UTC (permalink / raw) To: Paolo Bonzini Cc: Haozhong Zhang, kvm, rkrcmar, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel, Gleb Natapov, Boris Petkov, Tony Luck, Andi Kleen, Ashok Raj, libvir-list, Jiri Denemark On Thu, Jun 16, 2016 at 12:04:50PM +0200, Paolo Bonzini wrote: > On 16/06/2016 08:05, Haozhong Zhang wrote: > > From: Ashok Raj <ashok.raj@intel.com> > > > > On Intel platforms, this patch adds LMCE to KVM MCE supported > > capabilities and handles guest access to LMCE related MSRs. > > > > Signed-off-by: Ashok Raj <ashok.raj@intel.com> > > [Haozhong: macro KVM_MCE_CAP_SUPPORTED => variable kvm_mce_cap_supported > > Only enable LMCE on Intel platform > > Check MSR_IA32_FEATURE_CONTROL when handling guest > > access to MSR_IA32_MCG_EXT_CTL] > > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> [...] > > @@ -6433,6 +6455,8 @@ static __init int hardware_setup(void) > > > > kvm_set_posted_intr_wakeup_handler(wakeup_handler); > > > > + kvm_mce_cap_supported |= MCG_LMCE_P; > > Ah, so virtual LMCE is available on all processors! This is > interesting, but it also makes it more complicated to handle in QEMU; a > new QEMU generally doesn't require a new kernel. > > Eduardo, any ideas? (CCing libvirt list) As we shouldn't make machine-type changes introduce new host requirements, it looks like we need to either add a new set of CPU models (unreasonable), or expect management software to explicitly enable LMCE after ensuring the host supports it. Or we could wait for a reasonable time after the feature is available in the kernel, and declare that QEMU as a whole requires a newer kernel. But how much time would be reasonable for that? Long term, I believe we should think of a better solution. I don't think it is reasonable to require new libvirt code to be written for every single low-level feature that requires a newer kernel or newer host hardware. Maybe new introspection interfaces that would allow us to drop the "no new requirements on machine-type changes" rule? -- Eduardo ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 3/3] KVM: VMX: enable guest access to LMCE related MSRs 2016-06-16 14:55 ` Eduardo Habkost @ 2016-06-17 1:11 ` Haozhong Zhang 2016-06-17 17:15 ` Eduardo Habkost 0 siblings, 1 reply; 20+ messages in thread From: Haozhong Zhang @ 2016-06-17 1:11 UTC (permalink / raw) To: Eduardo Habkost Cc: Paolo Bonzini, kvm, rkrcmar, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel, Gleb Natapov, Boris Petkov, Tony Luck, Andi Kleen, Ashok Raj, libvir-list, Jiri Denemark On 06/16/16 11:55, Eduardo Habkost wrote: > On Thu, Jun 16, 2016 at 12:04:50PM +0200, Paolo Bonzini wrote: > > On 16/06/2016 08:05, Haozhong Zhang wrote: > > > From: Ashok Raj <ashok.raj@intel.com> > > > > > > On Intel platforms, this patch adds LMCE to KVM MCE supported > > > capabilities and handles guest access to LMCE related MSRs. > > > > > > Signed-off-by: Ashok Raj <ashok.raj@intel.com> > > > [Haozhong: macro KVM_MCE_CAP_SUPPORTED => variable kvm_mce_cap_supported > > > Only enable LMCE on Intel platform > > > Check MSR_IA32_FEATURE_CONTROL when handling guest > > > access to MSR_IA32_MCG_EXT_CTL] > > > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> > [...] > > > @@ -6433,6 +6455,8 @@ static __init int hardware_setup(void) > > > > > > kvm_set_posted_intr_wakeup_handler(wakeup_handler); > > > > > > + kvm_mce_cap_supported |= MCG_LMCE_P; > > > > Ah, so virtual LMCE is available on all processors! This is > > interesting, but it also makes it more complicated to handle in QEMU; a > > new QEMU generally doesn't require a new kernel. > > > > Eduardo, any ideas? > > (CCing libvirt list) > > As we shouldn't make machine-type changes introduce new host > requirements, it looks like we need to either add a new set of > CPU models (unreasonable), or expect management software to > explicitly enable LMCE after ensuring the host supports it. > > Or we could wait for a reasonable time after the feature is > available in the kernel, and declare that QEMU as a whole > requires a newer kernel. But how much time would be reasonable > for that? > > Long term, I believe we should think of a better solution. I > don't think it is reasonable to require new libvirt code to be > written for every single low-level feature that requires a newer > kernel or newer host hardware. Maybe new introspection interfaces > that would allow us to drop the "no new requirements on > machine-type changes" rule? > Because new MSR (MSR_IA32_MCG_EXT_CTL) and new MSR bit (FEATURE_CONTROL_LMCE in MSR_IA32_FEATURE_CONTROL) are introduced by LMCE, QEMU requires new KVM which can handle those changes. I'm not familiar with libvirt. Does the requirement of new KVM capability bring any troubles to libvirt? Thanks, Haozhong ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 3/3] KVM: VMX: enable guest access to LMCE related MSRs 2016-06-17 1:11 ` Haozhong Zhang @ 2016-06-17 17:15 ` Eduardo Habkost 2016-06-20 1:49 ` Haozhong Zhang 0 siblings, 1 reply; 20+ messages in thread From: Eduardo Habkost @ 2016-06-17 17:15 UTC (permalink / raw) To: Paolo Bonzini, kvm, rkrcmar, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel, Gleb Natapov, Boris Petkov, Tony Luck, Andi Kleen, Ashok Raj, libvir-list, Jiri Denemark On Fri, Jun 17, 2016 at 09:11:16AM +0800, Haozhong Zhang wrote: > On 06/16/16 11:55, Eduardo Habkost wrote: > > On Thu, Jun 16, 2016 at 12:04:50PM +0200, Paolo Bonzini wrote: > > > On 16/06/2016 08:05, Haozhong Zhang wrote: > > > > From: Ashok Raj <ashok.raj@intel.com> > > > > > > > > On Intel platforms, this patch adds LMCE to KVM MCE supported > > > > capabilities and handles guest access to LMCE related MSRs. > > > > > > > > Signed-off-by: Ashok Raj <ashok.raj@intel.com> > > > > [Haozhong: macro KVM_MCE_CAP_SUPPORTED => variable kvm_mce_cap_supported > > > > Only enable LMCE on Intel platform > > > > Check MSR_IA32_FEATURE_CONTROL when handling guest > > > > access to MSR_IA32_MCG_EXT_CTL] > > > > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> > > [...] > > > > @@ -6433,6 +6455,8 @@ static __init int hardware_setup(void) > > > > > > > > kvm_set_posted_intr_wakeup_handler(wakeup_handler); > > > > > > > > + kvm_mce_cap_supported |= MCG_LMCE_P; > > > > > > Ah, so virtual LMCE is available on all processors! This is > > > interesting, but it also makes it more complicated to handle in QEMU; a > > > new QEMU generally doesn't require a new kernel. > > > > > > Eduardo, any ideas? > > > > (CCing libvirt list) > > > > As we shouldn't make machine-type changes introduce new host > > requirements, it looks like we need to either add a new set of > > CPU models (unreasonable), or expect management software to > > explicitly enable LMCE after ensuring the host supports it. > > > > Or we could wait for a reasonable time after the feature is > > available in the kernel, and declare that QEMU as a whole > > requires a newer kernel. But how much time would be reasonable > > for that? > > > > Long term, I believe we should think of a better solution. I > > don't think it is reasonable to require new libvirt code to be > > written for every single low-level feature that requires a newer > > kernel or newer host hardware. Maybe new introspection interfaces > > that would allow us to drop the "no new requirements on > > machine-type changes" rule? > > > > Because new MSR (MSR_IA32_MCG_EXT_CTL) and new MSR bit > (FEATURE_CONTROL_LMCE in MSR_IA32_FEATURE_CONTROL) are introduced by > LMCE, QEMU requires new KVM which can handle those changes. If I understood correctly, you are describing the second option above (declaring that QEMU as a whole requires a newer kernel). > > I'm not familiar with libvirt. Does the requirement of new KVM > capability bring any troubles to libvirt? It does, assuming that we still support running QEMU under an older kernel where KVM doesn't LMCE. In this case, the pc-2.6 machine-type will run, but the pc-2.7 machine-type won't. The requirement of new KVM capabilities based on the machine-type is a problem for livirt. libvirt have some host-capabilities APIs to allow software to check if the VM can be run on (or migrated to) a host, but none of them are based on machine-type. This is not necessarily specific to libvirt: people may have their own configuration or scripts that use the default "pc" alias, and a QEMU upgrade shouldn't break their configuration. -- Eduardo ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 3/3] KVM: VMX: enable guest access to LMCE related MSRs 2016-06-17 17:15 ` Eduardo Habkost @ 2016-06-20 1:49 ` Haozhong Zhang 2016-06-20 6:55 ` Paolo Bonzini 0 siblings, 1 reply; 20+ messages in thread From: Haozhong Zhang @ 2016-06-20 1:49 UTC (permalink / raw) To: Eduardo Habkost Cc: Paolo Bonzini, kvm, rkrcmar, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel, Gleb Natapov, Boris Petkov, Tony Luck, Andi Kleen, Ashok Raj, libvir-list, Jiri Denemark On 06/17/16 14:15, Eduardo Habkost wrote: > On Fri, Jun 17, 2016 at 09:11:16AM +0800, Haozhong Zhang wrote: > > On 06/16/16 11:55, Eduardo Habkost wrote: > > > On Thu, Jun 16, 2016 at 12:04:50PM +0200, Paolo Bonzini wrote: > > > > On 16/06/2016 08:05, Haozhong Zhang wrote: > > > > > From: Ashok Raj <ashok.raj@intel.com> > > > > > > > > > > On Intel platforms, this patch adds LMCE to KVM MCE supported > > > > > capabilities and handles guest access to LMCE related MSRs. > > > > > > > > > > Signed-off-by: Ashok Raj <ashok.raj@intel.com> > > > > > [Haozhong: macro KVM_MCE_CAP_SUPPORTED => variable kvm_mce_cap_supported > > > > > Only enable LMCE on Intel platform > > > > > Check MSR_IA32_FEATURE_CONTROL when handling guest > > > > > access to MSR_IA32_MCG_EXT_CTL] > > > > > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> > > > [...] > > > > > @@ -6433,6 +6455,8 @@ static __init int hardware_setup(void) > > > > > > > > > > kvm_set_posted_intr_wakeup_handler(wakeup_handler); > > > > > > > > > > + kvm_mce_cap_supported |= MCG_LMCE_P; > > > > > > > > Ah, so virtual LMCE is available on all processors! This is > > > > interesting, but it also makes it more complicated to handle in QEMU; a > > > > new QEMU generally doesn't require a new kernel. > > > > > > > > Eduardo, any ideas? > > > > > > (CCing libvirt list) > > > > > > As we shouldn't make machine-type changes introduce new host > > > requirements, it looks like we need to either add a new set of > > > CPU models (unreasonable), or expect management software to > > > explicitly enable LMCE after ensuring the host supports it. > > > > > > Or we could wait for a reasonable time after the feature is > > > available in the kernel, and declare that QEMU as a whole > > > requires a newer kernel. But how much time would be reasonable > > > for that? > > > > > > Long term, I believe we should think of a better solution. I > > > don't think it is reasonable to require new libvirt code to be > > > written for every single low-level feature that requires a newer > > > kernel or newer host hardware. Maybe new introspection interfaces > > > that would allow us to drop the "no new requirements on > > > machine-type changes" rule? > > > > > > > Because new MSR (MSR_IA32_MCG_EXT_CTL) and new MSR bit > > (FEATURE_CONTROL_LMCE in MSR_IA32_FEATURE_CONTROL) are introduced by > > LMCE, QEMU requires new KVM which can handle those changes. > > If I understood correctly, you are describing the second option > above (declaring that QEMU as a whole requires a newer kernel). > > > > > I'm not familiar with libvirt. Does the requirement of new KVM > > capability bring any troubles to libvirt? > > It does, assuming that we still support running QEMU under an > older kernel where KVM doesn't LMCE. In this case, the pc-2.6 > machine-type will run, but the pc-2.7 machine-type won't. > > The requirement of new KVM capabilities based on the machine-type > is a problem for livirt. libvirt have some host-capabilities APIs > to allow software to check if the VM can be run on (or migrated > to) a host, but none of them are based on machine-type. > > This is not necessarily specific to libvirt: people may have > their own configuration or scripts that use the default "pc" > alias, and a QEMU upgrade shouldn't break their configuration. > Thanks for the explanation! If we disable LMCE in QEMU by default (even for -cpu host), will it still be a problem? That is, - pc-2.7 can continue to run on old kernels unless users explicitly require LMCE - existing libvirt VM configurations can continue to work on pc-2.7 because LMCE is not specified in those configurations and are disabled by default (i.e. no requirement for new kernels) - existing QEMU configurations/scripts using pc alias can continue to work on pc-27 for the same reason above. Thanks, Haozhong ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 3/3] KVM: VMX: enable guest access to LMCE related MSRs 2016-06-20 1:49 ` Haozhong Zhang @ 2016-06-20 6:55 ` Paolo Bonzini 0 siblings, 0 replies; 20+ messages in thread From: Paolo Bonzini @ 2016-06-20 6:55 UTC (permalink / raw) To: Eduardo Habkost, kvm, rkrcmar, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel, Gleb Natapov, Boris Petkov, Tony Luck, Andi Kleen, Ashok Raj, libvir-list, Jiri Denemark On 20/06/2016 03:49, Haozhong Zhang wrote: > Thanks for the explanation! > > If we disable LMCE in QEMU by default (even for -cpu host), will it > still be a problem? That is, > > - pc-2.7 can continue to run on old kernels unless users explicitly > require LMCE > > - existing libvirt VM configurations can continue to work on pc-2.7 > because LMCE is not specified in those configurations and are > disabled by default (i.e. no requirement for new kernels) > > - existing QEMU configurations/scripts using pc alias can continue to > work on pc-27 for the same reason above. Yes, that would be fine. "-cpu host" can leave LMCE enabled if supported by the kernel. Paolo ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2016-06-20 7:13 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-06-16 6:05 [PATCH v2 0/3] Add KVM support for Intel local MCE Haozhong Zhang 2016-06-16 6:05 ` [PATCH v2 1/3] KVM: VMX: move msr_ia32_feature_control to vcpu_vmx Haozhong Zhang 2016-06-16 11:49 ` Borislav Petkov 2016-06-16 11:57 ` Paolo Bonzini 2016-06-16 11:57 ` Haozhong Zhang 2016-06-16 6:05 ` [PATCH v2 2/3] KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL Haozhong Zhang 2016-06-16 9:55 ` Paolo Bonzini 2016-06-16 11:21 ` Haozhong Zhang 2016-06-16 10:01 ` Paolo Bonzini 2016-06-16 11:16 ` Haozhong Zhang 2016-06-16 11:19 ` Paolo Bonzini 2016-06-16 11:29 ` Haozhong Zhang 2016-06-16 6:05 ` [PATCH v2 3/3] KVM: VMX: enable guest access to LMCE related MSRs Haozhong Zhang 2016-06-16 10:04 ` Paolo Bonzini 2016-06-16 10:49 ` Haozhong Zhang 2016-06-16 14:55 ` Eduardo Habkost 2016-06-17 1:11 ` Haozhong Zhang 2016-06-17 17:15 ` Eduardo Habkost 2016-06-20 1:49 ` Haozhong Zhang 2016-06-20 6:55 ` Paolo Bonzini
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).