linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] Add KVM support for Intel local MCE
@ 2016-06-16  6:05 Haozhong Zhang
  2016-06-16  6:05 ` [PATCH v2 1/3] KVM: VMX: move msr_ia32_feature_control to vcpu_vmx Haozhong Zhang
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Haozhong Zhang @ 2016-06-16  6:05 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, rkrcmar, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, x86, linux-kernel, Gleb Natapov, Boris Petkov,
	Tony Luck, Andi Kleen, Ashok Raj, Haozhong Zhang

Changes in v2:
 * v1 Patch 1 becomes v2 Patch 3.
 * Fix COB chain in Patch 3. (Boris Petkov)
 * (Patch 1) Move msr_ia32_feature_control from nested_vmx to
   vcpu_vmx, because it does not depend only on nested after this
   patch series. (Radim Krčmář)
 * (Patch 2) Add a valid bitmask for MSR_IA32_FEATURE_CONTROL to allow
   checking individual bits of MSR_IA32_FEATURE_CONTROL according to
   enabled features. (Radim Krčmář)
 * Move the common check in handling MSR_IA32_MCG_EXT_CTL to function
   vmx_mcg_ext_ctl_msr_present. (Radim Krčmář)

Changes in v1:
 * Change macro KVM_MCE_CAP_SUPPORTED to variable kvm_mce_cap_supported.
 * Include LMCE capability in kvm_mce_cap_supported only on Intel CPU,
   i.e. LMCE can be enabled only on Intel CPU.
 * Check if LMCE is enabled in guest MSR_IA32_FEATURE_CONTROL when
   handling guest access to MSR_IA32_MCG_EXT_CTL.

This patch series along with the corresponding QEMU patch series (sent
via another email with title "[PATCH v4 0/3] Add QEMU support for
Intel local MCE") enables Intel local MCE feature for guest. This KVM
patch handles guest access to LMCE-related MSR (MSR_IA32_MCG_EXT_CTL
and MSR_IA32_FEATURE_CONTROL).

Ashok Raj (1):
  KVM: VMX: enable guest access to LMCE related MSRs

Haozhong Zhang (2):
  KVM: VMX: move msr_ia32_feature_control to vcpu_vmx
  KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL

 arch/x86/include/asm/kvm_host.h |  5 +++
 arch/x86/kvm/vmx.c              | 97 ++++++++++++++++++++++++++++++++++++++---
 arch/x86/kvm/x86.c              | 15 ++++---
 3 files changed, 104 insertions(+), 13 deletions(-)

-- 
2.9.0

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 1/3] KVM: VMX: move msr_ia32_feature_control to vcpu_vmx
  2016-06-16  6:05 [PATCH v2 0/3] Add KVM support for Intel local MCE Haozhong Zhang
@ 2016-06-16  6:05 ` Haozhong Zhang
  2016-06-16 11:49   ` Borislav Petkov
  2016-06-16  6:05 ` [PATCH v2 2/3] KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL Haozhong Zhang
  2016-06-16  6:05 ` [PATCH v2 3/3] KVM: VMX: enable guest access to LMCE related MSRs Haozhong Zhang
  2 siblings, 1 reply; 20+ messages in thread
From: Haozhong Zhang @ 2016-06-16  6:05 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, rkrcmar, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, x86, linux-kernel, Gleb Natapov, Boris Petkov,
	Tony Luck, Andi Kleen, Ashok Raj, Haozhong Zhang

msr_ia32_feature_control will be used for LMCE and not depend only on
nested anymore, so move it from struct nested_vmx to struct vcpu_vmx.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
 arch/x86/kvm/vmx.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 57ec6a4..6b63f2d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -421,7 +421,6 @@ struct nested_vmx {
 	struct pi_desc *pi_desc;
 	bool pi_pending;
 	u16 posted_intr_nv;
-	u64 msr_ia32_feature_control;
 
 	struct hrtimer preemption_timer;
 	bool preemption_timer_expired;
@@ -602,6 +601,8 @@ struct vcpu_vmx {
 	bool guest_pkru_valid;
 	u32 guest_pkru;
 	u32 host_pkru;
+
+	u64 msr_ia32_feature_control;
 };
 
 enum segment_cache_field {
@@ -2907,7 +2908,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_FEATURE_CONTROL:
 		if (!nested_vmx_allowed(vcpu))
 			return 1;
-		msr_info->data = to_vmx(vcpu)->nested.msr_ia32_feature_control;
+		msr_info->data = to_vmx(vcpu)->msr_ia32_feature_control;
 		break;
 	case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
 		if (!nested_vmx_allowed(vcpu))
@@ -2999,10 +3000,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		break;
 	case MSR_IA32_FEATURE_CONTROL:
 		if (!nested_vmx_allowed(vcpu) ||
-		    (to_vmx(vcpu)->nested.msr_ia32_feature_control &
+		    (to_vmx(vcpu)->msr_ia32_feature_control &
 		     FEATURE_CONTROL_LOCKED && !msr_info->host_initiated))
 			return 1;
-		vmx->nested.msr_ia32_feature_control = data;
+		vmx->msr_ia32_feature_control = data;
 		if (msr_info->host_initiated && data == 0)
 			vmx_leave_nested(vcpu);
 		break;
@@ -6859,7 +6860,7 @@ static int handle_vmon(struct kvm_vcpu *vcpu)
 		return 1;
 	}
 
-	if ((vmx->nested.msr_ia32_feature_control & VMXON_NEEDED_FEATURES)
+	if ((vmx->msr_ia32_feature_control & VMXON_NEEDED_FEATURES)
 			!= VMXON_NEEDED_FEATURES) {
 		kvm_inject_gp(vcpu, 0);
 		return 1;
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/3] KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL
  2016-06-16  6:05 [PATCH v2 0/3] Add KVM support for Intel local MCE Haozhong Zhang
  2016-06-16  6:05 ` [PATCH v2 1/3] KVM: VMX: move msr_ia32_feature_control to vcpu_vmx Haozhong Zhang
@ 2016-06-16  6:05 ` Haozhong Zhang
  2016-06-16  9:55   ` Paolo Bonzini
  2016-06-16 10:01   ` Paolo Bonzini
  2016-06-16  6:05 ` [PATCH v2 3/3] KVM: VMX: enable guest access to LMCE related MSRs Haozhong Zhang
  2 siblings, 2 replies; 20+ messages in thread
From: Haozhong Zhang @ 2016-06-16  6:05 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, rkrcmar, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, x86, linux-kernel, Gleb Natapov, Boris Petkov,
	Tony Luck, Andi Kleen, Ashok Raj, Haozhong Zhang

KVM currently does not check the value written to guest
MSR_IA32_FEATURE_CONTROL, though bits corresponding to disabled features
may be set. This patch makes KVM to validate individual bits written to
guest MSR_IA32_FEATURE_CONTROL according to enabled features.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
 arch/x86/kvm/vmx.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 50 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6b63f2d..1dc89c5 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -602,7 +602,16 @@ struct vcpu_vmx {
 	u32 guest_pkru;
 	u32 host_pkru;
 
+	/*
+	 * Only bits masked by msr_ia32_feature_control_valid_bits can be set in
+	 * msr_ia32_feature_control.
+	 *
+	 * msr_ia32_feature_control_valid_bits should be modified by
+	 * feature_control_valid_bits_add/del(), and only bits masked by
+	 * FEATURE_CONTROL_MAX_VALID_BITS can be modified.
+	 */
 	u64 msr_ia32_feature_control;
+	u64 msr_ia32_feature_control_valid_bits;
 };
 
 enum segment_cache_field {
@@ -624,6 +633,30 @@ static struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
 	return &(to_vmx(vcpu)->pi_desc);
 }
 
+/*
+ * FEATURE_CONTROL_LOCKED is added/removed automatically by
+ * feature_control_valid_bits_add/del(), so it's not included here.
+ */
+#define FEATURE_CONTROL_MAX_VALID_BITS \
+	FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX
+
+static void feature_control_valid_bits_add(struct kvm_vcpu *vcpu, uint64_t bits)
+{
+	ASSERT(!(bits & ~FEATURE_CONTROL_MAX_VALID_BITS));
+	to_vmx(vcpu)->msr_ia32_feature_control_valid_bits |=
+		bits | FEATURE_CONTROL_LOCKED;
+}
+
+static void feature_control_valid_bits_del(struct kvm_vcpu *vcpu, uint64_t bits)
+{
+	uint64_t *valid_bits =
+		&to_vmx(vcpu)->msr_ia32_feature_control_valid_bits;
+	ASSERT(!(bits & ~FEATURE_CONTROL_MAX_VALID_BITS));
+	*valid_bits &= ~bits;
+	if (!(*valid_bits & ~FEATURE_CONTROL_LOCKED))
+		*valid_bits = 0;
+}
+
 #define VMCS12_OFFSET(x) offsetof(struct vmcs12, x)
 #define FIELD(number, name)	[number] = VMCS12_OFFSET(name)
 #define FIELD64(number, name)	[number] = VMCS12_OFFSET(name), \
@@ -2864,6 +2897,14 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
 	return 0;
 }
 
+static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
+						 uint64_t val)
+{
+	uint64_t valid_bits = to_vmx(vcpu)->msr_ia32_feature_control_valid_bits;
+
+	return valid_bits && !(val & ~valid_bits);
+}
+
 /*
  * Reads an msr value (of 'msr_index') into 'pdata'.
  * Returns 0 on success, non-0 otherwise.
@@ -2906,7 +2947,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		msr_info->data = vmcs_read64(GUEST_BNDCFGS);
 		break;
 	case MSR_IA32_FEATURE_CONTROL:
-		if (!nested_vmx_allowed(vcpu))
+		if (!vmx_feature_control_msr_valid(vcpu, 0))
 			return 1;
 		msr_info->data = to_vmx(vcpu)->msr_ia32_feature_control;
 		break;
@@ -2999,7 +3040,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		ret = kvm_set_msr_common(vcpu, msr_info);
 		break;
 	case MSR_IA32_FEATURE_CONTROL:
-		if (!nested_vmx_allowed(vcpu) ||
+		if (!vmx_feature_control_msr_valid(vcpu, data) ||
 		    (to_vmx(vcpu)->msr_ia32_feature_control &
 		     FEATURE_CONTROL_LOCKED && !msr_info->host_initiated))
 			return 1;
@@ -9095,6 +9136,13 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
 			vmx->nested.nested_vmx_secondary_ctls_high &=
 				~SECONDARY_EXEC_PCOMMIT;
 	}
+
+	if (nested_vmx_allowed(vcpu))
+		feature_control_valid_bits_add
+			(vcpu, FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX);
+	else
+		feature_control_valid_bits_del
+			(vcpu, FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX);
 }
 
 static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 3/3] KVM: VMX: enable guest access to LMCE related MSRs
  2016-06-16  6:05 [PATCH v2 0/3] Add KVM support for Intel local MCE Haozhong Zhang
  2016-06-16  6:05 ` [PATCH v2 1/3] KVM: VMX: move msr_ia32_feature_control to vcpu_vmx Haozhong Zhang
  2016-06-16  6:05 ` [PATCH v2 2/3] KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL Haozhong Zhang
@ 2016-06-16  6:05 ` Haozhong Zhang
  2016-06-16 10:04   ` Paolo Bonzini
  2 siblings, 1 reply; 20+ messages in thread
From: Haozhong Zhang @ 2016-06-16  6:05 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, rkrcmar, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, x86, linux-kernel, Gleb Natapov, Boris Petkov,
	Tony Luck, Andi Kleen, Ashok Raj, Haozhong Zhang

From: Ashok Raj <ashok.raj@intel.com>

On Intel platforms, this patch adds LMCE to KVM MCE supported
capabilities and handles guest access to LMCE related MSRs.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
[Haozhong: macro KVM_MCE_CAP_SUPPORTED => variable kvm_mce_cap_supported
           Only enable LMCE on Intel platform
	   Check MSR_IA32_FEATURE_CONTROL when handling guest
	     access to MSR_IA32_MCG_EXT_CTL]
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  5 +++++
 arch/x86/kvm/vmx.c              | 36 +++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.c              | 15 +++++++++------
 3 files changed, 49 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e0fbe7e..75defa6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -598,6 +598,7 @@ struct kvm_vcpu_arch {
 	u64 mcg_cap;
 	u64 mcg_status;
 	u64 mcg_ctl;
+	u64 mcg_ext_ctl;
 	u64 *mce_banks;
 
 	/* Cache MMIO info */
@@ -1005,6 +1006,8 @@ struct kvm_x86_ops {
 	int (*update_pi_irte)(struct kvm *kvm, unsigned int host_irq,
 			      uint32_t guest_irq, bool set);
 	void (*apicv_post_state_restore)(struct kvm_vcpu *vcpu);
+
+	void (*setup_mce)(struct kvm_vcpu *vcpu);
 };
 
 struct kvm_arch_async_pf {
@@ -1077,6 +1080,8 @@ extern u8   kvm_tsc_scaling_ratio_frac_bits;
 /* maximum allowed value of TSC scaling ratio */
 extern u64  kvm_max_tsc_scaling_ratio;
 
+extern u64 kvm_mce_cap_supported;
+
 enum emulation_result {
 	EMULATE_DONE,         /* no further processing */
 	EMULATE_USER_EXIT,    /* kvm_run ready for userspace exit */
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1dc89c5..42db42e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -638,7 +638,7 @@ static struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
  * feature_control_valid_bits_add/del(), so it's not included here.
  */
 #define FEATURE_CONTROL_MAX_VALID_BITS \
-	FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX
+	(FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX | FEATURE_CONTROL_LMCE)
 
 static void feature_control_valid_bits_add(struct kvm_vcpu *vcpu, uint64_t bits)
 {
@@ -2905,6 +2905,15 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
 	return valid_bits && !(val & ~valid_bits);
 }
 
+static inline bool vmx_mcg_ext_ctrl_msr_present(struct kvm_vcpu *vcpu,
+						bool host_initiated)
+{
+	return (vcpu->arch.mcg_cap & MCG_LMCE_P) &&
+		(host_initiated ||
+		 (to_vmx(vcpu)->msr_ia32_feature_control &
+		  FEATURE_CONTROL_LMCE));
+}
+
 /*
  * Reads an msr value (of 'msr_index') into 'pdata'.
  * Returns 0 on success, non-0 otherwise.
@@ -2946,6 +2955,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			return 1;
 		msr_info->data = vmcs_read64(GUEST_BNDCFGS);
 		break;
+	case MSR_IA32_MCG_EXT_CTL:
+		if (!vmx_mcg_ext_ctrl_msr_present(vcpu,
+						  msr_info->host_initiated))
+			return 1;
+		msr_info->data = vcpu->arch.mcg_ext_ctl;
+		break;
 	case MSR_IA32_FEATURE_CONTROL:
 		if (!vmx_feature_control_msr_valid(vcpu, 0))
 			return 1;
@@ -3039,6 +3054,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_TSC_ADJUST:
 		ret = kvm_set_msr_common(vcpu, msr_info);
 		break;
+	case MSR_IA32_MCG_EXT_CTL:
+		if (!vmx_mcg_ext_ctrl_msr_present(vcpu,
+						  msr_info->host_initiated) ||
+		    (data & ~MCG_EXT_CTL_LMCE_EN))
+			return 1;
+		vcpu->arch.mcg_ext_ctl = data;
+		break;
 	case MSR_IA32_FEATURE_CONTROL:
 		if (!vmx_feature_control_msr_valid(vcpu, data) ||
 		    (to_vmx(vcpu)->msr_ia32_feature_control &
@@ -6433,6 +6455,8 @@ static __init int hardware_setup(void)
 
 	kvm_set_posted_intr_wakeup_handler(wakeup_handler);
 
+	kvm_mce_cap_supported |= MCG_LMCE_P;
+
 	return alloc_kvm_area();
 
 out8:
@@ -10950,6 +10974,14 @@ out:
 	return ret;
 }
 
+static void vmx_setup_mce(struct kvm_vcpu *vcpu)
+{
+	if (vcpu->arch.mcg_cap & MCG_LMCE_P)
+		feature_control_valid_bits_add(vcpu, FEATURE_CONTROL_LMCE);
+	else
+		feature_control_valid_bits_del(vcpu, FEATURE_CONTROL_LMCE);
+}
+
 static struct kvm_x86_ops vmx_x86_ops = {
 	.cpu_has_kvm_support = cpu_has_kvm_support,
 	.disabled_by_bios = vmx_disabled_by_bios,
@@ -11074,6 +11106,8 @@ static struct kvm_x86_ops vmx_x86_ops = {
 	.pmu_ops = &intel_pmu_ops,
 
 	.update_pi_irte = vmx_update_pi_irte,
+
+	.setup_mce = vmx_setup_mce,
 };
 
 static int __init vmx_init(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bf22721..5bf76ab 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -70,7 +70,8 @@
 
 #define MAX_IO_MSRS 256
 #define KVM_MAX_MCE_BANKS 32
-#define KVM_MCE_CAP_SUPPORTED (MCG_CTL_P | MCG_SER_P)
+u64 __read_mostly kvm_mce_cap_supported = MCG_CTL_P | MCG_SER_P;
+EXPORT_SYMBOL_GPL(kvm_mce_cap_supported);
 
 #define emul_to_vcpu(ctxt) \
 	container_of(ctxt, struct kvm_vcpu, arch.emulate_ctxt)
@@ -983,6 +984,7 @@ static u32 emulated_msrs[] = {
 	MSR_IA32_MISC_ENABLE,
 	MSR_IA32_MCG_STATUS,
 	MSR_IA32_MCG_CTL,
+	MSR_IA32_MCG_EXT_CTL,
 	MSR_IA32_SMBASE,
 };
 
@@ -2684,11 +2686,9 @@ long kvm_arch_dev_ioctl(struct file *filp,
 		break;
 	}
 	case KVM_X86_GET_MCE_CAP_SUPPORTED: {
-		u64 mce_cap;
-
-		mce_cap = KVM_MCE_CAP_SUPPORTED;
 		r = -EFAULT;
-		if (copy_to_user(argp, &mce_cap, sizeof mce_cap))
+		if (copy_to_user(argp, &kvm_mce_cap_supported,
+				 sizeof(kvm_mce_cap_supported)))
 			goto out;
 		r = 0;
 		break;
@@ -2866,7 +2866,7 @@ static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu *vcpu,
 	r = -EINVAL;
 	if (!bank_num || bank_num >= KVM_MAX_MCE_BANKS)
 		goto out;
-	if (mcg_cap & ~(KVM_MCE_CAP_SUPPORTED | 0xff | 0xff0000))
+	if (mcg_cap & ~(kvm_mce_cap_supported | 0xff | 0xff0000))
 		goto out;
 	r = 0;
 	vcpu->arch.mcg_cap = mcg_cap;
@@ -2876,6 +2876,9 @@ static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu *vcpu,
 	/* Init IA32_MCi_CTL to all 1s */
 	for (bank = 0; bank < bank_num; bank++)
 		vcpu->arch.mce_banks[bank*4] = ~(u64)0;
+
+	if (kvm_x86_ops->setup_mce)
+		kvm_x86_ops->setup_mce(vcpu);
 out:
 	return r;
 }
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/3] KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL
  2016-06-16  6:05 ` [PATCH v2 2/3] KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL Haozhong Zhang
@ 2016-06-16  9:55   ` Paolo Bonzini
  2016-06-16 11:21     ` Haozhong Zhang
  2016-06-16 10:01   ` Paolo Bonzini
  1 sibling, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2016-06-16  9:55 UTC (permalink / raw)
  To: Haozhong Zhang, kvm
  Cc: rkrcmar, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86,
	linux-kernel, Gleb Natapov, Boris Petkov, Tony Luck, Andi Kleen,
	Ashok Raj



On 16/06/2016 08:05, Haozhong Zhang wrote:
> +	/*
> +	 * Only bits masked by msr_ia32_feature_control_valid_bits can be set in
> +	 * msr_ia32_feature_control.
> +	 *
> +	 * msr_ia32_feature_control_valid_bits should be modified by
> +	 * feature_control_valid_bits_add/del(), and only bits masked by
> +	 * FEATURE_CONTROL_MAX_VALID_BITS can be modified.
> +	 */
>  	u64 msr_ia32_feature_control;
> +	u64 msr_ia32_feature_control_valid_bits;

I noticed that the fw_cfg patch used an uint32_t.  It probably should
use uint64_t; what you did here is correct.

Paolo

>  };

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/3] KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL
  2016-06-16  6:05 ` [PATCH v2 2/3] KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL Haozhong Zhang
  2016-06-16  9:55   ` Paolo Bonzini
@ 2016-06-16 10:01   ` Paolo Bonzini
  2016-06-16 11:16     ` Haozhong Zhang
  1 sibling, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2016-06-16 10:01 UTC (permalink / raw)
  To: Haozhong Zhang, kvm
  Cc: rkrcmar, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86,
	linux-kernel, Gleb Natapov, Boris Petkov, Tony Luck, Andi Kleen,
	Ashok Raj



On 16/06/2016 08:05, Haozhong Zhang wrote:
> +/*
> + * FEATURE_CONTROL_LOCKED is added/removed automatically by
> + * feature_control_valid_bits_add/del(), so it's not included here.
> + */
> +#define FEATURE_CONTROL_MAX_VALID_BITS \
> +	FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX
> +
> +static void feature_control_valid_bits_add(struct kvm_vcpu *vcpu, uint64_t bits)
> +{
> +	ASSERT(!(bits & ~FEATURE_CONTROL_MAX_VALID_BITS));

The KVM-specific ASSERT macro should go.  You can use WARN_ON.

However, I think FEATURE_CONTROL_LOCKED should always be writable.  If
you change that, it's simpler to just do |= and &= in the caller.

> +	to_vmx(vcpu)->msr_ia32_feature_control_valid_bits |=
> +		bits | FEATURE_CONTROL_LOCKED;
> +}
> +
> +static void feature_control_valid_bits_del(struct kvm_vcpu *vcpu, uint64_t bits)
> +{
> +	uint64_t *valid_bits =
> +		&to_vmx(vcpu)->msr_ia32_feature_control_valid_bits;
> +	ASSERT(!(bits & ~FEATURE_CONTROL_MAX_VALID_BITS));
> +	*valid_bits &= ~bits;
> +	if (!(*valid_bits & ~FEATURE_CONTROL_LOCKED))
> +		*valid_bits = 0;
> +}
> +
>  #define VMCS12_OFFSET(x) offsetof(struct vmcs12, x)
>  #define FIELD(number, name)	[number] = VMCS12_OFFSET(name)
>  #define FIELD64(number, name)	[number] = VMCS12_OFFSET(name), \
> @@ -2864,6 +2897,14 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
>  	return 0;
>  }
>  
> +static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
> +						 uint64_t val)
> +{
> +	uint64_t valid_bits = to_vmx(vcpu)->msr_ia32_feature_control_valid_bits;
> +
> +	return valid_bits && !(val & ~valid_bits);
> +}
>  /*
>   * Reads an msr value (of 'msr_index') into 'pdata'.
>   * Returns 0 on success, non-0 otherwise.
> @@ -2906,7 +2947,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		msr_info->data = vmcs_read64(GUEST_BNDCFGS);
>  		break;
>  	case MSR_IA32_FEATURE_CONTROL:
> -		if (!nested_vmx_allowed(vcpu))
> +		if (!vmx_feature_control_msr_valid(vcpu, 0))

You can remove this if completely in patch 1.  It's okay to make the MSR
always available.

Thanks,

Paolo

>  			return 1;
>  		msr_info->data = to_vmx(vcpu)->msr_ia32_feature_control;
>  		break;
> @@ -2999,7 +3040,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		ret = kvm_set_msr_common(vcpu, msr_info);
>  		break;
>  	case MSR_IA32_FEATURE_CONTROL:
> -		if (!nested_vmx_allowed(vcpu) ||
> +		if (!vmx_feature_control_msr_valid(vcpu, data) ||
>  		    (to_vmx(vcpu)->msr_ia32_feature_control &
>  		     FEATURE_CONTROL_LOCKED && !msr_info->host_initiated))
>  			return 1;
> @@ -9095,6 +9136,13 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
>  			vmx->nested.nested_vmx_secondary_ctls_high &=
>  				~SECONDARY_EXEC_PCOMMIT;
>  	}
> +
> +	if (nested_vmx_allowed(vcpu))
> +		feature_control_valid_bits_add
> +			(vcpu, FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX);
> +	else
> +		feature_control_valid_bits_del
> +			(vcpu, FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX);
>  }

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/3] KVM: VMX: enable guest access to LMCE related MSRs
  2016-06-16  6:05 ` [PATCH v2 3/3] KVM: VMX: enable guest access to LMCE related MSRs Haozhong Zhang
@ 2016-06-16 10:04   ` Paolo Bonzini
  2016-06-16 10:49     ` Haozhong Zhang
  2016-06-16 14:55     ` Eduardo Habkost
  0 siblings, 2 replies; 20+ messages in thread
From: Paolo Bonzini @ 2016-06-16 10:04 UTC (permalink / raw)
  To: Haozhong Zhang, kvm
  Cc: rkrcmar, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86,
	linux-kernel, Gleb Natapov, Boris Petkov, Tony Luck, Andi Kleen,
	Ashok Raj, Eduardo Habkost



On 16/06/2016 08:05, Haozhong Zhang wrote:
> From: Ashok Raj <ashok.raj@intel.com>
> 
> On Intel platforms, this patch adds LMCE to KVM MCE supported
> capabilities and handles guest access to LMCE related MSRs.
> 
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> [Haozhong: macro KVM_MCE_CAP_SUPPORTED => variable kvm_mce_cap_supported
>            Only enable LMCE on Intel platform
> 	   Check MSR_IA32_FEATURE_CONTROL when handling guest
> 	     access to MSR_IA32_MCG_EXT_CTL]
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  5 +++++
>  arch/x86/kvm/vmx.c              | 36 +++++++++++++++++++++++++++++++++++-
>  arch/x86/kvm/x86.c              | 15 +++++++++------
>  3 files changed, 49 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index e0fbe7e..75defa6 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -598,6 +598,7 @@ struct kvm_vcpu_arch {
>  	u64 mcg_cap;
>  	u64 mcg_status;
>  	u64 mcg_ctl;
> +	u64 mcg_ext_ctl;
>  	u64 *mce_banks;
>  
>  	/* Cache MMIO info */
> @@ -1005,6 +1006,8 @@ struct kvm_x86_ops {
>  	int (*update_pi_irte)(struct kvm *kvm, unsigned int host_irq,
>  			      uint32_t guest_irq, bool set);
>  	void (*apicv_post_state_restore)(struct kvm_vcpu *vcpu);
> +
> +	void (*setup_mce)(struct kvm_vcpu *vcpu);
>  };
>  
>  struct kvm_arch_async_pf {
> @@ -1077,6 +1080,8 @@ extern u8   kvm_tsc_scaling_ratio_frac_bits;
>  /* maximum allowed value of TSC scaling ratio */
>  extern u64  kvm_max_tsc_scaling_ratio;
>  
> +extern u64 kvm_mce_cap_supported;
> +
>  enum emulation_result {
>  	EMULATE_DONE,         /* no further processing */
>  	EMULATE_USER_EXIT,    /* kvm_run ready for userspace exit */
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 1dc89c5..42db42e 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -638,7 +638,7 @@ static struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
>   * feature_control_valid_bits_add/del(), so it's not included here.
>   */
>  #define FEATURE_CONTROL_MAX_VALID_BITS \
> -	FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX
> +	(FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX | FEATURE_CONTROL_LMCE)
>  
>  static void feature_control_valid_bits_add(struct kvm_vcpu *vcpu, uint64_t bits)
>  {
> @@ -2905,6 +2905,15 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
>  	return valid_bits && !(val & ~valid_bits);
>  }
>  
> +static inline bool vmx_mcg_ext_ctrl_msr_present(struct kvm_vcpu *vcpu,
> +						bool host_initiated)
> +{
> +	return (vcpu->arch.mcg_cap & MCG_LMCE_P) &&

Checking MCG_LMCE_P is unnecessary, because you cannot set
FEATURE_CONTROL_LMCE unless MCG_LMCE_P is present.

You can just inline this function in the callers, it's simpler.

> +		(host_initiated ||
> +		 (to_vmx(vcpu)->msr_ia32_feature_control &
> +		  FEATURE_CONTROL_LMCE));
> +}
> +
>  /*
>   * Reads an msr value (of 'msr_index') into 'pdata'.
>   * Returns 0 on success, non-0 otherwise.
> @@ -2946,6 +2955,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  			return 1;
>  		msr_info->data = vmcs_read64(GUEST_BNDCFGS);
>  		break;
> +	case MSR_IA32_MCG_EXT_CTL:
> +		if (!vmx_mcg_ext_ctrl_msr_present(vcpu,
> +						  msr_info->host_initiated))
> +			return 1;
> +		msr_info->data = vcpu->arch.mcg_ext_ctl;
> +		break;
>  	case MSR_IA32_FEATURE_CONTROL:
>  		if (!vmx_feature_control_msr_valid(vcpu, 0))
>  			return 1;
> @@ -3039,6 +3054,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  	case MSR_IA32_TSC_ADJUST:
>  		ret = kvm_set_msr_common(vcpu, msr_info);
>  		break;
> +	case MSR_IA32_MCG_EXT_CTL:
> +		if (!vmx_mcg_ext_ctrl_msr_present(vcpu,
> +						  msr_info->host_initiated) ||
> +		    (data & ~MCG_EXT_CTL_LMCE_EN))
> +			return 1;
> +		vcpu->arch.mcg_ext_ctl = data;
> +		break;
>  	case MSR_IA32_FEATURE_CONTROL:
>  		if (!vmx_feature_control_msr_valid(vcpu, data) ||
>  		    (to_vmx(vcpu)->msr_ia32_feature_control &
> @@ -6433,6 +6455,8 @@ static __init int hardware_setup(void)
>  
>  	kvm_set_posted_intr_wakeup_handler(wakeup_handler);
>  
> +	kvm_mce_cap_supported |= MCG_LMCE_P;

Ah, so virtual LMCE is available on all processors!  This is
interesting, but it also makes it more complicated to handle in QEMU; a
new QEMU generally doesn't require a new kernel.

Eduardo, any ideas?

Thanks,

Paolo

>  	return alloc_kvm_area();
>  
>  out8:
> @@ -10950,6 +10974,14 @@ out:
>  	return ret;
>  }
>  
> +static void vmx_setup_mce(struct kvm_vcpu *vcpu)
> +{
> +	if (vcpu->arch.mcg_cap & MCG_LMCE_P)
> +		feature_control_valid_bits_add(vcpu, FEATURE_CONTROL_LMCE);
> +	else
> +		feature_control_valid_bits_del(vcpu, FEATURE_CONTROL_LMCE);
> +}
> +
>  static struct kvm_x86_ops vmx_x86_ops = {
>  	.cpu_has_kvm_support = cpu_has_kvm_support,
>  	.disabled_by_bios = vmx_disabled_by_bios,
> @@ -11074,6 +11106,8 @@ static struct kvm_x86_ops vmx_x86_ops = {
>  	.pmu_ops = &intel_pmu_ops,
>  
>  	.update_pi_irte = vmx_update_pi_irte,
> +
> +	.setup_mce = vmx_setup_mce,
>  };
>  
>  static int __init vmx_init(void)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index bf22721..5bf76ab 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -70,7 +70,8 @@
>  
>  #define MAX_IO_MSRS 256
>  #define KVM_MAX_MCE_BANKS 32
> -#define KVM_MCE_CAP_SUPPORTED (MCG_CTL_P | MCG_SER_P)
> +u64 __read_mostly kvm_mce_cap_supported = MCG_CTL_P | MCG_SER_P;
> +EXPORT_SYMBOL_GPL(kvm_mce_cap_supported);
>  
>  #define emul_to_vcpu(ctxt) \
>  	container_of(ctxt, struct kvm_vcpu, arch.emulate_ctxt)
> @@ -983,6 +984,7 @@ static u32 emulated_msrs[] = {
>  	MSR_IA32_MISC_ENABLE,
>  	MSR_IA32_MCG_STATUS,
>  	MSR_IA32_MCG_CTL,
> +	MSR_IA32_MCG_EXT_CTL,
>  	MSR_IA32_SMBASE,
>  };
>  
> @@ -2684,11 +2686,9 @@ long kvm_arch_dev_ioctl(struct file *filp,
>  		break;
>  	}
>  	case KVM_X86_GET_MCE_CAP_SUPPORTED: {
> -		u64 mce_cap;
> -
> -		mce_cap = KVM_MCE_CAP_SUPPORTED;
>  		r = -EFAULT;
> -		if (copy_to_user(argp, &mce_cap, sizeof mce_cap))
> +		if (copy_to_user(argp, &kvm_mce_cap_supported,
> +				 sizeof(kvm_mce_cap_supported)))
>  			goto out;
>  		r = 0;
>  		break;
> @@ -2866,7 +2866,7 @@ static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu *vcpu,
>  	r = -EINVAL;
>  	if (!bank_num || bank_num >= KVM_MAX_MCE_BANKS)
>  		goto out;
> -	if (mcg_cap & ~(KVM_MCE_CAP_SUPPORTED | 0xff | 0xff0000))
> +	if (mcg_cap & ~(kvm_mce_cap_supported | 0xff | 0xff0000))
>  		goto out;
>  	r = 0;
>  	vcpu->arch.mcg_cap = mcg_cap;
> @@ -2876,6 +2876,9 @@ static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu *vcpu,
>  	/* Init IA32_MCi_CTL to all 1s */
>  	for (bank = 0; bank < bank_num; bank++)
>  		vcpu->arch.mce_banks[bank*4] = ~(u64)0;
> +
> +	if (kvm_x86_ops->setup_mce)
> +		kvm_x86_ops->setup_mce(vcpu);
>  out:
>  	return r;
>  }
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/3] KVM: VMX: enable guest access to LMCE related MSRs
  2016-06-16 10:04   ` Paolo Bonzini
@ 2016-06-16 10:49     ` Haozhong Zhang
  2016-06-16 14:55     ` Eduardo Habkost
  1 sibling, 0 replies; 20+ messages in thread
From: Haozhong Zhang @ 2016-06-16 10:49 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, rkrcmar, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86,
	linux-kernel, Gleb Natapov, Boris Petkov, Tony Luck, Andi Kleen,
	Ashok Raj, Eduardo Habkost

On 06/16/16 12:04, Paolo Bonzini wrote:
> 
> 
> On 16/06/2016 08:05, Haozhong Zhang wrote:
> > From: Ashok Raj <ashok.raj@intel.com>
> > 
> > On Intel platforms, this patch adds LMCE to KVM MCE supported
> > capabilities and handles guest access to LMCE related MSRs.
> > 
> > Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> > [Haozhong: macro KVM_MCE_CAP_SUPPORTED => variable kvm_mce_cap_supported
> >            Only enable LMCE on Intel platform
> > 	   Check MSR_IA32_FEATURE_CONTROL when handling guest
> > 	     access to MSR_IA32_MCG_EXT_CTL]
> > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > ---
> >  arch/x86/include/asm/kvm_host.h |  5 +++++
> >  arch/x86/kvm/vmx.c              | 36 +++++++++++++++++++++++++++++++++++-
> >  arch/x86/kvm/x86.c              | 15 +++++++++------
> >  3 files changed, 49 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index e0fbe7e..75defa6 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -598,6 +598,7 @@ struct kvm_vcpu_arch {
> >  	u64 mcg_cap;
> >  	u64 mcg_status;
> >  	u64 mcg_ctl;
> > +	u64 mcg_ext_ctl;
> >  	u64 *mce_banks;
> >  
> >  	/* Cache MMIO info */
> > @@ -1005,6 +1006,8 @@ struct kvm_x86_ops {
> >  	int (*update_pi_irte)(struct kvm *kvm, unsigned int host_irq,
> >  			      uint32_t guest_irq, bool set);
> >  	void (*apicv_post_state_restore)(struct kvm_vcpu *vcpu);
> > +
> > +	void (*setup_mce)(struct kvm_vcpu *vcpu);
> >  };
> >  
> >  struct kvm_arch_async_pf {
> > @@ -1077,6 +1080,8 @@ extern u8   kvm_tsc_scaling_ratio_frac_bits;
> >  /* maximum allowed value of TSC scaling ratio */
> >  extern u64  kvm_max_tsc_scaling_ratio;
> >  
> > +extern u64 kvm_mce_cap_supported;
> > +
> >  enum emulation_result {
> >  	EMULATE_DONE,         /* no further processing */
> >  	EMULATE_USER_EXIT,    /* kvm_run ready for userspace exit */
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index 1dc89c5..42db42e 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -638,7 +638,7 @@ static struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
> >   * feature_control_valid_bits_add/del(), so it's not included here.
> >   */
> >  #define FEATURE_CONTROL_MAX_VALID_BITS \
> > -	FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX
> > +	(FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX | FEATURE_CONTROL_LMCE)
> >  
> >  static void feature_control_valid_bits_add(struct kvm_vcpu *vcpu, uint64_t bits)
> >  {
> > @@ -2905,6 +2905,15 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
> >  	return valid_bits && !(val & ~valid_bits);
> >  }
> >  
> > +static inline bool vmx_mcg_ext_ctrl_msr_present(struct kvm_vcpu *vcpu,
> > +						bool host_initiated)
> > +{
> > +	return (vcpu->arch.mcg_cap & MCG_LMCE_P) &&
> 
> Checking MCG_LMCE_P is unnecessary, because you cannot set
> FEATURE_CONTROL_LMCE unless MCG_LMCE_P is present.
> 
> You can just inline this function in the callers, it's simpler.

I'll remove the first line check and inline the other parts.

> 
> > +		(host_initiated ||
> > +		 (to_vmx(vcpu)->msr_ia32_feature_control &
> > +		  FEATURE_CONTROL_LMCE));
> > +}
> > +
> >  /*
> >   * Reads an msr value (of 'msr_index') into 'pdata'.
> >   * Returns 0 on success, non-0 otherwise.
> > @@ -2946,6 +2955,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >  			return 1;
> >  		msr_info->data = vmcs_read64(GUEST_BNDCFGS);
> >  		break;
> > +	case MSR_IA32_MCG_EXT_CTL:
> > +		if (!vmx_mcg_ext_ctrl_msr_present(vcpu,
> > +						  msr_info->host_initiated))
> > +			return 1;
> > +		msr_info->data = vcpu->arch.mcg_ext_ctl;
> > +		break;
> >  	case MSR_IA32_FEATURE_CONTROL:
> >  		if (!vmx_feature_control_msr_valid(vcpu, 0))
> >  			return 1;
> > @@ -3039,6 +3054,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >  	case MSR_IA32_TSC_ADJUST:
> >  		ret = kvm_set_msr_common(vcpu, msr_info);
> >  		break;
> > +	case MSR_IA32_MCG_EXT_CTL:
> > +		if (!vmx_mcg_ext_ctrl_msr_present(vcpu,
> > +						  msr_info->host_initiated) ||
> > +		    (data & ~MCG_EXT_CTL_LMCE_EN))
> > +			return 1;
> > +		vcpu->arch.mcg_ext_ctl = data;
> > +		break;
> >  	case MSR_IA32_FEATURE_CONTROL:
> >  		if (!vmx_feature_control_msr_valid(vcpu, data) ||
> >  		    (to_vmx(vcpu)->msr_ia32_feature_control &
> > @@ -6433,6 +6455,8 @@ static __init int hardware_setup(void)
> >  
> >  	kvm_set_posted_intr_wakeup_handler(wakeup_handler);
> >  
> > +	kvm_mce_cap_supported |= MCG_LMCE_P;
> 
> Ah, so virtual LMCE is available on all processors!  This is
> interesting, but it also makes it more complicated to handle in QEMU; a
> new QEMU generally doesn't require a new kernel.
>

My QMEU patch checks KVM MCE capabilities before enabling LMCE (See
lmce_supported() and mce_init() in QEMU Patch 1), so either new QEMU
on old kernel or old QEMU on new kernel will work.

Haozhong

> Eduardo, any ideas?
> 
> Thanks,
> 
> Paolo
> 
> >  	return alloc_kvm_area();
> >  
> >  out8:
> > @@ -10950,6 +10974,14 @@ out:
> >  	return ret;
> >  }
> >  
> > +static void vmx_setup_mce(struct kvm_vcpu *vcpu)
> > +{
> > +	if (vcpu->arch.mcg_cap & MCG_LMCE_P)
> > +		feature_control_valid_bits_add(vcpu, FEATURE_CONTROL_LMCE);
> > +	else
> > +		feature_control_valid_bits_del(vcpu, FEATURE_CONTROL_LMCE);
> > +}
> > +
> >  static struct kvm_x86_ops vmx_x86_ops = {
> >  	.cpu_has_kvm_support = cpu_has_kvm_support,
> >  	.disabled_by_bios = vmx_disabled_by_bios,
> > @@ -11074,6 +11106,8 @@ static struct kvm_x86_ops vmx_x86_ops = {
> >  	.pmu_ops = &intel_pmu_ops,
> >  
> >  	.update_pi_irte = vmx_update_pi_irte,
> > +
> > +	.setup_mce = vmx_setup_mce,
> >  };
> >  
> >  static int __init vmx_init(void)
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index bf22721..5bf76ab 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -70,7 +70,8 @@
> >  
> >  #define MAX_IO_MSRS 256
> >  #define KVM_MAX_MCE_BANKS 32
> > -#define KVM_MCE_CAP_SUPPORTED (MCG_CTL_P | MCG_SER_P)
> > +u64 __read_mostly kvm_mce_cap_supported = MCG_CTL_P | MCG_SER_P;
> > +EXPORT_SYMBOL_GPL(kvm_mce_cap_supported);
> >  
> >  #define emul_to_vcpu(ctxt) \
> >  	container_of(ctxt, struct kvm_vcpu, arch.emulate_ctxt)
> > @@ -983,6 +984,7 @@ static u32 emulated_msrs[] = {
> >  	MSR_IA32_MISC_ENABLE,
> >  	MSR_IA32_MCG_STATUS,
> >  	MSR_IA32_MCG_CTL,
> > +	MSR_IA32_MCG_EXT_CTL,
> >  	MSR_IA32_SMBASE,
> >  };
> >  
> > @@ -2684,11 +2686,9 @@ long kvm_arch_dev_ioctl(struct file *filp,
> >  		break;
> >  	}
> >  	case KVM_X86_GET_MCE_CAP_SUPPORTED: {
> > -		u64 mce_cap;
> > -
> > -		mce_cap = KVM_MCE_CAP_SUPPORTED;
> >  		r = -EFAULT;
> > -		if (copy_to_user(argp, &mce_cap, sizeof mce_cap))
> > +		if (copy_to_user(argp, &kvm_mce_cap_supported,
> > +				 sizeof(kvm_mce_cap_supported)))
> >  			goto out;
> >  		r = 0;
> >  		break;
> > @@ -2866,7 +2866,7 @@ static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu *vcpu,
> >  	r = -EINVAL;
> >  	if (!bank_num || bank_num >= KVM_MAX_MCE_BANKS)
> >  		goto out;
> > -	if (mcg_cap & ~(KVM_MCE_CAP_SUPPORTED | 0xff | 0xff0000))
> > +	if (mcg_cap & ~(kvm_mce_cap_supported | 0xff | 0xff0000))
> >  		goto out;
> >  	r = 0;
> >  	vcpu->arch.mcg_cap = mcg_cap;
> > @@ -2876,6 +2876,9 @@ static int kvm_vcpu_ioctl_x86_setup_mce(struct kvm_vcpu *vcpu,
> >  	/* Init IA32_MCi_CTL to all 1s */
> >  	for (bank = 0; bank < bank_num; bank++)
> >  		vcpu->arch.mce_banks[bank*4] = ~(u64)0;
> > +
> > +	if (kvm_x86_ops->setup_mce)
> > +		kvm_x86_ops->setup_mce(vcpu);
> >  out:
> >  	return r;
> >  }
> > 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/3] KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL
  2016-06-16 10:01   ` Paolo Bonzini
@ 2016-06-16 11:16     ` Haozhong Zhang
  2016-06-16 11:19       ` Paolo Bonzini
  0 siblings, 1 reply; 20+ messages in thread
From: Haozhong Zhang @ 2016-06-16 11:16 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, rkrcmar, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86,
	linux-kernel, Gleb Natapov, Boris Petkov, Tony Luck, Andi Kleen,
	Ashok Raj

On 06/16/16 12:01, Paolo Bonzini wrote:
> 
> 
> On 16/06/2016 08:05, Haozhong Zhang wrote:
> > +/*
> > + * FEATURE_CONTROL_LOCKED is added/removed automatically by
> > + * feature_control_valid_bits_add/del(), so it's not included here.
> > + */
> > +#define FEATURE_CONTROL_MAX_VALID_BITS \
> > +	FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX
> > +
> > +static void feature_control_valid_bits_add(struct kvm_vcpu *vcpu, uint64_t bits)
> > +{
> > +	ASSERT(!(bits & ~FEATURE_CONTROL_MAX_VALID_BITS));
> 
> The KVM-specific ASSERT macro should go.  You can use WARN_ON.
>

will change to WARN_ON.

> However, I think FEATURE_CONTROL_LOCKED should always be writable.  If
> you change that, it's simpler to just do |= and &= in the caller.
>

These two functions (add/del) are to prevent callers from forgetting
setting/clearing FEATURE_CONTROL_LOCKED in
msr_ia32_feature_control_valid_bits: it should be set if any feature
bit is set, and be cleared if all feature bits are cleared. The second
rule could relaxed as we can always present MSR_IA32_FEATURE_CONTROL
to guest.

I'm okey to let callers take care for the locked bit.

> > +	to_vmx(vcpu)->msr_ia32_feature_control_valid_bits |=
> > +		bits | FEATURE_CONTROL_LOCKED;
> > +}
> > +
> > +static void feature_control_valid_bits_del(struct kvm_vcpu *vcpu, uint64_t bits)
> > +{
> > +	uint64_t *valid_bits =
> > +		&to_vmx(vcpu)->msr_ia32_feature_control_valid_bits;
> > +	ASSERT(!(bits & ~FEATURE_CONTROL_MAX_VALID_BITS));
> > +	*valid_bits &= ~bits;
> > +	if (!(*valid_bits & ~FEATURE_CONTROL_LOCKED))
> > +		*valid_bits = 0;
> > +}
> > +
> >  #define VMCS12_OFFSET(x) offsetof(struct vmcs12, x)
> >  #define FIELD(number, name)	[number] = VMCS12_OFFSET(name)
> >  #define FIELD64(number, name)	[number] = VMCS12_OFFSET(name), \
> > @@ -2864,6 +2897,14 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
> >  	return 0;
> >  }
> >  
> > +static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
> > +						 uint64_t val)
> > +{
> > +	uint64_t valid_bits = to_vmx(vcpu)->msr_ia32_feature_control_valid_bits;
> > +
> > +	return valid_bits && !(val & ~valid_bits);
> > +}
> >  /*
> >   * Reads an msr value (of 'msr_index') into 'pdata'.
> >   * Returns 0 on success, non-0 otherwise.
> > @@ -2906,7 +2947,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >  		msr_info->data = vmcs_read64(GUEST_BNDCFGS);
> >  		break;
> >  	case MSR_IA32_FEATURE_CONTROL:
> > -		if (!nested_vmx_allowed(vcpu))
> > +		if (!vmx_feature_control_msr_valid(vcpu, 0))
> 
> You can remove this if completely in patch 1.  It's okay to make the MSR
> always available.
>

But then it also allows all bits to be set by guests, even though some
features are not available. Though this problem already exists before
Patch 1, I prefer to leave it as was in Patch 1 and fix it here.

Haozhong

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/3] KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL
  2016-06-16 11:16     ` Haozhong Zhang
@ 2016-06-16 11:19       ` Paolo Bonzini
  2016-06-16 11:29         ` Haozhong Zhang
  0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2016-06-16 11:19 UTC (permalink / raw)
  To: kvm, rkrcmar, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86,
	linux-kernel, Gleb Natapov, Boris Petkov, Tony Luck, Andi Kleen,
	Ashok Raj



On 16/06/2016 13:16, Haozhong Zhang wrote:
>> However, I think FEATURE_CONTROL_LOCKED should always be writable.  If
>> you change that, it's simpler to just do |= and &= in the caller.
> 
> These two functions (add/del) are to prevent callers from forgetting
> setting/clearing FEATURE_CONTROL_LOCKED in
> msr_ia32_feature_control_valid_bits: it should be set if any feature
> bit is set, and be cleared if all feature bits are cleared. The second
> rule could relaxed as we can always present MSR_IA32_FEATURE_CONTROL
> to guest.

Yes, this means that FEATURE_CONTROL_LOCKED effectively is always valid.
 So you end up with just &= to clear and |= to set.

> I'm okey to let callers take care for the locked bit.
> 
>>> +	to_vmx(vcpu)->msr_ia32_feature_control_valid_bits |=
>>> +		bits | FEATURE_CONTROL_LOCKED;
>>> +}
>>> +
>>> +static void feature_control_valid_bits_del(struct kvm_vcpu *vcpu, uint64_t bits)
>>> +{
>>> +	uint64_t *valid_bits =
>>> +		&to_vmx(vcpu)->msr_ia32_feature_control_valid_bits;
>>> +	ASSERT(!(bits & ~FEATURE_CONTROL_MAX_VALID_BITS));
>>> +	*valid_bits &= ~bits;
>>> +	if (!(*valid_bits & ~FEATURE_CONTROL_LOCKED))
>>> +		*valid_bits = 0;
>>> +}
>>> +
>>>  #define VMCS12_OFFSET(x) offsetof(struct vmcs12, x)
>>>  #define FIELD(number, name)	[number] = VMCS12_OFFSET(name)
>>>  #define FIELD64(number, name)	[number] = VMCS12_OFFSET(name), \
>>> @@ -2864,6 +2897,14 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
>>>  	return 0;
>>>  }
>>>  
>>> +static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
>>> +						 uint64_t val)
>>> +{
>>> +	uint64_t valid_bits = to_vmx(vcpu)->msr_ia32_feature_control_valid_bits;
>>> +
>>> +	return valid_bits && !(val & ~valid_bits);
>>> +}
>>>  /*
>>>   * Reads an msr value (of 'msr_index') into 'pdata'.
>>>   * Returns 0 on success, non-0 otherwise.
>>> @@ -2906,7 +2947,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>>  		msr_info->data = vmcs_read64(GUEST_BNDCFGS);
>>>  		break;
>>>  	case MSR_IA32_FEATURE_CONTROL:
>>> -		if (!nested_vmx_allowed(vcpu))
>>> +		if (!vmx_feature_control_msr_valid(vcpu, 0))
>>
>> You can remove this if completely in patch 1.  It's okay to make the MSR
>> always available.
>>
> 
> But then it also allows all bits to be set by guests, even though some
> features are not available.

Note that this is "get".  Of course the "if" must stay in vmx_set_msr.

Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/3] KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL
  2016-06-16  9:55   ` Paolo Bonzini
@ 2016-06-16 11:21     ` Haozhong Zhang
  0 siblings, 0 replies; 20+ messages in thread
From: Haozhong Zhang @ 2016-06-16 11:21 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, rkrcmar, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86,
	linux-kernel, Gleb Natapov, Boris Petkov, Tony Luck, Andi Kleen,
	Ashok Raj

On 06/16/16 11:55, Paolo Bonzini wrote:
> 
> 
> On 16/06/2016 08:05, Haozhong Zhang wrote:
> > +	/*
> > +	 * Only bits masked by msr_ia32_feature_control_valid_bits can be set in
> > +	 * msr_ia32_feature_control.
> > +	 *
> > +	 * msr_ia32_feature_control_valid_bits should be modified by
> > +	 * feature_control_valid_bits_add/del(), and only bits masked by
> > +	 * FEATURE_CONTROL_MAX_VALID_BITS can be modified.
> > +	 */
> >  	u64 msr_ia32_feature_control;
> > +	u64 msr_ia32_feature_control_valid_bits;
> 
> I noticed that the fw_cfg patch used an uint32_t.  It probably should
> use uint64_t; what you did here is correct.
>

I'll fix there.

Thanks,
Haozhong

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/3] KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL
  2016-06-16 11:19       ` Paolo Bonzini
@ 2016-06-16 11:29         ` Haozhong Zhang
  0 siblings, 0 replies; 20+ messages in thread
From: Haozhong Zhang @ 2016-06-16 11:29 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, rkrcmar, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86,
	linux-kernel, Gleb Natapov, Boris Petkov, Tony Luck, Andi Kleen,
	Ashok Raj

On 06/16/16 13:19, Paolo Bonzini wrote:
> 
> 
> On 16/06/2016 13:16, Haozhong Zhang wrote:
> >> However, I think FEATURE_CONTROL_LOCKED should always be writable.  If
> >> you change that, it's simpler to just do |= and &= in the caller.
> > 
> > These two functions (add/del) are to prevent callers from forgetting
> > setting/clearing FEATURE_CONTROL_LOCKED in
> > msr_ia32_feature_control_valid_bits: it should be set if any feature
> > bit is set, and be cleared if all feature bits are cleared. The second
> > rule could relaxed as we can always present MSR_IA32_FEATURE_CONTROL
> > to guest.
> 
> Yes, this means that FEATURE_CONTROL_LOCKED effectively is always valid.
>  So you end up with just &= to clear and |= to set.
> 
> > I'm okey to let callers take care for the locked bit.
> > 
> >>> +	to_vmx(vcpu)->msr_ia32_feature_control_valid_bits |=
> >>> +		bits | FEATURE_CONTROL_LOCKED;
> >>> +}
> >>> +
> >>> +static void feature_control_valid_bits_del(struct kvm_vcpu *vcpu, uint64_t bits)
> >>> +{
> >>> +	uint64_t *valid_bits =
> >>> +		&to_vmx(vcpu)->msr_ia32_feature_control_valid_bits;
> >>> +	ASSERT(!(bits & ~FEATURE_CONTROL_MAX_VALID_BITS));
> >>> +	*valid_bits &= ~bits;
> >>> +	if (!(*valid_bits & ~FEATURE_CONTROL_LOCKED))
> >>> +		*valid_bits = 0;
> >>> +}
> >>> +
> >>>  #define VMCS12_OFFSET(x) offsetof(struct vmcs12, x)
> >>>  #define FIELD(number, name)	[number] = VMCS12_OFFSET(name)
> >>>  #define FIELD64(number, name)	[number] = VMCS12_OFFSET(name), \
> >>> @@ -2864,6 +2897,14 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
> >>>  	return 0;
> >>>  }
> >>>  
> >>> +static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
> >>> +						 uint64_t val)
> >>> +{
> >>> +	uint64_t valid_bits = to_vmx(vcpu)->msr_ia32_feature_control_valid_bits;
> >>> +
> >>> +	return valid_bits && !(val & ~valid_bits);
> >>> +}
> >>>  /*
> >>>   * Reads an msr value (of 'msr_index') into 'pdata'.
> >>>   * Returns 0 on success, non-0 otherwise.
> >>> @@ -2906,7 +2947,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >>>  		msr_info->data = vmcs_read64(GUEST_BNDCFGS);
> >>>  		break;
> >>>  	case MSR_IA32_FEATURE_CONTROL:
> >>> -		if (!nested_vmx_allowed(vcpu))
> >>> +		if (!vmx_feature_control_msr_valid(vcpu, 0))
> >>
> >> You can remove this if completely in patch 1.  It's okay to make the MSR
> >> always available.
> >>
> > 
> > But then it also allows all bits to be set by guests, even though some
> > features are not available.
> 
> Note that this is "get".  Of course the "if" must stay in vmx_set_msr.
>

My mistake. I'll remove it in patch 1.

Thanks,
Haozhong

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] KVM: VMX: move msr_ia32_feature_control to vcpu_vmx
  2016-06-16  6:05 ` [PATCH v2 1/3] KVM: VMX: move msr_ia32_feature_control to vcpu_vmx Haozhong Zhang
@ 2016-06-16 11:49   ` Borislav Petkov
  2016-06-16 11:57     ` Paolo Bonzini
  2016-06-16 11:57     ` Haozhong Zhang
  0 siblings, 2 replies; 20+ messages in thread
From: Borislav Petkov @ 2016-06-16 11:49 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: kvm, Paolo Bonzini, rkrcmar, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, x86, linux-kernel, Gleb Natapov, Tony Luck,
	Andi Kleen, Ashok Raj

On Thu, Jun 16, 2016 at 02:05:29PM +0800, Haozhong Zhang wrote:
> msr_ia32_feature_control will be used for LMCE and not depend only on
> nested anymore, so move it from struct nested_vmx to struct vcpu_vmx.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
>  arch/x86/kvm/vmx.c | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 57ec6a4..6b63f2d 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -421,7 +421,6 @@ struct nested_vmx {
>  	struct pi_desc *pi_desc;
>  	bool pi_pending;
>  	u16 posted_intr_nv;
> -	u64 msr_ia32_feature_control;
>  
>  	struct hrtimer preemption_timer;
>  	bool preemption_timer_expired;
> @@ -602,6 +601,8 @@ struct vcpu_vmx {
>  	bool guest_pkru_valid;
>  	u32 guest_pkru;
>  	u32 host_pkru;
> +
> +	u64 msr_ia32_feature_control;
>  };
>  
>  enum segment_cache_field {
> @@ -2907,7 +2908,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  	case MSR_IA32_FEATURE_CONTROL:
>  		if (!nested_vmx_allowed(vcpu))
>  			return 1;
> -		msr_info->data = to_vmx(vcpu)->nested.msr_ia32_feature_control;
> +		msr_info->data = to_vmx(vcpu)->msr_ia32_feature_control;

Since this moves out of struct nested_vmx, that check above it:

	if (!nested_vmx_allowed(vcpu))

should not influence it anymore, no?

Ditto for the rest.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] KVM: VMX: move msr_ia32_feature_control to vcpu_vmx
  2016-06-16 11:49   ` Borislav Petkov
@ 2016-06-16 11:57     ` Paolo Bonzini
  2016-06-16 11:57     ` Haozhong Zhang
  1 sibling, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2016-06-16 11:57 UTC (permalink / raw)
  To: Borislav Petkov, Haozhong Zhang
  Cc: kvm, rkrcmar, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86,
	linux-kernel, Gleb Natapov, Tony Luck, Andi Kleen, Ashok Raj



On 16/06/2016 13:49, Borislav Petkov wrote:
>> >  enum segment_cache_field {
>> > @@ -2907,7 +2908,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>> >  	case MSR_IA32_FEATURE_CONTROL:
>> >  		if (!nested_vmx_allowed(vcpu))
>> >  			return 1;
>> > -		msr_info->data = to_vmx(vcpu)->nested.msr_ia32_feature_control;
>> > +		msr_info->data = to_vmx(vcpu)->msr_ia32_feature_control;
> Since this moves out of struct nested_vmx, that check above it:
> 
> 	if (!nested_vmx_allowed(vcpu))
> 
> should not influence it anymore, no?

For get, yes, this "if" should go.

For set, you need to check that the guest doesn't write to reserved
bits.  So as of this patch the "if" remains tied to VMX, but the next
patch changes it to be generic.

Paolo

> Ditto for the rest.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] KVM: VMX: move msr_ia32_feature_control to vcpu_vmx
  2016-06-16 11:49   ` Borislav Petkov
  2016-06-16 11:57     ` Paolo Bonzini
@ 2016-06-16 11:57     ` Haozhong Zhang
  1 sibling, 0 replies; 20+ messages in thread
From: Haozhong Zhang @ 2016-06-16 11:57 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: kvm, Paolo Bonzini, rkrcmar, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, x86, linux-kernel, Gleb Natapov, Tony Luck,
	Andi Kleen, Ashok Raj

On 06/16/16 13:49, Borislav Petkov wrote:
> On Thu, Jun 16, 2016 at 02:05:29PM +0800, Haozhong Zhang wrote:
> > msr_ia32_feature_control will be used for LMCE and not depend only on
> > nested anymore, so move it from struct nested_vmx to struct vcpu_vmx.
> > 
> > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > ---
> >  arch/x86/kvm/vmx.c | 11 ++++++-----
> >  1 file changed, 6 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index 57ec6a4..6b63f2d 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -421,7 +421,6 @@ struct nested_vmx {
> >  	struct pi_desc *pi_desc;
> >  	bool pi_pending;
> >  	u16 posted_intr_nv;
> > -	u64 msr_ia32_feature_control;
> >  
> >  	struct hrtimer preemption_timer;
> >  	bool preemption_timer_expired;
> > @@ -602,6 +601,8 @@ struct vcpu_vmx {
> >  	bool guest_pkru_valid;
> >  	u32 guest_pkru;
> >  	u32 host_pkru;
> > +
> > +	u64 msr_ia32_feature_control;
> >  };
> >  
> >  enum segment_cache_field {
> > @@ -2907,7 +2908,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >  	case MSR_IA32_FEATURE_CONTROL:
> >  		if (!nested_vmx_allowed(vcpu))
> >  			return 1;
> > -		msr_info->data = to_vmx(vcpu)->nested.msr_ia32_feature_control;
> > +		msr_info->data = to_vmx(vcpu)->msr_ia32_feature_control;
> 
> Since this moves out of struct nested_vmx, that check above it:
> 
> 	if (!nested_vmx_allowed(vcpu))
> 
> should not influence it anymore, no?
> 
> Ditto for the rest.
> 

The same check in the set case still makes sense. I can remove the
check here and leave it in the set case.

Thanks,
Haozhong

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/3] KVM: VMX: enable guest access to LMCE related MSRs
  2016-06-16 10:04   ` Paolo Bonzini
  2016-06-16 10:49     ` Haozhong Zhang
@ 2016-06-16 14:55     ` Eduardo Habkost
  2016-06-17  1:11       ` Haozhong Zhang
  1 sibling, 1 reply; 20+ messages in thread
From: Eduardo Habkost @ 2016-06-16 14:55 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Haozhong Zhang, kvm, rkrcmar, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, x86, linux-kernel, Gleb Natapov, Boris Petkov,
	Tony Luck, Andi Kleen, Ashok Raj, libvir-list, Jiri Denemark

On Thu, Jun 16, 2016 at 12:04:50PM +0200, Paolo Bonzini wrote:
> On 16/06/2016 08:05, Haozhong Zhang wrote:
> > From: Ashok Raj <ashok.raj@intel.com>
> > 
> > On Intel platforms, this patch adds LMCE to KVM MCE supported
> > capabilities and handles guest access to LMCE related MSRs.
> > 
> > Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> > [Haozhong: macro KVM_MCE_CAP_SUPPORTED => variable kvm_mce_cap_supported
> >            Only enable LMCE on Intel platform
> > 	   Check MSR_IA32_FEATURE_CONTROL when handling guest
> > 	     access to MSR_IA32_MCG_EXT_CTL]
> > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
[...]
> > @@ -6433,6 +6455,8 @@ static __init int hardware_setup(void)
> >  
> >  	kvm_set_posted_intr_wakeup_handler(wakeup_handler);
> >  
> > +	kvm_mce_cap_supported |= MCG_LMCE_P;
> 
> Ah, so virtual LMCE is available on all processors!  This is
> interesting, but it also makes it more complicated to handle in QEMU; a
> new QEMU generally doesn't require a new kernel.
> 
> Eduardo, any ideas?

(CCing libvirt list)

As we shouldn't make machine-type changes introduce new host
requirements, it looks like we need to either add a new set of
CPU models (unreasonable), or expect management software to
explicitly enable LMCE after ensuring the host supports it.

Or we could wait for a reasonable time after the feature is
available in the kernel, and declare that QEMU as a whole
requires a newer kernel. But how much time would be reasonable
for that?

Long term, I believe we should think of a better solution. I
don't think it is reasonable to require new libvirt code to be
written for every single low-level feature that requires a newer
kernel or newer host hardware. Maybe new introspection interfaces
that would allow us to drop the "no new requirements on
machine-type changes" rule?

-- 
Eduardo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/3] KVM: VMX: enable guest access to LMCE related MSRs
  2016-06-16 14:55     ` Eduardo Habkost
@ 2016-06-17  1:11       ` Haozhong Zhang
  2016-06-17 17:15         ` Eduardo Habkost
  0 siblings, 1 reply; 20+ messages in thread
From: Haozhong Zhang @ 2016-06-17  1:11 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Paolo Bonzini, kvm, rkrcmar, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, x86, linux-kernel, Gleb Natapov, Boris Petkov,
	Tony Luck, Andi Kleen, Ashok Raj, libvir-list, Jiri Denemark

On 06/16/16 11:55, Eduardo Habkost wrote:
> On Thu, Jun 16, 2016 at 12:04:50PM +0200, Paolo Bonzini wrote:
> > On 16/06/2016 08:05, Haozhong Zhang wrote:
> > > From: Ashok Raj <ashok.raj@intel.com>
> > > 
> > > On Intel platforms, this patch adds LMCE to KVM MCE supported
> > > capabilities and handles guest access to LMCE related MSRs.
> > > 
> > > Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> > > [Haozhong: macro KVM_MCE_CAP_SUPPORTED => variable kvm_mce_cap_supported
> > >            Only enable LMCE on Intel platform
> > > 	   Check MSR_IA32_FEATURE_CONTROL when handling guest
> > > 	     access to MSR_IA32_MCG_EXT_CTL]
> > > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> [...]
> > > @@ -6433,6 +6455,8 @@ static __init int hardware_setup(void)
> > >  
> > >  	kvm_set_posted_intr_wakeup_handler(wakeup_handler);
> > >  
> > > +	kvm_mce_cap_supported |= MCG_LMCE_P;
> > 
> > Ah, so virtual LMCE is available on all processors!  This is
> > interesting, but it also makes it more complicated to handle in QEMU; a
> > new QEMU generally doesn't require a new kernel.
> > 
> > Eduardo, any ideas?
> 
> (CCing libvirt list)
> 
> As we shouldn't make machine-type changes introduce new host
> requirements, it looks like we need to either add a new set of
> CPU models (unreasonable), or expect management software to
> explicitly enable LMCE after ensuring the host supports it.
>
> Or we could wait for a reasonable time after the feature is
> available in the kernel, and declare that QEMU as a whole
> requires a newer kernel. But how much time would be reasonable
> for that?
> 
> Long term, I believe we should think of a better solution. I
> don't think it is reasonable to require new libvirt code to be
> written for every single low-level feature that requires a newer
> kernel or newer host hardware. Maybe new introspection interfaces
> that would allow us to drop the "no new requirements on
> machine-type changes" rule?
>

Because new MSR (MSR_IA32_MCG_EXT_CTL) and new MSR bit
(FEATURE_CONTROL_LMCE in MSR_IA32_FEATURE_CONTROL) are introduced by
LMCE, QEMU requires new KVM which can handle those changes.

I'm not familiar with libvirt. Does the requirement of new KVM
capability bring any troubles to libvirt?

Thanks,
Haozhong

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/3] KVM: VMX: enable guest access to LMCE related MSRs
  2016-06-17  1:11       ` Haozhong Zhang
@ 2016-06-17 17:15         ` Eduardo Habkost
  2016-06-20  1:49           ` Haozhong Zhang
  0 siblings, 1 reply; 20+ messages in thread
From: Eduardo Habkost @ 2016-06-17 17:15 UTC (permalink / raw)
  To: Paolo Bonzini, kvm, rkrcmar, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, x86, linux-kernel, Gleb Natapov, Boris Petkov,
	Tony Luck, Andi Kleen, Ashok Raj, libvir-list, Jiri Denemark

On Fri, Jun 17, 2016 at 09:11:16AM +0800, Haozhong Zhang wrote:
> On 06/16/16 11:55, Eduardo Habkost wrote:
> > On Thu, Jun 16, 2016 at 12:04:50PM +0200, Paolo Bonzini wrote:
> > > On 16/06/2016 08:05, Haozhong Zhang wrote:
> > > > From: Ashok Raj <ashok.raj@intel.com>
> > > > 
> > > > On Intel platforms, this patch adds LMCE to KVM MCE supported
> > > > capabilities and handles guest access to LMCE related MSRs.
> > > > 
> > > > Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> > > > [Haozhong: macro KVM_MCE_CAP_SUPPORTED => variable kvm_mce_cap_supported
> > > >            Only enable LMCE on Intel platform
> > > > 	   Check MSR_IA32_FEATURE_CONTROL when handling guest
> > > > 	     access to MSR_IA32_MCG_EXT_CTL]
> > > > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > [...]
> > > > @@ -6433,6 +6455,8 @@ static __init int hardware_setup(void)
> > > >  
> > > >  	kvm_set_posted_intr_wakeup_handler(wakeup_handler);
> > > >  
> > > > +	kvm_mce_cap_supported |= MCG_LMCE_P;
> > > 
> > > Ah, so virtual LMCE is available on all processors!  This is
> > > interesting, but it also makes it more complicated to handle in QEMU; a
> > > new QEMU generally doesn't require a new kernel.
> > > 
> > > Eduardo, any ideas?
> > 
> > (CCing libvirt list)
> > 
> > As we shouldn't make machine-type changes introduce new host
> > requirements, it looks like we need to either add a new set of
> > CPU models (unreasonable), or expect management software to
> > explicitly enable LMCE after ensuring the host supports it.
> >
> > Or we could wait for a reasonable time after the feature is
> > available in the kernel, and declare that QEMU as a whole
> > requires a newer kernel. But how much time would be reasonable
> > for that?
> > 
> > Long term, I believe we should think of a better solution. I
> > don't think it is reasonable to require new libvirt code to be
> > written for every single low-level feature that requires a newer
> > kernel or newer host hardware. Maybe new introspection interfaces
> > that would allow us to drop the "no new requirements on
> > machine-type changes" rule?
> >
> 
> Because new MSR (MSR_IA32_MCG_EXT_CTL) and new MSR bit
> (FEATURE_CONTROL_LMCE in MSR_IA32_FEATURE_CONTROL) are introduced by
> LMCE, QEMU requires new KVM which can handle those changes.

If I understood correctly, you are describing the second option
above (declaring that QEMU as a whole requires a newer kernel).

> 
> I'm not familiar with libvirt. Does the requirement of new KVM
> capability bring any troubles to libvirt?

It does, assuming that we still support running QEMU under an
older kernel where KVM doesn't LMCE. In this case, the pc-2.6
machine-type will run, but the pc-2.7 machine-type won't.

The requirement of new KVM capabilities based on the machine-type
is a problem for livirt. libvirt have some host-capabilities APIs
to allow software to check if the VM can be run on (or migrated
to) a host, but none of them are based on machine-type.

This is not necessarily specific to libvirt: people may have
their own configuration or scripts that use the default "pc"
alias, and a QEMU upgrade shouldn't break their configuration.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/3] KVM: VMX: enable guest access to LMCE related MSRs
  2016-06-17 17:15         ` Eduardo Habkost
@ 2016-06-20  1:49           ` Haozhong Zhang
  2016-06-20  6:55             ` Paolo Bonzini
  0 siblings, 1 reply; 20+ messages in thread
From: Haozhong Zhang @ 2016-06-20  1:49 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Paolo Bonzini, kvm, rkrcmar, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, x86, linux-kernel, Gleb Natapov, Boris Petkov,
	Tony Luck, Andi Kleen, Ashok Raj, libvir-list, Jiri Denemark

On 06/17/16 14:15, Eduardo Habkost wrote:
> On Fri, Jun 17, 2016 at 09:11:16AM +0800, Haozhong Zhang wrote:
> > On 06/16/16 11:55, Eduardo Habkost wrote:
> > > On Thu, Jun 16, 2016 at 12:04:50PM +0200, Paolo Bonzini wrote:
> > > > On 16/06/2016 08:05, Haozhong Zhang wrote:
> > > > > From: Ashok Raj <ashok.raj@intel.com>
> > > > > 
> > > > > On Intel platforms, this patch adds LMCE to KVM MCE supported
> > > > > capabilities and handles guest access to LMCE related MSRs.
> > > > > 
> > > > > Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> > > > > [Haozhong: macro KVM_MCE_CAP_SUPPORTED => variable kvm_mce_cap_supported
> > > > >            Only enable LMCE on Intel platform
> > > > > 	   Check MSR_IA32_FEATURE_CONTROL when handling guest
> > > > > 	     access to MSR_IA32_MCG_EXT_CTL]
> > > > > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > > [...]
> > > > > @@ -6433,6 +6455,8 @@ static __init int hardware_setup(void)
> > > > >  
> > > > >  	kvm_set_posted_intr_wakeup_handler(wakeup_handler);
> > > > >  
> > > > > +	kvm_mce_cap_supported |= MCG_LMCE_P;
> > > > 
> > > > Ah, so virtual LMCE is available on all processors!  This is
> > > > interesting, but it also makes it more complicated to handle in QEMU; a
> > > > new QEMU generally doesn't require a new kernel.
> > > > 
> > > > Eduardo, any ideas?
> > > 
> > > (CCing libvirt list)
> > > 
> > > As we shouldn't make machine-type changes introduce new host
> > > requirements, it looks like we need to either add a new set of
> > > CPU models (unreasonable), or expect management software to
> > > explicitly enable LMCE after ensuring the host supports it.
> > >
> > > Or we could wait for a reasonable time after the feature is
> > > available in the kernel, and declare that QEMU as a whole
> > > requires a newer kernel. But how much time would be reasonable
> > > for that?
> > > 
> > > Long term, I believe we should think of a better solution. I
> > > don't think it is reasonable to require new libvirt code to be
> > > written for every single low-level feature that requires a newer
> > > kernel or newer host hardware. Maybe new introspection interfaces
> > > that would allow us to drop the "no new requirements on
> > > machine-type changes" rule?
> > >
> > 
> > Because new MSR (MSR_IA32_MCG_EXT_CTL) and new MSR bit
> > (FEATURE_CONTROL_LMCE in MSR_IA32_FEATURE_CONTROL) are introduced by
> > LMCE, QEMU requires new KVM which can handle those changes.
> 
> If I understood correctly, you are describing the second option
> above (declaring that QEMU as a whole requires a newer kernel).
> 
> > 
> > I'm not familiar with libvirt. Does the requirement of new KVM
> > capability bring any troubles to libvirt?
>
> It does, assuming that we still support running QEMU under an
> older kernel where KVM doesn't LMCE. In this case, the pc-2.6
> machine-type will run, but the pc-2.7 machine-type won't.
>
> The requirement of new KVM capabilities based on the machine-type
> is a problem for livirt. libvirt have some host-capabilities APIs
> to allow software to check if the VM can be run on (or migrated
> to) a host, but none of them are based on machine-type.
> 
> This is not necessarily specific to libvirt: people may have
> their own configuration or scripts that use the default "pc"
> alias, and a QEMU upgrade shouldn't break their configuration.
>

Thanks for the explanation!

If we disable LMCE in QEMU by default (even for -cpu host), will it
still be a problem? That is,

- pc-2.7 can continue to run on old kernels unless users explicitly
  require LMCE

- existing libvirt VM configurations can continue to work on pc-2.7
  because LMCE is not specified in those configurations and are
  disabled by default (i.e. no requirement for new kernels)

- existing QEMU configurations/scripts using pc alias can continue to
  work on pc-27 for the same reason above.

Thanks,
Haozhong

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/3] KVM: VMX: enable guest access to LMCE related MSRs
  2016-06-20  1:49           ` Haozhong Zhang
@ 2016-06-20  6:55             ` Paolo Bonzini
  0 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2016-06-20  6:55 UTC (permalink / raw)
  To: Eduardo Habkost, kvm, rkrcmar, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, x86, linux-kernel, Gleb Natapov, Boris Petkov,
	Tony Luck, Andi Kleen, Ashok Raj, libvir-list, Jiri Denemark



On 20/06/2016 03:49, Haozhong Zhang wrote:
> Thanks for the explanation!
> 
> If we disable LMCE in QEMU by default (even for -cpu host), will it
> still be a problem? That is,
> 
> - pc-2.7 can continue to run on old kernels unless users explicitly
>   require LMCE
> 
> - existing libvirt VM configurations can continue to work on pc-2.7
>   because LMCE is not specified in those configurations and are
>   disabled by default (i.e. no requirement for new kernels)
> 
> - existing QEMU configurations/scripts using pc alias can continue to
>   work on pc-27 for the same reason above.

Yes, that would be fine.  "-cpu host" can leave LMCE enabled if
supported by the kernel.

Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-06-20  7:13 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-16  6:05 [PATCH v2 0/3] Add KVM support for Intel local MCE Haozhong Zhang
2016-06-16  6:05 ` [PATCH v2 1/3] KVM: VMX: move msr_ia32_feature_control to vcpu_vmx Haozhong Zhang
2016-06-16 11:49   ` Borislav Petkov
2016-06-16 11:57     ` Paolo Bonzini
2016-06-16 11:57     ` Haozhong Zhang
2016-06-16  6:05 ` [PATCH v2 2/3] KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL Haozhong Zhang
2016-06-16  9:55   ` Paolo Bonzini
2016-06-16 11:21     ` Haozhong Zhang
2016-06-16 10:01   ` Paolo Bonzini
2016-06-16 11:16     ` Haozhong Zhang
2016-06-16 11:19       ` Paolo Bonzini
2016-06-16 11:29         ` Haozhong Zhang
2016-06-16  6:05 ` [PATCH v2 3/3] KVM: VMX: enable guest access to LMCE related MSRs Haozhong Zhang
2016-06-16 10:04   ` Paolo Bonzini
2016-06-16 10:49     ` Haozhong Zhang
2016-06-16 14:55     ` Eduardo Habkost
2016-06-17  1:11       ` Haozhong Zhang
2016-06-17 17:15         ` Eduardo Habkost
2016-06-20  1:49           ` Haozhong Zhang
2016-06-20  6:55             ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).