All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/7] KVM: VMX: PT (processor trace) optimization cleanup and fixes
@ 2021-08-27  7:02 Xiaoyao Li
  2021-08-27  7:02 ` [PATCH v2 1/7] KVM: VMX: Restore host's MSR_IA32_RTIT_CTL when it's not zero Xiaoyao Li
                   ` (7 more replies)
  0 siblings, 8 replies; 21+ messages in thread
From: Xiaoyao Li @ 2021-08-27  7:02 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Xiaoyao Li, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, linux-kernel

Patch 1-3 are optimization and cleanup. 

Patch 4-7 are fixes for PT. Patch 4 and 5 fix the virtulazation of PT to
provide architectual consistent behavior for guest. Patch 6 fix the case
that malicious userspace can exploit PT to cause vm-entry failure or #GP
in KVM. Patch 7 fix the potential MSR access #GP if some PT MSRs not
available on hardware.

Patch 3 and patch 7 are added in v2.

Xiaoyao Li (7):
  KVM: VMX: Restore host's MSR_IA32_RTIT_CTL when it's not zero
  KVM: VMX: Use precomputed vmx->pt_desc.addr_range
  KVM: VMX: Rename pt_desc.addr_range to pt_desc.nr_addr_range
  KVM: VMX: RTIT_CTL_BRANCH_EN has no dependency on other CPUID bit
  KVM: VMX: Disallow PT MSRs accessing if PT is not exposed to guest
  KVM: VMX: Check Intel PT related CPUID leaves
  KVM: VMX: Only context switch some PT MSRs when they exist

 arch/x86/kvm/cpuid.c   |  25 ++++++++++
 arch/x86/kvm/vmx/vmx.c | 110 ++++++++++++++++++++++++++---------------
 arch/x86/kvm/vmx/vmx.h |   2 +-
 3 files changed, 95 insertions(+), 42 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 1/7] KVM: VMX: Restore host's MSR_IA32_RTIT_CTL when it's not zero
  2021-08-27  7:02 [PATCH v2 0/7] KVM: VMX: PT (processor trace) optimization cleanup and fixes Xiaoyao Li
@ 2021-08-27  7:02 ` Xiaoyao Li
  2021-08-27  7:02 ` [PATCH v2 2/7] KVM: VMX: Use precomputed vmx->pt_desc.addr_range Xiaoyao Li
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Xiaoyao Li @ 2021-08-27  7:02 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Xiaoyao Li, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, linux-kernel

A minor optimization to WRMSR MSR_IA32_RTIT_CTL when necessary.

Opportunistically refine the comment to call out that KVM requires
VM_EXIT_CLEAR_IA32_RTIT_CTL to expose PT to the guest.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
Changes in v2:
 - Refine comments
---
 arch/x86/kvm/vmx/vmx.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index fada1055f325..5535a86aea37 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1074,8 +1074,12 @@ static void pt_guest_exit(struct vcpu_vmx *vmx)
 		pt_load_msr(&vmx->pt_desc.host, vmx->pt_desc.addr_range);
 	}
 
-	/* Reload host state (IA32_RTIT_CTL will be cleared on VM exit). */
-	wrmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc.host.ctl);
+	/*
+	 * KVM requires VM_EXIT_CLEAR_IA32_RTIT_CTL to expose PT to the guest,
+	 * i.e. RTIT_CTL is always cleared on VM-Exit.  Restore it if necessary.
+	 */
+	if (vmx->pt_desc.host.ctl)
+		wrmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc.host.ctl);
 }
 
 void vmx_set_host_fs_gs(struct vmcs_host_state *host, u16 fs_sel, u16 gs_sel,
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 2/7] KVM: VMX: Use precomputed vmx->pt_desc.addr_range
  2021-08-27  7:02 [PATCH v2 0/7] KVM: VMX: PT (processor trace) optimization cleanup and fixes Xiaoyao Li
  2021-08-27  7:02 ` [PATCH v2 1/7] KVM: VMX: Restore host's MSR_IA32_RTIT_CTL when it's not zero Xiaoyao Li
@ 2021-08-27  7:02 ` Xiaoyao Li
  2021-08-27  7:02 ` [PATCH v2 3/7] KVM: VMX: Rename pt_desc.addr_range to pt_desc.nr_addr_range Xiaoyao Li
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Xiaoyao Li @ 2021-08-27  7:02 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Xiaoyao Li, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, linux-kernel

The number of guest's valid PT ADDR MSRs is precomputed in
vmx->pt_desc.addr_range. Use it instead of calculating again.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
Changes in v2:
 - Refine comment to use 'precomputed' instead of 'cached'
 - Use precomputed value in vmx_get_msr() as well
---
 arch/x86/kvm/vmx/vmx.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 5535a86aea37..96a2df65678f 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1889,8 +1889,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B:
 		index = msr_info->index - MSR_IA32_RTIT_ADDR0_A;
 		if (!vmx_pt_mode_is_host_guest() ||
-			(index >= 2 * intel_pt_validate_cap(vmx->pt_desc.caps,
-					PT_CAP_num_address_ranges)))
+		    (index >= 2 * vmx->pt_desc.addr_range))
 			return 1;
 		if (index % 2)
 			msr_info->data = vmx->pt_desc.guest.addr_b[index / 2];
@@ -2205,8 +2204,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		if (!pt_can_write_msr(vmx))
 			return 1;
 		index = msr_info->index - MSR_IA32_RTIT_ADDR0_A;
-		if (index >= 2 * intel_pt_validate_cap(vmx->pt_desc.caps,
-						       PT_CAP_num_address_ranges))
+		if (index >= 2 * vmx->pt_desc.addr_range)
 			return 1;
 		if (is_noncanonical_address(data, vcpu))
 			return 1;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 3/7] KVM: VMX: Rename pt_desc.addr_range to pt_desc.nr_addr_range
  2021-08-27  7:02 [PATCH v2 0/7] KVM: VMX: PT (processor trace) optimization cleanup and fixes Xiaoyao Li
  2021-08-27  7:02 ` [PATCH v2 1/7] KVM: VMX: Restore host's MSR_IA32_RTIT_CTL when it's not zero Xiaoyao Li
  2021-08-27  7:02 ` [PATCH v2 2/7] KVM: VMX: Use precomputed vmx->pt_desc.addr_range Xiaoyao Li
@ 2021-08-27  7:02 ` Xiaoyao Li
  2021-10-18 12:41   ` Paolo Bonzini
  2021-08-27  7:02 ` [PATCH v2 4/7] KVM: VMX: RTIT_CTL_BRANCH_EN has no dependency on other CPUID bit Xiaoyao Li
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Xiaoyao Li @ 2021-08-27  7:02 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Xiaoyao Li, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, linux-kernel

To better self explain the meaning of this field.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 26 +++++++++++++-------------
 arch/x86/kvm/vmx/vmx.h |  2 +-
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 96a2df65678f..c54b99cec0e6 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1059,8 +1059,8 @@ static void pt_guest_enter(struct vcpu_vmx *vmx)
 	rdmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc.host.ctl);
 	if (vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) {
 		wrmsrl(MSR_IA32_RTIT_CTL, 0);
-		pt_save_msr(&vmx->pt_desc.host, vmx->pt_desc.addr_range);
-		pt_load_msr(&vmx->pt_desc.guest, vmx->pt_desc.addr_range);
+		pt_save_msr(&vmx->pt_desc.host, vmx->pt_desc.nr_addr_ranges);
+		pt_load_msr(&vmx->pt_desc.guest, vmx->pt_desc.nr_addr_ranges);
 	}
 }
 
@@ -1070,8 +1070,8 @@ static void pt_guest_exit(struct vcpu_vmx *vmx)
 		return;
 
 	if (vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) {
-		pt_save_msr(&vmx->pt_desc.guest, vmx->pt_desc.addr_range);
-		pt_load_msr(&vmx->pt_desc.host, vmx->pt_desc.addr_range);
+		pt_save_msr(&vmx->pt_desc.guest, vmx->pt_desc.nr_addr_ranges);
+		pt_load_msr(&vmx->pt_desc.host, vmx->pt_desc.nr_addr_ranges);
 	}
 
 	/*
@@ -1460,16 +1460,16 @@ static int vmx_rtit_ctl_check(struct kvm_vcpu *vcpu, u64 data)
 	 * cause a #GP fault.
 	 */
 	value = (data & RTIT_CTL_ADDR0) >> RTIT_CTL_ADDR0_OFFSET;
-	if ((value && (vmx->pt_desc.addr_range < 1)) || (value > 2))
+	if ((value && (vmx->pt_desc.nr_addr_ranges < 1)) || (value > 2))
 		return 1;
 	value = (data & RTIT_CTL_ADDR1) >> RTIT_CTL_ADDR1_OFFSET;
-	if ((value && (vmx->pt_desc.addr_range < 2)) || (value > 2))
+	if ((value && (vmx->pt_desc.nr_addr_ranges < 2)) || (value > 2))
 		return 1;
 	value = (data & RTIT_CTL_ADDR2) >> RTIT_CTL_ADDR2_OFFSET;
-	if ((value && (vmx->pt_desc.addr_range < 3)) || (value > 2))
+	if ((value && (vmx->pt_desc.nr_addr_ranges < 3)) || (value > 2))
 		return 1;
 	value = (data & RTIT_CTL_ADDR3) >> RTIT_CTL_ADDR3_OFFSET;
-	if ((value && (vmx->pt_desc.addr_range < 4)) || (value > 2))
+	if ((value && (vmx->pt_desc.nr_addr_ranges < 4)) || (value > 2))
 		return 1;
 
 	return 0;
@@ -1889,7 +1889,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B:
 		index = msr_info->index - MSR_IA32_RTIT_ADDR0_A;
 		if (!vmx_pt_mode_is_host_guest() ||
-		    (index >= 2 * vmx->pt_desc.addr_range))
+		    (index >= 2 * vmx->pt_desc.nr_addr_ranges))
 			return 1;
 		if (index % 2)
 			msr_info->data = vmx->pt_desc.guest.addr_b[index / 2];
@@ -2204,7 +2204,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		if (!pt_can_write_msr(vmx))
 			return 1;
 		index = msr_info->index - MSR_IA32_RTIT_ADDR0_A;
-		if (index >= 2 * vmx->pt_desc.addr_range)
+		if (index >= 2 * vmx->pt_desc.nr_addr_ranges)
 			return 1;
 		if (is_noncanonical_address(data, vcpu))
 			return 1;
@@ -3880,7 +3880,7 @@ void pt_update_intercept_for_msr(struct kvm_vcpu *vcpu)
 	vmx_set_intercept_for_msr(vcpu, MSR_IA32_RTIT_OUTPUT_BASE, MSR_TYPE_RW, flag);
 	vmx_set_intercept_for_msr(vcpu, MSR_IA32_RTIT_OUTPUT_MASK, MSR_TYPE_RW, flag);
 	vmx_set_intercept_for_msr(vcpu, MSR_IA32_RTIT_CR3_MATCH, MSR_TYPE_RW, flag);
-	for (i = 0; i < vmx->pt_desc.addr_range; i++) {
+	for (i = 0; i < vmx->pt_desc.nr_addr_ranges; i++) {
 		vmx_set_intercept_for_msr(vcpu, MSR_IA32_RTIT_ADDR0_A + i * 2, MSR_TYPE_RW, flag);
 		vmx_set_intercept_for_msr(vcpu, MSR_IA32_RTIT_ADDR0_B + i * 2, MSR_TYPE_RW, flag);
 	}
@@ -7113,7 +7113,7 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
 	}
 
 	/* Get the number of configurable Address Ranges for filtering */
-	vmx->pt_desc.addr_range = intel_pt_validate_cap(vmx->pt_desc.caps,
+	vmx->pt_desc.nr_addr_ranges = intel_pt_validate_cap(vmx->pt_desc.caps,
 						PT_CAP_num_address_ranges);
 
 	/* Initialize and clear the no dependency bits */
@@ -7161,7 +7161,7 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
 		vmx->pt_desc.ctl_bitmask &= ~RTIT_CTL_FABRIC_EN;
 
 	/* unmask address range configure area */
-	for (i = 0; i < vmx->pt_desc.addr_range; i++)
+	for (i = 0; i < vmx->pt_desc.nr_addr_ranges; i++)
 		vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4));
 }
 
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 4858c5fd95f2..f48eafbbed0e 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -62,7 +62,7 @@ struct pt_ctx {
 
 struct pt_desc {
 	u64 ctl_bitmask;
-	u32 addr_range;
+	u32 nr_addr_ranges;
 	u32 caps[PT_CPUID_REGS_NUM * PT_CPUID_LEAVES];
 	struct pt_ctx host;
 	struct pt_ctx guest;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 4/7] KVM: VMX: RTIT_CTL_BRANCH_EN has no dependency on other CPUID bit
  2021-08-27  7:02 [PATCH v2 0/7] KVM: VMX: PT (processor trace) optimization cleanup and fixes Xiaoyao Li
                   ` (2 preceding siblings ...)
  2021-08-27  7:02 ` [PATCH v2 3/7] KVM: VMX: Rename pt_desc.addr_range to pt_desc.nr_addr_range Xiaoyao Li
@ 2021-08-27  7:02 ` Xiaoyao Li
  2021-08-27  7:02 ` [PATCH v2 5/7] KVM: VMX: Disallow PT MSRs accessing if PT is not exposed to guest Xiaoyao Li
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Xiaoyao Li @ 2021-08-27  7:02 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Xiaoyao Li, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, linux-kernel

Per Intel SDM, RTIT_CTL_BRANCH_EN bit has no dependency on any CPUID
leaf 0x14.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c54b99cec0e6..b9d640029c40 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7118,7 +7118,8 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
 
 	/* Initialize and clear the no dependency bits */
 	vmx->pt_desc.ctl_bitmask = ~(RTIT_CTL_TRACEEN | RTIT_CTL_OS |
-			RTIT_CTL_USR | RTIT_CTL_TSC_EN | RTIT_CTL_DISRETC);
+			RTIT_CTL_USR | RTIT_CTL_TSC_EN | RTIT_CTL_DISRETC |
+			RTIT_CTL_BRANCH_EN);
 
 	/*
 	 * If CPUID.(EAX=14H,ECX=0):EBX[0]=1 CR3Filter can be set otherwise
@@ -7136,12 +7137,11 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
 				RTIT_CTL_CYC_THRESH | RTIT_CTL_PSB_FREQ);
 
 	/*
-	 * If CPUID.(EAX=14H,ECX=0):EBX[3]=1 MTCEn BranchEn and
-	 * MTCFreq can be set
+	 * If CPUID.(EAX=14H,ECX=0):EBX[3]=1 MTCEn and MTCFreq can be set
 	 */
 	if (intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_mtc))
 		vmx->pt_desc.ctl_bitmask &= ~(RTIT_CTL_MTC_EN |
-				RTIT_CTL_BRANCH_EN | RTIT_CTL_MTC_RANGE);
+					      RTIT_CTL_MTC_RANGE);
 
 	/* If CPUID.(EAX=14H,ECX=0):EBX[4]=1 FUPonPTW and PTWEn can be set */
 	if (intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_ptwrite))
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 5/7] KVM: VMX: Disallow PT MSRs accessing if PT is not exposed to guest
  2021-08-27  7:02 [PATCH v2 0/7] KVM: VMX: PT (processor trace) optimization cleanup and fixes Xiaoyao Li
                   ` (3 preceding siblings ...)
  2021-08-27  7:02 ` [PATCH v2 4/7] KVM: VMX: RTIT_CTL_BRANCH_EN has no dependency on other CPUID bit Xiaoyao Li
@ 2021-08-27  7:02 ` Xiaoyao Li
  2021-10-18 12:46   ` Paolo Bonzini
  2021-08-27  7:02 ` [PATCH v2 6/7] KVM: VMX: Check Intel PT related CPUID leaves Xiaoyao Li
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Xiaoyao Li @ 2021-08-27  7:02 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Xiaoyao Li, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, linux-kernel

Per SDM, it triggers #GP for all the accessing of PT MSRs, if
X86_FEATURE_INTEL_PT is not available.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
Changes in v2:
 - allow userspace/host access regradless of PT bit, (Sean)
---
 arch/x86/kvm/vmx/vmx.c | 38 +++++++++++++++++++++++++-------------
 1 file changed, 25 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b9d640029c40..394ef4732838 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1007,10 +1007,21 @@ static unsigned long segment_base(u16 selector)
 }
 #endif
 
-static inline bool pt_can_write_msr(struct vcpu_vmx *vmx)
+static inline bool pt_can_write_msr(struct vcpu_vmx *vmx,
+				    struct msr_data *msr_info)
 {
 	return vmx_pt_mode_is_host_guest() &&
-	       !(vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN);
+	       !(vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) &&
+	       (msr_info->host_initiated ||
+		guest_cpuid_has(&vmx->vcpu, X86_FEATURE_INTEL_PT));
+}
+
+static inline bool pt_can_read_msr(struct kvm_vcpu *vcpu,
+				   struct msr_data *msr_info)
+{
+	return vmx_pt_mode_is_host_guest() &&
+	       (msr_info->host_initiated ||
+		guest_cpuid_has(vcpu, X86_FEATURE_INTEL_PT));
 }
 
 static inline bool pt_output_base_valid(struct kvm_vcpu *vcpu, u64 base)
@@ -1852,24 +1863,24 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 							&msr_info->data);
 		break;
 	case MSR_IA32_RTIT_CTL:
-		if (!vmx_pt_mode_is_host_guest())
+		if (!pt_can_read_msr(vcpu, msr_info))
 			return 1;
 		msr_info->data = vmx->pt_desc.guest.ctl;
 		break;
 	case MSR_IA32_RTIT_STATUS:
-		if (!vmx_pt_mode_is_host_guest())
+		if (!pt_can_read_msr(vcpu, msr_info))
 			return 1;
 		msr_info->data = vmx->pt_desc.guest.status;
 		break;
 	case MSR_IA32_RTIT_CR3_MATCH:
-		if (!vmx_pt_mode_is_host_guest() ||
+		if (!pt_can_read_msr(vcpu, msr_info) ||
 			!intel_pt_validate_cap(vmx->pt_desc.caps,
 						PT_CAP_cr3_filtering))
 			return 1;
 		msr_info->data = vmx->pt_desc.guest.cr3_match;
 		break;
 	case MSR_IA32_RTIT_OUTPUT_BASE:
-		if (!vmx_pt_mode_is_host_guest() ||
+		if (!pt_can_read_msr(vcpu, msr_info) ||
 			(!intel_pt_validate_cap(vmx->pt_desc.caps,
 					PT_CAP_topa_output) &&
 			 !intel_pt_validate_cap(vmx->pt_desc.caps,
@@ -1878,7 +1889,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		msr_info->data = vmx->pt_desc.guest.output_base;
 		break;
 	case MSR_IA32_RTIT_OUTPUT_MASK:
-		if (!vmx_pt_mode_is_host_guest() ||
+		if (!pt_can_read_msr(vcpu, msr_info) ||
 			(!intel_pt_validate_cap(vmx->pt_desc.caps,
 					PT_CAP_topa_output) &&
 			 !intel_pt_validate_cap(vmx->pt_desc.caps,
@@ -1888,7 +1899,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		break;
 	case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B:
 		index = msr_info->index - MSR_IA32_RTIT_ADDR0_A;
-		if (!vmx_pt_mode_is_host_guest() ||
+		if (!pt_can_read_msr(vcpu, msr_info) ||
 		    (index >= 2 * vmx->pt_desc.nr_addr_ranges))
 			return 1;
 		if (index % 2)
@@ -2156,6 +2167,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		return vmx_set_vmx_msr(vcpu, msr_index, data);
 	case MSR_IA32_RTIT_CTL:
 		if (!vmx_pt_mode_is_host_guest() ||
+		    !guest_cpuid_has(vcpu, X86_FEATURE_INTEL_PT) ||
 			vmx_rtit_ctl_check(vcpu, data) ||
 			vmx->nested.vmxon)
 			return 1;
@@ -2164,14 +2176,14 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		pt_update_intercept_for_msr(vcpu);
 		break;
 	case MSR_IA32_RTIT_STATUS:
-		if (!pt_can_write_msr(vmx))
+		if (!pt_can_write_msr(vmx, msr_info))
 			return 1;
 		if (data & MSR_IA32_RTIT_STATUS_MASK)
 			return 1;
 		vmx->pt_desc.guest.status = data;
 		break;
 	case MSR_IA32_RTIT_CR3_MATCH:
-		if (!pt_can_write_msr(vmx))
+		if (!pt_can_write_msr(vmx, msr_info))
 			return 1;
 		if (!intel_pt_validate_cap(vmx->pt_desc.caps,
 					   PT_CAP_cr3_filtering))
@@ -2179,7 +2191,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		vmx->pt_desc.guest.cr3_match = data;
 		break;
 	case MSR_IA32_RTIT_OUTPUT_BASE:
-		if (!pt_can_write_msr(vmx))
+		if (!pt_can_write_msr(vmx, msr_info))
 			return 1;
 		if (!intel_pt_validate_cap(vmx->pt_desc.caps,
 					   PT_CAP_topa_output) &&
@@ -2191,7 +2203,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		vmx->pt_desc.guest.output_base = data;
 		break;
 	case MSR_IA32_RTIT_OUTPUT_MASK:
-		if (!pt_can_write_msr(vmx))
+		if (!pt_can_write_msr(vmx, msr_info))
 			return 1;
 		if (!intel_pt_validate_cap(vmx->pt_desc.caps,
 					   PT_CAP_topa_output) &&
@@ -2201,7 +2213,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		vmx->pt_desc.guest.output_mask = data;
 		break;
 	case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B:
-		if (!pt_can_write_msr(vmx))
+		if (!pt_can_write_msr(vmx, msr_info))
 			return 1;
 		index = msr_info->index - MSR_IA32_RTIT_ADDR0_A;
 		if (index >= 2 * vmx->pt_desc.nr_addr_ranges)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 6/7] KVM: VMX: Check Intel PT related CPUID leaves
  2021-08-27  7:02 [PATCH v2 0/7] KVM: VMX: PT (processor trace) optimization cleanup and fixes Xiaoyao Li
                   ` (4 preceding siblings ...)
  2021-08-27  7:02 ` [PATCH v2 5/7] KVM: VMX: Disallow PT MSRs accessing if PT is not exposed to guest Xiaoyao Li
@ 2021-08-27  7:02 ` Xiaoyao Li
  2021-09-09 21:41   ` Sean Christopherson
  2021-08-27  7:02 ` [PATCH v2 7/7] KVM: VMX: Only context switch some PT MSRs when they exist Xiaoyao Li
  2021-10-19 16:52 ` [PATCH v2 0/7] KVM: VMX: PT (processor trace) optimization cleanup and fixes Paolo Bonzini
  7 siblings, 1 reply; 21+ messages in thread
From: Xiaoyao Li @ 2021-08-27  7:02 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Xiaoyao Li, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, linux-kernel

CPUID 0xD leaves reports the capabilities of Intel PT, e.g. it decides
which bits are valid to be set in MSR_IA32_RTIT_CTL, and reports the
number of PT ADDR ranges.

KVM needs to check that guest CPUID values set by userspace doesn't
enable any bit which is not supported by bare metal. Otherwise,
1. it will trigger vm-entry failure if hardware unsupported bit is
   exposed to guest and set by guest.
2. it triggers #GP when context switch PT MSRs if exposing more
   RTIT_ADDR* MSRs than hardware capacity.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
There is bit 31 of CPUID(0xD, 0).ECX that doesn't restrict any bit in
MSR_IA32_RTIT_CTL. If guest has different value than host, it won't
cause any vm-entry failure, but guest will parse the PT packet with
wrong format.

I also check it to be same as host to ensure the virtualization correctness.

Changes in v2:
- Call out that if configuring more PT ADDR MSRs than hardware, it can
  cause #GP when context switch.
---
 arch/x86/kvm/cpuid.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 739be5da3bca..0c8e06a24156 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -76,6 +76,7 @@ static inline struct kvm_cpuid_entry2 *cpuid_entry2_find(
 static int kvm_check_cpuid(struct kvm_cpuid_entry2 *entries, int nent)
 {
 	struct kvm_cpuid_entry2 *best;
+	u32 eax, ebx, ecx, edx;
 
 	/*
 	 * The existing code assumes virtual address is 48-bit or 57-bit in the
@@ -89,6 +90,30 @@ static int kvm_check_cpuid(struct kvm_cpuid_entry2 *entries, int nent)
 			return -EINVAL;
 	}
 
+	/*
+	 * CPUID 0xD leaves tell Intel PT capabilities, which decides
+	 * pt_desc.ctl_bitmask in later update_intel_pt_cfg().
+	 *
+	 * pt_desc.ctl_bitmask decides the legal value for guest
+	 * MSR_IA32_RTIT_CTL. KVM cannot support PT capabilities beyond native,
+	 * otherwise it will trigger vm-entry failure if guest sets native
+	 * unsupported bits in MSR_IA32_RTIT_CTL.
+	 */
+	best = cpuid_entry2_find(entries, nent, 0xD, 0);
+	if (best) {
+		cpuid_count(0xD, 0, &eax, &ebx, &ecx, &edx);
+		if (best->ebx & ~ebx || best->ecx & ~ecx)
+			return -EINVAL;
+	}
+	best = cpuid_entry2_find(entries, nent, 0xD, 1);
+	if (best) {
+		cpuid_count(0xD, 0, &eax, &ebx, &ecx, &edx);
+		if (((best->eax & 0x7) > (eax & 0x7)) ||
+		    ((best->eax & ~eax) >> 16) ||
+		    (best->ebx & ~ebx))
+			return -EINVAL;
+	}
+
 	return 0;
 }
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 7/7] KVM: VMX: Only context switch some PT MSRs when they exist
  2021-08-27  7:02 [PATCH v2 0/7] KVM: VMX: PT (processor trace) optimization cleanup and fixes Xiaoyao Li
                   ` (5 preceding siblings ...)
  2021-08-27  7:02 ` [PATCH v2 6/7] KVM: VMX: Check Intel PT related CPUID leaves Xiaoyao Li
@ 2021-08-27  7:02 ` Xiaoyao Li
  2021-10-18 13:08   ` Paolo Bonzini
  2021-10-19 16:52 ` [PATCH v2 0/7] KVM: VMX: PT (processor trace) optimization cleanup and fixes Paolo Bonzini
  7 siblings, 1 reply; 21+ messages in thread
From: Xiaoyao Li @ 2021-08-27  7:02 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Xiaoyao Li, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, linux-kernel

The enumeration of Intel PT feature doesn't guarantee the existence of
MSR_IA32_RTIT_OUTPUT_BASE, MSR_IA32_RTIT_OUTPUT_MASK and
MSR_IA32_RTIT_CR3_MATCH. They need to be detected from CPUID 0x14 PT
leaves.

Detect the existence of them in hardware_setup() and only context switch
them when they exist. Otherwise it will cause #GP when access them.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 28 +++++++++++++++++++++-------
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 394ef4732838..6819fc470072 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -204,6 +204,9 @@ module_param(ple_window_max, uint, 0444);
 int __read_mostly pt_mode = PT_MODE_SYSTEM;
 module_param(pt_mode, int, S_IRUGO);
 
+static bool has_msr_rtit_cr3_match;
+static bool has_msr_rtit_output_x;
+
 static DEFINE_STATIC_KEY_FALSE(vmx_l1d_should_flush);
 static DEFINE_STATIC_KEY_FALSE(vmx_l1d_flush_cond);
 static DEFINE_MUTEX(vmx_l1d_flush_mutex);
@@ -1035,9 +1038,12 @@ static inline void pt_load_msr(struct pt_ctx *ctx, u32 addr_range)
 	u32 i;
 
 	wrmsrl(MSR_IA32_RTIT_STATUS, ctx->status);
-	wrmsrl(MSR_IA32_RTIT_OUTPUT_BASE, ctx->output_base);
-	wrmsrl(MSR_IA32_RTIT_OUTPUT_MASK, ctx->output_mask);
-	wrmsrl(MSR_IA32_RTIT_CR3_MATCH, ctx->cr3_match);
+	if (has_msr_rtit_output_x) {
+		wrmsrl(MSR_IA32_RTIT_OUTPUT_BASE, ctx->output_base);
+		wrmsrl(MSR_IA32_RTIT_OUTPUT_MASK, ctx->output_mask);
+	}
+	if (has_msr_rtit_cr3_match)
+		wrmsrl(MSR_IA32_RTIT_CR3_MATCH, ctx->cr3_match);
 	for (i = 0; i < addr_range; i++) {
 		wrmsrl(MSR_IA32_RTIT_ADDR0_A + i * 2, ctx->addr_a[i]);
 		wrmsrl(MSR_IA32_RTIT_ADDR0_B + i * 2, ctx->addr_b[i]);
@@ -1049,9 +1055,12 @@ static inline void pt_save_msr(struct pt_ctx *ctx, u32 addr_range)
 	u32 i;
 
 	rdmsrl(MSR_IA32_RTIT_STATUS, ctx->status);
-	rdmsrl(MSR_IA32_RTIT_OUTPUT_BASE, ctx->output_base);
-	rdmsrl(MSR_IA32_RTIT_OUTPUT_MASK, ctx->output_mask);
-	rdmsrl(MSR_IA32_RTIT_CR3_MATCH, ctx->cr3_match);
+	if (has_msr_rtit_output_x) {
+		rdmsrl(MSR_IA32_RTIT_OUTPUT_BASE, ctx->output_base);
+		rdmsrl(MSR_IA32_RTIT_OUTPUT_MASK, ctx->output_mask);
+	}
+	if (has_msr_rtit_cr3_match)
+		rdmsrl(MSR_IA32_RTIT_CR3_MATCH, ctx->cr3_match);
 	for (i = 0; i < addr_range; i++) {
 		rdmsrl(MSR_IA32_RTIT_ADDR0_A + i * 2, ctx->addr_a[i]);
 		rdmsrl(MSR_IA32_RTIT_ADDR0_B + i * 2, ctx->addr_b[i]);
@@ -7883,8 +7892,13 @@ static __init int hardware_setup(void)
 
 	if (pt_mode != PT_MODE_SYSTEM && pt_mode != PT_MODE_HOST_GUEST)
 		return -EINVAL;
-	if (!enable_ept || !cpu_has_vmx_intel_pt())
+	if (!enable_ept || !cpu_has_vmx_intel_pt()) {
 		pt_mode = PT_MODE_SYSTEM;
+	} else if (boot_cpu_has(X86_FEATURE_INTEL_PT)) {
+		has_msr_rtit_cr3_match = intel_pt_validate_hw_cap(PT_CAP_cr3_filtering);
+		has_msr_rtit_output_x = intel_pt_validate_hw_cap(PT_CAP_topa_output) ||
+					intel_pt_validate_hw_cap(PT_CAP_single_range_output);
+	}
 
 	setup_default_sgx_lepubkeyhash();
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 6/7] KVM: VMX: Check Intel PT related CPUID leaves
  2021-08-27  7:02 ` [PATCH v2 6/7] KVM: VMX: Check Intel PT related CPUID leaves Xiaoyao Li
@ 2021-09-09 21:41   ` Sean Christopherson
  2021-09-10  1:59     ` Xiaoyao Li
  0 siblings, 1 reply; 21+ messages in thread
From: Sean Christopherson @ 2021-09-09 21:41 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel

On Fri, Aug 27, 2021, Xiaoyao Li wrote:
> CPUID 0xD leaves reports the capabilities of Intel PT, e.g. it decides
> which bits are valid to be set in MSR_IA32_RTIT_CTL, and reports the
> number of PT ADDR ranges.
> 
> KVM needs to check that guest CPUID values set by userspace doesn't
> enable any bit which is not supported by bare metal. Otherwise,
> 1. it will trigger vm-entry failure if hardware unsupported bit is
>    exposed to guest and set by guest.
> 2. it triggers #GP when context switch PT MSRs if exposing more
>    RTIT_ADDR* MSRs than hardware capacity.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
> There is bit 31 of CPUID(0xD, 0).ECX that doesn't restrict any bit in
> MSR_IA32_RTIT_CTL. If guest has different value than host, it won't
> cause any vm-entry failure, but guest will parse the PT packet with
> wrong format.
> 
> I also check it to be same as host to ensure the virtualization correctness.
> 
> Changes in v2:
> - Call out that if configuring more PT ADDR MSRs than hardware, it can
>   cause #GP when context switch.
> ---
>  arch/x86/kvm/cpuid.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 739be5da3bca..0c8e06a24156 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -76,6 +76,7 @@ static inline struct kvm_cpuid_entry2 *cpuid_entry2_find(
>  static int kvm_check_cpuid(struct kvm_cpuid_entry2 *entries, int nent)
>  {
>  	struct kvm_cpuid_entry2 *best;
> +	u32 eax, ebx, ecx, edx;
>  
>  	/*
>  	 * The existing code assumes virtual address is 48-bit or 57-bit in the
> @@ -89,6 +90,30 @@ static int kvm_check_cpuid(struct kvm_cpuid_entry2 *entries, int nent)
>  			return -EINVAL;
>  	}
>  
> +	/*
> +	 * CPUID 0xD leaves tell Intel PT capabilities, which decides

CPUID.0xD is XSAVE state, CPUID.0x14 is Intel PT.  This series needs tests...

> +	 * pt_desc.ctl_bitmask in later update_intel_pt_cfg().
> +	 *
> +	 * pt_desc.ctl_bitmask decides the legal value for guest
> +	 * MSR_IA32_RTIT_CTL. KVM cannot support PT capabilities beyond native,
> +	 * otherwise it will trigger vm-entry failure if guest sets native
> +	 * unsupported bits in MSR_IA32_RTIT_CTL.
> +	 */
> +	best = cpuid_entry2_find(entries, nent, 0xD, 0);
> +	if (best) {
> +		cpuid_count(0xD, 0, &eax, &ebx, &ecx, &edx);
> +		if (best->ebx & ~ebx || best->ecx & ~ecx)
> +			return -EINVAL;
> +	}
> +	best = cpuid_entry2_find(entries, nent, 0xD, 1);
> +	if (best) {
> +		cpuid_count(0xD, 0, &eax, &ebx, &ecx, &edx);
> +		if (((best->eax & 0x7) > (eax & 0x7)) ||

Ugh, looking at the rest of the code, even this isn't sufficient because
pt_desc.guest.addr_{a,b} are hardcoded at 4 entries, i.e. running KVM on hardware
with >4 entries will lead to buffer overflows.

One option would be to bump that to the theoretical max of 15, which doesn't seem
too horrible, especially if pt_desc as a whole is allocated on-demand, which it
probably should be since it isn't exactly tiny (nor ubiquitous)

A different option would be to let userspace define whatever it wants for guest
CPUID, and instead cap nr_addr_ranges at min(host.cpuid, guest.cpuid, RTIT_ADDR_RANGE).

Letting userspace generate a bad MSR_IA32_RTIT_CTL is not problematic, there are
plenty of ways userspace can deliberately trigger VM-Entry failure due to invalid
guest state (even if this is a VM-Fail condition, it's not a danger to KVM).

> +		    ((best->eax & ~eax) >> 16) ||
> +		    (best->ebx & ~ebx))
> +			return -EINVAL;
> +	}
> +
>  	return 0;
>  }
>  
> -- 
> 2.27.0
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 6/7] KVM: VMX: Check Intel PT related CPUID leaves
  2021-09-09 21:41   ` Sean Christopherson
@ 2021-09-10  1:59     ` Xiaoyao Li
  2021-10-18  7:01       ` Xiaoyao Li
  2021-10-18 12:47       ` Paolo Bonzini
  0 siblings, 2 replies; 21+ messages in thread
From: Xiaoyao Li @ 2021-09-10  1:59 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel

On 9/10/2021 5:41 AM, Sean Christopherson wrote:
> On Fri, Aug 27, 2021, Xiaoyao Li wrote:
>> CPUID 0xD leaves reports the capabilities of Intel PT, e.g. it decides
>> which bits are valid to be set in MSR_IA32_RTIT_CTL, and reports the
>> number of PT ADDR ranges.
>>
>> KVM needs to check that guest CPUID values set by userspace doesn't
>> enable any bit which is not supported by bare metal. Otherwise,
>> 1. it will trigger vm-entry failure if hardware unsupported bit is
>>     exposed to guest and set by guest.
>> 2. it triggers #GP when context switch PT MSRs if exposing more
>>     RTIT_ADDR* MSRs than hardware capacity.
>>
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> ---
>> There is bit 31 of CPUID(0xD, 0).ECX that doesn't restrict any bit in
>> MSR_IA32_RTIT_CTL. If guest has different value than host, it won't
>> cause any vm-entry failure, but guest will parse the PT packet with
>> wrong format.
>>
>> I also check it to be same as host to ensure the virtualization correctness.
>>
>> Changes in v2:
>> - Call out that if configuring more PT ADDR MSRs than hardware, it can
>>    cause #GP when context switch.
>> ---
>>   arch/x86/kvm/cpuid.c | 25 +++++++++++++++++++++++++
>>   1 file changed, 25 insertions(+)
>>
>> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>> index 739be5da3bca..0c8e06a24156 100644
>> --- a/arch/x86/kvm/cpuid.c
>> +++ b/arch/x86/kvm/cpuid.c
>> @@ -76,6 +76,7 @@ static inline struct kvm_cpuid_entry2 *cpuid_entry2_find(
>>   static int kvm_check_cpuid(struct kvm_cpuid_entry2 *entries, int nent)
>>   {
>>   	struct kvm_cpuid_entry2 *best;
>> +	u32 eax, ebx, ecx, edx;
>>   
>>   	/*
>>   	 * The existing code assumes virtual address is 48-bit or 57-bit in the
>> @@ -89,6 +90,30 @@ static int kvm_check_cpuid(struct kvm_cpuid_entry2 *entries, int nent)
>>   			return -EINVAL;
>>   	}
>>   
>> +	/*
>> +	 * CPUID 0xD leaves tell Intel PT capabilities, which decides
> 
> CPUID.0xD is XSAVE state, CPUID.0x14 is Intel PT.  This series needs tests...

My apologize.

>> +	 * pt_desc.ctl_bitmask in later update_intel_pt_cfg().
>> +	 *
>> +	 * pt_desc.ctl_bitmask decides the legal value for guest
>> +	 * MSR_IA32_RTIT_CTL. KVM cannot support PT capabilities beyond native,
>> +	 * otherwise it will trigger vm-entry failure if guest sets native
>> +	 * unsupported bits in MSR_IA32_RTIT_CTL.
>> +	 */
>> +	best = cpuid_entry2_find(entries, nent, 0xD, 0);
>> +	if (best) {
>> +		cpuid_count(0xD, 0, &eax, &ebx, &ecx, &edx);
>> +		if (best->ebx & ~ebx || best->ecx & ~ecx)
>> +			return -EINVAL;
>> +	}
>> +	best = cpuid_entry2_find(entries, nent, 0xD, 1);
>> +	if (best) {
>> +		cpuid_count(0xD, 0, &eax, &ebx, &ecx, &edx);
>> +		if (((best->eax & 0x7) > (eax & 0x7)) ||
> 
> Ugh, looking at the rest of the code, even this isn't sufficient because
> pt_desc.guest.addr_{a,b} are hardcoded at 4 entries, i.e. running KVM on hardware
> with >4 entries will lead to buffer overflows.

it's hardcoded to 4 because there is a note of "no processors support 
more than 4 address ranges" in SDM vol.3 Chapter 31.3.1, table 31-11

> One option would be to bump that to the theoretical max of 15, which doesn't seem
> too horrible, especially if pt_desc as a whole is allocated on-demand, which it
> probably should be since it isn't exactly tiny (nor ubiquitous)
> 
> A different option would be to let userspace define whatever it wants for guest
> CPUID, and instead cap nr_addr_ranges at min(host.cpuid, guest.cpuid, RTIT_ADDR_RANGE).
> 
> Letting userspace generate a bad MSR_IA32_RTIT_CTL is not problematic, there are
> plenty of ways userspace can deliberately trigger VM-Entry failure due to invalid
> guest state (even if this is a VM-Fail condition, it's not a danger to KVM).

I'm fine to only safe guard the nr_addr_range if VM-Entry failure 
doesn't matter.

> 
>> +		    ((best->eax & ~eax) >> 16) ||
>> +		    (best->ebx & ~ebx))
>> +			return -EINVAL;
>> +	}
>> +
>>   	return 0;
>>   }
>>   
>> -- 
>> 2.27.0
>>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 6/7] KVM: VMX: Check Intel PT related CPUID leaves
  2021-09-10  1:59     ` Xiaoyao Li
@ 2021-10-18  7:01       ` Xiaoyao Li
  2021-10-18 12:47       ` Paolo Bonzini
  1 sibling, 0 replies; 21+ messages in thread
From: Xiaoyao Li @ 2021-10-18  7:01 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel

On 9/10/2021 9:59 AM, Xiaoyao Li wrote:
> On 9/10/2021 5:41 AM, Sean Christopherson wrote:
>> On Fri, Aug 27, 2021, Xiaoyao Li wrote:
>>> CPUID 0xD leaves reports the capabilities of Intel PT, e.g. it decides
>>> which bits are valid to be set in MSR_IA32_RTIT_CTL, and reports the
>>> number of PT ADDR ranges.
>>>
>>> KVM needs to check that guest CPUID values set by userspace doesn't
>>> enable any bit which is not supported by bare metal. Otherwise,
>>> 1. it will trigger vm-entry failure if hardware unsupported bit is
>>>     exposed to guest and set by guest.
>>> 2. it triggers #GP when context switch PT MSRs if exposing more
>>>     RTIT_ADDR* MSRs than hardware capacity.
>>>
>>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
..
> 
>>> +     * pt_desc.ctl_bitmask in later update_intel_pt_cfg().
>>> +     *
>>> +     * pt_desc.ctl_bitmask decides the legal value for guest
>>> +     * MSR_IA32_RTIT_CTL. KVM cannot support PT capabilities beyond 
>>> native,
>>> +     * otherwise it will trigger vm-entry failure if guest sets native
>>> +     * unsupported bits in MSR_IA32_RTIT_CTL.
>>> +     */
>>> +    best = cpuid_entry2_find(entries, nent, 0xD, 0);
>>> +    if (best) {
>>> +        cpuid_count(0xD, 0, &eax, &ebx, &ecx, &edx);
>>> +        if (best->ebx & ~ebx || best->ecx & ~ecx)
>>> +            return -EINVAL;
>>> +    }
>>> +    best = cpuid_entry2_find(entries, nent, 0xD, 1);
>>> +    if (best) {
>>> +        cpuid_count(0xD, 0, &eax, &ebx, &ecx, &edx);
>>> +        if (((best->eax & 0x7) > (eax & 0x7)) ||
>>
>> Ugh, looking at the rest of the code, even this isn't sufficient because
>> pt_desc.guest.addr_{a,b} are hardcoded at 4 entries, i.e. running KVM 
>> on hardware
>> with >4 entries will lead to buffer overflows.
> 
> it's hardcoded to 4 because there is a note of "no processors support 
> more than 4 address ranges" in SDM vol.3 Chapter 31.3.1, table 31-11
> 
>> One option would be to bump that to the theoretical max of 15, which 
>> doesn't seem
>> too horrible, especially if pt_desc as a whole is allocated on-demand, 
>> which it
>> probably should be since it isn't exactly tiny (nor ubiquitous)
>>
>> A different option would be to let userspace define whatever it wants 
>> for guest
>> CPUID, and instead cap nr_addr_ranges at min(host.cpuid, guest.cpuid, 
>> RTIT_ADDR_RANGE).
>>
>> Letting userspace generate a bad MSR_IA32_RTIT_CTL is not problematic, 
>> there are
>> plenty of ways userspace can deliberately trigger VM-Entry failure due 
>> to invalid
>> guest state (even if this is a VM-Fail condition, it's not a danger to 
>> KVM).
> 
> I'm fine to only safe guard the nr_addr_range if VM-Entry failure 
> doesn't matter.

Hi Sean.

It seems I misread your comment. All above you were talking about the 
check on nr_addr_range. Did you want to say the check is not necessary 
if it's to avoid VM-entry failure?

The problem is 1) the check on nr_addr_range is to avoid MSR read #GP, 
thought kernel will fix the #GP. It still prints the warning message.

2) Other check of this Patch on guest CPUID 0x14 is to avoid VM-entry 
failure.

So I want to ask that do you think both 1) and 2) are unnecessary, or 
only 2) ?


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/7] KVM: VMX: Rename pt_desc.addr_range to pt_desc.nr_addr_range
  2021-08-27  7:02 ` [PATCH v2 3/7] KVM: VMX: Rename pt_desc.addr_range to pt_desc.nr_addr_range Xiaoyao Li
@ 2021-10-18 12:41   ` Paolo Bonzini
  0 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2021-10-18 12:41 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel

On 27/08/21 09:02, Xiaoyao Li wrote:
> To better self explain the meaning of this field.
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>

Let's use num_addr_ranges to map the PT_CAP constant.

Paolo

>   arch/x86/kvm/vmx/vmx.c | 26 +++++++++++++-------------
>   arch/x86/kvm/vmx/vmx.h |  2 +-
>   2 files changed, 14 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 96a2df65678f..c54b99cec0e6 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1059,8 +1059,8 @@ static void pt_guest_enter(struct vcpu_vmx *vmx)
>   	rdmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc.host.ctl);
>   	if (vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) {
>   		wrmsrl(MSR_IA32_RTIT_CTL, 0);
> -		pt_save_msr(&vmx->pt_desc.host, vmx->pt_desc.addr_range);
> -		pt_load_msr(&vmx->pt_desc.guest, vmx->pt_desc.addr_range);
> +		pt_save_msr(&vmx->pt_desc.host, vmx->pt_desc.nr_addr_ranges);
> +		pt_load_msr(&vmx->pt_desc.guest, vmx->pt_desc.nr_addr_ranges);
>   	}
>   }
>   
> @@ -1070,8 +1070,8 @@ static void pt_guest_exit(struct vcpu_vmx *vmx)
>   		return;
>   
>   	if (vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) {
> -		pt_save_msr(&vmx->pt_desc.guest, vmx->pt_desc.addr_range);
> -		pt_load_msr(&vmx->pt_desc.host, vmx->pt_desc.addr_range);
> +		pt_save_msr(&vmx->pt_desc.guest, vmx->pt_desc.nr_addr_ranges);
> +		pt_load_msr(&vmx->pt_desc.host, vmx->pt_desc.nr_addr_ranges);
>   	}
>   
>   	/*
> @@ -1460,16 +1460,16 @@ static int vmx_rtit_ctl_check(struct kvm_vcpu *vcpu, u64 data)
>   	 * cause a #GP fault.
>   	 */
>   	value = (data & RTIT_CTL_ADDR0) >> RTIT_CTL_ADDR0_OFFSET;
> -	if ((value && (vmx->pt_desc.addr_range < 1)) || (value > 2))
> +	if ((value && (vmx->pt_desc.nr_addr_ranges < 1)) || (value > 2))
>   		return 1;
>   	value = (data & RTIT_CTL_ADDR1) >> RTIT_CTL_ADDR1_OFFSET;
> -	if ((value && (vmx->pt_desc.addr_range < 2)) || (value > 2))
> +	if ((value && (vmx->pt_desc.nr_addr_ranges < 2)) || (value > 2))
>   		return 1;
>   	value = (data & RTIT_CTL_ADDR2) >> RTIT_CTL_ADDR2_OFFSET;
> -	if ((value && (vmx->pt_desc.addr_range < 3)) || (value > 2))
> +	if ((value && (vmx->pt_desc.nr_addr_ranges < 3)) || (value > 2))
>   		return 1;
>   	value = (data & RTIT_CTL_ADDR3) >> RTIT_CTL_ADDR3_OFFSET;
> -	if ((value && (vmx->pt_desc.addr_range < 4)) || (value > 2))
> +	if ((value && (vmx->pt_desc.nr_addr_ranges < 4)) || (value > 2))
>   		return 1;
>   
>   	return 0;
> @@ -1889,7 +1889,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   	case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B:
>   		index = msr_info->index - MSR_IA32_RTIT_ADDR0_A;
>   		if (!vmx_pt_mode_is_host_guest() ||
> -		    (index >= 2 * vmx->pt_desc.addr_range))
> +		    (index >= 2 * vmx->pt_desc.nr_addr_ranges))
>   			return 1;
>   		if (index % 2)
>   			msr_info->data = vmx->pt_desc.guest.addr_b[index / 2];
> @@ -2204,7 +2204,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		if (!pt_can_write_msr(vmx))
>   			return 1;
>   		index = msr_info->index - MSR_IA32_RTIT_ADDR0_A;
> -		if (index >= 2 * vmx->pt_desc.addr_range)
> +		if (index >= 2 * vmx->pt_desc.nr_addr_ranges)
>   			return 1;
>   		if (is_noncanonical_address(data, vcpu))
>   			return 1;
> @@ -3880,7 +3880,7 @@ void pt_update_intercept_for_msr(struct kvm_vcpu *vcpu)
>   	vmx_set_intercept_for_msr(vcpu, MSR_IA32_RTIT_OUTPUT_BASE, MSR_TYPE_RW, flag);
>   	vmx_set_intercept_for_msr(vcpu, MSR_IA32_RTIT_OUTPUT_MASK, MSR_TYPE_RW, flag);
>   	vmx_set_intercept_for_msr(vcpu, MSR_IA32_RTIT_CR3_MATCH, MSR_TYPE_RW, flag);
> -	for (i = 0; i < vmx->pt_desc.addr_range; i++) {
> +	for (i = 0; i < vmx->pt_desc.nr_addr_ranges; i++) {
>   		vmx_set_intercept_for_msr(vcpu, MSR_IA32_RTIT_ADDR0_A + i * 2, MSR_TYPE_RW, flag);
>   		vmx_set_intercept_for_msr(vcpu, MSR_IA32_RTIT_ADDR0_B + i * 2, MSR_TYPE_RW, flag);
>   	}
> @@ -7113,7 +7113,7 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
>   	}
>   
>   	/* Get the number of configurable Address Ranges for filtering */
> -	vmx->pt_desc.addr_range = intel_pt_validate_cap(vmx->pt_desc.caps,
> +	vmx->pt_desc.nr_addr_ranges = intel_pt_validate_cap(vmx->pt_desc.caps,
>   						PT_CAP_num_address_ranges);
>   
>   	/* Initialize and clear the no dependency bits */
> @@ -7161,7 +7161,7 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
>   		vmx->pt_desc.ctl_bitmask &= ~RTIT_CTL_FABRIC_EN;
>   
>   	/* unmask address range configure area */
> -	for (i = 0; i < vmx->pt_desc.addr_range; i++)
> +	for (i = 0; i < vmx->pt_desc.nr_addr_ranges; i++)
>   		vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4));
>   }
>   
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index 4858c5fd95f2..f48eafbbed0e 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -62,7 +62,7 @@ struct pt_ctx {
>   
>   struct pt_desc {
>   	u64 ctl_bitmask;
> -	u32 addr_range;
> +	u32 nr_addr_ranges;
>   	u32 caps[PT_CPUID_REGS_NUM * PT_CPUID_LEAVES];
>   	struct pt_ctx host;
>   	struct pt_ctx guest;
> 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 5/7] KVM: VMX: Disallow PT MSRs accessing if PT is not exposed to guest
  2021-08-27  7:02 ` [PATCH v2 5/7] KVM: VMX: Disallow PT MSRs accessing if PT is not exposed to guest Xiaoyao Li
@ 2021-10-18 12:46   ` Paolo Bonzini
  0 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2021-10-18 12:46 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel

On 27/08/21 09:02, Xiaoyao Li wrote:
> Per SDM, it triggers #GP for all the accessing of PT MSRs, if
> X86_FEATURE_INTEL_PT is not available.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
> Changes in v2:
>   - allow userspace/host access regradless of PT bit, (Sean)
> ---
>   arch/x86/kvm/vmx/vmx.c | 38 +++++++++++++++++++++++++-------------
>   1 file changed, 25 insertions(+), 13 deletions(-)

Let's cache this in vmx->pt_desc.  More precisely:

- always call update_intel_pt_cfg from vmx_vcpu_after_set_cpuid

- add a field vmx->pt_desc.available matching guest_cpuid_has(vcpu, 
X86_FEATURE_INTEL_PT)

- if it is false, clear _all_ of vmx->pt_desc (with memcpy) and return 
early from update_intel_pt_cfg

Thanks,

Paolo

> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index b9d640029c40..394ef4732838 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1007,10 +1007,21 @@ static unsigned long segment_base(u16 selector)
>   }
>   #endif
>   
> -static inline bool pt_can_write_msr(struct vcpu_vmx *vmx)
> +static inline bool pt_can_write_msr(struct vcpu_vmx *vmx,
> +				    struct msr_data *msr_info)
>   {
>   	return vmx_pt_mode_is_host_guest() &&
> -	       !(vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN);
> +	       !(vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) &&
> +	       (msr_info->host_initiated ||
> +		guest_cpuid_has(&vmx->vcpu, X86_FEATURE_INTEL_PT));
> +}
> +
> +static inline bool pt_can_read_msr(struct kvm_vcpu *vcpu,
> +				   struct msr_data *msr_info)
> +{
> +	return vmx_pt_mode_is_host_guest() &&
> +	       (msr_info->host_initiated ||
> +		guest_cpuid_has(vcpu, X86_FEATURE_INTEL_PT));
>   }
>   
>   static inline bool pt_output_base_valid(struct kvm_vcpu *vcpu, u64 base)
> @@ -1852,24 +1863,24 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   							&msr_info->data);
>   		break;
>   	case MSR_IA32_RTIT_CTL:
> -		if (!vmx_pt_mode_is_host_guest())
> +		if (!pt_can_read_msr(vcpu, msr_info))
>   			return 1;
>   		msr_info->data = vmx->pt_desc.guest.ctl;
>   		break;
>   	case MSR_IA32_RTIT_STATUS:
> -		if (!vmx_pt_mode_is_host_guest())
> +		if (!pt_can_read_msr(vcpu, msr_info))
>   			return 1;
>   		msr_info->data = vmx->pt_desc.guest.status;
>   		break;
>   	case MSR_IA32_RTIT_CR3_MATCH:
> -		if (!vmx_pt_mode_is_host_guest() ||
> +		if (!pt_can_read_msr(vcpu, msr_info) ||
>   			!intel_pt_validate_cap(vmx->pt_desc.caps,
>   						PT_CAP_cr3_filtering))
>   			return 1;
>   		msr_info->data = vmx->pt_desc.guest.cr3_match;
>   		break;
>   	case MSR_IA32_RTIT_OUTPUT_BASE:
> -		if (!vmx_pt_mode_is_host_guest() ||
> +		if (!pt_can_read_msr(vcpu, msr_info) ||
>   			(!intel_pt_validate_cap(vmx->pt_desc.caps,
>   					PT_CAP_topa_output) &&
>   			 !intel_pt_validate_cap(vmx->pt_desc.caps,
> @@ -1878,7 +1889,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		msr_info->data = vmx->pt_desc.guest.output_base;
>   		break;
>   	case MSR_IA32_RTIT_OUTPUT_MASK:
> -		if (!vmx_pt_mode_is_host_guest() ||
> +		if (!pt_can_read_msr(vcpu, msr_info) ||
>   			(!intel_pt_validate_cap(vmx->pt_desc.caps,
>   					PT_CAP_topa_output) &&
>   			 !intel_pt_validate_cap(vmx->pt_desc.caps,
> @@ -1888,7 +1899,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		break;
>   	case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B:
>   		index = msr_info->index - MSR_IA32_RTIT_ADDR0_A;
> -		if (!vmx_pt_mode_is_host_guest() ||
> +		if (!pt_can_read_msr(vcpu, msr_info) ||
>   		    (index >= 2 * vmx->pt_desc.nr_addr_ranges))
>   			return 1;
>   		if (index % 2)
> @@ -2156,6 +2167,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		return vmx_set_vmx_msr(vcpu, msr_index, data);
>   	case MSR_IA32_RTIT_CTL:
>   		if (!vmx_pt_mode_is_host_guest() ||
> +		    !guest_cpuid_has(vcpu, X86_FEATURE_INTEL_PT) ||
>   			vmx_rtit_ctl_check(vcpu, data) ||
>   			vmx->nested.vmxon)
>   			return 1;
> @@ -2164,14 +2176,14 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		pt_update_intercept_for_msr(vcpu);
>   		break;
>   	case MSR_IA32_RTIT_STATUS:
> -		if (!pt_can_write_msr(vmx))
> +		if (!pt_can_write_msr(vmx, msr_info))
>   			return 1;
>   		if (data & MSR_IA32_RTIT_STATUS_MASK)
>   			return 1;
>   		vmx->pt_desc.guest.status = data;
>   		break;
>   	case MSR_IA32_RTIT_CR3_MATCH:
> -		if (!pt_can_write_msr(vmx))
> +		if (!pt_can_write_msr(vmx, msr_info))
>   			return 1;
>   		if (!intel_pt_validate_cap(vmx->pt_desc.caps,
>   					   PT_CAP_cr3_filtering))
> @@ -2179,7 +2191,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		vmx->pt_desc.guest.cr3_match = data;
>   		break;
>   	case MSR_IA32_RTIT_OUTPUT_BASE:
> -		if (!pt_can_write_msr(vmx))
> +		if (!pt_can_write_msr(vmx, msr_info))
>   			return 1;
>   		if (!intel_pt_validate_cap(vmx->pt_desc.caps,
>   					   PT_CAP_topa_output) &&
> @@ -2191,7 +2203,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		vmx->pt_desc.guest.output_base = data;
>   		break;
>   	case MSR_IA32_RTIT_OUTPUT_MASK:
> -		if (!pt_can_write_msr(vmx))
> +		if (!pt_can_write_msr(vmx, msr_info))
>   			return 1;
>   		if (!intel_pt_validate_cap(vmx->pt_desc.caps,
>   					   PT_CAP_topa_output) &&
> @@ -2201,7 +2213,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		vmx->pt_desc.guest.output_mask = data;
>   		break;
>   	case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B:
> -		if (!pt_can_write_msr(vmx))
> +		if (!pt_can_write_msr(vmx, msr_info))
>   			return 1;
>   		index = msr_info->index - MSR_IA32_RTIT_ADDR0_A;
>   		if (index >= 2 * vmx->pt_desc.nr_addr_ranges)
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 6/7] KVM: VMX: Check Intel PT related CPUID leaves
  2021-09-10  1:59     ` Xiaoyao Li
  2021-10-18  7:01       ` Xiaoyao Li
@ 2021-10-18 12:47       ` Paolo Bonzini
  2021-10-18 13:56         ` Xiaoyao Li
  1 sibling, 1 reply; 21+ messages in thread
From: Paolo Bonzini @ 2021-10-18 12:47 UTC (permalink / raw)
  To: Xiaoyao Li, Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel

On 10/09/21 03:59, Xiaoyao Li wrote:
>> 
>> Ugh, looking at the rest of the code, even this isn't sufficient
>> because pt_desc.guest.addr_{a,b} are hardcoded at 4 entries, i.e.
>> running KVM on hardware with >4 entries will lead to buffer
>> overflows.
> 
> it's hardcoded to 4 because there is a note of "no processors support
>  more than 4 address ranges" in SDM vol.3 Chapter 31.3.1, table
> 31-11

True, but I agree with Sean that it's not pretty.

>> One option would be to bump that to the theoretical max of 15,
>> which doesn't seem too horrible, especially if pt_desc as a whole
>> is allocated on-demand, which it probably should be since it isn't
>> exactly tiny (nor ubiquitous)
>> 
>> A different option would be to let userspace define whatever it
>> wants for guest CPUID, and instead cap nr_addr_ranges at
>> min(host.cpuid, guest.cpuid, RTIT_ADDR_RANGE).

This is the safest option.

Paolo


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 7/7] KVM: VMX: Only context switch some PT MSRs when they exist
  2021-08-27  7:02 ` [PATCH v2 7/7] KVM: VMX: Only context switch some PT MSRs when they exist Xiaoyao Li
@ 2021-10-18 13:08   ` Paolo Bonzini
  2021-10-18 14:04     ` Xiaoyao Li
  0 siblings, 1 reply; 21+ messages in thread
From: Paolo Bonzini @ 2021-10-18 13:08 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel

On 27/08/21 09:02, Xiaoyao Li wrote:
> The enumeration of Intel PT feature doesn't guarantee the existence of
> MSR_IA32_RTIT_OUTPUT_BASE, MSR_IA32_RTIT_OUTPUT_MASK and
> MSR_IA32_RTIT_CR3_MATCH. They need to be detected from CPUID 0x14 PT
> leaves.
> 
> Detect the existence of them in hardware_setup() and only context switch
> them when they exist. Otherwise it will cause #GP when access them.

If intel_pt_validate_hw_cap is not fast enough, it should be optimized.
Something like this:

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index 7f406c14715f..9c7167dcb719 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -41,13 +41,15 @@ static struct pt_pmu pt_pmu;
   * permitted values for certain bit fields).
   */
  #define PT_CAP(_n, _l, _r, _m)						\
-	[PT_CAP_ ## _n] = { .name = __stringify(_n), .leaf = _l,	\
-			    .reg = _r, .mask = _m }
+	[PT_CAP_ ## _n] = { .name = __stringify(_n),			\
+			    .index = _l * PT_CPUID_REGS_NUM + _r,	\
+			    .shift = __CONSTANT_FFS_32(_m),		\
+			    .mask = _m }
  
  static struct pt_cap_desc {
  	const char	*name;
-	u32		leaf;
-	u8		reg;
+	u16		index;
+	u8		shift;
  	u32		mask;
  } pt_caps[] = {
  	PT_CAP(max_subleaf,		0, CPUID_EAX, 0xffffffff),
@@ -71,10 +73,8 @@ static struct pt_cap_desc {
  u32 intel_pt_validate_cap(u32 *caps, enum pt_capabilities capability)
  {
  	struct pt_cap_desc *cd = &pt_caps[capability];
-	u32 c = caps[cd->leaf * PT_CPUID_REGS_NUM + cd->reg];
-	unsigned int shift = __ffs(cd->mask);
  
-	return (c & cd->mask) >> shift;
+	return (caps[cd->index] & cd->mask) >> cd->shift;
  }
  EXPORT_SYMBOL_GPL(intel_pt_validate_cap);
  
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 5e62e2383b7f..b4ee28d91b77 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -211,6 +211,17 @@ static inline int get_count_order_long(unsigned long l)
  	return (int)fls_long(--l);
  }
  
+#define __CONSTANT_FFS_2(w) \
+	(((w) & 1) == 0)
+#define __CONSTANT_FFS_4(w) \
+	(((w) & 0x3) == 0 ? 2 + __CONSTANT_FFS_2((w) >> 2) : __CONSTANT_FFS_2((w)))
+#define __CONSTANT_FFS_8(w) \
+	(((w) & 0xf) == 0 ? 4 + __CONSTANT_FFS_4((w) >> 4) : __CONSTANT_FFS_4((w)))
+#define __CONSTANT_FFS_16(w) \
+	(((w) & 0xff) == 0 ? 8 + __CONSTANT_FFS_8((w) >> 8) : __CONSTANT_FFS_8((w)))
+#define __CONSTANT_FFS_32(w) \
+	(((w) & 0xffff) == 0 ? 16 + __CONSTANT_FFS_16((w) >> 16) : __CONSTANT_FFS_16((w)))
+
  /**
   * __ffs64 - find first set bit in a 64 bit word
   * @word: The 64 bit word


Paolo

> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
>   arch/x86/kvm/vmx/vmx.c | 28 +++++++++++++++++++++-------
>   1 file changed, 21 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 394ef4732838..6819fc470072 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -204,6 +204,9 @@ module_param(ple_window_max, uint, 0444);
>   int __read_mostly pt_mode = PT_MODE_SYSTEM;
>   module_param(pt_mode, int, S_IRUGO);
>   
> +static bool has_msr_rtit_cr3_match;
> +static bool has_msr_rtit_output_x;
> +
>   static DEFINE_STATIC_KEY_FALSE(vmx_l1d_should_flush);
>   static DEFINE_STATIC_KEY_FALSE(vmx_l1d_flush_cond);
>   static DEFINE_MUTEX(vmx_l1d_flush_mutex);
> @@ -1035,9 +1038,12 @@ static inline void pt_load_msr(struct pt_ctx *ctx, u32 addr_range)
>   	u32 i;
>   
>   	wrmsrl(MSR_IA32_RTIT_STATUS, ctx->status);
> -	wrmsrl(MSR_IA32_RTIT_OUTPUT_BASE, ctx->output_base);
> -	wrmsrl(MSR_IA32_RTIT_OUTPUT_MASK, ctx->output_mask);
> -	wrmsrl(MSR_IA32_RTIT_CR3_MATCH, ctx->cr3_match);
> +	if (has_msr_rtit_output_x) {
> +		wrmsrl(MSR_IA32_RTIT_OUTPUT_BASE, ctx->output_base);
> +		wrmsrl(MSR_IA32_RTIT_OUTPUT_MASK, ctx->output_mask);
> +	}
> +	if (has_msr_rtit_cr3_match)
> +		wrmsrl(MSR_IA32_RTIT_CR3_MATCH, ctx->cr3_match);
>   	for (i = 0; i < addr_range; i++) {
>   		wrmsrl(MSR_IA32_RTIT_ADDR0_A + i * 2, ctx->addr_a[i]);
>   		wrmsrl(MSR_IA32_RTIT_ADDR0_B + i * 2, ctx->addr_b[i]);
> @@ -1049,9 +1055,12 @@ static inline void pt_save_msr(struct pt_ctx *ctx, u32 addr_range)
>   	u32 i;
>   
>   	rdmsrl(MSR_IA32_RTIT_STATUS, ctx->status);
> -	rdmsrl(MSR_IA32_RTIT_OUTPUT_BASE, ctx->output_base);
> -	rdmsrl(MSR_IA32_RTIT_OUTPUT_MASK, ctx->output_mask);
> -	rdmsrl(MSR_IA32_RTIT_CR3_MATCH, ctx->cr3_match);
> +	if (has_msr_rtit_output_x) {
> +		rdmsrl(MSR_IA32_RTIT_OUTPUT_BASE, ctx->output_base);
> +		rdmsrl(MSR_IA32_RTIT_OUTPUT_MASK, ctx->output_mask);
> +	}
> +	if (has_msr_rtit_cr3_match)
> +		rdmsrl(MSR_IA32_RTIT_CR3_MATCH, ctx->cr3_match);
>   	for (i = 0; i < addr_range; i++) {
>   		rdmsrl(MSR_IA32_RTIT_ADDR0_A + i * 2, ctx->addr_a[i]);
>   		rdmsrl(MSR_IA32_RTIT_ADDR0_B + i * 2, ctx->addr_b[i]);
> @@ -7883,8 +7892,13 @@ static __init int hardware_setup(void)
>   
>   	if (pt_mode != PT_MODE_SYSTEM && pt_mode != PT_MODE_HOST_GUEST)
>   		return -EINVAL;
> -	if (!enable_ept || !cpu_has_vmx_intel_pt())
> +	if (!enable_ept || !cpu_has_vmx_intel_pt()) {
>   		pt_mode = PT_MODE_SYSTEM;
> +	} else if (boot_cpu_has(X86_FEATURE_INTEL_PT)) {
> +		has_msr_rtit_cr3_match = intel_pt_validate_hw_cap(PT_CAP_cr3_filtering);
> +		has_msr_rtit_output_x = intel_pt_validate_hw_cap(PT_CAP_topa_output) ||
> +					intel_pt_validate_hw_cap(PT_CAP_single_range_output);
> +	}
>   
>   	setup_default_sgx_lepubkeyhash();
>   
> 


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 6/7] KVM: VMX: Check Intel PT related CPUID leaves
  2021-10-18 12:47       ` Paolo Bonzini
@ 2021-10-18 13:56         ` Xiaoyao Li
  2021-10-18 17:26           ` Sean Christopherson
  0 siblings, 1 reply; 21+ messages in thread
From: Xiaoyao Li @ 2021-10-18 13:56 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel

On 10/18/2021 8:47 PM, Paolo Bonzini wrote:
> On 10/09/21 03:59, Xiaoyao Li wrote:
>>>
>>> Ugh, looking at the rest of the code, even this isn't sufficient
>>> because pt_desc.guest.addr_{a,b} are hardcoded at 4 entries, i.e.
>>> running KVM on hardware with >4 entries will lead to buffer
>>> overflows.
>>
>> it's hardcoded to 4 because there is a note of "no processors support
>>  more than 4 address ranges" in SDM vol.3 Chapter 31.3.1, table
>> 31-11
> 
> True, but I agree with Sean that it's not pretty.

Yes. We can add the check to validate the hardware is not bigger than 4 
for future processors. Intel folks are supposed to send new patches 
before silicon with more than 4 PT_ADDR_RANGE delivered.

>>> One option would be to bump that to the theoretical max of 15,
>>> which doesn't seem too horrible, especially if pt_desc as a whole
>>> is allocated on-demand, which it probably should be since it isn't
>>> exactly tiny (nor ubiquitous)
>>>
>>> A different option would be to let userspace define whatever it
>>> wants for guest CPUID, and instead cap nr_addr_ranges at
>>> min(host.cpuid, guest.cpuid, RTIT_ADDR_RANGE).
> 
> This is the safest option.

My concern was that change userspace's input silently is not good. If 
you prefer this, we certainly need to extend the userspace to query what 
value is finally accepted and set by KVM.

> Paolo
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 7/7] KVM: VMX: Only context switch some PT MSRs when they exist
  2021-10-18 13:08   ` Paolo Bonzini
@ 2021-10-18 14:04     ` Xiaoyao Li
  2021-10-18 15:20       ` Paolo Bonzini
  0 siblings, 1 reply; 21+ messages in thread
From: Xiaoyao Li @ 2021-10-18 14:04 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel

On 10/18/2021 9:08 PM, Paolo Bonzini wrote:
> On 27/08/21 09:02, Xiaoyao Li wrote:
>> The enumeration of Intel PT feature doesn't guarantee the existence of
>> MSR_IA32_RTIT_OUTPUT_BASE, MSR_IA32_RTIT_OUTPUT_MASK and
>> MSR_IA32_RTIT_CR3_MATCH. They need to be detected from CPUID 0x14 PT
>> leaves.
>>
>> Detect the existence of them in hardware_setup() and only context switch
>> them when they exist. Otherwise it will cause #GP when access them.
> 
> If intel_pt_validate_hw_cap is not fast enough, it should be optimized.
> Something like this:

If I understand correctly, you mean we don't need to cache the existence 
with two variable, right?

> diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
> index 7f406c14715f..9c7167dcb719 100644
> --- a/arch/x86/events/intel/pt.c
> +++ b/arch/x86/events/intel/pt.c
> @@ -41,13 +41,15 @@ static struct pt_pmu pt_pmu;
>    * permitted values for certain bit fields).
>    */
>   #define PT_CAP(_n, _l, _r, _m)                        \
> -    [PT_CAP_ ## _n] = { .name = __stringify(_n), .leaf = _l,    \
> -                .reg = _r, .mask = _m }
> +    [PT_CAP_ ## _n] = { .name = __stringify(_n),            \
> +                .index = _l * PT_CPUID_REGS_NUM + _r,    \
> +                .shift = __CONSTANT_FFS_32(_m),        \
> +                .mask = _m }
> 
>   static struct pt_cap_desc {
>       const char    *name;
> -    u32        leaf;
> -    u8        reg;
> +    u16        index;
> +    u8        shift;
>       u32        mask;
>   } pt_caps[] = {
>       PT_CAP(max_subleaf,        0, CPUID_EAX, 0xffffffff),
> @@ -71,10 +73,8 @@ static struct pt_cap_desc {
>   u32 intel_pt_validate_cap(u32 *caps, enum pt_capabilities capability)
>   {
>       struct pt_cap_desc *cd = &pt_caps[capability];
> -    u32 c = caps[cd->leaf * PT_CPUID_REGS_NUM + cd->reg];
> -    unsigned int shift = __ffs(cd->mask);
> 
> -    return (c & cd->mask) >> shift;
> +    return (caps[cd->index] & cd->mask) >> cd->shift;
>   }
>   EXPORT_SYMBOL_GPL(intel_pt_validate_cap);
> 
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index 5e62e2383b7f..b4ee28d91b77 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -211,6 +211,17 @@ static inline int get_count_order_long(unsigned 
> long l)
>       return (int)fls_long(--l);
>   }
> 
> +#define __CONSTANT_FFS_2(w) \
> +    (((w) & 1) == 0)
> +#define __CONSTANT_FFS_4(w) \
> +    (((w) & 0x3) == 0 ? 2 + __CONSTANT_FFS_2((w) >> 2) : 
> __CONSTANT_FFS_2((w)))
> +#define __CONSTANT_FFS_8(w) \
> +    (((w) & 0xf) == 0 ? 4 + __CONSTANT_FFS_4((w) >> 4) : 
> __CONSTANT_FFS_4((w)))
> +#define __CONSTANT_FFS_16(w) \
> +    (((w) & 0xff) == 0 ? 8 + __CONSTANT_FFS_8((w) >> 8) : 
> __CONSTANT_FFS_8((w)))
> +#define __CONSTANT_FFS_32(w) \
> +    (((w) & 0xffff) == 0 ? 16 + __CONSTANT_FFS_16((w) >> 16) : 
> __CONSTANT_FFS_16((w)))
> +
>   /**
>    * __ffs64 - find first set bit in a 64 bit word
>    * @word: The 64 bit word
> 
> 
> Paolo
> 
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> ---
>>   arch/x86/kvm/vmx/vmx.c | 28 +++++++++++++++++++++-------
>>   1 file changed, 21 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index 394ef4732838..6819fc470072 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -204,6 +204,9 @@ module_param(ple_window_max, uint, 0444);
>>   int __read_mostly pt_mode = PT_MODE_SYSTEM;
>>   module_param(pt_mode, int, S_IRUGO);
>> +static bool has_msr_rtit_cr3_match;
>> +static bool has_msr_rtit_output_x;
>> +
>>   static DEFINE_STATIC_KEY_FALSE(vmx_l1d_should_flush);
>>   static DEFINE_STATIC_KEY_FALSE(vmx_l1d_flush_cond);
>>   static DEFINE_MUTEX(vmx_l1d_flush_mutex);
>> @@ -1035,9 +1038,12 @@ static inline void pt_load_msr(struct pt_ctx 
>> *ctx, u32 addr_range)
>>       u32 i;
>>       wrmsrl(MSR_IA32_RTIT_STATUS, ctx->status);
>> -    wrmsrl(MSR_IA32_RTIT_OUTPUT_BASE, ctx->output_base);
>> -    wrmsrl(MSR_IA32_RTIT_OUTPUT_MASK, ctx->output_mask);
>> -    wrmsrl(MSR_IA32_RTIT_CR3_MATCH, ctx->cr3_match);
>> +    if (has_msr_rtit_output_x) {
>> +        wrmsrl(MSR_IA32_RTIT_OUTPUT_BASE, ctx->output_base);
>> +        wrmsrl(MSR_IA32_RTIT_OUTPUT_MASK, ctx->output_mask);
>> +    }
>> +    if (has_msr_rtit_cr3_match)
>> +        wrmsrl(MSR_IA32_RTIT_CR3_MATCH, ctx->cr3_match);
>>       for (i = 0; i < addr_range; i++) {
>>           wrmsrl(MSR_IA32_RTIT_ADDR0_A + i * 2, ctx->addr_a[i]);
>>           wrmsrl(MSR_IA32_RTIT_ADDR0_B + i * 2, ctx->addr_b[i]);
>> @@ -1049,9 +1055,12 @@ static inline void pt_save_msr(struct pt_ctx 
>> *ctx, u32 addr_range)
>>       u32 i;
>>       rdmsrl(MSR_IA32_RTIT_STATUS, ctx->status);
>> -    rdmsrl(MSR_IA32_RTIT_OUTPUT_BASE, ctx->output_base);
>> -    rdmsrl(MSR_IA32_RTIT_OUTPUT_MASK, ctx->output_mask);
>> -    rdmsrl(MSR_IA32_RTIT_CR3_MATCH, ctx->cr3_match);
>> +    if (has_msr_rtit_output_x) {
>> +        rdmsrl(MSR_IA32_RTIT_OUTPUT_BASE, ctx->output_base);
>> +        rdmsrl(MSR_IA32_RTIT_OUTPUT_MASK, ctx->output_mask);
>> +    }
>> +    if (has_msr_rtit_cr3_match)
>> +        rdmsrl(MSR_IA32_RTIT_CR3_MATCH, ctx->cr3_match);
>>       for (i = 0; i < addr_range; i++) {
>>           rdmsrl(MSR_IA32_RTIT_ADDR0_A + i * 2, ctx->addr_a[i]);
>>           rdmsrl(MSR_IA32_RTIT_ADDR0_B + i * 2, ctx->addr_b[i]);
>> @@ -7883,8 +7892,13 @@ static __init int hardware_setup(void)
>>       if (pt_mode != PT_MODE_SYSTEM && pt_mode != PT_MODE_HOST_GUEST)
>>           return -EINVAL;
>> -    if (!enable_ept || !cpu_has_vmx_intel_pt())
>> +    if (!enable_ept || !cpu_has_vmx_intel_pt()) {
>>           pt_mode = PT_MODE_SYSTEM;
>> +    } else if (boot_cpu_has(X86_FEATURE_INTEL_PT)) {
>> +        has_msr_rtit_cr3_match = 
>> intel_pt_validate_hw_cap(PT_CAP_cr3_filtering);
>> +        has_msr_rtit_output_x = 
>> intel_pt_validate_hw_cap(PT_CAP_topa_output) ||
>> +                    
>> intel_pt_validate_hw_cap(PT_CAP_single_range_output);
>> +    }
>>       setup_default_sgx_lepubkeyhash();
>>
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 7/7] KVM: VMX: Only context switch some PT MSRs when they exist
  2021-10-18 14:04     ` Xiaoyao Li
@ 2021-10-18 15:20       ` Paolo Bonzini
  0 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2021-10-18 15:20 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel

On 18/10/21 16:04, Xiaoyao Li wrote:
>>
>> If intel_pt_validate_hw_cap is not fast enough, it should be optimized.
>> Something like this:
> 
> If I understand correctly, you mean we don't need to cache the existence 
> with two variable, right?
> 

Yes, exactly.

Paolo


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 6/7] KVM: VMX: Check Intel PT related CPUID leaves
  2021-10-18 13:56         ` Xiaoyao Li
@ 2021-10-18 17:26           ` Sean Christopherson
  2021-10-19  1:46             ` Xiaoyao Li
  0 siblings, 1 reply; 21+ messages in thread
From: Sean Christopherson @ 2021-10-18 17:26 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel

On Mon, Oct 18, 2021, Xiaoyao Li wrote:
> On 10/18/2021 8:47 PM, Paolo Bonzini wrote:
> > > > One option would be to bump that to the theoretical max of 15,
> > > > which doesn't seem too horrible, especially if pt_desc as a whole
> > > > is allocated on-demand, which it probably should be since it isn't
> > > > exactly tiny (nor ubiquitous)
> > > > 
> > > > A different option would be to let userspace define whatever it
> > > > wants for guest CPUID, and instead cap nr_addr_ranges at
> > > > min(host.cpuid, guest.cpuid, RTIT_ADDR_RANGE).
> > 
> > This is the safest option.
> 
> My concern was that change userspace's input silently is not good.

Technically KVM isn't changing userspace's input, as KVM will still enumerate
CPUID as defined by userspace.  What KVM is doing is refusing to emulate/virtualize
a bogus vCPU model, e.g. by injecting #GP on an MSR access that userspace
incorrectly told the guest was legal.  That is standard operation procedure for
KVM, e.g. there are any number of instructions that will fault if userspace lies
about the vCPU model.

> prefer this, we certainly need to extend the userspace to query what value
> is finally accepted and set by KVM.

That would be __do_cpuid_func()'s responsibility to cap leaf 0x14 output with
RTIT_ADDR_RANGE.  I.e. enumerate the supported ranges in KVM_GET_SUPPORTED_CPUID,
after that it's userspace's responsibility to not mess up.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 6/7] KVM: VMX: Check Intel PT related CPUID leaves
  2021-10-18 17:26           ` Sean Christopherson
@ 2021-10-19  1:46             ` Xiaoyao Li
  0 siblings, 0 replies; 21+ messages in thread
From: Xiaoyao Li @ 2021-10-19  1:46 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel

On 10/19/2021 1:26 AM, Sean Christopherson wrote:
> On Mon, Oct 18, 2021, Xiaoyao Li wrote:
>> On 10/18/2021 8:47 PM, Paolo Bonzini wrote:
>>>>> One option would be to bump that to the theoretical max of 15,
>>>>> which doesn't seem too horrible, especially if pt_desc as a whole
>>>>> is allocated on-demand, which it probably should be since it isn't
>>>>> exactly tiny (nor ubiquitous)
>>>>>
>>>>> A different option would be to let userspace define whatever it
>>>>> wants for guest CPUID, and instead cap nr_addr_ranges at
>>>>> min(host.cpuid, guest.cpuid, RTIT_ADDR_RANGE).
>>>
>>> This is the safest option.

I think I misunderstood it. sigh...

It's not architecture consistent that guest sees a certain number of 
RTIT_ADDR_RANGE from its CPUID but may get #GP when it accesses high index.

OK, you mean it's userspace's fault and KVM shouldn't get blamed for it. 
It seems reasonable for me now.

>> My concern was that change userspace's input silently is not good.
> 
> Technically KVM isn't changing userspace's input, as KVM will still enumerate
> CPUID as defined by userspace.  What KVM is doing is refusing to emulate/virtualize
> a bogus vCPU model, e.g. by injecting #GP on an MSR access that userspace
> incorrectly told the guest was legal.  That is standard operation procedure for
> KVM, e.g. there are any number of instructions that will fault if userspace lies
> about the vCPU model.
> 
>> prefer this, we certainly need to extend the userspace to query what value
>> is finally accepted and set by KVM.
> 
> That would be __do_cpuid_func()'s responsibility to cap leaf 0x14 output with
> RTIT_ADDR_RANGE.  I.e. enumerate the supported ranges in KVM_GET_SUPPORTED_CPUID,
> after that it's userspace's responsibility to not mess up.
> 




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/7] KVM: VMX: PT (processor trace) optimization cleanup and fixes
  2021-08-27  7:02 [PATCH v2 0/7] KVM: VMX: PT (processor trace) optimization cleanup and fixes Xiaoyao Li
                   ` (6 preceding siblings ...)
  2021-08-27  7:02 ` [PATCH v2 7/7] KVM: VMX: Only context switch some PT MSRs when they exist Xiaoyao Li
@ 2021-10-19 16:52 ` Paolo Bonzini
  7 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2021-10-19 16:52 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel

On 27/08/21 09:02, Xiaoyao Li wrote:
> Patch 1-3 are optimization and cleanup.
> 
> Patch 4-7 are fixes for PT. Patch 4 and 5 fix the virtulazation of PT to
> provide architectual consistent behavior for guest. Patch 6 fix the case
> that malicious userspace can exploit PT to cause vm-entry failure or #GP
> in KVM. Patch 7 fix the potential MSR access #GP if some PT MSRs not
> available on hardware.
> 
> Patch 3 and patch 7 are added in v2.

Queued patches 1-4, thanks.

Paolo


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2021-10-19 16:52 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-27  7:02 [PATCH v2 0/7] KVM: VMX: PT (processor trace) optimization cleanup and fixes Xiaoyao Li
2021-08-27  7:02 ` [PATCH v2 1/7] KVM: VMX: Restore host's MSR_IA32_RTIT_CTL when it's not zero Xiaoyao Li
2021-08-27  7:02 ` [PATCH v2 2/7] KVM: VMX: Use precomputed vmx->pt_desc.addr_range Xiaoyao Li
2021-08-27  7:02 ` [PATCH v2 3/7] KVM: VMX: Rename pt_desc.addr_range to pt_desc.nr_addr_range Xiaoyao Li
2021-10-18 12:41   ` Paolo Bonzini
2021-08-27  7:02 ` [PATCH v2 4/7] KVM: VMX: RTIT_CTL_BRANCH_EN has no dependency on other CPUID bit Xiaoyao Li
2021-08-27  7:02 ` [PATCH v2 5/7] KVM: VMX: Disallow PT MSRs accessing if PT is not exposed to guest Xiaoyao Li
2021-10-18 12:46   ` Paolo Bonzini
2021-08-27  7:02 ` [PATCH v2 6/7] KVM: VMX: Check Intel PT related CPUID leaves Xiaoyao Li
2021-09-09 21:41   ` Sean Christopherson
2021-09-10  1:59     ` Xiaoyao Li
2021-10-18  7:01       ` Xiaoyao Li
2021-10-18 12:47       ` Paolo Bonzini
2021-10-18 13:56         ` Xiaoyao Li
2021-10-18 17:26           ` Sean Christopherson
2021-10-19  1:46             ` Xiaoyao Li
2021-08-27  7:02 ` [PATCH v2 7/7] KVM: VMX: Only context switch some PT MSRs when they exist Xiaoyao Li
2021-10-18 13:08   ` Paolo Bonzini
2021-10-18 14:04     ` Xiaoyao Li
2021-10-18 15:20       ` Paolo Bonzini
2021-10-19 16:52 ` [PATCH v2 0/7] KVM: VMX: PT (processor trace) optimization cleanup and fixes Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.