linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/6] KVM: VMX: Intel PT configuration switch using XSAVES/XRSTORS on VM-Entry/Exit
@ 2019-05-16  8:25 Luwei Kang
  2019-05-16  8:25 ` [PATCH v1 1/6] x86/fpu: Introduce new fpu state for Intel processor trace Luwei Kang
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Luwei Kang @ 2019-05-16  8:25 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: tglx, mingo, bp, hpa, x86, pbonzini, rkrcmar, Luwei Kang

This patch set is mainly used for reduce the overhead of switch
Intel PT configuation contex on VM-Entry/Exit by XSAVES/XRSTORS
instructions.

I measured the cycles number of context witch on Manual and
XSAVES/XRSTORES by rdtsc, and the data as below:

Manual save(rdmsr):     ~334  cycles
Manual restore(wrmsr):  ~1668 cycles

XSAVES insturction:     ~124  cycles
XRSTORS instruction:    ~378  cycles

Manual: Switch the configuration by rdmsr and wrmsr instruction,
        and there have 8 registers need to be saved or restore.
        They are IA32_RTIT_OUTPUT_BASE, *_OUTPUT_MASK_PTRS,
        *_STATUS, *_CR3_MATCH, *_ADDR0_A, *_ADDR0_B,
        *_ADDR1_A, *_ADDR1_B.
XSAVES/XRSTORS: Switch the configuration context by XSAVES/XRSTORS
        instructions. This patch set will allocate separate
        "struct fpu" structure to save host and guest PT state.
        Only a small portion of this structure will be used because
        we only save/restore PT state (not save AVX, AVX-512, MPX,
        PKRU and so on).

This patch set also do some code clean e.g. patch 2 will reuse
the fpu pt_state to save the PT configuration contex and
patch 3 will dymamic allocate Intel PT configuration state.

Luwei Kang (6):
  x86/fpu: Introduce new fpu state for Intel processor trace
  KVM: VMX: Reuse the pt_state structure for PT context
  KVM: VMX: Dymamic allocate Intel PT configuration state
  KVM: VMX: Allocate XSAVE area for Intel PT configuration
  KVM: VMX: Intel PT configration context switch using XSAVES/XRSTORS
  KVM: VMX: Get PT state from xsave area to variables

 arch/x86/include/asm/fpu/types.h |  13 ++
 arch/x86/kvm/vmx/nested.c        |   2 +-
 arch/x86/kvm/vmx/vmx.c           | 338 ++++++++++++++++++++++++++-------------
 arch/x86/kvm/vmx/vmx.h           |  21 +--
 4 files changed, 243 insertions(+), 131 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v1 1/6] x86/fpu: Introduce new fpu state for Intel processor trace
  2019-05-16  8:25 [PATCH v1 0/6] KVM: VMX: Intel PT configuration switch using XSAVES/XRSTORS on VM-Entry/Exit Luwei Kang
@ 2019-05-16  8:25 ` Luwei Kang
  2019-05-16  8:25 ` [PATCH v1 2/6] KVM: VMX: Reuse the pt_state structure for PT context Luwei Kang
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Luwei Kang @ 2019-05-16  8:25 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: tglx, mingo, bp, hpa, x86, pbonzini, rkrcmar, Luwei Kang

Introduce new fpu state structure pt_state to save Intel
processor trace configuration. The upcoming using
XSAVES/XRSTORS to switch the Intel PT configuration
on VM-Entry/Exit will use this structure.

Signed-off-by: Luwei Kang <luwei.kang@intel.com>
---
 arch/x86/include/asm/fpu/types.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index 2e32e17..8cbb42e 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -221,6 +221,19 @@ struct avx_512_hi16_state {
 } __packed;
 
 /*
+ * State component 8 is used for some 64-bit registers
+ * of Intel processor trace.
+ */
+struct pt_state {
+	u64 rtit_ctl;
+	u64 rtit_output_base;
+	u64 rtit_output_mask;
+	u64 rtit_status;
+	u64 rtit_cr3_match;
+	u64 rtit_addrx_ab[0];
+} __packed;
+
+/*
  * State component 9: 32-bit PKRU register.  The state is
  * 8 bytes long but only 4 bytes is used currently.
  */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v1 2/6] KVM: VMX: Reuse the pt_state structure for PT context
  2019-05-16  8:25 [PATCH v1 0/6] KVM: VMX: Intel PT configuration switch using XSAVES/XRSTORS on VM-Entry/Exit Luwei Kang
  2019-05-16  8:25 ` [PATCH v1 1/6] x86/fpu: Introduce new fpu state for Intel processor trace Luwei Kang
@ 2019-05-16  8:25 ` Luwei Kang
  2019-05-16  8:25 ` [PATCH v1 3/6] KVM: VMX: Dymamic allocate Intel PT configuration state Luwei Kang
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Luwei Kang @ 2019-05-16  8:25 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: tglx, mingo, bp, hpa, x86, pbonzini, rkrcmar, Luwei Kang

Remove the previous pt_ctx structure and use pt_state
to save the PT configuration because they are saved
the same things.
Add *_ctx postfix to different with the upcoming
host and guest fpu pointer for PT state.

Signed-off-by: Luwei Kang <luwei.kang@intel.com>
---
 arch/x86/kvm/vmx/nested.c |  2 +-
 arch/x86/kvm/vmx/vmx.c    | 96 +++++++++++++++++++++--------------------------
 arch/x86/kvm/vmx/vmx.h    | 16 +-------
 3 files changed, 46 insertions(+), 68 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index f4b1ae4..e8d5c61 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4201,7 +4201,7 @@ static int enter_vmx_operation(struct kvm_vcpu *vcpu)
 	vmx->nested.vmxon = true;
 
 	if (pt_mode == PT_MODE_HOST_GUEST) {
-		vmx->pt_desc.guest.ctl = 0;
+		vmx->pt_desc.guest_ctx.rtit_ctl = 0;
 		pt_update_intercept_for_msr(vmx);
 	}
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 0db7ded..4234e40e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -976,32 +976,28 @@ static unsigned long segment_base(u16 selector)
 }
 #endif
 
-static inline void pt_load_msr(struct pt_ctx *ctx, u32 addr_range)
+static inline void pt_load_msr(struct pt_state *ctx, u32 addr_range)
 {
 	u32 i;
 
-	wrmsrl(MSR_IA32_RTIT_STATUS, ctx->status);
-	wrmsrl(MSR_IA32_RTIT_OUTPUT_BASE, ctx->output_base);
-	wrmsrl(MSR_IA32_RTIT_OUTPUT_MASK, ctx->output_mask);
-	wrmsrl(MSR_IA32_RTIT_CR3_MATCH, ctx->cr3_match);
-	for (i = 0; i < addr_range; i++) {
-		wrmsrl(MSR_IA32_RTIT_ADDR0_A + i * 2, ctx->addr_a[i]);
-		wrmsrl(MSR_IA32_RTIT_ADDR0_B + i * 2, ctx->addr_b[i]);
-	}
+	wrmsrl(MSR_IA32_RTIT_OUTPUT_BASE, ctx->rtit_output_base);
+	wrmsrl(MSR_IA32_RTIT_OUTPUT_MASK, ctx->rtit_output_mask);
+	wrmsrl(MSR_IA32_RTIT_STATUS, ctx->rtit_status);
+	wrmsrl(MSR_IA32_RTIT_CR3_MATCH, ctx->rtit_cr3_match);
+	for (i = 0; i < addr_range * 2; i++)
+		wrmsrl(MSR_IA32_RTIT_ADDR0_A + i, ctx->rtit_addrx_ab[i]);
 }
 
-static inline void pt_save_msr(struct pt_ctx *ctx, u32 addr_range)
+static inline void pt_save_msr(struct pt_state *ctx, u32 addr_range)
 {
 	u32 i;
 
-	rdmsrl(MSR_IA32_RTIT_STATUS, ctx->status);
-	rdmsrl(MSR_IA32_RTIT_OUTPUT_BASE, ctx->output_base);
-	rdmsrl(MSR_IA32_RTIT_OUTPUT_MASK, ctx->output_mask);
-	rdmsrl(MSR_IA32_RTIT_CR3_MATCH, ctx->cr3_match);
-	for (i = 0; i < addr_range; i++) {
-		rdmsrl(MSR_IA32_RTIT_ADDR0_A + i * 2, ctx->addr_a[i]);
-		rdmsrl(MSR_IA32_RTIT_ADDR0_B + i * 2, ctx->addr_b[i]);
-	}
+	rdmsrl(MSR_IA32_RTIT_OUTPUT_BASE, ctx->rtit_output_base);
+	rdmsrl(MSR_IA32_RTIT_OUTPUT_MASK, ctx->rtit_output_mask);
+	rdmsrl(MSR_IA32_RTIT_STATUS, ctx->rtit_status);
+	rdmsrl(MSR_IA32_RTIT_CR3_MATCH, ctx->rtit_cr3_match);
+	for (i = 0; i < addr_range; i++)
+		rdmsrl(MSR_IA32_RTIT_ADDR0_A + i, ctx->rtit_addrx_ab[i]);
 }
 
 static void pt_guest_enter(struct vcpu_vmx *vmx)
@@ -1013,11 +1009,11 @@ static void pt_guest_enter(struct vcpu_vmx *vmx)
 	 * GUEST_IA32_RTIT_CTL is already set in the VMCS.
 	 * Save host state before VM entry.
 	 */
-	rdmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc.host.ctl);
-	if (vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) {
+	rdmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc.host_ctx.rtit_ctl);
+	if (vmx->pt_desc.guest_ctx.rtit_ctl & RTIT_CTL_TRACEEN) {
 		wrmsrl(MSR_IA32_RTIT_CTL, 0);
-		pt_save_msr(&vmx->pt_desc.host, vmx->pt_desc.addr_range);
-		pt_load_msr(&vmx->pt_desc.guest, vmx->pt_desc.addr_range);
+		pt_save_msr(&vmx->pt_desc.host_ctx, vmx->pt_desc.addr_range);
+		pt_load_msr(&vmx->pt_desc.guest_ctx, vmx->pt_desc.addr_range);
 	}
 }
 
@@ -1026,13 +1022,13 @@ static void pt_guest_exit(struct vcpu_vmx *vmx)
 	if (pt_mode == PT_MODE_SYSTEM)
 		return;
 
-	if (vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) {
-		pt_save_msr(&vmx->pt_desc.guest, vmx->pt_desc.addr_range);
-		pt_load_msr(&vmx->pt_desc.host, vmx->pt_desc.addr_range);
+	if (vmx->pt_desc.guest_ctx.rtit_ctl & RTIT_CTL_TRACEEN) {
+		pt_save_msr(&vmx->pt_desc.guest_ctx, vmx->pt_desc.addr_range);
+		pt_load_msr(&vmx->pt_desc.host_ctx, vmx->pt_desc.addr_range);
 	}
 
 	/* Reload host state (IA32_RTIT_CTL will be cleared on VM exit). */
-	wrmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc.host.ctl);
+	wrmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc.host_ctx.rtit_ctl);
 }
 
 void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
@@ -1402,8 +1398,8 @@ static int vmx_rtit_ctl_check(struct kvm_vcpu *vcpu, u64 data)
 	 * Any attempt to modify IA32_RTIT_CTL while TraceEn is set will
 	 * result in a #GP unless the same write also clears TraceEn.
 	 */
-	if ((vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) &&
-		((vmx->pt_desc.guest.ctl ^ data) & ~RTIT_CTL_TRACEEN))
+	if ((vmx->pt_desc.guest_ctx.rtit_ctl & RTIT_CTL_TRACEEN) &&
+		((vmx->pt_desc.guest_ctx.rtit_ctl ^ data) & ~RTIT_CTL_TRACEEN))
 		return 1;
 
 	/*
@@ -1725,19 +1721,19 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_RTIT_CTL:
 		if (pt_mode != PT_MODE_HOST_GUEST)
 			return 1;
-		msr_info->data = vmx->pt_desc.guest.ctl;
+		msr_info->data = vmx->pt_desc.guest_ctx.rtit_ctl;
 		break;
 	case MSR_IA32_RTIT_STATUS:
 		if (pt_mode != PT_MODE_HOST_GUEST)
 			return 1;
-		msr_info->data = vmx->pt_desc.guest.status;
+		msr_info->data = vmx->pt_desc.guest_ctx.rtit_status;
 		break;
 	case MSR_IA32_RTIT_CR3_MATCH:
 		if ((pt_mode != PT_MODE_HOST_GUEST) ||
 			!intel_pt_validate_cap(vmx->pt_desc.caps,
 						PT_CAP_cr3_filtering))
 			return 1;
-		msr_info->data = vmx->pt_desc.guest.cr3_match;
+		msr_info->data = vmx->pt_desc.guest_ctx.rtit_cr3_match;
 		break;
 	case MSR_IA32_RTIT_OUTPUT_BASE:
 		if ((pt_mode != PT_MODE_HOST_GUEST) ||
@@ -1746,7 +1742,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			 !intel_pt_validate_cap(vmx->pt_desc.caps,
 					PT_CAP_single_range_output)))
 			return 1;
-		msr_info->data = vmx->pt_desc.guest.output_base;
+		msr_info->data = vmx->pt_desc.guest_ctx.rtit_output_base;
 		break;
 	case MSR_IA32_RTIT_OUTPUT_MASK:
 		if ((pt_mode != PT_MODE_HOST_GUEST) ||
@@ -1755,7 +1751,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			 !intel_pt_validate_cap(vmx->pt_desc.caps,
 					PT_CAP_single_range_output)))
 			return 1;
-		msr_info->data = vmx->pt_desc.guest.output_mask;
+		msr_info->data = vmx->pt_desc.guest_ctx.rtit_output_mask;
 		break;
 	case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B:
 		index = msr_info->index - MSR_IA32_RTIT_ADDR0_A;
@@ -1763,10 +1759,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			(index >= 2 * intel_pt_validate_cap(vmx->pt_desc.caps,
 					PT_CAP_num_address_ranges)))
 			return 1;
-		if (index % 2)
-			msr_info->data = vmx->pt_desc.guest.addr_b[index / 2];
-		else
-			msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
+		msr_info->data = vmx->pt_desc.guest_ctx.rtit_addrx_ab[index];
 		break;
 	case MSR_TSC_AUX:
 		if (!msr_info->host_initiated &&
@@ -1953,56 +1946,53 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			vmx->nested.vmxon)
 			return 1;
 		vmcs_write64(GUEST_IA32_RTIT_CTL, data);
-		vmx->pt_desc.guest.ctl = data;
+		vmx->pt_desc.guest_ctx.rtit_ctl = data;
 		pt_update_intercept_for_msr(vmx);
 		break;
 	case MSR_IA32_RTIT_STATUS:
 		if ((pt_mode != PT_MODE_HOST_GUEST) ||
-			(vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) ||
+			(vmx->pt_desc.guest_ctx.rtit_ctl & RTIT_CTL_TRACEEN) ||
 			(data & MSR_IA32_RTIT_STATUS_MASK))
 			return 1;
-		vmx->pt_desc.guest.status = data;
+		vmx->pt_desc.guest_ctx.rtit_status = data;
 		break;
 	case MSR_IA32_RTIT_CR3_MATCH:
 		if ((pt_mode != PT_MODE_HOST_GUEST) ||
-			(vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) ||
+			(vmx->pt_desc.guest_ctx.rtit_ctl & RTIT_CTL_TRACEEN) ||
 			!intel_pt_validate_cap(vmx->pt_desc.caps,
 						PT_CAP_cr3_filtering))
 			return 1;
-		vmx->pt_desc.guest.cr3_match = data;
+		vmx->pt_desc.guest_ctx.rtit_cr3_match = data;
 		break;
 	case MSR_IA32_RTIT_OUTPUT_BASE:
 		if ((pt_mode != PT_MODE_HOST_GUEST) ||
-			(vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) ||
+			(vmx->pt_desc.guest_ctx.rtit_ctl & RTIT_CTL_TRACEEN) ||
 			(!intel_pt_validate_cap(vmx->pt_desc.caps,
 					PT_CAP_topa_output) &&
 			 !intel_pt_validate_cap(vmx->pt_desc.caps,
 					PT_CAP_single_range_output)) ||
 			(data & MSR_IA32_RTIT_OUTPUT_BASE_MASK))
 			return 1;
-		vmx->pt_desc.guest.output_base = data;
+		vmx->pt_desc.guest_ctx.rtit_output_base = data;
 		break;
 	case MSR_IA32_RTIT_OUTPUT_MASK:
 		if ((pt_mode != PT_MODE_HOST_GUEST) ||
-			(vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) ||
+			(vmx->pt_desc.guest_ctx.rtit_ctl & RTIT_CTL_TRACEEN) ||
 			(!intel_pt_validate_cap(vmx->pt_desc.caps,
 					PT_CAP_topa_output) &&
 			 !intel_pt_validate_cap(vmx->pt_desc.caps,
 					PT_CAP_single_range_output)))
 			return 1;
-		vmx->pt_desc.guest.output_mask = data;
+		vmx->pt_desc.guest_ctx.rtit_output_mask = data;
 		break;
 	case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B:
 		index = msr_info->index - MSR_IA32_RTIT_ADDR0_A;
 		if ((pt_mode != PT_MODE_HOST_GUEST) ||
-			(vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) ||
+			(vmx->pt_desc.guest_ctx.rtit_ctl & RTIT_CTL_TRACEEN) ||
 			(index >= 2 * intel_pt_validate_cap(vmx->pt_desc.caps,
 					PT_CAP_num_address_ranges)))
 			return 1;
-		if (index % 2)
-			vmx->pt_desc.guest.addr_b[index / 2] = data;
-		else
-			vmx->pt_desc.guest.addr_a[index / 2] = data;
+		vmx->pt_desc.guest_ctx.rtit_addrx_ab[index] = data;
 		break;
 	case MSR_TSC_AUX:
 		if (!msr_info->host_initiated &&
@@ -3591,7 +3581,7 @@ void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu)
 void pt_update_intercept_for_msr(struct vcpu_vmx *vmx)
 {
 	unsigned long *msr_bitmap = vmx->vmcs01.msr_bitmap;
-	bool flag = !(vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN);
+	bool flag = !(vmx->pt_desc.guest_ctx.rtit_ctl & RTIT_CTL_TRACEEN);
 	u32 i;
 
 	vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_RTIT_STATUS,
@@ -4105,7 +4095,7 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
 	if (pt_mode == PT_MODE_HOST_GUEST) {
 		memset(&vmx->pt_desc, 0, sizeof(vmx->pt_desc));
 		/* Bit[6~0] are forced to 1, writes are ignored. */
-		vmx->pt_desc.guest.output_mask = 0x7F;
+		vmx->pt_desc.guest_ctx.rtit_output_mask = 0x7F;
 		vmcs_write64(GUEST_IA32_RTIT_CTL, 0);
 	}
 }
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 63d37cc..11ad856 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -65,24 +65,12 @@ struct pi_desc {
 	u32 rsvd[6];
 } __aligned(64);
 
-#define RTIT_ADDR_RANGE		4
-
-struct pt_ctx {
-	u64 ctl;
-	u64 status;
-	u64 output_base;
-	u64 output_mask;
-	u64 cr3_match;
-	u64 addr_a[RTIT_ADDR_RANGE];
-	u64 addr_b[RTIT_ADDR_RANGE];
-};
-
 struct pt_desc {
 	u64 ctl_bitmask;
 	u32 addr_range;
 	u32 caps[PT_CPUID_REGS_NUM * PT_CPUID_LEAVES];
-	struct pt_ctx host;
-	struct pt_ctx guest;
+	struct pt_state host_ctx;
+	struct pt_state guest_ctx;
 };
 
 /*
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v1 3/6] KVM: VMX: Dymamic allocate Intel PT configuration state
  2019-05-16  8:25 [PATCH v1 0/6] KVM: VMX: Intel PT configuration switch using XSAVES/XRSTORS on VM-Entry/Exit Luwei Kang
  2019-05-16  8:25 ` [PATCH v1 1/6] x86/fpu: Introduce new fpu state for Intel processor trace Luwei Kang
  2019-05-16  8:25 ` [PATCH v1 2/6] KVM: VMX: Reuse the pt_state structure for PT context Luwei Kang
@ 2019-05-16  8:25 ` Luwei Kang
  2019-05-16  8:25 ` [PATCH v1 4/6] KVM: VMX: Allocate XSAVE area for Intel PT configuration Luwei Kang
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Luwei Kang @ 2019-05-16  8:25 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: tglx, mingo, bp, hpa, x86, pbonzini, rkrcmar, Luwei Kang

This patch change the Intel PT configuration state
to structure pointer so that we only need to allocate
the state buffer when Intel PT working in HOST_GUEST
mode.

Signed-off-by: Luwei Kang <luwei.kang@intel.com>
---
 arch/x86/kvm/vmx/nested.c |   2 +-
 arch/x86/kvm/vmx/vmx.c    | 202 +++++++++++++++++++++++++++-------------------
 arch/x86/kvm/vmx/vmx.h    |   6 +-
 3 files changed, 121 insertions(+), 89 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index e8d5c61..349be88 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4201,7 +4201,7 @@ static int enter_vmx_operation(struct kvm_vcpu *vcpu)
 	vmx->nested.vmxon = true;
 
 	if (pt_mode == PT_MODE_HOST_GUEST) {
-		vmx->pt_desc.guest_ctx.rtit_ctl = 0;
+		vmx->pt_desc->guest_ctx->rtit_ctl = 0;
 		pt_update_intercept_for_msr(vmx);
 	}
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4234e40e..4595230 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1009,11 +1009,11 @@ static void pt_guest_enter(struct vcpu_vmx *vmx)
 	 * GUEST_IA32_RTIT_CTL is already set in the VMCS.
 	 * Save host state before VM entry.
 	 */
-	rdmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc.host_ctx.rtit_ctl);
-	if (vmx->pt_desc.guest_ctx.rtit_ctl & RTIT_CTL_TRACEEN) {
+	rdmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc->host_ctx->rtit_ctl);
+	if (vmx->pt_desc->guest_ctx->rtit_ctl & RTIT_CTL_TRACEEN) {
 		wrmsrl(MSR_IA32_RTIT_CTL, 0);
-		pt_save_msr(&vmx->pt_desc.host_ctx, vmx->pt_desc.addr_range);
-		pt_load_msr(&vmx->pt_desc.guest_ctx, vmx->pt_desc.addr_range);
+		pt_save_msr(vmx->pt_desc->host_ctx, vmx->pt_desc->addr_range);
+		pt_load_msr(vmx->pt_desc->guest_ctx, vmx->pt_desc->addr_range);
 	}
 }
 
@@ -1022,13 +1022,35 @@ static void pt_guest_exit(struct vcpu_vmx *vmx)
 	if (pt_mode == PT_MODE_SYSTEM)
 		return;
 
-	if (vmx->pt_desc.guest_ctx.rtit_ctl & RTIT_CTL_TRACEEN) {
-		pt_save_msr(&vmx->pt_desc.guest_ctx, vmx->pt_desc.addr_range);
-		pt_load_msr(&vmx->pt_desc.host_ctx, vmx->pt_desc.addr_range);
+	if (vmx->pt_desc->guest_ctx->rtit_ctl & RTIT_CTL_TRACEEN) {
+		pt_save_msr(vmx->pt_desc->guest_ctx, vmx->pt_desc->addr_range);
+		pt_load_msr(vmx->pt_desc->host_ctx, vmx->pt_desc->addr_range);
 	}
 
 	/* Reload host state (IA32_RTIT_CTL will be cleared on VM exit). */
-	wrmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc.host_ctx.rtit_ctl);
+	wrmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc->host_ctx->rtit_ctl);
+}
+
+static int pt_init(struct vcpu_vmx *vmx)
+{
+	u32 pt_state_sz = sizeof(struct pt_state) + sizeof(u64) *
+		intel_pt_validate_hw_cap(PT_CAP_num_address_ranges) * 2;
+
+	vmx->pt_desc = kzalloc(sizeof(struct pt_desc) + pt_state_sz * 2,
+		GFP_KERNEL_ACCOUNT);
+	if (!vmx->pt_desc)
+		return -ENOMEM;
+
+	vmx->pt_desc->host_ctx = (struct pt_state *)(vmx->pt_desc + 1);
+	vmx->pt_desc->guest_ctx = (void *)vmx->pt_desc->host_ctx + pt_state_sz;
+
+	return 0;
+}
+
+static void pt_uninit(struct vcpu_vmx *vmx)
+{
+	if (pt_mode == PT_MODE_HOST_GUEST)
+		kfree(vmx->pt_desc);
 }
 
 void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
@@ -1391,15 +1413,16 @@ static int vmx_rtit_ctl_check(struct kvm_vcpu *vcpu, u64 data)
 	 * Any MSR write that attempts to change bits marked reserved will
 	 * case a #GP fault.
 	 */
-	if (data & vmx->pt_desc.ctl_bitmask)
+	if (data & vmx->pt_desc->ctl_bitmask)
 		return 1;
 
 	/*
 	 * Any attempt to modify IA32_RTIT_CTL while TraceEn is set will
 	 * result in a #GP unless the same write also clears TraceEn.
 	 */
-	if ((vmx->pt_desc.guest_ctx.rtit_ctl & RTIT_CTL_TRACEEN) &&
-		((vmx->pt_desc.guest_ctx.rtit_ctl ^ data) & ~RTIT_CTL_TRACEEN))
+	if ((vmx->pt_desc->guest_ctx->rtit_ctl & RTIT_CTL_TRACEEN) &&
+		((vmx->pt_desc->guest_ctx->rtit_ctl ^ data) &
+						~RTIT_CTL_TRACEEN))
 		return 1;
 
 	/*
@@ -1409,7 +1432,7 @@ static int vmx_rtit_ctl_check(struct kvm_vcpu *vcpu, u64 data)
 	 */
 	if ((data & RTIT_CTL_TRACEEN) && !(data & RTIT_CTL_TOPA) &&
 		!(data & RTIT_CTL_FABRIC_EN) &&
-		!intel_pt_validate_cap(vmx->pt_desc.caps,
+		!intel_pt_validate_cap(vmx->pt_desc->caps,
 					PT_CAP_single_range_output))
 		return 1;
 
@@ -1417,19 +1440,19 @@ static int vmx_rtit_ctl_check(struct kvm_vcpu *vcpu, u64 data)
 	 * MTCFreq, CycThresh and PSBFreq encodings check, any MSR write that
 	 * utilize encodings marked reserved will casue a #GP fault.
 	 */
-	value = intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_mtc_periods);
-	if (intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_mtc) &&
+	value = intel_pt_validate_cap(vmx->pt_desc->caps, PT_CAP_mtc_periods);
+	if (intel_pt_validate_cap(vmx->pt_desc->caps, PT_CAP_mtc) &&
 			!test_bit((data & RTIT_CTL_MTC_RANGE) >>
 			RTIT_CTL_MTC_RANGE_OFFSET, &value))
 		return 1;
-	value = intel_pt_validate_cap(vmx->pt_desc.caps,
+	value = intel_pt_validate_cap(vmx->pt_desc->caps,
 						PT_CAP_cycle_thresholds);
-	if (intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_psb_cyc) &&
+	if (intel_pt_validate_cap(vmx->pt_desc->caps, PT_CAP_psb_cyc) &&
 			!test_bit((data & RTIT_CTL_CYC_THRESH) >>
 			RTIT_CTL_CYC_THRESH_OFFSET, &value))
 		return 1;
-	value = intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_psb_periods);
-	if (intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_psb_cyc) &&
+	value = intel_pt_validate_cap(vmx->pt_desc->caps, PT_CAP_psb_periods);
+	if (intel_pt_validate_cap(vmx->pt_desc->caps, PT_CAP_psb_cyc) &&
 			!test_bit((data & RTIT_CTL_PSB_FREQ) >>
 			RTIT_CTL_PSB_FREQ_OFFSET, &value))
 		return 1;
@@ -1439,16 +1462,16 @@ static int vmx_rtit_ctl_check(struct kvm_vcpu *vcpu, u64 data)
 	 * cause a #GP fault.
 	 */
 	value = (data & RTIT_CTL_ADDR0) >> RTIT_CTL_ADDR0_OFFSET;
-	if ((value && (vmx->pt_desc.addr_range < 1)) || (value > 2))
+	if ((value && (vmx->pt_desc->addr_range < 1)) || (value > 2))
 		return 1;
 	value = (data & RTIT_CTL_ADDR1) >> RTIT_CTL_ADDR1_OFFSET;
-	if ((value && (vmx->pt_desc.addr_range < 2)) || (value > 2))
+	if ((value && (vmx->pt_desc->addr_range < 2)) || (value > 2))
 		return 1;
 	value = (data & RTIT_CTL_ADDR2) >> RTIT_CTL_ADDR2_OFFSET;
-	if ((value && (vmx->pt_desc.addr_range < 3)) || (value > 2))
+	if ((value && (vmx->pt_desc->addr_range < 3)) || (value > 2))
 		return 1;
 	value = (data & RTIT_CTL_ADDR3) >> RTIT_CTL_ADDR3_OFFSET;
-	if ((value && (vmx->pt_desc.addr_range < 4)) || (value > 2))
+	if ((value && (vmx->pt_desc->addr_range < 4)) || (value > 2))
 		return 1;
 
 	return 0;
@@ -1721,45 +1744,46 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_RTIT_CTL:
 		if (pt_mode != PT_MODE_HOST_GUEST)
 			return 1;
-		msr_info->data = vmx->pt_desc.guest_ctx.rtit_ctl;
+		msr_info->data = vmx->pt_desc->guest_ctx->rtit_ctl;
 		break;
 	case MSR_IA32_RTIT_STATUS:
 		if (pt_mode != PT_MODE_HOST_GUEST)
 			return 1;
-		msr_info->data = vmx->pt_desc.guest_ctx.rtit_status;
+		msr_info->data = vmx->pt_desc->guest_ctx->rtit_status;
 		break;
 	case MSR_IA32_RTIT_CR3_MATCH:
 		if ((pt_mode != PT_MODE_HOST_GUEST) ||
-			!intel_pt_validate_cap(vmx->pt_desc.caps,
+			!intel_pt_validate_cap(vmx->pt_desc->caps,
 						PT_CAP_cr3_filtering))
 			return 1;
-		msr_info->data = vmx->pt_desc.guest_ctx.rtit_cr3_match;
+		msr_info->data = vmx->pt_desc->guest_ctx->rtit_cr3_match;
 		break;
 	case MSR_IA32_RTIT_OUTPUT_BASE:
 		if ((pt_mode != PT_MODE_HOST_GUEST) ||
-			(!intel_pt_validate_cap(vmx->pt_desc.caps,
+			(!intel_pt_validate_cap(vmx->pt_desc->caps,
 					PT_CAP_topa_output) &&
-			 !intel_pt_validate_cap(vmx->pt_desc.caps,
+			 !intel_pt_validate_cap(vmx->pt_desc->caps,
 					PT_CAP_single_range_output)))
 			return 1;
-		msr_info->data = vmx->pt_desc.guest_ctx.rtit_output_base;
+		msr_info->data = vmx->pt_desc->guest_ctx->rtit_output_base;
 		break;
 	case MSR_IA32_RTIT_OUTPUT_MASK:
 		if ((pt_mode != PT_MODE_HOST_GUEST) ||
-			(!intel_pt_validate_cap(vmx->pt_desc.caps,
+			(!intel_pt_validate_cap(vmx->pt_desc->caps,
 					PT_CAP_topa_output) &&
-			 !intel_pt_validate_cap(vmx->pt_desc.caps,
+			 !intel_pt_validate_cap(vmx->pt_desc->caps,
 					PT_CAP_single_range_output)))
 			return 1;
-		msr_info->data = vmx->pt_desc.guest_ctx.rtit_output_mask;
+		msr_info->data =
+			vmx->pt_desc->guest_ctx->rtit_output_mask | 0x7f;
 		break;
 	case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B:
 		index = msr_info->index - MSR_IA32_RTIT_ADDR0_A;
 		if ((pt_mode != PT_MODE_HOST_GUEST) ||
-			(index >= 2 * intel_pt_validate_cap(vmx->pt_desc.caps,
+			(index >= 2 * intel_pt_validate_cap(vmx->pt_desc->caps,
 					PT_CAP_num_address_ranges)))
 			return 1;
-		msr_info->data = vmx->pt_desc.guest_ctx.rtit_addrx_ab[index];
+		msr_info->data = vmx->pt_desc->guest_ctx->rtit_addrx_ab[index];
 		break;
 	case MSR_TSC_AUX:
 		if (!msr_info->host_initiated &&
@@ -1946,53 +1970,58 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			vmx->nested.vmxon)
 			return 1;
 		vmcs_write64(GUEST_IA32_RTIT_CTL, data);
-		vmx->pt_desc.guest_ctx.rtit_ctl = data;
+		vmx->pt_desc->guest_ctx->rtit_ctl = data;
 		pt_update_intercept_for_msr(vmx);
 		break;
 	case MSR_IA32_RTIT_STATUS:
 		if ((pt_mode != PT_MODE_HOST_GUEST) ||
-			(vmx->pt_desc.guest_ctx.rtit_ctl & RTIT_CTL_TRACEEN) ||
+			(vmx->pt_desc->guest_ctx->rtit_ctl &
+					RTIT_CTL_TRACEEN) ||
 			(data & MSR_IA32_RTIT_STATUS_MASK))
 			return 1;
-		vmx->pt_desc.guest_ctx.rtit_status = data;
+		vmx->pt_desc->guest_ctx->rtit_status = data;
 		break;
 	case MSR_IA32_RTIT_CR3_MATCH:
 		if ((pt_mode != PT_MODE_HOST_GUEST) ||
-			(vmx->pt_desc.guest_ctx.rtit_ctl & RTIT_CTL_TRACEEN) ||
-			!intel_pt_validate_cap(vmx->pt_desc.caps,
+			(vmx->pt_desc->guest_ctx->rtit_ctl &
+						RTIT_CTL_TRACEEN) ||
+			!intel_pt_validate_cap(vmx->pt_desc->caps,
 						PT_CAP_cr3_filtering))
 			return 1;
-		vmx->pt_desc.guest_ctx.rtit_cr3_match = data;
+		vmx->pt_desc->guest_ctx->rtit_cr3_match = data;
 		break;
 	case MSR_IA32_RTIT_OUTPUT_BASE:
 		if ((pt_mode != PT_MODE_HOST_GUEST) ||
-			(vmx->pt_desc.guest_ctx.rtit_ctl & RTIT_CTL_TRACEEN) ||
-			(!intel_pt_validate_cap(vmx->pt_desc.caps,
+			(vmx->pt_desc->guest_ctx->rtit_ctl &
+						RTIT_CTL_TRACEEN) ||
+			(!intel_pt_validate_cap(vmx->pt_desc->caps,
 					PT_CAP_topa_output) &&
-			 !intel_pt_validate_cap(vmx->pt_desc.caps,
+			 !intel_pt_validate_cap(vmx->pt_desc->caps,
 					PT_CAP_single_range_output)) ||
 			(data & MSR_IA32_RTIT_OUTPUT_BASE_MASK))
 			return 1;
-		vmx->pt_desc.guest_ctx.rtit_output_base = data;
+		vmx->pt_desc->guest_ctx->rtit_output_base = data;
 		break;
 	case MSR_IA32_RTIT_OUTPUT_MASK:
 		if ((pt_mode != PT_MODE_HOST_GUEST) ||
-			(vmx->pt_desc.guest_ctx.rtit_ctl & RTIT_CTL_TRACEEN) ||
-			(!intel_pt_validate_cap(vmx->pt_desc.caps,
+			(vmx->pt_desc->guest_ctx->rtit_ctl &
+						RTIT_CTL_TRACEEN) ||
+			(!intel_pt_validate_cap(vmx->pt_desc->caps,
 					PT_CAP_topa_output) &&
-			 !intel_pt_validate_cap(vmx->pt_desc.caps,
+			 !intel_pt_validate_cap(vmx->pt_desc->caps,
 					PT_CAP_single_range_output)))
 			return 1;
-		vmx->pt_desc.guest_ctx.rtit_output_mask = data;
+		vmx->pt_desc->guest_ctx->rtit_output_mask = data;
 		break;
 	case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B:
 		index = msr_info->index - MSR_IA32_RTIT_ADDR0_A;
 		if ((pt_mode != PT_MODE_HOST_GUEST) ||
-			(vmx->pt_desc.guest_ctx.rtit_ctl & RTIT_CTL_TRACEEN) ||
-			(index >= 2 * intel_pt_validate_cap(vmx->pt_desc.caps,
+			(vmx->pt_desc->guest_ctx->rtit_ctl &
+						RTIT_CTL_TRACEEN) ||
+			(index >= 2 * intel_pt_validate_cap(vmx->pt_desc->caps,
 					PT_CAP_num_address_ranges)))
 			return 1;
-		vmx->pt_desc.guest_ctx.rtit_addrx_ab[index] = data;
+		vmx->pt_desc->guest_ctx->rtit_addrx_ab[index] = data;
 		break;
 	case MSR_TSC_AUX:
 		if (!msr_info->host_initiated &&
@@ -3581,7 +3610,7 @@ void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu)
 void pt_update_intercept_for_msr(struct vcpu_vmx *vmx)
 {
 	unsigned long *msr_bitmap = vmx->vmcs01.msr_bitmap;
-	bool flag = !(vmx->pt_desc.guest_ctx.rtit_ctl & RTIT_CTL_TRACEEN);
+	bool flag = !(vmx->pt_desc->guest_ctx->rtit_ctl & RTIT_CTL_TRACEEN);
 	u32 i;
 
 	vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_RTIT_STATUS,
@@ -3592,12 +3621,9 @@ void pt_update_intercept_for_msr(struct vcpu_vmx *vmx)
 							MSR_TYPE_RW, flag);
 	vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_RTIT_CR3_MATCH,
 							MSR_TYPE_RW, flag);
-	for (i = 0; i < vmx->pt_desc.addr_range; i++) {
-		vmx_set_intercept_for_msr(msr_bitmap,
-			MSR_IA32_RTIT_ADDR0_A + i * 2, MSR_TYPE_RW, flag);
+	for (i = 0; i < vmx->pt_desc->addr_range * 2; i++)
 		vmx_set_intercept_for_msr(msr_bitmap,
-			MSR_IA32_RTIT_ADDR0_B + i * 2, MSR_TYPE_RW, flag);
-	}
+			MSR_IA32_RTIT_ADDR0_A + i, MSR_TYPE_RW, flag);
 }
 
 static bool vmx_get_enable_apicv(struct kvm_vcpu *vcpu)
@@ -4092,12 +4118,8 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
 	if (cpu_has_vmx_encls_vmexit())
 		vmcs_write64(ENCLS_EXITING_BITMAP, -1ull);
 
-	if (pt_mode == PT_MODE_HOST_GUEST) {
-		memset(&vmx->pt_desc, 0, sizeof(vmx->pt_desc));
-		/* Bit[6~0] are forced to 1, writes are ignored. */
-		vmx->pt_desc.guest_ctx.rtit_output_mask = 0x7F;
+	if (pt_mode == PT_MODE_HOST_GUEST)
 		vmcs_write64(GUEST_IA32_RTIT_CTL, 0);
-	}
 }
 
 static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
@@ -6544,6 +6566,8 @@ static void vmx_free_vcpu(struct kvm_vcpu *vcpu)
 
 	if (enable_pml)
 		vmx_destroy_pml_buffer(vmx);
+	if (pt_mode == PT_MODE_HOST_GUEST)
+		pt_uninit(vmx);
 	free_vpid(vmx->vpid);
 	nested_vmx_free_vcpu(vcpu);
 	free_loaded_vmcs(vmx->loaded_vmcs);
@@ -6592,12 +6616,18 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
 			goto uninit_vcpu;
 	}
 
+	if (pt_mode == PT_MODE_HOST_GUEST) {
+		err = pt_init(vmx);
+		if (err)
+			goto free_pml;
+	}
+
 	vmx->guest_msrs = kmalloc(PAGE_SIZE, GFP_KERNEL_ACCOUNT);
 	BUILD_BUG_ON(ARRAY_SIZE(vmx_msr_index) * sizeof(vmx->guest_msrs[0])
 		     > PAGE_SIZE);
 
 	if (!vmx->guest_msrs)
-		goto free_pml;
+		goto free_pt;
 
 	err = alloc_loaded_vmcs(&vmx->vmcs01);
 	if (err < 0)
@@ -6659,6 +6689,8 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
 	free_loaded_vmcs(vmx->loaded_vmcs);
 free_msrs:
 	kfree(vmx->guest_msrs);
+free_pt:
+	pt_uninit(vmx);
 free_pml:
 	vmx_destroy_pml_buffer(vmx);
 uninit_vcpu:
@@ -6866,63 +6898,63 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
 		best = kvm_find_cpuid_entry(vcpu, 0x14, i);
 		if (!best)
 			return;
-		vmx->pt_desc.caps[CPUID_EAX + i*PT_CPUID_REGS_NUM] = best->eax;
-		vmx->pt_desc.caps[CPUID_EBX + i*PT_CPUID_REGS_NUM] = best->ebx;
-		vmx->pt_desc.caps[CPUID_ECX + i*PT_CPUID_REGS_NUM] = best->ecx;
-		vmx->pt_desc.caps[CPUID_EDX + i*PT_CPUID_REGS_NUM] = best->edx;
+		vmx->pt_desc->caps[CPUID_EAX + i*PT_CPUID_REGS_NUM] = best->eax;
+		vmx->pt_desc->caps[CPUID_EBX + i*PT_CPUID_REGS_NUM] = best->ebx;
+		vmx->pt_desc->caps[CPUID_ECX + i*PT_CPUID_REGS_NUM] = best->ecx;
+		vmx->pt_desc->caps[CPUID_EDX + i*PT_CPUID_REGS_NUM] = best->edx;
 	}
 
 	/* Get the number of configurable Address Ranges for filtering */
-	vmx->pt_desc.addr_range = intel_pt_validate_cap(vmx->pt_desc.caps,
+	vmx->pt_desc->addr_range = intel_pt_validate_cap(vmx->pt_desc->caps,
 						PT_CAP_num_address_ranges);
 
 	/* Initialize and clear the no dependency bits */
-	vmx->pt_desc.ctl_bitmask = ~(RTIT_CTL_TRACEEN | RTIT_CTL_OS |
+	vmx->pt_desc->ctl_bitmask = ~(RTIT_CTL_TRACEEN | RTIT_CTL_OS |
 			RTIT_CTL_USR | RTIT_CTL_TSC_EN | RTIT_CTL_DISRETC);
 
 	/*
 	 * If CPUID.(EAX=14H,ECX=0):EBX[0]=1 CR3Filter can be set otherwise
 	 * will inject an #GP
 	 */
-	if (intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_cr3_filtering))
-		vmx->pt_desc.ctl_bitmask &= ~RTIT_CTL_CR3EN;
+	if (intel_pt_validate_cap(vmx->pt_desc->caps, PT_CAP_cr3_filtering))
+		vmx->pt_desc->ctl_bitmask &= ~RTIT_CTL_CR3EN;
 
 	/*
 	 * If CPUID.(EAX=14H,ECX=0):EBX[1]=1 CYCEn, CycThresh and
 	 * PSBFreq can be set
 	 */
-	if (intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_psb_cyc))
-		vmx->pt_desc.ctl_bitmask &= ~(RTIT_CTL_CYCLEACC |
+	if (intel_pt_validate_cap(vmx->pt_desc->caps, PT_CAP_psb_cyc))
+		vmx->pt_desc->ctl_bitmask &= ~(RTIT_CTL_CYCLEACC |
 				RTIT_CTL_CYC_THRESH | RTIT_CTL_PSB_FREQ);
 
 	/*
 	 * If CPUID.(EAX=14H,ECX=0):EBX[3]=1 MTCEn BranchEn and
 	 * MTCFreq can be set
 	 */
-	if (intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_mtc))
-		vmx->pt_desc.ctl_bitmask &= ~(RTIT_CTL_MTC_EN |
+	if (intel_pt_validate_cap(vmx->pt_desc->caps, PT_CAP_mtc))
+		vmx->pt_desc->ctl_bitmask &= ~(RTIT_CTL_MTC_EN |
 				RTIT_CTL_BRANCH_EN | RTIT_CTL_MTC_RANGE);
 
 	/* If CPUID.(EAX=14H,ECX=0):EBX[4]=1 FUPonPTW and PTWEn can be set */
-	if (intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_ptwrite))
-		vmx->pt_desc.ctl_bitmask &= ~(RTIT_CTL_FUP_ON_PTW |
+	if (intel_pt_validate_cap(vmx->pt_desc->caps, PT_CAP_ptwrite))
+		vmx->pt_desc->ctl_bitmask &= ~(RTIT_CTL_FUP_ON_PTW |
 							RTIT_CTL_PTW_EN);
 
 	/* If CPUID.(EAX=14H,ECX=0):EBX[5]=1 PwrEvEn can be set */
-	if (intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_power_event_trace))
-		vmx->pt_desc.ctl_bitmask &= ~RTIT_CTL_PWR_EVT_EN;
+	if (intel_pt_validate_cap(vmx->pt_desc->caps, PT_CAP_power_event_trace))
+		vmx->pt_desc->ctl_bitmask &= ~RTIT_CTL_PWR_EVT_EN;
 
 	/* If CPUID.(EAX=14H,ECX=0):ECX[0]=1 ToPA can be set */
-	if (intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_topa_output))
-		vmx->pt_desc.ctl_bitmask &= ~RTIT_CTL_TOPA;
+	if (intel_pt_validate_cap(vmx->pt_desc->caps, PT_CAP_topa_output))
+		vmx->pt_desc->ctl_bitmask &= ~RTIT_CTL_TOPA;
 
 	/* If CPUID.(EAX=14H,ECX=0):ECX[3]=1 FabircEn can be set */
-	if (intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_output_subsys))
-		vmx->pt_desc.ctl_bitmask &= ~RTIT_CTL_FABRIC_EN;
+	if (intel_pt_validate_cap(vmx->pt_desc->caps, PT_CAP_output_subsys))
+		vmx->pt_desc->ctl_bitmask &= ~RTIT_CTL_FABRIC_EN;
 
 	/* unmask address range configure area */
-	for (i = 0; i < vmx->pt_desc.addr_range; i++)
-		vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4));
+	for (i = 0; i < vmx->pt_desc->addr_range; i++)
+		vmx->pt_desc->ctl_bitmask &= ~(0xfULL << (32 + i * 4));
 }
 
 static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 11ad856..283f69d 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -69,8 +69,8 @@ struct pt_desc {
 	u64 ctl_bitmask;
 	u32 addr_range;
 	u32 caps[PT_CPUID_REGS_NUM * PT_CPUID_LEAVES];
-	struct pt_state host_ctx;
-	struct pt_state guest_ctx;
+	struct pt_state *host_ctx;
+	struct pt_state *guest_ctx;
 };
 
 /*
@@ -259,7 +259,7 @@ struct vcpu_vmx {
 	u64 msr_ia32_feature_control_valid_bits;
 	u64 ept_pointer;
 
-	struct pt_desc pt_desc;
+	struct pt_desc *pt_desc;
 };
 
 enum ept_pointers_status {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v1 4/6] KVM: VMX: Allocate XSAVE area for Intel PT configuration
  2019-05-16  8:25 [PATCH v1 0/6] KVM: VMX: Intel PT configuration switch using XSAVES/XRSTORS on VM-Entry/Exit Luwei Kang
                   ` (2 preceding siblings ...)
  2019-05-16  8:25 ` [PATCH v1 3/6] KVM: VMX: Dymamic allocate Intel PT configuration state Luwei Kang
@ 2019-05-16  8:25 ` Luwei Kang
  2019-05-16  8:25 ` [PATCH v1 5/6] KVM: VMX: Intel PT configration context switch using XSAVES/XRSTORS Luwei Kang
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Luwei Kang @ 2019-05-16  8:25 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: tglx, mingo, bp, hpa, x86, pbonzini, rkrcmar, Luwei Kang

Allocate XSAVE area for host and guest Intel PT
configuration when Intel PT working in HOST_GUEST
mode. Intel PT configuration state can be saved
using XSAVES and restored by XRSTORS instruction.

Signed-off-by: Luwei Kang <luwei.kang@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 25 ++++++++++++++++++++++++-
 arch/x86/kvm/vmx/vmx.h |  3 +++
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4595230..4691665 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1033,6 +1033,7 @@ static void pt_guest_exit(struct vcpu_vmx *vmx)
 
 static int pt_init(struct vcpu_vmx *vmx)
 {
+	unsigned int eax, ebx, ecx, edx;
 	u32 pt_state_sz = sizeof(struct pt_state) + sizeof(u64) *
 		intel_pt_validate_hw_cap(PT_CAP_num_address_ranges) * 2;
 
@@ -1044,13 +1045,35 @@ static int pt_init(struct vcpu_vmx *vmx)
 	vmx->pt_desc->host_ctx = (struct pt_state *)(vmx->pt_desc + 1);
 	vmx->pt_desc->guest_ctx = (void *)vmx->pt_desc->host_ctx + pt_state_sz;
 
+	cpuid_count(XSTATE_CPUID, 1, &eax, &ebx, &ecx, &edx);
+	if (ecx & XFEATURE_MASK_PT) {
+		vmx->pt_desc->host_xs = kmem_cache_zalloc(x86_fpu_cache,
+							GFP_KERNEL_ACCOUNT);
+		vmx->pt_desc->guest_xs = kmem_cache_zalloc(x86_fpu_cache,
+							GFP_KERNEL_ACCOUNT);
+		if (!vmx->pt_desc->host_xs || !vmx->pt_desc->guest_xs) {
+			if (vmx->pt_desc->host_xs)
+				kmem_cache_free(x86_fpu_cache,
+						vmx->pt_desc->host_xs);
+			if (vmx->pt_desc->guest_xs)
+				kmem_cache_free(x86_fpu_cache,
+						vmx->pt_desc->guest_xs);
+		} else
+			vmx->pt_desc->pt_xsave = true;
+	}
+
 	return 0;
 }
 
 static void pt_uninit(struct vcpu_vmx *vmx)
 {
-	if (pt_mode == PT_MODE_HOST_GUEST)
+	if (pt_mode == PT_MODE_HOST_GUEST) {
 		kfree(vmx->pt_desc);
+		if (vmx->pt_desc->pt_xsave) {
+			kmem_cache_free(x86_fpu_cache, vmx->pt_desc->host_xs);
+			kmem_cache_free(x86_fpu_cache, vmx->pt_desc->guest_xs);
+		}
+	}
 }
 
 void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 283f69d..e103991 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -69,8 +69,11 @@ struct pt_desc {
 	u64 ctl_bitmask;
 	u32 addr_range;
 	u32 caps[PT_CPUID_REGS_NUM * PT_CPUID_LEAVES];
+	bool pt_xsave;
 	struct pt_state *host_ctx;
 	struct pt_state *guest_ctx;
+	struct fpu *host_xs;
+	struct fpu *guest_xs;
 };
 
 /*
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v1 5/6] KVM: VMX: Intel PT configration context switch using XSAVES/XRSTORS
  2019-05-16  8:25 [PATCH v1 0/6] KVM: VMX: Intel PT configuration switch using XSAVES/XRSTORS on VM-Entry/Exit Luwei Kang
                   ` (3 preceding siblings ...)
  2019-05-16  8:25 ` [PATCH v1 4/6] KVM: VMX: Allocate XSAVE area for Intel PT configuration Luwei Kang
@ 2019-05-16  8:25 ` Luwei Kang
  2019-05-16  8:25 ` [PATCH v1 6/6] KVM: VMX: Get PT state from xsave area to variables Luwei Kang
  2019-11-11 13:28 ` [PATCH v1 0/6] KVM: VMX: Intel PT configuration switch using XSAVES/XRSTORS on VM-Entry/Exit Paolo Bonzini
  6 siblings, 0 replies; 8+ messages in thread
From: Luwei Kang @ 2019-05-16  8:25 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: tglx, mingo, bp, hpa, x86, pbonzini, rkrcmar, Luwei Kang

This patch add the support of using XSAVES/XRSTORS to
do the Intel processor trace context switch.

Because of native driver didn't set the XSS[bit8] to enabled
the PT state in xsave area, so this patch only set this bit
before XSAVE/XRSTORS intstuction executtion and restore the
original value after.

The flag "initialized" need to be cleared when PT is change
from enabled to disabled. Guest may modify PT MSRs when PT
is disabled and they are only saved in variables.
We need to reload these value to HW manual when PT is enabled.

Signed-off-by: Luwei Kang <luwei.kang@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 80 ++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 65 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4691665..d323e6b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1002,33 +1002,83 @@ static inline void pt_save_msr(struct pt_state *ctx, u32 addr_range)
 
 static void pt_guest_enter(struct vcpu_vmx *vmx)
 {
+	struct pt_desc *desc;
+	int err;
+
 	if (pt_mode == PT_MODE_SYSTEM)
 		return;
 
-	/*
-	 * GUEST_IA32_RTIT_CTL is already set in the VMCS.
-	 * Save host state before VM entry.
-	 */
-	rdmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc->host_ctx->rtit_ctl);
-	if (vmx->pt_desc->guest_ctx->rtit_ctl & RTIT_CTL_TRACEEN) {
-		wrmsrl(MSR_IA32_RTIT_CTL, 0);
-		pt_save_msr(vmx->pt_desc->host_ctx, vmx->pt_desc->addr_range);
-		pt_load_msr(vmx->pt_desc->guest_ctx, vmx->pt_desc->addr_range);
+	desc = vmx->pt_desc;
+
+	rdmsrl(MSR_IA32_RTIT_CTL, desc->host_ctx->rtit_ctl);
+
+	if (desc->guest_ctx->rtit_ctl & RTIT_CTL_TRACEEN) {
+		if (likely(desc->pt_xsave)) {
+			wrmsrl(MSR_IA32_XSS, host_xss | XFEATURE_MASK_PT);
+			/*
+			 * XSAVES instruction will clears the TeaceEn after
+			 * saving the value of RTIT_CTL and before saving any
+			 * other PT state.
+			 */
+			XSTATE_XSAVE(&desc->host_xs->state.xsave,
+						XFEATURE_MASK_PT, 0, err);
+			/*
+			 * Still need to load the guest PT state manual if
+			 * PT stste not populated in xsave area.
+			 */
+			if (desc->guest_xs->initialized)
+				XSTATE_XRESTORE(&desc->guest_xs->state.xsave,
+						XFEATURE_MASK_PT, 0);
+			else
+				pt_load_msr(desc->guest_ctx, desc->addr_range);
+
+			wrmsrl(MSR_IA32_XSS, host_xss);
+		} else {
+			if (desc->host_ctx->rtit_ctl & RTIT_CTL_TRACEEN)
+				wrmsrl(MSR_IA32_RTIT_CTL, 0);
+
+			pt_save_msr(desc->host_ctx, desc->addr_range);
+			pt_load_msr(desc->guest_ctx, desc->addr_range);
+		}
 	}
 }
 
 static void pt_guest_exit(struct vcpu_vmx *vmx)
 {
+	struct pt_desc *desc;
+	int err;
+
 	if (pt_mode == PT_MODE_SYSTEM)
 		return;
 
-	if (vmx->pt_desc->guest_ctx->rtit_ctl & RTIT_CTL_TRACEEN) {
-		pt_save_msr(vmx->pt_desc->guest_ctx, vmx->pt_desc->addr_range);
-		pt_load_msr(vmx->pt_desc->host_ctx, vmx->pt_desc->addr_range);
-	}
+	desc = vmx->pt_desc;
+
+	if (desc->guest_ctx->rtit_ctl & RTIT_CTL_TRACEEN) {
+		if (likely(desc->pt_xsave)) {
+			wrmsrl(MSR_IA32_XSS, host_xss | XFEATURE_MASK_PT);
+			/*
+			 * Save guest state. TraceEn is 0 before and after
+			 * XSAVES instruction because RTIT_CTL will be cleared
+			 * on VM-exit (VM Exit control bit25).
+			 */
+			XSTATE_XSAVE(&desc->guest_xs->state.xsave,
+						XFEATURE_MASK_PT, 0, err);
+			desc->guest_xs->initialized = 1;
+			/*
+			 * Resume host PT state and PT may enabled after this
+			 * instruction if host PT is enabled before VM-entry.
+			 */
+			XSTATE_XRESTORE(&desc->host_xs->state.xsave,
+						XFEATURE_MASK_PT, 0);
+			wrmsrl(MSR_IA32_XSS, host_xss);
+		} else {
+			pt_save_msr(desc->guest_ctx, desc->addr_range);
+			pt_load_msr(desc->host_ctx, desc->addr_range);
 
-	/* Reload host state (IA32_RTIT_CTL will be cleared on VM exit). */
-	wrmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc->host_ctx->rtit_ctl);
+			wrmsrl(MSR_IA32_RTIT_CTL, desc->host_ctx->rtit_ctl);
+		}
+	} else if (desc->host_ctx->rtit_ctl & RTIT_CTL_TRACEEN)
+		wrmsrl(MSR_IA32_RTIT_CTL, desc->host_ctx->rtit_ctl);
 }
 
 static int pt_init(struct vcpu_vmx *vmx)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v1 6/6] KVM: VMX: Get PT state from xsave area to variables
  2019-05-16  8:25 [PATCH v1 0/6] KVM: VMX: Intel PT configuration switch using XSAVES/XRSTORS on VM-Entry/Exit Luwei Kang
                   ` (4 preceding siblings ...)
  2019-05-16  8:25 ` [PATCH v1 5/6] KVM: VMX: Intel PT configration context switch using XSAVES/XRSTORS Luwei Kang
@ 2019-05-16  8:25 ` Luwei Kang
  2019-11-11 13:28 ` [PATCH v1 0/6] KVM: VMX: Intel PT configuration switch using XSAVES/XRSTORS on VM-Entry/Exit Paolo Bonzini
  6 siblings, 0 replies; 8+ messages in thread
From: Luwei Kang @ 2019-05-16  8:25 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: tglx, mingo, bp, hpa, x86, pbonzini, rkrcmar, Luwei Kang

This patch get the Intel PT state from xsave area to
variables when PT is change from enabled to disabled.
Because PT state is saved/restored to/from xsave area
by XSAVES/XRSTORES instructions when Intel PT is enabled.
The KVM guest may read this MSRs when PT is disabled
but the real value is saved in xsave area not variables.

Signed-off-by: Luwei Kang <luwei.kang@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index d323e6b..d3e2569 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1000,6 +1000,16 @@ static inline void pt_save_msr(struct pt_state *ctx, u32 addr_range)
 		rdmsrl(MSR_IA32_RTIT_ADDR0_A + i, ctx->rtit_addrx_ab[i]);
 }
 
+static void pt_state_get(struct pt_state *ctx, struct fpu *fpu, u32 addr_range)
+{
+	char *buff = fpu->state.xsave.extended_state_area;
+
+	/* skip riti_ctl register */
+	memcpy(&ctx->rtit_output_base, buff + sizeof(u64),
+			sizeof(struct pt_state) - sizeof(u64) +
+			sizeof(u64) * addr_range * 2);
+}
+
 static void pt_guest_enter(struct vcpu_vmx *vmx)
 {
 	struct pt_desc *desc;
@@ -1040,6 +1050,9 @@ static void pt_guest_enter(struct vcpu_vmx *vmx)
 			pt_save_msr(desc->host_ctx, desc->addr_range);
 			pt_load_msr(desc->guest_ctx, desc->addr_range);
 		}
+	} else if (desc->pt_xsave && desc->guest_xs->initialized) {
+		pt_state_get(desc->guest_ctx, desc->guest_xs, desc->addr_range);
+		desc->guest_xs->initialized = 0;
 	}
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v1 0/6] KVM: VMX: Intel PT configuration switch using XSAVES/XRSTORS on VM-Entry/Exit
  2019-05-16  8:25 [PATCH v1 0/6] KVM: VMX: Intel PT configuration switch using XSAVES/XRSTORS on VM-Entry/Exit Luwei Kang
                   ` (5 preceding siblings ...)
  2019-05-16  8:25 ` [PATCH v1 6/6] KVM: VMX: Get PT state from xsave area to variables Luwei Kang
@ 2019-11-11 13:28 ` Paolo Bonzini
  6 siblings, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2019-11-11 13:28 UTC (permalink / raw)
  To: Luwei Kang, linux-kernel, kvm; +Cc: tglx, mingo, bp, hpa, x86, rkrcmar

On 16/05/19 10:25, Luwei Kang wrote:
> This patch set is mainly used for reduce the overhead of switch
> Intel PT configuation contex on VM-Entry/Exit by XSAVES/XRSTORS
> instructions.
> 
> I measured the cycles number of context witch on Manual and
> XSAVES/XRSTORES by rdtsc, and the data as below:
> 
> Manual save(rdmsr):     ~334  cycles
> Manual restore(wrmsr):  ~1668 cycles
> 
> XSAVES insturction:     ~124  cycles
> XRSTORS instruction:    ~378  cycles
> 
> Manual: Switch the configuration by rdmsr and wrmsr instruction,
>         and there have 8 registers need to be saved or restore.
>         They are IA32_RTIT_OUTPUT_BASE, *_OUTPUT_MASK_PTRS,
>         *_STATUS, *_CR3_MATCH, *_ADDR0_A, *_ADDR0_B,
>         *_ADDR1_A, *_ADDR1_B.
> XSAVES/XRSTORS: Switch the configuration context by XSAVES/XRSTORS
>         instructions. This patch set will allocate separate
>         "struct fpu" structure to save host and guest PT state.
>         Only a small portion of this structure will be used because
>         we only save/restore PT state (not save AVX, AVX-512, MPX,
>         PKRU and so on).
> 
> This patch set also do some code clean e.g. patch 2 will reuse
> the fpu pt_state to save the PT configuration contex and
> patch 3 will dymamic allocate Intel PT configuration state.
> 
> Luwei Kang (6):
>   x86/fpu: Introduce new fpu state for Intel processor trace
>   KVM: VMX: Reuse the pt_state structure for PT context
>   KVM: VMX: Dymamic allocate Intel PT configuration state
>   KVM: VMX: Allocate XSAVE area for Intel PT configuration
>   KVM: VMX: Intel PT configration context switch using XSAVES/XRSTORS
>   KVM: VMX: Get PT state from xsave area to variables
> 
>  arch/x86/include/asm/fpu/types.h |  13 ++
>  arch/x86/kvm/vmx/nested.c        |   2 +-
>  arch/x86/kvm/vmx/vmx.c           | 338 ++++++++++++++++++++++++++-------------
>  arch/x86/kvm/vmx/vmx.h           |  21 +--
>  4 files changed, 243 insertions(+), 131 deletions(-)
> 

Luwei, I found I had missed this series.  Can you check whether it needs
a rebase, since I don't have hardware that supports it?

Paolo


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-11-11 13:28 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-16  8:25 [PATCH v1 0/6] KVM: VMX: Intel PT configuration switch using XSAVES/XRSTORS on VM-Entry/Exit Luwei Kang
2019-05-16  8:25 ` [PATCH v1 1/6] x86/fpu: Introduce new fpu state for Intel processor trace Luwei Kang
2019-05-16  8:25 ` [PATCH v1 2/6] KVM: VMX: Reuse the pt_state structure for PT context Luwei Kang
2019-05-16  8:25 ` [PATCH v1 3/6] KVM: VMX: Dymamic allocate Intel PT configuration state Luwei Kang
2019-05-16  8:25 ` [PATCH v1 4/6] KVM: VMX: Allocate XSAVE area for Intel PT configuration Luwei Kang
2019-05-16  8:25 ` [PATCH v1 5/6] KVM: VMX: Intel PT configration context switch using XSAVES/XRSTORS Luwei Kang
2019-05-16  8:25 ` [PATCH v1 6/6] KVM: VMX: Get PT state from xsave area to variables Luwei Kang
2019-11-11 13:28 ` [PATCH v1 0/6] KVM: VMX: Intel PT configuration switch using XSAVES/XRSTORS on VM-Entry/Exit Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).