linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] INVPCID support for the AMD guests
@ 2020-06-11 21:48 Babu Moger
  2020-06-11 21:48 ` [PATCH 1/3] KVM: X86: Move handling of INVPCID types to x86 Babu Moger
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Babu Moger @ 2020-06-11 21:48 UTC (permalink / raw)
  To: wanpengli, joro, x86, sean.j.christopherson, mingo, bp, hpa,
	pbonzini, vkuznets, tglx, jmattson
  Cc: linux-kernel, kvm

The following series adds the support for PCID/INVPCID on AMD guests.

For the guests with nested page table (NPT) support, the INVPCID
feature works as running it natively. KVM does not need to do any
special handling in this case.

KVM interceptions are required in the following cases.
1. If the guest tries to disable the feature when the underlying
   hardware supports it. In this case hypervisor needs to report #UD.

2. When the guest is running with shadow page table enabled, in
   this case the hypervisor needs to handle the tlbflush based on the
   type of invpcid instruction type.

AMD documentation for INVPCID feature is available at "AMD64
Architecture Programmer’s Manual Volume 2: System Programming,
Pub. 24593 Rev. 3.34(or later)"

The documentation can be obtained at the links below:
Link: https://www.amd.com/system/files/TechDocs/24593.pdf
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
---

babu Moger (3):
      KVM: X86: Move handling of INVPCID types to x86
      KVM:SVM: Add extended intercept support
      KVM:SVM: Enable INVPCID feature on AMD


 arch/x86/include/asm/svm.h      |    7 +++
 arch/x86/include/uapi/asm/svm.h |    2 +
 arch/x86/kvm/svm/nested.c       |    6 ++-
 arch/x86/kvm/svm/svm.c          |   43 +++++++++++++++++++
 arch/x86/kvm/svm/svm.h          |   18 ++++++++
 arch/x86/kvm/trace.h            |   12 ++++-
 arch/x86/kvm/vmx/vmx.c          |   78 ----------------------------------
 arch/x86/kvm/x86.c              |   89 +++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.h              |    2 +
 9 files changed, 174 insertions(+), 83 deletions(-)

--

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/3] KVM: X86: Move handling of INVPCID types to x86
  2020-06-11 21:48 [PATCH 0/3] INVPCID support for the AMD guests Babu Moger
@ 2020-06-11 21:48 ` Babu Moger
  2020-06-12 18:02   ` Jim Mattson
  2020-06-11 21:48 ` [PATCH 2/3] KVM:SVM: Add extended intercept support Babu Moger
  2020-06-11 21:48 ` [PATCH 3/3] KVM:SVM: Enable INVPCID feature on AMD Babu Moger
  2 siblings, 1 reply; 14+ messages in thread
From: Babu Moger @ 2020-06-11 21:48 UTC (permalink / raw)
  To: wanpengli, joro, x86, sean.j.christopherson, mingo, bp, hpa,
	pbonzini, vkuznets, tglx, jmattson
  Cc: linux-kernel, kvm

INVPCID instruction handling is mostly same across both VMX and
SVM. So, move the code to common x86.c.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/kvm/vmx/vmx.c |   78 +-----------------------------------------
 arch/x86/kvm/x86.c     |   89 ++++++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.h     |    2 +
 3 files changed, 92 insertions(+), 77 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 170cc76a581f..d9c35f337da6 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5477,29 +5477,15 @@ static int handle_invpcid(struct kvm_vcpu *vcpu)
 {
 	u32 vmx_instruction_info;
 	unsigned long type;
-	bool pcid_enabled;
 	gva_t gva;
-	struct x86_exception e;
-	unsigned i;
-	unsigned long roots_to_free = 0;
 	struct {
 		u64 pcid;
 		u64 gla;
 	} operand;
 
-	if (!guest_cpuid_has(vcpu, X86_FEATURE_INVPCID)) {
-		kvm_queue_exception(vcpu, UD_VECTOR);
-		return 1;
-	}
-
 	vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
 	type = kvm_register_readl(vcpu, (vmx_instruction_info >> 28) & 0xf);
 
-	if (type > 3) {
-		kvm_inject_gp(vcpu, 0);
-		return 1;
-	}
-
 	/* According to the Intel instruction reference, the memory operand
 	 * is read even if it isn't needed (e.g., for type==all)
 	 */
@@ -5508,69 +5494,7 @@ static int handle_invpcid(struct kvm_vcpu *vcpu)
 				sizeof(operand), &gva))
 		return 1;
 
-	if (kvm_read_guest_virt(vcpu, gva, &operand, sizeof(operand), &e)) {
-		kvm_inject_emulated_page_fault(vcpu, &e);
-		return 1;
-	}
-
-	if (operand.pcid >> 12 != 0) {
-		kvm_inject_gp(vcpu, 0);
-		return 1;
-	}
-
-	pcid_enabled = kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
-
-	switch (type) {
-	case INVPCID_TYPE_INDIV_ADDR:
-		if ((!pcid_enabled && (operand.pcid != 0)) ||
-		    is_noncanonical_address(operand.gla, vcpu)) {
-			kvm_inject_gp(vcpu, 0);
-			return 1;
-		}
-		kvm_mmu_invpcid_gva(vcpu, operand.gla, operand.pcid);
-		return kvm_skip_emulated_instruction(vcpu);
-
-	case INVPCID_TYPE_SINGLE_CTXT:
-		if (!pcid_enabled && (operand.pcid != 0)) {
-			kvm_inject_gp(vcpu, 0);
-			return 1;
-		}
-
-		if (kvm_get_active_pcid(vcpu) == operand.pcid) {
-			kvm_mmu_sync_roots(vcpu);
-			kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
-		}
-
-		for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
-			if (kvm_get_pcid(vcpu, vcpu->arch.mmu->prev_roots[i].pgd)
-			    == operand.pcid)
-				roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
-
-		kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, roots_to_free);
-		/*
-		 * If neither the current cr3 nor any of the prev_roots use the
-		 * given PCID, then nothing needs to be done here because a
-		 * resync will happen anyway before switching to any other CR3.
-		 */
-
-		return kvm_skip_emulated_instruction(vcpu);
-
-	case INVPCID_TYPE_ALL_NON_GLOBAL:
-		/*
-		 * Currently, KVM doesn't mark global entries in the shadow
-		 * page tables, so a non-global flush just degenerates to a
-		 * global flush. If needed, we could optimize this later by
-		 * keeping track of global entries in shadow page tables.
-		 */
-
-		/* fall-through */
-	case INVPCID_TYPE_ALL_INCL_GLOBAL:
-		kvm_mmu_unload(vcpu);
-		return kvm_skip_emulated_instruction(vcpu);
-
-	default:
-		BUG(); /* We have already checked above that type <= 3 */
-	}
+	return kvm_handle_invpcid_types(vcpu,  gva, type);
 }
 
 static int handle_pml_full(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9e41b5135340..13373359608c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -72,6 +72,7 @@
 #include <asm/hypervisor.h>
 #include <asm/intel_pt.h>
 #include <asm/emulate_prefix.h>
+#include <asm/tlbflush.h>
 #include <clocksource/hyperv_timer.h>
 
 #define CREATE_TRACE_POINTS
@@ -10714,6 +10715,94 @@ u64 kvm_spec_ctrl_valid_bits(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_spec_ctrl_valid_bits);
 
+int kvm_handle_invpcid_types(struct kvm_vcpu *vcpu, gva_t gva,
+			     unsigned long type)
+{
+	unsigned long roots_to_free = 0;
+	struct x86_exception e;
+	bool pcid_enabled;
+	unsigned i;
+	struct {
+		u64 pcid;
+		u64 gla;
+	} operand;
+
+	if (!guest_cpuid_has(vcpu, X86_FEATURE_INVPCID)) {
+		kvm_queue_exception(vcpu, UD_VECTOR);
+		return 1;
+	}
+
+	if (type > 3) {
+		kvm_inject_gp(vcpu, 0);
+		return 1;
+	}
+
+	if (kvm_read_guest_virt(vcpu, gva, &operand, sizeof(operand), &e)) {
+		kvm_inject_emulated_page_fault(vcpu, &e);
+		return 1;
+	}
+
+	if (operand.pcid >> 12 != 0) {
+		kvm_inject_gp(vcpu, 0);
+		return 1;
+	}
+
+	pcid_enabled = kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
+
+	switch (type) {
+	case INVPCID_TYPE_INDIV_ADDR:
+		if ((!pcid_enabled && (operand.pcid != 0)) ||
+		    is_noncanonical_address(operand.gla, vcpu)) {
+			kvm_inject_gp(vcpu, 0);
+			return 1;
+		}
+		kvm_mmu_invpcid_gva(vcpu, operand.gla, operand.pcid);
+		return kvm_skip_emulated_instruction(vcpu);
+
+	case INVPCID_TYPE_SINGLE_CTXT:
+		if (!pcid_enabled && (operand.pcid != 0)) {
+			kvm_inject_gp(vcpu, 0);
+			return 1;
+		}
+
+		if (kvm_get_active_pcid(vcpu) == operand.pcid) {
+			kvm_mmu_sync_roots(vcpu);
+			kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
+		}
+
+		for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
+			if (kvm_get_pcid(vcpu, vcpu->arch.mmu->prev_roots[i].pgd)
+			    == operand.pcid)
+				roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
+
+		kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, roots_to_free);
+		/*
+		 * If neither the current cr3 nor any of the prev_roots use the
+		 * given PCID, then nothing needs to be done here because a
+		 * resync will happen anyway before switching to any other CR3.
+		 */
+
+		return kvm_skip_emulated_instruction(vcpu);
+
+	case INVPCID_TYPE_ALL_NON_GLOBAL:
+		/*
+		 * Currently, KVM doesn't mark global entries in the shadow
+		 * page tables, so a non-global flush just degenerates to a
+		 * global flush. If needed, we could optimize this later by
+		 * keeping track of global entries in shadow page tables.
+		 */
+
+		/* fall-through */
+	case INVPCID_TYPE_ALL_INCL_GLOBAL:
+		kvm_mmu_unload(vcpu);
+		return kvm_skip_emulated_instruction(vcpu);
+
+	default:
+		BUG(); /* We have already checked above that type <= 3 */
+	}
+}
+EXPORT_SYMBOL_GPL(kvm_handle_invpcid_types);
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 6eb62e97e59f..8e23f2705344 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -365,5 +365,7 @@ void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu);
 void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu);
 u64 kvm_spec_ctrl_valid_bits(struct kvm_vcpu *vcpu);
 bool kvm_vcpu_exit_request(struct kvm_vcpu *vcpu);
+int kvm_handle_invpcid_types(struct kvm_vcpu *vcpu, gva_t gva,
+			     unsigned long type);
 
 #endif


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/3] KVM:SVM: Add extended intercept support
  2020-06-11 21:48 [PATCH 0/3] INVPCID support for the AMD guests Babu Moger
  2020-06-11 21:48 ` [PATCH 1/3] KVM: X86: Move handling of INVPCID types to x86 Babu Moger
@ 2020-06-11 21:48 ` Babu Moger
  2020-06-11 21:48 ` [PATCH 3/3] KVM:SVM: Enable INVPCID feature on AMD Babu Moger
  2 siblings, 0 replies; 14+ messages in thread
From: Babu Moger @ 2020-06-11 21:48 UTC (permalink / raw)
  To: wanpengli, joro, x86, sean.j.christopherson, mingo, bp, hpa,
	pbonzini, vkuznets, tglx, jmattson
  Cc: linux-kernel, kvm

The new intercept bits have been added in vmcb control
area to support the interception of INVPCID instruction.

The following bit is added to the VMCB layout control area
to control intercept of INVPCID:

Byte Offset     Bit(s)          Function
14h             2               intercept INVPCID

Add the interfaces to support these extended interception.
Also update the tracing for extended intercepts.

AMD documentation for INVPCID feature is available at "AMD64
Architecture Programmer’s Manual Volume 2: System Programming,
Pub. 24593 Rev. 3.34(or later)"

The documentation can be obtained at the links below:
Link: https://www.amd.com/system/files/TechDocs/24593.pdf
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/include/asm/svm.h |    3 ++-
 arch/x86/kvm/svm/nested.c  |    6 +++++-
 arch/x86/kvm/svm/svm.c     |    1 +
 arch/x86/kvm/svm/svm.h     |   18 ++++++++++++++++++
 arch/x86/kvm/trace.h       |   12 ++++++++----
 5 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 8a1f5382a4ea..62649fba8908 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -61,7 +61,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 	u32 intercept_dr;
 	u32 intercept_exceptions;
 	u64 intercept;
-	u8 reserved_1[40];
+	u32 intercept_extended;
+	u8 reserved_1[36];
 	u16 pause_filter_thresh;
 	u16 pause_filter_count;
 	u64 iopm_base_pa;
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 8a6db11dcb43..7f6d0f2533e2 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -121,6 +121,7 @@ void recalc_intercepts(struct vcpu_svm *svm)
 	c->intercept_dr = h->intercept_dr;
 	c->intercept_exceptions = h->intercept_exceptions;
 	c->intercept = h->intercept;
+	c->intercept_extended = h->intercept_extended;
 
 	if (g->int_ctl & V_INTR_MASKING_MASK) {
 		/* We only want the cr8 intercept bits of L1 */
@@ -142,6 +143,7 @@ void recalc_intercepts(struct vcpu_svm *svm)
 	c->intercept_dr |= g->intercept_dr;
 	c->intercept_exceptions |= g->intercept_exceptions;
 	c->intercept |= g->intercept;
+	c->intercept_extended |= g->intercept_extended;
 }
 
 static void copy_vmcb_control_area(struct vmcb_control_area *dst,
@@ -151,6 +153,7 @@ static void copy_vmcb_control_area(struct vmcb_control_area *dst,
 	dst->intercept_dr         = from->intercept_dr;
 	dst->intercept_exceptions = from->intercept_exceptions;
 	dst->intercept            = from->intercept;
+	dst->intercept_extended   = from->intercept_extended;
 	dst->iopm_base_pa         = from->iopm_base_pa;
 	dst->msrpm_base_pa        = from->msrpm_base_pa;
 	dst->tsc_offset           = from->tsc_offset;
@@ -433,7 +436,8 @@ int nested_svm_vmrun(struct vcpu_svm *svm)
 	trace_kvm_nested_intercepts(nested_vmcb->control.intercept_cr & 0xffff,
 				    nested_vmcb->control.intercept_cr >> 16,
 				    nested_vmcb->control.intercept_exceptions,
-				    nested_vmcb->control.intercept);
+				    nested_vmcb->control.intercept,
+				    nested_vmcb->control.intercept_extended);
 
 	/* Clear internal status */
 	kvm_clear_exception_queue(&svm->vcpu);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 9e333b91ff78..285e5e1ff518 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2801,6 +2801,7 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
 	pr_err("%-20s%04x\n", "dr_write:", control->intercept_dr >> 16);
 	pr_err("%-20s%08x\n", "exceptions:", control->intercept_exceptions);
 	pr_err("%-20s%016llx\n", "intercepts:", control->intercept);
+	pr_err("%-20s%08x\n", "intercepts (extended):", control->intercept_extended);
 	pr_err("%-20s%d\n", "pause filter count:", control->pause_filter_count);
 	pr_err("%-20s%d\n", "pause filter threshold:",
 	       control->pause_filter_thresh);
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 6ac4c00a5d82..935d08fac03d 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -311,6 +311,24 @@ static inline void clr_intercept(struct vcpu_svm *svm, int bit)
 	recalc_intercepts(svm);
 }
 
+static inline void set_extended_intercept(struct vcpu_svm *svm, int bit)
+{
+	struct vmcb *vmcb = get_host_vmcb(svm);
+
+	vmcb->control.intercept_extended |= (1U << bit);
+
+	recalc_intercepts(svm);
+}
+
+static inline void clr_extended_intercept(struct vcpu_svm *svm, int bit)
+{
+	struct vmcb *vmcb = get_host_vmcb(svm);
+
+	vmcb->control.intercept_extended &= ~(1U << bit);
+
+	recalc_intercepts(svm);
+}
+
 static inline bool is_intercept(struct vcpu_svm *svm, int bit)
 {
 	return (svm->vmcb->control.intercept & (1ULL << bit)) != 0;
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index b66432b015d2..5c841c42b33d 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -544,14 +544,16 @@ TRACE_EVENT(kvm_nested_vmrun,
 );
 
 TRACE_EVENT(kvm_nested_intercepts,
-	    TP_PROTO(__u16 cr_read, __u16 cr_write, __u32 exceptions, __u64 intercept),
-	    TP_ARGS(cr_read, cr_write, exceptions, intercept),
+	    TP_PROTO(__u16 cr_read, __u16 cr_write, __u32 exceptions, __u64 intercept,
+		     __u32 extended),
+	    TP_ARGS(cr_read, cr_write, exceptions, intercept, extended),
 
 	TP_STRUCT__entry(
 		__field(	__u16,		cr_read		)
 		__field(	__u16,		cr_write	)
 		__field(	__u32,		exceptions	)
 		__field(	__u64,		intercept	)
+		__field(	__u32,		extended	)
 	),
 
 	TP_fast_assign(
@@ -559,11 +561,13 @@ TRACE_EVENT(kvm_nested_intercepts,
 		__entry->cr_write	= cr_write;
 		__entry->exceptions	= exceptions;
 		__entry->intercept	= intercept;
+		__entry->extended	= extended;
 	),
 
-	TP_printk("cr_read: %04x cr_write: %04x excp: %08x intercept: %016llx",
+	TP_printk("cr_read: %04x cr_write: %04x excp: %08x intercept: %016llx"
+		  "intercept (extended): %08x",
 		__entry->cr_read, __entry->cr_write, __entry->exceptions,
-		__entry->intercept)
+		__entry->intercept, __entry->extended)
 );
 /*
  * Tracepoint for #VMEXIT while nested


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/3] KVM:SVM: Enable INVPCID feature on AMD
  2020-06-11 21:48 [PATCH 0/3] INVPCID support for the AMD guests Babu Moger
  2020-06-11 21:48 ` [PATCH 1/3] KVM: X86: Move handling of INVPCID types to x86 Babu Moger
  2020-06-11 21:48 ` [PATCH 2/3] KVM:SVM: Add extended intercept support Babu Moger
@ 2020-06-11 21:48 ` Babu Moger
  2020-06-11 23:50   ` Jim Mattson
  2 siblings, 1 reply; 14+ messages in thread
From: Babu Moger @ 2020-06-11 21:48 UTC (permalink / raw)
  To: wanpengli, joro, x86, sean.j.christopherson, mingo, bp, hpa,
	pbonzini, vkuznets, tglx, jmattson
  Cc: linux-kernel, kvm

The following intercept is added for INVPCID instruction:
Code    Name            Cause
A2h     VMEXIT_INVPCID  INVPCID instruction

The following bit is added to the VMCB layout control area
to control intercept of INVPCID:
Byte Offset     Bit(s)    Function
14h             2         intercept INVPCID

For the guests with nested page table (NPT) support, the INVPCID
feature works as running it natively. KVM does not need to do any
special handling in this case.

Interceptions are required in the following cases.
1. If the guest tries to disable the feature when the underlying
hardware supports it. In this case hypervisor needs to report #UD.
2. When the guest is running with shadow page table enabled, in
this case the hypervisor needs to handle the tlbflush based on the
type of invpcid instruction type.

AMD documentation for INVPCID feature is available at "AMD64
Architecture Programmer’s Manual Volume 2: System Programming,
Pub. 24593 Rev. 3.34(or later)"

The documentation can be obtained at the links below:
Link: https://www.amd.com/system/files/TechDocs/24593.pdf
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/include/asm/svm.h      |    4 ++++
 arch/x86/include/uapi/asm/svm.h |    2 ++
 arch/x86/kvm/svm/svm.c          |   42 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 48 insertions(+)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 62649fba8908..6488094f67fa 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -55,6 +55,10 @@ enum {
 	INTERCEPT_RDPRU,
 };
 
+/* Extended Intercept bits */
+enum {
+	INTERCEPT_INVPCID = 2,
+};
 
 struct __attribute__ ((__packed__)) vmcb_control_area {
 	u32 intercept_cr;
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 2e8a30f06c74..522d42dfc28c 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -76,6 +76,7 @@
 #define SVM_EXIT_MWAIT_COND    0x08c
 #define SVM_EXIT_XSETBV        0x08d
 #define SVM_EXIT_RDPRU         0x08e
+#define SVM_EXIT_INVPCID       0x0a2
 #define SVM_EXIT_NPF           0x400
 #define SVM_EXIT_AVIC_INCOMPLETE_IPI		0x401
 #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS	0x402
@@ -171,6 +172,7 @@
 	{ SVM_EXIT_MONITOR,     "monitor" }, \
 	{ SVM_EXIT_MWAIT,       "mwait" }, \
 	{ SVM_EXIT_XSETBV,      "xsetbv" }, \
+	{ SVM_EXIT_INVPCID,     "invpcid" }, \
 	{ SVM_EXIT_NPF,         "npf" }, \
 	{ SVM_EXIT_AVIC_INCOMPLETE_IPI,		"avic_incomplete_ipi" }, \
 	{ SVM_EXIT_AVIC_UNACCELERATED_ACCESS,   "avic_unaccelerated_access" }, \
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 285e5e1ff518..82d974338f68 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -813,6 +813,11 @@ static __init void svm_set_cpu_caps(void)
 	if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) ||
 	    boot_cpu_has(X86_FEATURE_AMD_SSBD))
 		kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD);
+
+	/* Enable INVPCID if both PCID and INVPCID enabled */
+	if (boot_cpu_has(X86_FEATURE_PCID) &&
+	    boot_cpu_has(X86_FEATURE_INVPCID))
+		kvm_cpu_cap_set(X86_FEATURE_INVPCID);
 }
 
 static __init int svm_hardware_setup(void)
@@ -1099,6 +1104,17 @@ static void init_vmcb(struct vcpu_svm *svm)
 		clr_intercept(svm, INTERCEPT_PAUSE);
 	}
 
+	/*
+	 * Intercept INVPCID instruction only if shadow page table is
+	 * enabled. Interception is not required with nested page table.
+	 */
+	if (boot_cpu_has(X86_FEATURE_INVPCID)) {
+		if (!npt_enabled)
+			set_extended_intercept(svm, INTERCEPT_INVPCID);
+		else
+			clr_extended_intercept(svm, INTERCEPT_INVPCID);
+	}
+
 	if (kvm_vcpu_apicv_active(&svm->vcpu))
 		avic_init_vmcb(svm);
 
@@ -2715,6 +2731,23 @@ static int mwait_interception(struct vcpu_svm *svm)
 	return nop_interception(svm);
 }
 
+static int invpcid_interception(struct vcpu_svm *svm)
+{
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	unsigned long type;
+	gva_t gva;
+
+	/*
+	 * For an INVPCID intercept:
+	 * EXITINFO1 provides the linear address of the memory operand.
+	 * EXITINFO2 provides the contents of the register operand.
+	 */
+	type = svm->vmcb->control.exit_info_2;
+	gva = svm->vmcb->control.exit_info_1;
+
+	return kvm_handle_invpcid_types(vcpu,  gva, type);
+}
+
 static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
 	[SVM_EXIT_READ_CR0]			= cr_interception,
 	[SVM_EXIT_READ_CR3]			= cr_interception,
@@ -2777,6 +2810,7 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
 	[SVM_EXIT_MWAIT]			= mwait_interception,
 	[SVM_EXIT_XSETBV]			= xsetbv_interception,
 	[SVM_EXIT_RDPRU]			= rdpru_interception,
+	[SVM_EXIT_INVPCID]                      = invpcid_interception,
 	[SVM_EXIT_NPF]				= npf_interception,
 	[SVM_EXIT_RSM]                          = rsm_interception,
 	[SVM_EXIT_AVIC_INCOMPLETE_IPI]		= avic_incomplete_ipi_interception,
@@ -3562,6 +3596,14 @@ static void svm_cpuid_update(struct kvm_vcpu *vcpu)
 	svm->nrips_enabled = kvm_cpu_cap_has(X86_FEATURE_NRIPS) &&
 			     guest_cpuid_has(&svm->vcpu, X86_FEATURE_NRIPS);
 
+	/*
+	 * Intercept INVPCID instruction if the baremetal has the support
+	 * but the guest doesn't claim the feature.
+	 */
+	if (boot_cpu_has(X86_FEATURE_INVPCID) &&
+	    !guest_cpuid_has(vcpu, X86_FEATURE_INVPCID))
+		set_extended_intercept(svm, INTERCEPT_INVPCID);
+
 	if (!kvm_vcpu_apicv_active(vcpu))
 		return;
 


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] KVM:SVM: Enable INVPCID feature on AMD
  2020-06-11 21:48 ` [PATCH 3/3] KVM:SVM: Enable INVPCID feature on AMD Babu Moger
@ 2020-06-11 23:50   ` Jim Mattson
  2020-06-12 19:35     ` Babu Moger
  0 siblings, 1 reply; 14+ messages in thread
From: Jim Mattson @ 2020-06-11 23:50 UTC (permalink / raw)
  To: Babu Moger
  Cc: Wanpeng Li, Joerg Roedel, the arch/x86 maintainers,
	Sean Christopherson, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, Paolo Bonzini, Vitaly Kuznetsov,
	Thomas Gleixner, LKML, kvm list

On Thu, Jun 11, 2020 at 2:48 PM Babu Moger <babu.moger@amd.com> wrote:
>
> The following intercept is added for INVPCID instruction:
> Code    Name            Cause
> A2h     VMEXIT_INVPCID  INVPCID instruction
>
> The following bit is added to the VMCB layout control area
> to control intercept of INVPCID:
> Byte Offset     Bit(s)    Function
> 14h             2         intercept INVPCID
>
> For the guests with nested page table (NPT) support, the INVPCID
> feature works as running it natively. KVM does not need to do any
> special handling in this case.
>
> Interceptions are required in the following cases.
> 1. If the guest tries to disable the feature when the underlying
> hardware supports it. In this case hypervisor needs to report #UD.

Per the AMD documentation, attempts to use INVPCID at CPL>0 will
result in a #GP, regardless of the intercept bit. If the guest CPUID
doesn't enumerate the feature, shouldn't the instruction raise #UD
regardless of CPL? This seems to imply that we should intercept #GP
and decode the instruction to see if we should synthesize #UD instead.

> 2. When the guest is running with shadow page table enabled, in
> this case the hypervisor needs to handle the tlbflush based on the
> type of invpcid instruction type.
>
> AMD documentation for INVPCID feature is available at "AMD64
> Architecture Programmer’s Manual Volume 2: System Programming,
> Pub. 24593 Rev. 3.34(or later)"
>
> The documentation can be obtained at the links below:
> Link: https://www.amd.com/system/files/TechDocs/24593.pdf
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
>  arch/x86/include/asm/svm.h      |    4 ++++
>  arch/x86/include/uapi/asm/svm.h |    2 ++
>  arch/x86/kvm/svm/svm.c          |   42 +++++++++++++++++++++++++++++++++++++++
>  3 files changed, 48 insertions(+)
>
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index 62649fba8908..6488094f67fa 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -55,6 +55,10 @@ enum {
>         INTERCEPT_RDPRU,
>  };
>
> +/* Extended Intercept bits */
> +enum {
> +       INTERCEPT_INVPCID = 2,
> +};
>
>  struct __attribute__ ((__packed__)) vmcb_control_area {
>         u32 intercept_cr;
> diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
> index 2e8a30f06c74..522d42dfc28c 100644
> --- a/arch/x86/include/uapi/asm/svm.h
> +++ b/arch/x86/include/uapi/asm/svm.h
> @@ -76,6 +76,7 @@
>  #define SVM_EXIT_MWAIT_COND    0x08c
>  #define SVM_EXIT_XSETBV        0x08d
>  #define SVM_EXIT_RDPRU         0x08e
> +#define SVM_EXIT_INVPCID       0x0a2
>  #define SVM_EXIT_NPF           0x400
>  #define SVM_EXIT_AVIC_INCOMPLETE_IPI           0x401
>  #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS     0x402
> @@ -171,6 +172,7 @@
>         { SVM_EXIT_MONITOR,     "monitor" }, \
>         { SVM_EXIT_MWAIT,       "mwait" }, \
>         { SVM_EXIT_XSETBV,      "xsetbv" }, \
> +       { SVM_EXIT_INVPCID,     "invpcid" }, \
>         { SVM_EXIT_NPF,         "npf" }, \
>         { SVM_EXIT_AVIC_INCOMPLETE_IPI,         "avic_incomplete_ipi" }, \
>         { SVM_EXIT_AVIC_UNACCELERATED_ACCESS,   "avic_unaccelerated_access" }, \
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 285e5e1ff518..82d974338f68 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -813,6 +813,11 @@ static __init void svm_set_cpu_caps(void)
>         if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) ||
>             boot_cpu_has(X86_FEATURE_AMD_SSBD))
>                 kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD);
> +
> +       /* Enable INVPCID if both PCID and INVPCID enabled */
> +       if (boot_cpu_has(X86_FEATURE_PCID) &&
> +           boot_cpu_has(X86_FEATURE_INVPCID))
> +               kvm_cpu_cap_set(X86_FEATURE_INVPCID);
>  }
>
>  static __init int svm_hardware_setup(void)
> @@ -1099,6 +1104,17 @@ static void init_vmcb(struct vcpu_svm *svm)
>                 clr_intercept(svm, INTERCEPT_PAUSE);
>         }
>
> +       /*
> +        * Intercept INVPCID instruction only if shadow page table is
> +        * enabled. Interception is not required with nested page table.
> +        */
> +       if (boot_cpu_has(X86_FEATURE_INVPCID)) {
> +               if (!npt_enabled)
> +                       set_extended_intercept(svm, INTERCEPT_INVPCID);
> +               else
> +                       clr_extended_intercept(svm, INTERCEPT_INVPCID);
> +       }
> +
>         if (kvm_vcpu_apicv_active(&svm->vcpu))
>                 avic_init_vmcb(svm);
>
> @@ -2715,6 +2731,23 @@ static int mwait_interception(struct vcpu_svm *svm)
>         return nop_interception(svm);
>  }
>
> +static int invpcid_interception(struct vcpu_svm *svm)
> +{
> +       struct kvm_vcpu *vcpu = &svm->vcpu;
> +       unsigned long type;
> +       gva_t gva;
> +
> +       /*
> +        * For an INVPCID intercept:
> +        * EXITINFO1 provides the linear address of the memory operand.
> +        * EXITINFO2 provides the contents of the register operand.
> +        */
> +       type = svm->vmcb->control.exit_info_2;
> +       gva = svm->vmcb->control.exit_info_1;
> +
> +       return kvm_handle_invpcid_types(vcpu,  gva, type);
> +}
> +
>  static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
>         [SVM_EXIT_READ_CR0]                     = cr_interception,
>         [SVM_EXIT_READ_CR3]                     = cr_interception,
> @@ -2777,6 +2810,7 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
>         [SVM_EXIT_MWAIT]                        = mwait_interception,
>         [SVM_EXIT_XSETBV]                       = xsetbv_interception,
>         [SVM_EXIT_RDPRU]                        = rdpru_interception,
> +       [SVM_EXIT_INVPCID]                      = invpcid_interception,
>         [SVM_EXIT_NPF]                          = npf_interception,
>         [SVM_EXIT_RSM]                          = rsm_interception,
>         [SVM_EXIT_AVIC_INCOMPLETE_IPI]          = avic_incomplete_ipi_interception,
> @@ -3562,6 +3596,14 @@ static void svm_cpuid_update(struct kvm_vcpu *vcpu)
>         svm->nrips_enabled = kvm_cpu_cap_has(X86_FEATURE_NRIPS) &&
>                              guest_cpuid_has(&svm->vcpu, X86_FEATURE_NRIPS);
>
> +       /*
> +        * Intercept INVPCID instruction if the baremetal has the support
> +        * but the guest doesn't claim the feature.
> +        */
> +       if (boot_cpu_has(X86_FEATURE_INVPCID) &&
> +           !guest_cpuid_has(vcpu, X86_FEATURE_INVPCID))
> +               set_extended_intercept(svm, INTERCEPT_INVPCID);
> +

What if INVPCID is enabled in the guest CPUID later? Shouldn't we then
clear this intercept bit?

>         if (!kvm_vcpu_apicv_active(vcpu))
>                 return;
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] KVM: X86: Move handling of INVPCID types to x86
  2020-06-11 21:48 ` [PATCH 1/3] KVM: X86: Move handling of INVPCID types to x86 Babu Moger
@ 2020-06-12 18:02   ` Jim Mattson
  2020-06-12 20:06     ` Babu Moger
  0 siblings, 1 reply; 14+ messages in thread
From: Jim Mattson @ 2020-06-12 18:02 UTC (permalink / raw)
  To: Babu Moger
  Cc: Wanpeng Li, Joerg Roedel, the arch/x86 maintainers,
	Sean Christopherson, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, Paolo Bonzini, Vitaly Kuznetsov,
	Thomas Gleixner, LKML, kvm list

On Thu, Jun 11, 2020 at 2:48 PM Babu Moger <babu.moger@amd.com> wrote:
>
> INVPCID instruction handling is mostly same across both VMX and
> SVM. So, move the code to common x86.c.
>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
>  arch/x86/kvm/vmx/vmx.c |   78 +-----------------------------------------
>  arch/x86/kvm/x86.c     |   89 ++++++++++++++++++++++++++++++++++++++++++++++++
>  arch/x86/kvm/x86.h     |    2 +
>  3 files changed, 92 insertions(+), 77 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 170cc76a581f..d9c35f337da6 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -5477,29 +5477,15 @@ static int handle_invpcid(struct kvm_vcpu *vcpu)
>  {
>         u32 vmx_instruction_info;
>         unsigned long type;
> -       bool pcid_enabled;
>         gva_t gva;
> -       struct x86_exception e;
> -       unsigned i;
> -       unsigned long roots_to_free = 0;
>         struct {
>                 u64 pcid;
>                 u64 gla;
>         } operand;
>
> -       if (!guest_cpuid_has(vcpu, X86_FEATURE_INVPCID)) {
> -               kvm_queue_exception(vcpu, UD_VECTOR);
> -               return 1;
> -       }
> -
>         vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
>         type = kvm_register_readl(vcpu, (vmx_instruction_info >> 28) & 0xf);
>
> -       if (type > 3) {
> -               kvm_inject_gp(vcpu, 0);
> -               return 1;
> -       }
> -

You've introduced some fault priority inversions by sinking the above
tests for #UD and #GP below the call to get_vmx_mem_address(), which
may raise #UD, #GP, or #SS.

>         /* According to the Intel instruction reference, the memory operand
>          * is read even if it isn't needed (e.g., for type==all)
>          */
> @@ -5508,69 +5494,7 @@ static int handle_invpcid(struct kvm_vcpu *vcpu)
>                                 sizeof(operand), &gva))
>                 return 1;
>
> -       if (kvm_read_guest_virt(vcpu, gva, &operand, sizeof(operand), &e)) {
> -               kvm_inject_emulated_page_fault(vcpu, &e);
> -               return 1;
> -       }
> -
> -       if (operand.pcid >> 12 != 0) {
> -               kvm_inject_gp(vcpu, 0);
> -               return 1;
> -       }
> -
> -       pcid_enabled = kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
> -
> -       switch (type) {
> -       case INVPCID_TYPE_INDIV_ADDR:
> -               if ((!pcid_enabled && (operand.pcid != 0)) ||
> -                   is_noncanonical_address(operand.gla, vcpu)) {
> -                       kvm_inject_gp(vcpu, 0);
> -                       return 1;
> -               }
> -               kvm_mmu_invpcid_gva(vcpu, operand.gla, operand.pcid);
> -               return kvm_skip_emulated_instruction(vcpu);
> -
> -       case INVPCID_TYPE_SINGLE_CTXT:
> -               if (!pcid_enabled && (operand.pcid != 0)) {
> -                       kvm_inject_gp(vcpu, 0);
> -                       return 1;
> -               }
> -
> -               if (kvm_get_active_pcid(vcpu) == operand.pcid) {
> -                       kvm_mmu_sync_roots(vcpu);
> -                       kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
> -               }
> -
> -               for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
> -                       if (kvm_get_pcid(vcpu, vcpu->arch.mmu->prev_roots[i].pgd)
> -                           == operand.pcid)
> -                               roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
> -
> -               kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, roots_to_free);
> -               /*
> -                * If neither the current cr3 nor any of the prev_roots use the
> -                * given PCID, then nothing needs to be done here because a
> -                * resync will happen anyway before switching to any other CR3.
> -                */
> -
> -               return kvm_skip_emulated_instruction(vcpu);
> -
> -       case INVPCID_TYPE_ALL_NON_GLOBAL:
> -               /*
> -                * Currently, KVM doesn't mark global entries in the shadow
> -                * page tables, so a non-global flush just degenerates to a
> -                * global flush. If needed, we could optimize this later by
> -                * keeping track of global entries in shadow page tables.
> -                */
> -
> -               /* fall-through */
> -       case INVPCID_TYPE_ALL_INCL_GLOBAL:
> -               kvm_mmu_unload(vcpu);
> -               return kvm_skip_emulated_instruction(vcpu);
> -
> -       default:
> -               BUG(); /* We have already checked above that type <= 3 */
> -       }
> +       return kvm_handle_invpcid_types(vcpu,  gva, type);
>  }
>
>  static int handle_pml_full(struct kvm_vcpu *vcpu)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9e41b5135340..13373359608c 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -72,6 +72,7 @@
>  #include <asm/hypervisor.h>
>  #include <asm/intel_pt.h>
>  #include <asm/emulate_prefix.h>
> +#include <asm/tlbflush.h>
>  #include <clocksource/hyperv_timer.h>
>
>  #define CREATE_TRACE_POINTS
> @@ -10714,6 +10715,94 @@ u64 kvm_spec_ctrl_valid_bits(struct kvm_vcpu *vcpu)
>  }
>  EXPORT_SYMBOL_GPL(kvm_spec_ctrl_valid_bits);
>
> +int kvm_handle_invpcid_types(struct kvm_vcpu *vcpu, gva_t gva,
> +                            unsigned long type)
> +{
> +       unsigned long roots_to_free = 0;
> +       struct x86_exception e;
> +       bool pcid_enabled;
> +       unsigned i;
> +       struct {
> +               u64 pcid;
> +               u64 gla;
> +       } operand;
> +
> +       if (!guest_cpuid_has(vcpu, X86_FEATURE_INVPCID)) {
> +               kvm_queue_exception(vcpu, UD_VECTOR);
> +               return 1;
> +       }
> +
> +       if (type > 3) {
> +               kvm_inject_gp(vcpu, 0);
> +               return 1;
> +       }
> +
> +       if (kvm_read_guest_virt(vcpu, gva, &operand, sizeof(operand), &e)) {
> +               kvm_inject_emulated_page_fault(vcpu, &e);
> +               return 1;
> +       }
> +
> +       if (operand.pcid >> 12 != 0) {
> +               kvm_inject_gp(vcpu, 0);
> +               return 1;
> +       }
> +
> +       pcid_enabled = kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
> +
> +       switch (type) {
> +       case INVPCID_TYPE_INDIV_ADDR:
> +               if ((!pcid_enabled && (operand.pcid != 0)) ||
> +                   is_noncanonical_address(operand.gla, vcpu)) {
> +                       kvm_inject_gp(vcpu, 0);
> +                       return 1;
> +               }
> +               kvm_mmu_invpcid_gva(vcpu, operand.gla, operand.pcid);
> +               return kvm_skip_emulated_instruction(vcpu);
> +
> +       case INVPCID_TYPE_SINGLE_CTXT:
> +               if (!pcid_enabled && (operand.pcid != 0)) {
> +                       kvm_inject_gp(vcpu, 0);
> +                       return 1;
> +               }
> +
> +               if (kvm_get_active_pcid(vcpu) == operand.pcid) {
> +                       kvm_mmu_sync_roots(vcpu);
> +                       kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
> +               }
> +
> +               for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
> +                       if (kvm_get_pcid(vcpu, vcpu->arch.mmu->prev_roots[i].pgd)
> +                           == operand.pcid)
> +                               roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
> +
> +               kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, roots_to_free);
> +               /*
> +                * If neither the current cr3 nor any of the prev_roots use the
> +                * given PCID, then nothing needs to be done here because a
> +                * resync will happen anyway before switching to any other CR3.
> +                */
> +
> +               return kvm_skip_emulated_instruction(vcpu);
> +
> +       case INVPCID_TYPE_ALL_NON_GLOBAL:
> +               /*
> +                * Currently, KVM doesn't mark global entries in the shadow
> +                * page tables, so a non-global flush just degenerates to a
> +                * global flush. If needed, we could optimize this later by
> +                * keeping track of global entries in shadow page tables.
> +                */
> +
> +               /* fall-through */
> +       case INVPCID_TYPE_ALL_INCL_GLOBAL:
> +               kvm_mmu_unload(vcpu);
> +               return kvm_skip_emulated_instruction(vcpu);
> +
> +       default:
> +               BUG(); /* We have already checked above that type <= 3 */
> +       }
> +}
> +EXPORT_SYMBOL_GPL(kvm_handle_invpcid_types);
> +
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 6eb62e97e59f..8e23f2705344 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -365,5 +365,7 @@ void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu);
>  void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu);
>  u64 kvm_spec_ctrl_valid_bits(struct kvm_vcpu *vcpu);
>  bool kvm_vcpu_exit_request(struct kvm_vcpu *vcpu);
> +int kvm_handle_invpcid_types(struct kvm_vcpu *vcpu, gva_t gva,
> +                            unsigned long type);
>
>  #endif
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH 3/3] KVM:SVM: Enable INVPCID feature on AMD
  2020-06-11 23:50   ` Jim Mattson
@ 2020-06-12 19:35     ` Babu Moger
  2020-06-12 20:10       ` Jim Mattson
  0 siblings, 1 reply; 14+ messages in thread
From: Babu Moger @ 2020-06-12 19:35 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Wanpeng Li, Joerg Roedel, the arch/x86 maintainers,
	Sean Christopherson, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, Paolo Bonzini, Vitaly Kuznetsov,
	Thomas Gleixner, LKML, kvm list



> -----Original Message-----
> From: Jim Mattson <jmattson@google.com>
> Sent: Thursday, June 11, 2020 6:51 PM
> To: Moger, Babu <Babu.Moger@amd.com>
> Cc: Wanpeng Li <wanpengli@tencent.com>; Joerg Roedel <joro@8bytes.org>;
> the arch/x86 maintainers <x86@kernel.org>; Sean Christopherson
> <sean.j.christopherson@intel.com>; Ingo Molnar <mingo@redhat.com>;
> Borislav Petkov <bp@alien8.de>; H . Peter Anvin <hpa@zytor.com>; Paolo
> Bonzini <pbonzini@redhat.com>; Vitaly Kuznetsov <vkuznets@redhat.com>;
> Thomas Gleixner <tglx@linutronix.de>; LKML <linux-kernel@vger.kernel.org>;
> kvm list <kvm@vger.kernel.org>
> Subject: Re: [PATCH 3/3] KVM:SVM: Enable INVPCID feature on AMD
> 
> On Thu, Jun 11, 2020 at 2:48 PM Babu Moger <babu.moger@amd.com> wrote:
> >
> > The following intercept is added for INVPCID instruction:
> > Code    Name            Cause
> > A2h     VMEXIT_INVPCID  INVPCID instruction
> >
> > The following bit is added to the VMCB layout control area
> > to control intercept of INVPCID:
> > Byte Offset     Bit(s)    Function
> > 14h             2         intercept INVPCID
> >
> > For the guests with nested page table (NPT) support, the INVPCID
> > feature works as running it natively. KVM does not need to do any
> > special handling in this case.
> >
> > Interceptions are required in the following cases.
> > 1. If the guest tries to disable the feature when the underlying
> > hardware supports it. In this case hypervisor needs to report #UD.
> 
> Per the AMD documentation, attempts to use INVPCID at CPL>0 will
> result in a #GP, regardless of the intercept bit. If the guest CPUID
> doesn't enumerate the feature, shouldn't the instruction raise #UD
> regardless of CPL? This seems to imply that we should intercept #GP
> and decode the instruction to see if we should synthesize #UD instead.

Purpose here is to report UD when the guest CPUID doesn't enumerate the
INVPCID feature When Bare-metal supports it. It seems to work fine for
that purpose. You are right. The #GP for CPL>0 takes precedence over
interception. No. I am not planning to intercept GP.

I will change the text. How about this?

Interceptions are required in the following cases.
1. If the guest CPUID doesn't enumerate the INVPCID feature when the
underlying hardware supports it,  hypervisor needs to report UD. However,
#GP for CPL>0 takes precedence over interception.

> > 2. When the guest is running with shadow page table enabled, in
> > this case the hypervisor needs to handle the tlbflush based on the
> > type of invpcid instruction type.
> >
> > AMD documentation for INVPCID feature is available at "AMD64
> > Architecture Programmer’s Manual Volume 2: System Programming,
> > Pub. 24593 Rev. 3.34(or later)"
> >
> > The documentation can be obtained at the links below:
> > Link:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.a
> md.com%2Fsystem%2Ffiles%2FTechDocs%2F24593.pdf&amp;data=02%7C01%7
> Cbabu.moger%40amd.com%7C36861b25f6d143e3b38e08d80e624472%7C3dd8
> 961fe4884e608e11a82d994e183d%7C0%7C0%7C637275163374103811&amp;s
> data=E%2Fdb6T%2BdO4nrtUoqhKidF6XyorsWrphj6O4WwNZpmYA%3D&amp;res
> erved=0
> > Link:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.
> kernel.org%2Fshow_bug.cgi%3Fid%3D206537&amp;data=02%7C01%7Cbabu.m
> oger%40amd.com%7C36861b25f6d143e3b38e08d80e624472%7C3dd8961fe488
> 4e608e11a82d994e183d%7C0%7C0%7C637275163374103811&amp;sdata=b81
> 9W%2FhKS93%2BAp3QvcsR0BwTQpUVUFMbIaNaisgWHRY%3D&amp;reserved=
> 0
> >
> > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > ---
> >  arch/x86/include/asm/svm.h      |    4 ++++
> >  arch/x86/include/uapi/asm/svm.h |    2 ++
> >  arch/x86/kvm/svm/svm.c          |   42
> +++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 48 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> > index 62649fba8908..6488094f67fa 100644
> > --- a/arch/x86/include/asm/svm.h
> > +++ b/arch/x86/include/asm/svm.h
> > @@ -55,6 +55,10 @@ enum {
> >         INTERCEPT_RDPRU,
> >  };
> >
> > +/* Extended Intercept bits */
> > +enum {
> > +       INTERCEPT_INVPCID = 2,
> > +};
> >
> >  struct __attribute__ ((__packed__)) vmcb_control_area {
> >         u32 intercept_cr;
> > diff --git a/arch/x86/include/uapi/asm/svm.h
> b/arch/x86/include/uapi/asm/svm.h
> > index 2e8a30f06c74..522d42dfc28c 100644
> > --- a/arch/x86/include/uapi/asm/svm.h
> > +++ b/arch/x86/include/uapi/asm/svm.h
> > @@ -76,6 +76,7 @@
> >  #define SVM_EXIT_MWAIT_COND    0x08c
> >  #define SVM_EXIT_XSETBV        0x08d
> >  #define SVM_EXIT_RDPRU         0x08e
> > +#define SVM_EXIT_INVPCID       0x0a2
> >  #define SVM_EXIT_NPF           0x400
> >  #define SVM_EXIT_AVIC_INCOMPLETE_IPI           0x401
> >  #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS     0x402
> > @@ -171,6 +172,7 @@
> >         { SVM_EXIT_MONITOR,     "monitor" }, \
> >         { SVM_EXIT_MWAIT,       "mwait" }, \
> >         { SVM_EXIT_XSETBV,      "xsetbv" }, \
> > +       { SVM_EXIT_INVPCID,     "invpcid" }, \
> >         { SVM_EXIT_NPF,         "npf" }, \
> >         { SVM_EXIT_AVIC_INCOMPLETE_IPI,         "avic_incomplete_ipi" }, \
> >         { SVM_EXIT_AVIC_UNACCELERATED_ACCESS,
> "avic_unaccelerated_access" }, \
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 285e5e1ff518..82d974338f68 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -813,6 +813,11 @@ static __init void svm_set_cpu_caps(void)
> >         if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) ||
> >             boot_cpu_has(X86_FEATURE_AMD_SSBD))
> >                 kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD);
> > +
> > +       /* Enable INVPCID if both PCID and INVPCID enabled */
> > +       if (boot_cpu_has(X86_FEATURE_PCID) &&
> > +           boot_cpu_has(X86_FEATURE_INVPCID))
> > +               kvm_cpu_cap_set(X86_FEATURE_INVPCID);
> >  }
> >
> >  static __init int svm_hardware_setup(void)
> > @@ -1099,6 +1104,17 @@ static void init_vmcb(struct vcpu_svm *svm)
> >                 clr_intercept(svm, INTERCEPT_PAUSE);
> >         }
> >
> > +       /*
> > +        * Intercept INVPCID instruction only if shadow page table is
> > +        * enabled. Interception is not required with nested page table.
> > +        */
> > +       if (boot_cpu_has(X86_FEATURE_INVPCID)) {
> > +               if (!npt_enabled)
> > +                       set_extended_intercept(svm, INTERCEPT_INVPCID);
> > +               else
> > +                       clr_extended_intercept(svm, INTERCEPT_INVPCID);
> > +       }
> > +
> >         if (kvm_vcpu_apicv_active(&svm->vcpu))
> >                 avic_init_vmcb(svm);
> >
> > @@ -2715,6 +2731,23 @@ static int mwait_interception(struct vcpu_svm
> *svm)
> >         return nop_interception(svm);
> >  }
> >
> > +static int invpcid_interception(struct vcpu_svm *svm)
> > +{
> > +       struct kvm_vcpu *vcpu = &svm->vcpu;
> > +       unsigned long type;
> > +       gva_t gva;
> > +
> > +       /*
> > +        * For an INVPCID intercept:
> > +        * EXITINFO1 provides the linear address of the memory operand.
> > +        * EXITINFO2 provides the contents of the register operand.
> > +        */
> > +       type = svm->vmcb->control.exit_info_2;
> > +       gva = svm->vmcb->control.exit_info_1;
> > +
> > +       return kvm_handle_invpcid_types(vcpu,  gva, type);
> > +}
> > +
> >  static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
> >         [SVM_EXIT_READ_CR0]                     = cr_interception,
> >         [SVM_EXIT_READ_CR3]                     = cr_interception,
> > @@ -2777,6 +2810,7 @@ static int (*const svm_exit_handlers[])(struct
> vcpu_svm *svm) = {
> >         [SVM_EXIT_MWAIT]                        = mwait_interception,
> >         [SVM_EXIT_XSETBV]                       = xsetbv_interception,
> >         [SVM_EXIT_RDPRU]                        = rdpru_interception,
> > +       [SVM_EXIT_INVPCID]                      = invpcid_interception,
> >         [SVM_EXIT_NPF]                          = npf_interception,
> >         [SVM_EXIT_RSM]                          = rsm_interception,
> >         [SVM_EXIT_AVIC_INCOMPLETE_IPI]          =
> avic_incomplete_ipi_interception,
> > @@ -3562,6 +3596,14 @@ static void svm_cpuid_update(struct kvm_vcpu
> *vcpu)
> >         svm->nrips_enabled = kvm_cpu_cap_has(X86_FEATURE_NRIPS) &&
> >                              guest_cpuid_has(&svm->vcpu, X86_FEATURE_NRIPS);
> >
> > +       /*
> > +        * Intercept INVPCID instruction if the baremetal has the support
> > +        * but the guest doesn't claim the feature.
> > +        */
> > +       if (boot_cpu_has(X86_FEATURE_INVPCID) &&
> > +           !guest_cpuid_has(vcpu, X86_FEATURE_INVPCID))
> > +               set_extended_intercept(svm, INTERCEPT_INVPCID);
> > +
> 
> What if INVPCID is enabled in the guest CPUID later? Shouldn't we then
> clear this intercept bit?

I assume the feature enable comes in the same code path as this. I can add
"if else" check here if that is what you are suggesting.

> 
> >         if (!kvm_vcpu_apicv_active(vcpu))
> >                 return;
> >
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] KVM: X86: Move handling of INVPCID types to x86
  2020-06-12 18:02   ` Jim Mattson
@ 2020-06-12 20:06     ` Babu Moger
  0 siblings, 0 replies; 14+ messages in thread
From: Babu Moger @ 2020-06-12 20:06 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Wanpeng Li, Joerg Roedel, the arch/x86 maintainers,
	Sean Christopherson, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, Paolo Bonzini, Vitaly Kuznetsov,
	Thomas Gleixner, LKML, kvm list



On 6/12/20 1:02 PM, Jim Mattson wrote:
> On Thu, Jun 11, 2020 at 2:48 PM Babu Moger <babu.moger@amd.com> wrote:
>>
>> INVPCID instruction handling is mostly same across both VMX and
>> SVM. So, move the code to common x86.c.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>>  arch/x86/kvm/vmx/vmx.c |   78 +-----------------------------------------
>>  arch/x86/kvm/x86.c     |   89 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  arch/x86/kvm/x86.h     |    2 +
>>  3 files changed, 92 insertions(+), 77 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index 170cc76a581f..d9c35f337da6 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -5477,29 +5477,15 @@ static int handle_invpcid(struct kvm_vcpu *vcpu)
>>  {
>>         u32 vmx_instruction_info;
>>         unsigned long type;
>> -       bool pcid_enabled;
>>         gva_t gva;
>> -       struct x86_exception e;
>> -       unsigned i;
>> -       unsigned long roots_to_free = 0;
>>         struct {
>>                 u64 pcid;
>>                 u64 gla;
>>         } operand;
>>
>> -       if (!guest_cpuid_has(vcpu, X86_FEATURE_INVPCID)) {
>> -               kvm_queue_exception(vcpu, UD_VECTOR);
>> -               return 1;
>> -       }
>> -
>>         vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
>>         type = kvm_register_readl(vcpu, (vmx_instruction_info >> 28) & 0xf);
>>
>> -       if (type > 3) {
>> -               kvm_inject_gp(vcpu, 0);
>> -               return 1;
>> -       }
>> -
> 
> You've introduced some fault priority inversions by sinking the above
> tests for #UD and #GP below the call to get_vmx_mem_address(), which
> may raise #UD, #GP, or #SS.

oh. Ok. I will restore the old order back. Thanks for spotting it.

> 
>>         /* According to the Intel instruction reference, the memory operand
>>          * is read even if it isn't needed (e.g., for type==all)
>>          */
>> @@ -5508,69 +5494,7 @@ static int handle_invpcid(struct kvm_vcpu *vcpu)
>>                                 sizeof(operand), &gva))
>>                 return 1;
>>
>> -       if (kvm_read_guest_virt(vcpu, gva, &operand, sizeof(operand), &e)) {
>> -               kvm_inject_emulated_page_fault(vcpu, &e);
>> -               return 1;
>> -       }
>> -
>> -       if (operand.pcid >> 12 != 0) {
>> -               kvm_inject_gp(vcpu, 0);
>> -               return 1;
>> -       }
>> -
>> -       pcid_enabled = kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
>> -
>> -       switch (type) {
>> -       case INVPCID_TYPE_INDIV_ADDR:
>> -               if ((!pcid_enabled && (operand.pcid != 0)) ||
>> -                   is_noncanonical_address(operand.gla, vcpu)) {
>> -                       kvm_inject_gp(vcpu, 0);
>> -                       return 1;
>> -               }
>> -               kvm_mmu_invpcid_gva(vcpu, operand.gla, operand.pcid);
>> -               return kvm_skip_emulated_instruction(vcpu);
>> -
>> -       case INVPCID_TYPE_SINGLE_CTXT:
>> -               if (!pcid_enabled && (operand.pcid != 0)) {
>> -                       kvm_inject_gp(vcpu, 0);
>> -                       return 1;
>> -               }
>> -
>> -               if (kvm_get_active_pcid(vcpu) == operand.pcid) {
>> -                       kvm_mmu_sync_roots(vcpu);
>> -                       kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
>> -               }
>> -
>> -               for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
>> -                       if (kvm_get_pcid(vcpu, vcpu->arch.mmu->prev_roots[i].pgd)
>> -                           == operand.pcid)
>> -                               roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
>> -
>> -               kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, roots_to_free);
>> -               /*
>> -                * If neither the current cr3 nor any of the prev_roots use the
>> -                * given PCID, then nothing needs to be done here because a
>> -                * resync will happen anyway before switching to any other CR3.
>> -                */
>> -
>> -               return kvm_skip_emulated_instruction(vcpu);
>> -
>> -       case INVPCID_TYPE_ALL_NON_GLOBAL:
>> -               /*
>> -                * Currently, KVM doesn't mark global entries in the shadow
>> -                * page tables, so a non-global flush just degenerates to a
>> -                * global flush. If needed, we could optimize this later by
>> -                * keeping track of global entries in shadow page tables.
>> -                */
>> -
>> -               /* fall-through */
>> -       case INVPCID_TYPE_ALL_INCL_GLOBAL:
>> -               kvm_mmu_unload(vcpu);
>> -               return kvm_skip_emulated_instruction(vcpu);
>> -
>> -       default:
>> -               BUG(); /* We have already checked above that type <= 3 */
>> -       }
>> +       return kvm_handle_invpcid_types(vcpu,  gva, type);
>>  }
>>
>>  static int handle_pml_full(struct kvm_vcpu *vcpu)
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 9e41b5135340..13373359608c 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -72,6 +72,7 @@
>>  #include <asm/hypervisor.h>
>>  #include <asm/intel_pt.h>
>>  #include <asm/emulate_prefix.h>
>> +#include <asm/tlbflush.h>
>>  #include <clocksource/hyperv_timer.h>
>>
>>  #define CREATE_TRACE_POINTS
>> @@ -10714,6 +10715,94 @@ u64 kvm_spec_ctrl_valid_bits(struct kvm_vcpu *vcpu)
>>  }
>>  EXPORT_SYMBOL_GPL(kvm_spec_ctrl_valid_bits);
>>
>> +int kvm_handle_invpcid_types(struct kvm_vcpu *vcpu, gva_t gva,
>> +                            unsigned long type)
>> +{
>> +       unsigned long roots_to_free = 0;
>> +       struct x86_exception e;
>> +       bool pcid_enabled;
>> +       unsigned i;
>> +       struct {
>> +               u64 pcid;
>> +               u64 gla;
>> +       } operand;
>> +
>> +       if (!guest_cpuid_has(vcpu, X86_FEATURE_INVPCID)) {
>> +               kvm_queue_exception(vcpu, UD_VECTOR);
>> +               return 1;
>> +       }
>> +
>> +       if (type > 3) {
>> +               kvm_inject_gp(vcpu, 0);
>> +               return 1;
>> +       }
>> +
>> +       if (kvm_read_guest_virt(vcpu, gva, &operand, sizeof(operand), &e)) {
>> +               kvm_inject_emulated_page_fault(vcpu, &e);
>> +               return 1;
>> +       }
>> +
>> +       if (operand.pcid >> 12 != 0) {
>> +               kvm_inject_gp(vcpu, 0);
>> +               return 1;
>> +       }
>> +
>> +       pcid_enabled = kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
>> +
>> +       switch (type) {
>> +       case INVPCID_TYPE_INDIV_ADDR:
>> +               if ((!pcid_enabled && (operand.pcid != 0)) ||
>> +                   is_noncanonical_address(operand.gla, vcpu)) {
>> +                       kvm_inject_gp(vcpu, 0);
>> +                       return 1;
>> +               }
>> +               kvm_mmu_invpcid_gva(vcpu, operand.gla, operand.pcid);
>> +               return kvm_skip_emulated_instruction(vcpu);
>> +
>> +       case INVPCID_TYPE_SINGLE_CTXT:
>> +               if (!pcid_enabled && (operand.pcid != 0)) {
>> +                       kvm_inject_gp(vcpu, 0);
>> +                       return 1;
>> +               }
>> +
>> +               if (kvm_get_active_pcid(vcpu) == operand.pcid) {
>> +                       kvm_mmu_sync_roots(vcpu);
>> +                       kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
>> +               }
>> +
>> +               for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
>> +                       if (kvm_get_pcid(vcpu, vcpu->arch.mmu->prev_roots[i].pgd)
>> +                           == operand.pcid)
>> +                               roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
>> +
>> +               kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, roots_to_free);
>> +               /*
>> +                * If neither the current cr3 nor any of the prev_roots use the
>> +                * given PCID, then nothing needs to be done here because a
>> +                * resync will happen anyway before switching to any other CR3.
>> +                */
>> +
>> +               return kvm_skip_emulated_instruction(vcpu);
>> +
>> +       case INVPCID_TYPE_ALL_NON_GLOBAL:
>> +               /*
>> +                * Currently, KVM doesn't mark global entries in the shadow
>> +                * page tables, so a non-global flush just degenerates to a
>> +                * global flush. If needed, we could optimize this later by
>> +                * keeping track of global entries in shadow page tables.
>> +                */
>> +
>> +               /* fall-through */
>> +       case INVPCID_TYPE_ALL_INCL_GLOBAL:
>> +               kvm_mmu_unload(vcpu);
>> +               return kvm_skip_emulated_instruction(vcpu);
>> +
>> +       default:
>> +               BUG(); /* We have already checked above that type <= 3 */
>> +       }
>> +}
>> +EXPORT_SYMBOL_GPL(kvm_handle_invpcid_types);
>> +
>>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
>>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
>>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
>> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
>> index 6eb62e97e59f..8e23f2705344 100644
>> --- a/arch/x86/kvm/x86.h
>> +++ b/arch/x86/kvm/x86.h
>> @@ -365,5 +365,7 @@ void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu);
>>  void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu);
>>  u64 kvm_spec_ctrl_valid_bits(struct kvm_vcpu *vcpu);
>>  bool kvm_vcpu_exit_request(struct kvm_vcpu *vcpu);
>> +int kvm_handle_invpcid_types(struct kvm_vcpu *vcpu, gva_t gva,
>> +                            unsigned long type);
>>
>>  #endif
>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] KVM:SVM: Enable INVPCID feature on AMD
  2020-06-12 19:35     ` Babu Moger
@ 2020-06-12 20:10       ` Jim Mattson
  2020-06-12 21:47         ` Babu Moger
  0 siblings, 1 reply; 14+ messages in thread
From: Jim Mattson @ 2020-06-12 20:10 UTC (permalink / raw)
  To: Babu Moger
  Cc: Wanpeng Li, Joerg Roedel, the arch/x86 maintainers,
	Sean Christopherson, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, Paolo Bonzini, Vitaly Kuznetsov,
	Thomas Gleixner, LKML, kvm list

On Fri, Jun 12, 2020 at 12:35 PM Babu Moger <babu.moger@amd.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Jim Mattson <jmattson@google.com>
> > Sent: Thursday, June 11, 2020 6:51 PM
> > To: Moger, Babu <Babu.Moger@amd.com>
> > Cc: Wanpeng Li <wanpengli@tencent.com>; Joerg Roedel <joro@8bytes.org>;
> > the arch/x86 maintainers <x86@kernel.org>; Sean Christopherson
> > <sean.j.christopherson@intel.com>; Ingo Molnar <mingo@redhat.com>;
> > Borislav Petkov <bp@alien8.de>; H . Peter Anvin <hpa@zytor.com>; Paolo
> > Bonzini <pbonzini@redhat.com>; Vitaly Kuznetsov <vkuznets@redhat.com>;
> > Thomas Gleixner <tglx@linutronix.de>; LKML <linux-kernel@vger.kernel.org>;
> > kvm list <kvm@vger.kernel.org>
> > Subject: Re: [PATCH 3/3] KVM:SVM: Enable INVPCID feature on AMD
> >
> > On Thu, Jun 11, 2020 at 2:48 PM Babu Moger <babu.moger@amd.com> wrote:
> > >
> > > The following intercept is added for INVPCID instruction:
> > > Code    Name            Cause
> > > A2h     VMEXIT_INVPCID  INVPCID instruction
> > >
> > > The following bit is added to the VMCB layout control area
> > > to control intercept of INVPCID:
> > > Byte Offset     Bit(s)    Function
> > > 14h             2         intercept INVPCID
> > >
> > > For the guests with nested page table (NPT) support, the INVPCID
> > > feature works as running it natively. KVM does not need to do any
> > > special handling in this case.
> > >
> > > Interceptions are required in the following cases.
> > > 1. If the guest tries to disable the feature when the underlying
> > > hardware supports it. In this case hypervisor needs to report #UD.
> >
> > Per the AMD documentation, attempts to use INVPCID at CPL>0 will
> > result in a #GP, regardless of the intercept bit. If the guest CPUID
> > doesn't enumerate the feature, shouldn't the instruction raise #UD
> > regardless of CPL? This seems to imply that we should intercept #GP
> > and decode the instruction to see if we should synthesize #UD instead.
>
> Purpose here is to report UD when the guest CPUID doesn't enumerate the
> INVPCID feature When Bare-metal supports it. It seems to work fine for
> that purpose. You are right. The #GP for CPL>0 takes precedence over
> interception. No. I am not planning to intercept GP.

WIthout intercepting #GP, you fail to achieve your stated purpose.

> I will change the text. How about this?
>
> Interceptions are required in the following cases.
> 1. If the guest CPUID doesn't enumerate the INVPCID feature when the
> underlying hardware supports it,  hypervisor needs to report UD. However,
> #GP for CPL>0 takes precedence over interception.

This text is not internally consistent. In one sentence, you say that
"hypervisor needs to report #UD." In the next sentence, you are
essentially saying that the hypervisor doesn't need to report #UD.
Which is it?

> > > 2. When the guest is running with shadow page table enabled, in
> > > this case the hypervisor needs to handle the tlbflush based on the
> > > type of invpcid instruction type.
> > >
> > > AMD documentation for INVPCID feature is available at "AMD64
> > > Architecture Programmer’s Manual Volume 2: System Programming,
> > > Pub. 24593 Rev. 3.34(or later)"
> > >
> > > The documentation can be obtained at the links below:
> > > Link:
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.a
> > md.com%2Fsystem%2Ffiles%2FTechDocs%2F24593.pdf&amp;data=02%7C01%7
> > Cbabu.moger%40amd.com%7C36861b25f6d143e3b38e08d80e624472%7C3dd8
> > 961fe4884e608e11a82d994e183d%7C0%7C0%7C637275163374103811&amp;s
> > data=E%2Fdb6T%2BdO4nrtUoqhKidF6XyorsWrphj6O4WwNZpmYA%3D&amp;res
> > erved=0
> > > Link:
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.
> > kernel.org%2Fshow_bug.cgi%3Fid%3D206537&amp;data=02%7C01%7Cbabu.m
> > oger%40amd.com%7C36861b25f6d143e3b38e08d80e624472%7C3dd8961fe488
> > 4e608e11a82d994e183d%7C0%7C0%7C637275163374103811&amp;sdata=b81
> > 9W%2FhKS93%2BAp3QvcsR0BwTQpUVUFMbIaNaisgWHRY%3D&amp;reserved=
> > 0
> > >
> > > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > > ---
> > >  arch/x86/include/asm/svm.h      |    4 ++++
> > >  arch/x86/include/uapi/asm/svm.h |    2 ++
> > >  arch/x86/kvm/svm/svm.c          |   42
> > +++++++++++++++++++++++++++++++++++++++
> > >  3 files changed, 48 insertions(+)
> > >
> > > diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> > > index 62649fba8908..6488094f67fa 100644
> > > --- a/arch/x86/include/asm/svm.h
> > > +++ b/arch/x86/include/asm/svm.h
> > > @@ -55,6 +55,10 @@ enum {
> > >         INTERCEPT_RDPRU,
> > >  };
> > >
> > > +/* Extended Intercept bits */
> > > +enum {
> > > +       INTERCEPT_INVPCID = 2,
> > > +};
> > >
> > >  struct __attribute__ ((__packed__)) vmcb_control_area {
> > >         u32 intercept_cr;
> > > diff --git a/arch/x86/include/uapi/asm/svm.h
> > b/arch/x86/include/uapi/asm/svm.h
> > > index 2e8a30f06c74..522d42dfc28c 100644
> > > --- a/arch/x86/include/uapi/asm/svm.h
> > > +++ b/arch/x86/include/uapi/asm/svm.h
> > > @@ -76,6 +76,7 @@
> > >  #define SVM_EXIT_MWAIT_COND    0x08c
> > >  #define SVM_EXIT_XSETBV        0x08d
> > >  #define SVM_EXIT_RDPRU         0x08e
> > > +#define SVM_EXIT_INVPCID       0x0a2
> > >  #define SVM_EXIT_NPF           0x400
> > >  #define SVM_EXIT_AVIC_INCOMPLETE_IPI           0x401
> > >  #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS     0x402
> > > @@ -171,6 +172,7 @@
> > >         { SVM_EXIT_MONITOR,     "monitor" }, \
> > >         { SVM_EXIT_MWAIT,       "mwait" }, \
> > >         { SVM_EXIT_XSETBV,      "xsetbv" }, \
> > > +       { SVM_EXIT_INVPCID,     "invpcid" }, \
> > >         { SVM_EXIT_NPF,         "npf" }, \
> > >         { SVM_EXIT_AVIC_INCOMPLETE_IPI,         "avic_incomplete_ipi" }, \
> > >         { SVM_EXIT_AVIC_UNACCELERATED_ACCESS,
> > "avic_unaccelerated_access" }, \
> > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > index 285e5e1ff518..82d974338f68 100644
> > > --- a/arch/x86/kvm/svm/svm.c
> > > +++ b/arch/x86/kvm/svm/svm.c
> > > @@ -813,6 +813,11 @@ static __init void svm_set_cpu_caps(void)
> > >         if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) ||
> > >             boot_cpu_has(X86_FEATURE_AMD_SSBD))
> > >                 kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD);
> > > +
> > > +       /* Enable INVPCID if both PCID and INVPCID enabled */
> > > +       if (boot_cpu_has(X86_FEATURE_PCID) &&
> > > +           boot_cpu_has(X86_FEATURE_INVPCID))
> > > +               kvm_cpu_cap_set(X86_FEATURE_INVPCID);
> > >  }
> > >
> > >  static __init int svm_hardware_setup(void)
> > > @@ -1099,6 +1104,17 @@ static void init_vmcb(struct vcpu_svm *svm)
> > >                 clr_intercept(svm, INTERCEPT_PAUSE);
> > >         }
> > >
> > > +       /*
> > > +        * Intercept INVPCID instruction only if shadow page table is
> > > +        * enabled. Interception is not required with nested page table.
> > > +        */
> > > +       if (boot_cpu_has(X86_FEATURE_INVPCID)) {
> > > +               if (!npt_enabled)
> > > +                       set_extended_intercept(svm, INTERCEPT_INVPCID);
> > > +               else
> > > +                       clr_extended_intercept(svm, INTERCEPT_INVPCID);
> > > +       }
> > > +
> > >         if (kvm_vcpu_apicv_active(&svm->vcpu))
> > >                 avic_init_vmcb(svm);
> > >
> > > @@ -2715,6 +2731,23 @@ static int mwait_interception(struct vcpu_svm
> > *svm)
> > >         return nop_interception(svm);
> > >  }
> > >
> > > +static int invpcid_interception(struct vcpu_svm *svm)
> > > +{
> > > +       struct kvm_vcpu *vcpu = &svm->vcpu;
> > > +       unsigned long type;
> > > +       gva_t gva;
> > > +
> > > +       /*
> > > +        * For an INVPCID intercept:
> > > +        * EXITINFO1 provides the linear address of the memory operand.
> > > +        * EXITINFO2 provides the contents of the register operand.
> > > +        */
> > > +       type = svm->vmcb->control.exit_info_2;
> > > +       gva = svm->vmcb->control.exit_info_1;
> > > +
> > > +       return kvm_handle_invpcid_types(vcpu,  gva, type);
> > > +}
> > > +
> > >  static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
> > >         [SVM_EXIT_READ_CR0]                     = cr_interception,
> > >         [SVM_EXIT_READ_CR3]                     = cr_interception,
> > > @@ -2777,6 +2810,7 @@ static int (*const svm_exit_handlers[])(struct
> > vcpu_svm *svm) = {
> > >         [SVM_EXIT_MWAIT]                        = mwait_interception,
> > >         [SVM_EXIT_XSETBV]                       = xsetbv_interception,
> > >         [SVM_EXIT_RDPRU]                        = rdpru_interception,
> > > +       [SVM_EXIT_INVPCID]                      = invpcid_interception,
> > >         [SVM_EXIT_NPF]                          = npf_interception,
> > >         [SVM_EXIT_RSM]                          = rsm_interception,
> > >         [SVM_EXIT_AVIC_INCOMPLETE_IPI]          =
> > avic_incomplete_ipi_interception,
> > > @@ -3562,6 +3596,14 @@ static void svm_cpuid_update(struct kvm_vcpu
> > *vcpu)
> > >         svm->nrips_enabled = kvm_cpu_cap_has(X86_FEATURE_NRIPS) &&
> > >                              guest_cpuid_has(&svm->vcpu, X86_FEATURE_NRIPS);
> > >
> > > +       /*
> > > +        * Intercept INVPCID instruction if the baremetal has the support
> > > +        * but the guest doesn't claim the feature.
> > > +        */
> > > +       if (boot_cpu_has(X86_FEATURE_INVPCID) &&
> > > +           !guest_cpuid_has(vcpu, X86_FEATURE_INVPCID))
> > > +               set_extended_intercept(svm, INTERCEPT_INVPCID);
> > > +
> >
> > What if INVPCID is enabled in the guest CPUID later? Shouldn't we then
> > clear this intercept bit?
>
> I assume the feature enable comes in the same code path as this. I can add
> "if else" check here if that is what you are suggesting.

Yes, that's what I'm suggesting.

> >
> > >         if (!kvm_vcpu_apicv_active(vcpu))
> > >                 return;
> > >
> > >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] KVM:SVM: Enable INVPCID feature on AMD
  2020-06-12 20:10       ` Jim Mattson
@ 2020-06-12 21:47         ` Babu Moger
  2020-06-13  0:04           ` Jim Mattson
  0 siblings, 1 reply; 14+ messages in thread
From: Babu Moger @ 2020-06-12 21:47 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Wanpeng Li, Joerg Roedel, the arch/x86 maintainers,
	Sean Christopherson, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, Paolo Bonzini, Vitaly Kuznetsov,
	Thomas Gleixner, LKML, kvm list



On 6/12/20 3:10 PM, Jim Mattson wrote:
> On Fri, Jun 12, 2020 at 12:35 PM Babu Moger <babu.moger@amd.com> wrote:
>>
>>
>>
>>> -----Original Message-----
>>> From: Jim Mattson <jmattson@google.com>
>>> Sent: Thursday, June 11, 2020 6:51 PM
>>> To: Moger, Babu <Babu.Moger@amd.com>
>>> Cc: Wanpeng Li <wanpengli@tencent.com>; Joerg Roedel <joro@8bytes.org>;
>>> the arch/x86 maintainers <x86@kernel.org>; Sean Christopherson
>>> <sean.j.christopherson@intel.com>; Ingo Molnar <mingo@redhat.com>;
>>> Borislav Petkov <bp@alien8.de>; H . Peter Anvin <hpa@zytor.com>; Paolo
>>> Bonzini <pbonzini@redhat.com>; Vitaly Kuznetsov <vkuznets@redhat.com>;
>>> Thomas Gleixner <tglx@linutronix.de>; LKML <linux-kernel@vger.kernel.org>;
>>> kvm list <kvm@vger.kernel.org>
>>> Subject: Re: [PATCH 3/3] KVM:SVM: Enable INVPCID feature on AMD
>>>
>>> On Thu, Jun 11, 2020 at 2:48 PM Babu Moger <babu.moger@amd.com> wrote:
>>>>
>>>> The following intercept is added for INVPCID instruction:
>>>> Code    Name            Cause
>>>> A2h     VMEXIT_INVPCID  INVPCID instruction
>>>>
>>>> The following bit is added to the VMCB layout control area
>>>> to control intercept of INVPCID:
>>>> Byte Offset     Bit(s)    Function
>>>> 14h             2         intercept INVPCID
>>>>
>>>> For the guests with nested page table (NPT) support, the INVPCID
>>>> feature works as running it natively. KVM does not need to do any
>>>> special handling in this case.
>>>>
>>>> Interceptions are required in the following cases.
>>>> 1. If the guest tries to disable the feature when the underlying
>>>> hardware supports it. In this case hypervisor needs to report #UD.
>>>
>>> Per the AMD documentation, attempts to use INVPCID at CPL>0 will
>>> result in a #GP, regardless of the intercept bit. If the guest CPUID
>>> doesn't enumerate the feature, shouldn't the instruction raise #UD
>>> regardless of CPL? This seems to imply that we should intercept #GP
>>> and decode the instruction to see if we should synthesize #UD instead.
>>
>> Purpose here is to report UD when the guest CPUID doesn't enumerate the
>> INVPCID feature When Bare-metal supports it. It seems to work fine for
>> that purpose. You are right. The #GP for CPL>0 takes precedence over
>> interception. No. I am not planning to intercept GP.
> 
> WIthout intercepting #GP, you fail to achieve your stated purpose.

I think I have misunderstood this part. I was not inteding to change the
#GP behaviour. I will remove this part. My intension of these series is to
handle invpcid in shadow page mode. I have verified that part. Hope I did
not miss anything else.

> 
>> I will change the text. How about this?
>>
>> Interceptions are required in the following cases.
>> 1. If the guest CPUID doesn't enumerate the INVPCID feature when the
>> underlying hardware supports it,  hypervisor needs to report UD. However,
>> #GP for CPL>0 takes precedence over interception.
> 
> This text is not internally consistent. In one sentence, you say that
> "hypervisor needs to report #UD." In the next sentence, you are
> essentially saying that the hypervisor doesn't need to report #UD.
> Which is it?
> 
>>>> 2. When the guest is running with shadow page table enabled, in
>>>> this case the hypervisor needs to handle the tlbflush based on the
>>>> type of invpcid instruction type.
>>>>
>>>> AMD documentation for INVPCID feature is available at "AMD64
>>>> Architecture Programmer’s Manual Volume 2: System Programming,
>>>> Pub. 24593 Rev. 3.34(or later)"
>>>>
>>>> The documentation can be obtained at the links below:
>>>> Link:
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.a
>>> md.com%2Fsystem%2Ffiles%2FTechDocs%2F24593.pdf&amp;data=02%7C01%7
>>> Cbabu.moger%40amd.com%7C36861b25f6d143e3b38e08d80e624472%7C3dd8
>>> 961fe4884e608e11a82d994e183d%7C0%7C0%7C637275163374103811&amp;s
>>> data=E%2Fdb6T%2BdO4nrtUoqhKidF6XyorsWrphj6O4WwNZpmYA%3D&amp;res
>>> erved=0
>>>> Link:
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.
>>> kernel.org%2Fshow_bug.cgi%3Fid%3D206537&amp;data=02%7C01%7Cbabu.m
>>> oger%40amd.com%7C36861b25f6d143e3b38e08d80e624472%7C3dd8961fe488
>>> 4e608e11a82d994e183d%7C0%7C0%7C637275163374103811&amp;sdata=b81
>>> 9W%2FhKS93%2BAp3QvcsR0BwTQpUVUFMbIaNaisgWHRY%3D&amp;reserved=
>>> 0
>>>>
>>>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>>>> ---
>>>>  arch/x86/include/asm/svm.h      |    4 ++++
>>>>  arch/x86/include/uapi/asm/svm.h |    2 ++
>>>>  arch/x86/kvm/svm/svm.c          |   42
>>> +++++++++++++++++++++++++++++++++++++++
>>>>  3 files changed, 48 insertions(+)
>>>>
>>>> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
>>>> index 62649fba8908..6488094f67fa 100644
>>>> --- a/arch/x86/include/asm/svm.h
>>>> +++ b/arch/x86/include/asm/svm.h
>>>> @@ -55,6 +55,10 @@ enum {
>>>>         INTERCEPT_RDPRU,
>>>>  };
>>>>
>>>> +/* Extended Intercept bits */
>>>> +enum {
>>>> +       INTERCEPT_INVPCID = 2,
>>>> +};
>>>>
>>>>  struct __attribute__ ((__packed__)) vmcb_control_area {
>>>>         u32 intercept_cr;
>>>> diff --git a/arch/x86/include/uapi/asm/svm.h
>>> b/arch/x86/include/uapi/asm/svm.h
>>>> index 2e8a30f06c74..522d42dfc28c 100644
>>>> --- a/arch/x86/include/uapi/asm/svm.h
>>>> +++ b/arch/x86/include/uapi/asm/svm.h
>>>> @@ -76,6 +76,7 @@
>>>>  #define SVM_EXIT_MWAIT_COND    0x08c
>>>>  #define SVM_EXIT_XSETBV        0x08d
>>>>  #define SVM_EXIT_RDPRU         0x08e
>>>> +#define SVM_EXIT_INVPCID       0x0a2
>>>>  #define SVM_EXIT_NPF           0x400
>>>>  #define SVM_EXIT_AVIC_INCOMPLETE_IPI           0x401
>>>>  #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS     0x402
>>>> @@ -171,6 +172,7 @@
>>>>         { SVM_EXIT_MONITOR,     "monitor" }, \
>>>>         { SVM_EXIT_MWAIT,       "mwait" }, \
>>>>         { SVM_EXIT_XSETBV,      "xsetbv" }, \
>>>> +       { SVM_EXIT_INVPCID,     "invpcid" }, \
>>>>         { SVM_EXIT_NPF,         "npf" }, \
>>>>         { SVM_EXIT_AVIC_INCOMPLETE_IPI,         "avic_incomplete_ipi" }, \
>>>>         { SVM_EXIT_AVIC_UNACCELERATED_ACCESS,
>>> "avic_unaccelerated_access" }, \
>>>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>>>> index 285e5e1ff518..82d974338f68 100644
>>>> --- a/arch/x86/kvm/svm/svm.c
>>>> +++ b/arch/x86/kvm/svm/svm.c
>>>> @@ -813,6 +813,11 @@ static __init void svm_set_cpu_caps(void)
>>>>         if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) ||
>>>>             boot_cpu_has(X86_FEATURE_AMD_SSBD))
>>>>                 kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD);
>>>> +
>>>> +       /* Enable INVPCID if both PCID and INVPCID enabled */
>>>> +       if (boot_cpu_has(X86_FEATURE_PCID) &&
>>>> +           boot_cpu_has(X86_FEATURE_INVPCID))
>>>> +               kvm_cpu_cap_set(X86_FEATURE_INVPCID);
>>>>  }
>>>>
>>>>  static __init int svm_hardware_setup(void)
>>>> @@ -1099,6 +1104,17 @@ static void init_vmcb(struct vcpu_svm *svm)
>>>>                 clr_intercept(svm, INTERCEPT_PAUSE);
>>>>         }
>>>>
>>>> +       /*
>>>> +        * Intercept INVPCID instruction only if shadow page table is
>>>> +        * enabled. Interception is not required with nested page table.
>>>> +        */
>>>> +       if (boot_cpu_has(X86_FEATURE_INVPCID)) {
>>>> +               if (!npt_enabled)
>>>> +                       set_extended_intercept(svm, INTERCEPT_INVPCID);
>>>> +               else
>>>> +                       clr_extended_intercept(svm, INTERCEPT_INVPCID);
>>>> +       }
>>>> +
>>>>         if (kvm_vcpu_apicv_active(&svm->vcpu))
>>>>                 avic_init_vmcb(svm);
>>>>
>>>> @@ -2715,6 +2731,23 @@ static int mwait_interception(struct vcpu_svm
>>> *svm)
>>>>         return nop_interception(svm);
>>>>  }
>>>>
>>>> +static int invpcid_interception(struct vcpu_svm *svm)
>>>> +{
>>>> +       struct kvm_vcpu *vcpu = &svm->vcpu;
>>>> +       unsigned long type;
>>>> +       gva_t gva;
>>>> +
>>>> +       /*
>>>> +        * For an INVPCID intercept:
>>>> +        * EXITINFO1 provides the linear address of the memory operand.
>>>> +        * EXITINFO2 provides the contents of the register operand.
>>>> +        */
>>>> +       type = svm->vmcb->control.exit_info_2;
>>>> +       gva = svm->vmcb->control.exit_info_1;
>>>> +
>>>> +       return kvm_handle_invpcid_types(vcpu,  gva, type);
>>>> +}
>>>> +
>>>>  static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
>>>>         [SVM_EXIT_READ_CR0]                     = cr_interception,
>>>>         [SVM_EXIT_READ_CR3]                     = cr_interception,
>>>> @@ -2777,6 +2810,7 @@ static int (*const svm_exit_handlers[])(struct
>>> vcpu_svm *svm) = {
>>>>         [SVM_EXIT_MWAIT]                        = mwait_interception,
>>>>         [SVM_EXIT_XSETBV]                       = xsetbv_interception,
>>>>         [SVM_EXIT_RDPRU]                        = rdpru_interception,
>>>> +       [SVM_EXIT_INVPCID]                      = invpcid_interception,
>>>>         [SVM_EXIT_NPF]                          = npf_interception,
>>>>         [SVM_EXIT_RSM]                          = rsm_interception,
>>>>         [SVM_EXIT_AVIC_INCOMPLETE_IPI]          =
>>> avic_incomplete_ipi_interception,
>>>> @@ -3562,6 +3596,14 @@ static void svm_cpuid_update(struct kvm_vcpu
>>> *vcpu)
>>>>         svm->nrips_enabled = kvm_cpu_cap_has(X86_FEATURE_NRIPS) &&
>>>>                              guest_cpuid_has(&svm->vcpu, X86_FEATURE_NRIPS);
>>>>
>>>> +       /*
>>>> +        * Intercept INVPCID instruction if the baremetal has the support
>>>> +        * but the guest doesn't claim the feature.
>>>> +        */
>>>> +       if (boot_cpu_has(X86_FEATURE_INVPCID) &&
>>>> +           !guest_cpuid_has(vcpu, X86_FEATURE_INVPCID))
>>>> +               set_extended_intercept(svm, INTERCEPT_INVPCID);
>>>> +
>>>
>>> What if INVPCID is enabled in the guest CPUID later? Shouldn't we then
>>> clear this intercept bit?
>>
>> I assume the feature enable comes in the same code path as this. I can add
>> "if else" check here if that is what you are suggesting.
> 
> Yes, that's what I'm suggesting.
> 
>>>
>>>>         if (!kvm_vcpu_apicv_active(vcpu))
>>>>                 return;
>>>>
>>>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] KVM:SVM: Enable INVPCID feature on AMD
  2020-06-12 21:47         ` Babu Moger
@ 2020-06-13  0:04           ` Jim Mattson
  2020-06-15 13:46             ` Paolo Bonzini
  2020-06-15 13:47             ` Babu Moger
  0 siblings, 2 replies; 14+ messages in thread
From: Jim Mattson @ 2020-06-13  0:04 UTC (permalink / raw)
  To: Babu Moger
  Cc: Wanpeng Li, Joerg Roedel, the arch/x86 maintainers,
	Sean Christopherson, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, Paolo Bonzini, Vitaly Kuznetsov,
	Thomas Gleixner, LKML, kvm list

On Fri, Jun 12, 2020 at 2:47 PM Babu Moger <babu.moger@amd.com> wrote:
>
>
>
> On 6/12/20 3:10 PM, Jim Mattson wrote:
> > On Fri, Jun 12, 2020 at 12:35 PM Babu Moger <babu.moger@amd.com> wrote:
> >>
> >>
> >>
> >>> -----Original Message-----
> >>> From: Jim Mattson <jmattson@google.com>
> >>> Sent: Thursday, June 11, 2020 6:51 PM
> >>> To: Moger, Babu <Babu.Moger@amd.com>
> >>> Cc: Wanpeng Li <wanpengli@tencent.com>; Joerg Roedel <joro@8bytes.org>;
> >>> the arch/x86 maintainers <x86@kernel.org>; Sean Christopherson
> >>> <sean.j.christopherson@intel.com>; Ingo Molnar <mingo@redhat.com>;
> >>> Borislav Petkov <bp@alien8.de>; H . Peter Anvin <hpa@zytor.com>; Paolo
> >>> Bonzini <pbonzini@redhat.com>; Vitaly Kuznetsov <vkuznets@redhat.com>;
> >>> Thomas Gleixner <tglx@linutronix.de>; LKML <linux-kernel@vger.kernel.org>;
> >>> kvm list <kvm@vger.kernel.org>
> >>> Subject: Re: [PATCH 3/3] KVM:SVM: Enable INVPCID feature on AMD
> >>>
> >>> On Thu, Jun 11, 2020 at 2:48 PM Babu Moger <babu.moger@amd.com> wrote:
> >>>>
> >>>> The following intercept is added for INVPCID instruction:
> >>>> Code    Name            Cause
> >>>> A2h     VMEXIT_INVPCID  INVPCID instruction
> >>>>
> >>>> The following bit is added to the VMCB layout control area
> >>>> to control intercept of INVPCID:
> >>>> Byte Offset     Bit(s)    Function
> >>>> 14h             2         intercept INVPCID
> >>>>
> >>>> For the guests with nested page table (NPT) support, the INVPCID
> >>>> feature works as running it natively. KVM does not need to do any
> >>>> special handling in this case.
> >>>>
> >>>> Interceptions are required in the following cases.
> >>>> 1. If the guest tries to disable the feature when the underlying
> >>>> hardware supports it. In this case hypervisor needs to report #UD.
> >>>
> >>> Per the AMD documentation, attempts to use INVPCID at CPL>0 will
> >>> result in a #GP, regardless of the intercept bit. If the guest CPUID
> >>> doesn't enumerate the feature, shouldn't the instruction raise #UD
> >>> regardless of CPL? This seems to imply that we should intercept #GP
> >>> and decode the instruction to see if we should synthesize #UD instead.
> >>
> >> Purpose here is to report UD when the guest CPUID doesn't enumerate the
> >> INVPCID feature When Bare-metal supports it. It seems to work fine for
> >> that purpose. You are right. The #GP for CPL>0 takes precedence over
> >> interception. No. I am not planning to intercept GP.
> >
> > WIthout intercepting #GP, you fail to achieve your stated purpose.
>
> I think I have misunderstood this part. I was not inteding to change the
> #GP behaviour. I will remove this part. My intension of these series is to
> handle invpcid in shadow page mode. I have verified that part. Hope I did
> not miss anything else.

You don't really have to intercept INVPCID when tdp is in use, right?
There are certainly plenty of operations for which kvm does not
properly raise #UD when they aren't enumerated in the guest CPUID.

> >> I will change the text. How about this?
> >>
> >> Interceptions are required in the following cases.
> >> 1. If the guest CPUID doesn't enumerate the INVPCID feature when the
> >> underlying hardware supports it,  hypervisor needs to report UD. However,
> >> #GP for CPL>0 takes precedence over interception.
> >
> > This text is not internally consistent. In one sentence, you say that
> > "hypervisor needs to report #UD." In the next sentence, you are
> > essentially saying that the hypervisor doesn't need to report #UD.
> > Which is it?
> >
> >>>> 2. When the guest is running with shadow page table enabled, in
> >>>> this case the hypervisor needs to handle the tlbflush based on the
> >>>> type of invpcid instruction type.
> >>>>
> >>>> AMD documentation for INVPCID feature is available at "AMD64
> >>>> Architecture Programmer’s Manual Volume 2: System Programming,
> >>>> Pub. 24593 Rev. 3.34(or later)"
> >>>>
> >>>> The documentation can be obtained at the links below:
> >>>> Link:
> >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.a
> >>> md.com%2Fsystem%2Ffiles%2FTechDocs%2F24593.pdf&amp;data=02%7C01%7
> >>> Cbabu.moger%40amd.com%7C36861b25f6d143e3b38e08d80e624472%7C3dd8
> >>> 961fe4884e608e11a82d994e183d%7C0%7C0%7C637275163374103811&amp;s
> >>> data=E%2Fdb6T%2BdO4nrtUoqhKidF6XyorsWrphj6O4WwNZpmYA%3D&amp;res
> >>> erved=0
> >>>> Link:
> >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.
> >>> kernel.org%2Fshow_bug.cgi%3Fid%3D206537&amp;data=02%7C01%7Cbabu.m
> >>> oger%40amd.com%7C36861b25f6d143e3b38e08d80e624472%7C3dd8961fe488
> >>> 4e608e11a82d994e183d%7C0%7C0%7C637275163374103811&amp;sdata=b81
> >>> 9W%2FhKS93%2BAp3QvcsR0BwTQpUVUFMbIaNaisgWHRY%3D&amp;reserved=
> >>> 0
> >>>>
> >>>> Signed-off-by: Babu Moger <babu.moger@amd.com>
> >>>> ---
> >>>>  arch/x86/include/asm/svm.h      |    4 ++++
> >>>>  arch/x86/include/uapi/asm/svm.h |    2 ++
> >>>>  arch/x86/kvm/svm/svm.c          |   42
> >>> +++++++++++++++++++++++++++++++++++++++
> >>>>  3 files changed, 48 insertions(+)
> >>>>
> >>>> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> >>>> index 62649fba8908..6488094f67fa 100644
> >>>> --- a/arch/x86/include/asm/svm.h
> >>>> +++ b/arch/x86/include/asm/svm.h
> >>>> @@ -55,6 +55,10 @@ enum {
> >>>>         INTERCEPT_RDPRU,
> >>>>  };
> >>>>
> >>>> +/* Extended Intercept bits */
> >>>> +enum {
> >>>> +       INTERCEPT_INVPCID = 2,
> >>>> +};
> >>>>
> >>>>  struct __attribute__ ((__packed__)) vmcb_control_area {
> >>>>         u32 intercept_cr;
> >>>> diff --git a/arch/x86/include/uapi/asm/svm.h
> >>> b/arch/x86/include/uapi/asm/svm.h
> >>>> index 2e8a30f06c74..522d42dfc28c 100644
> >>>> --- a/arch/x86/include/uapi/asm/svm.h
> >>>> +++ b/arch/x86/include/uapi/asm/svm.h
> >>>> @@ -76,6 +76,7 @@
> >>>>  #define SVM_EXIT_MWAIT_COND    0x08c
> >>>>  #define SVM_EXIT_XSETBV        0x08d
> >>>>  #define SVM_EXIT_RDPRU         0x08e
> >>>> +#define SVM_EXIT_INVPCID       0x0a2
> >>>>  #define SVM_EXIT_NPF           0x400
> >>>>  #define SVM_EXIT_AVIC_INCOMPLETE_IPI           0x401
> >>>>  #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS     0x402
> >>>> @@ -171,6 +172,7 @@
> >>>>         { SVM_EXIT_MONITOR,     "monitor" }, \
> >>>>         { SVM_EXIT_MWAIT,       "mwait" }, \
> >>>>         { SVM_EXIT_XSETBV,      "xsetbv" }, \
> >>>> +       { SVM_EXIT_INVPCID,     "invpcid" }, \
> >>>>         { SVM_EXIT_NPF,         "npf" }, \
> >>>>         { SVM_EXIT_AVIC_INCOMPLETE_IPI,         "avic_incomplete_ipi" }, \
> >>>>         { SVM_EXIT_AVIC_UNACCELERATED_ACCESS,
> >>> "avic_unaccelerated_access" }, \
> >>>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> >>>> index 285e5e1ff518..82d974338f68 100644
> >>>> --- a/arch/x86/kvm/svm/svm.c
> >>>> +++ b/arch/x86/kvm/svm/svm.c
> >>>> @@ -813,6 +813,11 @@ static __init void svm_set_cpu_caps(void)
> >>>>         if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) ||
> >>>>             boot_cpu_has(X86_FEATURE_AMD_SSBD))
> >>>>                 kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD);
> >>>> +
> >>>> +       /* Enable INVPCID if both PCID and INVPCID enabled */
> >>>> +       if (boot_cpu_has(X86_FEATURE_PCID) &&
> >>>> +           boot_cpu_has(X86_FEATURE_INVPCID))
> >>>> +               kvm_cpu_cap_set(X86_FEATURE_INVPCID);
> >>>>  }
> >>>>
> >>>>  static __init int svm_hardware_setup(void)
> >>>> @@ -1099,6 +1104,17 @@ static void init_vmcb(struct vcpu_svm *svm)
> >>>>                 clr_intercept(svm, INTERCEPT_PAUSE);
> >>>>         }
> >>>>
> >>>> +       /*
> >>>> +        * Intercept INVPCID instruction only if shadow page table is
> >>>> +        * enabled. Interception is not required with nested page table.
> >>>> +        */
> >>>> +       if (boot_cpu_has(X86_FEATURE_INVPCID)) {
> >>>> +               if (!npt_enabled)
> >>>> +                       set_extended_intercept(svm, INTERCEPT_INVPCID);
> >>>> +               else
> >>>> +                       clr_extended_intercept(svm, INTERCEPT_INVPCID);
> >>>> +       }
> >>>> +
> >>>>         if (kvm_vcpu_apicv_active(&svm->vcpu))
> >>>>                 avic_init_vmcb(svm);
> >>>>
> >>>> @@ -2715,6 +2731,23 @@ static int mwait_interception(struct vcpu_svm
> >>> *svm)
> >>>>         return nop_interception(svm);
> >>>>  }
> >>>>
> >>>> +static int invpcid_interception(struct vcpu_svm *svm)
> >>>> +{
> >>>> +       struct kvm_vcpu *vcpu = &svm->vcpu;
> >>>> +       unsigned long type;
> >>>> +       gva_t gva;
> >>>> +
> >>>> +       /*
> >>>> +        * For an INVPCID intercept:
> >>>> +        * EXITINFO1 provides the linear address of the memory operand.
> >>>> +        * EXITINFO2 provides the contents of the register operand.
> >>>> +        */
> >>>> +       type = svm->vmcb->control.exit_info_2;
> >>>> +       gva = svm->vmcb->control.exit_info_1;
> >>>> +
> >>>> +       return kvm_handle_invpcid_types(vcpu,  gva, type);
> >>>> +}
> >>>> +
> >>>>  static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
> >>>>         [SVM_EXIT_READ_CR0]                     = cr_interception,
> >>>>         [SVM_EXIT_READ_CR3]                     = cr_interception,
> >>>> @@ -2777,6 +2810,7 @@ static int (*const svm_exit_handlers[])(struct
> >>> vcpu_svm *svm) = {
> >>>>         [SVM_EXIT_MWAIT]                        = mwait_interception,
> >>>>         [SVM_EXIT_XSETBV]                       = xsetbv_interception,
> >>>>         [SVM_EXIT_RDPRU]                        = rdpru_interception,
> >>>> +       [SVM_EXIT_INVPCID]                      = invpcid_interception,
> >>>>         [SVM_EXIT_NPF]                          = npf_interception,
> >>>>         [SVM_EXIT_RSM]                          = rsm_interception,
> >>>>         [SVM_EXIT_AVIC_INCOMPLETE_IPI]          =
> >>> avic_incomplete_ipi_interception,
> >>>> @@ -3562,6 +3596,14 @@ static void svm_cpuid_update(struct kvm_vcpu
> >>> *vcpu)
> >>>>         svm->nrips_enabled = kvm_cpu_cap_has(X86_FEATURE_NRIPS) &&
> >>>>                              guest_cpuid_has(&svm->vcpu, X86_FEATURE_NRIPS);
> >>>>
> >>>> +       /*
> >>>> +        * Intercept INVPCID instruction if the baremetal has the support
> >>>> +        * but the guest doesn't claim the feature.
> >>>> +        */
> >>>> +       if (boot_cpu_has(X86_FEATURE_INVPCID) &&
> >>>> +           !guest_cpuid_has(vcpu, X86_FEATURE_INVPCID))
> >>>> +               set_extended_intercept(svm, INTERCEPT_INVPCID);
> >>>> +
> >>>
> >>> What if INVPCID is enabled in the guest CPUID later? Shouldn't we then
> >>> clear this intercept bit?
> >>
> >> I assume the feature enable comes in the same code path as this. I can add
> >> "if else" check here if that is what you are suggesting.
> >
> > Yes, that's what I'm suggesting.
> >
> >>>
> >>>>         if (!kvm_vcpu_apicv_active(vcpu))
> >>>>                 return;
> >>>>
> >>>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] KVM:SVM: Enable INVPCID feature on AMD
  2020-06-13  0:04           ` Jim Mattson
@ 2020-06-15 13:46             ` Paolo Bonzini
  2020-06-15 14:59               ` Babu Moger
  2020-06-15 13:47             ` Babu Moger
  1 sibling, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2020-06-15 13:46 UTC (permalink / raw)
  To: Jim Mattson, Babu Moger
  Cc: Wanpeng Li, Joerg Roedel, the arch/x86 maintainers,
	Sean Christopherson, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, Vitaly Kuznetsov, Thomas Gleixner, LKML,
	kvm list

On 13/06/20 02:04, Jim Mattson wrote:
>> I think I have misunderstood this part. I was not inteding to change the
>> #GP behaviour. I will remove this part. My intension of these series is to
>> handle invpcid in shadow page mode. I have verified that part. Hope I did
>> not miss anything else.
> You don't really have to intercept INVPCID when tdp is in use, right?
> There are certainly plenty of operations for which kvm does not
> properly raise #UD when they aren't enumerated in the guest CPUID.
> 

Indeed; for RDRAND and RDSEED it makes sense to do so because the
instructions may introduce undesirable nondeterminism (or block all the
packages in your core as they have been doing for a few weeks).
Therefore on Intel we trap them and raise #UD; on AMD this is not
possible because RDRAND has no intercept.

In general however we do not try to hard to raise #UD and that is
usually not even possible.

Paolo


^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH 3/3] KVM:SVM: Enable INVPCID feature on AMD
  2020-06-13  0:04           ` Jim Mattson
  2020-06-15 13:46             ` Paolo Bonzini
@ 2020-06-15 13:47             ` Babu Moger
  1 sibling, 0 replies; 14+ messages in thread
From: Babu Moger @ 2020-06-15 13:47 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Wanpeng Li, Joerg Roedel, the arch/x86 maintainers,
	Sean Christopherson, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, Paolo Bonzini, Vitaly Kuznetsov,
	Thomas Gleixner, LKML, kvm list



> -----Original Message-----
> From: Jim Mattson <jmattson@google.com>
> Sent: Friday, June 12, 2020 7:04 PM
> To: Moger, Babu <Babu.Moger@amd.com>
> Cc: Wanpeng Li <wanpengli@tencent.com>; Joerg Roedel <joro@8bytes.org>;
> the arch/x86 maintainers <x86@kernel.org>; Sean Christopherson
> <sean.j.christopherson@intel.com>; Ingo Molnar <mingo@redhat.com>;
> Borislav Petkov <bp@alien8.de>; H . Peter Anvin <hpa@zytor.com>; Paolo
> Bonzini <pbonzini@redhat.com>; Vitaly Kuznetsov <vkuznets@redhat.com>;
> Thomas Gleixner <tglx@linutronix.de>; LKML <linux-kernel@vger.kernel.org>;
> kvm list <kvm@vger.kernel.org>
> Subject: Re: [PATCH 3/3] KVM:SVM: Enable INVPCID feature on AMD
> 
> On Fri, Jun 12, 2020 at 2:47 PM Babu Moger <babu.moger@amd.com> wrote:
> >
> >
> >
> > On 6/12/20 3:10 PM, Jim Mattson wrote:
> > > On Fri, Jun 12, 2020 at 12:35 PM Babu Moger <babu.moger@amd.com>
> wrote:
> > >>
> > >>
> > >>
> > >>> -----Original Message-----
> > >>> From: Jim Mattson <jmattson@google.com>
> > >>> Sent: Thursday, June 11, 2020 6:51 PM
> > >>> To: Moger, Babu <Babu.Moger@amd.com>
> > >>> Cc: Wanpeng Li <wanpengli@tencent.com>; Joerg Roedel
> <joro@8bytes.org>;
> > >>> the arch/x86 maintainers <x86@kernel.org>; Sean Christopherson
> > >>> <sean.j.christopherson@intel.com>; Ingo Molnar <mingo@redhat.com>;
> > >>> Borislav Petkov <bp@alien8.de>; H . Peter Anvin <hpa@zytor.com>; Paolo
> > >>> Bonzini <pbonzini@redhat.com>; Vitaly Kuznetsov
> <vkuznets@redhat.com>;
> > >>> Thomas Gleixner <tglx@linutronix.de>; LKML <linux-
> kernel@vger.kernel.org>;
> > >>> kvm list <kvm@vger.kernel.org>
> > >>> Subject: Re: [PATCH 3/3] KVM:SVM: Enable INVPCID feature on AMD
> > >>>
> > >>> On Thu, Jun 11, 2020 at 2:48 PM Babu Moger <babu.moger@amd.com>
> wrote:
> > >>>>
> > >>>> The following intercept is added for INVPCID instruction:
> > >>>> Code    Name            Cause
> > >>>> A2h     VMEXIT_INVPCID  INVPCID instruction
> > >>>>
> > >>>> The following bit is added to the VMCB layout control area
> > >>>> to control intercept of INVPCID:
> > >>>> Byte Offset     Bit(s)    Function
> > >>>> 14h             2         intercept INVPCID
> > >>>>
> > >>>> For the guests with nested page table (NPT) support, the INVPCID
> > >>>> feature works as running it natively. KVM does not need to do any
> > >>>> special handling in this case.
> > >>>>
> > >>>> Interceptions are required in the following cases.
> > >>>> 1. If the guest tries to disable the feature when the underlying
> > >>>> hardware supports it. In this case hypervisor needs to report #UD.
> > >>>
> > >>> Per the AMD documentation, attempts to use INVPCID at CPL>0 will
> > >>> result in a #GP, regardless of the intercept bit. If the guest CPUID
> > >>> doesn't enumerate the feature, shouldn't the instruction raise #UD
> > >>> regardless of CPL? This seems to imply that we should intercept #GP
> > >>> and decode the instruction to see if we should synthesize #UD instead.
> > >>
> > >> Purpose here is to report UD when the guest CPUID doesn't enumerate the
> > >> INVPCID feature When Bare-metal supports it. It seems to work fine for
> > >> that purpose. You are right. The #GP for CPL>0 takes precedence over
> > >> interception. No. I am not planning to intercept GP.
> > >
> > > WIthout intercepting #GP, you fail to achieve your stated purpose.
> >
> > I think I have misunderstood this part. I was not inteding to change the
> > #GP behaviour. I will remove this part. My intension of these series is to
> > handle invpcid in shadow page mode. I have verified that part. Hope I did
> > not miss anything else.
> 
> You don't really have to intercept INVPCID when tdp is in use, right?

That is correct.  Adding the intercept only when tdp is off.

> There are certainly plenty of operations for which kvm does not
> properly raise #UD when they aren't enumerated in the guest CPUID.
> 
> > >> I will change the text. How about this?
> > >>
> > >> Interceptions are required in the following cases.
> > >> 1. If the guest CPUID doesn't enumerate the INVPCID feature when the
> > >> underlying hardware supports it,  hypervisor needs to report UD. However,
> > >> #GP for CPL>0 takes precedence over interception.
> > >
> > > This text is not internally consistent. In one sentence, you say that
> > > "hypervisor needs to report #UD." In the next sentence, you are
> > > essentially saying that the hypervisor doesn't need to report #UD.
> > > Which is it?
> > >
> > >>>> 2. When the guest is running with shadow page table enabled, in
> > >>>> this case the hypervisor needs to handle the tlbflush based on the
> > >>>> type of invpcid instruction type.
> > >>>>
> > >>>> AMD documentation for INVPCID feature is available at "AMD64
> > >>>> Architecture Programmer’s Manual Volume 2: System Programming,
> > >>>> Pub. 24593 Rev. 3.34(or later)"
> > >>>>
> > >>>> The documentation can be obtained at the links below:
> > >>>> Link:
> > >>>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.a
> > >>>
> md.com%2Fsystem%2Ffiles%2FTechDocs%2F24593.pdf&amp;data=02%7C01%7
> > >>>
> Cbabu.moger%40amd.com%7C36861b25f6d143e3b38e08d80e624472%7C3dd8
> > >>>
> 961fe4884e608e11a82d994e183d%7C0%7C0%7C637275163374103811&amp;s
> > >>>
> data=E%2Fdb6T%2BdO4nrtUoqhKidF6XyorsWrphj6O4WwNZpmYA%3D&amp;res
> > >>> erved=0
> > >>>> Link:
> > >>>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.
> > >>>
> kernel.org%2Fshow_bug.cgi%3Fid%3D206537&amp;data=02%7C01%7Cbabu.m
> > >>>
> oger%40amd.com%7C36861b25f6d143e3b38e08d80e624472%7C3dd8961fe488
> > >>>
> 4e608e11a82d994e183d%7C0%7C0%7C637275163374103811&amp;sdata=b81
> > >>>
> 9W%2FhKS93%2BAp3QvcsR0BwTQpUVUFMbIaNaisgWHRY%3D&amp;reserved=
> > >>> 0
> > >>>>
> > >>>> Signed-off-by: Babu Moger <babu.moger@amd.com>
> > >>>> ---
> > >>>>  arch/x86/include/asm/svm.h      |    4 ++++
> > >>>>  arch/x86/include/uapi/asm/svm.h |    2 ++
> > >>>>  arch/x86/kvm/svm/svm.c          |   42
> > >>> +++++++++++++++++++++++++++++++++++++++
> > >>>>  3 files changed, 48 insertions(+)
> > >>>>
> > >>>> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> > >>>> index 62649fba8908..6488094f67fa 100644
> > >>>> --- a/arch/x86/include/asm/svm.h
> > >>>> +++ b/arch/x86/include/asm/svm.h
> > >>>> @@ -55,6 +55,10 @@ enum {
> > >>>>         INTERCEPT_RDPRU,
> > >>>>  };
> > >>>>
> > >>>> +/* Extended Intercept bits */
> > >>>> +enum {
> > >>>> +       INTERCEPT_INVPCID = 2,
> > >>>> +};
> > >>>>
> > >>>>  struct __attribute__ ((__packed__)) vmcb_control_area {
> > >>>>         u32 intercept_cr;
> > >>>> diff --git a/arch/x86/include/uapi/asm/svm.h
> > >>> b/arch/x86/include/uapi/asm/svm.h
> > >>>> index 2e8a30f06c74..522d42dfc28c 100644
> > >>>> --- a/arch/x86/include/uapi/asm/svm.h
> > >>>> +++ b/arch/x86/include/uapi/asm/svm.h
> > >>>> @@ -76,6 +76,7 @@
> > >>>>  #define SVM_EXIT_MWAIT_COND    0x08c
> > >>>>  #define SVM_EXIT_XSETBV        0x08d
> > >>>>  #define SVM_EXIT_RDPRU         0x08e
> > >>>> +#define SVM_EXIT_INVPCID       0x0a2
> > >>>>  #define SVM_EXIT_NPF           0x400
> > >>>>  #define SVM_EXIT_AVIC_INCOMPLETE_IPI           0x401
> > >>>>  #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS     0x402
> > >>>> @@ -171,6 +172,7 @@
> > >>>>         { SVM_EXIT_MONITOR,     "monitor" }, \
> > >>>>         { SVM_EXIT_MWAIT,       "mwait" }, \
> > >>>>         { SVM_EXIT_XSETBV,      "xsetbv" }, \
> > >>>> +       { SVM_EXIT_INVPCID,     "invpcid" }, \
> > >>>>         { SVM_EXIT_NPF,         "npf" }, \
> > >>>>         { SVM_EXIT_AVIC_INCOMPLETE_IPI,         "avic_incomplete_ipi" }, \
> > >>>>         { SVM_EXIT_AVIC_UNACCELERATED_ACCESS,
> > >>> "avic_unaccelerated_access" }, \
> > >>>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > >>>> index 285e5e1ff518..82d974338f68 100644
> > >>>> --- a/arch/x86/kvm/svm/svm.c
> > >>>> +++ b/arch/x86/kvm/svm/svm.c
> > >>>> @@ -813,6 +813,11 @@ static __init void svm_set_cpu_caps(void)
> > >>>>         if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) ||
> > >>>>             boot_cpu_has(X86_FEATURE_AMD_SSBD))
> > >>>>                 kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD);
> > >>>> +
> > >>>> +       /* Enable INVPCID if both PCID and INVPCID enabled */
> > >>>> +       if (boot_cpu_has(X86_FEATURE_PCID) &&
> > >>>> +           boot_cpu_has(X86_FEATURE_INVPCID))
> > >>>> +               kvm_cpu_cap_set(X86_FEATURE_INVPCID);
> > >>>>  }
> > >>>>
> > >>>>  static __init int svm_hardware_setup(void)
> > >>>> @@ -1099,6 +1104,17 @@ static void init_vmcb(struct vcpu_svm *svm)
> > >>>>                 clr_intercept(svm, INTERCEPT_PAUSE);
> > >>>>         }
> > >>>>
> > >>>> +       /*
> > >>>> +        * Intercept INVPCID instruction only if shadow page table is
> > >>>> +        * enabled. Interception is not required with nested page table.
> > >>>> +        */
> > >>>> +       if (boot_cpu_has(X86_FEATURE_INVPCID)) {
> > >>>> +               if (!npt_enabled)
> > >>>> +                       set_extended_intercept(svm, INTERCEPT_INVPCID);
> > >>>> +               else
> > >>>> +                       clr_extended_intercept(svm, INTERCEPT_INVPCID);
> > >>>> +       }
> > >>>> +
> > >>>>         if (kvm_vcpu_apicv_active(&svm->vcpu))
> > >>>>                 avic_init_vmcb(svm);
> > >>>>
> > >>>> @@ -2715,6 +2731,23 @@ static int mwait_interception(struct
> vcpu_svm
> > >>> *svm)
> > >>>>         return nop_interception(svm);
> > >>>>  }
> > >>>>
> > >>>> +static int invpcid_interception(struct vcpu_svm *svm)
> > >>>> +{
> > >>>> +       struct kvm_vcpu *vcpu = &svm->vcpu;
> > >>>> +       unsigned long type;
> > >>>> +       gva_t gva;
> > >>>> +
> > >>>> +       /*
> > >>>> +        * For an INVPCID intercept:
> > >>>> +        * EXITINFO1 provides the linear address of the memory operand.
> > >>>> +        * EXITINFO2 provides the contents of the register operand.
> > >>>> +        */
> > >>>> +       type = svm->vmcb->control.exit_info_2;
> > >>>> +       gva = svm->vmcb->control.exit_info_1;
> > >>>> +
> > >>>> +       return kvm_handle_invpcid_types(vcpu,  gva, type);
> > >>>> +}
> > >>>> +
> > >>>>  static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
> > >>>>         [SVM_EXIT_READ_CR0]                     = cr_interception,
> > >>>>         [SVM_EXIT_READ_CR3]                     = cr_interception,
> > >>>> @@ -2777,6 +2810,7 @@ static int (*const svm_exit_handlers[])(struct
> > >>> vcpu_svm *svm) = {
> > >>>>         [SVM_EXIT_MWAIT]                        = mwait_interception,
> > >>>>         [SVM_EXIT_XSETBV]                       = xsetbv_interception,
> > >>>>         [SVM_EXIT_RDPRU]                        = rdpru_interception,
> > >>>> +       [SVM_EXIT_INVPCID]                      = invpcid_interception,
> > >>>>         [SVM_EXIT_NPF]                          = npf_interception,
> > >>>>         [SVM_EXIT_RSM]                          = rsm_interception,
> > >>>>         [SVM_EXIT_AVIC_INCOMPLETE_IPI]          =
> > >>> avic_incomplete_ipi_interception,
> > >>>> @@ -3562,6 +3596,14 @@ static void svm_cpuid_update(struct
> kvm_vcpu
> > >>> *vcpu)
> > >>>>         svm->nrips_enabled = kvm_cpu_cap_has(X86_FEATURE_NRIPS) &&
> > >>>>                              guest_cpuid_has(&svm->vcpu, X86_FEATURE_NRIPS);
> > >>>>
> > >>>> +       /*
> > >>>> +        * Intercept INVPCID instruction if the baremetal has the support
> > >>>> +        * but the guest doesn't claim the feature.
> > >>>> +        */
> > >>>> +       if (boot_cpu_has(X86_FEATURE_INVPCID) &&
> > >>>> +           !guest_cpuid_has(vcpu, X86_FEATURE_INVPCID))
> > >>>> +               set_extended_intercept(svm, INTERCEPT_INVPCID);
> > >>>> +
> > >>>
> > >>> What if INVPCID is enabled in the guest CPUID later? Shouldn't we then
> > >>> clear this intercept bit?
> > >>
> > >> I assume the feature enable comes in the same code path as this. I can add
> > >> "if else" check here if that is what you are suggesting.
> > >
> > > Yes, that's what I'm suggesting.
> > >
> > >>>
> > >>>>         if (!kvm_vcpu_apicv_active(vcpu))
> > >>>>                 return;
> > >>>>
> > >>>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH 3/3] KVM:SVM: Enable INVPCID feature on AMD
  2020-06-15 13:46             ` Paolo Bonzini
@ 2020-06-15 14:59               ` Babu Moger
  0 siblings, 0 replies; 14+ messages in thread
From: Babu Moger @ 2020-06-15 14:59 UTC (permalink / raw)
  To: Paolo Bonzini, Jim Mattson
  Cc: Wanpeng Li, Joerg Roedel, the arch/x86 maintainers,
	Sean Christopherson, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, Vitaly Kuznetsov, Thomas Gleixner, LKML,
	kvm list



> -----Original Message-----
> From: Paolo Bonzini <pbonzini@redhat.com>
> Sent: Monday, June 15, 2020 8:47 AM
> To: Jim Mattson <jmattson@google.com>; Moger, Babu
> <Babu.Moger@amd.com>
> Cc: Wanpeng Li <wanpengli@tencent.com>; Joerg Roedel <joro@8bytes.org>;
> the arch/x86 maintainers <x86@kernel.org>; Sean Christopherson
> <sean.j.christopherson@intel.com>; Ingo Molnar <mingo@redhat.com>;
> Borislav Petkov <bp@alien8.de>; H . Peter Anvin <hpa@zytor.com>; Vitaly
> Kuznetsov <vkuznets@redhat.com>; Thomas Gleixner <tglx@linutronix.de>;
> LKML <linux-kernel@vger.kernel.org>; kvm list <kvm@vger.kernel.org>
> Subject: Re: [PATCH 3/3] KVM:SVM: Enable INVPCID feature on AMD
> 
> On 13/06/20 02:04, Jim Mattson wrote:
> >> I think I have misunderstood this part. I was not inteding to change the
> >> #GP behaviour. I will remove this part. My intension of these series is to
> >> handle invpcid in shadow page mode. I have verified that part. Hope I did
> >> not miss anything else.
> > You don't really have to intercept INVPCID when tdp is in use, right?
> > There are certainly plenty of operations for which kvm does not
> > properly raise #UD when they aren't enumerated in the guest CPUID.
> >
> 
> Indeed; for RDRAND and RDSEED it makes sense to do so because the
> instructions may introduce undesirable nondeterminism (or block all the
> packages in your core as they have been doing for a few weeks).
> Therefore on Intel we trap them and raise #UD; on AMD this is not
> possible because RDRAND has no intercept.
> 
> In general however we do not try to hard to raise #UD and that is
> usually not even possible.

Yes. Sure. Thanks.
> 
> Paolo


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-06-15 14:59 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-11 21:48 [PATCH 0/3] INVPCID support for the AMD guests Babu Moger
2020-06-11 21:48 ` [PATCH 1/3] KVM: X86: Move handling of INVPCID types to x86 Babu Moger
2020-06-12 18:02   ` Jim Mattson
2020-06-12 20:06     ` Babu Moger
2020-06-11 21:48 ` [PATCH 2/3] KVM:SVM: Add extended intercept support Babu Moger
2020-06-11 21:48 ` [PATCH 3/3] KVM:SVM: Enable INVPCID feature on AMD Babu Moger
2020-06-11 23:50   ` Jim Mattson
2020-06-12 19:35     ` Babu Moger
2020-06-12 20:10       ` Jim Mattson
2020-06-12 21:47         ` Babu Moger
2020-06-13  0:04           ` Jim Mattson
2020-06-15 13:46             ` Paolo Bonzini
2020-06-15 14:59               ` Babu Moger
2020-06-15 13:47             ` Babu Moger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).