kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 00/12] SVM cleanup and INVPCID feature support
@ 2020-09-11 19:27 Babu Moger
  2020-09-11 19:27 ` [PATCH v6 01/12] KVM: SVM: Introduce vmcb_(set_intercept/clr_intercept/_is_intercept) Babu Moger
                   ` (12 more replies)
  0 siblings, 13 replies; 62+ messages in thread
From: Babu Moger @ 2020-09-11 19:27 UTC (permalink / raw)
  To: pbonzini, vkuznets, sean.j.christopherson, jmattson
  Cc: wanpengli, kvm, joro, x86, linux-kernel, babu.moger, mingo, bp,
	hpa, tglx

The following series adds the support for PCID/INVPCID on AMD guests.
While doing it re-structured the vmcb_control_area data structure to
combine all the intercept vectors into one 32 bit array. Makes it easy
for future additions. Re-arranged few pcid related code to make it common
between SVM and VMX.

INVPCID interceptions are added only when the guest is running with shadow
page table enabled. In this case the hypervisor needs to handle the tlbflush
based on the type of invpcid instruction.

For the guests with nested page table (NPT) support, the INVPCID feature
works as running it natively. KVM does not need to do any special handling.

AMD documentation for INVPCID feature is available at "AMD64 Architecture
Programmer’s Manual Volume 2: System Programming, Pub. 24593 Rev. 3.34(or later)"

The documentation can be obtained at the links below:
Link: https://www.amd.com/system/files/TechDocs/24593.pdf
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
---

v6:
 One minor change in patch #04. Otherwise same as v5.
 Updated all the patches by Reviewed-by.

v5:
 https://lore.kernel.org/lkml/159846887637.18873.14677728679411578606.stgit@bmoger-ubuntu/
 All the changes are related to rebase.
 Aplies cleanly on mainline and kvm(master) tree. 
 Resending it to get some attention.

v4:
 https://lore.kernel.org/lkml/159676101387.12805.18038347880482984693.stgit@bmoger-ubuntu/
 1. Changed the functions __set_intercept/__clr_intercept/__is_intercept to
    to vmcb_set_intercept/vmcb_clr_intercept/vmcb_is_intercept by passing
    vmcb_control_area structure(Suggested by Paolo).
 2. Rearranged the commit 7a35e515a7055 ("KVM: VMX: Properly handle kvm_read/write_guest_virt*())
    to make it common across both SVM/VMX(Suggested by Jim Mattson).
 3. Took care of few other comments from Jim Mattson. Dropped "Reviewed-by"
    on few patches which I have changed since v3.

v3:
 https://lore.kernel.org/lkml/159597929496.12744.14654593948763926416.stgit@bmoger-ubuntu/
 1. Addressing the comments from Jim Mattson. Follow the v2 link below
    for the context.
 2. Introduced the generic __set_intercept, __clr_intercept and is_intercept
    using native __set_bit, clear_bit and test_bit.
 3. Combined all the intercepts vectors into single 32 bit array.
 4. Removed set_intercept_cr, clr_intercept_cr, set_exception_intercepts,
    clr_exception_intercept etc. Used the generic set_intercept and
    clr_intercept where applicable.
 5. Tested both L1 guest and l2 nested guests. 

v2:
  https://lore.kernel.org/lkml/159234483706.6230.13753828995249423191.stgit@bmoger-ubuntu/
  - Taken care of few comments from Jim Mattson.
  - KVM interceptions added only when tdp is off. No interceptions
    when tdp is on.
  - Reverted the fault priority to original order in VMX. 
  
v1:
  https://lore.kernel.org/lkml/159191202523.31436.11959784252237488867.stgit@bmoger-ubuntu/

Babu Moger (12):
      KVM: SVM: Introduce vmcb_(set_intercept/clr_intercept/_is_intercept)
      KVM: SVM: Change intercept_cr to generic intercepts
      KVM: SVM: Change intercept_dr to generic intercepts
      KVM: SVM: Modify intercept_exceptions to generic intercepts
      KVM: SVM: Modify 64 bit intercept field to two 32 bit vectors
      KVM: SVM: Add new intercept vector in vmcb_control_area
      KVM: nSVM: Cleanup nested_state data structure
      KVM: SVM: Remove set_cr_intercept, clr_cr_intercept and is_cr_intercept
      KVM: SVM: Remove set_exception_intercept and clr_exception_intercept
      KVM: X86: Rename and move the function vmx_handle_memory_failure to x86.c
      KVM: X86: Move handling of INVPCID types to x86
      KVM:SVM: Enable INVPCID feature on AMD


 arch/x86/include/asm/svm.h      |  117 +++++++++++++++++++++++++----------
 arch/x86/include/uapi/asm/svm.h |    2 +
 arch/x86/kvm/svm/nested.c       |   66 +++++++++-----------
 arch/x86/kvm/svm/svm.c          |  131 ++++++++++++++++++++++++++-------------
 arch/x86/kvm/svm/svm.h          |   87 +++++++++-----------------
 arch/x86/kvm/trace.h            |   21 ++++--
 arch/x86/kvm/vmx/nested.c       |   12 ++--
 arch/x86/kvm/vmx/vmx.c          |   95 ----------------------------
 arch/x86/kvm/vmx/vmx.h          |    2 -
 arch/x86/kvm/x86.c              |  106 ++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.h              |    3 +
 11 files changed, 364 insertions(+), 278 deletions(-)

--
Signature

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v6 01/12] KVM: SVM: Introduce vmcb_(set_intercept/clr_intercept/_is_intercept)
  2020-09-11 19:27 [PATCH v6 00/12] SVM cleanup and INVPCID feature support Babu Moger
@ 2020-09-11 19:27 ` Babu Moger
  2020-09-11 19:28 ` [PATCH v6 02/12] KVM: SVM: Change intercept_cr to generic intercepts Babu Moger
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 62+ messages in thread
From: Babu Moger @ 2020-09-11 19:27 UTC (permalink / raw)
  To: pbonzini, vkuznets, sean.j.christopherson, jmattson
  Cc: wanpengli, kvm, joro, x86, linux-kernel, babu.moger, mingo, bp,
	hpa, tglx

This is in preparation for the future intercept vector additions.

Add new functions vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept
using kernel APIs __set_bit, __clear_bit and test_bit espectively.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/kvm/svm/svm.h |   15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index a798e1731709..1cff7644e70b 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -214,6 +214,21 @@ static inline struct vmcb *get_host_vmcb(struct vcpu_svm *svm)
 		return svm->vmcb;
 }
 
+static inline void vmcb_set_intercept(struct vmcb_control_area *control, int bit)
+{
+	__set_bit(bit, (unsigned long *)&control->intercept_cr);
+}
+
+static inline void vmcb_clr_intercept(struct vmcb_control_area *control, int bit)
+{
+	__clear_bit(bit, (unsigned long *)&control->intercept_cr);
+}
+
+static inline bool vmcb_is_intercept(struct vmcb_control_area *control, int bit)
+{
+	return test_bit(bit, (unsigned long *)&control->intercept_cr);
+}
+
 static inline void set_cr_intercept(struct vcpu_svm *svm, int bit)
 {
 	struct vmcb *vmcb = get_host_vmcb(svm);


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v6 02/12] KVM: SVM: Change intercept_cr to generic intercepts
  2020-09-11 19:27 [PATCH v6 00/12] SVM cleanup and INVPCID feature support Babu Moger
  2020-09-11 19:27 ` [PATCH v6 01/12] KVM: SVM: Introduce vmcb_(set_intercept/clr_intercept/_is_intercept) Babu Moger
@ 2020-09-11 19:28 ` Babu Moger
  2020-09-11 19:28 ` [PATCH v6 03/12] KVM: SVM: Change intercept_dr " Babu Moger
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 62+ messages in thread
From: Babu Moger @ 2020-09-11 19:28 UTC (permalink / raw)
  To: pbonzini, vkuznets, sean.j.christopherson, jmattson
  Cc: wanpengli, kvm, joro, x86, linux-kernel, babu.moger, mingo, bp,
	hpa, tglx

Change intercept_cr to generic intercepts in vmcb_control_area.
Use the new vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept
where applicable.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/include/asm/svm.h |   42 ++++++++++++++++++++++++++++++++----------
 arch/x86/kvm/svm/nested.c  |   26 +++++++++++++++++---------
 arch/x86/kvm/svm/svm.c     |    4 ++--
 arch/x86/kvm/svm/svm.h     |   12 ++++++------
 4 files changed, 57 insertions(+), 27 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 8a1f5382a4ea..d4739f4eae63 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -4,6 +4,37 @@
 
 #include <uapi/asm/svm.h>
 
+/*
+ * VMCB Control Area intercept bits starting
+ * at Byte offset 000h (Vector 0).
+ */
+
+enum vector_offset {
+	CR_VECTOR = 0,
+	MAX_VECTORS,
+};
+
+enum {
+	/* Byte offset 000h (Vector 0) */
+	INTERCEPT_CR0_READ = 0,
+	INTERCEPT_CR1_READ,
+	INTERCEPT_CR2_READ,
+	INTERCEPT_CR3_READ,
+	INTERCEPT_CR4_READ,
+	INTERCEPT_CR5_READ,
+	INTERCEPT_CR6_READ,
+	INTERCEPT_CR7_READ,
+	INTERCEPT_CR8_READ,
+	INTERCEPT_CR0_WRITE = 16,
+	INTERCEPT_CR1_WRITE,
+	INTERCEPT_CR2_WRITE,
+	INTERCEPT_CR3_WRITE,
+	INTERCEPT_CR4_WRITE,
+	INTERCEPT_CR5_WRITE,
+	INTERCEPT_CR6_WRITE,
+	INTERCEPT_CR7_WRITE,
+	INTERCEPT_CR8_WRITE,
+};
 
 enum {
 	INTERCEPT_INTR,
@@ -57,7 +88,7 @@ enum {
 
 
 struct __attribute__ ((__packed__)) vmcb_control_area {
-	u32 intercept_cr;
+	u32 intercepts[MAX_VECTORS];
 	u32 intercept_dr;
 	u32 intercept_exceptions;
 	u64 intercept;
@@ -240,15 +271,6 @@ struct __attribute__ ((__packed__)) vmcb {
 #define SVM_SELECTOR_READ_MASK SVM_SELECTOR_WRITE_MASK
 #define SVM_SELECTOR_CODE_MASK (1 << 3)
 
-#define INTERCEPT_CR0_READ	0
-#define INTERCEPT_CR3_READ	3
-#define INTERCEPT_CR4_READ	4
-#define INTERCEPT_CR8_READ	8
-#define INTERCEPT_CR0_WRITE	(16 + 0)
-#define INTERCEPT_CR3_WRITE	(16 + 3)
-#define INTERCEPT_CR4_WRITE	(16 + 4)
-#define INTERCEPT_CR8_WRITE	(16 + 8)
-
 #define INTERCEPT_DR0_READ	0
 #define INTERCEPT_DR1_READ	1
 #define INTERCEPT_DR2_READ	2
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index fb68467e6049..5f65b759abcb 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -98,6 +98,7 @@ static void nested_svm_uninit_mmu_context(struct kvm_vcpu *vcpu)
 void recalc_intercepts(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *c, *h, *g;
+	unsigned int i;
 
 	vmcb_mark_dirty(svm->vmcb, VMCB_INTERCEPTS);
 
@@ -110,15 +111,17 @@ void recalc_intercepts(struct vcpu_svm *svm)
 
 	svm->nested.host_intercept_exceptions = h->intercept_exceptions;
 
-	c->intercept_cr = h->intercept_cr;
+	for (i = 0; i < MAX_VECTORS; i++)
+		c->intercepts[i] = h->intercepts[i];
+
 	c->intercept_dr = h->intercept_dr;
 	c->intercept_exceptions = h->intercept_exceptions;
 	c->intercept = h->intercept;
 
 	if (g->int_ctl & V_INTR_MASKING_MASK) {
 		/* We only want the cr8 intercept bits of L1 */
-		c->intercept_cr &= ~(1U << INTERCEPT_CR8_READ);
-		c->intercept_cr &= ~(1U << INTERCEPT_CR8_WRITE);
+		vmcb_clr_intercept(c, INTERCEPT_CR8_READ);
+		vmcb_clr_intercept(c, INTERCEPT_CR8_WRITE);
 
 		/*
 		 * Once running L2 with HF_VINTR_MASK, EFLAGS.IF does not
@@ -131,7 +134,9 @@ void recalc_intercepts(struct vcpu_svm *svm)
 	/* We don't want to see VMMCALLs from a nested guest */
 	c->intercept &= ~(1ULL << INTERCEPT_VMMCALL);
 
-	c->intercept_cr |= g->intercept_cr;
+	for (i = 0; i < MAX_VECTORS; i++)
+		c->intercepts[i] |= g->intercepts[i];
+
 	c->intercept_dr |= g->intercept_dr;
 	c->intercept_exceptions |= g->intercept_exceptions;
 	c->intercept |= g->intercept;
@@ -140,7 +145,11 @@ void recalc_intercepts(struct vcpu_svm *svm)
 static void copy_vmcb_control_area(struct vmcb_control_area *dst,
 				   struct vmcb_control_area *from)
 {
-	dst->intercept_cr         = from->intercept_cr;
+	unsigned int i;
+
+	for (i = 0; i < MAX_VECTORS; i++)
+		dst->intercepts[i] = from->intercepts[i];
+
 	dst->intercept_dr         = from->intercept_dr;
 	dst->intercept_exceptions = from->intercept_exceptions;
 	dst->intercept            = from->intercept;
@@ -487,8 +496,8 @@ int nested_svm_vmrun(struct vcpu_svm *svm)
 			       nested_vmcb->control.event_inj,
 			       nested_vmcb->control.nested_ctl);
 
-	trace_kvm_nested_intercepts(nested_vmcb->control.intercept_cr & 0xffff,
-				    nested_vmcb->control.intercept_cr >> 16,
+	trace_kvm_nested_intercepts(nested_vmcb->control.intercepts[CR_VECTOR] & 0xffff,
+				    nested_vmcb->control.intercepts[CR_VECTOR] >> 16,
 				    nested_vmcb->control.intercept_exceptions,
 				    nested_vmcb->control.intercept);
 
@@ -765,8 +774,7 @@ static int nested_svm_intercept(struct vcpu_svm *svm)
 		vmexit = nested_svm_intercept_ioio(svm);
 		break;
 	case SVM_EXIT_READ_CR0 ... SVM_EXIT_WRITE_CR8: {
-		u32 bit = 1U << (exit_code - SVM_EXIT_READ_CR0);
-		if (svm->nested.ctl.intercept_cr & bit)
+		if (vmcb_is_intercept(&svm->nested.ctl, exit_code))
 			vmexit = NESTED_EXIT_DONE;
 		break;
 	}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 03dd7bac8034..523936b80dda 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2813,8 +2813,8 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
 	}
 
 	pr_err("VMCB Control Area:\n");
-	pr_err("%-20s%04x\n", "cr_read:", control->intercept_cr & 0xffff);
-	pr_err("%-20s%04x\n", "cr_write:", control->intercept_cr >> 16);
+	pr_err("%-20s%04x\n", "cr_read:", control->intercepts[CR_VECTOR] & 0xffff);
+	pr_err("%-20s%04x\n", "cr_write:", control->intercepts[CR_VECTOR] >> 16);
 	pr_err("%-20s%04x\n", "dr_read:", control->intercept_dr & 0xffff);
 	pr_err("%-20s%04x\n", "dr_write:", control->intercept_dr >> 16);
 	pr_err("%-20s%08x\n", "exceptions:", control->intercept_exceptions);
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 1cff7644e70b..e775c502a074 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -216,24 +216,24 @@ static inline struct vmcb *get_host_vmcb(struct vcpu_svm *svm)
 
 static inline void vmcb_set_intercept(struct vmcb_control_area *control, int bit)
 {
-	__set_bit(bit, (unsigned long *)&control->intercept_cr);
+	__set_bit(bit, (unsigned long *)&control->intercepts);
 }
 
 static inline void vmcb_clr_intercept(struct vmcb_control_area *control, int bit)
 {
-	__clear_bit(bit, (unsigned long *)&control->intercept_cr);
+	__clear_bit(bit, (unsigned long *)&control->intercepts);
 }
 
 static inline bool vmcb_is_intercept(struct vmcb_control_area *control, int bit)
 {
-	return test_bit(bit, (unsigned long *)&control->intercept_cr);
+	return test_bit(bit, (unsigned long *)&control->intercepts);
 }
 
 static inline void set_cr_intercept(struct vcpu_svm *svm, int bit)
 {
 	struct vmcb *vmcb = get_host_vmcb(svm);
 
-	vmcb->control.intercept_cr |= (1U << bit);
+	vmcb_set_intercept(&vmcb->control, bit);
 
 	recalc_intercepts(svm);
 }
@@ -242,7 +242,7 @@ static inline void clr_cr_intercept(struct vcpu_svm *svm, int bit)
 {
 	struct vmcb *vmcb = get_host_vmcb(svm);
 
-	vmcb->control.intercept_cr &= ~(1U << bit);
+	vmcb_clr_intercept(&vmcb->control, bit);
 
 	recalc_intercepts(svm);
 }
@@ -251,7 +251,7 @@ static inline bool is_cr_intercept(struct vcpu_svm *svm, int bit)
 {
 	struct vmcb *vmcb = get_host_vmcb(svm);
 
-	return vmcb->control.intercept_cr & (1U << bit);
+	return vmcb_is_intercept(&vmcb->control, bit);
 }
 
 static inline void set_dr_intercepts(struct vcpu_svm *svm)


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v6 03/12] KVM: SVM: Change intercept_dr to generic intercepts
  2020-09-11 19:27 [PATCH v6 00/12] SVM cleanup and INVPCID feature support Babu Moger
  2020-09-11 19:27 ` [PATCH v6 01/12] KVM: SVM: Introduce vmcb_(set_intercept/clr_intercept/_is_intercept) Babu Moger
  2020-09-11 19:28 ` [PATCH v6 02/12] KVM: SVM: Change intercept_cr to generic intercepts Babu Moger
@ 2020-09-11 19:28 ` Babu Moger
  2020-09-11 19:28 ` [PATCH v6 04/12] KVM: SVM: Modify intercept_exceptions " Babu Moger
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 62+ messages in thread
From: Babu Moger @ 2020-09-11 19:28 UTC (permalink / raw)
  To: pbonzini, vkuznets, sean.j.christopherson, jmattson
  Cc: wanpengli, kvm, joro, x86, linux-kernel, babu.moger, mingo, bp,
	hpa, tglx

Modify intercept_dr to generic intercepts in vmcb_control_area. Use
the generic vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept
to set/clear/test the intercept_dr bits.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/include/asm/svm.h |   36 ++++++++++++++++++------------------
 arch/x86/kvm/svm/nested.c  |    6 +-----
 arch/x86/kvm/svm/svm.c     |    4 ++--
 arch/x86/kvm/svm/svm.h     |   34 +++++++++++++++++-----------------
 4 files changed, 38 insertions(+), 42 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index d4739f4eae63..ffc89d8e4fcb 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -11,6 +11,7 @@
 
 enum vector_offset {
 	CR_VECTOR = 0,
+	DR_VECTOR,
 	MAX_VECTORS,
 };
 
@@ -34,6 +35,23 @@ enum {
 	INTERCEPT_CR6_WRITE,
 	INTERCEPT_CR7_WRITE,
 	INTERCEPT_CR8_WRITE,
+	/* Byte offset 004h (Vector 1) */
+	INTERCEPT_DR0_READ = 32,
+	INTERCEPT_DR1_READ,
+	INTERCEPT_DR2_READ,
+	INTERCEPT_DR3_READ,
+	INTERCEPT_DR4_READ,
+	INTERCEPT_DR5_READ,
+	INTERCEPT_DR6_READ,
+	INTERCEPT_DR7_READ,
+	INTERCEPT_DR0_WRITE = 48,
+	INTERCEPT_DR1_WRITE,
+	INTERCEPT_DR2_WRITE,
+	INTERCEPT_DR3_WRITE,
+	INTERCEPT_DR4_WRITE,
+	INTERCEPT_DR5_WRITE,
+	INTERCEPT_DR6_WRITE,
+	INTERCEPT_DR7_WRITE,
 };
 
 enum {
@@ -89,7 +107,6 @@ enum {
 
 struct __attribute__ ((__packed__)) vmcb_control_area {
 	u32 intercepts[MAX_VECTORS];
-	u32 intercept_dr;
 	u32 intercept_exceptions;
 	u64 intercept;
 	u8 reserved_1[40];
@@ -271,23 +288,6 @@ struct __attribute__ ((__packed__)) vmcb {
 #define SVM_SELECTOR_READ_MASK SVM_SELECTOR_WRITE_MASK
 #define SVM_SELECTOR_CODE_MASK (1 << 3)
 
-#define INTERCEPT_DR0_READ	0
-#define INTERCEPT_DR1_READ	1
-#define INTERCEPT_DR2_READ	2
-#define INTERCEPT_DR3_READ	3
-#define INTERCEPT_DR4_READ	4
-#define INTERCEPT_DR5_READ	5
-#define INTERCEPT_DR6_READ	6
-#define INTERCEPT_DR7_READ	7
-#define INTERCEPT_DR0_WRITE	(16 + 0)
-#define INTERCEPT_DR1_WRITE	(16 + 1)
-#define INTERCEPT_DR2_WRITE	(16 + 2)
-#define INTERCEPT_DR3_WRITE	(16 + 3)
-#define INTERCEPT_DR4_WRITE	(16 + 4)
-#define INTERCEPT_DR5_WRITE	(16 + 5)
-#define INTERCEPT_DR6_WRITE	(16 + 6)
-#define INTERCEPT_DR7_WRITE	(16 + 7)
-
 #define SVM_EVTINJ_VEC_MASK 0xff
 
 #define SVM_EVTINJ_TYPE_SHIFT 8
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 5f65b759abcb..ba11fc3bf843 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -114,7 +114,6 @@ void recalc_intercepts(struct vcpu_svm *svm)
 	for (i = 0; i < MAX_VECTORS; i++)
 		c->intercepts[i] = h->intercepts[i];
 
-	c->intercept_dr = h->intercept_dr;
 	c->intercept_exceptions = h->intercept_exceptions;
 	c->intercept = h->intercept;
 
@@ -137,7 +136,6 @@ void recalc_intercepts(struct vcpu_svm *svm)
 	for (i = 0; i < MAX_VECTORS; i++)
 		c->intercepts[i] |= g->intercepts[i];
 
-	c->intercept_dr |= g->intercept_dr;
 	c->intercept_exceptions |= g->intercept_exceptions;
 	c->intercept |= g->intercept;
 }
@@ -150,7 +148,6 @@ static void copy_vmcb_control_area(struct vmcb_control_area *dst,
 	for (i = 0; i < MAX_VECTORS; i++)
 		dst->intercepts[i] = from->intercepts[i];
 
-	dst->intercept_dr         = from->intercept_dr;
 	dst->intercept_exceptions = from->intercept_exceptions;
 	dst->intercept            = from->intercept;
 	dst->iopm_base_pa         = from->iopm_base_pa;
@@ -779,8 +776,7 @@ static int nested_svm_intercept(struct vcpu_svm *svm)
 		break;
 	}
 	case SVM_EXIT_READ_DR0 ... SVM_EXIT_WRITE_DR7: {
-		u32 bit = 1U << (exit_code - SVM_EXIT_READ_DR0);
-		if (svm->nested.ctl.intercept_dr & bit)
+		if (vmcb_is_intercept(&svm->nested.ctl, exit_code))
 			vmexit = NESTED_EXIT_DONE;
 		break;
 	}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 523936b80dda..1a5f3908b388 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2815,8 +2815,8 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
 	pr_err("VMCB Control Area:\n");
 	pr_err("%-20s%04x\n", "cr_read:", control->intercepts[CR_VECTOR] & 0xffff);
 	pr_err("%-20s%04x\n", "cr_write:", control->intercepts[CR_VECTOR] >> 16);
-	pr_err("%-20s%04x\n", "dr_read:", control->intercept_dr & 0xffff);
-	pr_err("%-20s%04x\n", "dr_write:", control->intercept_dr >> 16);
+	pr_err("%-20s%04x\n", "dr_read:", control->intercepts[DR_VECTOR] & 0xffff);
+	pr_err("%-20s%04x\n", "dr_write:", control->intercepts[DR_VECTOR] >> 16);
 	pr_err("%-20s%08x\n", "exceptions:", control->intercept_exceptions);
 	pr_err("%-20s%016llx\n", "intercepts:", control->intercept);
 	pr_err("%-20s%d\n", "pause filter count:", control->pause_filter_count);
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index e775c502a074..d3b34e0276c5 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -258,22 +258,22 @@ static inline void set_dr_intercepts(struct vcpu_svm *svm)
 {
 	struct vmcb *vmcb = get_host_vmcb(svm);
 
-	vmcb->control.intercept_dr = (1 << INTERCEPT_DR0_READ)
-		| (1 << INTERCEPT_DR1_READ)
-		| (1 << INTERCEPT_DR2_READ)
-		| (1 << INTERCEPT_DR3_READ)
-		| (1 << INTERCEPT_DR4_READ)
-		| (1 << INTERCEPT_DR5_READ)
-		| (1 << INTERCEPT_DR6_READ)
-		| (1 << INTERCEPT_DR7_READ)
-		| (1 << INTERCEPT_DR0_WRITE)
-		| (1 << INTERCEPT_DR1_WRITE)
-		| (1 << INTERCEPT_DR2_WRITE)
-		| (1 << INTERCEPT_DR3_WRITE)
-		| (1 << INTERCEPT_DR4_WRITE)
-		| (1 << INTERCEPT_DR5_WRITE)
-		| (1 << INTERCEPT_DR6_WRITE)
-		| (1 << INTERCEPT_DR7_WRITE);
+	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR0_READ);
+	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR1_READ);
+	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR2_READ);
+	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR3_READ);
+	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR4_READ);
+	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR5_READ);
+	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR6_READ);
+	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
+	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR0_WRITE);
+	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR1_WRITE);
+	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR2_WRITE);
+	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR3_WRITE);
+	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR4_WRITE);
+	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR5_WRITE);
+	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR6_WRITE);
+	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
 
 	recalc_intercepts(svm);
 }
@@ -282,7 +282,7 @@ static inline void clr_dr_intercepts(struct vcpu_svm *svm)
 {
 	struct vmcb *vmcb = get_host_vmcb(svm);
 
-	vmcb->control.intercept_dr = 0;
+	vmcb->control.intercepts[DR_VECTOR] = 0;
 
 	recalc_intercepts(svm);
 }


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v6 04/12] KVM: SVM: Modify intercept_exceptions to generic intercepts
  2020-09-11 19:27 [PATCH v6 00/12] SVM cleanup and INVPCID feature support Babu Moger
                   ` (2 preceding siblings ...)
  2020-09-11 19:28 ` [PATCH v6 03/12] KVM: SVM: Change intercept_dr " Babu Moger
@ 2020-09-11 19:28 ` Babu Moger
  2020-09-12 16:52   ` Paolo Bonzini
  2020-09-11 19:28 ` [PATCH v6 05/12] KVM: SVM: Modify 64 bit intercept field to two 32 bit vectors Babu Moger
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 62+ messages in thread
From: Babu Moger @ 2020-09-11 19:28 UTC (permalink / raw)
  To: pbonzini, vkuznets, sean.j.christopherson, jmattson
  Cc: wanpengli, kvm, joro, x86, linux-kernel, babu.moger, mingo, bp,
	hpa, tglx

Modify intercept_exceptions to generic intercepts in vmcb_control_area. Use
the generic vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept to
set/clear/test the intercept_exceptions bits.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/include/asm/svm.h |   22 +++++++++++++++++++++-
 arch/x86/kvm/svm/nested.c  |   12 +++++-------
 arch/x86/kvm/svm/svm.c     |   22 +++++++++++-----------
 arch/x86/kvm/svm/svm.h     |    4 ++--
 4 files changed, 39 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index ffc89d8e4fcb..51833a611eba 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -3,6 +3,7 @@
 #define __SVM_H
 
 #include <uapi/asm/svm.h>
+#include <uapi/asm/kvm.h>
 
 /*
  * VMCB Control Area intercept bits starting
@@ -12,6 +13,7 @@
 enum vector_offset {
 	CR_VECTOR = 0,
 	DR_VECTOR,
+	EXCEPTION_VECTOR,
 	MAX_VECTORS,
 };
 
@@ -52,6 +54,25 @@ enum {
 	INTERCEPT_DR5_WRITE,
 	INTERCEPT_DR6_WRITE,
 	INTERCEPT_DR7_WRITE,
+	/* Byte offset 008h (Vector 2) */
+	INTERCEPT_DE_VECTOR = 64 + DE_VECTOR,
+	INTERCEPT_DB_VECTOR = 64 + DB_VECTOR,
+	INTERCEPT_BP_VECTOR = 64 + BP_VECTOR,
+	INTERCEPT_OF_VECTOR = 64 + OF_VECTOR,
+	INTERCEPT_BR_VECTOR = 64 + BR_VECTOR,
+	INTERCEPT_UD_VECTOR = 64 + UD_VECTOR,
+	INTERCEPT_NM_VECTOR = 64 + NM_VECTOR,
+	INTERCEPT_DF_VECTOR = 64 + DF_VECTOR,
+	INTERCEPT_TS_VECTOR = 64 + TS_VECTOR,
+	INTERCEPT_NP_VECTOR = 64 + NP_VECTOR,
+	INTERCEPT_SS_VECTOR = 64 + SS_VECTOR,
+	INTERCEPT_GP_VECTOR = 64 + GP_VECTOR,
+	INTERCEPT_PF_VECTOR = 64 + PF_VECTOR,
+	INTERCEPT_MF_VECTOR = 64 + MF_VECTOR,
+	INTERCEPT_AC_VECTOR = 64 + AC_VECTOR,
+	INTERCEPT_MC_VECTOR = 64 + MC_VECTOR,
+	INTERCEPT_XM_VECTOR = 64 + XM_VECTOR,
+	INTERCEPT_VE_VECTOR = 64 + VE_VECTOR,
 };
 
 enum {
@@ -107,7 +128,6 @@ enum {
 
 struct __attribute__ ((__packed__)) vmcb_control_area {
 	u32 intercepts[MAX_VECTORS];
-	u32 intercept_exceptions;
 	u64 intercept;
 	u8 reserved_1[40];
 	u16 pause_filter_thresh;
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index ba11fc3bf843..c161bd38f401 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -109,12 +109,11 @@ void recalc_intercepts(struct vcpu_svm *svm)
 	h = &svm->nested.hsave->control;
 	g = &svm->nested.ctl;
 
-	svm->nested.host_intercept_exceptions = h->intercept_exceptions;
+	svm->nested.host_intercept_exceptions = h->intercepts[EXCEPTION_VECTOR];
 
 	for (i = 0; i < MAX_VECTORS; i++)
 		c->intercepts[i] = h->intercepts[i];
 
-	c->intercept_exceptions = h->intercept_exceptions;
 	c->intercept = h->intercept;
 
 	if (g->int_ctl & V_INTR_MASKING_MASK) {
@@ -136,7 +135,6 @@ void recalc_intercepts(struct vcpu_svm *svm)
 	for (i = 0; i < MAX_VECTORS; i++)
 		c->intercepts[i] |= g->intercepts[i];
 
-	c->intercept_exceptions |= g->intercept_exceptions;
 	c->intercept |= g->intercept;
 }
 
@@ -148,7 +146,6 @@ static void copy_vmcb_control_area(struct vmcb_control_area *dst,
 	for (i = 0; i < MAX_VECTORS; i++)
 		dst->intercepts[i] = from->intercepts[i];
 
-	dst->intercept_exceptions = from->intercept_exceptions;
 	dst->intercept            = from->intercept;
 	dst->iopm_base_pa         = from->iopm_base_pa;
 	dst->msrpm_base_pa        = from->msrpm_base_pa;
@@ -495,7 +492,7 @@ int nested_svm_vmrun(struct vcpu_svm *svm)
 
 	trace_kvm_nested_intercepts(nested_vmcb->control.intercepts[CR_VECTOR] & 0xffff,
 				    nested_vmcb->control.intercepts[CR_VECTOR] >> 16,
-				    nested_vmcb->control.intercept_exceptions,
+				    nested_vmcb->control.intercepts[EXCEPTION_VECTOR],
 				    nested_vmcb->control.intercept);
 
 	/* Clear internal status */
@@ -835,7 +832,7 @@ static bool nested_exit_on_exception(struct vcpu_svm *svm)
 {
 	unsigned int nr = svm->vcpu.arch.exception.nr;
 
-	return (svm->nested.ctl.intercept_exceptions & (1 << nr));
+	return (svm->nested.ctl.intercepts[EXCEPTION_VECTOR] & BIT(nr));
 }
 
 static void nested_svm_inject_exception_vmexit(struct vcpu_svm *svm)
@@ -984,7 +981,8 @@ int nested_svm_exit_special(struct vcpu_svm *svm)
 	case SVM_EXIT_EXCP_BASE ... SVM_EXIT_EXCP_BASE + 0x1f: {
 		u32 excp_bits = 1 << (exit_code - SVM_EXIT_EXCP_BASE);
 
-		if (get_host_vmcb(svm)->control.intercept_exceptions & excp_bits)
+		if (get_host_vmcb(svm)->control.intercepts[EXCEPTION_VECTOR] &
+				excp_bits)
 			return NESTED_EXIT_HOST;
 		else if (exit_code == SVM_EXIT_EXCP_BASE + PF_VECTOR &&
 			 svm->vcpu.arch.apf.host_apf_flags)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 1a5f3908b388..11892e86cb39 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1003,11 +1003,11 @@ static void init_vmcb(struct vcpu_svm *svm)
 
 	set_dr_intercepts(svm);
 
-	set_exception_intercept(svm, PF_VECTOR);
-	set_exception_intercept(svm, UD_VECTOR);
-	set_exception_intercept(svm, MC_VECTOR);
-	set_exception_intercept(svm, AC_VECTOR);
-	set_exception_intercept(svm, DB_VECTOR);
+	set_exception_intercept(svm, INTERCEPT_PF_VECTOR);
+	set_exception_intercept(svm, INTERCEPT_UD_VECTOR);
+	set_exception_intercept(svm, INTERCEPT_MC_VECTOR);
+	set_exception_intercept(svm, INTERCEPT_AC_VECTOR);
+	set_exception_intercept(svm, INTERCEPT_DB_VECTOR);
 	/*
 	 * Guest access to VMware backdoor ports could legitimately
 	 * trigger #GP because of TSS I/O permission bitmap.
@@ -1015,7 +1015,7 @@ static void init_vmcb(struct vcpu_svm *svm)
 	 * as VMware does.
 	 */
 	if (enable_vmware_backdoor)
-		set_exception_intercept(svm, GP_VECTOR);
+		set_exception_intercept(svm, INTERCEPT_GP_VECTOR);
 
 	svm_set_intercept(svm, INTERCEPT_INTR);
 	svm_set_intercept(svm, INTERCEPT_NMI);
@@ -1093,7 +1093,7 @@ static void init_vmcb(struct vcpu_svm *svm)
 		/* Setup VMCB for Nested Paging */
 		control->nested_ctl |= SVM_NESTED_CTL_NP_ENABLE;
 		svm_clr_intercept(svm, INTERCEPT_INVLPG);
-		clr_exception_intercept(svm, PF_VECTOR);
+		clr_exception_intercept(svm, INTERCEPT_PF_VECTOR);
 		clr_cr_intercept(svm, INTERCEPT_CR3_READ);
 		clr_cr_intercept(svm, INTERCEPT_CR3_WRITE);
 		save->g_pat = svm->vcpu.arch.pat;
@@ -1135,7 +1135,7 @@ static void init_vmcb(struct vcpu_svm *svm)
 
 	if (sev_guest(svm->vcpu.kvm)) {
 		svm->vmcb->control.nested_ctl |= SVM_NESTED_CTL_SEV_ENABLE;
-		clr_exception_intercept(svm, UD_VECTOR);
+		clr_exception_intercept(svm, INTERCEPT_UD_VECTOR);
 	}
 
 	vmcb_mark_all_dirty(svm->vmcb);
@@ -1646,11 +1646,11 @@ static void update_exception_bitmap(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	clr_exception_intercept(svm, BP_VECTOR);
+	clr_exception_intercept(svm, INTERCEPT_BP_VECTOR);
 
 	if (vcpu->guest_debug & KVM_GUESTDBG_ENABLE) {
 		if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP)
-			set_exception_intercept(svm, BP_VECTOR);
+			set_exception_intercept(svm, INTERCEPT_BP_VECTOR);
 	}
 }
 
@@ -2817,7 +2817,7 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
 	pr_err("%-20s%04x\n", "cr_write:", control->intercepts[CR_VECTOR] >> 16);
 	pr_err("%-20s%04x\n", "dr_read:", control->intercepts[DR_VECTOR] & 0xffff);
 	pr_err("%-20s%04x\n", "dr_write:", control->intercepts[DR_VECTOR] >> 16);
-	pr_err("%-20s%08x\n", "exceptions:", control->intercept_exceptions);
+	pr_err("%-20s%08x\n", "exceptions:", control->intercepts[EXCEPTION_VECTOR]);
 	pr_err("%-20s%016llx\n", "intercepts:", control->intercept);
 	pr_err("%-20s%d\n", "pause filter count:", control->pause_filter_count);
 	pr_err("%-20s%d\n", "pause filter threshold:",
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index d3b34e0276c5..2fc305f647a3 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -291,7 +291,7 @@ static inline void set_exception_intercept(struct vcpu_svm *svm, int bit)
 {
 	struct vmcb *vmcb = get_host_vmcb(svm);
 
-	vmcb->control.intercept_exceptions |= (1U << bit);
+	vmcb_set_intercept(&vmcb->control, bit);
 
 	recalc_intercepts(svm);
 }
@@ -300,7 +300,7 @@ static inline void clr_exception_intercept(struct vcpu_svm *svm, int bit)
 {
 	struct vmcb *vmcb = get_host_vmcb(svm);
 
-	vmcb->control.intercept_exceptions &= ~(1U << bit);
+	vmcb_clr_intercept(&vmcb->control, bit);
 
 	recalc_intercepts(svm);
 }


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v6 05/12] KVM: SVM: Modify 64 bit intercept field to two 32 bit vectors
  2020-09-11 19:27 [PATCH v6 00/12] SVM cleanup and INVPCID feature support Babu Moger
                   ` (3 preceding siblings ...)
  2020-09-11 19:28 ` [PATCH v6 04/12] KVM: SVM: Modify intercept_exceptions " Babu Moger
@ 2020-09-11 19:28 ` Babu Moger
  2020-09-11 19:28 ` [PATCH v6 06/12] KVM: SVM: Add new intercept vector in vmcb_control_area Babu Moger
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 62+ messages in thread
From: Babu Moger @ 2020-09-11 19:28 UTC (permalink / raw)
  To: pbonzini, vkuznets, sean.j.christopherson, jmattson
  Cc: wanpengli, kvm, joro, x86, linux-kernel, babu.moger, mingo, bp,
	hpa, tglx

Convert all the intercepts to one array of 32 bit vectors in
vmcb_control_area. This makes it easy for future intercept vector
additions. Also update trace functions.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/include/asm/svm.h |   14 +++++++-------
 arch/x86/kvm/svm/nested.c  |   25 ++++++++++---------------
 arch/x86/kvm/svm/svm.c     |   16 ++++++----------
 arch/x86/kvm/svm/svm.h     |   12 ++++++------
 arch/x86/kvm/trace.h       |   18 +++++++++++-------
 5 files changed, 40 insertions(+), 45 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 51833a611eba..9f0fa02fc838 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -14,6 +14,8 @@ enum vector_offset {
 	CR_VECTOR = 0,
 	DR_VECTOR,
 	EXCEPTION_VECTOR,
+	INTERCEPT_VECTOR_3,
+	INTERCEPT_VECTOR_4,
 	MAX_VECTORS,
 };
 
@@ -73,10 +75,8 @@ enum {
 	INTERCEPT_MC_VECTOR = 64 + MC_VECTOR,
 	INTERCEPT_XM_VECTOR = 64 + XM_VECTOR,
 	INTERCEPT_VE_VECTOR = 64 + VE_VECTOR,
-};
-
-enum {
-	INTERCEPT_INTR,
+	/* Byte offset 00Ch (Vector 3) */
+	INTERCEPT_INTR = 96,
 	INTERCEPT_NMI,
 	INTERCEPT_SMI,
 	INTERCEPT_INIT,
@@ -108,7 +108,8 @@ enum {
 	INTERCEPT_TASK_SWITCH,
 	INTERCEPT_FERR_FREEZE,
 	INTERCEPT_SHUTDOWN,
-	INTERCEPT_VMRUN,
+	/* Byte offset 010h (Vector 4) */
+	INTERCEPT_VMRUN = 128,
 	INTERCEPT_VMMCALL,
 	INTERCEPT_VMLOAD,
 	INTERCEPT_VMSAVE,
@@ -128,8 +129,7 @@ enum {
 
 struct __attribute__ ((__packed__)) vmcb_control_area {
 	u32 intercepts[MAX_VECTORS];
-	u64 intercept;
-	u8 reserved_1[40];
+	u32 reserved_1[15 - MAX_VECTORS];
 	u16 pause_filter_thresh;
 	u16 pause_filter_count;
 	u64 iopm_base_pa;
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index c161bd38f401..353d550a5bb7 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -114,8 +114,6 @@ void recalc_intercepts(struct vcpu_svm *svm)
 	for (i = 0; i < MAX_VECTORS; i++)
 		c->intercepts[i] = h->intercepts[i];
 
-	c->intercept = h->intercept;
-
 	if (g->int_ctl & V_INTR_MASKING_MASK) {
 		/* We only want the cr8 intercept bits of L1 */
 		vmcb_clr_intercept(c, INTERCEPT_CR8_READ);
@@ -126,16 +124,14 @@ void recalc_intercepts(struct vcpu_svm *svm)
 		 * affect any interrupt we may want to inject; therefore,
 		 * interrupt window vmexits are irrelevant to L0.
 		 */
-		c->intercept &= ~(1ULL << INTERCEPT_VINTR);
+		vmcb_clr_intercept(c, INTERCEPT_VINTR);
 	}
 
 	/* We don't want to see VMMCALLs from a nested guest */
-	c->intercept &= ~(1ULL << INTERCEPT_VMMCALL);
+	vmcb_clr_intercept(c, INTERCEPT_VMMCALL);
 
 	for (i = 0; i < MAX_VECTORS; i++)
 		c->intercepts[i] |= g->intercepts[i];
-
-	c->intercept |= g->intercept;
 }
 
 static void copy_vmcb_control_area(struct vmcb_control_area *dst,
@@ -146,7 +142,6 @@ static void copy_vmcb_control_area(struct vmcb_control_area *dst,
 	for (i = 0; i < MAX_VECTORS; i++)
 		dst->intercepts[i] = from->intercepts[i];
 
-	dst->intercept            = from->intercept;
 	dst->iopm_base_pa         = from->iopm_base_pa;
 	dst->msrpm_base_pa        = from->msrpm_base_pa;
 	dst->tsc_offset           = from->tsc_offset;
@@ -179,7 +174,7 @@ static bool nested_svm_vmrun_msrpm(struct vcpu_svm *svm)
 	 */
 	int i;
 
-	if (!(svm->nested.ctl.intercept & (1ULL << INTERCEPT_MSR_PROT)))
+	if (!(vmcb_is_intercept(&svm->nested.ctl, INTERCEPT_MSR_PROT)))
 		return true;
 
 	for (i = 0; i < MSRPM_OFFSETS; i++) {
@@ -205,7 +200,7 @@ static bool nested_svm_vmrun_msrpm(struct vcpu_svm *svm)
 
 static bool nested_vmcb_check_controls(struct vmcb_control_area *control)
 {
-	if ((control->intercept & (1ULL << INTERCEPT_VMRUN)) == 0)
+	if ((vmcb_is_intercept(control, INTERCEPT_VMRUN)) == 0)
 		return false;
 
 	if (control->asid == 0)
@@ -493,7 +488,8 @@ int nested_svm_vmrun(struct vcpu_svm *svm)
 	trace_kvm_nested_intercepts(nested_vmcb->control.intercepts[CR_VECTOR] & 0xffff,
 				    nested_vmcb->control.intercepts[CR_VECTOR] >> 16,
 				    nested_vmcb->control.intercepts[EXCEPTION_VECTOR],
-				    nested_vmcb->control.intercept);
+				    nested_vmcb->control.intercepts[INTERCEPT_VECTOR_3],
+				    nested_vmcb->control.intercepts[INTERCEPT_VECTOR_4]);
 
 	/* Clear internal status */
 	kvm_clear_exception_queue(&svm->vcpu);
@@ -710,7 +706,7 @@ static int nested_svm_exit_handled_msr(struct vcpu_svm *svm)
 	u32 offset, msr, value;
 	int write, mask;
 
-	if (!(svm->nested.ctl.intercept & (1ULL << INTERCEPT_MSR_PROT)))
+	if (!(vmcb_is_intercept(&svm->nested.ctl, INTERCEPT_MSR_PROT)))
 		return NESTED_EXIT_HOST;
 
 	msr    = svm->vcpu.arch.regs[VCPU_REGS_RCX];
@@ -737,7 +733,7 @@ static int nested_svm_intercept_ioio(struct vcpu_svm *svm)
 	u8 start_bit;
 	u64 gpa;
 
-	if (!(svm->nested.ctl.intercept & (1ULL << INTERCEPT_IOIO_PROT)))
+	if (!(vmcb_is_intercept(&svm->nested.ctl, INTERCEPT_IOIO_PROT)))
 		return NESTED_EXIT_HOST;
 
 	port = svm->vmcb->control.exit_info_1 >> 16;
@@ -791,8 +787,7 @@ static int nested_svm_intercept(struct vcpu_svm *svm)
 		break;
 	}
 	default: {
-		u64 exit_bits = 1ULL << (exit_code - SVM_EXIT_INTR);
-		if (svm->nested.ctl.intercept & exit_bits)
+		if (vmcb_is_intercept(&svm->nested.ctl, exit_code))
 			vmexit = NESTED_EXIT_DONE;
 	}
 	}
@@ -900,7 +895,7 @@ static void nested_svm_intr(struct vcpu_svm *svm)
 
 static inline bool nested_exit_on_init(struct vcpu_svm *svm)
 {
-	return (svm->nested.ctl.intercept & (1ULL << INTERCEPT_INIT));
+	return vmcb_is_intercept(&svm->nested.ctl, INTERCEPT_INIT);
 }
 
 static void nested_svm_init(struct vcpu_svm *svm)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 11892e86cb39..17bfa34033ac 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2218,12 +2218,9 @@ static bool check_selective_cr0_intercepted(struct vcpu_svm *svm,
 {
 	unsigned long cr0 = svm->vcpu.arch.cr0;
 	bool ret = false;
-	u64 intercept;
-
-	intercept = svm->nested.ctl.intercept;
 
 	if (!is_guest_mode(&svm->vcpu) ||
-	    (!(intercept & (1ULL << INTERCEPT_SELECTIVE_CR0))))
+	    (!(vmcb_is_intercept(&svm->nested.ctl, INTERCEPT_SELECTIVE_CR0))))
 		return false;
 
 	cr0 &= ~SVM_CR0_SELECTIVE_MASK;
@@ -2818,7 +2815,8 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
 	pr_err("%-20s%04x\n", "dr_read:", control->intercepts[DR_VECTOR] & 0xffff);
 	pr_err("%-20s%04x\n", "dr_write:", control->intercepts[DR_VECTOR] >> 16);
 	pr_err("%-20s%08x\n", "exceptions:", control->intercepts[EXCEPTION_VECTOR]);
-	pr_err("%-20s%016llx\n", "intercepts:", control->intercept);
+	pr_err("%-20s%08x\n", "intercept1:", control->intercepts[INTERCEPT_VECTOR_3]);
+	pr_err("%-20s%08x\n", "intercept2:", control->intercepts[INTERCEPT_VECTOR_4]);
 	pr_err("%-20s%d\n", "pause filter count:", control->pause_filter_count);
 	pr_err("%-20s%d\n", "pause filter threshold:",
 	       control->pause_filter_thresh);
@@ -3738,7 +3736,6 @@ static int svm_check_intercept(struct kvm_vcpu *vcpu,
 		break;
 	case SVM_EXIT_WRITE_CR0: {
 		unsigned long cr0, val;
-		u64 intercept;
 
 		if (info->intercept == x86_intercept_cr_write)
 			icpt_info.exit_code += info->modrm_reg;
@@ -3747,9 +3744,8 @@ static int svm_check_intercept(struct kvm_vcpu *vcpu,
 		    info->intercept == x86_intercept_clts)
 			break;
 
-		intercept = svm->nested.ctl.intercept;
-
-		if (!(intercept & (1ULL << INTERCEPT_SELECTIVE_CR0)))
+		if (!(vmcb_is_intercept(&svm->nested.ctl,
+					INTERCEPT_SELECTIVE_CR0)))
 			break;
 
 		cr0 = vcpu->arch.cr0 & ~SVM_CR0_SELECTIVE_MASK;
@@ -4010,7 +4006,7 @@ static bool svm_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
 	 * if an INIT signal is pending.
 	 */
 	return !gif_set(svm) ||
-		   (svm->vmcb->control.intercept & (1ULL << INTERCEPT_INIT));
+		   (vmcb_is_intercept(&svm->vmcb->control, INTERCEPT_INIT));
 }
 
 static void svm_vm_destroy(struct kvm *kvm)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 2fc305f647a3..2cde5091775a 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -309,7 +309,7 @@ static inline void svm_set_intercept(struct vcpu_svm *svm, int bit)
 {
 	struct vmcb *vmcb = get_host_vmcb(svm);
 
-	vmcb->control.intercept |= (1ULL << bit);
+	vmcb_set_intercept(&vmcb->control, bit);
 
 	recalc_intercepts(svm);
 }
@@ -318,14 +318,14 @@ static inline void svm_clr_intercept(struct vcpu_svm *svm, int bit)
 {
 	struct vmcb *vmcb = get_host_vmcb(svm);
 
-	vmcb->control.intercept &= ~(1ULL << bit);
+	vmcb_clr_intercept(&vmcb->control, bit);
 
 	recalc_intercepts(svm);
 }
 
 static inline bool svm_is_intercept(struct vcpu_svm *svm, int bit)
 {
-	return (svm->vmcb->control.intercept & (1ULL << bit)) != 0;
+	return vmcb_is_intercept(&svm->vmcb->control, bit);
 }
 
 static inline bool vgif_enabled(struct vcpu_svm *svm)
@@ -389,17 +389,17 @@ static inline bool nested_svm_virtualize_tpr(struct kvm_vcpu *vcpu)
 
 static inline bool nested_exit_on_smi(struct vcpu_svm *svm)
 {
-	return (svm->nested.ctl.intercept & (1ULL << INTERCEPT_SMI));
+	return vmcb_is_intercept(&svm->nested.ctl, INTERCEPT_SMI);
 }
 
 static inline bool nested_exit_on_intr(struct vcpu_svm *svm)
 {
-	return (svm->nested.ctl.intercept & (1ULL << INTERCEPT_INTR));
+	return vmcb_is_intercept(&svm->nested.ctl, INTERCEPT_INTR);
 }
 
 static inline bool nested_exit_on_nmi(struct vcpu_svm *svm)
 {
-	return (svm->nested.ctl.intercept & (1ULL << INTERCEPT_NMI));
+	return vmcb_is_intercept(&svm->nested.ctl, INTERCEPT_NMI);
 }
 
 int enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index b66432b015d2..6e7262229e6a 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -544,26 +544,30 @@ TRACE_EVENT(kvm_nested_vmrun,
 );
 
 TRACE_EVENT(kvm_nested_intercepts,
-	    TP_PROTO(__u16 cr_read, __u16 cr_write, __u32 exceptions, __u64 intercept),
-	    TP_ARGS(cr_read, cr_write, exceptions, intercept),
+	    TP_PROTO(__u16 cr_read, __u16 cr_write, __u32 exceptions, __u32 intercept1,
+		     __u32 intercept2),
+	    TP_ARGS(cr_read, cr_write, exceptions, intercept1, intercept2),
 
 	TP_STRUCT__entry(
 		__field(	__u16,		cr_read		)
 		__field(	__u16,		cr_write	)
 		__field(	__u32,		exceptions	)
-		__field(	__u64,		intercept	)
+		__field(	__u32,		intercept1	)
+		__field(	__u32,		intercept2	)
 	),
 
 	TP_fast_assign(
 		__entry->cr_read	= cr_read;
 		__entry->cr_write	= cr_write;
 		__entry->exceptions	= exceptions;
-		__entry->intercept	= intercept;
+		__entry->intercept1	= intercept1;
+		__entry->intercept2	= intercept2;
 	),
 
-	TP_printk("cr_read: %04x cr_write: %04x excp: %08x intercept: %016llx",
-		__entry->cr_read, __entry->cr_write, __entry->exceptions,
-		__entry->intercept)
+	TP_printk("cr_read: %04x cr_write: %04x excp: %08x "
+		  "intercept1: %08x intercept2: %08x",
+		  __entry->cr_read, __entry->cr_write, __entry->exceptions,
+		  __entry->intercept1, __entry->intercept2)
 );
 /*
  * Tracepoint for #VMEXIT while nested


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v6 06/12] KVM: SVM: Add new intercept vector in vmcb_control_area
  2020-09-11 19:27 [PATCH v6 00/12] SVM cleanup and INVPCID feature support Babu Moger
                   ` (4 preceding siblings ...)
  2020-09-11 19:28 ` [PATCH v6 05/12] KVM: SVM: Modify 64 bit intercept field to two 32 bit vectors Babu Moger
@ 2020-09-11 19:28 ` Babu Moger
  2020-09-11 19:28 ` [PATCH v6 07/12] KVM: nSVM: Cleanup nested_state data structure Babu Moger
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 62+ messages in thread
From: Babu Moger @ 2020-09-11 19:28 UTC (permalink / raw)
  To: pbonzini, vkuznets, sean.j.christopherson, jmattson
  Cc: wanpengli, kvm, joro, x86, linux-kernel, babu.moger, mingo, bp,
	hpa, tglx

The new intercept bits have been added in vmcb control area to support
few more interceptions. Here are the some of them.
 - INTERCEPT_INVLPGB,
 - INTERCEPT_INVLPGB_ILLEGAL,
 - INTERCEPT_INVPCID,
 - INTERCEPT_MCOMMIT,
 - INTERCEPT_TLBSYNC,

Add new intercept vector in vmcb_control_area to support these instructions.
Also update kvm_nested_vmrun trace function to support the new addition.

AMD documentation for these instructions is available at "AMD64
Architecture Programmer’s Manual Volume 2: System Programming, Pub. 24593
Rev. 3.34(or later)"

The documentation can be obtained at the links below:
Link: https://www.amd.com/system/files/TechDocs/24593.pdf
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537

Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/include/asm/svm.h |    7 +++++++
 arch/x86/kvm/svm/nested.c  |    3 ++-
 arch/x86/kvm/trace.h       |   13 ++++++++-----
 3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 9f0fa02fc838..623c392a55ac 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -16,6 +16,7 @@ enum vector_offset {
 	EXCEPTION_VECTOR,
 	INTERCEPT_VECTOR_3,
 	INTERCEPT_VECTOR_4,
+	INTERCEPT_VECTOR_5,
 	MAX_VECTORS,
 };
 
@@ -124,6 +125,12 @@ enum {
 	INTERCEPT_MWAIT_COND,
 	INTERCEPT_XSETBV,
 	INTERCEPT_RDPRU,
+	/* Byte offset 014h (Vector 5) */
+	INTERCEPT_INVLPGB = 160,
+	INTERCEPT_INVLPGB_ILLEGAL,
+	INTERCEPT_INVPCID,
+	INTERCEPT_MCOMMIT,
+	INTERCEPT_TLBSYNC,
 };
 
 
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 353d550a5bb7..c833f6265b59 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -489,7 +489,8 @@ int nested_svm_vmrun(struct vcpu_svm *svm)
 				    nested_vmcb->control.intercepts[CR_VECTOR] >> 16,
 				    nested_vmcb->control.intercepts[EXCEPTION_VECTOR],
 				    nested_vmcb->control.intercepts[INTERCEPT_VECTOR_3],
-				    nested_vmcb->control.intercepts[INTERCEPT_VECTOR_4]);
+				    nested_vmcb->control.intercepts[INTERCEPT_VECTOR_4],
+				    nested_vmcb->control.intercepts[INTERCEPT_VECTOR_5]);
 
 	/* Clear internal status */
 	kvm_clear_exception_queue(&svm->vcpu);
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 6e7262229e6a..11046171b5d9 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -544,9 +544,10 @@ TRACE_EVENT(kvm_nested_vmrun,
 );
 
 TRACE_EVENT(kvm_nested_intercepts,
-	    TP_PROTO(__u16 cr_read, __u16 cr_write, __u32 exceptions, __u32 intercept1,
-		     __u32 intercept2),
-	    TP_ARGS(cr_read, cr_write, exceptions, intercept1, intercept2),
+	    TP_PROTO(__u16 cr_read, __u16 cr_write, __u32 exceptions,
+		     __u32 intercept1, __u32 intercept2, __u32 intercept3),
+	    TP_ARGS(cr_read, cr_write, exceptions, intercept1,
+		    intercept2, intercept3),
 
 	TP_STRUCT__entry(
 		__field(	__u16,		cr_read		)
@@ -554,6 +555,7 @@ TRACE_EVENT(kvm_nested_intercepts,
 		__field(	__u32,		exceptions	)
 		__field(	__u32,		intercept1	)
 		__field(	__u32,		intercept2	)
+		__field(	__u32,		intercept3	)
 	),
 
 	TP_fast_assign(
@@ -562,12 +564,13 @@ TRACE_EVENT(kvm_nested_intercepts,
 		__entry->exceptions	= exceptions;
 		__entry->intercept1	= intercept1;
 		__entry->intercept2	= intercept2;
+		__entry->intercept3	= intercept3;
 	),
 
 	TP_printk("cr_read: %04x cr_write: %04x excp: %08x "
-		  "intercept1: %08x intercept2: %08x",
+		  "intercept1: %08x intercept2: %08x  intercept3: %08x",
 		  __entry->cr_read, __entry->cr_write, __entry->exceptions,
-		  __entry->intercept1, __entry->intercept2)
+		  __entry->intercept1, __entry->intercept2, __entry->intercept3)
 );
 /*
  * Tracepoint for #VMEXIT while nested


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v6 07/12] KVM: nSVM: Cleanup nested_state data structure
  2020-09-11 19:27 [PATCH v6 00/12] SVM cleanup and INVPCID feature support Babu Moger
                   ` (5 preceding siblings ...)
  2020-09-11 19:28 ` [PATCH v6 06/12] KVM: SVM: Add new intercept vector in vmcb_control_area Babu Moger
@ 2020-09-11 19:28 ` Babu Moger
  2020-09-11 19:28 ` [PATCH v6 08/12] KVM: SVM: Remove set_cr_intercept, clr_cr_intercept and is_cr_intercept Babu Moger
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 62+ messages in thread
From: Babu Moger @ 2020-09-11 19:28 UTC (permalink / raw)
  To: pbonzini, vkuznets, sean.j.christopherson, jmattson
  Cc: wanpengli, kvm, joro, x86, linux-kernel, babu.moger, mingo, bp,
	hpa, tglx

host_intercept_exceptions is not used anywhere. Clean it up.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/kvm/svm/nested.c |    2 --
 arch/x86/kvm/svm/svm.h    |    1 -
 2 files changed, 3 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index c833f6265b59..7121756685ee 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -109,8 +109,6 @@ void recalc_intercepts(struct vcpu_svm *svm)
 	h = &svm->nested.hsave->control;
 	g = &svm->nested.ctl;
 
-	svm->nested.host_intercept_exceptions = h->intercepts[EXCEPTION_VECTOR];
-
 	for (i = 0; i < MAX_VECTORS; i++)
 		c->intercepts[i] = h->intercepts[i];
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 2cde5091775a..ffb35a83048f 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -86,7 +86,6 @@ struct svm_nested_state {
 	u64 hsave_msr;
 	u64 vm_cr_msr;
 	u64 vmcb;
-	u32 host_intercept_exceptions;
 
 	/* These are the merged vectors */
 	u32 *msrpm;


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v6 08/12] KVM: SVM: Remove set_cr_intercept, clr_cr_intercept and is_cr_intercept
  2020-09-11 19:27 [PATCH v6 00/12] SVM cleanup and INVPCID feature support Babu Moger
                   ` (6 preceding siblings ...)
  2020-09-11 19:28 ` [PATCH v6 07/12] KVM: nSVM: Cleanup nested_state data structure Babu Moger
@ 2020-09-11 19:28 ` Babu Moger
  2020-09-11 19:28 ` [PATCH v6 09/12] KVM: SVM: Remove set_exception_intercept and clr_exception_intercept Babu Moger
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 62+ messages in thread
From: Babu Moger @ 2020-09-11 19:28 UTC (permalink / raw)
  To: pbonzini, vkuznets, sean.j.christopherson, jmattson
  Cc: wanpengli, kvm, joro, x86, linux-kernel, babu.moger, mingo, bp,
	hpa, tglx

Remove set_cr_intercept, clr_cr_intercept and is_cr_intercept. Instead
call generic svm_set_intercept, svm_clr_intercept an dsvm_is_intercep
tfor all cr intercepts.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/kvm/svm/svm.c |   34 +++++++++++++++++-----------------
 arch/x86/kvm/svm/svm.h |   25 -------------------------
 2 files changed, 17 insertions(+), 42 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 17bfa34033ac..0d7397f4a4f7 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -992,14 +992,14 @@ static void init_vmcb(struct vcpu_svm *svm)
 
 	svm->vcpu.arch.hflags = 0;
 
-	set_cr_intercept(svm, INTERCEPT_CR0_READ);
-	set_cr_intercept(svm, INTERCEPT_CR3_READ);
-	set_cr_intercept(svm, INTERCEPT_CR4_READ);
-	set_cr_intercept(svm, INTERCEPT_CR0_WRITE);
-	set_cr_intercept(svm, INTERCEPT_CR3_WRITE);
-	set_cr_intercept(svm, INTERCEPT_CR4_WRITE);
+	svm_set_intercept(svm, INTERCEPT_CR0_READ);
+	svm_set_intercept(svm, INTERCEPT_CR3_READ);
+	svm_set_intercept(svm, INTERCEPT_CR4_READ);
+	svm_set_intercept(svm, INTERCEPT_CR0_WRITE);
+	svm_set_intercept(svm, INTERCEPT_CR3_WRITE);
+	svm_set_intercept(svm, INTERCEPT_CR4_WRITE);
 	if (!kvm_vcpu_apicv_active(&svm->vcpu))
-		set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
+		svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
 
 	set_dr_intercepts(svm);
 
@@ -1094,8 +1094,8 @@ static void init_vmcb(struct vcpu_svm *svm)
 		control->nested_ctl |= SVM_NESTED_CTL_NP_ENABLE;
 		svm_clr_intercept(svm, INTERCEPT_INVLPG);
 		clr_exception_intercept(svm, INTERCEPT_PF_VECTOR);
-		clr_cr_intercept(svm, INTERCEPT_CR3_READ);
-		clr_cr_intercept(svm, INTERCEPT_CR3_WRITE);
+		svm_clr_intercept(svm, INTERCEPT_CR3_READ);
+		svm_clr_intercept(svm, INTERCEPT_CR3_WRITE);
 		save->g_pat = svm->vcpu.arch.pat;
 		save->cr3 = 0;
 		save->cr4 = 0;
@@ -1549,11 +1549,11 @@ static void update_cr0_intercept(struct vcpu_svm *svm)
 	vmcb_mark_dirty(svm->vmcb, VMCB_CR);
 
 	if (gcr0 == *hcr0) {
-		clr_cr_intercept(svm, INTERCEPT_CR0_READ);
-		clr_cr_intercept(svm, INTERCEPT_CR0_WRITE);
+		svm_clr_intercept(svm, INTERCEPT_CR0_READ);
+		svm_clr_intercept(svm, INTERCEPT_CR0_WRITE);
 	} else {
-		set_cr_intercept(svm, INTERCEPT_CR0_READ);
-		set_cr_intercept(svm, INTERCEPT_CR0_WRITE);
+		svm_set_intercept(svm, INTERCEPT_CR0_READ);
+		svm_set_intercept(svm, INTERCEPT_CR0_WRITE);
 	}
 }
 
@@ -2931,7 +2931,7 @@ static int handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 
 	trace_kvm_exit(exit_code, vcpu, KVM_ISA_SVM);
 
-	if (!is_cr_intercept(svm, INTERCEPT_CR0_WRITE))
+	if (!svm_is_intercept(svm, INTERCEPT_CR0_WRITE))
 		vcpu->arch.cr0 = svm->vmcb->save.cr0;
 	if (npt_enabled)
 		vcpu->arch.cr3 = svm->vmcb->save.cr3;
@@ -3056,13 +3056,13 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
 	if (nested_svm_virtualize_tpr(vcpu))
 		return;
 
-	clr_cr_intercept(svm, INTERCEPT_CR8_WRITE);
+	svm_clr_intercept(svm, INTERCEPT_CR8_WRITE);
 
 	if (irr == -1)
 		return;
 
 	if (tpr >= irr)
-		set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
+		svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
 }
 
 bool svm_nmi_blocked(struct kvm_vcpu *vcpu)
@@ -3250,7 +3250,7 @@ static inline void sync_cr8_to_lapic(struct kvm_vcpu *vcpu)
 	if (nested_svm_virtualize_tpr(vcpu))
 		return;
 
-	if (!is_cr_intercept(svm, INTERCEPT_CR8_WRITE)) {
+	if (!svm_is_intercept(svm, INTERCEPT_CR8_WRITE)) {
 		int cr8 = svm->vmcb->control.int_ctl & V_TPR_MASK;
 		kvm_set_cr8(vcpu, cr8);
 	}
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index ffb35a83048f..8128bac75fa2 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -228,31 +228,6 @@ static inline bool vmcb_is_intercept(struct vmcb_control_area *control, int bit)
 	return test_bit(bit, (unsigned long *)&control->intercepts);
 }
 
-static inline void set_cr_intercept(struct vcpu_svm *svm, int bit)
-{
-	struct vmcb *vmcb = get_host_vmcb(svm);
-
-	vmcb_set_intercept(&vmcb->control, bit);
-
-	recalc_intercepts(svm);
-}
-
-static inline void clr_cr_intercept(struct vcpu_svm *svm, int bit)
-{
-	struct vmcb *vmcb = get_host_vmcb(svm);
-
-	vmcb_clr_intercept(&vmcb->control, bit);
-
-	recalc_intercepts(svm);
-}
-
-static inline bool is_cr_intercept(struct vcpu_svm *svm, int bit)
-{
-	struct vmcb *vmcb = get_host_vmcb(svm);
-
-	return vmcb_is_intercept(&vmcb->control, bit);
-}
-
 static inline void set_dr_intercepts(struct vcpu_svm *svm)
 {
 	struct vmcb *vmcb = get_host_vmcb(svm);


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v6 09/12] KVM: SVM: Remove set_exception_intercept and clr_exception_intercept
  2020-09-11 19:27 [PATCH v6 00/12] SVM cleanup and INVPCID feature support Babu Moger
                   ` (7 preceding siblings ...)
  2020-09-11 19:28 ` [PATCH v6 08/12] KVM: SVM: Remove set_cr_intercept, clr_cr_intercept and is_cr_intercept Babu Moger
@ 2020-09-11 19:28 ` Babu Moger
  2020-09-11 19:29 ` [PATCH v6 10/12] KVM: X86: Rename and move the function vmx_handle_memory_failure to x86.c Babu Moger
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 62+ messages in thread
From: Babu Moger @ 2020-09-11 19:28 UTC (permalink / raw)
  To: pbonzini, vkuznets, sean.j.christopherson, jmattson
  Cc: wanpengli, kvm, joro, x86, linux-kernel, babu.moger, mingo, bp,
	hpa, tglx

Remove set_exception_intercept and clr_exception_intercept.
Replace with generic set_intercept and clr_intercept for these calls.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/kvm/svm/svm.c |   20 ++++++++++----------
 arch/x86/kvm/svm/svm.h |   18 ------------------
 2 files changed, 10 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 0d7397f4a4f7..96617b61e531 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1003,11 +1003,11 @@ static void init_vmcb(struct vcpu_svm *svm)
 
 	set_dr_intercepts(svm);
 
-	set_exception_intercept(svm, INTERCEPT_PF_VECTOR);
-	set_exception_intercept(svm, INTERCEPT_UD_VECTOR);
-	set_exception_intercept(svm, INTERCEPT_MC_VECTOR);
-	set_exception_intercept(svm, INTERCEPT_AC_VECTOR);
-	set_exception_intercept(svm, INTERCEPT_DB_VECTOR);
+	svm_set_intercept(svm, INTERCEPT_PF_VECTOR);
+	svm_set_intercept(svm, INTERCEPT_UD_VECTOR);
+	svm_set_intercept(svm, INTERCEPT_MC_VECTOR);
+	svm_set_intercept(svm, INTERCEPT_AC_VECTOR);
+	svm_set_intercept(svm, INTERCEPT_DB_VECTOR);
 	/*
 	 * Guest access to VMware backdoor ports could legitimately
 	 * trigger #GP because of TSS I/O permission bitmap.
@@ -1015,7 +1015,7 @@ static void init_vmcb(struct vcpu_svm *svm)
 	 * as VMware does.
 	 */
 	if (enable_vmware_backdoor)
-		set_exception_intercept(svm, INTERCEPT_GP_VECTOR);
+		svm_set_intercept(svm, INTERCEPT_GP_VECTOR);
 
 	svm_set_intercept(svm, INTERCEPT_INTR);
 	svm_set_intercept(svm, INTERCEPT_NMI);
@@ -1093,7 +1093,7 @@ static void init_vmcb(struct vcpu_svm *svm)
 		/* Setup VMCB for Nested Paging */
 		control->nested_ctl |= SVM_NESTED_CTL_NP_ENABLE;
 		svm_clr_intercept(svm, INTERCEPT_INVLPG);
-		clr_exception_intercept(svm, INTERCEPT_PF_VECTOR);
+		svm_clr_intercept(svm, INTERCEPT_PF_VECTOR);
 		svm_clr_intercept(svm, INTERCEPT_CR3_READ);
 		svm_clr_intercept(svm, INTERCEPT_CR3_WRITE);
 		save->g_pat = svm->vcpu.arch.pat;
@@ -1135,7 +1135,7 @@ static void init_vmcb(struct vcpu_svm *svm)
 
 	if (sev_guest(svm->vcpu.kvm)) {
 		svm->vmcb->control.nested_ctl |= SVM_NESTED_CTL_SEV_ENABLE;
-		clr_exception_intercept(svm, INTERCEPT_UD_VECTOR);
+		svm_clr_intercept(svm, INTERCEPT_UD_VECTOR);
 	}
 
 	vmcb_mark_all_dirty(svm->vmcb);
@@ -1646,11 +1646,11 @@ static void update_exception_bitmap(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	clr_exception_intercept(svm, INTERCEPT_BP_VECTOR);
+	svm_clr_intercept(svm, INTERCEPT_BP_VECTOR);
 
 	if (vcpu->guest_debug & KVM_GUESTDBG_ENABLE) {
 		if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP)
-			set_exception_intercept(svm, INTERCEPT_BP_VECTOR);
+			svm_set_intercept(svm, INTERCEPT_BP_VECTOR);
 	}
 }
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 8128bac75fa2..fc4bfea3f555 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -261,24 +261,6 @@ static inline void clr_dr_intercepts(struct vcpu_svm *svm)
 	recalc_intercepts(svm);
 }
 
-static inline void set_exception_intercept(struct vcpu_svm *svm, int bit)
-{
-	struct vmcb *vmcb = get_host_vmcb(svm);
-
-	vmcb_set_intercept(&vmcb->control, bit);
-
-	recalc_intercepts(svm);
-}
-
-static inline void clr_exception_intercept(struct vcpu_svm *svm, int bit)
-{
-	struct vmcb *vmcb = get_host_vmcb(svm);
-
-	vmcb_clr_intercept(&vmcb->control, bit);
-
-	recalc_intercepts(svm);
-}
-
 static inline void svm_set_intercept(struct vcpu_svm *svm, int bit)
 {
 	struct vmcb *vmcb = get_host_vmcb(svm);


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v6 10/12] KVM: X86: Rename and move the function vmx_handle_memory_failure to x86.c
  2020-09-11 19:27 [PATCH v6 00/12] SVM cleanup and INVPCID feature support Babu Moger
                   ` (8 preceding siblings ...)
  2020-09-11 19:28 ` [PATCH v6 09/12] KVM: SVM: Remove set_exception_intercept and clr_exception_intercept Babu Moger
@ 2020-09-11 19:29 ` Babu Moger
  2020-09-11 19:29 ` [PATCH v6 11/12] KVM: X86: Move handling of INVPCID types to x86 Babu Moger
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 62+ messages in thread
From: Babu Moger @ 2020-09-11 19:29 UTC (permalink / raw)
  To: pbonzini, vkuznets, sean.j.christopherson, jmattson
  Cc: wanpengli, kvm, joro, x86, linux-kernel, babu.moger, mingo, bp,
	hpa, tglx

Handling of kvm_read/write_guest_virt*() errors can be moved to common
code. The same code can be used by both VMX and SVM.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/kvm/vmx/nested.c |   12 ++++++------
 arch/x86/kvm/vmx/vmx.c    |   29 +----------------------------
 arch/x86/kvm/vmx/vmx.h    |    2 --
 arch/x86/kvm/x86.c        |   28 ++++++++++++++++++++++++++++
 arch/x86/kvm/x86.h        |    2 ++
 5 files changed, 37 insertions(+), 36 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 23b58c28a1c9..28becd22d9d9 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4688,7 +4688,7 @@ static int nested_vmx_get_vmptr(struct kvm_vcpu *vcpu, gpa_t *vmpointer,
 
 	r = kvm_read_guest_virt(vcpu, gva, vmpointer, sizeof(*vmpointer), &e);
 	if (r != X86EMUL_CONTINUE) {
-		*ret = vmx_handle_memory_failure(vcpu, r, &e);
+		*ret = kvm_handle_memory_failure(vcpu, r, &e);
 		return -EINVAL;
 	}
 
@@ -4995,7 +4995,7 @@ static int handle_vmread(struct kvm_vcpu *vcpu)
 		/* _system ok, nested_vmx_check_permission has verified cpl=0 */
 		r = kvm_write_guest_virt_system(vcpu, gva, &value, len, &e);
 		if (r != X86EMUL_CONTINUE)
-			return vmx_handle_memory_failure(vcpu, r, &e);
+			return kvm_handle_memory_failure(vcpu, r, &e);
 	}
 
 	return nested_vmx_succeed(vcpu);
@@ -5068,7 +5068,7 @@ static int handle_vmwrite(struct kvm_vcpu *vcpu)
 			return 1;
 		r = kvm_read_guest_virt(vcpu, gva, &value, len, &e);
 		if (r != X86EMUL_CONTINUE)
-			return vmx_handle_memory_failure(vcpu, r, &e);
+			return kvm_handle_memory_failure(vcpu, r, &e);
 	}
 
 	field = kvm_register_readl(vcpu, (((instr_info) >> 28) & 0xf));
@@ -5230,7 +5230,7 @@ static int handle_vmptrst(struct kvm_vcpu *vcpu)
 	r = kvm_write_guest_virt_system(vcpu, gva, (void *)&current_vmptr,
 					sizeof(gpa_t), &e);
 	if (r != X86EMUL_CONTINUE)
-		return vmx_handle_memory_failure(vcpu, r, &e);
+		return kvm_handle_memory_failure(vcpu, r, &e);
 
 	return nested_vmx_succeed(vcpu);
 }
@@ -5283,7 +5283,7 @@ static int handle_invept(struct kvm_vcpu *vcpu)
 		return 1;
 	r = kvm_read_guest_virt(vcpu, gva, &operand, sizeof(operand), &e);
 	if (r != X86EMUL_CONTINUE)
-		return vmx_handle_memory_failure(vcpu, r, &e);
+		return kvm_handle_memory_failure(vcpu, r, &e);
 
 	/*
 	 * Nested EPT roots are always held through guest_mmu,
@@ -5365,7 +5365,7 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
 		return 1;
 	r = kvm_read_guest_virt(vcpu, gva, &operand, sizeof(operand), &e);
 	if (r != X86EMUL_CONTINUE)
-		return vmx_handle_memory_failure(vcpu, r, &e);
+		return kvm_handle_memory_failure(vcpu, r, &e);
 
 	if (operand.vpid >> 16)
 		return nested_vmx_fail(vcpu,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 46ba2e03a892..b15b4c6e3b46 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1598,33 +1598,6 @@ static int skip_emulated_instruction(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
-/*
- * Handles kvm_read/write_guest_virt*() result and either injects #PF or returns
- * KVM_EXIT_INTERNAL_ERROR for cases not currently handled by KVM. Return value
- * indicates whether exit to userspace is needed.
- */
-int vmx_handle_memory_failure(struct kvm_vcpu *vcpu, int r,
-			      struct x86_exception *e)
-{
-	if (r == X86EMUL_PROPAGATE_FAULT) {
-		kvm_inject_emulated_page_fault(vcpu, e);
-		return 1;
-	}
-
-	/*
-	 * In case kvm_read/write_guest_virt*() failed with X86EMUL_IO_NEEDED
-	 * while handling a VMX instruction KVM could've handled the request
-	 * correctly by exiting to userspace and performing I/O but there
-	 * doesn't seem to be a real use-case behind such requests, just return
-	 * KVM_EXIT_INTERNAL_ERROR for now.
-	 */
-	vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
-	vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
-	vcpu->run->internal.ndata = 0;
-
-	return 0;
-}
-
 /*
  * Recognizes a pending MTF VM-exit and records the nested state for later
  * delivery.
@@ -5558,7 +5531,7 @@ static int handle_invpcid(struct kvm_vcpu *vcpu)
 
 	r = kvm_read_guest_virt(vcpu, gva, &operand, sizeof(operand), &e);
 	if (r != X86EMUL_CONTINUE)
-		return vmx_handle_memory_failure(vcpu, r, &e);
+		return kvm_handle_memory_failure(vcpu, r, &e);
 
 	if (operand.pcid >> 12 != 0) {
 		kvm_inject_gp(vcpu, 0);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 26175a4759fa..7c578564a8fc 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -354,8 +354,6 @@ struct shared_msr_entry *find_msr_entry(struct vcpu_vmx *vmx, u32 msr);
 void pt_update_intercept_for_msr(struct vcpu_vmx *vmx);
 void vmx_update_host_rsp(struct vcpu_vmx *vmx, unsigned long host_rsp);
 int vmx_find_msr_index(struct vmx_msrs *m, u32 msr);
-int vmx_handle_memory_failure(struct kvm_vcpu *vcpu, int r,
-			      struct x86_exception *e);
 
 #define POSTED_INTR_ON  0
 #define POSTED_INTR_SN  1
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 539ea1cd6020..5d7930ecdddc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10763,6 +10763,34 @@ void kvm_fixup_and_inject_pf_error(struct kvm_vcpu *vcpu, gva_t gva, u16 error_c
 }
 EXPORT_SYMBOL_GPL(kvm_fixup_and_inject_pf_error);
 
+/*
+ * Handles kvm_read/write_guest_virt*() result and either injects #PF or returns
+ * KVM_EXIT_INTERNAL_ERROR for cases not currently handled by KVM. Return value
+ * indicates whether exit to userspace is needed.
+ */
+int kvm_handle_memory_failure(struct kvm_vcpu *vcpu, int r,
+			      struct x86_exception *e)
+{
+	if (r == X86EMUL_PROPAGATE_FAULT) {
+		kvm_inject_emulated_page_fault(vcpu, e);
+		return 1;
+	}
+
+	/*
+	 * In case kvm_read/write_guest_virt*() failed with X86EMUL_IO_NEEDED
+	 * while handling a VMX instruction KVM could've handled the request
+	 * correctly by exiting to userspace and performing I/O but there
+	 * doesn't seem to be a real use-case behind such requests, just return
+	 * KVM_EXIT_INTERNAL_ERROR for now.
+	 */
+	vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+	vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
+	vcpu->run->internal.ndata = 0;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_handle_memory_failure);
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 995ab696dcf0..d3a41144eb30 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -372,6 +372,8 @@ void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu);
 int kvm_spec_ctrl_test_value(u64 value);
 int kvm_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
 bool kvm_vcpu_exit_request(struct kvm_vcpu *vcpu);
+int kvm_handle_memory_failure(struct kvm_vcpu *vcpu, int r,
+			      struct x86_exception *e);
 
 #define  KVM_MSR_RET_INVALID  2
 


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v6 11/12] KVM: X86: Move handling of INVPCID types to x86
  2020-09-11 19:27 [PATCH v6 00/12] SVM cleanup and INVPCID feature support Babu Moger
                   ` (9 preceding siblings ...)
  2020-09-11 19:29 ` [PATCH v6 10/12] KVM: X86: Rename and move the function vmx_handle_memory_failure to x86.c Babu Moger
@ 2020-09-11 19:29 ` Babu Moger
  2020-09-11 19:29 ` [PATCH v6 12/12] KVM:SVM: Enable INVPCID feature on AMD Babu Moger
  2020-09-12 17:08 ` [PATCH v6 00/12] SVM cleanup and INVPCID feature support Paolo Bonzini
  12 siblings, 0 replies; 62+ messages in thread
From: Babu Moger @ 2020-09-11 19:29 UTC (permalink / raw)
  To: pbonzini, vkuznets, sean.j.christopherson, jmattson
  Cc: wanpengli, kvm, joro, x86, linux-kernel, babu.moger, mingo, bp,
	hpa, tglx

INVPCID instruction handling is mostly same across both VMX and
SVM. So, move the code to common x86.c.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/kvm/vmx/vmx.c |   68 +-----------------------------------------
 arch/x86/kvm/x86.c     |   78 ++++++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.h     |    1 +
 3 files changed, 80 insertions(+), 67 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b15b4c6e3b46..ff42d27f641f 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5497,16 +5497,11 @@ static int handle_invpcid(struct kvm_vcpu *vcpu)
 {
 	u32 vmx_instruction_info;
 	unsigned long type;
-	bool pcid_enabled;
 	gva_t gva;
-	struct x86_exception e;
-	unsigned i;
-	unsigned long roots_to_free = 0;
 	struct {
 		u64 pcid;
 		u64 gla;
 	} operand;
-	int r;
 
 	if (!guest_cpuid_has(vcpu, X86_FEATURE_INVPCID)) {
 		kvm_queue_exception(vcpu, UD_VECTOR);
@@ -5529,68 +5524,7 @@ static int handle_invpcid(struct kvm_vcpu *vcpu)
 				sizeof(operand), &gva))
 		return 1;
 
-	r = kvm_read_guest_virt(vcpu, gva, &operand, sizeof(operand), &e);
-	if (r != X86EMUL_CONTINUE)
-		return kvm_handle_memory_failure(vcpu, r, &e);
-
-	if (operand.pcid >> 12 != 0) {
-		kvm_inject_gp(vcpu, 0);
-		return 1;
-	}
-
-	pcid_enabled = kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
-
-	switch (type) {
-	case INVPCID_TYPE_INDIV_ADDR:
-		if ((!pcid_enabled && (operand.pcid != 0)) ||
-		    is_noncanonical_address(operand.gla, vcpu)) {
-			kvm_inject_gp(vcpu, 0);
-			return 1;
-		}
-		kvm_mmu_invpcid_gva(vcpu, operand.gla, operand.pcid);
-		return kvm_skip_emulated_instruction(vcpu);
-
-	case INVPCID_TYPE_SINGLE_CTXT:
-		if (!pcid_enabled && (operand.pcid != 0)) {
-			kvm_inject_gp(vcpu, 0);
-			return 1;
-		}
-
-		if (kvm_get_active_pcid(vcpu) == operand.pcid) {
-			kvm_mmu_sync_roots(vcpu);
-			kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
-		}
-
-		for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
-			if (kvm_get_pcid(vcpu, vcpu->arch.mmu->prev_roots[i].pgd)
-			    == operand.pcid)
-				roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
-
-		kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, roots_to_free);
-		/*
-		 * If neither the current cr3 nor any of the prev_roots use the
-		 * given PCID, then nothing needs to be done here because a
-		 * resync will happen anyway before switching to any other CR3.
-		 */
-
-		return kvm_skip_emulated_instruction(vcpu);
-
-	case INVPCID_TYPE_ALL_NON_GLOBAL:
-		/*
-		 * Currently, KVM doesn't mark global entries in the shadow
-		 * page tables, so a non-global flush just degenerates to a
-		 * global flush. If needed, we could optimize this later by
-		 * keeping track of global entries in shadow page tables.
-		 */
-
-		/* fall-through */
-	case INVPCID_TYPE_ALL_INCL_GLOBAL:
-		kvm_mmu_unload(vcpu);
-		return kvm_skip_emulated_instruction(vcpu);
-
-	default:
-		BUG(); /* We have already checked above that type <= 3 */
-	}
+	return kvm_handle_invpcid(vcpu, type, gva);
 }
 
 static int handle_pml_full(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5d7930ecdddc..39ca22e0f8b2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -71,6 +71,7 @@
 #include <asm/irq_remapping.h>
 #include <asm/mshyperv.h>
 #include <asm/hypervisor.h>
+#include <asm/tlbflush.h>
 #include <asm/intel_pt.h>
 #include <asm/emulate_prefix.h>
 #include <clocksource/hyperv_timer.h>
@@ -10791,6 +10792,83 @@ int kvm_handle_memory_failure(struct kvm_vcpu *vcpu, int r,
 }
 EXPORT_SYMBOL_GPL(kvm_handle_memory_failure);
 
+int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva)
+{
+	bool pcid_enabled;
+	struct x86_exception e;
+	unsigned i;
+	unsigned long roots_to_free = 0;
+	struct {
+		u64 pcid;
+		u64 gla;
+	} operand;
+	int r;
+
+	r = kvm_read_guest_virt(vcpu, gva, &operand, sizeof(operand), &e);
+	if (r != X86EMUL_CONTINUE)
+		return kvm_handle_memory_failure(vcpu, r, &e);
+
+	if (operand.pcid >> 12 != 0) {
+		kvm_inject_gp(vcpu, 0);
+		return 1;
+	}
+
+	pcid_enabled = kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE);
+
+	switch (type) {
+	case INVPCID_TYPE_INDIV_ADDR:
+		if ((!pcid_enabled && (operand.pcid != 0)) ||
+		    is_noncanonical_address(operand.gla, vcpu)) {
+			kvm_inject_gp(vcpu, 0);
+			return 1;
+		}
+		kvm_mmu_invpcid_gva(vcpu, operand.gla, operand.pcid);
+		return kvm_skip_emulated_instruction(vcpu);
+
+	case INVPCID_TYPE_SINGLE_CTXT:
+		if (!pcid_enabled && (operand.pcid != 0)) {
+			kvm_inject_gp(vcpu, 0);
+			return 1;
+		}
+
+		if (kvm_get_active_pcid(vcpu) == operand.pcid) {
+			kvm_mmu_sync_roots(vcpu);
+			kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
+		}
+
+		for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
+			if (kvm_get_pcid(vcpu, vcpu->arch.mmu->prev_roots[i].pgd)
+			    == operand.pcid)
+				roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
+
+		kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, roots_to_free);
+		/*
+		 * If neither the current cr3 nor any of the prev_roots use the
+		 * given PCID, then nothing needs to be done here because a
+		 * resync will happen anyway before switching to any other CR3.
+		 */
+
+		return kvm_skip_emulated_instruction(vcpu);
+
+	case INVPCID_TYPE_ALL_NON_GLOBAL:
+		/*
+		 * Currently, KVM doesn't mark global entries in the shadow
+		 * page tables, so a non-global flush just degenerates to a
+		 * global flush. If needed, we could optimize this later by
+		 * keeping track of global entries in shadow page tables.
+		 */
+
+		/* fall-through */
+	case INVPCID_TYPE_ALL_INCL_GLOBAL:
+		kvm_mmu_unload(vcpu);
+		return kvm_skip_emulated_instruction(vcpu);
+
+	default:
+		BUG(); /* We have already checked above that type <= 3 */
+	}
+}
+EXPORT_SYMBOL_GPL(kvm_handle_invpcid);
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index d3a41144eb30..6781fd660a29 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -374,6 +374,7 @@ int kvm_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
 bool kvm_vcpu_exit_request(struct kvm_vcpu *vcpu);
 int kvm_handle_memory_failure(struct kvm_vcpu *vcpu, int r,
 			      struct x86_exception *e);
+int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva);
 
 #define  KVM_MSR_RET_INVALID  2
 


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v6 12/12] KVM:SVM: Enable INVPCID feature on AMD
  2020-09-11 19:27 [PATCH v6 00/12] SVM cleanup and INVPCID feature support Babu Moger
                   ` (10 preceding siblings ...)
  2020-09-11 19:29 ` [PATCH v6 11/12] KVM: X86: Move handling of INVPCID types to x86 Babu Moger
@ 2020-09-11 19:29 ` Babu Moger
  2020-09-12 17:08 ` [PATCH v6 00/12] SVM cleanup and INVPCID feature support Paolo Bonzini
  12 siblings, 0 replies; 62+ messages in thread
From: Babu Moger @ 2020-09-11 19:29 UTC (permalink / raw)
  To: pbonzini, vkuznets, sean.j.christopherson, jmattson
  Cc: wanpengli, kvm, joro, x86, linux-kernel, babu.moger, mingo, bp,
	hpa, tglx

The following intercept bit has been added to support VMEXIT
for INVPCID instruction:
Code    Name            Cause
A2h     VMEXIT_INVPCID  INVPCID instruction

The following bit has been added to the VMCB layout control area
to control intercept of INVPCID:
Byte Offset     Bit(s)    Function
14h             2         intercept INVPCID

Enable the interceptions when the the guest is running with shadow
page table enabled and handle the tlbflush based on the invpcid
instruction type.

For the guests with nested page table (NPT) support, the INVPCID
feature works as running it natively. KVM does not need to do any
special handling in this case.

AMD documentation for INVPCID feature is available at "AMD64
Architecture Programmer’s Manual Volume 2: System Programming,
Pub. 24593 Rev. 3.34(or later)"

The documentation can be obtained at the links below:
Link: https://www.amd.com/system/files/TechDocs/24593.pdf
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537

Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/include/uapi/asm/svm.h |    2 ++
 arch/x86/kvm/svm/svm.c          |   51 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+)

diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 2e8a30f06c74..522d42dfc28c 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -76,6 +76,7 @@
 #define SVM_EXIT_MWAIT_COND    0x08c
 #define SVM_EXIT_XSETBV        0x08d
 #define SVM_EXIT_RDPRU         0x08e
+#define SVM_EXIT_INVPCID       0x0a2
 #define SVM_EXIT_NPF           0x400
 #define SVM_EXIT_AVIC_INCOMPLETE_IPI		0x401
 #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS	0x402
@@ -171,6 +172,7 @@
 	{ SVM_EXIT_MONITOR,     "monitor" }, \
 	{ SVM_EXIT_MWAIT,       "mwait" }, \
 	{ SVM_EXIT_XSETBV,      "xsetbv" }, \
+	{ SVM_EXIT_INVPCID,     "invpcid" }, \
 	{ SVM_EXIT_NPF,         "npf" }, \
 	{ SVM_EXIT_AVIC_INCOMPLETE_IPI,		"avic_incomplete_ipi" }, \
 	{ SVM_EXIT_AVIC_UNACCELERATED_ACCESS,   "avic_unaccelerated_access" }, \
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 96617b61e531..5c6b8d0f7628 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -813,6 +813,9 @@ static __init void svm_set_cpu_caps(void)
 	if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) ||
 	    boot_cpu_has(X86_FEATURE_AMD_SSBD))
 		kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD);
+
+	/* Enable INVPCID feature */
+	kvm_cpu_cap_check_and_set(X86_FEATURE_INVPCID);
 }
 
 static __init int svm_hardware_setup(void)
@@ -985,6 +988,21 @@ static u64 svm_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 	return svm->vmcb->control.tsc_offset;
 }
 
+static void svm_check_invpcid(struct vcpu_svm *svm)
+{
+	/*
+	 * Intercept INVPCID instruction only if shadow page table is
+	 * enabled. Interception is not required with nested page table
+	 * enabled.
+	 */
+	if (kvm_cpu_cap_has(X86_FEATURE_INVPCID)) {
+		if (!npt_enabled)
+			svm_set_intercept(svm, INTERCEPT_INVPCID);
+		else
+			svm_clr_intercept(svm, INTERCEPT_INVPCID);
+	}
+}
+
 static void init_vmcb(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -1114,6 +1132,8 @@ static void init_vmcb(struct vcpu_svm *svm)
 		svm_clr_intercept(svm, INTERCEPT_PAUSE);
 	}
 
+	svm_check_invpcid(svm);
+
 	if (kvm_vcpu_apicv_active(&svm->vcpu))
 		avic_init_vmcb(svm);
 
@@ -2730,6 +2750,33 @@ static int mwait_interception(struct vcpu_svm *svm)
 	return nop_interception(svm);
 }
 
+static int invpcid_interception(struct vcpu_svm *svm)
+{
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	unsigned long type;
+	gva_t gva;
+
+	if (!guest_cpuid_has(vcpu, X86_FEATURE_INVPCID)) {
+		kvm_queue_exception(vcpu, UD_VECTOR);
+		return 1;
+	}
+
+	/*
+	 * For an INVPCID intercept:
+	 * EXITINFO1 provides the linear address of the memory operand.
+	 * EXITINFO2 provides the contents of the register operand.
+	 */
+	type = svm->vmcb->control.exit_info_2;
+	gva = svm->vmcb->control.exit_info_1;
+
+	if (type > 3) {
+		kvm_inject_gp(vcpu, 0);
+		return 1;
+	}
+
+	return kvm_handle_invpcid(vcpu, type, gva);
+}
+
 static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
 	[SVM_EXIT_READ_CR0]			= cr_interception,
 	[SVM_EXIT_READ_CR3]			= cr_interception,
@@ -2792,6 +2839,7 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
 	[SVM_EXIT_MWAIT]			= mwait_interception,
 	[SVM_EXIT_XSETBV]			= xsetbv_interception,
 	[SVM_EXIT_RDPRU]			= rdpru_interception,
+	[SVM_EXIT_INVPCID]                      = invpcid_interception,
 	[SVM_EXIT_NPF]				= npf_interception,
 	[SVM_EXIT_RSM]                          = rsm_interception,
 	[SVM_EXIT_AVIC_INCOMPLETE_IPI]		= avic_incomplete_ipi_interception,
@@ -3622,6 +3670,9 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	svm->nrips_enabled = kvm_cpu_cap_has(X86_FEATURE_NRIPS) &&
 			     guest_cpuid_has(&svm->vcpu, X86_FEATURE_NRIPS);
 
+	/* Check again if INVPCID interception if required */
+	svm_check_invpcid(svm);
+
 	if (!kvm_vcpu_apicv_active(vcpu))
 		return;
 


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH v6 04/12] KVM: SVM: Modify intercept_exceptions to generic intercepts
  2020-09-11 19:28 ` [PATCH v6 04/12] KVM: SVM: Modify intercept_exceptions " Babu Moger
@ 2020-09-12 16:52   ` Paolo Bonzini
  2020-09-14 15:06     ` Sean Christopherson
  0 siblings, 1 reply; 62+ messages in thread
From: Paolo Bonzini @ 2020-09-12 16:52 UTC (permalink / raw)
  To: Babu Moger, vkuznets, sean.j.christopherson, jmattson
  Cc: wanpengli, kvm, joro, x86, linux-kernel, mingo, bp, hpa, tglx

On 11/09/20 21:28, Babu Moger wrote:
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 1a5f3908b388..11892e86cb39 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1003,11 +1003,11 @@ static void init_vmcb(struct vcpu_svm *svm)
>  
>  	set_dr_intercepts(svm);
>  
> -	set_exception_intercept(svm, PF_VECTOR);
> -	set_exception_intercept(svm, UD_VECTOR);
> -	set_exception_intercept(svm, MC_VECTOR);
> -	set_exception_intercept(svm, AC_VECTOR);
> -	set_exception_intercept(svm, DB_VECTOR);
> +	set_exception_intercept(svm, INTERCEPT_PF_VECTOR);
> +	set_exception_intercept(svm, INTERCEPT_UD_VECTOR);
> +	set_exception_intercept(svm, INTERCEPT_MC_VECTOR);
> +	set_exception_intercept(svm, INTERCEPT_AC_VECTOR);
> +	set_exception_intercept(svm, INTERCEPT_DB_VECTOR);

I think these should take a vector instead, and add 64 in the functions.

Paolo


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
  2020-09-11 19:27 [PATCH v6 00/12] SVM cleanup and INVPCID feature support Babu Moger
                   ` (11 preceding siblings ...)
  2020-09-11 19:29 ` [PATCH v6 12/12] KVM:SVM: Enable INVPCID feature on AMD Babu Moger
@ 2020-09-12 17:08 ` Paolo Bonzini
  2020-09-14 15:05   ` Sean Christopherson
  2020-09-14 18:33   ` Babu Moger
  12 siblings, 2 replies; 62+ messages in thread
From: Paolo Bonzini @ 2020-09-12 17:08 UTC (permalink / raw)
  To: Babu Moger, vkuznets, sean.j.christopherson, jmattson
  Cc: wanpengli, kvm, joro, x86, linux-kernel, mingo, bp, hpa, tglx

On 11/09/20 21:27, Babu Moger wrote:
> The following series adds the support for PCID/INVPCID on AMD guests.
> While doing it re-structured the vmcb_control_area data structure to
> combine all the intercept vectors into one 32 bit array. Makes it easy
> for future additions. Re-arranged few pcid related code to make it common
> between SVM and VMX.
> 
> INVPCID interceptions are added only when the guest is running with shadow
> page table enabled. In this case the hypervisor needs to handle the tlbflush
> based on the type of invpcid instruction.
> 
> For the guests with nested page table (NPT) support, the INVPCID feature
> works as running it natively. KVM does not need to do any special handling.
> 
> AMD documentation for INVPCID feature is available at "AMD64 Architecture
> Programmer’s Manual Volume 2: System Programming, Pub. 24593 Rev. 3.34(or later)"
> 
> The documentation can be obtained at the links below:
> Link: https://www.amd.com/system/files/TechDocs/24593.pdf
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> ---
> 
> v6:
>  One minor change in patch #04. Otherwise same as v5.
>  Updated all the patches by Reviewed-by.
> 
> v5:
>  https://lore.kernel.org/lkml/159846887637.18873.14677728679411578606.stgit@bmoger-ubuntu/
>  All the changes are related to rebase.
>  Aplies cleanly on mainline and kvm(master) tree. 
>  Resending it to get some attention.
> 
> v4:
>  https://lore.kernel.org/lkml/159676101387.12805.18038347880482984693.stgit@bmoger-ubuntu/
>  1. Changed the functions __set_intercept/__clr_intercept/__is_intercept to
>     to vmcb_set_intercept/vmcb_clr_intercept/vmcb_is_intercept by passing
>     vmcb_control_area structure(Suggested by Paolo).
>  2. Rearranged the commit 7a35e515a7055 ("KVM: VMX: Properly handle kvm_read/write_guest_virt*())
>     to make it common across both SVM/VMX(Suggested by Jim Mattson).
>  3. Took care of few other comments from Jim Mattson. Dropped "Reviewed-by"
>     on few patches which I have changed since v3.
> 
> v3:
>  https://lore.kernel.org/lkml/159597929496.12744.14654593948763926416.stgit@bmoger-ubuntu/
>  1. Addressing the comments from Jim Mattson. Follow the v2 link below
>     for the context.
>  2. Introduced the generic __set_intercept, __clr_intercept and is_intercept
>     using native __set_bit, clear_bit and test_bit.
>  3. Combined all the intercepts vectors into single 32 bit array.
>  4. Removed set_intercept_cr, clr_intercept_cr, set_exception_intercepts,
>     clr_exception_intercept etc. Used the generic set_intercept and
>     clr_intercept where applicable.
>  5. Tested both L1 guest and l2 nested guests. 
> 
> v2:
>   https://lore.kernel.org/lkml/159234483706.6230.13753828995249423191.stgit@bmoger-ubuntu/
>   - Taken care of few comments from Jim Mattson.
>   - KVM interceptions added only when tdp is off. No interceptions
>     when tdp is on.
>   - Reverted the fault priority to original order in VMX. 
>   
> v1:
>   https://lore.kernel.org/lkml/159191202523.31436.11959784252237488867.stgit@bmoger-ubuntu/
> 
> Babu Moger (12):
>       KVM: SVM: Introduce vmcb_(set_intercept/clr_intercept/_is_intercept)
>       KVM: SVM: Change intercept_cr to generic intercepts
>       KVM: SVM: Change intercept_dr to generic intercepts
>       KVM: SVM: Modify intercept_exceptions to generic intercepts
>       KVM: SVM: Modify 64 bit intercept field to two 32 bit vectors
>       KVM: SVM: Add new intercept vector in vmcb_control_area
>       KVM: nSVM: Cleanup nested_state data structure
>       KVM: SVM: Remove set_cr_intercept, clr_cr_intercept and is_cr_intercept
>       KVM: SVM: Remove set_exception_intercept and clr_exception_intercept
>       KVM: X86: Rename and move the function vmx_handle_memory_failure to x86.c
>       KVM: X86: Move handling of INVPCID types to x86
>       KVM:SVM: Enable INVPCID feature on AMD
> 
> 
>  arch/x86/include/asm/svm.h      |  117 +++++++++++++++++++++++++----------
>  arch/x86/include/uapi/asm/svm.h |    2 +
>  arch/x86/kvm/svm/nested.c       |   66 +++++++++-----------
>  arch/x86/kvm/svm/svm.c          |  131 ++++++++++++++++++++++++++-------------
>  arch/x86/kvm/svm/svm.h          |   87 +++++++++-----------------
>  arch/x86/kvm/trace.h            |   21 ++++--
>  arch/x86/kvm/vmx/nested.c       |   12 ++--
>  arch/x86/kvm/vmx/vmx.c          |   95 ----------------------------
>  arch/x86/kvm/vmx/vmx.h          |    2 -
>  arch/x86/kvm/x86.c              |  106 ++++++++++++++++++++++++++++++++
>  arch/x86/kvm/x86.h              |    3 +
>  11 files changed, 364 insertions(+), 278 deletions(-)
> 
> --
> Signature
> 

Queued except for patch 9 with only some changes to the names (mostly
replacing "vector" with "word").  It should get to kvm.git on Monday or
Tuesday, please give it a shot.

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
  2020-09-12 17:08 ` [PATCH v6 00/12] SVM cleanup and INVPCID feature support Paolo Bonzini
@ 2020-09-14 15:05   ` Sean Christopherson
  2020-09-14 18:33   ` Babu Moger
  1 sibling, 0 replies; 62+ messages in thread
From: Sean Christopherson @ 2020-09-14 15:05 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Babu Moger, vkuznets, jmattson, wanpengli, kvm, joro, x86,
	linux-kernel, mingo, bp, hpa, tglx

On Sat, Sep 12, 2020 at 07:08:05PM +0200, Paolo Bonzini wrote:
> Queued except for patch 9 with only some changes to the names (mostly
> replacing "vector" with "word").  It should get to kvm.git on Monday or
> Tuesday, please give it a shot.

Belated vote for s/vector/word, I found EXCEPTION_VECTOR quite confusing.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v6 04/12] KVM: SVM: Modify intercept_exceptions to generic intercepts
  2020-09-12 16:52   ` Paolo Bonzini
@ 2020-09-14 15:06     ` Sean Christopherson
  2020-09-22 13:39       ` Paolo Bonzini
  0 siblings, 1 reply; 62+ messages in thread
From: Sean Christopherson @ 2020-09-14 15:06 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Babu Moger, vkuznets, jmattson, wanpengli, kvm, joro, x86,
	linux-kernel, mingo, bp, hpa, tglx

On Sat, Sep 12, 2020 at 06:52:20PM +0200, Paolo Bonzini wrote:
> On 11/09/20 21:28, Babu Moger wrote:
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 1a5f3908b388..11892e86cb39 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -1003,11 +1003,11 @@ static void init_vmcb(struct vcpu_svm *svm)
> >  
> >  	set_dr_intercepts(svm);
> >  
> > -	set_exception_intercept(svm, PF_VECTOR);
> > -	set_exception_intercept(svm, UD_VECTOR);
> > -	set_exception_intercept(svm, MC_VECTOR);
> > -	set_exception_intercept(svm, AC_VECTOR);
> > -	set_exception_intercept(svm, DB_VECTOR);
> > +	set_exception_intercept(svm, INTERCEPT_PF_VECTOR);
> > +	set_exception_intercept(svm, INTERCEPT_UD_VECTOR);
> > +	set_exception_intercept(svm, INTERCEPT_MC_VECTOR);
> > +	set_exception_intercept(svm, INTERCEPT_AC_VECTOR);
> > +	set_exception_intercept(svm, INTERCEPT_DB_VECTOR);
> 
> I think these should take a vector instead, and add 64 in the functions.

And "s/int bit/u32 vector" + BUILD_BUG_ON(vector > 32)?

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
  2020-09-12 17:08 ` [PATCH v6 00/12] SVM cleanup and INVPCID feature support Paolo Bonzini
  2020-09-14 15:05   ` Sean Christopherson
@ 2020-09-14 18:33   ` Babu Moger
  2021-01-19 23:01     ` Jim Mattson
  1 sibling, 1 reply; 62+ messages in thread
From: Babu Moger @ 2020-09-14 18:33 UTC (permalink / raw)
  To: Paolo Bonzini, vkuznets, sean.j.christopherson, jmattson
  Cc: wanpengli, kvm, joro, x86, linux-kernel, mingo, bp, hpa, tglx



On 9/12/20 12:08 PM, Paolo Bonzini wrote:
> On 11/09/20 21:27, Babu Moger wrote:
>> The following series adds the support for PCID/INVPCID on AMD guests.
>> While doing it re-structured the vmcb_control_area data structure to
>> combine all the intercept vectors into one 32 bit array. Makes it easy
>> for future additions. Re-arranged few pcid related code to make it common
>> between SVM and VMX.
>>
>> INVPCID interceptions are added only when the guest is running with shadow
>> page table enabled. In this case the hypervisor needs to handle the tlbflush
>> based on the type of invpcid instruction.
>>
>> For the guests with nested page table (NPT) support, the INVPCID feature
>> works as running it natively. KVM does not need to do any special handling.
>>
>> AMD documentation for INVPCID feature is available at "AMD64 Architecture
>> Programmer’s Manual Volume 2: System Programming, Pub. 24593 Rev. 3.34(or later)"
>>
>> The documentation can be obtained at the links below:
>> Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.amd.com%2Fsystem%2Ffiles%2FTechDocs%2F24593.pdf&amp;data=02%7C01%7Cbabu.moger%40amd.com%7Cd2bca7c6209743a7fe0e08d8573e70fd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637355274033139116&amp;sdata=C3EGywJcz3rAPmjckWGKbm7GkHR1Xyrl%2BIL9sEijhcQ%3D&amp;reserved=0
>> Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D206537&amp;data=02%7C01%7Cbabu.moger%40amd.com%7Cd2bca7c6209743a7fe0e08d8573e70fd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637355274033139116&amp;sdata=29n8WNNpcUgVQRUyxbiSPcWJGTL5uV%2FaHgHXU1b9BjI%3D&amp;reserved=0
>> ---
>>
>> v6:
>>  One minor change in patch #04. Otherwise same as v5.
>>  Updated all the patches by Reviewed-by.
>>
>> v5:
>>  https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F159846887637.18873.14677728679411578606.stgit%40bmoger-ubuntu%2F&amp;data=02%7C01%7Cbabu.moger%40amd.com%7Cd2bca7c6209743a7fe0e08d8573e70fd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637355274033139116&amp;sdata=D7HvBj6OArmpKsiaZj0Qk3mIHWYOOUN23f53ajhQpOY%3D&amp;reserved=0
>>  All the changes are related to rebase.
>>  Aplies cleanly on mainline and kvm(master) tree. 
>>  Resending it to get some attention.
>>
>> v4:
>>  https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F159676101387.12805.18038347880482984693.stgit%40bmoger-ubuntu%2F&amp;data=02%7C01%7Cbabu.moger%40amd.com%7Cd2bca7c6209743a7fe0e08d8573e70fd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637355274033139116&amp;sdata=7og620g0qsxee7Wd60emz5YdbA44Al4tiUJX5n46MhE%3D&amp;reserved=0
>>  1. Changed the functions __set_intercept/__clr_intercept/__is_intercept to
>>     to vmcb_set_intercept/vmcb_clr_intercept/vmcb_is_intercept by passing
>>     vmcb_control_area structure(Suggested by Paolo).
>>  2. Rearranged the commit 7a35e515a7055 ("KVM: VMX: Properly handle kvm_read/write_guest_virt*())
>>     to make it common across both SVM/VMX(Suggested by Jim Mattson).
>>  3. Took care of few other comments from Jim Mattson. Dropped "Reviewed-by"
>>     on few patches which I have changed since v3.
>>
>> v3:
>>  https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F159597929496.12744.14654593948763926416.stgit%40bmoger-ubuntu%2F&amp;data=02%7C01%7Cbabu.moger%40amd.com%7Cd2bca7c6209743a7fe0e08d8573e70fd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637355274033139116&amp;sdata=hvPNH827bmo1VL%2F%2FIv%2F%2ByQdVBygOpI1tkgQ6ASf5Wt8%3D&amp;reserved=0
>>  1. Addressing the comments from Jim Mattson. Follow the v2 link below
>>     for the context.
>>  2. Introduced the generic __set_intercept, __clr_intercept and is_intercept
>>     using native __set_bit, clear_bit and test_bit.
>>  3. Combined all the intercepts vectors into single 32 bit array.
>>  4. Removed set_intercept_cr, clr_intercept_cr, set_exception_intercepts,
>>     clr_exception_intercept etc. Used the generic set_intercept and
>>     clr_intercept where applicable.
>>  5. Tested both L1 guest and l2 nested guests. 
>>
>> v2:
>>   https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F159234483706.6230.13753828995249423191.stgit%40bmoger-ubuntu%2F&amp;data=02%7C01%7Cbabu.moger%40amd.com%7Cd2bca7c6209743a7fe0e08d8573e70fd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637355274033139116&amp;sdata=rP%2BlRJ91tk1VXS3YX8TdP2L9vORiIj8gN3ZZLKIXfeY%3D&amp;reserved=0
>>   - Taken care of few comments from Jim Mattson.
>>   - KVM interceptions added only when tdp is off. No interceptions
>>     when tdp is on.
>>   - Reverted the fault priority to original order in VMX. 
>>   
>> v1:
>>   https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F159191202523.31436.11959784252237488867.stgit%40bmoger-ubuntu%2F&amp;data=02%7C01%7Cbabu.moger%40amd.com%7Cd2bca7c6209743a7fe0e08d8573e70fd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637355274033139116&amp;sdata=IGmv%2BLF60dmGVSCwcTU6sTDMvW1%2BEWUqTA5K%2FAowuxM%3D&amp;reserved=0
>>
>> Babu Moger (12):
>>       KVM: SVM: Introduce vmcb_(set_intercept/clr_intercept/_is_intercept)
>>       KVM: SVM: Change intercept_cr to generic intercepts
>>       KVM: SVM: Change intercept_dr to generic intercepts
>>       KVM: SVM: Modify intercept_exceptions to generic intercepts
>>       KVM: SVM: Modify 64 bit intercept field to two 32 bit vectors
>>       KVM: SVM: Add new intercept vector in vmcb_control_area
>>       KVM: nSVM: Cleanup nested_state data structure
>>       KVM: SVM: Remove set_cr_intercept, clr_cr_intercept and is_cr_intercept
>>       KVM: SVM: Remove set_exception_intercept and clr_exception_intercept
>>       KVM: X86: Rename and move the function vmx_handle_memory_failure to x86.c
>>       KVM: X86: Move handling of INVPCID types to x86
>>       KVM:SVM: Enable INVPCID feature on AMD
>>
>>
>>  arch/x86/include/asm/svm.h      |  117 +++++++++++++++++++++++++----------
>>  arch/x86/include/uapi/asm/svm.h |    2 +
>>  arch/x86/kvm/svm/nested.c       |   66 +++++++++-----------
>>  arch/x86/kvm/svm/svm.c          |  131 ++++++++++++++++++++++++++-------------
>>  arch/x86/kvm/svm/svm.h          |   87 +++++++++-----------------
>>  arch/x86/kvm/trace.h            |   21 ++++--
>>  arch/x86/kvm/vmx/nested.c       |   12 ++--
>>  arch/x86/kvm/vmx/vmx.c          |   95 ----------------------------
>>  arch/x86/kvm/vmx/vmx.h          |    2 -
>>  arch/x86/kvm/x86.c              |  106 ++++++++++++++++++++++++++++++++
>>  arch/x86/kvm/x86.h              |    3 +
>>  11 files changed, 364 insertions(+), 278 deletions(-)
>>
>> --
>> Signature
>>
> 
> Queued except for patch 9 with only some changes to the names (mostly
> replacing "vector" with "word").  It should get to kvm.git on Monday or
> Tuesday, please give it a shot.

Thanks Paolo. Tested Guest/nested guest/kvm units tests. Everything works
as expected.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v6 04/12] KVM: SVM: Modify intercept_exceptions to generic intercepts
  2020-09-14 15:06     ` Sean Christopherson
@ 2020-09-22 13:39       ` Paolo Bonzini
  2020-09-22 19:11         ` Babu Moger
  0 siblings, 1 reply; 62+ messages in thread
From: Paolo Bonzini @ 2020-09-22 13:39 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Babu Moger, vkuznets, jmattson, wanpengli, kvm, joro, x86,
	linux-kernel, mingo, bp, hpa, tglx

On 14/09/20 17:06, Sean Christopherson wrote:
>> I think these should take a vector instead, and add 64 in the functions.
> 
> And "s/int bit/u32 vector" + BUILD_BUG_ON(vector > 32)?

Not sure if we can assume it to be constant, but WARN_ON_ONCE is good
enough as far as performance is concerned.  The same int->u32 +
WARN_ON_ONCE should be done in patch 1.

Thanks for the review!

Paolo


^ permalink raw reply	[flat|nested] 62+ messages in thread

* RE: [PATCH v6 04/12] KVM: SVM: Modify intercept_exceptions to generic intercepts
  2020-09-22 13:39       ` Paolo Bonzini
@ 2020-09-22 19:11         ` Babu Moger
  2020-09-23  2:43           ` Paolo Bonzini
  0 siblings, 1 reply; 62+ messages in thread
From: Babu Moger @ 2020-09-22 19:11 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: vkuznets, jmattson, wanpengli, kvm, joro, x86, linux-kernel,
	mingo, bp, hpa, tglx



> -----Original Message-----
> From: Paolo Bonzini <pbonzini@redhat.com>
> Sent: Tuesday, September 22, 2020 8:39 AM
> To: Sean Christopherson <sean.j.christopherson@intel.com>
> Cc: Moger, Babu <Babu.Moger@amd.com>; vkuznets@redhat.com;
> jmattson@google.com; wanpengli@tencent.com; kvm@vger.kernel.org;
> joro@8bytes.org; x86@kernel.org; linux-kernel@vger.kernel.org;
> mingo@redhat.com; bp@alien8.de; hpa@zytor.com; tglx@linutronix.de
> Subject: Re: [PATCH v6 04/12] KVM: SVM: Modify intercept_exceptions to
> generic intercepts
> 
> On 14/09/20 17:06, Sean Christopherson wrote:
> >> I think these should take a vector instead, and add 64 in the functions.
> >
> > And "s/int bit/u32 vector" + BUILD_BUG_ON(vector > 32)?
> 
> Not sure if we can assume it to be constant, but WARN_ON_ONCE is good
> enough as far as performance is concerned.  The same int->u32 +
> WARN_ON_ONCE should be done in patch 1.

Paolo, Ok sure. Will change "int bit" to "u32 vector". I will send a new
patch to address this. This needs to be addressed in all these functions,
vmcb_set_intercept, vmcb_clr_intercept, vmcb_is_intercept,
set_exception_intercept, clr_exception_intercept, svm_set_intercept,
svm_clr_intercept, svm_is_intercept.

Also will add WARN_ON_ONCE(vector > 32); on set_exception_intercept,
clr_exception_intercept.  Does that sound good?

> 
> Thanks for the review!
> 
> Paolo


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v6 04/12] KVM: SVM: Modify intercept_exceptions to generic intercepts
  2020-09-22 19:11         ` Babu Moger
@ 2020-09-23  2:43           ` Paolo Bonzini
  2020-09-23 13:35             ` Babu Moger
  0 siblings, 1 reply; 62+ messages in thread
From: Paolo Bonzini @ 2020-09-23  2:43 UTC (permalink / raw)
  To: Babu Moger, Sean Christopherson
  Cc: vkuznets, jmattson, wanpengli, kvm, joro, x86, linux-kernel,
	mingo, bp, hpa, tglx

On 22/09/20 21:11, Babu Moger wrote:
> 
> 
>> -----Original Message-----
>> From: Paolo Bonzini <pbonzini@redhat.com>
>> Sent: Tuesday, September 22, 2020 8:39 AM
>> To: Sean Christopherson <sean.j.christopherson@intel.com>
>> Cc: Moger, Babu <Babu.Moger@amd.com>; vkuznets@redhat.com;
>> jmattson@google.com; wanpengli@tencent.com; kvm@vger.kernel.org;
>> joro@8bytes.org; x86@kernel.org; linux-kernel@vger.kernel.org;
>> mingo@redhat.com; bp@alien8.de; hpa@zytor.com; tglx@linutronix.de
>> Subject: Re: [PATCH v6 04/12] KVM: SVM: Modify intercept_exceptions to
>> generic intercepts
>>
>> On 14/09/20 17:06, Sean Christopherson wrote:
>>>> I think these should take a vector instead, and add 64 in the functions.
>>>
>>> And "s/int bit/u32 vector" + BUILD_BUG_ON(vector > 32)?
>>
>> Not sure if we can assume it to be constant, but WARN_ON_ONCE is good
>> enough as far as performance is concerned.  The same int->u32 +
>> WARN_ON_ONCE should be done in patch 1.
> 
> Paolo, Ok sure. Will change "int bit" to "u32 vector". I will send a new
> patch to address this. This needs to be addressed in all these functions,
> vmcb_set_intercept, vmcb_clr_intercept, vmcb_is_intercept,
> set_exception_intercept, clr_exception_intercept, svm_set_intercept,
> svm_clr_intercept, svm_is_intercept.
> 
> Also will add WARN_ON_ONCE(vector > 32); on set_exception_intercept,
> clr_exception_intercept.  Does that sound good?

I can do the fixes myself, no worries.  It should get to kvm/next this week.

Paolo


^ permalink raw reply	[flat|nested] 62+ messages in thread

* RE: [PATCH v6 04/12] KVM: SVM: Modify intercept_exceptions to generic intercepts
  2020-09-23  2:43           ` Paolo Bonzini
@ 2020-09-23 13:35             ` Babu Moger
  0 siblings, 0 replies; 62+ messages in thread
From: Babu Moger @ 2020-09-23 13:35 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: vkuznets, jmattson, wanpengli, kvm, joro, x86, linux-kernel,
	mingo, bp, hpa, tglx



> -----Original Message-----
> From: Paolo Bonzini <pbonzini@redhat.com>
> Sent: Tuesday, September 22, 2020 9:44 PM
> To: Moger, Babu <Babu.Moger@amd.com>; Sean Christopherson
> <sean.j.christopherson@intel.com>
> Cc: vkuznets@redhat.com; jmattson@google.com; wanpengli@tencent.com;
> kvm@vger.kernel.org; joro@8bytes.org; x86@kernel.org; linux-
> kernel@vger.kernel.org; mingo@redhat.com; bp@alien8.de; hpa@zytor.com;
> tglx@linutronix.de
> Subject: Re: [PATCH v6 04/12] KVM: SVM: Modify intercept_exceptions to
> generic intercepts
> 
> On 22/09/20 21:11, Babu Moger wrote:
> >
> >
> >> -----Original Message-----
> >> From: Paolo Bonzini <pbonzini@redhat.com>
> >> Sent: Tuesday, September 22, 2020 8:39 AM
> >> To: Sean Christopherson <sean.j.christopherson@intel.com>
> >> Cc: Moger, Babu <Babu.Moger@amd.com>; vkuznets@redhat.com;
> >> jmattson@google.com; wanpengli@tencent.com; kvm@vger.kernel.org;
> >> joro@8bytes.org; x86@kernel.org; linux-kernel@vger.kernel.org;
> >> mingo@redhat.com; bp@alien8.de; hpa@zytor.com; tglx@linutronix.de
> >> Subject: Re: [PATCH v6 04/12] KVM: SVM: Modify intercept_exceptions
> >> to generic intercepts
> >>
> >> On 14/09/20 17:06, Sean Christopherson wrote:
> >>>> I think these should take a vector instead, and add 64 in the functions.
> >>>
> >>> And "s/int bit/u32 vector" + BUILD_BUG_ON(vector > 32)?
> >>
> >> Not sure if we can assume it to be constant, but WARN_ON_ONCE is good
> >> enough as far as performance is concerned.  The same int->u32 +
> >> WARN_ON_ONCE should be done in patch 1.
> >
> > Paolo, Ok sure. Will change "int bit" to "u32 vector". I will send a
> > new patch to address this. This needs to be addressed in all these
> > functions, vmcb_set_intercept, vmcb_clr_intercept, vmcb_is_intercept,
> > set_exception_intercept, clr_exception_intercept, svm_set_intercept,
> > svm_clr_intercept, svm_is_intercept.
> >
> > Also will add WARN_ON_ONCE(vector > 32); on set_exception_intercept,
> > clr_exception_intercept.  Does that sound good?
> 
> I can do the fixes myself, no worries.  It should get to kvm/next this week.
Ok. Thanks
Babu

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
  2020-09-14 18:33   ` Babu Moger
@ 2021-01-19 23:01     ` Jim Mattson
  2021-01-19 23:45       ` Babu Moger
  0 siblings, 1 reply; 62+ messages in thread
From: Jim Mattson @ 2021-01-19 23:01 UTC (permalink / raw)
  To: Babu Moger
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Sean Christopherson, Wanpeng Li,
	kvm list, Joerg Roedel, the arch/x86 maintainers, LKML,
	Ingo Molnar, Borislav Petkov, H . Peter Anvin, Thomas Gleixner,
	Makarand Sonare

[-- Attachment #1: Type: text/plain, Size: 399 bytes --]

On Mon, Sep 14, 2020 at 11:33 AM Babu Moger <babu.moger@amd.com> wrote:

> Thanks Paolo. Tested Guest/nested guest/kvm units tests. Everything works
> as expected.

Debian 9 does not like this patch set. As a kvm guest, it panics on a
Milan CPU unless booted with 'nopcid'. Gmail mangles long lines, so
please see the attached kernel log snippet. Debian 10 is fine, so I
assume this is a guest bug.

[-- Attachment #2: debian9.panic --]
[-- Type: application/octet-stream, Size: 3420 bytes --]

[    1.235726] ------------[ cut here ]------------
[    1.237515] kernel BUG at /build/linux-dqnRSc/linux-4.9.228/arch/x86/kernel/alternative.c:709!
[    1.240926] invalid opcode: 0000 [#1] SMP
[    1.243301] Modules linked in:
[    1.244585] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.9.0-13-amd64 #1 Debian 4.9.228-1
[    1.247657] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[    1.251249] task: ffff909363e94040 task.stack: ffffa41bc0194000
[    1.253519] RIP: 0010:[<ffffffff8fa2e40c>]  [<ffffffff8fa2e40c>] text_poke+0x18c/0x240
[    1.256593] RSP: 0018:ffffa41bc0197d90  EFLAGS: 00010096
[    1.258657] RAX: 000000000000000f RBX: 0000000001020800 RCX: 00000000feda3203
[    1.261388] RDX: 00000000178bfbff RSI: 0000000000000000 RDI: ffffffffff57a000
[    1.264168] RBP: ffffffff8fbd3eca R08: 0000000000000000 R09: 0000000000000003
[    1.266983] R10: 0000000000000003 R11: 0000000000000112 R12: 0000000000000001
[    1.269702] R13: ffffa41bc0197dcf R14: 0000000000000286 R15: ffffed1c40407500
[    1.272572] FS:  0000000000000000(0000) GS:ffff909366300000(0000) knlGS:0000000000000000
[    1.275791] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.278032] CR2: 0000000000000000 CR3: 0000000010c08000 CR4: 00000000003606f0
[    1.280815] Stack:
[    1.281630]  ffffffff8fbd3eca 0000000000000005 ffffa41bc0197e03 ffffffff8fbd3ecb
[    1.284660]  0000000000000000 0000000000000000 ffffffff8fa2e835 ccffffff8fad4326
[    1.287729]  1ccd0231874d55d3 ffffffff8fbd3eca ffffa41bc0197e03 ffffffff90203844
[    1.290852] Call Trace:
[    1.291782]  [<ffffffff8fbd3eca>] ? swap_entry_free+0x12a/0x300
[    1.294900]  [<ffffffff8fbd3ecb>] ? swap_entry_free+0x12b/0x300
[    1.297267]  [<ffffffff8fa2e835>] ? text_poke_bp+0x55/0xe0
[    1.299473]  [<ffffffff8fbd3eca>] ? swap_entry_free+0x12a/0x300
[    1.301896]  [<ffffffff8fa2b64c>] ? arch_jump_label_transform+0x9c/0x120
[    1.304557]  [<ffffffff9073e81f>] ? set_debug_rodata+0xc/0xc
[    1.306790]  [<ffffffff8fb81d92>] ? __jump_label_update+0x72/0x80
[    1.309255]  [<ffffffff8fb8206f>] ? static_key_slow_inc+0x8f/0xa0
[    1.311680]  [<ffffffff8fbd7a57>] ? frontswap_register_ops+0x107/0x1d0
[    1.314281]  [<ffffffff9077078c>] ? init_zswap+0x282/0x3f6
[    1.316547]  [<ffffffff9077050a>] ? init_frontswap+0x8c/0x8c
[    1.318784]  [<ffffffff8fa0223e>] ? do_one_initcall+0x4e/0x180
[    1.321067]  [<ffffffff9073e81f>] ? set_debug_rodata+0xc/0xc
[    1.323366]  [<ffffffff9073f08d>] ? kernel_init_freeable+0x16b/0x1ec
[    1.325873]  [<ffffffff90011d50>] ? rest_init+0x80/0x80
[    1.327989]  [<ffffffff90011d5a>] ? kernel_init+0xa/0x100
[    1.330092]  [<ffffffff9001f424>] ? ret_from_fork+0x44/0x70
[    1.332311] Code: 00 0f a2 4d 85 e4 74 4a 0f b6 45 00 41 38 45 00 75 19 31 c0 83 c0 01 48 63 d0 49 39 d4 76 33 41 0f b6 4c 15 00 38 4c 15 00 74 e9 <0f> 0b 48 89 ef e8 da d6 19 00 48 8d bd 00 10 00 00 48 89 c3 e8 
[    1.342818] RIP  [<ffffffff8fa2e40c>] text_poke+0x18c/0x240
[    1.345859]  RSP <ffffa41bc0197d90>
[    1.347285] ---[ end trace 0a1c5ab5eb16de89 ]---
[    1.349169] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    1.349169] 
[    1.352885] Kernel Offset: 0xea00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[    1.357039] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    1.357039] 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
  2021-01-19 23:01     ` Jim Mattson
@ 2021-01-19 23:45       ` Babu Moger
  2021-01-20 21:14         ` Jim Mattson
  0 siblings, 1 reply; 62+ messages in thread
From: Babu Moger @ 2021-01-19 23:45 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Sean Christopherson, Wanpeng Li,
	kvm list, Joerg Roedel, the arch/x86 maintainers, LKML,
	Ingo Molnar, Borislav Petkov, H . Peter Anvin, Thomas Gleixner,
	Makarand Sonare



On 1/19/21 5:01 PM, Jim Mattson wrote:
> On Mon, Sep 14, 2020 at 11:33 AM Babu Moger <babu.moger@amd.com> wrote:
> 
>> Thanks Paolo. Tested Guest/nested guest/kvm units tests. Everything works
>> as expected.
> 
> Debian 9 does not like this patch set. As a kvm guest, it panics on a
> Milan CPU unless booted with 'nopcid'. Gmail mangles long lines, so
> please see the attached kernel log snippet. Debian 10 is fine, so I
> assume this is a guest bug.
> 

We had an issue with PCID feature earlier. This was showing only with SEV
guests. It is resolved recently. Do you think it is not related that?
Here are the patch set.
https://lore.kernel.org/kvm/160521930597.32054.4906933314022910996.stgit@bmoger-ubuntu/


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
  2021-01-19 23:45       ` Babu Moger
@ 2021-01-20 21:14         ` Jim Mattson
  2021-01-20 21:45           ` Babu Moger
  0 siblings, 1 reply; 62+ messages in thread
From: Jim Mattson @ 2021-01-20 21:14 UTC (permalink / raw)
  To: Babu Moger
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, kvm list,
	Joerg Roedel, the arch/x86 maintainers, LKML, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, Thomas Gleixner,
	Makarand Sonare, Sean Christopherson

On Tue, Jan 19, 2021 at 3:45 PM Babu Moger <babu.moger@amd.com> wrote:
>
>
>
> On 1/19/21 5:01 PM, Jim Mattson wrote:
> > On Mon, Sep 14, 2020 at 11:33 AM Babu Moger <babu.moger@amd.com> wrote:
> >
> >> Thanks Paolo. Tested Guest/nested guest/kvm units tests. Everything works
> >> as expected.
> >
> > Debian 9 does not like this patch set. As a kvm guest, it panics on a
> > Milan CPU unless booted with 'nopcid'. Gmail mangles long lines, so
> > please see the attached kernel log snippet. Debian 10 is fine, so I
> > assume this is a guest bug.
> >
>
> We had an issue with PCID feature earlier. This was showing only with SEV
> guests. It is resolved recently. Do you think it is not related that?
> Here are the patch set.
> https://lore.kernel.org/kvm/160521930597.32054.4906933314022910996.stgit@bmoger-ubuntu/

The Debian 9 release we tested is not an SEV guest.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
  2021-01-20 21:14         ` Jim Mattson
@ 2021-01-20 21:45           ` Babu Moger
  2021-01-21  3:10             ` Babu Moger
  0 siblings, 1 reply; 62+ messages in thread
From: Babu Moger @ 2021-01-20 21:45 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, kvm list,
	Joerg Roedel, the arch/x86 maintainers, LKML, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, Thomas Gleixner,
	Makarand Sonare, Sean Christopherson



On 1/20/21 3:14 PM, Jim Mattson wrote:
> On Tue, Jan 19, 2021 at 3:45 PM Babu Moger <babu.moger@amd.com> wrote:
>>
>>
>>
>> On 1/19/21 5:01 PM, Jim Mattson wrote:
>>> On Mon, Sep 14, 2020 at 11:33 AM Babu Moger <babu.moger@amd.com> wrote:
>>>
>>>> Thanks Paolo. Tested Guest/nested guest/kvm units tests. Everything works
>>>> as expected.
>>>
>>> Debian 9 does not like this patch set. As a kvm guest, it panics on a
>>> Milan CPU unless booted with 'nopcid'. Gmail mangles long lines, so
>>> please see the attached kernel log snippet. Debian 10 is fine, so I
>>> assume this is a guest bug.
>>>
>>
>> We had an issue with PCID feature earlier. This was showing only with SEV
>> guests. It is resolved recently. Do you think it is not related that?
>> Here are the patch set.
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fkvm%2F160521930597.32054.4906933314022910996.stgit%40bmoger-ubuntu%2F&amp;data=04%7C01%7Cbabu.moger%40amd.com%7C507e52200cc5478e3b9308d8bd8860bc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637467740754159704%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Nxhg4Atzr6wZ1L7egyxQVZ%2FmVCE473%2F%2F5Fi0savgUfk%3D&amp;reserved=0
> 
> The Debian 9 release we tested is not an SEV guest.
ok. I have not tested Debian 9 before. I will try now. Will let you know
how it goes. thanks

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
  2021-01-20 21:45           ` Babu Moger
@ 2021-01-21  3:10             ` Babu Moger
  2021-01-21 23:51               ` Babu Moger
  0 siblings, 1 reply; 62+ messages in thread
From: Babu Moger @ 2021-01-21  3:10 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, kvm list,
	Joerg Roedel, the arch/x86 maintainers, LKML, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, Thomas Gleixner,
	Makarand Sonare, Sean Christopherson



On 1/20/21 3:45 PM, Babu Moger wrote:
> 
> 
> On 1/20/21 3:14 PM, Jim Mattson wrote:
>> On Tue, Jan 19, 2021 at 3:45 PM Babu Moger <babu.moger@amd.com> wrote:
>>>
>>>
>>>
>>> On 1/19/21 5:01 PM, Jim Mattson wrote:
>>>> On Mon, Sep 14, 2020 at 11:33 AM Babu Moger <babu.moger@amd.com> wrote:
>>>>
>>>>> Thanks Paolo. Tested Guest/nested guest/kvm units tests. Everything works
>>>>> as expected.
>>>>
>>>> Debian 9 does not like this patch set. As a kvm guest, it panics on a
>>>> Milan CPU unless booted with 'nopcid'. Gmail mangles long lines, so
>>>> please see the attached kernel log snippet. Debian 10 is fine, so I
>>>> assume this is a guest bug.
>>>>
>>>
>>> We had an issue with PCID feature earlier. This was showing only with SEV
>>> guests. It is resolved recently. Do you think it is not related that?
>>> Here are the patch set.
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fkvm%2F160521930597.32054.4906933314022910996.stgit%40bmoger-ubuntu%2F&amp;data=04%7C01%7CBabu.Moger%40amd.com%7C562d8b8ea61c41a61fe608d8bda0ae3b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637467845105800757%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=l%2FhF%2FlDAqFN10SzDQ1L05FH1joXrLiuMwHAibBGHOqw%3D&amp;reserved=0
>>
>> The Debian 9 release we tested is not an SEV guest.
> ok. I have not tested Debian 9 before. I will try now. Will let you know
> how it goes. thanks
> 

I have reproduced the issue locally. Will investigate. thanks

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
  2021-01-21  3:10             ` Babu Moger
@ 2021-01-21 23:51               ` Babu Moger
  2021-01-23  1:52                 ` Babu Moger
  0 siblings, 1 reply; 62+ messages in thread
From: Babu Moger @ 2021-01-21 23:51 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, kvm list,
	Joerg Roedel, the arch/x86 maintainers, LKML, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, Thomas Gleixner,
	Makarand Sonare, Sean Christopherson



On 1/20/21 9:10 PM, Babu Moger wrote:
> 
> 
> On 1/20/21 3:45 PM, Babu Moger wrote:
>>
>>
>> On 1/20/21 3:14 PM, Jim Mattson wrote:
>>> On Tue, Jan 19, 2021 at 3:45 PM Babu Moger <babu.moger@amd.com> wrote:
>>>>
>>>>
>>>>
>>>> On 1/19/21 5:01 PM, Jim Mattson wrote:
>>>>> On Mon, Sep 14, 2020 at 11:33 AM Babu Moger <babu.moger@amd.com> wrote:
>>>>>
>>>>>> Thanks Paolo. Tested Guest/nested guest/kvm units tests. Everything works
>>>>>> as expected.
>>>>>
>>>>> Debian 9 does not like this patch set. As a kvm guest, it panics on a
>>>>> Milan CPU unless booted with 'nopcid'. Gmail mangles long lines, so
>>>>> please see the attached kernel log snippet. Debian 10 is fine, so I
>>>>> assume this is a guest bug.
>>>>>
>>>>
>>>> We had an issue with PCID feature earlier. This was showing only with SEV
>>>> guests. It is resolved recently. Do you think it is not related that?
>>>> Here are the patch set.
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fkvm%2F160521930597.32054.4906933314022910996.stgit%40bmoger-ubuntu%2F&amp;data=04%7C01%7CBabu.Moger%40amd.com%7C3009e5f7f32b4dbd4aee08d8bdc045c9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637467980841376327%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2Bva7em372XD7uaCrSy3UBH6a9n8xaTTXWCAlA3gJX78%3D&amp;reserved=0
>>>
>>> The Debian 9 release we tested is not an SEV guest.
>> ok. I have not tested Debian 9 before. I will try now. Will let you know
>> how it goes. thanks
>>
> 
> I have reproduced the issue locally. Will investigate. thanks
> 
Few updates.
1. Like Jim mentioned earlier, this appears to be guest kernel issue.
Debian 9 runs the base kernel 4.9.0-14. Problem can be seen consistently
with this kernel.

2. This guest kernel(4.9.0-14) does not like the new feature INVPCID.

3. System comes up fine when invpcid feature is disabled with the boot
parameter "noinvpcid" and also with "nopcid". nopcid disables both pcid
and invpcid.

4. Upgraded the guest kernel to v5.0 and system comes up fine.

5. Also system comes up fine with latest guest kernel 5.11.0-rc4.

I did not bisect further yet.
Babu
Thanks

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
  2021-01-21 23:51               ` Babu Moger
@ 2021-01-23  1:52                 ` Babu Moger
  2021-02-24  0:13                   ` Jim Mattson
  0 siblings, 1 reply; 62+ messages in thread
From: Babu Moger @ 2021-01-23  1:52 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, kvm list,
	Joerg Roedel, the arch/x86 maintainers, LKML, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, Thomas Gleixner,
	Makarand Sonare, Sean Christopherson



On 1/21/21 5:51 PM, Babu Moger wrote:
> 
> 
> On 1/20/21 9:10 PM, Babu Moger wrote:
>>
>>
>> On 1/20/21 3:45 PM, Babu Moger wrote:
>>>
>>>
>>> On 1/20/21 3:14 PM, Jim Mattson wrote:
>>>> On Tue, Jan 19, 2021 at 3:45 PM Babu Moger <babu.moger@amd.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 1/19/21 5:01 PM, Jim Mattson wrote:
>>>>>> On Mon, Sep 14, 2020 at 11:33 AM Babu Moger <babu.moger@amd.com> wrote:
>>>>>>
>>>>>>> Thanks Paolo. Tested Guest/nested guest/kvm units tests. Everything works
>>>>>>> as expected.
>>>>>>
>>>>>> Debian 9 does not like this patch set. As a kvm guest, it panics on a
>>>>>> Milan CPU unless booted with 'nopcid'. Gmail mangles long lines, so
>>>>>> please see the attached kernel log snippet. Debian 10 is fine, so I
>>>>>> assume this is a guest bug.
>>>>>>
>>>>>
>>>>> We had an issue with PCID feature earlier. This was showing only with SEV
>>>>> guests. It is resolved recently. Do you think it is not related that?
>>>>> Here are the patch set.
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fkvm%2F160521930597.32054.4906933314022910996.stgit%40bmoger-ubuntu%2F&amp;data=04%7C01%7CBabu.Moger%40amd.com%7C3009e5f7f32b4dbd4aee08d8bdc045c9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637467980841376327%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2Bva7em372XD7uaCrSy3UBH6a9n8xaTTXWCAlA3gJX78%3D&amp;reserved=0
>>>>
>>>> The Debian 9 release we tested is not an SEV guest.
>>> ok. I have not tested Debian 9 before. I will try now. Will let you know
>>> how it goes. thanks
>>>
>>
>> I have reproduced the issue locally. Will investigate. thanks
>>
> Few updates.
> 1. Like Jim mentioned earlier, this appears to be guest kernel issue.
> Debian 9 runs the base kernel 4.9.0-14. Problem can be seen consistently
> with this kernel.
> 
> 2. This guest kernel(4.9.0-14) does not like the new feature INVPCID.
> 
> 3. System comes up fine when invpcid feature is disabled with the boot
> parameter "noinvpcid" and also with "nopcid". nopcid disables both pcid
> and invpcid.
> 
> 4. Upgraded the guest kernel to v5.0 and system comes up fine.
> 
> 5. Also system comes up fine with latest guest kernel 5.11.0-rc4.
> 
> I did not bisect further yet.
> Babu
> Thanks


Some more update:
 System comes up fine with kernel v4.9(checked out on upstream tag v4.9).
So, I am assuming this is something specific to Debian 4.9.0-14 kernel.

Note: I couldn't go back prior versions(v4.8 or earlier) due to compile
issues.
Thanks
Babu


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
  2021-01-23  1:52                 ` Babu Moger
@ 2021-02-24  0:13                   ` Jim Mattson
  2021-02-24 22:17                     ` Babu Moger
  0 siblings, 1 reply; 62+ messages in thread
From: Jim Mattson @ 2021-02-24  0:13 UTC (permalink / raw)
  To: Babu Moger
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, kvm list,
	Joerg Roedel, the arch/x86 maintainers, LKML, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, Thomas Gleixner,
	Makarand Sonare, Sean Christopherson

Any updates? What should we be telling customers with Debian 9 guests? :-)

On Fri, Jan 22, 2021 at 5:52 PM Babu Moger <babu.moger@amd.com> wrote:
>
>
>
> On 1/21/21 5:51 PM, Babu Moger wrote:
> >
> >
> > On 1/20/21 9:10 PM, Babu Moger wrote:
> >>
> >>
> >> On 1/20/21 3:45 PM, Babu Moger wrote:
> >>>
> >>>
> >>> On 1/20/21 3:14 PM, Jim Mattson wrote:
> >>>> On Tue, Jan 19, 2021 at 3:45 PM Babu Moger <babu.moger@amd.com> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 1/19/21 5:01 PM, Jim Mattson wrote:
> >>>>>> On Mon, Sep 14, 2020 at 11:33 AM Babu Moger <babu.moger@amd.com> wrote:
> >>>>>>
> >>>>>>> Thanks Paolo. Tested Guest/nested guest/kvm units tests. Everything works
> >>>>>>> as expected.
> >>>>>>
> >>>>>> Debian 9 does not like this patch set. As a kvm guest, it panics on a
> >>>>>> Milan CPU unless booted with 'nopcid'. Gmail mangles long lines, so
> >>>>>> please see the attached kernel log snippet. Debian 10 is fine, so I
> >>>>>> assume this is a guest bug.
> >>>>>>
> >>>>>
> >>>>> We had an issue with PCID feature earlier. This was showing only with SEV
> >>>>> guests. It is resolved recently. Do you think it is not related that?
> >>>>> Here are the patch set.
> >>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fkvm%2F160521930597.32054.4906933314022910996.stgit%40bmoger-ubuntu%2F&amp;data=04%7C01%7CBabu.Moger%40amd.com%7C3009e5f7f32b4dbd4aee08d8bdc045c9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637467980841376327%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2Bva7em372XD7uaCrSy3UBH6a9n8xaTTXWCAlA3gJX78%3D&amp;reserved=0
> >>>>
> >>>> The Debian 9 release we tested is not an SEV guest.
> >>> ok. I have not tested Debian 9 before. I will try now. Will let you know
> >>> how it goes. thanks
> >>>
> >>
> >> I have reproduced the issue locally. Will investigate. thanks
> >>
> > Few updates.
> > 1. Like Jim mentioned earlier, this appears to be guest kernel issue.
> > Debian 9 runs the base kernel 4.9.0-14. Problem can be seen consistently
> > with this kernel.
> >
> > 2. This guest kernel(4.9.0-14) does not like the new feature INVPCID.
> >
> > 3. System comes up fine when invpcid feature is disabled with the boot
> > parameter "noinvpcid" and also with "nopcid". nopcid disables both pcid
> > and invpcid.
> >
> > 4. Upgraded the guest kernel to v5.0 and system comes up fine.
> >
> > 5. Also system comes up fine with latest guest kernel 5.11.0-rc4.
> >
> > I did not bisect further yet.
> > Babu
> > Thanks
>
>
> Some more update:
>  System comes up fine with kernel v4.9(checked out on upstream tag v4.9).
> So, I am assuming this is something specific to Debian 4.9.0-14 kernel.
>
> Note: I couldn't go back prior versions(v4.8 or earlier) due to compile
> issues.
> Thanks
> Babu
>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* RE: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
  2021-02-24  0:13                   ` Jim Mattson
@ 2021-02-24 22:17                     ` Babu Moger
  2021-03-10  1:04                       ` Babu Moger
  0 siblings, 1 reply; 62+ messages in thread
From: Babu Moger @ 2021-02-24 22:17 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, kvm list,
	Joerg Roedel, the arch/x86 maintainers, LKML, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, Thomas Gleixner,
	Makarand Sonare, Sean Christopherson



> -----Original Message-----
> From: Jim Mattson <jmattson@google.com>
> Sent: Tuesday, February 23, 2021 6:14 PM
> To: Moger, Babu <Babu.Moger@amd.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>; Vitaly Kuznetsov
> <vkuznets@redhat.com>; Wanpeng Li <wanpengli@tencent.com>; kvm list
> <kvm@vger.kernel.org>; Joerg Roedel <joro@8bytes.org>; the arch/x86
> maintainers <x86@kernel.org>; LKML <linux-kernel@vger.kernel.org>; Ingo
> Molnar <mingo@redhat.com>; Borislav Petkov <bp@alien8.de>; H . Peter Anvin
> <hpa@zytor.com>; Thomas Gleixner <tglx@linutronix.de>; Makarand Sonare
> <makarandsonare@google.com>; Sean Christopherson <seanjc@google.com>
> Subject: Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
> 
> Any updates? What should we be telling customers with Debian 9 guests? :-)

Found another problem with pcid feature om SVM. It is do with CR4 flags
reset during bootup. Problem was showing up with kexec loading on VM.
I am not sure if this is related to that. Will send the patch soon.

> 
> On Fri, Jan 22, 2021 at 5:52 PM Babu Moger <babu.moger@amd.com> wrote:
> >
> >
> >
> > On 1/21/21 5:51 PM, Babu Moger wrote:
> > >
> > >
> > > On 1/20/21 9:10 PM, Babu Moger wrote:
> > >>
> > >>
> > >> On 1/20/21 3:45 PM, Babu Moger wrote:
> > >>>
> > >>>
> > >>> On 1/20/21 3:14 PM, Jim Mattson wrote:
> > >>>> On Tue, Jan 19, 2021 at 3:45 PM Babu Moger <babu.moger@amd.com>
> wrote:
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On 1/19/21 5:01 PM, Jim Mattson wrote:
> > >>>>>> On Mon, Sep 14, 2020 at 11:33 AM Babu Moger
> <babu.moger@amd.com> wrote:
> > >>>>>>
> > >>>>>>> Thanks Paolo. Tested Guest/nested guest/kvm units tests.
> > >>>>>>> Everything works as expected.
> > >>>>>>
> > >>>>>> Debian 9 does not like this patch set. As a kvm guest, it
> > >>>>>> panics on a Milan CPU unless booted with 'nopcid'. Gmail
> > >>>>>> mangles long lines, so please see the attached kernel log
> > >>>>>> snippet. Debian 10 is fine, so I assume this is a guest bug.
> > >>>>>>
> > >>>>>
> > >>>>> We had an issue with PCID feature earlier. This was showing only
> > >>>>> with SEV guests. It is resolved recently. Do you think it is not related
> that?
> > >>>>> Here are the patch set.
> > >>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%
> > >>>>>
> 2Flore.kernel.org%2Fkvm%2F160521930597.32054.4906933314022910996
> > >>>>> .stgit%40bmoger-
> ubuntu%2F&amp;data=04%7C01%7Cbabu.moger%40amd.co
> > >>>>>
> m%7C9558672ca21c4f6c2d5308d8d85919dc%7C3dd8961fe4884e608e11a82d9
> > >>>>>
> 94e183d%7C0%7C0%7C637497224490455772%7CUnknown%7CTWFpbGZsb3d8
> eyJ
> > >>>>>
> WIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%
> > >>>>>
> 7C1000&amp;sdata=4QzTNHaYllwPd1U0kumq75dpwp7Rg0ZXsSQ631jMeqs%3D
> &
> > >>>>> amp;reserved=0
> > >>>>
> > >>>> The Debian 9 release we tested is not an SEV guest.
> > >>> ok. I have not tested Debian 9 before. I will try now. Will let
> > >>> you know how it goes. thanks
> > >>>
> > >>
> > >> I have reproduced the issue locally. Will investigate. thanks
> > >>
> > > Few updates.
> > > 1. Like Jim mentioned earlier, this appears to be guest kernel issue.
> > > Debian 9 runs the base kernel 4.9.0-14. Problem can be seen
> > > consistently with this kernel.
> > >
> > > 2. This guest kernel(4.9.0-14) does not like the new feature INVPCID.
> > >
> > > 3. System comes up fine when invpcid feature is disabled with the
> > > boot parameter "noinvpcid" and also with "nopcid". nopcid disables
> > > both pcid and invpcid.
> > >
> > > 4. Upgraded the guest kernel to v5.0 and system comes up fine.
> > >
> > > 5. Also system comes up fine with latest guest kernel 5.11.0-rc4.
> > >
> > > I did not bisect further yet.
> > > Babu
> > > Thanks
> >
> >
> > Some more update:
> >  System comes up fine with kernel v4.9(checked out on upstream tag v4.9).
> > So, I am assuming this is something specific to Debian 4.9.0-14 kernel.
> >
> > Note: I couldn't go back prior versions(v4.8 or earlier) due to
> > compile issues.
> > Thanks
> > Babu
> >

^ permalink raw reply	[flat|nested] 62+ messages in thread

* RE: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
  2021-02-24 22:17                     ` Babu Moger
@ 2021-03-10  1:04                       ` Babu Moger
  2021-03-10  9:08                         ` Paolo Bonzini
  0 siblings, 1 reply; 62+ messages in thread
From: Babu Moger @ 2021-03-10  1:04 UTC (permalink / raw)
  To: Babu Moger, Jim Mattson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, kvm list,
	Joerg Roedel, the arch/x86 maintainers, LKML, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, Thomas Gleixner,
	Makarand Sonare, Sean Christopherson



> -----Original Message-----
> From: Babu Moger <babu.moger@amd.com>
> Sent: Wednesday, February 24, 2021 4:17 PM
> To: Jim Mattson <jmattson@google.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>; Vitaly Kuznetsov
> <vkuznets@redhat.com>; Wanpeng Li <wanpengli@tencent.com>; kvm list
> <kvm@vger.kernel.org>; Joerg Roedel <joro@8bytes.org>; the arch/x86
> maintainers <x86@kernel.org>; LKML <linux-kernel@vger.kernel.org>; Ingo
> Molnar <mingo@redhat.com>; Borislav Petkov <bp@alien8.de>; H . Peter
> Anvin <hpa@zytor.com>; Thomas Gleixner <tglx@linutronix.de>; Makarand
> Sonare <makarandsonare@google.com>; Sean Christopherson
> <seanjc@google.com>
> Subject: RE: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
> 
> 
> 
> > -----Original Message-----
> > From: Jim Mattson <jmattson@google.com>
> > Sent: Tuesday, February 23, 2021 6:14 PM
> > To: Moger, Babu <Babu.Moger@amd.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>; Vitaly Kuznetsov
> > <vkuznets@redhat.com>; Wanpeng Li <wanpengli@tencent.com>; kvm list
> > <kvm@vger.kernel.org>; Joerg Roedel <joro@8bytes.org>; the arch/x86
> > maintainers <x86@kernel.org>; LKML <linux-kernel@vger.kernel.org>;
> > Ingo Molnar <mingo@redhat.com>; Borislav Petkov <bp@alien8.de>; H .
> > Peter Anvin <hpa@zytor.com>; Thomas Gleixner <tglx@linutronix.de>;
> > Makarand Sonare <makarandsonare@google.com>; Sean Christopherson
> > <seanjc@google.com>
> > Subject: Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
> >
> > Any updates? What should we be telling customers with Debian 9 guests?
> > :-)
> 
> Found another problem with pcid feature om SVM. It is do with CR4 flags
> reset during bootup. Problem was showing up with kexec loading on VM.
> I am not sure if this is related to that. Will send the patch soon.

Tried to reproduce the problem on upstream kernel versions without any
success.  Tried v4.9-0 and v4.8-0. Both these upstream versions are
working fine. So "git bisect" on upstream is ruled out.

Debian kernel 4.10(tag 4.10~rc6-1~exp1) also works fine. It appears the
problem is on Debian 4.9 kernel. I am not sure how to run git bisect on
Debian kernel. Tried anyway. It is pointing to

47811c66356d875e76a6ca637a9d384779a659bb is the first bad commit
commit 47811c66356d875e76a6ca637a9d384779a659bb
Author: Ben Hutchings <benh@debian.org>
Date:   Mon Mar 8 01:17:32 2021 +0100

    Prepare to release linux (4.9.258-1).

It does not appear to be the right commit. I am out of ideas now.
hanks
Babu

> 
> >
> > On Fri, Jan 22, 2021 at 5:52 PM Babu Moger <babu.moger@amd.com>
> wrote:
> > >
> > >
> > >
> > > On 1/21/21 5:51 PM, Babu Moger wrote:
> > > >
> > > >
> > > > On 1/20/21 9:10 PM, Babu Moger wrote:
> > > >>
> > > >>
> > > >> On 1/20/21 3:45 PM, Babu Moger wrote:
> > > >>>
> > > >>>
> > > >>> On 1/20/21 3:14 PM, Jim Mattson wrote:
> > > >>>> On Tue, Jan 19, 2021 at 3:45 PM Babu Moger
> <babu.moger@amd.com>
> > wrote:
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> On 1/19/21 5:01 PM, Jim Mattson wrote:
> > > >>>>>> On Mon, Sep 14, 2020 at 11:33 AM Babu Moger
> > <babu.moger@amd.com> wrote:
> > > >>>>>>
> > > >>>>>>> Thanks Paolo. Tested Guest/nested guest/kvm units tests.
> > > >>>>>>> Everything works as expected.
> > > >>>>>>
> > > >>>>>> Debian 9 does not like this patch set. As a kvm guest, it
> > > >>>>>> panics on a Milan CPU unless booted with 'nopcid'. Gmail
> > > >>>>>> mangles long lines, so please see the attached kernel log
> > > >>>>>> snippet. Debian 10 is fine, so I assume this is a guest bug.
> > > >>>>>>
> > > >>>>>
> > > >>>>> We had an issue with PCID feature earlier. This was showing
> > > >>>>> only with SEV guests. It is resolved recently. Do you think it
> > > >>>>> is not related
> > that?
> > > >>>>> Here are the patch set.
> > > >>>>>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2
> > > >>>>> F%25
> > > >>>>>
> > 2Flore.kernel.org%2Fkvm%2F160521930597.32054.4906933314022910996
> > > >>>>> .stgit%40bmoger-
> > ubuntu%2F&amp;data=04%7C01%7Cbabu.moger%40amd.co
> > > >>>>>
> >
> m%7C9558672ca21c4f6c2d5308d8d85919dc%7C3dd8961fe4884e608e11a82d9
> > > >>>>>
> >
> 94e183d%7C0%7C0%7C637497224490455772%7CUnknown%7CTWFpbGZsb3d
> 8
> > eyJ
> > > >>>>>
> >
> WIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%
> > > >>>>>
> >
> 7C1000&amp;sdata=4QzTNHaYllwPd1U0kumq75dpwp7Rg0ZXsSQ631jMeqs%
> 3D
> > &
> > > >>>>> amp;reserved=0
> > > >>>>
> > > >>>> The Debian 9 release we tested is not an SEV guest.
> > > >>> ok. I have not tested Debian 9 before. I will try now. Will let
> > > >>> you know how it goes. thanks
> > > >>>
> > > >>
> > > >> I have reproduced the issue locally. Will investigate. thanks
> > > >>
> > > > Few updates.
> > > > 1. Like Jim mentioned earlier, this appears to be guest kernel issue.
> > > > Debian 9 runs the base kernel 4.9.0-14. Problem can be seen
> > > > consistently with this kernel.
> > > >
> > > > 2. This guest kernel(4.9.0-14) does not like the new feature INVPCID.
> > > >
> > > > 3. System comes up fine when invpcid feature is disabled with the
> > > > boot parameter "noinvpcid" and also with "nopcid". nopcid disables
> > > > both pcid and invpcid.
> > > >
> > > > 4. Upgraded the guest kernel to v5.0 and system comes up fine.
> > > >
> > > > 5. Also system comes up fine with latest guest kernel 5.11.0-rc4.
> > > >
> > > > I did not bisect further yet.
> > > > Babu
> > > > Thanks
> > >
> > >
> > > Some more update:
> > >  System comes up fine with kernel v4.9(checked out on upstream tag
> v4.9).
> > > So, I am assuming this is something specific to Debian 4.9.0-14 kernel.
> > >
> > > Note: I couldn't go back prior versions(v4.8 or earlier) due to
> > > compile issues.
> > > Thanks
> > > Babu
> > >

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
  2021-03-10  1:04                       ` Babu Moger
@ 2021-03-10  9:08                         ` Paolo Bonzini
  2021-03-10 14:55                           ` Babu Moger
  0 siblings, 1 reply; 62+ messages in thread
From: Paolo Bonzini @ 2021-03-10  9:08 UTC (permalink / raw)
  To: Babu Moger, Jim Mattson
  Cc: Vitaly Kuznetsov, Wanpeng Li, kvm list, Joerg Roedel,
	the arch/x86 maintainers, LKML, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, Thomas Gleixner, Makarand Sonare,
	Sean Christopherson

On 10/03/21 02:04, Babu Moger wrote:
> Debian kernel 4.10(tag 4.10~rc6-1~exp1) also works fine. It appears the
> problem is on Debian 4.9 kernel. I am not sure how to run git bisect on
> Debian kernel. Tried anyway. It is pointing to
> 
> 47811c66356d875e76a6ca637a9d384779a659bb is the first bad commit
> commit 47811c66356d875e76a6ca637a9d384779a659bb
> Author: Ben Hutchings<benh@debian.org>
> Date:   Mon Mar 8 01:17:32 2021 +0100
> 
>      Prepare to release linux (4.9.258-1).
> 
> It does not appear to be the right commit. I am out of ideas now.
> hanks

Have you tried bisecting the upstream stable kernels (from 4.9.0 to 
4.9.258)?

Paolo


^ permalink raw reply	[flat|nested] 62+ messages in thread

* RE: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
  2021-03-10  9:08                         ` Paolo Bonzini
@ 2021-03-10 14:55                           ` Babu Moger
  2021-03-10 14:58                             ` Babu Moger
  0 siblings, 1 reply; 62+ messages in thread
From: Babu Moger @ 2021-03-10 14:55 UTC (permalink / raw)
  To: Paolo Bonzini, Jim Mattson
  Cc: Vitaly Kuznetsov, Wanpeng Li, kvm list, Joerg Roedel,
	the arch/x86 maintainers, LKML, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, Thomas Gleixner, Makarand Sonare,
	Sean Christopherson



> -----Original Message-----
> From: Paolo Bonzini <pbonzini@redhat.com>
> Sent: Wednesday, March 10, 2021 3:09 AM
> To: Moger, Babu <Babu.Moger@amd.com>; Jim Mattson
> <jmattson@google.com>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>; Wanpeng Li
> <wanpengli@tencent.com>; kvm list <kvm@vger.kernel.org>; Joerg Roedel
> <joro@8bytes.org>; the arch/x86 maintainers <x86@kernel.org>; LKML <linux-
> kernel@vger.kernel.org>; Ingo Molnar <mingo@redhat.com>; Borislav Petkov
> <bp@alien8.de>; H . Peter Anvin <hpa@zytor.com>; Thomas Gleixner
> <tglx@linutronix.de>; Makarand Sonare <makarandsonare@google.com>; Sean
> Christopherson <seanjc@google.com>
> Subject: Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
> 
> On 10/03/21 02:04, Babu Moger wrote:
> > Debian kernel 4.10(tag 4.10~rc6-1~exp1) also works fine. It appears
> > the problem is on Debian 4.9 kernel. I am not sure how to run git
> > bisect on Debian kernel. Tried anyway. It is pointing to
> >
> > 47811c66356d875e76a6ca637a9d384779a659bb is the first bad commit
> > commit 47811c66356d875e76a6ca637a9d384779a659bb
> > Author: Ben Hutchings<benh@debian.org>
> > Date:   Mon Mar 8 01:17:32 2021 +0100
> >
> >      Prepare to release linux (4.9.258-1).
> >
> > It does not appear to be the right commit. I am out of ideas now.
> > hanks
> 
> Have you tried bisecting the upstream stable kernels (from 4.9.0 to 4.9.258)?

I couldn't reproduce the issue on any of the upstream versions. I have
tried v4.9, v4.8 and even on latest v5.11. No issues there.

Jim mentioned Debian 10 which is based of kernel version 4.19 is also
fine. Issue appears to be only affecting  Debian 9(kernel v4.9.0-14).

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
  2021-03-10 14:55                           ` Babu Moger
@ 2021-03-10 14:58                             ` Babu Moger
  2021-03-10 15:31                               ` Paolo Bonzini
  0 siblings, 1 reply; 62+ messages in thread
From: Babu Moger @ 2021-03-10 14:58 UTC (permalink / raw)
  To: Paolo Bonzini, Jim Mattson
  Cc: Vitaly Kuznetsov, Wanpeng Li, kvm list, Joerg Roedel,
	the arch/x86 maintainers, LKML, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, Thomas Gleixner, Makarand Sonare,
	Sean Christopherson



On 3/10/21 8:55 AM, Babu Moger wrote:
> 
> 
>> -----Original Message-----
>> From: Paolo Bonzini <pbonzini@redhat.com>
>> Sent: Wednesday, March 10, 2021 3:09 AM
>> To: Moger, Babu <Babu.Moger@amd.com>; Jim Mattson
>> <jmattson@google.com>
>> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>; Wanpeng Li
>> <wanpengli@tencent.com>; kvm list <kvm@vger.kernel.org>; Joerg Roedel
>> <joro@8bytes.org>; the arch/x86 maintainers <x86@kernel.org>; LKML <linux-
>> kernel@vger.kernel.org>; Ingo Molnar <mingo@redhat.com>; Borislav Petkov
>> <bp@alien8.de>; H . Peter Anvin <hpa@zytor.com>; Thomas Gleixner
>> <tglx@linutronix.de>; Makarand Sonare <makarandsonare@google.com>; Sean
>> Christopherson <seanjc@google.com>
>> Subject: Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
>>
>> On 10/03/21 02:04, Babu Moger wrote:
>>> Debian kernel 4.10(tag 4.10~rc6-1~exp1) also works fine. It appears
>>> the problem is on Debian 4.9 kernel. I am not sure how to run git
>>> bisect on Debian kernel. Tried anyway. It is pointing to
>>>
>>> 47811c66356d875e76a6ca637a9d384779a659bb is the first bad commit
>>> commit 47811c66356d875e76a6ca637a9d384779a659bb
>>> Author: Ben Hutchings<benh@debian.org>
>>> Date:   Mon Mar 8 01:17:32 2021 +0100
>>>
>>>      Prepare to release linux (4.9.258-1).
>>>
>>> It does not appear to be the right commit. I am out of ideas now.
>>> hanks
>>
>> Have you tried bisecting the upstream stable kernels (from 4.9.0 to 4.9.258)?

I couldn't reproduce the issue on any of the upstream versions. I have
tried v4.9, v4.8 and even on latest v5.11. No issues there. There is no
upstream version 4.9.258.

Jim mentioned Debian 10 which is based of kernel version 4.19 is also
fine. Issue appears to be only affecting  Debian 9(kernel v4.9.0-14).


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
  2021-03-10 14:58                             ` Babu Moger
@ 2021-03-10 15:31                               ` Paolo Bonzini
  2021-03-11  1:21                                 ` Babu Moger
  0 siblings, 1 reply; 62+ messages in thread
From: Paolo Bonzini @ 2021-03-10 15:31 UTC (permalink / raw)
  To: Babu Moger, Jim Mattson
  Cc: Vitaly Kuznetsov, Wanpeng Li, kvm list, Joerg Roedel,
	the arch/x86 maintainers, LKML, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, Thomas Gleixner, Makarand Sonare,
	Sean Christopherson

On 10/03/21 15:58, Babu Moger wrote:
> There is no upstream version 4.9.258.

Sure there is, check out https://cdn.kernel.org/pub/linux/kernel/v4.x/

The easiest way to do it is to bisect on the linux-4.9.y branch of 
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git.

paolo


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
  2021-03-10 15:31                               ` Paolo Bonzini
@ 2021-03-11  1:21                                 ` Babu Moger
  2021-03-11 20:07                                   ` Borislav Petkov
  0 siblings, 1 reply; 62+ messages in thread
From: Babu Moger @ 2021-03-11  1:21 UTC (permalink / raw)
  To: Paolo Bonzini, Jim Mattson
  Cc: Vitaly Kuznetsov, Wanpeng Li, kvm list, Joerg Roedel,
	the arch/x86 maintainers, LKML, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, Thomas Gleixner, Makarand Sonare,
	Sean Christopherson



On 3/10/21 9:31 AM, Paolo Bonzini wrote:
> On 10/03/21 15:58, Babu Moger wrote:
>> There is no upstream version 4.9.258.
> 
> Sure there is, check out
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcdn.kernel.org%2Fpub%2Flinux%2Fkernel%2Fv4.x%2F&amp;data=04%7C01%7Cbabu.moger%40amd.com%7Caeefc58416ed490faa7f08d8e3d99d72%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637509871127634618%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=re2Jj5P7IjN2UdmPTjTuKd1KIJLek84KlcnsXxgKYRc%3D&amp;reserved=0
> 
> 
> The easiest way to do it is to bisect on the linux-4.9.y branch of
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git.
> 
Paolo, Thanks for pointing that out. Bisected linux-4.9.y branch.
It is pointing at

# git bisect good
59094faf3f618b2d2b2a45acb916437d611cede6 is the first bad commit
commit 59094faf3f618b2d2b2a45acb916437d611cede6
Author: Borislav Petkov <bp@suse.de>
Date:   Mon Dec 25 13:57:16 2017 +0100

    x86/kaiser: Move feature detection up


    ... before the first use of kaiser_enabled as otherwise funky
    things happen:

      about to get started...
      (XEN) d0v0 Unhandled page fault fault/trap [#14, ec=0000]
      (XEN) Pagetable walk from ffff88022a449090:
      (XEN)  L4[0x110] = 0000000229e0e067 0000000000001e0e
      (XEN)  L3[0x008] = 0000000000000000 ffffffffffffffff
      (XEN) domain_crash_sync called from entry.S: fault at ffff82d08033fd08
      entry.o#create_bounce_frame+0x135/0x14d
      (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
      (XEN) ----[ Xen-4.9.1_02-3.21  x86_64  debug=n   Not tainted ]----
      (XEN) CPU:    0
      (XEN) RIP:    e033:[<ffffffff81007460>]
      (XEN) RFLAGS: 0000000000000286   EM: 1   CONTEXT: pv guest (d0v0)

    Signed-off-by: Borislav Petkov <bp@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

:040000 040000 e56bbc975c3fd1a774b6cc0d6699c0c24e66be1c
e06231dccc8589b4baa0cd5759a37899b7ec71c1 M    arch

Not sure what is going on with this commit. Still looking.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
  2021-03-11  1:21                                 ` Babu Moger
@ 2021-03-11 20:07                                   ` Borislav Petkov
  2021-03-11 20:32                                     ` Borislav Petkov
  0 siblings, 1 reply; 62+ messages in thread
From: Borislav Petkov @ 2021-03-11 20:07 UTC (permalink / raw)
  To: Babu Moger
  Cc: Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov, Wanpeng Li,
	kvm list, Joerg Roedel, the arch/x86 maintainers, LKML,
	Ingo Molnar, H . Peter Anvin, Thomas Gleixner, Makarand Sonare,
	Sean Christopherson

On Wed, Mar 10, 2021 at 07:21:23PM -0600, Babu Moger wrote:
> # git bisect good
> 59094faf3f618b2d2b2a45acb916437d611cede6 is the first bad commit
> commit 59094faf3f618b2d2b2a45acb916437d611cede6
> Author: Borislav Petkov <bp@suse.de>
> Date:   Mon Dec 25 13:57:16 2017 +0100
> 
>     x86/kaiser: Move feature detection up

What is the reproducer?

Boot latest 4.9 stable kernel in a SEV guest? Can you send guest
.config?

Upthread is talking about PCID, so I'm guessing host needs to be Zen3
with PCID. Anything else?

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
  2021-03-11 20:07                                   ` Borislav Petkov
@ 2021-03-11 20:32                                     ` Borislav Petkov
  2021-03-11 20:57                                       ` Babu Moger
  2021-03-11 21:23                                       ` [PATCH v6 00/12] SVM cleanup and INVPCID feature support Jim Mattson
  0 siblings, 2 replies; 62+ messages in thread
From: Borislav Petkov @ 2021-03-11 20:32 UTC (permalink / raw)
  To: Babu Moger
  Cc: Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov, Wanpeng Li,
	kvm list, Joerg Roedel, the arch/x86 maintainers, LKML,
	Ingo Molnar, H . Peter Anvin, Thomas Gleixner, Makarand Sonare,
	Sean Christopherson

On Thu, Mar 11, 2021 at 09:07:55PM +0100, Borislav Petkov wrote:
> On Wed, Mar 10, 2021 at 07:21:23PM -0600, Babu Moger wrote:
> > # git bisect good
> > 59094faf3f618b2d2b2a45acb916437d611cede6 is the first bad commit
> > commit 59094faf3f618b2d2b2a45acb916437d611cede6
> > Author: Borislav Petkov <bp@suse.de>
> > Date:   Mon Dec 25 13:57:16 2017 +0100
> > 
> >     x86/kaiser: Move feature detection up
> 
> What is the reproducer?
> 
> Boot latest 4.9 stable kernel in a SEV guest? Can you send guest
> .config?
> 
> Upthread is talking about PCID, so I'm guessing host needs to be Zen3
> with PCID. Anything else?

That oops points to:

[    1.237515] kernel BUG at /build/linux-dqnRSc/linux-4.9.228/arch/x86/kernel/alternative.c:709!

which is:

        local_flush_tlb();
        sync_core();
        /* Could also do a CLFLUSH here to speed up CPU recovery; but
           that causes hangs on some VIA CPUs. */
        for (i = 0; i < len; i++)
                BUG_ON(((char *)addr)[i] != ((char *)opcode)[i]);	<---
        local_irq_restore(flags);
        return addr;

in text_poke() which basically says that the patching verification
fails. And you have a local_flush_tlb() before that. And with PCID maybe
it is not flushing properly or whatnot.

And deep down in the TLB flushing code, it does:

        if (kaiser_enabled)
                kaiser_flush_tlb_on_return_to_user();

and that uses PCID...

Anyway, needs more info.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
  2021-03-11 20:32                                     ` Borislav Petkov
@ 2021-03-11 20:57                                       ` Babu Moger
  2021-03-11 21:40                                         ` Borislav Petkov
  2021-03-11 21:23                                       ` [PATCH v6 00/12] SVM cleanup and INVPCID feature support Jim Mattson
  1 sibling, 1 reply; 62+ messages in thread
From: Babu Moger @ 2021-03-11 20:57 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov, Wanpeng Li,
	kvm list, Joerg Roedel, the arch/x86 maintainers, LKML,
	Ingo Molnar, H . Peter Anvin, Thomas Gleixner, Makarand Sonare,
	Sean Christopherson



On 3/11/21 2:32 PM, Borislav Petkov wrote:
> On Thu, Mar 11, 2021 at 09:07:55PM +0100, Borislav Petkov wrote:
>> On Wed, Mar 10, 2021 at 07:21:23PM -0600, Babu Moger wrote:
>>> # git bisect good
>>> 59094faf3f618b2d2b2a45acb916437d611cede6 is the first bad commit
>>> commit 59094faf3f618b2d2b2a45acb916437d611cede6
>>> Author: Borislav Petkov <bp@suse.de>
>>> Date:   Mon Dec 25 13:57:16 2017 +0100
>>>
>>>     x86/kaiser: Move feature detection up
>>
>> What is the reproducer?
>>
>> Boot latest 4.9 stable kernel in a SEV guest? Can you send guest
>> .config?
>>
>> Upthread is talking about PCID, so I'm guessing host needs to be Zen3
>> with PCID. Anything else?
> 
> That oops points to:
> 
> [    1.237515] kernel BUG at /build/linux-dqnRSc/linux-4.9.228/arch/x86/kernel/alternative.c:709!
> 
> which is:
> 
>         local_flush_tlb();
>         sync_core();
>         /* Could also do a CLFLUSH here to speed up CPU recovery; but
>            that causes hangs on some VIA CPUs. */
>         for (i = 0; i < len; i++)
>                 BUG_ON(((char *)addr)[i] != ((char *)opcode)[i]);	<---
>         local_irq_restore(flags);
>         return addr;
> 
> in text_poke() which basically says that the patching verification
> fails. And you have a local_flush_tlb() before that. And with PCID maybe
> it is not flushing properly or whatnot.
> 
> And deep down in the TLB flushing code, it does:
> 
>         if (kaiser_enabled)
>                 kaiser_flush_tlb_on_return_to_user();
> 
> and that uses PCID...
> 
> Anyway, needs more info.

Boris,
 It is related PCID and INVPCID combination. Few more details.
 1. System comes up fine with "noinvpid". So, it happens when invpcid is
enabled.
 2. Host is coming up fine. Problem is with the guest.
 3. Problem happens with Debian 9. Debian kernel version is 4.9.0-14.
 4. Debian 10 is fine.
 5. Upstream kernels are fine. Tried on v5.11 and it is working fine.
 6. Git bisect pointed to commit 47811c66356d875e76a6ca637a9d384779a659bb.

 Let me know if want me to try something else.
thanks
Babu



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
  2021-03-11 20:32                                     ` Borislav Petkov
  2021-03-11 20:57                                       ` Babu Moger
@ 2021-03-11 21:23                                       ` Jim Mattson
  2021-03-11 21:36                                         ` Borislav Petkov
  1 sibling, 1 reply; 62+ messages in thread
From: Jim Mattson @ 2021-03-11 21:23 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Babu Moger, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	kvm list, Joerg Roedel, the arch/x86 maintainers, LKML,
	Ingo Molnar, H . Peter Anvin, Thomas Gleixner, Makarand Sonare,
	Sean Christopherson

On Thu, Mar 11, 2021 at 12:32 PM Borislav Petkov <bp@alien8.de> wrote:
>
> On Thu, Mar 11, 2021 at 09:07:55PM +0100, Borislav Petkov wrote:
> > On Wed, Mar 10, 2021 at 07:21:23PM -0600, Babu Moger wrote:
> > > # git bisect good
> > > 59094faf3f618b2d2b2a45acb916437d611cede6 is the first bad commit
> > > commit 59094faf3f618b2d2b2a45acb916437d611cede6
> > > Author: Borislav Petkov <bp@suse.de>
> > > Date:   Mon Dec 25 13:57:16 2017 +0100
> > >
> > >     x86/kaiser: Move feature detection up
> >
> > What is the reproducer?
> >
> > Boot latest 4.9 stable kernel in a SEV guest? Can you send guest
> > .config?
> >
> > Upthread is talking about PCID, so I'm guessing host needs to be Zen3
> > with PCID. Anything else?
>
> That oops points to:
>
> [    1.237515] kernel BUG at /build/linux-dqnRSc/linux-4.9.228/arch/x86/kernel/alternative.c:709!
>
> which is:
>
>         local_flush_tlb();
>         sync_core();
>         /* Could also do a CLFLUSH here to speed up CPU recovery; but
>            that causes hangs on some VIA CPUs. */
>         for (i = 0; i < len; i++)
>                 BUG_ON(((char *)addr)[i] != ((char *)opcode)[i]);       <---
>         local_irq_restore(flags);
>         return addr;
>
> in text_poke() which basically says that the patching verification
> fails. And you have a local_flush_tlb() before that. And with PCID maybe
> it is not flushing properly or whatnot.
>
> And deep down in the TLB flushing code, it does:
>
>         if (kaiser_enabled)
>                 kaiser_flush_tlb_on_return_to_user();
>
> and that uses PCID...

I would expect kaiser_enabled to be false (and PCIDs not to be used),
since AMD CPUs are not vulnerable to Meltdown.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
  2021-03-11 21:23                                       ` [PATCH v6 00/12] SVM cleanup and INVPCID feature support Jim Mattson
@ 2021-03-11 21:36                                         ` Borislav Petkov
  2021-03-11 21:50                                           ` Babu Moger
  0 siblings, 1 reply; 62+ messages in thread
From: Borislav Petkov @ 2021-03-11 21:36 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Babu Moger, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	kvm list, Joerg Roedel, the arch/x86 maintainers, LKML,
	Ingo Molnar, H . Peter Anvin, Thomas Gleixner, Makarand Sonare,
	Sean Christopherson

On Thu, Mar 11, 2021 at 01:23:47PM -0800, Jim Mattson wrote:
> I would expect kaiser_enabled to be false (and PCIDs not to be used),
> since AMD CPUs are not vulnerable to Meltdown.

Ah, of course. The guest dmesg should have

"Kernel/User page tables isolation: disabled."

Lemme see if I can reproduce.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
  2021-03-11 20:57                                       ` Babu Moger
@ 2021-03-11 21:40                                         ` Borislav Petkov
  2021-03-11 22:04                                           ` Babu Moger
  0 siblings, 1 reply; 62+ messages in thread
From: Borislav Petkov @ 2021-03-11 21:40 UTC (permalink / raw)
  To: Babu Moger
  Cc: Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov, Wanpeng Li,
	kvm list, Joerg Roedel, the arch/x86 maintainers, LKML,
	Ingo Molnar, H . Peter Anvin, Thomas Gleixner, Makarand Sonare,
	Sean Christopherson

On Thu, Mar 11, 2021 at 02:57:04PM -0600, Babu Moger wrote:
>  It is related PCID and INVPCID combination. Few more details.
>  1. System comes up fine with "noinvpid". So, it happens when invpcid is
> enabled.

Which system, host or guest?

>  2. Host is coming up fine. Problem is with the guest.

Aha, guest.

>  3. Problem happens with Debian 9. Debian kernel version is 4.9.0-14.
>  4. Debian 10 is fine.
>  5. Upstream kernels are fine. Tried on v5.11 and it is working fine.
>  6. Git bisect pointed to commit 47811c66356d875e76a6ca637a9d384779a659bb.
> 
>  Let me know if want me to try something else.

Yes, I assume host has the patches which belong to this thread?

So please describe:

1. host has these patches, cmdline params, etc.
2. guest is a 4.9 kernel, cmdline params, etc.

Please be exact and specific so that I can properly reproduce.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
  2021-03-11 21:36                                         ` Borislav Petkov
@ 2021-03-11 21:50                                           ` Babu Moger
  0 siblings, 0 replies; 62+ messages in thread
From: Babu Moger @ 2021-03-11 21:50 UTC (permalink / raw)
  To: Borislav Petkov, Jim Mattson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, kvm list,
	Joerg Roedel, the arch/x86 maintainers, LKML, Ingo Molnar,
	H . Peter Anvin, Thomas Gleixner, Makarand Sonare,
	Sean Christopherson



On 3/11/21 3:36 PM, Borislav Petkov wrote:
> On Thu, Mar 11, 2021 at 01:23:47PM -0800, Jim Mattson wrote:
>> I would expect kaiser_enabled to be false (and PCIDs not to be used),
>> since AMD CPUs are not vulnerable to Meltdown.
> 
> Ah, of course. The guest dmesg should have
> 
> "Kernel/User page tables isolation: disabled."

yes.
 #dmesg |grep isolation
[    0.000000] Kernel/User page tables isolation: disabled

> 
> Lemme see if I can reproduce.
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
  2021-03-11 21:40                                         ` Borislav Petkov
@ 2021-03-11 22:04                                           ` Babu Moger
  2021-03-11 22:15                                             ` Babu Moger
  0 siblings, 1 reply; 62+ messages in thread
From: Babu Moger @ 2021-03-11 22:04 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov, Wanpeng Li,
	kvm list, Joerg Roedel, the arch/x86 maintainers, LKML,
	Ingo Molnar, H . Peter Anvin, Thomas Gleixner, Makarand Sonare,
	Sean Christopherson



On 3/11/21 3:40 PM, Borislav Petkov wrote:
> On Thu, Mar 11, 2021 at 02:57:04PM -0600, Babu Moger wrote:
>>  It is related PCID and INVPCID combination. Few more details.
>>  1. System comes up fine with "noinvpid". So, it happens when invpcid is
>> enabled.
> 
> Which system, host or guest?
> 
>>  2. Host is coming up fine. Problem is with the guest.
> 
> Aha, guest.
> 
>>  3. Problem happens with Debian 9. Debian kernel version is 4.9.0-14.
>>  4. Debian 10 is fine.
>>  5. Upstream kernels are fine. Tried on v5.11 and it is working fine.
>>  6. Git bisect pointed to commit 47811c66356d875e76a6ca637a9d384779a659bb.
>>
>>  Let me know if want me to try something else.
> 
> Yes, I assume host has the patches which belong to this thread?

Yes. Host has all these patches. Right now I am on 5.12.0-rc2. I just
updated yesterday. I was able to reproduce 5.11 also.


> 
> So please describe:
> 
> 1. host has these patches, cmdline params, etc.

# cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.12.0-rc2+ root=/dev/mapper/rhel-root ro
crashkernel=auto resume=/dev/mapper/rhel-swap rd.lvm.lv=rhel/root
rd.lvm.lv=rhel/swap ras=cec_disable nmi_watchdog=0 warn_ud2=on selinux=0
earlyprintk=serial,ttyS1,115200n8 console=ttyS1,115200n8


> 2. guest is a 4.9 kernel, cmdline params, etc.

I use qemu command line to bring up the guest. Make sure to use "-cpu host".

qemu-system-x86_64 -name deb9 -m 16384 -smp cores=16,threads=1,sockets=1
-hda vdisk-deb.qcow2 -enable-kvm -net nic  -net
bridge,br=virbr0,helper=/usr/libexec/qemu-bridge-helper -cpu host,+svm
-nographic


The grub command line looks like this on the guest.

cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.9.0-14-amd64
root=UUID=a0069240-cd60-4795-a391-273266dbae29 ro console=ttyS0,112500n8
earlyprintk


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
  2021-03-11 22:04                                           ` Babu Moger
@ 2021-03-11 22:15                                             ` Babu Moger
  2021-03-11 23:52                                               ` Borislav Petkov
  0 siblings, 1 reply; 62+ messages in thread
From: Babu Moger @ 2021-03-11 22:15 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov, Wanpeng Li,
	kvm list, Joerg Roedel, the arch/x86 maintainers, LKML,
	Ingo Molnar, H . Peter Anvin, Thomas Gleixner, Makarand Sonare,
	Sean Christopherson



On 3/11/21 4:04 PM, Babu Moger wrote:
> 
> 
> On 3/11/21 3:40 PM, Borislav Petkov wrote:
>> On Thu, Mar 11, 2021 at 02:57:04PM -0600, Babu Moger wrote:
>>>  It is related PCID and INVPCID combination. Few more details.
>>>  1. System comes up fine with "noinvpid". So, it happens when invpcid is
>>> enabled.
>>
>> Which system, host or guest?
>>
>>>  2. Host is coming up fine. Problem is with the guest.
>>
>> Aha, guest.
>>
>>>  3. Problem happens with Debian 9. Debian kernel version is 4.9.0-14.
>>>  4. Debian 10 is fine.
>>>  5. Upstream kernels are fine. Tried on v5.11 and it is working fine.
>>>  6. Git bisect pointed to commit 47811c66356d875e76a6ca637a9d384779a659bb.
>>>
>>>  Let me know if want me to try something else.
>>
>> Yes, I assume host has the patches which belong to this thread?
> 
> Yes. Host has all these patches. Right now I am on 5.12.0-rc2. I just
> updated yesterday. I was able to reproduce 5.11 also.
> 
> 
>>
>> So please describe:
>>
>> 1. host has these patches, cmdline params, etc.

My host is
# cat /etc/redhat-release
Red Hat Enterprise Linux release 8.3 (Ootpa)
# uname -r
5.12.0-rc2+


> 
> # cat /proc/cmdline
> BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.12.0-rc2+ root=/dev/mapper/rhel-root ro
> crashkernel=auto resume=/dev/mapper/rhel-swap rd.lvm.lv=rhel/root
> rd.lvm.lv=rhel/swap ras=cec_disable nmi_watchdog=0 warn_ud2=on selinux=0
> earlyprintk=serial,ttyS1,115200n8 console=ttyS1,115200n8
> 
> 
>> 2. guest is a 4.9 kernel, cmdline params, etc.
> 
> I use qemu command line to bring up the guest. Make sure to use "-cpu host".
> 
> qemu-system-x86_64 -name deb9 -m 16384 -smp cores=16,threads=1,sockets=1
> -hda vdisk-deb.qcow2 -enable-kvm -net nic  -net
> bridge,br=virbr0,helper=/usr/libexec/qemu-bridge-helper -cpu host,+svm
> -nographic
> 
> 
> The grub command line looks like this on the guest.
> 
> cat /proc/cmdline
> BOOT_IMAGE=/boot/vmlinuz-4.9.0-14-amd64
> root=UUID=a0069240-cd60-4795-a391-273266dbae29 ro console=ttyS0,112500n8
> earlyprintk
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
  2021-03-11 22:15                                             ` Babu Moger
@ 2021-03-11 23:52                                               ` Borislav Petkov
  2021-03-12 16:12                                                 ` Babu Moger
  0 siblings, 1 reply; 62+ messages in thread
From: Borislav Petkov @ 2021-03-11 23:52 UTC (permalink / raw)
  To: Babu Moger
  Cc: Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov, Wanpeng Li,
	kvm list, Joerg Roedel, the arch/x86 maintainers, LKML,
	Ingo Molnar, H . Peter Anvin, Thomas Gleixner, Makarand Sonare,
	Sean Christopherson

On Thu, Mar 11, 2021 at 04:15:37PM -0600, Babu Moger wrote:
> My host is
> # cat /etc/redhat-release
> Red Hat Enterprise Linux release 8.3 (Ootpa)
> # uname -r
> 5.12.0-rc2+

Please upload host and guest .config.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 62+ messages in thread

* RE: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
  2021-03-11 23:52                                               ` Borislav Petkov
@ 2021-03-12 16:12                                                 ` Babu Moger
  2021-03-24 21:21                                                   ` Borislav Petkov
  0 siblings, 1 reply; 62+ messages in thread
From: Babu Moger @ 2021-03-12 16:12 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov, Wanpeng Li,
	kvm list, Joerg Roedel, the arch/x86 maintainers, LKML,
	Ingo Molnar, H . Peter Anvin, Thomas Gleixner, Makarand Sonare,
	Sean Christopherson



> -----Original Message-----
> From: Borislav Petkov <bp@alien8.de>
> Sent: Thursday, March 11, 2021 5:52 PM
> To: Moger, Babu <Babu.Moger@amd.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>; Jim Mattson
> <jmattson@google.com>; Vitaly Kuznetsov <vkuznets@redhat.com>; Wanpeng
> Li <wanpengli@tencent.com>; kvm list <kvm@vger.kernel.org>; Joerg Roedel
> <joro@8bytes.org>; the arch/x86 maintainers <x86@kernel.org>; LKML <linux-
> kernel@vger.kernel.org>; Ingo Molnar <mingo@redhat.com>; H . Peter Anvin
> <hpa@zytor.com>; Thomas Gleixner <tglx@linutronix.de>; Makarand Sonare
> <makarandsonare@google.com>; Sean Christopherson <seanjc@google.com>
> Subject: Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
> 
> On Thu, Mar 11, 2021 at 04:15:37PM -0600, Babu Moger wrote:
> > My host is
> > # cat /etc/redhat-release
> > Red Hat Enterprise Linux release 8.3 (Ootpa) # uname -r 5.12.0-rc2+
> 
> Please upload host and guest .config.

Host config.

https://pastebin.com/wuLzEqZr

Guest config

https://pastebin.com/mvzEEq6R

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
  2021-03-12 16:12                                                 ` Babu Moger
@ 2021-03-24 21:21                                                   ` Borislav Petkov
  2021-03-24 21:59                                                     ` Paolo Bonzini
  2021-03-25  0:05                                                     ` Hugh Dickins
  0 siblings, 2 replies; 62+ messages in thread
From: Borislav Petkov @ 2021-03-24 21:21 UTC (permalink / raw)
  To: Babu Moger, Hugh Dickins
  Cc: Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov, Wanpeng Li,
	kvm list, Joerg Roedel, the arch/x86 maintainers, LKML,
	Ingo Molnar, H . Peter Anvin, Thomas Gleixner, Makarand Sonare,
	Sean Christopherson

Ok,

some more experimenting Babu and I did lead us to:

---
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index f5ca15622dc9..259aa4889cad 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -250,6 +250,9 @@ static inline void __native_flush_tlb_single(unsigned long addr)
 	 */
 	if (kaiser_enabled)
 		invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr);
+	else
+		asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
+
 	invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
 }

applied on the guest kernel which fixes the issue. And let me add Hugh
who did that PCID stuff at the time. So lemme summarize for Hugh and to
ask him nicely to sanity-check me. :-)

Basically, you have an AMD host which supports PCID and INVPCID and you
boot on it a 4.9 guest. It explodes like the panic below.

What fixes it is this:

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index f5ca15622dc9..259aa4889cad 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -250,6 +250,9 @@ static inline void __native_flush_tlb_single(unsigned long addr)
 	 */
 	if (kaiser_enabled)
 		invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr);
+	else
+		asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
+
 	invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
 }

---

and the reason why it does, IMHO, is because on AMD, kaiser_enabled is
false because AMD is not affected by Meltdown, which means, there's no
user/kernel pagetables split.

And that also means, you have global TLB entries which means that if you
look at that __native_flush_tlb_single() function, it needs to flush
global TLB entries on CPUs with X86_FEATURE_INVPCID_SINGLE by doing an
INVLPG in the kaiser_enabled=0 case. Errgo, the above hunk.

But I might be completely off here thus this note...

Thoughts?

Thx.


[    1.235726] ------------[ cut here ]------------
[    1.237515] kernel BUG at /build/linux-dqnRSc/linux-4.9.228/arch/x86/kernel/alternative.c:709!
[    1.240926] invalid opcode: 0000 [#1] SMP
[    1.243301] Modules linked in:
[    1.244585] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.9.0-13-amd64 #1 Debian 4.9.228-1
[    1.247657] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[    1.251249] task: ffff909363e94040 task.stack: ffffa41bc0194000
[    1.253519] RIP: 0010:[<ffffffff8fa2e40c>]  [<ffffffff8fa2e40c>] text_poke+0x18c/0x240
[    1.256593] RSP: 0018:ffffa41bc0197d90  EFLAGS: 00010096
[    1.258657] RAX: 000000000000000f RBX: 0000000001020800 RCX: 00000000feda3203
[    1.261388] RDX: 00000000178bfbff RSI: 0000000000000000 RDI: ffffffffff57a000
[    1.264168] RBP: ffffffff8fbd3eca R08: 0000000000000000 R09: 0000000000000003
[    1.266983] R10: 0000000000000003 R11: 0000000000000112 R12: 0000000000000001
[    1.269702] R13: ffffa41bc0197dcf R14: 0000000000000286 R15: ffffed1c40407500
[    1.272572] FS:  0000000000000000(0000) GS:ffff909366300000(0000) knlGS:0000000000000000
[    1.275791] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.278032] CR2: 0000000000000000 CR3: 0000000010c08000 CR4: 00000000003606f0
[    1.280815] Stack:
[    1.281630]  ffffffff8fbd3eca 0000000000000005 ffffa41bc0197e03 ffffffff8fbd3ecb
[    1.284660]  0000000000000000 0000000000000000 ffffffff8fa2e835 ccffffff8fad4326
[    1.287729]  1ccd0231874d55d3 ffffffff8fbd3eca ffffa41bc0197e03 ffffffff90203844
[    1.290852] Call Trace:
[    1.291782]  [<ffffffff8fbd3eca>] ? swap_entry_free+0x12a/0x300
[    1.294900]  [<ffffffff8fbd3ecb>] ? swap_entry_free+0x12b/0x300
[    1.297267]  [<ffffffff8fa2e835>] ? text_poke_bp+0x55/0xe0
[    1.299473]  [<ffffffff8fbd3eca>] ? swap_entry_free+0x12a/0x300
[    1.301896]  [<ffffffff8fa2b64c>] ? arch_jump_label_transform+0x9c/0x120
[    1.304557]  [<ffffffff9073e81f>] ? set_debug_rodata+0xc/0xc
[    1.306790]  [<ffffffff8fb81d92>] ? __jump_label_update+0x72/0x80
[    1.309255]  [<ffffffff8fb8206f>] ? static_key_slow_inc+0x8f/0xa0
[    1.311680]  [<ffffffff8fbd7a57>] ? frontswap_register_ops+0x107/0x1d0
[    1.314281]  [<ffffffff9077078c>] ? init_zswap+0x282/0x3f6
[    1.316547]  [<ffffffff9077050a>] ? init_frontswap+0x8c/0x8c
[    1.318784]  [<ffffffff8fa0223e>] ? do_one_initcall+0x4e/0x180
[    1.321067]  [<ffffffff9073e81f>] ? set_debug_rodata+0xc/0xc
[    1.323366]  [<ffffffff9073f08d>] ? kernel_init_freeable+0x16b/0x1ec
[    1.325873]  [<ffffffff90011d50>] ? rest_init+0x80/0x80
[    1.327989]  [<ffffffff90011d5a>] ? kernel_init+0xa/0x100
[    1.330092]  [<ffffffff9001f424>] ? ret_from_fork+0x44/0x70
[    1.332311] Code: 00 0f a2 4d 85 e4 74 4a 0f b6 45 00 41 38 45 00 75 19 31 c0 83 c0 01 48 63 d0 49 39 d4 76 33 41 0f b6 4c 15 00 38 4c 15 00 74 e9 <0f> 0b 48 89 ef e8 da d6 19 00 48 8d bd 00 10 00 00 48 89 c3 e8 
[    1.342818] RIP  [<ffffffff8fa2e40c>] text_poke+0x18c/0x240
[    1.345859]  RSP <ffffa41bc0197d90>
[    1.347285] ---[ end trace 0a1c5ab5eb16de89 ]---
[    1.349169] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    1.349169] 
[    1.352885] Kernel Offset: 0xea00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[    1.357039] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    1.357039] 


-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
  2021-03-24 21:21                                                   ` Borislav Petkov
@ 2021-03-24 21:59                                                     ` Paolo Bonzini
  2021-03-25  0:05                                                     ` Hugh Dickins
  1 sibling, 0 replies; 62+ messages in thread
From: Paolo Bonzini @ 2021-03-24 21:59 UTC (permalink / raw)
  To: Borislav Petkov, Babu Moger, Hugh Dickins
  Cc: Jim Mattson, Vitaly Kuznetsov, Wanpeng Li, kvm list,
	Joerg Roedel, the arch/x86 maintainers, LKML, Ingo Molnar,
	H . Peter Anvin, Thomas Gleixner, Makarand Sonare,
	Sean Christopherson

On 24/03/21 22:21, Borislav Petkov wrote:
>  	if (kaiser_enabled)
>   		invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr);
> +	else
> +		asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
> +
>   	invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
>   }

I think the kernel ASID flush can also be moved under the "if"?

> and the reason why it does, IMHO, is because on AMD, kaiser_enabled is
> false because AMD is not affected by Meltdown, which means, there's no
> user/kernel pagetables split.
> 
> And that also means, you have global TLB entries which means that if you
> look at that __native_flush_tlb_single() function, it needs to flush
> global TLB entries on CPUs with X86_FEATURE_INVPCID_SINGLE by doing an
> INVLPG in the kaiser_enabled=0 case. Errgo, the above hunk.

Makes sense.

Paolo


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
  2021-03-24 21:21                                                   ` Borislav Petkov
  2021-03-24 21:59                                                     ` Paolo Bonzini
@ 2021-03-25  0:05                                                     ` Hugh Dickins
  2021-03-25  2:43                                                       ` Hugh Dickins
  1 sibling, 1 reply; 62+ messages in thread
From: Hugh Dickins @ 2021-03-25  0:05 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Babu Moger, Hugh Dickins, Paolo Bonzini, Jim Mattson,
	Vitaly Kuznetsov, Wanpeng Li, kvm list, Joerg Roedel,
	the arch/x86 maintainers, LKML, Ingo Molnar, H . Peter Anvin,
	Thomas Gleixner, Makarand Sonare, Sean Christopherson

On Wed, 24 Mar 2021, Borislav Petkov wrote:

> Ok,
> 
> some more experimenting Babu and I did lead us to:
> 
> ---
> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> index f5ca15622dc9..259aa4889cad 100644
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -250,6 +250,9 @@ static inline void __native_flush_tlb_single(unsigned long addr)
>  	 */
>  	if (kaiser_enabled)
>  		invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr);
> +	else
> +		asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
> +
>  	invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
>  }
> 
> applied on the guest kernel which fixes the issue. And let me add Hugh
> who did that PCID stuff at the time. So lemme summarize for Hugh and to
> ask him nicely to sanity-check me. :-)

Just a brief interim note to assure you that I'm paying attention,
but wow, it's a long time since I gave any thought down here!
Trying to page it all back in...

I see no harm in your workaround if it works, but it's not as if
this is a previously untried path: so I'm suspicious how an issue
here with Globals could have gone unnoticed for so long, and need
to understand it better.

Hugh

> 
> Basically, you have an AMD host which supports PCID and INVPCID and you
> boot on it a 4.9 guest. It explodes like the panic below.
> 
> What fixes it is this:
> 
> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> index f5ca15622dc9..259aa4889cad 100644
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -250,6 +250,9 @@ static inline void __native_flush_tlb_single(unsigned long addr)
>  	 */
>  	if (kaiser_enabled)
>  		invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr);
> +	else
> +		asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
> +
>  	invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
>  }
> 
> ---
> 
> and the reason why it does, IMHO, is because on AMD, kaiser_enabled is
> false because AMD is not affected by Meltdown, which means, there's no
> user/kernel pagetables split.
> 
> And that also means, you have global TLB entries which means that if you
> look at that __native_flush_tlb_single() function, it needs to flush
> global TLB entries on CPUs with X86_FEATURE_INVPCID_SINGLE by doing an
> INVLPG in the kaiser_enabled=0 case. Errgo, the above hunk.
> 
> But I might be completely off here thus this note...
> 
> Thoughts?
> 
> Thx.
> 
> 
> [    1.235726] ------------[ cut here ]------------
> [    1.237515] kernel BUG at /build/linux-dqnRSc/linux-4.9.228/arch/x86/kernel/alternative.c:709!
> [    1.240926] invalid opcode: 0000 [#1] SMP
> [    1.243301] Modules linked in:
> [    1.244585] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.9.0-13-amd64 #1 Debian 4.9.228-1
> [    1.247657] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> [    1.251249] task: ffff909363e94040 task.stack: ffffa41bc0194000
> [    1.253519] RIP: 0010:[<ffffffff8fa2e40c>]  [<ffffffff8fa2e40c>] text_poke+0x18c/0x240
> [    1.256593] RSP: 0018:ffffa41bc0197d90  EFLAGS: 00010096
> [    1.258657] RAX: 000000000000000f RBX: 0000000001020800 RCX: 00000000feda3203
> [    1.261388] RDX: 00000000178bfbff RSI: 0000000000000000 RDI: ffffffffff57a000
> [    1.264168] RBP: ffffffff8fbd3eca R08: 0000000000000000 R09: 0000000000000003
> [    1.266983] R10: 0000000000000003 R11: 0000000000000112 R12: 0000000000000001
> [    1.269702] R13: ffffa41bc0197dcf R14: 0000000000000286 R15: ffffed1c40407500
> [    1.272572] FS:  0000000000000000(0000) GS:ffff909366300000(0000) knlGS:0000000000000000
> [    1.275791] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    1.278032] CR2: 0000000000000000 CR3: 0000000010c08000 CR4: 00000000003606f0
> [    1.280815] Stack:
> [    1.281630]  ffffffff8fbd3eca 0000000000000005 ffffa41bc0197e03 ffffffff8fbd3ecb
> [    1.284660]  0000000000000000 0000000000000000 ffffffff8fa2e835 ccffffff8fad4326
> [    1.287729]  1ccd0231874d55d3 ffffffff8fbd3eca ffffa41bc0197e03 ffffffff90203844
> [    1.290852] Call Trace:
> [    1.291782]  [<ffffffff8fbd3eca>] ? swap_entry_free+0x12a/0x300
> [    1.294900]  [<ffffffff8fbd3ecb>] ? swap_entry_free+0x12b/0x300
> [    1.297267]  [<ffffffff8fa2e835>] ? text_poke_bp+0x55/0xe0
> [    1.299473]  [<ffffffff8fbd3eca>] ? swap_entry_free+0x12a/0x300
> [    1.301896]  [<ffffffff8fa2b64c>] ? arch_jump_label_transform+0x9c/0x120
> [    1.304557]  [<ffffffff9073e81f>] ? set_debug_rodata+0xc/0xc
> [    1.306790]  [<ffffffff8fb81d92>] ? __jump_label_update+0x72/0x80
> [    1.309255]  [<ffffffff8fb8206f>] ? static_key_slow_inc+0x8f/0xa0
> [    1.311680]  [<ffffffff8fbd7a57>] ? frontswap_register_ops+0x107/0x1d0
> [    1.314281]  [<ffffffff9077078c>] ? init_zswap+0x282/0x3f6
> [    1.316547]  [<ffffffff9077050a>] ? init_frontswap+0x8c/0x8c
> [    1.318784]  [<ffffffff8fa0223e>] ? do_one_initcall+0x4e/0x180
> [    1.321067]  [<ffffffff9073e81f>] ? set_debug_rodata+0xc/0xc
> [    1.323366]  [<ffffffff9073f08d>] ? kernel_init_freeable+0x16b/0x1ec
> [    1.325873]  [<ffffffff90011d50>] ? rest_init+0x80/0x80
> [    1.327989]  [<ffffffff90011d5a>] ? kernel_init+0xa/0x100
> [    1.330092]  [<ffffffff9001f424>] ? ret_from_fork+0x44/0x70
> [    1.332311] Code: 00 0f a2 4d 85 e4 74 4a 0f b6 45 00 41 38 45 00 75 19 31 c0 83 c0 01 48 63 d0 49 39 d4 76 33 41 0f b6 4c 15 00 38 4c 15 00 74 e9 <0f> 0b 48 89 ef e8 da d6 19 00 48 8d bd 00 10 00 00 48 89 c3 e8 
> [    1.342818] RIP  [<ffffffff8fa2e40c>] text_poke+0x18c/0x240
> [    1.345859]  RSP <ffffa41bc0197d90>
> [    1.347285] ---[ end trace 0a1c5ab5eb16de89 ]---
> [    1.349169] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> [    1.349169] 
> [    1.352885] Kernel Offset: 0xea00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [    1.357039] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> [    1.357039] 
> 
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
  2021-03-25  0:05                                                     ` Hugh Dickins
@ 2021-03-25  2:43                                                       ` Hugh Dickins
  2021-03-25  9:56                                                         ` Borislav Petkov
  0 siblings, 1 reply; 62+ messages in thread
From: Hugh Dickins @ 2021-03-25  2:43 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Hugh Dickins, Babu Moger, Paolo Bonzini, Jim Mattson,
	Vitaly Kuznetsov, Wanpeng Li, kvm list, Joerg Roedel,
	the arch/x86 maintainers, LKML, Ingo Molnar, H . Peter Anvin,
	Thomas Gleixner, Makarand Sonare, Sean Christopherson

On Wed, 24 Mar 2021, Hugh Dickins wrote:
> On Wed, 24 Mar 2021, Borislav Petkov wrote:
> 
> > Ok,
> > 
> > some more experimenting Babu and I did lead us to:
> > 
> > ---
> > diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> > index f5ca15622dc9..259aa4889cad 100644
> > --- a/arch/x86/include/asm/tlbflush.h
> > +++ b/arch/x86/include/asm/tlbflush.h
> > @@ -250,6 +250,9 @@ static inline void __native_flush_tlb_single(unsigned long addr)
> >  	 */
> >  	if (kaiser_enabled)
> >  		invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr);
> > +	else
> > +		asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
> > +
> >  	invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
> >  }
> > 
> > applied on the guest kernel which fixes the issue. And let me add Hugh
> > who did that PCID stuff at the time. So lemme summarize for Hugh and to
> > ask him nicely to sanity-check me. :-)
> 
> Just a brief interim note to assure you that I'm paying attention,
> but wow, it's a long time since I gave any thought down here!
> Trying to page it all back in...
> 
> I see no harm in your workaround if it works, but it's not as if
> this is a previously untried path: so I'm suspicious how an issue
> here with Globals could have gone unnoticed for so long, and need
> to understand it better.

Right, after looking into it more, I completely agree with you:
the Kaiser series (in both 4.4-stable and 4.9-stable) was simply
wrong to lose that invlpg - fine in the kaiser case when we don't
enable Globals at all, but plain wrong in the !kaiser_enabled case.
One way or another, we have somehow got away with it for three years.

I do agree with Paolo that the PCID_ASID_KERN flush would be better
moved under the "if (kaiser_enabled)" now. (And if this were ongoing
development, I'd want to rewrite the function altogether: but no,
these old stable trees are not the place for that.)

Boris, may I leave both -stable fixes to you?
Let me know if you'd prefer me to clean up my mess.

Thanks a lot for tracking this down,
Hugh

> > 
> > Basically, you have an AMD host which supports PCID and INVPCID and you
> > boot on it a 4.9 guest. It explodes like the panic below.
> > 
> > What fixes it is this:
> > 
> > diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> > index f5ca15622dc9..259aa4889cad 100644
> > --- a/arch/x86/include/asm/tlbflush.h
> > +++ b/arch/x86/include/asm/tlbflush.h
> > @@ -250,6 +250,9 @@ static inline void __native_flush_tlb_single(unsigned long addr)
> >  	 */
> >  	if (kaiser_enabled)
> >  		invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr);
> > +	else
> > +		asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
> > +
> >  	invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
> >  }
> > 
> > ---
> > 
> > and the reason why it does, IMHO, is because on AMD, kaiser_enabled is
> > false because AMD is not affected by Meltdown, which means, there's no
> > user/kernel pagetables split.
> > 
> > And that also means, you have global TLB entries which means that if you
> > look at that __native_flush_tlb_single() function, it needs to flush
> > global TLB entries on CPUs with X86_FEATURE_INVPCID_SINGLE by doing an
> > INVLPG in the kaiser_enabled=0 case. Errgo, the above hunk.
> > 
> > But I might be completely off here thus this note...
> > 
> > Thoughts?
> > 
> > Thx.
> > 
> > 
> > [    1.235726] ------------[ cut here ]------------
> > [    1.237515] kernel BUG at /build/linux-dqnRSc/linux-4.9.228/arch/x86/kernel/alternative.c:709!
> > [    1.240926] invalid opcode: 0000 [#1] SMP
> > [    1.243301] Modules linked in:
> > [    1.244585] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.9.0-13-amd64 #1 Debian 4.9.228-1
> > [    1.247657] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > [    1.251249] task: ffff909363e94040 task.stack: ffffa41bc0194000
> > [    1.253519] RIP: 0010:[<ffffffff8fa2e40c>]  [<ffffffff8fa2e40c>] text_poke+0x18c/0x240
> > [    1.256593] RSP: 0018:ffffa41bc0197d90  EFLAGS: 00010096
> > [    1.258657] RAX: 000000000000000f RBX: 0000000001020800 RCX: 00000000feda3203
> > [    1.261388] RDX: 00000000178bfbff RSI: 0000000000000000 RDI: ffffffffff57a000
> > [    1.264168] RBP: ffffffff8fbd3eca R08: 0000000000000000 R09: 0000000000000003
> > [    1.266983] R10: 0000000000000003 R11: 0000000000000112 R12: 0000000000000001
> > [    1.269702] R13: ffffa41bc0197dcf R14: 0000000000000286 R15: ffffed1c40407500
> > [    1.272572] FS:  0000000000000000(0000) GS:ffff909366300000(0000) knlGS:0000000000000000
> > [    1.275791] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    1.278032] CR2: 0000000000000000 CR3: 0000000010c08000 CR4: 00000000003606f0
> > [    1.280815] Stack:
> > [    1.281630]  ffffffff8fbd3eca 0000000000000005 ffffa41bc0197e03 ffffffff8fbd3ecb
> > [    1.284660]  0000000000000000 0000000000000000 ffffffff8fa2e835 ccffffff8fad4326
> > [    1.287729]  1ccd0231874d55d3 ffffffff8fbd3eca ffffa41bc0197e03 ffffffff90203844
> > [    1.290852] Call Trace:
> > [    1.291782]  [<ffffffff8fbd3eca>] ? swap_entry_free+0x12a/0x300
> > [    1.294900]  [<ffffffff8fbd3ecb>] ? swap_entry_free+0x12b/0x300
> > [    1.297267]  [<ffffffff8fa2e835>] ? text_poke_bp+0x55/0xe0
> > [    1.299473]  [<ffffffff8fbd3eca>] ? swap_entry_free+0x12a/0x300
> > [    1.301896]  [<ffffffff8fa2b64c>] ? arch_jump_label_transform+0x9c/0x120
> > [    1.304557]  [<ffffffff9073e81f>] ? set_debug_rodata+0xc/0xc
> > [    1.306790]  [<ffffffff8fb81d92>] ? __jump_label_update+0x72/0x80
> > [    1.309255]  [<ffffffff8fb8206f>] ? static_key_slow_inc+0x8f/0xa0
> > [    1.311680]  [<ffffffff8fbd7a57>] ? frontswap_register_ops+0x107/0x1d0
> > [    1.314281]  [<ffffffff9077078c>] ? init_zswap+0x282/0x3f6
> > [    1.316547]  [<ffffffff9077050a>] ? init_frontswap+0x8c/0x8c
> > [    1.318784]  [<ffffffff8fa0223e>] ? do_one_initcall+0x4e/0x180
> > [    1.321067]  [<ffffffff9073e81f>] ? set_debug_rodata+0xc/0xc
> > [    1.323366]  [<ffffffff9073f08d>] ? kernel_init_freeable+0x16b/0x1ec
> > [    1.325873]  [<ffffffff90011d50>] ? rest_init+0x80/0x80
> > [    1.327989]  [<ffffffff90011d5a>] ? kernel_init+0xa/0x100
> > [    1.330092]  [<ffffffff9001f424>] ? ret_from_fork+0x44/0x70
> > [    1.332311] Code: 00 0f a2 4d 85 e4 74 4a 0f b6 45 00 41 38 45 00 75 19 31 c0 83 c0 01 48 63 d0 49 39 d4 76 33 41 0f b6 4c 15 00 38 4c 15 00 74 e9 <0f> 0b 48 89 ef e8 da d6 19 00 48 8d bd 00 10 00 00 48 89 c3 e8 
> > [    1.342818] RIP  [<ffffffff8fa2e40c>] text_poke+0x18c/0x240
> > [    1.345859]  RSP <ffffa41bc0197d90>
> > [    1.347285] ---[ end trace 0a1c5ab5eb16de89 ]---
> > [    1.349169] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> > [    1.349169] 
> > [    1.352885] Kernel Offset: 0xea00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > [    1.357039] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> > [    1.357039] 
> > 
> > 
> > -- 
> > Regards/Gruss,
> >     Boris.
> > 
> > https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v6 00/12] SVM cleanup and INVPCID feature support
  2021-03-25  2:43                                                       ` Hugh Dickins
@ 2021-03-25  9:56                                                         ` Borislav Petkov
  2021-03-25 10:29                                                           ` [PATCH] x86/tlb: Flush global mappings when KAISER is disabled Borislav Petkov
  0 siblings, 1 reply; 62+ messages in thread
From: Borislav Petkov @ 2021-03-25  9:56 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Babu Moger, Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov,
	Wanpeng Li, kvm list, Joerg Roedel, the arch/x86 maintainers,
	LKML, Ingo Molnar, H . Peter Anvin, Thomas Gleixner,
	Makarand Sonare, Sean Christopherson

On Wed, Mar 24, 2021 at 07:43:29PM -0700, Hugh Dickins wrote:
> Right, after looking into it more, I completely agree with you:
> the Kaiser series (in both 4.4-stable and 4.9-stable) was simply
> wrong to lose that invlpg - fine in the kaiser case when we don't
> enable Globals at all, but plain wrong in the !kaiser_enabled case.
> One way or another, we have somehow got away with it for three years.

Yeah, because there were no boxes with kaiser_enabled=0 *and* PCID
which would set INVPCID_SINGLE. Before those, it would INVLPG in the
!INVPCID_SINGLE case.

Oh, btw, booting with "pci=on" "fixes" the issue too. And I tried
reproducing this on an Intel box with "pti=off" but it booted fine
so I'm probably missing some other aspect or triggering it there is
harder/different due to TLB differences or whatnot.

And Babu triggered the same issue on a AMD baremetal yesterday.

> I do agree with Paolo that the PCID_ASID_KERN flush would be better
> moved under the "if (kaiser_enabled)" now.

Ok.

> (And if this were ongoing development, I'd want to rewrite the
> function altogether: but no, these old stable trees are not the place
> for that.)

Bah, it brought some very mixed memories, wading through that code
after years. And yeah, people should stop using all these dead kernels
already! So yeah, no, you don't want to clean up stuff there - let
sleeping dogs lie.

> Boris, may I leave both -stable fixes to you?
> Let me know if you'd prefer me to clean up my mess.

No worries, I'll take care of it.

> Thanks a lot for tracking this down,

Thanks for double-checking me so quickly, lemme whip up a patch.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH] x86/tlb: Flush global mappings when KAISER is disabled
  2021-03-25  9:56                                                         ` Borislav Petkov
@ 2021-03-25 10:29                                                           ` Borislav Petkov
  2021-03-25 10:52                                                             ` Paolo Bonzini
                                                                               ` (3 more replies)
  0 siblings, 4 replies; 62+ messages in thread
From: Borislav Petkov @ 2021-03-25 10:29 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Babu Moger, Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov,
	Wanpeng Li, kvm list, Joerg Roedel, the arch/x86 maintainers,
	LKML, Ingo Molnar, H . Peter Anvin, Thomas Gleixner,
	Makarand Sonare, Sean Christopherson

Ok,

I tried to be as specific as possible in the commit message so that we
don't forget. Please lemme know if I've missed something.

Babu, Jim, I'd appreciate it if you ran this to confirm.

Thx.

---
From: Borislav Petkov <bp@suse.de>
Date: Thu, 25 Mar 2021 11:02:31 +0100

Jim Mattson reported that Debian 9 guests using a 4.9-stable kernel
are exploding during alternatives patching:

  kernel BUG at /build/linux-dqnRSc/linux-4.9.228/arch/x86/kernel/alternative.c:709!
  invalid opcode: 0000 [#1] SMP
  Modules linked in:
  CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.9.0-13-amd64 #1 Debian 4.9.228-1
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Call Trace:
   swap_entry_free
   swap_entry_free
   text_poke_bp
   swap_entry_free
   arch_jump_label_transform
   set_debug_rodata
   __jump_label_update
   static_key_slow_inc
   frontswap_register_ops
   init_zswap
   init_frontswap
   do_one_initcall
   set_debug_rodata
   kernel_init_freeable
   rest_init
   kernel_init
   ret_from_fork

triggering the BUG_ON in text_poke() which verifies whether patched
instruction bytes have actually landed at the destination.

Further debugging showed that the TLB flush before that check is
insufficient because there could be global mappings left in the TLB,
leading to a stale mapping getting used.

I say "global mappings" because the hardware configuration is a new one:
machine is an AMD, which means, KAISER/PTI doesn't need to be enabled
there, which also means there's no user/kernel pagetables split and
therefore the TLB can have global mappings.

And the configuration is new one for a second reason: because that AMD
machine supports PCID and INVPCID, which leads the CPU detection code to
set the synthetic X86_FEATURE_INVPCID_SINGLE flag.

Now, __native_flush_tlb_single() does invalidate global mappings when
X86_FEATURE_INVPCID_SINGLE is *not* set and returns.

When X86_FEATURE_INVPCID_SINGLE is set, however, it invalidates the
requested address from both PCIDs in the KAISER-enabled case. But if
KAISER is not enabled and the machine has global mappings in the TLB,
then those global mappings do not get invalidated, which would lead to
the above mismatch from using a stale TLB entry.

So make sure to flush those global mappings in the KAISER disabled case.

Co-debugged by Babu Moger <babu.moger@amd.com>.

Reported-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/CALMp9eRDSW66%2BXvbHVF4ohL7XhThoPoT0BrB0TcS0cgk=dkcBg@mail.gmail.com
---
 arch/x86/include/asm/tlbflush.h | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index f5ca15622dc9..2bfa4deb8cae 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -245,12 +245,15 @@ static inline void __native_flush_tlb_single(unsigned long addr)
 	 * ASID.  But, userspace flushes are probably much more
 	 * important performance-wise.
 	 *
-	 * Make sure to do only a single invpcid when KAISER is
-	 * disabled and we have only a single ASID.
+	 * In the KAISER disabled case, do an INVLPG to make sure
+	 * the mapping is flushed in case it is a global one.
 	 */
-	if (kaiser_enabled)
+	if (kaiser_enabled) {
 		invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr);
-	invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
+		invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
+	} else {
+		asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
+	}
 }
 
 static inline void __flush_tlb_all(void)
-- 
2.29.2

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH] x86/tlb: Flush global mappings when KAISER is disabled
  2021-03-25 10:29                                                           ` [PATCH] x86/tlb: Flush global mappings when KAISER is disabled Borislav Petkov
@ 2021-03-25 10:52                                                             ` Paolo Bonzini
  2021-03-25 15:13                                                             ` Babu Moger
                                                                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 62+ messages in thread
From: Paolo Bonzini @ 2021-03-25 10:52 UTC (permalink / raw)
  To: Borislav Petkov, Hugh Dickins
  Cc: Babu Moger, Jim Mattson, Vitaly Kuznetsov, Wanpeng Li, kvm list,
	Joerg Roedel, the arch/x86 maintainers, LKML, Ingo Molnar,
	H . Peter Anvin, Thomas Gleixner, Makarand Sonare,
	Sean Christopherson

On 25/03/21 11:29, Borislav Petkov wrote:
> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> index f5ca15622dc9..2bfa4deb8cae 100644
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -245,12 +245,15 @@ static inline void __native_flush_tlb_single(unsigned long addr)
>   	 * ASID.  But, userspace flushes are probably much more
>   	 * important performance-wise.
>   	 *
> -	 * Make sure to do only a single invpcid when KAISER is
> -	 * disabled and we have only a single ASID.
> +	 * In the KAISER disabled case, do an INVLPG to make sure
> +	 * the mapping is flushed in case it is a global one.
>   	 */
> -	if (kaiser_enabled)
> +	if (kaiser_enabled) {
>   		invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr);
> -	invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
> +		invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
> +	} else {
> +		asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
> +	}
>   }
>   
>   static inline void __flush_tlb_all(void)
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH] x86/tlb: Flush global mappings when KAISER is disabled
  2021-03-25 10:29                                                           ` [PATCH] x86/tlb: Flush global mappings when KAISER is disabled Borislav Petkov
  2021-03-25 10:52                                                             ` Paolo Bonzini
@ 2021-03-25 15:13                                                             ` Babu Moger
  2021-03-25 16:33                                                             ` Hugh Dickins
  2021-03-25 20:09                                                             ` Borislav Petkov
  3 siblings, 0 replies; 62+ messages in thread
From: Babu Moger @ 2021-03-25 15:13 UTC (permalink / raw)
  To: Borislav Petkov, Hugh Dickins
  Cc: Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov, Wanpeng Li,
	kvm list, Joerg Roedel, the arch/x86 maintainers, LKML,
	Ingo Molnar, H . Peter Anvin, Thomas Gleixner, Makarand Sonare,
	Sean Christopherson



On 3/25/21 5:29 AM, Borislav Petkov wrote:
> Ok,
> 
> I tried to be as specific as possible in the commit message so that we
> don't forget. Please lemme know if I've missed something.
> 
> Babu, Jim, I'd appreciate it if you ran this to confirm.
> 
> Thx.
> 
> ---
> From: Borislav Petkov <bp@suse.de>
> Date: Thu, 25 Mar 2021 11:02:31 +0100
> 
> Jim Mattson reported that Debian 9 guests using a 4.9-stable kernel
> are exploding during alternatives patching:
> 
>   kernel BUG at /build/linux-dqnRSc/linux-4.9.228/arch/x86/kernel/alternative.c:709!
>   invalid opcode: 0000 [#1] SMP
>   Modules linked in:
>   CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.9.0-13-amd64 #1 Debian 4.9.228-1
>   Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>   Call Trace:
>    swap_entry_free
>    swap_entry_free
>    text_poke_bp
>    swap_entry_free
>    arch_jump_label_transform
>    set_debug_rodata
>    __jump_label_update
>    static_key_slow_inc
>    frontswap_register_ops
>    init_zswap
>    init_frontswap
>    do_one_initcall
>    set_debug_rodata
>    kernel_init_freeable
>    rest_init
>    kernel_init
>    ret_from_fork
> 
> triggering the BUG_ON in text_poke() which verifies whether patched
> instruction bytes have actually landed at the destination.
> 
> Further debugging showed that the TLB flush before that check is
> insufficient because there could be global mappings left in the TLB,
> leading to a stale mapping getting used.
> 
> I say "global mappings" because the hardware configuration is a new one:
> machine is an AMD, which means, KAISER/PTI doesn't need to be enabled
> there, which also means there's no user/kernel pagetables split and
> therefore the TLB can have global mappings.
> 
> And the configuration is new one for a second reason: because that AMD
> machine supports PCID and INVPCID, which leads the CPU detection code to
> set the synthetic X86_FEATURE_INVPCID_SINGLE flag.
> 
> Now, __native_flush_tlb_single() does invalidate global mappings when
> X86_FEATURE_INVPCID_SINGLE is *not* set and returns.
> 
> When X86_FEATURE_INVPCID_SINGLE is set, however, it invalidates the
> requested address from both PCIDs in the KAISER-enabled case. But if
> KAISER is not enabled and the machine has global mappings in the TLB,
> then those global mappings do not get invalidated, which would lead to
> the above mismatch from using a stale TLB entry.
> 
> So make sure to flush those global mappings in the KAISER disabled case.
> 
> Co-debugged by Babu Moger <babu.moger@amd.com>.
> 
> Reported-by: Jim Mattson <jmattson@google.com>
> Signed-off-by: Borislav Petkov <bp@suse.de>
> Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.kernel.org%2Fr%2FCALMp9eRDSW66%252BXvbHVF4ohL7XhThoPoT0BrB0TcS0cgk%3DdkcBg%40mail.gmail.com&amp;data=04%7C01%7Cbabu.moger%40amd.com%7Cf4e0aacf81744dc8be4408d8ef78f2cf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637522650066097649%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=1c4MQ9I9KrLxWLqghGCI%2BC%2Bvs0c9vYaNC5d%2FiYL0oMA%3D&amp;reserved=0
> ---
>  arch/x86/include/asm/tlbflush.h | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> index f5ca15622dc9..2bfa4deb8cae 100644
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -245,12 +245,15 @@ static inline void __native_flush_tlb_single(unsigned long addr)
>  	 * ASID.  But, userspace flushes are probably much more
>  	 * important performance-wise.
>  	 *
> -	 * Make sure to do only a single invpcid when KAISER is
> -	 * disabled and we have only a single ASID.
> +	 * In the KAISER disabled case, do an INVLPG to make sure
> +	 * the mapping is flushed in case it is a global one.
>  	 */
> -	if (kaiser_enabled)
> +	if (kaiser_enabled) {
>  		invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr);
> -	invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
> +		invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
> +	} else {
> +		asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
> +	}
>  }
>  
>  static inline void __flush_tlb_all(void)
> 

Thanks Boris. As you updated the patch little bit since yesterday, I
retested them again both on host and guest kernel. They are looking good.

Tested-by: Babu Moger <babu.moger@amd.com>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH] x86/tlb: Flush global mappings when KAISER is disabled
  2021-03-25 10:29                                                           ` [PATCH] x86/tlb: Flush global mappings when KAISER is disabled Borislav Petkov
  2021-03-25 10:52                                                             ` Paolo Bonzini
  2021-03-25 15:13                                                             ` Babu Moger
@ 2021-03-25 16:33                                                             ` Hugh Dickins
  2021-03-25 19:00                                                               ` Jim Mattson
  2021-03-25 20:09                                                             ` Borislav Petkov
  3 siblings, 1 reply; 62+ messages in thread
From: Hugh Dickins @ 2021-03-25 16:33 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Hugh Dickins, Babu Moger, Paolo Bonzini, Jim Mattson,
	Vitaly Kuznetsov, Wanpeng Li, kvm list, Joerg Roedel,
	the arch/x86 maintainers, LKML, Ingo Molnar, H . Peter Anvin,
	Thomas Gleixner, Makarand Sonare, Sean Christopherson

On Thu, 25 Mar 2021, Borislav Petkov wrote:

> Ok,
> 
> I tried to be as specific as possible in the commit message so that we
> don't forget. Please lemme know if I've missed something.
> 
> Babu, Jim, I'd appreciate it if you ran this to confirm.
> 
> Thx.
> 
> ---
> From: Borislav Petkov <bp@suse.de>
> Date: Thu, 25 Mar 2021 11:02:31 +0100
> 
> Jim Mattson reported that Debian 9 guests using a 4.9-stable kernel
> are exploding during alternatives patching:
> 
>   kernel BUG at /build/linux-dqnRSc/linux-4.9.228/arch/x86/kernel/alternative.c:709!
>   invalid opcode: 0000 [#1] SMP
>   Modules linked in:
>   CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.9.0-13-amd64 #1 Debian 4.9.228-1
>   Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>   Call Trace:
>    swap_entry_free
>    swap_entry_free
>    text_poke_bp
>    swap_entry_free
>    arch_jump_label_transform
>    set_debug_rodata
>    __jump_label_update
>    static_key_slow_inc
>    frontswap_register_ops
>    init_zswap
>    init_frontswap
>    do_one_initcall
>    set_debug_rodata
>    kernel_init_freeable
>    rest_init
>    kernel_init
>    ret_from_fork
> 
> triggering the BUG_ON in text_poke() which verifies whether patched
> instruction bytes have actually landed at the destination.
> 
> Further debugging showed that the TLB flush before that check is
> insufficient because there could be global mappings left in the TLB,
> leading to a stale mapping getting used.
> 
> I say "global mappings" because the hardware configuration is a new one:
> machine is an AMD, which means, KAISER/PTI doesn't need to be enabled
> there, which also means there's no user/kernel pagetables split and
> therefore the TLB can have global mappings.
> 
> And the configuration is new one for a second reason: because that AMD
> machine supports PCID and INVPCID, which leads the CPU detection code to
> set the synthetic X86_FEATURE_INVPCID_SINGLE flag.
> 
> Now, __native_flush_tlb_single() does invalidate global mappings when
> X86_FEATURE_INVPCID_SINGLE is *not* set and returns.
> 
> When X86_FEATURE_INVPCID_SINGLE is set, however, it invalidates the
> requested address from both PCIDs in the KAISER-enabled case. But if
> KAISER is not enabled and the machine has global mappings in the TLB,
> then those global mappings do not get invalidated, which would lead to
> the above mismatch from using a stale TLB entry.
> 
> So make sure to flush those global mappings in the KAISER disabled case.
> 
> Co-debugged by Babu Moger <babu.moger@amd.com>.
> 
> Reported-by: Jim Mattson <jmattson@google.com>
> Signed-off-by: Borislav Petkov <bp@suse.de>
> Link: https://lkml.kernel.org/r/CALMp9eRDSW66%2BXvbHVF4ohL7XhThoPoT0BrB0TcS0cgk=dkcBg@mail.gmail.com

Acked-by: Hugh Dickins <hughd@google.com>

Great write-up too: many thanks.

> ---
>  arch/x86/include/asm/tlbflush.h | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> index f5ca15622dc9..2bfa4deb8cae 100644
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -245,12 +245,15 @@ static inline void __native_flush_tlb_single(unsigned long addr)
>  	 * ASID.  But, userspace flushes are probably much more
>  	 * important performance-wise.
>  	 *
> -	 * Make sure to do only a single invpcid when KAISER is
> -	 * disabled and we have only a single ASID.
> +	 * In the KAISER disabled case, do an INVLPG to make sure
> +	 * the mapping is flushed in case it is a global one.
>  	 */
> -	if (kaiser_enabled)
> +	if (kaiser_enabled) {
>  		invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr);
> -	invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
> +		invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
> +	} else {
> +		asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
> +	}
>  }
>  
>  static inline void __flush_tlb_all(void)
> -- 
> 2.29.2
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH] x86/tlb: Flush global mappings when KAISER is disabled
  2021-03-25 16:33                                                             ` Hugh Dickins
@ 2021-03-25 19:00                                                               ` Jim Mattson
  0 siblings, 0 replies; 62+ messages in thread
From: Jim Mattson @ 2021-03-25 19:00 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Borislav Petkov, Babu Moger, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, kvm list, Joerg Roedel, the arch/x86 maintainers,
	LKML, Ingo Molnar, H . Peter Anvin, Thomas Gleixner,
	Makarand Sonare, Sean Christopherson

On Thu, Mar 25, 2021 at 9:33 AM Hugh Dickins <hughd@google.com> wrote:
>
> On Thu, 25 Mar 2021, Borislav Petkov wrote:
>
> > Ok,
> >
> > I tried to be as specific as possible in the commit message so that we
> > don't forget. Please lemme know if I've missed something.
> >
> > Babu, Jim, I'd appreciate it if you ran this to confirm.
Tested-by: Jim Mattson <jmattson@google.com>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH] x86/tlb: Flush global mappings when KAISER is disabled
  2021-03-25 10:29                                                           ` [PATCH] x86/tlb: Flush global mappings when KAISER is disabled Borislav Petkov
                                                                               ` (2 preceding siblings ...)
  2021-03-25 16:33                                                             ` Hugh Dickins
@ 2021-03-25 20:09                                                             ` Borislav Petkov
  2021-03-25 20:36                                                               ` Sasha Levin
  3 siblings, 1 reply; 62+ messages in thread
From: Borislav Petkov @ 2021-03-25 20:09 UTC (permalink / raw)
  To: stable
  Cc: Hugh Dickins, Babu Moger, Paolo Bonzini, Jim Mattson,
	Vitaly Kuznetsov, Wanpeng Li, kvm list, Joerg Roedel,
	the arch/x86 maintainers, LKML, Ingo Molnar, H . Peter Anvin,
	Thomas Gleixner, Makarand Sonare, Sean Christopherson

Hi stable folks,

the patch below fixes kernels 4.4 and 4.9 booting on AMD platforms with
PCID support. It doesn't have an upstream counterpart because it patches
the KAISER code which didn't go upstream. It applies fine to both of the
aforementioned kernels - please pick it up.

Thx.

---
From: Borislav Petkov <bp@suse.de>
Date: Thu, 25 Mar 2021 11:02:31 +0100
Subject: [PATCH] x86/tlb: Flush global mappings when KAISER is disabled

Jim Mattson reported that Debian 9 guests using a 4.9-stable kernel
are exploding during alternatives patching:

  kernel BUG at /build/linux-dqnRSc/linux-4.9.228/arch/x86/kernel/alternative.c:709!
  invalid opcode: 0000 [#1] SMP
  Modules linked in:
  CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.9.0-13-amd64 #1 Debian 4.9.228-1
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Call Trace:
   swap_entry_free
   swap_entry_free
   text_poke_bp
   swap_entry_free
   arch_jump_label_transform
   set_debug_rodata
   __jump_label_update
   static_key_slow_inc
   frontswap_register_ops
   init_zswap
   init_frontswap
   do_one_initcall
   set_debug_rodata
   kernel_init_freeable
   rest_init
   kernel_init
   ret_from_fork

triggering the BUG_ON in text_poke() which verifies whether patched
instruction bytes have actually landed at the destination.

Further debugging showed that the TLB flush before that check is
insufficient because there could be global mappings left in the TLB,
leading to a stale mapping getting used.

I say "global mappings" because the hardware configuration is a new one:
machine is an AMD, which means, KAISER/PTI doesn't need to be enabled
there, which also means there's no user/kernel pagetables split and
therefore the TLB can have global mappings.

And the configuration is new one for a second reason: because that AMD
machine supports PCID and INVPCID, which leads the CPU detection code to
set the synthetic X86_FEATURE_INVPCID_SINGLE flag.

Now, __native_flush_tlb_single() does invalidate global mappings when
X86_FEATURE_INVPCID_SINGLE is *not* set and returns.

When X86_FEATURE_INVPCID_SINGLE is set, however, it invalidates the
requested address from both PCIDs in the KAISER-enabled case. But if
KAISER is not enabled and the machine has global mappings in the TLB,
then those global mappings do not get invalidated, which would lead to
the above mismatch from using a stale TLB entry.

So make sure to flush those global mappings in the KAISER disabled case.

Co-debugged by Babu Moger <babu.moger@amd.com>.

Reported-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Jim Mattson <jmattson@google.com>
Link: https://lkml.kernel.org/r/CALMp9eRDSW66%2BXvbHVF4ohL7XhThoPoT0BrB0TcS0cgk=dkcBg@mail.gmail.com
---
 arch/x86/include/asm/tlbflush.h | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index f5ca15622dc9..2bfa4deb8cae 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -245,12 +245,15 @@ static inline void __native_flush_tlb_single(unsigned long addr)
 	 * ASID.  But, userspace flushes are probably much more
 	 * important performance-wise.
 	 *
-	 * Make sure to do only a single invpcid when KAISER is
-	 * disabled and we have only a single ASID.
+	 * In the KAISER disabled case, do an INVLPG to make sure
+	 * the mapping is flushed in case it is a global one.
 	 */
-	if (kaiser_enabled)
+	if (kaiser_enabled) {
 		invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr);
-	invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
+		invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
+	} else {
+		asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
+	}
 }
 
 static inline void __flush_tlb_all(void)
-- 
2.29.2


-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH] x86/tlb: Flush global mappings when KAISER is disabled
  2021-03-25 20:09                                                             ` Borislav Petkov
@ 2021-03-25 20:36                                                               ` Sasha Levin
  2021-03-25 23:19                                                                 ` Sasha Levin
  0 siblings, 1 reply; 62+ messages in thread
From: Sasha Levin @ 2021-03-25 20:36 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: stable, Hugh Dickins, Babu Moger, Paolo Bonzini, Jim Mattson,
	Vitaly Kuznetsov, Wanpeng Li, kvm list, Joerg Roedel,
	the arch/x86 maintainers, LKML, Ingo Molnar, H . Peter Anvin,
	Thomas Gleixner, Makarand Sonare, Sean Christopherson, carnil,
	ben

On Thu, Mar 25, 2021 at 09:09:42PM +0100, Borislav Petkov wrote:
>Hi stable folks,
>
>the patch below fixes kernels 4.4 and 4.9 booting on AMD platforms with
>PCID support. It doesn't have an upstream counterpart because it patches
>the KAISER code which didn't go upstream. It applies fine to both of the
>aforementioned kernels - please pick it up.

Queued up for 4.9 and 4.4, thanks!

>Jim Mattson reported that Debian 9 guests using a 4.9-stable kernel
>are exploding during alternatives patching:

(Cc Ben & Salvatore)

I'm not sure if 4.9 or Debian is still alive or not, but FYI...

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH] x86/tlb: Flush global mappings when KAISER is disabled
  2021-03-25 20:36                                                               ` Sasha Levin
@ 2021-03-25 23:19                                                                 ` Sasha Levin
  2021-03-25 23:56                                                                   ` Ben Hutchings
  0 siblings, 1 reply; 62+ messages in thread
From: Sasha Levin @ 2021-03-25 23:19 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: stable, Hugh Dickins, Babu Moger, Paolo Bonzini, Jim Mattson,
	Vitaly Kuznetsov, Wanpeng Li, kvm list, Joerg Roedel,
	the arch/x86 maintainers, LKML, Ingo Molnar, H . Peter Anvin,
	Thomas Gleixner, Makarand Sonare, Sean Christopherson, carnil,
	ben

On Thu, Mar 25, 2021 at 04:36:55PM -0400, Sasha Levin wrote:
>On Thu, Mar 25, 2021 at 09:09:42PM +0100, Borislav Petkov wrote:
>>Hi stable folks,
>>
>>the patch below fixes kernels 4.4 and 4.9 booting on AMD platforms with
>>PCID support. It doesn't have an upstream counterpart because it patches
>>the KAISER code which didn't go upstream. It applies fine to both of the
>>aforementioned kernels - please pick it up.
>
>Queued up for 4.9 and 4.4, thanks!
>
>>Jim Mattson reported that Debian 9 guests using a 4.9-stable kernel
>>are exploding during alternatives patching:
>
>(Cc Ben & Salvatore)
>
>I'm not sure if 4.9 or Debian is still alive or not, but FYI...
		    *on

		    :)
-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH] x86/tlb: Flush global mappings when KAISER is disabled
  2021-03-25 23:19                                                                 ` Sasha Levin
@ 2021-03-25 23:56                                                                   ` Ben Hutchings
  0 siblings, 0 replies; 62+ messages in thread
From: Ben Hutchings @ 2021-03-25 23:56 UTC (permalink / raw)
  To: Sasha Levin, Borislav Petkov
  Cc: stable, Hugh Dickins, Babu Moger, Paolo Bonzini, Jim Mattson,
	Vitaly Kuznetsov, Wanpeng Li, kvm list, Joerg Roedel,
	the arch/x86 maintainers, LKML, Ingo Molnar, H . Peter Anvin,
	Thomas Gleixner, Makarand Sonare, Sean Christopherson, carnil

[-- Attachment #1: Type: text/plain, Size: 1219 bytes --]

On Thu, 2021-03-25 at 19:19 -0400, Sasha Levin wrote:
> On Thu, Mar 25, 2021 at 04:36:55PM -0400, Sasha Levin wrote:
> > On Thu, Mar 25, 2021 at 09:09:42PM +0100, Borislav Petkov wrote:
> > > Hi stable folks,
> > > 
> > > the patch below fixes kernels 4.4 and 4.9 booting on AMD platforms with
> > > PCID support. It doesn't have an upstream counterpart because it patches
> > > the KAISER code which didn't go upstream. It applies fine to both of the
> > > aforementioned kernels - please pick it up.
> > 
> > Queued up for 4.9 and 4.4, thanks!
> > 
> > > Jim Mattson reported that Debian 9 guests using a 4.9-stable kernel
> > > are exploding during alternatives patching:
> > 
> > (Cc Ben & Salvatore)
> > 
> > I'm not sure if 4.9 or Debian is still alive or not, but FYI...
>                     *on
> 
>                     :)

We're supporting both 4.9 and 4.19 in Debian 9.  The general rule is we
carry on with the same stable kernel branch for the whole 5 year
support period, but add the option of using the kernel version from the
next stable release.

Ben.

-- 
Ben Hutchings
Teamwork is essential - it allows you to blame someone else.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2021-03-26  0:25 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-11 19:27 [PATCH v6 00/12] SVM cleanup and INVPCID feature support Babu Moger
2020-09-11 19:27 ` [PATCH v6 01/12] KVM: SVM: Introduce vmcb_(set_intercept/clr_intercept/_is_intercept) Babu Moger
2020-09-11 19:28 ` [PATCH v6 02/12] KVM: SVM: Change intercept_cr to generic intercepts Babu Moger
2020-09-11 19:28 ` [PATCH v6 03/12] KVM: SVM: Change intercept_dr " Babu Moger
2020-09-11 19:28 ` [PATCH v6 04/12] KVM: SVM: Modify intercept_exceptions " Babu Moger
2020-09-12 16:52   ` Paolo Bonzini
2020-09-14 15:06     ` Sean Christopherson
2020-09-22 13:39       ` Paolo Bonzini
2020-09-22 19:11         ` Babu Moger
2020-09-23  2:43           ` Paolo Bonzini
2020-09-23 13:35             ` Babu Moger
2020-09-11 19:28 ` [PATCH v6 05/12] KVM: SVM: Modify 64 bit intercept field to two 32 bit vectors Babu Moger
2020-09-11 19:28 ` [PATCH v6 06/12] KVM: SVM: Add new intercept vector in vmcb_control_area Babu Moger
2020-09-11 19:28 ` [PATCH v6 07/12] KVM: nSVM: Cleanup nested_state data structure Babu Moger
2020-09-11 19:28 ` [PATCH v6 08/12] KVM: SVM: Remove set_cr_intercept, clr_cr_intercept and is_cr_intercept Babu Moger
2020-09-11 19:28 ` [PATCH v6 09/12] KVM: SVM: Remove set_exception_intercept and clr_exception_intercept Babu Moger
2020-09-11 19:29 ` [PATCH v6 10/12] KVM: X86: Rename and move the function vmx_handle_memory_failure to x86.c Babu Moger
2020-09-11 19:29 ` [PATCH v6 11/12] KVM: X86: Move handling of INVPCID types to x86 Babu Moger
2020-09-11 19:29 ` [PATCH v6 12/12] KVM:SVM: Enable INVPCID feature on AMD Babu Moger
2020-09-12 17:08 ` [PATCH v6 00/12] SVM cleanup and INVPCID feature support Paolo Bonzini
2020-09-14 15:05   ` Sean Christopherson
2020-09-14 18:33   ` Babu Moger
2021-01-19 23:01     ` Jim Mattson
2021-01-19 23:45       ` Babu Moger
2021-01-20 21:14         ` Jim Mattson
2021-01-20 21:45           ` Babu Moger
2021-01-21  3:10             ` Babu Moger
2021-01-21 23:51               ` Babu Moger
2021-01-23  1:52                 ` Babu Moger
2021-02-24  0:13                   ` Jim Mattson
2021-02-24 22:17                     ` Babu Moger
2021-03-10  1:04                       ` Babu Moger
2021-03-10  9:08                         ` Paolo Bonzini
2021-03-10 14:55                           ` Babu Moger
2021-03-10 14:58                             ` Babu Moger
2021-03-10 15:31                               ` Paolo Bonzini
2021-03-11  1:21                                 ` Babu Moger
2021-03-11 20:07                                   ` Borislav Petkov
2021-03-11 20:32                                     ` Borislav Petkov
2021-03-11 20:57                                       ` Babu Moger
2021-03-11 21:40                                         ` Borislav Petkov
2021-03-11 22:04                                           ` Babu Moger
2021-03-11 22:15                                             ` Babu Moger
2021-03-11 23:52                                               ` Borislav Petkov
2021-03-12 16:12                                                 ` Babu Moger
2021-03-24 21:21                                                   ` Borislav Petkov
2021-03-24 21:59                                                     ` Paolo Bonzini
2021-03-25  0:05                                                     ` Hugh Dickins
2021-03-25  2:43                                                       ` Hugh Dickins
2021-03-25  9:56                                                         ` Borislav Petkov
2021-03-25 10:29                                                           ` [PATCH] x86/tlb: Flush global mappings when KAISER is disabled Borislav Petkov
2021-03-25 10:52                                                             ` Paolo Bonzini
2021-03-25 15:13                                                             ` Babu Moger
2021-03-25 16:33                                                             ` Hugh Dickins
2021-03-25 19:00                                                               ` Jim Mattson
2021-03-25 20:09                                                             ` Borislav Petkov
2021-03-25 20:36                                                               ` Sasha Levin
2021-03-25 23:19                                                                 ` Sasha Levin
2021-03-25 23:56                                                                   ` Ben Hutchings
2021-03-11 21:23                                       ` [PATCH v6 00/12] SVM cleanup and INVPCID feature support Jim Mattson
2021-03-11 21:36                                         ` Borislav Petkov
2021-03-11 21:50                                           ` Babu Moger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).