linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 0/7] Introduce support for guest CET feature
@ 2019-11-01  8:52 Yang Weijiang
  2019-11-01  8:52 ` [PATCH v8 1/7] KVM: CPUID: Fix IA32_XSS support in CPUID(0xd,i) enumeration Yang Weijiang
                   ` (7 more replies)
  0 siblings, 8 replies; 32+ messages in thread
From: Yang Weijiang @ 2019-11-01  8:52 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini, sean.j.christopherson
  Cc: jmattson, yu.c.zhang, yu-cheng.yu, Yang Weijiang

Control-flow Enforcement Technology (CET) provides protection against
Return/Jump-Oriented Programming (ROP/JOP) attack. It includes two
sub-features: Shadow Stack (SHSTK) and Indirect Branch Tracking (IBT).

KVM change is required to support guest CET feature.
This patch serial implemented CET related CPUID/XSAVES enumeration, MSRs
and vmentry/vmexit configuration etc.so that guest kernel can setup CET
runtime infrastructure based on them. Some CET MSRs and related feature
flags used reference the definitions in kernel patchset.

CET kernel patches is here:
https://lkml.org/lkml/2019/8/13/1110
https://lkml.org/lkml/2019/8/13/1109

v7 -> v8:
- Addressed Jim and Sean's feedback on: 1) CPUID(0xD,i) enumeration. 2)
  sanity check when configure guest CET. 3) function improvement.
- Added more sanity check functions.
- Set host vmexit default status so that guest won't leak CET status to
  host when vmexit.
- Added CR0.WP vs. CR4.CET mutual constrains.

v6 -> v7:
- Rebased patch to kernel v5.3
- Sean suggested to change CPUID(0xd, n) enumeration code as alined with
  existing one, and I think it's better to make the fix as an independent patch 
  since XSS MSR are being used widely on X86 platforms.
- Check more host and guest status before configure guest CET
  per Sean's feedback.
- Add error-check before guest accesses CET MSRs per Sean's feedback.
- Other minor fixes suggested by Sean.

v5 -> v6:
- Rebase patch to kernel v5.2.
- Move CPUID(0xD, n>=1) helper to a seperate patch.
- Merge xsave size fix with other patch.
- Other minor fixes per community feedback.

v4 -> v5:
- Rebase patch to kernel v5.1.
- Wrap CPUID(0xD, n>=1) code to a helper function.
- Pass through MSR_IA32_PL1_SSP and MSR_IA32_PL2_SSP to Guest.
- Add Co-developed-by expression in patch description.
- Refine patch description.

v3 -> v4:
- Add Sean's patch for loading Guest fpu state before access XSAVES
  managed CET MSRs.
- Melt down CET bits setting into CPUID configuration patch.
- Add VMX interface to query Host XSS.
- Check Host and Guest XSS support bits before set Guest XSS.
- Make Guest SHSTK and IBT feature enabling independent.
- Do not report CET support to Guest when Host CET feature is Disabled.

v2 -> v3:
- Modified patches to make Guest CET independent to Host enabling.
- Added patch 8 to add user space access for Guest CET MSR access.
- Modified code comments and patch description to reflect changes.

v1 -> v2:
- Re-ordered patch sequence, combined one patch.
- Added more description for CET related VMCS fields.
- Added Host CET capability check while enabling Guest CET loading bit.
- Added Host CET capability check while reporting Guest CPUID(EAX=7, EXC=0).
- Modified code in reporting Guest CPUID(EAX=D,ECX>=1), make it clearer.
- Added Host and Guest XSS mask check while setting bits for Guest XSS.


PATCH 1    : Fix CPUID(0xD, n) enumeration to support XSS MSR.
PATCH 2    : Define CET VMCS fields and bits.
PATCH 3    : Pass through CET MSRs to guest.
PATCH 4    : Load guest/host CET states on vmentry/vmexit.
PATCH 5    : Enable update to IA32_XSS for guest.
PATCH 6    : Load Guest FPU states for XSAVES managed MSRs.
PATCH 7    : Add user-space CET MSR access interface.



Sean Christopherson (1):
  KVM: X86: Load guest fpu state when accessing MSRs managed by XSAVES

Yang Weijiang (6):
  KVM: CPUID: Fix IA32_XSS support in CPUID(0xd,i) enumeration
  KVM: VMX: Define CET VMCS fields and #CP flag
  KVM: VMX: Pass through CET related MSRs
  KVM: VMX: Load CET states on vmentry/vmexit
  KVM: X86: Enable CET bits update in IA32_XSS
  KVM: X86: Add user-space access interface for CET MSRs

 arch/x86/include/asm/kvm_host.h |   5 +-
 arch/x86/include/asm/vmx.h      |   8 +
 arch/x86/include/uapi/asm/kvm.h |   1 +
 arch/x86/kvm/cpuid.c            | 121 +++++++++-----
 arch/x86/kvm/cpuid.h            |   2 +
 arch/x86/kvm/svm.c              |   7 +
 arch/x86/kvm/vmx/capabilities.h |  10 ++
 arch/x86/kvm/vmx/vmx.c          | 272 +++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.c              |  25 ++-
 arch/x86/kvm/x86.h              |  10 +-
 10 files changed, 415 insertions(+), 46 deletions(-)

-- 
2.17.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v8 1/7] KVM: CPUID: Fix IA32_XSS support in CPUID(0xd,i) enumeration
  2019-11-01  8:52 [PATCH v8 0/7] Introduce support for guest CET feature Yang Weijiang
@ 2019-11-01  8:52 ` Yang Weijiang
  2019-11-01  8:52 ` [PATCH v8 2/7] KVM: VMX: Define CET VMCS fields and #CP flag Yang Weijiang
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 32+ messages in thread
From: Yang Weijiang @ 2019-11-01  8:52 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini, sean.j.christopherson
  Cc: jmattson, yu.c.zhang, yu-cheng.yu, Yang Weijiang

The control bits in IA32_XSS MSR are being used for new features,
but current CPUID(0xd,i) enumeration code doesn't support them, so
fix existing code first.

The supervisor states in IA32_XSS haven't been used in public
KVM code, so set KVM_SUPPORTED_XSS to 0 now, anyone who's developing
IA32_XSS related feature may expand the macro to add the CPUID support,
otherwise, CPUID(0xd,i>1) always reports 0 of the subleaf to guest.

Extracted old code into a new filter and keep it same flavor as others.

This patch passed selftest on a few Intel platforms.

Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/include/asm/kvm_host.h |   1 +
 arch/x86/kvm/cpuid.c            | 105 ++++++++++++++++++++++----------
 arch/x86/kvm/svm.c              |   7 +++
 arch/x86/kvm/vmx/vmx.c          |   6 ++
 arch/x86/kvm/x86.h              |   7 +++
 5 files changed, 94 insertions(+), 32 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 74e88e5edd9c..d018df8c5f32 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1209,6 +1209,7 @@ struct kvm_x86_ops {
 	uint16_t (*nested_get_evmcs_version)(struct kvm_vcpu *vcpu);
 
 	bool (*need_emulation_on_page_fault)(struct kvm_vcpu *vcpu);
+	u64 (*supported_xss)(void);
 };
 
 struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 22c2720cd948..dd387a785c1e 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -62,6 +62,17 @@ u64 kvm_supported_xcr0(void)
 	return xcr0;
 }
 
+u64 kvm_supported_xss(void)
+{
+	u64 kvm_xss = KVM_SUPPORTED_XSS & kvm_x86_ops->supported_xss();
+
+	if (!kvm_x86_ops->xsaves_supported())
+		return 0;
+
+	return kvm_xss;
+}
+EXPORT_SYMBOL_GPL(kvm_supported_xss);
+
 #define F(x) bit(X86_FEATURE_##x)
 
 int kvm_update_cpuid(struct kvm_vcpu *vcpu)
@@ -414,6 +425,58 @@ static inline void do_cpuid_7_mask(struct kvm_cpuid_entry2 *entry, int index)
 	}
 }
 
+static inline bool do_cpuid_0xd_mask(struct kvm_cpuid_entry2 *entry, int index)
+{
+	unsigned int f_xsaves = kvm_x86_ops->xsaves_supported() ? F(XSAVES) : 0;
+	/* cpuid 0xD.1.eax */
+	const u32 kvm_cpuid_D_1_eax_x86_features =
+		F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1) | f_xsaves;
+	u64 u_supported = kvm_supported_xcr0();
+	u64 s_supported = kvm_supported_xss();
+	u64 supported;
+
+	switch (index) {
+	case 0:
+		entry->eax &= u_supported;
+		entry->ebx = xstate_required_size(u_supported, false);
+		entry->ecx = entry->ebx;
+		entry->edx &= u_supported >> 32;
+
+		if (!u_supported) {
+			entry->eax = 0;
+			entry->ebx = 0;
+			entry->ecx = 0;
+			entry->edx = 0;
+			return false;
+		}
+		break;
+	case 1:
+		supported = u_supported | s_supported;
+		entry->eax &= kvm_cpuid_D_1_eax_x86_features;
+		cpuid_mask(&entry->eax, CPUID_D_1_EAX);
+		entry->ebx = 0;
+		entry->edx &= s_supported >> 32;
+		entry->ecx &= s_supported;
+		if (entry->eax & (F(XSAVES) | F(XSAVEC)))
+			entry->ebx = xstate_required_size(supported, true);
+
+		break;
+	default:
+		supported = (entry->ecx & 0x1) ? s_supported : u_supported;
+		if (!(supported & (BIT_ULL(index)))) {
+			entry->eax = 0;
+			entry->ebx = 0;
+			entry->ecx = 0;
+			entry->edx = 0;
+			return false;
+		}
+		if (entry->ecx & 0x1)
+			entry->ebx = 0;
+		break;
+	}
+	return true;
+}
+
 static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
 				  int *nent, int maxnent)
 {
@@ -428,7 +491,6 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
 	unsigned f_lm = 0;
 #endif
 	unsigned f_rdtscp = kvm_x86_ops->rdtscp_supported() ? F(RDTSCP) : 0;
-	unsigned f_xsaves = kvm_x86_ops->xsaves_supported() ? F(XSAVES) : 0;
 	unsigned f_intel_pt = kvm_x86_ops->pt_supported() ? F(INTEL_PT) : 0;
 
 	/* cpuid 1.edx */
@@ -482,10 +544,6 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
 		F(ACE2) | F(ACE2_EN) | F(PHE) | F(PHE_EN) |
 		F(PMM) | F(PMM_EN);
 
-	/* cpuid 0xD.1.eax */
-	const u32 kvm_cpuid_D_1_eax_x86_features =
-		F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1) | f_xsaves;
-
 	/* all calls to cpuid_count() should be made on the same cpu */
 	get_cpu();
 
@@ -622,38 +680,21 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
 		break;
 	}
 	case 0xd: {
-		int idx, i;
-		u64 supported = kvm_supported_xcr0();
+		int i, idx;
 
-		entry->eax &= supported;
-		entry->ebx = xstate_required_size(supported, false);
-		entry->ecx = entry->ebx;
-		entry->edx &= supported >> 32;
-		if (!supported)
+		if (!do_cpuid_0xd_mask(&entry[0], 0))
 			break;
-
-		for (idx = 1, i = 1; idx < 64; ++idx) {
-			u64 mask = ((u64)1 << idx);
+		for (i = 1, idx = 1; idx < 64; ++idx) {
 			if (*nent >= maxnent)
 				goto out;
-
 			do_host_cpuid(&entry[i], function, idx);
-			if (idx == 1) {
-				entry[i].eax &= kvm_cpuid_D_1_eax_x86_features;
-				cpuid_mask(&entry[i].eax, CPUID_D_1_EAX);
-				entry[i].ebx = 0;
-				if (entry[i].eax & (F(XSAVES)|F(XSAVEC)))
-					entry[i].ebx =
-						xstate_required_size(supported,
-								     true);
-			} else {
-				if (entry[i].eax == 0 || !(supported & mask))
-					continue;
-				if (WARN_ON_ONCE(entry[i].ecx & 1))
-					continue;
-			}
-			entry[i].ecx = 0;
-			entry[i].edx = 0;
+			if (entry[i].eax == 0 && entry[i].ebx == 0 &&
+			    entry[i].ecx == 0 && entry[i].edx == 0)
+				continue;
+
+			if (!do_cpuid_0xd_mask(&entry[i], idx))
+				continue;
+
 			++*nent;
 			++i;
 		}
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index e0368076a1ef..be967bf9a81d 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -7193,6 +7193,11 @@ static bool svm_need_emulation_on_page_fault(struct kvm_vcpu *vcpu)
 	return false;
 }
 
+static u64 svm_supported_xss(void)
+{
+	return 0;
+}
+
 static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
 	.cpu_has_kvm_support = has_svm,
 	.disabled_by_bios = is_disabled,
@@ -7329,6 +7334,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
 	.nested_get_evmcs_version = NULL,
 
 	.need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
+
+	.supported_xss = svm_supported_xss,
 };
 
 static int __init svm_init(void)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c6f6b05004d9..a84198cff397 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1651,6 +1651,11 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
 	return !(val & ~valid_bits);
 }
 
+static inline u64 vmx_supported_xss(void)
+{
+	return host_xss;
+}
+
 static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
 {
 	switch (msr->index) {
@@ -7799,6 +7804,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
 	.nested_enable_evmcs = NULL,
 	.nested_get_evmcs_version = NULL,
 	.need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
+	.supported_xss = vmx_supported_xss,
 };
 
 static void vmx_cleanup_l1d_flush(void)
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 6594020c0691..f10c12b5197d 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -293,6 +293,13 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, unsigned long cr2,
 				| XFEATURE_MASK_YMM | XFEATURE_MASK_BNDREGS \
 				| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
 				| XFEATURE_MASK_PKRU)
+
+/*
+ * Right now, no XSS states are used on x86 platform,
+ * expand the macro for new features.
+ */
+#define KVM_SUPPORTED_XSS	0
+
 extern u64 host_xcr0;
 
 extern u64 kvm_supported_xcr0(void);
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v8 2/7] KVM: VMX: Define CET VMCS fields and #CP flag
  2019-11-01  8:52 [PATCH v8 0/7] Introduce support for guest CET feature Yang Weijiang
  2019-11-01  8:52 ` [PATCH v8 1/7] KVM: CPUID: Fix IA32_XSS support in CPUID(0xd,i) enumeration Yang Weijiang
@ 2019-11-01  8:52 ` Yang Weijiang
  2019-12-10 21:00   ` Sean Christopherson
  2019-11-01  8:52 ` [PATCH v8 3/7] KVM: VMX: Pass through CET related MSRs Yang Weijiang
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 32+ messages in thread
From: Yang Weijiang @ 2019-11-01  8:52 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini, sean.j.christopherson
  Cc: jmattson, yu.c.zhang, yu-cheng.yu, Yang Weijiang

CET(Control-flow Enforcement Technology) is an upcoming Intel(R)
processor feature that blocks Return/Jump-Oriented Programming(ROP)
attacks. It provides the following capabilities to defend
against ROP/JOP style control-flow subversion attacks:

Shadow Stack (SHSTK):
  A second stack for program which is used exclusively for
  control transfer operations.

Indirect Branch Tracking (IBT):
  Code branching protection to defend against jump/call oriented
  programming.

Several new CET MSRs are defined in kernel to support CET:
  MSR_IA32_{U,S}_CET: Controls the CET settings for user
                      mode and suervisor mode respectively.

  MSR_IA32_PL{0,1,2,3}_SSP: Stores shadow stack pointers for
                            CPL-0,1,2,3 level respectively.

  MSR_IA32_INT_SSP_TAB: Stores base address of shadow stack
                        pointer table.

Two XSAVES state bits are introduced for CET:
  IA32_XSS:[bit 11]: For saving/restoring user mode CET states
  IA32_XSS:[bit 12]: For saving/restoring supervisor mode CET states.

Six VMCS fields are introduced for CET:
  {HOST,GUEST}_S_CET: Stores CET settings for supervisor mode.
  {HOST,GUEST}_SSP: Stores shadow stack pointer for supervisor mode.
  {HOST,GUEST}_INTR_SSP_TABLE: Stores base address of shadow stack pointer
                               table.

If VM_EXIT_LOAD_HOST_CET_STATE = 1, the host's CET MSRs are restored
from below VMCS fields at VM-Exit:
  HOST_S_CET
  HOST_SSP
  HOST_INTR_SSP_TABLE

If VM_ENTRY_LOAD_GUEST_CET_STATE = 1, the guest's CET MSRs are loaded
from below VMCS fields at VM-Entry:
  GUEST_S_CET
  GUEST_SSP
  GUEST_INTR_SSP_TABLE

Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/include/asm/vmx.h      | 8 ++++++++
 arch/x86/include/uapi/asm/kvm.h | 1 +
 arch/x86/kvm/x86.c              | 1 +
 arch/x86/kvm/x86.h              | 5 +++--
 4 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index a39136b0d509..68bca290a203 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -90,6 +90,7 @@
 #define VM_EXIT_CLEAR_BNDCFGS                   0x00800000
 #define VM_EXIT_PT_CONCEAL_PIP			0x01000000
 #define VM_EXIT_CLEAR_IA32_RTIT_CTL		0x02000000
+#define VM_EXIT_LOAD_HOST_CET_STATE             0x10000000
 
 #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR	0x00036dff
 
@@ -103,6 +104,7 @@
 #define VM_ENTRY_LOAD_BNDCFGS                   0x00010000
 #define VM_ENTRY_PT_CONCEAL_PIP			0x00020000
 #define VM_ENTRY_LOAD_IA32_RTIT_CTL		0x00040000
+#define VM_ENTRY_LOAD_GUEST_CET_STATE           0x00100000
 
 #define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR	0x000011ff
 
@@ -321,6 +323,9 @@ enum vmcs_field {
 	GUEST_PENDING_DBG_EXCEPTIONS    = 0x00006822,
 	GUEST_SYSENTER_ESP              = 0x00006824,
 	GUEST_SYSENTER_EIP              = 0x00006826,
+	GUEST_S_CET                     = 0x00006828,
+	GUEST_SSP                       = 0x0000682a,
+	GUEST_INTR_SSP_TABLE            = 0x0000682c,
 	HOST_CR0                        = 0x00006c00,
 	HOST_CR3                        = 0x00006c02,
 	HOST_CR4                        = 0x00006c04,
@@ -333,6 +338,9 @@ enum vmcs_field {
 	HOST_IA32_SYSENTER_EIP          = 0x00006c12,
 	HOST_RSP                        = 0x00006c14,
 	HOST_RIP                        = 0x00006c16,
+	HOST_S_CET                      = 0x00006c18,
+	HOST_SSP                        = 0x00006c1a,
+	HOST_INTR_SSP_TABLE             = 0x00006c1c
 };
 
 /*
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 503d3f42da16..e68d6b448730 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -31,6 +31,7 @@
 #define MC_VECTOR 18
 #define XM_VECTOR 19
 #define VE_VECTOR 20
+#define CP_VECTOR 21
 
 /* Select x86 specific features in <linux/kvm.h> */
 #define __KVM_HAVE_PIT
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 290c3c3efb87..540490d5385f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -378,6 +378,7 @@ static int exception_class(int vector)
 	case NP_VECTOR:
 	case SS_VECTOR:
 	case GP_VECTOR:
+	case CP_VECTOR:
 		return EXCPT_CONTRIBUTORY;
 	default:
 		break;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index f10c12b5197d..7e7b5b5cc956 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -114,7 +114,7 @@ static inline bool x86_exception_has_error_code(unsigned int vector)
 {
 	static u32 exception_has_error_code = BIT(DF_VECTOR) | BIT(TS_VECTOR) |
 			BIT(NP_VECTOR) | BIT(SS_VECTOR) | BIT(GP_VECTOR) |
-			BIT(PF_VECTOR) | BIT(AC_VECTOR);
+			BIT(PF_VECTOR) | BIT(AC_VECTOR) | BIT(CP_VECTOR);
 
 	return (1U << vector) & exception_has_error_code;
 }
@@ -298,7 +298,8 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, unsigned long cr2,
  * Right now, no XSS states are used on x86 platform,
  * expand the macro for new features.
  */
-#define KVM_SUPPORTED_XSS	0
+#define KVM_SUPPORTED_XSS	(XFEATURE_MASK_CET_USER \
+				| XFEATURE_MASK_CET_KERNEL)
 
 extern u64 host_xcr0;
 
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v8 3/7] KVM: VMX: Pass through CET related MSRs
  2019-11-01  8:52 [PATCH v8 0/7] Introduce support for guest CET feature Yang Weijiang
  2019-11-01  8:52 ` [PATCH v8 1/7] KVM: CPUID: Fix IA32_XSS support in CPUID(0xd,i) enumeration Yang Weijiang
  2019-11-01  8:52 ` [PATCH v8 2/7] KVM: VMX: Define CET VMCS fields and #CP flag Yang Weijiang
@ 2019-11-01  8:52 ` Yang Weijiang
  2019-12-10 21:18   ` Sean Christopherson
  2019-11-01  8:52 ` [PATCH v8 4/7] KVM: VMX: Load CET states on vmentry/vmexit Yang Weijiang
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 32+ messages in thread
From: Yang Weijiang @ 2019-11-01  8:52 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini, sean.j.christopherson
  Cc: jmattson, yu.c.zhang, yu-cheng.yu, Yang Weijiang

CET MSRs pass through Guest directly to enhance performance.
CET runtime control settings are stored in MSR_IA32_{U,S}_CET,
Shadow Stack Pointer(SSP) are stored in MSR_IA32_PL{0,1,2,3}_SSP,
SSP table base address is stored in MSR_IA32_INT_SSP_TAB,
these MSRs are defined in kernel and re-used here.

MSR_IA32_U_CET and MSR_IA32_PL3_SSP are used for user mode protection,
the contents could differ from process to process, therefore,
kernel needs to save/restore them during context switch, it makes
sense to pass through them so that the guest kernel can
use xsaves/xrstors to operate them efficiently. Other MSRs are used
for non-user mode protection. See CET spec for detailed info.

The difference between CET VMCS state fields and xsave components is that,
the former used for CET state storage during VMEnter/VMExit,
whereas the latter used for state retention between Guest task/process
switch.

Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/cpuid.c   |  4 +--
 arch/x86/kvm/cpuid.h   |  2 ++
 arch/x86/kvm/vmx/vmx.c | 65 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 69 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index dd387a785c1e..4166c4fcad1e 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -371,13 +371,13 @@ static inline void do_cpuid_7_mask(struct kvm_cpuid_entry2 *entry, int index)
 		F(AVX512VBMI) | F(LA57) | F(PKU) | 0 /*OSPKE*/ |
 		F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
 		F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
-		F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B);
+		F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | F(SHSTK);
 
 	/* cpuid 7.0.edx*/
 	const u32 kvm_cpuid_7_0_edx_x86_features =
 		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
 		F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
-		F(MD_CLEAR);
+		F(MD_CLEAR) | F(IBT);
 
 	/* cpuid 7.1.eax */
 	const u32 kvm_cpuid_7_1_eax_x86_features =
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index d78a61408243..1d77b880084d 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -27,6 +27,8 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
 
 int cpuid_query_maxphyaddr(struct kvm_vcpu *vcpu);
 
+u64 kvm_supported_xss(void);
+
 static inline int cpuid_maxphyaddr(struct kvm_vcpu *vcpu)
 {
 	return vcpu->arch.maxphyaddr;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index a84198cff397..db03d9dc1297 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2918,6 +2918,24 @@ void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 	vmcs_writel(GUEST_CR3, guest_cr3);
 }
 
+static bool guest_cet_allowed(struct kvm_vcpu *vcpu, u32 feature, u32 mode)
+{
+	u64 kvm_xss = kvm_supported_xss();
+
+	/*
+	 * Sanity check for guest CET dependencies, guest_cpu_has(SHSTK|IBT) has
+	 * implied corresponding host CET status check.
+	 */
+	if (feature == X86_FEATURE_SHSTK)
+		return guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) &&
+		       (kvm_xss & mode);
+	else if (feature == X86_FEATURE_IBT)
+		return guest_cpuid_has(vcpu, X86_FEATURE_IBT) &&
+		       (kvm_xss & mode);
+
+	return false;
+}
+
 int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -7001,6 +7019,50 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
 		vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4));
 }
 
+static void vmx_pass_cet_msrs(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	unsigned long *msr_bitmap = vmx->vmcs01.msr_bitmap;
+
+	/*
+	 * U_CET is required for USER CET, per CET spec., meanwhile U_CET and
+	 * PL3_SPP are a bundle for USER CET xsaves.
+	 */
+	if (guest_cet_allowed(vcpu, X86_FEATURE_SHSTK, XFEATURE_MASK_CET_USER) ||
+	    guest_cet_allowed(vcpu, X86_FEATURE_IBT, XFEATURE_MASK_CET_USER)) {
+		vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_U_CET, MSR_TYPE_RW);
+		vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL3_SSP, MSR_TYPE_RW);
+	} else {
+		vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_U_CET, MSR_TYPE_RW, true);
+		vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_PL3_SSP, MSR_TYPE_RW, true);
+	}
+	/*
+	 * S_CET is required for KERNEL CET, meanwhile PL0_SSP ... PL2_SSP are a bundle
+	 * for CET KERNEL xsaves.
+	 */
+	if (guest_cet_allowed(vcpu, X86_FEATURE_SHSTK, XFEATURE_MASK_CET_KERNEL) ||
+	    guest_cet_allowed(vcpu, X86_FEATURE_IBT, XFEATURE_MASK_CET_KERNEL)) {
+		vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_S_CET, MSR_TYPE_RW);
+		vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL0_SSP, MSR_TYPE_RW);
+		vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL1_SSP, MSR_TYPE_RW);
+		vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL2_SSP, MSR_TYPE_RW);
+
+		/* SSP_TAB only available for KERNEL SHSTK.*/
+		if (guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
+			vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_INT_SSP_TAB,
+						      MSR_TYPE_RW);
+		else
+			vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_INT_SSP_TAB,
+						  MSR_TYPE_RW, true);
+	} else {
+		vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_S_CET, MSR_TYPE_RW, true);
+		vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_PL0_SSP, MSR_TYPE_RW, true);
+		vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_PL1_SSP, MSR_TYPE_RW, true);
+		vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_PL2_SSP, MSR_TYPE_RW, true);
+		vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_INT_SSP_TAB, MSR_TYPE_RW, true);
+	}
+}
+
 static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -7025,6 +7087,9 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
 	if (boot_cpu_has(X86_FEATURE_INTEL_PT) &&
 			guest_cpuid_has(vcpu, X86_FEATURE_INTEL_PT))
 		update_intel_pt_cfg(vcpu);
+
+	if (!is_guest_mode(vcpu))
+		vmx_pass_cet_msrs(vcpu);
 }
 
 static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v8 4/7] KVM: VMX: Load CET states on vmentry/vmexit
  2019-11-01  8:52 [PATCH v8 0/7] Introduce support for guest CET feature Yang Weijiang
                   ` (2 preceding siblings ...)
  2019-11-01  8:52 ` [PATCH v8 3/7] KVM: VMX: Pass through CET related MSRs Yang Weijiang
@ 2019-11-01  8:52 ` Yang Weijiang
  2019-12-10 21:23   ` Sean Christopherson
  2019-11-01  8:52 ` [PATCH v8 5/7] KVM: X86: Enable CET bits update in IA32_XSS Yang Weijiang
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 32+ messages in thread
From: Yang Weijiang @ 2019-11-01  8:52 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini, sean.j.christopherson
  Cc: jmattson, yu.c.zhang, yu-cheng.yu, Yang Weijiang

"Load {guest,host} CET state" bit controls whether guest/host
CET states will be loaded at VM entry/exit. Before doing that,
KVM needs to check if CET is both enabled on host and guest.

Note: SHSTK and IBT features share one control MSR:
MSR_IA32_{U,S}_CET, which means it's difficult to hide
one feature from another in the case of SHSTK != IBT,
after discussed in community, it's agreed to allow Guest
control two features independently as it won't introduce
security hole.

Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  3 +-
 arch/x86/kvm/vmx/capabilities.h | 10 ++++++
 arch/x86/kvm/vmx/vmx.c          | 55 +++++++++++++++++++++++++++++++--
 3 files changed, 65 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d018df8c5f32..f1e6cebaeb15 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -90,7 +90,8 @@
 			  | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_PCIDE \
 			  | X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \
 			  | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \
-			  | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP))
+			  | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP \
+			  | X86_CR4_CET))
 
 #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR)
 
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index d6664ee3d127..2720c9f4cd49 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -106,6 +106,16 @@ static inline bool vmx_mpx_supported(void)
 		(vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_BNDCFGS);
 }
 
+static inline bool cpu_has_load_guest_cet_states_ctrl(void)
+{
+	return ((vmcs_config.vmentry_ctrl) & VM_ENTRY_LOAD_GUEST_CET_STATE);
+}
+
+static inline bool cpu_has_load_host_cet_states_ctrl(void)
+{
+	return ((vmcs_config.vmexit_ctrl) & VM_EXIT_LOAD_HOST_CET_STATE);
+}
+
 static inline bool cpu_has_vmx_tpr_shadow(void)
 {
 	return vmcs_config.cpu_based_exec_ctrl & CPU_BASED_TPR_SHADOW;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index db03d9dc1297..e392e818e7eb 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -44,6 +44,7 @@
 #include <asm/spec-ctrl.h>
 #include <asm/virtext.h>
 #include <asm/vmx.h>
+#include <asm/cet.h>
 
 #include "capabilities.h"
 #include "cpuid.h"
@@ -2336,7 +2337,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 	      VM_EXIT_LOAD_IA32_EFER |
 	      VM_EXIT_CLEAR_BNDCFGS |
 	      VM_EXIT_PT_CONCEAL_PIP |
-	      VM_EXIT_CLEAR_IA32_RTIT_CTL;
+	      VM_EXIT_CLEAR_IA32_RTIT_CTL |
+	      VM_EXIT_LOAD_HOST_CET_STATE;
 	if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_EXIT_CTLS,
 				&_vmexit_control) < 0)
 		return -EIO;
@@ -2360,7 +2362,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 	      VM_ENTRY_LOAD_IA32_EFER |
 	      VM_ENTRY_LOAD_BNDCFGS |
 	      VM_ENTRY_PT_CONCEAL_PIP |
-	      VM_ENTRY_LOAD_IA32_RTIT_CTL;
+	      VM_ENTRY_LOAD_IA32_RTIT_CTL |
+	      VM_ENTRY_LOAD_GUEST_CET_STATE;
 	if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_ENTRY_CTLS,
 				&_vmentry_control) < 0)
 		return -EIO;
@@ -2834,6 +2837,9 @@ void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	unsigned long hw_cr0;
 
+	if (!(cr0 & X86_CR0_WP) && kvm_read_cr4_bits(vcpu, X86_CR4_CET))
+		cr0 |= X86_CR0_WP;
+
 	hw_cr0 = (cr0 & ~KVM_VM_CR0_ALWAYS_OFF);
 	if (enable_unrestricted_guest)
 		hw_cr0 |= KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST;
@@ -2936,6 +2942,22 @@ static bool guest_cet_allowed(struct kvm_vcpu *vcpu, u32 feature, u32 mode)
 	return false;
 }
 
+bool is_cet_bit_allowed(struct kvm_vcpu *vcpu)
+{
+	unsigned long cr0;
+	bool cet_allowed;
+
+	cr0 = kvm_read_cr0(vcpu);
+	cet_allowed = guest_cet_allowed(vcpu, X86_FEATURE_SHSTK,
+					XFEATURE_MASK_CET_USER) ||
+		      guest_cet_allowed(vcpu, X86_FEATURE_IBT,
+					XFEATURE_MASK_CET_USER);
+	if ((cr0 & X86_CR0_WP) && cet_allowed)
+		return true;
+
+	return false;
+}
+
 int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -2976,6 +2998,9 @@ int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 			return 1;
 	}
 
+	if ((cr4 & X86_CR4_CET) && !is_cet_bit_allowed(vcpu))
+		return 1;
+
 	if (vmx->nested.vmxon && !nested_cr4_valid(vcpu, cr4))
 		return 1;
 
@@ -3839,6 +3864,12 @@ void vmx_set_constant_host_state(struct vcpu_vmx *vmx)
 
 	if (cpu_has_load_ia32_efer())
 		vmcs_write64(HOST_IA32_EFER, host_efer);
+
+	if (cpu_has_load_host_cet_states_ctrl()) {
+		vmcs_writel(HOST_S_CET, 0);
+		vmcs_writel(HOST_INTR_SSP_TABLE, 0);
+		vmcs_writel(HOST_SSP, 0);
+	}
 }
 
 void set_cr4_guest_host_mask(struct vcpu_vmx *vmx)
@@ -6436,6 +6467,7 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	unsigned long cr3, cr4;
+	bool cet_allowed;
 
 	/* Record the guest's net vcpu time for enforced NMI injections. */
 	if (unlikely(!enable_vnmi &&
@@ -6466,6 +6498,25 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
 		vmx->loaded_vmcs->host_state.cr3 = cr3;
 	}
 
+	/* To be aligned with kernel code, only user mode is supported now. */
+	cet_allowed = guest_cet_allowed(vcpu, X86_FEATURE_SHSTK,
+					XFEATURE_MASK_CET_USER) ||
+		      guest_cet_allowed(vcpu, X86_FEATURE_IBT,
+					XFEATURE_MASK_CET_USER);
+	if (cpu_has_load_guest_cet_states_ctrl() && cet_allowed)
+		vmcs_set_bits(VM_ENTRY_CONTROLS,
+			      VM_ENTRY_LOAD_GUEST_CET_STATE);
+	else
+		vmcs_clear_bits(VM_ENTRY_CONTROLS,
+				VM_ENTRY_LOAD_GUEST_CET_STATE);
+
+	if (cpu_has_load_host_cet_states_ctrl() && cet_allowed)
+		vmcs_set_bits(VM_EXIT_CONTROLS,
+			      VM_EXIT_LOAD_HOST_CET_STATE);
+	else
+		vmcs_clear_bits(VM_EXIT_CONTROLS,
+				VM_EXIT_LOAD_HOST_CET_STATE);
+
 	cr4 = cr4_read_shadow();
 	if (unlikely(cr4 != vmx->loaded_vmcs->host_state.cr4)) {
 		vmcs_writel(HOST_CR4, cr4);
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v8 5/7] KVM: X86: Enable CET bits update in IA32_XSS
  2019-11-01  8:52 [PATCH v8 0/7] Introduce support for guest CET feature Yang Weijiang
                   ` (3 preceding siblings ...)
  2019-11-01  8:52 ` [PATCH v8 4/7] KVM: VMX: Load CET states on vmentry/vmexit Yang Weijiang
@ 2019-11-01  8:52 ` Yang Weijiang
  2019-11-01  8:52 ` [PATCH v8 6/7] KVM: X86: Load guest fpu state when accessing MSRs managed by XSAVES Yang Weijiang
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 32+ messages in thread
From: Yang Weijiang @ 2019-11-01  8:52 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini, sean.j.christopherson
  Cc: jmattson, yu.c.zhang, yu-cheng.yu, Yang Weijiang

Previously, KVM did not support setting any bits in XSS
so it's hardcoded to check and inject a #GP if Guest
attempted to write a non-zero value to XSS, now it supports
CET related bits setting.

Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/cpuid.c            | 12 ++++++++++--
 arch/x86/kvm/vmx/vmx.c          |  6 +-----
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f1e6cebaeb15..8f97269d6d9f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -624,6 +624,7 @@ struct kvm_vcpu_arch {
 
 	u64 xcr0;
 	u64 guest_supported_xcr0;
+	u64 guest_supported_xss;
 	u32 guest_xstate_size;
 
 	struct kvm_pio_request pio;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 4166c4fcad1e..eafb66da86fe 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -125,8 +125,16 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
 	}
 
 	best = kvm_find_cpuid_entry(vcpu, 0xD, 1);
-	if (best && (best->eax & (F(XSAVES) | F(XSAVEC))))
-		best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
+	if (best && (best->eax & (F(XSAVES) | F(XSAVEC)))) {
+		u64 xstate = vcpu->arch.xcr0 | vcpu->arch.ia32_xss;
+
+		best->ebx = xstate_required_size(xstate, true);
+		vcpu->arch.guest_supported_xss =
+			(best->ecx | ((u64)best->edx << 32)) &
+			kvm_supported_xss();
+	} else {
+		vcpu->arch.guest_supported_xss = 0;
+	}
 
 	/*
 	 * The existing code assumes virtual address is 48-bit or 57-bit in the
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e392e818e7eb..bfb1b922a9ac 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1973,11 +1973,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		     !(guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) &&
 		       guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))))
 			return 1;
-		/*
-		 * The only supported bit as of Skylake is bit 8, but
-		 * it is not supported on KVM.
-		 */
-		if (data != 0)
+		if (data & ~vcpu->arch.guest_supported_xss)
 			return 1;
 		vcpu->arch.ia32_xss = data;
 		if (vcpu->arch.ia32_xss != host_xss)
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v8 6/7] KVM: X86: Load guest fpu state when accessing MSRs managed by XSAVES
  2019-11-01  8:52 [PATCH v8 0/7] Introduce support for guest CET feature Yang Weijiang
                   ` (4 preceding siblings ...)
  2019-11-01  8:52 ` [PATCH v8 5/7] KVM: X86: Enable CET bits update in IA32_XSS Yang Weijiang
@ 2019-11-01  8:52 ` Yang Weijiang
  2019-12-10 21:27   ` Sean Christopherson
  2019-11-01  8:52 ` [PATCH v8 7/7] KVM: X86: Add user-space access interface for CET MSRs Yang Weijiang
  2019-12-12 16:03 ` [PATCH v8 0/7] Introduce support for guest CET feature Konrad Rzeszutek Wilk
  7 siblings, 1 reply; 32+ messages in thread
From: Yang Weijiang @ 2019-11-01  8:52 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini, sean.j.christopherson
  Cc: jmattson, yu.c.zhang, yu-cheng.yu, Yang Weijiang

From: Sean Christopherson <sean.j.christopherson@intel.com>

A handful of CET MSRs are not context switched through "traditional"
methods, e.g. VMCS or manual switching, but rather are passed through
to the guest and are saved and restored by XSAVES/XRSTORS, i.e. the
guest's FPU state.

Load the guest's FPU state if userspace is accessing MSRs whose values
are managed by XSAVES so that the MSR helper, e.g. vmx_{get,set}_msr(),
can simply do {RD,WR}MSR to access the guest's value.

Note that guest_cpuid_has() is not queried as host userspace is allowed
to access MSRs that have not been exposed to the guest, e.g. it might do
KVM_SET_MSRS prior to KVM_SET_CPUID2.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/x86.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 540490d5385f..6275a75d5802 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -104,6 +104,8 @@ static void enter_smm(struct kvm_vcpu *vcpu);
 static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
 static void store_regs(struct kvm_vcpu *vcpu);
 static int sync_regs(struct kvm_vcpu *vcpu);
+static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
+static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
 
 struct kvm_x86_ops *kvm_x86_ops __read_mostly;
 EXPORT_SYMBOL_GPL(kvm_x86_ops);
@@ -3000,6 +3002,12 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 }
 EXPORT_SYMBOL_GPL(kvm_get_msr_common);
 
+static bool is_xsaves_msr(u32 index)
+{
+	return index == MSR_IA32_U_CET ||
+	       (index >= MSR_IA32_PL0_SSP && index <= MSR_IA32_PL3_SSP);
+}
+
 /*
  * Read or write a bunch of msrs. All parameters are kernel addresses.
  *
@@ -3010,11 +3018,22 @@ static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs,
 		    int (*do_msr)(struct kvm_vcpu *vcpu,
 				  unsigned index, u64 *data))
 {
+	bool fpu_loaded = false;
 	int i;
+	const u64 cet_bits = XFEATURE_MASK_CET_USER | XFEATURE_MASK_CET_KERNEL;
+	bool cet_xss = kvm_supported_xss() & cet_bits;
 
-	for (i = 0; i < msrs->nmsrs; ++i)
+	for (i = 0; i < msrs->nmsrs; ++i) {
+		if (!fpu_loaded && cet_xss &&
+		    is_xsaves_msr(entries[i].index)) {
+			kvm_load_guest_fpu(vcpu);
+			fpu_loaded = true;
+		}
 		if (do_msr(vcpu, entries[i].index, &entries[i].data))
 			break;
+	}
+	if (fpu_loaded)
+		kvm_put_guest_fpu(vcpu);
 
 	return i;
 }
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v8 7/7] KVM: X86: Add user-space access interface for CET MSRs
  2019-11-01  8:52 [PATCH v8 0/7] Introduce support for guest CET feature Yang Weijiang
                   ` (5 preceding siblings ...)
  2019-11-01  8:52 ` [PATCH v8 6/7] KVM: X86: Load guest fpu state when accessing MSRs managed by XSAVES Yang Weijiang
@ 2019-11-01  8:52 ` Yang Weijiang
  2019-12-10 21:58   ` Sean Christopherson
  2019-12-12 16:03 ` [PATCH v8 0/7] Introduce support for guest CET feature Konrad Rzeszutek Wilk
  7 siblings, 1 reply; 32+ messages in thread
From: Yang Weijiang @ 2019-11-01  8:52 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini, sean.j.christopherson
  Cc: jmattson, yu.c.zhang, yu-cheng.yu, Yang Weijiang

There're two different places storing Guest CET states, states
managed with XSAVES/XRSTORS, as restored/saved
in previous patch, can be read/write directly from/to the MSRs.
For those stored in VMCS fields, they're access via vmcs_read/
vmcs_write.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 140 +++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c     |   3 +
 2 files changed, 143 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index bfb1b922a9ac..71aba264b5d2 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1671,6 +1671,98 @@ static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
 	return 0;
 }
 
+#define CET_MSR_RSVD_BITS_1    0x3
+#define CET_MSR_RSVD_BITS_2   (0xF << 6)
+
+static bool cet_msr_write_allowed(struct kvm_vcpu *vcpu, struct msr_data *msr)
+{
+	u32 index = msr->index;
+	u64 data = msr->data;
+	u32 high_word = data >> 32;
+
+	if ((index == MSR_IA32_U_CET || index == MSR_IA32_S_CET) &&
+	    (data & CET_MSR_RSVD_BITS_2))
+		return false;
+
+	if (is_64_bit_mode(vcpu)) {
+		if (is_noncanonical_address(data & PAGE_MASK, vcpu))
+			return false;
+		else if ((index == MSR_IA32_PL0_SSP ||
+			  index == MSR_IA32_PL1_SSP ||
+			  index == MSR_IA32_PL2_SSP ||
+			  index == MSR_IA32_PL3_SSP) &&
+			  (data & CET_MSR_RSVD_BITS_1))
+			return false;
+	} else {
+		if (msr->index == MSR_IA32_INT_SSP_TAB)
+			return false;
+		else if ((index == MSR_IA32_U_CET ||
+			  index == MSR_IA32_S_CET ||
+			  index == MSR_IA32_PL0_SSP ||
+			  index == MSR_IA32_PL1_SSP ||
+			  index == MSR_IA32_PL2_SSP ||
+			  index == MSR_IA32_PL3_SSP) &&
+			  (high_word & ~0ul))
+			return false;
+	}
+
+	return true;
+}
+
+static bool cet_msr_access_allowed(struct kvm_vcpu *vcpu, struct msr_data *msr)
+{
+	u64 kvm_xss;
+	u32 index = msr->index;
+
+	if (is_guest_mode(vcpu))
+		return false;
+
+	kvm_xss = kvm_supported_xss();
+
+	switch (index) {
+	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+		if (!boot_cpu_has(X86_FEATURE_SHSTK))
+			return false;
+		if (!msr->host_initiated) {
+			if (!guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
+				return false;
+		} else {
+			if (index == MSR_IA32_PL3_SSP) {
+				if (!(kvm_xss & XFEATURE_MASK_CET_USER))
+					return false;
+			} else {
+				if (!(kvm_xss & XFEATURE_MASK_CET_KERNEL))
+					return false;
+			}
+		}
+		break;
+	case MSR_IA32_U_CET:
+	case MSR_IA32_S_CET:
+		if (!boot_cpu_has(X86_FEATURE_SHSTK) &&
+		    !boot_cpu_has(X86_FEATURE_IBT))
+			return false;
+
+		if (!msr->host_initiated) {
+			if (!guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) &&
+			    !guest_cpuid_has(vcpu, X86_FEATURE_IBT))
+				return false;
+		} else if (index == MSR_IA32_U_CET &&
+			   !(kvm_xss & XFEATURE_MASK_CET_USER))
+			return false;
+		break;
+	case MSR_IA32_INT_SSP_TAB:
+		if (!boot_cpu_has(X86_FEATURE_SHSTK))
+			return false;
+
+		if (!msr->host_initiated &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
+			return false;
+		break;
+	default:
+		return false;
+	}
+	return true;
+}
 /*
  * Reads an msr value (of 'msr_index') into 'pdata'.
  * Returns 0 on success, non-0 otherwise.
@@ -1788,6 +1880,26 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		else
 			msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
 		break;
+	case MSR_IA32_S_CET:
+		if (!cet_msr_access_allowed(vcpu, msr_info))
+			return 1;
+		msr_info->data = vmcs_readl(GUEST_S_CET);
+		break;
+	case MSR_IA32_INT_SSP_TAB:
+		if (!cet_msr_access_allowed(vcpu, msr_info))
+			return 1;
+		msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE);
+		break;
+	case MSR_IA32_U_CET:
+		if (!cet_msr_access_allowed(vcpu, msr_info))
+			return 1;
+		rdmsrl(MSR_IA32_U_CET, msr_info->data);
+		break;
+	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+		if (!cet_msr_access_allowed(vcpu, msr_info))
+			return 1;
+		rdmsrl(msr_info->index, msr_info->data);
+		break;
 	case MSR_TSC_AUX:
 		if (!msr_info->host_initiated &&
 		    !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
@@ -2039,6 +2151,34 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		else
 			vmx->pt_desc.guest.addr_a[index / 2] = data;
 		break;
+	case MSR_IA32_S_CET:
+		if (!cet_msr_access_allowed(vcpu, msr_info))
+			return 1;
+		if (!cet_msr_write_allowed(vcpu, msr_info))
+			return 1;
+		vmcs_writel(GUEST_S_CET, data);
+		break;
+	case MSR_IA32_INT_SSP_TAB:
+		if (!cet_msr_access_allowed(vcpu, msr_info))
+			return 1;
+		if (!cet_msr_write_allowed(vcpu, msr_info))
+			return 1;
+		vmcs_writel(GUEST_INTR_SSP_TABLE, data);
+		break;
+	case MSR_IA32_U_CET:
+		if (!cet_msr_access_allowed(vcpu, msr_info))
+			return 1;
+		if (!cet_msr_write_allowed(vcpu, msr_info))
+			return 1;
+		wrmsrl(MSR_IA32_U_CET, data);
+		break;
+	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+		if (!cet_msr_access_allowed(vcpu, msr_info))
+			return 1;
+		if (!cet_msr_write_allowed(vcpu, msr_info))
+			return 1;
+		wrmsrl(msr_info->index, data);
+		break;
 	case MSR_TSC_AUX:
 		if (!msr_info->host_initiated &&
 		    !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6275a75d5802..1bbe4550da90 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1143,6 +1143,9 @@ static u32 msrs_to_save[] = {
 	MSR_IA32_RTIT_ADDR1_A, MSR_IA32_RTIT_ADDR1_B,
 	MSR_IA32_RTIT_ADDR2_A, MSR_IA32_RTIT_ADDR2_B,
 	MSR_IA32_RTIT_ADDR3_A, MSR_IA32_RTIT_ADDR3_B,
+	MSR_IA32_XSS, MSR_IA32_U_CET, MSR_IA32_S_CET,
+	MSR_IA32_PL0_SSP, MSR_IA32_PL1_SSP, MSR_IA32_PL2_SSP,
+	MSR_IA32_PL3_SSP, MSR_IA32_INT_SSP_TAB,
 };
 
 static unsigned num_msrs_to_save;
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 2/7] KVM: VMX: Define CET VMCS fields and #CP flag
  2019-11-01  8:52 ` [PATCH v8 2/7] KVM: VMX: Define CET VMCS fields and #CP flag Yang Weijiang
@ 2019-12-10 21:00   ` Sean Christopherson
  2019-12-11  1:45     ` Yang Weijiang
  0 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2019-12-10 21:00 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang, yu-cheng.yu

On Fri, Nov 01, 2019 at 04:52:17PM +0800, Yang Weijiang wrote:
> CET(Control-flow Enforcement Technology) is an upcoming Intel(R)
> processor feature that blocks Return/Jump-Oriented Programming(ROP)
> attacks. It provides the following capabilities to defend
> against ROP/JOP style control-flow subversion attacks:
> 
> Shadow Stack (SHSTK):
>   A second stack for program which is used exclusively for
>   control transfer operations.
> 
> Indirect Branch Tracking (IBT):
>   Code branching protection to defend against jump/call oriented
>   programming.
> 
> Several new CET MSRs are defined in kernel to support CET:
>   MSR_IA32_{U,S}_CET: Controls the CET settings for user
>                       mode and suervisor mode respectively.
> 
>   MSR_IA32_PL{0,1,2,3}_SSP: Stores shadow stack pointers for
>                             CPL-0,1,2,3 level respectively.
> 
>   MSR_IA32_INT_SSP_TAB: Stores base address of shadow stack
>                         pointer table.
> 
> Two XSAVES state bits are introduced for CET:
>   IA32_XSS:[bit 11]: For saving/restoring user mode CET states
>   IA32_XSS:[bit 12]: For saving/restoring supervisor mode CET states.
> 
> Six VMCS fields are introduced for CET:
>   {HOST,GUEST}_S_CET: Stores CET settings for supervisor mode.
>   {HOST,GUEST}_SSP: Stores shadow stack pointer for supervisor mode.
>   {HOST,GUEST}_INTR_SSP_TABLE: Stores base address of shadow stack pointer
>                                table.
> 
> If VM_EXIT_LOAD_HOST_CET_STATE = 1, the host's CET MSRs are restored
> from below VMCS fields at VM-Exit:
>   HOST_S_CET
>   HOST_SSP
>   HOST_INTR_SSP_TABLE
> 
> If VM_ENTRY_LOAD_GUEST_CET_STATE = 1, the guest's CET MSRs are loaded
> from below VMCS fields at VM-Entry:
>   GUEST_S_CET
>   GUEST_SSP
>   GUEST_INTR_SSP_TABLE
> 
> Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> ---
>  arch/x86/include/asm/vmx.h      | 8 ++++++++
>  arch/x86/include/uapi/asm/kvm.h | 1 +
>  arch/x86/kvm/x86.c              | 1 +
>  arch/x86/kvm/x86.h              | 5 +++--
>  4 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index a39136b0d509..68bca290a203 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -90,6 +90,7 @@
>  #define VM_EXIT_CLEAR_BNDCFGS                   0x00800000
>  #define VM_EXIT_PT_CONCEAL_PIP			0x01000000
>  #define VM_EXIT_CLEAR_IA32_RTIT_CTL		0x02000000
> +#define VM_EXIT_LOAD_HOST_CET_STATE             0x10000000
>  
>  #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR	0x00036dff
>  
> @@ -103,6 +104,7 @@
>  #define VM_ENTRY_LOAD_BNDCFGS                   0x00010000
>  #define VM_ENTRY_PT_CONCEAL_PIP			0x00020000
>  #define VM_ENTRY_LOAD_IA32_RTIT_CTL		0x00040000
> +#define VM_ENTRY_LOAD_GUEST_CET_STATE           0x00100000
>  
>  #define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR	0x000011ff
>  
> @@ -321,6 +323,9 @@ enum vmcs_field {
>  	GUEST_PENDING_DBG_EXCEPTIONS    = 0x00006822,
>  	GUEST_SYSENTER_ESP              = 0x00006824,
>  	GUEST_SYSENTER_EIP              = 0x00006826,
> +	GUEST_S_CET                     = 0x00006828,
> +	GUEST_SSP                       = 0x0000682a,
> +	GUEST_INTR_SSP_TABLE            = 0x0000682c,
>  	HOST_CR0                        = 0x00006c00,
>  	HOST_CR3                        = 0x00006c02,
>  	HOST_CR4                        = 0x00006c04,
> @@ -333,6 +338,9 @@ enum vmcs_field {
>  	HOST_IA32_SYSENTER_EIP          = 0x00006c12,
>  	HOST_RSP                        = 0x00006c14,
>  	HOST_RIP                        = 0x00006c16,
> +	HOST_S_CET                      = 0x00006c18,
> +	HOST_SSP                        = 0x00006c1a,
> +	HOST_INTR_SSP_TABLE             = 0x00006c1c
>  };
>  
>  /*
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index 503d3f42da16..e68d6b448730 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -31,6 +31,7 @@
>  #define MC_VECTOR 18
>  #define XM_VECTOR 19
>  #define VE_VECTOR 20
> +#define CP_VECTOR 21
>  
>  /* Select x86 specific features in <linux/kvm.h> */
>  #define __KVM_HAVE_PIT
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 290c3c3efb87..540490d5385f 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -378,6 +378,7 @@ static int exception_class(int vector)
>  	case NP_VECTOR:
>  	case SS_VECTOR:
>  	case GP_VECTOR:
> +	case CP_VECTOR:
>  		return EXCPT_CONTRIBUTORY;
>  	default:
>  		break;
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index f10c12b5197d..7e7b5b5cc956 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -114,7 +114,7 @@ static inline bool x86_exception_has_error_code(unsigned int vector)
>  {
>  	static u32 exception_has_error_code = BIT(DF_VECTOR) | BIT(TS_VECTOR) |
>  			BIT(NP_VECTOR) | BIT(SS_VECTOR) | BIT(GP_VECTOR) |
> -			BIT(PF_VECTOR) | BIT(AC_VECTOR);
> +			BIT(PF_VECTOR) | BIT(AC_VECTOR) | BIT(CP_VECTOR);
>  
>  	return (1U << vector) & exception_has_error_code;
>  }
> @@ -298,7 +298,8 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, unsigned long cr2,
>   * Right now, no XSS states are used on x86 platform,
>   * expand the macro for new features.

I assume this comment needs to be updated?

>   */
> -#define KVM_SUPPORTED_XSS	0
> +#define KVM_SUPPORTED_XSS	(XFEATURE_MASK_CET_USER \
> +				| XFEATURE_MASK_CET_KERNEL)
>  
>  extern u64 host_xcr0;
>  
> -- 
> 2.17.2
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 3/7] KVM: VMX: Pass through CET related MSRs
  2019-11-01  8:52 ` [PATCH v8 3/7] KVM: VMX: Pass through CET related MSRs Yang Weijiang
@ 2019-12-10 21:18   ` Sean Christopherson
  2019-12-11  1:32     ` Yang Weijiang
  2019-12-16  2:18     ` Yang Weijiang
  0 siblings, 2 replies; 32+ messages in thread
From: Sean Christopherson @ 2019-12-10 21:18 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang, yu-cheng.yu

On Fri, Nov 01, 2019 at 04:52:18PM +0800, Yang Weijiang wrote:
> CET MSRs pass through Guest directly to enhance performance.
> CET runtime control settings are stored in MSR_IA32_{U,S}_CET,
> Shadow Stack Pointer(SSP) are stored in MSR_IA32_PL{0,1,2,3}_SSP,
> SSP table base address is stored in MSR_IA32_INT_SSP_TAB,
> these MSRs are defined in kernel and re-used here.
> 
> MSR_IA32_U_CET and MSR_IA32_PL3_SSP are used for user mode protection,
> the contents could differ from process to process, therefore,
> kernel needs to save/restore them during context switch, it makes
> sense to pass through them so that the guest kernel can
> use xsaves/xrstors to operate them efficiently. Other MSRs are used
> for non-user mode protection. See CET spec for detailed info.
> 
> The difference between CET VMCS state fields and xsave components is that,
> the former used for CET state storage during VMEnter/VMExit,
> whereas the latter used for state retention between Guest task/process
> switch.
> 
> Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> ---
>  arch/x86/kvm/cpuid.c   |  4 +--
>  arch/x86/kvm/cpuid.h   |  2 ++
>  arch/x86/kvm/vmx/vmx.c | 65 ++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 69 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index dd387a785c1e..4166c4fcad1e 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -371,13 +371,13 @@ static inline void do_cpuid_7_mask(struct kvm_cpuid_entry2 *entry, int index)
>  		F(AVX512VBMI) | F(LA57) | F(PKU) | 0 /*OSPKE*/ |
>  		F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
>  		F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
> -		F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B);
> +		F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | F(SHSTK);
>  
>  	/* cpuid 7.0.edx*/
>  	const u32 kvm_cpuid_7_0_edx_x86_features =
>  		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
>  		F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
> -		F(MD_CLEAR);
> +		F(MD_CLEAR) | F(IBT);

Advertising CET to userspace/guest needs to be done at the end of the
series, or at least after CR4.CET is no longer reserved, e.g. KVM_SET_SREGS
will fail and the guest will get a #GP when trying to set CR4.CET.

I'm pretty sure I've said this at least twice in previous versions of
this series...

>  
>  	/* cpuid 7.1.eax */
>  	const u32 kvm_cpuid_7_1_eax_x86_features =
> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> index d78a61408243..1d77b880084d 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -27,6 +27,8 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
>  
>  int cpuid_query_maxphyaddr(struct kvm_vcpu *vcpu);
>  
> +u64 kvm_supported_xss(void);
> +
>  static inline int cpuid_maxphyaddr(struct kvm_vcpu *vcpu)
>  {
>  	return vcpu->arch.maxphyaddr;
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index a84198cff397..db03d9dc1297 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2918,6 +2918,24 @@ void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>  	vmcs_writel(GUEST_CR3, guest_cr3);
>  }
>  
> +static bool guest_cet_allowed(struct kvm_vcpu *vcpu, u32 feature, u32 mode)
> +{
> +	u64 kvm_xss = kvm_supported_xss();
> +
> +	/*
> +	 * Sanity check for guest CET dependencies, guest_cpu_has(SHSTK|IBT) has
> +	 * implied corresponding host CET status check.
> +	 */
> +	if (feature == X86_FEATURE_SHSTK)
> +		return guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) &&
> +		       (kvm_xss & mode);
> +	else if (feature == X86_FEATURE_IBT)
> +		return guest_cpuid_has(vcpu, X86_FEATURE_IBT) &&
> +		       (kvm_xss & mode);
> +
> +	return false;
> +}
> +
>  int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
>  {
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> @@ -7001,6 +7019,50 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
>  		vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4));
>  }
>  
> +static void vmx_pass_cet_msrs(struct kvm_vcpu *vcpu)

"pass" isn't accurate, this function also does the opposite.  Maybe 
vmx_update_intercept_for_cet_msr()?  Or reuse the PT naming and go with
cet_update_intercept_for_msr()?

> +{
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +	unsigned long *msr_bitmap = vmx->vmcs01.msr_bitmap;
> +
> +	/*
> +	 * U_CET is required for USER CET, per CET spec., meanwhile U_CET and
> +	 * PL3_SPP are a bundle for USER CET xsaves.
> +	 */
> +	if (guest_cet_allowed(vcpu, X86_FEATURE_SHSTK, XFEATURE_MASK_CET_USER) ||
> +	    guest_cet_allowed(vcpu, X86_FEATURE_IBT, XFEATURE_MASK_CET_USER)) {

IMO, the guest_cet_allowed() wrappers do more harm than good, e.g. I find
this easier to understand because it doesn't require digging into a random
helper.

	if ((kvm_supported_xss() & XFEATURE_MASK_CET_USER) &&
	    (guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) ||
	     guest_cpuid_has(vcpu, X86_FEATURE_IBT)))

> +		vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_U_CET, MSR_TYPE_RW);
> +		vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL3_SSP, MSR_TYPE_RW);
> +	} else {
> +		vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_U_CET, MSR_TYPE_RW, true);
> +		vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_PL3_SSP, MSR_TYPE_RW, true);
> +	}
> +	/*
> +	 * S_CET is required for KERNEL CET, meanwhile PL0_SSP ... PL2_SSP are a bundle
> +	 * for CET KERNEL xsaves.
> +	 */
> +	if (guest_cet_allowed(vcpu, X86_FEATURE_SHSTK, XFEATURE_MASK_CET_KERNEL) ||
> +	    guest_cet_allowed(vcpu, X86_FEATURE_IBT, XFEATURE_MASK_CET_KERNEL)) {
> +		vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_S_CET, MSR_TYPE_RW);
> +		vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL0_SSP, MSR_TYPE_RW);
> +		vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL1_SSP, MSR_TYPE_RW);
> +		vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL2_SSP, MSR_TYPE_RW);
> +
> +		/* SSP_TAB only available for KERNEL SHSTK.*/
> +		if (guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
> +			vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_INT_SSP_TAB,
> +						      MSR_TYPE_RW);
> +		else
> +			vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_INT_SSP_TAB,
> +						  MSR_TYPE_RW, true);
> +	} else {
> +		vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_S_CET, MSR_TYPE_RW, true);
> +		vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_PL0_SSP, MSR_TYPE_RW, true);
> +		vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_PL1_SSP, MSR_TYPE_RW, true);
> +		vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_PL2_SSP, MSR_TYPE_RW, true);
> +		vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_INT_SSP_TAB, MSR_TYPE_RW, true);
> +	}
> +}
> +
>  static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> @@ -7025,6 +7087,9 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
>  	if (boot_cpu_has(X86_FEATURE_INTEL_PT) &&
>  			guest_cpuid_has(vcpu, X86_FEATURE_INTEL_PT))
>  		update_intel_pt_cfg(vcpu);
> +
> +	if (!is_guest_mode(vcpu))
> +		vmx_pass_cet_msrs(vcpu);

Hmm, this looks insufficent, e.g. deliberately toggling CET from on->off
while in guest mode would put KVM in a weird state as the msr bitmap for
L1 would still allow L1 to access the CET MSRs.

Allowing KVM_SET_CPUID{2} while running a nested guest seems bogus, can we
kill that path entirely with -EINVAL?

>  }
>  
>  static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
> -- 
> 2.17.2
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 4/7] KVM: VMX: Load CET states on vmentry/vmexit
  2019-11-01  8:52 ` [PATCH v8 4/7] KVM: VMX: Load CET states on vmentry/vmexit Yang Weijiang
@ 2019-12-10 21:23   ` Sean Christopherson
  2019-12-11  1:54     ` Yang Weijiang
  0 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2019-12-10 21:23 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang, yu-cheng.yu

On Fri, Nov 01, 2019 at 04:52:19PM +0800, Yang Weijiang wrote:
> "Load {guest,host} CET state" bit controls whether guest/host
> CET states will be loaded at VM entry/exit. Before doing that,
> KVM needs to check if CET is both enabled on host and guest.
> 
> Note: SHSTK and IBT features share one control MSR:
> MSR_IA32_{U,S}_CET, which means it's difficult to hide
> one feature from another in the case of SHSTK != IBT,
> after discussed in community, it's agreed to allow Guest
> control two features independently as it won't introduce
> security hole.
> 
> Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> ---

...

> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index db03d9dc1297..e392e818e7eb 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -44,6 +44,7 @@
>  #include <asm/spec-ctrl.h>
>  #include <asm/virtext.h>
>  #include <asm/vmx.h>
> +#include <asm/cet.h>
>  
>  #include "capabilities.h"
>  #include "cpuid.h"
> @@ -2336,7 +2337,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
>  	      VM_EXIT_LOAD_IA32_EFER |
>  	      VM_EXIT_CLEAR_BNDCFGS |
>  	      VM_EXIT_PT_CONCEAL_PIP |
> -	      VM_EXIT_CLEAR_IA32_RTIT_CTL;
> +	      VM_EXIT_CLEAR_IA32_RTIT_CTL |
> +	      VM_EXIT_LOAD_HOST_CET_STATE;
>  	if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_EXIT_CTLS,
>  				&_vmexit_control) < 0)
>  		return -EIO;
> @@ -2360,7 +2362,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
>  	      VM_ENTRY_LOAD_IA32_EFER |
>  	      VM_ENTRY_LOAD_BNDCFGS |
>  	      VM_ENTRY_PT_CONCEAL_PIP |
> -	      VM_ENTRY_LOAD_IA32_RTIT_CTL;
> +	      VM_ENTRY_LOAD_IA32_RTIT_CTL |
> +	      VM_ENTRY_LOAD_GUEST_CET_STATE;
>  	if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_ENTRY_CTLS,
>  				&_vmentry_control) < 0)
>  		return -EIO;
> @@ -2834,6 +2837,9 @@ void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
>  	unsigned long hw_cr0;
>  
> +	if (!(cr0 & X86_CR0_WP) && kvm_read_cr4_bits(vcpu, X86_CR4_CET))
> +		cr0 |= X86_CR0_WP;

Huh?  What's the interaction between CR4.CET and CR0.WP?  If there really
is some non-standard interaction then it needs to be documented in at least
the changelog and probably with a comment as well.

> +
>  	hw_cr0 = (cr0 & ~KVM_VM_CR0_ALWAYS_OFF);
>  	if (enable_unrestricted_guest)
>  		hw_cr0 |= KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST;
> @@ -2936,6 +2942,22 @@ static bool guest_cet_allowed(struct kvm_vcpu *vcpu, u32 feature, u32 mode)
>  	return false;
>  }
>  
> +bool is_cet_bit_allowed(struct kvm_vcpu *vcpu)
> +{
> +	unsigned long cr0;
> +	bool cet_allowed;
> +
> +	cr0 = kvm_read_cr0(vcpu);
> +	cet_allowed = guest_cet_allowed(vcpu, X86_FEATURE_SHSTK,
> +					XFEATURE_MASK_CET_USER) ||
> +		      guest_cet_allowed(vcpu, X86_FEATURE_IBT,
> +					XFEATURE_MASK_CET_USER);
> +	if ((cr0 & X86_CR0_WP) && cet_allowed)
> +		return true;

So, attempting to set CR4.CET if CR0.WP=0 takes a #GP?  But attempting
to clear CR0.WP if CR4.CET=1 is ignored?

> +
> +	return false;
> +}
> +
>  int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
>  {
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> @@ -2976,6 +2998,9 @@ int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
>  			return 1;
>  	}
>  
> +	if ((cr4 & X86_CR4_CET) && !is_cet_bit_allowed(vcpu))
> +		return 1;
> +
>  	if (vmx->nested.vmxon && !nested_cr4_valid(vcpu, cr4))
>  		return 1;
>  
> @@ -3839,6 +3864,12 @@ void vmx_set_constant_host_state(struct vcpu_vmx *vmx)
>  
>  	if (cpu_has_load_ia32_efer())
>  		vmcs_write64(HOST_IA32_EFER, host_efer);
> +
> +	if (cpu_has_load_host_cet_states_ctrl()) {
> +		vmcs_writel(HOST_S_CET, 0);
> +		vmcs_writel(HOST_INTR_SSP_TABLE, 0);
> +		vmcs_writel(HOST_SSP, 0);
> +	}
>  }
>  
>  void set_cr4_guest_host_mask(struct vcpu_vmx *vmx)
> @@ -6436,6 +6467,7 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
>  	unsigned long cr3, cr4;
> +	bool cet_allowed;
>  
>  	/* Record the guest's net vcpu time for enforced NMI injections. */
>  	if (unlikely(!enable_vnmi &&
> @@ -6466,6 +6498,25 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>  		vmx->loaded_vmcs->host_state.cr3 = cr3;
>  	}
>  
> +	/* To be aligned with kernel code, only user mode is supported now. */
> +	cet_allowed = guest_cet_allowed(vcpu, X86_FEATURE_SHSTK,
> +					XFEATURE_MASK_CET_USER) ||
> +		      guest_cet_allowed(vcpu, X86_FEATURE_IBT,
> +					XFEATURE_MASK_CET_USER);
> +	if (cpu_has_load_guest_cet_states_ctrl() && cet_allowed)
> +		vmcs_set_bits(VM_ENTRY_CONTROLS,
> +			      VM_ENTRY_LOAD_GUEST_CET_STATE);
> +	else
> +		vmcs_clear_bits(VM_ENTRY_CONTROLS,
> +				VM_ENTRY_LOAD_GUEST_CET_STATE);
> +
> +	if (cpu_has_load_host_cet_states_ctrl() && cet_allowed)
> +		vmcs_set_bits(VM_EXIT_CONTROLS,
> +			      VM_EXIT_LOAD_HOST_CET_STATE);
> +	else
> +		vmcs_clear_bits(VM_EXIT_CONTROLS,
> +				VM_EXIT_LOAD_HOST_CET_STATE);
> +
>  	cr4 = cr4_read_shadow();
>  	if (unlikely(cr4 != vmx->loaded_vmcs->host_state.cr4)) {
>  		vmcs_writel(HOST_CR4, cr4);
> -- 
> 2.17.2
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 6/7] KVM: X86: Load guest fpu state when accessing MSRs managed by XSAVES
  2019-11-01  8:52 ` [PATCH v8 6/7] KVM: X86: Load guest fpu state when accessing MSRs managed by XSAVES Yang Weijiang
@ 2019-12-10 21:27   ` Sean Christopherson
  2019-12-11  2:03     ` Yang Weijiang
  0 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2019-12-10 21:27 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang, yu-cheng.yu

On Fri, Nov 01, 2019 at 04:52:21PM +0800, Yang Weijiang wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> A handful of CET MSRs are not context switched through "traditional"
> methods, e.g. VMCS or manual switching, but rather are passed through
> to the guest and are saved and restored by XSAVES/XRSTORS, i.e. the
> guest's FPU state.
> 
> Load the guest's FPU state if userspace is accessing MSRs whose values
> are managed by XSAVES so that the MSR helper, e.g. vmx_{get,set}_msr(),
> can simply do {RD,WR}MSR to access the guest's value.
> 
> Note that guest_cpuid_has() is not queried as host userspace is allowed
> to access MSRs that have not been exposed to the guest, e.g. it might do
> KVM_SET_MSRS prior to KVM_SET_CPUID2.
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Co-developed-by: Yang Weijiang <weijiang.yang@intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> ---
>  arch/x86/kvm/x86.c | 21 ++++++++++++++++++++-
>  1 file changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 540490d5385f..6275a75d5802 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -104,6 +104,8 @@ static void enter_smm(struct kvm_vcpu *vcpu);
>  static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
>  static void store_regs(struct kvm_vcpu *vcpu);
>  static int sync_regs(struct kvm_vcpu *vcpu);
> +static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
> +static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
>  
>  struct kvm_x86_ops *kvm_x86_ops __read_mostly;
>  EXPORT_SYMBOL_GPL(kvm_x86_ops);
> @@ -3000,6 +3002,12 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  }
>  EXPORT_SYMBOL_GPL(kvm_get_msr_common);
>  
> +static bool is_xsaves_msr(u32 index)
> +{
> +	return index == MSR_IA32_U_CET ||
> +	       (index >= MSR_IA32_PL0_SSP && index <= MSR_IA32_PL3_SSP);
> +}
> +
>  /*
>   * Read or write a bunch of msrs. All parameters are kernel addresses.
>   *
> @@ -3010,11 +3018,22 @@ static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs,
>  		    int (*do_msr)(struct kvm_vcpu *vcpu,
>  				  unsigned index, u64 *data))
>  {
> +	bool fpu_loaded = false;
>  	int i;
> +	const u64 cet_bits = XFEATURE_MASK_CET_USER | XFEATURE_MASK_CET_KERNEL;
> +	bool cet_xss = kvm_supported_xss() & cet_bits;
>  
> -	for (i = 0; i < msrs->nmsrs; ++i)
> +	for (i = 0; i < msrs->nmsrs; ++i) {
> +		if (!fpu_loaded && cet_xss &&
> +		    is_xsaves_msr(entries[i].index)) {
> +			kvm_load_guest_fpu(vcpu);

This needs to also check for a non-NULL @vcpu.  KVM_GET_MSR can be called
on the VM to invoke do_get_msr_feature().

> +			fpu_loaded = true;
> +		}
>  		if (do_msr(vcpu, entries[i].index, &entries[i].data))
>  			break;
> +	}
> +	if (fpu_loaded)
> +		kvm_put_guest_fpu(vcpu);
>  
>  	return i;
>  }
> -- 
> 2.17.2
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 7/7] KVM: X86: Add user-space access interface for CET MSRs
  2019-11-01  8:52 ` [PATCH v8 7/7] KVM: X86: Add user-space access interface for CET MSRs Yang Weijiang
@ 2019-12-10 21:58   ` Sean Christopherson
  2019-12-11  2:19     ` Yang Weijiang
  0 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2019-12-10 21:58 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang, yu-cheng.yu

On Fri, Nov 01, 2019 at 04:52:22PM +0800, Yang Weijiang wrote:
> There're two different places storing Guest CET states, states
> managed with XSAVES/XRSTORS, as restored/saved
> in previous patch, can be read/write directly from/to the MSRs.
> For those stored in VMCS fields, they're access via vmcs_read/
> vmcs_write.
> 
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> ---
>  arch/x86/kvm/vmx/vmx.c | 140 +++++++++++++++++++++++++++++++++++++++++
>  arch/x86/kvm/x86.c     |   3 +
>  2 files changed, 143 insertions(+)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index bfb1b922a9ac..71aba264b5d2 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1671,6 +1671,98 @@ static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
>  	return 0;
>  }
>  
> +#define CET_MSR_RSVD_BITS_1    0x3
> +#define CET_MSR_RSVD_BITS_2   (0xF << 6)
> +
> +static bool cet_msr_write_allowed(struct kvm_vcpu *vcpu, struct msr_data *msr)
> +{
> +	u32 index = msr->index;
> +	u64 data = msr->data;
> +	u32 high_word = data >> 32;
> +
> +	if ((index == MSR_IA32_U_CET || index == MSR_IA32_S_CET) &&
> +	    (data & CET_MSR_RSVD_BITS_2))
> +		return false;
> +
> +	if (is_64_bit_mode(vcpu)) {
> +		if (is_noncanonical_address(data & PAGE_MASK, vcpu))

I don't think this is correct.  MSRs that contain an address usually only
fault on a non-canonical value and do the non-canonical check regardless
of mode.  E.g. VM-Enter's consistency checks on SYSENTER_E{I,S}P only care
about a canonical address and are not dependent on mode, and SYSENTER
itself states that bits 63:32 are ignored in 32-bit mode.  I assume the
same is true here.

If that is indeed the case, what about adding these to the common canonical
check in __kvm_set_msr()?  That'd cut down on the boilerplate here and
might make it easier to audit KVM's canonical checks.

> +			return false;
> +		else if ((index == MSR_IA32_PL0_SSP ||
> +			  index == MSR_IA32_PL1_SSP ||
> +			  index == MSR_IA32_PL2_SSP ||
> +			  index == MSR_IA32_PL3_SSP) &&
> +			  (data & CET_MSR_RSVD_BITS_1))
> +			return false;
> +	} else {
> +		if (msr->index == MSR_IA32_INT_SSP_TAB)
> +			return false;
> +		else if ((index == MSR_IA32_U_CET ||
> +			  index == MSR_IA32_S_CET ||
> +			  index == MSR_IA32_PL0_SSP ||
> +			  index == MSR_IA32_PL1_SSP ||
> +			  index == MSR_IA32_PL2_SSP ||
> +			  index == MSR_IA32_PL3_SSP) &&
> +			  (high_word & ~0ul))
> +			return false;
> +	}
> +
> +	return true;
> +}

This helper seems like overkill, e.g. it's filled with index-specific
checks, but is called from code that has already switched on the index.
Open coding the individual checks is likely more readable and would require
less code, especially if the canonical checks are cleaned up.

> +
> +static bool cet_msr_access_allowed(struct kvm_vcpu *vcpu, struct msr_data *msr)
> +{
> +	u64 kvm_xss;
> +	u32 index = msr->index;
> +
> +	if (is_guest_mode(vcpu))
> +		return false;

I may have missed this in an earlier discussion, does CET not support
nesting?

> +
> +	kvm_xss = kvm_supported_xss();
> +
> +	switch (index) {
> +	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
> +		if (!boot_cpu_has(X86_FEATURE_SHSTK))
> +			return false;
> +		if (!msr->host_initiated) {
> +			if (!guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
> +				return false;
> +		} else {

This looks wrong, WRMSR from the guest only checks CPUID, it doesn't check
kvm_xss.

> +			if (index == MSR_IA32_PL3_SSP) {
> +				if (!(kvm_xss & XFEATURE_MASK_CET_USER))
> +					return false;
> +			} else {
> +				if (!(kvm_xss & XFEATURE_MASK_CET_KERNEL))
> +					return false;
> +			}
> +		}
> +		break;
> +	case MSR_IA32_U_CET:
> +	case MSR_IA32_S_CET:

Rather than bundle everything in a single access_allowed() helper, it might
be easier to have separate helpers for each class of MSR.   Except for the
guest_mode() check, there's no overlap between the classes.

> +		if (!boot_cpu_has(X86_FEATURE_SHSTK) &&
> +		    !boot_cpu_has(X86_FEATURE_IBT))
> +			return false;
> +
> +		if (!msr->host_initiated) {
> +			if (!guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) &&
> +			    !guest_cpuid_has(vcpu, X86_FEATURE_IBT))
> +				return false;
> +		} else if (index == MSR_IA32_U_CET &&
> +			   !(kvm_xss & XFEATURE_MASK_CET_USER))

Same comment about guest not checking kvm_xss.

> +			return false;
> +		break;
> +	case MSR_IA32_INT_SSP_TAB:
> +		if (!boot_cpu_has(X86_FEATURE_SHSTK))
> +			return false;
> +
> +		if (!msr->host_initiated &&
> +		    !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
> +			return false;
> +		break;
> +	default:
> +		return false;
> +	}
> +	return true;
> +}
>  /*
>   * Reads an msr value (of 'msr_index') into 'pdata'.
>   * Returns 0 on success, non-0 otherwise.
> @@ -1788,6 +1880,26 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		else
>  			msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
>  		break;
> +	case MSR_IA32_S_CET:
> +		if (!cet_msr_access_allowed(vcpu, msr_info))
> +			return 1;
> +		msr_info->data = vmcs_readl(GUEST_S_CET);
> +		break;
> +	case MSR_IA32_INT_SSP_TAB:
> +		if (!cet_msr_access_allowed(vcpu, msr_info))
> +			return 1;
> +		msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE);
> +		break;
> +	case MSR_IA32_U_CET:
> +		if (!cet_msr_access_allowed(vcpu, msr_info))
> +			return 1;
> +		rdmsrl(MSR_IA32_U_CET, msr_info->data);
> +		break;
> +	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
> +		if (!cet_msr_access_allowed(vcpu, msr_info))
> +			return 1;
> +		rdmsrl(msr_info->index, msr_info->data);
> +		break;
>  	case MSR_TSC_AUX:
>  		if (!msr_info->host_initiated &&
>  		    !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> @@ -2039,6 +2151,34 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		else
>  			vmx->pt_desc.guest.addr_a[index / 2] = data;
>  		break;
> +	case MSR_IA32_S_CET:
> +		if (!cet_msr_access_allowed(vcpu, msr_info))
> +			return 1;
> +		if (!cet_msr_write_allowed(vcpu, msr_info))
> +			return 1;
> +		vmcs_writel(GUEST_S_CET, data);
> +		break;
> +	case MSR_IA32_INT_SSP_TAB:
> +		if (!cet_msr_access_allowed(vcpu, msr_info))
> +			return 1;
> +		if (!cet_msr_write_allowed(vcpu, msr_info))
> +			return 1;
> +		vmcs_writel(GUEST_INTR_SSP_TABLE, data);
> +		break;
> +	case MSR_IA32_U_CET:
> +		if (!cet_msr_access_allowed(vcpu, msr_info))
> +			return 1;
> +		if (!cet_msr_write_allowed(vcpu, msr_info))
> +			return 1;
> +		wrmsrl(MSR_IA32_U_CET, data);
> +		break;
> +	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
> +		if (!cet_msr_access_allowed(vcpu, msr_info))
> +			return 1;
> +		if (!cet_msr_write_allowed(vcpu, msr_info))
> +			return 1;
> +		wrmsrl(msr_info->index, data);
> +		break;
>  	case MSR_TSC_AUX:
>  		if (!msr_info->host_initiated &&
>  		    !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 6275a75d5802..1bbe4550da90 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1143,6 +1143,9 @@ static u32 msrs_to_save[] = {
>  	MSR_IA32_RTIT_ADDR1_A, MSR_IA32_RTIT_ADDR1_B,
>  	MSR_IA32_RTIT_ADDR2_A, MSR_IA32_RTIT_ADDR2_B,
>  	MSR_IA32_RTIT_ADDR3_A, MSR_IA32_RTIT_ADDR3_B,
> +	MSR_IA32_XSS, MSR_IA32_U_CET, MSR_IA32_S_CET,
> +	MSR_IA32_PL0_SSP, MSR_IA32_PL1_SSP, MSR_IA32_PL2_SSP,
> +	MSR_IA32_PL3_SSP, MSR_IA32_INT_SSP_TAB,
>  };
>  
>  static unsigned num_msrs_to_save;
> -- 
> 2.17.2
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 3/7] KVM: VMX: Pass through CET related MSRs
  2019-12-10 21:18   ` Sean Christopherson
@ 2019-12-11  1:32     ` Yang Weijiang
  2019-12-11  1:50       ` Sean Christopherson
  2019-12-16  2:18     ` Yang Weijiang
  1 sibling, 1 reply; 32+ messages in thread
From: Yang Weijiang @ 2019-12-11  1:32 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Yang Weijiang, kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang,
	yu-cheng.yu

On Tue, Dec 10, 2019 at 01:18:21PM -0800, Sean Christopherson wrote:
> On Fri, Nov 01, 2019 at 04:52:18PM +0800, Yang Weijiang wrote:
> > CET MSRs pass through Guest directly to enhance performance.
> > CET runtime control settings are stored in MSR_IA32_{U,S}_CET,
> > Shadow Stack Pointer(SSP) are stored in MSR_IA32_PL{0,1,2,3}_SSP,
> > SSP table base address is stored in MSR_IA32_INT_SSP_TAB,
> > these MSRs are defined in kernel and re-used here.
> > 
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index dd387a785c1e..4166c4fcad1e 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -371,13 +371,13 @@ static inline void do_cpuid_7_mask(struct kvm_cpuid_entry2 *entry, int index)
> >  		F(AVX512VBMI) | F(LA57) | F(PKU) | 0 /*OSPKE*/ |
> >  		F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
> >  		F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
> > -		F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B);
> > +		F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | F(SHSTK);
> >  
> >  	/* cpuid 7.0.edx*/
> >  	const u32 kvm_cpuid_7_0_edx_x86_features =
> >  		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
> >  		F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
> > -		F(MD_CLEAR);
> > +		F(MD_CLEAR) | F(IBT);
> 
> Advertising CET to userspace/guest needs to be done at the end of the
> series, or at least after CR4.CET is no longer reserved, e.g. KVM_SET_SREGS
> will fail and the guest will get a #GP when trying to set CR4.CET.
> 
> I'm pretty sure I've said this at least twice in previous versions of
> this series...

Thanks Sean for picking these up!
The reason is, starting from this patch, I'm using guest_cpuid_has(CET)
to check the availability of guest CET CPUID, so logically I would like to let
the readers understand CET related CPUID word is
defined as above. But no problem, I can move these definitions to a
latter patch as the patchset only meaningful as a whole. 
> 
> >  
> >  	/* cpuid 7.1.eax */
> >  	const u32 kvm_cpuid_7_1_eax_x86_features =
> > diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> > index d78a61408243..1d77b880084d 100644
> > --- a/arch/x86/kvm/cpuid.h
> > +++ b/arch/x86/kvm/cpuid.h
> > @@ -27,6 +27,8 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
> >  
> > +static void vmx_pass_cet_msrs(struct kvm_vcpu *vcpu)
> 
> "pass" isn't accurate, this function also does the opposite.  Maybe 
> vmx_update_intercept_for_cet_msr()?  Or reuse the PT naming and go with
> cet_update_intercept_for_msr()?
>
Sure, will change it.

> > +{
> > +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> > +	unsigned long *msr_bitmap = vmx->vmcs01.msr_bitmap;
> > +
> > +	/*
> > +	 * U_CET is required for USER CET, per CET spec., meanwhile U_CET and
> > +	 * PL3_SPP are a bundle for USER CET xsaves.
> > +	 */
> > +	if (guest_cet_allowed(vcpu, X86_FEATURE_SHSTK, XFEATURE_MASK_CET_USER) ||
> > +	    guest_cet_allowed(vcpu, X86_FEATURE_IBT, XFEATURE_MASK_CET_USER)) {
> 
> IMO, the guest_cet_allowed() wrappers do more harm than good, e.g. I find
> this easier to understand because it doesn't require digging into a random
> helper.
> 
> 	if ((kvm_supported_xss() & XFEATURE_MASK_CET_USER) &&
> 	    (guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) ||
> 	     guest_cpuid_has(vcpu, X86_FEATURE_IBT)))
> 
Hmm, sounds like it's an unnecessary wrapper, will remove it, thanks!

> > +		vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_U_CET, MSR_TYPE_RW);
> > +		vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL3_SSP, MSR_TYPE_RW);
> > +	} else {
> > +		vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_U_CET, MSR_TYPE_RW, true);
> > +		vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_PL3_SSP, MSR_TYPE_RW, true);
> > +	}
> > +	/*
> > +	 * S_CET is required for KERNEL CET, meanwhile PL0_SSP ... PL2_SSP are a bundle
> > +	 * for CET KERNEL xsaves.
> > +	 */
> > +	if (guest_cet_allowed(vcpu, X86_FEATURE_SHSTK, XFEATURE_MASK_CET_KERNEL) ||
> > +	    guest_cet_allowed(vcpu, X86_FEATURE_IBT, XFEATURE_MASK_CET_KERNEL)) {
> > +		vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_S_CET, MSR_TYPE_RW);
> > +		vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL0_SSP, MSR_TYPE_RW);
> > +		vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL1_SSP, MSR_TYPE_RW);
> > +		vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL2_SSP, MSR_TYPE_RW);
> > +
> > +		/* SSP_TAB only available for KERNEL SHSTK.*/
> > +		if (guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
> > +			vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_INT_SSP_TAB,
> > +						      MSR_TYPE_RW);
> > +		else
> > +			vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_INT_SSP_TAB,
> > +						  MSR_TYPE_RW, true);
> > +	} else {
> > +		vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_S_CET, MSR_TYPE_RW, true);
> > +		vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_PL0_SSP, MSR_TYPE_RW, true);
> > +		vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_PL1_SSP, MSR_TYPE_RW, true);
> > +		vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_PL2_SSP, MSR_TYPE_RW, true);
> > +		vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_INT_SSP_TAB, MSR_TYPE_RW, true);
> > +	}
> > +}
> > +
> >  static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
> >  {
> >  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> > @@ -7025,6 +7087,9 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
> >  	if (boot_cpu_has(X86_FEATURE_INTEL_PT) &&
> >  			guest_cpuid_has(vcpu, X86_FEATURE_INTEL_PT))
> >  		update_intel_pt_cfg(vcpu);
> > +
> > +	if (!is_guest_mode(vcpu))
> > +		vmx_pass_cet_msrs(vcpu);
> 
> Hmm, this looks insufficent, e.g. deliberately toggling CET from on->off
> while in guest mode would put KVM in a weird state as the msr bitmap for
> L1 would still allow L1 to access the CET MSRs.
Not sure I understand correctly, guest_cpu_has(CET) implies the check of
host CET status, if CET is off in host, CET MSRs won't exposed to L1
guest.
> 
> Allowing KVM_SET_CPUID{2} while running a nested guest seems bogus, can we
> kill that path entirely with -EINVAL?
> 
Do you mean prevent L1 using KVM_SET_CPUID{2} to expose CET feature bits to
L2?

> >  }
> >  
> >  static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
> > -- 
> > 2.17.2
> > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 2/7] KVM: VMX: Define CET VMCS fields and #CP flag
  2019-12-10 21:00   ` Sean Christopherson
@ 2019-12-11  1:45     ` Yang Weijiang
  0 siblings, 0 replies; 32+ messages in thread
From: Yang Weijiang @ 2019-12-11  1:45 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Yang Weijiang, kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang,
	yu-cheng.yu

On Tue, Dec 10, 2019 at 01:00:44PM -0800, Sean Christopherson wrote:
> On Fri, Nov 01, 2019 at 04:52:17PM +0800, Yang Weijiang wrote:
> > CET(Control-flow Enforcement Technology) is an upcoming Intel(R)
> > processor feature that blocks Return/Jump-Oriented Programming(ROP)
> > attacks. It provides the following capabilities to defend
> > against ROP/JOP style control-flow subversion attacks:
> > 
> >  	return (1U << vector) & exception_has_error_code;
> >  }
> > @@ -298,7 +298,8 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, unsigned long cr2,
> >   * Right now, no XSS states are used on x86 platform,
> >   * expand the macro for new features.
> 
> I assume this comment needs to be updated?
>
I'm not sure which features in upstream code are using xsaves bits,
should I go like this:
In future, other XSS state bits can be added here to make them available
to guest? 
> >   */
> > -#define KVM_SUPPORTED_XSS	0
> > +#define KVM_SUPPORTED_XSS	(XFEATURE_MASK_CET_USER \
> > +				| XFEATURE_MASK_CET_KERNEL)
> >  
> >  extern u64 host_xcr0;
> >  
> > -- 
> > 2.17.2
> > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 3/7] KVM: VMX: Pass through CET related MSRs
  2019-12-11  1:32     ` Yang Weijiang
@ 2019-12-11  1:50       ` Sean Christopherson
  2019-12-11  2:27         ` Yang Weijiang
  0 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2019-12-11  1:50 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang, yu-cheng.yu

On Wed, Dec 11, 2019 at 09:32:07AM +0800, Yang Weijiang wrote:
> On Tue, Dec 10, 2019 at 01:18:21PM -0800, Sean Christopherson wrote:
> > On Fri, Nov 01, 2019 at 04:52:18PM +0800, Yang Weijiang wrote:
> > > CET MSRs pass through Guest directly to enhance performance.
> > > CET runtime control settings are stored in MSR_IA32_{U,S}_CET,
> > > Shadow Stack Pointer(SSP) are stored in MSR_IA32_PL{0,1,2,3}_SSP,
> > > SSP table base address is stored in MSR_IA32_INT_SSP_TAB,
> > > these MSRs are defined in kernel and re-used here.
> > > 
> > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > > index dd387a785c1e..4166c4fcad1e 100644
> > > --- a/arch/x86/kvm/cpuid.c
> > > +++ b/arch/x86/kvm/cpuid.c
> > > @@ -371,13 +371,13 @@ static inline void do_cpuid_7_mask(struct kvm_cpuid_entry2 *entry, int index)
> > >  		F(AVX512VBMI) | F(LA57) | F(PKU) | 0 /*OSPKE*/ |
> > >  		F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
> > >  		F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
> > > -		F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B);
> > > +		F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | F(SHSTK);
> > >  
> > >  	/* cpuid 7.0.edx*/
> > >  	const u32 kvm_cpuid_7_0_edx_x86_features =
> > >  		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
> > >  		F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
> > > -		F(MD_CLEAR);
> > > +		F(MD_CLEAR) | F(IBT);
> > 
> > Advertising CET to userspace/guest needs to be done at the end of the
> > series, or at least after CR4.CET is no longer reserved, e.g. KVM_SET_SREGS
> > will fail and the guest will get a #GP when trying to set CR4.CET.
> > 
> > I'm pretty sure I've said this at least twice in previous versions of
> > this series...
> 
> Thanks Sean for picking these up!
> The reason is, starting from this patch, I'm using guest_cpuid_has(CET)
> to check the availability of guest CET CPUID, so logically I would like to let
> the readers understand CET related CPUID word is
> defined as above. But no problem, I can move these definitions to a
> latter patch as the patchset only meaningful as a whole. 

Adding usage of guest_cpuid_has(CET) without advertising CET is perfectly
ok from a functionality perspective.  Having a user without a consumer
isn't ideal, but it's better than having one gigantic patch.

The problem with advertising CET when it's not fully supported is that it
will break bisection, e.g. trying to boot a CET-enabled guest would get a
#GP during boot and likely crash.  Whether or not a series is useful when
taken as a whole is orthogonal to the integrity of each invidiual patch.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 4/7] KVM: VMX: Load CET states on vmentry/vmexit
  2019-12-10 21:23   ` Sean Christopherson
@ 2019-12-11  1:54     ` Yang Weijiang
  2019-12-11 16:35       ` Sean Christopherson
  0 siblings, 1 reply; 32+ messages in thread
From: Yang Weijiang @ 2019-12-11  1:54 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Yang Weijiang, kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang,
	yu-cheng.yu

On Tue, Dec 10, 2019 at 01:23:05PM -0800, Sean Christopherson wrote:
> On Fri, Nov 01, 2019 at 04:52:19PM +0800, Yang Weijiang wrote:
> > @@ -2834,6 +2837,9 @@ void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
> >  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> >  	unsigned long hw_cr0;
> >  
> > +	if (!(cr0 & X86_CR0_WP) && kvm_read_cr4_bits(vcpu, X86_CR4_CET))
> > +		cr0 |= X86_CR0_WP;
> 
> Huh?  What's the interaction between CR4.CET and CR0.WP?  If there really
> is some non-standard interaction then it needs to be documented in at least
> the changelog and probably with a comment as well.
>
The processor does not allow CR4.CET to be set if CR0.WP = 0 (similarly, it does not allow CR0.WP to be
cleared while CR4.CET = 1).

> > +
> >  	hw_cr0 = (cr0 & ~KVM_VM_CR0_ALWAYS_OFF);
> >  	if (enable_unrestricted_guest)
> >  		hw_cr0 |= KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST;
> > @@ -2936,6 +2942,22 @@ static bool guest_cet_allowed(struct kvm_vcpu *vcpu, u32 feature, u32 mode)
> >  	return false;
> >  }
> >  
> > +bool is_cet_bit_allowed(struct kvm_vcpu *vcpu)
> > +{
> > +	unsigned long cr0;
> > +	bool cet_allowed;
> > +
> > +	cr0 = kvm_read_cr0(vcpu);
> > +	cet_allowed = guest_cet_allowed(vcpu, X86_FEATURE_SHSTK,
> > +					XFEATURE_MASK_CET_USER) ||
> > +		      guest_cet_allowed(vcpu, X86_FEATURE_IBT,
> > +					XFEATURE_MASK_CET_USER);
> > +	if ((cr0 & X86_CR0_WP) && cet_allowed)
> > +		return true;
> 
> So, attempting to set CR4.CET if CR0.WP=0 takes a #GP?  But attempting
> to clear CR0.WP if CR4.CET=1 is ignored?
> 
Per above words in spec., inject #GP to guest in either case?

> > +
> > +	return false;
> > +}
> > +
> >  int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
> >  {
> >  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> > @@ -2976,6 +2998,9 @@ int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
> >  			return 1;
> >  	}
> >  
> > +	if ((cr4 & X86_CR4_CET) && !is_cet_bit_allowed(vcpu))
> > +		return 1;
> > +
> >  	if (vmx->nested.vmxon && !nested_cr4_valid(vcpu, cr4))
> >  		return 1;
> >  
> > @@ -3839,6 +3864,12 @@ void vmx_set_constant_host_state(struct vcpu_vmx *vmx)
> >  
> >  	if (cpu_has_load_ia32_efer())
> >  		vmcs_write64(HOST_IA32_EFER, host_efer);
> > +
> > +	if (cpu_has_load_host_cet_states_ctrl()) {
> > +		vmcs_writel(HOST_S_CET, 0);
> > +		vmcs_writel(HOST_INTR_SSP_TABLE, 0);
> > +		vmcs_writel(HOST_SSP, 0);
> > +	}
> >  }
> >  
> >  void set_cr4_guest_host_mask(struct vcpu_vmx *vmx)
> > @@ -6436,6 +6467,7 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
> >  {
> >  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> >  	unsigned long cr3, cr4;
> > +	bool cet_allowed;
> >  
> >  	/* Record the guest's net vcpu time for enforced NMI injections. */
> >  	if (unlikely(!enable_vnmi &&
> > @@ -6466,6 +6498,25 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
> >  		vmx->loaded_vmcs->host_state.cr3 = cr3;
> >  	}
> >  
> > +	/* To be aligned with kernel code, only user mode is supported now. */
> > +	cet_allowed = guest_cet_allowed(vcpu, X86_FEATURE_SHSTK,
> > +					XFEATURE_MASK_CET_USER) ||
> > +		      guest_cet_allowed(vcpu, X86_FEATURE_IBT,
> > +					XFEATURE_MASK_CET_USER);
> > +	if (cpu_has_load_guest_cet_states_ctrl() && cet_allowed)
> > +		vmcs_set_bits(VM_ENTRY_CONTROLS,
> > +			      VM_ENTRY_LOAD_GUEST_CET_STATE);
> > +	else
> > +		vmcs_clear_bits(VM_ENTRY_CONTROLS,
> > +				VM_ENTRY_LOAD_GUEST_CET_STATE);
> > +
> > +	if (cpu_has_load_host_cet_states_ctrl() && cet_allowed)
> > +		vmcs_set_bits(VM_EXIT_CONTROLS,
> > +			      VM_EXIT_LOAD_HOST_CET_STATE);
> > +	else
> > +		vmcs_clear_bits(VM_EXIT_CONTROLS,
> > +				VM_EXIT_LOAD_HOST_CET_STATE);
> > +
> >  	cr4 = cr4_read_shadow();
> >  	if (unlikely(cr4 != vmx->loaded_vmcs->host_state.cr4)) {
> >  		vmcs_writel(HOST_CR4, cr4);
> > -- 
> > 2.17.2
> > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 6/7] KVM: X86: Load guest fpu state when accessing MSRs managed by XSAVES
  2019-12-10 21:27   ` Sean Christopherson
@ 2019-12-11  2:03     ` Yang Weijiang
  0 siblings, 0 replies; 32+ messages in thread
From: Yang Weijiang @ 2019-12-11  2:03 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Yang Weijiang, kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang,
	yu-cheng.yu

On Tue, Dec 10, 2019 at 01:27:48PM -0800, Sean Christopherson wrote:
> On Fri, Nov 01, 2019 at 04:52:21PM +0800, Yang Weijiang wrote:
> > From: Sean Christopherson <sean.j.christopherson@intel.com>
> > 
 
> > -	for (i = 0; i < msrs->nmsrs; ++i)
> > +	for (i = 0; i < msrs->nmsrs; ++i) {
> > +		if (!fpu_loaded && cet_xss &&
> > +		    is_xsaves_msr(entries[i].index)) {
> > +			kvm_load_guest_fpu(vcpu);
> 
> This needs to also check for a non-NULL @vcpu.  KVM_GET_MSR can be called
> on the VM to invoke do_get_msr_feature().
>
Yeah, I need to add the check, thanks!

> > +			fpu_loaded = true;
> > +		}
> >  		if (do_msr(vcpu, entries[i].index, &entries[i].data))
> >  			break;
> > +	}
> > +	if (fpu_loaded)
> > +		kvm_put_guest_fpu(vcpu);
> >  
> >  	return i;
> >  }
> > -- 
> > 2.17.2
> > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 7/7] KVM: X86: Add user-space access interface for CET MSRs
  2019-12-10 21:58   ` Sean Christopherson
@ 2019-12-11  2:19     ` Yang Weijiang
  2019-12-11 16:27       ` Sean Christopherson
  0 siblings, 1 reply; 32+ messages in thread
From: Yang Weijiang @ 2019-12-11  2:19 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Yang Weijiang, kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang,
	yu-cheng.yu

On Tue, Dec 10, 2019 at 01:58:59PM -0800, Sean Christopherson wrote:
> On Fri, Nov 01, 2019 at 04:52:22PM +0800, Yang Weijiang wrote:
> > There're two different places storing Guest CET states, states
> > managed with XSAVES/XRSTORS, as restored/saved
> > in previous patch, can be read/write directly from/to the MSRs.
> > For those stored in VMCS fields, they're access via vmcs_read/
> > vmcs_write.
> > 
> >  
> > +#define CET_MSR_RSVD_BITS_1    0x3
> > +#define CET_MSR_RSVD_BITS_2   (0xF << 6)
> > +
> > +static bool cet_msr_write_allowed(struct kvm_vcpu *vcpu, struct msr_data *msr)
> > +{
> > +	u32 index = msr->index;
> > +	u64 data = msr->data;
> > +	u32 high_word = data >> 32;
> > +
> > +	if ((index == MSR_IA32_U_CET || index == MSR_IA32_S_CET) &&
> > +	    (data & CET_MSR_RSVD_BITS_2))
> > +		return false;
> > +
> > +	if (is_64_bit_mode(vcpu)) {
> > +		if (is_noncanonical_address(data & PAGE_MASK, vcpu))
> 
> I don't think this is correct.  MSRs that contain an address usually only
> fault on a non-canonical value and do the non-canonical check regardless
> of mode.  E.g. VM-Enter's consistency checks on SYSENTER_E{I,S}P only care
> about a canonical address and are not dependent on mode, and SYSENTER
> itself states that bits 63:32 are ignored in 32-bit mode.  I assume the
> same is true here.
The spec. reads like this:  Must be machine canonical when written on
parts that support 64 bit mode. On parts that do not support 64 bit mode, the bits 63:32 are
reserved and must be 0.  

> If that is indeed the case, what about adding these to the common canonical
> check in __kvm_set_msr()?  That'd cut down on the boilerplate here and
> might make it easier to audit KVM's canonical checks.
> 
> > +			return false;
> > +		else if ((index == MSR_IA32_PL0_SSP ||
> > +			  index == MSR_IA32_PL1_SSP ||
> > +			  index == MSR_IA32_PL2_SSP ||
> > +			  index == MSR_IA32_PL3_SSP) &&
> > +			  (data & CET_MSR_RSVD_BITS_1))
> > +			return false;
> > +	} else {
> > +		if (msr->index == MSR_IA32_INT_SSP_TAB)
> > +			return false;
> > +		else if ((index == MSR_IA32_U_CET ||
> > +			  index == MSR_IA32_S_CET ||
> > +			  index == MSR_IA32_PL0_SSP ||
> > +			  index == MSR_IA32_PL1_SSP ||
> > +			  index == MSR_IA32_PL2_SSP ||
> > +			  index == MSR_IA32_PL3_SSP) &&
> > +			  (high_word & ~0ul))
> > +			return false;
> > +	}
> > +
> > +	return true;
> > +}
> 
> This helper seems like overkill, e.g. it's filled with index-specific
> checks, but is called from code that has already switched on the index.
> Open coding the individual checks is likely more readable and would require
> less code, especially if the canonical checks are cleaned up.
>
I'm afraid if the checks are not wrapped in a helper, there're many
repeat checking-code, that's why I'm using a wrapper.

> > +
> > +static bool cet_msr_access_allowed(struct kvm_vcpu *vcpu, struct msr_data *msr)
> > +{
> > +	u64 kvm_xss;
> > +	u32 index = msr->index;
> > +
> > +	if (is_guest_mode(vcpu))
> > +		return false;
> 
> I may have missed this in an earlier discussion, does CET not support
> nesting?
>
I don't want to make CET avaible to nested guest at time being, first to
make it available to L1 guest first. So I need to avoid exposing any CET
CPUID/MSRs to a nested guest.

> > +
> > +	kvm_xss = kvm_supported_xss();
> > +
> > +	switch (index) {
> > +	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
> > +		if (!boot_cpu_has(X86_FEATURE_SHSTK))
> > +			return false;
> > +		if (!msr->host_initiated) {
> > +			if (!guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
> > +				return false;
> > +		} else {
> 
> This looks wrong, WRMSR from the guest only checks CPUID, it doesn't check
> kvm_xss.
> 
OOPs, I need to add the check, thank you!

> > +			if (index == MSR_IA32_PL3_SSP) {
> > +				if (!(kvm_xss & XFEATURE_MASK_CET_USER))
> > +					return false;
> > +			} else {
> > +				if (!(kvm_xss & XFEATURE_MASK_CET_KERNEL))
> > +					return false;
> > +			}
> > +		}
> > +		break;
> > +	case MSR_IA32_U_CET:
> > +	case MSR_IA32_S_CET:
> 
> Rather than bundle everything in a single access_allowed() helper, it might
> be easier to have separate helpers for each class of MSR.   Except for the
> guest_mode() check, there's no overlap between the classes.
>
Sure, let me double check the code.

> > +		if (!boot_cpu_has(X86_FEATURE_SHSTK) &&
> > +		    !boot_cpu_has(X86_FEATURE_IBT))
> > +			return false;
> > +
> > +		if (!msr->host_initiated) {
> > +			if (!guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) &&
> > +			    !guest_cpuid_has(vcpu, X86_FEATURE_IBT))
> > +				return false;
> > +		} else if (index == MSR_IA32_U_CET &&
> > +			   !(kvm_xss & XFEATURE_MASK_CET_USER))
> 
> Same comment about guest not checking kvm_xss.
>
OK.

> > +			return false;
> > +		break;
> > +	case MSR_IA32_INT_SSP_TAB:
> > +		if (!boot_cpu_has(X86_FEATURE_SHSTK))
> > +			return false;
> > +
> > +		if (!msr->host_initiated &&
> > +		    !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
> > +			return false;
> > +		break;
> > +	default:
> > +		return false;
> > +	}
> > +	return true;
> > +}
> >  /*
> >   * Reads an msr value (of 'msr_index') into 'pdata'.
> >   * Returns 0 on success, non-0 otherwise.
> > @@ -1788,6 +1880,26 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >  		else
> >  			msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
> >  		break;
> > +	case MSR_IA32_S_CET:
> > +		if (!cet_msr_access_allowed(vcpu, msr_info))
> > +			return 1;
> > +		msr_info->data = vmcs_readl(GUEST_S_CET);
> > +		break;
> > +	case MSR_IA32_INT_SSP_TAB:
> > +		if (!cet_msr_access_allowed(vcpu, msr_info))
> > +			return 1;
> > +		msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE);
> > +		break;
> > +	case MSR_IA32_U_CET:
> > +		if (!cet_msr_access_allowed(vcpu, msr_info))
> > +			return 1;
> > +		rdmsrl(MSR_IA32_U_CET, msr_info->data);
> > +		break;
> > +	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
> > +		if (!cet_msr_access_allowed(vcpu, msr_info))
> > +			return 1;
> > +		rdmsrl(msr_info->index, msr_info->data);
> > +		break;
> >  	case MSR_TSC_AUX:
> >  		if (!msr_info->host_initiated &&
> >  		    !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> > @@ -2039,6 +2151,34 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >  		else
> >  			vmx->pt_desc.guest.addr_a[index / 2] = data;
> >  		break;
> > +	case MSR_IA32_S_CET:
> > +		if (!cet_msr_access_allowed(vcpu, msr_info))
> > +			return 1;
> > +		if (!cet_msr_write_allowed(vcpu, msr_info))
> > +			return 1;
> > +		vmcs_writel(GUEST_S_CET, data);
> > +		break;
> > +	case MSR_IA32_INT_SSP_TAB:
> > +		if (!cet_msr_access_allowed(vcpu, msr_info))
> > +			return 1;
> > +		if (!cet_msr_write_allowed(vcpu, msr_info))
> > +			return 1;
> > +		vmcs_writel(GUEST_INTR_SSP_TABLE, data);
> > +		break;
> > +	case MSR_IA32_U_CET:
> > +		if (!cet_msr_access_allowed(vcpu, msr_info))
> > +			return 1;
> > +		if (!cet_msr_write_allowed(vcpu, msr_info))
> > +			return 1;
> > +		wrmsrl(MSR_IA32_U_CET, data);
> > +		break;
> > +	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
> > +		if (!cet_msr_access_allowed(vcpu, msr_info))
> > +			return 1;
> > +		if (!cet_msr_write_allowed(vcpu, msr_info))
> > +			return 1;
> > +		wrmsrl(msr_info->index, data);
> > +		break;
> >  	case MSR_TSC_AUX:
> >  		if (!msr_info->host_initiated &&
> >  		    !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 6275a75d5802..1bbe4550da90 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -1143,6 +1143,9 @@ static u32 msrs_to_save[] = {
> >  	MSR_IA32_RTIT_ADDR1_A, MSR_IA32_RTIT_ADDR1_B,
> >  	MSR_IA32_RTIT_ADDR2_A, MSR_IA32_RTIT_ADDR2_B,
> >  	MSR_IA32_RTIT_ADDR3_A, MSR_IA32_RTIT_ADDR3_B,
> > +	MSR_IA32_XSS, MSR_IA32_U_CET, MSR_IA32_S_CET,
> > +	MSR_IA32_PL0_SSP, MSR_IA32_PL1_SSP, MSR_IA32_PL2_SSP,
> > +	MSR_IA32_PL3_SSP, MSR_IA32_INT_SSP_TAB,
> >  };
> >  
> >  static unsigned num_msrs_to_save;
> > -- 
> > 2.17.2
> > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 3/7] KVM: VMX: Pass through CET related MSRs
  2019-12-11  1:50       ` Sean Christopherson
@ 2019-12-11  2:27         ` Yang Weijiang
  0 siblings, 0 replies; 32+ messages in thread
From: Yang Weijiang @ 2019-12-11  2:27 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Yang Weijiang, kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang,
	yu-cheng.yu

On Tue, Dec 10, 2019 at 05:50:52PM -0800, Sean Christopherson wrote:
> On Wed, Dec 11, 2019 at 09:32:07AM +0800, Yang Weijiang wrote:
> > On Tue, Dec 10, 2019 at 01:18:21PM -0800, Sean Christopherson wrote:
> > > On Fri, Nov 01, 2019 at 04:52:18PM +0800, Yang Weijiang wrote:
> > > > CET MSRs pass through Guest directly to enhance performance.
> > > > CET runtime control settings are stored in MSR_IA32_{U,S}_CET,
> > > > Shadow Stack Pointer(SSP) are stored in MSR_IA32_PL{0,1,2,3}_SSP,
> > > > SSP table base address is stored in MSR_IA32_INT_SSP_TAB,
> > > > these MSRs are defined in kernel and re-used here.
> > > > 
> > > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > > > index dd387a785c1e..4166c4fcad1e 100644
> > > > --- a/arch/x86/kvm/cpuid.c
> > > > +++ b/arch/x86/kvm/cpuid.c
> > > > @@ -371,13 +371,13 @@ static inline void do_cpuid_7_mask(struct kvm_cpuid_entry2 *entry, int index)
> > > >  		F(AVX512VBMI) | F(LA57) | F(PKU) | 0 /*OSPKE*/ |
> > > >  		F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
> > > >  		F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
> > > > -		F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B);
> > > > +		F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | F(SHSTK);
> > > >  
> > > >  	/* cpuid 7.0.edx*/
> > > >  	const u32 kvm_cpuid_7_0_edx_x86_features =
> > > >  		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
> > > >  		F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
> > > > -		F(MD_CLEAR);
> > > > +		F(MD_CLEAR) | F(IBT);
> > > 
> > > Advertising CET to userspace/guest needs to be done at the end of the
> > > series, or at least after CR4.CET is no longer reserved, e.g. KVM_SET_SREGS
> > > will fail and the guest will get a #GP when trying to set CR4.CET.
> > > 
> > > I'm pretty sure I've said this at least twice in previous versions of
> > > this series...
> > 
> > Thanks Sean for picking these up!
> > The reason is, starting from this patch, I'm using guest_cpuid_has(CET)
> > to check the availability of guest CET CPUID, so logically I would like to let
> > the readers understand CET related CPUID word is
> > defined as above. But no problem, I can move these definitions to a
> > latter patch as the patchset only meaningful as a whole. 
> 
> Adding usage of guest_cpuid_has(CET) without advertising CET is perfectly
> ok from a functionality perspective.  Having a user without a consumer
> isn't ideal, but it's better than having one gigantic patch.
> 
> The problem with advertising CET when it's not fully supported is that it
> will break bisection, e.g. trying to boot a CET-enabled guest would get a
> #GP during boot and likely crash.  Whether or not a series is useful when
> taken as a whole is orthogonal to the integrity of each invidiual patch.

Oh, I omitted case likes bisection, you're right, I'll change it, thanks
a lot!


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 7/7] KVM: X86: Add user-space access interface for CET MSRs
  2019-12-11  2:19     ` Yang Weijiang
@ 2019-12-11 16:27       ` Sean Christopherson
  2019-12-12  0:42         ` Yang Weijiang
  0 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2019-12-11 16:27 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang, yu-cheng.yu

On Wed, Dec 11, 2019 at 10:19:51AM +0800, Yang Weijiang wrote:
> On Tue, Dec 10, 2019 at 01:58:59PM -0800, Sean Christopherson wrote:
> > On Fri, Nov 01, 2019 at 04:52:22PM +0800, Yang Weijiang wrote:
> > > There're two different places storing Guest CET states, states
> > > managed with XSAVES/XRSTORS, as restored/saved
> > > in previous patch, can be read/write directly from/to the MSRs.
> > > For those stored in VMCS fields, they're access via vmcs_read/
> > > vmcs_write.
> > > 
> > >  
> > > +#define CET_MSR_RSVD_BITS_1    0x3
> > > +#define CET_MSR_RSVD_BITS_2   (0xF << 6)
> > > +
> > > +static bool cet_msr_write_allowed(struct kvm_vcpu *vcpu, struct msr_data *msr)
> > > +{
> > > +	u32 index = msr->index;
> > > +	u64 data = msr->data;
> > > +	u32 high_word = data >> 32;
> > > +
> > > +	if ((index == MSR_IA32_U_CET || index == MSR_IA32_S_CET) &&
> > > +	    (data & CET_MSR_RSVD_BITS_2))
> > > +		return false;
> > > +
> > > +	if (is_64_bit_mode(vcpu)) {
> > > +		if (is_noncanonical_address(data & PAGE_MASK, vcpu))
> > 
> > I don't think this is correct.  MSRs that contain an address usually only
> > fault on a non-canonical value and do the non-canonical check regardless
> > of mode.  E.g. VM-Enter's consistency checks on SYSENTER_E{I,S}P only care
> > about a canonical address and are not dependent on mode, and SYSENTER
> > itself states that bits 63:32 are ignored in 32-bit mode.  I assume the
> > same is true here.
> The spec. reads like this:  Must be machine canonical when written on parts
> that support 64 bit mode. On parts that do not support 64 bit mode, the bits
> 63:32 are reserved and must be 0.

Yes, that agrees with me.  The key word is "support", i.e. "on parts that
support 64 bit mode" means "on parts with CPUID.0x80000001.EDX.LM=1."

The reason the architecture works this way is that unless hardware clears
the MSRs on transition from 64->32, bits 63:32 need to be ignored on the
way out instead of being validated on the way in, e.g. software writes a
64-bit value to the MSR and then transitions to 32-bit mode.  Clearing the
MSRs would be painful, slow and error prone, so it's easier for hardware
to simply ignore bits 63:32 in 32-bit mode.

> > If that is indeed the case, what about adding these to the common canonical
> > check in __kvm_set_msr()?  That'd cut down on the boilerplate here and
> > might make it easier to audit KVM's canonical checks.
> > 
> > > +			return false;
> > > +		else if ((index == MSR_IA32_PL0_SSP ||
> > > +			  index == MSR_IA32_PL1_SSP ||
> > > +			  index == MSR_IA32_PL2_SSP ||
> > > +			  index == MSR_IA32_PL3_SSP) &&
> > > +			  (data & CET_MSR_RSVD_BITS_1))
> > > +			return false;
> > > +	} else {
> > > +		if (msr->index == MSR_IA32_INT_SSP_TAB)
> > > +			return false;
> > > +		else if ((index == MSR_IA32_U_CET ||
> > > +			  index == MSR_IA32_S_CET ||
> > > +			  index == MSR_IA32_PL0_SSP ||
> > > +			  index == MSR_IA32_PL1_SSP ||
> > > +			  index == MSR_IA32_PL2_SSP ||
> > > +			  index == MSR_IA32_PL3_SSP) &&
> > > +			  (high_word & ~0ul))
> > > +			return false;
> > > +	}
> > > +
> > > +	return true;
> > > +}
> > 
> > This helper seems like overkill, e.g. it's filled with index-specific
> > checks, but is called from code that has already switched on the index.
> > Open coding the individual checks is likely more readable and would require
> > less code, especially if the canonical checks are cleaned up.
> >
> I'm afraid if the checks are not wrapped in a helper, there're many
> repeat checking-code, that's why I'm using a wrapper.

But you're adding almost as much, if not more, code to re-split the checks
in this helper.

> > > +
> > > +static bool cet_msr_access_allowed(struct kvm_vcpu *vcpu, struct msr_data *msr)
> > > +{
> > > +	u64 kvm_xss;
> > > +	u32 index = msr->index;
> > > +
> > > +	if (is_guest_mode(vcpu))
> > > +		return false;
> > 
> > I may have missed this in an earlier discussion, does CET not support
> > nesting?
> >
> I don't want to make CET avaible to nested guest at time being, first to
> make it available to L1 guest first. So I need to avoid exposing any CET
> CPUID/MSRs to a nested guest.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 4/7] KVM: VMX: Load CET states on vmentry/vmexit
  2019-12-11  1:54     ` Yang Weijiang
@ 2019-12-11 16:35       ` Sean Christopherson
  2019-12-12  1:04         ` Yang Weijiang
  0 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2019-12-11 16:35 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang, yu-cheng.yu

On Wed, Dec 11, 2019 at 09:54:23AM +0800, Yang Weijiang wrote:
> On Tue, Dec 10, 2019 at 01:23:05PM -0800, Sean Christopherson wrote:
> > On Fri, Nov 01, 2019 at 04:52:19PM +0800, Yang Weijiang wrote:
> > > @@ -2834,6 +2837,9 @@ void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
> > >  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> > >  	unsigned long hw_cr0;
> > >  
> > > +	if (!(cr0 & X86_CR0_WP) && kvm_read_cr4_bits(vcpu, X86_CR4_CET))
> > > +		cr0 |= X86_CR0_WP;
> > 
> > Huh?  What's the interaction between CR4.CET and CR0.WP?  If there really
> > is some non-standard interaction then it needs to be documented in at least
> > the changelog and probably with a comment as well.
> >
> The processor does not allow CR4.CET to be set if CR0.WP = 0 (similarly, it
> does not allow CR0.WP to be cleared while CR4.CET = 1).

Ya, as you surmised below, this needs to be a #GP condition.

Have you tested SMM at all?  The interaction between CR0 and CR4 may be
problematic for em_rsm() and/or rsm_enter_protected_mode().

> > > +
> > >  	hw_cr0 = (cr0 & ~KVM_VM_CR0_ALWAYS_OFF);
> > >  	if (enable_unrestricted_guest)
> > >  		hw_cr0 |= KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST;
> > > @@ -2936,6 +2942,22 @@ static bool guest_cet_allowed(struct kvm_vcpu *vcpu, u32 feature, u32 mode)
> > >  	return false;
> > >  }
> > >  
> > > +bool is_cet_bit_allowed(struct kvm_vcpu *vcpu)
> > > +{
> > > +	unsigned long cr0;
> > > +	bool cet_allowed;
> > > +
> > > +	cr0 = kvm_read_cr0(vcpu);
> > > +	cet_allowed = guest_cet_allowed(vcpu, X86_FEATURE_SHSTK,
> > > +					XFEATURE_MASK_CET_USER) ||
> > > +		      guest_cet_allowed(vcpu, X86_FEATURE_IBT,
> > > +					XFEATURE_MASK_CET_USER);
> > > +	if ((cr0 & X86_CR0_WP) && cet_allowed)
> > > +		return true;
> > 
> > So, attempting to set CR4.CET if CR0.WP=0 takes a #GP?  But attempting
> > to clear CR0.WP if CR4.CET=1 is ignored?
> > 
> Per above words in spec., inject #GP to guest in either case?
> 
> > > +
> > > +	return false;
> > > +}
> > > +

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 7/7] KVM: X86: Add user-space access interface for CET MSRs
  2019-12-11 16:27       ` Sean Christopherson
@ 2019-12-12  0:42         ` Yang Weijiang
  0 siblings, 0 replies; 32+ messages in thread
From: Yang Weijiang @ 2019-12-12  0:42 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Yang Weijiang, kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang,
	yu-cheng.yu

On Wed, Dec 11, 2019 at 08:27:02AM -0800, Sean Christopherson wrote:
> On Wed, Dec 11, 2019 at 10:19:51AM +0800, Yang Weijiang wrote:
> > On Tue, Dec 10, 2019 at 01:58:59PM -0800, Sean Christopherson wrote:
> > > On Fri, Nov 01, 2019 at 04:52:22PM +0800, Yang Weijiang wrote:
> > > > There're two different places storing Guest CET states, states
> > > > managed with XSAVES/XRSTORS, as restored/saved
> > > > in previous patch, can be read/write directly from/to the MSRs.
> > > > For those stored in VMCS fields, they're access via vmcs_read/
> > > > vmcs_write.
> > > > 
> > > >  
> > > > +#define CET_MSR_RSVD_BITS_1    0x3
> > > > +#define CET_MSR_RSVD_BITS_2   (0xF << 6)
> > > > +
> > > > +static bool cet_msr_write_allowed(struct kvm_vcpu *vcpu, struct msr_data *msr)
> > > > +{
> > > > +	u32 index = msr->index;
> > > > +	u64 data = msr->data;
> > > > +	u32 high_word = data >> 32;
> > > > +
> > > > +	if ((index == MSR_IA32_U_CET || index == MSR_IA32_S_CET) &&
> > > > +	    (data & CET_MSR_RSVD_BITS_2))
> > > > +		return false;
> > > > +
> > > > +	if (is_64_bit_mode(vcpu)) {
> > > > +		if (is_noncanonical_address(data & PAGE_MASK, vcpu))
> > > 
> > > I don't think this is correct.  MSRs that contain an address usually only
> > > fault on a non-canonical value and do the non-canonical check regardless
> > > of mode.  E.g. VM-Enter's consistency checks on SYSENTER_E{I,S}P only care
> > > about a canonical address and are not dependent on mode, and SYSENTER
> > > itself states that bits 63:32 are ignored in 32-bit mode.  I assume the
> > > same is true here.
> > The spec. reads like this:  Must be machine canonical when written on parts
> > that support 64 bit mode. On parts that do not support 64 bit mode, the bits
> > 63:32 are reserved and must be 0.
> 
> Yes, that agrees with me.  The key word is "support", i.e. "on parts that
> support 64 bit mode" means "on parts with CPUID.0x80000001.EDX.LM=1."
> 
> The reason the architecture works this way is that unless hardware clears
> the MSRs on transition from 64->32, bits 63:32 need to be ignored on the
> way out instead of being validated on the way in, e.g. software writes a
> 64-bit value to the MSR and then transitions to 32-bit mode.  Clearing the
> MSRs would be painful, slow and error prone, so it's easier for hardware
> to simply ignore bits 63:32 in 32-bit mode.
>
Make sense, I'll move the canonical check up to kvm_set_msr() like other
MSRs, thanks!

> > > If that is indeed the case, what about adding these to the common canonical
> > > check in __kvm_set_msr()?  That'd cut down on the boilerplate here and
> > > might make it easier to audit KVM's canonical checks.
> > > 
> > > > +			return false;
> > > > +		else if ((index == MSR_IA32_PL0_SSP ||
> > > > +			  index == MSR_IA32_PL1_SSP ||
> > > > +			  index == MSR_IA32_PL2_SSP ||
> > > > +			  index == MSR_IA32_PL3_SSP) &&
> > > > +			  (data & CET_MSR_RSVD_BITS_1))
> > > > +			return false;
> > > > +	} else {
> > > > +		if (msr->index == MSR_IA32_INT_SSP_TAB)
> > > > +			return false;
> > > > +		else if ((index == MSR_IA32_U_CET ||
> > > > +			  index == MSR_IA32_S_CET ||
> > > > +			  index == MSR_IA32_PL0_SSP ||
> > > > +			  index == MSR_IA32_PL1_SSP ||
> > > > +			  index == MSR_IA32_PL2_SSP ||
> > > > +			  index == MSR_IA32_PL3_SSP) &&
> > > > +			  (high_word & ~0ul))
> > > > +			return false;
> > > > +	}
> > > > +
> > > > +	return true;
> > > > +}
> > > 
> > > This helper seems like overkill, e.g. it's filled with index-specific
> > > checks, but is called from code that has already switched on the index.
> > > Open coding the individual checks is likely more readable and would require
> > > less code, especially if the canonical checks are cleaned up.
> > >
> > I'm afraid if the checks are not wrapped in a helper, there're many
> > repeat checking-code, that's why I'm using a wrapper.
> 
> But you're adding almost as much, if not more, code to re-split the checks
> in this helper.
>
Sure, thanks!

> > > > +
> > > > +static bool cet_msr_access_allowed(struct kvm_vcpu *vcpu, struct msr_data *msr)
> > > > +{
> > > > +	u64 kvm_xss;
> > > > +	u32 index = msr->index;
> > > > +
> > > > +	if (is_guest_mode(vcpu))
> > > > +		return false;
> > > 
> > > I may have missed this in an earlier discussion, does CET not support
> > > nesting?
> > >
> > I don't want to make CET avaible to nested guest at time being, first to
> > make it available to L1 guest first. So I need to avoid exposing any CET
> > CPUID/MSRs to a nested guest.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 4/7] KVM: VMX: Load CET states on vmentry/vmexit
  2019-12-11 16:35       ` Sean Christopherson
@ 2019-12-12  1:04         ` Yang Weijiang
  2019-12-18  0:30           ` Sean Christopherson
  0 siblings, 1 reply; 32+ messages in thread
From: Yang Weijiang @ 2019-12-12  1:04 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Yang Weijiang, kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang,
	yu-cheng.yu

On Wed, Dec 11, 2019 at 08:35:10AM -0800, Sean Christopherson wrote:
> On Wed, Dec 11, 2019 at 09:54:23AM +0800, Yang Weijiang wrote:
> > On Tue, Dec 10, 2019 at 01:23:05PM -0800, Sean Christopherson wrote:
> > > On Fri, Nov 01, 2019 at 04:52:19PM +0800, Yang Weijiang wrote:
> > > > @@ -2834,6 +2837,9 @@ void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
> > > >  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> > > >  	unsigned long hw_cr0;
> > > >  
> > > > +	if (!(cr0 & X86_CR0_WP) && kvm_read_cr4_bits(vcpu, X86_CR4_CET))
> > > > +		cr0 |= X86_CR0_WP;
> > > 
> > > Huh?  What's the interaction between CR4.CET and CR0.WP?  If there really
> > > is some non-standard interaction then it needs to be documented in at least
> > > the changelog and probably with a comment as well.
> > >
> > The processor does not allow CR4.CET to be set if CR0.WP = 0 (similarly, it
> > does not allow CR0.WP to be cleared while CR4.CET = 1).
> 
> Ya, as you surmised below, this needs to be a #GP condition.
>
OK, will do it.

> Have you tested SMM at all?  The interaction between CR0 and CR4 may be
> problematic for em_rsm() and/or rsm_enter_protected_mode().
>
Not yet, what's an easy way to test code in SMM mode?
Thanks!

> > > > +
> > > >  	hw_cr0 = (cr0 & ~KVM_VM_CR0_ALWAYS_OFF);
> > > >  	if (enable_unrestricted_guest)
> > > >  		hw_cr0 |= KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST;
> > > > @@ -2936,6 +2942,22 @@ static bool guest_cet_allowed(struct kvm_vcpu *vcpu, u32 feature, u32 mode)
> > > >  	return false;
> > > >  }
> > > >  
> > > > +bool is_cet_bit_allowed(struct kvm_vcpu *vcpu)
> > > > +{
> > > > +	unsigned long cr0;
> > > > +	bool cet_allowed;
> > > > +
> > > > +	cr0 = kvm_read_cr0(vcpu);
> > > > +	cet_allowed = guest_cet_allowed(vcpu, X86_FEATURE_SHSTK,
> > > > +					XFEATURE_MASK_CET_USER) ||
> > > > +		      guest_cet_allowed(vcpu, X86_FEATURE_IBT,
> > > > +					XFEATURE_MASK_CET_USER);
> > > > +	if ((cr0 & X86_CR0_WP) && cet_allowed)
> > > > +		return true;
> > > 
> > > So, attempting to set CR4.CET if CR0.WP=0 takes a #GP?  But attempting
> > > to clear CR0.WP if CR4.CET=1 is ignored?
> > > 
> > Per above words in spec., inject #GP to guest in either case?
> > 
> > > > +
> > > > +	return false;
> > > > +}
> > > > +

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 0/7] Introduce support for guest CET feature
  2019-11-01  8:52 [PATCH v8 0/7] Introduce support for guest CET feature Yang Weijiang
                   ` (6 preceding siblings ...)
  2019-11-01  8:52 ` [PATCH v8 7/7] KVM: X86: Add user-space access interface for CET MSRs Yang Weijiang
@ 2019-12-12 16:03 ` Konrad Rzeszutek Wilk
  2019-12-13  0:44   ` Yang Weijiang
  7 siblings, 1 reply; 32+ messages in thread
From: Konrad Rzeszutek Wilk @ 2019-12-12 16:03 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: kvm, linux-kernel, pbonzini, sean.j.christopherson, jmattson,
	yu.c.zhang, yu-cheng.yu

On Fri, Nov 01, 2019 at 04:52:15PM +0800, Yang Weijiang wrote:
> Control-flow Enforcement Technology (CET) provides protection against
> Return/Jump-Oriented Programming (ROP/JOP) attack. It includes two
> sub-features: Shadow Stack (SHSTK) and Indirect Branch Tracking (IBT).
> 
> KVM change is required to support guest CET feature.
> This patch serial implemented CET related CPUID/XSAVES enumeration, MSRs
> and vmentry/vmexit configuration etc.so that guest kernel can setup CET
> runtime infrastructure based on them. Some CET MSRs and related feature
> flags used reference the definitions in kernel patchset.
> 
> CET kernel patches is here:
> https://lkml.org/lkml/2019/8/13/1110
> https://lkml.org/lkml/2019/8/13/1109

Is there a git tree with all of them against v5.5-rc1 (so all three series)?
I tried your github tree: https://github.com/yyu168/linux_cet.git #cet
but sadly that does not apply against 5.5-rc1 :-(

Thanks!

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 0/7] Introduce support for guest CET feature
  2019-12-12 16:03 ` [PATCH v8 0/7] Introduce support for guest CET feature Konrad Rzeszutek Wilk
@ 2019-12-13  0:44   ` Yang Weijiang
  0 siblings, 0 replies; 32+ messages in thread
From: Yang Weijiang @ 2019-12-13  0:44 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Yang Weijiang, kvm, linux-kernel, pbonzini,
	sean.j.christopherson, jmattson, yu.c.zhang, yu-cheng.yu

On Thu, Dec 12, 2019 at 11:03:45AM -0500, Konrad Rzeszutek Wilk wrote:
> On Fri, Nov 01, 2019 at 04:52:15PM +0800, Yang Weijiang wrote:
> > Control-flow Enforcement Technology (CET) provides protection against
> > Return/Jump-Oriented Programming (ROP/JOP) attack. It includes two
> > sub-features: Shadow Stack (SHSTK) and Indirect Branch Tracking (IBT).
> > 
> > KVM change is required to support guest CET feature.
> > This patch serial implemented CET related CPUID/XSAVES enumeration, MSRs
> > and vmentry/vmexit configuration etc.so that guest kernel can setup CET
> > runtime infrastructure based on them. Some CET MSRs and related feature
> > flags used reference the definitions in kernel patchset.
> > 
> > CET kernel patches is here:
> > https://lkml.org/lkml/2019/8/13/1110
> > https://lkml.org/lkml/2019/8/13/1109
> 
> Is there a git tree with all of them against v5.5-rc1 (so all three series)?
> I tried your github tree: https://github.com/yyu168/linux_cet.git #cet
> but sadly that does not apply against 5.5-rc1 :-(
> 
> Thanks!
Hi, 
The CET patch includes two parts: one from kernel side the other from KVM,
the kernel patch in github is maintained by my peer, he'll rebase
it to the latest kernel tree shortly after resolve some issues.
Thank you for having interest!


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 3/7] KVM: VMX: Pass through CET related MSRs
  2019-12-10 21:18   ` Sean Christopherson
  2019-12-11  1:32     ` Yang Weijiang
@ 2019-12-16  2:18     ` Yang Weijiang
  2019-12-18  0:34       ` Sean Christopherson
  1 sibling, 1 reply; 32+ messages in thread
From: Yang Weijiang @ 2019-12-16  2:18 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Yang Weijiang, kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang,
	yu-cheng.yu

On Tue, Dec 10, 2019 at 01:18:21PM -0800, Sean Christopherson wrote:
> On Fri, Nov 01, 2019 at 04:52:18PM +0800, Yang Weijiang wrote:
> > CET MSRs pass through Guest directly to enhance performance.
> > CET runtime control settings are stored in MSR_IA32_{U,S}_CET,
> > Shadow Stack Pointer(SSP) are stored in MSR_IA32_PL{0,1,2,3}_SSP,
> > SSP table base address is stored in MSR_IA32_INT_SSP_TAB,
> > these MSRs are defined in kernel and re-used here.
> > 
 > +
> >  static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
> >  {
> >  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> > @@ -7025,6 +7087,9 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
> >  	if (boot_cpu_has(X86_FEATURE_INTEL_PT) &&
> >  			guest_cpuid_has(vcpu, X86_FEATURE_INTEL_PT))
> >  		update_intel_pt_cfg(vcpu);
> > +
> > +	if (!is_guest_mode(vcpu))
> > +		vmx_pass_cet_msrs(vcpu);
> 
> Hmm, this looks insufficent, e.g. deliberately toggling CET from on->off
> while in guest mode would put KVM in a weird state as the msr bitmap for
> L1 would still allow L1 to access the CET MSRs.
>
Hi, Sean,
I don't get you, there's guest mode check before access CET msrs, it'll
fail if it's in guest mode.

> Allowing KVM_SET_CPUID{2} while running a nested guest seems bogus, can we
> kill that path entirely with -EINVAL?
>
Do you mean don't expose CET cpuids to L2 guest?
thanks!

> >  }
> >  
> >  static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
> > -- 
> > 2.17.2
> > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 4/7] KVM: VMX: Load CET states on vmentry/vmexit
  2019-12-12  1:04         ` Yang Weijiang
@ 2019-12-18  0:30           ` Sean Christopherson
  2019-12-18 13:20             ` Yang Weijiang
  0 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2019-12-18  0:30 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang, yu-cheng.yu

On Thu, Dec 12, 2019 at 09:04:24AM +0800, Yang Weijiang wrote:
> On Wed, Dec 11, 2019 at 08:35:10AM -0800, Sean Christopherson wrote:
> > Have you tested SMM at all?  The interaction between CR0 and CR4 may be
> > problematic for em_rsm() and/or rsm_enter_protected_mode().
> >
> Not yet, what's an easy way to test code in SMM mode?

IIRC, SeaBIOS does SMM stuff by default.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 3/7] KVM: VMX: Pass through CET related MSRs
  2019-12-16  2:18     ` Yang Weijiang
@ 2019-12-18  0:34       ` Sean Christopherson
  2019-12-18 13:55         ` Yang Weijiang
  0 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2019-12-18  0:34 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang, yu-cheng.yu

On Mon, Dec 16, 2019 at 10:18:16AM +0800, Yang Weijiang wrote:
> On Tue, Dec 10, 2019 at 01:18:21PM -0800, Sean Christopherson wrote:
> > On Fri, Nov 01, 2019 at 04:52:18PM +0800, Yang Weijiang wrote:
> > > CET MSRs pass through Guest directly to enhance performance.
> > > CET runtime control settings are stored in MSR_IA32_{U,S}_CET,
> > > Shadow Stack Pointer(SSP) are stored in MSR_IA32_PL{0,1,2,3}_SSP,
> > > SSP table base address is stored in MSR_IA32_INT_SSP_TAB,
> > > these MSRs are defined in kernel and re-used here.
> > > 
>  > +
> > >  static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
> > >  {
> > >  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> > > @@ -7025,6 +7087,9 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
> > >  	if (boot_cpu_has(X86_FEATURE_INTEL_PT) &&
> > >  			guest_cpuid_has(vcpu, X86_FEATURE_INTEL_PT))
> > >  		update_intel_pt_cfg(vcpu);
> > > +
> > > +	if (!is_guest_mode(vcpu))
> > > +		vmx_pass_cet_msrs(vcpu);
> > 
> > Hmm, this looks insufficent, e.g. deliberately toggling CET from on->off
> > while in guest mode would put KVM in a weird state as the msr bitmap for
> > L1 would still allow L1 to access the CET MSRs.
> >
> Hi, Sean,
> I don't get you, there's guest mode check before access CET msrs, it'll
> fail if it's in guest mode.

KVM can exit to userspae while L2 is active.  If userspace then did a
KVM_SET_CPUID2, e.g. instead of KVM_RUN, vmx_cpuid_update() would skip
vmx_pass_cet_msrs() and KVM would never update L1's MSR bitmaps.

> > Allowing KVM_SET_CPUID{2} while running a nested guest seems bogus, can we
> > kill that path entirely with -EINVAL?
> >
> Do you mean don't expose CET cpuids to L2 guest?

I mean completely disallow KVM_SET_CPUID and KVM_SET_CPUID2 if
is_guest_mode() is true.  My question is mostly directed at Paolo and
anyone else that has an opinion on whether we can massage the ABI to
retroactively change KVM_SET_CPUID{2} behavior.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 4/7] KVM: VMX: Load CET states on vmentry/vmexit
  2019-12-18  0:30           ` Sean Christopherson
@ 2019-12-18 13:20             ` Yang Weijiang
  0 siblings, 0 replies; 32+ messages in thread
From: Yang Weijiang @ 2019-12-18 13:20 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Yang Weijiang, kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang,
	yu-cheng.yu

On Tue, Dec 17, 2019 at 04:30:05PM -0800, Sean Christopherson wrote:
> On Thu, Dec 12, 2019 at 09:04:24AM +0800, Yang Weijiang wrote:
> > On Wed, Dec 11, 2019 at 08:35:10AM -0800, Sean Christopherson wrote:
> > > Have you tested SMM at all?  The interaction between CR0 and CR4 may be
> > > problematic for em_rsm() and/or rsm_enter_protected_mode().
> > >
> > Not yet, what's an easy way to test code in SMM mode?
> 
> IIRC, SeaBIOS does SMM stuff by default.
Thanks Sean. I'll check this part.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 3/7] KVM: VMX: Pass through CET related MSRs
  2019-12-18  0:34       ` Sean Christopherson
@ 2019-12-18 13:55         ` Yang Weijiang
  2019-12-18 16:02           ` Sean Christopherson
  0 siblings, 1 reply; 32+ messages in thread
From: Yang Weijiang @ 2019-12-18 13:55 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Yang Weijiang, kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang,
	yu-cheng.yu

On Tue, Dec 17, 2019 at 04:34:55PM -0800, Sean Christopherson wrote:
> On Mon, Dec 16, 2019 at 10:18:16AM +0800, Yang Weijiang wrote:
> > On Tue, Dec 10, 2019 at 01:18:21PM -0800, Sean Christopherson wrote:
> > > On Fri, Nov 01, 2019 at 04:52:18PM +0800, Yang Weijiang wrote:
> > > > CET MSRs pass through Guest directly to enhance performance.
> > > > CET runtime control settings are stored in MSR_IA32_{U,S}_CET,
> > > > Shadow Stack Pointer(SSP) are stored in MSR_IA32_PL{0,1,2,3}_SSP,
> > > > SSP table base address is stored in MSR_IA32_INT_SSP_TAB,
> > > > these MSRs are defined in kernel and re-used here.
> > > > 
> >  > +
> > > >  static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
> > > >  {
> > > >  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> > > > @@ -7025,6 +7087,9 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
> > > >  	if (boot_cpu_has(X86_FEATURE_INTEL_PT) &&
> > > >  			guest_cpuid_has(vcpu, X86_FEATURE_INTEL_PT))
> > > >  		update_intel_pt_cfg(vcpu);
> > > > +
> > > > +	if (!is_guest_mode(vcpu))
> > > > +		vmx_pass_cet_msrs(vcpu);
> > > 
> > > Hmm, this looks insufficent, e.g. deliberately toggling CET from on->off
> > > while in guest mode would put KVM in a weird state as the msr bitmap for
> > > L1 would still allow L1 to access the CET MSRs.
> > >
> > Hi, Sean,
> > I don't get you, there's guest mode check before access CET msrs, it'll
> > fail if it's in guest mode.
> 
> KVM can exit to userspae while L2 is active.  If userspace then did a
> KVM_SET_CPUID2, e.g. instead of KVM_RUN, vmx_cpuid_update() would skip
> vmx_pass_cet_msrs() and KVM would never update L1's MSR bitmaps.
>
Thanks, it makes sense to me. Given current implementation, how about
removing above check and adding it in CET CPUID
enumeration for L2 so that no CET msrs passed through to L2 when
is_guest_mode() is true?

> > > Allowing KVM_SET_CPUID{2} while running a nested guest seems bogus, can we
> > > kill that path entirely with -EINVAL?
> > >
> > Do you mean don't expose CET cpuids to L2 guest?
> 
> I mean completely disallow KVM_SET_CPUID and KVM_SET_CPUID2 if
> is_guest_mode() is true.  My question is mostly directed at Paolo and
> anyone else that has an opinion on whether we can massage the ABI to
> retroactively change KVM_SET_CPUID{2} behavior.
This sounds like something deserving an individual patch after get
agreement in community. I'll put it aside right now.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 3/7] KVM: VMX: Pass through CET related MSRs
  2019-12-18 13:55         ` Yang Weijiang
@ 2019-12-18 16:02           ` Sean Christopherson
  0 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2019-12-18 16:02 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: kvm, linux-kernel, pbonzini, jmattson, yu.c.zhang, yu-cheng.yu

On Wed, Dec 18, 2019 at 09:55:13PM +0800, Yang Weijiang wrote:
> On Tue, Dec 17, 2019 at 04:34:55PM -0800, Sean Christopherson wrote:
> > On Mon, Dec 16, 2019 at 10:18:16AM +0800, Yang Weijiang wrote:
> > > On Tue, Dec 10, 2019 at 01:18:21PM -0800, Sean Christopherson wrote:
> > > > On Fri, Nov 01, 2019 at 04:52:18PM +0800, Yang Weijiang wrote:
> > > > > CET MSRs pass through Guest directly to enhance performance.
> > > > > CET runtime control settings are stored in MSR_IA32_{U,S}_CET,
> > > > > Shadow Stack Pointer(SSP) are stored in MSR_IA32_PL{0,1,2,3}_SSP,
> > > > > SSP table base address is stored in MSR_IA32_INT_SSP_TAB,
> > > > > these MSRs are defined in kernel and re-used here.
> > > > > 
> > >  > +
> > > > >  static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
> > > > >  {
> > > > >  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> > > > > @@ -7025,6 +7087,9 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
> > > > >  	if (boot_cpu_has(X86_FEATURE_INTEL_PT) &&
> > > > >  			guest_cpuid_has(vcpu, X86_FEATURE_INTEL_PT))
> > > > >  		update_intel_pt_cfg(vcpu);
> > > > > +
> > > > > +	if (!is_guest_mode(vcpu))
> > > > > +		vmx_pass_cet_msrs(vcpu);
> > > > 
> > > > Hmm, this looks insufficent, e.g. deliberately toggling CET from on->off
> > > > while in guest mode would put KVM in a weird state as the msr bitmap for
> > > > L1 would still allow L1 to access the CET MSRs.
> > > >
> > > Hi, Sean,
> > > I don't get you, there's guest mode check before access CET msrs, it'll
> > > fail if it's in guest mode.
> > 
> > KVM can exit to userspae while L2 is active.  If userspace then did a
> > KVM_SET_CPUID2, e.g. instead of KVM_RUN, vmx_cpuid_update() would skip
> > vmx_pass_cet_msrs() and KVM would never update L1's MSR bitmaps.
> >
> Thanks, it makes sense to me. Given current implementation, how about
> removing above check and adding it in CET CPUID
> enumeration for L2 so that no CET msrs passed through to L2 when
> is_guest_mode() is true?

But you still need to update L1's MSR bitmaps.  That can obviously be done
all at once, but it's annoying and IMO unnecessarily complex.

> > > > Allowing KVM_SET_CPUID{2} while running a nested guest seems bogus, can we
> > > > kill that path entirely with -EINVAL?
> > > >
> > > Do you mean don't expose CET cpuids to L2 guest?
> > 
> > I mean completely disallow KVM_SET_CPUID and KVM_SET_CPUID2 if
> > is_guest_mode() is true.  My question is mostly directed at Paolo and
> > anyone else that has an opinion on whether we can massage the ABI to
> > retroactively change KVM_SET_CPUID{2} behavior.
>
> This sounds like something deserving an individual patch after get
> agreement in community. I'll put it aside right now.

Normally I would agree, but I think in this case it would significantly
reduce the complexity and implementation cost of CET support.  I'll send a
patch to kickstart the conversation, it's a tiny change in terms of code.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2019-12-18 16:02 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-01  8:52 [PATCH v8 0/7] Introduce support for guest CET feature Yang Weijiang
2019-11-01  8:52 ` [PATCH v8 1/7] KVM: CPUID: Fix IA32_XSS support in CPUID(0xd,i) enumeration Yang Weijiang
2019-11-01  8:52 ` [PATCH v8 2/7] KVM: VMX: Define CET VMCS fields and #CP flag Yang Weijiang
2019-12-10 21:00   ` Sean Christopherson
2019-12-11  1:45     ` Yang Weijiang
2019-11-01  8:52 ` [PATCH v8 3/7] KVM: VMX: Pass through CET related MSRs Yang Weijiang
2019-12-10 21:18   ` Sean Christopherson
2019-12-11  1:32     ` Yang Weijiang
2019-12-11  1:50       ` Sean Christopherson
2019-12-11  2:27         ` Yang Weijiang
2019-12-16  2:18     ` Yang Weijiang
2019-12-18  0:34       ` Sean Christopherson
2019-12-18 13:55         ` Yang Weijiang
2019-12-18 16:02           ` Sean Christopherson
2019-11-01  8:52 ` [PATCH v8 4/7] KVM: VMX: Load CET states on vmentry/vmexit Yang Weijiang
2019-12-10 21:23   ` Sean Christopherson
2019-12-11  1:54     ` Yang Weijiang
2019-12-11 16:35       ` Sean Christopherson
2019-12-12  1:04         ` Yang Weijiang
2019-12-18  0:30           ` Sean Christopherson
2019-12-18 13:20             ` Yang Weijiang
2019-11-01  8:52 ` [PATCH v8 5/7] KVM: X86: Enable CET bits update in IA32_XSS Yang Weijiang
2019-11-01  8:52 ` [PATCH v8 6/7] KVM: X86: Load guest fpu state when accessing MSRs managed by XSAVES Yang Weijiang
2019-12-10 21:27   ` Sean Christopherson
2019-12-11  2:03     ` Yang Weijiang
2019-11-01  8:52 ` [PATCH v8 7/7] KVM: X86: Add user-space access interface for CET MSRs Yang Weijiang
2019-12-10 21:58   ` Sean Christopherson
2019-12-11  2:19     ` Yang Weijiang
2019-12-11 16:27       ` Sean Christopherson
2019-12-12  0:42         ` Yang Weijiang
2019-12-12 16:03 ` [PATCH v8 0/7] Introduce support for guest CET feature Konrad Rzeszutek Wilk
2019-12-13  0:44   ` Yang Weijiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).