linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 0/7] Introduce support for Guest CET feature
@ 2019-09-27  2:19 Yang Weijiang
  2019-09-27  2:19 ` [PATCH v7 1/7] KVM: CPUID: Fix IA32_XSS support in CPUID(0xd,i) enumeration Yang Weijiang
                   ` (7 more replies)
  0 siblings, 8 replies; 43+ messages in thread
From: Yang Weijiang @ 2019-09-27  2:19 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini, sean.j.christopherson
  Cc: mst, rkrcmar, jmattson, Yang Weijiang

Control-flow Enforcement Technology (CET) provides protection against
Return/Jump-Oriented Programming (ROP/JOP) attack. It includes two
sub-features: Shadow Stack (SHSTK) and Indirect Branch Tracking (IBT).

KVM modification is required to support Guest CET feature.
This patch serial implemented CET related CPUID/XSAVES enumeration, MSRs
and VMEntry configuration etc.so that Guest kernel can setup CET
runtime infrastructure based on them. Some MSRs and related feature
flags used in the patches reference the definitions in kernel patch.

CET kernel patches is here:
https://lkml.org/lkml/2019/8/13/1110
https://lkml.org/lkml/2019/8/13/1109

v6 -> v7:
- Rebased patch to kernel v5.3
- Sean suggested to change CPUID(0xd, n) enumeration code as alined with
  existing one, and I think it's better to make the fix as an independent patch 
  since XSS MSR are being used widely on X86 platforms.
- Check more host and guest status before configure guest CET
  per Sean's feedback.
- Add error-check before guest accesses CET MSRs per Sean's feedback.
- Other minor fixes suggested by Sean.

v5 -> v6:
- Rebase patch to kernel v5.2.
- Move CPUID(0xD, n>=1) helper to a seperate patch.
- Merge xsave size fix with other patch.
- Other minor fixes per community feedback.

v4 -> v5:
- Rebase patch to kernel v5.1.
- Wrap CPUID(0xD, n>=1) code to a helper function.
- Pass through MSR_IA32_PL1_SSP and MSR_IA32_PL2_SSP to Guest.
- Add Co-developed-by expression in patch description.
- Refine patch description.

v3 -> v4:
- Add Sean's patch for loading Guest fpu state before access XSAVES
  managed CET MSRs.
- Melt down CET bits setting into CPUID configuration patch.
- Add VMX interface to query Host XSS.
- Check Host and Guest XSS support bits before set Guest XSS.
- Make Guest SHSTK and IBT feature enabling independent.
- Do not report CET support to Guest when Host CET feature is Disabled.

v2 -> v3:
- Modified patches to make Guest CET independent to Host enabling.
- Added patch 8 to add user space access for Guest CET MSR access.
- Modified code comments and patch description to reflect changes.

v1 -> v2:
- Re-ordered patch sequence, combined one patch.
- Added more description for CET related VMCS fields.
- Added Host CET capability check while enabling Guest CET loading bit.
- Added Host CET capability check while reporting Guest CPUID(EAX=7, EXC=0).
- Modified code in reporting Guest CPUID(EAX=D,ECX>=1), make it clearer.
- Added Host and Guest XSS mask check while setting bits for Guest XSS.


PATCH 1    : Fix CPUID(0xD, n) enumeration to support XSS MSR.
PATCH 2    : Define CET VMCS fields and bits.
PATCH 3    : Pass through CET MSRs to Guest.
PATCH 4    : Load Guest CET states when CET is enabled.
PATCH 5    : Add CET CR4 bit and CET xsaves support in XSS MSR.
PATCH 6    : Load Guest FPU states for XSAVES managed MSRs.
PATCH 7    : Add user-space CET MSR access interface.


Sean Christopherson (1):
  KVM: x86: Load Guest fpu state when accessing MSRs managed by XSAVES

Yang Weijiang (6):
  KVM: CPUID: Fix IA32_XSS support in CPUID(0xd,i) enumeration
  KVM: VMX: Define CET VMCS fields and CPUID flags
  KVM: VMX: Pass through CET related MSRs to Guest
  KVM: VMX: Load Guest CET via VMCS when CET is enabled in Guest
  KVM: X86: Add CET CR4 bit and XSS support
  KVM: X86: Add user-space access interface for CET MSRs

 arch/x86/include/asm/kvm_host.h |   5 +-
 arch/x86/include/asm/vmx.h      |   8 ++
 arch/x86/kvm/cpuid.c            | 110 ++++++++++++++-------
 arch/x86/kvm/cpuid.h            |   2 +
 arch/x86/kvm/svm.c              |   7 ++
 arch/x86/kvm/vmx/vmx.c          | 169 +++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.c              |  22 ++++-
 arch/x86/kvm/x86.h              |   8 ++
 8 files changed, 287 insertions(+), 44 deletions(-)

-- 
2.17.2


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v7 1/7] KVM: CPUID: Fix IA32_XSS support in CPUID(0xd,i) enumeration
  2019-09-27  2:19 [PATCH v7 0/7] Introduce support for Guest CET feature Yang Weijiang
@ 2019-09-27  2:19 ` Yang Weijiang
  2019-10-02 17:26   ` Jim Mattson
  2019-09-27  2:19 ` [PATCH v7 2/7] kvm: vmx: Define CET VMCS fields and CPUID flags Yang Weijiang
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 43+ messages in thread
From: Yang Weijiang @ 2019-09-27  2:19 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini, sean.j.christopherson
  Cc: mst, rkrcmar, jmattson, Yang Weijiang

The control bits in IA32_XSS MSR are being used for new features,
but current CPUID(0xd,i) enumeration code doesn't support them, so
fix existing code first.

The supervisor states in IA32_XSS haven't been used in public
KVM code, so set KVM_SUPPORTED_XSS to 0 now, anyone who's developing
IA32_XSS related feature may expand the macro to add the CPUID support,
otherwise, CPUID(0xd,i>1) always reports 0 of the subleaf to guest.

Extracted old code into a new filter and keep it same flavor as others.

This patch passed selftest on a few Intel platforms.

Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/cpuid.c            | 94 +++++++++++++++++++++------------
 arch/x86/kvm/svm.c              |  7 +++
 arch/x86/kvm/vmx/vmx.c          |  6 +++
 arch/x86/kvm/x86.h              |  7 +++
 5 files changed, 82 insertions(+), 33 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 74e88e5edd9c..d018df8c5f32 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1209,6 +1209,7 @@ struct kvm_x86_ops {
 	uint16_t (*nested_get_evmcs_version)(struct kvm_vcpu *vcpu);
 
 	bool (*need_emulation_on_page_fault)(struct kvm_vcpu *vcpu);
+	u64 (*supported_xss)(void);
 };
 
 struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 22c2720cd948..9d282fec0a62 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -62,6 +62,11 @@ u64 kvm_supported_xcr0(void)
 	return xcr0;
 }
 
+u64 kvm_supported_xss(void)
+{
+	return KVM_SUPPORTED_XSS & kvm_x86_ops->supported_xss();
+}
+
 #define F(x) bit(X86_FEATURE_##x)
 
 int kvm_update_cpuid(struct kvm_vcpu *vcpu)
@@ -414,6 +419,50 @@ static inline void do_cpuid_7_mask(struct kvm_cpuid_entry2 *entry, int index)
 	}
 }
 
+static inline void do_cpuid_0xd_mask(struct kvm_cpuid_entry2 *entry, int index)
+{
+	unsigned int f_xsaves = kvm_x86_ops->xsaves_supported() ? F(XSAVES) : 0;
+	/* cpuid 0xD.1.eax */
+	const u32 kvm_cpuid_D_1_eax_x86_features =
+		F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1) | f_xsaves;
+	u64 u_supported = kvm_supported_xcr0();
+	u64 s_supported = kvm_supported_xss();
+	u64 supported;
+
+	switch (index) {
+	case 0:
+		entry->eax &= u_supported;
+		entry->ebx = xstate_required_size(u_supported, false);
+		entry->ecx = entry->ebx;
+		entry->edx = 0;
+		break;
+	case 1:
+		supported = u_supported | s_supported;
+		entry->eax &= kvm_cpuid_D_1_eax_x86_features;
+		cpuid_mask(&entry->eax, CPUID_D_1_EAX);
+		entry->ebx = 0;
+		entry->edx = 0;
+		entry->ecx &= s_supported;
+		if (entry->eax & (F(XSAVES) | F(XSAVEC)))
+			entry->ebx = xstate_required_size(supported, true);
+
+		break;
+	default:
+		supported = (entry->ecx & 1) ? s_supported : u_supported;
+		if (!(supported & ((u64)1 << index))) {
+			entry->eax = 0;
+			entry->ebx = 0;
+			entry->ecx = 0;
+			entry->edx = 0;
+			return;
+		}
+		if (entry->ecx)
+			entry->ebx = 0;
+		entry->edx = 0;
+		break;
+	}
+}
+
 static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
 				  int *nent, int maxnent)
 {
@@ -428,7 +477,6 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
 	unsigned f_lm = 0;
 #endif
 	unsigned f_rdtscp = kvm_x86_ops->rdtscp_supported() ? F(RDTSCP) : 0;
-	unsigned f_xsaves = kvm_x86_ops->xsaves_supported() ? F(XSAVES) : 0;
 	unsigned f_intel_pt = kvm_x86_ops->pt_supported() ? F(INTEL_PT) : 0;
 
 	/* cpuid 1.edx */
@@ -482,10 +530,6 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
 		F(ACE2) | F(ACE2_EN) | F(PHE) | F(PHE_EN) |
 		F(PMM) | F(PMM_EN);
 
-	/* cpuid 0xD.1.eax */
-	const u32 kvm_cpuid_D_1_eax_x86_features =
-		F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1) | f_xsaves;
-
 	/* all calls to cpuid_count() should be made on the same cpu */
 	get_cpu();
 
@@ -622,38 +666,22 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
 		break;
 	}
 	case 0xd: {
-		int idx, i;
-		u64 supported = kvm_supported_xcr0();
-
-		entry->eax &= supported;
-		entry->ebx = xstate_required_size(supported, false);
-		entry->ecx = entry->ebx;
-		entry->edx &= supported >> 32;
-		if (!supported)
-			break;
+		int i, idx;
 
-		for (idx = 1, i = 1; idx < 64; ++idx) {
-			u64 mask = ((u64)1 << idx);
+		do_cpuid_0xd_mask(&entry[0], 0);
+		for (i = 1, idx = 1; idx < 64; ++idx) {
 			if (*nent >= maxnent)
 				goto out;
-
 			do_host_cpuid(&entry[i], function, idx);
-			if (idx == 1) {
-				entry[i].eax &= kvm_cpuid_D_1_eax_x86_features;
-				cpuid_mask(&entry[i].eax, CPUID_D_1_EAX);
-				entry[i].ebx = 0;
-				if (entry[i].eax & (F(XSAVES)|F(XSAVEC)))
-					entry[i].ebx =
-						xstate_required_size(supported,
-								     true);
-			} else {
-				if (entry[i].eax == 0 || !(supported & mask))
-					continue;
-				if (WARN_ON_ONCE(entry[i].ecx & 1))
-					continue;
-			}
-			entry[i].ecx = 0;
-			entry[i].edx = 0;
+			if (entry[i].eax == 0 && entry[i].ebx == 0 &&
+			    entry[i].ecx == 0 && entry[i].edx == 0)
+				continue;
+
+			do_cpuid_0xd_mask(&entry[i], idx);
+			if (entry[i].eax == 0 && entry[i].ebx == 0 &&
+			    entry[i].ecx == 0 && entry[i].edx == 0)
+				continue;
+
 			++*nent;
 			++i;
 		}
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index e0368076a1ef..be967bf9a81d 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -7193,6 +7193,11 @@ static bool svm_need_emulation_on_page_fault(struct kvm_vcpu *vcpu)
 	return false;
 }
 
+static u64 svm_supported_xss(void)
+{
+	return 0;
+}
+
 static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
 	.cpu_has_kvm_support = has_svm,
 	.disabled_by_bios = is_disabled,
@@ -7329,6 +7334,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
 	.nested_get_evmcs_version = NULL,
 
 	.need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
+
+	.supported_xss = svm_supported_xss,
 };
 
 static int __init svm_init(void)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c6f6b05004d9..a84198cff397 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1651,6 +1651,11 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
 	return !(val & ~valid_bits);
 }
 
+static inline u64 vmx_supported_xss(void)
+{
+	return host_xss;
+}
+
 static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
 {
 	switch (msr->index) {
@@ -7799,6 +7804,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
 	.nested_enable_evmcs = NULL,
 	.nested_get_evmcs_version = NULL,
 	.need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
+	.supported_xss = vmx_supported_xss,
 };
 
 static void vmx_cleanup_l1d_flush(void)
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 6594020c0691..fbffabad0370 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -293,6 +293,13 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, unsigned long cr2,
 				| XFEATURE_MASK_YMM | XFEATURE_MASK_BNDREGS \
 				| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
 				| XFEATURE_MASK_PKRU)
+
+/*
+ * Right now, no XSS states are used on x86 platform,
+ * expand the macro for new features.
+ */
+#define KVM_SUPPORTED_XSS	(0)
+
 extern u64 host_xcr0;
 
 extern u64 kvm_supported_xcr0(void);
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v7 2/7] kvm: vmx: Define CET VMCS fields and CPUID flags
  2019-09-27  2:19 [PATCH v7 0/7] Introduce support for Guest CET feature Yang Weijiang
  2019-09-27  2:19 ` [PATCH v7 1/7] KVM: CPUID: Fix IA32_XSS support in CPUID(0xd,i) enumeration Yang Weijiang
@ 2019-09-27  2:19 ` Yang Weijiang
  2019-10-02 18:04   ` Jim Mattson
  2019-09-27  2:19 ` [PATCH v7 3/7] KVM: VMX: Pass through CET related MSRs to Guest Yang Weijiang
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 43+ messages in thread
From: Yang Weijiang @ 2019-09-27  2:19 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini, sean.j.christopherson
  Cc: mst, rkrcmar, jmattson, Yang Weijiang

CET(Control-flow Enforcement Technology) is an upcoming Intel(R)
processor feature that blocks Return/Jump-Oriented Programming(ROP)
attacks. It provides the following capabilities to defend
against ROP/JOP style control-flow subversion attacks:

Shadow Stack (SHSTK):
  A second stack for program which is used exclusively for
  control transfer operations.

Indirect Branch Tracking (IBT):
  Code branching protection to defend against jump/call oriented
  programming.

Several new CET MSRs are defined in kernel to support CET:
  MSR_IA32_{U,S}_CET: Controls the CET settings for user
                      mode and suervisor mode respectively.

  MSR_IA32_PL{0,1,2,3}_SSP: Stores shadow stack pointers for
                            CPL-0,1,2,3 level respectively.

  MSR_IA32_INT_SSP_TAB: Stores base address of shadow stack
                        pointer table.

Two XSAVES state bits are introduced for CET:
  IA32_XSS:[bit 11]: For saving/restoring user mode CET states
  IA32_XSS:[bit 12]: For saving/restoring supervisor mode CET states.

Six VMCS fields are introduced for CET:
  {HOST,GUEST}_S_CET: Stores CET settings for supervisor mode.
  {HOST,GUEST}_SSP: Stores shadow stack pointer for supervisor mode.
  {HOST,GUEST}_INTR_SSP_TABLE: Stores base address of shadow stack pointer
                               table.

If VM_EXIT_LOAD_HOST_CET_STATE = 1, the host's CET MSRs are restored
from below VMCS fields at VM-Exit:
  HOST_S_CET
  HOST_SSP
  HOST_INTR_SSP_TABLE

If VM_ENTRY_LOAD_GUEST_CET_STATE = 1, the guest's CET MSRs are loaded
from below VMCS fields at VM-Entry:
  GUEST_S_CET
  GUEST_SSP
  GUEST_INTR_SSP_TABLE

Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/include/asm/vmx.h | 8 ++++++++
 arch/x86/kvm/cpuid.c       | 4 ++--
 arch/x86/kvm/x86.h         | 3 ++-
 3 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index a39136b0d509..68bca290a203 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -90,6 +90,7 @@
 #define VM_EXIT_CLEAR_BNDCFGS                   0x00800000
 #define VM_EXIT_PT_CONCEAL_PIP			0x01000000
 #define VM_EXIT_CLEAR_IA32_RTIT_CTL		0x02000000
+#define VM_EXIT_LOAD_HOST_CET_STATE             0x10000000
 
 #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR	0x00036dff
 
@@ -103,6 +104,7 @@
 #define VM_ENTRY_LOAD_BNDCFGS                   0x00010000
 #define VM_ENTRY_PT_CONCEAL_PIP			0x00020000
 #define VM_ENTRY_LOAD_IA32_RTIT_CTL		0x00040000
+#define VM_ENTRY_LOAD_GUEST_CET_STATE           0x00100000
 
 #define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR	0x000011ff
 
@@ -321,6 +323,9 @@ enum vmcs_field {
 	GUEST_PENDING_DBG_EXCEPTIONS    = 0x00006822,
 	GUEST_SYSENTER_ESP              = 0x00006824,
 	GUEST_SYSENTER_EIP              = 0x00006826,
+	GUEST_S_CET                     = 0x00006828,
+	GUEST_SSP                       = 0x0000682a,
+	GUEST_INTR_SSP_TABLE            = 0x0000682c,
 	HOST_CR0                        = 0x00006c00,
 	HOST_CR3                        = 0x00006c02,
 	HOST_CR4                        = 0x00006c04,
@@ -333,6 +338,9 @@ enum vmcs_field {
 	HOST_IA32_SYSENTER_EIP          = 0x00006c12,
 	HOST_RSP                        = 0x00006c14,
 	HOST_RIP                        = 0x00006c16,
+	HOST_S_CET                      = 0x00006c18,
+	HOST_SSP                        = 0x00006c1a,
+	HOST_INTR_SSP_TABLE             = 0x00006c1c
 };
 
 /*
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 9d282fec0a62..1aa86b87b6ab 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -365,13 +365,13 @@ static inline void do_cpuid_7_mask(struct kvm_cpuid_entry2 *entry, int index)
 		F(AVX512VBMI) | F(LA57) | F(PKU) | 0 /*OSPKE*/ |
 		F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
 		F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
-		F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B);
+		F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | F(SHSTK);
 
 	/* cpuid 7.0.edx*/
 	const u32 kvm_cpuid_7_0_edx_x86_features =
 		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
 		F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
-		F(MD_CLEAR);
+		F(MD_CLEAR) | F(IBT);
 
 	/* cpuid 7.1.eax */
 	const u32 kvm_cpuid_7_1_eax_x86_features =
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index fbffabad0370..a85800b23e6e 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -298,7 +298,8 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, unsigned long cr2,
  * Right now, no XSS states are used on x86 platform,
  * expand the macro for new features.
  */
-#define KVM_SUPPORTED_XSS	(0)
+#define KVM_SUPPORTED_XSS	(XFEATURE_MASK_CET_USER \
+				| XFEATURE_MASK_CET_KERNEL)
 
 extern u64 host_xcr0;
 
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v7 3/7] KVM: VMX: Pass through CET related MSRs to Guest
  2019-09-27  2:19 [PATCH v7 0/7] Introduce support for Guest CET feature Yang Weijiang
  2019-09-27  2:19 ` [PATCH v7 1/7] KVM: CPUID: Fix IA32_XSS support in CPUID(0xd,i) enumeration Yang Weijiang
  2019-09-27  2:19 ` [PATCH v7 2/7] kvm: vmx: Define CET VMCS fields and CPUID flags Yang Weijiang
@ 2019-09-27  2:19 ` Yang Weijiang
  2019-10-02 18:18   ` Jim Mattson
  2019-09-27  2:19 ` [PATCH v7 4/7] KVM: VMX: Load Guest CET via VMCS when CET is enabled in Guest Yang Weijiang
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 43+ messages in thread
From: Yang Weijiang @ 2019-09-27  2:19 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini, sean.j.christopherson
  Cc: mst, rkrcmar, jmattson, Yang Weijiang

CET MSRs pass through Guest directly to enhance performance.
CET runtime control settings are stored in MSR_IA32_{U,S}_CET,
Shadow Stack Pointer(SSP) are stored in MSR_IA32_PL{0,1,2,3}_SSP,
SSP table base address is stored in MSR_IA32_INT_SSP_TAB,
these MSRs are defined in kernel and re-used here.

MSR_IA32_U_CET and MSR_IA32_PL3_SSP are used for user mode protection,
the contents could differ from process to process, therefore,
kernel needs to save/restore them during context switch, it makes
sense to pass through them so that the guest kernel can
use xsaves/xrstors to operate them efficiently. Other MSRs are used
for non-user mode protection. See CET spec for detailed info.

The difference between CET VMCS state fields and xsave components is that,
the former used for CET state storage during VMEnter/VMExit,
whereas the latter used for state retention between Guest task/process
switch.

Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/cpuid.c   |  1 +
 arch/x86/kvm/cpuid.h   |  2 ++
 arch/x86/kvm/vmx/vmx.c | 39 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 42 insertions(+)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 1aa86b87b6ab..0a47b9e565be 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -66,6 +66,7 @@ u64 kvm_supported_xss(void)
 {
 	return KVM_SUPPORTED_XSS & kvm_x86_ops->supported_xss();
 }
+EXPORT_SYMBOL_GPL(kvm_supported_xss);
 
 #define F(x) bit(X86_FEATURE_##x)
 
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index d78a61408243..1d77b880084d 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -27,6 +27,8 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
 
 int cpuid_query_maxphyaddr(struct kvm_vcpu *vcpu);
 
+u64 kvm_supported_xss(void);
+
 static inline int cpuid_maxphyaddr(struct kvm_vcpu *vcpu)
 {
 	return vcpu->arch.maxphyaddr;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index a84198cff397..f720baa7a9ba 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7001,6 +7001,43 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
 		vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4));
 }
 
+static void vmx_intercept_cet_msrs(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	unsigned long *msr_bitmap;
+	u64 kvm_xss;
+	bool cet_en;
+
+	msr_bitmap = vmx->vmcs01.msr_bitmap;
+	kvm_xss = kvm_supported_xss();
+	cet_en = guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) ||
+		 guest_cpuid_has(vcpu, X86_FEATURE_IBT);
+
+	/*
+	 * U_CET is a must for USER CET, per CET spec., U_CET and PL3_SPP are
+	 * a bundle for USER CET xsaves.
+	 */
+	if (cet_en && (kvm_xss & XFEATURE_MASK_CET_USER)) {
+		vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_U_CET, MSR_TYPE_RW);
+		vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL3_SSP, MSR_TYPE_RW);
+	}
+	/*
+	 * S_CET is a must for KERNEL CET, PL0_SSP ... PL2_SSP are a bundle
+	 * for CET KERNEL xsaves.
+	 */
+	if (cet_en && (kvm_xss & XFEATURE_MASK_CET_KERNEL)) {
+		vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_S_CET, MSR_TYPE_RW);
+		vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL0_SSP, MSR_TYPE_RW);
+		vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL1_SSP, MSR_TYPE_RW);
+		vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL2_SSP, MSR_TYPE_RW);
+
+		/* SSP_TAB only available for KERNEL SHSTK.*/
+		if (guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
+			vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_INT_SSP_TAB,
+						      MSR_TYPE_RW);
+	}
+}
+
 static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -7025,6 +7062,8 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
 	if (boot_cpu_has(X86_FEATURE_INTEL_PT) &&
 			guest_cpuid_has(vcpu, X86_FEATURE_INTEL_PT))
 		update_intel_pt_cfg(vcpu);
+
+	vmx_intercept_cet_msrs(vcpu);
 }
 
 static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v7 4/7] KVM: VMX: Load Guest CET via VMCS when CET is enabled in Guest
  2019-09-27  2:19 [PATCH v7 0/7] Introduce support for Guest CET feature Yang Weijiang
                   ` (2 preceding siblings ...)
  2019-09-27  2:19 ` [PATCH v7 3/7] KVM: VMX: Pass through CET related MSRs to Guest Yang Weijiang
@ 2019-09-27  2:19 ` Yang Weijiang
  2019-10-02 18:54   ` Jim Mattson
  2019-09-27  2:19 ` [PATCH v7 5/7] kvm: x86: Add CET CR4 bit and XSS support Yang Weijiang
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 43+ messages in thread
From: Yang Weijiang @ 2019-09-27  2:19 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini, sean.j.christopherson
  Cc: mst, rkrcmar, jmattson, Yang Weijiang

"Load Guest CET state" bit controls whether Guest CET states
will be loaded at Guest entry. Before doing that, KVM needs
to check if CPU CET feature is enabled on host and available
to Guest.

Note: SHSTK and IBT features share one control MSR:
MSR_IA32_{U,S}_CET, which means it's difficult to hide
one feature from another in the case of SHSTK != IBT,
after discussed in community, it's agreed to allow Guest
control two features independently as it won't introduce
security hole.

Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index f720baa7a9ba..ba1a83d11e69 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -44,6 +44,7 @@
 #include <asm/spec-ctrl.h>
 #include <asm/virtext.h>
 #include <asm/vmx.h>
+#include <asm/cet.h>
 
 #include "capabilities.h"
 #include "cpuid.h"
@@ -2918,6 +2919,37 @@ void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 	vmcs_writel(GUEST_CR3, guest_cr3);
 }
 
+static int set_cet_bit(struct kvm_vcpu *vcpu, unsigned long cr4)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	const u64 cet_bits = XFEATURE_MASK_CET_USER | XFEATURE_MASK_CET_KERNEL;
+	bool cet_xss = vmx_xsaves_supported() &&
+		       (kvm_supported_xss() & cet_bits);
+	bool cet_cpuid = guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) ||
+			 guest_cpuid_has(vcpu, X86_FEATURE_IBT);
+	bool cet_on = !!(cr4 & X86_CR4_CET);
+
+	if (cet_on && vmx->nested.vmxon)
+		return 1;
+
+	if (cet_on && !cpu_x86_cet_enabled())
+		return 1;
+
+	if (cet_on && !cet_xss)
+		return 1;
+
+	if (cet_on && !cet_cpuid)
+		return 1;
+
+	if (cet_on)
+		vmcs_set_bits(VM_ENTRY_CONTROLS,
+			      VM_ENTRY_LOAD_GUEST_CET_STATE);
+	else
+		vmcs_clear_bits(VM_ENTRY_CONTROLS,
+				VM_ENTRY_LOAD_GUEST_CET_STATE);
+	return 0;
+}
+
 int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -2958,6 +2990,9 @@ int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 			return 1;
 	}
 
+	if (set_cet_bit(vcpu, cr4))
+		return 1;
+
 	if (vmx->nested.vmxon && !nested_cr4_valid(vcpu, cr4))
 		return 1;
 
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v7 5/7] kvm: x86: Add CET CR4 bit and XSS support
  2019-09-27  2:19 [PATCH v7 0/7] Introduce support for Guest CET feature Yang Weijiang
                   ` (3 preceding siblings ...)
  2019-09-27  2:19 ` [PATCH v7 4/7] KVM: VMX: Load Guest CET via VMCS when CET is enabled in Guest Yang Weijiang
@ 2019-09-27  2:19 ` Yang Weijiang
  2019-10-02 19:05   ` Jim Mattson
  2019-09-27  2:19 ` [PATCH v7 6/7] KVM: x86: Load Guest fpu state when accessing MSRs managed by XSAVES Yang Weijiang
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 43+ messages in thread
From: Yang Weijiang @ 2019-09-27  2:19 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini, sean.j.christopherson
  Cc: mst, rkrcmar, jmattson, Yang Weijiang

CR4.CET(bit 23) is master enable bit for CET feature.
Previously, KVM did not support setting any bits in XSS
so it's hardcoded to check and inject a #GP if Guest
attempted to write a non-zero value to XSS, now it supports
CET related bits setting.

Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  4 +++-
 arch/x86/kvm/cpuid.c            | 11 +++++++++--
 arch/x86/kvm/vmx/vmx.c          |  6 +-----
 3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d018df8c5f32..8f97269d6d9f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -90,7 +90,8 @@
 			  | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_PCIDE \
 			  | X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \
 			  | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \
-			  | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP))
+			  | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP \
+			  | X86_CR4_CET))
 
 #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR)
 
@@ -623,6 +624,7 @@ struct kvm_vcpu_arch {
 
 	u64 xcr0;
 	u64 guest_supported_xcr0;
+	u64 guest_supported_xss;
 	u32 guest_xstate_size;
 
 	struct kvm_pio_request pio;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0a47b9e565be..dd3ddc6daa58 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -120,8 +120,15 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
 	}
 
 	best = kvm_find_cpuid_entry(vcpu, 0xD, 1);
-	if (best && (best->eax & (F(XSAVES) | F(XSAVEC))))
-		best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
+	if (best && (best->eax & (F(XSAVES) | F(XSAVEC)))) {
+		u64 kvm_xss = kvm_supported_xss();
+
+		best->ebx =
+			xstate_required_size(vcpu->arch.xcr0 | kvm_xss, true);
+		vcpu->arch.guest_supported_xss = best->ecx & kvm_xss;
+	} else {
+		vcpu->arch.guest_supported_xss = 0;
+	}
 
 	/*
 	 * The existing code assumes virtual address is 48-bit or 57-bit in the
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ba1a83d11e69..44913e4ab558 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1973,11 +1973,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		     !(guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) &&
 		       guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))))
 			return 1;
-		/*
-		 * The only supported bit as of Skylake is bit 8, but
-		 * it is not supported on KVM.
-		 */
-		if (data != 0)
+		if (data & ~vcpu->arch.guest_supported_xss)
 			return 1;
 		vcpu->arch.ia32_xss = data;
 		if (vcpu->arch.ia32_xss != host_xss)
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v7 6/7] KVM: x86: Load Guest fpu state when accessing MSRs managed by XSAVES
  2019-09-27  2:19 [PATCH v7 0/7] Introduce support for Guest CET feature Yang Weijiang
                   ` (4 preceding siblings ...)
  2019-09-27  2:19 ` [PATCH v7 5/7] kvm: x86: Add CET CR4 bit and XSS support Yang Weijiang
@ 2019-09-27  2:19 ` Yang Weijiang
  2019-10-02 19:56   ` Jim Mattson
  2019-09-27  2:19 ` [PATCH v7 7/7] KVM: x86: Add user-space access interface for CET MSRs Yang Weijiang
  2019-10-02 22:40 ` [PATCH v7 0/7] Introduce support for Guest CET feature Jim Mattson
  7 siblings, 1 reply; 43+ messages in thread
From: Yang Weijiang @ 2019-09-27  2:19 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini, sean.j.christopherson
  Cc: mst, rkrcmar, jmattson, Yang Weijiang

From: Sean Christopherson <sean.j.christopherson@intel.com>

A handful of CET MSRs are not context switched through "traditional"
methods, e.g. VMCS or manual switching, but rather are passed through
to the guest and are saved and restored by XSAVES/XRSTORS, i.e. the
guest's FPU state.

Load the guest's FPU state if userspace is accessing MSRs whose values
are managed by XSAVES so that the MSR helper, e.g. vmx_{get,set}_msr(),
can simply do {RD,WR}MSR to access the guest's value.

Note that guest_cpuid_has() is not queried as host userspace is allowed
to access MSRs that have not been exposed to the guest, e.g. it might do
KVM_SET_MSRS prior to KVM_SET_CPUID2.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/x86.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 290c3c3efb87..5b8116028a59 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -104,6 +104,8 @@ static void enter_smm(struct kvm_vcpu *vcpu);
 static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
 static void store_regs(struct kvm_vcpu *vcpu);
 static int sync_regs(struct kvm_vcpu *vcpu);
+static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
+static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
 
 struct kvm_x86_ops *kvm_x86_ops __read_mostly;
 EXPORT_SYMBOL_GPL(kvm_x86_ops);
@@ -2999,6 +3001,12 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 }
 EXPORT_SYMBOL_GPL(kvm_get_msr_common);
 
+static bool is_xsaves_msr(u32 index)
+{
+	return index == MSR_IA32_U_CET ||
+	       (index >= MSR_IA32_PL0_SSP && index <= MSR_IA32_PL3_SSP);
+}
+
 /*
  * Read or write a bunch of msrs. All parameters are kernel addresses.
  *
@@ -3009,11 +3017,23 @@ static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs,
 		    int (*do_msr)(struct kvm_vcpu *vcpu,
 				  unsigned index, u64 *data))
 {
+	bool fpu_loaded = false;
 	int i;
+	const u64 cet_bits = XFEATURE_MASK_CET_USER | XFEATURE_MASK_CET_KERNEL;
+	bool cet_xss = kvm_x86_ops->xsaves_supported() &&
+		       (kvm_supported_xss() & cet_bits);
 
-	for (i = 0; i < msrs->nmsrs; ++i)
+	for (i = 0; i < msrs->nmsrs; ++i) {
+		if (!fpu_loaded && cet_xss &&
+		    is_xsaves_msr(entries[i].index)) {
+			kvm_load_guest_fpu(vcpu);
+			fpu_loaded = true;
+		}
 		if (do_msr(vcpu, entries[i].index, &entries[i].data))
 			break;
+	}
+	if (fpu_loaded)
+		kvm_put_guest_fpu(vcpu);
 
 	return i;
 }
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v7 7/7] KVM: x86: Add user-space access interface for CET MSRs
  2019-09-27  2:19 [PATCH v7 0/7] Introduce support for Guest CET feature Yang Weijiang
                   ` (5 preceding siblings ...)
  2019-09-27  2:19 ` [PATCH v7 6/7] KVM: x86: Load Guest fpu state when accessing MSRs managed by XSAVES Yang Weijiang
@ 2019-09-27  2:19 ` Yang Weijiang
  2019-10-02 20:57   ` Jim Mattson
  2019-10-02 22:40 ` [PATCH v7 0/7] Introduce support for Guest CET feature Jim Mattson
  7 siblings, 1 reply; 43+ messages in thread
From: Yang Weijiang @ 2019-09-27  2:19 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini, sean.j.christopherson
  Cc: mst, rkrcmar, jmattson, Yang Weijiang

There're two different places storing Guest CET states, the states
managed with XSAVES/XRSTORS, as restored/saved
in previous patch, can be read/write directly from/to the MSRs.
For those stored in VMCS fields, they're access via vmcs_read/
vmcs_write.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 83 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 83 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 44913e4ab558..5265db7cd2af 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1671,6 +1671,49 @@ static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
 	return 0;
 }
 
+static int check_cet_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+{
+	u64 kvm_xss = kvm_supported_xss();
+
+	switch (msr_info->index) {
+	case MSR_IA32_PL0_SSP ... MSR_IA32_PL2_SSP:
+		if (!(kvm_xss | XFEATURE_MASK_CET_KERNEL))
+			return 1;
+		if (!msr_info->host_initiated &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
+			return 1;
+		break;
+	case MSR_IA32_PL3_SSP:
+		if (!(kvm_xss | XFEATURE_MASK_CET_USER))
+			return 1;
+		if (!msr_info->host_initiated &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
+			return 1;
+		break;
+	case MSR_IA32_U_CET:
+		if (!(kvm_xss | XFEATURE_MASK_CET_USER))
+			return 1;
+		if (!msr_info->host_initiated &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_IBT))
+			return 1;
+		break;
+	case MSR_IA32_S_CET:
+		if (!msr_info->host_initiated &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_IBT))
+			return 1;
+		break;
+	case MSR_IA32_INT_SSP_TAB:
+		if (!msr_info->host_initiated &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
+			return 1;
+		break;
+	default:
+		return 1;
+	}
+	return 0;
+}
 /*
  * Reads an msr value (of 'msr_index') into 'pdata'.
  * Returns 0 on success, non-0 otherwise.
@@ -1788,6 +1831,26 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		else
 			msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
 		break;
+	case MSR_IA32_S_CET:
+		if (check_cet_msr(vcpu, msr_info))
+			return 1;
+		msr_info->data = vmcs_readl(GUEST_S_CET);
+		break;
+	case MSR_IA32_INT_SSP_TAB:
+		if (check_cet_msr(vcpu, msr_info))
+			return 1;
+		msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE);
+		break;
+	case MSR_IA32_U_CET:
+		if (check_cet_msr(vcpu, msr_info))
+			return 1;
+		rdmsrl(MSR_IA32_U_CET, msr_info->data);
+		break;
+	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+		if (check_cet_msr(vcpu, msr_info))
+			return 1;
+		rdmsrl(msr_info->index, msr_info->data);
+		break;
 	case MSR_TSC_AUX:
 		if (!msr_info->host_initiated &&
 		    !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
@@ -2039,6 +2102,26 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		else
 			vmx->pt_desc.guest.addr_a[index / 2] = data;
 		break;
+	case MSR_IA32_S_CET:
+		if (check_cet_msr(vcpu, msr_info))
+			return 1;
+		vmcs_writel(GUEST_S_CET, data);
+		break;
+	case MSR_IA32_INT_SSP_TAB:
+		if (check_cet_msr(vcpu, msr_info))
+			return 1;
+		vmcs_writel(GUEST_INTR_SSP_TABLE, data);
+		break;
+	case MSR_IA32_U_CET:
+		if (check_cet_msr(vcpu, msr_info))
+			return 1;
+		wrmsrl(MSR_IA32_U_CET, data);
+		break;
+	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+		if (check_cet_msr(vcpu, msr_info))
+			return 1;
+		wrmsrl(msr_info->index, data);
+		break;
 	case MSR_TSC_AUX:
 		if (!msr_info->host_initiated &&
 		    !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 1/7] KVM: CPUID: Fix IA32_XSS support in CPUID(0xd,i) enumeration
  2019-09-27  2:19 ` [PATCH v7 1/7] KVM: CPUID: Fix IA32_XSS support in CPUID(0xd,i) enumeration Yang Weijiang
@ 2019-10-02 17:26   ` Jim Mattson
  2019-10-08  8:30     ` Yang Weijiang
  2019-10-17 19:46     ` Sean Christopherson
  0 siblings, 2 replies; 43+ messages in thread
From: Jim Mattson @ 2019-10-02 17:26 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: kvm list, LKML, Paolo Bonzini, Sean Christopherson,
	Michael S. Tsirkin, Radim Krčmář

On Thu, Sep 26, 2019 at 7:17 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
>
> The control bits in IA32_XSS MSR are being used for new features,
> but current CPUID(0xd,i) enumeration code doesn't support them, so
> fix existing code first.
>
> The supervisor states in IA32_XSS haven't been used in public
> KVM code, so set KVM_SUPPORTED_XSS to 0 now, anyone who's developing
> IA32_XSS related feature may expand the macro to add the CPUID support,
> otherwise, CPUID(0xd,i>1) always reports 0 of the subleaf to guest.
>
> Extracted old code into a new filter and keep it same flavor as others.
>
> This patch passed selftest on a few Intel platforms.
>
> Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  1 +
>  arch/x86/kvm/cpuid.c            | 94 +++++++++++++++++++++------------
>  arch/x86/kvm/svm.c              |  7 +++
>  arch/x86/kvm/vmx/vmx.c          |  6 +++
>  arch/x86/kvm/x86.h              |  7 +++
>  5 files changed, 82 insertions(+), 33 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 74e88e5edd9c..d018df8c5f32 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1209,6 +1209,7 @@ struct kvm_x86_ops {
>         uint16_t (*nested_get_evmcs_version)(struct kvm_vcpu *vcpu);
>
>         bool (*need_emulation_on_page_fault)(struct kvm_vcpu *vcpu);
> +       u64 (*supported_xss)(void);
>  };
>
>  struct kvm_arch_async_pf {
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 22c2720cd948..9d282fec0a62 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -62,6 +62,11 @@ u64 kvm_supported_xcr0(void)
>         return xcr0;
>  }
>
> +u64 kvm_supported_xss(void)
> +{
> +       return KVM_SUPPORTED_XSS & kvm_x86_ops->supported_xss();
> +}
> +
>  #define F(x) bit(X86_FEATURE_##x)
>
>  int kvm_update_cpuid(struct kvm_vcpu *vcpu)
> @@ -414,6 +419,50 @@ static inline void do_cpuid_7_mask(struct kvm_cpuid_entry2 *entry, int index)
>         }
>  }
>
> +static inline void do_cpuid_0xd_mask(struct kvm_cpuid_entry2 *entry, int index)
> +{
> +       unsigned int f_xsaves = kvm_x86_ops->xsaves_supported() ? F(XSAVES) : 0;

Does Intel have CPUs that support XSAVES but don't support the "enable
XSAVES/XRSTORS" VM-execution control? If so, what is the behavior of
XSAVESXRSTORS on those CPUs in VMX non-root mode? If not, why is this
conditional F(XSAVES) here?

> +       /* cpuid 0xD.1.eax */
> +       const u32 kvm_cpuid_D_1_eax_x86_features =
> +               F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1) | f_xsaves;
> +       u64 u_supported = kvm_supported_xcr0();
> +       u64 s_supported = kvm_supported_xss();
> +       u64 supported;
> +
> +       switch (index) {
> +       case 0:
> +               entry->eax &= u_supported;
> +               entry->ebx = xstate_required_size(u_supported, false);

EBX could actually be zero, couldn't it? Since this output is
context-dependent, I'm not sure how to interpret it when returned from
KVM_GET_SUPPORTED_CPUID.

> +               entry->ecx = entry->ebx;
> +               entry->edx = 0;

Shouldn't this be: entry->edx &= u_supported >> 32?

> +               break;
> +       case 1:
> +               supported = u_supported | s_supported;
> +               entry->eax &= kvm_cpuid_D_1_eax_x86_features;
> +               cpuid_mask(&entry->eax, CPUID_D_1_EAX);
> +               entry->ebx = 0;
> +               entry->edx = 0;

Shouldn't this be: entry->edx &= s_supported >> 32?

> +               entry->ecx &= s_supported;
> +               if (entry->eax & (F(XSAVES) | F(XSAVEC)))
> +                       entry->ebx = xstate_required_size(supported, true);

As above, can't EBX just be zero, since it's context-dependent? What
is the context when processing KVM_GET_SUPPORTED_CPUID? And why do we
only fill this in when XSAVES or XSAVEC is supported?

> +               break;
> +       default:
> +               supported = (entry->ecx & 1) ? s_supported : u_supported;
> +               if (!(supported & ((u64)1 << index))) {

Nit: 1ULL << index.

> +                       entry->eax = 0;
> +                       entry->ebx = 0;
> +                       entry->ecx = 0;
> +                       entry->edx = 0;
> +                       return;
> +               }
> +               if (entry->ecx)
> +                       entry->ebx = 0;

This seems to back up my claims above regarding the EBX output for
cases 0 and 1, but aside from those subleaves, is this correct? For
subleaves > 1, ECX bit 1 can be set for extended state components that
need to be cache-line aligned. Such components could map to a valid
bit in XCR0 and have a non-zero offset from the beginning of the
non-compacted XSAVE area.

> +               entry->edx = 0;

This seems too aggressive. See my comments above regarding EDX outputs
for cases 0 and 1.

> +               break;
> +       }
> +}
> +
>  static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
>                                   int *nent, int maxnent)
>  {
> @@ -428,7 +477,6 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
>         unsigned f_lm = 0;
>  #endif
>         unsigned f_rdtscp = kvm_x86_ops->rdtscp_supported() ? F(RDTSCP) : 0;
> -       unsigned f_xsaves = kvm_x86_ops->xsaves_supported() ? F(XSAVES) : 0;
>         unsigned f_intel_pt = kvm_x86_ops->pt_supported() ? F(INTEL_PT) : 0;
>
>         /* cpuid 1.edx */
> @@ -482,10 +530,6 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
>                 F(ACE2) | F(ACE2_EN) | F(PHE) | F(PHE_EN) |
>                 F(PMM) | F(PMM_EN);
>
> -       /* cpuid 0xD.1.eax */
> -       const u32 kvm_cpuid_D_1_eax_x86_features =
> -               F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1) | f_xsaves;
> -
>         /* all calls to cpuid_count() should be made on the same cpu */
>         get_cpu();
>
> @@ -622,38 +666,22 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
>                 break;
>         }
>         case 0xd: {
> -               int idx, i;
> -               u64 supported = kvm_supported_xcr0();
> -
> -               entry->eax &= supported;
> -               entry->ebx = xstate_required_size(supported, false);
> -               entry->ecx = entry->ebx;
> -               entry->edx &= supported >> 32;
> -               if (!supported)
> -                       break;
> +               int i, idx;
>
> -               for (idx = 1, i = 1; idx < 64; ++idx) {
> -                       u64 mask = ((u64)1 << idx);
> +               do_cpuid_0xd_mask(&entry[0], 0);
> +               for (i = 1, idx = 1; idx < 64; ++idx) {
>                         if (*nent >= maxnent)
>                                 goto out;
> -
>                         do_host_cpuid(&entry[i], function, idx);
> -                       if (idx == 1) {
> -                               entry[i].eax &= kvm_cpuid_D_1_eax_x86_features;
> -                               cpuid_mask(&entry[i].eax, CPUID_D_1_EAX);
> -                               entry[i].ebx = 0;
> -                               if (entry[i].eax & (F(XSAVES)|F(XSAVEC)))
> -                                       entry[i].ebx =
> -                                               xstate_required_size(supported,
> -                                                                    true);
> -                       } else {
> -                               if (entry[i].eax == 0 || !(supported & mask))
> -                                       continue;
> -                               if (WARN_ON_ONCE(entry[i].ecx & 1))
> -                                       continue;
> -                       }
> -                       entry[i].ecx = 0;
> -                       entry[i].edx = 0;
> +                       if (entry[i].eax == 0 && entry[i].ebx == 0 &&
> +                           entry[i].ecx == 0 && entry[i].edx == 0)
> +                               continue;
> +
> +                       do_cpuid_0xd_mask(&entry[i], idx);
> +                       if (entry[i].eax == 0 && entry[i].ebx == 0 &&
> +                           entry[i].ecx == 0 && entry[i].edx == 0)
> +                               continue;
> +
>                         ++*nent;
>                         ++i;
>                 }
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index e0368076a1ef..be967bf9a81d 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7193,6 +7193,11 @@ static bool svm_need_emulation_on_page_fault(struct kvm_vcpu *vcpu)
>         return false;
>  }
>
> +static u64 svm_supported_xss(void)
> +{
> +       return 0;
> +}
> +
>  static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>         .cpu_has_kvm_support = has_svm,
>         .disabled_by_bios = is_disabled,
> @@ -7329,6 +7334,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>         .nested_get_evmcs_version = NULL,
>
>         .need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
> +
> +       .supported_xss = svm_supported_xss,
>  };
>
>  static int __init svm_init(void)
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index c6f6b05004d9..a84198cff397 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1651,6 +1651,11 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
>         return !(val & ~valid_bits);
>  }
>
> +static inline u64 vmx_supported_xss(void)
> +{
> +       return host_xss;
> +}

Do you really need vendor-specific code for this? Can't you just hoist
host_xss into common code (x86.c) and use that? [Note that Aaron Lewis
is currently working on a series that will include that hoisting, if
you want to wait.]


>  static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
>  {
>         switch (msr->index) {
> @@ -7799,6 +7804,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
>         .nested_enable_evmcs = NULL,
>         .nested_get_evmcs_version = NULL,
>         .need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
> +       .supported_xss = vmx_supported_xss,
>  };
>
>  static void vmx_cleanup_l1d_flush(void)
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 6594020c0691..fbffabad0370 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -293,6 +293,13 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, unsigned long cr2,
>                                 | XFEATURE_MASK_YMM | XFEATURE_MASK_BNDREGS \
>                                 | XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
>                                 | XFEATURE_MASK_PKRU)
> +
> +/*
> + * Right now, no XSS states are used on x86 platform,
> + * expand the macro for new features.
> + */
> +#define KVM_SUPPORTED_XSS      (0)
> +

Nit: superfluous parentheses.

>  extern u64 host_xcr0;
>
>  extern u64 kvm_supported_xcr0(void);
> --
> 2.17.2
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 2/7] kvm: vmx: Define CET VMCS fields and CPUID flags
  2019-09-27  2:19 ` [PATCH v7 2/7] kvm: vmx: Define CET VMCS fields and CPUID flags Yang Weijiang
@ 2019-10-02 18:04   ` Jim Mattson
  2019-10-09  5:56     ` Yang Weijiang
  0 siblings, 1 reply; 43+ messages in thread
From: Jim Mattson @ 2019-10-02 18:04 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: kvm list, LKML, Paolo Bonzini, Sean Christopherson,
	Michael S. Tsirkin, Radim Krčmář,
	Aaron Lewis

On Thu, Sep 26, 2019 at 7:17 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
>
> CET(Control-flow Enforcement Technology) is an upcoming Intel(R)
> processor feature that blocks Return/Jump-Oriented Programming(ROP)
> attacks. It provides the following capabilities to defend
> against ROP/JOP style control-flow subversion attacks:
>
> Shadow Stack (SHSTK):
>   A second stack for program which is used exclusively for
>   control transfer operations.
>
> Indirect Branch Tracking (IBT):
>   Code branching protection to defend against jump/call oriented
>   programming.
>
> Several new CET MSRs are defined in kernel to support CET:
>   MSR_IA32_{U,S}_CET: Controls the CET settings for user
>                       mode and suervisor mode respectively.
>
>   MSR_IA32_PL{0,1,2,3}_SSP: Stores shadow stack pointers for
>                             CPL-0,1,2,3 level respectively.
>
>   MSR_IA32_INT_SSP_TAB: Stores base address of shadow stack
>                         pointer table.
>
> Two XSAVES state bits are introduced for CET:
>   IA32_XSS:[bit 11]: For saving/restoring user mode CET states
>   IA32_XSS:[bit 12]: For saving/restoring supervisor mode CET states.
>
> Six VMCS fields are introduced for CET:
>   {HOST,GUEST}_S_CET: Stores CET settings for supervisor mode.
>   {HOST,GUEST}_SSP: Stores shadow stack pointer for supervisor mode.
>   {HOST,GUEST}_INTR_SSP_TABLE: Stores base address of shadow stack pointer
>                                table.
>
> If VM_EXIT_LOAD_HOST_CET_STATE = 1, the host's CET MSRs are restored
> from below VMCS fields at VM-Exit:
>   HOST_S_CET
>   HOST_SSP
>   HOST_INTR_SSP_TABLE
>
> If VM_ENTRY_LOAD_GUEST_CET_STATE = 1, the guest's CET MSRs are loaded
> from below VMCS fields at VM-Entry:
>   GUEST_S_CET
>   GUEST_SSP
>   GUEST_INTR_SSP_TABLE
>
> Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> ---
>  arch/x86/include/asm/vmx.h | 8 ++++++++
>  arch/x86/kvm/cpuid.c       | 4 ++--
>  arch/x86/kvm/x86.h         | 3 ++-
>  3 files changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index a39136b0d509..68bca290a203 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -90,6 +90,7 @@
>  #define VM_EXIT_CLEAR_BNDCFGS                   0x00800000
>  #define VM_EXIT_PT_CONCEAL_PIP                 0x01000000
>  #define VM_EXIT_CLEAR_IA32_RTIT_CTL            0x02000000
> +#define VM_EXIT_LOAD_HOST_CET_STATE             0x10000000
>
>  #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR      0x00036dff
>
> @@ -103,6 +104,7 @@
>  #define VM_ENTRY_LOAD_BNDCFGS                   0x00010000
>  #define VM_ENTRY_PT_CONCEAL_PIP                        0x00020000
>  #define VM_ENTRY_LOAD_IA32_RTIT_CTL            0x00040000
> +#define VM_ENTRY_LOAD_GUEST_CET_STATE           0x00100000
>
>  #define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR     0x000011ff
>
> @@ -321,6 +323,9 @@ enum vmcs_field {
>         GUEST_PENDING_DBG_EXCEPTIONS    = 0x00006822,
>         GUEST_SYSENTER_ESP              = 0x00006824,
>         GUEST_SYSENTER_EIP              = 0x00006826,
> +       GUEST_S_CET                     = 0x00006828,
> +       GUEST_SSP                       = 0x0000682a,
> +       GUEST_INTR_SSP_TABLE            = 0x0000682c,
>         HOST_CR0                        = 0x00006c00,
>         HOST_CR3                        = 0x00006c02,
>         HOST_CR4                        = 0x00006c04,
> @@ -333,6 +338,9 @@ enum vmcs_field {
>         HOST_IA32_SYSENTER_EIP          = 0x00006c12,
>         HOST_RSP                        = 0x00006c14,
>         HOST_RIP                        = 0x00006c16,
> +       HOST_S_CET                      = 0x00006c18,
> +       HOST_SSP                        = 0x00006c1a,
> +       HOST_INTR_SSP_TABLE             = 0x00006c1c
>  };
>
>  /*
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 9d282fec0a62..1aa86b87b6ab 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -365,13 +365,13 @@ static inline void do_cpuid_7_mask(struct kvm_cpuid_entry2 *entry, int index)
>                 F(AVX512VBMI) | F(LA57) | F(PKU) | 0 /*OSPKE*/ |
>                 F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
>                 F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
> -               F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B);
> +               F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | F(SHSTK);
>
>         /* cpuid 7.0.edx*/
>         const u32 kvm_cpuid_7_0_edx_x86_features =
>                 F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
>                 F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
> -               F(MD_CLEAR);
> +               F(MD_CLEAR) | F(IBT);

Claiming that SHSTK and IBT are supported in the guest seems premature
as of this change, since you haven't actually done anything to
virtualize the features yet.

>         /* cpuid 7.1.eax */
>         const u32 kvm_cpuid_7_1_eax_x86_features =
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index fbffabad0370..a85800b23e6e 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -298,7 +298,8 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, unsigned long cr2,
>   * Right now, no XSS states are used on x86 platform,
>   * expand the macro for new features.
>   */
> -#define KVM_SUPPORTED_XSS      (0)
> +#define KVM_SUPPORTED_XSS      (XFEATURE_MASK_CET_USER \
> +                               | XFEATURE_MASK_CET_KERNEL)

If IA32_XSS can dynamically change within the guest, it will have to
be enumerated by KVM_GET_MSR_INDEX_LIST. (Note that Aaron Lewis is
working on a series which will include that enumeration, if you'd like
to wait.) I'm also not convinced that there is sufficient
virtualization of these features to allow these bits to be set in the
guest IA32_XSS at this point.

>  extern u64 host_xcr0;
>
> --
> 2.17.2
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 3/7] KVM: VMX: Pass through CET related MSRs to Guest
  2019-09-27  2:19 ` [PATCH v7 3/7] KVM: VMX: Pass through CET related MSRs to Guest Yang Weijiang
@ 2019-10-02 18:18   ` Jim Mattson
  2019-10-09  6:15     ` Yang Weijiang
  0 siblings, 1 reply; 43+ messages in thread
From: Jim Mattson @ 2019-10-02 18:18 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: kvm list, LKML, Paolo Bonzini, Sean Christopherson,
	Michael S. Tsirkin, Radim Krčmář

On Thu, Sep 26, 2019 at 7:17 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
>
> CET MSRs pass through Guest directly to enhance performance.
> CET runtime control settings are stored in MSR_IA32_{U,S}_CET,
> Shadow Stack Pointer(SSP) are stored in MSR_IA32_PL{0,1,2,3}_SSP,
> SSP table base address is stored in MSR_IA32_INT_SSP_TAB,
> these MSRs are defined in kernel and re-used here.

All of these new guest MSRs will have to be enumerated by
KVM_GET_MSR_INDEX_LIST.

> MSR_IA32_U_CET and MSR_IA32_PL3_SSP are used for user mode protection,
> the contents could differ from process to process, therefore,
> kernel needs to save/restore them during context switch, it makes
> sense to pass through them so that the guest kernel can
> use xsaves/xrstors to operate them efficiently. Other MSRs are used
> for non-user mode protection. See CET spec for detailed info.

I assume that XSAVES & XRSTORS bypass the MSR permission bitmap, like
other instructions that manipulate MSRs (e.g. SWAPGS, RDTSCP, etc.).
Is the guest OS likely to use RDMSR/WRMSR to access these MSRs?

> The difference between CET VMCS state fields and xsave components is that,
> the former used for CET state storage during VMEnter/VMExit,
> whereas the latter used for state retention between Guest task/process
> switch.
>
> Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> ---
>  arch/x86/kvm/cpuid.c   |  1 +
>  arch/x86/kvm/cpuid.h   |  2 ++
>  arch/x86/kvm/vmx/vmx.c | 39 +++++++++++++++++++++++++++++++++++++++
>  3 files changed, 42 insertions(+)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 1aa86b87b6ab..0a47b9e565be 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -66,6 +66,7 @@ u64 kvm_supported_xss(void)
>  {
>         return KVM_SUPPORTED_XSS & kvm_x86_ops->supported_xss();
>  }
> +EXPORT_SYMBOL_GPL(kvm_supported_xss);
>
>  #define F(x) bit(X86_FEATURE_##x)
>
> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> index d78a61408243..1d77b880084d 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -27,6 +27,8 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
>
>  int cpuid_query_maxphyaddr(struct kvm_vcpu *vcpu);
>
> +u64 kvm_supported_xss(void);
> +
>  static inline int cpuid_maxphyaddr(struct kvm_vcpu *vcpu)
>  {
>         return vcpu->arch.maxphyaddr;
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index a84198cff397..f720baa7a9ba 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7001,6 +7001,43 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
>                 vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4));
>  }
>
> +static void vmx_intercept_cet_msrs(struct kvm_vcpu *vcpu)

Nit: It seems like this function adjusts the MSR permission bitmap so
as *not* to intercept the CET MSRs.

> +{
> +       struct vcpu_vmx *vmx = to_vmx(vcpu);
> +       unsigned long *msr_bitmap;
> +       u64 kvm_xss;
> +       bool cet_en;
> +
> +       msr_bitmap = vmx->vmcs01.msr_bitmap;

What about nested guests? (i.e. vmcs02).

> +       kvm_xss = kvm_supported_xss();
> +       cet_en = guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) ||
> +                guest_cpuid_has(vcpu, X86_FEATURE_IBT);
> +       /*
> +        * U_CET is a must for USER CET, per CET spec., U_CET and PL3_SPP are
> +        * a bundle for USER CET xsaves.
> +        */
> +       if (cet_en && (kvm_xss & XFEATURE_MASK_CET_USER)) {
> +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_U_CET, MSR_TYPE_RW);
> +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL3_SSP, MSR_TYPE_RW);
> +       }

Since this is called from vmx_cpuid_update, what happens if cet_en was
previously true and now it's false?

> +       /*
> +        * S_CET is a must for KERNEL CET, PL0_SSP ... PL2_SSP are a bundle
> +        * for CET KERNEL xsaves.
> +        */
> +       if (cet_en && (kvm_xss & XFEATURE_MASK_CET_KERNEL)) {
> +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_S_CET, MSR_TYPE_RW);
> +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL0_SSP, MSR_TYPE_RW);
> +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL1_SSP, MSR_TYPE_RW);
> +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL2_SSP, MSR_TYPE_RW);
> +
> +               /* SSP_TAB only available for KERNEL SHSTK.*/
> +               if (guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
> +                       vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_INT_SSP_TAB,
> +                                                     MSR_TYPE_RW);
> +       }
> +}
> +
>  static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
>  {
>         struct vcpu_vmx *vmx = to_vmx(vcpu);
> @@ -7025,6 +7062,8 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
>         if (boot_cpu_has(X86_FEATURE_INTEL_PT) &&
>                         guest_cpuid_has(vcpu, X86_FEATURE_INTEL_PT))
>                 update_intel_pt_cfg(vcpu);
> +
> +       vmx_intercept_cet_msrs(vcpu);
>  }
>
>  static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
> --
> 2.17.2
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 4/7] KVM: VMX: Load Guest CET via VMCS when CET is enabled in Guest
  2019-09-27  2:19 ` [PATCH v7 4/7] KVM: VMX: Load Guest CET via VMCS when CET is enabled in Guest Yang Weijiang
@ 2019-10-02 18:54   ` Jim Mattson
  2019-10-09  6:43     ` Yang Weijiang
  0 siblings, 1 reply; 43+ messages in thread
From: Jim Mattson @ 2019-10-02 18:54 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: kvm list, LKML, Paolo Bonzini, Sean Christopherson,
	Michael S. Tsirkin, Radim Krčmář

On Thu, Sep 26, 2019 at 7:17 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
>
> "Load Guest CET state" bit controls whether Guest CET states
> will be loaded at Guest entry. Before doing that, KVM needs
> to check if CPU CET feature is enabled on host and available
> to Guest.
>
> Note: SHSTK and IBT features share one control MSR:
> MSR_IA32_{U,S}_CET, which means it's difficult to hide
> one feature from another in the case of SHSTK != IBT,
> after discussed in community, it's agreed to allow Guest
> control two features independently as it won't introduce
> security hole.
>
> Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> ---
>  arch/x86/kvm/vmx/vmx.c | 35 +++++++++++++++++++++++++++++++++++
>  1 file changed, 35 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index f720baa7a9ba..ba1a83d11e69 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -44,6 +44,7 @@
>  #include <asm/spec-ctrl.h>
>  #include <asm/virtext.h>
>  #include <asm/vmx.h>
> +#include <asm/cet.h>
>
>  #include "capabilities.h"
>  #include "cpuid.h"
> @@ -2918,6 +2919,37 @@ void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>         vmcs_writel(GUEST_CR3, guest_cr3);
>  }
>
> +static int set_cet_bit(struct kvm_vcpu *vcpu, unsigned long cr4)

Nit: This function does not appear to set CR4.CET, as the name would imply.

> +{
> +       struct vcpu_vmx *vmx = to_vmx(vcpu);
> +       const u64 cet_bits = XFEATURE_MASK_CET_USER | XFEATURE_MASK_CET_KERNEL;
> +       bool cet_xss = vmx_xsaves_supported() &&
> +                      (kvm_supported_xss() & cet_bits);
> +       bool cet_cpuid = guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) ||
> +                        guest_cpuid_has(vcpu, X86_FEATURE_IBT);
> +       bool cet_on = !!(cr4 & X86_CR4_CET);
> +
> +       if (cet_on && vmx->nested.vmxon)
> +               return 1;

This constraint doesn't appear to be architected. Also, this prevents
enabling CR4.CET when in VMX operation, but what about the other way
around (i.e. entering VMX operation with CR4.CET enabled)?

> +       if (cet_on && !cpu_x86_cet_enabled())
> +               return 1;

This seems odd. Why is kernel support for (SHSTK or IBT) required for
the guest to use (SHSTK or IBT)? If there's a constraint, shouldn't it
be 1:1? (i.e. kernel support for SHSTK is required for the guest to
use SHSTK and kernel support for IBT is required for the guest to use
IBT?) Either way, enforcement of this constraint seems late, here,
when the guest is trying to set CR4 to a value that, per the guest
CPUID information, should be legal. Shouldn't this constraint be
applied when setting the guest CPUID information, disallowing the
enumeration of SHSTK and/or IBT support on a platform where these
features are unavailable or disabled in the kernel?

> +       if (cet_on && !cet_xss)
> +               return 1;

Again, this constraint seems like it's being applied too late.
Returning 1 here will result in the guest's MOV-to-CR4 raising #GP,
even though there is no architected reason for it to do so.

> +       if (cet_on && !cet_cpuid)
> +               return 1;

What about the constraint that CR4.CET can't be set when CR0.WP is
clear? (And the reverse needs to be handled in vmx_set_cr0).

> +       if (cet_on)
> +               vmcs_set_bits(VM_ENTRY_CONTROLS,
> +                             VM_ENTRY_LOAD_GUEST_CET_STATE);

Have we ensured that this VM-entry control is supported on the platform?

> +       else
> +               vmcs_clear_bits(VM_ENTRY_CONTROLS,
> +                               VM_ENTRY_LOAD_GUEST_CET_STATE);
> +       return 0;
> +}
> +
>  int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
>  {
>         struct vcpu_vmx *vmx = to_vmx(vcpu);
> @@ -2958,6 +2990,9 @@ int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
>                         return 1;
>         }
>
> +       if (set_cet_bit(vcpu, cr4))
> +               return 1;
> +
>         if (vmx->nested.vmxon && !nested_cr4_valid(vcpu, cr4))
>                 return 1;
>
> --
> 2.17.2
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 5/7] kvm: x86: Add CET CR4 bit and XSS support
  2019-09-27  2:19 ` [PATCH v7 5/7] kvm: x86: Add CET CR4 bit and XSS support Yang Weijiang
@ 2019-10-02 19:05   ` Jim Mattson
  2019-10-17 19:56     ` Sean Christopherson
  0 siblings, 1 reply; 43+ messages in thread
From: Jim Mattson @ 2019-10-02 19:05 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: kvm list, LKML, Paolo Bonzini, Sean Christopherson,
	Michael S. Tsirkin, Radim Krčmář

On Thu, Sep 26, 2019 at 7:17 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
>
> CR4.CET(bit 23) is master enable bit for CET feature.
> Previously, KVM did not support setting any bits in XSS
> so it's hardcoded to check and inject a #GP if Guest
> attempted to write a non-zero value to XSS, now it supports
> CET related bits setting.
>
> Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  4 +++-
>  arch/x86/kvm/cpuid.c            | 11 +++++++++--
>  arch/x86/kvm/vmx/vmx.c          |  6 +-----
>  3 files changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index d018df8c5f32..8f97269d6d9f 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -90,7 +90,8 @@
>                           | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_PCIDE \
>                           | X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \
>                           | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \
> -                         | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP))
> +                         | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP \
> +                         | X86_CR4_CET))
>
>  #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR)
>
> @@ -623,6 +624,7 @@ struct kvm_vcpu_arch {
>
>         u64 xcr0;
>         u64 guest_supported_xcr0;
> +       u64 guest_supported_xss;
>         u32 guest_xstate_size;
>
>         struct kvm_pio_request pio;
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 0a47b9e565be..dd3ddc6daa58 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -120,8 +120,15 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
>         }
>
>         best = kvm_find_cpuid_entry(vcpu, 0xD, 1);
> -       if (best && (best->eax & (F(XSAVES) | F(XSAVEC))))
> -               best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
> +       if (best && (best->eax & (F(XSAVES) | F(XSAVEC)))) {

Is XSAVEC alone sufficient? Don't we explicitly need XSAVES to
save/restore the extended state components enumerated by IA32_XSS?

> +               u64 kvm_xss = kvm_supported_xss();
> +
> +               best->ebx =
> +                       xstate_required_size(vcpu->arch.xcr0 | kvm_xss, true);

Shouldn't this size be based on the *current* IA32_XSS value, rather
than the supported IA32_XSS bits? (i.e.
s/kvm_xss/vcpu->arch.ia32_xss/)

> +               vcpu->arch.guest_supported_xss = best->ecx & kvm_xss;

Shouldn't unsupported bits in best->ecx be masked off, so that the
guest CPUID doesn't mis-report the capabilities of the vCPU?

> +       } else {
> +               vcpu->arch.guest_supported_xss = 0;
> +       }
>
>         /*
>          * The existing code assumes virtual address is 48-bit or 57-bit in the
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index ba1a83d11e69..44913e4ab558 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1973,11 +1973,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>                      !(guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) &&
>                        guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))))
>                         return 1;
> -               /*
> -                * The only supported bit as of Skylake is bit 8, but
> -                * it is not supported on KVM.
> -                */
> -               if (data != 0)
> +               if (data & ~vcpu->arch.guest_supported_xss)
>                         return 1;
>                 vcpu->arch.ia32_xss = data;
>                 if (vcpu->arch.ia32_xss != host_xss)
> --
> 2.17.2
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 6/7] KVM: x86: Load Guest fpu state when accessing MSRs managed by XSAVES
  2019-09-27  2:19 ` [PATCH v7 6/7] KVM: x86: Load Guest fpu state when accessing MSRs managed by XSAVES Yang Weijiang
@ 2019-10-02 19:56   ` Jim Mattson
  2019-10-09  6:46     ` Yang Weijiang
  0 siblings, 1 reply; 43+ messages in thread
From: Jim Mattson @ 2019-10-02 19:56 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: kvm list, LKML, Paolo Bonzini, Sean Christopherson,
	Michael S. Tsirkin, Radim Krčmář

On Thu, Sep 26, 2019 at 7:17 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
>
> From: Sean Christopherson <sean.j.christopherson@intel.com>
>
> A handful of CET MSRs are not context switched through "traditional"
> methods, e.g. VMCS or manual switching, but rather are passed through
> to the guest and are saved and restored by XSAVES/XRSTORS, i.e. the
> guest's FPU state.
>
> Load the guest's FPU state if userspace is accessing MSRs whose values
> are managed by XSAVES so that the MSR helper, e.g. vmx_{get,set}_msr(),
> can simply do {RD,WR}MSR to access the guest's value.
>
> Note that guest_cpuid_has() is not queried as host userspace is allowed
> to access MSRs that have not been exposed to the guest, e.g. it might do
> KVM_SET_MSRS prior to KVM_SET_CPUID2.
>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Co-developed-by: Yang Weijiang <weijiang.yang@intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> ---
>  arch/x86/kvm/x86.c | 22 +++++++++++++++++++++-
>  1 file changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 290c3c3efb87..5b8116028a59 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -104,6 +104,8 @@ static void enter_smm(struct kvm_vcpu *vcpu);
>  static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
>  static void store_regs(struct kvm_vcpu *vcpu);
>  static int sync_regs(struct kvm_vcpu *vcpu);
> +static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
> +static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
>
>  struct kvm_x86_ops *kvm_x86_ops __read_mostly;
>  EXPORT_SYMBOL_GPL(kvm_x86_ops);
> @@ -2999,6 +3001,12 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  }
>  EXPORT_SYMBOL_GPL(kvm_get_msr_common);
>
> +static bool is_xsaves_msr(u32 index)
> +{
> +       return index == MSR_IA32_U_CET ||
> +              (index >= MSR_IA32_PL0_SSP && index <= MSR_IA32_PL3_SSP);
> +}
> +
>  /*
>   * Read or write a bunch of msrs. All parameters are kernel addresses.
>   *
> @@ -3009,11 +3017,23 @@ static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs,
>                     int (*do_msr)(struct kvm_vcpu *vcpu,
>                                   unsigned index, u64 *data))
>  {
> +       bool fpu_loaded = false;
>         int i;
> +       const u64 cet_bits = XFEATURE_MASK_CET_USER | XFEATURE_MASK_CET_KERNEL;
> +       bool cet_xss = kvm_x86_ops->xsaves_supported() &&
> +                      (kvm_supported_xss() & cet_bits);

It seems like I've seen a lot of checks like this. Can this be
simplified (throughout this series) by sinking the
kvm_x86_ops->xsaves_supported() check into kvm_supported_xss()? That
is, shouldn't kvm_supported_xss() return 0 if
kvm_x86_ops->xsaves_supported() is false?

> -       for (i = 0; i < msrs->nmsrs; ++i)
> +       for (i = 0; i < msrs->nmsrs; ++i) {
> +               if (!fpu_loaded && cet_xss &&
> +                   is_xsaves_msr(entries[i].index)) {
> +                       kvm_load_guest_fpu(vcpu);
> +                       fpu_loaded = true;
> +               }
>                 if (do_msr(vcpu, entries[i].index, &entries[i].data))
>                         break;
> +       }
> +       if (fpu_loaded)
> +               kvm_put_guest_fpu(vcpu);
>
>         return i;
>  }
> --
> 2.17.2
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 7/7] KVM: x86: Add user-space access interface for CET MSRs
  2019-09-27  2:19 ` [PATCH v7 7/7] KVM: x86: Add user-space access interface for CET MSRs Yang Weijiang
@ 2019-10-02 20:57   ` Jim Mattson
  2019-10-09  6:56     ` Yang Weijiang
  2019-10-17 19:58     ` Sean Christopherson
  0 siblings, 2 replies; 43+ messages in thread
From: Jim Mattson @ 2019-10-02 20:57 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: kvm list, LKML, Paolo Bonzini, Sean Christopherson,
	Michael S. Tsirkin, Radim Krčmář

On Thu, Sep 26, 2019 at 7:17 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
>
> There're two different places storing Guest CET states, the states
> managed with XSAVES/XRSTORS, as restored/saved
> in previous patch, can be read/write directly from/to the MSRs.
> For those stored in VMCS fields, they're access via vmcs_read/
> vmcs_write.
>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> ---
>  arch/x86/kvm/vmx/vmx.c | 83 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 83 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 44913e4ab558..5265db7cd2af 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1671,6 +1671,49 @@ static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
>         return 0;
>  }
>
> +static int check_cet_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)

I'd suggest changing return type to bool, since you are essentially
returning true or false.

> +{
> +       u64 kvm_xss = kvm_supported_xss();
> +
> +       switch (msr_info->index) {
> +       case MSR_IA32_PL0_SSP ... MSR_IA32_PL2_SSP:
> +               if (!(kvm_xss | XFEATURE_MASK_CET_KERNEL))
'|' should be '&'
> +                       return 1;
> +               if (!msr_info->host_initiated &&
> +                   !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
> +                       return 1;
> +               break;
> +       case MSR_IA32_PL3_SSP:
> +               if (!(kvm_xss | XFEATURE_MASK_CET_USER))
'|' should be '&'
> +                       return 1;
> +               if (!msr_info->host_initiated &&
> +                   !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
> +                       return 1;
> +               break;
> +       case MSR_IA32_U_CET:
> +               if (!(kvm_xss | XFEATURE_MASK_CET_USER))
'|' should be '&'
> +                       return 1;
> +               if (!msr_info->host_initiated &&
> +                   !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) &&
> +                   !guest_cpuid_has(vcpu, X86_FEATURE_IBT))
> +                       return 1;
> +               break;
> +       case MSR_IA32_S_CET:
> +               if (!msr_info->host_initiated &&
> +                   !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) &&
> +                   !guest_cpuid_has(vcpu, X86_FEATURE_IBT))
> +                       return 1;
> +               break;
> +       case MSR_IA32_INT_SSP_TAB:
> +               if (!msr_info->host_initiated &&
> +                   !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
> +                       return 1;
> +               break;
> +       default:
> +               return 1;
> +       }
> +       return 0;
> +}
>  /*
>   * Reads an msr value (of 'msr_index') into 'pdata'.
>   * Returns 0 on success, non-0 otherwise.
> @@ -1788,6 +1831,26 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>                 else
>                         msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
>                 break;
> +       case MSR_IA32_S_CET:
> +               if (check_cet_msr(vcpu, msr_info))
> +                       return 1;
> +               msr_info->data = vmcs_readl(GUEST_S_CET);
Have we ensured that this VMCS field exists?
> +               break;
> +       case MSR_IA32_INT_SSP_TAB:
> +               if (check_cet_msr(vcpu, msr_info))
> +                       return 1;
> +               msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE);
Have we ensured that this VMCS field exists?
> +               break;
> +       case MSR_IA32_U_CET:
Can this be lumped together with MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP, below?
> +               if (check_cet_msr(vcpu, msr_info))
> +                       return 1;
> +               rdmsrl(MSR_IA32_U_CET, msr_info->data);
> +               break;
> +       case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
> +               if (check_cet_msr(vcpu, msr_info))
> +                       return 1;
> +               rdmsrl(msr_info->index, msr_info->data);
> +               break;
>         case MSR_TSC_AUX:
>                 if (!msr_info->host_initiated &&
>                     !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> @@ -2039,6 +2102,26 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>                 else
>                         vmx->pt_desc.guest.addr_a[index / 2] = data;
>                 break;
> +       case MSR_IA32_S_CET:
> +               if (check_cet_msr(vcpu, msr_info))
> +                       return 1;
Bits 9:6 must be zero.
> +               vmcs_writel(GUEST_S_CET, data);
Have we ensured that this VMCS field exists?
> +               break;
> +       case MSR_IA32_INT_SSP_TAB:
> +               if (check_cet_msr(vcpu, msr_info))
> +                       return 1;
Must be canonical. vCPU must support longmode.
> +               vmcs_writel(GUEST_INTR_SSP_TABLE, data);
Have we ensured that this VMCS field exists?
> +               break;
> +       case MSR_IA32_U_CET:
> +               if (check_cet_msr(vcpu, msr_info))
> +                       return 1;
Bits 9:6 must be zero.
> +               wrmsrl(MSR_IA32_U_CET, data);
> +               break;
> +       case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
> +               if (check_cet_msr(vcpu, msr_info))
> +                       return 1;
'Data' must be canonical and 4-byte aligned. High dword must be zero
on vCPUs that don't support longmode.
> +               wrmsrl(msr_info->index, data);
> +               break;
>         case MSR_TSC_AUX:
>                 if (!msr_info->host_initiated &&
>                     !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> --
> 2.17.2
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 0/7] Introduce support for Guest CET feature
  2019-09-27  2:19 [PATCH v7 0/7] Introduce support for Guest CET feature Yang Weijiang
                   ` (6 preceding siblings ...)
  2019-09-27  2:19 ` [PATCH v7 7/7] KVM: x86: Add user-space access interface for CET MSRs Yang Weijiang
@ 2019-10-02 22:40 ` Jim Mattson
  2019-10-03 13:01   ` Yang Weijiang
  7 siblings, 1 reply; 43+ messages in thread
From: Jim Mattson @ 2019-10-02 22:40 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: kvm list, LKML, Paolo Bonzini, Sean Christopherson,
	Michael S. Tsirkin, Radim Krčmář

On Thu, Sep 26, 2019 at 7:17 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
>
> Control-flow Enforcement Technology (CET) provides protection against
> Return/Jump-Oriented Programming (ROP/JOP) attack. It includes two
> sub-features: Shadow Stack (SHSTK) and Indirect Branch Tracking (IBT).
>
> KVM modification is required to support Guest CET feature.
> This patch serial implemented CET related CPUID/XSAVES enumeration, MSRs
> and VMEntry configuration etc.so that Guest kernel can setup CET
> runtime infrastructure based on them. Some MSRs and related feature
> flags used in the patches reference the definitions in kernel patch.

I am still trying to make my way through the 358 page (!) spec for
this feature, but I already have some questions/comments about this
series:

1. Does CET "just work" with shadow paging? Shadow paging knows
nothing about "shadow-stack pages," and it's not clear to me how
shadow-stack pages will interact with dirty tracking.
2. I see non-trivial changes to task switch under CET. Does
emulator_do_task_switch need to be updated?
3. What about all of the emulator routines that emulate control
transfers (e.g. em_jmp_{far,abs}, em_call_(near_abs,far},
em_ret_{far,far_imm,near_imm}, etc)? Don't these have to be modified
to work correctly when CR4.CET is set?
4. You don't use the new "enable supervisor shadow stack control" bit
in the EPTP. I assume that this is entirely optional, right?
5. I think the easiest way to handle the nested issue (rather than
your explicit check for vmxon when setting CR4.CET when the vCPU is in
VMX operation) is just to leave CR4.CET out of IA32_VMX_CR4_FIXED1
(which is already the case).
6. The function, exception_class(), in x86.c, should be updated to
categorize #CP as contributory.
7. The function, x86_exception_has_error_code(), in x86.h, should be
updated to include #CP.
8. There appear to be multiple changes to SMM that you haven't
implemented (e.g saving/restoring the SSP registers in/from SMRAM.

CET is quite complex. Without any tests, I don't see how you can have
any confidence in the correctness of this patch series.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 0/7] Introduce support for Guest CET feature
  2019-10-02 22:40 ` [PATCH v7 0/7] Introduce support for Guest CET feature Jim Mattson
@ 2019-10-03 13:01   ` Yang Weijiang
  2019-10-03 16:33     ` Jim Mattson
  0 siblings, 1 reply; 43+ messages in thread
From: Yang Weijiang @ 2019-10-03 13:01 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Yang Weijiang, kvm list, LKML, Paolo Bonzini,
	Sean Christopherson, Michael S. Tsirkin,
	Radim Krčmář

On Wed, Oct 02, 2019 at 03:40:20PM -0700, Jim Mattson wrote:
> On Thu, Sep 26, 2019 at 7:17 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
> >
> > Control-flow Enforcement Technology (CET) provides protection against
> > Return/Jump-Oriented Programming (ROP/JOP) attack. It includes two
> > sub-features: Shadow Stack (SHSTK) and Indirect Branch Tracking (IBT).
> >
> > KVM modification is required to support Guest CET feature.
> > This patch serial implemented CET related CPUID/XSAVES enumeration, MSRs
> > and VMEntry configuration etc.so that Guest kernel can setup CET
> > runtime infrastructure based on them. Some MSRs and related feature
> > flags used in the patches reference the definitions in kernel patch.
> 
> I am still trying to make my way through the 358 page (!) spec for
> this feature, but I already have some questions/comments about this
> series:
> 
> 1. Does CET "just work" with shadow paging? Shadow paging knows
> nothing about "shadow-stack pages," and it's not clear to me how
> shadow-stack pages will interact with dirty tracking.
> 2. I see non-trivial changes to task switch under CET. Does
> emulator_do_task_switch need to be updated?
> 3. What about all of the emulator routines that emulate control
> transfers (e.g. em_jmp_{far,abs}, em_call_(near_abs,far},
> em_ret_{far,far_imm,near_imm}, etc)? Don't these have to be modified
> to work correctly when CR4.CET is set?
> 4. You don't use the new "enable supervisor shadow stack control" bit
> in the EPTP. I assume that this is entirely optional, right?
> 5. I think the easiest way to handle the nested issue (rather than
> your explicit check for vmxon when setting CR4.CET when the vCPU is in
> VMX operation) is just to leave CR4.CET out of IA32_VMX_CR4_FIXED1
> (which is already the case).
> 6. The function, exception_class(), in x86.c, should be updated to
> categorize #CP as contributory.
> 7. The function, x86_exception_has_error_code(), in x86.h, should be
> updated to include #CP.
> 8. There appear to be multiple changes to SMM that you haven't
> implemented (e.g saving/restoring the SSP registers in/from SMRAM.
> 
> CET is quite complex. Without any tests, I don't see how you can have
> any confidence in the correctness of this patch series.
Thanks Jim for the detailed comments. 

I missed adding test platform and
result introduction in cover letter. This serial of patch has passed CET
test in guest on Intel x86 emulator platform and develop machine. 
Some feature mentioned in the spec. has not been implemented yet. e.g., 
"supervisor shadow stack control". 

CET feature itself is complex, most of the enabling work is
inside kernel, the role of KVM is to expose CET related CPUID and MSRs
etc. to guest, and make guest take over control of the MSRs directly so that
CET can work efficiently for guest. There're QEMU patches for CET too.

I'll review your comments carefully, thank you again!

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 0/7] Introduce support for Guest CET feature
  2019-10-03 13:01   ` Yang Weijiang
@ 2019-10-03 16:33     ` Jim Mattson
  2019-10-08  8:50       ` Yang Weijiang
  0 siblings, 1 reply; 43+ messages in thread
From: Jim Mattson @ 2019-10-03 16:33 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: kvm list, LKML, Paolo Bonzini, Sean Christopherson,
	Michael S. Tsirkin, Radim Krčmář

On Thu, Oct 3, 2019 at 5:59 AM Yang Weijiang <weijiang.yang@intel.com> wrote:
>
> On Wed, Oct 02, 2019 at 03:40:20PM -0700, Jim Mattson wrote:
> > On Thu, Sep 26, 2019 at 7:17 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
> > >
> > > Control-flow Enforcement Technology (CET) provides protection against
> > > Return/Jump-Oriented Programming (ROP/JOP) attack. It includes two
> > > sub-features: Shadow Stack (SHSTK) and Indirect Branch Tracking (IBT).
> > >
> > > KVM modification is required to support Guest CET feature.
> > > This patch serial implemented CET related CPUID/XSAVES enumeration, MSRs
> > > and VMEntry configuration etc.so that Guest kernel can setup CET
> > > runtime infrastructure based on them. Some MSRs and related feature
> > > flags used in the patches reference the definitions in kernel patch.
> >
> > I am still trying to make my way through the 358 page (!) spec for
> > this feature, but I already have some questions/comments about this
> > series:
> >
> > 1. Does CET "just work" with shadow paging? Shadow paging knows
> > nothing about "shadow-stack pages," and it's not clear to me how
> > shadow-stack pages will interact with dirty tracking.
> > 2. I see non-trivial changes to task switch under CET. Does
> > emulator_do_task_switch need to be updated?
> > 3. What about all of the emulator routines that emulate control
> > transfers (e.g. em_jmp_{far,abs}, em_call_(near_abs,far},
> > em_ret_{far,far_imm,near_imm}, etc)? Don't these have to be modified
> > to work correctly when CR4.CET is set?
> > 4. You don't use the new "enable supervisor shadow stack control" bit
> > in the EPTP. I assume that this is entirely optional, right?
> > 5. I think the easiest way to handle the nested issue (rather than
> > your explicit check for vmxon when setting CR4.CET when the vCPU is in
> > VMX operation) is just to leave CR4.CET out of IA32_VMX_CR4_FIXED1
> > (which is already the case).
> > 6. The function, exception_class(), in x86.c, should be updated to
> > categorize #CP as contributory.
> > 7. The function, x86_exception_has_error_code(), in x86.h, should be
> > updated to include #CP.
> > 8. There appear to be multiple changes to SMM that you haven't
> > implemented (e.g saving/restoring the SSP registers in/from SMRAM.
> >
> > CET is quite complex. Without any tests, I don't see how you can have
> > any confidence in the correctness of this patch series.
> Thanks Jim for the detailed comments.
>
> I missed adding test platform and
> result introduction in cover letter. This serial of patch has passed CET
> test in guest on Intel x86 emulator platform and develop machine.
> Some feature mentioned in the spec. has not been implemented yet. e.g.,
> "supervisor shadow stack control".

What I should have said is that I'd like to see kvm-unit-tests to
exercise the new functionality, even if no one outside Intel can run
them yet.

> CET feature itself is complex, most of the enabling work is
> inside kernel, the role of KVM is to expose CET related CPUID and MSRs
> etc. to guest, and make guest take over control of the MSRs directly so that
> CET can work efficiently for guest. There're QEMU patches for CET too.
>
> I'll review your comments carefully, thank you again!

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 1/7] KVM: CPUID: Fix IA32_XSS support in CPUID(0xd,i) enumeration
  2019-10-02 17:26   ` Jim Mattson
@ 2019-10-08  8:30     ` Yang Weijiang
  2019-10-17 19:46     ` Sean Christopherson
  1 sibling, 0 replies; 43+ messages in thread
From: Yang Weijiang @ 2019-10-08  8:30 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Yang Weijiang, kvm list, LKML, Paolo Bonzini,
	Sean Christopherson, Michael S. Tsirkin,
	Radim Krčmář

On Wed, Oct 02, 2019 at 10:26:10AM -0700, Jim Mattson wrote:
> On Thu, Sep 26, 2019 at 7:17 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
> >
> > The control bits in IA32_XSS MSR are being used for new features,
> > but current CPUID(0xd,i) enumeration code doesn't support them, so
> > fix existing code first.
> >
> > The supervisor states in IA32_XSS haven't been used in public
> > KVM code, so set KVM_SUPPORTED_XSS to 0 now, anyone who's developing
> > IA32_XSS related feature may expand the macro to add the CPUID support,
> > otherwise, CPUID(0xd,i>1) always reports 0 of the subleaf to guest.
> >
> > Extracted old code into a new filter and keep it same flavor as others.
> >
> > This patch passed selftest on a few Intel platforms.
> >
> > Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> > ---
> >  arch/x86/include/asm/kvm_host.h |  1 +
> >  arch/x86/kvm/cpuid.c            | 94 +++++++++++++++++++++------------
> >  arch/x86/kvm/svm.c              |  7 +++
> >  arch/x86/kvm/vmx/vmx.c          |  6 +++
> >  arch/x86/kvm/x86.h              |  7 +++
> >  5 files changed, 82 insertions(+), 33 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 74e88e5edd9c..d018df8c5f32 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1209,6 +1209,7 @@ struct kvm_x86_ops {
> >         uint16_t (*nested_get_evmcs_version)(struct kvm_vcpu *vcpu);
> >
> >         bool (*need_emulation_on_page_fault)(struct kvm_vcpu *vcpu);
> > +       u64 (*supported_xss)(void);
> >  };
> >
> >  struct kvm_arch_async_pf {
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index 22c2720cd948..9d282fec0a62 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -62,6 +62,11 @@ u64 kvm_supported_xcr0(void)
> >         return xcr0;
> >  }
> >
> > +u64 kvm_supported_xss(void)
> > +{
> > +       return KVM_SUPPORTED_XSS & kvm_x86_ops->supported_xss();
> > +}
> > +
> >  #define F(x) bit(X86_FEATURE_##x)
> >
> >  int kvm_update_cpuid(struct kvm_vcpu *vcpu)
> > @@ -414,6 +419,50 @@ static inline void do_cpuid_7_mask(struct kvm_cpuid_entry2 *entry, int index)
> >         }
> >  }
> >
> > +static inline void do_cpuid_0xd_mask(struct kvm_cpuid_entry2 *entry, int index)
> > +{
> > +       unsigned int f_xsaves = kvm_x86_ops->xsaves_supported() ? F(XSAVES) : 0;
> 
> Does Intel have CPUs that support XSAVES but don't support the "enable
> XSAVES/XRSTORS" VM-execution control? If so, what is the behavior of
> XSAVESXRSTORS on those CPUs in VMX non-root mode? If not, why is this
> conditional F(XSAVES) here?
Thanks Jim for raising the question.
From SDM 25.3, if XSAVES/XRSTORS" VM-execution control == 0, these
instructions cause #UD.
> 
> > +       /* cpuid 0xD.1.eax */
> > +       const u32 kvm_cpuid_D_1_eax_x86_features =
> > +               F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1) | f_xsaves;
> > +       u64 u_supported = kvm_supported_xcr0();
> > +       u64 s_supported = kvm_supported_xss();
> > +       u64 supported;
> > +
> > +       switch (index) {
> > +       case 0:
> > +               entry->eax &= u_supported;
> > +               entry->ebx = xstate_required_size(u_supported, false);
> 
> EBX could actually be zero, couldn't it? Since this output is
> context-dependent, I'm not sure how to interpret it when returned from
> KVM_GET_SUPPORTED_CPUID.
I did't get your concern, What's context_dependent? IIUC, EBX actually
cannot be 0 since x87 and SSE states are always there.
> 
> > +               entry->ecx = entry->ebx;
> > +               entry->edx = 0;
> 
> Shouldn't this be: entry->edx &= u_supported >> 32?
Yes, but the SDM makes me puzzled, it says at the tail: Bits 31 - 00: Reserved,
Maybe I need to follow your suggestion in next version, thanks!
> > +               break;
> > +       case 1:
> > +               supported = u_supported | s_supported;
> > +               entry->eax &= kvm_cpuid_D_1_eax_x86_features;
> > +               cpuid_mask(&entry->eax, CPUID_D_1_EAX);
> > +               entry->ebx = 0;
> > +               entry->edx = 0;
> 
> Shouldn't this be: entry->edx &= s_supported >> 32?
> 
The same as above.
> > +               entry->ecx &= s_supported;
> > +               if (entry->eax & (F(XSAVES) | F(XSAVEC)))
> > +                       entry->ebx = xstate_required_size(supported, true);
> 
> As above, can't EBX just be zero, since it's context-dependent? What
> is the context when processing KVM_GET_SUPPORTED_CPUID? And why do we
> only fill this in when XSAVES or XSAVEC is supported?
> 
IIUC, EBX here reports the states(XCRO | IA32_XSS) in compacted format.
so it's only available when F(XSAVES) or F(XSAVEC) is on.
> > +               break;
> > +       default:
> > +               supported = (entry->ecx & 1) ? s_supported : u_supported;
> > +               if (!(supported & ((u64)1 << index))) {
> 
> Nit: 1ULL << index.
right, thank you.
> 
> > +                       entry->eax = 0;
> > +                       entry->ebx = 0;
> > +                       entry->ecx = 0;
> > +                       entry->edx = 0;
> > +                       return;
> > +               }
> > +               if (entry->ecx)
> > +                       entry->ebx = 0;
> 
> This seems to back up my claims above regarding the EBX output for
> cases 0 and 1, but aside from those subleaves, is this correct? For
> subleaves > 1, ECX bit 1 can be set for extended state components that
> need to be cache-line aligned. Such components could map to a valid
> bit in XCR0 and have a non-zero offset from the beginning of the
> non-compacted XSAVE area.
> 
thank you, should check bit 0 of ecx before clear ebx.
> > +               entry->edx = 0;
> 
> This seems too aggressive. See my comments above regarding EDX outputs
> for cases 0 and 1.
> 
OK, I'll change it together with other parts.
> > +               break;
> > +       }
> > +}
> > +
> >  static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
> >                                   int *nent, int maxnent)
> >  {
> > @@ -428,7 +477,6 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
> >         unsigned f_lm = 0;
> >  #endif
> >         unsigned f_rdtscp = kvm_x86_ops->rdtscp_supported() ? F(RDTSCP) : 0;
> > -       unsigned f_xsaves = kvm_x86_ops->xsaves_supported() ? F(XSAVES) : 0;
> >         unsigned f_intel_pt = kvm_x86_ops->pt_supported() ? F(INTEL_PT) : 0;
> >
> >         /* cpuid 1.edx */
> > @@ -482,10 +530,6 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
> >                 F(ACE2) | F(ACE2_EN) | F(PHE) | F(PHE_EN) |
> >                 F(PMM) | F(PMM_EN);
> >
> > -       /* cpuid 0xD.1.eax */
> > -       const u32 kvm_cpuid_D_1_eax_x86_features =
> > -               F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1) | f_xsaves;
> > -
> >         /* all calls to cpuid_count() should be made on the same cpu */
> >         get_cpu();
> >
> > @@ -622,38 +666,22 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
> >                 break;
> >         }
> >         case 0xd: {
> > -               int idx, i;
> > -               u64 supported = kvm_supported_xcr0();
> > -
> > -               entry->eax &= supported;
> > -               entry->ebx = xstate_required_size(supported, false);
> > -               entry->ecx = entry->ebx;
> > -               entry->edx &= supported >> 32;
> > -               if (!supported)
> > -                       break;
> > +               int i, idx;
> >
> > -               for (idx = 1, i = 1; idx < 64; ++idx) {
> > -                       u64 mask = ((u64)1 << idx);
> > +               do_cpuid_0xd_mask(&entry[0], 0);
> > +               for (i = 1, idx = 1; idx < 64; ++idx) {
> >                         if (*nent >= maxnent)
> >                                 goto out;
> > -
> >                         do_host_cpuid(&entry[i], function, idx);
> > -                       if (idx == 1) {
> > -                               entry[i].eax &= kvm_cpuid_D_1_eax_x86_features;
> > -                               cpuid_mask(&entry[i].eax, CPUID_D_1_EAX);
> > -                               entry[i].ebx = 0;
> > -                               if (entry[i].eax & (F(XSAVES)|F(XSAVEC)))
> > -                                       entry[i].ebx =
> > -                                               xstate_required_size(supported,
> > -                                                                    true);
> > -                       } else {
> > -                               if (entry[i].eax == 0 || !(supported & mask))
> > -                                       continue;
> > -                               if (WARN_ON_ONCE(entry[i].ecx & 1))
> > -                                       continue;
> > -                       }
> > -                       entry[i].ecx = 0;
> > -                       entry[i].edx = 0;
> > +                       if (entry[i].eax == 0 && entry[i].ebx == 0 &&
> > +                           entry[i].ecx == 0 && entry[i].edx == 0)
> > +                               continue;
> > +
> > +                       do_cpuid_0xd_mask(&entry[i], idx);
> > +                       if (entry[i].eax == 0 && entry[i].ebx == 0 &&
> > +                           entry[i].ecx == 0 && entry[i].edx == 0)
> > +                               continue;
> > +
> >                         ++*nent;
> >                         ++i;
> >                 }
> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > index e0368076a1ef..be967bf9a81d 100644
> > --- a/arch/x86/kvm/svm.c
> > +++ b/arch/x86/kvm/svm.c
> > @@ -7193,6 +7193,11 @@ static bool svm_need_emulation_on_page_fault(struct kvm_vcpu *vcpu)
> >         return false;
> >  }
> >
> > +static u64 svm_supported_xss(void)
> > +{
> > +       return 0;
> > +}
> > +
> >  static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> >         .cpu_has_kvm_support = has_svm,
> >         .disabled_by_bios = is_disabled,
> > @@ -7329,6 +7334,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> >         .nested_get_evmcs_version = NULL,
> >
> >         .need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
> > +
> > +       .supported_xss = svm_supported_xss,
> >  };
> >
> >  static int __init svm_init(void)
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index c6f6b05004d9..a84198cff397 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -1651,6 +1651,11 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
> >         return !(val & ~valid_bits);
> >  }
> >
> > +static inline u64 vmx_supported_xss(void)
> > +{
> > +       return host_xss;
> > +}
> 
> Do you really need vendor-specific code for this? Can't you just hoist
> host_xss into common code (x86.c) and use that? [Note that Aaron Lewis
> is currently working on a series that will include that hoisting, if
> you want to wait.]
> 
> 
> >  static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
> >  {
> >         switch (msr->index) {
> > @@ -7799,6 +7804,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
> >         .nested_enable_evmcs = NULL,
> >         .nested_get_evmcs_version = NULL,
> >         .need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
> > +       .supported_xss = vmx_supported_xss,
> >  };
> >
> >  static void vmx_cleanup_l1d_flush(void)
> > diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> > index 6594020c0691..fbffabad0370 100644
> > --- a/arch/x86/kvm/x86.h
> > +++ b/arch/x86/kvm/x86.h
> > @@ -293,6 +293,13 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, unsigned long cr2,
> >                                 | XFEATURE_MASK_YMM | XFEATURE_MASK_BNDREGS \
> >                                 | XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
> >                                 | XFEATURE_MASK_PKRU)
> > +
> > +/*
> > + * Right now, no XSS states are used on x86 platform,
> > + * expand the macro for new features.
> > + */
> > +#define KVM_SUPPORTED_XSS      (0)
> > +
> 
> Nit: superfluous parentheses.
> 
> >  extern u64 host_xcr0;
> >
> >  extern u64 kvm_supported_xcr0(void);
> > --
> > 2.17.2
> >

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 0/7] Introduce support for Guest CET feature
  2019-10-03 16:33     ` Jim Mattson
@ 2019-10-08  8:50       ` Yang Weijiang
  0 siblings, 0 replies; 43+ messages in thread
From: Yang Weijiang @ 2019-10-08  8:50 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Yang Weijiang, kvm list, LKML, Paolo Bonzini,
	Sean Christopherson, Michael S. Tsirkin,
	Radim Krčmář

On Thu, Oct 03, 2019 at 09:33:45AM -0700, Jim Mattson wrote:
> On Thu, Oct 3, 2019 at 5:59 AM Yang Weijiang <weijiang.yang@intel.com> wrote:
> >
> > On Wed, Oct 02, 2019 at 03:40:20PM -0700, Jim Mattson wrote:
> > > On Thu, Sep 26, 2019 at 7:17 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
> > > >
> > > > Control-flow Enforcement Technology (CET) provides protection against
> > > > Return/Jump-Oriented Programming (ROP/JOP) attack. It includes two
> > > > sub-features: Shadow Stack (SHSTK) and Indirect Branch Tracking (IBT).
> > > >
> > > > KVM modification is required to support Guest CET feature.
> > > > This patch serial implemented CET related CPUID/XSAVES enumeration, MSRs
> > > > and VMEntry configuration etc.so that Guest kernel can setup CET
> > > > runtime infrastructure based on them. Some MSRs and related feature
> > > > flags used in the patches reference the definitions in kernel patch.
> > >
> > > I am still trying to make my way through the 358 page (!) spec for
> > > this feature, but I already have some questions/comments about this
> > > series:
> > >
> > > 1. Does CET "just work" with shadow paging? Shadow paging knows
> > > nothing about "shadow-stack pages," and it's not clear to me how
> > > shadow-stack pages will interact with dirty tracking.
> > > 2. I see non-trivial changes to task switch under CET. Does
> > > emulator_do_task_switch need to be updated?
> > > 3. What about all of the emulator routines that emulate control
> > > transfers (e.g. em_jmp_{far,abs}, em_call_(near_abs,far},
> > > em_ret_{far,far_imm,near_imm}, etc)? Don't these have to be modified
> > > to work correctly when CR4.CET is set?
> > > 4. You don't use the new "enable supervisor shadow stack control" bit
> > > in the EPTP. I assume that this is entirely optional, right?
> > > 5. I think the easiest way to handle the nested issue (rather than
> > > your explicit check for vmxon when setting CR4.CET when the vCPU is in
> > > VMX operation) is just to leave CR4.CET out of IA32_VMX_CR4_FIXED1
> > > (which is already the case).
> > > 6. The function, exception_class(), in x86.c, should be updated to
> > > categorize #CP as contributory.
> > > 7. The function, x86_exception_has_error_code(), in x86.h, should be
> > > updated to include #CP.
> > > 8. There appear to be multiple changes to SMM that you haven't
> > > implemented (e.g saving/restoring the SSP registers in/from SMRAM.
> > >
> > > CET is quite complex. Without any tests, I don't see how you can have
> > > any confidence in the correctness of this patch series.
> > Thanks Jim for the detailed comments.
> >
> > I missed adding test platform and
> > result introduction in cover letter. This serial of patch has passed CET
> > test in guest on Intel x86 emulator platform and develop machine.
> > Some feature mentioned in the spec. has not been implemented yet. e.g.,
> > "supervisor shadow stack control".
> 
> What I should have said is that I'd like to see kvm-unit-tests to
> exercise the new functionality, even if no one outside Intel can run
> them yet.
>
OK, let me figure out how to test it either in unit-test or selftest.
Thank you!
> > CET feature itself is complex, most of the enabling work is
> > inside kernel, the role of KVM is to expose CET related CPUID and MSRs
> > etc. to guest, and make guest take over control of the MSRs directly so that
> > CET can work efficiently for guest. There're QEMU patches for CET too.
> >
> > I'll review your comments carefully, thank you again!

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 2/7] kvm: vmx: Define CET VMCS fields and CPUID flags
  2019-10-02 18:04   ` Jim Mattson
@ 2019-10-09  5:56     ` Yang Weijiang
  0 siblings, 0 replies; 43+ messages in thread
From: Yang Weijiang @ 2019-10-09  5:56 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Yang Weijiang, kvm list, LKML, Paolo Bonzini,
	Sean Christopherson, Michael S. Tsirkin,
	Radim Krčmář,
	Aaron Lewis

On Wed, Oct 02, 2019 at 11:04:07AM -0700, Jim Mattson wrote:
> On Thu, Sep 26, 2019 at 7:17 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
> >
> > CET(Control-flow Enforcement Technology) is an upcoming Intel(R)
> > processor feature that blocks Return/Jump-Oriented Programming(ROP)
> > attacks. It provides the following capabilities to defend
> > against ROP/JOP style control-flow subversion attacks:
> >  /*
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index 9d282fec0a62..1aa86b87b6ab 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -365,13 +365,13 @@ static inline void do_cpuid_7_mask(struct kvm_cpuid_entry2 *entry, int index)
> >                 F(AVX512VBMI) | F(LA57) | F(PKU) | 0 /*OSPKE*/ |
> >                 F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
> >                 F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
> > -               F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B);
> > +               F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | F(SHSTK);
> >
> >         /* cpuid 7.0.edx*/
> >         const u32 kvm_cpuid_7_0_edx_x86_features =
> >                 F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
> >                 F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
> > -               F(MD_CLEAR);
> > +               F(MD_CLEAR) | F(IBT);
> 
> Claiming that SHSTK and IBT are supported in the guest seems premature
> as of this change, since you haven't actually done anything to
> virtualize the features yet.
>
OK, will put the flags in other patch.
> >         /* cpuid 7.1.eax */
> >         const u32 kvm_cpuid_7_1_eax_x86_features =
> > diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> > index fbffabad0370..a85800b23e6e 100644
> > --- a/arch/x86/kvm/x86.h
> > +++ b/arch/x86/kvm/x86.h
> > @@ -298,7 +298,8 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, unsigned long cr2,
> >   * Right now, no XSS states are used on x86 platform,
> >   * expand the macro for new features.
> >   */
> > -#define KVM_SUPPORTED_XSS      (0)
> > +#define KVM_SUPPORTED_XSS      (XFEATURE_MASK_CET_USER \
> > +                               | XFEATURE_MASK_CET_KERNEL)
> 
> If IA32_XSS can dynamically change within the guest, it will have to
> be enumerated by KVM_GET_MSR_INDEX_LIST.
thanks for pointing it out, need to add IA32_XSS to msrs_to_save list.

>(Note that Aaron Lewis is
> working on a series which will include that enumeration, if you'd like
> to wait.) I'm also not convinced that there is sufficient
> virtualization of these features to allow these bits to be set in the
> guest IA32_XSS at this point.
> 
It's true CET is working in guest after I added XSS/XSAVES support in
KVM and QEMU, but I'd like to look at Aaron's new patch...

> >  extern u64 host_xcr0;
> >
> > --
> > 2.17.2
> >

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 3/7] KVM: VMX: Pass through CET related MSRs to Guest
  2019-10-02 18:18   ` Jim Mattson
@ 2019-10-09  6:15     ` Yang Weijiang
  2019-10-10 19:04       ` Jim Mattson
  2019-10-17 20:04       ` Sean Christopherson
  0 siblings, 2 replies; 43+ messages in thread
From: Yang Weijiang @ 2019-10-09  6:15 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Yang Weijiang, kvm list, LKML, Paolo Bonzini,
	Sean Christopherson, Michael S. Tsirkin,
	Radim Krčmář

On Wed, Oct 02, 2019 at 11:18:32AM -0700, Jim Mattson wrote:
> On Thu, Sep 26, 2019 at 7:17 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
> >
> > CET MSRs pass through Guest directly to enhance performance.
> > CET runtime control settings are stored in MSR_IA32_{U,S}_CET,
> > Shadow Stack Pointer(SSP) are stored in MSR_IA32_PL{0,1,2,3}_SSP,
> > SSP table base address is stored in MSR_IA32_INT_SSP_TAB,
> > these MSRs are defined in kernel and re-used here.
> 
> All of these new guest MSRs will have to be enumerated by
> KVM_GET_MSR_INDEX_LIST.
> 
Since CET feature is Intel platform specific, but looks like KVM_GET_MSR_INDEX_LIST
fetchs x86 common MSRs, I have patch in QEMU to support CET
MSRs, the patch is here:
https://patchwork.ozlabs.org/patch/1058265/

> > MSR_IA32_U_CET and MSR_IA32_PL3_SSP are used for user mode protection,
> > the contents could differ from process to process, therefore,
> > kernel needs to save/restore them during context switch, it makes
> > sense to pass through them so that the guest kernel can
> > use xsaves/xrstors to operate them efficiently. Other MSRs are used
> > for non-user mode protection. See CET spec for detailed info.
> 
> I assume that XSAVES & XRSTORS bypass the MSR permission bitmap, like
> other instructions that manipulate MSRs (e.g. SWAPGS, RDTSCP, etc.).
> Is the guest OS likely to use RDMSR/WRMSR to access these MSRs?
> 
Yes, exactly, you may check the CET kernel code.

> > The difference between CET VMCS state fields and xsave components is that,
> > the former used for CET state storage during VMEnter/VMExit,
> > whereas the latter used for state retention between Guest task/process
> > switch.
> >
> > Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> > Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> > ---
> >  arch/x86/kvm/cpuid.c   |  1 +
> >  arch/x86/kvm/cpuid.h   |  2 ++
> >  arch/x86/kvm/vmx/vmx.c | 39 +++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 42 insertions(+)
> >
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index 1aa86b87b6ab..0a47b9e565be 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -66,6 +66,7 @@ u64 kvm_supported_xss(void)
> >  {
> >         return KVM_SUPPORTED_XSS & kvm_x86_ops->supported_xss();
> >  }
> > +EXPORT_SYMBOL_GPL(kvm_supported_xss);
> >
> >  #define F(x) bit(X86_FEATURE_##x)
> >
> > diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> > index d78a61408243..1d77b880084d 100644
> > --- a/arch/x86/kvm/cpuid.h
> > +++ b/arch/x86/kvm/cpuid.h
> > @@ -27,6 +27,8 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
> >
> >  int cpuid_query_maxphyaddr(struct kvm_vcpu *vcpu);
> >
> > +u64 kvm_supported_xss(void);
> > +
> >  static inline int cpuid_maxphyaddr(struct kvm_vcpu *vcpu)
> >  {
> >         return vcpu->arch.maxphyaddr;
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index a84198cff397..f720baa7a9ba 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -7001,6 +7001,43 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
> >                 vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4));
> >  }
> >
> > +static void vmx_intercept_cet_msrs(struct kvm_vcpu *vcpu)
> 
> Nit: It seems like this function adjusts the MSR permission bitmap so
> as *not* to intercept the CET MSRs.
>
OK, will rename it.
> > +{
> > +       struct vcpu_vmx *vmx = to_vmx(vcpu);
> > +       unsigned long *msr_bitmap;
> > +       u64 kvm_xss;
> > +       bool cet_en;
> > +
> > +       msr_bitmap = vmx->vmcs01.msr_bitmap;
> 
> What about nested guests? (i.e. vmcs02).
> 
Hmm, I need to check the nested case, thank you.

> > +       kvm_xss = kvm_supported_xss();
> > +       cet_en = guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) ||
> > +                guest_cpuid_has(vcpu, X86_FEATURE_IBT);
> > +       /*
> > +        * U_CET is a must for USER CET, per CET spec., U_CET and PL3_SPP are
> > +        * a bundle for USER CET xsaves.
> > +        */
> > +       if (cet_en && (kvm_xss & XFEATURE_MASK_CET_USER)) {
> > +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_U_CET, MSR_TYPE_RW);
> > +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL3_SSP, MSR_TYPE_RW);
> > +       }
> 
> Since this is called from vmx_cpuid_update, what happens if cet_en was
> previously true and now it's false?
> 
Yes, it's likely, but guest CPUID usually is fixed before
guest is launched, do you have any suggestion?
> > +       /*
> > +        * S_CET is a must for KERNEL CET, PL0_SSP ... PL2_SSP are a bundle
> > +        * for CET KERNEL xsaves.
> > +        */
> > +       if (cet_en && (kvm_xss & XFEATURE_MASK_CET_KERNEL)) {
> > +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_S_CET, MSR_TYPE_RW);
> > +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL0_SSP, MSR_TYPE_RW);
> > +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL1_SSP, MSR_TYPE_RW);
> > +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL2_SSP, MSR_TYPE_RW);
> > +
> > +               /* SSP_TAB only available for KERNEL SHSTK.*/
> > +               if (guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
> > +                       vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_INT_SSP_TAB,
> > +                                                     MSR_TYPE_RW);
> > +       }
> > +}
> > +
> >  static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
> >  {
> >         struct vcpu_vmx *vmx = to_vmx(vcpu);
> > @@ -7025,6 +7062,8 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
> >         if (boot_cpu_has(X86_FEATURE_INTEL_PT) &&
> >                         guest_cpuid_has(vcpu, X86_FEATURE_INTEL_PT))
> >                 update_intel_pt_cfg(vcpu);
> > +
> > +       vmx_intercept_cet_msrs(vcpu);
> >  }
> >
> >  static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
> > --
> > 2.17.2
> >

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 4/7] KVM: VMX: Load Guest CET via VMCS when CET is enabled in Guest
  2019-10-02 18:54   ` Jim Mattson
@ 2019-10-09  6:43     ` Yang Weijiang
  2019-10-09 23:08       ` Jim Mattson
  0 siblings, 1 reply; 43+ messages in thread
From: Yang Weijiang @ 2019-10-09  6:43 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Yang Weijiang, kvm list, LKML, Paolo Bonzini,
	Sean Christopherson, Michael S. Tsirkin,
	Radim Krčmář

On Wed, Oct 02, 2019 at 11:54:26AM -0700, Jim Mattson wrote:
> On Thu, Sep 26, 2019 at 7:17 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
> >
> > "Load Guest CET state" bit controls whether Guest CET states
> > will be loaded at Guest entry. Before doing that, KVM needs
> > to check if CPU CET feature is enabled on host and available
> > to Guest.
> >
> > Note: SHSTK and IBT features share one control MSR:
> > MSR_IA32_{U,S}_CET, which means it's difficult to hide
> > one feature from another in the case of SHSTK != IBT,
> > after discussed in community, it's agreed to allow Guest
> > control two features independently as it won't introduce
> > security hole.
> >
> > Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> > Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> > ---
> >  arch/x86/kvm/vmx/vmx.c | 35 +++++++++++++++++++++++++++++++++++
> >  1 file changed, 35 insertions(+)
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index f720baa7a9ba..ba1a83d11e69 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -44,6 +44,7 @@
> >  #include <asm/spec-ctrl.h>
> >  #include <asm/virtext.h>
> >  #include <asm/vmx.h>
> > +#include <asm/cet.h>
> >
> >  #include "capabilities.h"
> >  #include "cpuid.h"
> > @@ -2918,6 +2919,37 @@ void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
> >         vmcs_writel(GUEST_CR3, guest_cr3);
> >  }
> >
> > +static int set_cet_bit(struct kvm_vcpu *vcpu, unsigned long cr4)
> 
> Nit: This function does not appear to set CR4.CET, as the name would imply.
>
OK, will change it, thank you!
> > +{
> > +       struct vcpu_vmx *vmx = to_vmx(vcpu);
> > +       const u64 cet_bits = XFEATURE_MASK_CET_USER | XFEATURE_MASK_CET_KERNEL;
> > +       bool cet_xss = vmx_xsaves_supported() &&
> > +                      (kvm_supported_xss() & cet_bits);
> > +       bool cet_cpuid = guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) ||
> > +                        guest_cpuid_has(vcpu, X86_FEATURE_IBT);
> > +       bool cet_on = !!(cr4 & X86_CR4_CET);
> > +
> > +       if (cet_on && vmx->nested.vmxon)
> > +               return 1;
> 
> This constraint doesn't appear to be architected. Also, this prevents
> enabling CR4.CET when in VMX operation, but what about the other way
> around (i.e. entering VMX operation with CR4.CET enabled)?
> 
Yes, will think more for nested case in next release.

> > +       if (cet_on && !cpu_x86_cet_enabled())
> > +               return 1;
> 
> This seems odd. Why is kernel support for (SHSTK or IBT) required for
> the guest to use (SHSTK or IBT)? If there's a constraint, shouldn't it
> be 1:1? (i.e. kernel support for SHSTK is required for the guest to
> use SHSTK and kernel support for IBT is required for the guest to use
> IBT?) Either way, enforcement of this constraint seems late, here,
> when the guest is trying to set CR4 to a value that, per the guest
> CPUID information, should be legal. Shouldn't this constraint be
> applied when setting the guest CPUID information, disallowing the
> enumeration of SHSTK and/or IBT support on a platform where these
> features are unavailable or disabled in the kernel?
> 
In KVM do_cpuid_7_mask(), SHSTK/IBT flags are emulated with
cpuid(0x7,0), there in cpuid_mask(), it'll enforce the check against
host kernel CET status,
and host boot_cpu_data.x86_capability[X86_FEATURE_SHSTK/IBT] will be cleared
during host boot up if the feature is disabled or unavailable.

You may check the kernel CET patch.

Rgarding the 1:1 check, I'll add more strict check in next version,
thanks!

> > +       if (cet_on && !cet_xss)
> > +               return 1;
> 
> Again, this constraint seems like it's being applied too late.
> Returning 1 here will result in the guest's MOV-to-CR4 raising #GP,
> even though there is no architected reason for it to do so.
> 
see above.
> > +       if (cet_on && !cet_cpuid)
> > +               return 1;
> 
> What about the constraint that CR4.CET can't be set when CR0.WP is
> clear? (And the reverse needs to be handled in vmx_set_cr0).
> 
OK, will check this case, thank you!
> > +       if (cet_on)
> > +               vmcs_set_bits(VM_ENTRY_CONTROLS,
> > +                             VM_ENTRY_LOAD_GUEST_CET_STATE);
> 
> Have we ensured that this VM-entry control is supported on the platform?
> 
If all the checks pass, is it enought to ensure the control bit supported?
> > +       else
> > +               vmcs_clear_bits(VM_ENTRY_CONTROLS,
> > +                               VM_ENTRY_LOAD_GUEST_CET_STATE);
> > +       return 0;
> > +}
> > +
> >  int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
> >  {
> >         struct vcpu_vmx *vmx = to_vmx(vcpu);
> > @@ -2958,6 +2990,9 @@ int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
> >                         return 1;
> >         }
> >
> > +       if (set_cet_bit(vcpu, cr4))
> > +               return 1;
> > +
> >         if (vmx->nested.vmxon && !nested_cr4_valid(vcpu, cr4))
> >                 return 1;
> >
> > --
> > 2.17.2
> >

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 6/7] KVM: x86: Load Guest fpu state when accessing MSRs managed by XSAVES
  2019-10-02 19:56   ` Jim Mattson
@ 2019-10-09  6:46     ` Yang Weijiang
  0 siblings, 0 replies; 43+ messages in thread
From: Yang Weijiang @ 2019-10-09  6:46 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Yang Weijiang, kvm list, LKML, Paolo Bonzini,
	Sean Christopherson, Michael S. Tsirkin,
	Radim Krčmář

On Wed, Oct 02, 2019 at 12:56:30PM -0700, Jim Mattson wrote:
> On Thu, Sep 26, 2019 at 7:17 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
> >
> > From: Sean Christopherson <sean.j.christopherson@intel.com>
> >
 >  /*
> >   * Read or write a bunch of msrs. All parameters are kernel addresses.
> >   *
> > @@ -3009,11 +3017,23 @@ static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs,
> >                     int (*do_msr)(struct kvm_vcpu *vcpu,
> >                                   unsigned index, u64 *data))
> >  {
> > +       bool fpu_loaded = false;
> >         int i;
> > +       const u64 cet_bits = XFEATURE_MASK_CET_USER | XFEATURE_MASK_CET_KERNEL;
> > +       bool cet_xss = kvm_x86_ops->xsaves_supported() &&
> > +                      (kvm_supported_xss() & cet_bits);
> 
> It seems like I've seen a lot of checks like this. Can this be
> simplified (throughout this series) by sinking the
> kvm_x86_ops->xsaves_supported() check into kvm_supported_xss()? That
> is, shouldn't kvm_supported_xss() return 0 if
> kvm_x86_ops->xsaves_supported() is false?
>
OK, let me add this check, thank you!

> > -       for (i = 0; i < msrs->nmsrs; ++i)
> > +       for (i = 0; i < msrs->nmsrs; ++i) {
> > +               if (!fpu_loaded && cet_xss &&
> > +                   is_xsaves_msr(entries[i].index)) {
> > +                       kvm_load_guest_fpu(vcpu);
> > +                       fpu_loaded = true;
> > +               }
> >                 if (do_msr(vcpu, entries[i].index, &entries[i].data))
> >                         break;
> > +       }
> > +       if (fpu_loaded)
> > +               kvm_put_guest_fpu(vcpu);
> >
> >         return i;
> >  }
> > --
> > 2.17.2
> >

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 7/7] KVM: x86: Add user-space access interface for CET MSRs
  2019-10-02 20:57   ` Jim Mattson
@ 2019-10-09  6:56     ` Yang Weijiang
  2019-10-17 19:58     ` Sean Christopherson
  1 sibling, 0 replies; 43+ messages in thread
From: Yang Weijiang @ 2019-10-09  6:56 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Yang Weijiang, kvm list, LKML, Paolo Bonzini,
	Sean Christopherson, Michael S. Tsirkin,
	Radim Krčmář

On Wed, Oct 02, 2019 at 01:57:50PM -0700, Jim Mattson wrote:
> On Thu, Sep 26, 2019 at 7:17 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
> >
> > There're two different places storing Guest CET states, the states
> > managed with XSAVES/XRSTORS, as restored/saved
> > in previous patch, can be read/write directly from/to the MSRs.
> > For those stored in VMCS fields, they're access via vmcs_read/
> > vmcs_write.
> >
> > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> > ---
> >  arch/x86/kvm/vmx/vmx.c | 83 ++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 83 insertions(+)
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 44913e4ab558..5265db7cd2af 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -1671,6 +1671,49 @@ static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
> >         return 0;
> >  }
> >
> > +static int check_cet_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> 
> I'd suggest changing return type to bool, since you are essentially
> returning true or false.
>
Sure, will change it.

> > +{
> > +       u64 kvm_xss = kvm_supported_xss();
> > +
> > +       switch (msr_info->index) {
> > +       case MSR_IA32_PL0_SSP ... MSR_IA32_PL2_SSP:
> > +               if (!(kvm_xss | XFEATURE_MASK_CET_KERNEL))
> '|' should be '&'
Oops, thanks for the capture!

> > +                       return 1;
> > +               if (!msr_info->host_initiated &&
> > +                   !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
> > +                       return 1;
> > +               break;
> > +       case MSR_IA32_PL3_SSP:
> > +               if (!(kvm_xss | XFEATURE_MASK_CET_USER))
> '|' should be '&'
> > +                       return 1;
> > +               if (!msr_info->host_initiated &&
> > +                   !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
> > +                       return 1;
> > +               break;
> > +       case MSR_IA32_U_CET:
> > +               if (!(kvm_xss | XFEATURE_MASK_CET_USER))
> '|' should be '&'
> > +                       return 1;
> > +               if (!msr_info->host_initiated &&
> > +                   !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) &&
> > +                   !guest_cpuid_has(vcpu, X86_FEATURE_IBT))
> > +                       return 1;
> > +               break;
> > +       case MSR_IA32_S_CET:
> > +               if (!msr_info->host_initiated &&
> > +                   !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) &&
> > +                   !guest_cpuid_has(vcpu, X86_FEATURE_IBT))
> > +                       return 1;
> > +               break;
> > +       case MSR_IA32_INT_SSP_TAB:
> > +               if (!msr_info->host_initiated &&
> > +                   !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
> > +                       return 1;
> > +               break;
> > +       default:
> > +               return 1;
> > +       }
> > +       return 0;
> > +}
> >  /*
> >   * Reads an msr value (of 'msr_index') into 'pdata'.
> >   * Returns 0 on success, non-0 otherwise.
> > @@ -1788,6 +1831,26 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >                 else
> >                         msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
> >                 break;
> > +       case MSR_IA32_S_CET:
> > +               if (check_cet_msr(vcpu, msr_info))
> > +                       return 1;
> > +               msr_info->data = vmcs_readl(GUEST_S_CET);
> Have we ensured that this VMCS field exists?
> > +               break;
> > +       case MSR_IA32_INT_SSP_TAB:
> > +               if (check_cet_msr(vcpu, msr_info))
> > +                       return 1;
> > +               msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE);
> Have we ensured that this VMCS field exists?
If host kernel sets correct info, we can make sure here.

> > +               break;
> > +       case MSR_IA32_U_CET:
> Can this be lumped together with MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP, below?
Unfortunately, this MSR is not adjacent to below MSRs.

> > +               if (check_cet_msr(vcpu, msr_info))
> > +                       return 1;
> > +               rdmsrl(MSR_IA32_U_CET, msr_info->data);
> > +               break;
> > +       case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
> > +               if (check_cet_msr(vcpu, msr_info))
> > +                       return 1;
> > +               rdmsrl(msr_info->index, msr_info->data);
> > +               break;
> >         case MSR_TSC_AUX:
> >                 if (!msr_info->host_initiated &&
> >                     !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> > @@ -2039,6 +2102,26 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >                 else
> >                         vmx->pt_desc.guest.addr_a[index / 2] = data;
> >                 break;
> > +       case MSR_IA32_S_CET:
> > +               if (check_cet_msr(vcpu, msr_info))
> > +                       return 1;
> Bits 9:6 must be zero.
> > +               vmcs_writel(GUEST_S_CET, data);
> Have we ensured that this VMCS field exists?
> > +               break;
> > +       case MSR_IA32_INT_SSP_TAB:
> > +               if (check_cet_msr(vcpu, msr_info))
> > +                       return 1;
> Must be canonical. vCPU must support longmode.
> > +               vmcs_writel(GUEST_INTR_SSP_TABLE, data);
> Have we ensured that this VMCS field exists?
> > +               break;
> > +       case MSR_IA32_U_CET:
> > +               if (check_cet_msr(vcpu, msr_info))
> > +                       return 1;
> Bits 9:6 must be zero.
> > +               wrmsrl(MSR_IA32_U_CET, data);
> > +               break;
> > +       case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
> > +               if (check_cet_msr(vcpu, msr_info))
> > +                       return 1;
> 'Data' must be canonical and 4-byte aligned. High dword must be zero
> on vCPUs that don't support longmode.

Thanks a lot for all above captures, will add sanity checks in next
version.

> > +               wrmsrl(msr_info->index, data);
> > +               break;
> >         case MSR_TSC_AUX:
> >                 if (!msr_info->host_initiated &&
> >                     !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> > --
> > 2.17.2
> >

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 4/7] KVM: VMX: Load Guest CET via VMCS when CET is enabled in Guest
  2019-10-09  6:43     ` Yang Weijiang
@ 2019-10-09 23:08       ` Jim Mattson
  2019-10-10  1:30         ` Yang Weijiang
  0 siblings, 1 reply; 43+ messages in thread
From: Jim Mattson @ 2019-10-09 23:08 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: kvm list, LKML, Paolo Bonzini, Sean Christopherson,
	Michael S. Tsirkin, Radim Krčmář

On Tue, Oct 8, 2019 at 11:41 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
>
> On Wed, Oct 02, 2019 at 11:54:26AM -0700, Jim Mattson wrote:
> > On Thu, Sep 26, 2019 at 7:17 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
> > > +       if (cet_on)
> > > +               vmcs_set_bits(VM_ENTRY_CONTROLS,
> > > +                             VM_ENTRY_LOAD_GUEST_CET_STATE);
> >
> > Have we ensured that this VM-entry control is supported on the platform?
> >
> If all the checks pass, is it enought to ensure the control bit supported?

I don't think so. The only way to check to see if a VM-entry control
is supported is to check the relevant VMX capability MSR.

BTW, what about the corresponding VM-exit control?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 4/7] KVM: VMX: Load Guest CET via VMCS when CET is enabled in Guest
  2019-10-09 23:08       ` Jim Mattson
@ 2019-10-10  1:30         ` Yang Weijiang
  2019-10-10 23:44           ` Jim Mattson
  0 siblings, 1 reply; 43+ messages in thread
From: Yang Weijiang @ 2019-10-10  1:30 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Yang Weijiang, kvm list, LKML, Paolo Bonzini,
	Sean Christopherson, Michael S. Tsirkin,
	Radim Krčmář

On Wed, Oct 09, 2019 at 04:08:50PM -0700, Jim Mattson wrote:
> On Tue, Oct 8, 2019 at 11:41 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
> >
> > On Wed, Oct 02, 2019 at 11:54:26AM -0700, Jim Mattson wrote:
> > > On Thu, Sep 26, 2019 at 7:17 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
> > > > +       if (cet_on)
> > > > +               vmcs_set_bits(VM_ENTRY_CONTROLS,
> > > > +                             VM_ENTRY_LOAD_GUEST_CET_STATE);
> > >
> > > Have we ensured that this VM-entry control is supported on the platform?
> > >
> > If all the checks pass, is it enought to ensure the control bit supported?
> 
> I don't think so. The only way to check to see if a VM-entry control
> is supported is to check the relevant VMX capability MSR.
>
It's a bit odd, there's no relevant CET bit in VMX cap. MSR, so I have
to check like this.

> BTW, what about the corresponding VM-exit control?
The kernel supervisor mode CET is not implemented yet, so I don't load host CET
states on VM-exit, in future, I'll add it.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 3/7] KVM: VMX: Pass through CET related MSRs to Guest
  2019-10-09  6:15     ` Yang Weijiang
@ 2019-10-10 19:04       ` Jim Mattson
  2019-10-11  1:51         ` Yang Weijiang
  2019-10-17 20:04       ` Sean Christopherson
  1 sibling, 1 reply; 43+ messages in thread
From: Jim Mattson @ 2019-10-10 19:04 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: kvm list, LKML, Paolo Bonzini, Sean Christopherson,
	Michael S. Tsirkin, Radim Krčmář

On Tue, Oct 8, 2019 at 11:13 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
>
> On Wed, Oct 02, 2019 at 11:18:32AM -0700, Jim Mattson wrote:
> > On Thu, Sep 26, 2019 at 7:17 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
> > >
> > > CET MSRs pass through Guest directly to enhance performance.
> > > CET runtime control settings are stored in MSR_IA32_{U,S}_CET,
> > > Shadow Stack Pointer(SSP) are stored in MSR_IA32_PL{0,1,2,3}_SSP,
> > > SSP table base address is stored in MSR_IA32_INT_SSP_TAB,
> > > these MSRs are defined in kernel and re-used here.
> >
> > All of these new guest MSRs will have to be enumerated by
> > KVM_GET_MSR_INDEX_LIST.
> >
> Since CET feature is Intel platform specific, but looks like KVM_GET_MSR_INDEX_LIST
> fetchs x86 common MSRs, I have patch in QEMU to support CET
> MSRs, the patch is here:
> https://patchwork.ozlabs.org/patch/1058265/

Qemu is not the only user of kvm. All possible guest MSRs for the
platform *must* be enumerated by KVM_GET_MSR_INDEX_LIST. A number of
Intel-specific MSRs are already enumerated.

> > > MSR_IA32_U_CET and MSR_IA32_PL3_SSP are used for user mode protection,
> > > the contents could differ from process to process, therefore,
> > > kernel needs to save/restore them during context switch, it makes
> > > sense to pass through them so that the guest kernel can
> > > use xsaves/xrstors to operate them efficiently. Other MSRs are used
> > > for non-user mode protection. See CET spec for detailed info.
> >
> > I assume that XSAVES & XRSTORS bypass the MSR permission bitmap, like
> > other instructions that manipulate MSRs (e.g. SWAPGS, RDTSCP, etc.).
> > Is the guest OS likely to use RDMSR/WRMSR to access these MSRs?
> >
> Yes, exactly, you may check the CET kernel code.
>
> > > The difference between CET VMCS state fields and xsave components is that,
> > > the former used for CET state storage during VMEnter/VMExit,
> > > whereas the latter used for state retention between Guest task/process
> > > switch.
> > >
> > > Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> > > Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> > > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> > > ---
> > >  arch/x86/kvm/cpuid.c   |  1 +
> > >  arch/x86/kvm/cpuid.h   |  2 ++
> > >  arch/x86/kvm/vmx/vmx.c | 39 +++++++++++++++++++++++++++++++++++++++
> > >  3 files changed, 42 insertions(+)
> > >
> > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > > index 1aa86b87b6ab..0a47b9e565be 100644
> > > --- a/arch/x86/kvm/cpuid.c
> > > +++ b/arch/x86/kvm/cpuid.c
> > > @@ -66,6 +66,7 @@ u64 kvm_supported_xss(void)
> > >  {
> > >         return KVM_SUPPORTED_XSS & kvm_x86_ops->supported_xss();
> > >  }
> > > +EXPORT_SYMBOL_GPL(kvm_supported_xss);
> > >
> > >  #define F(x) bit(X86_FEATURE_##x)
> > >
> > > diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> > > index d78a61408243..1d77b880084d 100644
> > > --- a/arch/x86/kvm/cpuid.h
> > > +++ b/arch/x86/kvm/cpuid.h
> > > @@ -27,6 +27,8 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
> > >
> > >  int cpuid_query_maxphyaddr(struct kvm_vcpu *vcpu);
> > >
> > > +u64 kvm_supported_xss(void);
> > > +
> > >  static inline int cpuid_maxphyaddr(struct kvm_vcpu *vcpu)
> > >  {
> > >         return vcpu->arch.maxphyaddr;
> > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > index a84198cff397..f720baa7a9ba 100644
> > > --- a/arch/x86/kvm/vmx/vmx.c
> > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > @@ -7001,6 +7001,43 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
> > >                 vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4));
> > >  }
> > >
> > > +static void vmx_intercept_cet_msrs(struct kvm_vcpu *vcpu)
> >
> > Nit: It seems like this function adjusts the MSR permission bitmap so
> > as *not* to intercept the CET MSRs.
> >
> OK, will rename it.
> > > +{
> > > +       struct vcpu_vmx *vmx = to_vmx(vcpu);
> > > +       unsigned long *msr_bitmap;
> > > +       u64 kvm_xss;
> > > +       bool cet_en;
> > > +
> > > +       msr_bitmap = vmx->vmcs01.msr_bitmap;
> >
> > What about nested guests? (i.e. vmcs02).
> >
> Hmm, I need to check the nested case, thank you.
>
> > > +       kvm_xss = kvm_supported_xss();
> > > +       cet_en = guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) ||
> > > +                guest_cpuid_has(vcpu, X86_FEATURE_IBT);
> > > +       /*
> > > +        * U_CET is a must for USER CET, per CET spec., U_CET and PL3_SPP are
> > > +        * a bundle for USER CET xsaves.
> > > +        */
> > > +       if (cet_en && (kvm_xss & XFEATURE_MASK_CET_USER)) {
> > > +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_U_CET, MSR_TYPE_RW);
> > > +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL3_SSP, MSR_TYPE_RW);
> > > +       }
> >
> > Since this is called from vmx_cpuid_update, what happens if cet_en was
> > previously true and now it's false?
> >
> Yes, it's likely, but guest CPUID usually is fixed before
> guest is launched, do you have any suggestion?

How about an else clause?

> > > +       /*
> > > +        * S_CET is a must for KERNEL CET, PL0_SSP ... PL2_SSP are a bundle
> > > +        * for CET KERNEL xsaves.
> > > +        */
> > > +       if (cet_en && (kvm_xss & XFEATURE_MASK_CET_KERNEL)) {
> > > +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_S_CET, MSR_TYPE_RW);
> > > +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL0_SSP, MSR_TYPE_RW);
> > > +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL1_SSP, MSR_TYPE_RW);
> > > +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL2_SSP, MSR_TYPE_RW);
> > > +
> > > +               /* SSP_TAB only available for KERNEL SHSTK.*/
> > > +               if (guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
> > > +                       vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_INT_SSP_TAB,
> > > +                                                     MSR_TYPE_RW);
> > > +       }
> > > +}
> > > +
> > >  static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
> > >  {
> > >         struct vcpu_vmx *vmx = to_vmx(vcpu);
> > > @@ -7025,6 +7062,8 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
> > >         if (boot_cpu_has(X86_FEATURE_INTEL_PT) &&
> > >                         guest_cpuid_has(vcpu, X86_FEATURE_INTEL_PT))
> > >                 update_intel_pt_cfg(vcpu);
> > > +
> > > +       vmx_intercept_cet_msrs(vcpu);
> > >  }
> > >
> > >  static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
> > > --
> > > 2.17.2
> > >

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 4/7] KVM: VMX: Load Guest CET via VMCS when CET is enabled in Guest
  2019-10-10  1:30         ` Yang Weijiang
@ 2019-10-10 23:44           ` Jim Mattson
  2019-10-11  1:43             ` Yang Weijiang
  0 siblings, 1 reply; 43+ messages in thread
From: Jim Mattson @ 2019-10-10 23:44 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: kvm list, LKML, Paolo Bonzini, Sean Christopherson,
	Michael S. Tsirkin, Radim Krčmář

On Wed, Oct 9, 2019 at 6:28 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
>
> On Wed, Oct 09, 2019 at 04:08:50PM -0700, Jim Mattson wrote:
> > On Tue, Oct 8, 2019 at 11:41 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
> > >
> > > On Wed, Oct 02, 2019 at 11:54:26AM -0700, Jim Mattson wrote:
> > > > On Thu, Sep 26, 2019 at 7:17 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
> > > > > +       if (cet_on)
> > > > > +               vmcs_set_bits(VM_ENTRY_CONTROLS,
> > > > > +                             VM_ENTRY_LOAD_GUEST_CET_STATE);
> > > >
> > > > Have we ensured that this VM-entry control is supported on the platform?
> > > >
> > > If all the checks pass, is it enought to ensure the control bit supported?
> >
> > I don't think so. The only way to check to see if a VM-entry control
> > is supported is to check the relevant VMX capability MSR.
> >
> It's a bit odd, there's no relevant CET bit in VMX cap. MSR, so I have
> to check like this.

Bit 52 of the IA32_VMX_ENTRY_CTLS MSR (index 484H) [and bit 52 of the
IA32_VMX_TRUE_ENTRY_CTLS MSR (index 490H), on hardware that supports
the "true" VMX capability MSRs] will be 1 if it is legal to set bit 20
of the VM-entry controls field to 1.

> > BTW, what about the corresponding VM-exit control?
> The kernel supervisor mode CET is not implemented yet, so I don't load host CET
> states on VM-exit, in future, I'll add it.

If you don't clear the supervisor mode CET state on VM-exit and the
guest has set IA32_S_CET.SH_STK_EN, doesn't that mean that
supervisor-mode shadow stacks will then be enabled on the host after
VM-exit?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 4/7] KVM: VMX: Load Guest CET via VMCS when CET is enabled in Guest
  2019-10-10 23:44           ` Jim Mattson
@ 2019-10-11  1:43             ` Yang Weijiang
  0 siblings, 0 replies; 43+ messages in thread
From: Yang Weijiang @ 2019-10-11  1:43 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Yang Weijiang, kvm list, LKML, Paolo Bonzini,
	Sean Christopherson, Michael S. Tsirkin,
	Radim Krčmář

On Thu, Oct 10, 2019 at 04:44:17PM -0700, Jim Mattson wrote:
> On Wed, Oct 9, 2019 at 6:28 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
> >
> > On Wed, Oct 09, 2019 at 04:08:50PM -0700, Jim Mattson wrote:
> > > On Tue, Oct 8, 2019 at 11:41 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
> > > >
> > > > On Wed, Oct 02, 2019 at 11:54:26AM -0700, Jim Mattson wrote:
> > > > > On Thu, Sep 26, 2019 at 7:17 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
> > > > > > +       if (cet_on)
> > > > > > +               vmcs_set_bits(VM_ENTRY_CONTROLS,
> > > > > > +                             VM_ENTRY_LOAD_GUEST_CET_STATE);
> > > > >
> > > > > Have we ensured that this VM-entry control is supported on the platform?
> > > > >
> > > > If all the checks pass, is it enought to ensure the control bit supported?
> > >
> > > I don't think so. The only way to check to see if a VM-entry control
> > > is supported is to check the relevant VMX capability MSR.
> > >
> > It's a bit odd, there's no relevant CET bit in VMX cap. MSR, so I have
> > to check like this.
> 
> Bit 52 of the IA32_VMX_ENTRY_CTLS MSR (index 484H) [and bit 52 of the
> IA32_VMX_TRUE_ENTRY_CTLS MSR (index 490H), on hardware that supports
> the "true" VMX capability MSRs] will be 1 if it is legal to set bit 20
> of the VM-entry controls field to 1.
>
Oh, you meant this, I'll add the check, thanks.

> > > BTW, what about the corresponding VM-exit control?
> > The kernel supervisor mode CET is not implemented yet, so I don't load host CET
> > states on VM-exit, in future, I'll add it.
> 
> If you don't clear the supervisor mode CET state on VM-exit and the
> guest has set IA32_S_CET.SH_STK_EN, doesn't that mean that
> supervisor-mode shadow stacks will then be enabled on the host after
> VM-exit?

Yeah, I should clear the MSR on VM-exit before supervisor-mode is
implemented. Thank you!

BTW, I'll move this bit-set before VM-entry, vmx_set_cr4() is not a
suitable place to set the bit.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 3/7] KVM: VMX: Pass through CET related MSRs to Guest
  2019-10-10 19:04       ` Jim Mattson
@ 2019-10-11  1:51         ` Yang Weijiang
  0 siblings, 0 replies; 43+ messages in thread
From: Yang Weijiang @ 2019-10-11  1:51 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Yang Weijiang, kvm list, LKML, Paolo Bonzini,
	Sean Christopherson, Michael S. Tsirkin,
	Radim Krčmář

On Thu, Oct 10, 2019 at 12:04:40PM -0700, Jim Mattson wrote:
> On Tue, Oct 8, 2019 at 11:13 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
> >
> > On Wed, Oct 02, 2019 at 11:18:32AM -0700, Jim Mattson wrote:
> > > On Thu, Sep 26, 2019 at 7:17 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
> > > >
> > > > CET MSRs pass through Guest directly to enhance performance.
> > > > CET runtime control settings are stored in MSR_IA32_{U,S}_CET,
> > > > Shadow Stack Pointer(SSP) are stored in MSR_IA32_PL{0,1,2,3}_SSP,
> > > > SSP table base address is stored in MSR_IA32_INT_SSP_TAB,
> > > > these MSRs are defined in kernel and re-used here.
> > >
> > > All of these new guest MSRs will have to be enumerated by
> > > KVM_GET_MSR_INDEX_LIST.
> > >
> > Since CET feature is Intel platform specific, but looks like KVM_GET_MSR_INDEX_LIST
> > fetchs x86 common MSRs, I have patch in QEMU to support CET
> > MSRs, the patch is here:
> > https://patchwork.ozlabs.org/patch/1058265/
> 
> Qemu is not the only user of kvm. All possible guest MSRs for the
> platform *must* be enumerated by KVM_GET_MSR_INDEX_LIST. A number of
> Intel-specific MSRs are already enumerated.
>
Sure, will do that, thank you!

> > > > MSR_IA32_U_CET and MSR_IA32_PL3_SSP are used for user mode protection,
> > > > the contents could differ from process to process, therefore,
> > > > kernel needs to save/restore them during context switch, it makes
> > > > sense to pass through them so that the guest kernel can
> > > > use xsaves/xrstors to operate them efficiently. Other MSRs are used
> > > > for non-user mode protection. See CET spec for detailed info.
> > >
> > > I assume that XSAVES & XRSTORS bypass the MSR permission bitmap, like
> > > other instructions that manipulate MSRs (e.g. SWAPGS, RDTSCP, etc.).
> > > Is the guest OS likely to use RDMSR/WRMSR to access these MSRs?
> > >
> > Yes, exactly, you may check the CET kernel code.
> >
> > > > The difference between CET VMCS state fields and xsave components is that,
> > > > the former used for CET state storage during VMEnter/VMExit,
> > > > whereas the latter used for state retention between Guest task/process
> > > > switch.
> > > >
> > > > Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> > > > Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> > > > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> > > > ---
> > > >  arch/x86/kvm/cpuid.c   |  1 +
> > > >  arch/x86/kvm/cpuid.h   |  2 ++
> > > >  arch/x86/kvm/vmx/vmx.c | 39 +++++++++++++++++++++++++++++++++++++++
> > > >  3 files changed, 42 insertions(+)
> > > >
> > > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > > > index 1aa86b87b6ab..0a47b9e565be 100644
> > > > --- a/arch/x86/kvm/cpuid.c
> > > > +++ b/arch/x86/kvm/cpuid.c
> > > > @@ -66,6 +66,7 @@ u64 kvm_supported_xss(void)
> > > >  {
> > > >         return KVM_SUPPORTED_XSS & kvm_x86_ops->supported_xss();
> > > >  }
> > > > +EXPORT_SYMBOL_GPL(kvm_supported_xss);
> > > >
> > > >  #define F(x) bit(X86_FEATURE_##x)
> > > >
> > > > diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> > > > index d78a61408243..1d77b880084d 100644
> > > > --- a/arch/x86/kvm/cpuid.h
> > > > +++ b/arch/x86/kvm/cpuid.h
> > > > @@ -27,6 +27,8 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
> > > >
> > > >  int cpuid_query_maxphyaddr(struct kvm_vcpu *vcpu);
> > > >
> > > > +u64 kvm_supported_xss(void);
> > > > +
> > > >  static inline int cpuid_maxphyaddr(struct kvm_vcpu *vcpu)
> > > >  {
> > > >         return vcpu->arch.maxphyaddr;
> > > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > > index a84198cff397..f720baa7a9ba 100644
> > > > --- a/arch/x86/kvm/vmx/vmx.c
> > > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > > @@ -7001,6 +7001,43 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
> > > >                 vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4));
> > > >  }
> > > >
> > > > +static void vmx_intercept_cet_msrs(struct kvm_vcpu *vcpu)
> > >
> > > Nit: It seems like this function adjusts the MSR permission bitmap so
> > > as *not* to intercept the CET MSRs.
> > >
> > OK, will rename it.
> > > > +{
> > > > +       struct vcpu_vmx *vmx = to_vmx(vcpu);
> > > > +       unsigned long *msr_bitmap;
> > > > +       u64 kvm_xss;
> > > > +       bool cet_en;
> > > > +
> > > > +       msr_bitmap = vmx->vmcs01.msr_bitmap;
> > >
> > > What about nested guests? (i.e. vmcs02).
> > >
> > Hmm, I need to check the nested case, thank you.
> >
> > > > +       kvm_xss = kvm_supported_xss();
> > > > +       cet_en = guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) ||
> > > > +                guest_cpuid_has(vcpu, X86_FEATURE_IBT);
> > > > +       /*
> > > > +        * U_CET is a must for USER CET, per CET spec., U_CET and PL3_SPP are
> > > > +        * a bundle for USER CET xsaves.
> > > > +        */
> > > > +       if (cet_en && (kvm_xss & XFEATURE_MASK_CET_USER)) {
> > > > +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_U_CET, MSR_TYPE_RW);
> > > > +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL3_SSP, MSR_TYPE_RW);
> > > > +       }
> > >
> > > Since this is called from vmx_cpuid_update, what happens if cet_en was
> > > previously true and now it's false?
> > >
> > Yes, it's likely, but guest CPUID usually is fixed before
> > guest is launched, do you have any suggestion?
> 
> How about an else clause?
>
OK, will add else clauses on the MSRs. thank you.

> > > > +       /*
> > > > +        * S_CET is a must for KERNEL CET, PL0_SSP ... PL2_SSP are a bundle
> > > > +        * for CET KERNEL xsaves.
> > > > +        */
> > > > +       if (cet_en && (kvm_xss & XFEATURE_MASK_CET_KERNEL)) {
> > > > +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_S_CET, MSR_TYPE_RW);
> > > > +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL0_SSP, MSR_TYPE_RW);
> > > > +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL1_SSP, MSR_TYPE_RW);
> > > > +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL2_SSP, MSR_TYPE_RW);
> > > > +
> > > > +               /* SSP_TAB only available for KERNEL SHSTK.*/
> > > > +               if (guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
> > > > +                       vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_INT_SSP_TAB,
> > > > +                                                     MSR_TYPE_RW);
> > > > +       }
> > > > +}
> > > > +
> > > >  static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
> > > >  {
> > > >         struct vcpu_vmx *vmx = to_vmx(vcpu);
> > > > @@ -7025,6 +7062,8 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
> > > >         if (boot_cpu_has(X86_FEATURE_INTEL_PT) &&
> > > >                         guest_cpuid_has(vcpu, X86_FEATURE_INTEL_PT))
> > > >                 update_intel_pt_cfg(vcpu);
> > > > +
> > > > +       vmx_intercept_cet_msrs(vcpu);
> > > >  }
> > > >
> > > >  static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
> > > > --
> > > > 2.17.2
> > > >

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 1/7] KVM: CPUID: Fix IA32_XSS support in CPUID(0xd,i) enumeration
  2019-10-02 17:26   ` Jim Mattson
  2019-10-08  8:30     ` Yang Weijiang
@ 2019-10-17 19:46     ` Sean Christopherson
  2019-10-18  1:28       ` Yang Weijiang
  1 sibling, 1 reply; 43+ messages in thread
From: Sean Christopherson @ 2019-10-17 19:46 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Yang Weijiang, kvm list, LKML, Paolo Bonzini, Michael S. Tsirkin,
	Radim Krčmář

On Wed, Oct 02, 2019 at 10:26:10AM -0700, Jim Mattson wrote:
> On Thu, Sep 26, 2019 at 7:17 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
> > @@ -414,6 +419,50 @@ static inline void do_cpuid_7_mask(struct kvm_cpuid_entry2 *entry, int index)
> >         }
> >  }
> >
> > +static inline void do_cpuid_0xd_mask(struct kvm_cpuid_entry2 *entry, int index)
> > +{
> > +       unsigned int f_xsaves = kvm_x86_ops->xsaves_supported() ? F(XSAVES) : 0;
> 
> Does Intel have CPUs that support XSAVES but don't support the "enable
> XSAVES/XRSTORS" VM-execution control?

I doubt it.

> If so, what is the behavior of XSAVESXRSTORS on those CPUs in VMX
> non-root mode?

#UD.  If not, the CPU would be in violation of the SDM:

  If the "enable XSAVES/XRSTORS" VM-execution control is 0, XRSTORS causes
  an invalid-opcode exception (#UD).

> If not, why is this conditional F(XSAVES) here?

Because it's technically legal for the control to not be supported even
if the host doesn't have support.

> > +       /* cpuid 0xD.1.eax */
> > +       const u32 kvm_cpuid_D_1_eax_x86_features =
> > +               F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1) | f_xsaves;
> > +       u64 u_supported = kvm_supported_xcr0();
> > +       u64 s_supported = kvm_supported_xss();
> > +       u64 supported;
> > +
> > +       switch (index) {
> > +       case 0:
> > +               entry->eax &= u_supported;
> > +               entry->ebx = xstate_required_size(u_supported, false);
> 
> EBX could actually be zero, couldn't it? Since this output is
> context-dependent, I'm not sure how to interpret it when returned from
> KVM_GET_SUPPORTED_CPUID.

*sigh*.  It took me something like ten read throughs to understand what
you're saying.

Yes, it could be zero, though that ship may have sailed since the previous
code reported a non-zero value.  Whatever is done, KVM should be consistent
for all indices, i.e. either report zero or the max size.

> > +               entry->ecx = entry->ebx;
> > +               entry->edx = 0;
> 
> Shouldn't this be: entry->edx &= u_supported >> 32?

Probably.  The confusion likely stems from this wording in the SDM, where
it states the per-bit behavior and then also says all bits are reserved.
I think it makes sense to do as Jim suggested, and defer the reserved bit
handling to kvm_supported_{xcr0,xss}().

  Bit 31 - 00: Reports the supported bits of the upper 32 bits of XCR0.
  XCR0[n+32] can be set to 1 only if EDX[n] is 1.
  Bits 31 - 00: Reserved
 
> > +               break;
> > +       case 1:
> > +               supported = u_supported | s_supported;
> > +               entry->eax &= kvm_cpuid_D_1_eax_x86_features;
> > +               cpuid_mask(&entry->eax, CPUID_D_1_EAX);
> > +               entry->ebx = 0;
> > +               entry->edx = 0;
> 
> Shouldn't this be: entry->edx &= s_supported >> 32?

Same as above.
 
> > +               entry->ecx &= s_supported;
> > +               if (entry->eax & (F(XSAVES) | F(XSAVEC)))
> > +                       entry->ebx = xstate_required_size(supported, true);
> 
> As above, can't EBX just be zero, since it's context-dependent? What
> is the context when processing KVM_GET_SUPPORTED_CPUID? And why do we
> only fill this in when XSAVES or XSAVEC is supported?
> 
> > +               break;
> > +       default:
> > +               supported = (entry->ecx & 1) ? s_supported : u_supported;
> > +               if (!(supported & ((u64)1 << index))) {
> 
> Nit: 1ULL << index.

Even better:  BIT_ULL(index)

> > +                       entry->eax = 0;
> > +                       entry->ebx = 0;
> > +                       entry->ecx = 0;
> > +                       entry->edx = 0;
> > +                       return;
> > +               }
> > +               if (entry->ecx)
> > +                       entry->ebx = 0;
> 
> This seems to back up my claims above regarding the EBX output for
> cases 0 and 1, but aside from those subleaves, is this correct? For
> subleaves > 1, ECX bit 1 can be set for extended state components that
> need to be cache-line aligned. Such components could map to a valid
> bit in XCR0 and have a non-zero offset from the beginning of the
> non-compacted XSAVE area.
> 
> > +               entry->edx = 0;
> 
> This seems too aggressive. See my comments above regarding EDX outputs
> for cases 0 and 1.
> 
> > +               break;
> > +       }
> > +}

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 5/7] kvm: x86: Add CET CR4 bit and XSS support
  2019-10-02 19:05   ` Jim Mattson
@ 2019-10-17 19:56     ` Sean Christopherson
  2019-10-18  1:58       ` Yang Weijiang
  0 siblings, 1 reply; 43+ messages in thread
From: Sean Christopherson @ 2019-10-17 19:56 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Yang Weijiang, kvm list, LKML, Paolo Bonzini, Michael S. Tsirkin,
	Radim Krčmář

On Wed, Oct 02, 2019 at 12:05:23PM -0700, Jim Mattson wrote:
> On Thu, Sep 26, 2019 at 7:17 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
> >
> > CR4.CET(bit 23) is master enable bit for CET feature.
> > Previously, KVM did not support setting any bits in XSS
> > so it's hardcoded to check and inject a #GP if Guest
> > attempted to write a non-zero value to XSS, now it supports
> > CET related bits setting.
> >
> > Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> > Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> > ---
> >  arch/x86/include/asm/kvm_host.h |  4 +++-
> >  arch/x86/kvm/cpuid.c            | 11 +++++++++--
> >  arch/x86/kvm/vmx/vmx.c          |  6 +-----
> >  3 files changed, 13 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index d018df8c5f32..8f97269d6d9f 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -90,7 +90,8 @@
> >                           | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_PCIDE \
> >                           | X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \
> >                           | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \
> > -                         | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP))
> > +                         | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP \
> > +                         | X86_CR4_CET))
> >
> >  #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR)
> >
> > @@ -623,6 +624,7 @@ struct kvm_vcpu_arch {
> >
> >         u64 xcr0;
> >         u64 guest_supported_xcr0;
> > +       u64 guest_supported_xss;
> >         u32 guest_xstate_size;
> >
> >         struct kvm_pio_request pio;
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index 0a47b9e565be..dd3ddc6daa58 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -120,8 +120,15 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
> >         }
> >
> >         best = kvm_find_cpuid_entry(vcpu, 0xD, 1);
> > -       if (best && (best->eax & (F(XSAVES) | F(XSAVEC))))
> > -               best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
> > +       if (best && (best->eax & (F(XSAVES) | F(XSAVEC)))) {
> 
> Is XSAVEC alone sufficient? Don't we explicitly need XSAVES to
> save/restore the extended state components enumerated by IA32_XSS?

Hmm, I think the check would be ok as-is if vcpu->arch.ia32_xss is used
below, as ia32_xss is guaranteed to be zero if XSAVES isn't supported.

> > +               u64 kvm_xss = kvm_supported_xss();
> > +
> > +               best->ebx =
> > +                       xstate_required_size(vcpu->arch.xcr0 | kvm_xss, true);
> 
> Shouldn't this size be based on the *current* IA32_XSS value, rather
> than the supported IA32_XSS bits? (i.e.
> s/kvm_xss/vcpu->arch.ia32_xss/)

Ya.

> > +               vcpu->arch.guest_supported_xss = best->ecx & kvm_xss;
> 
> Shouldn't unsupported bits in best->ecx be masked off, so that the
> guest CPUID doesn't mis-report the capabilities of the vCPU?

I thought KVM liked to let userspace blow off their foot whenever possible?
KVM already enumerated what features are supported, it's a userspace bug
if it ignores the enumeration.

> > +       } else {
> > +               vcpu->arch.guest_supported_xss = 0;
> > +       }
> >
> >         /*

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 7/7] KVM: x86: Add user-space access interface for CET MSRs
  2019-10-02 20:57   ` Jim Mattson
  2019-10-09  6:56     ` Yang Weijiang
@ 2019-10-17 19:58     ` Sean Christopherson
  2019-10-18  1:32       ` Yang Weijiang
  1 sibling, 1 reply; 43+ messages in thread
From: Sean Christopherson @ 2019-10-17 19:58 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Yang Weijiang, kvm list, LKML, Paolo Bonzini, Michael S. Tsirkin,
	Radim Krčmář

On Wed, Oct 02, 2019 at 01:57:50PM -0700, Jim Mattson wrote:
> On Thu, Sep 26, 2019 at 7:17 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
> >
> > There're two different places storing Guest CET states, the states
> > managed with XSAVES/XRSTORS, as restored/saved
> > in previous patch, can be read/write directly from/to the MSRs.
> > For those stored in VMCS fields, they're access via vmcs_read/
> > vmcs_write.
> >
> > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> > ---
> >  arch/x86/kvm/vmx/vmx.c | 83 ++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 83 insertions(+)
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 44913e4ab558..5265db7cd2af 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -1671,6 +1671,49 @@ static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
> >         return 0;
> >  }
> >
> > +static int check_cet_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> 
> I'd suggest changing return type to bool, since you are essentially
> returning true or false.

I'd also be more explicit in the name, e.g. is_valid_cet_msr().

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 3/7] KVM: VMX: Pass through CET related MSRs to Guest
  2019-10-09  6:15     ` Yang Weijiang
  2019-10-10 19:04       ` Jim Mattson
@ 2019-10-17 20:04       ` Sean Christopherson
  2019-10-18  1:31         ` Yang Weijiang
  1 sibling, 1 reply; 43+ messages in thread
From: Sean Christopherson @ 2019-10-17 20:04 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: Jim Mattson, kvm list, LKML, Paolo Bonzini, Michael S. Tsirkin,
	Radim Krčmář

On Wed, Oct 09, 2019 at 02:15:09PM +0800, Yang Weijiang wrote:
> On Wed, Oct 02, 2019 at 11:18:32AM -0700, Jim Mattson wrote:
> > > +       kvm_xss = kvm_supported_xss();
> > > +       cet_en = guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) ||
> > > +                guest_cpuid_has(vcpu, X86_FEATURE_IBT);
> > > +       /*
> > > +        * U_CET is a must for USER CET, per CET spec., U_CET and PL3_SPP are
> > > +        * a bundle for USER CET xsaves.
> > > +        */
> > > +       if (cet_en && (kvm_xss & XFEATURE_MASK_CET_USER)) {
> > > +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_U_CET, MSR_TYPE_RW);
> > > +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL3_SSP, MSR_TYPE_RW);
> > > +       }
> > 
> > Since this is called from vmx_cpuid_update, what happens if cet_en was
> > previously true and now it's false?
> > 
> Yes, it's likely, but guest CPUID usually is fixed before
> guest is launched, do you have any suggestion?

Re-enable interception.  kvm_x86_ops->cpuid_update() is only called in
the ioctl flow, i.e. it's not performance critical.

> > > +       /*
> > > +        * S_CET is a must for KERNEL CET, PL0_SSP ... PL2_SSP are a bundle
> > > +        * for CET KERNEL xsaves.
> > > +        */
> > > +       if (cet_en && (kvm_xss & XFEATURE_MASK_CET_KERNEL)) {
> > > +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_S_CET, MSR_TYPE_RW);
> > > +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL0_SSP, MSR_TYPE_RW);
> > > +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL1_SSP, MSR_TYPE_RW);
> > > +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL2_SSP, MSR_TYPE_RW);
> > > +
> > > +               /* SSP_TAB only available for KERNEL SHSTK.*/
> > > +               if (guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
> > > +                       vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_INT_SSP_TAB,
> > > +                                                     MSR_TYPE_RW);
> > > +       }
> > > +}
> > > +
> > >  static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
> > >  {
> > >         struct vcpu_vmx *vmx = to_vmx(vcpu);
> > > @@ -7025,6 +7062,8 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
> > >         if (boot_cpu_has(X86_FEATURE_INTEL_PT) &&
> > >                         guest_cpuid_has(vcpu, X86_FEATURE_INTEL_PT))
> > >                 update_intel_pt_cfg(vcpu);
> > > +
> > > +       vmx_intercept_cet_msrs(vcpu);
> > >  }
> > >
> > >  static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
> > > --
> > > 2.17.2
> > >

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 1/7] KVM: CPUID: Fix IA32_XSS support in CPUID(0xd,i) enumeration
  2019-10-17 19:46     ` Sean Christopherson
@ 2019-10-18  1:28       ` Yang Weijiang
  2019-10-22 19:46         ` Sean Christopherson
  0 siblings, 1 reply; 43+ messages in thread
From: Yang Weijiang @ 2019-10-18  1:28 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Jim Mattson, Yang Weijiang, kvm list, LKML, Paolo Bonzini,
	Michael S. Tsirkin, Radim Krčmář

On Thu, Oct 17, 2019 at 12:46:22PM -0700, Sean Christopherson wrote:
> On Wed, Oct 02, 2019 at 10:26:10AM -0700, Jim Mattson wrote:
> > On Thu, Sep 26, 2019 at 7:17 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
> > > @@ -414,6 +419,50 @@ static inline void do_cpuid_7_mask(struct kvm_cpuid_entry2 *entry, int index)
> > >         }
> > >  }
> > >
> > > +static inline void do_cpuid_0xd_mask(struct kvm_cpuid_entry2 *entry, int index)
> > > +{
> > > +       unsigned int f_xsaves = kvm_x86_ops->xsaves_supported() ? F(XSAVES) : 0;
> > 
> > Does Intel have CPUs that support XSAVES but don't support the "enable
> > XSAVES/XRSTORS" VM-execution control?
> 
> I doubt it.
> 
> > If so, what is the behavior of XSAVESXRSTORS on those CPUs in VMX
> > non-root mode?
> 
> #UD.  If not, the CPU would be in violation of the SDM:
> 
>   If the "enable XSAVES/XRSTORS" VM-execution control is 0, XRSTORS causes
>   an invalid-opcode exception (#UD).
> 
> > If not, why is this conditional F(XSAVES) here?
> 
> Because it's technically legal for the control to not be supported even
> if the host doesn't have support.
> 
> > > +       /* cpuid 0xD.1.eax */
> > > +       const u32 kvm_cpuid_D_1_eax_x86_features =
> > > +               F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1) | f_xsaves;
> > > +       u64 u_supported = kvm_supported_xcr0();
> > > +       u64 s_supported = kvm_supported_xss();
> > > +       u64 supported;
> > > +
> > > +       switch (index) {
> > > +       case 0:
> > > +               entry->eax &= u_supported;
> > > +               entry->ebx = xstate_required_size(u_supported, false);
> > 
> > EBX could actually be zero, couldn't it? Since this output is
> > context-dependent, I'm not sure how to interpret it when returned from
> > KVM_GET_SUPPORTED_CPUID.
> 
> *sigh*.  It took me something like ten read throughs to understand what
> you're saying.
> 
> Yes, it could be zero, though that ship may have sailed since the previous
> code reported a non-zero value.  Whatever is done, KVM should be consistent
> for all indices, i.e. either report zero or the max size.
>
Thanks Seans! So I will add the check  *if (!supported)* back in next
version.

> > > +               entry->ecx = entry->ebx;
> > > +               entry->edx = 0;
> > 
> > Shouldn't this be: entry->edx &= u_supported >> 32?
> 
> Probably.  The confusion likely stems from this wording in the SDM, where
> it states the per-bit behavior and then also says all bits are reserved.
> I think it makes sense to do as Jim suggested, and defer the reserved bit
> handling to kvm_supported_{xcr0,xss}().
> 
>   Bit 31 - 00: Reports the supported bits of the upper 32 bits of XCR0.
>   XCR0[n+32] can be set to 1 only if EDX[n] is 1.
>   Bits 31 - 00: Reserved
>  
> > > +               break;
> > > +       case 1:
> > > +               supported = u_supported | s_supported;
> > > +               entry->eax &= kvm_cpuid_D_1_eax_x86_features;
> > > +               cpuid_mask(&entry->eax, CPUID_D_1_EAX);
> > > +               entry->ebx = 0;
> > > +               entry->edx = 0;
> > 
> > Shouldn't this be: entry->edx &= s_supported >> 32?
> 
> Same as above.
>  
Yes, I followed Jim's comments.

> > > +               entry->ecx &= s_supported;
> > > +               if (entry->eax & (F(XSAVES) | F(XSAVEC)))
> > > +                       entry->ebx = xstate_required_size(supported, true);
> > 
> > As above, can't EBX just be zero, since it's context-dependent? What
> > is the context when processing KVM_GET_SUPPORTED_CPUID? And why do we
> > only fill this in when XSAVES or XSAVEC is supported?
> > 
> > > +               break;
> > > +       default:
> > > +               supported = (entry->ecx & 1) ? s_supported : u_supported;
> > > +               if (!(supported & ((u64)1 << index))) {
> > 
> > Nit: 1ULL << index.
> 
> Even better:  BIT_ULL(index)
> 
> > > +                       entry->eax = 0;
> > > +                       entry->ebx = 0;
> > > +                       entry->ecx = 0;
> > > +                       entry->edx = 0;
> > > +                       return;
> > > +               }
> > > +               if (entry->ecx)
> > > +                       entry->ebx = 0;
> > 
> > This seems to back up my claims above regarding the EBX output for
> > cases 0 and 1, but aside from those subleaves, is this correct? For
> > subleaves > 1, ECX bit 1 can be set for extended state components that
> > need to be cache-line aligned. Such components could map to a valid
> > bit in XCR0 and have a non-zero offset from the beginning of the
> > non-compacted XSAVE area.
> > 
> > > +               entry->edx = 0;
> > 
> > This seems too aggressive. See my comments above regarding EDX outputs
> > for cases 0 and 1.
> > 
Sean, I don't know how to deal with entry->edx here as SDM says it's
reserved for valid subleaf.

> > > +               break;
> > > +       }
> > > +}

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 3/7] KVM: VMX: Pass through CET related MSRs to Guest
  2019-10-17 20:04       ` Sean Christopherson
@ 2019-10-18  1:31         ` Yang Weijiang
  0 siblings, 0 replies; 43+ messages in thread
From: Yang Weijiang @ 2019-10-18  1:31 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Yang Weijiang, Jim Mattson, kvm list, LKML, Paolo Bonzini,
	Michael S. Tsirkin, Radim Krčmář

On Thu, Oct 17, 2019 at 01:04:11PM -0700, Sean Christopherson wrote:
> On Wed, Oct 09, 2019 at 02:15:09PM +0800, Yang Weijiang wrote:
> > On Wed, Oct 02, 2019 at 11:18:32AM -0700, Jim Mattson wrote:
> > > > +       kvm_xss = kvm_supported_xss();
> > > > +       cet_en = guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) ||
> > > > +                guest_cpuid_has(vcpu, X86_FEATURE_IBT);
> > > > +       /*
> > > > +        * U_CET is a must for USER CET, per CET spec., U_CET and PL3_SPP are
> > > > +        * a bundle for USER CET xsaves.
> > > > +        */
> > > > +       if (cet_en && (kvm_xss & XFEATURE_MASK_CET_USER)) {
> > > > +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_U_CET, MSR_TYPE_RW);
> > > > +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL3_SSP, MSR_TYPE_RW);
> > > > +       }
> > > 
> > > Since this is called from vmx_cpuid_update, what happens if cet_en was
> > > previously true and now it's false?
> > > 
> > Yes, it's likely, but guest CPUID usually is fixed before
> > guest is launched, do you have any suggestion?
> 
> Re-enable interception.  kvm_x86_ops->cpuid_update() is only called in
> the ioctl flow, i.e. it's not performance critical.
>
OK, will add a *else* clause to re-intercept the MSRs as Jim suggested.

> > > > +       /*
> > > > +        * S_CET is a must for KERNEL CET, PL0_SSP ... PL2_SSP are a bundle
> > > > +        * for CET KERNEL xsaves.
> > > > +        */
> > > > +       if (cet_en && (kvm_xss & XFEATURE_MASK_CET_KERNEL)) {
> > > > +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_S_CET, MSR_TYPE_RW);
> > > > +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL0_SSP, MSR_TYPE_RW);
> > > > +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL1_SSP, MSR_TYPE_RW);
> > > > +               vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL2_SSP, MSR_TYPE_RW);
> > > > +
> > > > +               /* SSP_TAB only available for KERNEL SHSTK.*/
> > > > +               if (guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
> > > > +                       vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_INT_SSP_TAB,
> > > > +                                                     MSR_TYPE_RW);
> > > > +       }
> > > > +}
> > > > +
> > > >  static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
> > > >  {
> > > >         struct vcpu_vmx *vmx = to_vmx(vcpu);
> > > > @@ -7025,6 +7062,8 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
> > > >         if (boot_cpu_has(X86_FEATURE_INTEL_PT) &&
> > > >                         guest_cpuid_has(vcpu, X86_FEATURE_INTEL_PT))
> > > >                 update_intel_pt_cfg(vcpu);
> > > > +
> > > > +       vmx_intercept_cet_msrs(vcpu);
> > > >  }
> > > >
> > > >  static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
> > > > --
> > > > 2.17.2
> > > >

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 7/7] KVM: x86: Add user-space access interface for CET MSRs
  2019-10-17 19:58     ` Sean Christopherson
@ 2019-10-18  1:32       ` Yang Weijiang
  0 siblings, 0 replies; 43+ messages in thread
From: Yang Weijiang @ 2019-10-18  1:32 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Jim Mattson, Yang Weijiang, kvm list, LKML, Paolo Bonzini,
	Michael S. Tsirkin, Radim Krčmář

On Thu, Oct 17, 2019 at 12:58:11PM -0700, Sean Christopherson wrote:
> On Wed, Oct 02, 2019 at 01:57:50PM -0700, Jim Mattson wrote:
> > On Thu, Sep 26, 2019 at 7:17 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
> > >
> > > There're two different places storing Guest CET states, the states
> > > managed with XSAVES/XRSTORS, as restored/saved
> > > in previous patch, can be read/write directly from/to the MSRs.
> > > For those stored in VMCS fields, they're access via vmcs_read/
> > > vmcs_write.
> > >
> > > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> > > ---
> > >  arch/x86/kvm/vmx/vmx.c | 83 ++++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 83 insertions(+)
> > >
> > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > index 44913e4ab558..5265db7cd2af 100644
> > > --- a/arch/x86/kvm/vmx/vmx.c
> > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > @@ -1671,6 +1671,49 @@ static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
> > >         return 0;
> > >  }
> > >
> > > +static int check_cet_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > 
> > I'd suggest changing return type to bool, since you are essentially
> > returning true or false.
> 
> I'd also be more explicit in the name, e.g. is_valid_cet_msr().
Sure, thank you two.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 5/7] kvm: x86: Add CET CR4 bit and XSS support
  2019-10-17 19:56     ` Sean Christopherson
@ 2019-10-18  1:58       ` Yang Weijiang
  2019-10-22 20:13         ` Sean Christopherson
  0 siblings, 1 reply; 43+ messages in thread
From: Yang Weijiang @ 2019-10-18  1:58 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Jim Mattson, Yang Weijiang, kvm list, LKML, Paolo Bonzini,
	Michael S. Tsirkin, Radim Krčmář

On Thu, Oct 17, 2019 at 12:56:42PM -0700, Sean Christopherson wrote:
> On Wed, Oct 02, 2019 at 12:05:23PM -0700, Jim Mattson wrote:
> > On Thu, Sep 26, 2019 at 7:17 PM Yang Weijiang <weijiang.yang@intel.com> wrote:
> > >
> > > CR4.CET(bit 23) is master enable bit for CET feature.
> > > Previously, KVM did not support setting any bits in XSS
> > > so it's hardcoded to check and inject a #GP if Guest
> > > attempted to write a non-zero value to XSS, now it supports
> > > CET related bits setting.
> > >
> > > Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> > > Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> > > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> > > ---
> > >  arch/x86/include/asm/kvm_host.h |  4 +++-
> > >  arch/x86/kvm/cpuid.c            | 11 +++++++++--
> > >  arch/x86/kvm/vmx/vmx.c          |  6 +-----
> > >  3 files changed, 13 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index d018df8c5f32..8f97269d6d9f 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -90,7 +90,8 @@
> > >                           | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_PCIDE \
> > >                           | X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \
> > >                           | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \
> > > -                         | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP))
> > > +                         | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP \
> > > +                         | X86_CR4_CET))
> > >
> > >  #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR)
> > >
> > > @@ -623,6 +624,7 @@ struct kvm_vcpu_arch {
> > >
> > >         u64 xcr0;
> > >         u64 guest_supported_xcr0;
> > > +       u64 guest_supported_xss;
> > >         u32 guest_xstate_size;
> > >
> > >         struct kvm_pio_request pio;
> > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > > index 0a47b9e565be..dd3ddc6daa58 100644
> > > --- a/arch/x86/kvm/cpuid.c
> > > +++ b/arch/x86/kvm/cpuid.c
> > > @@ -120,8 +120,15 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
> > >         }
> > >
> > >         best = kvm_find_cpuid_entry(vcpu, 0xD, 1);
> > > -       if (best && (best->eax & (F(XSAVES) | F(XSAVEC))))
> > > -               best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
> > > +       if (best && (best->eax & (F(XSAVES) | F(XSAVEC)))) {
> > 
> > Is XSAVEC alone sufficient? Don't we explicitly need XSAVES to
> > save/restore the extended state components enumerated by IA32_XSS?
> 
> Hmm, I think the check would be ok as-is if vcpu->arch.ia32_xss is used
> below, as ia32_xss is guaranteed to be zero if XSAVES isn't supported.
> 
Thanks Sean having me re-capture this reply thread, it's lost in my
folder.
I added kvm_x86_ops->xsaves_supported() in kvm_supported_xss() and it
returns 0 if xsaves is not supported which suggested by Jim.

> > > +               u64 kvm_xss = kvm_supported_xss();
> > > +
> > > +               best->ebx =
> > > +                       xstate_required_size(vcpu->arch.xcr0 | kvm_xss, true);
> > 
> > Shouldn't this size be based on the *current* IA32_XSS value, rather
> > than the supported IA32_XSS bits? (i.e.
> > s/kvm_xss/vcpu->arch.ia32_xss/)
> 
> Ya.
>
I'm not sure if I understand correctly, kvm_xss is what KVM supports,
but arch.ia32_xss reflects what guest currently is using, shoudn't CPUID
report what KVM supports instead of current status?
Will CPUID match current IA32_XSS status if guest changes it runtime?

> > > +               vcpu->arch.guest_supported_xss = best->ecx & kvm_xss;
> > 
> > Shouldn't unsupported bits in best->ecx be masked off, so that the
> > guest CPUID doesn't mis-report the capabilities of the vCPU?
> 
> I thought KVM liked to let userspace blow off their foot whenever possible?
> KVM already enumerated what features are supported, it's a userspace bug
> if it ignores the enumeration.
> 
> > > +       } else {
> > > +               vcpu->arch.guest_supported_xss = 0;
> > > +       }
> > >
> > >         /*

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 1/7] KVM: CPUID: Fix IA32_XSS support in CPUID(0xd,i) enumeration
  2019-10-18  1:28       ` Yang Weijiang
@ 2019-10-22 19:46         ` Sean Christopherson
  2019-10-23  1:16           ` Yang Weijiang
  0 siblings, 1 reply; 43+ messages in thread
From: Sean Christopherson @ 2019-10-22 19:46 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: Jim Mattson, kvm list, LKML, Paolo Bonzini, Michael S. Tsirkin,
	Radim Krčmář

On Fri, Oct 18, 2019 at 09:28:09AM +0800, Yang Weijiang wrote:
> On Thu, Oct 17, 2019 at 12:46:22PM -0700, Sean Christopherson wrote:
> > On Wed, Oct 02, 2019 at 10:26:10AM -0700, Jim Mattson wrote:
> > > > +                       entry->eax = 0;
> > > > +                       entry->ebx = 0;
> > > > +                       entry->ecx = 0;
> > > > +                       entry->edx = 0;
> > > > +                       return;
> > > > +               }
> > > > +               if (entry->ecx)
> > > > +                       entry->ebx = 0;
> > > 
> > > This seems to back up my claims above regarding the EBX output for
> > > cases 0 and 1, but aside from those subleaves, is this correct? For
> > > subleaves > 1, ECX bit 1 can be set for extended state components that
> > > need to be cache-line aligned. Such components could map to a valid
> > > bit in XCR0 and have a non-zero offset from the beginning of the
> > > non-compacted XSAVE area.
> > > 
> > > > +               entry->edx = 0;
> > > 
> > > This seems too aggressive. See my comments above regarding EDX outputs
> > > for cases 0 and 1.
> > > 
> Sean, I don't know how to deal with entry->edx here as SDM says it's
> reserved for valid subleaf.

The SDM also states:

  Bit 31 - 00: Reports the supported bits of the upper 32 bits of XCR0.
  XCR0[n+32] can be set to 1 only if EDX[n] is 1.

the second part, "Bits 31 - 00: Reserved" is at best superfluous, e.g. it
could be interpreted as saying that XCR0[63:32] are currently reserved,
and at worst the extra qualifier is an SDM bug and should be removed.

TL;DR: Ignore the blurb about the bits being reserved.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 5/7] kvm: x86: Add CET CR4 bit and XSS support
  2019-10-18  1:58       ` Yang Weijiang
@ 2019-10-22 20:13         ` Sean Christopherson
  2019-10-23  1:19           ` Yang Weijiang
  0 siblings, 1 reply; 43+ messages in thread
From: Sean Christopherson @ 2019-10-22 20:13 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: Jim Mattson, kvm list, LKML, Paolo Bonzini, Michael S. Tsirkin,
	Radim Krčmář

On Fri, Oct 18, 2019 at 09:58:02AM +0800, Yang Weijiang wrote:
> On Thu, Oct 17, 2019 at 12:56:42PM -0700, Sean Christopherson wrote:
> > On Wed, Oct 02, 2019 at 12:05:23PM -0700, Jim Mattson wrote:
> > > > +               u64 kvm_xss = kvm_supported_xss();
> > > > +
> > > > +               best->ebx =
> > > > +                       xstate_required_size(vcpu->arch.xcr0 | kvm_xss, true);
> > > 
> > > Shouldn't this size be based on the *current* IA32_XSS value, rather
> > > than the supported IA32_XSS bits? (i.e.
> > > s/kvm_xss/vcpu->arch.ia32_xss/)
> > 
> > Ya.
> >
> I'm not sure if I understand correctly, kvm_xss is what KVM supports,
> but arch.ia32_xss reflects what guest currently is using, shoudn't CPUID
> report what KVM supports instead of current status?
> Will CPUID match current IA32_XSS status if guest changes it runtime?

Not in this case.  Select CPUID output is dependent on current state as
opposed to being a constant defind by hardware.  Per the SDM, EBX is:

  The size in bytes of the XSAVE area containing all states enabled by
  XCRO | IA32_XSS

Since KVM is emulating CPUID for the guest, XCR0 and IA32_XSS in this
context refers to the guest's current/actual XCR0/IA32_XSS values.  The
purpose of this behavior is so that software can call CPUID to query the
actual amount of memory that is needed for XSAVE(S), as opposed to the
absolute max size that _might_ be needed.

MONITOR/MWAIT is the other case that comes to mind where CPUID dynamically
reflects configured state, e.g. MWAIT is reported as unsupported if it's
disabled via IA32_MISC_ENABLE MSR.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 1/7] KVM: CPUID: Fix IA32_XSS support in CPUID(0xd,i) enumeration
  2019-10-22 19:46         ` Sean Christopherson
@ 2019-10-23  1:16           ` Yang Weijiang
  0 siblings, 0 replies; 43+ messages in thread
From: Yang Weijiang @ 2019-10-23  1:16 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Yang Weijiang, Jim Mattson, kvm list, LKML, Paolo Bonzini,
	Michael S. Tsirkin, Radim Krčmář

On Tue, Oct 22, 2019 at 12:46:15PM -0700, Sean Christopherson wrote:
> On Fri, Oct 18, 2019 at 09:28:09AM +0800, Yang Weijiang wrote:
> > On Thu, Oct 17, 2019 at 12:46:22PM -0700, Sean Christopherson wrote:
> > > On Wed, Oct 02, 2019 at 10:26:10AM -0700, Jim Mattson wrote:
> > > > > +                       entry->eax = 0;
> > > > > +                       entry->ebx = 0;
> > > > > +                       entry->ecx = 0;
> > > > > +                       entry->edx = 0;
> > > > > +                       return;
> > > > > +               }
> > > > > +               if (entry->ecx)
> > > > > +                       entry->ebx = 0;
> > > > 
> > > > This seems to back up my claims above regarding the EBX output for
> > > > cases 0 and 1, but aside from those subleaves, is this correct? For
> > > > subleaves > 1, ECX bit 1 can be set for extended state components that
> > > > need to be cache-line aligned. Such components could map to a valid
> > > > bit in XCR0 and have a non-zero offset from the beginning of the
> > > > non-compacted XSAVE area.
> > > > 
> > > > > +               entry->edx = 0;
> > > > 
> > > > This seems too aggressive. See my comments above regarding EDX outputs
> > > > for cases 0 and 1.
> > > > 
> > Sean, I don't know how to deal with entry->edx here as SDM says it's
> > reserved for valid subleaf.
> 
> The SDM also states:
> 
>   Bit 31 - 00: Reports the supported bits of the upper 32 bits of XCR0.
>   XCR0[n+32] can be set to 1 only if EDX[n] is 1.
> 
> the second part, "Bits 31 - 00: Reserved" is at best superfluous, e.g. it
> could be interpreted as saying that XCR0[63:32] are currently reserved,
> and at worst the extra qualifier is an SDM bug and should be removed.
> 
> TL;DR: Ignore the blurb about the bits being reserved.
Thanks Sean, I'll follow it.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v7 5/7] kvm: x86: Add CET CR4 bit and XSS support
  2019-10-22 20:13         ` Sean Christopherson
@ 2019-10-23  1:19           ` Yang Weijiang
  0 siblings, 0 replies; 43+ messages in thread
From: Yang Weijiang @ 2019-10-23  1:19 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Yang Weijiang, Jim Mattson, kvm list, LKML, Paolo Bonzini,
	Michael S. Tsirkin, Radim Krčmář

On Tue, Oct 22, 2019 at 01:13:21PM -0700, Sean Christopherson wrote:
> On Fri, Oct 18, 2019 at 09:58:02AM +0800, Yang Weijiang wrote:
> > On Thu, Oct 17, 2019 at 12:56:42PM -0700, Sean Christopherson wrote:
> > > On Wed, Oct 02, 2019 at 12:05:23PM -0700, Jim Mattson wrote:
> > > > > +               u64 kvm_xss = kvm_supported_xss();
> > > > > +
> > > > > +               best->ebx =
> > > > > +                       xstate_required_size(vcpu->arch.xcr0 | kvm_xss, true);
> > > > 
> > > > Shouldn't this size be based on the *current* IA32_XSS value, rather
> > > > than the supported IA32_XSS bits? (i.e.
> > > > s/kvm_xss/vcpu->arch.ia32_xss/)
> > > 
> > > Ya.
> > >
> > I'm not sure if I understand correctly, kvm_xss is what KVM supports,
> > but arch.ia32_xss reflects what guest currently is using, shoudn't CPUID
> > report what KVM supports instead of current status?
> > Will CPUID match current IA32_XSS status if guest changes it runtime?
> 
> Not in this case.  Select CPUID output is dependent on current state as
> opposed to being a constant defind by hardware.  Per the SDM, EBX is:
> 
>   The size in bytes of the XSAVE area containing all states enabled by
>   XCRO | IA32_XSS
> 
> Since KVM is emulating CPUID for the guest, XCR0 and IA32_XSS in this
> context refers to the guest's current/actual XCR0/IA32_XSS values.  The
> purpose of this behavior is so that software can call CPUID to query the
> actual amount of memory that is needed for XSAVE(S), as opposed to the
> absolute max size that _might_ be needed.
> 
> MONITOR/MWAIT is the other case that comes to mind where CPUID dynamically
> reflects configured state, e.g. MWAIT is reported as unsupported if it's
> disabled via IA32_MISC_ENABLE MSR.
Yep, make sense, thank you for explanation.

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2019-10-23  1:17 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-27  2:19 [PATCH v7 0/7] Introduce support for Guest CET feature Yang Weijiang
2019-09-27  2:19 ` [PATCH v7 1/7] KVM: CPUID: Fix IA32_XSS support in CPUID(0xd,i) enumeration Yang Weijiang
2019-10-02 17:26   ` Jim Mattson
2019-10-08  8:30     ` Yang Weijiang
2019-10-17 19:46     ` Sean Christopherson
2019-10-18  1:28       ` Yang Weijiang
2019-10-22 19:46         ` Sean Christopherson
2019-10-23  1:16           ` Yang Weijiang
2019-09-27  2:19 ` [PATCH v7 2/7] kvm: vmx: Define CET VMCS fields and CPUID flags Yang Weijiang
2019-10-02 18:04   ` Jim Mattson
2019-10-09  5:56     ` Yang Weijiang
2019-09-27  2:19 ` [PATCH v7 3/7] KVM: VMX: Pass through CET related MSRs to Guest Yang Weijiang
2019-10-02 18:18   ` Jim Mattson
2019-10-09  6:15     ` Yang Weijiang
2019-10-10 19:04       ` Jim Mattson
2019-10-11  1:51         ` Yang Weijiang
2019-10-17 20:04       ` Sean Christopherson
2019-10-18  1:31         ` Yang Weijiang
2019-09-27  2:19 ` [PATCH v7 4/7] KVM: VMX: Load Guest CET via VMCS when CET is enabled in Guest Yang Weijiang
2019-10-02 18:54   ` Jim Mattson
2019-10-09  6:43     ` Yang Weijiang
2019-10-09 23:08       ` Jim Mattson
2019-10-10  1:30         ` Yang Weijiang
2019-10-10 23:44           ` Jim Mattson
2019-10-11  1:43             ` Yang Weijiang
2019-09-27  2:19 ` [PATCH v7 5/7] kvm: x86: Add CET CR4 bit and XSS support Yang Weijiang
2019-10-02 19:05   ` Jim Mattson
2019-10-17 19:56     ` Sean Christopherson
2019-10-18  1:58       ` Yang Weijiang
2019-10-22 20:13         ` Sean Christopherson
2019-10-23  1:19           ` Yang Weijiang
2019-09-27  2:19 ` [PATCH v7 6/7] KVM: x86: Load Guest fpu state when accessing MSRs managed by XSAVES Yang Weijiang
2019-10-02 19:56   ` Jim Mattson
2019-10-09  6:46     ` Yang Weijiang
2019-09-27  2:19 ` [PATCH v7 7/7] KVM: x86: Add user-space access interface for CET MSRs Yang Weijiang
2019-10-02 20:57   ` Jim Mattson
2019-10-09  6:56     ` Yang Weijiang
2019-10-17 19:58     ` Sean Christopherson
2019-10-18  1:32       ` Yang Weijiang
2019-10-02 22:40 ` [PATCH v7 0/7] Introduce support for Guest CET feature Jim Mattson
2019-10-03 13:01   ` Yang Weijiang
2019-10-03 16:33     ` Jim Mattson
2019-10-08  8:50       ` Yang Weijiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).