linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/19] Refresh queued CET virtualization series
@ 2022-06-16  8:46 Yang Weijiang
  2022-06-16  8:46 ` [PATCH 01/19] x86/cet/shstk: Add Kconfig option for Shadow Stack Yang Weijiang
                   ` (20 more replies)
  0 siblings, 21 replies; 45+ messages in thread
From: Yang Weijiang @ 2022-06-16  8:46 UTC (permalink / raw)
  To: pbonzini, seanjc, x86, kvm, linux-kernel, rick.p.edgecombe; +Cc: weijiang.yang

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=yes, Size: 7340 bytes --]

The purpose of this patch series is to refresh the queued CET KVM
patches[1] with the latest dependent CET native patches, pursuing
the result that whole series could be merged ahead of CET native
series[2] [3].

The patchset has been tested on Skylake(none-CET) and Sapphire Rapids
(CET capable) platforms, didn't find breakages to KVM basic functions
and KVM unit-tests/selftests.

----------------------------------------------------------------------
The motivations are:
1) Customers are interested in developing CET related applications,
they especially desire to set up CET development environments in
VM, but suffered by non-trivial native and KVM patch rebasing work.
If this series could be merged early, it'll save them tons of energy.

2) The kernel and KVM have evolved significantly since the queued day,
it’s necessary to fix up the KVM patches to make them adapted to the
recent mainline code.

3) CET native patch series refactored a lot per maintainers’ review
and some of the patches can be reused by KVM enabling patches.

4) PeterZ’s supervisor IBT patch series got merged in 5.18, it
requires additional KVM patch to support it in guest kernel.

----------------------------------------------------------------------
Guest CET states in KVM:
CET user mode states(MSR_IA32_U_CET,MSR_IA32_PL3_SSP) counts on
xsaves/xrstors and CET user bit of MSR_IA32_XSS to save/restor when
thread/process context switch happens. In virtulization world, after
vm-exit and before vcpu thread exits to user mode, the guest user mode
states are swapped to guest fpu area and host user mode states are loaded,
vice-versa on vm-entry. See details in kvm_load_guest_fpu() and
kvm_put_guest_fpu(). With this design, guest CET xsave-supported states
retain while the vcpu thread keeps in ring-0 vmx root mode, the
instantaneous guest states are not expected to impact host side.

Moveover, VMCS includes new fields for CET states, i.e.,GUEST_S_CET,
GUEST_SSP, GUEST_INTR_SSP_TABLE for guest and HOST_S_CET, HOST_SSP,
HOST_INTR_SSP_TABLE for host, when loading guest/host state bits set
in VMCS, the guest/host MSRs are swapped at vm-exit/entry, therefore
these guest/host CET states are strictly isolated. All CET supervisor
states map to one of the fields. With the new fields, current guest
supervisor IBT enalbing doesn't depend on xsaves/xrstors and CET
supervisor bit of MSR_IA32_XSS.

---------------------------------------------------------------------
Impact to existing kernel/KVM:
To minimize the impact to exiting kernel/KVM code, most of KVM patch
code can be bypassed during runtime.Uncheck "CONFIG_X86_KERNEL_IBT"
and "CONFIG_X86_SHADOW_STACK" in Kconfig before kernel build to get
rid of CET featrures in KVM. If both of them are not enabled, KVM
clears related feature bits as well as CET user bit in supported_xss,
this makes CET related checks stop at the first points. Since most of
the patch code runs on the none-hot path of KVM, it's expected to
introduce little impact to existing code.
On legacy platforms, CET feature is not available by nature, therefore
the execution flow just like that on CET capable platform with
features disabled at build time.

One known downside of early merge is thread fpu area size expands by 16
bytes due to enabling XFEATURE_MASK_CET_USER bit on CET capable platforms.

Although native SHSTK and IBT patch series are split off, but we don't
want to do the same for KVM patches since supervisor IBT has been merged
and customers desire full user mode features in guest.

We'd like to get your comments on the practice and patches, thanks!

Patch 1-5: Dependent CET native patches.
Patch 6-7: KVM XSS Supporting patches from kvm/queue.
Patch 8-18: Enabling patches for CET user mode.
Patch 19:  Enabling patch for supervisor IBT.

Change logs:
1. Removed XFEATURE_MASK_CET_KERNEL, MSR_IA32_PL{0,1,2}_SSP and
   MSR_IA32_INT_SSP_TAB related code since supervisor SHSTK design is
   still open.
2. Added support for guest kernel supervisor IBT.
3. Refactored some of previous helpers due to change 1) and 2).
4. Refactored control logic between XSS CET user bit and user mode SHSTK/IBT,
   make supervisor IBT support independent to XSS user bit.
5. Rebased the patch series onto kvm/queue:
   8baacf67c76c ("KVM: SEV-ES: reuse advance_sev_es_emulated_ins for OUT too")

[1]: https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=intel
[2]: SHSTK: https://lore.kernel.org/all/20220130211838.8382-1-rick.p.edgecombe@intel.com/
[3]: old IBT: https://lore.kernel.org/all/20210830182221.3535-1-yu-cheng.yu@intel.com/

Rick Edgecombe (1):
  x86/fpu: Add helper for modifying xstate

Sean Christopherson (2):
  KVM: x86: Report XSS as an MSR to be saved if there are supported
    features
  KVM: x86: Load guest fpu state when accessing MSRs managed by XSAVES

Yang Weijiang (12):
  KVM: x86: Refresh CPUID on writes to MSR_IA32_XSS
  KVM: x86: Add #CP support in guest exception classification.
  KVM: VMX: Introduce CET VMCS fields and flags
  KVM: x86: Add fault checks for CR4.CET
  KVM: VMX: Emulate reads and writes to CET MSRs
  KVM: VMX: Add a synthetic MSR to allow userspace VMM to access
    GUEST_SSP
  KVM: x86: Report CET MSRs as to-be-saved if CET is supported
  KVM: x86: Save/Restore GUEST_SSP to/from SMM state save area
  KVM: x86: Enable CET virtualization for VMX and advertise CET to
    userspace
  KVM: VMX: Pass through CET MSRs to the guest when supported
  KVM: nVMX: Enable CET support for nested VMX
  KVM: x86: Enable supervisor IBT support for guest

Yu-cheng Yu (4):
  x86/cet/shstk: Add Kconfig option for Shadow Stack
  x86/cpufeatures: Add CPU feature flags for shadow stacks
  x86/cpufeatures: Enable CET CR4 bit for shadow stack
  x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states

 arch/x86/Kconfig                         |  17 +++
 arch/x86/Kconfig.assembler               |   1 +
 arch/x86/include/asm/cpu.h               |   2 +-
 arch/x86/include/asm/cpufeatures.h       |   1 +
 arch/x86/include/asm/disabled-features.h |   8 +-
 arch/x86/include/asm/fpu/api.h           |   7 +-
 arch/x86/include/asm/fpu/types.h         |  14 ++-
 arch/x86/include/asm/fpu/xstate.h        |   6 +-
 arch/x86/include/asm/kvm_host.h          |   3 +-
 arch/x86/include/asm/vmx.h               |   8 ++
 arch/x86/include/uapi/asm/kvm.h          |   1 +
 arch/x86/include/uapi/asm/kvm_para.h     |   1 +
 arch/x86/kernel/cpu/common.c             |  14 +--
 arch/x86/kernel/cpu/cpuid-deps.c         |   1 +
 arch/x86/kernel/fpu/core.c               |  19 ++++
 arch/x86/kernel/fpu/xstate.c             |  93 ++++++++--------
 arch/x86/kernel/machine_kexec_64.c       |   2 +-
 arch/x86/kvm/cpuid.c                     |  21 +++-
 arch/x86/kvm/cpuid.h                     |   5 +
 arch/x86/kvm/emulate.c                   |  11 ++
 arch/x86/kvm/vmx/capabilities.h          |   4 +
 arch/x86/kvm/vmx/nested.c                |  19 +++-
 arch/x86/kvm/vmx/vmcs12.c                |   6 +
 arch/x86/kvm/vmx/vmcs12.h                |  14 ++-
 arch/x86/kvm/vmx/vmx.c                   | 134 ++++++++++++++++++++++-
 arch/x86/kvm/x86.c                       |  95 ++++++++++++++--
 arch/x86/kvm/x86.h                       |  47 +++++++-
 27 files changed, 468 insertions(+), 86 deletions(-)


base-commit: 8baacf67c76c560fed954ac972b63e6e59a6fba0
-- 
2.27.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 01/19] x86/cet/shstk: Add Kconfig option for Shadow Stack
  2022-06-16  8:46 [PATCH 00/19] Refresh queued CET virtualization series Yang Weijiang
@ 2022-06-16  8:46 ` Yang Weijiang
  2022-06-16  8:46 ` [PATCH 02/19] x86/cpufeatures: Add CPU feature flags for shadow stacks Yang Weijiang
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Yang Weijiang @ 2022-06-16  8:46 UTC (permalink / raw)
  To: pbonzini, seanjc, x86, kvm, linux-kernel, rick.p.edgecombe
  Cc: weijiang.yang, Yu-cheng Yu, Kees Cook

From: Yu-cheng Yu <yu-cheng.yu@intel.com>

Shadow Stack provides protection against function return address
corruption. It is active when the processor supports it, the kernel has
CONFIG_X86_SHADOW_STACK enabled, and the application is built for the
feature. This is only implemented for the 64-bit kernel. When it is
enabled, legacy non-Shadow Stack applications continue to work, but without
protection.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Cc: Kees Cook <keescook@chromium.org>

---
v2:
 - Remove already wrong kernel size increase info (tlgx)
 - Change prompt to remove "Intel" (tglx)
 - Update line about what CPUs are supported (Dave)

Yu-cheng v25:
 - Remove X86_CET and use X86_SHADOW_STACK directly.

Yu-cheng v24:
 - Update for the splitting X86_CET to X86_SHADOW_STACK and X86_IBT.

 arch/x86/Kconfig           | 17 +++++++++++++++++
 arch/x86/Kconfig.assembler |  1 +
 2 files changed, 18 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 9783ebc4e021..79c6b0490350 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -26,6 +26,7 @@ config X86_64
 	depends on 64BIT
 	# Options that are inherently 64-bit kernel only:
 	select ARCH_HAS_GIGANTIC_PAGE
+	select ARCH_HAS_SHADOW_STACK
 	select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
 	select ARCH_USE_CMPXCHG_LOCKREF
 	select HAVE_ARCH_SOFT_DIRTY
@@ -1969,6 +1970,22 @@ config X86_SGX
 
 	  If unsure, say N.
 
+config ARCH_HAS_SHADOW_STACK
+	def_bool n
+
+config X86_SHADOW_STACK
+	prompt "X86 Shadow Stack"
+	def_bool n
+	depends on ARCH_HAS_SHADOW_STACK
+	help
+	  Shadow Stack protection is a hardware feature that detects function
+	  return address corruption. Today the kernel's support is limited to
+	  virtualizing it in KVM guests.
+
+	  CPUs supporting shadow stacks were first released in 2020.
+
+	  If unsure, say N.
+
 config EFI
 	bool "EFI runtime service support"
 	depends on ACPI
diff --git a/arch/x86/Kconfig.assembler b/arch/x86/Kconfig.assembler
index 26b8c08e2fc4..41428391e475 100644
--- a/arch/x86/Kconfig.assembler
+++ b/arch/x86/Kconfig.assembler
@@ -19,3 +19,4 @@ config AS_TPAUSE
 	def_bool $(as-instr,tpause %ecx)
 	help
 	  Supported by binutils >= 2.31.1 and LLVM integrated assembler >= V7
+
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 02/19] x86/cpufeatures: Add CPU feature flags for shadow stacks
  2022-06-16  8:46 [PATCH 00/19] Refresh queued CET virtualization series Yang Weijiang
  2022-06-16  8:46 ` [PATCH 01/19] x86/cet/shstk: Add Kconfig option for Shadow Stack Yang Weijiang
@ 2022-06-16  8:46 ` Yang Weijiang
  2022-06-16  8:46 ` [PATCH 03/19] x86/cpufeatures: Enable CET CR4 bit for shadow stack Yang Weijiang
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Yang Weijiang @ 2022-06-16  8:46 UTC (permalink / raw)
  To: pbonzini, seanjc, x86, kvm, linux-kernel, rick.p.edgecombe
  Cc: weijiang.yang, Yu-cheng Yu, Kees Cook

From: Yu-cheng Yu <yu-cheng.yu@intel.com>

The Control-Flow Enforcement Technology contains two related features,
one of which is Shadow Stacks. Future patches will utilize this feature
for shadow stack support in KVM, so add a CPU feature flags for Shadow
Stacks (CPUID.(EAX=7,ECX=0):ECX[bit 7]).

To protect shadow stack state from malicious modification, the registers
are only accessible in supervisor mode. This implementation
context-switches the registers with XSAVES. Make X86_FEATURE_SHSTK depend
on XSAVES.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Cc: Kees Cook <keescook@chromium.org>

---
v2:
 - Remove IBT reference in commit log (Kees)
 - Describe xsaves dependency using text from (Dave)

v1:
 - Remove IBT, can be added in a follow on IBT series.

Yu-cheng v25:
 - Make X86_FEATURE_IBT depend on X86_FEATURE_SHSTK.

Yu-cheng v24:
 - Update for splitting CONFIG_X86_CET to CONFIG_X86_SHADOW_STACK and
   CONFIG_X86_IBT.
 - Move DISABLE_IBT definition to the IBT series.

 arch/x86/include/asm/cpufeatures.h       | 1 +
 arch/x86/include/asm/disabled-features.h | 8 +++++++-
 arch/x86/kernel/cpu/cpuid-deps.c         | 1 +
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 393f2bbb5e3a..2a3aaf5e1052 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -355,6 +355,7 @@
 #define X86_FEATURE_OSPKE		(16*32+ 4) /* OS Protection Keys Enable */
 #define X86_FEATURE_WAITPKG		(16*32+ 5) /* UMONITOR/UMWAIT/TPAUSE Instructions */
 #define X86_FEATURE_AVX512_VBMI2	(16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */
+#define X86_FEATURE_SHSTK		(16*32+ 7) /* Shadow Stack */
 #define X86_FEATURE_GFNI		(16*32+ 8) /* Galois Field New Instructions */
 #define X86_FEATURE_VAES		(16*32+ 9) /* Vector AES */
 #define X86_FEATURE_VPCLMULQDQ		(16*32+10) /* Carry-Less Multiplication Double Quadword */
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index 36369e76cc63..c61c65bbc58d 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -68,6 +68,12 @@
 # define DISABLE_TDX_GUEST	(1 << (X86_FEATURE_TDX_GUEST & 31))
 #endif
 
+#ifdef CONFIG_X86_SHADOW_STACK
+#define DISABLE_SHSTK	0
+#else
+#define DISABLE_SHSTK	(1 << (X86_FEATURE_SHSTK & 31))
+#endif
+
 /*
  * Make sure to add features to the correct mask
  */
@@ -88,7 +94,7 @@
 #define DISABLED_MASK14	0
 #define DISABLED_MASK15	0
 #define DISABLED_MASK16	(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \
-			 DISABLE_ENQCMD)
+			 DISABLE_ENQCMD|DISABLE_SHSTK)
 #define DISABLED_MASK17	0
 #define DISABLED_MASK18	0
 #define DISABLED_MASK19	0
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index c881bcafba7d..bf1b55a1ba21 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -78,6 +78,7 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_XFD,			X86_FEATURE_XSAVES    },
 	{ X86_FEATURE_XFD,			X86_FEATURE_XGETBV1   },
 	{ X86_FEATURE_AMX_TILE,			X86_FEATURE_XFD       },
+	{ X86_FEATURE_SHSTK,			X86_FEATURE_XSAVES    },
 	{}
 };
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 03/19] x86/cpufeatures: Enable CET CR4 bit for shadow stack
  2022-06-16  8:46 [PATCH 00/19] Refresh queued CET virtualization series Yang Weijiang
  2022-06-16  8:46 ` [PATCH 01/19] x86/cet/shstk: Add Kconfig option for Shadow Stack Yang Weijiang
  2022-06-16  8:46 ` [PATCH 02/19] x86/cpufeatures: Add CPU feature flags for shadow stacks Yang Weijiang
@ 2022-06-16  8:46 ` Yang Weijiang
  2022-06-16 10:24   ` Peter Zijlstra
  2022-06-16 10:25   ` Peter Zijlstra
  2022-06-16  8:46 ` [PATCH 04/19] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states Yang Weijiang
                   ` (17 subsequent siblings)
  20 siblings, 2 replies; 45+ messages in thread
From: Yang Weijiang @ 2022-06-16  8:46 UTC (permalink / raw)
  To: pbonzini, seanjc, x86, kvm, linux-kernel, rick.p.edgecombe
  Cc: weijiang.yang, Yu-cheng Yu, Kees Cook

From: Yu-cheng Yu <yu-cheng.yu@intel.com>

Utilizing CET features requires a CR4 bit to be enabled as well as bits
to be set in CET MSRs. Setting the CR4 bit does two things:
 1. Enables the usage of WRUSS instruction, which the kernel can use to
    write to userspace shadow stacks.
 2. Allows those individual aspects of CET to be enabled later via the MSR.

While future patches will allow the MSR values to be saved and restored
per task, the CR4 bit will allow for WRUSS to be used regardless of if a
tasks CET MSRs have been restored.

Kernel IBT already enables the CR4 bit. Modify the logic to enable it for
when the kernel is configured with and detects shadow stack support, as
well.

Rename cet_disable() to ibt_disable() since it no longer applies to all
CET features in the kernel.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Cc: Kees Cook <keescook@chromium.org>

---
v2:
 - Drop no_user_shstk (Dave Hansen)
 - Elaborate on what the CR4 bit does in the commit log
 - Integrate with Kernel IBT logic

v1:
 - Moved kernel-parameters.txt changes here from patch 1.

Yu-cheng v25:
 - Remove software-defined X86_FEATURE_CET.

Yu-cheng v24:
 - Update #ifdef placement to reflect Kconfig changes of splitting shadow stack
   and ibt.

 arch/x86/include/asm/cpu.h         |  2 +-
 arch/x86/kernel/cpu/common.c       | 14 +++++++-------
 arch/x86/kernel/machine_kexec_64.c |  2 +-
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index 8cbf623f0ecf..a56270838435 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -74,7 +74,7 @@ void init_ia32_feat_ctl(struct cpuinfo_x86 *c);
 static inline void init_ia32_feat_ctl(struct cpuinfo_x86 *c) {}
 #endif
 
-extern __noendbr void cet_disable(void);
+extern __noendbr void ibt_disable(void);
 
 struct ucode_cpu_info;
 
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index c296cb1c0113..86102a8d451e 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -598,23 +598,23 @@ __noendbr void ibt_restore(u64 save)
 
 static __always_inline void setup_cet(struct cpuinfo_x86 *c)
 {
+	bool kernel_ibt = HAS_KERNEL_IBT && cpu_feature_enabled(X86_FEATURE_IBT);
 	u64 msr = CET_ENDBR_EN;
 
-	if (!HAS_KERNEL_IBT ||
-	    !cpu_feature_enabled(X86_FEATURE_IBT))
-		return;
+	if (kernel_ibt)
+		wrmsrl(MSR_IA32_S_CET, msr);
 
-	wrmsrl(MSR_IA32_S_CET, msr);
-	cr4_set_bits(X86_CR4_CET);
+	if (kernel_ibt || cpu_feature_enabled(X86_FEATURE_SHSTK))
+		cr4_set_bits(X86_CR4_CET);
 
-	if (!ibt_selftest()) {
+	if (kernel_ibt && !ibt_selftest()) {
 		pr_err("IBT selftest: Failed!\n");
 		setup_clear_cpu_cap(X86_FEATURE_IBT);
 		return;
 	}
 }
 
-__noendbr void cet_disable(void)
+__noendbr void ibt_disable(void)
 {
 	if (cpu_feature_enabled(X86_FEATURE_IBT))
 		wrmsrl(MSR_IA32_S_CET, 0);
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 0611fd83858e..745024654fcd 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -311,7 +311,7 @@ void machine_kexec(struct kimage *image)
 	/* Interrupts aren't acceptable while we reboot */
 	local_irq_disable();
 	hw_breakpoint_disable();
-	cet_disable();
+	ibt_disable();
 
 	if (image->preserve_context) {
 #ifdef CONFIG_X86_IO_APIC
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 04/19] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states
  2022-06-16  8:46 [PATCH 00/19] Refresh queued CET virtualization series Yang Weijiang
                   ` (2 preceding siblings ...)
  2022-06-16  8:46 ` [PATCH 03/19] x86/cpufeatures: Enable CET CR4 bit for shadow stack Yang Weijiang
@ 2022-06-16  8:46 ` Yang Weijiang
  2022-06-16 10:27   ` Peter Zijlstra
  2022-06-16  8:46 ` [PATCH 05/19] x86/fpu: Add helper for modifying xstate Yang Weijiang
                   ` (16 subsequent siblings)
  20 siblings, 1 reply; 45+ messages in thread
From: Yang Weijiang @ 2022-06-16  8:46 UTC (permalink / raw)
  To: pbonzini, seanjc, x86, kvm, linux-kernel, rick.p.edgecombe
  Cc: weijiang.yang, Yu-cheng Yu, Kees Cook

From: Yu-cheng Yu <yu-cheng.yu@intel.com>

Shadow stack register state can be managed with XSAVE. The registers
can logically be separated into two groups:
        * Registers controlling user-mode operation
        * Registers controlling kernel-mode operation

The architecture has two new XSAVE state components: one for each group
of those groups of registers. This lets an OS manage them separately if
it chooses. Future patches for host userspace and KVM guests will only
utilize the user-mode registers, so only configure XSAVE to save
user-mode registers. This state will add 16 bytes to the xsave buffer
size.

Future patches will use the user-mode XSAVE area to save guest user-mode
CET state. However, VMCS includes new fields for guest CET supervisor
states. KVM can use these to save and restore guest supervisor state, so
host supervisor XSAVE support is not required.

Adding this exacerbates the already unwieldy if statement in
check_xstate_against_struct() that handles warning about un-implemented
xfeatures. So refactor these check's by having XCHECK_SZ() set a bool when
it actually check's the xfeature. This ends up exceeding 80 chars, but was
better on balance than other options explored. Pass the bool as pointer to
make it clear that XCHECK_SZ() can change the variable.

While configuring user-mode XSAVE, clarify kernel-mode registers are not
managed by XSAVE by defining the xfeature in
XFEATURE_MASK_SUPERVISOR_UNSUPPORTED, like is done for XFEATURE_MASK_PT.
This serves more of a documentation as code purpose, and functionally,
only enables a few safety checks.

Both XSAVE state components are supervisor states, even the state
controlling user-mode operation. This is a departure from earlier features
like protection keys where the PKRU state a normal user (non-supervisor)
state. Having the user state be supervisor-managed ensures there is no
direct, unprivileged access to it, making it harder for an attacker to
subvert CET.

To facilitate this privileged access, define the two user-mode CET MSRs,
and the bits defined in those MSRs relevant to future shadow stack
enablement patches.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Cc: Kees Cook <keescook@chromium.org>

---
v2:
 - Reword commit log using some verbiage posted by Dave Hansen
 - Remove unlikely to be used supervisor cet xsave struct
 - Clarify that supervisor cet state is not saved by xsave
 - Remove unused supervisor MSRs

v1:
 - Remove outdated reference to sigreturn checks on msr's.

Yu-cheng v29:
 - Move CET MSR definition up in msr-index.h.

Yu-cheng v28:
 - Add XFEATURE_MASK_CET_USER to XFEATURES_INIT_FPSTATE_HANDLED.

Yu-cheng v25:
 - Update xsave_cpuid_features[].  Now CET XSAVES features depend on
   X86_FEATURE_SHSTK (vs. the software-defined X86_FEATURE_CET).

 arch/x86/include/asm/fpu/types.h  | 14 ++++-
 arch/x86/include/asm/fpu/xstate.h |  6 +-
 arch/x86/kernel/fpu/xstate.c      | 93 ++++++++++++++++---------------
 3 files changed, 63 insertions(+), 50 deletions(-)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index eb7cd1139d97..03aa98fb9c2b 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -115,8 +115,8 @@ enum xfeature {
 	XFEATURE_PT_UNIMPLEMENTED_SO_FAR,
 	XFEATURE_PKRU,
 	XFEATURE_PASID,
-	XFEATURE_RSRVD_COMP_11,
-	XFEATURE_RSRVD_COMP_12,
+	XFEATURE_CET_USER,
+	XFEATURE_CET_KERNEL_UNIMPLEMENTED_SO_FAR,
 	XFEATURE_RSRVD_COMP_13,
 	XFEATURE_RSRVD_COMP_14,
 	XFEATURE_LBR,
@@ -138,6 +138,8 @@ enum xfeature {
 #define XFEATURE_MASK_PT		(1 << XFEATURE_PT_UNIMPLEMENTED_SO_FAR)
 #define XFEATURE_MASK_PKRU		(1 << XFEATURE_PKRU)
 #define XFEATURE_MASK_PASID		(1 << XFEATURE_PASID)
+#define XFEATURE_MASK_CET_USER		(1 << XFEATURE_CET_USER)
+#define XFEATURE_MASK_CET_KERNEL	(1 << XFEATURE_CET_KERNEL_UNIMPLEMENTED_SO_FAR)
 #define XFEATURE_MASK_LBR		(1 << XFEATURE_LBR)
 #define XFEATURE_MASK_XTILE_CFG		(1 << XFEATURE_XTILE_CFG)
 #define XFEATURE_MASK_XTILE_DATA	(1 << XFEATURE_XTILE_DATA)
@@ -252,6 +254,14 @@ struct pkru_state {
 	u32				pad;
 } __packed;
 
+/*
+ * State component 11 is Control-flow Enforcement user states
+ */
+struct cet_user_state {
+	u64 user_cet;			/* user control-flow settings */
+	u64 user_ssp;			/* user shadow stack pointer */
+};
+
 /*
  * State component 15: Architectural LBR configuration state.
  * The size of Arch LBR state depends on the number of LBRs (lbr_depth).
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index cd3dd170e23a..d4427b88ee12 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -50,7 +50,8 @@
 #define XFEATURE_MASK_USER_DYNAMIC	XFEATURE_MASK_XTILE_DATA
 
 /* All currently supported supervisor features */
-#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID)
+#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID | \
+					    XFEATURE_MASK_CET_USER)
 
 /*
  * A supervisor state component may not always contain valuable information,
@@ -77,7 +78,8 @@
  * Unsupported supervisor features. When a supervisor feature in this mask is
  * supported in the future, move it to the supported supervisor feature mask.
  */
-#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT)
+#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT | \
+					      XFEATURE_MASK_CET_KERNEL)
 
 /* All supervisor states including supported and unsupported states. */
 #define XFEATURE_MASK_SUPERVISOR_ALL (XFEATURE_MASK_SUPERVISOR_SUPPORTED | \
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index c8340156bfd2..5e6a4867fd05 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -39,26 +39,26 @@
  */
 static const char *xfeature_names[] =
 {
-	"x87 floating point registers"	,
-	"SSE registers"			,
-	"AVX registers"			,
-	"MPX bounds registers"		,
-	"MPX CSR"			,
-	"AVX-512 opmask"		,
-	"AVX-512 Hi256"			,
-	"AVX-512 ZMM_Hi256"		,
-	"Processor Trace (unused)"	,
-	"Protection Keys User registers",
-	"PASID state",
-	"unknown xstate feature"	,
-	"unknown xstate feature"	,
-	"unknown xstate feature"	,
-	"unknown xstate feature"	,
-	"unknown xstate feature"	,
-	"unknown xstate feature"	,
-	"AMX Tile config"		,
-	"AMX Tile data"			,
-	"unknown xstate feature"	,
+	"x87 floating point registers"			,
+	"SSE registers"					,
+	"AVX registers"					,
+	"MPX bounds registers"				,
+	"MPX CSR"					,
+	"AVX-512 opmask"				,
+	"AVX-512 Hi256"					,
+	"AVX-512 ZMM_Hi256"				,
+	"Processor Trace (unused)"			,
+	"Protection Keys User registers"		,
+	"PASID state"					,
+	"Control-flow User registers"			,
+	"Control-flow Kernel registers (unused)"	,
+	"unknown xstate feature"			,
+	"unknown xstate feature"			,
+	"unknown xstate feature"			,
+	"unknown xstate feature"			,
+	"AMX Tile config"				,
+	"AMX Tile data"					,
+	"unknown xstate feature"			,
 };
 
 static unsigned short xsave_cpuid_features[] __initdata = {
@@ -73,6 +73,7 @@ static unsigned short xsave_cpuid_features[] __initdata = {
 	[XFEATURE_PT_UNIMPLEMENTED_SO_FAR]	= X86_FEATURE_INTEL_PT,
 	[XFEATURE_PKRU]				= X86_FEATURE_PKU,
 	[XFEATURE_PASID]			= X86_FEATURE_ENQCMD,
+	[XFEATURE_CET_USER]			= X86_FEATURE_SHSTK,
 	[XFEATURE_XTILE_CFG]			= X86_FEATURE_AMX_TILE,
 	[XFEATURE_XTILE_DATA]			= X86_FEATURE_AMX_TILE,
 };
@@ -276,6 +277,7 @@ static void __init print_xstate_features(void)
 	print_xstate_feature(XFEATURE_MASK_Hi16_ZMM);
 	print_xstate_feature(XFEATURE_MASK_PKRU);
 	print_xstate_feature(XFEATURE_MASK_PASID);
+	print_xstate_feature(XFEATURE_MASK_CET_USER);
 	print_xstate_feature(XFEATURE_MASK_XTILE_CFG);
 	print_xstate_feature(XFEATURE_MASK_XTILE_DATA);
 }
@@ -344,6 +346,7 @@ static __init void os_xrstor_booting(struct xregs_state *xstate)
 	 XFEATURE_MASK_BNDREGS |		\
 	 XFEATURE_MASK_BNDCSR |			\
 	 XFEATURE_MASK_PASID |			\
+	 XFEATURE_MASK_CET_USER |		\
 	 XFEATURE_MASK_XTILE)
 
 /*
@@ -446,13 +449,14 @@ static void __init __xstate_dump_leaves(void)
 	}									\
 } while (0)
 
-#define XCHECK_SZ(sz, nr, nr_macro, __struct) do {			\
-	if ((nr == nr_macro) &&						\
-	    WARN_ONCE(sz != sizeof(__struct),				\
-		"%s: struct is %zu bytes, cpu state %d bytes\n",	\
-		__stringify(nr_macro), sizeof(__struct), sz)) {		\
-		__xstate_dump_leaves();					\
-	}								\
+#define XCHECK_SZ(checked, sz, nr, nr_macro, __struct) do {			\
+	if (nr == nr_macro) {							\
+		*checked = true;						\
+		if (WARN_ONCE(sz != sizeof(__struct),				\
+			      "%s: struct is %zu bytes, cpu state %d bytes\n",	\
+			      __stringify(nr_macro), sizeof(__struct), sz))	\
+			__xstate_dump_leaves();					\
+	}									\
 } while (0)
 
 /**
@@ -527,33 +531,30 @@ static bool __init check_xstate_against_struct(int nr)
 	 * Ask the CPU for the size of the state.
 	 */
 	int sz = xfeature_size(nr);
+	bool chked = false;
+
 	/*
 	 * Match each CPU state with the corresponding software
 	 * structure.
 	 */
-	XCHECK_SZ(sz, nr, XFEATURE_YMM,       struct ymmh_struct);
-	XCHECK_SZ(sz, nr, XFEATURE_BNDREGS,   struct mpx_bndreg_state);
-	XCHECK_SZ(sz, nr, XFEATURE_BNDCSR,    struct mpx_bndcsr_state);
-	XCHECK_SZ(sz, nr, XFEATURE_OPMASK,    struct avx_512_opmask_state);
-	XCHECK_SZ(sz, nr, XFEATURE_ZMM_Hi256, struct avx_512_zmm_uppers_state);
-	XCHECK_SZ(sz, nr, XFEATURE_Hi16_ZMM,  struct avx_512_hi16_state);
-	XCHECK_SZ(sz, nr, XFEATURE_PKRU,      struct pkru_state);
-	XCHECK_SZ(sz, nr, XFEATURE_PASID,     struct ia32_pasid_state);
-	XCHECK_SZ(sz, nr, XFEATURE_XTILE_CFG, struct xtile_cfg);
+	XCHECK_SZ(&chked, sz, nr, XFEATURE_YMM,       struct ymmh_struct);
+	XCHECK_SZ(&chked, sz, nr, XFEATURE_BNDREGS,   struct mpx_bndreg_state);
+	XCHECK_SZ(&chked, sz, nr, XFEATURE_BNDCSR,    struct mpx_bndcsr_state);
+	XCHECK_SZ(&chked, sz, nr, XFEATURE_OPMASK,    struct avx_512_opmask_state);
+	XCHECK_SZ(&chked, sz, nr, XFEATURE_ZMM_Hi256, struct avx_512_zmm_uppers_state);
+	XCHECK_SZ(&chked, sz, nr, XFEATURE_Hi16_ZMM,  struct avx_512_hi16_state);
+	XCHECK_SZ(&chked, sz, nr, XFEATURE_PKRU,      struct pkru_state);
+	XCHECK_SZ(&chked, sz, nr, XFEATURE_PASID,     struct ia32_pasid_state);
+	XCHECK_SZ(&chked, sz, nr, XFEATURE_XTILE_CFG, struct xtile_cfg);
+	XCHECK_SZ(&chked, sz, nr, XFEATURE_CET_USER,  struct cet_user_state);
 
 	/* The tile data size varies between implementations. */
-	if (nr == XFEATURE_XTILE_DATA)
+	if (nr == XFEATURE_XTILE_DATA) {
 		check_xtile_data_against_struct(sz);
+		chked = true;
+	}
 
-	/*
-	 * Make *SURE* to add any feature numbers in below if
-	 * there are "holes" in the xsave state component
-	 * numbers.
-	 */
-	if ((nr < XFEATURE_YMM) ||
-	    (nr >= XFEATURE_MAX) ||
-	    (nr == XFEATURE_PT_UNIMPLEMENTED_SO_FAR) ||
-	    ((nr >= XFEATURE_RSRVD_COMP_11) && (nr <= XFEATURE_RSRVD_COMP_16))) {
+	if (!chked) {
 		WARN_ONCE(1, "no structure for xstate: %d\n", nr);
 		XSTATE_WARN_ON(1);
 		return false;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 05/19] x86/fpu: Add helper for modifying xstate
  2022-06-16  8:46 [PATCH 00/19] Refresh queued CET virtualization series Yang Weijiang
                   ` (3 preceding siblings ...)
  2022-06-16  8:46 ` [PATCH 04/19] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states Yang Weijiang
@ 2022-06-16  8:46 ` Yang Weijiang
  2022-06-16  8:46 ` [PATCH 06/19] KVM: x86: Report XSS as an MSR to be saved if there are supported features Yang Weijiang
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Yang Weijiang @ 2022-06-16  8:46 UTC (permalink / raw)
  To: pbonzini, seanjc, x86, kvm, linux-kernel, rick.p.edgecombe
  Cc: weijiang.yang, Thomas Gleixner

From: Rick Edgecombe <rick.p.edgecombe@intel.com>

Just like user xfeatures, supervisor xfeatures can be active in the
registers or present in the task FPU buffer. If the registers are
active, the registers can be modified directly. If the registers are
not active, the modification must be performed on the task FPU buffer.

When the state is not active, the kernel could perform modifications
directly to the buffer. But in order for it to do that, it needs
to know where in the buffer the specific state it wants to modify is
located. Doing this is not robust against optimizations that compact
the FPU buffer, as each access would require computing where in the
buffer it is.

The easiest way to modify supervisor xfeature data is to force restore
the registers and write directly to the MSRs. Often times this is just fine
anyway as the registers need to be restored before returning to userspace.
Do this for now, leaving buffer writing optimizations for the future.

Add a new function fpregs_lock_and_load() that can simultaneously call
fpregs_lock() and do this restore. Also perform some extra sanity
checks in this function since this will be used in non-fpu focused code.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>

---
v2:
 - Drop optimization of writing directly the buffer, and change API
   accordingly.
 - fpregs_lock_and_load() suggested by tglx
 - Some commit log verbiage from dhansen

v1:
 - New patch.

 arch/x86/include/asm/fpu/api.h |  7 ++++++-
 arch/x86/kernel/fpu/core.c     | 19 +++++++++++++++++++
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index 6b0f31fb53f7..4f34812b4dd5 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -82,6 +82,12 @@ static inline void fpregs_unlock(void)
 		preempt_enable();
 }
 
+/*
+ * Lock and load the fpu state into the registers, if they are not already
+ * loaded.
+ */
+void fpu_lock_and_load(void);
+
 #ifdef CONFIG_X86_DEBUG_FPU
 extern void fpregs_assert_state_consistent(void);
 #else
@@ -163,5 +169,4 @@ static inline bool fpstate_is_confidential(struct fpu_guest *gfpu)
 
 /* prctl */
 extern long fpu_xstate_prctl(int option, unsigned long arg2);
-
 #endif /* _ASM_X86_FPU_API_H */
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 0531d6a06df5..4d250dba1619 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -756,6 +756,25 @@ void switch_fpu_return(void)
 }
 EXPORT_SYMBOL_GPL(switch_fpu_return);
 
+void fpu_lock_and_load(void)
+{
+	/*
+	 * fpregs_lock() only disables preemption (mostly). So modifing state
+	 * in an interrupt could screw up some in progress fpregs operation,
+	 * but appear to work. Warn about it.
+	 */
+	WARN_ON_ONCE(!irq_fpu_usable());
+	WARN_ON_ONCE(current->flags & PF_KTHREAD);
+
+	fpregs_lock();
+
+	fpregs_assert_state_consistent();
+
+	if (test_thread_flag(TIF_NEED_FPU_LOAD))
+		fpregs_restore_userregs();
+}
+EXPORT_SYMBOL_GPL(fpu_lock_and_load);
+
 #ifdef CONFIG_X86_DEBUG_FPU
 /*
  * If current FPU state according to its tracking (loaded FPU context on this
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 06/19] KVM: x86: Report XSS as an MSR to be saved if there are supported features
  2022-06-16  8:46 [PATCH 00/19] Refresh queued CET virtualization series Yang Weijiang
                   ` (4 preceding siblings ...)
  2022-06-16  8:46 ` [PATCH 05/19] x86/fpu: Add helper for modifying xstate Yang Weijiang
@ 2022-06-16  8:46 ` Yang Weijiang
  2022-06-16  8:46 ` [PATCH 07/19] KVM: x86: Refresh CPUID on writes to MSR_IA32_XSS Yang Weijiang
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Yang Weijiang @ 2022-06-16  8:46 UTC (permalink / raw)
  To: pbonzini, seanjc, x86, kvm, linux-kernel, rick.p.edgecombe
  Cc: weijiang.yang, Sean Christopherson

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add MSR_IA32_XSS to the list of MSRs reported to userspace if
supported_xss is non-zero, i.e. KVM supports at least one XSS based
feature.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Message-Id: <20220517154100.29983-4-weijiang.yang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/x86.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2318a99139fa..f525228168b8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1446,6 +1446,7 @@ static const u32 msrs_to_save_all[] = {
 	MSR_F15H_PERF_CTR0, MSR_F15H_PERF_CTR1, MSR_F15H_PERF_CTR2,
 	MSR_F15H_PERF_CTR3, MSR_F15H_PERF_CTR4, MSR_F15H_PERF_CTR5,
 	MSR_IA32_XFD, MSR_IA32_XFD_ERR,
+	MSR_IA32_XSS,
 };
 
 static u32 msrs_to_save[ARRAY_SIZE(msrs_to_save_all)];
@@ -6780,6 +6781,10 @@ static void kvm_init_msr_list(void)
 			if (!kvm_cpu_cap_has(X86_FEATURE_XFD))
 				continue;
 			break;
+		case MSR_IA32_XSS:
+			if (!kvm_caps.supported_xss)
+				continue;
+			break;
 		default:
 			break;
 		}
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 07/19] KVM: x86: Refresh CPUID on writes to MSR_IA32_XSS
  2022-06-16  8:46 [PATCH 00/19] Refresh queued CET virtualization series Yang Weijiang
                   ` (5 preceding siblings ...)
  2022-06-16  8:46 ` [PATCH 06/19] KVM: x86: Report XSS as an MSR to be saved if there are supported features Yang Weijiang
@ 2022-06-16  8:46 ` Yang Weijiang
  2022-06-16  8:46 ` [PATCH 08/19] KVM: x86: Load guest fpu state when accessing MSRs managed by XSAVES Yang Weijiang
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Yang Weijiang @ 2022-06-16  8:46 UTC (permalink / raw)
  To: pbonzini, seanjc, x86, kvm, linux-kernel, rick.p.edgecombe
  Cc: weijiang.yang, Zhang Yi Z

Updated CPUID.0xD.0x1, which reports the current required storage size
of all features enabled via XCR0 | XSS, when the guest's XSS is modified.

Note, KVM does not yet support any XSS based features, i.e. supported_xss
is guaranteed to be zero at this time.

Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Message-Id: <20220517154100.29983-5-weijiang.yang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/cpuid.c | 16 +++++++++++++---
 arch/x86/kvm/x86.c   |  6 ++++--
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index d47222ab8e6e..46ca0f1abbcb 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -240,9 +240,19 @@ static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_e
 		best->ebx = xstate_required_size(vcpu->arch.xcr0, false);
 
 	best = cpuid_entry2_find(entries, nent, 0xD, 1);
-	if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
-		     cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
-		best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
+	if (best) {
+		if (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
+		    cpuid_entry_has(best, X86_FEATURE_XSAVEC))  {
+			u64 xstate = vcpu->arch.xcr0 | vcpu->arch.ia32_xss;
+
+			best->ebx = xstate_required_size(xstate, true);
+		}
+
+		if (!cpuid_entry_has(best, X86_FEATURE_XSAVES)) {
+			best->ecx = 0;
+			best->edx = 0;
+		}
+	}
 
 	best = __kvm_find_kvm_cpuid_features(vcpu, entries, nent);
 	if (kvm_hlt_in_guest(vcpu->kvm) && best &&
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f525228168b8..06fbd3daf393 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3605,8 +3605,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		 */
 		if (data & ~kvm_caps.supported_xss)
 			return 1;
-		vcpu->arch.ia32_xss = data;
-		kvm_update_cpuid_runtime(vcpu);
+		if (vcpu->arch.ia32_xss != data) {
+			vcpu->arch.ia32_xss = data;
+			kvm_update_cpuid_runtime(vcpu);
+		}
 		break;
 	case MSR_SMI_COUNT:
 		if (!msr_info->host_initiated)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 08/19] KVM: x86: Load guest fpu state when accessing MSRs managed by XSAVES
  2022-06-16  8:46 [PATCH 00/19] Refresh queued CET virtualization series Yang Weijiang
                   ` (6 preceding siblings ...)
  2022-06-16  8:46 ` [PATCH 07/19] KVM: x86: Refresh CPUID on writes to MSR_IA32_XSS Yang Weijiang
@ 2022-06-16  8:46 ` Yang Weijiang
  2022-06-16  8:46 ` [PATCH 09/19] KVM: x86: Add #CP support in guest exception classification Yang Weijiang
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Yang Weijiang @ 2022-06-16  8:46 UTC (permalink / raw)
  To: pbonzini, seanjc, x86, kvm, linux-kernel, rick.p.edgecombe
  Cc: weijiang.yang, Sean Christopherson

From: Sean Christopherson <sean.j.christopherson@intel.com>

If new feature MSRs are supported in XSS and passed through to the guest
they are saved and restored by XSAVES/XRSTORS, i.e. in the guest's FPU
state.

Load the guest's FPU state if userspace is accessing MSRs whose values are
managed by XSAVES so that the MSR helper, e.g. kvm_{get,set}_xsave_msr(),
can simply do {RD,WR}MSR to access the guest's value.

Because is also used for the KVM_GET_MSRS device ioctl(), explicitly
check that @vcpu is non-null before attempting to load guest state.  The
XSS supporting MSRs cannot be retrieved via the device ioctl() without
loading guest FPU state (which doesn't exist).

Note that guest_cpuid_has() is not queried as host userspace is allowed
to access MSRs that have not been exposed to the guest, e.g. it might do
KVM_SET_MSRS prior to KVM_SET_CPUID2.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/x86.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 06fbd3daf393..506454e17afc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -130,6 +130,9 @@ static int kvm_vcpu_do_singlestep(struct kvm_vcpu *vcpu);
 static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
 static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
 
+static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
+static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
+
 struct kvm_x86_ops kvm_x86_ops __read_mostly;
 
 #define KVM_X86_OP(func)					     \
@@ -4138,6 +4141,11 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 }
 EXPORT_SYMBOL_GPL(kvm_get_msr_common);
 
+static bool is_xsaves_msr(u32 index)
+{
+	return index == MSR_IA32_U_CET || index == MSR_IA32_PL3_SSP;
+}
+
 /*
  * Read or write a bunch of msrs. All parameters are kernel addresses.
  *
@@ -4148,11 +4156,20 @@ static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs,
 		    int (*do_msr)(struct kvm_vcpu *vcpu,
 				  unsigned index, u64 *data))
 {
+	bool fpu_loaded = false;
 	int i;
 
-	for (i = 0; i < msrs->nmsrs; ++i)
+	for (i = 0; i < msrs->nmsrs; ++i) {
+		if (vcpu && !fpu_loaded && kvm_caps.supported_xss &&
+		    is_xsaves_msr(entries[i].index)) {
+			kvm_load_guest_fpu(vcpu);
+			fpu_loaded = true;
+		}
 		if (do_msr(vcpu, entries[i].index, &entries[i].data))
 			break;
+	}
+	if (fpu_loaded)
+		kvm_put_guest_fpu(vcpu);
 
 	return i;
 }
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 09/19] KVM: x86: Add #CP support in guest exception classification.
  2022-06-16  8:46 [PATCH 00/19] Refresh queued CET virtualization series Yang Weijiang
                   ` (7 preceding siblings ...)
  2022-06-16  8:46 ` [PATCH 08/19] KVM: x86: Load guest fpu state when accessing MSRs managed by XSAVES Yang Weijiang
@ 2022-06-16  8:46 ` Yang Weijiang
  2022-06-16  8:46 ` [PATCH 10/19] KVM: VMX: Introduce CET VMCS fields and flags Yang Weijiang
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Yang Weijiang @ 2022-06-16  8:46 UTC (permalink / raw)
  To: pbonzini, seanjc, x86, kvm, linux-kernel, rick.p.edgecombe; +Cc: weijiang.yang

Add handling for Control Protection (#CP) exceptions, vector 21, used
and introduced by Intel's Control-Flow Enforcement Technology (CET).
relevant CET violation case.  See Intel's SDM for details.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Message-Id: <20210203113421.5759-5-weijiang.yang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/uapi/asm/kvm.h |  1 +
 arch/x86/kvm/vmx/nested.c       |  2 +-
 arch/x86/kvm/x86.c              | 10 +++++++---
 arch/x86/kvm/x86.h              | 13 ++++++++++---
 4 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 50a4e787d5e6..69146e7436af 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -32,6 +32,7 @@
 #define MC_VECTOR 18
 #define XM_VECTOR 19
 #define VE_VECTOR 20
+#define CP_VECTOR 21
 
 /* Select x86 specific features in <linux/kvm.h> */
 #define __KVM_HAVE_PIT
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 7d8cd0ebcc75..01fe23c6fa49 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2819,7 +2819,7 @@ static int nested_check_vm_entry_controls(struct kvm_vcpu *vcpu,
 		/* VM-entry interruption-info field: deliver error code */
 		should_have_error_code =
 			intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode &&
-			x86_exception_has_error_code(vector);
+			x86_exception_has_error_code(vcpu, vector);
 		if (CC(has_error_code != should_have_error_code))
 			return -EINVAL;
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 506454e17afc..40749e47cda7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -511,11 +511,15 @@ EXPORT_SYMBOL_GPL(kvm_spurious_fault);
 #define EXCPT_CONTRIBUTORY	1
 #define EXCPT_PF		2
 
-static int exception_class(int vector)
+static int exception_class(struct kvm_vcpu *vcpu, int vector)
 {
 	switch (vector) {
 	case PF_VECTOR:
 		return EXCPT_PF;
+	case CP_VECTOR:
+		if (vcpu->arch.cr4_guest_rsvd_bits & X86_CR4_CET)
+			return EXCPT_BENIGN;
+		return EXCPT_CONTRIBUTORY;
 	case DE_VECTOR:
 	case TS_VECTOR:
 	case NP_VECTOR:
@@ -659,8 +663,8 @@ static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
 		kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
 		return;
 	}
-	class1 = exception_class(prev_nr);
-	class2 = exception_class(nr);
+	class1 = exception_class(vcpu, prev_nr);
+	class2 = exception_class(vcpu, nr);
 	if ((class1 == EXCPT_CONTRIBUTORY && class2 == EXCPT_CONTRIBUTORY)
 		|| (class1 == EXCPT_PF && class2 != EXCPT_BENIGN)) {
 		/*
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 501b884b8cc4..b9b1fff6d97a 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -148,13 +148,20 @@ static inline bool is_64_bit_hypercall(struct kvm_vcpu *vcpu)
 	return vcpu->arch.guest_state_protected || is_64_bit_mode(vcpu);
 }
 
-static inline bool x86_exception_has_error_code(unsigned int vector)
+static inline bool x86_exception_has_error_code(struct kvm_vcpu *vcpu,
+						unsigned int vector)
 {
 	static u32 exception_has_error_code = BIT(DF_VECTOR) | BIT(TS_VECTOR) |
 			BIT(NP_VECTOR) | BIT(SS_VECTOR) | BIT(GP_VECTOR) |
-			BIT(PF_VECTOR) | BIT(AC_VECTOR);
+			BIT(PF_VECTOR) | BIT(AC_VECTOR) | BIT(CP_VECTOR);
 
-	return (1U << vector) & exception_has_error_code;
+	if (!((1U << vector) & exception_has_error_code))
+		return false;
+
+	if (vector == CP_VECTOR)
+		return !(vcpu->arch.cr4_guest_rsvd_bits & X86_CR4_CET);
+
+	return true;
 }
 
 static inline bool mmu_is_nested(struct kvm_vcpu *vcpu)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 10/19] KVM: VMX: Introduce CET VMCS fields and flags
  2022-06-16  8:46 [PATCH 00/19] Refresh queued CET virtualization series Yang Weijiang
                   ` (8 preceding siblings ...)
  2022-06-16  8:46 ` [PATCH 09/19] KVM: x86: Add #CP support in guest exception classification Yang Weijiang
@ 2022-06-16  8:46 ` Yang Weijiang
  2022-06-16  8:46 ` [PATCH 11/19] KVM: x86: Add fault checks for CR4.CET Yang Weijiang
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Yang Weijiang @ 2022-06-16  8:46 UTC (permalink / raw)
  To: pbonzini, seanjc, x86, kvm, linux-kernel, rick.p.edgecombe
  Cc: weijiang.yang, Zhang Yi Z

CET (Control-flow Enforcement Technology) is a CPU feature used to prevent
Return/Jump-Oriented Programming (ROP/JOP) attacks. CET introduces a new
exception type, Control Protection (#CP), and two sub-features to defend
against ROP/JOP style control-flow subversion attacks:

Shadow Stack (SHSTK):
  A shadow stack is a second stack used exclusively for control transfer
  operations. The shadow stack is separate from the data/normal stack and
  can be enabled individually in user and kernel mode.  When shadow stacks
  are enabled, CALL pushes the return address on both the data and shadow
  stack. RET pops the return address from both stacks and compares them.
  If the return addresses from the two stacks do not match, the processor
  signals a #CP.

Indirect Branch Tracking (IBT):
  IBT adds a new instrution, ENDBRANCH, that is used to mark valid target
  addresses of indirect branches (CALL, JMP, ENCLU[EEXIT], etc...). If an
  indirect branch is executed and the next instruction is _not_ an
  ENDBRANCH, the processor signals a #CP.

Several new CET MSRs are defined to support CET:
  MSR_IA32_{U,S}_CET: Controls the CET settings for user mode and kernel
                      mode respectively.

  MSR_IA32_PL{0,1,2,3}_SSP: Stores shadow stack pointers for CPL-0,1,2,3
                            protection respectively.

  MSR_IA32_INT_SSP_TAB: Stores base address of shadow stack pointer table.

Two XSAVES state bits are introduced for CET:
  IA32_XSS:[bit 11]: Control saving/restoring user mode CET states
  IA32_XSS:[bit 12]: Control saving/restoring kernel mode CET states.

Six VMCS fields are introduced for CET:
  {HOST,GUEST}_S_CET: Stores CET settings for kernel mode.
  {HOST,GUEST}_SSP: Stores shadow stack pointer of current task/thread.
  {HOST,GUEST}_INTR_SSP_TABLE: Stores base address of shadow stack pointer
  table.

If VM_EXIT_LOAD_HOST_CET_STATE = 1, the host CET states are restored from
the following VMCS fields at VM-Exit:
  HOST_S_CET
  HOST_SSP
  HOST_INTR_SSP_TABLE

If VM_ENTRY_LOAD_GUEST_CET_STATE = 1, the guest CET states are loaded from
the following VMCS fields at VM-Entry:
  GUEST_S_CET
  GUEST_SSP
  GUEST_INTR_SSP_TABLE

Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Message-Id: <20210203113421.5759-6-weijiang.yang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/vmx.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index c371ef695fcc..4e019fa968b8 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -102,6 +102,7 @@
 #define VM_EXIT_CLEAR_BNDCFGS                   0x00800000
 #define VM_EXIT_PT_CONCEAL_PIP			0x01000000
 #define VM_EXIT_CLEAR_IA32_RTIT_CTL		0x02000000
+#define VM_EXIT_LOAD_CET_STATE                  0x10000000
 
 #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR	0x00036dff
 
@@ -115,6 +116,7 @@
 #define VM_ENTRY_LOAD_BNDCFGS                   0x00010000
 #define VM_ENTRY_PT_CONCEAL_PIP			0x00020000
 #define VM_ENTRY_LOAD_IA32_RTIT_CTL		0x00040000
+#define VM_ENTRY_LOAD_CET_STATE                 0x00100000
 
 #define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR	0x000011ff
 
@@ -343,6 +345,9 @@ enum vmcs_field {
 	GUEST_PENDING_DBG_EXCEPTIONS    = 0x00006822,
 	GUEST_SYSENTER_ESP              = 0x00006824,
 	GUEST_SYSENTER_EIP              = 0x00006826,
+	GUEST_S_CET                     = 0x00006828,
+	GUEST_SSP                       = 0x0000682a,
+	GUEST_INTR_SSP_TABLE            = 0x0000682c,
 	HOST_CR0                        = 0x00006c00,
 	HOST_CR3                        = 0x00006c02,
 	HOST_CR4                        = 0x00006c04,
@@ -355,6 +360,9 @@ enum vmcs_field {
 	HOST_IA32_SYSENTER_EIP          = 0x00006c12,
 	HOST_RSP                        = 0x00006c14,
 	HOST_RIP                        = 0x00006c16,
+	HOST_S_CET                      = 0x00006c18,
+	HOST_SSP                        = 0x00006c1a,
+	HOST_INTR_SSP_TABLE             = 0x00006c1c
 };
 
 /*
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 11/19] KVM: x86: Add fault checks for CR4.CET
  2022-06-16  8:46 [PATCH 00/19] Refresh queued CET virtualization series Yang Weijiang
                   ` (9 preceding siblings ...)
  2022-06-16  8:46 ` [PATCH 10/19] KVM: VMX: Introduce CET VMCS fields and flags Yang Weijiang
@ 2022-06-16  8:46 ` Yang Weijiang
  2022-06-16  8:46 ` [PATCH 12/19] KVM: VMX: Emulate reads and writes to CET MSRs Yang Weijiang
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Yang Weijiang @ 2022-06-16  8:46 UTC (permalink / raw)
  To: pbonzini, seanjc, x86, kvm, linux-kernel, rick.p.edgecombe
  Cc: weijiang.yang, Sean Christopherson

Add the fault checks for CR4.CET, which is the master control for all
CET features (SHSTK and IBT).  In addition to basic support checks, CET
can be enabled if and only if CR0.WP==1, i.e. setting CR4.CET=1 faults
if CR0.WP==0 and setting CR0.WP=0 fails if CR4.CET==1.

Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Message-Id: <20210203113421.5759-7-weijiang.yang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/x86.c | 6 ++++++
 arch/x86/kvm/x86.h | 3 +++
 2 files changed, 9 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 40749e47cda7..cce789f1246a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -952,6 +952,9 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 	    (is_64_bit_mode(vcpu) || kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE)))
 		return 1;
 
+	if (!(cr0 & X86_CR0_WP) && kvm_read_cr4_bits(vcpu, X86_CR4_CET))
+		return 1;
+
 	static_call(kvm_x86_set_cr0)(vcpu, cr0);
 
 	kvm_post_set_cr0(vcpu, old_cr0, cr0);
@@ -1168,6 +1171,9 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 			return 1;
 	}
 
+	if ((cr4 & X86_CR4_CET) && !(kvm_read_cr0(vcpu) & X86_CR0_WP))
+		return 1;
+
 	static_call(kvm_x86_set_cr4)(vcpu, cr4);
 
 	kvm_post_set_cr4(vcpu, old_cr4, cr4);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index b9b1fff6d97a..01493b7ae150 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -477,6 +477,9 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type);
 		__reserved_bits |= X86_CR4_VMXE;        \
 	if (!__cpu_has(__c, X86_FEATURE_PCID))          \
 		__reserved_bits |= X86_CR4_PCIDE;       \
+	if (!__cpu_has(__c, X86_FEATURE_SHSTK) &&	\
+	    !__cpu_has(__c, X86_FEATURE_IBT))		\
+		__reserved_bits |= X86_CR4_CET;		\
 	__reserved_bits;                                \
 })
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 12/19] KVM: VMX: Emulate reads and writes to CET MSRs
  2022-06-16  8:46 [PATCH 00/19] Refresh queued CET virtualization series Yang Weijiang
                   ` (10 preceding siblings ...)
  2022-06-16  8:46 ` [PATCH 11/19] KVM: x86: Add fault checks for CR4.CET Yang Weijiang
@ 2022-06-16  8:46 ` Yang Weijiang
  2022-06-16  8:46 ` [PATCH 13/19] KVM: VMX: Add a synthetic MSR to allow userspace VMM to access GUEST_SSP Yang Weijiang
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Yang Weijiang @ 2022-06-16  8:46 UTC (permalink / raw)
  To: pbonzini, seanjc, x86, kvm, linux-kernel, rick.p.edgecombe
  Cc: weijiang.yang, Sean Christopherson

Add support for emulating read and write accesses to CET MSRs.
CET MSRs are universally "special" as they are either context
switched via dedicated VMCS fields or via XSAVES, i.e. no
additional in-memory tracking is needed, but emulated reads/writes
are more expensive.

Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.h     | 31 +++++++++++++++++++++++++++++++
 2 files changed, 73 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 5e14e4c40007..d1f2ffa07576 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1767,6 +1767,26 @@ static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
 	}
 }
 
+static bool cet_is_msr_accessible(struct kvm_vcpu *vcpu,
+				  struct msr_data *msr)
+{
+	if (!kvm_cet_user_supported())
+		return false;
+
+	if (msr->host_initiated)
+		return true;
+
+	if (!guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) &&
+	    !guest_cpuid_has(vcpu, X86_FEATURE_IBT))
+		return false;
+
+	if (msr->index == MSR_IA32_PL3_SSP &&
+	    !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
+		return false;
+
+	return true;
+}
+
 /*
  * Reads an msr value (of 'msr_info->index') into 'msr_info->data'.
  * Returns 0 on success, non-0 otherwise.
@@ -1906,6 +1926,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		else
 			msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
 		break;
+	case MSR_IA32_U_CET:
+	case MSR_IA32_PL3_SSP:
+		if (!cet_is_msr_accessible(vcpu, msr_info))
+			return 1;
+		kvm_get_xsave_msr(msr_info);
+		break;
 	case MSR_IA32_DEBUGCTLMSR:
 		msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
 		break;
@@ -2238,6 +2264,22 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		else
 			vmx->pt_desc.guest.addr_a[index / 2] = data;
 		break;
+	case MSR_IA32_U_CET:
+		if (!cet_is_msr_accessible(vcpu, msr_info))
+			return 1;
+		if ((data & GENMASK(9, 6)) ||
+		    is_noncanonical_address(data, vcpu))
+			return 1;
+		kvm_set_xsave_msr(msr_info);
+		break;
+	case MSR_IA32_PL3_SSP:
+		if (!cet_is_msr_accessible(vcpu, msr_info))
+			return 1;
+		if ((data & GENMASK(2, 0)) ||
+		    is_noncanonical_address(data, vcpu))
+			return 1;
+		kvm_set_xsave_msr(msr_info);
+		break;
 	case MSR_IA32_PERF_CAPABILITIES:
 		if (data && !vcpu_to_pmu(vcpu)->version)
 			return 1;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 01493b7ae150..f6000e3fb195 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -2,6 +2,7 @@
 #ifndef ARCH_X86_KVM_X86_H
 #define ARCH_X86_KVM_X86_H
 
+#include <asm/fpu/api.h>
 #include <linux/kvm_host.h>
 #include <asm/mce.h>
 #include <asm/pvclock.h>
@@ -323,6 +324,16 @@ static inline bool kvm_mpx_supported(void)
 		== (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR);
 }
 
+/*
+ * Guest CET user mode states depend on host XSAVES/XRSTORS to save/restore
+ * when vCPU enter/exit user space. If host doesn't support CET user bit in
+ * XSS msr, then treat this case as KVM doesn't support CET user mode.
+ */
+static inline bool kvm_cet_user_supported(void)
+{
+	return !!(kvm_caps.supported_xss & XFEATURE_MASK_CET_USER);
+}
+
 extern unsigned int min_timer_period_us;
 
 extern bool enable_vmware_backdoor;
@@ -491,4 +502,24 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
 			 unsigned int port, void *data,  unsigned int count,
 			 int in);
 
+/*
+ * We've already loaded guest MSRs in __msr_io() when check the MSR index.
+ * In case vcpu has been preempted, we need to disable preemption, check
+ * and reload the guest fpu states before issue MSR read/write,
+ * fpu_lock_and_load() serves the purpose well.
+ */
+static inline void kvm_get_xsave_msr(struct msr_data *msr_info)
+{
+	fpu_lock_and_load();
+	rdmsrl(msr_info->index, msr_info->data);
+	fpregs_unlock();
+}
+
+static inline void kvm_set_xsave_msr(struct msr_data *msr_info)
+{
+	fpu_lock_and_load();
+	wrmsrl(msr_info->index, msr_info->data);
+	fpregs_unlock();
+}
+
 #endif
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 13/19] KVM: VMX: Add a synthetic MSR to allow userspace VMM to access GUEST_SSP
  2022-06-16  8:46 [PATCH 00/19] Refresh queued CET virtualization series Yang Weijiang
                   ` (11 preceding siblings ...)
  2022-06-16  8:46 ` [PATCH 12/19] KVM: VMX: Emulate reads and writes to CET MSRs Yang Weijiang
@ 2022-06-16  8:46 ` Yang Weijiang
  2022-06-16  8:46 ` [PATCH 14/19] KVM: x86: Report CET MSRs as to-be-saved if CET is supported Yang Weijiang
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Yang Weijiang @ 2022-06-16  8:46 UTC (permalink / raw)
  To: pbonzini, seanjc, x86, kvm, linux-kernel, rick.p.edgecombe
  Cc: weijiang.yang, Sean Christopherson

Introduce a host-only synthetic MSR, MSR_KVM_GUEST_SSP so that the VMM
can read/write the guest's SSP, e.g. to migrate CET state.  Use a
synthetic MSR, e.g. as opposed to a VCPU_REG_, as GUEST_SSP is subject
to the same consistency checks as the PL*_SSP MSRs, i.e. can share code.

Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/include/uapi/asm/kvm_para.h |  1 +
 arch/x86/kvm/vmx/vmx.c               | 15 ++++++++++++---
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 6e64b27b2c1e..7af465e4e0bd 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -58,6 +58,7 @@
 #define MSR_KVM_ASYNC_PF_INT	0x4b564d06
 #define MSR_KVM_ASYNC_PF_ACK	0x4b564d07
 #define MSR_KVM_MIGRATION_CONTROL	0x4b564d08
+#define MSR_KVM_GUEST_SSP	0x4b564d09
 
 struct kvm_steal_time {
 	__u64 steal;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index d1f2ffa07576..fc1229f23987 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1780,7 +1780,8 @@ static bool cet_is_msr_accessible(struct kvm_vcpu *vcpu,
 	    !guest_cpuid_has(vcpu, X86_FEATURE_IBT))
 		return false;
 
-	if (msr->index == MSR_IA32_PL3_SSP &&
+	if ((msr->index == MSR_IA32_PL3_SSP ||
+	     msr->index == MSR_KVM_GUEST_SSP) &&
 	    !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
 		return false;
 
@@ -1928,9 +1929,13 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		break;
 	case MSR_IA32_U_CET:
 	case MSR_IA32_PL3_SSP:
+	case MSR_KVM_GUEST_SSP:
 		if (!cet_is_msr_accessible(vcpu, msr_info))
 			return 1;
-		kvm_get_xsave_msr(msr_info);
+		if (msr_info->index == MSR_KVM_GUEST_SSP)
+			msr_info->data = vmcs_readl(GUEST_SSP);
+		else
+			kvm_get_xsave_msr(msr_info);
 		break;
 	case MSR_IA32_DEBUGCTLMSR:
 		msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
@@ -2273,12 +2278,16 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		kvm_set_xsave_msr(msr_info);
 		break;
 	case MSR_IA32_PL3_SSP:
+	case MSR_KVM_GUEST_SSP:
 		if (!cet_is_msr_accessible(vcpu, msr_info))
 			return 1;
 		if ((data & GENMASK(2, 0)) ||
 		    is_noncanonical_address(data, vcpu))
 			return 1;
-		kvm_set_xsave_msr(msr_info);
+		if (msr_index == MSR_KVM_GUEST_SSP)
+			vmcs_writel(GUEST_SSP, data);
+		else
+			kvm_set_xsave_msr(msr_info);
 		break;
 	case MSR_IA32_PERF_CAPABILITIES:
 		if (data && !vcpu_to_pmu(vcpu)->version)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 14/19] KVM: x86: Report CET MSRs as to-be-saved if CET is supported
  2022-06-16  8:46 [PATCH 00/19] Refresh queued CET virtualization series Yang Weijiang
                   ` (12 preceding siblings ...)
  2022-06-16  8:46 ` [PATCH 13/19] KVM: VMX: Add a synthetic MSR to allow userspace VMM to access GUEST_SSP Yang Weijiang
@ 2022-06-16  8:46 ` Yang Weijiang
  2022-06-16  8:46 ` [PATCH 15/19] KVM: x86: Save/Restore GUEST_SSP to/from SMM state save area Yang Weijiang
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Yang Weijiang @ 2022-06-16  8:46 UTC (permalink / raw)
  To: pbonzini, seanjc, x86, kvm, linux-kernel, rick.p.edgecombe
  Cc: weijiang.yang, Sean Christopherson

Report all CET MSRs, including the synthetic GUEST_SSP MSR, as
to-be-saved, e.g. for migration, if CET is supported by KVM.

Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/x86.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cce789f1246a..3613b73f13fb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1460,6 +1460,7 @@ static const u32 msrs_to_save_all[] = {
 	MSR_F15H_PERF_CTR3, MSR_F15H_PERF_CTR4, MSR_F15H_PERF_CTR5,
 	MSR_IA32_XFD, MSR_IA32_XFD_ERR,
 	MSR_IA32_XSS,
+	MSR_IA32_U_CET, MSR_IA32_PL3_SSP, MSR_KVM_GUEST_SSP,
 };
 
 static u32 msrs_to_save[ARRAY_SIZE(msrs_to_save_all)];
@@ -6814,6 +6815,12 @@ static void kvm_init_msr_list(void)
 			if (!kvm_caps.supported_xss)
 				continue;
 			break;
+		case MSR_KVM_GUEST_SSP:
+		case MSR_IA32_U_CET:
+		case MSR_IA32_PL3_SSP:
+			if (!kvm_cet_user_supported())
+				continue;
+			break;
 		default:
 			break;
 		}
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 15/19] KVM: x86: Save/Restore GUEST_SSP to/from SMM state save area
  2022-06-16  8:46 [PATCH 00/19] Refresh queued CET virtualization series Yang Weijiang
                   ` (13 preceding siblings ...)
  2022-06-16  8:46 ` [PATCH 14/19] KVM: x86: Report CET MSRs as to-be-saved if CET is supported Yang Weijiang
@ 2022-06-16  8:46 ` Yang Weijiang
  2022-06-16  8:46 ` [PATCH 16/19] KVM: x86: Enable CET virtualization for VMX and advertise CET to userspace Yang Weijiang
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Yang Weijiang @ 2022-06-16  8:46 UTC (permalink / raw)
  To: pbonzini, seanjc, x86, kvm, linux-kernel, rick.p.edgecombe; +Cc: weijiang.yang

Save GUEST_SSP in the SMM state save area when guest exits to SMM
due to SMI and restore it when guest exits SMM.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Message-Id: <20210203113421.5759-15-weijiang.yang@intel.com>
[Change the SMM offset to some place that is actually free. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/emulate.c | 11 +++++++++++
 arch/x86/kvm/x86.c     | 10 ++++++++++
 2 files changed, 21 insertions(+)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 39ea9138224c..eb0d45ae5214 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2558,6 +2558,17 @@ static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt,
 			return r;
 	}
 
+	if (kvm_cet_user_supported()) {
+		struct msr_data msr;
+
+		val = GET_SMSTATE(u64, smstate, 0x7f08);
+		msr.index = MSR_KVM_GUEST_SSP;
+		msr.host_initiated = true;
+		msr.data = val;
+		/* Mimic host_initiated access to bypass ssp access check. */
+		kvm_x86_ops.set_msr(ctxt->vcpu, &msr);
+	}
+
 	return X86EMUL_CONTINUE;
 }
 #endif
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3613b73f13fb..86bccb12f036 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9833,6 +9833,16 @@ static void enter_smm_save_state_64(struct kvm_vcpu *vcpu, char *buf)
 
 	for (i = 0; i < 6; i++)
 		enter_smm_save_seg_64(vcpu, buf, i);
+
+	if (kvm_cet_user_supported()) {
+		struct msr_data msr;
+
+		msr.index = MSR_KVM_GUEST_SSP;
+		msr.host_initiated = true;
+		/* GUEST_SSP is stored in VMCS at vm-exit. */
+		kvm_x86_ops.get_msr(vcpu, &msr);
+		put_smstate(u64, buf, 0x7f08, msr.data);
+	}
 }
 #endif
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 16/19] KVM: x86: Enable CET virtualization for VMX and advertise CET to userspace
  2022-06-16  8:46 [PATCH 00/19] Refresh queued CET virtualization series Yang Weijiang
                   ` (14 preceding siblings ...)
  2022-06-16  8:46 ` [PATCH 15/19] KVM: x86: Save/Restore GUEST_SSP to/from SMM state save area Yang Weijiang
@ 2022-06-16  8:46 ` Yang Weijiang
  2022-06-16 10:59   ` Peter Zijlstra
  2022-06-16  8:46 ` [PATCH 17/19] KVM: VMX: Pass through CET MSRs to the guest when supported Yang Weijiang
                   ` (4 subsequent siblings)
  20 siblings, 1 reply; 45+ messages in thread
From: Yang Weijiang @ 2022-06-16  8:46 UTC (permalink / raw)
  To: pbonzini, seanjc, x86, kvm, linux-kernel, rick.p.edgecombe
  Cc: weijiang.yang, Sean Christopherson

Set the feature bits so that CET capabilities can be seen in guest via
CPUID enumeration. Add CR4.CET bit support in order to allow guest set CET
master control bit(CR4.CET).

Disable KVM CET feature if unrestricted_guest is unsupported/disabled as
KVM does not support emulating CET.

Don't expose CET feature if dependent CET bits are cleared in host XSS,
or if XSAVES isn't supported.  Updating the CET features in common x86 is
a little ugly, but there is no clean solution without risking breakage of
SVM if SVM hardware ever gains support for CET, e.g. moving everything to
common x86 would prematurely expose CET on SVM.  The alternative is to
put all the logic in VMX, but that means rereading host_xss in VMX and
duplicating the XSAVES check across VMX and SVM.

Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  3 ++-
 arch/x86/kvm/cpuid.c            |  5 +++--
 arch/x86/kvm/vmx/capabilities.h |  4 ++++
 arch/x86/kvm/vmx/vmx.c          | 37 ++++++++++++++++++++++++++++-----
 arch/x86/kvm/x86.c              | 21 ++++++++++++++++++-
 5 files changed, 61 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7e98b2876380..a7e5463d0107 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -120,7 +120,8 @@
 			  | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_PCIDE \
 			  | X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \
 			  | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \
-			  | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP))
+			  | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP \
+			  | X86_CR4_CET))
 
 #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR)
 
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 46ca0f1abbcb..12d527e612e5 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -581,7 +581,7 @@ void kvm_set_cpu_caps(void)
 		F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
 		F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
 		F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | 0 /*WAITPKG*/ |
-		F(SGX_LC) | F(BUS_LOCK_DETECT)
+		F(SGX_LC) | F(BUS_LOCK_DETECT) | F(SHSTK)
 	);
 	/* Set LA57 based on hardware capability. */
 	if (cpuid_ecx(7) & F(LA57))
@@ -599,7 +599,8 @@ void kvm_set_cpu_caps(void)
 		F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
 		F(MD_CLEAR) | F(AVX512_VP2INTERSECT) | F(FSRM) |
 		F(SERIALIZE) | F(TSXLDTRK) | F(AVX512_FP16) |
-		F(AMX_TILE) | F(AMX_INT8) | F(AMX_BF16)
+		F(AMX_TILE) | F(AMX_INT8) | F(AMX_BF16) |
+		F(IBT)
 	);
 
 	/* TSC_ADJUST and ARCH_CAPABILITIES are emulated in software. */
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 069d8d298e1d..6849888f7b46 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -106,6 +106,10 @@ static inline bool cpu_has_load_perf_global_ctrl(void)
 	return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
 }
 
+static inline bool cpu_has_load_cet_ctrl(void)
+{
+	return (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_CET_STATE);
+}
 static inline bool cpu_has_vmx_mpx(void)
 {
 	return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_BNDCFGS;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index fc1229f23987..4bdede87669a 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2516,6 +2516,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 		{ VM_ENTRY_LOAD_IA32_EFER,		VM_EXIT_LOAD_IA32_EFER },
 		{ VM_ENTRY_LOAD_BNDCFGS,		VM_EXIT_CLEAR_BNDCFGS },
 		{ VM_ENTRY_LOAD_IA32_RTIT_CTL,		VM_EXIT_CLEAR_IA32_RTIT_CTL },
+		{ VM_ENTRY_LOAD_CET_STATE,		VM_EXIT_LOAD_CET_STATE },
 	};
 
 	memset(vmcs_conf, 0, sizeof(*vmcs_conf));
@@ -2636,7 +2637,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 	      VM_EXIT_LOAD_IA32_EFER |
 	      VM_EXIT_CLEAR_BNDCFGS |
 	      VM_EXIT_PT_CONCEAL_PIP |
-	      VM_EXIT_CLEAR_IA32_RTIT_CTL;
+	      VM_EXIT_CLEAR_IA32_RTIT_CTL |
+	      VM_EXIT_LOAD_CET_STATE;
 	if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_EXIT_CTLS,
 				&_vmexit_control) < 0)
 		return -EIO;
@@ -2660,7 +2662,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 	      VM_ENTRY_LOAD_IA32_EFER |
 	      VM_ENTRY_LOAD_BNDCFGS |
 	      VM_ENTRY_PT_CONCEAL_PIP |
-	      VM_ENTRY_LOAD_IA32_RTIT_CTL;
+	      VM_ENTRY_LOAD_IA32_RTIT_CTL |
+	      VM_ENTRY_LOAD_CET_STATE;
 	if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_ENTRY_CTLS,
 				&_vmentry_control) < 0)
 		return -EIO;
@@ -2705,7 +2708,6 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 		}
 	}
 
-
 	rdmsr(MSR_IA32_VMX_BASIC, vmx_msr_low, vmx_msr_high);
 
 	/* IA-32 SDM Vol 3B: VMCS size is never greater than 4kB. */
@@ -6159,6 +6161,12 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
 	if (vmcs_read32(VM_EXIT_MSR_STORE_COUNT) > 0)
 		vmx_dump_msrs("guest autostore", &vmx->msr_autostore.guest);
 
+	if (vmentry_ctl & VM_ENTRY_LOAD_CET_STATE) {
+		pr_err("S_CET = 0x%016lx\n", vmcs_readl(GUEST_S_CET));
+		pr_err("SSP = 0x%016lx\n", vmcs_readl(GUEST_SSP));
+		pr_err("SSP TABLE = 0x%016lx\n",
+		       vmcs_readl(GUEST_INTR_SSP_TABLE));
+	}
 	pr_err("*** Host State ***\n");
 	pr_err("RIP = 0x%016lx  RSP = 0x%016lx\n",
 	       vmcs_readl(HOST_RIP), vmcs_readl(HOST_RSP));
@@ -6236,6 +6244,12 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
 	if (secondary_exec_control & SECONDARY_EXEC_ENABLE_VPID)
 		pr_err("Virtual processor ID = 0x%04x\n",
 		       vmcs_read16(VIRTUAL_PROCESSOR_ID));
+	if (vmexit_ctl & VM_EXIT_LOAD_CET_STATE) {
+		pr_err("S_CET = 0x%016lx\n", vmcs_readl(HOST_S_CET));
+		pr_err("SSP = 0x%016lx\n", vmcs_readl(HOST_SSP));
+		pr_err("SSP TABLE = 0x%016lx\n",
+		       vmcs_readl(HOST_INTR_SSP_TABLE));
+	}
 }
 
 /*
@@ -7679,9 +7693,10 @@ static __init void vmx_set_cpu_caps(void)
 		kvm_cpu_cap_set(X86_FEATURE_UMIP);
 
 	/* CPUID 0xD.1 */
-	kvm_caps.supported_xss = 0;
-	if (!cpu_has_vmx_xsaves())
+	if (!cpu_has_vmx_xsaves()) {
 		kvm_cpu_cap_clear(X86_FEATURE_XSAVES);
+		kvm_caps.supported_xss = 0;
+	}
 
 	/* CPUID 0x80000001 and 0x7 (RDPID) */
 	if (!cpu_has_vmx_rdtscp()) {
@@ -7691,6 +7706,18 @@ static __init void vmx_set_cpu_caps(void)
 
 	if (cpu_has_vmx_waitpkg())
 		kvm_cpu_cap_check_and_set(X86_FEATURE_WAITPKG);
+
+	if (!cpu_has_load_cet_ctrl() || !enable_unrestricted_guest) {
+		kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
+		kvm_cpu_cap_clear(X86_FEATURE_IBT);
+		kvm_caps.supported_xss &= ~XFEATURE_MASK_CET_USER;
+	}
+
+#ifndef CONFIG_X86_SHADOW_STACK
+	if (boot_cpu_has(X86_FEATURE_SHSTK))
+		kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
+#endif
+
 }
 
 static void vmx_request_immediate_exit(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 86bccb12f036..fe049d0e5ecc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -217,6 +217,8 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
 				| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
 				| XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE)
 
+#define KVM_SUPPORTED_XSS     (XFEATURE_MASK_CET_USER)
+
 u64 __read_mostly host_efer;
 EXPORT_SYMBOL_GPL(host_efer);
 
@@ -11807,8 +11809,10 @@ int kvm_arch_hardware_setup(void *opaque)
 
 	rdmsrl_safe(MSR_EFER, &host_efer);
 
-	if (boot_cpu_has(X86_FEATURE_XSAVES))
+	if (boot_cpu_has(X86_FEATURE_XSAVES)) {
 		rdmsrl(MSR_IA32_XSS, host_xss);
+		kvm_caps.supported_xss = host_xss & KVM_SUPPORTED_XSS;
+	}
 
 	kvm_init_pmu_capability();
 
@@ -11823,6 +11827,21 @@ int kvm_arch_hardware_setup(void *opaque)
 	if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
 		kvm_caps.supported_xss = 0;
 
+	/* Update CET features now as kvm_caps.supported_xss is finalized. */
+	if (!kvm_cet_user_supported()) {
+		kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
+		kvm_cpu_cap_clear(X86_FEATURE_IBT);
+	}
+
+	/*
+	 * If SHSTK and IBT are disabled either by user space or unselection in
+	 * Kconfig,then the feature bits should have been removed from KVM caps
+	 * by this point, clear CET user bit in kvm_caps.supported_xss too.
+	 */
+	if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
+	    !kvm_cpu_cap_has(X86_FEATURE_IBT))
+		kvm_caps.supported_xss &= ~XFEATURE_CET_USER;
+
 #define __kvm_cpu_cap_has(UNUSED_, f) kvm_cpu_cap_has(f)
 	cr4_reserved_bits = __cr4_reserved_bits(__kvm_cpu_cap_has, UNUSED_);
 #undef __kvm_cpu_cap_has
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 17/19] KVM: VMX: Pass through CET MSRs to the guest when supported
  2022-06-16  8:46 [PATCH 00/19] Refresh queued CET virtualization series Yang Weijiang
                   ` (15 preceding siblings ...)
  2022-06-16  8:46 ` [PATCH 16/19] KVM: x86: Enable CET virtualization for VMX and advertise CET to userspace Yang Weijiang
@ 2022-06-16  8:46 ` Yang Weijiang
  2022-06-16  8:46 ` [PATCH 18/19] KVM: nVMX: Enable CET support for nested VMX Yang Weijiang
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Yang Weijiang @ 2022-06-16  8:46 UTC (permalink / raw)
  To: pbonzini, seanjc, x86, kvm, linux-kernel, rick.p.edgecombe
  Cc: weijiang.yang, Zhang Yi Z, Sean Christopherson

Pass through CET user mode MSRs when the associated CET component
is enabled to improve guest performance.  All CET MSRs are context
switched, either via dedicated VMCS fields or XSAVES.

Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4bdede87669a..9aebd67ff03e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -583,6 +583,9 @@ static bool is_valid_passthrough_msr(u32 msr)
 	case MSR_LBR_CORE_TO ... MSR_LBR_CORE_TO + 8:
 		/* LBR MSRs. These are handled in vmx_update_intercept_for_lbr_msrs() */
 		return true;
+	case MSR_IA32_U_CET:
+	case MSR_IA32_PL3_SSP:
+		return true;
 	}
 
 	r = possible_passthrough_msr_slot(msr) != -ENOENT;
@@ -7595,6 +7598,23 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
 		vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4));
 }
 
+static bool is_cet_state_supported(struct kvm_vcpu *vcpu, u32 xss_state)
+{
+	return (kvm_caps.supported_xss & xss_state) &&
+	       (guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) ||
+		guest_cpuid_has(vcpu, X86_FEATURE_IBT));
+}
+
+static void vmx_update_intercept_for_cet_msr(struct kvm_vcpu *vcpu)
+{
+	bool incpt = !is_cet_state_supported(vcpu, XFEATURE_MASK_CET_USER);
+
+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET, MSR_TYPE_RW, incpt);
+
+	incpt |= !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP, MSR_TYPE_RW, incpt);
+}
+
 static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -7657,6 +7677,9 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 
 	/* Refresh #PF interception to account for MAXPHYADDR changes. */
 	vmx_update_exception_bitmap(vcpu);
+
+	if (kvm_cet_user_supported())
+		vmx_update_intercept_for_cet_msr(vcpu);
 }
 
 static __init void vmx_set_cpu_caps(void)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 18/19] KVM: nVMX: Enable CET support for nested VMX
  2022-06-16  8:46 [PATCH 00/19] Refresh queued CET virtualization series Yang Weijiang
                   ` (16 preceding siblings ...)
  2022-06-16  8:46 ` [PATCH 17/19] KVM: VMX: Pass through CET MSRs to the guest when supported Yang Weijiang
@ 2022-06-16  8:46 ` Yang Weijiang
  2022-06-16  8:46 ` [PATCH 19/19] KVM: x86: Enable supervisor IBT support for guest Yang Weijiang
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Yang Weijiang @ 2022-06-16  8:46 UTC (permalink / raw)
  To: pbonzini, seanjc, x86, kvm, linux-kernel, rick.p.edgecombe
  Cc: weijiang.yang, Sean Christopherson

Add vmcs12 fields for all CET fields, pass-through CET MSRs to L2 when
possible, and enumerate the VMCS controls and CR4 bit as supported.

Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/vmx/nested.c | 14 ++++++++++++--
 arch/x86/kvm/vmx/vmcs12.c |  6 ++++++
 arch/x86/kvm/vmx/vmcs12.h | 14 +++++++++++++-
 arch/x86/kvm/vmx/vmx.c    |  2 ++
 4 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 01fe23c6fa49..f31f3d394507 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -684,6 +684,13 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
 	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
 					 MSR_IA32_PRED_CMD, MSR_TYPE_W);
 
+	/* Pass CET MSRs to nested VM if L0 and L1 are set to pass-through. */
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_U_CET, MSR_TYPE_RW);
+
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_PL3_SSP, MSR_TYPE_RW);
+
 	kvm_vcpu_unmap(vcpu, &vmx->nested.msr_bitmap_map, false);
 
 	vmx->nested.force_msr_bitmap_recalc = false;
@@ -6593,7 +6600,9 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
 		VM_EXIT_HOST_ADDR_SPACE_SIZE |
 #endif
 		VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
-		VM_EXIT_CLEAR_BNDCFGS | VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
+		VM_EXIT_CLEAR_BNDCFGS | VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL |
+		VM_EXIT_LOAD_CET_STATE;
+
 	msrs->exit_ctls_high |=
 		VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
 		VM_EXIT_LOAD_IA32_EFER | VM_EXIT_SAVE_IA32_EFER |
@@ -6613,7 +6622,8 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
 		VM_ENTRY_IA32E_MODE |
 #endif
 		VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS |
-		VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
+		VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL | VM_ENTRY_LOAD_CET_STATE;
+
 	msrs->entry_ctls_high |=
 		(VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | VM_ENTRY_LOAD_IA32_EFER);
 
diff --git a/arch/x86/kvm/vmx/vmcs12.c b/arch/x86/kvm/vmx/vmcs12.c
index 2251b60920f8..4c3836db548e 100644
--- a/arch/x86/kvm/vmx/vmcs12.c
+++ b/arch/x86/kvm/vmx/vmcs12.c
@@ -138,6 +138,9 @@ const unsigned short vmcs12_field_offsets[] = {
 	FIELD(GUEST_PENDING_DBG_EXCEPTIONS, guest_pending_dbg_exceptions),
 	FIELD(GUEST_SYSENTER_ESP, guest_sysenter_esp),
 	FIELD(GUEST_SYSENTER_EIP, guest_sysenter_eip),
+	FIELD(GUEST_S_CET, guest_s_cet),
+	FIELD(GUEST_SSP, guest_ssp),
+	FIELD(GUEST_INTR_SSP_TABLE, guest_ssp_tbl),
 	FIELD(HOST_CR0, host_cr0),
 	FIELD(HOST_CR3, host_cr3),
 	FIELD(HOST_CR4, host_cr4),
@@ -150,5 +153,8 @@ const unsigned short vmcs12_field_offsets[] = {
 	FIELD(HOST_IA32_SYSENTER_EIP, host_ia32_sysenter_eip),
 	FIELD(HOST_RSP, host_rsp),
 	FIELD(HOST_RIP, host_rip),
+	FIELD(HOST_S_CET, host_s_cet),
+	FIELD(HOST_SSP, host_ssp),
+	FIELD(HOST_INTR_SSP_TABLE, host_ssp_tbl),
 };
 const unsigned int nr_vmcs12_fields = ARRAY_SIZE(vmcs12_field_offsets);
diff --git a/arch/x86/kvm/vmx/vmcs12.h b/arch/x86/kvm/vmx/vmcs12.h
index 746129ddd5ae..672abd1f500b 100644
--- a/arch/x86/kvm/vmx/vmcs12.h
+++ b/arch/x86/kvm/vmx/vmcs12.h
@@ -117,7 +117,13 @@ struct __packed vmcs12 {
 	natural_width host_ia32_sysenter_eip;
 	natural_width host_rsp;
 	natural_width host_rip;
-	natural_width paddingl[8]; /* room for future expansion */
+	natural_width host_s_cet;
+	natural_width host_ssp;
+	natural_width host_ssp_tbl;
+	natural_width guest_s_cet;
+	natural_width guest_ssp;
+	natural_width guest_ssp_tbl;
+	natural_width paddingl[2]; /* room for future expansion */
 	u32 pin_based_vm_exec_control;
 	u32 cpu_based_vm_exec_control;
 	u32 exception_bitmap;
@@ -293,6 +299,12 @@ static inline void vmx_check_vmcs12_offsets(void)
 	CHECK_OFFSET(host_ia32_sysenter_eip, 656);
 	CHECK_OFFSET(host_rsp, 664);
 	CHECK_OFFSET(host_rip, 672);
+	CHECK_OFFSET(host_s_cet, 680);
+	CHECK_OFFSET(host_ssp, 688);
+	CHECK_OFFSET(host_ssp_tbl, 696);
+	CHECK_OFFSET(guest_s_cet, 704);
+	CHECK_OFFSET(guest_ssp, 712);
+	CHECK_OFFSET(guest_ssp_tbl, 720);
 	CHECK_OFFSET(pin_based_vm_exec_control, 744);
 	CHECK_OFFSET(cpu_based_vm_exec_control, 748);
 	CHECK_OFFSET(exception_bitmap, 752);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 9aebd67ff03e..00782d1750a5 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7508,6 +7508,8 @@ static void nested_vmx_cr_fixed1_bits_update(struct kvm_vcpu *vcpu)
 	cr4_fixed1_update(X86_CR4_PKE,        ecx, feature_bit(PKU));
 	cr4_fixed1_update(X86_CR4_UMIP,       ecx, feature_bit(UMIP));
 	cr4_fixed1_update(X86_CR4_LA57,       ecx, feature_bit(LA57));
+	cr4_fixed1_update(X86_CR4_CET,	      ecx, feature_bit(SHSTK));
+	cr4_fixed1_update(X86_CR4_CET,	      edx, feature_bit(IBT));
 
 #undef cr4_fixed1_update
 }
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 19/19] KVM: x86: Enable supervisor IBT support for guest
  2022-06-16  8:46 [PATCH 00/19] Refresh queued CET virtualization series Yang Weijiang
                   ` (17 preceding siblings ...)
  2022-06-16  8:46 ` [PATCH 18/19] KVM: nVMX: Enable CET support for nested VMX Yang Weijiang
@ 2022-06-16  8:46 ` Yang Weijiang
  2022-06-16 11:05   ` Peter Zijlstra
  2022-06-16 11:19   ` Peter Zijlstra
  2022-06-16  9:10 ` [PATCH 00/19] Refresh queued CET virtualization series Christoph Hellwig
  2022-06-16 10:12 ` Peter Zijlstra
  20 siblings, 2 replies; 45+ messages in thread
From: Yang Weijiang @ 2022-06-16  8:46 UTC (permalink / raw)
  To: pbonzini, seanjc, x86, kvm, linux-kernel, rick.p.edgecombe; +Cc: weijiang.yang

Mainline kernel now supports supervisor IBT for kernel code,
to make s-IBT work in guest(nested guest), pass through
MSR_IA32_S_CET to guest(nested guest) if host kernel and KVM
enabled IBT. Note, s-IBT can work independent to host xsaves
support because guest MSR_IA32_S_CET can be stored/loaded from
specific VMCS field.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/cpuid.h      |  5 +++++
 arch/x86/kvm/vmx/nested.c |  3 +++
 arch/x86/kvm/vmx/vmx.c    | 27 ++++++++++++++++++++++++---
 arch/x86/kvm/x86.c        | 13 ++++++++++++-
 4 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index ac72aabba981..c67c1e2fc11a 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -230,4 +230,9 @@ static __always_inline bool guest_pv_has(struct kvm_vcpu *vcpu,
 	return vcpu->arch.pv_cpuid.features & (1u << kvm_feature);
 }
 
+static __always_inline bool cet_kernel_ibt_supported(void)
+{
+	return HAS_KERNEL_IBT && kvm_cpu_cap_has(X86_FEATURE_IBT);
+}
+
 #endif
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index f31f3d394507..d394136891d0 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -688,6 +688,9 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
 	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
 					 MSR_IA32_U_CET, MSR_TYPE_RW);
 
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_S_CET, MSR_TYPE_RW);
+
 	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
 					 MSR_IA32_PL3_SSP, MSR_TYPE_RW);
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 00782d1750a5..6e7e596c0147 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -585,6 +585,7 @@ static bool is_valid_passthrough_msr(u32 msr)
 		return true;
 	case MSR_IA32_U_CET:
 	case MSR_IA32_PL3_SSP:
+	case MSR_IA32_S_CET:
 		return true;
 	}
 
@@ -1773,7 +1774,8 @@ static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
 static bool cet_is_msr_accessible(struct kvm_vcpu *vcpu,
 				  struct msr_data *msr)
 {
-	if (!kvm_cet_user_supported())
+	if (!kvm_cet_user_supported() &&
+	    !cet_kernel_ibt_supported())
 		return false;
 
 	if (msr->host_initiated)
@@ -1783,6 +1785,10 @@ static bool cet_is_msr_accessible(struct kvm_vcpu *vcpu,
 	    !guest_cpuid_has(vcpu, X86_FEATURE_IBT))
 		return false;
 
+	if (msr->index == MSR_IA32_S_CET &&
+	    guest_cpuid_has(vcpu, X86_FEATURE_IBT))
+		return true;
+
 	if ((msr->index == MSR_IA32_PL3_SSP ||
 	     msr->index == MSR_KVM_GUEST_SSP) &&
 	    !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
@@ -1933,10 +1939,13 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_U_CET:
 	case MSR_IA32_PL3_SSP:
 	case MSR_KVM_GUEST_SSP:
+	case MSR_IA32_S_CET:
 		if (!cet_is_msr_accessible(vcpu, msr_info))
 			return 1;
 		if (msr_info->index == MSR_KVM_GUEST_SSP)
 			msr_info->data = vmcs_readl(GUEST_SSP);
+		else if (msr_info->index == MSR_IA32_S_CET)
+			msr_info->data = vmcs_readl(GUEST_S_CET);
 		else
 			kvm_get_xsave_msr(msr_info);
 		break;
@@ -2273,12 +2282,16 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			vmx->pt_desc.guest.addr_a[index / 2] = data;
 		break;
 	case MSR_IA32_U_CET:
+	case MSR_IA32_S_CET:
 		if (!cet_is_msr_accessible(vcpu, msr_info))
 			return 1;
 		if ((data & GENMASK(9, 6)) ||
 		    is_noncanonical_address(data, vcpu))
 			return 1;
-		kvm_set_xsave_msr(msr_info);
+		if (msr_index == MSR_IA32_S_CET)
+			vmcs_writel(GUEST_S_CET, data);
+		else
+			kvm_set_xsave_msr(msr_info);
 		break;
 	case MSR_IA32_PL3_SSP:
 	case MSR_KVM_GUEST_SSP:
@@ -7615,6 +7628,9 @@ static void vmx_update_intercept_for_cet_msr(struct kvm_vcpu *vcpu)
 
 	incpt |= !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
 	vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP, MSR_TYPE_RW, incpt);
+
+	incpt |= !guest_cpuid_has(vcpu, X86_FEATURE_IBT);
+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, MSR_TYPE_RW, incpt);
 }
 
 static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
@@ -7680,7 +7696,7 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	/* Refresh #PF interception to account for MAXPHYADDR changes. */
 	vmx_update_exception_bitmap(vcpu);
 
-	if (kvm_cet_user_supported())
+	if (kvm_cet_user_supported() || cet_kernel_ibt_supported())
 		vmx_update_intercept_for_cet_msr(vcpu);
 }
 
@@ -7743,6 +7759,11 @@ static __init void vmx_set_cpu_caps(void)
 		kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
 #endif
 
+#ifndef CONFIG_X86_KERNEL_IBT
+	if (boot_cpu_has(X86_FEATURE_IBT))
+		kvm_cpu_cap_clear(X86_FEATURE_IBT);
+#endif
+
 }
 
 static void vmx_request_immediate_exit(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fe049d0e5ecc..c0118b33806a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1463,6 +1463,7 @@ static const u32 msrs_to_save_all[] = {
 	MSR_IA32_XFD, MSR_IA32_XFD_ERR,
 	MSR_IA32_XSS,
 	MSR_IA32_U_CET, MSR_IA32_PL3_SSP, MSR_KVM_GUEST_SSP,
+	MSR_IA32_S_CET,
 };
 
 static u32 msrs_to_save[ARRAY_SIZE(msrs_to_save_all)];
@@ -6823,6 +6824,10 @@ static void kvm_init_msr_list(void)
 			if (!kvm_cet_user_supported())
 				continue;
 			break;
+		case MSR_IA32_S_CET:
+			if (!cet_kernel_ibt_supported())
+				continue;
+			break;
 		default:
 			break;
 		}
@@ -11830,7 +11835,13 @@ int kvm_arch_hardware_setup(void *opaque)
 	/* Update CET features now as kvm_caps.supported_xss is finalized. */
 	if (!kvm_cet_user_supported()) {
 		kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
-		kvm_cpu_cap_clear(X86_FEATURE_IBT);
+		/* If CET user bit is disabled due to cmdline option such as
+		 * noxsaves, but kernel IBT is on, this means we can expose
+		 * kernel IBT alone to guest since CET user mode msrs are not
+		 * passed through to guest.
+		 */
+		if (!cet_kernel_ibt_supported())
+			kvm_cpu_cap_clear(X86_FEATURE_IBT);
 	}
 
 	/*
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 00/19] Refresh queued CET virtualization series
  2022-06-16  8:46 [PATCH 00/19] Refresh queued CET virtualization series Yang Weijiang
                   ` (18 preceding siblings ...)
  2022-06-16  8:46 ` [PATCH 19/19] KVM: x86: Enable supervisor IBT support for guest Yang Weijiang
@ 2022-06-16  9:10 ` Christoph Hellwig
  2022-06-16 11:25   ` Peter Zijlstra
  2022-06-16 10:12 ` Peter Zijlstra
  20 siblings, 1 reply; 45+ messages in thread
From: Christoph Hellwig @ 2022-06-16  9:10 UTC (permalink / raw)
  To: Yang Weijiang; +Cc: pbonzini, seanjc, x86, kvm, linux-kernel, rick.p.edgecombe

On Thu, Jun 16, 2022 at 04:46:24AM -0400, Yang Weijiang wrote:
> The purpose of this patch series is to refresh the queued CET KVM
> patches[1] with the latest dependent CET native patches, pursuing
> the result that whole series could be merged ahead of CET native
> series[2] [3].

It might be helpful to explain what CET is here..

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 00/19] Refresh queued CET virtualization series
  2022-06-16  8:46 [PATCH 00/19] Refresh queued CET virtualization series Yang Weijiang
                   ` (19 preceding siblings ...)
  2022-06-16  9:10 ` [PATCH 00/19] Refresh queued CET virtualization series Christoph Hellwig
@ 2022-06-16 10:12 ` Peter Zijlstra
  2022-06-16 10:21   ` Paolo Bonzini
  20 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2022-06-16 10:12 UTC (permalink / raw)
  To: Yang Weijiang; +Cc: pbonzini, seanjc, x86, kvm, linux-kernel, rick.p.edgecombe

On Thu, Jun 16, 2022 at 04:46:24AM -0400, Yang Weijiang wrote:

> To minimize the impact to exiting kernel/KVM code, most of KVM patch
> code can be bypassed during runtime.Uncheck "CONFIG_X86_KERNEL_IBT"
> and "CONFIG_X86_SHADOW_STACK" in Kconfig before kernel build to get
> rid of CET featrures in KVM. If both of them are not enabled, KVM
> clears related feature bits as well as CET user bit in supported_xss,
> this makes CET related checks stop at the first points. Since most of
> the patch code runs on the none-hot path of KVM, it's expected to
> introduce little impact to existing code.

Do I understand this right in that a host without X86_KERNEL_IBT cannot
run a guest with X86_KERNEL_IBT on? That seems unfortunate, since that
was exactly what I did while developing the X86_KERNEL_IBT patches.

I'm thinking that if the hardware supports it, KVM should expose it,
irrespective of the host kernel using it.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 00/19] Refresh queued CET virtualization series
  2022-06-16 10:12 ` Peter Zijlstra
@ 2022-06-16 10:21   ` Paolo Bonzini
  2022-06-16 14:18     ` Peter Zijlstra
  0 siblings, 1 reply; 45+ messages in thread
From: Paolo Bonzini @ 2022-06-16 10:21 UTC (permalink / raw)
  To: Peter Zijlstra, Yang Weijiang
  Cc: seanjc, x86, kvm, linux-kernel, rick.p.edgecombe

On 6/16/22 12:12, Peter Zijlstra wrote:
> Do I understand this right in that a host without X86_KERNEL_IBT cannot
> run a guest with X86_KERNEL_IBT on? That seems unfortunate, since that
> was exactly what I did while developing the X86_KERNEL_IBT patches.
> 
> I'm thinking that if the hardware supports it, KVM should expose it,
> irrespective of the host kernel using it.

For IBT in particular, I think all processor state is only loaded and 
stored at vmentry/vmexit (does not need XSAVES), so it should be feasible.

Paolo


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/19] x86/cpufeatures: Enable CET CR4 bit for shadow stack
  2022-06-16  8:46 ` [PATCH 03/19] x86/cpufeatures: Enable CET CR4 bit for shadow stack Yang Weijiang
@ 2022-06-16 10:24   ` Peter Zijlstra
  2022-06-16 17:12     ` Edgecombe, Rick P
  2022-06-16 10:25   ` Peter Zijlstra
  1 sibling, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2022-06-16 10:24 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: pbonzini, seanjc, x86, kvm, linux-kernel, rick.p.edgecombe,
	Yu-cheng Yu, Kees Cook

On Thu, Jun 16, 2022 at 04:46:27AM -0400, Yang Weijiang wrote:
> --- a/arch/x86/include/asm/cpu.h
> +++ b/arch/x86/include/asm/cpu.h
> @@ -74,7 +74,7 @@ void init_ia32_feat_ctl(struct cpuinfo_x86 *c);
>  static inline void init_ia32_feat_ctl(struct cpuinfo_x86 *c) {}
>  #endif
>  
> -extern __noendbr void cet_disable(void);
> +extern __noendbr void ibt_disable(void);
>  
>  struct ucode_cpu_info;
>  
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index c296cb1c0113..86102a8d451e 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -598,23 +598,23 @@ __noendbr void ibt_restore(u64 save)

>  
> -__noendbr void cet_disable(void)
> +__noendbr void ibt_disable(void)
>  {
>  	if (cpu_feature_enabled(X86_FEATURE_IBT))
>  		wrmsrl(MSR_IA32_S_CET, 0);

Not sure about this rename; it really disables all of (S) CET.

Specifically, once we do S-SHSTK (after FRED) we might also very much
need to kill that for kexec.

> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> index 0611fd83858e..745024654fcd 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -311,7 +311,7 @@ void machine_kexec(struct kimage *image)
>  	/* Interrupts aren't acceptable while we reboot */
>  	local_irq_disable();
>  	hw_breakpoint_disable();
> -	cet_disable();
> +	ibt_disable();
>  
>  	if (image->preserve_context) {
>  #ifdef CONFIG_X86_IO_APIC
> -- 
> 2.27.0
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/19] x86/cpufeatures: Enable CET CR4 bit for shadow stack
  2022-06-16  8:46 ` [PATCH 03/19] x86/cpufeatures: Enable CET CR4 bit for shadow stack Yang Weijiang
  2022-06-16 10:24   ` Peter Zijlstra
@ 2022-06-16 10:25   ` Peter Zijlstra
  2022-06-16 17:36     ` Edgecombe, Rick P
  1 sibling, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2022-06-16 10:25 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: pbonzini, seanjc, x86, kvm, linux-kernel, rick.p.edgecombe,
	Yu-cheng Yu, Kees Cook

On Thu, Jun 16, 2022 at 04:46:27AM -0400, Yang Weijiang wrote:

>  static __always_inline void setup_cet(struct cpuinfo_x86 *c)
>  {
> +	bool kernel_ibt = HAS_KERNEL_IBT && cpu_feature_enabled(X86_FEATURE_IBT);
>  	u64 msr = CET_ENDBR_EN;
>  
> +	if (kernel_ibt)
> +		wrmsrl(MSR_IA32_S_CET, msr);
>  
> +	if (kernel_ibt || cpu_feature_enabled(X86_FEATURE_SHSTK))
> +		cr4_set_bits(X86_CR4_CET);

Does flipping the CR4 and S_CET MSR write not result in simpler code?

>  
> +	if (kernel_ibt && !ibt_selftest()) {
>  		pr_err("IBT selftest: Failed!\n");
>  		setup_clear_cpu_cap(X86_FEATURE_IBT);

Looking at this error path; I think I forgot to clear S_CET here.

>  		return;
>  	}
>  }

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 04/19] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states
  2022-06-16  8:46 ` [PATCH 04/19] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states Yang Weijiang
@ 2022-06-16 10:27   ` Peter Zijlstra
  2022-06-16 17:12     ` Edgecombe, Rick P
  0 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2022-06-16 10:27 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: pbonzini, seanjc, x86, kvm, linux-kernel, rick.p.edgecombe,
	Yu-cheng Yu, Kees Cook

On Thu, Jun 16, 2022 at 04:46:28AM -0400, Yang Weijiang wrote:
> diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
> index eb7cd1139d97..03aa98fb9c2b 100644
> --- a/arch/x86/include/asm/fpu/types.h
> +++ b/arch/x86/include/asm/fpu/types.h
> @@ -115,8 +115,8 @@ enum xfeature {
>  	XFEATURE_PT_UNIMPLEMENTED_SO_FAR,
>  	XFEATURE_PKRU,
>  	XFEATURE_PASID,
> -	XFEATURE_RSRVD_COMP_11,
> -	XFEATURE_RSRVD_COMP_12,
> +	XFEATURE_CET_USER,
> +	XFEATURE_CET_KERNEL_UNIMPLEMENTED_SO_FAR,
>  	XFEATURE_RSRVD_COMP_13,
>  	XFEATURE_RSRVD_COMP_14,
>  	XFEATURE_LBR,
> @@ -138,6 +138,8 @@ enum xfeature {
>  #define XFEATURE_MASK_PT		(1 << XFEATURE_PT_UNIMPLEMENTED_SO_FAR)
>  #define XFEATURE_MASK_PKRU		(1 << XFEATURE_PKRU)
>  #define XFEATURE_MASK_PASID		(1 << XFEATURE_PASID)
> +#define XFEATURE_MASK_CET_USER		(1 << XFEATURE_CET_USER)
> +#define XFEATURE_MASK_CET_KERNEL	(1 << XFEATURE_CET_KERNEL_UNIMPLEMENTED_SO_FAR)
>  #define XFEATURE_MASK_LBR		(1 << XFEATURE_LBR)
>  #define XFEATURE_MASK_XTILE_CFG		(1 << XFEATURE_XTILE_CFG)
>  #define XFEATURE_MASK_XTILE_DATA	(1 << XFEATURE_XTILE_DATA)

I'm not sure about that UNIMPLEMENTED_SO_FAR thing, that is, I'm
thinking we *never* want XSAVE managed S_CET.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 16/19] KVM: x86: Enable CET virtualization for VMX and advertise CET to userspace
  2022-06-16  8:46 ` [PATCH 16/19] KVM: x86: Enable CET virtualization for VMX and advertise CET to userspace Yang Weijiang
@ 2022-06-16 10:59   ` Peter Zijlstra
  2022-06-16 15:27     ` Yang, Weijiang
  2022-06-25  6:55     ` Yang, Weijiang
  0 siblings, 2 replies; 45+ messages in thread
From: Peter Zijlstra @ 2022-06-16 10:59 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: pbonzini, seanjc, x86, kvm, linux-kernel, rick.p.edgecombe,
	Sean Christopherson

On Thu, Jun 16, 2022 at 04:46:40AM -0400, Yang Weijiang wrote:
> Set the feature bits so that CET capabilities can be seen in guest via
> CPUID enumeration. Add CR4.CET bit support in order to allow guest set CET
> master control bit(CR4.CET).
> 
> Disable KVM CET feature if unrestricted_guest is unsupported/disabled as
> KVM does not support emulating CET.
> 
> Don't expose CET feature if dependent CET bits are cleared in host XSS,
> or if XSAVES isn't supported.  Updating the CET features in common x86 is
> a little ugly, but there is no clean solution without risking breakage of
> SVM if SVM hardware ever gains support for CET, e.g. moving everything to
> common x86 would prematurely expose CET on SVM.  The alternative is to
> put all the logic in VMX, but that means rereading host_xss in VMX and
> duplicating the XSAVES check across VMX and SVM.

Doesn't Zen3 already have SHSTK ?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 19/19] KVM: x86: Enable supervisor IBT support for guest
  2022-06-16  8:46 ` [PATCH 19/19] KVM: x86: Enable supervisor IBT support for guest Yang Weijiang
@ 2022-06-16 11:05   ` Peter Zijlstra
  2022-06-16 11:19   ` Peter Zijlstra
  1 sibling, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2022-06-16 11:05 UTC (permalink / raw)
  To: Yang Weijiang; +Cc: pbonzini, seanjc, x86, kvm, linux-kernel, rick.p.edgecombe

On Thu, Jun 16, 2022 at 04:46:43AM -0400, Yang Weijiang wrote:
> Mainline kernel now supports supervisor IBT for kernel code,
> to make s-IBT work in guest(nested guest), pass through
> MSR_IA32_S_CET to guest(nested guest) if host kernel and KVM
> enabled IBT. Note, s-IBT can work independent to host xsaves
> support because guest MSR_IA32_S_CET can be stored/loaded from
> specific VMCS field.
> 
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> ---
>  arch/x86/kvm/cpuid.h      |  5 +++++
>  arch/x86/kvm/vmx/nested.c |  3 +++
>  arch/x86/kvm/vmx/vmx.c    | 27 ++++++++++++++++++++++++---
>  arch/x86/kvm/x86.c        | 13 ++++++++++++-
>  4 files changed, 44 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> index ac72aabba981..c67c1e2fc11a 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -230,4 +230,9 @@ static __always_inline bool guest_pv_has(struct kvm_vcpu *vcpu,
>  	return vcpu->arch.pv_cpuid.features & (1u << kvm_feature);
>  }
>  
> +static __always_inline bool cet_kernel_ibt_supported(void)
> +{
> +	return HAS_KERNEL_IBT && kvm_cpu_cap_has(X86_FEATURE_IBT);
> +}

As stated before; I would much rather it expose S_CET unconditional of
host kernel config.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 19/19] KVM: x86: Enable supervisor IBT support for guest
  2022-06-16  8:46 ` [PATCH 19/19] KVM: x86: Enable supervisor IBT support for guest Yang Weijiang
  2022-06-16 11:05   ` Peter Zijlstra
@ 2022-06-16 11:19   ` Peter Zijlstra
  2022-06-16 15:56     ` Yang, Weijiang
  1 sibling, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2022-06-16 11:19 UTC (permalink / raw)
  To: Yang Weijiang; +Cc: pbonzini, seanjc, x86, kvm, linux-kernel, rick.p.edgecombe

On Thu, Jun 16, 2022 at 04:46:43AM -0400, Yang Weijiang wrote:
> Mainline kernel now supports supervisor IBT for kernel code,
> to make s-IBT work in guest(nested guest), pass through
> MSR_IA32_S_CET to guest(nested guest) if host kernel and KVM
> enabled IBT. Note, s-IBT can work independent to host xsaves
> support because guest MSR_IA32_S_CET can be stored/loaded from
> specific VMCS field.


> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index fe049d0e5ecc..c0118b33806a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1463,6 +1463,7 @@ static const u32 msrs_to_save_all[] = {
>  	MSR_IA32_XFD, MSR_IA32_XFD_ERR,
>  	MSR_IA32_XSS,
>  	MSR_IA32_U_CET, MSR_IA32_PL3_SSP, MSR_KVM_GUEST_SSP,
> +	MSR_IA32_S_CET,
>  };


So much like my local kvm/qemu hacks; this patch suffers the problem of
not exposing S_SHSTK. What happens if a guest tries to use that?

Should we intercept and reject setting those bits or complete this patch
and support full S_SHSTK? (with all the warts and horrors that entails)

I don't think throwing this out in this half-finished state makes much
sense (which is why I never much shared my hacks).


> @@ -11830,7 +11835,13 @@ int kvm_arch_hardware_setup(void *opaque)
>  	/* Update CET features now as kvm_caps.supported_xss is finalized. */
>  	if (!kvm_cet_user_supported()) {
>  		kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
> -		kvm_cpu_cap_clear(X86_FEATURE_IBT);
> +		/* If CET user bit is disabled due to cmdline option such as
> +		 * noxsaves, but kernel IBT is on, this means we can expose
> +		 * kernel IBT alone to guest since CET user mode msrs are not
> +		 * passed through to guest.
> +		 */

Invalid multi-line comment style.

> +		if (!cet_kernel_ibt_supported())
> +			kvm_cpu_cap_clear(X86_FEATURE_IBT);

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 00/19] Refresh queued CET virtualization series
  2022-06-16  9:10 ` [PATCH 00/19] Refresh queued CET virtualization series Christoph Hellwig
@ 2022-06-16 11:25   ` Peter Zijlstra
  0 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2022-06-16 11:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Yang Weijiang, pbonzini, seanjc, x86, kvm, linux-kernel,
	rick.p.edgecombe

On Thu, Jun 16, 2022 at 02:10:50AM -0700, Christoph Hellwig wrote:
> On Thu, Jun 16, 2022 at 04:46:24AM -0400, Yang Weijiang wrote:
> > The purpose of this patch series is to refresh the queued CET KVM
> > patches[1] with the latest dependent CET native patches, pursuing
> > the result that whole series could be merged ahead of CET native
> > series[2] [3].
> 
> It might be helpful to explain what CET is here..

Central European Time ofc :-)

I think it stands for Control-flow Enforcement Technology or something
along those lines, but this being Intel it loves to obfuscate stuff and
make it impossible to understand what's being said to increase the
buzzword bong hits.

Its a mostly pointless umbrella term for IBT (Indirect Branch Tracking)
and SHSTK (SHadow STacK), the first of which covers forward edge control
flow and the second covers backward edge control flow.



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 00/19] Refresh queued CET virtualization series
  2022-06-16 10:21   ` Paolo Bonzini
@ 2022-06-16 14:18     ` Peter Zijlstra
  2022-06-16 15:06       ` Yang, Weijiang
  2022-06-16 15:28       ` Paolo Bonzini
  0 siblings, 2 replies; 45+ messages in thread
From: Peter Zijlstra @ 2022-06-16 14:18 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Yang Weijiang, seanjc, x86, kvm, linux-kernel, rick.p.edgecombe

On Thu, Jun 16, 2022 at 12:21:20PM +0200, Paolo Bonzini wrote:
> On 6/16/22 12:12, Peter Zijlstra wrote:
> > Do I understand this right in that a host without X86_KERNEL_IBT cannot
> > run a guest with X86_KERNEL_IBT on? That seems unfortunate, since that
> > was exactly what I did while developing the X86_KERNEL_IBT patches.
> > 
> > I'm thinking that if the hardware supports it, KVM should expose it,
> > irrespective of the host kernel using it.
> 
> For IBT in particular, I think all processor state is only loaded and stored
> at vmentry/vmexit (does not need XSAVES), so it should be feasible.

That would be the S_CET stuff, yeah, that's VMCS managed. The U_CET
stuff is all XSAVE though.

But funny thing, CPUID doesn't enumerate {U,S}_CET separately. It *does*
enumerate IBT and SS separately, but for each IBT/SS you have to
implement both U and S.

That was a problem with the first series, which only implemented support
for U_CET while advertising IBT and SS (very much including S_CET), and
still is a problem with this series because S_SS is missing while
advertised.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 00/19] Refresh queued CET virtualization series
  2022-06-16 14:18     ` Peter Zijlstra
@ 2022-06-16 15:06       ` Yang, Weijiang
  2022-06-16 15:28       ` Paolo Bonzini
  1 sibling, 0 replies; 45+ messages in thread
From: Yang, Weijiang @ 2022-06-16 15:06 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini
  Cc: seanjc, x86, kvm, linux-kernel, Edgecombe, Rick P


On 6/16/2022 10:18 PM, Peter Zijlstra wrote:
> On Thu, Jun 16, 2022 at 12:21:20PM +0200, Paolo Bonzini wrote:
>> On 6/16/22 12:12, Peter Zijlstra wrote:
>>> Do I understand this right in that a host without X86_KERNEL_IBT cannot
>>> run a guest with X86_KERNEL_IBT on? That seems unfortunate, since that
>>> was exactly what I did while developing the X86_KERNEL_IBT patches.
>>>
>>> I'm thinking that if the hardware supports it, KVM should expose it,
>>> irrespective of the host kernel using it.
>> For IBT in particular, I think all processor state is only loaded and stored
>> at vmentry/vmexit (does not need XSAVES), so it should be feasible.
> That would be the S_CET stuff, yeah, that's VMCS managed. The U_CET
> stuff is all XSAVE though.

Thank you Peter and Paolo!

In this version, I referenced host kernel settings when expose 
X86_KERNEL_IBT to guest.

The reason would be _IF_ host, for whatever reason, disabled the IBT 
feature, exposing the

feature blindly to guest could be risking, e.g., hitting some issues 
host wants to mitigate.

The actual implementation depends on the agreement we got :-)

>
> But funny thing, CPUID doesn't enumerate {U,S}_CET separately. It *does*
> enumerate IBT and SS separately, but for each IBT/SS you have to
> implement both U and S.

Exactly, the CPUID enumeration could be a pain point for the KVM solution.

It makes {U,S}_CET feature control harder for guest.

>
> That was a problem with the first series, which only implemented support
> for U_CET while advertising IBT and SS (very much including S_CET), and
> still is a problem with this series because S_SS is missing while
> advertised.

KVM has problem advertising S_SS alone to guest when  U_CET(both SS and 
IBT) are

not available to guest. I would like to hear the voice from community on 
how to

make the features control straightforward and reasonable. Existing CPUID 
enumeration

cannot advertise {U, S}_SS and {U,S}_IBT well.

>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 16/19] KVM: x86: Enable CET virtualization for VMX and advertise CET to userspace
  2022-06-16 10:59   ` Peter Zijlstra
@ 2022-06-16 15:27     ` Yang, Weijiang
  2022-06-25  6:55     ` Yang, Weijiang
  1 sibling, 0 replies; 45+ messages in thread
From: Yang, Weijiang @ 2022-06-16 15:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: pbonzini, seanjc, x86, kvm, linux-kernel, Edgecombe, Rick P,
	Sean Christopherson


On 6/16/2022 6:59 PM, Peter Zijlstra wrote:
> On Thu, Jun 16, 2022 at 04:46:40AM -0400, Yang Weijiang wrote:
>> Set the feature bits so that CET capabilities can be seen in guest via
>> CPUID enumeration. Add CR4.CET bit support in order to allow guest set CET
>> master control bit(CR4.CET).
>>
>> Disable KVM CET feature if unrestricted_guest is unsupported/disabled as
>> KVM does not support emulating CET.
>>
>> Don't expose CET feature if dependent CET bits are cleared in host XSS,
>> or if XSAVES isn't supported.  Updating the CET features in common x86 is
>> a little ugly, but there is no clean solution without risking breakage of
>> SVM if SVM hardware ever gains support for CET, e.g. moving everything to
>> common x86 would prematurely expose CET on SVM.  The alternative is to
>> put all the logic in VMX, but that means rereading host_xss in VMX and
>> duplicating the XSAVES check across VMX and SVM.
> Doesn't Zen3 already have SHSTK ?

Hmm, you remind me of reading more specs from AMD... I'll check their HW 
solution

if it's available.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 00/19] Refresh queued CET virtualization series
  2022-06-16 14:18     ` Peter Zijlstra
  2022-06-16 15:06       ` Yang, Weijiang
@ 2022-06-16 15:28       ` Paolo Bonzini
  2022-06-18  6:43         ` Yang, Weijiang
  1 sibling, 1 reply; 45+ messages in thread
From: Paolo Bonzini @ 2022-06-16 15:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Yang Weijiang, seanjc, x86, kvm, linux-kernel, rick.p.edgecombe

On 6/16/22 16:18, Peter Zijlstra wrote:
> On Thu, Jun 16, 2022 at 12:21:20PM +0200, Paolo Bonzini wrote:
>> On 6/16/22 12:12, Peter Zijlstra wrote:
>>> Do I understand this right in that a host without X86_KERNEL_IBT cannot
>>> run a guest with X86_KERNEL_IBT on? That seems unfortunate, since that
>>> was exactly what I did while developing the X86_KERNEL_IBT patches.
>>>
>>> I'm thinking that if the hardware supports it, KVM should expose it,
>>> irrespective of the host kernel using it.
>>
>> For IBT in particular, I think all processor state is only loaded and stored
>> at vmentry/vmexit (does not need XSAVES), so it should be feasible.
> 
> That would be the S_CET stuff, yeah, that's VMCS managed. The U_CET
> stuff is all XSAVE though.

What matters is whether XFEATURE_MASK_USER_SUPPORTED includes 
XFEATURE_CET_USER.  If you build with !X86_KERNEL_IBT, KVM can still 
rely on the FPU state for U_CET state, and S_CET is saved/restored via 
the VMCS independent of X86_KERNEL_IBT.

Paolo

> But funny thing, CPUID doesn't enumerate {U,S}_CET separately. It *does*
> enumerate IBT and SS separately, but for each IBT/SS you have to
> implement both U and S.
> 
> That was a problem with the first series, which only implemented support
> for U_CET while advertising IBT and SS (very much including S_CET), and
> still is a problem with this series because S_SS is missing while
> advertised.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 19/19] KVM: x86: Enable supervisor IBT support for guest
  2022-06-16 11:19   ` Peter Zijlstra
@ 2022-06-16 15:56     ` Yang, Weijiang
  0 siblings, 0 replies; 45+ messages in thread
From: Yang, Weijiang @ 2022-06-16 15:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: pbonzini, seanjc, x86, kvm, linux-kernel, Edgecombe, Rick P


On 6/16/2022 7:19 PM, Peter Zijlstra wrote:
> On Thu, Jun 16, 2022 at 04:46:43AM -0400, Yang Weijiang wrote:
>> Mainline kernel now supports supervisor IBT for kernel code,
>> to make s-IBT work in guest(nested guest), pass through
>> MSR_IA32_S_CET to guest(nested guest) if host kernel and KVM
>> enabled IBT. Note, s-IBT can work independent to host xsaves
>> support because guest MSR_IA32_S_CET can be stored/loaded from
>> specific VMCS field.
>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index fe049d0e5ecc..c0118b33806a 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -1463,6 +1463,7 @@ static const u32 msrs_to_save_all[] = {
>>   	MSR_IA32_XFD, MSR_IA32_XFD_ERR,
>>   	MSR_IA32_XSS,
>>   	MSR_IA32_U_CET, MSR_IA32_PL3_SSP, MSR_KVM_GUEST_SSP,
>> +	MSR_IA32_S_CET,
>>   };
>
> So much like my local kvm/qemu hacks; this patch suffers the problem of
> not exposing S_SHSTK. What happens if a guest tries to use that?
With current solution, I think guest kernel will hit #GP while 
reading/writing PL0_SSP.
>
> Should we intercept and reject setting those bits or complete this patch
> and support full S_SHSTK? (with all the warts and horrors that entails)
>
> I don't think throwing this out in this half-finished state makes much
> sense (which is why I never much shared my hacks).

You reminded me to think over these cases even I don't have a solution now,

thank you!

>
>
>> @@ -11830,7 +11835,13 @@ int kvm_arch_hardware_setup(void *opaque)
>>   	/* Update CET features now as kvm_caps.supported_xss is finalized. */
>>   	if (!kvm_cet_user_supported()) {
>>   		kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
>> -		kvm_cpu_cap_clear(X86_FEATURE_IBT);
>> +		/* If CET user bit is disabled due to cmdline option such as
>> +		 * noxsaves, but kernel IBT is on, this means we can expose
>> +		 * kernel IBT alone to guest since CET user mode msrs are not
>> +		 * passed through to guest.
>> +		 */
> Invalid multi-line comment style.
Oops, last minute change messed it up :-(
>
>> +		if (!cet_kernel_ibt_supported())
>> +			kvm_cpu_cap_clear(X86_FEATURE_IBT);

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 04/19] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states
  2022-06-16 10:27   ` Peter Zijlstra
@ 2022-06-16 17:12     ` Edgecombe, Rick P
  0 siblings, 0 replies; 45+ messages in thread
From: Edgecombe, Rick P @ 2022-06-16 17:12 UTC (permalink / raw)
  To: peterz, Yang, Weijiang
  Cc: keescook, Christopherson,,
	Sean, kvm, Yu, Yu-cheng, pbonzini, x86, linux-kernel

On Thu, 2022-06-16 at 12:27 +0200, Peter Zijlstra wrote:
> On Thu, Jun 16, 2022 at 04:46:28AM -0400, Yang Weijiang wrote:
> > diff --git a/arch/x86/include/asm/fpu/types.h
> > b/arch/x86/include/asm/fpu/types.h
> > index eb7cd1139d97..03aa98fb9c2b 100644
> > --- a/arch/x86/include/asm/fpu/types.h
> > +++ b/arch/x86/include/asm/fpu/types.h
> > @@ -115,8 +115,8 @@ enum xfeature {
> >        XFEATURE_PT_UNIMPLEMENTED_SO_FAR,
> >        XFEATURE_PKRU,
> >        XFEATURE_PASID,
> > -     XFEATURE_RSRVD_COMP_11,
> > -     XFEATURE_RSRVD_COMP_12,
> > +     XFEATURE_CET_USER,
> > +     XFEATURE_CET_KERNEL_UNIMPLEMENTED_SO_FAR,
> >        XFEATURE_RSRVD_COMP_13,
> >        XFEATURE_RSRVD_COMP_14,
> >        XFEATURE_LBR,
> > @@ -138,6 +138,8 @@ enum xfeature {
> >   #define XFEATURE_MASK_PT             (1 <<
> > XFEATURE_PT_UNIMPLEMENTED_SO_FAR)
> >   #define XFEATURE_MASK_PKRU           (1 << XFEATURE_PKRU)
> >   #define XFEATURE_MASK_PASID          (1 << XFEATURE_PASID)
> > +#define XFEATURE_MASK_CET_USER               (1 <<
> > XFEATURE_CET_USER)
> > +#define XFEATURE_MASK_CET_KERNEL     (1 <<
> > XFEATURE_CET_KERNEL_UNIMPLEMENTED_SO_FAR)
> >   #define XFEATURE_MASK_LBR            (1 << XFEATURE_LBR)
> >   #define XFEATURE_MASK_XTILE_CFG              (1 <<
> > XFEATURE_XTILE_CFG)
> >   #define XFEATURE_MASK_XTILE_DATA     (1 << XFEATURE_XTILE_DATA)
> 
> I'm not sure about that UNIMPLEMENTED_SO_FAR thing, that is, I'm
> thinking we *never* want XSAVE managed S_CET.

Hmm, yes. I mostly was just keeping the pattern with
XFEATURE_PT_UNIMPLEMENTED_SO_FAR.

How about XFEATURE_CET_KERNEL_UNUSED?


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/19] x86/cpufeatures: Enable CET CR4 bit for shadow stack
  2022-06-16 10:24   ` Peter Zijlstra
@ 2022-06-16 17:12     ` Edgecombe, Rick P
  2022-06-17 11:38       ` Peter Zijlstra
  0 siblings, 1 reply; 45+ messages in thread
From: Edgecombe, Rick P @ 2022-06-16 17:12 UTC (permalink / raw)
  To: peterz, Yang, Weijiang
  Cc: keescook, Christopherson,,
	Sean, kvm, Yu, Yu-cheng, pbonzini, x86, linux-kernel

On Thu, 2022-06-16 at 12:24 +0200, Peter Zijlstra wrote:
> On Thu, Jun 16, 2022 at 04:46:27AM -0400, Yang Weijiang wrote:
> > --- a/arch/x86/include/asm/cpu.h
> > +++ b/arch/x86/include/asm/cpu.h
> > @@ -74,7 +74,7 @@ void init_ia32_feat_ctl(struct cpuinfo_x86 *c);
> >   static inline void init_ia32_feat_ctl(struct cpuinfo_x86 *c) {}
> >   #endif
> >   
> > -extern __noendbr void cet_disable(void);
> > +extern __noendbr void ibt_disable(void);
> >   
> >   struct ucode_cpu_info;
> >   
> > diff --git a/arch/x86/kernel/cpu/common.c
> > b/arch/x86/kernel/cpu/common.c
> > index c296cb1c0113..86102a8d451e 100644
> > --- a/arch/x86/kernel/cpu/common.c
> > +++ b/arch/x86/kernel/cpu/common.c
> > @@ -598,23 +598,23 @@ __noendbr void ibt_restore(u64 save)
> >   
> > -__noendbr void cet_disable(void)
> > +__noendbr void ibt_disable(void)
> >   {
> >        if (cpu_feature_enabled(X86_FEATURE_IBT))
> >                wrmsrl(MSR_IA32_S_CET, 0);
> 
> Not sure about this rename; it really disables all of (S) CET.
> 
> Specifically, once we do S-SHSTK (after FRED) we might also very much
> need to kill that for kexec.

Sure, what about something like sup_cet_disable()?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/19] x86/cpufeatures: Enable CET CR4 bit for shadow stack
  2022-06-16 10:25   ` Peter Zijlstra
@ 2022-06-16 17:36     ` Edgecombe, Rick P
  0 siblings, 0 replies; 45+ messages in thread
From: Edgecombe, Rick P @ 2022-06-16 17:36 UTC (permalink / raw)
  To: peterz, Yang, Weijiang
  Cc: keescook, Christopherson,,
	Sean, kvm, Yu, Yu-cheng, pbonzini, x86, linux-kernel

On Thu, 2022-06-16 at 12:25 +0200, Peter Zijlstra wrote:
> On Thu, Jun 16, 2022 at 04:46:27AM -0400, Yang Weijiang wrote:
> 
> >   static __always_inline void setup_cet(struct cpuinfo_x86 *c)
> >   {
> > +     bool kernel_ibt = HAS_KERNEL_IBT &&
> > cpu_feature_enabled(X86_FEATURE_IBT);
> >        u64 msr = CET_ENDBR_EN;
> >   
> > +     if (kernel_ibt)
> > +             wrmsrl(MSR_IA32_S_CET, msr);
> >   
> > +     if (kernel_ibt || cpu_feature_enabled(X86_FEATURE_SHSTK))
> > +             cr4_set_bits(X86_CR4_CET);
> 
> Does flipping the CR4 and S_CET MSR write not result in simpler code?

I thought it was more defensive to reset S_CET before turning it on
with CR4. Of course CR4.CET could have been left on as well, but if CET
features were actually fully turned on, then we probably wouldn't have
gotten here. Seem reasonable?

> 
> >   
> > +     if (kernel_ibt && !ibt_selftest()) {
> >                pr_err("IBT selftest: Failed!\n");
> >                setup_clear_cpu_cap(X86_FEATURE_IBT);
> 
> Looking at this error path; I think I forgot to clear S_CET here.
> 

Yea. I can fix it in the next version of this if you want.

> >                return;
> >        }
> >   }

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/19] x86/cpufeatures: Enable CET CR4 bit for shadow stack
  2022-06-16 17:12     ` Edgecombe, Rick P
@ 2022-06-17 11:38       ` Peter Zijlstra
  2022-06-17 21:18         ` Edgecombe, Rick P
  0 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2022-06-17 11:38 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: Yang, Weijiang, keescook, Christopherson,,
	Sean, kvm, Yu, Yu-cheng, pbonzini, x86, linux-kernel

On Thu, Jun 16, 2022 at 05:12:47PM +0000, Edgecombe, Rick P wrote:
> On Thu, 2022-06-16 at 12:24 +0200, Peter Zijlstra wrote:
> > On Thu, Jun 16, 2022 at 04:46:27AM -0400, Yang Weijiang wrote:
> > > --- a/arch/x86/include/asm/cpu.h
> > > +++ b/arch/x86/include/asm/cpu.h
> > > @@ -74,7 +74,7 @@ void init_ia32_feat_ctl(struct cpuinfo_x86 *c);
> > >   static inline void init_ia32_feat_ctl(struct cpuinfo_x86 *c) {}
> > >   #endif
> > >   
> > > -extern __noendbr void cet_disable(void);
> > > +extern __noendbr void ibt_disable(void);
> > >   
> > >   struct ucode_cpu_info;
> > >   
> > > diff --git a/arch/x86/kernel/cpu/common.c
> > > b/arch/x86/kernel/cpu/common.c
> > > index c296cb1c0113..86102a8d451e 100644
> > > --- a/arch/x86/kernel/cpu/common.c
> > > +++ b/arch/x86/kernel/cpu/common.c
> > > @@ -598,23 +598,23 @@ __noendbr void ibt_restore(u64 save)
> > >   
> > > -__noendbr void cet_disable(void)
> > > +__noendbr void ibt_disable(void)
> > >   {
> > >        if (cpu_feature_enabled(X86_FEATURE_IBT))
> > >                wrmsrl(MSR_IA32_S_CET, 0);
> > 
> > Not sure about this rename; it really disables all of (S) CET.
> > 
> > Specifically, once we do S-SHSTK (after FRED) we might also very much
> > need to kill that for kexec.
> 
> Sure, what about something like sup_cet_disable()?

Why bother? Arguably kexec should clear U_CET too.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/19] x86/cpufeatures: Enable CET CR4 bit for shadow stack
  2022-06-17 11:38       ` Peter Zijlstra
@ 2022-06-17 21:18         ` Edgecombe, Rick P
  0 siblings, 0 replies; 45+ messages in thread
From: Edgecombe, Rick P @ 2022-06-17 21:18 UTC (permalink / raw)
  To: peterz, ebiederm, kexec
  Cc: kvm, linux-kernel, keescook, Yu, Yu-cheng, x86, Christopherson,,
	Sean, Yang, Weijiang, pbonzini

+kexec people

On Fri, 2022-06-17 at 13:38 +0200, Peter Zijlstra wrote:
> On Thu, Jun 16, 2022 at 05:12:47PM +0000, Edgecombe, Rick P wrote:
> > On Thu, 2022-06-16 at 12:24 +0200, Peter Zijlstra wrote:
> > > On Thu, Jun 16, 2022 at 04:46:27AM -0400, Yang Weijiang wrote:
> > > > --- a/arch/x86/include/asm/cpu.h
> > > > +++ b/arch/x86/include/asm/cpu.h
> > > > @@ -74,7 +74,7 @@ void init_ia32_feat_ctl(struct cpuinfo_x86
> > > > *c);
> > > >    static inline void init_ia32_feat_ctl(struct cpuinfo_x86 *c)
> > > > {}
> > > >    #endif
> > > >    
> > > > -extern __noendbr void cet_disable(void);
> > > > +extern __noendbr void ibt_disable(void);
> > > >    
> > > >    struct ucode_cpu_info;
> > > >    
> > > > diff --git a/arch/x86/kernel/cpu/common.c
> > > > b/arch/x86/kernel/cpu/common.c
> > > > index c296cb1c0113..86102a8d451e 100644
> > > > --- a/arch/x86/kernel/cpu/common.c
> > > > +++ b/arch/x86/kernel/cpu/common.c
> > > > @@ -598,23 +598,23 @@ __noendbr void ibt_restore(u64 save)
> > > >    
> > > > -__noendbr void cet_disable(void)
> > > > +__noendbr void ibt_disable(void)
> > > >    {
> > > >         if (cpu_feature_enabled(X86_FEATURE_IBT))
> > > >                 wrmsrl(MSR_IA32_S_CET, 0);
> > > 
> > > Not sure about this rename; it really disables all of (S) CET.
> > > 
> > > Specifically, once we do S-SHSTK (after FRED) we might also very
> > > much
> > > need to kill that for kexec.
> > 
> > Sure, what about something like sup_cet_disable()?
> 
> Why bother? Arguably kexec should clear U_CET too.

Hmm, I think you're right. It doesn't look the fpu stuff actually would
reset unknown xfeatures to init. So kernels with Kernel IBT would set
CR4.CET and then MSR_IA32_U_CET might make it to the point where
userspace would run with CET enabled.

It seems like a general kexec problem for when the kernel enables new
xfeatures. I suppose having vector instruction type data stick around
is not going to show up the same way as having new enforcement rules
applied.

But also, looking at this, the existing clearing of MSR_IA32_S_CET is
not sufficient, since it only does it on the cpu doing the kexec. I
think something like the below might be needed. Since per the other
discussion we are going to need to start setting CR4.CET whenever the
HW supports it, for KVM's benefit. So those other CPUs might get
supervisor IBT enabled if we don't clear the msr from every CPU.

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 9730c88530fc..eb57d7f4fa6a 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -96,6 +96,12 @@ static void kdump_nmi_callback(int cpu, struct
pt_regs *regs)
        cpu_emergency_stop_pt();
 
        disable_local_APIC();
+
+       /*
+        * Make sure to disable CET features before kexec so the new
kernel
+        * doesn't get surprised by the enforcement.
+        */
+       cet_disable();
 }
 
 void kdump_nmi_shootdown_cpus(void)
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 738226472468..de65bac0ae02 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -787,6 +787,12 @@ void __noreturn stop_this_cpu(void *dummy)
         */
        if (cpuid_eax(0x8000001f) & BIT(0))
                native_wbinvd();
+
+       /*
+        * Make sure to disable CET features before kexec so the new
kernel
+        * doesn't get surprised by the enforcement.
+        */
+       cet_disable();
        for (;;) {
                /*
                 * Use native_halt() so that memory contents don't
change

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 00/19] Refresh queued CET virtualization series
  2022-06-16 15:28       ` Paolo Bonzini
@ 2022-06-18  6:43         ` Yang, Weijiang
  2022-07-14 19:36           ` Sean Christopherson
  0 siblings, 1 reply; 45+ messages in thread
From: Yang, Weijiang @ 2022-06-18  6:43 UTC (permalink / raw)
  To: Paolo Bonzini, Peter Zijlstra
  Cc: seanjc, x86, kvm, linux-kernel, rick.p.edgecombe


On 6/16/2022 11:28 PM, Paolo Bonzini wrote:
> On 6/16/22 16:18, Peter Zijlstra wrote:
>> On Thu, Jun 16, 2022 at 12:21:20PM +0200, Paolo Bonzini wrote:
>>> On 6/16/22 12:12, Peter Zijlstra wrote:
>>>> Do I understand this right in that a host without X86_KERNEL_IBT 
>>>> cannot
>>>> run a guest with X86_KERNEL_IBT on? That seems unfortunate, since that
>>>> was exactly what I did while developing the X86_KERNEL_IBT patches.
>>>>
>>>> I'm thinking that if the hardware supports it, KVM should expose it,
>>>> irrespective of the host kernel using it.
>>>
>>> For IBT in particular, I think all processor state is only loaded 
>>> and stored
>>> at vmentry/vmexit (does not need XSAVES), so it should be feasible.
>>
>> That would be the S_CET stuff, yeah, that's VMCS managed. The U_CET
>> stuff is all XSAVE though.
>
> What matters is whether XFEATURE_MASK_USER_SUPPORTED includes 
> XFEATURE_CET_USER. 

Small correction, XFEATURE_CET_USER belongs to 
XFEATURE_MASK_SUPERVISOR_SUPPORTED, the name is misleading.


> If you build with !X86_KERNEL_IBT, KVM can still rely on the FPU state 
> for U_CET state, and S_CET is saved/restored via the VMCS independent 
> of X86_KERNEL_IBT.

A fundamental question is, should KVM always honor host CET enablement 
before expose the feature to guest? i.e., check X86_KERNEL_IBT and 
X86_SHADOW_STACK.


>
> Paolo
>
>> But funny thing, CPUID doesn't enumerate {U,S}_CET separately. It *does*
>> enumerate IBT and SS separately, but for each IBT/SS you have to
>> implement both U and S.
>>
>> That was a problem with the first series, which only implemented support
>> for U_CET while advertising IBT and SS (very much including S_CET), and
>> still is a problem with this series because S_SS is missing while
>> advertised.
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 16/19] KVM: x86: Enable CET virtualization for VMX and advertise CET to userspace
  2022-06-16 10:59   ` Peter Zijlstra
  2022-06-16 15:27     ` Yang, Weijiang
@ 2022-06-25  6:55     ` Yang, Weijiang
  1 sibling, 0 replies; 45+ messages in thread
From: Yang, Weijiang @ 2022-06-25  6:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: pbonzini, seanjc, x86, kvm, linux-kernel, Edgecombe, Rick P,
	Sean Christopherson


On 6/16/2022 6:59 PM, Peter Zijlstra wrote:
> On Thu, Jun 16, 2022 at 04:46:40AM -0400, Yang Weijiang wrote:
>> Set the feature bits so that CET capabilities can be seen in guest via
>> CPUID enumeration. Add CR4.CET bit support in order to allow guest set CET
>> master control bit(CR4.CET).
>>
>> Disable KVM CET feature if unrestricted_guest is unsupported/disabled as
>> KVM does not support emulating CET.
>>
>> Don't expose CET feature if dependent CET bits are cleared in host XSS,
>> or if XSAVES isn't supported.  Updating the CET features in common x86 is
>> a little ugly, but there is no clean solution without risking breakage of
>> SVM if SVM hardware ever gains support for CET, e.g. moving everything to
>> common x86 would prematurely expose CET on SVM.  The alternative is to
>> put all the logic in VMX, but that means rereading host_xss in VMX and
>> duplicating the XSAVES check across VMX and SVM.
> Doesn't Zen3 already have SHSTK ?

 From what I read, AMD only supports SHSTK now, IBT is not available. Given

possible implementation difference and the enabling code in vmx is shared by

SHSTK and IBT, I'd like to keep guest CET enabling code specific to vmx 
now.

In the future, part of the code could be hoisted to x86 common to 
support both.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 00/19] Refresh queued CET virtualization series
  2022-06-18  6:43         ` Yang, Weijiang
@ 2022-07-14 19:36           ` Sean Christopherson
  2022-07-15 15:04             ` Yang, Weijiang
  0 siblings, 1 reply; 45+ messages in thread
From: Sean Christopherson @ 2022-07-14 19:36 UTC (permalink / raw)
  To: Yang, Weijiang
  Cc: Paolo Bonzini, Peter Zijlstra, x86, kvm, linux-kernel, rick.p.edgecombe

On Sat, Jun 18, 2022, Yang, Weijiang wrote:
> 
> On 6/16/2022 11:28 PM, Paolo Bonzini wrote:
> > If you build with !X86_KERNEL_IBT, KVM can still rely on the FPU state
> > for U_CET state, and S_CET is saved/restored via the VMCS independent of
> > X86_KERNEL_IBT.
> 
> A fundamental question is, should KVM always honor host CET enablement
> before expose the feature to guest? i.e., check X86_KERNEL_IBT and
> X86_SHADOW_STACK.

If there is a legitimate use case to NOT require host enablement and it's 100%
safe to do so (within requiring hacks to the core kernel), then there's no hard
requirement that says KVM can't virtualize a feature that's not used by the host.

It's definitely uncommon; unless I'm forgetting features, LA57 is the only feature
that KVM fully virtualizes (as opposed to emulates in software) without requiring
host enablement.  Ah, and good ol' MPX, which is probably the best prior are since
it shares the same XSAVE+VMCS for user+supervisor state management.  So more than
one, but still not very many.

But, requiring host "support" is the de facto standard largely because features
tend to fall into one of three categories:

  1. The feature is always available, i.e. doesn't have a software enable/disable
     flag.

  2. The feature isn't explicitly disabled in cpufeatures / x86_capability even
     if it's not used by the host.  E.g. MONITOR/MWAIT comes to mind where the
     host can be configured to not use MWAIT for idle, but it's still reported
     as supported (and for that case, KVM does have to explicitly guard against
     X86_BUG_MONITOR).

  3. Require some amount of host support, e.g. exposing XSAVE without the kernel
     knowing how to save/restore all that state wouldn't end well.

In other words, virtualizing a feature if it's disabled in the host is allowed,
but it's rare because there just aren't many features where doing so is possible
_and_ necessary.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 00/19] Refresh queued CET virtualization series
  2022-07-14 19:36           ` Sean Christopherson
@ 2022-07-15 15:04             ` Yang, Weijiang
  2022-07-15 15:58               ` Sean Christopherson
  0 siblings, 1 reply; 45+ messages in thread
From: Yang, Weijiang @ 2022-07-15 15:04 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Peter Zijlstra, x86, kvm, linux-kernel, rick.p.edgecombe


On 7/15/2022 3:36 AM, Sean Christopherson wrote:
> On Sat, Jun 18, 2022, Yang, Weijiang wrote:
>> On 6/16/2022 11:28 PM, Paolo Bonzini wrote:
>>> If you build with !X86_KERNEL_IBT, KVM can still rely on the FPU state
>>> for U_CET state, and S_CET is saved/restored via the VMCS independent of
>>> X86_KERNEL_IBT.
>> A fundamental question is, should KVM always honor host CET enablement
>> before expose the feature to guest? i.e., check X86_KERNEL_IBT and
>> X86_SHADOW_STACK.
> If there is a legitimate use case to NOT require host enablement and it's 100%
> safe to do so (within requiring hacks to the core kernel), then there's no hard
> requirement that says KVM can't virtualize a feature that's not used by the host.

Yeah, CET definitely can be virtualized without considering host usages, 
but to make things

easier, still back on some kind of host side support, e.g., xsaves.

>
> It's definitely uncommon; unless I'm forgetting features, LA57 is the only feature
> that KVM fully virtualizes (as opposed to emulates in software) without requiring
> host enablement.  Ah, and good ol' MPX, which is probably the best prior are since
> it shares the same XSAVE+VMCS for user+supervisor state management.  So more than
> one, but still not very many.

Speaking of MPX, is it really active in recent kernel? I can find little 
piece of code at native side,

instead, more code in KVM.

>
> But, requiring host "support" is the de facto standard largely because features
> tend to fall into one of three categories:
>
>    1. The feature is always available, i.e. doesn't have a software enable/disable
>       flag.
>
>    2. The feature isn't explicitly disabled in cpufeatures / x86_capability even
>       if it's not used by the host.  E.g. MONITOR/MWAIT comes to mind where the
>       host can be configured to not use MWAIT for idle, but it's still reported
>       as supported (and for that case, KVM does have to explicitly guard against
>       X86_BUG_MONITOR).
>
>    3. Require some amount of host support, e.g. exposing XSAVE without the kernel
>       knowing how to save/restore all that state wouldn't end well.

CET may fall into one of the three or combination of them :-), depending 
on the complexity

of the implementation.

>
> In other words, virtualizing a feature if it's disabled in the host is allowed,
> but it's rare because there just aren't many features where doing so is possible
> _and_ necessary.

I'm thinking of tweaking the patches to construct a safe yet flexible 
solution based on

a bunch of MSRs/CPUIDs/VMCS fields/XSAVES elements + a few host side 
constraints.

Thanks for the enlightenment!



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 00/19] Refresh queued CET virtualization series
  2022-07-15 15:04             ` Yang, Weijiang
@ 2022-07-15 15:58               ` Sean Christopherson
  0 siblings, 0 replies; 45+ messages in thread
From: Sean Christopherson @ 2022-07-15 15:58 UTC (permalink / raw)
  To: Yang, Weijiang
  Cc: Paolo Bonzini, Peter Zijlstra, x86, kvm, linux-kernel, rick.p.edgecombe

On Fri, Jul 15, 2022, Yang, Weijiang wrote:
> 
> On 7/15/2022 3:36 AM, Sean Christopherson wrote:
> > It's definitely uncommon; unless I'm forgetting features, LA57 is the only feature
> > that KVM fully virtualizes (as opposed to emulates in software) without requiring
> > host enablement.  Ah, and good ol' MPX, which is probably the best prior are since
> > it shares the same XSAVE+VMCS for user+supervisor state management.  So more than
> > one, but still not very many.
> 
> Speaking of MPX, is it really active in recent kernel? I can find little
> piece of code at native side, instead, more code in KVM.

Nope, native MPX support was ripped out a year or two ago.  The kernel provides
just enough save+restore support so that KVM can continue to virtualize MPX.

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2022-07-15 15:58 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-16  8:46 [PATCH 00/19] Refresh queued CET virtualization series Yang Weijiang
2022-06-16  8:46 ` [PATCH 01/19] x86/cet/shstk: Add Kconfig option for Shadow Stack Yang Weijiang
2022-06-16  8:46 ` [PATCH 02/19] x86/cpufeatures: Add CPU feature flags for shadow stacks Yang Weijiang
2022-06-16  8:46 ` [PATCH 03/19] x86/cpufeatures: Enable CET CR4 bit for shadow stack Yang Weijiang
2022-06-16 10:24   ` Peter Zijlstra
2022-06-16 17:12     ` Edgecombe, Rick P
2022-06-17 11:38       ` Peter Zijlstra
2022-06-17 21:18         ` Edgecombe, Rick P
2022-06-16 10:25   ` Peter Zijlstra
2022-06-16 17:36     ` Edgecombe, Rick P
2022-06-16  8:46 ` [PATCH 04/19] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states Yang Weijiang
2022-06-16 10:27   ` Peter Zijlstra
2022-06-16 17:12     ` Edgecombe, Rick P
2022-06-16  8:46 ` [PATCH 05/19] x86/fpu: Add helper for modifying xstate Yang Weijiang
2022-06-16  8:46 ` [PATCH 06/19] KVM: x86: Report XSS as an MSR to be saved if there are supported features Yang Weijiang
2022-06-16  8:46 ` [PATCH 07/19] KVM: x86: Refresh CPUID on writes to MSR_IA32_XSS Yang Weijiang
2022-06-16  8:46 ` [PATCH 08/19] KVM: x86: Load guest fpu state when accessing MSRs managed by XSAVES Yang Weijiang
2022-06-16  8:46 ` [PATCH 09/19] KVM: x86: Add #CP support in guest exception classification Yang Weijiang
2022-06-16  8:46 ` [PATCH 10/19] KVM: VMX: Introduce CET VMCS fields and flags Yang Weijiang
2022-06-16  8:46 ` [PATCH 11/19] KVM: x86: Add fault checks for CR4.CET Yang Weijiang
2022-06-16  8:46 ` [PATCH 12/19] KVM: VMX: Emulate reads and writes to CET MSRs Yang Weijiang
2022-06-16  8:46 ` [PATCH 13/19] KVM: VMX: Add a synthetic MSR to allow userspace VMM to access GUEST_SSP Yang Weijiang
2022-06-16  8:46 ` [PATCH 14/19] KVM: x86: Report CET MSRs as to-be-saved if CET is supported Yang Weijiang
2022-06-16  8:46 ` [PATCH 15/19] KVM: x86: Save/Restore GUEST_SSP to/from SMM state save area Yang Weijiang
2022-06-16  8:46 ` [PATCH 16/19] KVM: x86: Enable CET virtualization for VMX and advertise CET to userspace Yang Weijiang
2022-06-16 10:59   ` Peter Zijlstra
2022-06-16 15:27     ` Yang, Weijiang
2022-06-25  6:55     ` Yang, Weijiang
2022-06-16  8:46 ` [PATCH 17/19] KVM: VMX: Pass through CET MSRs to the guest when supported Yang Weijiang
2022-06-16  8:46 ` [PATCH 18/19] KVM: nVMX: Enable CET support for nested VMX Yang Weijiang
2022-06-16  8:46 ` [PATCH 19/19] KVM: x86: Enable supervisor IBT support for guest Yang Weijiang
2022-06-16 11:05   ` Peter Zijlstra
2022-06-16 11:19   ` Peter Zijlstra
2022-06-16 15:56     ` Yang, Weijiang
2022-06-16  9:10 ` [PATCH 00/19] Refresh queued CET virtualization series Christoph Hellwig
2022-06-16 11:25   ` Peter Zijlstra
2022-06-16 10:12 ` Peter Zijlstra
2022-06-16 10:21   ` Paolo Bonzini
2022-06-16 14:18     ` Peter Zijlstra
2022-06-16 15:06       ` Yang, Weijiang
2022-06-16 15:28       ` Paolo Bonzini
2022-06-18  6:43         ` Yang, Weijiang
2022-07-14 19:36           ` Sean Christopherson
2022-07-15 15:04             ` Yang, Weijiang
2022-07-15 15:58               ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).