kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/21] Enable CET Virtualization
@ 2023-05-11  4:08 Yang Weijiang
  2023-05-11  4:08 ` [PATCH v3 01/21] x86/shstk: Add Kconfig option for shadow stack Yang Weijiang
                   ` (21 more replies)
  0 siblings, 22 replies; 99+ messages in thread
From: Yang Weijiang @ 2023-05-11  4:08 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm, linux-kernel
  Cc: peterz, rppt, binbin.wu, rick.p.edgecombe, weijiang.yang, john.allen

Control-flow Enforcement Technology (CET) is a CPU feature used to prevent
Return/Jump-Oriented Programming (ROP/JOP) attacks. CET introduces a new
exception type, Control Protection (#CP), and two sub-features(SHSTK,IBT)
to defend against ROP/JOP style control-flow subversion attacks.

Shadow Stack (SHSTK):
  A shadow stack is a second stack used exclusively for control transfer
  operations. The shadow stack is separate from the data/normal stack and
  can be enabled individually in user and kernel mode. When shadow stack
  is enabled, CALL pushes the return address on both the data and shadow
  stack. RET pops the return address from both stacks and compares them.
  If the return addresses from the two stacks do not match, the processor
  generates a #CP.

Indirect Branch Tracking (IBT):
  IBT adds a new instruction, ENDBRANCH, to mark valid target addresses of
  indirect branches (CALL, JMP etc...). If an indirect branch is executed
  and the next instruction is _not_ an ENDBRANCH, the processor generates a
  #CP. These instruction behaves as a NOP on platforms that doesn't support
  CET.


Dependency:
--------------------------------------------------------------------------
The first 5 patches are taken over from CET native series [1] in linux-next.
They're prerequisites for enabling guest user mode SHSTK. Patch this full
series before build host kernel for guest CET testing. Also apply CET enabling
patches in [2] to build qualified QEMU. These kernel dependent patches will
be enclosed in KVM series until CET native series is merged in mainline tree.


Implementation:
--------------------------------------------------------------------------
Historically, the early KVM patches can support both user SHSTK and IBT,
and most of the early patches are carried forward with changes in this new
series. And with kernel IBT feature merged in 5.18, a new patch was added
to support the feature in guest. The last patch is introduced to support
supervisor SHSTK but the feature is not enabled on Intel platform for now,
the main purpose of this patch is to facilitate AMD folks to enable the
feature.

In summary, this new series enables CET user SHSTK/IBT and kernel IBT, but
doesn't fully support CET supervisor SHSTK, the enabling work is left for
the future.

Supported CET sub-features:

                  |
    User SHSTK    |    User IBT      (user mode)
--------------------------------------------------
    s-SHSTK (X)   |    Kernel IBT    (kernel mode)
                  |

Guest user mode SHSTK/IBT relies on host side XSAVES support(XSS[bit 11])
to swap CET states. Guest kernel IBT doesn't have dependency on host XSAVES.
The supervisor SHSTK relies on host side XSAVES support(XSS[bit 12]) for
supervisor mode CET states save/restore.

This version removed unnecessary checks of host CET enabling status before
expose CET features to guest, making guest CET enabling apart from host.
By doing so, it's expected to be more friendly to cloud computing scenarios.


CET states management:
--------------------------------------------------------------------------
CET user mode states, MSR_IA32_{U_CET,PL3_SSP} depends on {XSAVES,XRSTORS}
instructions to swap guest/host context when vm-exit/vm-entry happens. 
On vm-exit, the guest CET states are stored to guest fpu area and host user
mode states are loaded from thread/process context before vCPU returns to
userspace, vice-versa on vm-entry. See details in kvm_{load|put}_guest_fpu().
So the user mode state validity depends on host side U_CET bit set in MSR_XSS.

CET supervisor mode states are grouped into two categories - XSAVES dependent
and non-dependent, the former includes MSR_IA32_PL{0,1,2}_SSP, the later
consists of MSR_IA32_S_CET and MSR_IA32_INTR_SSP_TBL. The XSAVES dependent
MSR's save/restore depends on S_CET bit set in MSR_XSS. Since native series
doesn't enable S_CET support, these s-SHSTK shadow stack pointers are invalid.

New VMCS fields, {GUEST|HOST}_{S_CET,SSP,INTR_SSP_TABL}, are introduced for
guest/host non-XSAVES managed states switch. When CET entry/exit load bits are
set, guest/host MSR_IA32_{S_CET,INTR_SSP_TBL,SSP} are loaded from these fields
at vm-exit/entry. With these new fields, current guest kernel IBT enabling
doesn't depend on S_CET bit in XSS, i.e., host {XSAVES|XRSTORS} support.


Tests:
--------------------------------------------------------------------------
This series passed basic CET user shadow stack test and kernel IBT test in
L1 and L2 guest. It also works with CET KVM-unit-test application.

Executed all KVM-unit-test cases and KVM selftests against this series, all
test cases passed except the vmx test, the failure is due to CR4_CET bit
testing in test_vmxon_bad_cr(). After add CR4_CET bit to skip list, the test
passed. I'll send a patch to fix this issue later.


To run user shadow stack test and kernel IBT test in VM, you need an CET
capable platform, e.g., Sapphire Rapids server, and follow below steps to
build host/guest kernel properly:

1. Build host kernel. Patch this series to kernel tree and build kernel.

2. Build guest kernel. Patch CET native series to kernel tree and opt-in
CONFIG_X86_KERNEL_IBT and CONFIG_X86_USER_SHADOW_STACK options. Build with
CET enabled gcc versions(>= 8.5.0).

3. Use patched QEMU to launch a VM.

Check kernel selftest test_shadow_stack_64 output:

[INFO]  new_ssp = 7f8c82100ff8, *new_ssp = 7f8c82101001
[INFO]  changing ssp from 7f8c82900ff0 to 7f8c82100ff8
[INFO]  ssp is now 7f8c82101000
[OK]    Shadow stack pivot
[OK]    Shadow stack faults
[INFO]  Corrupting shadow stack
[INFO]  Generated shadow stack violation successfully
[OK]    Shadow stack violation test
[INFO]  Gup read -> shstk access success
[INFO]  Gup write -> shstk access success
[INFO]  Violation from normal write
[INFO]  Gup read -> write access success
[INFO]  Violation from normal write
[INFO]  Gup write -> write access success
[INFO]  Cow gup write -> write access success
[OK]    Shadow gup test
[INFO]  Violation from shstk access
[OK]    mprotect() test
[SKIP]  Userfaultfd unavailable.
[OK]    32 bit test


Check kernel IBT with dmesg | grep CET:

CET detected: Indirect Branch Tracking enabled

--------------------------------------------------------------------------
Changes in v3:
1. Moved MSR access check helper to x86 common file. [Mike]
2. Modified cover letter, commit logs and code per review comments. [PeterZ, Binbin, Rick]
3. Fixed an issue on host MSR_IA32_S_CET reload at vm-exit.
5. Rebase on kvm-x86/next [4].


[1]: linux-next: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/?h=next-20230420
[2]: QEMU patch: https://lore.kernel.org/all/20230421041227.90915-1-weijiang.yang@intel.com/
[3]: v2 patchset: https://lore.kernel.org/all/20230421134615.62539-1-weijiang.yang@intel.com/
[4]: Rebase branch: https://github.com/kvm-x86/linux.git, commit: 5c291b93e5d6 (tag: kvm-x86-next-2023.04.26)


Rick Edgecombe (5):
  x86/shstk: Add Kconfig option for shadow stack
  x86/cpufeatures: Add CPU feature flags for shadow stacks
  x86/cpufeatures: Enable CET CR4 bit for shadow stack
  x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states
  x86/fpu: Add helper for modifying xstate

Sean Christopherson (2):
  KVM:x86: Report XSS as to-be-saved if there are supported features
  KVM:x86: Load guest FPU state when accessing xsaves-managed MSRs

Yang Weijiang (14):
  KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS
  KVM:x86: Init kvm_caps.supported_xss with supported feature bits
  KVM:x86: Add #CP support in guest exception classification
  KVM:VMX: Introduce CET VMCS fields and control bits
  KVM:x86: Add fault checks for guest CR4.CET setting
  KVM:VMX: Emulate reads and writes to CET MSRs
  KVM:VMX: Add a synthetic MSR to allow userspace to access GUEST_SSP
  KVM:x86: Report CET MSRs as to-be-saved if CET is supported
  KVM:x86: Save/Restore GUEST_SSP to/from SMM state save area
  KVM:VMX: Pass through user CET MSRs to the guest
  KVM:x86: Enable CET virtualization for VMX and advertise to userspace
  KVM:nVMX: Enable user CET support for nested VMX
  KVM:x86: Enable kernel IBT support for guest
  KVM:x86: Support CET supervisor shadow stack MSR access

 arch/x86/Kconfig                         |  24 +++++
 arch/x86/Kconfig.assembler               |   5 +
 arch/x86/include/asm/cpufeatures.h       |   2 +
 arch/x86/include/asm/disabled-features.h |   8 +-
 arch/x86/include/asm/fpu/api.h           |   9 ++
 arch/x86/include/asm/fpu/types.h         |  16 ++-
 arch/x86/include/asm/fpu/xstate.h        |   6 +-
 arch/x86/include/asm/kvm_host.h          |   3 +-
 arch/x86/include/asm/vmx.h               |   8 ++
 arch/x86/include/uapi/asm/kvm.h          |   1 +
 arch/x86/include/uapi/asm/kvm_para.h     |   1 +
 arch/x86/kernel/cpu/common.c             |  35 +++++--
 arch/x86/kernel/cpu/cpuid-deps.c         |   1 +
 arch/x86/kernel/fpu/core.c               |  19 ++++
 arch/x86/kernel/fpu/xstate.c             |  90 ++++++++--------
 arch/x86/kvm/cpuid.c                     |  19 +++-
 arch/x86/kvm/cpuid.h                     |   6 ++
 arch/x86/kvm/smm.c                       |  20 ++++
 arch/x86/kvm/vmx/capabilities.h          |   4 +
 arch/x86/kvm/vmx/nested.c                |  29 +++++-
 arch/x86/kvm/vmx/vmcs12.c                |   6 ++
 arch/x86/kvm/vmx/vmcs12.h                |  14 ++-
 arch/x86/kvm/vmx/vmx.c                   | 124 ++++++++++++++++++++++-
 arch/x86/kvm/vmx/vmx.h                   |   6 +-
 arch/x86/kvm/x86.c                       | 122 ++++++++++++++++++++--
 arch/x86/kvm/x86.h                       |  47 ++++++++-
 26 files changed, 543 insertions(+), 82 deletions(-)


base-commit: 5c291b93e5d665380dbecc6944973583f9565ee5
-- 
2.27.0


^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v3 01/21] x86/shstk: Add Kconfig option for shadow stack
  2023-05-11  4:08 [PATCH v3 00/21] Enable CET Virtualization Yang Weijiang
@ 2023-05-11  4:08 ` Yang Weijiang
  2023-05-11  4:08 ` [PATCH v3 02/21] x86/cpufeatures: Add CPU feature flags for shadow stacks Yang Weijiang
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 99+ messages in thread
From: Yang Weijiang @ 2023-05-11  4:08 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm, linux-kernel
  Cc: peterz, rppt, binbin.wu, rick.p.edgecombe, weijiang.yang,
	john.allen, Yu-cheng Yu, Borislav Petkov, Kees Cook, Pengfei Xu

From: Rick Edgecombe <rick.p.edgecombe@intel.com>

Shadow stack provides protection for applications against function return
address corruption. It is active when the processor supports it, the
kernel has CONFIG_X86_SHADOW_STACK enabled, and the application is built
for the feature. This is only implemented for the 64-bit kernel. When it
is enabled, legacy non-shadow stack applications continue to work, but
without protection.

Since there is another feature that utilizes CET (Kernel IBT) that will
share implementation with shadow stacks, create CONFIG_CET to signify
that at least one CET feature is configured.

Co-developed-by: Yu-cheng Yu <yu-cheng.yu@intel.com>

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/all/20230319001535.23210-3-rick.p.edgecombe%40intel.com
---
 arch/x86/Kconfig           | 24 ++++++++++++++++++++++++
 arch/x86/Kconfig.assembler |  5 +++++
 2 files changed, 29 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a825bf031f49..f03791b73f9f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1851,6 +1851,11 @@ config CC_HAS_IBT
 		  (CC_IS_CLANG && CLANG_VERSION >= 140000)) && \
 		  $(as-instr,endbr64)
 
+config X86_CET
+	def_bool n
+	help
+	  CET features configured (Shadow stack or IBT)
+
 config X86_KERNEL_IBT
 	prompt "Indirect Branch Tracking"
 	def_bool y
@@ -1858,6 +1863,7 @@ config X86_KERNEL_IBT
 	# https://github.com/llvm/llvm-project/commit/9d7001eba9c4cb311e03cd8cdc231f9e579f2d0f
 	depends on !LD_IS_LLD || LLD_VERSION >= 140000
 	select OBJTOOL
+	select X86_CET
 	help
 	  Build the kernel with support for Indirect Branch Tracking, a
 	  hardware support course-grain forward-edge Control Flow Integrity
@@ -1952,6 +1958,24 @@ config X86_SGX
 
 	  If unsure, say N.
 
+config X86_USER_SHADOW_STACK
+	bool "X86 userspace shadow stack"
+	depends on AS_WRUSS
+	depends on X86_64
+	select ARCH_USES_HIGH_VMA_FLAGS
+	select X86_CET
+	help
+	  Shadow stack protection is a hardware feature that detects function
+	  return address corruption.  This helps mitigate ROP attacks.
+	  Applications must be enabled to use it, and old userspace does not
+	  get protection "for free".
+
+	  CPUs supporting shadow stacks were first released in 2020.
+
+	  See Documentation/x86/shstk.rst for more information.
+
+	  If unsure, say N.
+
 config EFI
 	bool "EFI runtime service support"
 	depends on ACPI
diff --git a/arch/x86/Kconfig.assembler b/arch/x86/Kconfig.assembler
index b88f784cb02e..8ad41da301e5 100644
--- a/arch/x86/Kconfig.assembler
+++ b/arch/x86/Kconfig.assembler
@@ -24,3 +24,8 @@ config AS_GFNI
 	def_bool $(as-instr,vgf2p8mulb %xmm0$(comma)%xmm1$(comma)%xmm2)
 	help
 	  Supported by binutils >= 2.30 and LLVM integrated assembler
+
+config AS_WRUSS
+	def_bool $(as-instr,wrussq %rax$(comma)(%rbx))
+	help
+	  Supported by binutils >= 2.31 and LLVM integrated assembler
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 02/21] x86/cpufeatures: Add CPU feature flags for shadow stacks
  2023-05-11  4:08 [PATCH v3 00/21] Enable CET Virtualization Yang Weijiang
  2023-05-11  4:08 ` [PATCH v3 01/21] x86/shstk: Add Kconfig option for shadow stack Yang Weijiang
@ 2023-05-11  4:08 ` Yang Weijiang
  2023-05-11  4:08 ` [PATCH v3 03/21] x86/cpufeatures: Enable CET CR4 bit for shadow stack Yang Weijiang
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 99+ messages in thread
From: Yang Weijiang @ 2023-05-11  4:08 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm, linux-kernel
  Cc: peterz, rppt, binbin.wu, rick.p.edgecombe, weijiang.yang,
	john.allen, Yu-cheng Yu, Borislav Petkov, Kees Cook, Pengfei Xu

From: Rick Edgecombe <rick.p.edgecombe@intel.com>

The Control-Flow Enforcement Technology contains two related features,
one of which is Shadow Stacks. Future patches will utilize this feature
for shadow stack support in KVM, so add a CPU feature flags for Shadow
Stacks (CPUID.(EAX=7,ECX=0):ECX[bit 7]).

To protect shadow stack state from malicious modification, the registers
are only accessible in supervisor mode. This implementation
context-switches the registers with XSAVES. Make X86_FEATURE_SHSTK depend
on XSAVES.

The shadow stack feature, enumerated by the CPUID bit described above,
encompasses both supervisor and userspace support for shadow stack. In
near future patches, only userspace shadow stack will be enabled. In
expectation of future supervisor shadow stack support, create a software
CPU capability to enumerate kernel utilization of userspace shadow stack
support. This user shadow stack bit should depend on the HW "shstk"
capability and that logic will be implemented in future patches.

Co-developed-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/all/20230319001535.23210-4-rick.p.edgecombe%40intel.com
---
 arch/x86/include/asm/cpufeatures.h       | 2 ++
 arch/x86/include/asm/disabled-features.h | 8 +++++++-
 arch/x86/kernel/cpu/cpuid-deps.c         | 1 +
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 97327a1e3aff..3993ea7c6312 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -308,6 +308,7 @@
 #define X86_FEATURE_MSR_TSX_CTRL	(11*32+20) /* "" MSR IA32_TSX_CTRL (Intel) implemented */
 #define X86_FEATURE_SMBA		(11*32+21) /* "" Slow Memory Bandwidth Allocation */
 #define X86_FEATURE_BMEC		(11*32+22) /* "" Bandwidth Monitoring Event Configuration */
+#define X86_FEATURE_USER_SHSTK		(11*32+23) /* Shadow stack support for user mode applications */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI instructions */
@@ -379,6 +380,7 @@
 #define X86_FEATURE_OSPKE		(16*32+ 4) /* OS Protection Keys Enable */
 #define X86_FEATURE_WAITPKG		(16*32+ 5) /* UMONITOR/UMWAIT/TPAUSE Instructions */
 #define X86_FEATURE_AVX512_VBMI2	(16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */
+#define X86_FEATURE_SHSTK		(16*32+ 7) /* "" Shadow stack */
 #define X86_FEATURE_GFNI		(16*32+ 8) /* Galois Field New Instructions */
 #define X86_FEATURE_VAES		(16*32+ 9) /* Vector AES */
 #define X86_FEATURE_VPCLMULQDQ		(16*32+10) /* Carry-Less Multiplication Double Quadword */
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index 5dfa4fb76f4b..505f78ddca82 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -99,6 +99,12 @@
 # define DISABLE_TDX_GUEST	(1 << (X86_FEATURE_TDX_GUEST & 31))
 #endif
 
+#ifdef CONFIG_X86_USER_SHADOW_STACK
+#define DISABLE_USER_SHSTK	0
+#else
+#define DISABLE_USER_SHSTK	(1 << (X86_FEATURE_USER_SHSTK & 31))
+#endif
+
 /*
  * Make sure to add features to the correct mask
  */
@@ -114,7 +120,7 @@
 #define DISABLED_MASK9	(DISABLE_SGX)
 #define DISABLED_MASK10	0
 #define DISABLED_MASK11	(DISABLE_RETPOLINE|DISABLE_RETHUNK|DISABLE_UNRET| \
-			 DISABLE_CALL_DEPTH_TRACKING)
+			 DISABLE_CALL_DEPTH_TRACKING|DISABLE_USER_SHSTK)
 #define DISABLED_MASK12	0
 #define DISABLED_MASK13	0
 #define DISABLED_MASK14	0
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index f6748c8bd647..e462c1d3800a 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -81,6 +81,7 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_XFD,			X86_FEATURE_XSAVES    },
 	{ X86_FEATURE_XFD,			X86_FEATURE_XGETBV1   },
 	{ X86_FEATURE_AMX_TILE,			X86_FEATURE_XFD       },
+	{ X86_FEATURE_SHSTK,			X86_FEATURE_XSAVES    },
 	{}
 };
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 03/21] x86/cpufeatures: Enable CET CR4 bit for shadow stack
  2023-05-11  4:08 [PATCH v3 00/21] Enable CET Virtualization Yang Weijiang
  2023-05-11  4:08 ` [PATCH v3 01/21] x86/shstk: Add Kconfig option for shadow stack Yang Weijiang
  2023-05-11  4:08 ` [PATCH v3 02/21] x86/cpufeatures: Add CPU feature flags for shadow stacks Yang Weijiang
@ 2023-05-11  4:08 ` Yang Weijiang
  2023-05-11  4:08 ` [PATCH v3 04/21] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states Yang Weijiang
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 99+ messages in thread
From: Yang Weijiang @ 2023-05-11  4:08 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm, linux-kernel
  Cc: peterz, rppt, binbin.wu, rick.p.edgecombe, weijiang.yang,
	john.allen, Yu-cheng Yu, Borislav Petkov, Kees Cook, Pengfei Xu

From: Rick Edgecombe <rick.p.edgecombe@intel.com>

Setting CR4.CET is a prerequisite for utilizing any CET features, most of
which also require setting MSRs.

Kernel IBT already enables the CET CR4 bit when it detects IBT HW support
and is configured with kernel IBT. However, future patches that enable
userspace shadow stack support will need the bit set as well. So change
the logic to enable it in either case.

Clear MSR_IA32_U_CET in cet_disable() so that it can't live to see
userspace in a new kexec-ed kernel that has CR4.CET set from kernel IBT.

Co-developed-by: Yu-cheng Yu <yu-cheng.yu@intel.com>

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/all/20230319001535.23210-5-rick.p.edgecombe%40intel.com
---
 arch/x86/kernel/cpu/common.c | 35 +++++++++++++++++++++++++++--------
 1 file changed, 27 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 8cd4126d8253..cc686e5039be 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -600,27 +600,43 @@ __noendbr void ibt_restore(u64 save)
 
 static __always_inline void setup_cet(struct cpuinfo_x86 *c)
 {
-	u64 msr = CET_ENDBR_EN;
+	bool user_shstk, kernel_ibt;
 
-	if (!HAS_KERNEL_IBT ||
-	    !cpu_feature_enabled(X86_FEATURE_IBT))
+	if (!IS_ENABLED(CONFIG_X86_CET))
 		return;
 
-	wrmsrl(MSR_IA32_S_CET, msr);
+	kernel_ibt = HAS_KERNEL_IBT && cpu_feature_enabled(X86_FEATURE_IBT);
+	user_shstk = cpu_feature_enabled(X86_FEATURE_SHSTK) &&
+		     IS_ENABLED(CONFIG_X86_USER_SHADOW_STACK);
+
+	if (!kernel_ibt && !user_shstk)
+		return;
+
+	if (user_shstk)
+		set_cpu_cap(c, X86_FEATURE_USER_SHSTK);
+
+	if (kernel_ibt)
+		wrmsrl(MSR_IA32_S_CET, CET_ENDBR_EN);
+	else
+		wrmsrl(MSR_IA32_S_CET, 0);
+
 	cr4_set_bits(X86_CR4_CET);
 
-	if (!ibt_selftest()) {
+	if (kernel_ibt && !ibt_selftest()) {
 		pr_err("IBT selftest: Failed!\n");
 		wrmsrl(MSR_IA32_S_CET, 0);
 		setup_clear_cpu_cap(X86_FEATURE_IBT);
-		return;
 	}
 }
 
 __noendbr void cet_disable(void)
 {
-	if (cpu_feature_enabled(X86_FEATURE_IBT))
-		wrmsrl(MSR_IA32_S_CET, 0);
+	if (!(cpu_feature_enabled(X86_FEATURE_IBT) ||
+	      cpu_feature_enabled(X86_FEATURE_SHSTK)))
+		return;
+
+	wrmsrl(MSR_IA32_S_CET, 0);
+	wrmsrl(MSR_IA32_U_CET, 0);
 }
 
 /*
@@ -1482,6 +1498,9 @@ static void __init cpu_parse_early_param(void)
 	if (cmdline_find_option_bool(boot_command_line, "noxsaves"))
 		setup_clear_cpu_cap(X86_FEATURE_XSAVES);
 
+	if (cmdline_find_option_bool(boot_command_line, "nousershstk"))
+		setup_clear_cpu_cap(X86_FEATURE_USER_SHSTK);
+
 	arglen = cmdline_find_option(boot_command_line, "clearcpuid", arg, sizeof(arg));
 	if (arglen <= 0)
 		return;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 04/21] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states
  2023-05-11  4:08 [PATCH v3 00/21] Enable CET Virtualization Yang Weijiang
                   ` (2 preceding siblings ...)
  2023-05-11  4:08 ` [PATCH v3 03/21] x86/cpufeatures: Enable CET CR4 bit for shadow stack Yang Weijiang
@ 2023-05-11  4:08 ` Yang Weijiang
  2023-05-11  4:08 ` [PATCH v3 05/21] x86/fpu: Add helper for modifying xstate Yang Weijiang
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 99+ messages in thread
From: Yang Weijiang @ 2023-05-11  4:08 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm, linux-kernel
  Cc: peterz, rppt, binbin.wu, rick.p.edgecombe, weijiang.yang,
	john.allen, Yu-cheng Yu, Borislav Petkov, Kees Cook, Pengfei Xu

From: Rick Edgecombe <rick.p.edgecombe@intel.com>

Shadow stack register state can be managed with XSAVE. The registers
can logically be separated into two groups:
        * Registers controlling user-mode operation
        * Registers controlling kernel-mode operation

The architecture has two new XSAVE state components: one for each group
of those groups of registers. This lets an OS manage them separately if
it chooses. Future patches for host userspace and KVM guests will only
utilize the user-mode registers, so only configure XSAVE to save
user-mode registers. This state will add 16 bytes to the xsave buffer
size.

Future patches will use the user-mode XSAVE area to save guest user-mode
CET state. However, VMCS includes new fields for guest CET supervisor
states. KVM can use these to save and restore guest supervisor state, so
host supervisor XSAVE support is not required.

Adding this exacerbates the already unwieldy if statement in
check_xstate_against_struct() that handles warning about un-implemented
xfeatures. So refactor these check's by having XCHECK_SZ() set a bool when
it actually check's the xfeature. This ends up exceeding 80 chars, but was
better on balance than other options explored. Pass the bool as pointer to
make it clear that XCHECK_SZ() can change the variable.

While configuring user-mode XSAVE, clarify kernel-mode registers are not
managed by XSAVE by defining the xfeature in
XFEATURE_MASK_SUPERVISOR_UNSUPPORTED, like is done for XFEATURE_MASK_PT.
This serves more of a documentation as code purpose, and functionally,
only enables a few safety checks.

Both XSAVE state components are supervisor states, even the state
controlling user-mode operation. This is a departure from earlier features
like protection keys where the PKRU state is a normal user
(non-supervisor) state. Having the user state be supervisor-managed
ensures there is no direct, unprivileged access to it, making it harder
for an attacker to subvert CET.

To facilitate this privileged access, define the two user-mode CET MSRs,
and the bits defined in those MSRs relevant to future shadow stack
enablement patches.

Co-developed-by: Yu-cheng Yu <yu-cheng.yu@intel.com>

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/all/20230319001535.23210-6-rick.p.edgecombe%40intel.com
---
 arch/x86/include/asm/fpu/types.h  | 16 +++++-
 arch/x86/include/asm/fpu/xstate.h |  6 ++-
 arch/x86/kernel/fpu/xstate.c      | 90 +++++++++++++++----------------
 3 files changed, 61 insertions(+), 51 deletions(-)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index 7f6d858ff47a..eb810074f1e7 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -115,8 +115,8 @@ enum xfeature {
 	XFEATURE_PT_UNIMPLEMENTED_SO_FAR,
 	XFEATURE_PKRU,
 	XFEATURE_PASID,
-	XFEATURE_RSRVD_COMP_11,
-	XFEATURE_RSRVD_COMP_12,
+	XFEATURE_CET_USER,
+	XFEATURE_CET_KERNEL_UNUSED,
 	XFEATURE_RSRVD_COMP_13,
 	XFEATURE_RSRVD_COMP_14,
 	XFEATURE_LBR,
@@ -138,6 +138,8 @@ enum xfeature {
 #define XFEATURE_MASK_PT		(1 << XFEATURE_PT_UNIMPLEMENTED_SO_FAR)
 #define XFEATURE_MASK_PKRU		(1 << XFEATURE_PKRU)
 #define XFEATURE_MASK_PASID		(1 << XFEATURE_PASID)
+#define XFEATURE_MASK_CET_USER		(1 << XFEATURE_CET_USER)
+#define XFEATURE_MASK_CET_KERNEL	(1 << XFEATURE_CET_KERNEL_UNUSED)
 #define XFEATURE_MASK_LBR		(1 << XFEATURE_LBR)
 #define XFEATURE_MASK_XTILE_CFG		(1 << XFEATURE_XTILE_CFG)
 #define XFEATURE_MASK_XTILE_DATA	(1 << XFEATURE_XTILE_DATA)
@@ -252,6 +254,16 @@ struct pkru_state {
 	u32				pad;
 } __packed;
 
+/*
+ * State component 11 is Control-flow Enforcement user states
+ */
+struct cet_user_state {
+	/* user control-flow settings */
+	u64 user_cet;
+	/* user shadow stack pointer */
+	u64 user_ssp;
+};
+
 /*
  * State component 15: Architectural LBR configuration state.
  * The size of Arch LBR state depends on the number of LBRs (lbr_depth).
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index cd3dd170e23a..d4427b88ee12 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -50,7 +50,8 @@
 #define XFEATURE_MASK_USER_DYNAMIC	XFEATURE_MASK_XTILE_DATA
 
 /* All currently supported supervisor features */
-#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID)
+#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID | \
+					    XFEATURE_MASK_CET_USER)
 
 /*
  * A supervisor state component may not always contain valuable information,
@@ -77,7 +78,8 @@
  * Unsupported supervisor features. When a supervisor feature in this mask is
  * supported in the future, move it to the supported supervisor feature mask.
  */
-#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT)
+#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT | \
+					      XFEATURE_MASK_CET_KERNEL)
 
 /* All supervisor states including supported and unsupported states. */
 #define XFEATURE_MASK_SUPERVISOR_ALL (XFEATURE_MASK_SUPERVISOR_SUPPORTED | \
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 714166cc25f2..13a80521dd51 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -39,26 +39,26 @@
  */
 static const char *xfeature_names[] =
 {
-	"x87 floating point registers"	,
-	"SSE registers"			,
-	"AVX registers"			,
-	"MPX bounds registers"		,
-	"MPX CSR"			,
-	"AVX-512 opmask"		,
-	"AVX-512 Hi256"			,
-	"AVX-512 ZMM_Hi256"		,
-	"Processor Trace (unused)"	,
+	"x87 floating point registers",
+	"SSE registers",
+	"AVX registers",
+	"MPX bounds registers",
+	"MPX CSR",
+	"AVX-512 opmask",
+	"AVX-512 Hi256",
+	"AVX-512 ZMM_Hi256",
+	"Processor Trace (unused)",
 	"Protection Keys User registers",
 	"PASID state",
-	"unknown xstate feature"	,
-	"unknown xstate feature"	,
-	"unknown xstate feature"	,
-	"unknown xstate feature"	,
-	"unknown xstate feature"	,
-	"unknown xstate feature"	,
-	"AMX Tile config"		,
-	"AMX Tile data"			,
-	"unknown xstate feature"	,
+	"Control-flow User registers",
+	"Control-flow Kernel registers (unused)",
+	"unknown xstate feature",
+	"unknown xstate feature",
+	"unknown xstate feature",
+	"unknown xstate feature",
+	"AMX Tile config",
+	"AMX Tile data",
+	"unknown xstate feature",
 };
 
 static unsigned short xsave_cpuid_features[] __initdata = {
@@ -73,6 +73,7 @@ static unsigned short xsave_cpuid_features[] __initdata = {
 	[XFEATURE_PT_UNIMPLEMENTED_SO_FAR]	= X86_FEATURE_INTEL_PT,
 	[XFEATURE_PKRU]				= X86_FEATURE_PKU,
 	[XFEATURE_PASID]			= X86_FEATURE_ENQCMD,
+	[XFEATURE_CET_USER]			= X86_FEATURE_SHSTK,
 	[XFEATURE_XTILE_CFG]			= X86_FEATURE_AMX_TILE,
 	[XFEATURE_XTILE_DATA]			= X86_FEATURE_AMX_TILE,
 };
@@ -276,6 +277,7 @@ static void __init print_xstate_features(void)
 	print_xstate_feature(XFEATURE_MASK_Hi16_ZMM);
 	print_xstate_feature(XFEATURE_MASK_PKRU);
 	print_xstate_feature(XFEATURE_MASK_PASID);
+	print_xstate_feature(XFEATURE_MASK_CET_USER);
 	print_xstate_feature(XFEATURE_MASK_XTILE_CFG);
 	print_xstate_feature(XFEATURE_MASK_XTILE_DATA);
 }
@@ -344,6 +346,7 @@ static __init void os_xrstor_booting(struct xregs_state *xstate)
 	 XFEATURE_MASK_BNDREGS |		\
 	 XFEATURE_MASK_BNDCSR |			\
 	 XFEATURE_MASK_PASID |			\
+	 XFEATURE_MASK_CET_USER |		\
 	 XFEATURE_MASK_XTILE)
 
 /*
@@ -446,14 +449,15 @@ static void __init __xstate_dump_leaves(void)
 	}									\
 } while (0)
 
-#define XCHECK_SZ(sz, nr, nr_macro, __struct) do {			\
-	if ((nr == nr_macro) &&						\
-	    WARN_ONCE(sz != sizeof(__struct),				\
-		"%s: struct is %zu bytes, cpu state %d bytes\n",	\
-		__stringify(nr_macro), sizeof(__struct), sz)) {		\
+#define XCHECK_SZ(sz, nr, __struct) ({					\
+	if (WARN_ONCE(sz != sizeof(__struct),				\
+	    "[%s]: struct is %zu bytes, cpu state %d bytes\n",		\
+	    xfeature_names[nr], sizeof(__struct), sz)) {		\
 		__xstate_dump_leaves();					\
 	}								\
-} while (0)
+	true;								\
+})
+
 
 /**
  * check_xtile_data_against_struct - Check tile data state size.
@@ -527,36 +531,28 @@ static bool __init check_xstate_against_struct(int nr)
 	 * Ask the CPU for the size of the state.
 	 */
 	int sz = xfeature_size(nr);
+
 	/*
 	 * Match each CPU state with the corresponding software
 	 * structure.
 	 */
-	XCHECK_SZ(sz, nr, XFEATURE_YMM,       struct ymmh_struct);
-	XCHECK_SZ(sz, nr, XFEATURE_BNDREGS,   struct mpx_bndreg_state);
-	XCHECK_SZ(sz, nr, XFEATURE_BNDCSR,    struct mpx_bndcsr_state);
-	XCHECK_SZ(sz, nr, XFEATURE_OPMASK,    struct avx_512_opmask_state);
-	XCHECK_SZ(sz, nr, XFEATURE_ZMM_Hi256, struct avx_512_zmm_uppers_state);
-	XCHECK_SZ(sz, nr, XFEATURE_Hi16_ZMM,  struct avx_512_hi16_state);
-	XCHECK_SZ(sz, nr, XFEATURE_PKRU,      struct pkru_state);
-	XCHECK_SZ(sz, nr, XFEATURE_PASID,     struct ia32_pasid_state);
-	XCHECK_SZ(sz, nr, XFEATURE_XTILE_CFG, struct xtile_cfg);
-
-	/* The tile data size varies between implementations. */
-	if (nr == XFEATURE_XTILE_DATA)
-		check_xtile_data_against_struct(sz);
-
-	/*
-	 * Make *SURE* to add any feature numbers in below if
-	 * there are "holes" in the xsave state component
-	 * numbers.
-	 */
-	if ((nr < XFEATURE_YMM) ||
-	    (nr >= XFEATURE_MAX) ||
-	    (nr == XFEATURE_PT_UNIMPLEMENTED_SO_FAR) ||
-	    ((nr >= XFEATURE_RSRVD_COMP_11) && (nr <= XFEATURE_RSRVD_COMP_16))) {
+	switch (nr) {
+	case XFEATURE_YMM:	  return XCHECK_SZ(sz, nr, struct ymmh_struct);
+	case XFEATURE_BNDREGS:	  return XCHECK_SZ(sz, nr, struct mpx_bndreg_state);
+	case XFEATURE_BNDCSR:	  return XCHECK_SZ(sz, nr, struct mpx_bndcsr_state);
+	case XFEATURE_OPMASK:	  return XCHECK_SZ(sz, nr, struct avx_512_opmask_state);
+	case XFEATURE_ZMM_Hi256:  return XCHECK_SZ(sz, nr, struct avx_512_zmm_uppers_state);
+	case XFEATURE_Hi16_ZMM:	  return XCHECK_SZ(sz, nr, struct avx_512_hi16_state);
+	case XFEATURE_PKRU:	  return XCHECK_SZ(sz, nr, struct pkru_state);
+	case XFEATURE_PASID:	  return XCHECK_SZ(sz, nr, struct ia32_pasid_state);
+	case XFEATURE_XTILE_CFG:  return XCHECK_SZ(sz, nr, struct xtile_cfg);
+	case XFEATURE_CET_USER:	  return XCHECK_SZ(sz, nr, struct cet_user_state);
+	case XFEATURE_XTILE_DATA: check_xtile_data_against_struct(sz); return true;
+	default:
 		XSTATE_WARN_ON(1, "No structure for xstate: %d\n", nr);
 		return false;
 	}
+
 	return true;
 }
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 05/21] x86/fpu: Add helper for modifying xstate
  2023-05-11  4:08 [PATCH v3 00/21] Enable CET Virtualization Yang Weijiang
                   ` (3 preceding siblings ...)
  2023-05-11  4:08 ` [PATCH v3 04/21] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states Yang Weijiang
@ 2023-05-11  4:08 ` Yang Weijiang
  2023-05-11  4:08 ` [PATCH v3 06/21] KVM:x86: Report XSS as to-be-saved if there are supported features Yang Weijiang
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 99+ messages in thread
From: Yang Weijiang @ 2023-05-11  4:08 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm, linux-kernel
  Cc: peterz, rppt, binbin.wu, rick.p.edgecombe, weijiang.yang,
	john.allen, Thomas Gleixner, Borislav Petkov, Kees Cook,
	Pengfei Xu

From: Rick Edgecombe <rick.p.edgecombe@intel.com>

Just like user xfeatures, supervisor xfeatures can be active in the
registers or present in the task FPU buffer. If the registers are
active, the registers can be modified directly. If the registers are
not active, the modification must be performed on the task FPU buffer.

When the state is not active, the kernel could perform modifications
directly to the buffer. But in order for it to do that, it needs
to know where in the buffer the specific state it wants to modify is
located. Doing this is not robust against optimizations that compact
the FPU buffer, as each access would require computing where in the
buffer it is.

The easiest way to modify supervisor xfeature data is to force restore
the registers and write directly to the MSRs. Often times this is just fine
anyway as the registers need to be restored before returning to userspace.
Do this for now, leaving buffer writing optimizations for the future.

Add a new function fpregs_lock_and_load() that can simultaneously call
fpregs_lock() and do this restore. Also perform some extra sanity
checks in this function since this will be used in non-fpu focused code.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/all/20230319001535.23210-7-rick.p.edgecombe%40intel.com
---
 arch/x86/include/asm/fpu/api.h |  9 +++++++++
 arch/x86/kernel/fpu/core.c     | 18 ++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index 503a577814b2..aadc6893dcaa 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -82,6 +82,15 @@ static inline void fpregs_unlock(void)
 		preempt_enable();
 }
 
+/*
+ * FPU state gets lazily restored before returning to userspace. So when in the
+ * kernel, the valid FPU state may be kept in the buffer. This function will force
+ * restore all the fpu state to the registers early if needed, and lock them from
+ * being automatically saved/restored. Then FPU state can be modified safely in the
+ * registers, before unlocking with fpregs_unlock().
+ */
+void fpregs_lock_and_load(void);
+
 #ifdef CONFIG_X86_DEBUG_FPU
 extern void fpregs_assert_state_consistent(void);
 #else
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index caf33486dc5e..f851558b673f 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -753,6 +753,24 @@ void switch_fpu_return(void)
 }
 EXPORT_SYMBOL_GPL(switch_fpu_return);
 
+void fpregs_lock_and_load(void)
+{
+	/*
+	 * fpregs_lock() only disables preemption (mostly). So modifying state
+	 * in an interrupt could screw up some in progress fpregs operation.
+	 * Warn about it.
+	 */
+	WARN_ON_ONCE(!irq_fpu_usable());
+	WARN_ON_ONCE(current->flags & PF_KTHREAD);
+
+	fpregs_lock();
+
+	fpregs_assert_state_consistent();
+
+	if (test_thread_flag(TIF_NEED_FPU_LOAD))
+		fpregs_restore_userregs();
+}
+
 #ifdef CONFIG_X86_DEBUG_FPU
 /*
  * If current FPU state according to its tracking (loaded FPU context on this
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 06/21] KVM:x86: Report XSS as to-be-saved if there are supported features
  2023-05-11  4:08 [PATCH v3 00/21] Enable CET Virtualization Yang Weijiang
                   ` (4 preceding siblings ...)
  2023-05-11  4:08 ` [PATCH v3 05/21] x86/fpu: Add helper for modifying xstate Yang Weijiang
@ 2023-05-11  4:08 ` Yang Weijiang
  2023-05-24  7:06   ` Chao Gao
  2023-05-11  4:08 ` [PATCH v3 07/21] KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS Yang Weijiang
                   ` (15 subsequent siblings)
  21 siblings, 1 reply; 99+ messages in thread
From: Yang Weijiang @ 2023-05-11  4:08 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm, linux-kernel
  Cc: peterz, rppt, binbin.wu, rick.p.edgecombe, weijiang.yang,
	john.allen, Sean Christopherson

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add MSR_IA32_XSS to the list of MSRs reported to userspace if
supported_xss is non-zero, i.e. KVM supports at least one XSS based
feature.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/x86.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e7f78fe79b32..33a780fe820b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1454,6 +1454,7 @@ static const u32 msrs_to_save_base[] = {
 	MSR_IA32_UMWAIT_CONTROL,
 
 	MSR_IA32_XFD, MSR_IA32_XFD_ERR,
+	MSR_IA32_XSS,
 };
 
 static const u32 msrs_to_save_pmu[] = {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 07/21] KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS
  2023-05-11  4:08 [PATCH v3 00/21] Enable CET Virtualization Yang Weijiang
                   ` (5 preceding siblings ...)
  2023-05-11  4:08 ` [PATCH v3 06/21] KVM:x86: Report XSS as to-be-saved if there are supported features Yang Weijiang
@ 2023-05-11  4:08 ` Yang Weijiang
  2023-05-25  6:10   ` Chao Gao
  2023-05-11  4:08 ` [PATCH v3 08/21] KVM:x86: Init kvm_caps.supported_xss with supported feature bits Yang Weijiang
                   ` (14 subsequent siblings)
  21 siblings, 1 reply; 99+ messages in thread
From: Yang Weijiang @ 2023-05-11  4:08 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm, linux-kernel
  Cc: peterz, rppt, binbin.wu, rick.p.edgecombe, weijiang.yang,
	john.allen, Zhang Yi Z

Update CPUID(EAX=0DH,ECX=1) when the guest's XSS is modified.
CPUID(EAX=0DH,ECX=1).EBX reports current required storage size for all
features enabled via XCR0 | XSS so that guest can allocate correct xsave
buffer.

Note, KVM does not yet support any XSS based features, i.e. supported_xss
is guaranteed to be zero at this time.

Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/cpuid.c | 7 +++++--
 arch/x86/kvm/x86.c   | 6 ++++--
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 123bf8b97a4b..cbb1b8a65502 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -277,8 +277,11 @@ static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_e
 
 	best = cpuid_entry2_find(entries, nent, 0xD, 1);
 	if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
-		     cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
-		best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
+		cpuid_entry_has(best, X86_FEATURE_XSAVEC))) {
+		u64 xstate = vcpu->arch.xcr0 | vcpu->arch.ia32_xss;
+
+		best->ebx = xstate_required_size(xstate, true);
+	}
 
 	best = __kvm_find_kvm_cpuid_features(vcpu, entries, nent);
 	if (kvm_hlt_in_guest(vcpu->kvm) && best &&
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 33a780fe820b..ab3360a10933 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3776,8 +3776,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		 */
 		if (data & ~kvm_caps.supported_xss)
 			return 1;
-		vcpu->arch.ia32_xss = data;
-		kvm_update_cpuid_runtime(vcpu);
+		if (vcpu->arch.ia32_xss != data) {
+			vcpu->arch.ia32_xss = data;
+			kvm_update_cpuid_runtime(vcpu);
+		}
 		break;
 	case MSR_SMI_COUNT:
 		if (!msr_info->host_initiated)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 08/21] KVM:x86: Init kvm_caps.supported_xss with supported feature bits
  2023-05-11  4:08 [PATCH v3 00/21] Enable CET Virtualization Yang Weijiang
                   ` (6 preceding siblings ...)
  2023-05-11  4:08 ` [PATCH v3 07/21] KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS Yang Weijiang
@ 2023-05-11  4:08 ` Yang Weijiang
  2023-06-06  8:38   ` Chao Gao
  2023-05-11  4:08 ` [PATCH v3 09/21] KVM:x86: Load guest FPU state when accessing xsaves-managed MSRs Yang Weijiang
                   ` (13 subsequent siblings)
  21 siblings, 1 reply; 99+ messages in thread
From: Yang Weijiang @ 2023-05-11  4:08 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm, linux-kernel
  Cc: peterz, rppt, binbin.wu, rick.p.edgecombe, weijiang.yang, john.allen

Initialize kvm_caps.supported_xss with host XSS msr value AND XSS mask.
KVM_SUPPORTED_XSS holds all potential supported feature bits, the result
represents all KVM supported feature bits which is used for swapping guest
and host FPU contents.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 1 -
 arch/x86/kvm/x86.c     | 6 +++++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 44fb619803b8..c872a5aafa50 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7806,7 +7806,6 @@ static __init void vmx_set_cpu_caps(void)
 		kvm_cpu_cap_set(X86_FEATURE_UMIP);
 
 	/* CPUID 0xD.1 */
-	kvm_caps.supported_xss = 0;
 	if (!cpu_has_vmx_xsaves())
 		kvm_cpu_cap_clear(X86_FEATURE_XSAVES);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ab3360a10933..d2975ca96ac5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -223,6 +223,8 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
 				| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
 				| XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE)
 
+#define KVM_SUPPORTED_XSS     0
+
 u64 __read_mostly host_efer;
 EXPORT_SYMBOL_GPL(host_efer);
 
@@ -9472,8 +9474,10 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 
 	rdmsrl_safe(MSR_EFER, &host_efer);
 
-	if (boot_cpu_has(X86_FEATURE_XSAVES))
+	if (boot_cpu_has(X86_FEATURE_XSAVES)) {
 		rdmsrl(MSR_IA32_XSS, host_xss);
+		kvm_caps.supported_xss = host_xss & KVM_SUPPORTED_XSS;
+	}
 
 	kvm_init_pmu_capability(ops->pmu_ops);
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 09/21] KVM:x86: Load guest FPU state when accessing xsaves-managed MSRs
  2023-05-11  4:08 [PATCH v3 00/21] Enable CET Virtualization Yang Weijiang
                   ` (7 preceding siblings ...)
  2023-05-11  4:08 ` [PATCH v3 08/21] KVM:x86: Init kvm_caps.supported_xss with supported feature bits Yang Weijiang
@ 2023-05-11  4:08 ` Yang Weijiang
  2023-06-15 23:50   ` Sean Christopherson
  2023-05-11  4:08 ` [PATCH v3 10/21] KVM:x86: Add #CP support in guest exception classification Yang Weijiang
                   ` (12 subsequent siblings)
  21 siblings, 1 reply; 99+ messages in thread
From: Yang Weijiang @ 2023-05-11  4:08 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm, linux-kernel
  Cc: peterz, rppt, binbin.wu, rick.p.edgecombe, weijiang.yang,
	john.allen, Sean Christopherson

From: Sean Christopherson <sean.j.christopherson@intel.com>

Load the guest's FPU state if userspace is accessing MSRs whose values are
managed by XSAVES. Two MSR access helpers, i.e., kvm_{get,set}_xsave_msr(),
are introduced by a later patch to facilitate access to this kind of MSRs.

If new feature MSRs supported in XSS are passed through to the guest they
are saved and restored by {XSAVES|XRSTORS} to/from guest's FPU state at
vm-entry/exit.

Because the modified code is also used for the KVM_GET_MSRS device ioctl(),
explicitly check @vcpu is non-null before attempting to load guest state.
The XSS supporting MSRs cannot be retrieved via the device ioctl() without
loading guest FPU state (which doesn't exist).

Note that guest_cpuid_has() is not queried as host userspace is allowed
to access MSRs that have not been exposed to the guest, e.g. it might do
KVM_SET_MSRS prior to KVM_SET_CPUID2.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/x86.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d2975ca96ac5..7788646bbf1f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -130,6 +130,9 @@ static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
 static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
 
 static DEFINE_MUTEX(vendor_module_lock);
+static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
+static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
+
 struct kvm_x86_ops kvm_x86_ops __read_mostly;
 
 #define KVM_X86_OP(func)					     \
@@ -4336,6 +4339,21 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 }
 EXPORT_SYMBOL_GPL(kvm_get_msr_common);
 
+static const u32 xsave_msrs[] = {
+	MSR_IA32_U_CET, MSR_IA32_PL3_SSP,
+};
+
+static bool is_xsaves_msr(u32 index)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(xsave_msrs); i++) {
+		if (index == xsave_msrs[i])
+			return true;
+	}
+	return false;
+}
+
 /*
  * Read or write a bunch of msrs. All parameters are kernel addresses.
  *
@@ -4346,11 +4364,20 @@ static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs,
 		    int (*do_msr)(struct kvm_vcpu *vcpu,
 				  unsigned index, u64 *data))
 {
+	bool fpu_loaded = false;
 	int i;
 
-	for (i = 0; i < msrs->nmsrs; ++i)
+	for (i = 0; i < msrs->nmsrs; ++i) {
+		if (vcpu && !fpu_loaded && kvm_caps.supported_xss &&
+		    is_xsaves_msr(entries[i].index)) {
+			kvm_load_guest_fpu(vcpu);
+			fpu_loaded = true;
+		}
 		if (do_msr(vcpu, entries[i].index, &entries[i].data))
 			break;
+	}
+	if (fpu_loaded)
+		kvm_put_guest_fpu(vcpu);
 
 	return i;
 }
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 10/21] KVM:x86: Add #CP support in guest exception classification
  2023-05-11  4:08 [PATCH v3 00/21] Enable CET Virtualization Yang Weijiang
                   ` (8 preceding siblings ...)
  2023-05-11  4:08 ` [PATCH v3 09/21] KVM:x86: Load guest FPU state when accessing xsaves-managed MSRs Yang Weijiang
@ 2023-05-11  4:08 ` Yang Weijiang
  2023-06-06  9:08   ` Chao Gao
  2023-05-11  4:08 ` [PATCH v3 11/21] KVM:VMX: Introduce CET VMCS fields and control bits Yang Weijiang
                   ` (11 subsequent siblings)
  21 siblings, 1 reply; 99+ messages in thread
From: Yang Weijiang @ 2023-05-11  4:08 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm, linux-kernel
  Cc: peterz, rppt, binbin.wu, rick.p.edgecombe, weijiang.yang, john.allen

Add handling for Control Protection (#CP) exceptions(vector 21).
The new vector is introduced for Intel's Control-Flow Enforcement
Technology (CET) relevant violation cases.

Although #CP belongs contributory exception class, but the actual
effect is conditional on CET being exposed to guest. If CET is not
available to guest, #CP falls back to non-contributory and doesn't
have an error code. The rational is used to fix one unit test failure
encountered in L1. Although the issue now is fixed in unit test case,
keep the handling is reasonable. cr4_guest_rsvd_bits is used to avoid
guest_cpuid_has() lookups.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/include/uapi/asm/kvm.h |  1 +
 arch/x86/kvm/vmx/nested.c       |  2 +-
 arch/x86/kvm/x86.c              | 10 +++++++---
 arch/x86/kvm/x86.h              | 13 ++++++++++---
 4 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 7f467fe05d42..1c002abe2be8 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -33,6 +33,7 @@
 #define MC_VECTOR 18
 #define XM_VECTOR 19
 #define VE_VECTOR 20
+#define CP_VECTOR 21
 
 /* Select x86 specific features in <linux/kvm.h> */
 #define __KVM_HAVE_PIT
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 96ede74a6067..7bc62cd72748 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2850,7 +2850,7 @@ static int nested_check_vm_entry_controls(struct kvm_vcpu *vcpu,
 		/* VM-entry interruption-info field: deliver error code */
 		should_have_error_code =
 			intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode &&
-			x86_exception_has_error_code(vector);
+			x86_exception_has_error_code(vcpu, vector);
 		if (CC(has_error_code != should_have_error_code))
 			return -EINVAL;
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7788646bbf1f..a768cbf3fbb7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -520,11 +520,15 @@ EXPORT_SYMBOL_GPL(kvm_spurious_fault);
 #define EXCPT_CONTRIBUTORY	1
 #define EXCPT_PF		2
 
-static int exception_class(int vector)
+static int exception_class(struct kvm_vcpu *vcpu, int vector)
 {
 	switch (vector) {
 	case PF_VECTOR:
 		return EXCPT_PF;
+	case CP_VECTOR:
+		if (vcpu->arch.cr4_guest_rsvd_bits & X86_CR4_CET)
+			return EXCPT_BENIGN;
+		return EXCPT_CONTRIBUTORY;
 	case DE_VECTOR:
 	case TS_VECTOR:
 	case NP_VECTOR:
@@ -707,8 +711,8 @@ static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
 		kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
 		return;
 	}
-	class1 = exception_class(prev_nr);
-	class2 = exception_class(nr);
+	class1 = exception_class(vcpu, prev_nr);
+	class2 = exception_class(vcpu, nr);
 	if ((class1 == EXCPT_CONTRIBUTORY && class2 == EXCPT_CONTRIBUTORY) ||
 	    (class1 == EXCPT_PF && class2 != EXCPT_BENIGN)) {
 		/*
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index c544602d07a3..2ba7c7fc4846 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -171,13 +171,20 @@ static inline bool is_64_bit_hypercall(struct kvm_vcpu *vcpu)
 	return vcpu->arch.guest_state_protected || is_64_bit_mode(vcpu);
 }
 
-static inline bool x86_exception_has_error_code(unsigned int vector)
+static inline bool x86_exception_has_error_code(struct kvm_vcpu *vcpu,
+						unsigned int vector)
 {
 	static u32 exception_has_error_code = BIT(DF_VECTOR) | BIT(TS_VECTOR) |
 			BIT(NP_VECTOR) | BIT(SS_VECTOR) | BIT(GP_VECTOR) |
-			BIT(PF_VECTOR) | BIT(AC_VECTOR);
+			BIT(PF_VECTOR) | BIT(AC_VECTOR) | BIT(CP_VECTOR);
 
-	return (1U << vector) & exception_has_error_code;
+	if (!((1U << vector) & exception_has_error_code))
+		return false;
+
+	if (vector == CP_VECTOR)
+		return !(vcpu->arch.cr4_guest_rsvd_bits & X86_CR4_CET);
+
+	return true;
 }
 
 static inline bool mmu_is_nested(struct kvm_vcpu *vcpu)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 11/21] KVM:VMX: Introduce CET VMCS fields and control bits
  2023-05-11  4:08 [PATCH v3 00/21] Enable CET Virtualization Yang Weijiang
                   ` (9 preceding siblings ...)
  2023-05-11  4:08 ` [PATCH v3 10/21] KVM:x86: Add #CP support in guest exception classification Yang Weijiang
@ 2023-05-11  4:08 ` Yang Weijiang
  2023-05-11  4:08 ` [PATCH v3 12/21] KVM:x86: Add fault checks for guest CR4.CET setting Yang Weijiang
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 99+ messages in thread
From: Yang Weijiang @ 2023-05-11  4:08 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm, linux-kernel
  Cc: peterz, rppt, binbin.wu, rick.p.edgecombe, weijiang.yang,
	john.allen, Zhang Yi Z

Control-flow Enforcement Technology(CET) is a CPU feature used to prevent
Return/Jump-Oriented Programming (ROP/JOP) attacks. CET introduces a new
exception type, Control Protection (#CP), and two sub-features(SHSTK,IBT)
to defend against ROP/JOP style control-flow subversion attacks.

Shadow Stack (SHSTK):
  A shadow stack is a second stack used exclusively for control transfer
  operations. The shadow stack is separate from the data/normal stack and
  can be enabled individually in user and kernel mode. When shadow stack
  is enabled, CALL pushes the return address on both the data and shadow
  stack. RET pops the return address from both stacks and compares them.
  If the return addresses from the two stacks do not match, the processor
  generates a #CP.

Indirect Branch Tracking (IBT):
  IBT adds a new instrution, ENDBRANCH, that is used to mark valid target
  addresses of indirect branches(CALL, JMP, ENCLU[EEXIT], etc...). If an
  indirect branch is executed and the next instruction is _not_ an ENDBRANCH,
  the processor generates a #CP.

Several new CET MSRs are defined to support CET:
  MSR_IA32_{U,S}_CET: Controls the CET settings for user mode and kernel
                      mode respectively.

  MSR_IA32_PL{0,1,2,3}_SSP: Stores shadow stack pointers for CPL-0,1,2,3
                            protection respectively.

  MSR_IA32_INT_SSP_TAB: Linear address of shadow stack pointer table,the
			entry is indexed by IST of interrupt gate desc.

Two XSAVES state bits are introduced for CET:
  IA32_XSS:[bit 11]: Control saving/restoring user mode CET states
  IA32_XSS:[bit 12]: Control saving/restoring kernel mode CET states.

Six VMCS fields are introduced for CET:
  {HOST,GUEST}_S_CET: Stores CET settings for kernel mode.
  {HOST,GUEST}_SSP: Stores shadow stack pointer of current active task/thread.
  {HOST,GUEST}_INTR_SSP_TABLE: Stores current active MSR_IA32_INT_SSP_TAB.

If VM_EXIT_LOAD_HOST_CET_STATE = 1, the host CET states are restored from
the following VMCS fields at VM-Exit:
  HOST_S_CET
  HOST_SSP
  HOST_INTR_SSP_TABLE

If VM_ENTRY_LOAD_GUEST_CET_STATE = 1, the guest CET states are loaded from
the following VMCS fields at VM-Entry:
  GUEST_S_CET
  GUEST_SSP
  GUEST_INTR_SSP_TABLE

Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/include/asm/vmx.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 498dc600bd5c..fe2aff27df8c 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -102,6 +102,7 @@
 #define VM_EXIT_CLEAR_BNDCFGS                   0x00800000
 #define VM_EXIT_PT_CONCEAL_PIP			0x01000000
 #define VM_EXIT_CLEAR_IA32_RTIT_CTL		0x02000000
+#define VM_EXIT_LOAD_CET_STATE                  0x10000000
 
 #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR	0x00036dff
 
@@ -115,6 +116,7 @@
 #define VM_ENTRY_LOAD_BNDCFGS                   0x00010000
 #define VM_ENTRY_PT_CONCEAL_PIP			0x00020000
 #define VM_ENTRY_LOAD_IA32_RTIT_CTL		0x00040000
+#define VM_ENTRY_LOAD_CET_STATE                 0x00100000
 
 #define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR	0x000011ff
 
@@ -343,6 +345,9 @@ enum vmcs_field {
 	GUEST_PENDING_DBG_EXCEPTIONS    = 0x00006822,
 	GUEST_SYSENTER_ESP              = 0x00006824,
 	GUEST_SYSENTER_EIP              = 0x00006826,
+	GUEST_S_CET                     = 0x00006828,
+	GUEST_SSP                       = 0x0000682a,
+	GUEST_INTR_SSP_TABLE            = 0x0000682c,
 	HOST_CR0                        = 0x00006c00,
 	HOST_CR3                        = 0x00006c02,
 	HOST_CR4                        = 0x00006c04,
@@ -355,6 +360,9 @@ enum vmcs_field {
 	HOST_IA32_SYSENTER_EIP          = 0x00006c12,
 	HOST_RSP                        = 0x00006c14,
 	HOST_RIP                        = 0x00006c16,
+	HOST_S_CET                      = 0x00006c18,
+	HOST_SSP                        = 0x00006c1a,
+	HOST_INTR_SSP_TABLE             = 0x00006c1c
 };
 
 /*
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 12/21] KVM:x86: Add fault checks for guest CR4.CET setting
  2023-05-11  4:08 [PATCH v3 00/21] Enable CET Virtualization Yang Weijiang
                   ` (10 preceding siblings ...)
  2023-05-11  4:08 ` [PATCH v3 11/21] KVM:VMX: Introduce CET VMCS fields and control bits Yang Weijiang
@ 2023-05-11  4:08 ` Yang Weijiang
  2023-06-06 11:03   ` Chao Gao
  2023-05-11  4:08 ` [PATCH v3 13/21] KVM:VMX: Emulate reads and writes to CET MSRs Yang Weijiang
                   ` (9 subsequent siblings)
  21 siblings, 1 reply; 99+ messages in thread
From: Yang Weijiang @ 2023-05-11  4:08 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm, linux-kernel
  Cc: peterz, rppt, binbin.wu, rick.p.edgecombe, weijiang.yang,
	john.allen, Sean Christopherson

Check potential faults for CR4.CET setting per Intel SDM.
CR4.CET is the master control bit for CET features (SHSTK and IBT).
In addition to basic support checks, CET can be enabled if and only
if CR0.WP==1, i.e. setting CR4.CET=1 faults if CR0.WP==0 and setting
CR0.WP=0 fails if CR4.CET==1.

Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/x86.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a768cbf3fbb7..b6eec9143129 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -995,6 +995,9 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 	    (is_64_bit_mode(vcpu) || kvm_is_cr4_bit_set(vcpu, X86_CR4_PCIDE)))
 		return 1;
 
+	if (!(cr0 & X86_CR0_WP) && kvm_is_cr4_bit_set(vcpu, X86_CR4_CET))
+		return 1;
+
 	static_call(kvm_x86_set_cr0)(vcpu, cr0);
 
 	kvm_post_set_cr0(vcpu, old_cr0, cr0);
@@ -1210,6 +1213,9 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 			return 1;
 	}
 
+	if ((cr4 & X86_CR4_CET) && !kvm_is_cr0_bit_set(vcpu, X86_CR0_WP))
+		return 1;
+
 	static_call(kvm_x86_set_cr4)(vcpu, cr4);
 
 	kvm_post_set_cr4(vcpu, old_cr4, cr4);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 13/21] KVM:VMX: Emulate reads and writes to CET MSRs
  2023-05-11  4:08 [PATCH v3 00/21] Enable CET Virtualization Yang Weijiang
                   ` (11 preceding siblings ...)
  2023-05-11  4:08 ` [PATCH v3 12/21] KVM:x86: Add fault checks for guest CR4.CET setting Yang Weijiang
@ 2023-05-11  4:08 ` Yang Weijiang
  2023-05-23  8:21   ` Binbin Wu
  2023-06-23 23:53   ` Sean Christopherson
  2023-05-11  4:08 ` [PATCH v3 14/21] KVM:VMX: Add a synthetic MSR to allow userspace to access GUEST_SSP Yang Weijiang
                   ` (8 subsequent siblings)
  21 siblings, 2 replies; 99+ messages in thread
From: Yang Weijiang @ 2023-05-11  4:08 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm, linux-kernel
  Cc: peterz, rppt, binbin.wu, rick.p.edgecombe, weijiang.yang,
	john.allen, Sean Christopherson

Add support for emulating read and write accesses to CET MSRs.
CET MSRs are universally "special" as they are either context switched
via dedicated VMCS fields or via XSAVES, i.e. no additional in-memory
tracking is needed, but emulated reads/writes are more expensive.

Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kernel/fpu/core.c |  1 +
 arch/x86/kvm/vmx/vmx.c     | 18 ++++++++++++++++++
 arch/x86/kvm/x86.c         | 20 ++++++++++++++++++++
 arch/x86/kvm/x86.h         | 31 +++++++++++++++++++++++++++++++
 4 files changed, 70 insertions(+)

diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index f851558b673f..b4e28487882c 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -770,6 +770,7 @@ void fpregs_lock_and_load(void)
 	if (test_thread_flag(TIF_NEED_FPU_LOAD))
 		fpregs_restore_userregs();
 }
+EXPORT_SYMBOL_GPL(fpregs_lock_and_load);
 
 #ifdef CONFIG_X86_DEBUG_FPU
 /*
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c872a5aafa50..0ccaa467d7d3 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2093,6 +2093,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		else
 			msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
 		break;
+	case MSR_IA32_U_CET:
+	case MSR_IA32_PL3_SSP:
+		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
+			return 1;
+		kvm_get_xsave_msr(msr_info);
+		break;
 	case MSR_IA32_DEBUGCTLMSR:
 		msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
 		break;
@@ -2405,6 +2411,18 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		else
 			vmx->pt_desc.guest.addr_a[index / 2] = data;
 		break;
+	case MSR_IA32_U_CET:
+	case MSR_IA32_PL3_SSP:
+		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
+			return 1;
+		if (is_noncanonical_address(data, vcpu))
+			return 1;
+		if (msr_index == MSR_IA32_U_CET && (data & GENMASK(9, 6)))
+			return 1;
+		if (msr_index == MSR_IA32_PL3_SSP && (data & GENMASK(2, 0)))
+			return 1;
+		kvm_set_xsave_msr(msr_info);
+		break;
 	case MSR_IA32_PERF_CAPABILITIES:
 		if (data && !vcpu_to_pmu(vcpu)->version)
 			return 1;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b6eec9143129..2e3a39c9297c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13630,6 +13630,26 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
 }
 EXPORT_SYMBOL_GPL(kvm_sev_es_string_io);
 
+bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu, struct msr_data *msr)
+{
+	if (!kvm_cet_user_supported())
+		return false;
+
+	if (msr->host_initiated)
+		return true;
+
+	if (!guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) &&
+	    !guest_cpuid_has(vcpu, X86_FEATURE_IBT))
+		return false;
+
+	if (msr->index == MSR_IA32_PL3_SSP &&
+	    !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
+		return false;
+
+	return true;
+}
+EXPORT_SYMBOL_GPL(kvm_cet_is_msr_accessible);
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_entry);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 2ba7c7fc4846..93afa7631735 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -2,6 +2,7 @@
 #ifndef ARCH_X86_KVM_X86_H
 #define ARCH_X86_KVM_X86_H
 
+#include <asm/fpu/api.h>
 #include <linux/kvm_host.h>
 #include <asm/fpu/xstate.h>
 #include <asm/mce.h>
@@ -370,6 +371,16 @@ static inline bool kvm_mpx_supported(void)
 		== (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR);
 }
 
+/*
+ * Guest CET user mode states depend on host XSAVES/XRSTORS to save/restore
+ * when vCPU enter/exit user space. If host doesn't support CET user bit in
+ * XSS msr, then treat this case as KVM doesn't support CET user mode.
+ */
+static inline bool kvm_cet_user_supported(void)
+{
+	return !!(kvm_caps.supported_xss & XFEATURE_MASK_CET_USER);
+}
+
 extern unsigned int min_timer_period_us;
 
 extern bool enable_vmware_backdoor;
@@ -546,5 +557,25 @@ int kvm_sev_es_mmio_read(struct kvm_vcpu *vcpu, gpa_t src, unsigned int bytes,
 int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
 			 unsigned int port, void *data,  unsigned int count,
 			 int in);
+bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu, struct msr_data *msr);
+
+/*
+ * We've already loaded guest MSRs in __msr_io() after check the MSR index.
+ * In case vcpu has been preempted, we need to disable preemption, check
+ * and reload the guest fpu states before read/write xsaves-managed MSRs.
+ */
+static inline void kvm_get_xsave_msr(struct msr_data *msr_info)
+{
+	fpregs_lock_and_load();
+	rdmsrl(msr_info->index, msr_info->data);
+	fpregs_unlock();
+}
+
+static inline void kvm_set_xsave_msr(struct msr_data *msr_info)
+{
+	fpregs_lock_and_load();
+	wrmsrl(msr_info->index, msr_info->data);
+	fpregs_unlock();
+}
 
 #endif
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 14/21] KVM:VMX: Add a synthetic MSR to allow userspace to access GUEST_SSP
  2023-05-11  4:08 [PATCH v3 00/21] Enable CET Virtualization Yang Weijiang
                   ` (12 preceding siblings ...)
  2023-05-11  4:08 ` [PATCH v3 13/21] KVM:VMX: Emulate reads and writes to CET MSRs Yang Weijiang
@ 2023-05-11  4:08 ` Yang Weijiang
  2023-05-23  8:57   ` Binbin Wu
  2023-05-11  4:08 ` [PATCH v3 15/21] KVM:x86: Report CET MSRs as to-be-saved if CET is supported Yang Weijiang
                   ` (7 subsequent siblings)
  21 siblings, 1 reply; 99+ messages in thread
From: Yang Weijiang @ 2023-05-11  4:08 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm, linux-kernel
  Cc: peterz, rppt, binbin.wu, rick.p.edgecombe, weijiang.yang,
	john.allen, Sean Christopherson

Introduce a host-only synthetic MSR, MSR_KVM_GUEST_SSP, so that the VMM
can read/write the guest's SSP, e.g. to migrate CET state.  Use a synthetic
MSR, e.g. as opposed to a VCPU_REG_, as GUEST_SSP is subject to the same
consistency checks as the PL*_SSP MSRs, i.e. can share code.

Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/include/uapi/asm/kvm_para.h |  1 +
 arch/x86/kvm/vmx/vmx.c               | 15 ++++++++++++---
 arch/x86/kvm/x86.c                   |  4 ++++
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 6e64b27b2c1e..7af465e4e0bd 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -58,6 +58,7 @@
 #define MSR_KVM_ASYNC_PF_INT	0x4b564d06
 #define MSR_KVM_ASYNC_PF_ACK	0x4b564d07
 #define MSR_KVM_MIGRATION_CONTROL	0x4b564d08
+#define MSR_KVM_GUEST_SSP	0x4b564d09
 
 struct kvm_steal_time {
 	__u64 steal;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 0ccaa467d7d3..72149156bbd3 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2095,9 +2095,13 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		break;
 	case MSR_IA32_U_CET:
 	case MSR_IA32_PL3_SSP:
+	case MSR_KVM_GUEST_SSP:
 		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
 			return 1;
-		kvm_get_xsave_msr(msr_info);
+		if (msr_info->index == MSR_KVM_GUEST_SSP)
+			msr_info->data = vmcs_readl(GUEST_SSP);
+		else
+			kvm_get_xsave_msr(msr_info);
 		break;
 	case MSR_IA32_DEBUGCTLMSR:
 		msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
@@ -2413,15 +2417,20 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		break;
 	case MSR_IA32_U_CET:
 	case MSR_IA32_PL3_SSP:
+	case MSR_KVM_GUEST_SSP:
 		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
 			return 1;
 		if (is_noncanonical_address(data, vcpu))
 			return 1;
 		if (msr_index == MSR_IA32_U_CET && (data & GENMASK(9, 6)))
 			return 1;
-		if (msr_index == MSR_IA32_PL3_SSP && (data & GENMASK(2, 0)))
+		if ((msr_index == MSR_IA32_PL3_SSP ||
+		     msr_index == MSR_KVM_GUEST_SSP) && (data & GENMASK(2, 0)))
 			return 1;
-		kvm_set_xsave_msr(msr_info);
+		if (msr_index == MSR_KVM_GUEST_SSP)
+			vmcs_writel(GUEST_SSP, data);
+		else
+			kvm_set_xsave_msr(msr_info);
 		break;
 	case MSR_IA32_PERF_CAPABILITIES:
 		if (data && !vcpu_to_pmu(vcpu)->version)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2e3a39c9297c..baac6acebd40 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13642,6 +13642,10 @@ bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu, struct msr_data *msr)
 	    !guest_cpuid_has(vcpu, X86_FEATURE_IBT))
 		return false;
 
+	/* The synthetic MSR is for userspace access only. */
+	if (msr->index == MSR_KVM_GUEST_SSP)
+		return false;
+
 	if (msr->index == MSR_IA32_PL3_SSP &&
 	    !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
 		return false;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 15/21] KVM:x86: Report CET MSRs as to-be-saved if CET is supported
  2023-05-11  4:08 [PATCH v3 00/21] Enable CET Virtualization Yang Weijiang
                   ` (13 preceding siblings ...)
  2023-05-11  4:08 ` [PATCH v3 14/21] KVM:VMX: Add a synthetic MSR to allow userspace to access GUEST_SSP Yang Weijiang
@ 2023-05-11  4:08 ` Yang Weijiang
  2023-05-11  4:08 ` [PATCH v3 16/21] KVM:x86: Save/Restore GUEST_SSP to/from SMM state save area Yang Weijiang
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 99+ messages in thread
From: Yang Weijiang @ 2023-05-11  4:08 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm, linux-kernel
  Cc: peterz, rppt, binbin.wu, rick.p.edgecombe, weijiang.yang,
	john.allen, Sean Christopherson

Report CET user mode MSRs, including the synthetic GUEST_SSP MSR,
as to-be-saved MSRs.

Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/x86.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index baac6acebd40..50026557fb2a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1470,6 +1470,7 @@ static const u32 msrs_to_save_base[] = {
 
 	MSR_IA32_XFD, MSR_IA32_XFD_ERR,
 	MSR_IA32_XSS,
+	MSR_IA32_U_CET, MSR_IA32_PL3_SSP, MSR_KVM_GUEST_SSP,
 };
 
 static const u32 msrs_to_save_pmu[] = {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 16/21] KVM:x86: Save/Restore GUEST_SSP to/from SMM state save area
  2023-05-11  4:08 [PATCH v3 00/21] Enable CET Virtualization Yang Weijiang
                   ` (14 preceding siblings ...)
  2023-05-11  4:08 ` [PATCH v3 15/21] KVM:x86: Report CET MSRs as to-be-saved if CET is supported Yang Weijiang
@ 2023-05-11  4:08 ` Yang Weijiang
  2023-06-23 22:30   ` Sean Christopherson
  2023-05-11  4:08 ` [PATCH v3 17/21] KVM:VMX: Pass through user CET MSRs to the guest Yang Weijiang
                   ` (5 subsequent siblings)
  21 siblings, 1 reply; 99+ messages in thread
From: Yang Weijiang @ 2023-05-11  4:08 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm, linux-kernel
  Cc: peterz, rppt, binbin.wu, rick.p.edgecombe, weijiang.yang, john.allen

Save GUEST_SSP to SMM state save area when guest exits to SMM
due to SMI and restore it VMCS field when guest exits SMM.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/smm.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c
index b42111a24cc2..c54d3eb2b7e4 100644
--- a/arch/x86/kvm/smm.c
+++ b/arch/x86/kvm/smm.c
@@ -275,6 +275,16 @@ static void enter_smm_save_state_64(struct kvm_vcpu *vcpu,
 	enter_smm_save_seg_64(vcpu, &smram->gs, VCPU_SREG_GS);
 
 	smram->int_shadow = static_call(kvm_x86_get_interrupt_shadow)(vcpu);
+
+	if (kvm_cet_user_supported()) {
+		struct msr_data msr;
+
+		msr.index = MSR_KVM_GUEST_SSP;
+		msr.host_initiated = true;
+		/* GUEST_SSP is stored in VMCS at vm-exit. */
+		static_call(kvm_x86_get_msr)(vcpu, &msr);
+		smram->ssp = msr.data;
+	}
 }
 #endif
 
@@ -565,6 +575,16 @@ static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt,
 	static_call(kvm_x86_set_interrupt_shadow)(vcpu, 0);
 	ctxt->interruptibility = (u8)smstate->int_shadow;
 
+	if (kvm_cet_user_supported()) {
+		struct msr_data msr;
+
+		msr.index = MSR_KVM_GUEST_SSP;
+		msr.host_initiated = true;
+		msr.data = smstate->ssp;
+		/* Mimic host_initiated access to bypass ssp access check. */
+		static_call(kvm_x86_set_msr)(vcpu, &msr);
+	}
+
 	return X86EMUL_CONTINUE;
 }
 #endif
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 17/21] KVM:VMX: Pass through user CET MSRs to the guest
  2023-05-11  4:08 [PATCH v3 00/21] Enable CET Virtualization Yang Weijiang
                   ` (15 preceding siblings ...)
  2023-05-11  4:08 ` [PATCH v3 16/21] KVM:x86: Save/Restore GUEST_SSP to/from SMM state save area Yang Weijiang
@ 2023-05-11  4:08 ` Yang Weijiang
  2023-05-11  4:08 ` [PATCH v3 18/21] KVM:x86: Enable CET virtualization for VMX and advertise to userspace Yang Weijiang
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 99+ messages in thread
From: Yang Weijiang @ 2023-05-11  4:08 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm, linux-kernel
  Cc: peterz, rppt, binbin.wu, rick.p.edgecombe, weijiang.yang,
	john.allen, Zhang Yi Z, Sean Christopherson

Pass through CET user mode MSRs when the associated CET component
is enabled to improve guest performance. All CET MSRs are context
switched, either via dedicated VMCS fields or XSAVES.

Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 72149156bbd3..c254c23f89f3 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -709,6 +709,9 @@ static bool is_valid_passthrough_msr(u32 msr)
 	case MSR_LBR_CORE_TO ... MSR_LBR_CORE_TO + 8:
 		/* LBR MSRs. These are handled in vmx_update_intercept_for_lbr_msrs() */
 		return true;
+	case MSR_IA32_U_CET:
+	case MSR_IA32_PL3_SSP:
+		return true;
 	}
 
 	r = possible_passthrough_msr_slot(msr) != -ENOENT;
@@ -7702,6 +7705,23 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
 		vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4));
 }
 
+static bool is_cet_state_supported(struct kvm_vcpu *vcpu, u32 xss_state)
+{
+	return (kvm_caps.supported_xss & xss_state) &&
+	       (guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) ||
+		guest_cpuid_has(vcpu, X86_FEATURE_IBT));
+}
+
+static void vmx_update_intercept_for_cet_msr(struct kvm_vcpu *vcpu)
+{
+	bool incpt = !is_cet_state_supported(vcpu, XFEATURE_MASK_CET_USER);
+
+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET, MSR_TYPE_RW, incpt);
+
+	incpt |= !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP, MSR_TYPE_RW, incpt);
+}
+
 static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -7769,6 +7789,9 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 
 	/* Refresh #PF interception to account for MAXPHYADDR changes. */
 	vmx_update_exception_bitmap(vcpu);
+
+	if (kvm_cet_user_supported())
+		vmx_update_intercept_for_cet_msr(vcpu);
 }
 
 static u64 vmx_get_perf_capabilities(void)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 18/21] KVM:x86: Enable CET virtualization for VMX and advertise to userspace
  2023-05-11  4:08 [PATCH v3 00/21] Enable CET Virtualization Yang Weijiang
                   ` (16 preceding siblings ...)
  2023-05-11  4:08 ` [PATCH v3 17/21] KVM:VMX: Pass through user CET MSRs to the guest Yang Weijiang
@ 2023-05-11  4:08 ` Yang Weijiang
  2023-05-24  6:35   ` Chenyi Qiang
  2023-05-11  4:08 ` [PATCH v3 19/21] KVM:nVMX: Enable user CET support for nested VMX Yang Weijiang
                   ` (3 subsequent siblings)
  21 siblings, 1 reply; 99+ messages in thread
From: Yang Weijiang @ 2023-05-11  4:08 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm, linux-kernel
  Cc: peterz, rppt, binbin.wu, rick.p.edgecombe, weijiang.yang,
	john.allen, Sean Christopherson

Set the feature bits so that CET capabilities can be seen in guest via
CPUID enumeration. Add CR4.CET bit support in order to allow guest set
CET master control bit(CR4.CET).

Disable KVM CET feature if unrestricted_guest is unsupported/disabled as
KVM does not support emulating CET.

Don't expose CET feature if dependent CET bits are cleared in host XSS,
or if XSAVES isn't supported.

Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  3 ++-
 arch/x86/kvm/cpuid.c            | 12 ++++++++++--
 arch/x86/kvm/vmx/capabilities.h |  4 ++++
 arch/x86/kvm/vmx/vmx.c          | 19 +++++++++++++++++++
 arch/x86/kvm/vmx/vmx.h          |  6 ++++--
 arch/x86/kvm/x86.c              | 21 ++++++++++++++++++++-
 arch/x86/kvm/x86.h              |  3 +++
 7 files changed, 62 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2865c3cb3501..58e20d5895d1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -125,7 +125,8 @@
 			  | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_PCIDE \
 			  | X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \
 			  | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \
-			  | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP))
+			  | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP \
+			  | X86_CR4_CET))
 
 #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR)
 
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index cbb1b8a65502..fefe8833f892 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -632,7 +632,7 @@ void kvm_set_cpu_caps(void)
 		F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
 		F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
 		F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | 0 /*WAITPKG*/ |
-		F(SGX_LC) | F(BUS_LOCK_DETECT)
+		F(SGX_LC) | F(BUS_LOCK_DETECT) | F(SHSTK)
 	);
 	/* Set LA57 based on hardware capability. */
 	if (cpuid_ecx(7) & F(LA57))
@@ -650,7 +650,8 @@ void kvm_set_cpu_caps(void)
 		F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
 		F(MD_CLEAR) | F(AVX512_VP2INTERSECT) | F(FSRM) |
 		F(SERIALIZE) | F(TSXLDTRK) | F(AVX512_FP16) |
-		F(AMX_TILE) | F(AMX_INT8) | F(AMX_BF16) | F(FLUSH_L1D)
+		F(AMX_TILE) | F(AMX_INT8) | F(AMX_BF16) | F(FLUSH_L1D) |
+		F(IBT)
 	);
 
 	/* TSC_ADJUST and ARCH_CAPABILITIES are emulated in software. */
@@ -663,6 +664,13 @@ void kvm_set_cpu_caps(void)
 		kvm_cpu_cap_set(X86_FEATURE_INTEL_STIBP);
 	if (boot_cpu_has(X86_FEATURE_AMD_SSBD))
 		kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL_SSBD);
+	/*
+	 * The feature bit in boot_cpu_data.x86_capability could have been
+	 * cleared due to ibt=off cmdline option, then add it back if CPU
+	 * supports IBT.
+	 */
+	if (cpuid_edx(7) & F(IBT))
+		kvm_cpu_cap_set(X86_FEATURE_IBT);
 
 	kvm_cpu_cap_mask(CPUID_7_1_EAX,
 		F(AVX_VNNI) | F(AVX512_BF16) | F(CMPCCXADD) |
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 45162c1bcd8f..85cffeae7f10 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -106,6 +106,10 @@ static inline bool cpu_has_load_perf_global_ctrl(void)
 	return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
 }
 
+static inline bool cpu_has_load_cet_ctrl(void)
+{
+	return (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_CET_STATE);
+}
 static inline bool cpu_has_vmx_mpx(void)
 {
 	return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_BNDCFGS;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c254c23f89f3..cb5908433c09 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2607,6 +2607,7 @@ static int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 		{ VM_ENTRY_LOAD_IA32_EFER,		VM_EXIT_LOAD_IA32_EFER },
 		{ VM_ENTRY_LOAD_BNDCFGS,		VM_EXIT_CLEAR_BNDCFGS },
 		{ VM_ENTRY_LOAD_IA32_RTIT_CTL,		VM_EXIT_CLEAR_IA32_RTIT_CTL },
+		{ VM_ENTRY_LOAD_CET_STATE,		VM_EXIT_LOAD_CET_STATE },
 	};
 
 	memset(vmcs_conf, 0, sizeof(*vmcs_conf));
@@ -6316,6 +6317,12 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
 	if (vmcs_read32(VM_EXIT_MSR_STORE_COUNT) > 0)
 		vmx_dump_msrs("guest autostore", &vmx->msr_autostore.guest);
 
+	if (vmentry_ctl & VM_ENTRY_LOAD_CET_STATE) {
+		pr_err("S_CET = 0x%016lx\n", vmcs_readl(GUEST_S_CET));
+		pr_err("SSP = 0x%016lx\n", vmcs_readl(GUEST_SSP));
+		pr_err("INTR SSP TABLE = 0x%016lx\n",
+		       vmcs_readl(GUEST_INTR_SSP_TABLE));
+	}
 	pr_err("*** Host State ***\n");
 	pr_err("RIP = 0x%016lx  RSP = 0x%016lx\n",
 	       vmcs_readl(HOST_RIP), vmcs_readl(HOST_RSP));
@@ -6393,6 +6400,12 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
 	if (secondary_exec_control & SECONDARY_EXEC_ENABLE_VPID)
 		pr_err("Virtual processor ID = 0x%04x\n",
 		       vmcs_read16(VIRTUAL_PROCESSOR_ID));
+	if (vmexit_ctl & VM_EXIT_LOAD_CET_STATE) {
+		pr_err("S_CET = 0x%016lx\n", vmcs_readl(HOST_S_CET));
+		pr_err("SSP = 0x%016lx\n", vmcs_readl(HOST_SSP));
+		pr_err("INTR SSP TABLE = 0x%016lx\n",
+		       vmcs_readl(HOST_INTR_SSP_TABLE));
+	}
 }
 
 /*
@@ -7867,6 +7880,12 @@ static __init void vmx_set_cpu_caps(void)
 
 	if (cpu_has_vmx_waitpkg())
 		kvm_cpu_cap_check_and_set(X86_FEATURE_WAITPKG);
+
+	if (!cpu_has_load_cet_ctrl() || !enable_unrestricted_guest) {
+		kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
+		kvm_cpu_cap_clear(X86_FEATURE_IBT);
+		kvm_caps.supported_xss &= ~XFEATURE_MASK_CET_USER;
+	}
 }
 
 static void vmx_request_immediate_exit(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 9e66531861cf..5e3ba69006f9 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -493,7 +493,8 @@ static inline u8 vmx_get_rvi(void)
 	 VM_ENTRY_LOAD_IA32_EFER |					\
 	 VM_ENTRY_LOAD_BNDCFGS |					\
 	 VM_ENTRY_PT_CONCEAL_PIP |					\
-	 VM_ENTRY_LOAD_IA32_RTIT_CTL)
+	 VM_ENTRY_LOAD_IA32_RTIT_CTL |					\
+	 VM_ENTRY_LOAD_CET_STATE)
 
 #define __KVM_REQUIRED_VMX_VM_EXIT_CONTROLS				\
 	(VM_EXIT_SAVE_DEBUG_CONTROLS |					\
@@ -515,7 +516,8 @@ static inline u8 vmx_get_rvi(void)
 	       VM_EXIT_LOAD_IA32_EFER |					\
 	       VM_EXIT_CLEAR_BNDCFGS |					\
 	       VM_EXIT_PT_CONCEAL_PIP |					\
-	       VM_EXIT_CLEAR_IA32_RTIT_CTL)
+	       VM_EXIT_CLEAR_IA32_RTIT_CTL |				\
+	       VM_EXIT_LOAD_CET_STATE)
 
 #define KVM_REQUIRED_VMX_PIN_BASED_VM_EXEC_CONTROL			\
 	(PIN_BASED_EXT_INTR_MASK |					\
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 50026557fb2a..858cb68e781a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -226,7 +226,7 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
 				| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
 				| XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE)
 
-#define KVM_SUPPORTED_XSS     0
+#define KVM_SUPPORTED_XSS	(XFEATURE_MASK_CET_USER)
 
 u64 __read_mostly host_efer;
 EXPORT_SYMBOL_GPL(host_efer);
@@ -9525,6 +9525,25 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 
 	kvm_ops_update(ops);
 
+	/*
+	 * Check CET user bit is still set in kvm_caps.supported_xss,
+	 * if not, clear the cap bits as the user parts depends on
+	 * XSAVES support.
+	 */
+	if (!kvm_cet_user_supported()) {
+		kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
+		kvm_cpu_cap_clear(X86_FEATURE_IBT);
+	}
+
+	/*
+	 * If SHSTK and IBT are available in KVM, clear CET user bit in
+	 * kvm_caps.supported_xss so that kvm_cet_user_supported() returns
+	 * false when called.
+	 */
+	if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
+	    !kvm_cpu_cap_has(X86_FEATURE_IBT))
+		kvm_caps.supported_xss &= ~XFEATURE_MASK_CET_USER;
+
 	for_each_online_cpu(cpu) {
 		smp_call_function_single(cpu, kvm_x86_check_cpu_compat, &r, 1);
 		if (r < 0)
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 93afa7631735..09a8c8316914 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -547,6 +547,9 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type);
 		__reserved_bits |= X86_CR4_VMXE;        \
 	if (!__cpu_has(__c, X86_FEATURE_PCID))          \
 		__reserved_bits |= X86_CR4_PCIDE;       \
+	if (!__cpu_has(__c, X86_FEATURE_SHSTK) &&       \
+	    !__cpu_has(__c, X86_FEATURE_IBT))           \
+		__reserved_bits |= X86_CR4_CET;         \
 	__reserved_bits;                                \
 })
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 19/21] KVM:nVMX: Enable user CET support for nested VMX
  2023-05-11  4:08 [PATCH v3 00/21] Enable CET Virtualization Yang Weijiang
                   ` (17 preceding siblings ...)
  2023-05-11  4:08 ` [PATCH v3 18/21] KVM:x86: Enable CET virtualization for VMX and advertise to userspace Yang Weijiang
@ 2023-05-11  4:08 ` Yang Weijiang
  2023-05-11  4:08 ` [PATCH v3 20/21] KVM:x86: Enable kernel IBT support for guest Yang Weijiang
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 99+ messages in thread
From: Yang Weijiang @ 2023-05-11  4:08 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm, linux-kernel
  Cc: peterz, rppt, binbin.wu, rick.p.edgecombe, weijiang.yang,
	john.allen, Sean Christopherson

Add CET fields to vmcs12 and pass through CET user mode MSRs to
L2 if L1 can support. Enable nested VMCS control bits and CR4 CET
bit support.

Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/vmx/nested.c | 12 ++++++++++--
 arch/x86/kvm/vmx/vmcs12.c |  6 ++++++
 arch/x86/kvm/vmx/vmcs12.h | 14 +++++++++++++-
 arch/x86/kvm/vmx/vmx.c    |  2 ++
 4 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 7bc62cd72748..522ac27d2534 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -660,6 +660,13 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
 	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
 					 MSR_IA32_FLUSH_CMD, MSR_TYPE_W);
 
+	/* Pass CET MSRs to nested VM if L0 and L1 are set to pass-through. */
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_U_CET, MSR_TYPE_RW);
+
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_PL3_SSP, MSR_TYPE_RW);
+
 	kvm_vcpu_unmap(vcpu, &vmx->nested.msr_bitmap_map, false);
 
 	vmx->nested.force_msr_bitmap_recalc = false;
@@ -6785,7 +6792,7 @@ static void nested_vmx_setup_exit_ctls(struct vmcs_config *vmcs_conf,
 		VM_EXIT_HOST_ADDR_SPACE_SIZE |
 #endif
 		VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
-		VM_EXIT_CLEAR_BNDCFGS;
+		VM_EXIT_CLEAR_BNDCFGS | VM_EXIT_LOAD_CET_STATE;
 	msrs->exit_ctls_high |=
 		VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
 		VM_EXIT_LOAD_IA32_EFER | VM_EXIT_SAVE_IA32_EFER |
@@ -6807,7 +6814,8 @@ static void nested_vmx_setup_entry_ctls(struct vmcs_config *vmcs_conf,
 #ifdef CONFIG_X86_64
 		VM_ENTRY_IA32E_MODE |
 #endif
-		VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS;
+		VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS |
+		VM_ENTRY_LOAD_CET_STATE;
 	msrs->entry_ctls_high |=
 		(VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | VM_ENTRY_LOAD_IA32_EFER |
 		 VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL);
diff --git a/arch/x86/kvm/vmx/vmcs12.c b/arch/x86/kvm/vmx/vmcs12.c
index 106a72c923ca..4233b5ca9461 100644
--- a/arch/x86/kvm/vmx/vmcs12.c
+++ b/arch/x86/kvm/vmx/vmcs12.c
@@ -139,6 +139,9 @@ const unsigned short vmcs12_field_offsets[] = {
 	FIELD(GUEST_PENDING_DBG_EXCEPTIONS, guest_pending_dbg_exceptions),
 	FIELD(GUEST_SYSENTER_ESP, guest_sysenter_esp),
 	FIELD(GUEST_SYSENTER_EIP, guest_sysenter_eip),
+	FIELD(GUEST_S_CET, guest_s_cet),
+	FIELD(GUEST_SSP, guest_ssp),
+	FIELD(GUEST_INTR_SSP_TABLE, guest_ssp_tbl),
 	FIELD(HOST_CR0, host_cr0),
 	FIELD(HOST_CR3, host_cr3),
 	FIELD(HOST_CR4, host_cr4),
@@ -151,5 +154,8 @@ const unsigned short vmcs12_field_offsets[] = {
 	FIELD(HOST_IA32_SYSENTER_EIP, host_ia32_sysenter_eip),
 	FIELD(HOST_RSP, host_rsp),
 	FIELD(HOST_RIP, host_rip),
+	FIELD(HOST_S_CET, host_s_cet),
+	FIELD(HOST_SSP, host_ssp),
+	FIELD(HOST_INTR_SSP_TABLE, host_ssp_tbl),
 };
 const unsigned int nr_vmcs12_fields = ARRAY_SIZE(vmcs12_field_offsets);
diff --git a/arch/x86/kvm/vmx/vmcs12.h b/arch/x86/kvm/vmx/vmcs12.h
index 01936013428b..3884489e7f7e 100644
--- a/arch/x86/kvm/vmx/vmcs12.h
+++ b/arch/x86/kvm/vmx/vmcs12.h
@@ -117,7 +117,13 @@ struct __packed vmcs12 {
 	natural_width host_ia32_sysenter_eip;
 	natural_width host_rsp;
 	natural_width host_rip;
-	natural_width paddingl[8]; /* room for future expansion */
+	natural_width host_s_cet;
+	natural_width host_ssp;
+	natural_width host_ssp_tbl;
+	natural_width guest_s_cet;
+	natural_width guest_ssp;
+	natural_width guest_ssp_tbl;
+	natural_width paddingl[2]; /* room for future expansion */
 	u32 pin_based_vm_exec_control;
 	u32 cpu_based_vm_exec_control;
 	u32 exception_bitmap;
@@ -292,6 +298,12 @@ static inline void vmx_check_vmcs12_offsets(void)
 	CHECK_OFFSET(host_ia32_sysenter_eip, 656);
 	CHECK_OFFSET(host_rsp, 664);
 	CHECK_OFFSET(host_rip, 672);
+	CHECK_OFFSET(host_s_cet, 680);
+	CHECK_OFFSET(host_ssp, 688);
+	CHECK_OFFSET(host_ssp_tbl, 696);
+	CHECK_OFFSET(guest_s_cet, 704);
+	CHECK_OFFSET(guest_ssp, 712);
+	CHECK_OFFSET(guest_ssp_tbl, 720);
 	CHECK_OFFSET(pin_based_vm_exec_control, 744);
 	CHECK_OFFSET(cpu_based_vm_exec_control, 748);
 	CHECK_OFFSET(exception_bitmap, 752);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index cb5908433c09..a2494156902d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7645,6 +7645,8 @@ static void nested_vmx_cr_fixed1_bits_update(struct kvm_vcpu *vcpu)
 	cr4_fixed1_update(X86_CR4_PKE,        ecx, feature_bit(PKU));
 	cr4_fixed1_update(X86_CR4_UMIP,       ecx, feature_bit(UMIP));
 	cr4_fixed1_update(X86_CR4_LA57,       ecx, feature_bit(LA57));
+	cr4_fixed1_update(X86_CR4_CET,	      ecx, feature_bit(SHSTK));
+	cr4_fixed1_update(X86_CR4_CET,	      edx, feature_bit(IBT));
 
 #undef cr4_fixed1_update
 }
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 20/21] KVM:x86: Enable kernel IBT support for guest
  2023-05-11  4:08 [PATCH v3 00/21] Enable CET Virtualization Yang Weijiang
                   ` (18 preceding siblings ...)
  2023-05-11  4:08 ` [PATCH v3 19/21] KVM:nVMX: Enable user CET support for nested VMX Yang Weijiang
@ 2023-05-11  4:08 ` Yang Weijiang
  2023-06-24  0:03   ` Sean Christopherson
  2023-05-11  4:08 ` [PATCH v3 21/21] KVM:x86: Support CET supervisor shadow stack MSR access Yang Weijiang
  2023-06-15 23:30 ` [PATCH v3 00/21] Enable CET Virtualization Sean Christopherson
  21 siblings, 1 reply; 99+ messages in thread
From: Yang Weijiang @ 2023-05-11  4:08 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm, linux-kernel
  Cc: peterz, rppt, binbin.wu, rick.p.edgecombe, weijiang.yang, john.allen

Enable MSR_IA32_S_CET access for guest kernel IBT.

Mainline Linux kernel now supports supervisor IBT for kernel code,
to make s-IBT work in guest(nested guest), pass through MSR_IA32_S_CET
to guest(nested guest) if host kernel and KVM enabled IBT.

Note, s-IBT can work independent to host xsaves support because guest
MSR_IA32_S_CET is {stored|loaded} from VMCS GUEST_S_CET field.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/vmx/nested.c |  3 +++
 arch/x86/kvm/vmx/vmx.c    | 39 ++++++++++++++++++++++++++++++++++-----
 arch/x86/kvm/x86.c        |  7 ++++++-
 3 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 522ac27d2534..bf690827bfee 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -664,6 +664,9 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
 	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
 					 MSR_IA32_U_CET, MSR_TYPE_RW);
 
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_S_CET, MSR_TYPE_RW);
+
 	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
 					 MSR_IA32_PL3_SSP, MSR_TYPE_RW);
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index a2494156902d..1d0151f9e575 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -711,6 +711,7 @@ static bool is_valid_passthrough_msr(u32 msr)
 		return true;
 	case MSR_IA32_U_CET:
 	case MSR_IA32_PL3_SSP:
+	case MSR_IA32_S_CET:
 		return true;
 	}
 
@@ -2097,14 +2098,18 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
 		break;
 	case MSR_IA32_U_CET:
+	case MSR_IA32_S_CET:
 	case MSR_IA32_PL3_SSP:
 	case MSR_KVM_GUEST_SSP:
 		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
 			return 1;
-		if (msr_info->index == MSR_KVM_GUEST_SSP)
+		if (msr_info->index == MSR_KVM_GUEST_SSP) {
 			msr_info->data = vmcs_readl(GUEST_SSP);
-		else
+		} else if (msr_info->index == MSR_IA32_S_CET) {
+			msr_info->data = vmcs_readl(GUEST_S_CET);
+		} else {
 			kvm_get_xsave_msr(msr_info);
+		}
 		break;
 	case MSR_IA32_DEBUGCTLMSR:
 		msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
@@ -2419,6 +2424,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			vmx->pt_desc.guest.addr_a[index / 2] = data;
 		break;
 	case MSR_IA32_U_CET:
+	case MSR_IA32_S_CET:
 	case MSR_IA32_PL3_SSP:
 	case MSR_KVM_GUEST_SSP:
 		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
@@ -2430,10 +2436,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		if ((msr_index == MSR_IA32_PL3_SSP ||
 		     msr_index == MSR_KVM_GUEST_SSP) && (data & GENMASK(2, 0)))
 			return 1;
-		if (msr_index == MSR_KVM_GUEST_SSP)
+		if (msr_index == MSR_KVM_GUEST_SSP) {
 			vmcs_writel(GUEST_SSP, data);
-		else
+		} else if (msr_index == MSR_IA32_S_CET) {
+			vmcs_writel(GUEST_S_CET, data);
+		} else {
 			kvm_set_xsave_msr(msr_info);
+		}
 		break;
 	case MSR_IA32_PERF_CAPABILITIES:
 		if (data && !vcpu_to_pmu(vcpu)->version)
@@ -7322,6 +7331,19 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
 
 	kvm_wait_lapic_expire(vcpu);
 
+	/*
+	 * Save host MSR_IA32_S_CET so that it can be reloaded at vm_exit.
+	 * No need to save the other two vmcs fields as supervisor SHSTK
+	 * are not enabled on Intel platform now.
+	 */
+	if (IS_ENABLED(CONFIG_X86_KERNEL_IBT) &&
+	    (vm_exit_controls_get(vmx) & VM_EXIT_LOAD_CET_STATE)) {
+		u64 msr;
+
+		rdmsrl(MSR_IA32_S_CET, msr);
+		vmcs_writel(HOST_S_CET, msr);
+	}
+
 	/* The actual VMENTER/EXIT is in the .noinstr.text section. */
 	vmx_vcpu_enter_exit(vcpu, __vmx_vcpu_run_flags(vmx));
 
@@ -7735,6 +7757,13 @@ static void vmx_update_intercept_for_cet_msr(struct kvm_vcpu *vcpu)
 
 	incpt |= !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
 	vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP, MSR_TYPE_RW, incpt);
+
+	/*
+	 * If IBT is available to guest, then passthrough S_CET MSR too since
+	 * kernel IBT is already in mainline kernel tree.
+	 */
+	incpt = !guest_cpuid_has(vcpu, X86_FEATURE_IBT);
+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, MSR_TYPE_RW, incpt);
 }
 
 static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
@@ -7805,7 +7834,7 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	/* Refresh #PF interception to account for MAXPHYADDR changes. */
 	vmx_update_exception_bitmap(vcpu);
 
-	if (kvm_cet_user_supported())
+	if (kvm_cet_user_supported() || kvm_cpu_cap_has(X86_FEATURE_IBT))
 		vmx_update_intercept_for_cet_msr(vcpu);
 }
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 858cb68e781a..b450361b94ef 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1471,6 +1471,7 @@ static const u32 msrs_to_save_base[] = {
 	MSR_IA32_XFD, MSR_IA32_XFD_ERR,
 	MSR_IA32_XSS,
 	MSR_IA32_U_CET, MSR_IA32_PL3_SSP, MSR_KVM_GUEST_SSP,
+	MSR_IA32_S_CET,
 };
 
 static const u32 msrs_to_save_pmu[] = {
@@ -13652,7 +13653,8 @@ EXPORT_SYMBOL_GPL(kvm_sev_es_string_io);
 
 bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu, struct msr_data *msr)
 {
-	if (!kvm_cet_user_supported())
+	if (!kvm_cet_user_supported() &&
+	    !kvm_cpu_cap_has(X86_FEATURE_IBT))
 		return false;
 
 	if (msr->host_initiated)
@@ -13666,6 +13668,9 @@ bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu, struct msr_data *msr)
 	if (msr->index == MSR_KVM_GUEST_SSP)
 		return false;
 
+	if (msr->index == MSR_IA32_S_CET)
+		return guest_cpuid_has(vcpu, X86_FEATURE_IBT);
+
 	if (msr->index == MSR_IA32_PL3_SSP &&
 	    !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
 		return false;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3 21/21] KVM:x86: Support CET supervisor shadow stack MSR access
  2023-05-11  4:08 [PATCH v3 00/21] Enable CET Virtualization Yang Weijiang
                   ` (19 preceding siblings ...)
  2023-05-11  4:08 ` [PATCH v3 20/21] KVM:x86: Enable kernel IBT support for guest Yang Weijiang
@ 2023-05-11  4:08 ` Yang Weijiang
  2023-06-15 23:30 ` [PATCH v3 00/21] Enable CET Virtualization Sean Christopherson
  21 siblings, 0 replies; 99+ messages in thread
From: Yang Weijiang @ 2023-05-11  4:08 UTC (permalink / raw)
  To: seanjc, pbonzini, kvm, linux-kernel
  Cc: peterz, rppt, binbin.wu, rick.p.edgecombe, weijiang.yang, john.allen

Add MSR access interfaces for supervisor shadow stack, i.e.,
MSR_IA32_PL{0,1,2} and MSR_IA32_INT_SSP_TAB, meanwhile pass through
them to {L1,L2} guests when {L0,L1} KVM supports supervisor shadow
stack.

Note, currently supervisor shadow stack is not supported on Intel
platforms, i.e., VMX always clears CPUID(EAX=07H,ECX=1):EDX[bit 18].

The main purpose of this patch is to facilitate AMD folks to enable
supervisor shadow stack for their platforms.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/cpuid.h      |  6 ++++++
 arch/x86/kvm/vmx/nested.c | 12 ++++++++++++
 arch/x86/kvm/vmx/vmx.c    | 25 ++++++++++++++++++++++++-
 arch/x86/kvm/x86.c        | 21 ++++++++++++++++++---
 4 files changed, 60 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index b1658c0de847..019a16b25b88 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -232,4 +232,10 @@ static __always_inline bool guest_pv_has(struct kvm_vcpu *vcpu,
 	return vcpu->arch.pv_cpuid.features & (1u << kvm_feature);
 }
 
+static __always_inline bool kvm_cet_kernel_shstk_supported(void)
+{
+	return !IS_ENABLED(CONFIG_KVM_INTEL) &&
+	       kvm_cpu_cap_has(X86_FEATURE_SHSTK);
+}
+
 #endif
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index bf690827bfee..aaaae92dc9f6 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -670,6 +670,18 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
 	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
 					 MSR_IA32_PL3_SSP, MSR_TYPE_RW);
 
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_PL0_SSP, MSR_TYPE_RW);
+
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_PL1_SSP, MSR_TYPE_RW);
+
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_PL2_SSP, MSR_TYPE_RW);
+
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_INT_SSP_TAB, MSR_TYPE_RW);
+
 	kvm_vcpu_unmap(vcpu, &vmx->nested.msr_bitmap_map, false);
 
 	vmx->nested.force_msr_bitmap_recalc = false;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1d0151f9e575..d70f2e94b187 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -713,6 +713,9 @@ static bool is_valid_passthrough_msr(u32 msr)
 	case MSR_IA32_PL3_SSP:
 	case MSR_IA32_S_CET:
 		return true;
+	case MSR_IA32_PL0_SSP ... MSR_IA32_PL2_SSP:
+	case MSR_IA32_INT_SSP_TAB:
+		return true;
 	}
 
 	r = possible_passthrough_msr_slot(msr) != -ENOENT;
@@ -2101,12 +2104,16 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_S_CET:
 	case MSR_IA32_PL3_SSP:
 	case MSR_KVM_GUEST_SSP:
+	case MSR_IA32_PL0_SSP ... MSR_IA32_PL2_SSP:
+	case MSR_IA32_INT_SSP_TAB:
 		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
 			return 1;
 		if (msr_info->index == MSR_KVM_GUEST_SSP) {
 			msr_info->data = vmcs_readl(GUEST_SSP);
 		} else if (msr_info->index == MSR_IA32_S_CET) {
 			msr_info->data = vmcs_readl(GUEST_S_CET);
+		} else if (msr_info->index == MSR_IA32_INT_SSP_TAB) {
+			msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE);
 		} else {
 			kvm_get_xsave_msr(msr_info);
 		}
@@ -2427,6 +2434,8 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_S_CET:
 	case MSR_IA32_PL3_SSP:
 	case MSR_KVM_GUEST_SSP:
+	case MSR_IA32_PL0_SSP ... MSR_IA32_PL2_SSP:
+	case MSR_IA32_INT_SSP_TAB:
 		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
 			return 1;
 		if (is_noncanonical_address(data, vcpu))
@@ -2440,6 +2449,8 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			vmcs_writel(GUEST_SSP, data);
 		} else if (msr_index == MSR_IA32_S_CET) {
 			vmcs_writel(GUEST_S_CET, data);
+		} else if (msr_index == MSR_IA32_INT_SSP_TAB) {
+			vmcs_writel(GUEST_INTR_SSP_TABLE, data);
 		} else {
 			kvm_set_xsave_msr(msr_info);
 		}
@@ -7764,6 +7775,17 @@ static void vmx_update_intercept_for_cet_msr(struct kvm_vcpu *vcpu)
 	 */
 	incpt = !guest_cpuid_has(vcpu, X86_FEATURE_IBT);
 	vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, MSR_TYPE_RW, incpt);
+
+	/*
+	 * Supervisor shadow stack is not supported in VMX now, intercept all
+	 * related MSRs.
+	 */
+	incpt = !kvm_cet_kernel_shstk_supported();
+
+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_INT_SSP_TAB, MSR_TYPE_RW, incpt);
+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP, MSR_TYPE_RW, incpt);
+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP, MSR_TYPE_RW, incpt);
+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP, MSR_TYPE_RW, incpt);
 }
 
 static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
@@ -7834,7 +7856,8 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	/* Refresh #PF interception to account for MAXPHYADDR changes. */
 	vmx_update_exception_bitmap(vcpu);
 
-	if (kvm_cet_user_supported() || kvm_cpu_cap_has(X86_FEATURE_IBT))
+	if (kvm_cet_user_supported() || kvm_cpu_cap_has(X86_FEATURE_IBT) ||
+	    kvm_cpu_cap_has(X86_FEATURE_SHSTK))
 		vmx_update_intercept_for_cet_msr(vcpu);
 }
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b450361b94ef..a9ab01293420 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1472,6 +1472,8 @@ static const u32 msrs_to_save_base[] = {
 	MSR_IA32_XSS,
 	MSR_IA32_U_CET, MSR_IA32_PL3_SSP, MSR_KVM_GUEST_SSP,
 	MSR_IA32_S_CET,
+	MSR_IA32_PL0_SSP, MSR_IA32_PL1_SSP, MSR_IA32_PL2_SSP,
+	MSR_IA32_INT_SSP_TAB,
 };
 
 static const u32 msrs_to_save_pmu[] = {
@@ -13653,8 +13655,11 @@ EXPORT_SYMBOL_GPL(kvm_sev_es_string_io);
 
 bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu, struct msr_data *msr)
 {
+	u64 mask;
+
 	if (!kvm_cet_user_supported() &&
-	    !kvm_cpu_cap_has(X86_FEATURE_IBT))
+	    !(kvm_cpu_cap_has(X86_FEATURE_IBT) ||
+	      kvm_cpu_cap_has(X86_FEATURE_SHSTK)))
 		return false;
 
 	if (msr->host_initiated)
@@ -13668,14 +13673,24 @@ bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu, struct msr_data *msr)
 	if (msr->index == MSR_KVM_GUEST_SSP)
 		return false;
 
+	if (msr->index == MSR_IA32_U_CET)
+		return true;
+
 	if (msr->index == MSR_IA32_S_CET)
-		return guest_cpuid_has(vcpu, X86_FEATURE_IBT);
+		return guest_cpuid_has(vcpu, X86_FEATURE_IBT) ||
+		       kvm_cet_kernel_shstk_supported();
+
+	if (msr->index == MSR_IA32_INT_SSP_TAB)
+		return guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) &&
+		       kvm_cet_kernel_shstk_supported();
 
 	if (msr->index == MSR_IA32_PL3_SSP &&
 	    !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
 		return false;
 
-	return true;
+	mask = (msr->index == MSR_IA32_PL3_SSP) ? XFEATURE_MASK_CET_USER :
+						  XFEATURE_MASK_CET_KERNEL;
+	return !!(kvm_caps.supported_xss & mask);
 }
 EXPORT_SYMBOL_GPL(kvm_cet_is_msr_accessible);
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 13/21] KVM:VMX: Emulate reads and writes to CET MSRs
  2023-05-11  4:08 ` [PATCH v3 13/21] KVM:VMX: Emulate reads and writes to CET MSRs Yang Weijiang
@ 2023-05-23  8:21   ` Binbin Wu
  2023-05-24  2:49     ` Yang, Weijiang
  2023-06-23 23:53   ` Sean Christopherson
  1 sibling, 1 reply; 99+ messages in thread
From: Binbin Wu @ 2023-05-23  8:21 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: seanjc, pbonzini, kvm, linux-kernel, peterz, rppt,
	rick.p.edgecombe, john.allen, Sean Christopherson



On 5/11/2023 12:08 PM, Yang Weijiang wrote:
> Add support for emulating read and write accesses to CET MSRs.
> CET MSRs are universally "special" as they are either context switched
> via dedicated VMCS fields or via XSAVES, i.e. no additional in-memory
> tracking is needed, but emulated reads/writes are more expensive.
>
> Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> ---
>   arch/x86/kernel/fpu/core.c |  1 +
>   arch/x86/kvm/vmx/vmx.c     | 18 ++++++++++++++++++
>   arch/x86/kvm/x86.c         | 20 ++++++++++++++++++++
>   arch/x86/kvm/x86.h         | 31 +++++++++++++++++++++++++++++++
>   4 files changed, 70 insertions(+)
>
...
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index b6eec9143129..2e3a39c9297c 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -13630,6 +13630,26 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
>   }
>   EXPORT_SYMBOL_GPL(kvm_sev_es_string_io);
>   
> +bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu, struct msr_data *msr)
> +{
> +	if (!kvm_cet_user_supported())
> +		return false;
> +
> +	if (msr->host_initiated)
> +		return true;
> +
> +	if (!guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) &&
> +	    !guest_cpuid_has(vcpu, X86_FEATURE_IBT))
> +		return false;
> +
> +	if (msr->index == MSR_IA32_PL3_SSP &&
> +	    !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
> +		return false;
It may be better to merge the two if statements into one to avoid 
calling guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) twice.

e.g,

     if (!guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) &&
         (!guest_cpuid_has(vcpu, X86_FEATURE_IBT) || msr->index == 
MSR_IA32_PL3_SSP))
         return false;


> +
> +	return true;
> +}
> +EXPORT_SYMBOL_GPL(kvm_cet_is_msr_accessible);
> +
>   EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_entry);
>   EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
>   EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
>
...

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 14/21] KVM:VMX: Add a synthetic MSR to allow userspace to access GUEST_SSP
  2023-05-11  4:08 ` [PATCH v3 14/21] KVM:VMX: Add a synthetic MSR to allow userspace to access GUEST_SSP Yang Weijiang
@ 2023-05-23  8:57   ` Binbin Wu
  2023-05-24  2:55     ` Yang, Weijiang
  0 siblings, 1 reply; 99+ messages in thread
From: Binbin Wu @ 2023-05-23  8:57 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: seanjc, pbonzini, kvm, linux-kernel, peterz, rppt,
	rick.p.edgecombe, john.allen, Sean Christopherson



On 5/11/2023 12:08 PM, Yang Weijiang wrote:
> Introduce a host-only synthetic MSR, MSR_KVM_GUEST_SSP, so that the VMM
> can read/write the guest's SSP, e.g. to migrate CET state.  Use a synthetic
> MSR, e.g. as opposed to a VCPU_REG_, as GUEST_SSP is subject to the same
> consistency checks as the PL*_SSP MSRs, i.e. can share code.
>
> Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> ---
>   arch/x86/include/uapi/asm/kvm_para.h |  1 +
>   arch/x86/kvm/vmx/vmx.c               | 15 ++++++++++++---
>   arch/x86/kvm/x86.c                   |  4 ++++
>   3 files changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> index 6e64b27b2c1e..7af465e4e0bd 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -58,6 +58,7 @@
>   #define MSR_KVM_ASYNC_PF_INT	0x4b564d06
>   #define MSR_KVM_ASYNC_PF_ACK	0x4b564d07
>   #define MSR_KVM_MIGRATION_CONTROL	0x4b564d08
> +#define MSR_KVM_GUEST_SSP	0x4b564d09
>   
>   struct kvm_steal_time {
>   	__u64 steal;
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 0ccaa467d7d3..72149156bbd3 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2095,9 +2095,13 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		break;
>   	case MSR_IA32_U_CET:
>   	case MSR_IA32_PL3_SSP:
> +	case MSR_KVM_GUEST_SSP:
>   		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
>   			return 1;
> -		kvm_get_xsave_msr(msr_info);
> +		if (msr_info->index == MSR_KVM_GUEST_SSP)
> +			msr_info->data = vmcs_readl(GUEST_SSP);
According to the change of the kvm_cet_is_msr_accessible() below,
kvm_cet_is_msr_accessible() will return false for MSR_KVM_GUEST_SSP, 
then this code is unreachable?

> +		else
> +			kvm_get_xsave_msr(msr_info);
>   		break;
>   	case MSR_IA32_DEBUGCTLMSR:
>   		msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
> @@ -2413,15 +2417,20 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		break;
>   	case MSR_IA32_U_CET:
>   	case MSR_IA32_PL3_SSP:
> +	case MSR_KVM_GUEST_SSP:
>   		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
>   			return 1;
>   		if (is_noncanonical_address(data, vcpu))
>   			return 1;
>   		if (msr_index == MSR_IA32_U_CET && (data & GENMASK(9, 6)))
>   			return 1;
> -		if (msr_index == MSR_IA32_PL3_SSP && (data & GENMASK(2, 0)))
> +		if ((msr_index == MSR_IA32_PL3_SSP ||
> +		     msr_index == MSR_KVM_GUEST_SSP) && (data & GENMASK(2, 0)))
>   			return 1;
> -		kvm_set_xsave_msr(msr_info);
> +		if (msr_index == MSR_KVM_GUEST_SSP)
> +			vmcs_writel(GUEST_SSP, data);
> +		else
> +			kvm_set_xsave_msr(msr_info);
>   		break;
>   	case MSR_IA32_PERF_CAPABILITIES:
>   		if (data && !vcpu_to_pmu(vcpu)->version)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 2e3a39c9297c..baac6acebd40 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -13642,6 +13642,10 @@ bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu, struct msr_data *msr)
>   	    !guest_cpuid_has(vcpu, X86_FEATURE_IBT))
>   		return false;
>   
> +	/* The synthetic MSR is for userspace access only. */
> +	if (msr->index == MSR_KVM_GUEST_SSP)
> +		return false;
> +
>   	if (msr->index == MSR_IA32_PL3_SSP &&
>   	    !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
>   		return false;


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 13/21] KVM:VMX: Emulate reads and writes to CET MSRs
  2023-05-23  8:21   ` Binbin Wu
@ 2023-05-24  2:49     ` Yang, Weijiang
  0 siblings, 0 replies; 99+ messages in thread
From: Yang, Weijiang @ 2023-05-24  2:49 UTC (permalink / raw)
  To: Binbin Wu
  Cc: seanjc, pbonzini, kvm, linux-kernel, peterz, rppt,
	rick.p.edgecombe, john.allen, Sean Christopherson


On 5/23/2023 4:21 PM, Binbin Wu wrote:
>
>
> On 5/11/2023 12:08 PM, Yang Weijiang wrote:
>> Add support for emulating read and write accesses to CET MSRs.
>> CET MSRs are universally "special" as they are either context switched
>> via dedicated VMCS fields or via XSAVES, i.e. no additional in-memory
>> tracking is needed, but emulated reads/writes are more expensive.
[...]
>> +
>> +    if (!guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) &&
>> +        !guest_cpuid_has(vcpu, X86_FEATURE_IBT))
>> +        return false;
>> +
>> +    if (msr->index == MSR_IA32_PL3_SSP &&
>> +        !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
>> +        return false;
> It may be better to merge the two if statements into one to avoid 
> calling guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) twice.
>
Yeah, it sounds good to me, thanks!

> e.g,
>
>     if (!guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) &&
>         (!guest_cpuid_has(vcpu, X86_FEATURE_IBT) || msr->index == 
> MSR_IA32_PL3_SSP))
>         return false;
>
>
>> +
>> +    return true;
>> +}
>> +EXPORT_SYMBOL_GPL(kvm_cet_is_msr_accessible);
>> +
>>   EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_entry);
>>   EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
>>   EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
>>
> ...

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 14/21] KVM:VMX: Add a synthetic MSR to allow userspace to access GUEST_SSP
  2023-05-23  8:57   ` Binbin Wu
@ 2023-05-24  2:55     ` Yang, Weijiang
  0 siblings, 0 replies; 99+ messages in thread
From: Yang, Weijiang @ 2023-05-24  2:55 UTC (permalink / raw)
  To: Binbin Wu
  Cc: seanjc, pbonzini, kvm, linux-kernel, peterz, rppt,
	rick.p.edgecombe, john.allen, Sean Christopherson


On 5/23/2023 4:57 PM, Binbin Wu wrote:
>
>
> On 5/11/2023 12:08 PM, Yang Weijiang wrote:
>> Introduce a host-only synthetic MSR, MSR_KVM_GUEST_SSP, so that the VMM
>> can read/write the guest's SSP, e.g. to migrate CET state.  Use a 
>> synthetic
>> MSR, e.g. as opposed to a VCPU_REG_, as GUEST_SSP is subject to the same
>> consistency checks as the PL*_SSP MSRs, i.e. can share code.
>>
>> Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
>> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
>> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
>> ---
>>   arch/x86/include/uapi/asm/kvm_para.h |  1 +
>>   arch/x86/kvm/vmx/vmx.c               | 15 ++++++++++++---
>>   arch/x86/kvm/x86.c                   |  4 ++++
>>   3 files changed, 17 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
>> b/arch/x86/include/uapi/asm/kvm_para.h
>> index 6e64b27b2c1e..7af465e4e0bd 100644
>> --- a/arch/x86/include/uapi/asm/kvm_para.h
>> +++ b/arch/x86/include/uapi/asm/kvm_para.h
>> @@ -58,6 +58,7 @@
>>   #define MSR_KVM_ASYNC_PF_INT    0x4b564d06
>>   #define MSR_KVM_ASYNC_PF_ACK    0x4b564d07
>>   #define MSR_KVM_MIGRATION_CONTROL    0x4b564d08
>> +#define MSR_KVM_GUEST_SSP    0x4b564d09
>>     struct kvm_steal_time {
>>       __u64 steal;
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index 0ccaa467d7d3..72149156bbd3 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -2095,9 +2095,13 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, 
>> struct msr_data *msr_info)
>>           break;
>>       case MSR_IA32_U_CET:
>>       case MSR_IA32_PL3_SSP:
>> +    case MSR_KVM_GUEST_SSP:
>>           if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
>>               return 1;
>> -        kvm_get_xsave_msr(msr_info);
>> +        if (msr_info->index == MSR_KVM_GUEST_SSP)
>> +            msr_info->data = vmcs_readl(GUEST_SSP);
> According to the change of the kvm_cet_is_msr_accessible() below,
> kvm_cet_is_msr_accessible() will return false for MSR_KVM_GUEST_SSP, 
> then this code is unreachable?

No, when the access is initiated from host side, 
kvm_cet_is_msr_accessible() return true for MSR_KVM_GUEST_SSP.

So the code is reachable:

if (msr->host_initiated)

     return true;


[...]

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 18/21] KVM:x86: Enable CET virtualization for VMX and advertise to userspace
  2023-05-11  4:08 ` [PATCH v3 18/21] KVM:x86: Enable CET virtualization for VMX and advertise to userspace Yang Weijiang
@ 2023-05-24  6:35   ` Chenyi Qiang
  2023-05-24  8:07     ` Yang, Weijiang
  0 siblings, 1 reply; 99+ messages in thread
From: Chenyi Qiang @ 2023-05-24  6:35 UTC (permalink / raw)
  To: Yang Weijiang, seanjc, pbonzini, kvm, linux-kernel
  Cc: peterz, rppt, binbin.wu, rick.p.edgecombe, john.allen,
	Sean Christopherson



On 5/11/2023 12:08 PM, Yang Weijiang wrote:
> Set the feature bits so that CET capabilities can be seen in guest via
> CPUID enumeration. Add CR4.CET bit support in order to allow guest set
> CET master control bit(CR4.CET).
> 
> Disable KVM CET feature if unrestricted_guest is unsupported/disabled as
> KVM does not support emulating CET.
> 
> Don't expose CET feature if dependent CET bits are cleared in host XSS,
> or if XSAVES isn't supported.
> 
> Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  3 ++-
>  arch/x86/kvm/cpuid.c            | 12 ++++++++++--
>  arch/x86/kvm/vmx/capabilities.h |  4 ++++
>  arch/x86/kvm/vmx/vmx.c          | 19 +++++++++++++++++++
>  arch/x86/kvm/vmx/vmx.h          |  6 ++++--
>  arch/x86/kvm/x86.c              | 21 ++++++++++++++++++++-
>  arch/x86/kvm/x86.h              |  3 +++
>  7 files changed, 62 insertions(+), 6 deletions(-)

...

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 50026557fb2a..858cb68e781a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -226,7 +226,7 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
>  				| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
>  				| XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE)
>  
> -#define KVM_SUPPORTED_XSS     0
> +#define KVM_SUPPORTED_XSS	(XFEATURE_MASK_CET_USER)
>  
>  u64 __read_mostly host_efer;
>  EXPORT_SYMBOL_GPL(host_efer);
> @@ -9525,6 +9525,25 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
>  
>  	kvm_ops_update(ops);
>  
> +	/*
> +	 * Check CET user bit is still set in kvm_caps.supported_xss,
> +	 * if not, clear the cap bits as the user parts depends on
> +	 * XSAVES support.
> +	 */
> +	if (!kvm_cet_user_supported()) {
> +		kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
> +		kvm_cpu_cap_clear(X86_FEATURE_IBT);
> +	}
> +
> +	/*
> +	 * If SHSTK and IBT are available in KVM, clear CET user bit in

Should it be "If SHSTK and IBT are *not* available ..."?

> +	 * kvm_caps.supported_xss so that kvm_cet_user_supported() returns
> +	 * false when called.
> +	 */
> +	if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
> +	    !kvm_cpu_cap_has(X86_FEATURE_IBT))
> +		kvm_caps.supported_xss &= ~XFEATURE_MASK_CET_USER;
> +
>  	for_each_online_cpu(cpu) {
>  		smp_call_function_single(cpu, kvm_x86_check_cpu_compat, &r, 1);
>  		if (r < 0)


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 06/21] KVM:x86: Report XSS as to-be-saved if there are supported features
  2023-05-11  4:08 ` [PATCH v3 06/21] KVM:x86: Report XSS as to-be-saved if there are supported features Yang Weijiang
@ 2023-05-24  7:06   ` Chao Gao
  2023-05-24  8:19     ` Yang, Weijiang
  0 siblings, 1 reply; 99+ messages in thread
From: Chao Gao @ 2023-05-24  7:06 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: seanjc, pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen, Sean Christopherson

On Thu, May 11, 2023 at 12:08:42AM -0400, Yang Weijiang wrote:
>From: Sean Christopherson <sean.j.christopherson@intel.com>
>
>Add MSR_IA32_XSS to the list of MSRs reported to userspace if
>supported_xss is non-zero, i.e. KVM supports at least one XSS based
>feature.

The changelog doesn't match what the patch does.

Do you need to check if supported_xss is non-zero in kvm_probe_msr_to_save(),
e.g.,
        case MSR_IA32_XSS:
                if (!kvm_caps.supported_xss)
                        return;
                break;

>
>Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
>Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
>---
> arch/x86/kvm/x86.c | 1 +
> 1 file changed, 1 insertion(+)
>
>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>index e7f78fe79b32..33a780fe820b 100644
>--- a/arch/x86/kvm/x86.c
>+++ b/arch/x86/kvm/x86.c
>@@ -1454,6 +1454,7 @@ static const u32 msrs_to_save_base[] = {
> 	MSR_IA32_UMWAIT_CONTROL,
> 
> 	MSR_IA32_XFD, MSR_IA32_XFD_ERR,
>+	MSR_IA32_XSS,
> };
> 
> static const u32 msrs_to_save_pmu[] = {
>-- 
>2.27.0
>

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 18/21] KVM:x86: Enable CET virtualization for VMX and advertise to userspace
  2023-05-24  6:35   ` Chenyi Qiang
@ 2023-05-24  8:07     ` Yang, Weijiang
  0 siblings, 0 replies; 99+ messages in thread
From: Yang, Weijiang @ 2023-05-24  8:07 UTC (permalink / raw)
  To: Chenyi Qiang
  Cc: seanjc, pbonzini, linux-kernel, kvm, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen


On 5/24/2023 2:35 PM, Chenyi Qiang wrote:
>
> On 5/11/2023 12:08 PM, Yang Weijiang wrote:

[...]

>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 50026557fb2a..858cb68e781a 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -226,7 +226,7 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
>>   				| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
>>   				| XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE)
>>   
>> -#define KVM_SUPPORTED_XSS     0
>> +#define KVM_SUPPORTED_XSS	(XFEATURE_MASK_CET_USER)
>>   
>>   u64 __read_mostly host_efer;
>>   EXPORT_SYMBOL_GPL(host_efer);
>> @@ -9525,6 +9525,25 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
>>   
>>   	kvm_ops_update(ops);
>>   
>> +	/*
>> +	 * Check CET user bit is still set in kvm_caps.supported_xss,
>> +	 * if not, clear the cap bits as the user parts depends on
>> +	 * XSAVES support.
>> +	 */
>> +	if (!kvm_cet_user_supported()) {
>> +		kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
>> +		kvm_cpu_cap_clear(X86_FEATURE_IBT);
>> +	}
>> +
>> +	/*
>> +	 * If SHSTK and IBT are available in KVM, clear CET user bit in
> Should it be "If SHSTK and IBT are *not* available ..."?

Good catch, thanks! I'll change it in next version!

>
>> +	 * kvm_caps.supported_xss so that kvm_cet_user_supported() returns
>> +	 * false when called.
>> +	 */
>> +	if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
>> +	    !kvm_cpu_cap_has(X86_FEATURE_IBT))
>> +		kvm_caps.supported_xss &= ~XFEATURE_MASK_CET_USER;
>> +
>>   	for_each_online_cpu(cpu) {
>>   		smp_call_function_single(cpu, kvm_x86_check_cpu_compat, &r, 1);
>>   		if (r < 0)

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 06/21] KVM:x86: Report XSS as to-be-saved if there are supported features
  2023-05-24  7:06   ` Chao Gao
@ 2023-05-24  8:19     ` Yang, Weijiang
  0 siblings, 0 replies; 99+ messages in thread
From: Yang, Weijiang @ 2023-05-24  8:19 UTC (permalink / raw)
  To: Chao Gao
  Cc: seanjc, pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen, Sean Christopherson


On 5/24/2023 3:06 PM, Chao Gao wrote:
> On Thu, May 11, 2023 at 12:08:42AM -0400, Yang Weijiang wrote:
>> From: Sean Christopherson <sean.j.christopherson@intel.com>
>>
>> Add MSR_IA32_XSS to the list of MSRs reported to userspace if
>> supported_xss is non-zero, i.e. KVM supports at least one XSS based
>> feature.
> The changelog doesn't match what the patch does.
>
> Do you need to check if supported_xss is non-zero in kvm_probe_msr_to_save(),
> e.g.,
>          case MSR_IA32_XSS:
>                  if (!kvm_caps.supported_xss)
>                          return;
>                  break;

I looked back the history of this patch, there's similar check 
originally, however it's lost

during following rebases. I'll add it back. Thanks for pointing it out!

>> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
>> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
>> ---
>> arch/x86/kvm/x86.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index e7f78fe79b32..33a780fe820b 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -1454,6 +1454,7 @@ static const u32 msrs_to_save_base[] = {
>> 	MSR_IA32_UMWAIT_CONTROL,
>>
>> 	MSR_IA32_XFD, MSR_IA32_XFD_ERR,
>> +	MSR_IA32_XSS,
>> };
>>
>> static const u32 msrs_to_save_pmu[] = {
>> -- 
>> 2.27.0
>>

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 07/21] KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS
  2023-05-11  4:08 ` [PATCH v3 07/21] KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS Yang Weijiang
@ 2023-05-25  6:10   ` Chao Gao
  2023-05-30  3:51     ` Yang, Weijiang
  0 siblings, 1 reply; 99+ messages in thread
From: Chao Gao @ 2023-05-25  6:10 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: seanjc, pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen, Zhang Yi Z

On Thu, May 11, 2023 at 12:08:43AM -0400, Yang Weijiang wrote:
>Update CPUID(EAX=0DH,ECX=1) when the guest's XSS is modified.
>CPUID(EAX=0DH,ECX=1).EBX reports current required storage size for all
>features enabled via XCR0 | XSS so that guest can allocate correct xsave
>buffer.
>
>Note, KVM does not yet support any XSS based features, i.e. supported_xss
>is guaranteed to be zero at this time.
>
>Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
>Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
>Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
>---
> arch/x86/kvm/cpuid.c | 7 +++++--
> arch/x86/kvm/x86.c   | 6 ++++--
> 2 files changed, 9 insertions(+), 4 deletions(-)
>
>diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>index 123bf8b97a4b..cbb1b8a65502 100644
>--- a/arch/x86/kvm/cpuid.c
>+++ b/arch/x86/kvm/cpuid.c
>@@ -277,8 +277,11 @@ static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_e
> 
> 	best = cpuid_entry2_find(entries, nent, 0xD, 1);
> 	if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
>-		     cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
>-		best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
>+		cpuid_entry_has(best, X86_FEATURE_XSAVEC))) {

Align indentation.

 	if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
		     cpuid_entry_has(best, X86_FEATURE_XSAVEC))) {

>+		u64 xstate = vcpu->arch.xcr0 | vcpu->arch.ia32_xss;
>+
>+		best->ebx = xstate_required_size(xstate, true);
>+	}
> 
> 	best = __kvm_find_kvm_cpuid_features(vcpu, entries, nent);
> 	if (kvm_hlt_in_guest(vcpu->kvm) && best &&
>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>index 33a780fe820b..ab3360a10933 100644
>--- a/arch/x86/kvm/x86.c
>+++ b/arch/x86/kvm/x86.c
>@@ -3776,8 +3776,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> 		 */
> 		if (data & ~kvm_caps.supported_xss)

Shouldn't we check against the supported value of _this_ guest? similar to
guest_supported_xcr0.

> 			return 1;
>-		vcpu->arch.ia32_xss = data;
>-		kvm_update_cpuid_runtime(vcpu);
>+		if (vcpu->arch.ia32_xss != data) {
>+			vcpu->arch.ia32_xss = data;
>+			kvm_update_cpuid_runtime(vcpu);
>+		}
> 		break;
> 	case MSR_SMI_COUNT:
> 		if (!msr_info->host_initiated)
>-- 
>2.27.0
>

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 07/21] KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS
  2023-05-25  6:10   ` Chao Gao
@ 2023-05-30  3:51     ` Yang, Weijiang
  2023-05-30 12:08       ` Chao Gao
  0 siblings, 1 reply; 99+ messages in thread
From: Yang, Weijiang @ 2023-05-30  3:51 UTC (permalink / raw)
  To: Chao Gao
  Cc: seanjc, pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen, Zhang Yi Z


On 5/25/2023 2:10 PM, Chao Gao wrote:
> On Thu, May 11, 2023 at 12:08:43AM -0400, Yang Weijiang wrote:
>> Update CPUID(EAX=0DH,ECX=1) when the guest's XSS is modified.
>> CPUID(EAX=0DH,ECX=1).EBX reports current required storage size for all
>> features enabled via XCR0 | XSS so that guest can allocate correct xsave
>> buffer.
>>
>> Note, KVM does not yet support any XSS based features, i.e. supported_xss
>> is guaranteed to be zero at this time.
>>
>> Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
>> Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
>> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
>> ---
>> arch/x86/kvm/cpuid.c | 7 +++++--
>> arch/x86/kvm/x86.c   | 6 ++++--
>> 2 files changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>> index 123bf8b97a4b..cbb1b8a65502 100644
>> --- a/arch/x86/kvm/cpuid.c
>> +++ b/arch/x86/kvm/cpuid.c
>> @@ -277,8 +277,11 @@ static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_e
>>
>> 	best = cpuid_entry2_find(entries, nent, 0xD, 1);
>> 	if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
>> -		     cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
>> -		best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
>> +		cpuid_entry_has(best, X86_FEATURE_XSAVEC))) {
> Align indentation.

OK. Thanks!

>
>   	if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
> 		     cpuid_entry_has(best, X86_FEATURE_XSAVEC))) {
>
>> +		u64 xstate = vcpu->arch.xcr0 | vcpu->arch.ia32_xss;
>> +
>> +		best->ebx = xstate_required_size(xstate, true);
>> +	}
>>
>> 	best = __kvm_find_kvm_cpuid_features(vcpu, entries, nent);
>> 	if (kvm_hlt_in_guest(vcpu->kvm) && best &&
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 33a780fe820b..ab3360a10933 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -3776,8 +3776,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>> 		 */
>> 		if (data & ~kvm_caps.supported_xss)
> Shouldn't we check against the supported value of _this_ guest? similar to
> guest_supported_xcr0.

I don't think it requires an extra variable to serve per guest purpose.

For guest XSS settings, now we don't add extra constraints like XCR0, 
thus all KVM supported

bits can be accessed by guest.  There's already another variable 
vcpu->arch.ia32_xss

to play similar role. In future, if there's requirement to serve per VM 
control purpose,

then will align it to XCR0 settings.


>
>> 			return 1;
>> -		vcpu->arch.ia32_xss = data;
>> -		kvm_update_cpuid_runtime(vcpu);
>> +		if (vcpu->arch.ia32_xss != data) {
>> +			vcpu->arch.ia32_xss = data;
>> +			kvm_update_cpuid_runtime(vcpu);
>> +		}
>> 		break;
>> 	case MSR_SMI_COUNT:
>> 		if (!msr_info->host_initiated)
>> -- 
>> 2.27.0
>>

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 07/21] KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS
  2023-05-30  3:51     ` Yang, Weijiang
@ 2023-05-30 12:08       ` Chao Gao
  2023-05-31  1:11         ` Yang, Weijiang
  0 siblings, 1 reply; 99+ messages in thread
From: Chao Gao @ 2023-05-30 12:08 UTC (permalink / raw)
  To: Yang, Weijiang
  Cc: seanjc, pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen, Zhang Yi Z

>> > --- a/arch/x86/kvm/x86.c
>> > +++ b/arch/x86/kvm/x86.c
>> > @@ -3776,8 +3776,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>> > 		 */
>> > 		if (data & ~kvm_caps.supported_xss)
>> Shouldn't we check against the supported value of _this_ guest? similar to
>> guest_supported_xcr0.
>
>I don't think it requires an extra variable to serve per guest purpose.
>
>For guest XSS settings, now we don't add extra constraints like XCR0, thus

QEMU can impose constraints by configuring guest CPUID.0xd.1 to indicate
certain supervisor state components cannot be managed by XSAVES, even
though KVM supports them. IOW, guests may differ in the supported values
for the IA32_XSS MSR.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 07/21] KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS
  2023-05-30 12:08       ` Chao Gao
@ 2023-05-31  1:11         ` Yang, Weijiang
  2023-06-15 23:45           ` Sean Christopherson
  0 siblings, 1 reply; 99+ messages in thread
From: Yang, Weijiang @ 2023-05-31  1:11 UTC (permalink / raw)
  To: Chao Gao
  Cc: seanjc, pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen, Zhang Yi Z


On 5/30/2023 8:08 PM, Chao Gao wrote:
>>>> --- a/arch/x86/kvm/x86.c
>>>> +++ b/arch/x86/kvm/x86.c
>>>> @@ -3776,8 +3776,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>>> 		 */
>>>> 		if (data & ~kvm_caps.supported_xss)
>>> Shouldn't we check against the supported value of _this_ guest? similar to
>>> guest_supported_xcr0.
>> I don't think it requires an extra variable to serve per guest purpose.
>>
>> For guest XSS settings, now we don't add extra constraints like XCR0, thus
> QEMU can impose constraints by configuring guest CPUID.0xd.1 to indicate
> certain supervisor state components cannot be managed by XSAVES, even
> though KVM supports them. IOW, guests may differ in the supported values
> for the IA32_XSS MSR.

OK, will change this part to align with xcr0 settings. Thanks!


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 08/21] KVM:x86: Init kvm_caps.supported_xss with supported feature bits
  2023-05-11  4:08 ` [PATCH v3 08/21] KVM:x86: Init kvm_caps.supported_xss with supported feature bits Yang Weijiang
@ 2023-06-06  8:38   ` Chao Gao
  2023-06-08  5:42     ` Yang, Weijiang
  0 siblings, 1 reply; 99+ messages in thread
From: Chao Gao @ 2023-06-06  8:38 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: seanjc, pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen

On Thu, May 11, 2023 at 12:08:44AM -0400, Yang Weijiang wrote:
>Initialize kvm_caps.supported_xss with host XSS msr value AND XSS mask.
>KVM_SUPPORTED_XSS holds all potential supported feature bits,

>the result
>represents all KVM supported feature bits which is used for swapping guest
>and host FPU contents.

do you mean kvm_caps.supported_xss by "the result"? I don't see how
fpu_swap_kvm_fpstate() uses kvm_caps.supported_xss.

>
>Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
>---
> arch/x86/kvm/vmx/vmx.c | 1 -
> arch/x86/kvm/x86.c     | 6 +++++-
> 2 files changed, 5 insertions(+), 2 deletions(-)
>
>diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>index 44fb619803b8..c872a5aafa50 100644
>--- a/arch/x86/kvm/vmx/vmx.c
>+++ b/arch/x86/kvm/vmx/vmx.c
>@@ -7806,7 +7806,6 @@ static __init void vmx_set_cpu_caps(void)
> 		kvm_cpu_cap_set(X86_FEATURE_UMIP);
> 
> 	/* CPUID 0xD.1 */
>-	kvm_caps.supported_xss = 0;

AMD has the same statement. Do you need to remove that one?

> 	if (!cpu_has_vmx_xsaves())
> 		kvm_cpu_cap_clear(X86_FEATURE_XSAVES);
> 
>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>index ab3360a10933..d2975ca96ac5 100644
>--- a/arch/x86/kvm/x86.c
>+++ b/arch/x86/kvm/x86.c
>@@ -223,6 +223,8 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
> 				| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
> 				| XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE)
> 
>+#define KVM_SUPPORTED_XSS     0
>+
> u64 __read_mostly host_efer;
> EXPORT_SYMBOL_GPL(host_efer);
> 
>@@ -9472,8 +9474,10 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
> 
> 	rdmsrl_safe(MSR_EFER, &host_efer);
> 
>-	if (boot_cpu_has(X86_FEATURE_XSAVES))
>+	if (boot_cpu_has(X86_FEATURE_XSAVES)) {
> 		rdmsrl(MSR_IA32_XSS, host_xss);
>+		kvm_caps.supported_xss = host_xss & KVM_SUPPORTED_XSS;
>+	}
> 
> 	kvm_init_pmu_capability(ops->pmu_ops);
> 
>-- 
>2.27.0
>

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 10/21] KVM:x86: Add #CP support in guest exception classification
  2023-05-11  4:08 ` [PATCH v3 10/21] KVM:x86: Add #CP support in guest exception classification Yang Weijiang
@ 2023-06-06  9:08   ` Chao Gao
  2023-06-08  6:01     ` Yang, Weijiang
  0 siblings, 1 reply; 99+ messages in thread
From: Chao Gao @ 2023-06-06  9:08 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: seanjc, pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen

On Thu, May 11, 2023 at 12:08:46AM -0400, Yang Weijiang wrote:
>Add handling for Control Protection (#CP) exceptions(vector 21).
>The new vector is introduced for Intel's Control-Flow Enforcement
>Technology (CET) relevant violation cases.
>

>Although #CP belongs contributory exception class, but the actual
>effect is conditional on CET being exposed to guest. If CET is not
>available to guest, #CP falls back to non-contributory and doesn't
>have an error code.

This sounds weird. is this the hardware behavior? If yes, could you
point us to where this behavior is documented?

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 12/21] KVM:x86: Add fault checks for guest CR4.CET setting
  2023-05-11  4:08 ` [PATCH v3 12/21] KVM:x86: Add fault checks for guest CR4.CET setting Yang Weijiang
@ 2023-06-06 11:03   ` Chao Gao
  2023-06-08  6:06     ` Yang, Weijiang
  0 siblings, 1 reply; 99+ messages in thread
From: Chao Gao @ 2023-06-06 11:03 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: seanjc, pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen, Sean Christopherson

On Thu, May 11, 2023 at 12:08:48AM -0400, Yang Weijiang wrote:
>Check potential faults for CR4.CET setting per Intel SDM.

>CR4.CET is the master control bit for CET features (SHSTK and IBT).
>In addition to basic support checks,

To me, this implies some checks against CR4.CET when enabling SHSTK or
IBT. but the checks are not added by this patch. Then, why bother to
mention this?

>CET can be enabled if and only
>if CR0.WP==1, i.e. setting CR4.CET=1 faults if CR0.WP==0 and setting
>CR0.WP=0 fails if CR4.CET==1.
>
>Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
>Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
>Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
>---
> arch/x86/kvm/x86.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>index a768cbf3fbb7..b6eec9143129 100644
>--- a/arch/x86/kvm/x86.c
>+++ b/arch/x86/kvm/x86.c
>@@ -995,6 +995,9 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
> 	    (is_64_bit_mode(vcpu) || kvm_is_cr4_bit_set(vcpu, X86_CR4_PCIDE)))
> 		return 1;
> 
>+	if (!(cr0 & X86_CR0_WP) && kvm_is_cr4_bit_set(vcpu, X86_CR4_CET))
>+		return 1;
>+
> 	static_call(kvm_x86_set_cr0)(vcpu, cr0);
> 
> 	kvm_post_set_cr0(vcpu, old_cr0, cr0);
>@@ -1210,6 +1213,9 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
> 			return 1;
> 	}
> 
>+	if ((cr4 & X86_CR4_CET) && !kvm_is_cr0_bit_set(vcpu, X86_CR0_WP))
>+		return 1;
>+
> 	static_call(kvm_x86_set_cr4)(vcpu, cr4);
> 
> 	kvm_post_set_cr4(vcpu, old_cr4, cr4);
>-- 
>2.27.0
>

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 08/21] KVM:x86: Init kvm_caps.supported_xss with supported feature bits
  2023-06-06  8:38   ` Chao Gao
@ 2023-06-08  5:42     ` Yang, Weijiang
  0 siblings, 0 replies; 99+ messages in thread
From: Yang, Weijiang @ 2023-06-08  5:42 UTC (permalink / raw)
  To: Chao Gao
  Cc: seanjc, pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen


On 6/6/2023 4:38 PM, Chao Gao wrote:
> On Thu, May 11, 2023 at 12:08:44AM -0400, Yang Weijiang wrote:
>> Initialize kvm_caps.supported_xss with host XSS msr value AND XSS mask.
>> KVM_SUPPORTED_XSS holds all potential supported feature bits,
>> the result
>> represents all KVM supported feature bits which is used for swapping guest
>> and host FPU contents.
> do you mean kvm_caps.supported_xss by "the result"? I don't see how
> fpu_swap_kvm_fpstate() uses kvm_caps.supported_xss.

The wording is not accurate, what I meant is : the resulting bits are 
supported by

fpu_swap_kvm_fpstate(). I will change the commit logs in next version, thanks!

>
>> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
>> ---
>> arch/x86/kvm/vmx/vmx.c | 1 -
>> arch/x86/kvm/x86.c     | 6 +++++-
>> 2 files changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index 44fb619803b8..c872a5aafa50 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -7806,7 +7806,6 @@ static __init void vmx_set_cpu_caps(void)
>> 		kvm_cpu_cap_set(X86_FEATURE_UMIP);
>>
>> 	/* CPUID 0xD.1 */
>> -	kvm_caps.supported_xss = 0;
> AMD has the same statement. Do you need to remove that one?

Since it appears in svm.c, so I assume Allen(AMD) will change it

in his following up patch series.

[...]


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 10/21] KVM:x86: Add #CP support in guest exception classification
  2023-06-06  9:08   ` Chao Gao
@ 2023-06-08  6:01     ` Yang, Weijiang
  2023-06-15 23:58       ` Sean Christopherson
  0 siblings, 1 reply; 99+ messages in thread
From: Yang, Weijiang @ 2023-06-08  6:01 UTC (permalink / raw)
  To: Chao Gao
  Cc: seanjc, pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen


On 6/6/2023 5:08 PM, Chao Gao wrote:
> On Thu, May 11, 2023 at 12:08:46AM -0400, Yang Weijiang wrote:
>> Add handling for Control Protection (#CP) exceptions(vector 21).
>> The new vector is introduced for Intel's Control-Flow Enforcement
>> Technology (CET) relevant violation cases.
>>
>> Although #CP belongs contributory exception class, but the actual
>> effect is conditional on CET being exposed to guest. If CET is not
>> available to guest, #CP falls back to non-contributory and doesn't
>> have an error code.
> This sounds weird. is this the hardware behavior? If yes, could you
> point us to where this behavior is documented?

It's not SDM documented behavior.

The original description is provided by Sean here:

Re: [PATCH v15 04/14] KVM: x86: Add #CP support in guest exception 
dispatch - Sean Christopherson (kernel.org) 
<https://lore.kernel.org/all/YBsZwvwhshw+s7yQ@google.com/>

I also verified the issue on my side.  If the KVM CET patches are there 
in L1 but CET is not enabled, and running some unit test can

trigger unit test failure although the #CP induced one has been fixed in 
KVM unit tests.


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 12/21] KVM:x86: Add fault checks for guest CR4.CET setting
  2023-06-06 11:03   ` Chao Gao
@ 2023-06-08  6:06     ` Yang, Weijiang
  0 siblings, 0 replies; 99+ messages in thread
From: Yang, Weijiang @ 2023-06-08  6:06 UTC (permalink / raw)
  To: Chao Gao
  Cc: seanjc, pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen, Sean Christopherson


On 6/6/2023 7:03 PM, Chao Gao wrote:
> On Thu, May 11, 2023 at 12:08:48AM -0400, Yang Weijiang wrote:
>> Check potential faults for CR4.CET setting per Intel SDM.
>> CR4.CET is the master control bit for CET features (SHSTK and IBT).
>> In addition to basic support checks,
> To me, this implies some checks against CR4.CET when enabling SHSTK or
> IBT. but the checks are not added by this patch. Then, why bother to
> mention this?

OK, I'll remove these unnecessary words and change the commit logs.

>
>> CET can be enabled if and only
>> if CR0.WP==1, i.e. setting CR4.CET=1 faults if CR0.WP==0 and setting
>> CR0.WP=0 fails if CR4.CET==1.
>>
>> Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
>> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
>> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
>> ---
>> arch/x86/kvm/x86.c | 6 ++++++
>> 1 file changed, 6 insertions(+)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index a768cbf3fbb7..b6eec9143129 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -995,6 +995,9 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
>> 	    (is_64_bit_mode(vcpu) || kvm_is_cr4_bit_set(vcpu, X86_CR4_PCIDE)))
>> 		return 1;
>>
>> +	if (!(cr0 & X86_CR0_WP) && kvm_is_cr4_bit_set(vcpu, X86_CR4_CET))
>> +		return 1;
>> +
>> 	static_call(kvm_x86_set_cr0)(vcpu, cr0);
>>
>> 	kvm_post_set_cr0(vcpu, old_cr0, cr0);
>> @@ -1210,6 +1213,9 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
>> 			return 1;
>> 	}
>>
>> +	if ((cr4 & X86_CR4_CET) && !kvm_is_cr0_bit_set(vcpu, X86_CR0_WP))
>> +		return 1;
>> +
>> 	static_call(kvm_x86_set_cr4)(vcpu, cr4);
>>
>> 	kvm_post_set_cr4(vcpu, old_cr4, cr4);
>> -- 
>> 2.27.0
>>

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 00/21] Enable CET Virtualization
  2023-05-11  4:08 [PATCH v3 00/21] Enable CET Virtualization Yang Weijiang
                   ` (20 preceding siblings ...)
  2023-05-11  4:08 ` [PATCH v3 21/21] KVM:x86: Support CET supervisor shadow stack MSR access Yang Weijiang
@ 2023-06-15 23:30 ` Sean Christopherson
  2023-06-16  0:00   ` Sean Christopherson
  2023-06-16  8:25   ` Yang, Weijiang
  21 siblings, 2 replies; 99+ messages in thread
From: Sean Christopherson @ 2023-06-15 23:30 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen

On Thu, May 11, 2023, Yang Weijiang wrote:
> The last patch is introduced to support supervisor SHSTK but the feature is
> not enabled on Intel platform for now, the main purpose of this patch is to
> facilitate AMD folks to enable the feature.

I am beyond confused by the SDM's wording of CET_SSS.

First, it says that CET_SSS says the CPU isn't buggy (or maybe "less buggy" is
more appropriate phrasing).

  Bit 18: CET_SSS. If 1, indicates that an operating system can enable supervisor
  shadow stacks as long as it ensures that certain supervisor shadow-stack pushes
  will not cause page faults (see Section 17.2.3 of the Intel® 64 and IA-32
  Architectures Software Developer’s Manual, Volume 1).

But then it says says VMMs shouldn't set the bit.

  When emulating the CPUID instruction, a virtual-machine monitor should return
  this bit as 0 if those pushes can cause VM exits.

Based on the Xen code (which is sadly a far better source of information than the
SDM), I *think* that what the SDM is trying to say is that VMMs should not set
CET_SS if VM-Exits can occur ***and*** the bit is not set in the host CPU.  Because
if the SDM really means "VMMs should never set the bit", then what on earth is the
point of the bit.

> In summary, this new series enables CET user SHSTK/IBT and kernel IBT, but
> doesn't fully support CET supervisor SHSTK, the enabling work is left for
> the future.

Why?  If my interpretation of the SDM is correct, then all the pieces are there.

> Executed all KVM-unit-test cases and KVM selftests against this series, all
> test cases passed except the vmx test, the failure is due to CR4_CET bit
> testing in test_vmxon_bad_cr(). After add CR4_CET bit to skip list, the test
> passed. I'll send a patch to fix this issue later.

Your cover letter from v2 back in April said the same thing.  Why hasn't the patch
been posted?  And what exactly is the issue?  IIUC, setting CR4.CET with
MSR_IA32_S_CET=0 and MSR_IA32_U_CET=0 should be a nop, which suggests that there's
a KVM bug.  And if that's the case, the next obvious questions is, why are you
posting known buggy code?

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 07/21] KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS
  2023-05-31  1:11         ` Yang, Weijiang
@ 2023-06-15 23:45           ` Sean Christopherson
  2023-06-16  1:58             ` Yang, Weijiang
  0 siblings, 1 reply; 99+ messages in thread
From: Sean Christopherson @ 2023-06-15 23:45 UTC (permalink / raw)
  To: Weijiang Yang
  Cc: Chao Gao, pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen, Zhang Yi Z

On Wed, May 31, 2023, Weijiang Yang wrote:
> 
> On 5/30/2023 8:08 PM, Chao Gao wrote:
> > > > > --- a/arch/x86/kvm/x86.c
> > > > > +++ b/arch/x86/kvm/x86.c
> > > > > @@ -3776,8 +3776,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > > > > 		 */
> > > > > 		if (data & ~kvm_caps.supported_xss)
> > > > Shouldn't we check against the supported value of _this_ guest? similar to
> > > > guest_supported_xcr0.
> > > I don't think it requires an extra variable to serve per guest purpose.
> > > 
> > > For guest XSS settings, now we don't add extra constraints like XCR0, thus
> > QEMU can impose constraints by configuring guest CPUID.0xd.1 to indicate
> > certain supervisor state components cannot be managed by XSAVES, even
> > though KVM supports them. IOW, guests may differ in the supported values
> > for the IA32_XSS MSR.
> 
> OK, will change this part to align with xcr0 settings. Thanks!

Please write KVM-Unit-Tests to verify KVM correctly handles the various MSRs related
to CET, e.g. a test_cet_msrs() subtest in msr.c would do nicely.  Hmm, though testing
the combinations of CPUID bits will require multiple x86/unittests.cfg entries.
Might be time to split up msr.c into a library and then multiple tests.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 09/21] KVM:x86: Load guest FPU state when accessing xsaves-managed MSRs
  2023-05-11  4:08 ` [PATCH v3 09/21] KVM:x86: Load guest FPU state when accessing xsaves-managed MSRs Yang Weijiang
@ 2023-06-15 23:50   ` Sean Christopherson
  2023-06-16  2:02     ` Yang, Weijiang
  0 siblings, 1 reply; 99+ messages in thread
From: Sean Christopherson @ 2023-06-15 23:50 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen, Sean Christopherson

On Thu, May 11, 2023, Yang Weijiang wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> Load the guest's FPU state if userspace is accessing MSRs whose values are
> managed by XSAVES. Two MSR access helpers, i.e., kvm_{get,set}_xsave_msr(),
> are introduced by a later patch to facilitate access to this kind of MSRs.
> 
> If new feature MSRs supported in XSS are passed through to the guest they
> are saved and restored by {XSAVES|XRSTORS} to/from guest's FPU state at
> vm-entry/exit.
> 
> Because the modified code is also used for the KVM_GET_MSRS device ioctl(),
> explicitly check @vcpu is non-null before attempting to load guest state.
> The XSS supporting MSRs cannot be retrieved via the device ioctl() without
> loading guest FPU state (which doesn't exist).
> 
> Note that guest_cpuid_has() is not queried as host userspace is allowed
> to access MSRs that have not been exposed to the guest, e.g. it might do
> KVM_SET_MSRS prior to KVM_SET_CPUID2.
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Co-developed-by: Yang Weijiang <weijiang.yang@intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> ---
>  arch/x86/kvm/x86.c | 29 ++++++++++++++++++++++++++++-
>  1 file changed, 28 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index d2975ca96ac5..7788646bbf1f 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -130,6 +130,9 @@ static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
>  static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
>  
>  static DEFINE_MUTEX(vendor_module_lock);
> +static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
> +static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
> +
>  struct kvm_x86_ops kvm_x86_ops __read_mostly;
>  
>  #define KVM_X86_OP(func)					     \
> @@ -4336,6 +4339,21 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  }
>  EXPORT_SYMBOL_GPL(kvm_get_msr_common);
>  
> +static const u32 xsave_msrs[] = {

Can you change this to "xstate_msrs"?


> +	MSR_IA32_U_CET, MSR_IA32_PL3_SSP,
> +};
> +
> +static bool is_xsaves_msr(u32 index)

And then is_xstate_msr().  The intent to is check if an MSR is managed as part of
the xstate, not if the MSR is somehow related to XSAVE itself.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 10/21] KVM:x86: Add #CP support in guest exception classification
  2023-06-08  6:01     ` Yang, Weijiang
@ 2023-06-15 23:58       ` Sean Christopherson
  2023-06-16  6:56         ` Yang, Weijiang
  0 siblings, 1 reply; 99+ messages in thread
From: Sean Christopherson @ 2023-06-15 23:58 UTC (permalink / raw)
  To: Weijiang Yang
  Cc: Chao Gao, pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen

On Thu, Jun 08, 2023, Weijiang Yang wrote:
> 
> On 6/6/2023 5:08 PM, Chao Gao wrote:
> > On Thu, May 11, 2023 at 12:08:46AM -0400, Yang Weijiang wrote:
> > > Add handling for Control Protection (#CP) exceptions(vector 21).
> > > The new vector is introduced for Intel's Control-Flow Enforcement
> > > Technology (CET) relevant violation cases.
> > > 
> > > Although #CP belongs contributory exception class, but the actual
> > > effect is conditional on CET being exposed to guest. If CET is not
> > > available to guest, #CP falls back to non-contributory and doesn't
> > > have an error code.
> > This sounds weird. is this the hardware behavior? If yes, could you
> > point us to where this behavior is documented?
> 
> It's not SDM documented behavior.

The #CP behavior needs to be documented.  Please pester whoever you need to in
order to make that happen.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 00/21] Enable CET Virtualization
  2023-06-15 23:30 ` [PATCH v3 00/21] Enable CET Virtualization Sean Christopherson
@ 2023-06-16  0:00   ` Sean Christopherson
  2023-06-16  1:00     ` Yang, Weijiang
  2023-06-16  8:25   ` Yang, Weijiang
  1 sibling, 1 reply; 99+ messages in thread
From: Sean Christopherson @ 2023-06-16  0:00 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen

On Thu, Jun 15, 2023, Sean Christopherson wrote:
> Your cover letter from v2 back in April said the same thing.  Why hasn't the patch
> been posted?  And what exactly is the issue?  IIUC, setting CR4.CET with
> MSR_IA32_S_CET=0 and MSR_IA32_U_CET=0 should be a nop, which suggests that there's
> a KVM bug.  And if that's the case, the next obvious questions is, why are you
> posting known buggy code?

Ah, is the problem that the test doesn't set CR0.WP as required by CR4.CET=1?

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 00/21] Enable CET Virtualization
  2023-06-16  0:00   ` Sean Christopherson
@ 2023-06-16  1:00     ` Yang, Weijiang
  0 siblings, 0 replies; 99+ messages in thread
From: Yang, Weijiang @ 2023-06-16  1:00 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen


On 6/16/2023 8:00 AM, Sean Christopherson wrote:
> On Thu, Jun 15, 2023, Sean Christopherson wrote:
>> Your cover letter from v2 back in April said the same thing.  Why hasn't the patch
>> been posted?  And what exactly is the issue?  IIUC, setting CR4.CET with
>> MSR_IA32_S_CET=0 and MSR_IA32_U_CET=0 should be a nop, which suggests that there's
>> a KVM bug.  And if that's the case, the next obvious questions is, why are you
>> posting known buggy code?
> Ah, is the problem that the test doesn't set CR0.WP as required by CR4.CET=1?

Thanks for taking time to review this series!

Yes, due to CR0.WP bit is not set while CR4.CET is being set.

The check is imposed by patch-12.

I'll add the fixup patch together with next the version.


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 07/21] KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS
  2023-06-15 23:45           ` Sean Christopherson
@ 2023-06-16  1:58             ` Yang, Weijiang
  2023-06-23 23:21               ` Sean Christopherson
  0 siblings, 1 reply; 99+ messages in thread
From: Yang, Weijiang @ 2023-06-16  1:58 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Chao Gao, pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen, Zhang Yi Z


On 6/16/2023 7:45 AM, Sean Christopherson wrote:
> On Wed, May 31, 2023, Weijiang Yang wrote:
>> On 5/30/2023 8:08 PM, Chao Gao wrote:
>>>>>> --- a/arch/x86/kvm/x86.c
>>>>>> +++ b/arch/x86/kvm/x86.c
>>>>>> @@ -3776,8 +3776,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>>>>> 		 */
>>>>>> 		if (data & ~kvm_caps.supported_xss)
>>>>> Shouldn't we check against the supported value of _this_ guest? similar to
>>>>> guest_supported_xcr0.
>>>> I don't think it requires an extra variable to serve per guest purpose.
>>>>
>>>> For guest XSS settings, now we don't add extra constraints like XCR0, thus
>>> QEMU can impose constraints by configuring guest CPUID.0xd.1 to indicate
>>> certain supervisor state components cannot be managed by XSAVES, even
>>> though KVM supports them. IOW, guests may differ in the supported values
>>> for the IA32_XSS MSR.
>> OK, will change this part to align with xcr0 settings. Thanks!
> Please write KVM-Unit-Tests to verify KVM correctly handles the various MSRs related
> to CET, e.g. a test_cet_msrs() subtest in msr.c would do nicely.  Hmm, though testing
> the combinations of CPUID bits will require multiple x86/unittests.cfg entries.
> Might be time to split up msr.c into a library and then multiple tests.

Since there's already a CET specific unit test app, do you mind adding 
all CET related stuffs to

the app to make it inclusive? e.g.,  validate constraints between CET 
CPUIDs vs. CET/XSS MSRs?


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 09/21] KVM:x86: Load guest FPU state when accessing xsaves-managed MSRs
  2023-06-15 23:50   ` Sean Christopherson
@ 2023-06-16  2:02     ` Yang, Weijiang
  0 siblings, 0 replies; 99+ messages in thread
From: Yang, Weijiang @ 2023-06-16  2:02 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen, Sean Christopherson


On 6/16/2023 7:50 AM, Sean Christopherson wrote:
> On Thu, May 11, 2023, Yang Weijiang wrote:
>> From: Sean Christopherson <sean.j.christopherson@intel.com>
>>
>> Load the guest's FPU state if userspace is accessing MSRs whose values are
>> managed by XSAVES. Two MSR access helpers, i.e., kvm_{get,set}_xsave_msr(),
>> are introduced by a later patch to facilitate access to this kind of MSRs.
>>
>>
>> [...]
>>   
>>   #define KVM_X86_OP(func)					     \
>> @@ -4336,6 +4339,21 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>   }
>>   EXPORT_SYMBOL_GPL(kvm_get_msr_common);
>>   
>> +static const u32 xsave_msrs[] = {
> Can you change this to "xstate_msrs"?
OK, will change it in next version.
>
>
>> +	MSR_IA32_U_CET, MSR_IA32_PL3_SSP,
>> +};
>> +
>> +static bool is_xsaves_msr(u32 index)
> And then is_xstate_msr().  The intent to is check if an MSR is managed as part of
> the xstate, not if the MSR is somehow related to XSAVE itself.
Make sense, will change it. Thanks!

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 10/21] KVM:x86: Add #CP support in guest exception classification
  2023-06-15 23:58       ` Sean Christopherson
@ 2023-06-16  6:56         ` Yang, Weijiang
  2023-06-16 18:57           ` Sean Christopherson
  0 siblings, 1 reply; 99+ messages in thread
From: Yang, Weijiang @ 2023-06-16  6:56 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Chao Gao, pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen


On 6/16/2023 7:58 AM, Sean Christopherson wrote:
> On Thu, Jun 08, 2023, Weijiang Yang wrote:
>> On 6/6/2023 5:08 PM, Chao Gao wrote:
>>> On Thu, May 11, 2023 at 12:08:46AM -0400, Yang Weijiang wrote:
>>>> Add handling for Control Protection (#CP) exceptions(vector 21).
>>>> The new vector is introduced for Intel's Control-Flow Enforcement
>>>> Technology (CET) relevant violation cases.
>>>>
>>>> Although #CP belongs contributory exception class, but the actual
>>>> effect is conditional on CET being exposed to guest. If CET is not
>>>> available to guest, #CP falls back to non-contributory and doesn't
>>>> have an error code.
>>> This sounds weird. is this the hardware behavior? If yes, could you
>>> point us to where this behavior is documented?
>> It's not SDM documented behavior.
> The #CP behavior needs to be documented.  Please pester whoever you need to in
> order to make that happen.

Do you mean documentation for #CP as an generic exception or the 
behavior in KVM as

this patch shows?


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 00/21] Enable CET Virtualization
  2023-06-15 23:30 ` [PATCH v3 00/21] Enable CET Virtualization Sean Christopherson
  2023-06-16  0:00   ` Sean Christopherson
@ 2023-06-16  8:25   ` Yang, Weijiang
  2023-06-16 17:56     ` Sean Christopherson
  1 sibling, 1 reply; 99+ messages in thread
From: Yang, Weijiang @ 2023-06-16  8:25 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen


On 6/16/2023 7:30 AM, Sean Christopherson wrote:
> On Thu, May 11, 2023, Yang Weijiang wrote:
>> The last patch is introduced to support supervisor SHSTK but the feature is
>> not enabled on Intel platform for now, the main purpose of this patch is to
>> facilitate AMD folks to enable the feature.
> I am beyond confused by the SDM's wording of CET_SSS.
>
> First, it says that CET_SSS says the CPU isn't buggy (or maybe "less buggy" is
> more appropriate phrasing).
>
>    Bit 18: CET_SSS. If 1, indicates that an operating system can enable supervisor
>    shadow stacks as long as it ensures that certain supervisor shadow-stack pushes
>    will not cause page faults (see Section 17.2.3 of the Intel® 64 and IA-32
>    Architectures Software Developer’s Manual, Volume 1).
>
> But then it says says VMMs shouldn't set the bit.
>
>    When emulating the CPUID instruction, a virtual-machine monitor should return
>    this bit as 0 if those pushes can cause VM exits.
>
> Based on the Xen code (which is sadly a far better source of information than the
> SDM), I *think* that what the SDM is trying to say is that VMMs should not set
> CET_SS if VM-Exits can occur ***and*** the bit is not set in the host CPU.  Because
> if the SDM really means "VMMs should never set the bit", then what on earth is the
> point of the bit.

I need to double check for the vague description.

 From my understanding, on bare metal side, if the bit is 1, OS can 
enable SSS if pushes won't cause

page fault. But for VM case, it's not recommended(regardless of the bit 
state) to set the bit as vm-exits

caused by guest SSS pushes cannot be fully excluded.

In other word, the bit is mainly for bare metal guidance now.

>> In summary, this new series enables CET user SHSTK/IBT and kernel IBT, but
>> doesn't fully support CET supervisor SHSTK, the enabling work is left for
>> the future.
> Why?  If my interpretation of the SDM is correct, then all the pieces are there.

My assumption is,  VM supervisor SHSTK depends bare metal kernel support 
as PL0_SSP MSR is

backed by XSAVES via IA32_XSS:bit12(CET_S), but this part of support is 
not there in Rick's native series.

And also based on above SDM description, I don't want to add the support 
blindly now.

> [...]

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 00/21] Enable CET Virtualization
  2023-06-16  8:25   ` Yang, Weijiang
@ 2023-06-16 17:56     ` Sean Christopherson
  2023-06-19  6:41       ` Yang, Weijiang
  2023-07-10  0:28       ` Yang, Weijiang
  0 siblings, 2 replies; 99+ messages in thread
From: Sean Christopherson @ 2023-06-16 17:56 UTC (permalink / raw)
  To: Weijiang Yang
  Cc: pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen

On Fri, Jun 16, 2023, Weijiang Yang wrote:
> 
> On 6/16/2023 7:30 AM, Sean Christopherson wrote:
> > On Thu, May 11, 2023, Yang Weijiang wrote:
> > > The last patch is introduced to support supervisor SHSTK but the feature is
> > > not enabled on Intel platform for now, the main purpose of this patch is to
> > > facilitate AMD folks to enable the feature.
> > I am beyond confused by the SDM's wording of CET_SSS.
> > 
> > First, it says that CET_SSS says the CPU isn't buggy (or maybe "less buggy" is
> > more appropriate phrasing).
> > 
> >    Bit 18: CET_SSS. If 1, indicates that an operating system can enable supervisor
> >    shadow stacks as long as it ensures that certain supervisor shadow-stack pushes
> >    will not cause page faults (see Section 17.2.3 of the Intel® 64 and IA-32
> >    Architectures Software Developer’s Manual, Volume 1).
> > 
> > But then it says says VMMs shouldn't set the bit.
> > 
> >    When emulating the CPUID instruction, a virtual-machine monitor should return
> >    this bit as 0 if those pushes can cause VM exits.
> > 
> > Based on the Xen code (which is sadly a far better source of information than the
> > SDM), I *think* that what the SDM is trying to say is that VMMs should not set
> > CET_SS if VM-Exits can occur ***and*** the bit is not set in the host CPU.  Because
> > if the SDM really means "VMMs should never set the bit", then what on earth is the
> > point of the bit.
> 
> I need to double check for the vague description.
> 
> From my understanding, on bare metal side, if the bit is 1, OS can enable
> SSS if pushes won't cause page fault. But for VM case, it's not recommended
> (regardless of the bit state) to set the bit as vm-exits caused by guest SSS
> pushes cannot be fully excluded.
> 
> In other word, the bit is mainly for bare metal guidance now.
> 
> > > In summary, this new series enables CET user SHSTK/IBT and kernel IBT, but
> > > doesn't fully support CET supervisor SHSTK, the enabling work is left for
> > > the future.
> > Why?  If my interpretation of the SDM is correct, then all the pieces are there.

...

> And also based on above SDM description, I don't want to add the support
> blindly now.

*sigh*

I got filled in on the details offlist.

1) In the next version of this series, please rework it to reincorporate Supervisor
   Shadow Stack support into the main series, i.e. pretend Intel's implemenation
   isn't horribly flawed.  KVM can't guarantee that a VM-Exit won't occur, i.e.
   can't advertise CET_SS, but I want the baseline support to be implemented,
   otherwise the series as a whole is a big confusing mess with unanswered question
   left, right, and center.  And more importantly, architecturally SSS exists if
   X86_FEATURE_SHSTK is enumerated, i.e. the guest should be allowed to utilize
   SSS if it so chooses, with the obvious caveat that there's a non-zero chance
   the guest risks death by doing so.  Or if userspace can ensure no VM-Exit will
   occur, which is difficult but feasible (ignoring #MC), e.g. by statically
   partitioning memory, prefaulting all memory in guest firmware, and not dirty
   logging SSS pages.  In such an extreme setup, userspace can enumerate CET_SSS
   to the guest, and KVM should support that.
 
2) Add the below patch to document exactly why KVM doesn't advertise CET_SSS.
   While Intel is apparently ok with treating KVM developers like mushrooms, I
   am not.

---
From: Sean Christopherson <seanjc@google.com>
Date: Fri, 16 Jun 2023 10:04:37 -0700
Subject: [PATCH] KVM: x86: Explicitly document that KVM must not advertise
 CET_SSS

Explicitly call out that KVM must NOT advertise CET_SSS to userspace,
i.e. must not tell userspace and thus the guest that it is safe for the
guest to enable Supervisor Shadow Stacks (SSS).

Intel's implementation of SSS is fatally flawed for virtualized
environments, as despite wording in the SDM that suggests otherwise,
Intel CPUs' handling of shadow stack switches are NOT fully atomic.  Only
the check-and-update of the supervisor shadow stack token's busy bit is
atomic.  Per the SDM:

  If the far CALL or event delivery pushes a stack frame after the token
  is acquired and any of the pushes causes a fault or VM exit, the
  processor will revert to the old shadow stack and the busy bit in the
  new shadow stack's token remains set.

Or more bluntly, any fault or VM-Exit that occurs when pushing to the
shadow stack after the busy bit is set is fatal to the kernel, i.e. to
the guest in KVM's case.  The (guest) kernel can protect itself against
faults, e.g. by ensuring that the shadow stack always has a valid mapping,
but a guest kernel obviously has no control over, or even knowledge of,
VM-Exits due to host activity.

To help software determine when it is safe to use SSS, Intel defined
CPUID.0x7.1.EDX bit (CET_SSS) and updated Intel CPUs to enumerate CET_SS,
i.e. bare metal Intel CPUs advertise to software that it is safe to enable
SSS.

  If CPUID.(EAX=07H,ECX=1H):EDX[bit 18] is enumerated as 1, it is
  sufficient for an operating system to ensure that none of the pushes can
  cause a page fault.

But CET_SS also comes with an major caveat that is kinda sorta documented
in the SDM:

  When emulating the CPUID instruction, a virtual-machine monitor should
  return this bit as 0 if those pushes can cause VM exits.

In other words, CET_SSS (bit 18) does NOT enumerate that the underlying
CPU prevents VM-Exits, only that the environment in which the software is
running will not generate VM-Exits.  I.e. CET_SSS is a stopgap to stem the
bleeding and allow kernels to enable SSS, not an indication that the
underlying CPU is immune to the VM-Exit problem.

And unfortunately, KVM itself effectively has zero chance of ensuring that
a shadow stack switch can't trigger a VM-Exit, e.g. KVM zaps *all* SPTEs
when any memslot is deleted, enabling dirty logging write-protects SPTEs,
etc.  A sufficiently motivated userspace can, at least in theory, provide
a safe environment for SSS, e.g. by statically partitioning and
prefaulting (in guest firmware) all memory, disabling PML, never
write-protecting guest shadow stacks, etc.  But such a setup is far, far
beyond typical KVM deployments.

Note, AMD CPUs have a similar erratum, but AMD CPUs *DO* perform the full
shadow stack switch atomically so long as the stack is mapped WB and does
not cross a page boundary, i.e. a "normal" KVM setup and a well-behaved
guest play nice with SSS without additional shenanigans.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 1e3ee96c879b..ecf4a68aaa08 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -658,7 +658,15 @@ void kvm_set_cpu_caps(void)
 	);
 
 	kvm_cpu_cap_init_kvm_defined(CPUID_7_1_EDX,
-		F(AVX_VNNI_INT8) | F(AVX_NE_CONVERT) | F(PREFETCHITI)
+		F(AVX_VNNI_INT8) | F(AVX_NE_CONVERT) | F(PREFETCHITI) |
+
+		/*
+		 * Do NOT advertise CET_SSS, i.e. do not tell userspace and the
+		 * guest that it is safe to use Supervisor Shadow Stacks under
+		 * KVM when running on Intel CPUs.  KVM itself cannot guarantee
+		 * that a VM-Exit won't occur during a shadow stack update.
+		 */
+		0 /* F(CET_SSS) */
 	);
 
 	kvm_cpu_cap_mask(CPUID_D_1_EAX,

base-commit: 9305c14847719870e9e08294034861360577ce08
-- 


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 10/21] KVM:x86: Add #CP support in guest exception classification
  2023-06-16  6:56         ` Yang, Weijiang
@ 2023-06-16 18:57           ` Sean Christopherson
  2023-06-19  9:28             ` Yang, Weijiang
  2023-06-30  9:34             ` Yang, Weijiang
  0 siblings, 2 replies; 99+ messages in thread
From: Sean Christopherson @ 2023-06-16 18:57 UTC (permalink / raw)
  To: Weijiang Yang
  Cc: Chao Gao, pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen

On Fri, Jun 16, 2023, Weijiang Yang wrote:
> 
> On 6/16/2023 7:58 AM, Sean Christopherson wrote:
> > On Thu, Jun 08, 2023, Weijiang Yang wrote:
> > > On 6/6/2023 5:08 PM, Chao Gao wrote:
> > > > On Thu, May 11, 2023 at 12:08:46AM -0400, Yang Weijiang wrote:
> > > > > Add handling for Control Protection (#CP) exceptions(vector 21).
> > > > > The new vector is introduced for Intel's Control-Flow Enforcement
> > > > > Technology (CET) relevant violation cases.
> > > > > 
> > > > > Although #CP belongs contributory exception class, but the actual
> > > > > effect is conditional on CET being exposed to guest. If CET is not
> > > > > available to guest, #CP falls back to non-contributory and doesn't
> > > > > have an error code.
> > > > This sounds weird. is this the hardware behavior? If yes, could you
> > > > point us to where this behavior is documented?
> > > It's not SDM documented behavior.
> > The #CP behavior needs to be documented.  Please pester whoever you need to in
> > order to make that happen.
> 
> Do you mean documentation for #CP as an generic exception or the behavior in
> KVM as this patch shows?

As I pointed out two *years* ago, this entry in the SDM

  — The field's deliver-error-code bit (bit 11) is 1 if each of the following
    holds: (1) the interruption type is hardware exception; (2) bit 0
    (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
    (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
    indicates one of the following exceptions: #DF (vector 8), #TS (10),
    #NP (11), #SS (12), #GP (13), #PF (14), or #AC (17).

needs to read something like

  — The field's deliver-error-code bit (bit 11) is 1 if each of the following
    holds: (1) the interruption type is hardware exception; (2) bit 0
    (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
    (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
    indicates one of the following exceptions: #DF (vector 8), #TS (10),
    #NP (11), #SS (12), #GP (13), #PF (14), #AC (17), or #CP (21)[1]

    [1] #CP has an error code if and only if IA32_VMX_CR4_FIXED1 enumerates
        support for the 1-setting of CR4.CET.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 00/21] Enable CET Virtualization
  2023-06-16 17:56     ` Sean Christopherson
@ 2023-06-19  6:41       ` Yang, Weijiang
  2023-06-23 20:51         ` Sean Christopherson
  2023-07-10  0:28       ` Yang, Weijiang
  1 sibling, 1 reply; 99+ messages in thread
From: Yang, Weijiang @ 2023-06-19  6:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen


On 6/17/2023 1:56 AM, Sean Christopherson wrote:
> On Fri, Jun 16, 2023, Weijiang Yang wrote:
>> On 6/16/2023 7:30 AM, Sean Christopherson wrote:
>>> On Thu, May 11, 2023, Yang Weijiang wrote:
>>>> The last patch is introduced to support supervisor SHSTK but the feature is
>>>> not enabled on Intel platform for now, the main purpose of this patch is to
>>>> facilitate AMD folks to enable the feature.
>>> I am beyond confused by the SDM's wording of CET_SSS.
>>>
>>> First, it says that CET_SSS says the CPU isn't buggy (or maybe "less buggy" is
>>> more appropriate phrasing).
>>>
>>>     Bit 18: CET_SSS. If 1, indicates that an operating system can enable supervisor
>>>     shadow stacks as long as it ensures that certain supervisor shadow-stack pushes
>>>     will not cause page faults (see Section 17.2.3 of the Intel® 64 and IA-32
>>>     Architectures Software Developer’s Manual, Volume 1).
>>>
>>> But then it says says VMMs shouldn't set the bit.
>>>
>>>     When emulating the CPUID instruction, a virtual-machine monitor should return
>>>     this bit as 0 if those pushes can cause VM exits.
>>>
>>> Based on the Xen code (which is sadly a far better source of information than the
>>> SDM), I *think* that what the SDM is trying to say is that VMMs should not set
>>> CET_SS if VM-Exits can occur ***and*** the bit is not set in the host CPU.  Because
>>> if the SDM really means "VMMs should never set the bit", then what on earth is the
>>> point of the bit.
>> I need to double check for the vague description.
>>
>>  From my understanding, on bare metal side, if the bit is 1, OS can enable
>> SSS if pushes won't cause page fault. But for VM case, it's not recommended
>> (regardless of the bit state) to set the bit as vm-exits caused by guest SSS
>> pushes cannot be fully excluded.
>>
>> In other word, the bit is mainly for bare metal guidance now.
>>
>>>> In summary, this new series enables CET user SHSTK/IBT and kernel IBT, but
>>>> doesn't fully support CET supervisor SHSTK, the enabling work is left for
>>>> the future.
>>> Why?  If my interpretation of the SDM is correct, then all the pieces are there.
> ...
>
>> And also based on above SDM description, I don't want to add the support
>> blindly now.
> *sigh*
>
> I got filled in on the details offlist.
>
> 1) In the next version of this series, please rework it to reincorporate Supervisor
>     Shadow Stack support into the main series, i.e. pretend Intel's implemenation
>     isn't horribly flawed.

Let me make it clear, you want me to do two things:

1)Add Supervisor Shadow Stack  state support(i.e., XSS.bit12(CET_S)) 
into kernel so that host can

support guest Supervisor Shadow Stack MSRs in g/h FPU context switch.

2) Add Supervisor Shadow stack support into KVM part so that guest OS is 
able to use SSS with risk.

is it correct?

> KVM can't guarantee that a VM-Exit won't occur, i.e.
>     can't advertise CET_SS, but I want the baseline support to be implemented,
>     otherwise the series as a whole is a big confusing mess with unanswered question
>     left, right, and center.  And more importantly, architecturally SSS exists if
>     X86_FEATURE_SHSTK is enumerated, i.e. the guest should be allowed to utilize
>     SSS if it so chooses, with the obvious caveat that there's a non-zero chance
>     the guest risks death by doing so.  Or if userspace can ensure no VM-Exit will
>     occur, which is difficult but feasible (ignoring #MC), e.g. by statically
>     partitioning memory, prefaulting all memory in guest firmware, and not dirty
>     logging SSS pages.  In such an extreme setup, userspace can enumerate CET_SSS
>     to the guest, and KVM should support that.

Make sense, provide support but take risk on your own.

>   
> 2) Add the below patch to document exactly why KVM doesn't advertise CET_SSS.
>     While Intel is apparently ok with treating KVM developers like mushrooms, I
>     am not.

will add it, thanks a lot for detailed change logs!

>
> ---
> From: Sean Christopherson <seanjc@google.com>
> Date: Fri, 16 Jun 2023 10:04:37 -0700
> Subject: [PATCH] KVM: x86: Explicitly document that KVM must not advertise
>   CET_SSS
>
> Explicitly call out that KVM must NOT advertise CET_SSS to userspace,
> i.e. must not tell userspace and thus the guest that it is safe for the
> guest to enable Supervisor Shadow Stacks (SSS).
>
> Intel's implementation of SSS is fatally flawed for virtualized
> environments, as despite wording in the SDM that suggests otherwise,
> Intel CPUs' handling of shadow stack switches are NOT fully atomic.  Only
> the check-and-update of the supervisor shadow stack token's busy bit is
> atomic.  Per the SDM:
>
>    If the far CALL or event delivery pushes a stack frame after the token
>    is acquired and any of the pushes causes a fault or VM exit, the
>    processor will revert to the old shadow stack and the busy bit in the
>    new shadow stack's token remains set.
>
> Or more bluntly, any fault or VM-Exit that occurs when pushing to the
> shadow stack after the busy bit is set is fatal to the kernel, i.e. to
> the guest in KVM's case.  The (guest) kernel can protect itself against
> faults, e.g. by ensuring that the shadow stack always has a valid mapping,
> but a guest kernel obviously has no control over, or even knowledge of,
> VM-Exits due to host activity.
>
> To help software determine when it is safe to use SSS, Intel defined
> CPUID.0x7.1.EDX bit (CET_SSS) and updated Intel CPUs to enumerate CET_SS,
> i.e. bare metal Intel CPUs advertise to software that it is safe to enable
> SSS.
>
>    If CPUID.(EAX=07H,ECX=1H):EDX[bit 18] is enumerated as 1, it is
>    sufficient for an operating system to ensure that none of the pushes can
>    cause a page fault.
>
> But CET_SS also comes with an major caveat that is kinda sorta documented
> in the SDM:
>
>    When emulating the CPUID instruction, a virtual-machine monitor should
>    return this bit as 0 if those pushes can cause VM exits.
>
> In other words, CET_SSS (bit 18) does NOT enumerate that the underlying
> CPU prevents VM-Exits, only that the environment in which the software is
> running will not generate VM-Exits.  I.e. CET_SSS is a stopgap to stem the
> bleeding and allow kernels to enable SSS, not an indication that the
> underlying CPU is immune to the VM-Exit problem.
>
> And unfortunately, KVM itself effectively has zero chance of ensuring that
> a shadow stack switch can't trigger a VM-Exit, e.g. KVM zaps *all* SPTEs
> when any memslot is deleted, enabling dirty logging write-protects SPTEs,
> etc.  A sufficiently motivated userspace can, at least in theory, provide
> a safe environment for SSS, e.g. by statically partitioning and
> prefaulting (in guest firmware) all memory, disabling PML, never
> write-protecting guest shadow stacks, etc.  But such a setup is far, far
> beyond typical KVM deployments.
>
> Note, AMD CPUs have a similar erratum, but AMD CPUs *DO* perform the full
> shadow stack switch atomically so long as the stack is mapped WB and does
> not cross a page boundary, i.e. a "normal" KVM setup and a well-behaved
> guest play nice with SSS without additional shenanigans.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/cpuid.c | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 1e3ee96c879b..ecf4a68aaa08 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -658,7 +658,15 @@ void kvm_set_cpu_caps(void)
>   	);
>   
>   	kvm_cpu_cap_init_kvm_defined(CPUID_7_1_EDX,
> -		F(AVX_VNNI_INT8) | F(AVX_NE_CONVERT) | F(PREFETCHITI)
> +		F(AVX_VNNI_INT8) | F(AVX_NE_CONVERT) | F(PREFETCHITI) |
> +
> +		/*
> +		 * Do NOT advertise CET_SSS, i.e. do not tell userspace and the
> +		 * guest that it is safe to use Supervisor Shadow Stacks under
> +		 * KVM when running on Intel CPUs.  KVM itself cannot guarantee
> +		 * that a VM-Exit won't occur during a shadow stack update.
> +		 */
> +		0 /* F(CET_SSS) */
>   	);
>   
>   	kvm_cpu_cap_mask(CPUID_D_1_EAX,
>
> base-commit: 9305c14847719870e9e08294034861360577ce08

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 10/21] KVM:x86: Add #CP support in guest exception classification
  2023-06-16 18:57           ` Sean Christopherson
@ 2023-06-19  9:28             ` Yang, Weijiang
  2023-06-30  9:34             ` Yang, Weijiang
  1 sibling, 0 replies; 99+ messages in thread
From: Yang, Weijiang @ 2023-06-19  9:28 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Chao Gao, pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen


On 6/17/2023 2:57 AM, Sean Christopherson wrote:
> On Fri, Jun 16, 2023, Weijiang Yang wrote:
>> On 6/16/2023 7:58 AM, Sean Christopherson wrote:
>>> On Thu, Jun 08, 2023, Weijiang Yang wrote:
>>>> On 6/6/2023 5:08 PM, Chao Gao wrote:
>>>>> On Thu, May 11, 2023 at 12:08:46AM -0400, Yang Weijiang wrote:
>>>>>> Add handling for Control Protection (#CP) exceptions(vector 21).
>>>>>> The new vector is introduced for Intel's Control-Flow Enforcement
>>>>>> Technology (CET) relevant violation cases.
>>>>>>
>>>>>> Although #CP belongs contributory exception class, but the actual
>>>>>> effect is conditional on CET being exposed to guest. If CET is not
>>>>>> available to guest, #CP falls back to non-contributory and doesn't
>>>>>> have an error code.
>>>>> This sounds weird. is this the hardware behavior? If yes, could you
>>>>> point us to where this behavior is documented?
>>>> It's not SDM documented behavior.
>>> The #CP behavior needs to be documented.  Please pester whoever you need to in
>>> order to make that happen.
>> Do you mean documentation for #CP as an generic exception or the behavior in
>> KVM as this patch shows?
> As I pointed out two *years* ago, this entry in the SDM
>
>    — The field's deliver-error-code bit (bit 11) is 1 if each of the following
>      holds: (1) the interruption type is hardware exception; (2) bit 0
>      (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
>      (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
>      indicates one of the following exceptions: #DF (vector 8), #TS (10),
>      #NP (11), #SS (12), #GP (13), #PF (14), or #AC (17).
>
> needs to read something like
>
>    — The field's deliver-error-code bit (bit 11) is 1 if each of the following
>      holds: (1) the interruption type is hardware exception; (2) bit 0
>      (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
>      (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
>      indicates one of the following exceptions: #DF (vector 8), #TS (10),
>      #NP (11), #SS (12), #GP (13), #PF (14), #AC (17), or #CP (21)[1]
>
>      [1] #CP has an error code if and only if IA32_VMX_CR4_FIXED1 enumerates
>          support for the 1-setting of CR4.CET.

OK, I'll route the messages to related person, thanks!


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 00/21] Enable CET Virtualization
  2023-06-19  6:41       ` Yang, Weijiang
@ 2023-06-23 20:51         ` Sean Christopherson
  2023-06-26  6:46           ` Yang, Weijiang
  2023-07-17  7:44           ` Yang, Weijiang
  0 siblings, 2 replies; 99+ messages in thread
From: Sean Christopherson @ 2023-06-23 20:51 UTC (permalink / raw)
  To: Weijiang Yang
  Cc: pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen

On Mon, Jun 19, 2023, Weijiang Yang wrote:
> 
> On 6/17/2023 1:56 AM, Sean Christopherson wrote:
> > On Fri, Jun 16, 2023, Weijiang Yang wrote:
> > > On 6/16/2023 7:30 AM, Sean Christopherson wrote:
> > > > On Thu, May 11, 2023, Yang Weijiang wrote:
> > > > > The last patch is introduced to support supervisor SHSTK but the feature is
> > > > > not enabled on Intel platform for now, the main purpose of this patch is to
> > > > > facilitate AMD folks to enable the feature.
> > > > I am beyond confused by the SDM's wording of CET_SSS.
> > > > 
> > > > First, it says that CET_SSS says the CPU isn't buggy (or maybe "less buggy" is
> > > > more appropriate phrasing).
> > > > 
> > > >     Bit 18: CET_SSS. If 1, indicates that an operating system can enable supervisor
> > > >     shadow stacks as long as it ensures that certain supervisor shadow-stack pushes
> > > >     will not cause page faults (see Section 17.2.3 of the Intel® 64 and IA-32
> > > >     Architectures Software Developer’s Manual, Volume 1).
> > > > 
> > > > But then it says says VMMs shouldn't set the bit.
> > > > 
> > > >     When emulating the CPUID instruction, a virtual-machine monitor should return
> > > >     this bit as 0 if those pushes can cause VM exits.
> > > > 
> > > > Based on the Xen code (which is sadly a far better source of information than the
> > > > SDM), I *think* that what the SDM is trying to say is that VMMs should not set
> > > > CET_SS if VM-Exits can occur ***and*** the bit is not set in the host CPU.  Because
> > > > if the SDM really means "VMMs should never set the bit", then what on earth is the
> > > > point of the bit.
> > > I need to double check for the vague description.
> > > 
> > >  From my understanding, on bare metal side, if the bit is 1, OS can enable
> > > SSS if pushes won't cause page fault. But for VM case, it's not recommended
> > > (regardless of the bit state) to set the bit as vm-exits caused by guest SSS
> > > pushes cannot be fully excluded.
> > > 
> > > In other word, the bit is mainly for bare metal guidance now.
> > > 
> > > > > In summary, this new series enables CET user SHSTK/IBT and kernel IBT, but
> > > > > doesn't fully support CET supervisor SHSTK, the enabling work is left for
> > > > > the future.
> > > > Why?  If my interpretation of the SDM is correct, then all the pieces are there.
> > ...
> > 
> > > And also based on above SDM description, I don't want to add the support
> > > blindly now.
> > *sigh*
> > 
> > I got filled in on the details offlist.
> > 
> > 1) In the next version of this series, please rework it to reincorporate Supervisor
> >     Shadow Stack support into the main series, i.e. pretend Intel's implemenation
> >     isn't horribly flawed.
> 
> Let me make it clear, you want me to do two things:
> 
> 1)Add Supervisor Shadow Stack  state support(i.e., XSS.bit12(CET_S)) into
> kernel so that host can support guest Supervisor Shadow Stack MSRs in g/h FPU
> context switch.

If that's necessary for correct functionality, yes.

> 2) Add Supervisor Shadow stack support into KVM part so that guest OS is
> able to use SSS with risk.

Yes.  Architecturally, if KVM advertises X86_FEATURE_SHSTK, then KVM needs to
provide both User and Supervisor support.  CET_SSS doesn't change the architecture,
it's little more than a hint.  And even if the guest follows SDM's recommendation
to not enable shadow stacks, a clever kernel can still utilize SSS assets, e.g. use
the MSRs as scratch registers.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 16/21] KVM:x86: Save/Restore GUEST_SSP to/from SMM state save area
  2023-05-11  4:08 ` [PATCH v3 16/21] KVM:x86: Save/Restore GUEST_SSP to/from SMM state save area Yang Weijiang
@ 2023-06-23 22:30   ` Sean Christopherson
  2023-06-26  8:59     ` Yang, Weijiang
  0 siblings, 1 reply; 99+ messages in thread
From: Sean Christopherson @ 2023-06-23 22:30 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen

On Thu, May 11, 2023, Yang Weijiang wrote:
> Save GUEST_SSP to SMM state save area when guest exits to SMM
> due to SMI and restore it VMCS field when guest exits SMM.

This fails to answer "Why does KVM need to do this?"

> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> ---
>  arch/x86/kvm/smm.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c
> index b42111a24cc2..c54d3eb2b7e4 100644
> --- a/arch/x86/kvm/smm.c
> +++ b/arch/x86/kvm/smm.c
> @@ -275,6 +275,16 @@ static void enter_smm_save_state_64(struct kvm_vcpu *vcpu,
>  	enter_smm_save_seg_64(vcpu, &smram->gs, VCPU_SREG_GS);
>  
>  	smram->int_shadow = static_call(kvm_x86_get_interrupt_shadow)(vcpu);
> +
> +	if (kvm_cet_user_supported()) {

This is wrong, KVM should not save/restore state that doesn't exist from the guest's
perspective, i.e. this needs to check guest_cpuid_has().

On a related topic, I would love feedback on my series that adds a framework for
features like this, where KVM needs to check guest CPUID as well as host support.

https://lore.kernel.org/all/20230217231022.816138-1-seanjc@google.com

> +		struct msr_data msr;
> +
> +		msr.index = MSR_KVM_GUEST_SSP;
> +		msr.host_initiated = true;

Huh?

> +		/* GUEST_SSP is stored in VMCS at vm-exit. */

(a) this is not VMX code, i.e. referencing the VMCS is wrong, and (b) how the
guest's SSP is managed is irrelevant, all that matters is that KVM can get the
current guest value.

> +		static_call(kvm_x86_get_msr)(vcpu, &msr);
> +		smram->ssp = msr.data;
> +	}
>  }
>  #endif
>  
> @@ -565,6 +575,16 @@ static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt,
>  	static_call(kvm_x86_set_interrupt_shadow)(vcpu, 0);
>  	ctxt->interruptibility = (u8)smstate->int_shadow;
>  
> +	if (kvm_cet_user_supported()) {
> +		struct msr_data msr;
> +
> +		msr.index = MSR_KVM_GUEST_SSP;
> +		msr.host_initiated = true;
> +		msr.data = smstate->ssp;
> +		/* Mimic host_initiated access to bypass ssp access check. */

No, masquerading as a host access is all kinds of wrong.  I have no idea what
check you're trying to bypass, but whatever it is, it's wrong.  Per the SDM, the
SSP field in SMRAM is writable, which means that KVM needs to correctly handle
the scenario where SSP holds garbage, e.g. a non-canonical address.

Why can't this use kvm_get_msr() and kvm_set_msr()?

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 07/21] KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS
  2023-06-16  1:58             ` Yang, Weijiang
@ 2023-06-23 23:21               ` Sean Christopherson
  2023-06-26  9:24                 ` Yang, Weijiang
  0 siblings, 1 reply; 99+ messages in thread
From: Sean Christopherson @ 2023-06-23 23:21 UTC (permalink / raw)
  To: Weijiang Yang
  Cc: Chao Gao, pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen, Zhang Yi Z

On Fri, Jun 16, 2023, Weijiang Yang wrote:
> 
> On 6/16/2023 7:45 AM, Sean Christopherson wrote:
> > On Wed, May 31, 2023, Weijiang Yang wrote:
> > > On 5/30/2023 8:08 PM, Chao Gao wrote:
> > > > > > > --- a/arch/x86/kvm/x86.c
> > > > > > > +++ b/arch/x86/kvm/x86.c
> > > > > > > @@ -3776,8 +3776,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > > > > > > 		 */
> > > > > > > 		if (data & ~kvm_caps.supported_xss)
> > > > > > Shouldn't we check against the supported value of _this_ guest? similar to
> > > > > > guest_supported_xcr0.
> > > > > I don't think it requires an extra variable to serve per guest purpose.
> > > > > 
> > > > > For guest XSS settings, now we don't add extra constraints like XCR0, thus
> > > > QEMU can impose constraints by configuring guest CPUID.0xd.1 to indicate
> > > > certain supervisor state components cannot be managed by XSAVES, even
> > > > though KVM supports them. IOW, guests may differ in the supported values
> > > > for the IA32_XSS MSR.
> > > OK, will change this part to align with xcr0 settings. Thanks!
> > Please write KVM-Unit-Tests to verify KVM correctly handles the various MSRs related
> > to CET, e.g. a test_cet_msrs() subtest in msr.c would do nicely.  Hmm, though testing
> > the combinations of CPUID bits will require multiple x86/unittests.cfg entries.
> > Might be time to split up msr.c into a library and then multiple tests.
> 
> Since there's already a CET specific unit test app, do you mind adding all
> CET related stuffs to the app to make it inclusive? e.g.,�validate constraints
> between CET CPUIDs vs. CET/XSS MSRs?

Hmm, that will get a bit kludgy since the MSR testcases will want to toggle IBT
and SHSTK on and off.

Actually, I take back my suggestion to add a KUT test.  Except for a few special
cases, e.g. 32-bit support, selftests is a better framework for testing MSRs than
KUT, as it's relatively easy to create a custom vCPU model in selftests, whereas
in KUT it requires handcoding an entry in unittests.cfg, and having corresponding
code in the test itself.

The biggest gap in selftests was the lack of decent reporting in guest code, but
Aaron is working on closing that gap[*].

I'm thinking something like this as a framework.  

	struct msr_data {
		const uint32_t idx;
		const char *name;
		const struct kvm_x86_cpu_feature feature1;
		const struct kvm_x86_cpu_feature feature2;
		const uint32_t nr_values;
		const uint64_t *values;
	};

	#define TEST_MSR2(msr, f1, f2) { .idx = msr, .name = #msr, .feature1 = f1, .feature2 = f2, .nr_values = ARRAY_SIZE(msr_VALUES), .values = msr_VALUES }
	#define TEST_MSR(msr, f) TEST_MSR2(msr, f, <a dummy value?>)
	#define TEST_MSR0(msr) TEST_MSR(msr, <a dummy value?>)

With CET usage looking like

	static const uint64_t MSR_IA32_S_CET_VALUES[] = {
		<super interesting values>
	};

	TEST_MSR2(MSR_IA32_S_CET, X86_FEATURE_IBT, X86_FEATURE_SHSTK);

Then the test could iterate over each entry and test the various combinations of
features being enabled (if supported by KVM).  And it could also test ioctls(),
which are all but impossible to test in KUT, e.g. verify that supported MSRs are
reported in KVM_GET_MSR_INDEX_LIST, verify that userspace can read/write MSRs
regardless of guest CPUID, etc.  Ooh, and we can even test MSR filtering.

I don't know that we'd want to cram all of those things in a single test, but we
can worry about that later as it shouldn't be difficult to put the framework and
MSR definitions in common code.

[*] https://lore.kernel.org/all/20230607224520.4164598-1-aaronlewis@google.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 13/21] KVM:VMX: Emulate reads and writes to CET MSRs
  2023-05-11  4:08 ` [PATCH v3 13/21] KVM:VMX: Emulate reads and writes to CET MSRs Yang Weijiang
  2023-05-23  8:21   ` Binbin Wu
@ 2023-06-23 23:53   ` Sean Christopherson
  2023-06-26 14:05     ` Yang, Weijiang
  2023-07-07  9:10     ` Yang, Weijiang
  1 sibling, 2 replies; 99+ messages in thread
From: Sean Christopherson @ 2023-06-23 23:53 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen, Sean Christopherson

On Thu, May 11, 2023, Yang Weijiang wrote:
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index c872a5aafa50..0ccaa467d7d3 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2093,6 +2093,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		else
>  			msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
>  		break;
> +	case MSR_IA32_U_CET:
> +	case MSR_IA32_PL3_SSP:
> +		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
> +			return 1;
> +		kvm_get_xsave_msr(msr_info);
> +		break;

Please put as much MSR handling in x86.c as possible.  We quite obviously know
that AMD support is coming along, there's no reason to duplicate all of this code.
And unless I'm missing something, John's series misses several #GP checks, e.g.
for MSR_IA32_S_CET reserved bits, which means that providing a common implementation
would actually fix bugs.

For MSRs that require vendor input and/or handling, please follow what was
recently done for MSR_IA32_CR_PAT, where the common bits are handled in common
code, and vendor code does its updates.

The divergent alignment between AMD and Intel could get annoying, but I'm sure
we can figure out a solution. 

>  	case MSR_IA32_DEBUGCTLMSR:
>  		msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
>  		break;
> @@ -2405,6 +2411,18 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		else
>  			vmx->pt_desc.guest.addr_a[index / 2] = data;
>  		break;
> +	case MSR_IA32_U_CET:
> +	case MSR_IA32_PL3_SSP:
> +		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
> +			return 1;
> +		if (is_noncanonical_address(data, vcpu))
> +			return 1;
> +		if (msr_index == MSR_IA32_U_CET && (data & GENMASK(9, 6)))
> +			return 1;
> +		if (msr_index == MSR_IA32_PL3_SSP && (data & GENMASK(2, 0)))

Please #define reserved bits, ideally using the inverse of the valid masks.  And
for SSP, it might be better to do IS_ALIGNED(data, 8) (or 4, pending my question
about the SDM's wording).

Side topic, what on earth does the SDM mean by this?!?

  The linear address written must be aligned to 8 bytes and bits 2:0 must be 0
  (hardware requires bits 1:0 to be 0).

I know Intel retroactively changed the alignment requirements, but the above
is nonsensical.  If ucode prevents writing bits 2:0, who cares what hardware
requires?

> +			return 1;
> +		kvm_set_xsave_msr(msr_info);
> +		break;
>  	case MSR_IA32_PERF_CAPABILITIES:
>  		if (data && !vcpu_to_pmu(vcpu)->version)
>  			return 1;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index b6eec9143129..2e3a39c9297c 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -13630,6 +13630,26 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
>  }
>  EXPORT_SYMBOL_GPL(kvm_sev_es_string_io);
>  
> +bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu, struct msr_data *msr)
> +{
> +	if (!kvm_cet_user_supported())

This feels wrong.  KVM should differentiate between SHSTK and IBT in the host.
E.g. if running in a VM with SHSTK but not IBT, or vice versa, KVM should allow
writes to non-existent MSRs.  I.e. this looks wrong:

	/*
	 * If SHSTK and IBT are available in KVM, clear CET user bit in
	 * kvm_caps.supported_xss so that kvm_cet_user_supported() returns
	 * false when called.
	 */
	if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
	    !kvm_cpu_cap_has(X86_FEATURE_IBT))
		kvm_caps.supported_xss &= ~XFEATURE_MASK_CET_USER;

and by extension, all dependent code is also wrong.  IIRC, there's a virtualization
hole, but I don't see any reason why KVM has to make the hole even bigger.

> +		return false;
> +
> +	if (msr->host_initiated)
> +		return true;
> +
> +	if (!guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) &&
> +	    !guest_cpuid_has(vcpu, X86_FEATURE_IBT))
> +		return false;
> +
> +	if (msr->index == MSR_IA32_PL3_SSP &&
> +	    !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))

I probably asked this long ago, but if I did I since forgot.  Is it really just
PL3_SSP that depends on SHSTK?  I would expect all shadow stack MSRs to depend
on SHSTK.

> @@ -546,5 +557,25 @@ int kvm_sev_es_mmio_read(struct kvm_vcpu *vcpu, gpa_t src, unsigned int bytes,
>  int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
>  			 unsigned int port, void *data,  unsigned int count,
>  			 int in);
> +bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu, struct msr_data *msr);
> +
> +/*
> + * We've already loaded guest MSRs in __msr_io() after check the MSR index.

Please avoid pronouns

> + * In case vcpu has been preempted, we need to disable preemption, check

vCPU.  And this doesn't make any sense.  The "vCPU" being preempted doesn't matter,
it's KVM, i.e. the task that's accessing vCPU state that cares about preemption.
I *think* what you're trying to say is that preemption needs to be disabled to
ensure that the guest values are resident.

> + * and reload the guest fpu states before read/write xsaves-managed MSRs.
> + */
> +static inline void kvm_get_xsave_msr(struct msr_data *msr_info)
> +{
> +	fpregs_lock_and_load();

KVM already has helpers that do exactly this, and they have far better names for
KVM: kvm_fpu_get() and kvm_fpu_put().  Can you convert kvm_fpu_get() to
fpregs_lock_and_load() and use those isntead?  And if the extra consistency checks
in fpregs_lock_and_load() fire, we definitely want to know, as it means we probably
have bugs in KVM.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 20/21] KVM:x86: Enable kernel IBT support for guest
  2023-05-11  4:08 ` [PATCH v3 20/21] KVM:x86: Enable kernel IBT support for guest Yang Weijiang
@ 2023-06-24  0:03   ` Sean Christopherson
  2023-06-26 12:10     ` Yang, Weijiang
  0 siblings, 1 reply; 99+ messages in thread
From: Sean Christopherson @ 2023-06-24  0:03 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen

On Thu, May 11, 2023, Yang Weijiang wrote:
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index a2494156902d..1d0151f9e575 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -711,6 +711,7 @@ static bool is_valid_passthrough_msr(u32 msr)
>  		return true;
>  	case MSR_IA32_U_CET:
>  	case MSR_IA32_PL3_SSP:
> +	case MSR_IA32_S_CET:
>  		return true;
>  	}
>  
> @@ -2097,14 +2098,18 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  			msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
>  		break;
>  	case MSR_IA32_U_CET:
> +	case MSR_IA32_S_CET:
>  	case MSR_IA32_PL3_SSP:
>  	case MSR_KVM_GUEST_SSP:
>  		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
>  			return 1;
> -		if (msr_info->index == MSR_KVM_GUEST_SSP)
> +		if (msr_info->index == MSR_KVM_GUEST_SSP) {

Unnecessary curly braces.

>  			msr_info->data = vmcs_readl(GUEST_SSP);
> -		else
> +		} else if (msr_info->index == MSR_IA32_S_CET) {
> +			msr_info->data = vmcs_readl(GUEST_S_CET);
> +		} else {
>  			kvm_get_xsave_msr(msr_info);
> +		}
>  		break;
>  	case MSR_IA32_DEBUGCTLMSR:
>  		msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
> @@ -2419,6 +2424,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  			vmx->pt_desc.guest.addr_a[index / 2] = data;
>  		break;
>  	case MSR_IA32_U_CET:
> +	case MSR_IA32_S_CET:
>  	case MSR_IA32_PL3_SSP:
>  	case MSR_KVM_GUEST_SSP:
>  		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
> @@ -2430,10 +2436,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		if ((msr_index == MSR_IA32_PL3_SSP ||
>  		     msr_index == MSR_KVM_GUEST_SSP) && (data & GENMASK(2, 0)))
>  			return 1;
> -		if (msr_index == MSR_KVM_GUEST_SSP)
> +		if (msr_index == MSR_KVM_GUEST_SSP) {
>  			vmcs_writel(GUEST_SSP, data);
> -		else
> +		} else if (msr_index == MSR_IA32_S_CET) {
> +			vmcs_writel(GUEST_S_CET, data);
> +		} else {

Same here.

>  			kvm_set_xsave_msr(msr_info);
> +		}
>  		break;
>  	case MSR_IA32_PERF_CAPABILITIES:
>  		if (data && !vcpu_to_pmu(vcpu)->version)
> @@ -7322,6 +7331,19 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
>  
>  	kvm_wait_lapic_expire(vcpu);
>  
> +	/*
> +	 * Save host MSR_IA32_S_CET so that it can be reloaded at vm_exit.
> +	 * No need to save the other two vmcs fields as supervisor SHSTK
> +	 * are not enabled on Intel platform now.
> +	 */
> +	if (IS_ENABLED(CONFIG_X86_KERNEL_IBT) &&
> +	    (vm_exit_controls_get(vmx) & VM_EXIT_LOAD_CET_STATE)) {
> +		u64 msr;
> +
> +		rdmsrl(MSR_IA32_S_CET, msr);

Reading the MSR on every VM-Enter can't possibly be necessary.  At the absolute
minimum, this could be moved outside of the fastpath; if the kernel modifies S_CET
from NMI context, KVM is hosed.  And *if* S_CET isn't static post-boot, this can
be done in .prepare_switch_to_guest() so long as S_CET isn't modified from IRQ
context.

But unless mine eyes deceive me, S_CET is only truly modified during setup_cet(),
i.e. is static post boot, which means it can be read once at KVM load time, e.g.
just like host_efer.

The kernel does save/restore IBT when making BIOS calls, but if KVM is running a
vCPU across a BIOS call then we've got bigger issues.

> +		vmcs_writel(HOST_S_CET, msr);
> +	}
> +
>  	/* The actual VMENTER/EXIT is in the .noinstr.text section. */
>  	vmx_vcpu_enter_exit(vcpu, __vmx_vcpu_run_flags(vmx));
>  
> @@ -7735,6 +7757,13 @@ static void vmx_update_intercept_for_cet_msr(struct kvm_vcpu *vcpu)
>  
>  	incpt |= !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
>  	vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP, MSR_TYPE_RW, incpt);
> +
> +	/*
> +	 * If IBT is available to guest, then passthrough S_CET MSR too since
> +	 * kernel IBT is already in mainline kernel tree.
> +	 */
> +	incpt = !guest_cpuid_has(vcpu, X86_FEATURE_IBT);
> +	vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, MSR_TYPE_RW, incpt);
>  }
>  
>  static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> @@ -7805,7 +7834,7 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  	/* Refresh #PF interception to account for MAXPHYADDR changes. */
>  	vmx_update_exception_bitmap(vcpu);
>  
> -	if (kvm_cet_user_supported())
> +	if (kvm_cet_user_supported() || kvm_cpu_cap_has(X86_FEATURE_IBT))

Yeah, kvm_cet_user_supported() simply looks wrong.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 00/21] Enable CET Virtualization
  2023-06-23 20:51         ` Sean Christopherson
@ 2023-06-26  6:46           ` Yang, Weijiang
  2023-07-17  7:44           ` Yang, Weijiang
  1 sibling, 0 replies; 99+ messages in thread
From: Yang, Weijiang @ 2023-06-26  6:46 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen


On 6/24/2023 4:51 AM, Sean Christopherson wrote:
> On Mon, Jun 19, 2023, Weijiang Yang wrote:
>> On 6/17/2023 1:56 AM, Sean Christopherson wrote:
>>> On Fri, Jun 16, 2023, Weijiang Yang wrote:
>>>> On 6/16/2023 7:30 AM, Sean Christopherson wrote:
>>>>> On Thu, May 11, 2023, Yang Weijiang wrote:
>>>>>> The last patch is introduced to support supervisor SHSTK but the feature is
>>>>>> not enabled on Intel platform for now, the main purpose of this patch is to
>>>>>> facilitate AMD folks to enable the feature.
>>>>> I am beyond confused by the SDM's wording of CET_SSS.
>>>>>
>>>>> First, it says that CET_SSS says the CPU isn't buggy (or maybe "less buggy" is
>>>>> more appropriate phrasing).
>>>>>
>>>>>      Bit 18: CET_SSS. If 1, indicates that an operating system can enable supervisor
>>>>>      shadow stacks as long as it ensures that certain supervisor shadow-stack pushes
>>>>>      will not cause page faults (see Section 17.2.3 of the Intel® 64 and IA-32
>>>>>      Architectures Software Developer’s Manual, Volume 1).
>>>>>
>>>>> But then it says says VMMs shouldn't set the bit.
>>>>>
>>>>>      When emulating the CPUID instruction, a virtual-machine monitor should return
>>>>>      this bit as 0 if those pushes can cause VM exits.
>>>>>
>>>>> Based on the Xen code (which is sadly a far better source of information than the
>>>>> SDM), I *think* that what the SDM is trying to say is that VMMs should not set
>>>>> CET_SS if VM-Exits can occur ***and*** the bit is not set in the host CPU.  Because
>>>>> if the SDM really means "VMMs should never set the bit", then what on earth is the
>>>>> point of the bit.
>>>> I need to double check for the vague description.
>>>>
>>>>   From my understanding, on bare metal side, if the bit is 1, OS can enable
>>>> SSS if pushes won't cause page fault. But for VM case, it's not recommended
>>>> (regardless of the bit state) to set the bit as vm-exits caused by guest SSS
>>>> pushes cannot be fully excluded.
>>>>
>>>> In other word, the bit is mainly for bare metal guidance now.
>>>>
>>>>>> In summary, this new series enables CET user SHSTK/IBT and kernel IBT, but
>>>>>> doesn't fully support CET supervisor SHSTK, the enabling work is left for
>>>>>> the future.
>>>>> Why?  If my interpretation of the SDM is correct, then all the pieces are there.
>>> ...
>>>
>>>> And also based on above SDM description, I don't want to add the support
>>>> blindly now.
>>> *sigh*
>>>
>>> I got filled in on the details offlist.
>>>
>>> 1) In the next version of this series, please rework it to reincorporate Supervisor
>>>      Shadow Stack support into the main series, i.e. pretend Intel's implemenation
>>>      isn't horribly flawed.
>> Let me make it clear, you want me to do two things:
>>
>> 1)Add Supervisor Shadow Stack  state support(i.e., XSS.bit12(CET_S)) into
>> kernel so that host can support guest Supervisor Shadow Stack MSRs in g/h FPU
>> context switch.
> If that's necessary for correct functionality, yes.
>
>> 2) Add Supervisor Shadow stack support into KVM part so that guest OS is
>> able to use SSS with risk.
> Yes.  Architecturally, if KVM advertises X86_FEATURE_SHSTK, then KVM needs to
> provide both User and Supervisor support.  CET_SSS doesn't change the architecture,
> it's little more than a hint.  And even if the guest follows SDM's recommendation
> to not enable shadow stacks, a clever kernel can still utilize SSS assets, e.g. use
> the MSRs as scratch registers.

Understood, thanks!


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 16/21] KVM:x86: Save/Restore GUEST_SSP to/from SMM state save area
  2023-06-23 22:30   ` Sean Christopherson
@ 2023-06-26  8:59     ` Yang, Weijiang
  2023-06-26 21:20       ` Sean Christopherson
  0 siblings, 1 reply; 99+ messages in thread
From: Yang, Weijiang @ 2023-06-26  8:59 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen


On 6/24/2023 6:30 AM, Sean Christopherson wrote:
> On Thu, May 11, 2023, Yang Weijiang wrote:
>> Save GUEST_SSP to SMM state save area when guest exits to SMM
>> due to SMI and restore it VMCS field when guest exits SMM.
> This fails to answer "Why does KVM need to do this?"

How about this:

Guest SMM mode execution is out of guest kernel, to avoid GUEST_SSP 
corruption,

KVM needs to save current normal mode GUEST_SSP to SMRAM area so that it can

restore original GUEST_SSP at the end of SMM.

>
>> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
>> ---
>>   arch/x86/kvm/smm.c | 20 ++++++++++++++++++++
>>   1 file changed, 20 insertions(+)
>>
>> diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c
>> index b42111a24cc2..c54d3eb2b7e4 100644
>> --- a/arch/x86/kvm/smm.c
>> +++ b/arch/x86/kvm/smm.c
>> @@ -275,6 +275,16 @@ static void enter_smm_save_state_64(struct kvm_vcpu *vcpu,
>>   	enter_smm_save_seg_64(vcpu, &smram->gs, VCPU_SREG_GS);
>>   
>>   	smram->int_shadow = static_call(kvm_x86_get_interrupt_shadow)(vcpu);
>> +
>> +	if (kvm_cet_user_supported()) {
> This is wrong, KVM should not save/restore state that doesn't exist from the guest's
> perspective, i.e. this needs to check guest_cpuid_has().

Yes, the check missed the case that user space disables SHSTK. Will 
change it, thanks!

>
> On a related topic, I would love feedback on my series that adds a framework for
> features like this, where KVM needs to check guest CPUID as well as host support.
>
> https://lore.kernel.org/all/20230217231022.816138-1-seanjc@google.com

The framework looks good, will it be merged in kvm_x86?

>
>> +		struct msr_data msr;
>> +
>> +		msr.index = MSR_KVM_GUEST_SSP;
>> +		msr.host_initiated = true;
> Huh?
>
>> +		/* GUEST_SSP is stored in VMCS at vm-exit. */
> (a) this is not VMX code, i.e. referencing the VMCS is wrong, and (b) how the
> guest's SSP is managed is irrelevant, all that matters is that KVM can get the
> current guest value.

Sorry the comment is incorrect,  my original intent is: it's stored in 
VM control structure field, will change it.

>
>> +		static_call(kvm_x86_get_msr)(vcpu, &msr);
>> +		smram->ssp = msr.data;
>> +	}
>>   }
>>   #endif
>>   
>> @@ -565,6 +575,16 @@ static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt,
>>   	static_call(kvm_x86_set_interrupt_shadow)(vcpu, 0);
>>   	ctxt->interruptibility = (u8)smstate->int_shadow;
>>   
>> +	if (kvm_cet_user_supported()) {
>> +		struct msr_data msr;
>> +
>> +		msr.index = MSR_KVM_GUEST_SSP;
>> +		msr.host_initiated = true;
>> +		msr.data = smstate->ssp;
>> +		/* Mimic host_initiated access to bypass ssp access check. */
> No, masquerading as a host access is all kinds of wrong.  I have no idea what
> check you're trying to bypass, but whatever it is, it's wrong.  Per the SDM, the
> SSP field in SMRAM is writable, which means that KVM needs to correctly handle
> the scenario where SSP holds garbage, e.g. a non-canonical address.

MSR_KVM_GUEST_SSP is only accessible to user space, e.g., during LM, it's not
accessible to VM itself. So in kvm_cet_is_msr_accessible(), I added a check to
tell whether the access is initiated from user space or not, I tried to bypass
that check. Yes, I will add necessary checks here.

>
> Why can't this use kvm_get_msr() and kvm_set_msr()?

If my above assumption is correct, these helpers are passed by 
host_initiated=false and cannot meet the requirments.


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 07/21] KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS
  2023-06-23 23:21               ` Sean Christopherson
@ 2023-06-26  9:24                 ` Yang, Weijiang
  0 siblings, 0 replies; 99+ messages in thread
From: Yang, Weijiang @ 2023-06-26  9:24 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Chao Gao, pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen, Zhang Yi Z


On 6/24/2023 7:21 AM, Sean Christopherson wrote:
> On Fri, Jun 16, 2023, Weijiang Yang wrote:
>> On 6/16/2023 7:45 AM, Sean Christopherson wrote:
>>> On Wed, May 31, 2023, Weijiang Yang wrote:
>>>> On 5/30/2023 8:08 PM, Chao Gao wrote:
>>>>>>>> --- a/arch/x86/kvm/x86.c
>>>>>>>> +++ b/arch/x86/kvm/x86.c
>>>>>>>> @@ -3776,8 +3776,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>>>>>>> 		 */
>>>>>>>> 		if (data & ~kvm_caps.supported_xss)
>>>>>>> Shouldn't we check against the supported value of _this_ guest? similar to
>>>>>>> guest_supported_xcr0.
>>>>>> I don't think it requires an extra variable to serve per guest purpose.
>>>>>>
>>>>>> For guest XSS settings, now we don't add extra constraints like XCR0, thus
>>>>> QEMU can impose constraints by configuring guest CPUID.0xd.1 to indicate
>>>>> certain supervisor state components cannot be managed by XSAVES, even
>>>>> though KVM supports them. IOW, guests may differ in the supported values
>>>>> for the IA32_XSS MSR.
>>>> OK, will change this part to align with xcr0 settings. Thanks!
>>> Please write KVM-Unit-Tests to verify KVM correctly handles the various MSRs related
>>> to CET, e.g. a test_cet_msrs() subtest in msr.c would do nicely.  Hmm, though testing
>>> the combinations of CPUID bits will require multiple x86/unittests.cfg entries.
>>> Might be time to split up msr.c into a library and then multiple tests.
>> Since there's already a CET specific unit test app, do you mind adding all
>> CET related stuffs to the app to make it inclusive? e.g.,�validate constraints
>> between CET CPUIDs vs. CET/XSS MSRs?
> Hmm, that will get a bit kludgy since the MSR testcases will want to toggle IBT
> and SHSTK on and off.
>
> Actually, I take back my suggestion to add a KUT test.  Except for a few special
> cases, e.g. 32-bit support, selftests is a better framework for testing MSRs than
> KUT, as it's relatively easy to create a custom vCPU model in selftests, whereas
> in KUT it requires handcoding an entry in unittests.cfg, and having corresponding
> code in the test itself.
>
> The biggest gap in selftests was the lack of decent reporting in guest code, but
> Aaron is working on closing that gap[*].
>
> I'm thinking something like this as a framework.
>
> 	struct msr_data {
> 		const uint32_t idx;
> 		const char *name;
> 		const struct kvm_x86_cpu_feature feature1;
> 		const struct kvm_x86_cpu_feature feature2;
> 		const uint32_t nr_values;
> 		const uint64_t *values;
> 	};
>
> 	#define TEST_MSR2(msr, f1, f2) { .idx = msr, .name = #msr, .feature1 = f1, .feature2 = f2, .nr_values = ARRAY_SIZE(msr_VALUES), .values = msr_VALUES }
> 	#define TEST_MSR(msr, f) TEST_MSR2(msr, f, <a dummy value?>)
> 	#define TEST_MSR0(msr) TEST_MSR(msr, <a dummy value?>)
>
> With CET usage looking like
>
> 	static const uint64_t MSR_IA32_S_CET_VALUES[] = {
> 		<super interesting values>
> 	};
>
> 	TEST_MSR2(MSR_IA32_S_CET, X86_FEATURE_IBT, X86_FEATURE_SHSTK);
>
> Then the test could iterate over each entry and test the various combinations of
> features being enabled (if supported by KVM).  And it could also test ioctls(),
> which are all but impossible to test in KUT, e.g. verify that supported MSRs are
> reported in KVM_GET_MSR_INDEX_LIST, verify that userspace can read/write MSRs
> regardless of guest CPUID, etc.  Ooh, and we can even test MSR filtering.
>
> I don't know that we'd want to cram all of those things in a single test, but we
> can worry about that later as it shouldn't be difficult to put the framework and
> MSR definitions in common code.

OK, I'll add a new selftest app which initially only includes CET MSRs 
testing but practice

the above ideas.

>
> [*] https://lore.kernel.org/all/20230607224520.4164598-1-aaronlewis@google.com

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 20/21] KVM:x86: Enable kernel IBT support for guest
  2023-06-24  0:03   ` Sean Christopherson
@ 2023-06-26 12:10     ` Yang, Weijiang
  2023-06-26 20:50       ` Sean Christopherson
  0 siblings, 1 reply; 99+ messages in thread
From: Yang, Weijiang @ 2023-06-26 12:10 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen


On 6/24/2023 8:03 AM, Sean Christopherson wrote:
> On Thu, May 11, 2023, Yang Weijiang wrote:
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index a2494156902d..1d0151f9e575 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -711,6 +711,7 @@ static bool is_valid_passthrough_msr(u32 msr)
>>   		return true;
>>   	case MSR_IA32_U_CET:
>>   	case MSR_IA32_PL3_SSP:
>> +	case MSR_IA32_S_CET:
>>   		return true;
>>   	}
>>   
>> @@ -2097,14 +2098,18 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>   			msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
>>   		break;
>>   	case MSR_IA32_U_CET:
>> +	case MSR_IA32_S_CET:
>>   	case MSR_IA32_PL3_SSP:
>>   	case MSR_KVM_GUEST_SSP:
>>   		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
>>   			return 1;
>> -		if (msr_info->index == MSR_KVM_GUEST_SSP)
>> +		if (msr_info->index == MSR_KVM_GUEST_SSP) {
> Unnecessary curly braces.

Something in my mind must be wrong :-), will remove them.

>
>>   			msr_info->data = vmcs_readl(GUEST_SSP);
>> -		else
>> +		} else if (msr_info->index == MSR_IA32_S_CET) {
>> +			msr_info->data = vmcs_readl(GUEST_S_CET);
>> +		} else {
>>   			kvm_get_xsave_msr(msr_info);
>> +		}
>>   		break;
>>   	case MSR_IA32_DEBUGCTLMSR:
>>   		msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
>> @@ -2419,6 +2424,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>   			vmx->pt_desc.guest.addr_a[index / 2] = data;
>>   		break;
>>   	case MSR_IA32_U_CET:
>> +	case MSR_IA32_S_CET:
>>   	case MSR_IA32_PL3_SSP:
>>   	case MSR_KVM_GUEST_SSP:
>>   		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
>> @@ -2430,10 +2436,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>   		if ((msr_index == MSR_IA32_PL3_SSP ||
>>   		     msr_index == MSR_KVM_GUEST_SSP) && (data & GENMASK(2, 0)))
>>   			return 1;
>> -		if (msr_index == MSR_KVM_GUEST_SSP)
>> +		if (msr_index == MSR_KVM_GUEST_SSP) {
>>   			vmcs_writel(GUEST_SSP, data);
>> -		else
>> +		} else if (msr_index == MSR_IA32_S_CET) {
>> +			vmcs_writel(GUEST_S_CET, data);
>> +		} else {
> Same here.
>
>>   			kvm_set_xsave_msr(msr_info);
>> +		}
>>   		break;
>>   	case MSR_IA32_PERF_CAPABILITIES:
>>   		if (data && !vcpu_to_pmu(vcpu)->version)
>> @@ -7322,6 +7331,19 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
>>   
>>   	kvm_wait_lapic_expire(vcpu);
>>   
>> +	/*
>> +	 * Save host MSR_IA32_S_CET so that it can be reloaded at vm_exit.
>> +	 * No need to save the other two vmcs fields as supervisor SHSTK
>> +	 * are not enabled on Intel platform now.
>> +	 */
>> +	if (IS_ENABLED(CONFIG_X86_KERNEL_IBT) &&
>> +	    (vm_exit_controls_get(vmx) & VM_EXIT_LOAD_CET_STATE)) {
>> +		u64 msr;
>> +
>> +		rdmsrl(MSR_IA32_S_CET, msr);
> Reading the MSR on every VM-Enter can't possibly be necessary.  At the absolute
> minimum, this could be moved outside of the fastpath; if the kernel modifies S_CET
> from NMI context, KVM is hosed.  And *if* S_CET isn't static post-boot, this can
> be done in .prepare_switch_to_guest() so long as S_CET isn't modified from IRQ
> context.

Agree with you.

>
> But unless mine eyes deceive me, S_CET is only truly modified during setup_cet(),
> i.e. is static post boot, which means it can be read once at KVM load time, e.g.
> just like host_efer.

I think handling S_CET like host_efer from usage perspective is possible 
given currently only

kernel IBT is enabled in kernel, I'll remove the code and initialize the 
vmcs field once like host_efer.

>
> The kernel does save/restore IBT when making BIOS calls, but if KVM is running a
> vCPU across a BIOS call then we've got bigger issues.

What's the problem you're referring to?

>
>> +		vmcs_writel(HOST_S_CET, msr);
>> +	}
>> +
>>   	/* The actual VMENTER/EXIT is in the .noinstr.text section. */
>>   	vmx_vcpu_enter_exit(vcpu, __vmx_vcpu_run_flags(vmx));
>>   
>> @@ -7735,6 +7757,13 @@ static void vmx_update_intercept_for_cet_msr(struct kvm_vcpu *vcpu)
>>   
>>   	incpt |= !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
>>   	vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP, MSR_TYPE_RW, incpt);
>> +
>> +	/*
>> +	 * If IBT is available to guest, then passthrough S_CET MSR too since
>> +	 * kernel IBT is already in mainline kernel tree.
>> +	 */
>> +	incpt = !guest_cpuid_has(vcpu, X86_FEATURE_IBT);
>> +	vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, MSR_TYPE_RW, incpt);
>>   }
>>   
>>   static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>> @@ -7805,7 +7834,7 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>>   	/* Refresh #PF interception to account for MAXPHYADDR changes. */
>>   	vmx_update_exception_bitmap(vcpu);
>>   
>> -	if (kvm_cet_user_supported())
>> +	if (kvm_cet_user_supported() || kvm_cpu_cap_has(X86_FEATURE_IBT))
> Yeah, kvm_cet_user_supported() simply looks wrong.

These are preconditions to set up CET MSRs for guest, in 
vmx_update_intercept_for_cet_msr(),

the actual MSR control is based on guest_cpuid_has() results.



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 13/21] KVM:VMX: Emulate reads and writes to CET MSRs
  2023-06-23 23:53   ` Sean Christopherson
@ 2023-06-26 14:05     ` Yang, Weijiang
  2023-06-26 21:15       ` Sean Christopherson
  2023-07-07  9:10     ` Yang, Weijiang
  1 sibling, 1 reply; 99+ messages in thread
From: Yang, Weijiang @ 2023-06-26 14:05 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen, Sean Christopherson


On 6/24/2023 7:53 AM, Sean Christopherson wrote:
> On Thu, May 11, 2023, Yang Weijiang wrote:
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index c872a5aafa50..0ccaa467d7d3 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -2093,6 +2093,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>   		else
>>   			msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
>>   		break;
>> +	case MSR_IA32_U_CET:
>> +	case MSR_IA32_PL3_SSP:
>> +		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
>> +			return 1;
>> +		kvm_get_xsave_msr(msr_info);
>> +		break;
> Please put as much MSR handling in x86.c as possible.  We quite obviously know
> that AMD support is coming along, there's no reason to duplicate all of this code.
> And unless I'm missing something, John's series misses several #GP checks, e.g.
> for MSR_IA32_S_CET reserved bits, which means that providing a common implementation
> would actually fix bugs.

OK, will move the common part to x86.c

>
> For MSRs that require vendor input and/or handling, please follow what was
> recently done for MSR_IA32_CR_PAT, where the common bits are handled in common
> code, and vendor code does its updates.
>
> The divergent alignment between AMD and Intel could get annoying, but I'm sure
> we can figure out a solution.
Got it, will refer to the PAT handling.
>
>>   	case MSR_IA32_DEBUGCTLMSR:
>>   		msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
>>   		break;
>> @@ -2405,6 +2411,18 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>   		else
>>   			vmx->pt_desc.guest.addr_a[index / 2] = data;
>>   		break;
>> +	case MSR_IA32_U_CET:
>> +	case MSR_IA32_PL3_SSP:
>> +		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
>> +			return 1;
>> +		if (is_noncanonical_address(data, vcpu))
>> +			return 1;
>> +		if (msr_index == MSR_IA32_U_CET && (data & GENMASK(9, 6)))
>> +			return 1;
>> +		if (msr_index == MSR_IA32_PL3_SSP && (data & GENMASK(2, 0)))
> Please #define reserved bits, ideally using the inverse of the valid masks.  And
> for SSP, it might be better to do IS_ALIGNED(data, 8) (or 4, pending my question
> about the SDM's wording).

OK.

>
> Side topic, what on earth does the SDM mean by this?!?
>
>    The linear address written must be aligned to 8 bytes and bits 2:0 must be 0
>    (hardware requires bits 1:0 to be 0).
>
> I know Intel retroactively changed the alignment requirements, but the above
> is nonsensical.  If ucode prevents writing bits 2:0, who cares what hardware
> requires?

So do I ;-/

>
>> +			return 1;
>> +		kvm_set_xsave_msr(msr_info);
>> +		break;
>>   	case MSR_IA32_PERF_CAPABILITIES:
>>   		if (data && !vcpu_to_pmu(vcpu)->version)
>>   			return 1;
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index b6eec9143129..2e3a39c9297c 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -13630,6 +13630,26 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
>>   }
>>   EXPORT_SYMBOL_GPL(kvm_sev_es_string_io);
>>   
>> +bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu, struct msr_data *msr)
>> +{
>> +	if (!kvm_cet_user_supported())
> This feels wrong.  KVM should differentiate between SHSTK and IBT in the host.
> E.g. if running in a VM with SHSTK but not IBT, or vice versa, KVM should allow
> writes to non-existent MSRs.

I don't follow you, in this case, which part KVM is on behalf of? guest 
or user space?

> I.e. this looks wrong:
>
> 	/*
> 	 * If SHSTK and IBT are available in KVM, clear CET user bit in
> 	 * kvm_caps.supported_xss so that kvm_cet_user_supported() returns
> 	 * false when called.
> 	 */
> 	if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
> 	    !kvm_cpu_cap_has(X86_FEATURE_IBT))
> 		kvm_caps.supported_xss &= ~XFEATURE_MASK_CET_USER;

The comment is wrong, it should be "are not available in KVM". My intent 
is,  if both features are not

available in KVM, then clear the precondition bit so that all dependent 
checks will fail quickly.

>
> and by extension, all dependent code is also wrong.  IIRC, there's a virtualization
> hole, but I don't see any reason why KVM has to make the hole even bigger.

Do you mean the issue that both SHSTK and IBT share one control MSR? 
i.e., U_CET/S_CET?

>
>> +		return false;
>> +
>> +	if (msr->host_initiated)
>> +		return true;
>> +
>> +	if (!guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) &&
>> +	    !guest_cpuid_has(vcpu, X86_FEATURE_IBT))
>> +		return false;
>> +
>> +	if (msr->index == MSR_IA32_PL3_SSP &&
>> +	    !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
> I probably asked this long ago, but if I did I since forgot.  Is it really just
> PL3_SSP that depends on SHSTK?  I would expect all shadow stack MSRs to depend
> on SHSTK.

All PL{0,1,2,3}_SSP plus INT_SSP_TAB msr depend on SHSTK. In patch 21, I 
added more

MSRs in this helper.

>> @@ -546,5 +557,25 @@ int kvm_sev_es_mmio_read(struct kvm_vcpu *vcpu, gpa_t src, unsigned int bytes,
>>   int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
>>   			 unsigned int port, void *data,  unsigned int count,
>>   			 int in);
>> +bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu, struct msr_data *msr);
>> +
>> +/*
>> + * We've already loaded guest MSRs in __msr_io() after check the MSR index.
> Please avoid pronouns

OK.

>> + * In case vcpu has been preempted, we need to disable preemption, check
> vCPU.  And this doesn't make any sense.  The "vCPU" being preempted doesn't matter,
> it's KVM, i.e. the task that's accessing vCPU state that cares about preemption.
> I *think* what you're trying to say is that preemption needs to be disabled to
> ensure that the guest values are resident.

Sorry the comment is broken, I meant to say between kvm_load_guest_fpu() 
and the

place to use this helper, the vCPU could have been preempted, so need to 
reload fpu with

fpregs_lock_and_load() and disable preemption now before access MSR.


>> + * and reload the guest fpu states before read/write xsaves-managed MSRs.
>> + */
>> +static inline void kvm_get_xsave_msr(struct msr_data *msr_info)
>> +{
>> +	fpregs_lock_and_load();
> KVM already has helpers that do exactly this, and they have far better names for
> KVM: kvm_fpu_get() and kvm_fpu_put().  Can you convert kvm_fpu_get() to
> fpregs_lock_and_load() and use those isntead? And if the extra consistency checks
> in fpregs_lock_and_load() fire, we definitely want to know, as it means we probably
> have bugs in KVM.

Do you want me to do some experiments to make sure the WARN()  in 
fpregs_lock_and load() would be

triggered or not?

If no WARN() trigger, then replace 
fpregs_lock_and_load()/fpregs_unlock() with kvm_fpu_get()/

kvm_fpu_put()?


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 20/21] KVM:x86: Enable kernel IBT support for guest
  2023-06-26 12:10     ` Yang, Weijiang
@ 2023-06-26 20:50       ` Sean Christopherson
  2023-06-27  1:53         ` Yang, Weijiang
  0 siblings, 1 reply; 99+ messages in thread
From: Sean Christopherson @ 2023-06-26 20:50 UTC (permalink / raw)
  To: Weijiang Yang
  Cc: pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen

On Mon, Jun 26, 2023, Weijiang Yang wrote:
> 
> On 6/24/2023 8:03 AM, Sean Christopherson wrote:
> > > @@ -7322,6 +7331,19 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
> > >   	kvm_wait_lapic_expire(vcpu);
> > > +	/*
> > > +	 * Save host MSR_IA32_S_CET so that it can be reloaded at vm_exit.
> > > +	 * No need to save the other two vmcs fields as supervisor SHSTK
> > > +	 * are not enabled on Intel platform now.
> > > +	 */
> > > +	if (IS_ENABLED(CONFIG_X86_KERNEL_IBT) &&
> > > +	    (vm_exit_controls_get(vmx) & VM_EXIT_LOAD_CET_STATE)) {
> > > +		u64 msr;
> > > +
> > > +		rdmsrl(MSR_IA32_S_CET, msr);
> > Reading the MSR on every VM-Enter can't possibly be necessary.  At the absolute
> > minimum, this could be moved outside of the fastpath; if the kernel modifies S_CET
> > from NMI context, KVM is hosed.  And *if* S_CET isn't static post-boot, this can
> > be done in .prepare_switch_to_guest() so long as S_CET isn't modified from IRQ
> > context.
> 
> Agree with you.
> 
> > 
> > But unless mine eyes deceive me, S_CET is only truly modified during setup_cet(),
> > i.e. is static post boot, which means it can be read once at KVM load time, e.g.
> > just like host_efer.
> 
> I think handling S_CET like host_efer from usage perspective is possible
> given currently only
> 
> kernel IBT is enabled in kernel, I'll remove the code and initialize the
> vmcs field once like host_efer.
> 
> > 
> > The kernel does save/restore IBT when making BIOS calls, but if KVM is running a
> > vCPU across a BIOS call then we've got bigger issues.
> 
> What's the problem you're referring to?

I was pointing out that S_CET isn't strictly constant, as it's saved/modified/restored
by ibt_save() + ibt_restore().  But KVM should never run between those paired
functions, so from KVM's perspective the host value is effectively constant.

> > > +		vmcs_writel(HOST_S_CET, msr);
> > > +	}
> > > +
> > >   	/* The actual VMENTER/EXIT is in the .noinstr.text section. */
> > >   	vmx_vcpu_enter_exit(vcpu, __vmx_vcpu_run_flags(vmx));
> > > @@ -7735,6 +7757,13 @@ static void vmx_update_intercept_for_cet_msr(struct kvm_vcpu *vcpu)
> > >   	incpt |= !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
> > >   	vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP, MSR_TYPE_RW, incpt);
> > > +
> > > +	/*
> > > +	 * If IBT is available to guest, then passthrough S_CET MSR too since
> > > +	 * kernel IBT is already in mainline kernel tree.
> > > +	 */
> > > +	incpt = !guest_cpuid_has(vcpu, X86_FEATURE_IBT);
> > > +	vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, MSR_TYPE_RW, incpt);
> > >   }
> > >   static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> > > @@ -7805,7 +7834,7 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> > >   	/* Refresh #PF interception to account for MAXPHYADDR changes. */
> > >   	vmx_update_exception_bitmap(vcpu);
> > > -	if (kvm_cet_user_supported())
> > > +	if (kvm_cet_user_supported() || kvm_cpu_cap_has(X86_FEATURE_IBT))
> > Yeah, kvm_cet_user_supported() simply looks wrong.
> 
> These are preconditions to set up CET MSRs for guest, in
> vmx_update_intercept_for_cet_msr(),
> 
> the actual MSR control is based on guest_cpuid_has() results.

I know.  My point is that with the below combination, 

	kvm_cet_user_supported()		= true
	kvm_cpu_cap_has(X86_FEATURE_IBT)	= false 
	guest_cpuid_has(vcpu, X86_FEATURE_IBT)	= true

KVM will passthrough MSR_IA32_S_CET for guest IBT even though IBT isn't supported
on the host.

	incpt = !guest_cpuid_has(vcpu, X86_FEATURE_IBT);
	vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, MSR_TYPE_RW, incpt);

So either KVM is broken and is passing through S_CET when it shouldn't, or the
check on kvm_cet_user_supported() is redundant, i.e. the above combination is
impossible.

Either way, the code *looks* wrong, which is almost as bad as it being functionally
wrong.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 13/21] KVM:VMX: Emulate reads and writes to CET MSRs
  2023-06-26 14:05     ` Yang, Weijiang
@ 2023-06-26 21:15       ` Sean Christopherson
  2023-06-27  3:32         ` Yang, Weijiang
  0 siblings, 1 reply; 99+ messages in thread
From: Sean Christopherson @ 2023-06-26 21:15 UTC (permalink / raw)
  To: Weijiang Yang
  Cc: pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen, Sean Christopherson

On Mon, Jun 26, 2023, Weijiang Yang wrote:
> 
> On 6/24/2023 7:53 AM, Sean Christopherson wrote:
> > On Thu, May 11, 2023, Yang Weijiang wrote:
> > Side topic, what on earth does the SDM mean by this?!?
> > 
> >    The linear address written must be aligned to 8 bytes and bits 2:0 must be 0
> >    (hardware requires bits 1:0 to be 0).
> > 
> > I know Intel retroactively changed the alignment requirements, but the above
> > is nonsensical.  If ucode prevents writing bits 2:0, who cares what hardware
> > requires?
> 
> So do I ;-/

Can you follow-up with someone to get clarification?  If writing bit 2 with '1'
does not #GP despite the statement that it "must be aligned", then KVM shouldn't
injected a #GP on that case.

> > > +			return 1;
> > > +		kvm_set_xsave_msr(msr_info);
> > > +		break;
> > >   	case MSR_IA32_PERF_CAPABILITIES:
> > >   		if (data && !vcpu_to_pmu(vcpu)->version)
> > >   			return 1;
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index b6eec9143129..2e3a39c9297c 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -13630,6 +13630,26 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
> > >   }
> > >   EXPORT_SYMBOL_GPL(kvm_sev_es_string_io);
> > > +bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu, struct msr_data *msr)
> > > +{
> > > +	if (!kvm_cet_user_supported())
> > This feels wrong.  KVM should differentiate between SHSTK and IBT in the host.
> > E.g. if running in a VM with SHSTK but not IBT, or vice versa, KVM should allow
> > writes to non-existent MSRs.
> 
> I don't follow you, in this case, which part KVM is on behalf of? guest or
> user space?

Sorry, typo.  KVM *shouldn't* allow writes to non-existent MSRs.  

> > I.e. this looks wrong:
> > 
> > 	/*
> > 	 * If SHSTK and IBT are available in KVM, clear CET user bit in
> > 	 * kvm_caps.supported_xss so that kvm_cet_user_supported() returns
> > 	 * false when called.
> > 	 */
> > 	if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
> > 	    !kvm_cpu_cap_has(X86_FEATURE_IBT))
> > 		kvm_caps.supported_xss &= ~XFEATURE_MASK_CET_USER;
> 
> The comment is wrong, it should be "are not available in KVM". My intent is,�
> if both features are not available in KVM, then clear the precondition bit so
> that all dependent checks will fail quickly.

Checking kvm_caps.supported_xss.CET_USER is worthless in 99% of the cases though.
Unless I'm missing something, the only time it's useful is for CR4.CET, which
doesn't differentiate between SHSTK and IBT.  For everything else that KVM cares
about, at some point KVM needs to precisely check for SHSTK and IBT support
anyways

> > and by extension, all dependent code is also wrong.  IIRC, there's a virtualization
> > hole, but I don't see any reason why KVM has to make the hole even bigger.
> 
> Do you mean the issue that both SHSTK and IBT share one control MSR? i.e.,
> U_CET/S_CET?

I mean that passing through PLx_SSP if the host has IBT but *not* SHSTK is wrong.

> > > +		return false;
> > > +
> > > +	if (msr->host_initiated)
> > > +		return true;
> > > +
> > > +	if (!guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) &&
> > > +	    !guest_cpuid_has(vcpu, X86_FEATURE_IBT))
> > > +		return false;
> > > +
> > > +	if (msr->index == MSR_IA32_PL3_SSP &&
> > > +	    !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
> > I probably asked this long ago, but if I did I since forgot.  Is it really just
> > PL3_SSP that depends on SHSTK?  I would expect all shadow stack MSRs to depend
> > on SHSTK.
> 
> All PL{0,1,2,3}_SSP plus INT_SSP_TAB msr depend on SHSTK. In patch 21, I
> added more MSRs in this helper.

Sure, except that patch 21 never adds handling for PL{0,1,2}_SSP.  I see:

	if (!kvm_cet_user_supported() &&
	    !(kvm_cpu_cap_has(X86_FEATURE_IBT) ||
	      kvm_cpu_cap_has(X86_FEATURE_SHSTK)))
		return false;

	if (msr->host_initiated)
		return true;

	if (!guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) &&
	    !guest_cpuid_has(vcpu, X86_FEATURE_IBT))
		return false;

	/* The synthetic MSR is for userspace access only. */
	if (msr->index == MSR_KVM_GUEST_SSP)
		return false;

	if (msr->index == MSR_IA32_U_CET)
		return true;

	if (msr->index == MSR_IA32_S_CET)
		return guest_cpuid_has(vcpu, X86_FEATURE_IBT) ||
		       kvm_cet_kernel_shstk_supported();

	if (msr->index == MSR_IA32_INT_SSP_TAB)
		return guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) &&
		       kvm_cet_kernel_shstk_supported();

	if (msr->index == MSR_IA32_PL3_SSP &&
	    !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
		return false;

	mask = (msr->index == MSR_IA32_PL3_SSP) ? XFEATURE_MASK_CET_USER :
						  XFEATURE_MASK_CET_KERNEL;
	return !!(kvm_caps.supported_xss & mask);

Which means that KVM will allow guest accesses to PL{0,1,2}_SSP regardless of
whether or not X86_FEATURE_SHSTK is enumerated to the guest.

And the above is also wrong for host_initiated writes to SHSTK MSRs.  E.g. if KVM
is running on a CPU that has IBT but not SHSTK, then userspace can write to MSRs
that do not exist.

Maybe this confusion is just a symptom of the series not providing proper
Supervisor Shadow Stack support, but that's still a poor excuse for posting
broken code.

I suspect you tried to get too fancy.  I don't see any reason to ever care about
kvm_caps.supported_xss beyond emulating writes to XSS itself.  Just require that
both CET_USER and CET_KERNEL are supported in XSS to allow IBT or SHSTK, i.e. let
X86_FEATURE_IBT and X86_FEATURE_SHSTK speak for themselves.  That way, this can
simply be:

bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu, struct msr_data *msr)
{
	if (is_shadow_stack_msr(...))
		if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
			return false;

		return msr->host_initiated ||
		       guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
	}

	if (!kvm_cpu_cap_has(X86_FEATURE_IBT) &&
	    !kvm_cpu_cap_has(X86_FEATURE_SHSTK))
		return false;

	return msr->host_initiated ||
	       guest_cpuid_has(vcpu, X86_FEATURE_IBT) ||
	       guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
}

> > > + * and reload the guest fpu states before read/write xsaves-managed MSRs.
> > > + */
> > > +static inline void kvm_get_xsave_msr(struct msr_data *msr_info)
> > > +{
> > > +	fpregs_lock_and_load();
> > KVM already has helpers that do exactly this, and they have far better names for
> > KVM: kvm_fpu_get() and kvm_fpu_put().  Can you convert kvm_fpu_get() to
> > fpregs_lock_and_load() and use those isntead? And if the extra consistency checks
> > in fpregs_lock_and_load() fire, we definitely want to know, as it means we probably
> > have bugs in KVM.
> 
> Do you want me to do some experiments to make sure the WARN()� in
> fpregs_lock_and load() would be triggered or not?

Yes, though I shouldn't have to clarify that.  The well-documented (as of now)
expectation is that any code that someone posts is tested, unless explicitly
stated otherwise.  I.e. you should not have to ask if you should verify the WARN
doesn't trigger, because you should be doing that for all code you post.

> If no WARN() trigger, then replace fpregs_lock_and_load()/fpregs_unlock()
> with kvm_fpu_get()/
> 
> kvm_fpu_put()?

Yes.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 16/21] KVM:x86: Save/Restore GUEST_SSP to/from SMM state save area
  2023-06-26  8:59     ` Yang, Weijiang
@ 2023-06-26 21:20       ` Sean Christopherson
  2023-06-27  3:50         ` Yang, Weijiang
  0 siblings, 1 reply; 99+ messages in thread
From: Sean Christopherson @ 2023-06-26 21:20 UTC (permalink / raw)
  To: Weijiang Yang
  Cc: pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen

On Mon, Jun 26, 2023, Weijiang Yang wrote:
> 
> On 6/24/2023 6:30 AM, Sean Christopherson wrote:
> > On Thu, May 11, 2023, Yang Weijiang wrote:
> > > Save GUEST_SSP to SMM state save area when guest exits to SMM
> > > due to SMI and restore it VMCS field when guest exits SMM.
> > This fails to answer "Why does KVM need to do this?"
> 
> How about this:
> 
> Guest SMM mode execution is out of guest kernel, to avoid GUEST_SSP
> corruption,
> 
> KVM needs to save current normal mode GUEST_SSP to SMRAM area so that it can
> restore original GUEST_SSP at the end of SMM.

The key point I am looking for is a call out that KVM is emulating architectural
behavior, i.e. that smram->ssp is defined in the SDM and that the documented
behavior of Intel CPUs is that the CPU's current SSP is saved on SMI and loaded
on RSM.  And I specifically say "loaded" and not "restored", because the field
is writable.

> > > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> > > ---
> > >   arch/x86/kvm/smm.c | 20 ++++++++++++++++++++
> > >   1 file changed, 20 insertions(+)
> > > 
> > > diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c
> > > index b42111a24cc2..c54d3eb2b7e4 100644
> > > --- a/arch/x86/kvm/smm.c
> > > +++ b/arch/x86/kvm/smm.c
> > > @@ -275,6 +275,16 @@ static void enter_smm_save_state_64(struct kvm_vcpu *vcpu,
> > >   	enter_smm_save_seg_64(vcpu, &smram->gs, VCPU_SREG_GS);
> > >   	smram->int_shadow = static_call(kvm_x86_get_interrupt_shadow)(vcpu);
> > > +
> > > +	if (kvm_cet_user_supported()) {
> > This is wrong, KVM should not save/restore state that doesn't exist from the guest's
> > perspective, i.e. this needs to check guest_cpuid_has().
> 
> Yes, the check missed the case that user space disables SHSTK. Will change
> it, thanks!
> 
> > 
> > On a related topic, I would love feedback on my series that adds a framework for
> > features like this, where KVM needs to check guest CPUID as well as host support.
> > 
> > https://lore.kernel.org/all/20230217231022.816138-1-seanjc@google.com
> 
> The framework looks good, will it be merged in kvm_x86?

Yes, I would like to merge it at some point.

> > > @@ -565,6 +575,16 @@ static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt,
> > >   	static_call(kvm_x86_set_interrupt_shadow)(vcpu, 0);
> > >   	ctxt->interruptibility = (u8)smstate->int_shadow;
> > > +	if (kvm_cet_user_supported()) {
> > > +		struct msr_data msr;
> > > +
> > > +		msr.index = MSR_KVM_GUEST_SSP;
> > > +		msr.host_initiated = true;
> > > +		msr.data = smstate->ssp;
> > > +		/* Mimic host_initiated access to bypass ssp access check. */
> > No, masquerading as a host access is all kinds of wrong.  I have no idea what
> > check you're trying to bypass, but whatever it is, it's wrong.  Per the SDM, the
> > SSP field in SMRAM is writable, which means that KVM needs to correctly handle
> > the scenario where SSP holds garbage, e.g. a non-canonical address.
> 
> MSR_KVM_GUEST_SSP is only accessible to user space, e.g., during LM, it's not
> accessible to VM itself. So in kvm_cet_is_msr_accessible(), I added a check to
> tell whether the access is initiated from user space or not, I tried to bypass
> that check. Yes, I will add necessary checks here.
> 
> > 
> > Why can't this use kvm_get_msr() and kvm_set_msr()?
> 
> If my above assumption is correct, these helpers are passed by
> host_initiated=false and cannot meet the requirments.

Sorry, I don't follow.  These writes are NOT initiated from the host, i.e.
kvm_get_msr() and kvm_set_msr() do the right thing, unless I'm missing something.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 20/21] KVM:x86: Enable kernel IBT support for guest
  2023-06-26 20:50       ` Sean Christopherson
@ 2023-06-27  1:53         ` Yang, Weijiang
  0 siblings, 0 replies; 99+ messages in thread
From: Yang, Weijiang @ 2023-06-27  1:53 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen


On 6/27/2023 4:50 AM, Sean Christopherson wrote:
> On Mon, Jun 26, 2023, Weijiang Yang wrote:
>> On 6/24/2023 8:03 AM, Sean Christopherson wrote:
>>>> @@ -7322,6 +7331,19 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
>>>>    	kvm_wait_lapic_expire(vcpu);
>>>> +	/*
>>>> +	 * Save host MSR_IA32_S_CET so that it can be reloaded at vm_exit.
>>>> +	 * No need to save the other two vmcs fields as supervisor SHSTK
>>>> +	 * are not enabled on Intel platform now.
>>>> +	 */
>>>> +	if (IS_ENABLED(CONFIG_X86_KERNEL_IBT) &&
>>>> +	    (vm_exit_controls_get(vmx) & VM_EXIT_LOAD_CET_STATE)) {
>>>> +		u64 msr;
>>>> +
>>>> +		rdmsrl(MSR_IA32_S_CET, msr);
>>> Reading the MSR on every VM-Enter can't possibly be necessary.  At the absolute
>>> minimum, this could be moved outside of the fastpath; if the kernel modifies S_CET
>>> from NMI context, KVM is hosed.  And *if* S_CET isn't static post-boot, this can
>>> be done in .prepare_switch_to_guest() so long as S_CET isn't modified from IRQ
>>> context.
>> Agree with you.
>>
>>> But unless mine eyes deceive me, S_CET is only truly modified during setup_cet(),
>>> i.e. is static post boot, which means it can be read once at KVM load time, e.g.
>>> just like host_efer.
>> I think handling S_CET like host_efer from usage perspective is possible
>> given currently only
>>
>> kernel IBT is enabled in kernel, I'll remove the code and initialize the
>> vmcs field once like host_efer.
>>
>>> The kernel does save/restore IBT when making BIOS calls, but if KVM is running a
>>> vCPU across a BIOS call then we've got bigger issues.
>> What's the problem you're referring to?
> I was pointing out that S_CET isn't strictly constant, as it's saved/modified/restored
> by ibt_save() + ibt_restore().  But KVM should never run between those paired
> functions, so from KVM's perspective the host value is effectively constant.

Yeah, so I think host S_CET setup can be handled as host_efer, thanks.

>
>>>> +		vmcs_writel(HOST_S_CET, msr);
>>>> +	}
>>>> +
>>>>    	/* The actual VMENTER/EXIT is in the .noinstr.text section. */
>>>>    	vmx_vcpu_enter_exit(vcpu, __vmx_vcpu_run_flags(vmx));
>>>> @@ -7735,6 +7757,13 @@ static void vmx_update_intercept_for_cet_msr(struct kvm_vcpu *vcpu)
>>>>    	incpt |= !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
>>>>    	vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP, MSR_TYPE_RW, incpt);
>>>> +
>>>> +	/*
>>>> +	 * If IBT is available to guest, then passthrough S_CET MSR too since
>>>> +	 * kernel IBT is already in mainline kernel tree.
>>>> +	 */
>>>> +	incpt = !guest_cpuid_has(vcpu, X86_FEATURE_IBT);
>>>> +	vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, MSR_TYPE_RW, incpt);
>>>>    }
>>>>    static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>>>> @@ -7805,7 +7834,7 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>>>>    	/* Refresh #PF interception to account for MAXPHYADDR changes. */
>>>>    	vmx_update_exception_bitmap(vcpu);
>>>> -	if (kvm_cet_user_supported())
>>>> +	if (kvm_cet_user_supported() || kvm_cpu_cap_has(X86_FEATURE_IBT))
>>> Yeah, kvm_cet_user_supported() simply looks wrong.
>> These are preconditions to set up CET MSRs for guest, in
>> vmx_update_intercept_for_cet_msr(),
>>
>> the actual MSR control is based on guest_cpuid_has() results.
> I know.  My point is that with the below combination,
>
> 	kvm_cet_user_supported()		= true
> 	kvm_cpu_cap_has(X86_FEATURE_IBT)	= false
> 	guest_cpuid_has(vcpu, X86_FEATURE_IBT)	= true
>
> KVM will passthrough MSR_IA32_S_CET for guest IBT even though IBT isn't supported
> on the host.
>
> 	incpt = !guest_cpuid_has(vcpu, X86_FEATURE_IBT);
> 	vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, MSR_TYPE_RW, incpt);
>
> So either KVM is broken and is passing through S_CET when it shouldn't, or the
> check on kvm_cet_user_supported() is redundant, i.e. the above combination is
> impossible.
>
> Either way, the code *looks* wrong, which is almost as bad as it being functionally
> wrong.

Got your point, I'll refine related code to make the handling reasonable.


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 13/21] KVM:VMX: Emulate reads and writes to CET MSRs
  2023-06-26 21:15       ` Sean Christopherson
@ 2023-06-27  3:32         ` Yang, Weijiang
  2023-06-27 14:55           ` Sean Christopherson
  0 siblings, 1 reply; 99+ messages in thread
From: Yang, Weijiang @ 2023-06-27  3:32 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen, Sean Christopherson


On 6/27/2023 5:15 AM, Sean Christopherson wrote:
> On Mon, Jun 26, 2023, Weijiang Yang wrote:
>> On 6/24/2023 7:53 AM, Sean Christopherson wrote:
>>> On Thu, May 11, 2023, Yang Weijiang wrote:
>>> Side topic, what on earth does the SDM mean by this?!?
>>>
>>>     The linear address written must be aligned to 8 bytes and bits 2:0 must be 0
>>>     (hardware requires bits 1:0 to be 0).
>>>
>>> I know Intel retroactively changed the alignment requirements, but the above
>>> is nonsensical.  If ucode prevents writing bits 2:0, who cares what hardware
>>> requires?
>> So do I ;-/
> Can you follow-up with someone to get clarification?  If writing bit 2 with '1'
> does not #GP despite the statement that it "must be aligned", then KVM shouldn't
> injected a #GP on that case.

OK, will consult someone and get back to this thread.

>
>>>> +			return 1;
>>>> +		kvm_set_xsave_msr(msr_info);
>>>> +		break;
>>>>    	case MSR_IA32_PERF_CAPABILITIES:
>>>>    		if (data && !vcpu_to_pmu(vcpu)->version)
>>>>    			return 1;
>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>> index b6eec9143129..2e3a39c9297c 100644
>>>> --- a/arch/x86/kvm/x86.c
>>>> +++ b/arch/x86/kvm/x86.c
>>>> @@ -13630,6 +13630,26 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
>>>>    }
>>>>    EXPORT_SYMBOL_GPL(kvm_sev_es_string_io);
>>>> +bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu, struct msr_data *msr)
>>>> +{
>>>> +	if (!kvm_cet_user_supported())
>>> This feels wrong.  KVM should differentiate between SHSTK and IBT in the host.
>>> E.g. if running in a VM with SHSTK but not IBT, or vice versa, KVM should allow
>>> writes to non-existent MSRs.
>> I don't follow you, in this case, which part KVM is on behalf of? guest or
>> user space?
> Sorry, typo.  KVM *shouldn't* allow writes to non-existent MSRs.
>
>>> I.e. this looks wrong:
>>>
>>> 	/*
>>> 	 * If SHSTK and IBT are available in KVM, clear CET user bit in
>>> 	 * kvm_caps.supported_xss so that kvm_cet_user_supported() returns
>>> 	 * false when called.
>>> 	 */
>>> 	if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
>>> 	    !kvm_cpu_cap_has(X86_FEATURE_IBT))
>>> 		kvm_caps.supported_xss &= ~XFEATURE_MASK_CET_USER;
>> The comment is wrong, it should be "are not available in KVM". My intent is,�
>> if both features are not available in KVM, then clear the precondition bit so
>> that all dependent checks will fail quickly.
> Checking kvm_caps.supported_xss.CET_USER is worthless in 99% of the cases though.
> Unless I'm missing something, the only time it's useful is for CR4.CET, which
> doesn't differentiate between SHSTK and IBT.  For everything else that KVM cares
> about, at some point KVM needs to precisely check for SHSTK and IBT support
> anyways

I will tweak the patches and do precise checks based on the available 
features to guest.

>>> and by extension, all dependent code is also wrong.  IIRC, there's a virtualization
>>> hole, but I don't see any reason why KVM has to make the hole even bigger.
>> Do you mean the issue that both SHSTK and IBT share one control MSR? i.e.,
>> U_CET/S_CET?
> I mean that passing through PLx_SSP if the host has IBT but *not* SHSTK is wrong.

Understood.

>
>>>> +		return false;
>>>> +
>>>> +	if (msr->host_initiated)
>>>> +		return true;
>>>> +
>>>> +	if (!guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) &&
>>>> +	    !guest_cpuid_has(vcpu, X86_FEATURE_IBT))
>>>> +		return false;
>>>> +
>>>> +	if (msr->index == MSR_IA32_PL3_SSP &&
>>>> +	    !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
>>> I probably asked this long ago, but if I did I since forgot.  Is it really just
>>> PL3_SSP that depends on SHSTK?  I would expect all shadow stack MSRs to depend
>>> on SHSTK.
>> All PL{0,1,2,3}_SSP plus INT_SSP_TAB msr depend on SHSTK. In patch 21, I
>> added more MSRs in this helper.
> Sure, except that patch 21 never adds handling for PL{0,1,2}_SSP.  I see:
>
> 	if (!kvm_cet_user_supported() &&
> 	    !(kvm_cpu_cap_has(X86_FEATURE_IBT) ||
> 	      kvm_cpu_cap_has(X86_FEATURE_SHSTK)))
> 		return false;
>
> 	if (msr->host_initiated)
> 		return true;
>
> 	if (!guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) &&
> 	    !guest_cpuid_has(vcpu, X86_FEATURE_IBT))
> 		return false;
>
> 	/* The synthetic MSR is for userspace access only. */
> 	if (msr->index == MSR_KVM_GUEST_SSP)
> 		return false;
>
> 	if (msr->index == MSR_IA32_U_CET)
> 		return true;
>
> 	if (msr->index == MSR_IA32_S_CET)
> 		return guest_cpuid_has(vcpu, X86_FEATURE_IBT) ||
> 		       kvm_cet_kernel_shstk_supported();
>
> 	if (msr->index == MSR_IA32_INT_SSP_TAB)
> 		return guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) &&
> 		       kvm_cet_kernel_shstk_supported();
>
> 	if (msr->index == MSR_IA32_PL3_SSP &&
> 	    !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
> 		return false;
>
> 	mask = (msr->index == MSR_IA32_PL3_SSP) ? XFEATURE_MASK_CET_USER :
> 						  XFEATURE_MASK_CET_KERNEL;
> 	return !!(kvm_caps.supported_xss & mask);
>
> Which means that KVM will allow guest accesses to PL{0,1,2}_SSP regardless of
> whether or not X86_FEATURE_SHSTK is enumerated to the guest.

Hmm, the check of X86_FEATURE_SHSTK is missing in this case.

>
> And the above is also wrong for host_initiated writes to SHSTK MSRs.  E.g. if KVM
> is running on a CPU that has IBT but not SHSTK, then userspace can write to MSRs
> that do not exist.
>
> Maybe this confusion is just a symptom of the series not providing proper
> Supervisor Shadow Stack support, but that's still a poor excuse for posting
> broken code.
>
> I suspect you tried to get too fancy.  I don't see any reason to ever care about
> kvm_caps.supported_xss beyond emulating writes to XSS itself.  Just require that
> both CET_USER and CET_KERNEL are supported in XSS to allow IBT or SHSTK, i.e. let
> X86_FEATURE_IBT and X86_FEATURE_SHSTK speak for themselves.  That way, this can
> simply be:

You're right, kvm_cet_user_supported() is overused.

Let me recap to see if I understand correctly:

1. Check both CET_USER and CET_KERNEL are supported in XSS before 
advertise SHSTK is supported

in KVM and expose it to guest, the reason is once SHSTK is exposed to 
guest, KVM should support both

modes to honor arch integrity.

2. Check CET_USER is supported before advertise IBT is supported in KVM  
and expose IBT, the reason is,

user IBT(MSR_U_CET) depends on CET_USER bit while kernel IBT(MSR_S_CET) 
doesn't.

>
> bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu, struct msr_data *msr)
> {
> 	if (is_shadow_stack_msr(...))
> 		if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
> 			return false;
>
> 		return msr->host_initiated ||
> 		       guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
> 	}
>
> 	if (!kvm_cpu_cap_has(X86_FEATURE_IBT) &&
> 	    !kvm_cpu_cap_has(X86_FEATURE_SHSTK))
> 		return false;

Move above checks to the beginning?

>
> 	return msr->host_initiated ||
> 	       guest_cpuid_has(vcpu, X86_FEATURE_IBT) ||
> 	       guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
> }
>
>>>> + * and reload the guest fpu states before read/write xsaves-managed MSRs.
>>>> + */
>>>> +static inline void kvm_get_xsave_msr(struct msr_data *msr_info)
>>>> +{
>>>> +	fpregs_lock_and_load();
>>> KVM already has helpers that do exactly this, and they have far better names for
>>> KVM: kvm_fpu_get() and kvm_fpu_put().  Can you convert kvm_fpu_get() to
>>> fpregs_lock_and_load() and use those isntead? And if the extra consistency checks
>>> in fpregs_lock_and_load() fire, we definitely want to know, as it means we probably
>>> have bugs in KVM.
>> Do you want me to do some experiments to make sure the WARN()� in
>> fpregs_lock_and load() would be triggered or not?
> Yes, though I shouldn't have to clarify that.  The well-documented (as of now)
> expectation is that any code that someone posts is tested, unless explicitly
> stated otherwise.  I.e. you should not have to ask if you should verify the WARN
> doesn't trigger, because you should be doing that for all code you post.

Surely I will do tests based on the change.

>
>> If no WARN() trigger, then replace fpregs_lock_and_load()/fpregs_unlock()
>> with kvm_fpu_get()/
>>
>> kvm_fpu_put()?
> Yes.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 16/21] KVM:x86: Save/Restore GUEST_SSP to/from SMM state save area
  2023-06-26 21:20       ` Sean Christopherson
@ 2023-06-27  3:50         ` Yang, Weijiang
  0 siblings, 0 replies; 99+ messages in thread
From: Yang, Weijiang @ 2023-06-27  3:50 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen


On 6/27/2023 5:20 AM, Sean Christopherson wrote:
> On Mon, Jun 26, 2023, Weijiang Yang wrote:
>> On 6/24/2023 6:30 AM, Sean Christopherson wrote:
>>> On Thu, May 11, 2023, Yang Weijiang wrote:
>>>> Save GUEST_SSP to SMM state save area when guest exits to SMM
>>>> due to SMI and restore it VMCS field when guest exits SMM.
>>> This fails to answer "Why does KVM need to do this?"
>> How about this:
>>
>> Guest SMM mode execution is out of guest kernel, to avoid GUEST_SSP
>> corruption,
>>
>> KVM needs to save current normal mode GUEST_SSP to SMRAM area so that it can
>> restore original GUEST_SSP at the end of SMM.
> The key point I am looking for is a call out that KVM is emulating architectural
> behavior, i.e. that smram->ssp is defined in the SDM and that the documented
> behavior of Intel CPUs is that the CPU's current SSP is saved on SMI and loaded
> on RSM.  And I specifically say "loaded" and not "restored", because the field
> is writable.

OK, will enclose these ideas.

>
>>>> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
>>>> ---
>>>>    arch/x86/kvm/smm.c | 20 ++++++++++++++++++++
>>>>    1 file changed, 20 insertions(+)
>>>>
>>>> diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c
>>>> index b42111a24cc2..c54d3eb2b7e4 100644
>>>> --- a/arch/x86/kvm/smm.c
>>>> +++ b/arch/x86/kvm/smm.c
>>>> @@ -275,6 +275,16 @@ static void enter_smm_save_state_64(struct kvm_vcpu *vcpu,
>>>>    	enter_smm_save_seg_64(vcpu, &smram->gs, VCPU_SREG_GS);
>>>>    	smram->int_shadow = static_call(kvm_x86_get_interrupt_shadow)(vcpu);
>>>> +
>>>> +	if (kvm_cet_user_supported()) {
>>> This is wrong, KVM should not save/restore state that doesn't exist from the guest's
>>> perspective, i.e. this needs to check guest_cpuid_has().
>> Yes, the check missed the case that user space disables SHSTK. Will change
>> it, thanks!
>>
>>> On a related topic, I would love feedback on my series that adds a framework for
>>> features like this, where KVM needs to check guest CPUID as well as host support.
>>>
>>> https://lore.kernel.org/all/20230217231022.816138-1-seanjc@google.com
>> The framework looks good, will it be merged in kvm_x86?
> Yes, I would like to merge it at some point.
>
>>>> @@ -565,6 +575,16 @@ static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt,
>>>>    	static_call(kvm_x86_set_interrupt_shadow)(vcpu, 0);
>>>>    	ctxt->interruptibility = (u8)smstate->int_shadow;
>>>> +	if (kvm_cet_user_supported()) {
>>>> +		struct msr_data msr;
>>>> +
>>>> +		msr.index = MSR_KVM_GUEST_SSP;
>>>> +		msr.host_initiated = true;
>>>> +		msr.data = smstate->ssp;
>>>> +		/* Mimic host_initiated access to bypass ssp access check. */
>>> No, masquerading as a host access is all kinds of wrong.  I have no idea what
>>> check you're trying to bypass, but whatever it is, it's wrong.  Per the SDM, the
>>> SSP field in SMRAM is writable, which means that KVM needs to correctly handle
>>> the scenario where SSP holds garbage, e.g. a non-canonical address.
>> MSR_KVM_GUEST_SSP is only accessible to user space, e.g., during LM, it's not
>> accessible to VM itself. So in kvm_cet_is_msr_accessible(), I added a check to
>> tell whether the access is initiated from user space or not, I tried to bypass
>> that check. Yes, I will add necessary checks here.
>>
>>> Why can't this use kvm_get_msr() and kvm_set_msr()?
>> If my above assumption is correct, these helpers are passed by
>> host_initiated=false and cannot meet the requirments.
> Sorry, I don't follow.  These writes are NOT initiated from the host, i.e.
> kvm_get_msr() and kvm_set_msr() do the right thing, unless I'm missing something.

In this series, in patch 14, I added below check:

+/* The synthetic MSR is for userspace access only. */

+if (msr->index == MSR_KVM_GUEST_SSP)

+return false;

If  kvm_get_msr() or kvm_set_msr() is used(host_initiated=false),

it'll hit this check and fail to write the MSR.

But there's anther check at the beginning of kvm_cet_is_msr_accessible():

+if (msr->host_initiated)

+return true;

I thought to use the host_initiated = true to bypass the former check.

Now the helper is going to be overhauled then this is not an issue.


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 13/21] KVM:VMX: Emulate reads and writes to CET MSRs
  2023-06-27  3:32         ` Yang, Weijiang
@ 2023-06-27 14:55           ` Sean Christopherson
  2023-06-28  1:42             ` Yang, Weijiang
  0 siblings, 1 reply; 99+ messages in thread
From: Sean Christopherson @ 2023-06-27 14:55 UTC (permalink / raw)
  To: Weijiang Yang
  Cc: pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen, Sean Christopherson

On Tue, Jun 27, 2023, Weijiang Yang wrote:
> 
> On 6/27/2023 5:15 AM, Sean Christopherson wrote:
> > And the above is also wrong for host_initiated writes to SHSTK MSRs.  E.g. if KVM
> > is running on a CPU that has IBT but not SHSTK, then userspace can write to MSRs
> > that do not exist.
> > 
> > Maybe this confusion is just a symptom of the series not providing proper
> > Supervisor Shadow Stack support, but that's still a poor excuse for posting
> > broken code.
> > 
> > I suspect you tried to get too fancy.  I don't see any reason to ever care about
> > kvm_caps.supported_xss beyond emulating writes to XSS itself.  Just require that
> > both CET_USER and CET_KERNEL are supported in XSS to allow IBT or SHSTK, i.e. let
> > X86_FEATURE_IBT and X86_FEATURE_SHSTK speak for themselves.  That way, this can
> > simply be:
> 
> You're right, kvm_cet_user_supported() is overused.
> 
> Let me recap to see if I understand correctly:
> 
> 1. Check both CET_USER and CET_KERNEL are supported in XSS before advertise
> SHSTK is supported
> 
> in KVM and expose it to guest, the reason is once SHSTK is exposed to guest,
> KVM should support both modes to honor arch integrity.
> 
> 2. Check CET_USER is supported before advertise IBT is supported in KVM� and
> expose IBT, the reason is, user IBT(MSR_U_CET) depends on CET_USER bit while
> kernel IBT(MSR_S_CET) doesn't.

IBT can also used by the kernel... 

Just require that both CET_USER and CET_KERNEL are supported to advertise IBT
or SHSTK.  I don't see why this is needs to be any more complex than that.

> > bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu, struct msr_data *msr)
> > {
> > 	if (is_shadow_stack_msr(...))
> > 		if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
> > 			return false;
> > 
> > 		return msr->host_initiated ||
> > 		       guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
> > 	}
> > 
> > 	if (!kvm_cpu_cap_has(X86_FEATURE_IBT) &&
> > 	    !kvm_cpu_cap_has(X86_FEATURE_SHSTK))
> > 		return false;
> 
> Move above checks to the beginning?

Why?  The is_shadow_stack_msr() would still have to recheck X86_FEATURE_SHSTK,
so hoisting the checks to the top would be doing unnecessary work.

> > 	return msr->host_initiated ||
> > 	       guest_cpuid_has(vcpu, X86_FEATURE_IBT) ||
> > 	       guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
> > }

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 13/21] KVM:VMX: Emulate reads and writes to CET MSRs
  2023-06-27 14:55           ` Sean Christopherson
@ 2023-06-28  1:42             ` Yang, Weijiang
  0 siblings, 0 replies; 99+ messages in thread
From: Yang, Weijiang @ 2023-06-28  1:42 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen, Sean Christopherson


On 6/27/2023 10:55 PM, Sean Christopherson wrote:
> On Tue, Jun 27, 2023, Weijiang Yang wrote:
>> On 6/27/2023 5:15 AM, Sean Christopherson wrote:
>>> And the above is also wrong for host_initiated writes to SHSTK MSRs.  E.g. if KVM
>>> is running on a CPU that has IBT but not SHSTK, then userspace can write to MSRs
>>> that do not exist.
>>>
>>> Maybe this confusion is just a symptom of the series not providing proper
>>> Supervisor Shadow Stack support, but that's still a poor excuse for posting
>>> broken code.
>>>
>>> I suspect you tried to get too fancy.  I don't see any reason to ever care about
>>> kvm_caps.supported_xss beyond emulating writes to XSS itself.  Just require that
>>> both CET_USER and CET_KERNEL are supported in XSS to allow IBT or SHSTK, i.e. let
>>> X86_FEATURE_IBT and X86_FEATURE_SHSTK speak for themselves.  That way, this can
>>> simply be:
>> You're right, kvm_cet_user_supported() is overused.
>>
>> Let me recap to see if I understand correctly:
>>
>> 1. Check both CET_USER and CET_KERNEL are supported in XSS before advertise
>> SHSTK is supported
>>
>> in KVM and expose it to guest, the reason is once SHSTK is exposed to guest,
>> KVM should support both modes to honor arch integrity.
>>
>> 2. Check CET_USER is supported before advertise IBT is supported in KVM� and
>> expose IBT, the reason is, user IBT(MSR_U_CET) depends on CET_USER bit while
>> kernel IBT(MSR_S_CET) doesn't.
> IBT can also used by the kernel...
>
> Just require that both CET_USER and CET_KERNEL are supported to advertise IBT
> or SHSTK.  I don't see why this is needs to be any more complex than that.

The arch control for user/kernel mode CET is the big source of 
complexity of the helpers.

Currently, CET_USER bit manages IA32_U_CET and IA32_PL3_SSP.

And CET_KERNEL bit manages PL{0,1,2}_SSP,

but architectural control/enable of IBT(user or kernel) is through 
IA32_{U,S}_CET, the former is

XSAVE-managed, but the latter is not.

  Checking both before enable the features  would make things much 
easier, but looks like

CET_KERNEL check for kernel IBT is excessive, just want to get your 
opinion on this. Thanks!

>
>>> bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu, struct msr_data *msr)
>>> {
>>> 	if (is_shadow_stack_msr(...))
>>> 		if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
>>> 			return false;
>>>
>>> 		return msr->host_initiated ||
>>> 		       guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
>>> 	}
>>>
>>> 	if (!kvm_cpu_cap_has(X86_FEATURE_IBT) &&
>>> 	    !kvm_cpu_cap_has(X86_FEATURE_SHSTK))
>>> 		return false;
>> Move above checks to the beginning?
> Why?  The is_shadow_stack_msr() would still have to recheck X86_FEATURE_SHSTK,
> so hoisting the checks to the top would be doing unnecessary work.

Yeah, just considered from readability perspective for the change, but 
it does introduce

unnecessary check. Will follow it.

>
>>> 	return msr->host_initiated ||
>>> 	       guest_cpuid_has(vcpu, X86_FEATURE_IBT) ||
>>> 	       guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
>>> }

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 10/21] KVM:x86: Add #CP support in guest exception classification
  2023-06-16 18:57           ` Sean Christopherson
  2023-06-19  9:28             ` Yang, Weijiang
@ 2023-06-30  9:34             ` Yang, Weijiang
  2023-06-30 10:27               ` Chao Gao
  2023-06-30 15:07               ` Sean Christopherson
  1 sibling, 2 replies; 99+ messages in thread
From: Yang, Weijiang @ 2023-06-30  9:34 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Chao Gao, pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen, gil.neiger


On 6/17/2023 2:57 AM, Sean Christopherson wrote:
> On Fri, Jun 16, 2023, Weijiang Yang wrote:
>> On 6/16/2023 7:58 AM, Sean Christopherson wrote:
>>> On Thu, Jun 08, 2023, Weijiang Yang wrote:
>>>> On 6/6/2023 5:08 PM, Chao Gao wrote:
>>>>> On Thu, May 11, 2023 at 12:08:46AM -0400, Yang Weijiang wrote:
>>>>>> Add handling for Control Protection (#CP) exceptions(vector 21).
>>>>>> The new vector is introduced for Intel's Control-Flow Enforcement
>>>>>> Technology (CET) relevant violation cases.
>>>>>>
>>>>>> Although #CP belongs contributory exception class, but the actual
>>>>>> effect is conditional on CET being exposed to guest. If CET is not
>>>>>> available to guest, #CP falls back to non-contributory and doesn't
>>>>>> have an error code.
>>>>> This sounds weird. is this the hardware behavior? If yes, could you
>>>>> point us to where this behavior is documented?
>>>> It's not SDM documented behavior.
>>> The #CP behavior needs to be documented.  Please pester whoever you need to in
>>> order to make that happen.
>> Do you mean documentation for #CP as an generic exception or the behavior in
>> KVM as this patch shows?
> As I pointed out two *years* ago, this entry in the SDM
>
>    — The field's deliver-error-code bit (bit 11) is 1 if each of the following
>      holds: (1) the interruption type is hardware exception; (2) bit 0
>      (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
>      (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
>      indicates one of the following exceptions: #DF (vector 8), #TS (10),
>      #NP (11), #SS (12), #GP (13), #PF (14), or #AC (17).
>
> needs to read something like
>
>    — The field's deliver-error-code bit (bit 11) is 1 if each of the following
>      holds: (1) the interruption type is hardware exception; (2) bit 0
>      (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
>      (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
>      indicates one of the following exceptions: #DF (vector 8), #TS (10),
>      #NP (11), #SS (12), #GP (13), #PF (14), #AC (17), or #CP (21)[1]
>
>      [1] #CP has an error code if and only if IA32_VMX_CR4_FIXED1 enumerates
>          support for the 1-setting of CR4.CET.

Hi, Sean,

I sent above change request to Gil(added in cc), but he shared different 
opinion on this issue:


"It is the case that all CET-capable parts enumerate IA32_VMX_BASIC[56] 
as 1.

  However, there were earlier parts without CET that enumerated 
IA32_VMX_BASIC[56] as 0.

  On those parts, an attempt to inject an exception with vector 21 (#CP) 
with an error code would fail.

(Injection of exception 21 with no error code would be allowed.)

  It may make things clearer if we document the statement above (all 
CET-capable parts enumerate IA32_VMX_BASIC[56] as 1).

I will see if we can update future revisions of the SDM to clarify this."


Then if this is the case,  kvm needs to check IA32_VMX_BASIC[56] before 
inject exception to nested VM.

And this patch could be removed, instead need another patch like below:

diff --git a/arch/x86/include/asm/msr-index.h 
b/arch/x86/include/asm/msr-index.h
index ad35355ee43e..6b33aacc8587 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1076,6 +1076,7 @@
  #define VMX_BASIC_MEM_TYPE_MASK    0x003c000000000000LLU
  #define VMX_BASIC_MEM_TYPE_WB    6LLU
  #define VMX_BASIC_INOUT        0x0040000000000000LLU
+#define VMX_BASIC_CHECK_ERRCODE    0x0140000000000000LLU

  /* Resctrl MSRs: */
  /* - Intel: */
diff --git a/arch/x86/kvm/vmx/capabilities.h 
b/arch/x86/kvm/vmx/capabilities.h
index 85cffeae7f10..4b1ed4dc03bc 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -79,6 +79,11 @@ static inline bool cpu_has_vmx_basic_inout(void)
      return    (((u64)vmcs_config.basic_cap << 32) & VMX_BASIC_INOUT);
  }

+static inline bool cpu_has_vmx_basic_check_errcode(void)
+{
+    return    (((u64)vmcs_config.basic_cap << 32) & 
VMX_BASIC_CHECK_ERRCODE);
+}
+
  static inline bool cpu_has_virtual_nmis(void)
  {
      return vmcs_config.pin_based_exec_ctrl & PIN_BASED_VIRTUAL_NMIS &&
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 78524daa2cb2..92aa4fc3d233 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1227,9 +1227,9 @@ static int vmx_restore_vmx_basic(struct vcpu_vmx 
*vmx, u64 data)
  {
      const u64 feature_and_reserved =
          /* feature (except bit 48; see below) */
-        BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) |
+        BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) | BIT_ULL(56) |
          /* reserved */
-        BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 56);
+        BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 57);
      u64 vmx_basic = vmcs_config.nested.basic;

      if (!is_bitwise_subset(vmx_basic, data, feature_and_reserved))
@@ -2873,7 +2873,8 @@ static int nested_check_vm_entry_controls(struct 
kvm_vcpu *vcpu,
          should_have_error_code =
              intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode &&
              x86_exception_has_error_code(vector);
-        if (CC(has_error_code != should_have_error_code))
+        if (!cpu_has_vmx_basic_check_errcode() &&
+            CC(has_error_code != should_have_error_code))
              return -EINVAL;

          /* VM-entry exception error code */
@@ -6986,6 +6987,8 @@ static void nested_vmx_setup_basic(struct 
nested_vmx_msrs *msrs)

      if (cpu_has_vmx_basic_inout())
          msrs->basic |= VMX_BASIC_INOUT;
+    if (cpu_has_vmx_basic_check_errcode())
+        msrs->basic |= VMX_BASIC_CHECK_ERRCODE;
  }

  static void nested_vmx_setup_cr_fixed(struct nested_vmx_msrs *msrs)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index d70f2e94b187..95c0eab7805c 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2748,7 +2748,7 @@ static int setup_vmcs_config(struct vmcs_config 
*vmcs_conf,
      rdmsrl(MSR_IA32_VMX_MISC, misc_msr);

      vmcs_conf->size = vmx_msr_high & 0x1fff;
-    vmcs_conf->basic_cap = vmx_msr_high & ~0x1fff;
+    vmcs_conf->basic_cap = vmx_msr_high & ~0x7fff;

      vmcs_conf->revision_id = vmx_msr_low;



^ permalink raw reply related	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 10/21] KVM:x86: Add #CP support in guest exception classification
  2023-06-30  9:34             ` Yang, Weijiang
@ 2023-06-30 10:27               ` Chao Gao
  2023-06-30 12:05                 ` Yang, Weijiang
  2023-06-30 15:07               ` Sean Christopherson
  1 sibling, 1 reply; 99+ messages in thread
From: Chao Gao @ 2023-06-30 10:27 UTC (permalink / raw)
  To: Yang, Weijiang
  Cc: Sean Christopherson, pbonzini, kvm, linux-kernel, peterz, rppt,
	binbin.wu, rick.p.edgecombe, john.allen, gil.neiger

On Fri, Jun 30, 2023 at 05:34:28PM +0800, Yang, Weijiang wrote:
>
>On 6/17/2023 2:57 AM, Sean Christopherson wrote:
>> On Fri, Jun 16, 2023, Weijiang Yang wrote:
>> > On 6/16/2023 7:58 AM, Sean Christopherson wrote:
>> > > On Thu, Jun 08, 2023, Weijiang Yang wrote:
>> > > > On 6/6/2023 5:08 PM, Chao Gao wrote:
>> > > > > On Thu, May 11, 2023 at 12:08:46AM -0400, Yang Weijiang wrote:
>> > > > > > Add handling for Control Protection (#CP) exceptions(vector 21).
>> > > > > > The new vector is introduced for Intel's Control-Flow Enforcement
>> > > > > > Technology (CET) relevant violation cases.
>> > > > > > 
>> > > > > > Although #CP belongs contributory exception class, but the actual
>> > > > > > effect is conditional on CET being exposed to guest. If CET is not
>> > > > > > available to guest, #CP falls back to non-contributory and doesn't
>> > > > > > have an error code.
>> > > > > This sounds weird. is this the hardware behavior? If yes, could you
>> > > > > point us to where this behavior is documented?
>> > > > It's not SDM documented behavior.
>> > > The #CP behavior needs to be documented.  Please pester whoever you need to in
>> > > order to make that happen.
>> > Do you mean documentation for #CP as an generic exception or the behavior in
>> > KVM as this patch shows?
>> As I pointed out two *years* ago, this entry in the SDM
>> 
>>    — The field's deliver-error-code bit (bit 11) is 1 if each of the following
>>      holds: (1) the interruption type is hardware exception; (2) bit 0
>>      (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
>>      (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
>>      indicates one of the following exceptions: #DF (vector 8), #TS (10),
>>      #NP (11), #SS (12), #GP (13), #PF (14), or #AC (17).
>> 
>> needs to read something like
>> 
>>    — The field's deliver-error-code bit (bit 11) is 1 if each of the following
>>      holds: (1) the interruption type is hardware exception; (2) bit 0
>>      (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
>>      (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
>>      indicates one of the following exceptions: #DF (vector 8), #TS (10),
>>      #NP (11), #SS (12), #GP (13), #PF (14), #AC (17), or #CP (21)[1]
>> 
>>      [1] #CP has an error code if and only if IA32_VMX_CR4_FIXED1 enumerates
>>          support for the 1-setting of CR4.CET.
>
>Hi, Sean,
>
>I sent above change request to Gil(added in cc), but he shared different
>opinion on this issue:
>
>
>"It is the case that all CET-capable parts enumerate IA32_VMX_BASIC[56] as 1.
>
> However, there were earlier parts without CET that enumerated
>IA32_VMX_BASIC[56] as 0.
>
> On those parts, an attempt to inject an exception with vector 21 (#CP) with
>an error code would fail.
>
>(Injection of exception 21 with no error code would be allowed.)
>
> It may make things clearer if we document the statement above (all
>CET-capable parts enumerate IA32_VMX_BASIC[56] as 1).
>
>I will see if we can update future revisions of the SDM to clarify this."
>
>
>Then if this is the case,  kvm needs to check IA32_VMX_BASIC[56] before
>inject exception to nested VM.

And KVM can hide CET from guests if IA32_VMX_BASIC[56] is 0.

>
>And this patch could be removed, instead need another patch like below:
>
>diff --git a/arch/x86/include/asm/msr-index.h
>b/arch/x86/include/asm/msr-index.h
>index ad35355ee43e..6b33aacc8587 100644
>--- a/arch/x86/include/asm/msr-index.h
>+++ b/arch/x86/include/asm/msr-index.h
>@@ -1076,6 +1076,7 @@
> #define VMX_BASIC_MEM_TYPE_MASK    0x003c000000000000LLU
> #define VMX_BASIC_MEM_TYPE_WB    6LLU
> #define VMX_BASIC_INOUT        0x0040000000000000LLU
>+#define VMX_BASIC_CHECK_ERRCODE    0x0140000000000000LLU
>
> /* Resctrl MSRs: */
> /* - Intel: */
>diff --git a/arch/x86/kvm/vmx/capabilities.h
>b/arch/x86/kvm/vmx/capabilities.h
>index 85cffeae7f10..4b1ed4dc03bc 100644
>--- a/arch/x86/kvm/vmx/capabilities.h
>+++ b/arch/x86/kvm/vmx/capabilities.h
>@@ -79,6 +79,11 @@ static inline bool cpu_has_vmx_basic_inout(void)
>     return    (((u64)vmcs_config.basic_cap << 32) & VMX_BASIC_INOUT);
> }
>
>+static inline bool cpu_has_vmx_basic_check_errcode(void)
>+{
>+    return    (((u64)vmcs_config.basic_cap << 32) &
>VMX_BASIC_CHECK_ERRCODE);
>+}
>+
> static inline bool cpu_has_virtual_nmis(void)
> {
>     return vmcs_config.pin_based_exec_ctrl & PIN_BASED_VIRTUAL_NMIS &&
>diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
>index 78524daa2cb2..92aa4fc3d233 100644
>--- a/arch/x86/kvm/vmx/nested.c
>+++ b/arch/x86/kvm/vmx/nested.c
>@@ -1227,9 +1227,9 @@ static int vmx_restore_vmx_basic(struct vcpu_vmx *vmx,
>u64 data)
> {
>     const u64 feature_and_reserved =
>         /* feature (except bit 48; see below) */
>-        BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) |
>+        BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) | BIT_ULL(56) |
>         /* reserved */
>-        BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 56);
>+        BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 57);
>     u64 vmx_basic = vmcs_config.nested.basic;
>
>     if (!is_bitwise_subset(vmx_basic, data, feature_and_reserved))
>@@ -2873,7 +2873,8 @@ static int nested_check_vm_entry_controls(struct
>kvm_vcpu *vcpu,
>         should_have_error_code =
>             intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode &&
>             x86_exception_has_error_code(vector);
>-        if (CC(has_error_code != should_have_error_code))
>+        if (!cpu_has_vmx_basic_check_errcode() &&

We can skip computing should_have_error_code. and we should check if
IA32_VMX_BASIC[56] is set for this vCPU (i.e. in vmx->nested.msrs.basic)
rather than host/kvm capability.

>+            CC(has_error_code != should_have_error_code))
>             return -EINVAL;
>
>         /* VM-entry exception error code */
>@@ -6986,6 +6987,8 @@ static void nested_vmx_setup_basic(struct
>nested_vmx_msrs *msrs)
>
>     if (cpu_has_vmx_basic_inout())
>         msrs->basic |= VMX_BASIC_INOUT;
>+    if (cpu_has_vmx_basic_check_errcode())
>+        msrs->basic |= VMX_BASIC_CHECK_ERRCODE;
> }
>
> static void nested_vmx_setup_cr_fixed(struct nested_vmx_msrs *msrs)
>diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>index d70f2e94b187..95c0eab7805c 100644
>--- a/arch/x86/kvm/vmx/vmx.c
>+++ b/arch/x86/kvm/vmx/vmx.c
>@@ -2748,7 +2748,7 @@ static int setup_vmcs_config(struct vmcs_config
>*vmcs_conf,
>     rdmsrl(MSR_IA32_VMX_MISC, misc_msr);
>
>     vmcs_conf->size = vmx_msr_high & 0x1fff;
>-    vmcs_conf->basic_cap = vmx_msr_high & ~0x1fff;
>+    vmcs_conf->basic_cap = vmx_msr_high & ~0x7fff;
>
>     vmcs_conf->revision_id = vmx_msr_low;
>
>

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 10/21] KVM:x86: Add #CP support in guest exception classification
  2023-06-30 10:27               ` Chao Gao
@ 2023-06-30 12:05                 ` Yang, Weijiang
  2023-06-30 15:05                   ` Neiger, Gil
  0 siblings, 1 reply; 99+ messages in thread
From: Yang, Weijiang @ 2023-06-30 12:05 UTC (permalink / raw)
  To: Chao Gao
  Cc: Sean Christopherson, pbonzini, kvm, linux-kernel, peterz, rppt,
	binbin.wu, rick.p.edgecombe, john.allen, gil.neiger


On 6/30/2023 6:27 PM, Chao Gao wrote:
> On Fri, Jun 30, 2023 at 05:34:28PM +0800, Yang, Weijiang wrote:
>> On 6/17/2023 2:57 AM, Sean Christopherson wrote:
>>> On Fri, Jun 16, 2023, Weijiang Yang wrote:
>>>> On 6/16/2023 7:58 AM, Sean Christopherson wrote:
>>>>> On Thu, Jun 08, 2023, Weijiang Yang wrote:
>>>>>> On 6/6/2023 5:08 PM, Chao Gao wrote:
>>>>>>> On Thu, May 11, 2023 at 12:08:46AM -0400, Yang Weijiang wrote:
>>>>>>>> Add handling for Control Protection (#CP) exceptions(vector 21).
>>>>>>>> The new vector is introduced for Intel's Control-Flow Enforcement
>>>>>>>> Technology (CET) relevant violation cases.
>>>>>>>>
>>>>>>>> Although #CP belongs contributory exception class, but the actual
>>>>>>>> effect is conditional on CET being exposed to guest. If CET is not
>>>>>>>> available to guest, #CP falls back to non-contributory and doesn't
>>>>>>>> have an error code.
>>>>>>> This sounds weird. is this the hardware behavior? If yes, could you
>>>>>>> point us to where this behavior is documented?
>>>>>> It's not SDM documented behavior.
>>>>> The #CP behavior needs to be documented.  Please pester whoever you need to in
>>>>> order to make that happen.
>>>> Do you mean documentation for #CP as an generic exception or the behavior in
>>>> KVM as this patch shows?
>>> As I pointed out two *years* ago, this entry in the SDM
>>>
>>>     — The field's deliver-error-code bit (bit 11) is 1 if each of the following
>>>       holds: (1) the interruption type is hardware exception; (2) bit 0
>>>       (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
>>>       (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
>>>       indicates one of the following exceptions: #DF (vector 8), #TS (10),
>>>       #NP (11), #SS (12), #GP (13), #PF (14), or #AC (17).
>>>
>>> needs to read something like
>>>
>>>     — The field's deliver-error-code bit (bit 11) is 1 if each of the following
>>>       holds: (1) the interruption type is hardware exception; (2) bit 0
>>>       (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
>>>       (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
>>>       indicates one of the following exceptions: #DF (vector 8), #TS (10),
>>>       #NP (11), #SS (12), #GP (13), #PF (14), #AC (17), or #CP (21)[1]
>>>
>>>       [1] #CP has an error code if and only if IA32_VMX_CR4_FIXED1 enumerates
>>>           support for the 1-setting of CR4.CET.
>> Hi, Sean,
>>
>> I sent above change request to Gil(added in cc), but he shared different
>> opinion on this issue:
>>
>>
>> "It is the case that all CET-capable parts enumerate IA32_VMX_BASIC[56] as 1.
>>
>>   However, there were earlier parts without CET that enumerated
>> IA32_VMX_BASIC[56] as 0.
>>
>>   On those parts, an attempt to inject an exception with vector 21 (#CP) with
>> an error code would fail.
>>
>> (Injection of exception 21 with no error code would be allowed.)
>>
>>   It may make things clearer if we document the statement above (all
>> CET-capable parts enumerate IA32_VMX_BASIC[56] as 1).
>>
>> I will see if we can update future revisions of the SDM to clarify this."
>>
>>
>> Then if this is the case,  kvm needs to check IA32_VMX_BASIC[56] before
>> inject exception to nested VM.
> And KVM can hide CET from guests if IA32_VMX_BASIC[56] is 0.

Yes, this scratch patch didn't cover cross-check with CET enabling, thanks!

>
>> And this patch could be removed, instead need another patch like below:
>>
>> diff --git a/arch/x86/include/asm/msr-index.h
>> b/arch/x86/include/asm/msr-index.h
>> index ad35355ee43e..6b33aacc8587 100644
>> --- a/arch/x86/include/asm/msr-index.h
>> +++ b/arch/x86/include/asm/msr-index.h
>> @@ -1076,6 +1076,7 @@
>>   #define VMX_BASIC_MEM_TYPE_MASK    0x003c000000000000LLU
>>   #define VMX_BASIC_MEM_TYPE_WB    6LLU
>>   #define VMX_BASIC_INOUT        0x0040000000000000LLU
>> +#define VMX_BASIC_CHECK_ERRCODE    0x0140000000000000LLU
>>
>>   /* Resctrl MSRs: */
>>   /* - Intel: */
>> diff --git a/arch/x86/kvm/vmx/capabilities.h
>> b/arch/x86/kvm/vmx/capabilities.h
>> index 85cffeae7f10..4b1ed4dc03bc 100644
>> --- a/arch/x86/kvm/vmx/capabilities.h
>> +++ b/arch/x86/kvm/vmx/capabilities.h
>> @@ -79,6 +79,11 @@ static inline bool cpu_has_vmx_basic_inout(void)
>>       return    (((u64)vmcs_config.basic_cap << 32) & VMX_BASIC_INOUT);
>>   }
>>
>> +static inline bool cpu_has_vmx_basic_check_errcode(void)
>> +{
>> +    return    (((u64)vmcs_config.basic_cap << 32) &
>> VMX_BASIC_CHECK_ERRCODE);
>> +}
>> +
>>   static inline bool cpu_has_virtual_nmis(void)
>>   {
>>       return vmcs_config.pin_based_exec_ctrl & PIN_BASED_VIRTUAL_NMIS &&
>> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
>> index 78524daa2cb2..92aa4fc3d233 100644
>> --- a/arch/x86/kvm/vmx/nested.c
>> +++ b/arch/x86/kvm/vmx/nested.c
>> @@ -1227,9 +1227,9 @@ static int vmx_restore_vmx_basic(struct vcpu_vmx *vmx,
>> u64 data)
>>   {
>>       const u64 feature_and_reserved =
>>           /* feature (except bit 48; see below) */
>> -        BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) |
>> +        BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) | BIT_ULL(56) |
>>           /* reserved */
>> -        BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 56);
>> +        BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 57);
>>       u64 vmx_basic = vmcs_config.nested.basic;
>>
>>       if (!is_bitwise_subset(vmx_basic, data, feature_and_reserved))
>> @@ -2873,7 +2873,8 @@ static int nested_check_vm_entry_controls(struct
>> kvm_vcpu *vcpu,
>>           should_have_error_code =
>>               intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode &&
>>               x86_exception_has_error_code(vector);
>> -        if (CC(has_error_code != should_have_error_code))
>> +        if (!cpu_has_vmx_basic_check_errcode() &&
> We can skip computing should_have_error_code. and we should check if
> IA32_VMX_BASIC[56] is set for this vCPU (i.e. in vmx->nested.msrs.basic)
> rather than host/kvm capability.

Oops, I confused myself, yes, need to reshape the code a bit and use 
msrs.basic

to check the bit status, thanks!

>
>> +            CC(has_error_code != should_have_error_code))
>>               return -EINVAL;
>>
>>           /* VM-entry exception error code */
>> @@ -6986,6 +6987,8 @@ static void nested_vmx_setup_basic(struct
>> nested_vmx_msrs *msrs)
>>
>>       if (cpu_has_vmx_basic_inout())
>>           msrs->basic |= VMX_BASIC_INOUT;
>> +    if (cpu_has_vmx_basic_check_errcode())
>> +        msrs->basic |= VMX_BASIC_CHECK_ERRCODE;
>>   }
>>
>>   static void nested_vmx_setup_cr_fixed(struct nested_vmx_msrs *msrs)
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index d70f2e94b187..95c0eab7805c 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -2748,7 +2748,7 @@ static int setup_vmcs_config(struct vmcs_config
>> *vmcs_conf,
>>       rdmsrl(MSR_IA32_VMX_MISC, misc_msr);
>>
>>       vmcs_conf->size = vmx_msr_high & 0x1fff;
>> -    vmcs_conf->basic_cap = vmx_msr_high & ~0x1fff;
>> +    vmcs_conf->basic_cap = vmx_msr_high & ~0x7fff;
>>
>>       vmcs_conf->revision_id = vmx_msr_low;
>>
>>

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH v3 10/21] KVM:x86: Add #CP support in guest exception classification
  2023-06-30 12:05                 ` Yang, Weijiang
@ 2023-06-30 15:05                   ` Neiger, Gil
  2023-06-30 15:15                     ` Sean Christopherson
  2023-07-01  1:54                     ` Yang, Weijiang
  0 siblings, 2 replies; 99+ messages in thread
From: Neiger, Gil @ 2023-06-30 15:05 UTC (permalink / raw)
  To: Yang, Weijiang, Gao, Chao
  Cc: Christopherson,,
	Sean, pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	Edgecombe, Rick P, john.allen

Intel will not produce any CPU with CET that does not enumerate IA32_VMX_BASIC[56] as 1.

One can check that bit before injecting a #CP with error code, but it should not be necessary if CET is enumerated.

Of course, if the KVM may run as a guest of another VMM/hypervisor, it may be that the virtual CPU in which KVM operates may enumerate CET but clear the bit in IA32_VMX_BASIC.

				- Gil

-----Original Message-----
From: Yang, Weijiang <weijiang.yang@intel.com> 
Sent: Friday, June 30, 2023 05:05
To: Gao, Chao <chao.gao@intel.com>
Cc: Christopherson,, Sean <seanjc@google.com>; pbonzini@redhat.com; kvm@vger.kernel.org; linux-kernel@vger.kernel.org; peterz@infradead.org; rppt@kernel.org; binbin.wu@linux.intel.com; Edgecombe, Rick P <rick.p.edgecombe@intel.com>; john.allen@amd.com; Neiger, Gil <gil.neiger@intel.com>
Subject: Re: [PATCH v3 10/21] KVM:x86: Add #CP support in guest exception classification


On 6/30/2023 6:27 PM, Chao Gao wrote:
> On Fri, Jun 30, 2023 at 05:34:28PM +0800, Yang, Weijiang wrote:
>> On 6/17/2023 2:57 AM, Sean Christopherson wrote:
>>> On Fri, Jun 16, 2023, Weijiang Yang wrote:
>>>> On 6/16/2023 7:58 AM, Sean Christopherson wrote:
>>>>> On Thu, Jun 08, 2023, Weijiang Yang wrote:
>>>>>> On 6/6/2023 5:08 PM, Chao Gao wrote:
>>>>>>> On Thu, May 11, 2023 at 12:08:46AM -0400, Yang Weijiang wrote:
>>>>>>>> Add handling for Control Protection (#CP) exceptions(vector 21).
>>>>>>>> The new vector is introduced for Intel's Control-Flow 
>>>>>>>> Enforcement Technology (CET) relevant violation cases.
>>>>>>>>
>>>>>>>> Although #CP belongs contributory exception class, but the 
>>>>>>>> actual effect is conditional on CET being exposed to guest. If 
>>>>>>>> CET is not available to guest, #CP falls back to 
>>>>>>>> non-contributory and doesn't have an error code.
>>>>>>> This sounds weird. is this the hardware behavior? If yes, could 
>>>>>>> you point us to where this behavior is documented?
>>>>>> It's not SDM documented behavior.
>>>>> The #CP behavior needs to be documented.  Please pester whoever 
>>>>> you need to in order to make that happen.
>>>> Do you mean documentation for #CP as an generic exception or the 
>>>> behavior in KVM as this patch shows?
>>> As I pointed out two *years* ago, this entry in the SDM
>>>
>>>     — The field's deliver-error-code bit (bit 11) is 1 if each of the following
>>>       holds: (1) the interruption type is hardware exception; (2) bit 0
>>>       (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
>>>       (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
>>>       indicates one of the following exceptions: #DF (vector 8), #TS (10),
>>>       #NP (11), #SS (12), #GP (13), #PF (14), or #AC (17).
>>>
>>> needs to read something like
>>>
>>>     — The field's deliver-error-code bit (bit 11) is 1 if each of the following
>>>       holds: (1) the interruption type is hardware exception; (2) bit 0
>>>       (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
>>>       (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
>>>       indicates one of the following exceptions: #DF (vector 8), #TS (10),
>>>       #NP (11), #SS (12), #GP (13), #PF (14), #AC (17), or #CP 
>>> (21)[1]
>>>
>>>       [1] #CP has an error code if and only if IA32_VMX_CR4_FIXED1 enumerates
>>>           support for the 1-setting of CR4.CET.
>> Hi, Sean,
>>
>> I sent above change request to Gil(added in cc), but he shared 
>> different opinion on this issue:
>>
>>
>> "It is the case that all CET-capable parts enumerate IA32_VMX_BASIC[56] as 1.
>>
>>   However, there were earlier parts without CET that enumerated 
>> IA32_VMX_BASIC[56] as 0.
>>
>>   On those parts, an attempt to inject an exception with vector 21 
>> (#CP) with an error code would fail.
>>
>> (Injection of exception 21 with no error code would be allowed.)
>>
>>   It may make things clearer if we document the statement above (all 
>> CET-capable parts enumerate IA32_VMX_BASIC[56] as 1).
>>
>> I will see if we can update future revisions of the SDM to clarify this."
>>
>>
>> Then if this is the case,  kvm needs to check IA32_VMX_BASIC[56] 
>> before inject exception to nested VM.
> And KVM can hide CET from guests if IA32_VMX_BASIC[56] is 0.

Yes, this scratch patch didn't cover cross-check with CET enabling, thanks!

>
>> And this patch could be removed, instead need another patch like below:
>>
>> diff --git a/arch/x86/include/asm/msr-index.h
>> b/arch/x86/include/asm/msr-index.h
>> index ad35355ee43e..6b33aacc8587 100644
>> --- a/arch/x86/include/asm/msr-index.h
>> +++ b/arch/x86/include/asm/msr-index.h
>> @@ -1076,6 +1076,7 @@
>>   #define VMX_BASIC_MEM_TYPE_MASK    0x003c000000000000LLU
>>   #define VMX_BASIC_MEM_TYPE_WB    6LLU
>>   #define VMX_BASIC_INOUT        0x0040000000000000LLU
>> +#define VMX_BASIC_CHECK_ERRCODE    0x0140000000000000LLU
>>
>>   /* Resctrl MSRs: */
>>   /* - Intel: */
>> diff --git a/arch/x86/kvm/vmx/capabilities.h 
>> b/arch/x86/kvm/vmx/capabilities.h index 85cffeae7f10..4b1ed4dc03bc 
>> 100644
>> --- a/arch/x86/kvm/vmx/capabilities.h
>> +++ b/arch/x86/kvm/vmx/capabilities.h
>> @@ -79,6 +79,11 @@ static inline bool cpu_has_vmx_basic_inout(void)
>>       return    (((u64)vmcs_config.basic_cap << 32) & 
>> VMX_BASIC_INOUT);
>>   }
>>
>> +static inline bool cpu_has_vmx_basic_check_errcode(void)
>> +{
>> +    return    (((u64)vmcs_config.basic_cap << 32) &
>> VMX_BASIC_CHECK_ERRCODE);
>> +}
>> +
>>   static inline bool cpu_has_virtual_nmis(void)
>>   {
>>       return vmcs_config.pin_based_exec_ctrl & PIN_BASED_VIRTUAL_NMIS 
>> && diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c 
>> index 78524daa2cb2..92aa4fc3d233 100644
>> --- a/arch/x86/kvm/vmx/nested.c
>> +++ b/arch/x86/kvm/vmx/nested.c
>> @@ -1227,9 +1227,9 @@ static int vmx_restore_vmx_basic(struct 
>> vcpu_vmx *vmx,
>> u64 data)
>>   {
>>       const u64 feature_and_reserved =
>>           /* feature (except bit 48; see below) */
>> -        BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) |
>> +        BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) | BIT_ULL(56) |
>>           /* reserved */
>> -        BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 56);
>> +        BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 57);
>>       u64 vmx_basic = vmcs_config.nested.basic;
>>
>>       if (!is_bitwise_subset(vmx_basic, data, feature_and_reserved)) 
>> @@ -2873,7 +2873,8 @@ static int 
>> nested_check_vm_entry_controls(struct
>> kvm_vcpu *vcpu,
>>           should_have_error_code =
>>               intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode &&
>>               x86_exception_has_error_code(vector);
>> -        if (CC(has_error_code != should_have_error_code))
>> +        if (!cpu_has_vmx_basic_check_errcode() &&
> We can skip computing should_have_error_code. and we should check if 
> IA32_VMX_BASIC[56] is set for this vCPU (i.e. in 
> vmx->nested.msrs.basic) rather than host/kvm capability.

Oops, I confused myself, yes, need to reshape the code a bit and use msrs.basic

to check the bit status, thanks!

>
>> +            CC(has_error_code != should_have_error_code))
>>               return -EINVAL;
>>
>>           /* VM-entry exception error code */ @@ -6986,6 +6987,8 @@ 
>> static void nested_vmx_setup_basic(struct nested_vmx_msrs *msrs)
>>
>>       if (cpu_has_vmx_basic_inout())
>>           msrs->basic |= VMX_BASIC_INOUT;
>> +    if (cpu_has_vmx_basic_check_errcode())
>> +        msrs->basic |= VMX_BASIC_CHECK_ERRCODE;
>>   }
>>
>>   static void nested_vmx_setup_cr_fixed(struct nested_vmx_msrs *msrs) 
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 
>> d70f2e94b187..95c0eab7805c 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -2748,7 +2748,7 @@ static int setup_vmcs_config(struct vmcs_config 
>> *vmcs_conf,
>>       rdmsrl(MSR_IA32_VMX_MISC, misc_msr);
>>
>>       vmcs_conf->size = vmx_msr_high & 0x1fff;
>> -    vmcs_conf->basic_cap = vmx_msr_high & ~0x1fff;
>> +    vmcs_conf->basic_cap = vmx_msr_high & ~0x7fff;
>>
>>       vmcs_conf->revision_id = vmx_msr_low;
>>
>>

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 10/21] KVM:x86: Add #CP support in guest exception classification
  2023-06-30  9:34             ` Yang, Weijiang
  2023-06-30 10:27               ` Chao Gao
@ 2023-06-30 15:07               ` Sean Christopherson
  2023-06-30 15:21                 ` Neiger, Gil
  2023-07-01  1:57                 ` Yang, Weijiang
  1 sibling, 2 replies; 99+ messages in thread
From: Sean Christopherson @ 2023-06-30 15:07 UTC (permalink / raw)
  To: Weijiang Yang
  Cc: Chao Gao, pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen, gil.neiger

On Fri, Jun 30, 2023, Weijiang Yang wrote:
> 
> On 6/17/2023 2:57 AM, Sean Christopherson wrote:
> > > Do you mean documentation for #CP as an generic exception or the behavior in
> > > KVM as this patch shows?
> > As I pointed out two *years* ago, this entry in the SDM
> > 
> >    — The field's deliver-error-code bit (bit 11) is 1 if each of the following
> >      holds: (1) the interruption type is hardware exception; (2) bit 0
> >      (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
> >      (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
> >      indicates one of the following exceptions: #DF (vector 8), #TS (10),
> >      #NP (11), #SS (12), #GP (13), #PF (14), or #AC (17).
> > 
> > needs to read something like
> > 
> >    — The field's deliver-error-code bit (bit 11) is 1 if each of the following
> >      holds: (1) the interruption type is hardware exception; (2) bit 0
> >      (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
> >      (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
> >      indicates one of the following exceptions: #DF (vector 8), #TS (10),
> >      #NP (11), #SS (12), #GP (13), #PF (14), #AC (17), or #CP (21)[1]
> > 
> >      [1] #CP has an error code if and only if IA32_VMX_CR4_FIXED1 enumerates
> >          support for the 1-setting of CR4.CET.
> 
> Hi, Sean,
> 
> I sent above change request to Gil(added in cc), but he shared different
> opinion on this issue:

Heh, "opinion".

>  It may make things clearer if we document the statement above (all
> CET-capable parts enumerate IA32_VMX_BASIC[56] as 1).
> 
> I will see if we can update future revisions of the SDM to clarify this."

That would be helpful.  Though to be perfectly honest, I simply overlooked the
existence of IA32_VMX_BASIC[56].

Thanks!

> Then if this is the case,  kvm needs to check IA32_VMX_BASIC[56] before
> inject exception to nested VM.
> 
> And this patch could be removed, instead need another patch like below:
> 
> diff --git a/arch/x86/include/asm/msr-index.h
> b/arch/x86/include/asm/msr-index.h
> index ad35355ee43e..6b33aacc8587 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -1076,6 +1076,7 @@
>  #define VMX_BASIC_MEM_TYPE_MASK    0x003c000000000000LLU
>  #define VMX_BASIC_MEM_TYPE_WB    6LLU
>  #define VMX_BASIC_INOUT        0x0040000000000000LLU
> +#define VMX_BASIC_CHECK_ERRCODE    0x0140000000000000LLU

"Check Error Code" isn't a great description.  The flag enumerates that there the
CPU does *not* perform consistency checks on the error code when injecting hardware
exceptions.

So something like this?

  VMX_BASIC_NO_HW_ERROR_CODE_CC

or maybe

  VMX_BASIC_PM_NO_HW_ERROR_CODE_CC

if we want to capture that only protected mode is exempt (I personally prefer
just VMX_BASIC_NO_HW_ERROR_CODE_CC as "PM" is a bit ambiguous).

> @@ -2873,7 +2873,8 @@ static int nested_check_vm_entry_controls(struct
> kvm_vcpu *vcpu,
>          should_have_error_code =
>              intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode &&
>              x86_exception_has_error_code(vector);
> -        if (CC(has_error_code != should_have_error_code))
> +        if (!cpu_has_vmx_basic_check_errcode() &&
> +            CC(has_error_code != should_have_error_code))

This is wrong on mutiple fronts:

  1. The new feature flag only excempts hardware exceptions delivered to guests
     with CR0.PE=1.  The above will skip the consistency check for all event injection.

  2. KVM needs to check the CPU model that is exposed to L1, not the capabilities
     of the host CPU.

Highlighting the key phrases in the SDM:

  The field's deliver-error-code bit (bit 11) is 1 if each of the following holds: (1) the interruption type is
                                                      ^^^^^^^
  hardware exception; (2) bit 0 (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
  (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector indicates one of the following
  exceptions: #DF (vector 8), #TS (10), #NP (11), #SS (12), #GP (13), #PF (14), or #AC (17).
  
  The field's deliver-error-code bit is 0 if any of the following holds: (1) the interruption type is not hardware
                                             ^^^^^^
  exception; (2) bit 0 is clear in the CR0 field in the guest-state area; or (3) IA32_VMX_BASIC[56] is read as
  0 and the vector is in one of the following ranges: 0–7, 9, 15, 16, or 18–31.

I think what we want is:

		/* VM-entry interruption-info field: deliver error code */
		if (!prot_mode || intr_type != INTR_TYPE_HARD_EXCEPTION ||
		    !nested_cpu_has_no_hw_error_code_cc(vcpu)) {
			should_have_error_code =
				intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode &&
				x86_exception_has_error_code(vector);
			if (CC(has_error_code != should_have_error_code))
				return -EINVAL;
		}

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 10/21] KVM:x86: Add #CP support in guest exception classification
  2023-06-30 15:05                   ` Neiger, Gil
@ 2023-06-30 15:15                     ` Sean Christopherson
  2023-07-01  1:58                       ` Yang, Weijiang
  2023-07-01  1:54                     ` Yang, Weijiang
  1 sibling, 1 reply; 99+ messages in thread
From: Sean Christopherson @ 2023-06-30 15:15 UTC (permalink / raw)
  To: Gil Neiger
  Cc: Weijiang Yang, Chao Gao, pbonzini, kvm, linux-kernel, peterz,
	rppt, binbin.wu, Rick P Edgecombe, john.allen

On Fri, Jun 30, 2023, Gil Neiger wrote:
> Intel will not produce any CPU with CET that does not enumerate IA32_VMX_BASIC[56] as 1.
> 
> One can check that bit before injecting a #CP with error code, but it should
> not be necessary if CET is enumerated.
> 
> Of course, if the KVM may run as a guest of another VMM/hypervisor, it may be
> that the virtual CPU in which KVM operates may enumerate CET but clear the
> bit in IA32_VMX_BASIC.

Yeah, I think KVM should be paranoid and expose CET to the guest if and only if
IA32_VMX_BASIC[56] is 1.  That'll also help validate nested support, e.g. will
make it more obvious if userspace+KVM provides a  "bad" model to L1.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH v3 10/21] KVM:x86: Add #CP support in guest exception classification
  2023-06-30 15:07               ` Sean Christopherson
@ 2023-06-30 15:21                 ` Neiger, Gil
  2023-07-01  1:57                 ` Yang, Weijiang
  1 sibling, 0 replies; 99+ messages in thread
From: Neiger, Gil @ 2023-06-30 15:21 UTC (permalink / raw)
  To: Christopherson,, Sean, Yang, Weijiang
  Cc: Gao, Chao, pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	Edgecombe, Rick P, john.allen

Just in case it is not clear:  event delivery in real mode never includes an error code.  That is why the PE bit in CR0 is checked.

			- Gil

-----Original Message-----
From: Sean Christopherson <seanjc@google.com> 
Sent: Friday, June 30, 2023 08:08
To: Yang, Weijiang <weijiang.yang@intel.com>
Cc: Gao, Chao <chao.gao@intel.com>; pbonzini@redhat.com; kvm@vger.kernel.org; linux-kernel@vger.kernel.org; peterz@infradead.org; rppt@kernel.org; binbin.wu@linux.intel.com; Edgecombe, Rick P <rick.p.edgecombe@intel.com>; john.allen@amd.com; Neiger, Gil <gil.neiger@intel.com>
Subject: Re: [PATCH v3 10/21] KVM:x86: Add #CP support in guest exception classification

On Fri, Jun 30, 2023, Weijiang Yang wrote:
> 
> On 6/17/2023 2:57 AM, Sean Christopherson wrote:
> > > Do you mean documentation for #CP as an generic exception or the 
> > > behavior in KVM as this patch shows?
> > As I pointed out two *years* ago, this entry in the SDM
> > 
> >    — The field's deliver-error-code bit (bit 11) is 1 if each of the following
> >      holds: (1) the interruption type is hardware exception; (2) bit 0
> >      (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
> >      (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
> >      indicates one of the following exceptions: #DF (vector 8), #TS (10),
> >      #NP (11), #SS (12), #GP (13), #PF (14), or #AC (17).
> > 
> > needs to read something like
> > 
> >    — The field's deliver-error-code bit (bit 11) is 1 if each of the following
> >      holds: (1) the interruption type is hardware exception; (2) bit 0
> >      (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
> >      (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
> >      indicates one of the following exceptions: #DF (vector 8), #TS (10),
> >      #NP (11), #SS (12), #GP (13), #PF (14), #AC (17), or #CP 
> > (21)[1]
> > 
> >      [1] #CP has an error code if and only if IA32_VMX_CR4_FIXED1 enumerates
> >          support for the 1-setting of CR4.CET.
> 
> Hi, Sean,
> 
> I sent above change request to Gil(added in cc), but he shared 
> different opinion on this issue:

Heh, "opinion".

>  It may make things clearer if we document the statement above (all 
> CET-capable parts enumerate IA32_VMX_BASIC[56] as 1).
> 
> I will see if we can update future revisions of the SDM to clarify this."

That would be helpful.  Though to be perfectly honest, I simply overlooked the existence of IA32_VMX_BASIC[56].

Thanks!

> Then if this is the case,  kvm needs to check IA32_VMX_BASIC[56] 
> before inject exception to nested VM.
> 
> And this patch could be removed, instead need another patch like below:
> 
> diff --git a/arch/x86/include/asm/msr-index.h
> b/arch/x86/include/asm/msr-index.h
> index ad35355ee43e..6b33aacc8587 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -1076,6 +1076,7 @@
>  #define VMX_BASIC_MEM_TYPE_MASK    0x003c000000000000LLU
>  #define VMX_BASIC_MEM_TYPE_WB    6LLU
>  #define VMX_BASIC_INOUT        0x0040000000000000LLU
> +#define VMX_BASIC_CHECK_ERRCODE    0x0140000000000000LLU

"Check Error Code" isn't a great description.  The flag enumerates that there the CPU does *not* perform consistency checks on the error code when injecting hardware exceptions.

So something like this?

  VMX_BASIC_NO_HW_ERROR_CODE_CC

or maybe

  VMX_BASIC_PM_NO_HW_ERROR_CODE_CC

if we want to capture that only protected mode is exempt (I personally prefer just VMX_BASIC_NO_HW_ERROR_CODE_CC as "PM" is a bit ambiguous).

> @@ -2873,7 +2873,8 @@ static int nested_check_vm_entry_controls(struct
> kvm_vcpu *vcpu,
>          should_have_error_code =
>              intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode &&
>              x86_exception_has_error_code(vector);
> -        if (CC(has_error_code != should_have_error_code))
> +        if (!cpu_has_vmx_basic_check_errcode() &&
> +            CC(has_error_code != should_have_error_code))

This is wrong on mutiple fronts:

  1. The new feature flag only excempts hardware exceptions delivered to guests
     with CR0.PE=1.  The above will skip the consistency check for all event injection.

  2. KVM needs to check the CPU model that is exposed to L1, not the capabilities
     of the host CPU.

Highlighting the key phrases in the SDM:

  The field's deliver-error-code bit (bit 11) is 1 if each of the following holds: (1) the interruption type is
                                                      ^^^^^^^
  hardware exception; (2) bit 0 (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
  (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector indicates one of the following
  exceptions: #DF (vector 8), #TS (10), #NP (11), #SS (12), #GP (13), #PF (14), or #AC (17).
  
  The field's deliver-error-code bit is 0 if any of the following holds: (1) the interruption type is not hardware
                                             ^^^^^^
  exception; (2) bit 0 is clear in the CR0 field in the guest-state area; or (3) IA32_VMX_BASIC[56] is read as
  0 and the vector is in one of the following ranges: 0–7, 9, 15, 16, or 18–31.

I think what we want is:

		/* VM-entry interruption-info field: deliver error code */
		if (!prot_mode || intr_type != INTR_TYPE_HARD_EXCEPTION ||
		    !nested_cpu_has_no_hw_error_code_cc(vcpu)) {
			should_have_error_code =
				intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode &&
				x86_exception_has_error_code(vector);
			if (CC(has_error_code != should_have_error_code))
				return -EINVAL;
		}

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 10/21] KVM:x86: Add #CP support in guest exception classification
  2023-06-30 15:05                   ` Neiger, Gil
  2023-06-30 15:15                     ` Sean Christopherson
@ 2023-07-01  1:54                     ` Yang, Weijiang
  1 sibling, 0 replies; 99+ messages in thread
From: Yang, Weijiang @ 2023-07-01  1:54 UTC (permalink / raw)
  To: Neiger, Gil, Gao, Chao
  Cc: Christopherson,,
	Sean, pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	Edgecombe, Rick P, john.allen


On 6/30/2023 11:05 PM, Neiger, Gil wrote:
> Intel will not produce any CPU with CET that does not enumerate IA32_VMX_BASIC[56] as 1.
>
> One can check that bit before injecting a #CP with error code, but it should not be necessary if CET is enumerated.
>
> Of course, if the KVM may run as a guest of another VMM/hypervisor, it may be that the virtual CPU in which KVM operates may enumerate CET but clear the bit in IA32_VMX_BASIC.
>
> 				- Gil
Thanks Gil for clarity!
>
> -----Original Message-----
> From: Yang, Weijiang <weijiang.yang@intel.com>
> Sent: Friday, June 30, 2023 05:05
> To: Gao, Chao <chao.gao@intel.com>
> Cc: Christopherson,, Sean <seanjc@google.com>; pbonzini@redhat.com; kvm@vger.kernel.org; linux-kernel@vger.kernel.org; peterz@infradead.org; rppt@kernel.org; binbin.wu@linux.intel.com; Edgecombe, Rick P <rick.p.edgecombe@intel.com>; john.allen@amd.com; Neiger, Gil <gil.neiger@intel.com>
> Subject: Re: [PATCH v3 10/21] KVM:x86: Add #CP support in guest exception classification
>
>
> On 6/30/2023 6:27 PM, Chao Gao wrote:
> [...]

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 10/21] KVM:x86: Add #CP support in guest exception classification
  2023-06-30 15:07               ` Sean Christopherson
  2023-06-30 15:21                 ` Neiger, Gil
@ 2023-07-01  1:57                 ` Yang, Weijiang
  1 sibling, 0 replies; 99+ messages in thread
From: Yang, Weijiang @ 2023-07-01  1:57 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Chao Gao, pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen, gil.neiger


On 6/30/2023 11:07 PM, Sean Christopherson wrote:
> On Fri, Jun 30, 2023, Weijiang Yang wrote:
>> On 6/17/2023 2:57 AM, Sean Christopherson wrote:
>>>> Do you mean documentation for #CP as an generic exception or the behavior in
>>>> KVM as this patch shows?
>>> As I pointed out two *years* ago, this entry in the SDM
>>>
>>>     — The field's deliver-error-code bit (bit 11) is 1 if each of the following
>>>       holds: (1) the interruption type is hardware exception; (2) bit 0
>>>       (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
>>>       (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
>>>       indicates one of the following exceptions: #DF (vector 8), #TS (10),
>>>       #NP (11), #SS (12), #GP (13), #PF (14), or #AC (17).
>>>
>>> needs to read something like
>>>
>>>     — The field's deliver-error-code bit (bit 11) is 1 if each of the following
>>>       holds: (1) the interruption type is hardware exception; (2) bit 0
>>>       (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
>>>       (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
>>>       indicates one of the following exceptions: #DF (vector 8), #TS (10),
>>>       #NP (11), #SS (12), #GP (13), #PF (14), #AC (17), or #CP (21)[1]
>>>
>>>       [1] #CP has an error code if and only if IA32_VMX_CR4_FIXED1 enumerates
>>>           support for the 1-setting of CR4.CET.
>> Hi, Sean,
>>
>> I sent above change request to Gil(added in cc), but he shared different
>> opinion on this issue:
> Heh, "opinion".
>
>>   It may make things clearer if we document the statement above (all
>> CET-capable parts enumerate IA32_VMX_BASIC[56] as 1).
>>
>> I will see if we can update future revisions of the SDM to clarify this."
> That would be helpful.  Though to be perfectly honest, I simply overlooked the
> existence of IA32_VMX_BASIC[56].
>
> Thanks!
>
>> Then if this is the case,  kvm needs to check IA32_VMX_BASIC[56] before
>> inject exception to nested VM.
>>
>> And this patch could be removed, instead need another patch like below:
>>
>> diff --git a/arch/x86/include/asm/msr-index.h
>> b/arch/x86/include/asm/msr-index.h
>> index ad35355ee43e..6b33aacc8587 100644
>> --- a/arch/x86/include/asm/msr-index.h
>> +++ b/arch/x86/include/asm/msr-index.h
>> @@ -1076,6 +1076,7 @@
>>   #define VMX_BASIC_MEM_TYPE_MASK    0x003c000000000000LLU
>>   #define VMX_BASIC_MEM_TYPE_WB    6LLU
>>   #define VMX_BASIC_INOUT        0x0040000000000000LLU
>> +#define VMX_BASIC_CHECK_ERRCODE    0x0140000000000000LLU
> "Check Error Code" isn't a great description.  The flag enumerates that there the
> CPU does *not* perform consistency checks on the error code when injecting hardware
> exceptions.
>
> So something like this?
>
>    VMX_BASIC_NO_HW_ERROR_CODE_CC
>
> or maybe
>
>    VMX_BASIC_PM_NO_HW_ERROR_CODE_CC
>
> if we want to capture that only protected mode is exempt (I personally prefer
> just VMX_BASIC_NO_HW_ERROR_CODE_CC as "PM" is a bit ambiguous).

I like VMX_BASIC_NO_HW_ERROR_CODE_CC too :-), thanks!

>
>> @@ -2873,7 +2873,8 @@ static int nested_check_vm_entry_controls(struct
>> kvm_vcpu *vcpu,
>>           should_have_error_code =
>>               intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode &&
>>               x86_exception_has_error_code(vector);
>> -        if (CC(has_error_code != should_have_error_code))
>> +        if (!cpu_has_vmx_basic_check_errcode() &&
>> +            CC(has_error_code != should_have_error_code))
> This is wrong on mutiple fronts:
>
>    1. The new feature flag only excempts hardware exceptions delivered to guests
>       with CR0.PE=1.  The above will skip the consistency check for all event injection.
>
>    2. KVM needs to check the CPU model that is exposed to L1, not the capabilities
>       of the host CPU.
>
> Highlighting the key phrases in the SDM:
>
>    The field's deliver-error-code bit (bit 11) is 1 if each of the following holds: (1) the interruption type is
>                                                        ^^^^^^^
>    hardware exception; (2) bit 0 (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
>    (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector indicates one of the following
>    exceptions: #DF (vector 8), #TS (10), #NP (11), #SS (12), #GP (13), #PF (14), or #AC (17).
>    
>    The field's deliver-error-code bit is 0 if any of the following holds: (1) the interruption type is not hardware
>                                               ^^^^^^
>    exception; (2) bit 0 is clear in the CR0 field in the guest-state area; or (3) IA32_VMX_BASIC[56] is read as
>    0 and the vector is in one of the following ranges: 0–7, 9, 15, 16, or 18–31.
>
> I think what we want is:
>
> 		/* VM-entry interruption-info field: deliver error code */
> 		if (!prot_mode || intr_type != INTR_TYPE_HARD_EXCEPTION ||
> 		    !nested_cpu_has_no_hw_error_code_cc(vcpu)) {
> 			should_have_error_code =
> 				intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode &&
> 				x86_exception_has_error_code(vector);
> 			if (CC(has_error_code != should_have_error_code))
> 				return -EINVAL;
> 		}

It looks good to me, will take it, thanks a lot!


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 10/21] KVM:x86: Add #CP support in guest exception classification
  2023-06-30 15:15                     ` Sean Christopherson
@ 2023-07-01  1:58                       ` Yang, Weijiang
  0 siblings, 0 replies; 99+ messages in thread
From: Yang, Weijiang @ 2023-07-01  1:58 UTC (permalink / raw)
  To: Sean Christopherson, Gil Neiger
  Cc: Chao Gao, pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	Rick P Edgecombe, john.allen


On 6/30/2023 11:15 PM, Sean Christopherson wrote:
> On Fri, Jun 30, 2023, Gil Neiger wrote:
>> Intel will not produce any CPU with CET that does not enumerate IA32_VMX_BASIC[56] as 1.
>>
>> One can check that bit before injecting a #CP with error code, but it should
>> not be necessary if CET is enumerated.
>>
>> Of course, if the KVM may run as a guest of another VMM/hypervisor, it may be
>> that the virtual CPU in which KVM operates may enumerate CET but clear the
>> bit in IA32_VMX_BASIC.
> Yeah, I think KVM should be paranoid and expose CET to the guest if and only if
> IA32_VMX_BASIC[56] is 1.  That'll also help validate nested support, e.g. will
> make it more obvious if userspace+KVM provides a  "bad" model to L1.

OK, will do it, thanks you two!


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 13/21] KVM:VMX: Emulate reads and writes to CET MSRs
  2023-06-23 23:53   ` Sean Christopherson
  2023-06-26 14:05     ` Yang, Weijiang
@ 2023-07-07  9:10     ` Yang, Weijiang
  2023-07-07 15:28       ` Neiger, Gil
  2023-07-12 16:42       ` Sean Christopherson
  1 sibling, 2 replies; 99+ messages in thread
From: Yang, Weijiang @ 2023-07-07  9:10 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen, Sean Christopherson, Neiger, Gil


>> +	case MSR_IA32_PL3_SSP:
>> +		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
>> +			return 1;
>> +		if (is_noncanonical_address(data, vcpu))
>> +			return 1;
>> +		if (msr_index == MSR_IA32_U_CET && (data & GENMASK(9, 6)))
>> +			return 1;
>> +		if (msr_index == MSR_IA32_PL3_SSP && (data & GENMASK(2, 0)))
> Please #define reserved bits, ideally using the inverse of the valid masks.  And
> for SSP, it might be better to do IS_ALIGNED(data, 8) (or 4, pending my question
> about the SDM's wording).
>
> Side topic, what on earth does the SDM mean by this?!?
>
>    The linear address written must be aligned to 8 bytes and bits 2:0 must be 0
>    (hardware requires bits 1:0 to be 0).
>
> I know Intel retroactively changed the alignment requirements, but the above
> is nonsensical.  If ucode prevents writing bits 2:0, who cares what hardware
> requires?

Hi, Sean,

Regarding the alignment check, I got update from Gil:

==================================================

The WRMSR instruction to load IA32_PL[0-3]_SSP will #GP if the value to 
be loaded sets either bit 0 or bit 1.  It does not check bit 2.
IDT event delivery, when changing to rings 0-2 will load SSP from the 
MSR corresponding to the new ring.  These transitions check that bits 
2:0 of the new value are all zero and will generate a nested fault if 
any of those bits are set.  (Far CALL using a call gate also checks this 
if changing CPL.)

For a VMM that is emulating a WRMSR by a guest OS (because it was 
intercepting writes to that MSR), it suffices to perform the same checks 
as the CPU would (i.e., only bits 1:0):
•    If the VMM sees bits 1:0 clear, it can perform the write on the 
part of the guest OS.  If the guest OS later encounters a #GP during IDT 
event delivery (because bit 2 is set), it is its own fault.
•    If the VMM sets either bit 0 or bit 1 set, it should inject a #GP 
into the guest, as that is what the CPU would do in this case.

For an OS that is writing to the MSRs to set up shadow stacks, it should 
WRMSR the base addresses of those stacks.  Because of the token-based 
architecture used for supervisor shadow stacks (for rings 0-2), the base 
addresses of those stacks should be 8-byte aligned (clearing bits 2:0).  
Thus, the values that an OS writes to the corresponding MSRs should 
clear bits 2:0.

(Of course, most OS’s will use only the MSR for ring 0, as most OS’s do 
not use rings 1 and 2.)

In contrast, the IA32_PL3_SSP MSR holds the current SSP for user 
software.  When a user thread is created, I suppose it may reference the 
base of the user shadow stack.  For a 32-bit app, that needs to be 
4-byte aligned (bits 1:0 clear); for a 64-bit app, it may be necessary 
for it to be 8-byte aligned (bits 2:0) clear.

Once the user thread is executing, the CPU will load IA32_PL3_SSP with 
the user’s value of SSP on every exception and interrupt to ring 0.  The 
value at that time may be 4-byte or 8-byte aligned, depending on how the 
user thread is using the shadow stack.  On context switches, the OS 
should WRMSR whatever value was saved (by RDMSR) the last time there was 
a context switch away from the incoming thread.  The OS should not need 
to inspect or change this value.

===================================================

Based on his feedback, I think VMM needs to check bits 1:0 when write 
the SSP MSRs. Is it?


^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH v3 13/21] KVM:VMX: Emulate reads and writes to CET MSRs
  2023-07-07  9:10     ` Yang, Weijiang
@ 2023-07-07 15:28       ` Neiger, Gil
  2023-07-12 16:42       ` Sean Christopherson
  1 sibling, 0 replies; 99+ messages in thread
From: Neiger, Gil @ 2023-07-07 15:28 UTC (permalink / raw)
  To: Yang, Weijiang, Christopherson,, Sean
  Cc: pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu, Edgecombe,
	Rick P, john.allen, Sean Christopherson

There is a small typo below (which came from me originally):

Where it says, "If the VMM sets either bit 0 or bit 1 set, it should inject a #GP" - it should be "If the VMM _sees_ ...".

	- Gil

-----Original Message-----
From: Yang, Weijiang <weijiang.yang@intel.com> 
Sent: Friday, July 7, 2023 02:10
To: Christopherson,, Sean <seanjc@google.com>
Cc: pbonzini@redhat.com; kvm@vger.kernel.org; linux-kernel@vger.kernel.org; peterz@infradead.org; rppt@kernel.org; binbin.wu@linux.intel.com; Edgecombe, Rick P <rick.p.edgecombe@intel.com>; john.allen@amd.com; Sean Christopherson <sean.j.christopherson@intel.com>; Neiger, Gil <gil.neiger@intel.com>
Subject: Re: [PATCH v3 13/21] KVM:VMX: Emulate reads and writes to CET MSRs


>> +	case MSR_IA32_PL3_SSP:
>> +		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
>> +			return 1;
>> +		if (is_noncanonical_address(data, vcpu))
>> +			return 1;
>> +		if (msr_index == MSR_IA32_U_CET && (data & GENMASK(9, 6)))
>> +			return 1;
>> +		if (msr_index == MSR_IA32_PL3_SSP && (data & GENMASK(2, 0)))
> Please #define reserved bits, ideally using the inverse of the valid 
> masks.  And for SSP, it might be better to do IS_ALIGNED(data, 8) (or 
> 4, pending my question about the SDM's wording).
>
> Side topic, what on earth does the SDM mean by this?!?
>
>    The linear address written must be aligned to 8 bytes and bits 2:0 must be 0
>    (hardware requires bits 1:0 to be 0).
>
> I know Intel retroactively changed the alignment requirements, but the 
> above is nonsensical.  If ucode prevents writing bits 2:0, who cares 
> what hardware requires?

Hi, Sean,

Regarding the alignment check, I got update from Gil:

==================================================

The WRMSR instruction to load IA32_PL[0-3]_SSP will #GP if the value to be loaded sets either bit 0 or bit 1.  It does not check bit 2.
IDT event delivery, when changing to rings 0-2 will load SSP from the MSR corresponding to the new ring.  These transitions check that bits
2:0 of the new value are all zero and will generate a nested fault if any of those bits are set.  (Far CALL using a call gate also checks this if changing CPL.)

For a VMM that is emulating a WRMSR by a guest OS (because it was intercepting writes to that MSR), it suffices to perform the same checks as the CPU would (i.e., only bits 1:0):
•    If the VMM sees bits 1:0 clear, it can perform the write on the part of the guest OS.  If the guest OS later encounters a #GP during IDT event delivery (because bit 2 is set), it is its own fault.
•    If the VMM sets either bit 0 or bit 1 set, it should inject a #GP into the guest, as that is what the CPU would do in this case.

For an OS that is writing to the MSRs to set up shadow stacks, it should WRMSR the base addresses of those stacks.  Because of the token-based architecture used for supervisor shadow stacks (for rings 0-2), the base addresses of those stacks should be 8-byte aligned (clearing bits 2:0). Thus, the values that an OS writes to the corresponding MSRs should clear bits 2:0.

(Of course, most OS’s will use only the MSR for ring 0, as most OS’s do not use rings 1 and 2.)

In contrast, the IA32_PL3_SSP MSR holds the current SSP for user software.  When a user thread is created, I suppose it may reference the base of the user shadow stack.  For a 32-bit app, that needs to be 4-byte aligned (bits 1:0 clear); for a 64-bit app, it may be necessary for it to be 8-byte aligned (bits 2:0) clear.

Once the user thread is executing, the CPU will load IA32_PL3_SSP with the user’s value of SSP on every exception and interrupt to ring 0.  The value at that time may be 4-byte or 8-byte aligned, depending on how the user thread is using the shadow stack.  On context switches, the OS should WRMSR whatever value was saved (by RDMSR) the last time there was a context switch away from the incoming thread.  The OS should not need to inspect or change this value.

===================================================

Based on his feedback, I think VMM needs to check bits 1:0 when write the SSP MSRs. Is it?


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 00/21] Enable CET Virtualization
  2023-06-16 17:56     ` Sean Christopherson
  2023-06-19  6:41       ` Yang, Weijiang
@ 2023-07-10  0:28       ` Yang, Weijiang
  2023-07-10 22:18         ` Sean Christopherson
  1 sibling, 1 reply; 99+ messages in thread
From: Yang, Weijiang @ 2023-07-10  0:28 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen, Neiger, Gil


> *sigh*
>
> I got filled in on the details offlist.
>
> 1) In the next version of this series, please rework it to reincorporate Supervisor
>     Shadow Stack support into the main series, i.e. pretend Intel's implemenation
>     isn't horribly flawed.  KVM can't guarantee that a VM-Exit won't occur, i.e.
>     can't advertise CET_SS, but I want the baseline support to be implemented,
>     otherwise the series as a whole is a big confusing mess with unanswered question
>     left, right, and center.  And more importantly, architecturally SSS exists if
>     X86_FEATURE_SHSTK is enumerated, i.e. the guest should be allowed to utilize
>     SSS if it so chooses, with the obvious caveat that there's a non-zero chance
>     the guest risks death by doing so.  Or if userspace can ensure no VM-Exit will
>     occur, which is difficult but feasible (ignoring #MC), e.g. by statically
>     partitioning memory, prefaulting all memory in guest firmware, and not dirty
>     logging SSS pages.  In such an extreme setup, userspace can enumerate CET_SSS
>     to the guest, and KVM should support that.
>   
> 2) Add the below patch to document exactly why KVM doesn't advertise CET_SSS.
>     While Intel is apparently ok with treating KVM developers like mushrooms, I
>     am not.
>
> ---
> From: Sean Christopherson<seanjc@google.com>
> Date: Fri, 16 Jun 2023 10:04:37 -0700
> Subject: [PATCH] KVM: x86: Explicitly document that KVM must not advertise
>   CET_SSS
>
> Explicitly call out that KVM must NOT advertise CET_SSS to userspace,
> i.e. must not tell userspace and thus the guest that it is safe for the
> guest to enable Supervisor Shadow Stacks (SSS).
>
> Intel's implementation of SSS is fatally flawed for virtualized
> environments, as despite wording in the SDM that suggests otherwise,
> Intel CPUs' handling of shadow stack switches are NOT fully atomic.  Only
> the check-and-update of the supervisor shadow stack token's busy bit is
> atomic.  Per the SDM:
>
>    If the far CALL or event delivery pushes a stack frame after the token
>    is acquired and any of the pushes causes a fault or VM exit, the
>    processor will revert to the old shadow stack and the busy bit in the
>    new shadow stack's token remains set.
>
> Or more bluntly, any fault or VM-Exit that occurs when pushing to the
> shadow stack after the busy bit is set is fatal to the kernel, i.e. to
> the guest in KVM's case.  The (guest) kernel can protect itself against
> faults, e.g. by ensuring that the shadow stack always has a valid mapping,
> but a guest kernel obviously has no control over, or even knowledge of,
> VM-Exits due to host activity.
>
> To help software determine when it is safe to use SSS, Intel defined
> CPUID.0x7.1.EDX bit (CET_SSS) and updated Intel CPUs to enumerate CET_SS,
> i.e. bare metal Intel CPUs advertise to software that it is safe to enable
> SSS.
>
>    If CPUID.(EAX=07H,ECX=1H):EDX[bit 18] is enumerated as 1, it is
>    sufficient for an operating system to ensure that none of the pushes can
>    cause a page fault.
>
> But CET_SS also comes with an major caveat that is kinda sorta documented
> in the SDM:
>
>    When emulating the CPUID instruction, a virtual-machine monitor should
>    return this bit as 0 if those pushes can cause VM exits.
>
> In other words, CET_SSS (bit 18) does NOT enumerate that the underlying
> CPU prevents VM-Exits, only that the environment in which the software is
> running will not generate VM-Exits.  I.e. CET_SSS is a stopgap to stem the
> bleeding and allow kernels to enable SSS, not an indication that the
> underlying CPU is immune to the VM-Exit problem.
>
> And unfortunately, KVM itself effectively has zero chance of ensuring that
> a shadow stack switch can't trigger a VM-Exit, e.g. KVM zaps *all* SPTEs
> when any memslot is deleted, enabling dirty logging write-protects SPTEs,
> etc.  A sufficiently motivated userspace can, at least in theory, provide
> a safe environment for SSS, e.g. by statically partitioning and
> prefaulting (in guest firmware) all memory, disabling PML, never
> write-protecting guest shadow stacks, etc.  But such a setup is far, far
> beyond typical KVM deployments.
>
> Note, AMD CPUs have a similar erratum, but AMD CPUs *DO* perform the full
> shadow stack switch atomically so long as the stack is mapped WB and does
> not cross a page boundary, i.e. a "normal" KVM setup and a well-behaved
> guest play nice with SSS without additional shenanigans.
>
> Signed-off-by: Sean Christopherson<seanjc@google.com>
> ---
>   arch/x86/kvm/cpuid.c | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 1e3ee96c879b..ecf4a68aaa08 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -658,7 +658,15 @@ void kvm_set_cpu_caps(void)
>   	);
>   
>   	kvm_cpu_cap_init_kvm_defined(CPUID_7_1_EDX,
> -		F(AVX_VNNI_INT8) | F(AVX_NE_CONVERT) | F(PREFETCHITI)
> +		F(AVX_VNNI_INT8) | F(AVX_NE_CONVERT) | F(PREFETCHITI) |
> +
> +		/*
> +		 * Do NOT advertise CET_SSS, i.e. do not tell userspace and the
> +		 * guest that it is safe to use Supervisor Shadow Stacks under
> +		 * KVM when running on Intel CPUs.  KVM itself cannot guarantee
> +		 * that a VM-Exit won't occur during a shadow stack update.
> +		 */
> +		0 /* F(CET_SSS) */
>   	);
>   
>   	kvm_cpu_cap_mask(CPUID_D_1_EAX,
>
> base-commit: 9305c14847719870e9e08294034861360577ce08

Hi, Sean,

Gil reminded me SDM has been updated CET SSS related topics 
recently(June release):

======================================================================

Section 17.2.3 (Supervisor Shadow Stack Token) in Volume 1 of the SDM:
     If the far CALL or event delivery pushes a stack frame after the 
token is acquired and any of the pushes causes a
     fault or VM exit, the processor will revert to the old shadow stack 
and the busy bit in the new shadow stack's token
     remains set. The new shadow stack is said to be prematurely busy. 
Software should enable supervisor shadow
     stacks only if it is certain that this situation cannot occur. If 
CPUID.(EAX=07H,ECX=1H):EDX[bit 18] is enumerated
     as 1, it is sufficient for an operating system to ensure that none 
of the pushes can cause a page fault.

Volume 2A: CPUID.(EAX=07H,ECX=1H):EDX[bit 18] as follows:
     CET_SSS. If 1, indicates that an operating system can enable 
supervisor shadow stacks as long as
     it ensures that a supervisor shadow stack cannot become prematurely 
busy due to page faults (see Section
     17.2.3 of the Intel® 64 and IA-32 Architectures Software 
Developer’s Manual, Volume 1). When
     emulating the CPUID instruction, a virtual-machine monitor (VMM) 
should return this bit as 1 only if it
     ensures that VM exits cannot cause a guest supervisor shadow stack 
to appear to be prematurely busy.
     Such a VMM could set the “prematurely busy shadow stack” VM-exit 
control and use the additional information
     that it provides.

Volume 3C: new “prematurely busy shadow stack” VM-exit control.

========================================================================

And Gil told me additional information was planed to be released later 
in the summer.

Maybe you need modify above changelog a bit per the update.

Given the updated parts are technical forecast, I don't plan to 
implement it in this series and still enumerate

CET_SSS ==0 for guest. What's your thoughts?


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 00/21] Enable CET Virtualization
  2023-07-10  0:28       ` Yang, Weijiang
@ 2023-07-10 22:18         ` Sean Christopherson
  2023-07-11  1:24           ` Yang, Weijiang
  0 siblings, 1 reply; 99+ messages in thread
From: Sean Christopherson @ 2023-07-10 22:18 UTC (permalink / raw)
  To: Weijiang Yang
  Cc: pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen, Gil Neiger

On Mon, Jul 10, 2023, Weijiang Yang wrote:
> Maybe you need modify above changelog a bit per the update.

Ya, I'll make sure the changelog gets updated before CET support is merged, though
feel free to post the next version without waiting for new changelog.

> Given the updated parts are technical forecast, I don't plan to implement it
> in this series and still enumerate
> 
> CET_SSS ==0 for guest. What's your thoughts?

Yes, definitely punt shadow-stack fixup to future enabling work. 

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 00/21] Enable CET Virtualization
  2023-07-10 22:18         ` Sean Christopherson
@ 2023-07-11  1:24           ` Yang, Weijiang
  0 siblings, 0 replies; 99+ messages in thread
From: Yang, Weijiang @ 2023-07-11  1:24 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen, Gil Neiger


On 7/11/2023 6:18 AM, Sean Christopherson wrote:
> On Mon, Jul 10, 2023, Weijiang Yang wrote:
>> Maybe you need modify above changelog a bit per the update.
> Ya, I'll make sure the changelog gets updated before CET support is merged, though
> feel free to post the next version without waiting for new changelog.

Sure, thanks!

>> Given the updated parts are technical forecast, I don't plan to implement it
>> in this series and still enumerate
>>
>> CET_SSS ==0 for guest. What's your thoughts?
> Yes, definitely punt shadow-stack fixup to future enabling work.
Got it.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 13/21] KVM:VMX: Emulate reads and writes to CET MSRs
  2023-07-07  9:10     ` Yang, Weijiang
  2023-07-07 15:28       ` Neiger, Gil
@ 2023-07-12 16:42       ` Sean Christopherson
  1 sibling, 0 replies; 99+ messages in thread
From: Sean Christopherson @ 2023-07-12 16:42 UTC (permalink / raw)
  To: Weijiang Yang
  Cc: pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen, Sean Christopherson, Gil Neiger

On Fri, Jul 07, 2023, Weijiang Yang wrote:
> > Side topic, what on earth does the SDM mean by this?!?
> > 
> >    The linear address written must be aligned to 8 bytes and bits 2:0 must be 0
> >    (hardware requires bits 1:0 to be 0).
> > 
> > I know Intel retroactively changed the alignment requirements, but the above
> > is nonsensical.  If ucode prevents writing bits 2:0, who cares what hardware
> > requires?
> 
> Hi, Sean,
> 
> Regarding the alignment check, I got update from Gil:
> 
> ==================================================
> 
> The WRMSR instruction to load IA32_PL[0-3]_SSP will #GP if the value to be
> loaded sets either bit 0 or bit 1.  It does not check bit 2.
> IDT event delivery, when changing to rings 0-2 will load SSP from the MSR
> corresponding to the new ring.  These transitions check that bits 2:0 of the
> new value are all zero and will generate a nested fault if any of those bits
> are set.  (Far CALL using a call gate also checks this if changing CPL.)
> 
> For a VMM that is emulating a WRMSR by a guest OS (because it was
> intercepting writes to that MSR), it suffices to perform the same checks as
> the CPU would (i.e., only bits 1:0):
> •    If the VMM sees bits 1:0 clear, it can perform the write on the part of
> the guest OS.  If the guest OS later encounters a #GP during IDT event
> delivery (because bit 2 is set), it is its own fault.
> •    If the VMM sets either bit 0 or bit 1 set, it should inject a #GP into
> the guest, as that is what the CPU would do in this case.
> 
> For an OS that is writing to the MSRs to set up shadow stacks, it should
> WRMSR the base addresses of those stacks.  Because of the token-based
> architecture used for supervisor shadow stacks (for rings 0-2), the base
> addresses of those stacks should be 8-byte aligned (clearing bits 2:0). 
> Thus, the values that an OS writes to the corresponding MSRs should clear
> bits 2:0.
> 
> (Of course, most OS’s will use only the MSR for ring 0, as most OS’s do not
> use rings 1 and 2.)
> 
> In contrast, the IA32_PL3_SSP MSR holds the current SSP for user software. 
> When a user thread is created, I suppose it may reference the base of the
> user shadow stack.  For a 32-bit app, that needs to be 4-byte aligned (bits
> 1:0 clear); for a 64-bit app, it may be necessary for it to be 8-byte
> aligned (bits 2:0) clear.
> 
> Once the user thread is executing, the CPU will load IA32_PL3_SSP with the
> user’s value of SSP on every exception and interrupt to ring 0.  The value
> at that time may be 4-byte or 8-byte aligned, depending on how the user
> thread is using the shadow stack.  On context switches, the OS should WRMSR
> whatever value was saved (by RDMSR) the last time there was a context switch
> away from the incoming thread.  The OS should not need to inspect or change
> this value.
> 
> ===================================================
> 
> Based on his feedback, I think VMM needs to check bits 1:0 when write the
> SSP MSRs. Is it?

Yep, KVM should only check bits 1:0 when emulating WRMSR.  KVM doesn't emulate
event delivery except for Real Mode, and I don't see that ever changing.  So to
"handle" the #GP during event delivery case, KVM just needs to propagate the "bad"
value into guest context, which KVM needs to do anyways.

Thanks for following up on this!

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 00/21] Enable CET Virtualization
  2023-06-23 20:51         ` Sean Christopherson
  2023-06-26  6:46           ` Yang, Weijiang
@ 2023-07-17  7:44           ` Yang, Weijiang
  2023-07-19 19:41             ` Sean Christopherson
  1 sibling, 1 reply; 99+ messages in thread
From: Yang, Weijiang @ 2023-07-17  7:44 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen, Chao Gao


On 6/24/2023 4:51 AM, Sean Christopherson wrote:
> On Mon, Jun 19, 2023, Weijiang Yang wrote:
>> On 6/17/2023 1:56 AM, Sean Christopherson wrote:
>>> On Fri, Jun 16, 2023, Weijiang Yang wrote:
>>>> On 6/16/2023 7:30 AM, Sean Christopherson wrote:
>>>>> On Thu, May 11, 2023, Yang Weijiang wrote:
>>>>>
[...]
>> Let me make it clear, you want me to do two things:
>>
>> 1)Add Supervisor Shadow Stack  state support(i.e., XSS.bit12(CET_S)) into
>> kernel so that host can support guest Supervisor Shadow Stack MSRs in g/h FPU
>> context switch.
> If that's necessary for correct functionality, yes.

Hi, Sean,

I held off posting the new version and want to sync up with you on this 
point to avoid

surprising you.

After discussed adding the patch in kernel with Rick and Chao, we got 
blow conclusions

on doing so:

the Pros:
  - Super easy to implement for KVM.
  - Automatically avoids saving and restoring this data when the vmexit
    is handled within KVM.

the Cons:
  - Unnecessarily restores XFEATURE_CET_KERNEL when switching to
    non-KVM task's userspace.
  - Forces allocating space for this state on all tasks, whether or not
    they use KVM, and with likely zero users today and the near future.
  - Complicates the FPU optimization thinking by including things that
    can have no affect on userspace in the FPU

Given above reasons, I implemented guest CET supervisor states management

in KVM instead of adding a kernel patch for it.

Below are 3 KVM patches to support it:

Patch 1: Save/reload guest CET supervisor states when necessary:

=======================================================================

commit 16147ede75dee29583b7d42a6621d10d55b63595
Author: Yang Weijiang <weijiang.yang@intel.com>
Date:   Tue Jul 11 02:26:17 2023 -0400

     KVM:x86: Make guest supervisor states as non-XSAVE managed

     Save and reload guest CET supervisor states, i.e.,PL{0,1,2}_SSP,
     when vCPU context is being swapped before and after userspace
     <->kernel entry, also do the same operation when vCPU is sched-in
     or sched-out.

     Enabling CET supervisor state management only in KVM due to:
     1) Currently, suervisor SHSTK is not enabled on host side, only
     KVM needs to care about the guest's suervisor SHSTK states.
     2) Enabling them in kernel FPU state framework has global effects
     to all threads on host kernel, but the majority of the threads
     are free to CET supervisor states. And it requires additional
     storage size of thread FPU state area.

     Add a new helper kvm_arch_sched_out() for that purpose. Adding
     the support in kvm_arch_vcpu_put/load() without the new helper
     looks possible, but the put/load functions are also called in
     vcpu_put()/load(), the latter are heavily used in KVM, so adding
     new helper makes the implementation clearer.

     Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 7e7e19ef6993..98235cb3d258 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1023,6 +1023,7 @@ void kvm_arm_vcpu_ptrauth_trap(struct kvm_vcpu *vcpu);

  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
+static inline void kvm_arch_sched_out(struct kvm_vcpu *vcpu, int cpu) {}
  void kvm_arm_init_debug(void);
  void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu);
diff --git a/arch/mips/include/asm/kvm_host.h 
b/arch/mips/include/asm/kvm_host.h
index 957121a495f0..56c5e85ba5a3 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -893,6 +893,7 @@ static inline void kvm_arch_free_memslot(struct kvm 
*kvm,
                                          struct kvm_memory_slot *slot) {}
  static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
+static inline void kvm_arch_sched_out(struct kvm_vcpu *vcpu, int cpu) {}
  static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
  static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 14ee0dece853..11587d953bf6 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -880,6 +880,7 @@ static inline void kvm_arch_sync_events(struct kvm 
*kvm) {}
  static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
  static inline void kvm_arch_flush_shadow_all(struct kvm *kvm) {}
  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
+static inline void kvm_arch_sched_out(struct kvm_vcpu *vcpu, int cpu) {}
  static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
  static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}

diff --git a/arch/riscv/include/asm/kvm_host.h 
b/arch/riscv/include/asm/kvm_host.h
index ee0acccb1d3b..6ff4a04fe0f2 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -244,6 +244,7 @@ struct kvm_vcpu_arch {

  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
+static inline void kvm_arch_sched_out(struct kvm_vcpu *vcpu, int cpu) {}
  #define KVM_ARCH_WANT_MMU_NOTIFIER

diff --git a/arch/s390/include/asm/kvm_host.h 
b/arch/s390/include/asm/kvm_host.h
index 2bbc3d54959d..d1750a6a86cf 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -1033,6 +1033,7 @@ extern int kvm_s390_gisc_unregister(struct kvm 
*kvm, u32 gisc);

  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
+static inline void kvm_arch_sched_out(struct kvm_vcpu *vcpu, int cpu) {}
  static inline void kvm_arch_free_memslot(struct kvm *kvm,
                                          struct kvm_memory_slot *slot) {}
  static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e2c549f147a5..7d9cfb7e2fe8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11212,6 +11212,31 @@ static void kvm_put_guest_fpu(struct kvm_vcpu 
*vcpu)
         trace_kvm_fpu(0);
  }

+static void kvm_save_cet_supervisor_ssp(struct kvm_vcpu *vcpu)
+{
+       preempt_disable();
+       if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) {
+               rdmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]);
+               rdmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]);
+               rdmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]);
+               wrmsrl(MSR_IA32_PL0_SSP, 0);
+               wrmsrl(MSR_IA32_PL1_SSP, 0);
+               wrmsrl(MSR_IA32_PL2_SSP, 0);
+       }
+       preempt_enable();
+}
+
+static void kvm_reload_cet_supervisor_ssp(struct kvm_vcpu *vcpu)
+{
+       preempt_disable();
+       if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) {
+               wrmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]);
+               wrmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]);
+               wrmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]);
+       }
+       preempt_enable();
+}
+
  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
  {
         struct kvm_queued_exception *ex = &vcpu->arch.exception;
@@ -11222,6 +11247,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
         kvm_sigset_activate(vcpu);
         kvm_run->flags = 0;
         kvm_load_guest_fpu(vcpu);
+       kvm_reload_cet_supervisor_ssp(vcpu);

         kvm_vcpu_srcu_read_lock(vcpu);
         if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) {
@@ -11310,6 +11336,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
         r = vcpu_run(vcpu);

  out:
+       kvm_save_cet_supervisor_ssp(vcpu);
         kvm_put_guest_fpu(vcpu);
         if (kvm_run->kvm_valid_regs)
                 store_regs(vcpu);
@@ -12398,9 +12425,17 @@ void kvm_arch_sched_in(struct kvm_vcpu *vcpu, 
int cpu)
                 pmu->need_cleanup = true;
                 kvm_make_request(KVM_REQ_PMU, vcpu);
         }
+
+       kvm_reload_cet_supervisor_ssp(vcpu);
+
         static_call(kvm_x86_sched_in)(vcpu, cpu);
  }
+void kvm_arch_sched_out(struct kvm_vcpu *vcpu, int cpu)
+{
+       kvm_save_cet_supervisor_ssp(vcpu);
+}
+
  void kvm_arch_free_vm(struct kvm *kvm)
  {
         kfree(to_kvm_hv(kvm)->hv_pa_pg);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d90331f16db1..b3032a5f0641 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1423,6 +1423,7 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct 
kvm_vcpu *vcpu,
  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu);

  void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu);
+void kvm_arch_sched_out(struct kvm_vcpu *vcpu, int cpu);

  void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 66c1447d3c7f..42f28e8905e1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5885,6 +5885,7 @@ static void kvm_sched_out(struct preempt_notifier *pn,
  {
         struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);

+       kvm_arch_sched_out(vcpu, 0);
         if (current->on_rq) {
                 WRITE_ONCE(vcpu->preempted, true);
                 WRITE_ONCE(vcpu->ready, true);

Patch 2: optimization patch for above one:

===================================================================

commit ae5fe7c81cc3b93193758d1b7b4ab74a92a51dad
Author: Yang Weijiang <weijiang.yang@intel.com>
Date:   Fri Jul 14 20:03:52 2023 -0400

     KVM:x86: Optimize CET supervisor SSP save/reload

     Make PL{0,1,2}_SSP as write-intercepted to detect whether
     guest is using these MSRs. Disable intercept to the MSRs
     if they're written with non-zero values. KVM does save/
     reload for the MSRs only if they're used by guest.

     Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>

diff --git a/arch/x86/include/asm/kvm_host.h 
b/arch/x86/include/asm/kvm_host.h
index 69cbc9d9b277..c50b555234fb 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -748,6 +748,7 @@ struct kvm_vcpu_arch {
         bool tpr_access_reporting;
         bool xsaves_enabled;
         bool xfd_no_write_intercept;
+       bool cet_sss_active;
         u64 ia32_xss;
         u64 microcode_version;
         u64 arch_capabilities;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 90ce1c7d3fd7..21c89d200c88 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2156,6 +2156,18 @@ static u64 vmx_get_supported_debugctl(struct 
kvm_vcpu *vcpu, bool host_initiated
         return debugctl;
  }

+static void vmx_disable_write_intercept_sss_msr(struct kvm_vcpu *vcpu)
+{
+       if (guest_can_use(vcpu, X86_FEATURE_SHSTK)) {
+               vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP,
+                               MSR_TYPE_RW, false);
+               vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP,
+                               MSR_TYPE_RW, false);
+               vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP,
+                               MSR_TYPE_RW, false);
+       }
+}
+
  /*
   * Writes msr value into the appropriate "register".
   * Returns 0 on success, non-0 otherwise.
@@ -2427,7 +2439,17 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, 
struct msr_data *msr_info)
  #define VMX_CET_CONTROL_MASK   (~GENMASK_ULL(9,6))
  #define LEG_BITMAP_BASE(data)  ((data) >> 12)
         case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
-               return kvm_set_msr_common(vcpu, msr_info);
+               if (kvm_set_msr_common(vcpu, msr_info))
+                       return 1;
+               /*
+                * Write to the base SSP MSRs should happen ahead of 
toggling
+                * of IA32_S_CET.SH_STK_EN bit.
+                */
+               if (!msr_info->host_initiated &&
+                   msr_index != MSR_IA32_PL3_SSP && data) {
+                       vmx_disable_write_intercept_sss_msr(vcpu);
+                       wrmsrl(msr_index, data);
+               }
                 break;
         case MSR_IA32_U_CET:
         case MSR_IA32_S_CET:
@@ -7773,12 +7795,17 @@ static void 
vmx_update_intercept_for_cet_msr(struct kvm_vcpu *vcpu)
                                 MSR_TYPE_RW, false);
                 vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET,
                                 MSR_TYPE_RW, false);
+               /*
+                * Supervisor shadow stack MSRs are intercepted until
+                * they're written by guest, this is designed to
+                * optimize the save/restore overhead.
+                */
                 vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP,
-                               MSR_TYPE_RW, false);
+                               MSR_TYPE_R, false);
                 vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP,
-                               MSR_TYPE_RW, false);
+                               MSR_TYPE_R, false);
                 vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP,
-                               MSR_TYPE_RW, false);
+                               MSR_TYPE_R, false);
                 vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP,

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cab31dbb2bec..06dc5111da3b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4049,8 +4049,11 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, 
struct msr_data *msr_info)
                 if (!IS_ALIGNED(data, 4))
                         return 1;
                 if (msr == MSR_IA32_PL0_SSP || msr == MSR_IA32_PL1_SSP ||
-                   msr == MSR_IA32_PL2_SSP)
+                   msr == MSR_IA32_PL2_SSP) {
+                       if (!msr_info->host_initiated && data)
+                               vcpu->arch.cet_sss_active = true;
                         vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP] = 
data;
+               }
                 else if (msr == MSR_IA32_PL3_SSP)
                         kvm_set_xsave_msr(msr_info);
                 break;
@@ -11250,7 +11253,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
         kvm_sigset_activate(vcpu);
         kvm_run->flags = 0;
         kvm_load_guest_fpu(vcpu);
-       kvm_reload_cet_supervisor_ssp(vcpu);
+       if (vcpu->arch.cet_sss_active)
+               kvm_reload_cet_supervisor_ssp(vcpu);

         kvm_vcpu_srcu_read_lock(vcpu);
         if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) {
@@ -11339,7 +11343,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
         r = vcpu_run(vcpu);

  out:
-       kvm_save_cet_supervisor_ssp(vcpu);
+       if (vcpu->arch.cet_sss_active)
+               kvm_save_cet_supervisor_ssp(vcpu);
         kvm_put_guest_fpu(vcpu);
         if (kvm_run->kvm_valid_regs)
                 store_regs(vcpu);
         kvm_vcpu_srcu_read_lock(vcpu);
         if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) {
@@ -11339,7 +11343,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
         r = vcpu_run(vcpu);

  out:
-       kvm_save_cet_supervisor_ssp(vcpu);
+       if (vcpu->arch.cet_sss_active)
+               kvm_save_cet_supervisor_ssp(vcpu);
         kvm_put_guest_fpu(vcpu);
         if (kvm_run->kvm_valid_regs)
                 store_regs(vcpu);
@@ -12428,15 +12433,16 @@ void kvm_arch_sched_in(struct kvm_vcpu *vcpu, 
int cpu)
                 pmu->need_cleanup = true;
                 kvm_make_request(KVM_REQ_PMU, vcpu);
         }
-
-       kvm_reload_cet_supervisor_ssp(vcpu);
+       if (vcpu->arch.cet_sss_active)
+               kvm_reload_cet_supervisor_ssp(vcpu);

         static_call(kvm_x86_sched_in)(vcpu, cpu);
  }

  void kvm_arch_sched_out(struct kvm_vcpu *vcpu, int cpu)
  {
-       kvm_save_cet_supervisor_ssp(vcpu);
+       if (vcpu->arch.cet_sss_active)
+               kvm_save_cet_supervisor_ssp(vcpu);
  }

  void kvm_arch_free_vm(struct kvm *kvm)

=============================================================

Patch 3: support guest CET supervisor xstate bit:

commit 2708b3c959db56fb9243f9a157884c2120b8810c
Author: Yang Weijiang <weijiang.yang@intel.com>
Date:   Sat Jul 15 20:56:37 2023 -0400

     KVM:x86: Enable guest CET supervisor xstate bit support

     Add S_CET bit in kvm_caps.supported_xss so that guest can enumerate
     the feature in CPUID(0xd,1).ECX.

     Guest S_CET xstate bit is specially handled, i.e., it can be exposed
     without related enabling on host side, because KVM manually 
saves/reloads
     guest supervisor SHSTK SSPs and current XSS swap logic for 
host/guest aslo
     supports doing so, thus it's safe to enable the bit without host 
support.

     Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2653e5eb54ee..071bcdedc530 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -228,7 +228,8 @@ static struct kvm_user_return_msrs __percpu 
*user_return_msrs;
                                 | XFEATURE_MASK_BNDCSR | 
XFEATURE_MASK_AVX512 \
                                 | XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE)

-#define KVM_SUPPORTED_XSS      (XFEATURE_MASK_CET_USER)
+#define KVM_SUPPORTED_XSS      (XFEATURE_MASK_CET_USER | \
+                                XFEATURE_MASK_CET_KERNEL)

  u64 __read_mostly host_efer;
  EXPORT_SYMBOL_GPL(host_efer);
@@ -9639,6 +9640,7 @@ static int __kvm_x86_vendor_init(struct 
kvm_x86_init_ops *ops)
         if (boot_cpu_has(X86_FEATURE_XSAVES)) {
                 rdmsrl(MSR_IA32_XSS, host_xss);
                 kvm_caps.supported_xss = host_xss & KVM_SUPPORTED_XSS;
+               kvm_caps.supported_xss |= XFEATURE_MASK_CET_KERNEL;
         }

         kvm_init_pmu_capability(ops->pmu_ops);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index f8f042c91728..df187d7c3e74 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -362,7 +362,7 @@ static inline bool kvm_mpx_supported(void)
                 == (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR);
  }

-#define CET_XSTATE_MASK (XFEATURE_MASK_CET_USER)
+#define CET_XSTATE_MASK (XFEATURE_MASK_CET_USER | XFEATURE_MASK_CET_KERNEL)
  /*
   * Shadow Stack and Indirect Branch Tracking feature enabling depends on
   * whether host side CET user xstate bit is supported or not.

=================================================================

What's your thoughts on the solution? Is it appropriate for KVM?

Thanks!

[...]


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 00/21] Enable CET Virtualization
  2023-07-17  7:44           ` Yang, Weijiang
@ 2023-07-19 19:41             ` Sean Christopherson
  2023-07-19 20:26               ` Sean Christopherson
                                 ` (2 more replies)
  0 siblings, 3 replies; 99+ messages in thread
From: Sean Christopherson @ 2023-07-19 19:41 UTC (permalink / raw)
  To: Weijiang Yang
  Cc: pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen, Chao Gao

On Mon, Jul 17, 2023, Weijiang Yang wrote:
> 
> On 6/24/2023 4:51 AM, Sean Christopherson wrote:
> > > 1)Add Supervisor Shadow Stack� state support(i.e., XSS.bit12(CET_S)) into
> > > kernel so that host can support guest Supervisor Shadow Stack MSRs in g/h FPU
> > > context switch.
> > If that's necessary for correct functionality, yes.

...

> the Pros:
> �- Super easy to implement for KVM.
> �- Automatically avoids saving and restoring this data when the vmexit
> �� is handled within KVM.
> 
> the Cons:
> �- Unnecessarily restores XFEATURE_CET_KERNEL when switching to
> �� non-KVM task's userspace.
> �- Forces allocating space for this state on all tasks, whether or not
> �� they use KVM, and with likely zero users today and the near future.
> �- Complicates the FPU optimization thinking by including things that
> �� can have no affect on userspace in the FPU
> 
> Given above reasons, I implemented guest CET supervisor states management
> in KVM instead of adding a kernel patch for it.
> 
> Below are 3 KVM patches to support it:
> 
> Patch 1: Save/reload guest CET supervisor states when necessary:
> 
> =======================================================================
> 
> commit 16147ede75dee29583b7d42a6621d10d55b63595
> Author: Yang Weijiang <weijiang.yang@intel.com>
> Date:�� Tue Jul 11 02:26:17 2023 -0400
> 
> ��� KVM:x86: Make guest supervisor states as non-XSAVE managed
> 
> ��� Save and reload guest CET supervisor states, i.e.,PL{0,1,2}_SSP,
> ��� when vCPU context is being swapped before and after userspace
> ��� <->kernel entry, also do the same operation when vCPU is sched-in
> ��� or sched-out.

...

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e2c549f147a5..7d9cfb7e2fe8 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -11212,6 +11212,31 @@ static void kvm_put_guest_fpu(struct kvm_vcpu
> *vcpu)
> ������� trace_kvm_fpu(0);
> �}
> 
> +static void kvm_save_cet_supervisor_ssp(struct kvm_vcpu *vcpu)
> +{
> +������ preempt_disable();
> +������ if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) {
> +�������������� rdmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]);
> +�������������� rdmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]);
> +�������������� rdmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]);
> +�������������� wrmsrl(MSR_IA32_PL0_SSP, 0);
> +�������������� wrmsrl(MSR_IA32_PL1_SSP, 0);
> +�������������� wrmsrl(MSR_IA32_PL2_SSP, 0);
> +������ }
> +������ preempt_enable();
> +}
> +
> +static void kvm_reload_cet_supervisor_ssp(struct kvm_vcpu *vcpu)
> +{
> +������ preempt_disable();
> +������ if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) {
> +�������������� wrmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]);
> +�������������� wrmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]);
> +�������������� wrmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]);
> +������ }
> +������ preempt_enable();
> +}

My understanding is that PL[0-2]_SSP are used only on transitions to the
corresponding privilege level from a *different* privilege level.  That means
KVM should be able to utilize the user_return_msr framework to load the host
values.  Though if Linux ever supports SSS, I'm guessing the core kernel will
have some sort of mechanism to defer loading MSR_IA32_PL0_SSP until an exit to
userspace, e.g. to avoid having to write PL0_SSP, which will presumably be
per-task, on every context switch.

But note my original wording: **If that's necessary**

If nothing in the host ever consumes those MSRs, i.e. if SSS is NOT enabled in
IA32_S_CET, then running host stuff with guest values should be ok.  KVM only
needs to guarantee that it doesn't leak values between guests.  But that should
Just Work, e.g. KVM should load the new vCPU's values if SHSTK is exposed to the
guest, and intercept (to inject #GP) if SHSTK is not exposed to the guest.

And regardless of what the mechanism ends up managing SSP MSRs, it should only
ever touch PL0_SSP, because Linux never runs anything at CPL1 or CPL2, i.e. will
never consume PL{1,2}_SSP.

Am I missing something?

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 00/21] Enable CET Virtualization
  2023-07-19 19:41             ` Sean Christopherson
@ 2023-07-19 20:26               ` Sean Christopherson
  2023-07-20  1:58                 ` Yang, Weijiang
  2023-07-19 20:36               ` Peter Zijlstra
  2023-07-20  1:55               ` Yang, Weijiang
  2 siblings, 1 reply; 99+ messages in thread
From: Sean Christopherson @ 2023-07-19 20:26 UTC (permalink / raw)
  To: Weijiang Yang
  Cc: pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen, Chao Gao

On Wed, Jul 19, 2023, Sean Christopherson wrote:
> On Mon, Jul 17, 2023, Weijiang Yang wrote:
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index e2c549f147a5..7d9cfb7e2fe8 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -11212,6 +11212,31 @@ static void kvm_put_guest_fpu(struct kvm_vcpu
> > *vcpu)
> > ������� trace_kvm_fpu(0);
> > �}

Huh.  After a bit of debugging, the mangling is due to mutt's default for send_charset
being

  "us-ascii:iso-8859-1:utf-8"

and selecting iso-8859-1 instead of utf-8 as the encoding despite the original
mail being utf-8.  In this case, mutt ran afoul of nbsp (u+00a0).

AFAICT, the solution is to essentially tell mutt to never try to use iso-8859-1
for sending mail

  set send_charset="us-ascii:utf-8"

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 00/21] Enable CET Virtualization
  2023-07-19 19:41             ` Sean Christopherson
  2023-07-19 20:26               ` Sean Christopherson
@ 2023-07-19 20:36               ` Peter Zijlstra
  2023-07-20  5:26                 ` Pankaj Gupta
  2023-07-20  1:55               ` Yang, Weijiang
  2 siblings, 1 reply; 99+ messages in thread
From: Peter Zijlstra @ 2023-07-19 20:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Weijiang Yang, pbonzini, kvm, linux-kernel, rppt, binbin.wu,
	rick.p.edgecombe, john.allen, Chao Gao

On Wed, Jul 19, 2023 at 12:41:47PM -0700, Sean Christopherson wrote:

> My understanding is that PL[0-2]_SSP are used only on transitions to the
> corresponding privilege level from a *different* privilege level.  That means
> KVM should be able to utilize the user_return_msr framework to load the host
> values.  Though if Linux ever supports SSS, I'm guessing the core kernel will
> have some sort of mechanism to defer loading MSR_IA32_PL0_SSP until an exit to
> userspace, e.g. to avoid having to write PL0_SSP, which will presumably be
> per-task, on every context switch.
> 
> But note my original wording: **If that's necessary**
> 
> If nothing in the host ever consumes those MSRs, i.e. if SSS is NOT enabled in
> IA32_S_CET, then running host stuff with guest values should be ok.  KVM only
> needs to guarantee that it doesn't leak values between guests.  But that should
> Just Work, e.g. KVM should load the new vCPU's values if SHSTK is exposed to the
> guest, and intercept (to inject #GP) if SHSTK is not exposed to the guest.
> 
> And regardless of what the mechanism ends up managing SSP MSRs, it should only
> ever touch PL0_SSP, because Linux never runs anything at CPL1 or CPL2, i.e. will
> never consume PL{1,2}_SSP.

To clarify, Linux will only use SSS in FRED mode -- FRED removes CPL1,2.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 00/21] Enable CET Virtualization
  2023-07-19 19:41             ` Sean Christopherson
  2023-07-19 20:26               ` Sean Christopherson
  2023-07-19 20:36               ` Peter Zijlstra
@ 2023-07-20  1:55               ` Yang, Weijiang
  2 siblings, 0 replies; 99+ messages in thread
From: Yang, Weijiang @ 2023-07-20  1:55 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen, Chao Gao


On 7/20/2023 3:41 AM, Sean Christopherson wrote:
> [...]
> My understanding is that PL[0-2]_SSP are used only on transitions to the
> corresponding privilege level from a *different* privilege level.  That means
> KVM should be able to utilize the user_return_msr framework to load the host
> values.  Though if Linux ever supports SSS, I'm guessing the core kernel will
> have some sort of mechanism to defer loading MSR_IA32_PL0_SSP until an exit to
> userspace, e.g. to avoid having to write PL0_SSP, which will presumably be
> per-task, on every context switch.
>
> But note my original wording: **If that's necessary**

Thanks!

I think host SSS enabling won't happen in short-term, handling the guest 
supervisor

states in KVM side is doable.

>
> If nothing in the host ever consumes those MSRs, i.e. if SSS is NOT enabled in
> IA32_S_CET, then running host stuff with guest values should be ok.  KVM only
> needs to guarantee that it doesn't leak values between guests.  But that should
> Just Work, e.g. KVM should load the new vCPU's values if SHSTK is exposed to the
> guest, and intercept (to inject #GP) if SHSTK is not exposed to the guest.

Yes, these handling have been covered by the new version.

>
> And regardless of what the mechanism ends up managing SSP MSRs, it should only
> ever touch PL0_SSP, because Linux never runs anything at CPL1 or CPL2, i.e. will
> never consume PL{1,2}_SSP.
>
> Am I missing something?

I think, guest PL{0,1,2}_SSP can be handled as a bundle to make the 
handling easy(instead of handling each

separately) because guest can be non-Linux systems, as you said before 
they could even be used as scratch registers.

But for host side as it's Linux, I can omit reloading/resetting host 
PL{1,2}_SSP when vCPU thread is preempted.

I will post new version to community if above is minor divergence.


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 00/21] Enable CET Virtualization
  2023-07-19 20:26               ` Sean Christopherson
@ 2023-07-20  1:58                 ` Yang, Weijiang
  0 siblings, 0 replies; 99+ messages in thread
From: Yang, Weijiang @ 2023-07-20  1:58 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, kvm, linux-kernel, peterz, rppt, binbin.wu,
	rick.p.edgecombe, john.allen, Chao Gao


On 7/20/2023 4:26 AM, Sean Christopherson wrote:
> On Wed, Jul 19, 2023, Sean Christopherson wrote:
>> On Mon, Jul 17, 2023, Weijiang Yang wrote:
>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>> index e2c549f147a5..7d9cfb7e2fe8 100644
>>> --- a/arch/x86/kvm/x86.c
>>> +++ b/arch/x86/kvm/x86.c
>>> @@ -11212,6 +11212,31 @@ static void kvm_put_guest_fpu(struct kvm_vcpu
>>> *vcpu)
>>> ������� trace_kvm_fpu(0);
>>> �}
> Huh.  After a bit of debugging, the mangling is due to mutt's default for send_charset
> being
>
>    "us-ascii:iso-8859-1:utf-8"
>
> and selecting iso-8859-1 instead of utf-8 as the encoding despite the original
> mail being utf-8.  In this case, mutt ran afoul of nbsp (u+00a0).
>
> AFAICT, the solution is to essentially tell mutt to never try to use iso-8859-1
> for sending mail
>
>    set send_charset="us-ascii:utf-8"

It made me feel a bit guilty as I thought it could be resulted from 
wrong settings of my email system :-)


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 00/21] Enable CET Virtualization
  2023-07-19 20:36               ` Peter Zijlstra
@ 2023-07-20  5:26                 ` Pankaj Gupta
  2023-07-20  8:03                   ` Peter Zijlstra
  0 siblings, 1 reply; 99+ messages in thread
From: Pankaj Gupta @ 2023-07-20  5:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sean Christopherson, Weijiang Yang, pbonzini, kvm, linux-kernel,
	rppt, binbin.wu, rick.p.edgecombe, john.allen, Chao Gao

> > My understanding is that PL[0-2]_SSP are used only on transitions to the
> > corresponding privilege level from a *different* privilege level.  That means
> > KVM should be able to utilize the user_return_msr framework to load the host
> > values.  Though if Linux ever supports SSS, I'm guessing the core kernel will
> > have some sort of mechanism to defer loading MSR_IA32_PL0_SSP until an exit to
> > userspace, e.g. to avoid having to write PL0_SSP, which will presumably be
> > per-task, on every context switch.
> >
> > But note my original wording: **If that's necessary**
> >
> > If nothing in the host ever consumes those MSRs, i.e. if SSS is NOT enabled in
> > IA32_S_CET, then running host stuff with guest values should be ok.  KVM only
> > needs to guarantee that it doesn't leak values between guests.  But that should
> > Just Work, e.g. KVM should load the new vCPU's values if SHSTK is exposed to the
> > guest, and intercept (to inject #GP) if SHSTK is not exposed to the guest.
> >
> > And regardless of what the mechanism ends up managing SSP MSRs, it should only
> > ever touch PL0_SSP, because Linux never runs anything at CPL1 or CPL2, i.e. will
> > never consume PL{1,2}_SSP.
>
> To clarify, Linux will only use SSS in FRED mode -- FRED removes CPL1,2.

Trying to understand more what prevents SSS to enable in pre FRED, Is
it better #CP exception
handling with other nested exceptions?

Won't same problems (to some extent) happen in user-mode shadow stack
(and in case of guest, SSS inside VM)?

Thanks,
Pankaj

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 00/21] Enable CET Virtualization
  2023-07-20  5:26                 ` Pankaj Gupta
@ 2023-07-20  8:03                   ` Peter Zijlstra
  2023-07-20  8:09                     ` Peter Zijlstra
  2023-07-20 10:46                     ` Andrew Cooper
  0 siblings, 2 replies; 99+ messages in thread
From: Peter Zijlstra @ 2023-07-20  8:03 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: Sean Christopherson, Weijiang Yang, pbonzini, kvm, linux-kernel,
	rppt, binbin.wu, rick.p.edgecombe, john.allen, Chao Gao,
	Andrew Cooper

On Thu, Jul 20, 2023 at 07:26:04AM +0200, Pankaj Gupta wrote:
> > > My understanding is that PL[0-2]_SSP are used only on transitions to the
> > > corresponding privilege level from a *different* privilege level.  That means
> > > KVM should be able to utilize the user_return_msr framework to load the host
> > > values.  Though if Linux ever supports SSS, I'm guessing the core kernel will
> > > have some sort of mechanism to defer loading MSR_IA32_PL0_SSP until an exit to
> > > userspace, e.g. to avoid having to write PL0_SSP, which will presumably be
> > > per-task, on every context switch.
> > >
> > > But note my original wording: **If that's necessary**
> > >
> > > If nothing in the host ever consumes those MSRs, i.e. if SSS is NOT enabled in
> > > IA32_S_CET, then running host stuff with guest values should be ok.  KVM only
> > > needs to guarantee that it doesn't leak values between guests.  But that should
> > > Just Work, e.g. KVM should load the new vCPU's values if SHSTK is exposed to the
> > > guest, and intercept (to inject #GP) if SHSTK is not exposed to the guest.
> > >
> > > And regardless of what the mechanism ends up managing SSP MSRs, it should only
> > > ever touch PL0_SSP, because Linux never runs anything at CPL1 or CPL2, i.e. will
> > > never consume PL{1,2}_SSP.
> >
> > To clarify, Linux will only use SSS in FRED mode -- FRED removes CPL1,2.
> 
> Trying to understand more what prevents SSS to enable in pre FRED, Is
> it better #CP exception
> handling with other nested exceptions?

SSS took the syscall gap and made it worse -- as in *way* worse.

To top it off, the whole SSS busy bit thing is fundamentally
incompatible with how we manage to survive nested exceptions in NMI
context.

Basically, the whole x86 exception / stack switching logic was already
borderline impossible (consider taking an MCE in the early NMI path
where we set up, but have not finished, the re-entrancy stuff), and
pushed it over the edge and set it on fire.

And NMI isn't the only problem, the various new virt exceptions #VC and
#HV are on their own already near impossible, adding SSS again pushes
the whole thing into clear insanity.

There's a good exposition of the whole trainwreck by Andrew here:

  https://www.youtube.com/watch?v=qcORS8CN0ow

(that is, sorry for the youtube link, but Google is failing me in
finding the actual Google Doc that talk is based on, or even the slide
deck :/)



FRED solves all that by:

 - removing the stack gap, cc/ip/ss/sp/ssp/gs will all be switched
   atomically and consistently for every transition.

 - removing the non-reentrant IST mechanism and replacing it with stack
   levels

 - adding an explicit NMI latch

 - re-organising the actual shadow stacks and doing away with that busy
   bit thing (I need to re-read the FRED spec on this detail again).



Crazy as we are, we're not touching legacy/IDT SSS with a ten foot pole,
sorry.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 00/21] Enable CET Virtualization
  2023-07-20  8:03                   ` Peter Zijlstra
@ 2023-07-20  8:09                     ` Peter Zijlstra
  2023-07-20  9:14                       ` Pankaj Gupta
  2023-07-20 10:46                     ` Andrew Cooper
  1 sibling, 1 reply; 99+ messages in thread
From: Peter Zijlstra @ 2023-07-20  8:09 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: Sean Christopherson, Weijiang Yang, pbonzini, kvm, linux-kernel,
	rppt, binbin.wu, rick.p.edgecombe, john.allen, Chao Gao,
	Andrew Cooper

On Thu, Jul 20, 2023 at 10:03:58AM +0200, Peter Zijlstra wrote:

> > Trying to understand more what prevents SSS to enable in pre FRED, Is
> > it better #CP exception
> > handling with other nested exceptions?
> 
> SSS took the syscall gap and made it worse -- as in *way* worse.
> 
> To top it off, the whole SSS busy bit thing is fundamentally
> incompatible with how we manage to survive nested exceptions in NMI
> context.
> 
> Basically, the whole x86 exception / stack switching logic was already
> borderline impossible (consider taking an MCE in the early NMI path
> where we set up, but have not finished, the re-entrancy stuff), and

SSS

> pushed it over the edge and set it on fire.
> 
> And NMI isn't the only problem, the various new virt exceptions #VC and
> #HV are on their own already near impossible, adding SSS again pushes
> the whole thing into clear insanity.
> 
> There's a good exposition of the whole trainwreck by Andrew here:
> 
>   https://www.youtube.com/watch?v=qcORS8CN0ow
> 
> (that is, sorry for the youtube link, but Google is failing me in
> finding the actual Google Doc that talk is based on, or even the slide
> deck :/)
> 
> 
> 
> FRED solves all that by:
> 
>  - removing the stack gap, cc/ip/ss/sp/ssp/gs will all be switched
>    atomically and consistently for every transition.
> 
>  - removing the non-reentrant IST mechanism and replacing it with stack
>    levels
> 
>  - adding an explicit NMI latch
> 
>  - re-organising the actual shadow stacks and doing away with that busy
>    bit thing (I need to re-read the FRED spec on this detail again).
> 
> 
> 
> Crazy as we are, we're not touching legacy/IDT SSS with a ten foot pole,
> sorry.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 00/21] Enable CET Virtualization
  2023-07-20  8:09                     ` Peter Zijlstra
@ 2023-07-20  9:14                       ` Pankaj Gupta
  0 siblings, 0 replies; 99+ messages in thread
From: Pankaj Gupta @ 2023-07-20  9:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sean Christopherson, Weijiang Yang, pbonzini, kvm, linux-kernel,
	rppt, binbin.wu, rick.p.edgecombe, john.allen, Chao Gao,
	Andrew Cooper

> > > Trying to understand more what prevents SSS to enable in pre FRED, Is
> > > it better #CP exception
> > > handling with other nested exceptions?
> >
> > SSS took the syscall gap and made it worse -- as in *way* worse.
> >
> > To top it off, the whole SSS busy bit thing is fundamentally
> > incompatible with how we manage to survive nested exceptions in NMI
> > context.
> >
> > Basically, the whole x86 exception / stack switching logic was already
> > borderline impossible (consider taking an MCE in the early NMI path
> > where we set up, but have not finished, the re-entrancy stuff), and
>
> SSS
>
> > pushed it over the edge and set it on fire.

ah I see. SSS takes it to the next level.

> >
> > And NMI isn't the only problem, the various new virt exceptions #VC and
> > #HV are on their own already near impossible, adding SSS again pushes
> > the whole thing into clear insanity.
> >
> > There's a good exposition of the whole trainwreck by Andrew here:
> >
> >   https://www.youtube.com/watch?v=qcORS8CN0ow
> >
> > (that is, sorry for the youtube link, but Google is failing me in
> > finding the actual Google Doc that talk is based on, or even the slide
> > deck :/)

I think I got the link:
https://docs.google.com/document/d/1hWejnyDkjRRAW-JEsRjA5c9CKLOPc6VKJQsuvODlQEI/edit?pli=1

> >
> >
> >
> > FRED solves all that by:
> >
> >  - removing the stack gap, cc/ip/ss/sp/ssp/gs will all be switched
> >    atomically and consistently for every transition.
> >
> >  - removing the non-reentrant IST mechanism and replacing it with stack
> >    levels
> >
> >  - adding an explicit NMI latch
> >
> >  - re-organising the actual shadow stacks and doing away with that busy
> >    bit thing (I need to re-read the FRED spec on this detail again).
> >

Thank you for explaining. I will also study the FRED spec and
corresponding kernel
patches posted in the mailing list.
> >
> >
> > Crazy as we are, we're not touching legacy/IDT SSS with a ten foot pole,
> > sorry.

ya, interesting.

Best regards,
Pankaj

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 00/21] Enable CET Virtualization
  2023-07-20  8:03                   ` Peter Zijlstra
  2023-07-20  8:09                     ` Peter Zijlstra
@ 2023-07-20 10:46                     ` Andrew Cooper
  1 sibling, 0 replies; 99+ messages in thread
From: Andrew Cooper @ 2023-07-20 10:46 UTC (permalink / raw)
  To: Peter Zijlstra, Pankaj Gupta
  Cc: Sean Christopherson, Weijiang Yang, pbonzini, kvm, linux-kernel,
	rppt, binbin.wu, rick.p.edgecombe, john.allen, Chao Gao

On 20/07/2023 9:03 am, Peter Zijlstra wrote:
> On Thu, Jul 20, 2023 at 07:26:04AM +0200, Pankaj Gupta wrote:
>>>> My understanding is that PL[0-2]_SSP are used only on transitions to the
>>>> corresponding privilege level from a *different* privilege level.  That means
>>>> KVM should be able to utilize the user_return_msr framework to load the host
>>>> values.  Though if Linux ever supports SSS, I'm guessing the core kernel will
>>>> have some sort of mechanism to defer loading MSR_IA32_PL0_SSP until an exit to
>>>> userspace, e.g. to avoid having to write PL0_SSP, which will presumably be
>>>> per-task, on every context switch.
>>>>
>>>> But note my original wording: **If that's necessary**
>>>>
>>>> If nothing in the host ever consumes those MSRs, i.e. if SSS is NOT enabled in
>>>> IA32_S_CET, then running host stuff with guest values should be ok.  KVM only
>>>> needs to guarantee that it doesn't leak values between guests.  But that should
>>>> Just Work, e.g. KVM should load the new vCPU's values if SHSTK is exposed to the
>>>> guest, and intercept (to inject #GP) if SHSTK is not exposed to the guest.
>>>>
>>>> And regardless of what the mechanism ends up managing SSP MSRs, it should only
>>>> ever touch PL0_SSP, because Linux never runs anything at CPL1 or CPL2, i.e. will
>>>> never consume PL{1,2}_SSP.
>>> To clarify, Linux will only use SSS in FRED mode -- FRED removes CPL1,2.
>> Trying to understand more what prevents SSS to enable in pre FRED, Is
>> it better #CP exception
>> handling with other nested exceptions?
> SSS 

Careful with SSS for "supervisor shadow stacks".   Because there's a
brand new CET_SSS CPUID bit to cover the (mis)feature where shstk
supervisor tokens can be *prematurely busy*.

(11/10 masterful wordsmithing, because it does lull you into the
impression that this isn't WTF^2 levels of crazy)

> took the syscall gap and made it worse -- as in *way* worse.

More impressively, it created a sysenter gap where there wasn't one
previously.

> To top it off, the whole SSS busy bit thing is fundamentally
> incompatible with how we manage to survive nested exceptions in NMI
> context.

To be clear, this is supervisor shadow stack regular busy bits, not the
CET_SSS premature busy problem.

>
> Basically, the whole x86 exception / stack switching logic was already
> borderline impossible (consider taking an MCE in the early NMI path
> where we set up, but have not finished, the re-entrancy stuff), and
> pushed it over the edge and set it on fire.
>
> And NMI isn't the only problem, the various new virt exceptions #VC and
> #HV are on their own already near impossible, adding SSS again pushes
> the whole thing into clear insanity.
>
> There's a good exposition of the whole trainwreck by Andrew here:
>
>   https://www.youtube.com/watch?v=qcORS8CN0ow
>
> (that is, sorry for the youtube link, but Google is failing me in
> finding the actual Google Doc that talk is based on, or even the slide
> deck :/)

https://docs.google.com/presentation/d/10vWC02kpy4QneI43qsT3worfF_e3sbAE3Ifr61Sq3dY/edit?usp=sharing
is the slide deck.

I'm very glad I put a "only accurate as of $PRESENTATION_DATE"
disclaimer on slide 14.  It makes the whole presentation still
technically correct.

FRED is now at draft 5, and importantly shstk tokens have been removed. 
They've been replaced with alternative MSR-based mechanism, mostly for
performance reasons but a consequence is that the prematurely busy bug
can't happen.

~Andrew

^ permalink raw reply	[flat|nested] 99+ messages in thread

end of thread, other threads:[~2023-07-20 10:46 UTC | newest]

Thread overview: 99+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-11  4:08 [PATCH v3 00/21] Enable CET Virtualization Yang Weijiang
2023-05-11  4:08 ` [PATCH v3 01/21] x86/shstk: Add Kconfig option for shadow stack Yang Weijiang
2023-05-11  4:08 ` [PATCH v3 02/21] x86/cpufeatures: Add CPU feature flags for shadow stacks Yang Weijiang
2023-05-11  4:08 ` [PATCH v3 03/21] x86/cpufeatures: Enable CET CR4 bit for shadow stack Yang Weijiang
2023-05-11  4:08 ` [PATCH v3 04/21] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states Yang Weijiang
2023-05-11  4:08 ` [PATCH v3 05/21] x86/fpu: Add helper for modifying xstate Yang Weijiang
2023-05-11  4:08 ` [PATCH v3 06/21] KVM:x86: Report XSS as to-be-saved if there are supported features Yang Weijiang
2023-05-24  7:06   ` Chao Gao
2023-05-24  8:19     ` Yang, Weijiang
2023-05-11  4:08 ` [PATCH v3 07/21] KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS Yang Weijiang
2023-05-25  6:10   ` Chao Gao
2023-05-30  3:51     ` Yang, Weijiang
2023-05-30 12:08       ` Chao Gao
2023-05-31  1:11         ` Yang, Weijiang
2023-06-15 23:45           ` Sean Christopherson
2023-06-16  1:58             ` Yang, Weijiang
2023-06-23 23:21               ` Sean Christopherson
2023-06-26  9:24                 ` Yang, Weijiang
2023-05-11  4:08 ` [PATCH v3 08/21] KVM:x86: Init kvm_caps.supported_xss with supported feature bits Yang Weijiang
2023-06-06  8:38   ` Chao Gao
2023-06-08  5:42     ` Yang, Weijiang
2023-05-11  4:08 ` [PATCH v3 09/21] KVM:x86: Load guest FPU state when accessing xsaves-managed MSRs Yang Weijiang
2023-06-15 23:50   ` Sean Christopherson
2023-06-16  2:02     ` Yang, Weijiang
2023-05-11  4:08 ` [PATCH v3 10/21] KVM:x86: Add #CP support in guest exception classification Yang Weijiang
2023-06-06  9:08   ` Chao Gao
2023-06-08  6:01     ` Yang, Weijiang
2023-06-15 23:58       ` Sean Christopherson
2023-06-16  6:56         ` Yang, Weijiang
2023-06-16 18:57           ` Sean Christopherson
2023-06-19  9:28             ` Yang, Weijiang
2023-06-30  9:34             ` Yang, Weijiang
2023-06-30 10:27               ` Chao Gao
2023-06-30 12:05                 ` Yang, Weijiang
2023-06-30 15:05                   ` Neiger, Gil
2023-06-30 15:15                     ` Sean Christopherson
2023-07-01  1:58                       ` Yang, Weijiang
2023-07-01  1:54                     ` Yang, Weijiang
2023-06-30 15:07               ` Sean Christopherson
2023-06-30 15:21                 ` Neiger, Gil
2023-07-01  1:57                 ` Yang, Weijiang
2023-05-11  4:08 ` [PATCH v3 11/21] KVM:VMX: Introduce CET VMCS fields and control bits Yang Weijiang
2023-05-11  4:08 ` [PATCH v3 12/21] KVM:x86: Add fault checks for guest CR4.CET setting Yang Weijiang
2023-06-06 11:03   ` Chao Gao
2023-06-08  6:06     ` Yang, Weijiang
2023-05-11  4:08 ` [PATCH v3 13/21] KVM:VMX: Emulate reads and writes to CET MSRs Yang Weijiang
2023-05-23  8:21   ` Binbin Wu
2023-05-24  2:49     ` Yang, Weijiang
2023-06-23 23:53   ` Sean Christopherson
2023-06-26 14:05     ` Yang, Weijiang
2023-06-26 21:15       ` Sean Christopherson
2023-06-27  3:32         ` Yang, Weijiang
2023-06-27 14:55           ` Sean Christopherson
2023-06-28  1:42             ` Yang, Weijiang
2023-07-07  9:10     ` Yang, Weijiang
2023-07-07 15:28       ` Neiger, Gil
2023-07-12 16:42       ` Sean Christopherson
2023-05-11  4:08 ` [PATCH v3 14/21] KVM:VMX: Add a synthetic MSR to allow userspace to access GUEST_SSP Yang Weijiang
2023-05-23  8:57   ` Binbin Wu
2023-05-24  2:55     ` Yang, Weijiang
2023-05-11  4:08 ` [PATCH v3 15/21] KVM:x86: Report CET MSRs as to-be-saved if CET is supported Yang Weijiang
2023-05-11  4:08 ` [PATCH v3 16/21] KVM:x86: Save/Restore GUEST_SSP to/from SMM state save area Yang Weijiang
2023-06-23 22:30   ` Sean Christopherson
2023-06-26  8:59     ` Yang, Weijiang
2023-06-26 21:20       ` Sean Christopherson
2023-06-27  3:50         ` Yang, Weijiang
2023-05-11  4:08 ` [PATCH v3 17/21] KVM:VMX: Pass through user CET MSRs to the guest Yang Weijiang
2023-05-11  4:08 ` [PATCH v3 18/21] KVM:x86: Enable CET virtualization for VMX and advertise to userspace Yang Weijiang
2023-05-24  6:35   ` Chenyi Qiang
2023-05-24  8:07     ` Yang, Weijiang
2023-05-11  4:08 ` [PATCH v3 19/21] KVM:nVMX: Enable user CET support for nested VMX Yang Weijiang
2023-05-11  4:08 ` [PATCH v3 20/21] KVM:x86: Enable kernel IBT support for guest Yang Weijiang
2023-06-24  0:03   ` Sean Christopherson
2023-06-26 12:10     ` Yang, Weijiang
2023-06-26 20:50       ` Sean Christopherson
2023-06-27  1:53         ` Yang, Weijiang
2023-05-11  4:08 ` [PATCH v3 21/21] KVM:x86: Support CET supervisor shadow stack MSR access Yang Weijiang
2023-06-15 23:30 ` [PATCH v3 00/21] Enable CET Virtualization Sean Christopherson
2023-06-16  0:00   ` Sean Christopherson
2023-06-16  1:00     ` Yang, Weijiang
2023-06-16  8:25   ` Yang, Weijiang
2023-06-16 17:56     ` Sean Christopherson
2023-06-19  6:41       ` Yang, Weijiang
2023-06-23 20:51         ` Sean Christopherson
2023-06-26  6:46           ` Yang, Weijiang
2023-07-17  7:44           ` Yang, Weijiang
2023-07-19 19:41             ` Sean Christopherson
2023-07-19 20:26               ` Sean Christopherson
2023-07-20  1:58                 ` Yang, Weijiang
2023-07-19 20:36               ` Peter Zijlstra
2023-07-20  5:26                 ` Pankaj Gupta
2023-07-20  8:03                   ` Peter Zijlstra
2023-07-20  8:09                     ` Peter Zijlstra
2023-07-20  9:14                       ` Pankaj Gupta
2023-07-20 10:46                     ` Andrew Cooper
2023-07-20  1:55               ` Yang, Weijiang
2023-07-10  0:28       ` Yang, Weijiang
2023-07-10 22:18         ` Sean Christopherson
2023-07-11  1:24           ` Yang, Weijiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).