linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/19] Enable CET Virtualization
@ 2023-08-03  4:27 Yang Weijiang
  2023-08-03  4:27 ` [PATCH v5 01/19] x86/cpufeatures: Add CPU feature flags for shadow stacks Yang Weijiang
                   ` (18 more replies)
  0 siblings, 19 replies; 82+ messages in thread
From: Yang Weijiang @ 2023-08-03  4:27 UTC (permalink / raw)
  To: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel
  Cc: rick.p.edgecombe, chao.gao, binbin.wu, weijiang.yang

Control-flow Enforcement Technology (CET) is a kind of CPU feature used
to prevent Return/CALL/Jump-Oriented Programming (ROP/COP/JOP) attacks.
It provides two sub-features(SHSTK,IBT) to defend against ROP/COP/JOP
style control-flow subversion attacks.

Shadow Stack (SHSTK):
  A shadow stack is a second stack used exclusively for control transfer
  operations. The shadow stack is separate from the data/normal stack and
  can be enabled individually in user and kernel mode. When shadow stack
  is enabled, CALL pushes the return address on both the data and shadow
  stack. RET pops the return address from both stacks and compares them.
  If the return addresses from the two stacks do not match, the processor
  generates a #CP.

Indirect Branch Tracking (IBT):
  IBT introduces new instruction(ENDBRANCH)to mark valid target addresses of
  indirect branches (CALL, JMP etc...). If an indirect branch is executed
  and the next instruction is _not_ an ENDBRANCH, the processor generates a
  #CP. These instruction behaves as a NOP on platforms that doesn't support
  CET.


Dependency:
--------------------------------------------------------------------------
The first 2 patches are taken over from CET native series[1] in kernel tip.
They're prerequisites for this KVM patch series as CET user mode xstate and
some feature bits are defined in the patches. Add this KVM series to kernel
tree to build qualified host kernel to support guest CET features. Also apply
QEMU CET enabling patches[2] to build qualified QEMU. These kernel dependent
patches will be enclosed in KVM series until CET native series is merged in
mainline tree.


Implementation:
--------------------------------------------------------------------------
This series enables full support for guest CET SHSTK/IBT register states,
i.e., CET register states in below usage models are backed by KVM.

                  |
    User SHSTK    |    User IBT      (user mode)
--------------------------------------------------
    Kernel SHSTK  |    Kernel IBT    (kernel mode)
                  |

KVM cooperates with host kernel to back CET register states in each model.
In this series, KVM manages guest CET kernel registers(MSRs) by itself and
relies on host kernel to manage the user mode registers, thus KVM relies on
capability from host XSS MSR before exposes CET features to guest.

Note, guest supervisor(kernel) SHSTK cannot be fully supported by this series,
therefore guest SSS_CET bit of CPUID(0x7,1):EDX[bit18] is cleared. Check SDM
(Vol 1, Section 17.2.3) for details.


CET states management:
--------------------------------------------------------------------------
CET user mode states, MSR_IA32_{U_CET,PL3_SSP}, depends on {XSAVES,XRSTORS}
instructions to swap guest and host's states. On vmexit, guest user states
are saved to guest fpu area and host user mode states are loaded from thread
context before vCPU returns to userspace, vice-versa on vmentry. See details
in kvm_{load,put}_guest_fpu(). So CET user mode states management depends on
CET user mode bit(U_CET bit) set in host XSS MSR.

CET supervisor mode states are grouped into two categories : XSAVE-managed
and non-XSAVE-managed, the former includes MSR_IA32_PL{0,1,2}_SSP and are
controlled by CET supervisor mode bit(S_CET bit) in XSS, the later consists
of MSR_IA32_S_CET and MSR_IA32_INTR_SSP_TBL.

The XSAVE-managed supervisor states theoretically can be handled by enabling
S_CET bit in host XSS. But given the fact supervisor shadow stack isn't enabled
in Linux kernel, enabling the control bit just like that for user mode states
has global side-effects to all threads/tasks running on host, i.e.:
1) Introducing unnecessary XSAVE operation when switch context between non-vCPU
userspace within current FPU framework.
2)Forcing allocating additional space for CET supervisor states in each thread
context regardless whether it's vCPU thread or not.

To avoid these downsides, this series provides a KVM solution to save/reload
vCPU's supervisor SHSTK states.

VMX introduces new VMCS fields, {GUEST|HOST}_{S_CET,SSP,INTR_SSP_TABL}, to
facilitate guest/host non-XSAVES-managed states. When VMX CET entry/exit load
bits are set, guest/host MSR_IA32_{S_CET,INTR_SSP_TBL,SSP} are loaded from
equivalent fields at vm-exit/entry. With these new fields, such supervisor states
require no addtional KVM save/reload actions.


Tests:
--------------------------------------------------------------------------
This series passed basic CET user shadow stack test and kernel IBT test in L1 and
L2 guest.

The patch series _has_ impact to existing vmx test cases in KVM-unit-tests,the
failures have been fixed in this patch[3].

All other parts of KVM unit-tests and selftests passed with this series. One new
selftest app for CET MSRs is posted here[4].

Note, this series hasn't been tested on AMD platform yet.

To run user SHSTK test and kernel IBT test in guest , an CET capable
platform is required, e.g., Sapphire Rapids server, and follow below steps to
build host/guest kernel properly:

1. Build host kernel: Add this series to kernel tree and build kernel.

2. Build guest kernel: Add full CET _native_ series to kernel tree and opt-in
CONFIG_X86_KERNEL_IBT and CONFIG_X86_USER_SHADOW_STACK options. Build with CET
enabled gcc versions(>= 8.5.0).

3. Use patched QEMU to launch a guest.

Check kernel selftest test_shadow_stack_64 output:

[INFO]  new_ssp = 7f8c82100ff8, *new_ssp = 7f8c82101001
[INFO]  changing ssp from 7f8c82900ff0 to 7f8c82100ff8
[INFO]  ssp is now 7f8c82101000
[OK]    Shadow stack pivot
[OK]    Shadow stack faults
[INFO]  Corrupting shadow stack
[INFO]  Generated shadow stack violation successfully
[OK]    Shadow stack violation test
[INFO]  Gup read -> shstk access success
[INFO]  Gup write -> shstk access success
[INFO]  Violation from normal write
[INFO]  Gup read -> write access success
[INFO]  Violation from normal write
[INFO]  Gup write -> write access success
[INFO]  Cow gup write -> write access success
[OK]    Shadow gup test
[INFO]  Violation from shstk access
[OK]    mprotect() test
[SKIP]  Userfaultfd unavailable.
[OK]    32 bit test


Check kernel IBT with dmesg | grep CET:

CET detected: Indirect Branch Tracking enabled

--------------------------------------------------------------------------
Changes in v5:
1. Consolidated CET MSR access code into one patch to make it clearer. [Chao]
2. Refined SSP handling when enter/exit SMM mode. [Chao]
3. Refined CET MSR interception to make it adaptive to enable/disable cases. [Chao]
4. Refined guest PL{0,1,2}_SSP handling to comply with exiting code logic. [Chao]
5. Other tweaks per Sean and Chao's feedback.
6. Rebased to: https://github.com/kvm-x86/linux/tree/next tag: kvm-x86-next-2023.07.28


[1]: CET native series: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=x86/shstk
[2]: QEMU patch: https://lore.kernel.org/all/20230720111445.99509-1-weijiang.yang@intel.com/
[3]: KVM-unit-tests fixup: https://lore.kernel.org/all/20230720115810.104890-1-weijiang.yang@intel.com/ 
[4]: Selftests for CET MSRs: https://lore.kernel.org/all/20230720120401.105770-1-weijiang.yang@intel.com/ 
[5]: v4 patchset: https://lore.kernel.org/all/20230721030352.72414-1-weijiang.yang@intel.com/ 


Patch 1-2: 	Native dependent CET patches.
Patch 3-5: 	Enable XSS support in KVM.
Patch 6 :  	Prepare patch for XSAVE-managed MSR access
Patch 7-9:  	Common patches to support CET on x86.
Patch 10-11: 	Emulate CET MSR access.
Patch 12: 	Handle SSP at entry/exit to SMM.
Patch 13-17: 	Add CET virtualization settings.
Patch 18-19: 	nVMX patches for CET support in nested VM.


Rick Edgecombe (2):
  x86/cpufeatures: Add CPU feature flags for shadow stacks
  x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states

Sean Christopherson (2):
  KVM:x86: Report XSS as to-be-saved if there are supported features
  KVM:x86: Load guest FPU state when access XSAVE-managed MSRs

Yang Weijiang (15):
  KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS
  KVM:x86: Initialize kvm_caps.supported_xss
  KVM:x86: Add fault checks for guest CR4.CET setting
  KVM:x86: Report KVM supported CET MSRs as to-be-saved
  KVM:x86: Make guest supervisor states as non-XSAVE managed
  KVM:VMX: Introduce CET VMCS fields and control bits
  KVM:VMX: Emulate read and write to CET MSRs
  KVM:x86: Save and reload SSP to/from SMRAM
  KVM:VMX: Set up interception for CET MSRs
  KVM:VMX: Set host constant supervisor states to VMCS fields
  KVM:x86: Optimize CET supervisor SSP save/reload
  KVM:x86: Enable CET virtualization for VMX and advertise to userspace
  KVM:x86: Enable guest CET supervisor xstate bit support
  KVM:nVMX: Refine error code injection to nested VM
  KVM:nVMX: Enable CET support for nested VM

 arch/x86/include/asm/cpufeatures.h       |   2 +
 arch/x86/include/asm/disabled-features.h |   8 +-
 arch/x86/include/asm/fpu/types.h         |  16 +-
 arch/x86/include/asm/fpu/xstate.h        |   6 +-
 arch/x86/include/asm/kvm_host.h          |   6 +-
 arch/x86/include/asm/msr-index.h         |   1 +
 arch/x86/include/asm/vmx.h               |   8 +
 arch/x86/include/uapi/asm/kvm_para.h     |   1 +
 arch/x86/kernel/cpu/cpuid-deps.c         |   1 +
 arch/x86/kernel/fpu/xstate.c             |  90 ++++-----
 arch/x86/kvm/cpuid.c                     |  32 ++-
 arch/x86/kvm/cpuid.h                     |  11 ++
 arch/x86/kvm/smm.c                       |  11 ++
 arch/x86/kvm/smm.h                       |   2 +-
 arch/x86/kvm/svm/svm.c                   |   2 +
 arch/x86/kvm/vmx/capabilities.h          |  10 +
 arch/x86/kvm/vmx/nested.c                |  49 ++++-
 arch/x86/kvm/vmx/nested.h                |   7 +
 arch/x86/kvm/vmx/vmcs12.c                |   6 +
 arch/x86/kvm/vmx/vmcs12.h                |  14 +-
 arch/x86/kvm/vmx/vmx.c                   | 133 ++++++++++++-
 arch/x86/kvm/vmx/vmx.h                   |   6 +-
 arch/x86/kvm/x86.c                       | 242 +++++++++++++++++++++--
 arch/x86/kvm/x86.h                       |  35 ++++
 24 files changed, 614 insertions(+), 85 deletions(-)


base-commit: d406b457840171306ada37400e4f3d3c6f0f4960
-- 
2.27.0


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH v5 01/19] x86/cpufeatures: Add CPU feature flags for shadow stacks
  2023-08-03  4:27 [PATCH v5 00/19] Enable CET Virtualization Yang Weijiang
@ 2023-08-03  4:27 ` Yang Weijiang
  2023-08-03  4:27 ` [PATCH v5 02/19] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states Yang Weijiang
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 82+ messages in thread
From: Yang Weijiang @ 2023-08-03  4:27 UTC (permalink / raw)
  To: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel
  Cc: rick.p.edgecombe, chao.gao, binbin.wu, weijiang.yang,
	Yu-cheng Yu, Borislav Petkov, Kees Cook, Mike Rapoport,
	Pengfei Xu

From: Rick Edgecombe <rick.p.edgecombe@intel.com>

The Control-Flow Enforcement Technology contains two related features,
one of which is Shadow Stacks. Future patches will utilize this feature
for shadow stack support in KVM, so add a CPU feature flags for Shadow
Stacks (CPUID.(EAX=7,ECX=0):ECX[bit 7]).

To protect shadow stack state from malicious modification, the registers
are only accessible in supervisor mode. This implementation
context-switches the registers with XSAVES. Make X86_FEATURE_SHSTK depend
on XSAVES.

The shadow stack feature, enumerated by the CPUID bit described above,
encompasses both supervisor and userspace support for shadow stack. In
near future patches, only userspace shadow stack will be enabled. In
expectation of future supervisor shadow stack support, create a software
CPU capability to enumerate kernel utilization of userspace shadow stack
support. This user shadow stack bit should depend on the HW "shstk"
capability and that logic will be implemented in future patches.

Co-developed-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/all/20230613001108.3040476-9-rick.p.edgecombe%40intel.com
---
 arch/x86/include/asm/cpufeatures.h       | 2 ++
 arch/x86/include/asm/disabled-features.h | 8 +++++++-
 arch/x86/kernel/cpu/cpuid-deps.c         | 1 +
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 31c862d79fae..8ea5c290259c 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -308,6 +308,7 @@
 #define X86_FEATURE_MSR_TSX_CTRL	(11*32+20) /* "" MSR IA32_TSX_CTRL (Intel) implemented */
 #define X86_FEATURE_SMBA		(11*32+21) /* "" Slow Memory Bandwidth Allocation */
 #define X86_FEATURE_BMEC		(11*32+22) /* "" Bandwidth Monitoring Event Configuration */
+#define X86_FEATURE_USER_SHSTK		(11*32+23) /* Shadow stack support for user mode applications */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI instructions */
@@ -380,6 +381,7 @@
 #define X86_FEATURE_OSPKE		(16*32+ 4) /* OS Protection Keys Enable */
 #define X86_FEATURE_WAITPKG		(16*32+ 5) /* UMONITOR/UMWAIT/TPAUSE Instructions */
 #define X86_FEATURE_AVX512_VBMI2	(16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */
+#define X86_FEATURE_SHSTK		(16*32+ 7) /* "" Shadow stack */
 #define X86_FEATURE_GFNI		(16*32+ 8) /* Galois Field New Instructions */
 #define X86_FEATURE_VAES		(16*32+ 9) /* Vector AES */
 #define X86_FEATURE_VPCLMULQDQ		(16*32+10) /* Carry-Less Multiplication Double Quadword */
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index fafe9be7a6f4..b9c7eae2e70f 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -105,6 +105,12 @@
 # define DISABLE_TDX_GUEST	(1 << (X86_FEATURE_TDX_GUEST & 31))
 #endif
 
+#ifdef CONFIG_X86_USER_SHADOW_STACK
+#define DISABLE_USER_SHSTK	0
+#else
+#define DISABLE_USER_SHSTK	(1 << (X86_FEATURE_USER_SHSTK & 31))
+#endif
+
 /*
  * Make sure to add features to the correct mask
  */
@@ -120,7 +126,7 @@
 #define DISABLED_MASK9	(DISABLE_SGX)
 #define DISABLED_MASK10	0
 #define DISABLED_MASK11	(DISABLE_RETPOLINE|DISABLE_RETHUNK|DISABLE_UNRET| \
-			 DISABLE_CALL_DEPTH_TRACKING)
+			 DISABLE_CALL_DEPTH_TRACKING|DISABLE_USER_SHSTK)
 #define DISABLED_MASK12	(DISABLE_LAM)
 #define DISABLED_MASK13	0
 #define DISABLED_MASK14	0
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index f6748c8bd647..e462c1d3800a 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -81,6 +81,7 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_XFD,			X86_FEATURE_XSAVES    },
 	{ X86_FEATURE_XFD,			X86_FEATURE_XGETBV1   },
 	{ X86_FEATURE_AMX_TILE,			X86_FEATURE_XFD       },
+	{ X86_FEATURE_SHSTK,			X86_FEATURE_XSAVES    },
 	{}
 };
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v5 02/19] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states
  2023-08-03  4:27 [PATCH v5 00/19] Enable CET Virtualization Yang Weijiang
  2023-08-03  4:27 ` [PATCH v5 01/19] x86/cpufeatures: Add CPU feature flags for shadow stacks Yang Weijiang
@ 2023-08-03  4:27 ` Yang Weijiang
  2023-08-03  4:27 ` [PATCH v5 03/19] KVM:x86: Report XSS as to-be-saved if there are supported features Yang Weijiang
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 82+ messages in thread
From: Yang Weijiang @ 2023-08-03  4:27 UTC (permalink / raw)
  To: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel
  Cc: rick.p.edgecombe, chao.gao, binbin.wu, weijiang.yang,
	Yu-cheng Yu, Borislav Petkov, Kees Cook, Mike Rapoport,
	Pengfei Xu

From: Rick Edgecombe <rick.p.edgecombe@intel.com>

Shadow stack register state can be managed with XSAVE. The registers
can logically be separated into two groups:
        * Registers controlling user-mode operation
        * Registers controlling kernel-mode operation

The architecture has two new XSAVE state components: one for each group
of those groups of registers. This lets an OS manage them separately if
it chooses. Future patches for host userspace and KVM guests will only
utilize the user-mode registers, so only configure XSAVE to save
user-mode registers. This state will add 16 bytes to the xsave buffer
size.

Future patches will use the user-mode XSAVE area to save guest user-mode
CET state. However, VMCS includes new fields for guest CET supervisor
states. KVM can use these to save and restore guest supervisor state, so
host supervisor XSAVE support is not required.

Adding this exacerbates the already unwieldy if statement in
check_xstate_against_struct() that handles warning about unimplemented
xfeatures. So refactor these check's by having XCHECK_SZ() set a bool when
it actually check's the xfeature. This ends up exceeding 80 chars, but was
better on balance than other options explored. Pass the bool as pointer to
make it clear that XCHECK_SZ() can change the variable.

While configuring user-mode XSAVE, clarify kernel-mode registers are not
managed by XSAVE by defining the xfeature in
XFEATURE_MASK_SUPERVISOR_UNSUPPORTED, like is done for XFEATURE_MASK_PT.
This serves more of a documentation as code purpose, and functionally,
only enables a few safety checks.

Both XSAVE state components are supervisor states, even the state
controlling user-mode operation. This is a departure from earlier features
like protection keys where the PKRU state is a normal user
(non-supervisor) state. Having the user state be supervisor-managed
ensures there is no direct, unprivileged access to it, making it harder
for an attacker to subvert CET.

To facilitate this privileged access, define the two user-mode CET MSRs,
and the bits defined in those MSRs relevant to future shadow stack
enablement patches.

Co-developed-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/all/20230613001108.3040476-25-rick.p.edgecombe%40intel.com
---
 arch/x86/include/asm/fpu/types.h  | 16 +++++-
 arch/x86/include/asm/fpu/xstate.h |  6 ++-
 arch/x86/kernel/fpu/xstate.c      | 90 +++++++++++++++----------------
 3 files changed, 61 insertions(+), 51 deletions(-)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index 7f6d858ff47a..eb810074f1e7 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -115,8 +115,8 @@ enum xfeature {
 	XFEATURE_PT_UNIMPLEMENTED_SO_FAR,
 	XFEATURE_PKRU,
 	XFEATURE_PASID,
-	XFEATURE_RSRVD_COMP_11,
-	XFEATURE_RSRVD_COMP_12,
+	XFEATURE_CET_USER,
+	XFEATURE_CET_KERNEL_UNUSED,
 	XFEATURE_RSRVD_COMP_13,
 	XFEATURE_RSRVD_COMP_14,
 	XFEATURE_LBR,
@@ -138,6 +138,8 @@ enum xfeature {
 #define XFEATURE_MASK_PT		(1 << XFEATURE_PT_UNIMPLEMENTED_SO_FAR)
 #define XFEATURE_MASK_PKRU		(1 << XFEATURE_PKRU)
 #define XFEATURE_MASK_PASID		(1 << XFEATURE_PASID)
+#define XFEATURE_MASK_CET_USER		(1 << XFEATURE_CET_USER)
+#define XFEATURE_MASK_CET_KERNEL	(1 << XFEATURE_CET_KERNEL_UNUSED)
 #define XFEATURE_MASK_LBR		(1 << XFEATURE_LBR)
 #define XFEATURE_MASK_XTILE_CFG		(1 << XFEATURE_XTILE_CFG)
 #define XFEATURE_MASK_XTILE_DATA	(1 << XFEATURE_XTILE_DATA)
@@ -252,6 +254,16 @@ struct pkru_state {
 	u32				pad;
 } __packed;
 
+/*
+ * State component 11 is Control-flow Enforcement user states
+ */
+struct cet_user_state {
+	/* user control-flow settings */
+	u64 user_cet;
+	/* user shadow stack pointer */
+	u64 user_ssp;
+};
+
 /*
  * State component 15: Architectural LBR configuration state.
  * The size of Arch LBR state depends on the number of LBRs (lbr_depth).
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index cd3dd170e23a..d4427b88ee12 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -50,7 +50,8 @@
 #define XFEATURE_MASK_USER_DYNAMIC	XFEATURE_MASK_XTILE_DATA
 
 /* All currently supported supervisor features */
-#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID)
+#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID | \
+					    XFEATURE_MASK_CET_USER)
 
 /*
  * A supervisor state component may not always contain valuable information,
@@ -77,7 +78,8 @@
  * Unsupported supervisor features. When a supervisor feature in this mask is
  * supported in the future, move it to the supported supervisor feature mask.
  */
-#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT)
+#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT | \
+					      XFEATURE_MASK_CET_KERNEL)
 
 /* All supervisor states including supported and unsupported states. */
 #define XFEATURE_MASK_SUPERVISOR_ALL (XFEATURE_MASK_SUPERVISOR_SUPPORTED | \
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 0bab497c9436..4fa4751912d9 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -39,26 +39,26 @@
  */
 static const char *xfeature_names[] =
 {
-	"x87 floating point registers"	,
-	"SSE registers"			,
-	"AVX registers"			,
-	"MPX bounds registers"		,
-	"MPX CSR"			,
-	"AVX-512 opmask"		,
-	"AVX-512 Hi256"			,
-	"AVX-512 ZMM_Hi256"		,
-	"Processor Trace (unused)"	,
+	"x87 floating point registers",
+	"SSE registers",
+	"AVX registers",
+	"MPX bounds registers",
+	"MPX CSR",
+	"AVX-512 opmask",
+	"AVX-512 Hi256",
+	"AVX-512 ZMM_Hi256",
+	"Processor Trace (unused)",
 	"Protection Keys User registers",
 	"PASID state",
-	"unknown xstate feature"	,
-	"unknown xstate feature"	,
-	"unknown xstate feature"	,
-	"unknown xstate feature"	,
-	"unknown xstate feature"	,
-	"unknown xstate feature"	,
-	"AMX Tile config"		,
-	"AMX Tile data"			,
-	"unknown xstate feature"	,
+	"Control-flow User registers",
+	"Control-flow Kernel registers (unused)",
+	"unknown xstate feature",
+	"unknown xstate feature",
+	"unknown xstate feature",
+	"unknown xstate feature",
+	"AMX Tile config",
+	"AMX Tile data",
+	"unknown xstate feature",
 };
 
 static unsigned short xsave_cpuid_features[] __initdata = {
@@ -73,6 +73,7 @@ static unsigned short xsave_cpuid_features[] __initdata = {
 	[XFEATURE_PT_UNIMPLEMENTED_SO_FAR]	= X86_FEATURE_INTEL_PT,
 	[XFEATURE_PKRU]				= X86_FEATURE_PKU,
 	[XFEATURE_PASID]			= X86_FEATURE_ENQCMD,
+	[XFEATURE_CET_USER]			= X86_FEATURE_SHSTK,
 	[XFEATURE_XTILE_CFG]			= X86_FEATURE_AMX_TILE,
 	[XFEATURE_XTILE_DATA]			= X86_FEATURE_AMX_TILE,
 };
@@ -276,6 +277,7 @@ static void __init print_xstate_features(void)
 	print_xstate_feature(XFEATURE_MASK_Hi16_ZMM);
 	print_xstate_feature(XFEATURE_MASK_PKRU);
 	print_xstate_feature(XFEATURE_MASK_PASID);
+	print_xstate_feature(XFEATURE_MASK_CET_USER);
 	print_xstate_feature(XFEATURE_MASK_XTILE_CFG);
 	print_xstate_feature(XFEATURE_MASK_XTILE_DATA);
 }
@@ -344,6 +346,7 @@ static __init void os_xrstor_booting(struct xregs_state *xstate)
 	 XFEATURE_MASK_BNDREGS |		\
 	 XFEATURE_MASK_BNDCSR |			\
 	 XFEATURE_MASK_PASID |			\
+	 XFEATURE_MASK_CET_USER |		\
 	 XFEATURE_MASK_XTILE)
 
 /*
@@ -446,14 +449,15 @@ static void __init __xstate_dump_leaves(void)
 	}									\
 } while (0)
 
-#define XCHECK_SZ(sz, nr, nr_macro, __struct) do {			\
-	if ((nr == nr_macro) &&						\
-	    WARN_ONCE(sz != sizeof(__struct),				\
-		"%s: struct is %zu bytes, cpu state %d bytes\n",	\
-		__stringify(nr_macro), sizeof(__struct), sz)) {		\
+#define XCHECK_SZ(sz, nr, __struct) ({					\
+	if (WARN_ONCE(sz != sizeof(__struct),				\
+	    "[%s]: struct is %zu bytes, cpu state %d bytes\n",		\
+	    xfeature_names[nr], sizeof(__struct), sz)) {		\
 		__xstate_dump_leaves();					\
 	}								\
-} while (0)
+	true;								\
+})
+
 
 /**
  * check_xtile_data_against_struct - Check tile data state size.
@@ -527,36 +531,28 @@ static bool __init check_xstate_against_struct(int nr)
 	 * Ask the CPU for the size of the state.
 	 */
 	int sz = xfeature_size(nr);
+
 	/*
 	 * Match each CPU state with the corresponding software
 	 * structure.
 	 */
-	XCHECK_SZ(sz, nr, XFEATURE_YMM,       struct ymmh_struct);
-	XCHECK_SZ(sz, nr, XFEATURE_BNDREGS,   struct mpx_bndreg_state);
-	XCHECK_SZ(sz, nr, XFEATURE_BNDCSR,    struct mpx_bndcsr_state);
-	XCHECK_SZ(sz, nr, XFEATURE_OPMASK,    struct avx_512_opmask_state);
-	XCHECK_SZ(sz, nr, XFEATURE_ZMM_Hi256, struct avx_512_zmm_uppers_state);
-	XCHECK_SZ(sz, nr, XFEATURE_Hi16_ZMM,  struct avx_512_hi16_state);
-	XCHECK_SZ(sz, nr, XFEATURE_PKRU,      struct pkru_state);
-	XCHECK_SZ(sz, nr, XFEATURE_PASID,     struct ia32_pasid_state);
-	XCHECK_SZ(sz, nr, XFEATURE_XTILE_CFG, struct xtile_cfg);
-
-	/* The tile data size varies between implementations. */
-	if (nr == XFEATURE_XTILE_DATA)
-		check_xtile_data_against_struct(sz);
-
-	/*
-	 * Make *SURE* to add any feature numbers in below if
-	 * there are "holes" in the xsave state component
-	 * numbers.
-	 */
-	if ((nr < XFEATURE_YMM) ||
-	    (nr >= XFEATURE_MAX) ||
-	    (nr == XFEATURE_PT_UNIMPLEMENTED_SO_FAR) ||
-	    ((nr >= XFEATURE_RSRVD_COMP_11) && (nr <= XFEATURE_RSRVD_COMP_16))) {
+	switch (nr) {
+	case XFEATURE_YMM:	  return XCHECK_SZ(sz, nr, struct ymmh_struct);
+	case XFEATURE_BNDREGS:	  return XCHECK_SZ(sz, nr, struct mpx_bndreg_state);
+	case XFEATURE_BNDCSR:	  return XCHECK_SZ(sz, nr, struct mpx_bndcsr_state);
+	case XFEATURE_OPMASK:	  return XCHECK_SZ(sz, nr, struct avx_512_opmask_state);
+	case XFEATURE_ZMM_Hi256:  return XCHECK_SZ(sz, nr, struct avx_512_zmm_uppers_state);
+	case XFEATURE_Hi16_ZMM:	  return XCHECK_SZ(sz, nr, struct avx_512_hi16_state);
+	case XFEATURE_PKRU:	  return XCHECK_SZ(sz, nr, struct pkru_state);
+	case XFEATURE_PASID:	  return XCHECK_SZ(sz, nr, struct ia32_pasid_state);
+	case XFEATURE_XTILE_CFG:  return XCHECK_SZ(sz, nr, struct xtile_cfg);
+	case XFEATURE_CET_USER:	  return XCHECK_SZ(sz, nr, struct cet_user_state);
+	case XFEATURE_XTILE_DATA: check_xtile_data_against_struct(sz); return true;
+	default:
 		XSTATE_WARN_ON(1, "No structure for xstate: %d\n", nr);
 		return false;
 	}
+
 	return true;
 }
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v5 03/19] KVM:x86: Report XSS as to-be-saved if there are supported features
  2023-08-03  4:27 [PATCH v5 00/19] Enable CET Virtualization Yang Weijiang
  2023-08-03  4:27 ` [PATCH v5 01/19] x86/cpufeatures: Add CPU feature flags for shadow stacks Yang Weijiang
  2023-08-03  4:27 ` [PATCH v5 02/19] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states Yang Weijiang
@ 2023-08-03  4:27 ` Yang Weijiang
  2023-08-03  4:27 ` [PATCH v5 04/19] KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS Yang Weijiang
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 82+ messages in thread
From: Yang Weijiang @ 2023-08-03  4:27 UTC (permalink / raw)
  To: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel
  Cc: rick.p.edgecombe, chao.gao, binbin.wu, weijiang.yang

From: Sean Christopherson <seanjc@google.com>

Add MSR_IA32_XSS to the list of MSRs reported to userspace if
supported_xss is non-zero, i.e. KVM supports at least one XSS
based feature.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/x86.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a6b9bea62fb8..0b9033551d8c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1451,6 +1451,7 @@ static const u32 msrs_to_save_base[] = {
 	MSR_IA32_UMWAIT_CONTROL,
 
 	MSR_IA32_XFD, MSR_IA32_XFD_ERR,
+	MSR_IA32_XSS,
 };
 
 static const u32 msrs_to_save_pmu[] = {
@@ -7172,6 +7173,10 @@ static void kvm_probe_msr_to_save(u32 msr_index)
 		if (!(kvm_get_arch_capabilities() & ARCH_CAP_TSX_CTRL_MSR))
 			return;
 		break;
+	case MSR_IA32_XSS:
+		if (!kvm_caps.supported_xss)
+			return;
+		break;
 	default:
 		break;
 	}
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v5 04/19] KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS
  2023-08-03  4:27 [PATCH v5 00/19] Enable CET Virtualization Yang Weijiang
                   ` (2 preceding siblings ...)
  2023-08-03  4:27 ` [PATCH v5 03/19] KVM:x86: Report XSS as to-be-saved if there are supported features Yang Weijiang
@ 2023-08-03  4:27 ` Yang Weijiang
  2023-08-04 16:02   ` Sean Christopherson
  2023-08-04 18:27   ` Sean Christopherson
  2023-08-03  4:27 ` [PATCH v5 05/19] KVM:x86: Initialize kvm_caps.supported_xss Yang Weijiang
                   ` (14 subsequent siblings)
  18 siblings, 2 replies; 82+ messages in thread
From: Yang Weijiang @ 2023-08-03  4:27 UTC (permalink / raw)
  To: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel
  Cc: rick.p.edgecombe, chao.gao, binbin.wu, weijiang.yang, Zhang Yi Z

Update CPUID(EAX=0DH,ECX=1) when the guest's XSS is modified.
CPUID(EAX=0DH,ECX=1).EBX reports required storage size of
all enabled xstate features in XCR0 | XSS. Guest can allocate
sufficient xsave buffer based on the info.

Note, KVM does not yet support any XSS based features, i.e.
supported_xss is guaranteed to be zero at this time.

Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/cpuid.c            | 20 ++++++++++++++++++--
 arch/x86/kvm/x86.c              |  8 +++++---
 3 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 28bd38303d70..20bbcd95511f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -804,6 +804,7 @@ struct kvm_vcpu_arch {
 
 	u64 xcr0;
 	u64 guest_supported_xcr0;
+	u64 guest_supported_xss;
 
 	struct kvm_pio_request pio;
 	void *pio_data;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 7f4d13383cf2..0338316b827c 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -249,6 +249,17 @@ static u64 cpuid_get_supported_xcr0(struct kvm_cpuid_entry2 *entries, int nent)
 	return (best->eax | ((u64)best->edx << 32)) & kvm_caps.supported_xcr0;
 }
 
+static u64 cpuid_get_supported_xss(struct kvm_cpuid_entry2 *entries, int nent)
+{
+	struct kvm_cpuid_entry2 *best;
+
+	best = cpuid_entry2_find(entries, nent, 0xd, 1);
+	if (!best)
+		return 0;
+
+	return (best->ecx | ((u64)best->edx << 32)) & kvm_caps.supported_xss;
+}
+
 static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *entries,
 				       int nent)
 {
@@ -276,8 +287,11 @@ static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_e
 
 	best = cpuid_entry2_find(entries, nent, 0xD, 1);
 	if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
-		     cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
-		best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
+		     cpuid_entry_has(best, X86_FEATURE_XSAVEC))) {
+		u64 xstate = vcpu->arch.xcr0 | vcpu->arch.ia32_xss;
+
+		best->ebx = xstate_required_size(xstate, true);
+	}
 
 	best = __kvm_find_kvm_cpuid_features(vcpu, entries, nent);
 	if (kvm_hlt_in_guest(vcpu->kvm) && best &&
@@ -325,6 +339,8 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 
 	vcpu->arch.guest_supported_xcr0 =
 		cpuid_get_supported_xcr0(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);
+	vcpu->arch.guest_supported_xss =
+		cpuid_get_supported_xss(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);
 
 	/*
 	 * FP+SSE can always be saved/restored via KVM_{G,S}ET_XSAVE, even if
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0b9033551d8c..5d6d6fa33e5b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3780,10 +3780,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		 * IA32_XSS[bit 8]. Guests have to use RDMSR/WRMSR rather than
 		 * XSAVES/XRSTORS to save/restore PT MSRs.
 		 */
-		if (data & ~kvm_caps.supported_xss)
+		if (data & ~vcpu->arch.guest_supported_xss)
 			return 1;
-		vcpu->arch.ia32_xss = data;
-		kvm_update_cpuid_runtime(vcpu);
+		if (vcpu->arch.ia32_xss != data) {
+			vcpu->arch.ia32_xss = data;
+			kvm_update_cpuid_runtime(vcpu);
+		}
 		break;
 	case MSR_SMI_COUNT:
 		if (!msr_info->host_initiated)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v5 05/19] KVM:x86: Initialize kvm_caps.supported_xss
  2023-08-03  4:27 [PATCH v5 00/19] Enable CET Virtualization Yang Weijiang
                   ` (3 preceding siblings ...)
  2023-08-03  4:27 ` [PATCH v5 04/19] KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS Yang Weijiang
@ 2023-08-03  4:27 ` Yang Weijiang
  2023-08-04 18:45   ` Sean Christopherson
  2023-08-03  4:27 ` [PATCH v5 06/19] KVM:x86: Load guest FPU state when access XSAVE-managed MSRs Yang Weijiang
                   ` (13 subsequent siblings)
  18 siblings, 1 reply; 82+ messages in thread
From: Yang Weijiang @ 2023-08-03  4:27 UTC (permalink / raw)
  To: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel
  Cc: rick.p.edgecombe, chao.gao, binbin.wu, weijiang.yang

Set kvm_caps.supported_xss to host_xss && KVM XSS mask.
host_xss contains the host supported xstate feature bits for thread
context switch, KVM_SUPPORTED_XSS includes all KVM enabled XSS feature
bits, the operation result represents all KVM supported feature bits.
Since the result is subset of host_xss, the related XSAVE-managed MSRs
are automatically swapped for guest and host when vCPU exits to
userspace.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 1 -
 arch/x86/kvm/x86.c     | 6 +++++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 0ecf4be2c6af..c8d9870cfecb 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7849,7 +7849,6 @@ static __init void vmx_set_cpu_caps(void)
 		kvm_cpu_cap_set(X86_FEATURE_UMIP);
 
 	/* CPUID 0xD.1 */
-	kvm_caps.supported_xss = 0;
 	if (!cpu_has_vmx_xsaves())
 		kvm_cpu_cap_clear(X86_FEATURE_XSAVES);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5d6d6fa33e5b..e9f3627d5fdd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -225,6 +225,8 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
 				| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
 				| XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE)
 
+#define KVM_SUPPORTED_XSS     0
+
 u64 __read_mostly host_efer;
 EXPORT_SYMBOL_GPL(host_efer);
 
@@ -9498,8 +9500,10 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 
 	rdmsrl_safe(MSR_EFER, &host_efer);
 
-	if (boot_cpu_has(X86_FEATURE_XSAVES))
+	if (boot_cpu_has(X86_FEATURE_XSAVES)) {
 		rdmsrl(MSR_IA32_XSS, host_xss);
+		kvm_caps.supported_xss = host_xss & KVM_SUPPORTED_XSS;
+	}
 
 	kvm_init_pmu_capability(ops->pmu_ops);
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v5 06/19] KVM:x86: Load guest FPU state when access XSAVE-managed MSRs
  2023-08-03  4:27 [PATCH v5 00/19] Enable CET Virtualization Yang Weijiang
                   ` (4 preceding siblings ...)
  2023-08-03  4:27 ` [PATCH v5 05/19] KVM:x86: Initialize kvm_caps.supported_xss Yang Weijiang
@ 2023-08-03  4:27 ` Yang Weijiang
  2023-08-03  4:27 ` [PATCH v5 07/19] KVM:x86: Add fault checks for guest CR4.CET setting Yang Weijiang
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 82+ messages in thread
From: Yang Weijiang @ 2023-08-03  4:27 UTC (permalink / raw)
  To: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel
  Cc: rick.p.edgecombe, chao.gao, binbin.wu, weijiang.yang

From: Sean Christopherson <seanjc@google.com>

Load the guest's FPU state if userspace is accessing MSRs whose values are
managed by XSAVES. Two MSR access helpers, i.e., kvm_{get,set}_xsave_msr(),
are designed by a later patch to facilitate access to such kind of MSRs.

If MSRs supported in kvm_caps.supported_xss are passed through to guest,
the guest MSRs are swapped with host contents before vCPU exits to userspace
and after it enters kernel again.

Because the modified code is also used for the KVM_GET_MSRS device ioctl(),
explicitly check @vcpu is non-null before attempting to load guest state.
The XSS supporting MSRs cannot be retrieved via the device ioctl() without
loading guest FPU state (which doesn't exist).

Note that guest_cpuid_has() is not queried as host userspace is allowed
to access MSRs that have not been exposed to the guest, e.g. it might do
KVM_SET_MSRS prior to KVM_SET_CPUID2.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Co-developed-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/x86.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e9f3627d5fdd..015fb0ef102c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -132,6 +132,9 @@ static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
 static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
 
 static DEFINE_MUTEX(vendor_module_lock);
+static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
+static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
+
 struct kvm_x86_ops kvm_x86_ops __read_mostly;
 
 #define KVM_X86_OP(func)					     \
@@ -4345,6 +4348,21 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 }
 EXPORT_SYMBOL_GPL(kvm_get_msr_common);
 
+static const u32 xstate_msrs[] = {
+	MSR_IA32_U_CET, MSR_IA32_PL3_SSP,
+};
+
+static bool is_xstate_msr(u32 index)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(xstate_msrs); i++) {
+		if (index == xstate_msrs[i])
+			return true;
+	}
+	return false;
+}
+
 /*
  * Read or write a bunch of msrs. All parameters are kernel addresses.
  *
@@ -4355,11 +4373,20 @@ static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs,
 		    int (*do_msr)(struct kvm_vcpu *vcpu,
 				  unsigned index, u64 *data))
 {
+	bool fpu_loaded = false;
 	int i;
 
-	for (i = 0; i < msrs->nmsrs; ++i)
+	for (i = 0; i < msrs->nmsrs; ++i) {
+		if (vcpu && !fpu_loaded && kvm_caps.supported_xss &&
+		    is_xstate_msr(entries[i].index)) {
+			kvm_load_guest_fpu(vcpu);
+			fpu_loaded = true;
+		}
 		if (do_msr(vcpu, entries[i].index, &entries[i].data))
 			break;
+	}
+	if (fpu_loaded)
+		kvm_put_guest_fpu(vcpu);
 
 	return i;
 }
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v5 07/19] KVM:x86: Add fault checks for guest CR4.CET setting
  2023-08-03  4:27 [PATCH v5 00/19] Enable CET Virtualization Yang Weijiang
                   ` (5 preceding siblings ...)
  2023-08-03  4:27 ` [PATCH v5 06/19] KVM:x86: Load guest FPU state when access XSAVE-managed MSRs Yang Weijiang
@ 2023-08-03  4:27 ` Yang Weijiang
  2023-08-03  9:07   ` Chao Gao
  2023-08-03  4:27 ` [PATCH v5 08/19] KVM:x86: Report KVM supported CET MSRs as to-be-saved Yang Weijiang
                   ` (11 subsequent siblings)
  18 siblings, 1 reply; 82+ messages in thread
From: Yang Weijiang @ 2023-08-03  4:27 UTC (permalink / raw)
  To: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel
  Cc: rick.p.edgecombe, chao.gao, binbin.wu, weijiang.yang

Check potential faults for CR4.CET setting per Intel SDM.
CET can be enabled if and only if CR0.WP==1, i.e. setting
CR4.CET=1 faults if CR0.WP==0 and setting CR0.WP=0 fails
if CR4.CET==1.

Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/x86.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 015fb0ef102c..82b9f14990da 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -993,6 +993,9 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 	    (is_64_bit_mode(vcpu) || kvm_is_cr4_bit_set(vcpu, X86_CR4_PCIDE)))
 		return 1;
 
+	if (!(cr0 & X86_CR0_WP) && kvm_is_cr4_bit_set(vcpu, X86_CR4_CET))
+		return 1;
+
 	static_call(kvm_x86_set_cr0)(vcpu, cr0);
 
 	kvm_post_set_cr0(vcpu, old_cr0, cr0);
@@ -1204,6 +1207,9 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 			return 1;
 	}
 
+	if ((cr4 & X86_CR4_CET) && !kvm_is_cr0_bit_set(vcpu, X86_CR0_WP))
+		return 1;
+
 	static_call(kvm_x86_set_cr4)(vcpu, cr4);
 
 	kvm_post_set_cr4(vcpu, old_cr4, cr4);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v5 08/19] KVM:x86: Report KVM supported CET MSRs as to-be-saved
  2023-08-03  4:27 [PATCH v5 00/19] Enable CET Virtualization Yang Weijiang
                   ` (6 preceding siblings ...)
  2023-08-03  4:27 ` [PATCH v5 07/19] KVM:x86: Add fault checks for guest CR4.CET setting Yang Weijiang
@ 2023-08-03  4:27 ` Yang Weijiang
  2023-08-03 10:39   ` Chao Gao
                     ` (2 more replies)
  2023-08-03  4:27 ` [PATCH v5 09/19] KVM:x86: Make guest supervisor states as non-XSAVE managed Yang Weijiang
                   ` (10 subsequent siblings)
  18 siblings, 3 replies; 82+ messages in thread
From: Yang Weijiang @ 2023-08-03  4:27 UTC (permalink / raw)
  To: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel
  Cc: rick.p.edgecombe, chao.gao, binbin.wu, weijiang.yang

Add all CET MSRs including the synthesized GUEST_SSP to report list.
PL{0,1,2}_SSP are independent to host XSAVE management with later
patches. MSR_IA32_U_CET and MSR_IA32_PL3_SSP are XSAVE-managed on
host side. MSR_IA32_S_CET/MSR_IA32_INT_SSP_TAB/MSR_KVM_GUEST_SSP
are not XSAVE-managed.

When CET IBT/SHSTK are enumerated to guest, both user and supervisor
modes should be supported for architechtural integrity, i.e., two
modes are supported as both or neither.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/include/uapi/asm/kvm_para.h |  1 +
 arch/x86/kvm/x86.c                   | 10 ++++++++++
 arch/x86/kvm/x86.h                   | 10 ++++++++++
 3 files changed, 21 insertions(+)

diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 6e64b27b2c1e..7af465e4e0bd 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -58,6 +58,7 @@
 #define MSR_KVM_ASYNC_PF_INT	0x4b564d06
 #define MSR_KVM_ASYNC_PF_ACK	0x4b564d07
 #define MSR_KVM_MIGRATION_CONTROL	0x4b564d08
+#define MSR_KVM_GUEST_SSP	0x4b564d09
 
 struct kvm_steal_time {
 	__u64 steal;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 82b9f14990da..d68ef87fe007 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1463,6 +1463,9 @@ static const u32 msrs_to_save_base[] = {
 
 	MSR_IA32_XFD, MSR_IA32_XFD_ERR,
 	MSR_IA32_XSS,
+	MSR_IA32_U_CET, MSR_IA32_S_CET,
+	MSR_IA32_PL0_SSP, MSR_IA32_PL1_SSP, MSR_IA32_PL2_SSP,
+	MSR_IA32_PL3_SSP, MSR_IA32_INT_SSP_TAB, MSR_KVM_GUEST_SSP,
 };
 
 static const u32 msrs_to_save_pmu[] = {
@@ -7214,6 +7217,13 @@ static void kvm_probe_msr_to_save(u32 msr_index)
 		if (!kvm_caps.supported_xss)
 			return;
 		break;
+	case MSR_IA32_U_CET:
+	case MSR_IA32_S_CET:
+	case MSR_KVM_GUEST_SSP:
+	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
+		if (!kvm_is_cet_supported())
+			return;
+		break;
 	default:
 		break;
 	}
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 82e3dafc5453..6e6292915f8c 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -362,6 +362,16 @@ static inline bool kvm_mpx_supported(void)
 		== (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR);
 }
 
+#define CET_XSTATE_MASK (XFEATURE_MASK_CET_USER)
+/*
+ * Shadow Stack and Indirect Branch Tracking feature enabling depends on
+ * whether host side CET user xstate bit is supported or not.
+ */
+static inline bool kvm_is_cet_supported(void)
+{
+	return (kvm_caps.supported_xss & CET_XSTATE_MASK) == CET_XSTATE_MASK;
+}
+
 extern unsigned int min_timer_period_us;
 
 extern bool enable_vmware_backdoor;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v5 09/19] KVM:x86: Make guest supervisor states as non-XSAVE managed
  2023-08-03  4:27 [PATCH v5 00/19] Enable CET Virtualization Yang Weijiang
                   ` (7 preceding siblings ...)
  2023-08-03  4:27 ` [PATCH v5 08/19] KVM:x86: Report KVM supported CET MSRs as to-be-saved Yang Weijiang
@ 2023-08-03  4:27 ` Yang Weijiang
  2023-08-03 11:15   ` Chao Gao
  2023-08-03  4:27 ` [PATCH v5 10/19] KVM:VMX: Introduce CET VMCS fields and control bits Yang Weijiang
                   ` (9 subsequent siblings)
  18 siblings, 1 reply; 82+ messages in thread
From: Yang Weijiang @ 2023-08-03  4:27 UTC (permalink / raw)
  To: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel
  Cc: rick.p.edgecombe, chao.gao, binbin.wu, weijiang.yang

Save guest CET supervisor states, i.e.,PL{0,1,2}_SSP, when vCPU
is exiting to userspace or being preempted. Reload the MSRs
before vm-entry.

Embeded the helpers in {vmx,svm}_prepare_switch_to_guest() and
vmx_prepare_switch_to_host()/svm_prepare_host_switch() to employ
existing guest state management and optimize the invocation of
the helpers.

Enabling CET supervisor state management in KVM due to:
 -Introducing unnecessary XSAVE operation when switch to non-vCPU
userspace within current FPU framework.
 -Forcing allocating additional space for CET supervisor states in
each thread context regardless whether it's vCPU thread or not.

Suggested-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/cpuid.h            | 11 +++++++++++
 arch/x86/kvm/svm/svm.c          |  2 ++
 arch/x86/kvm/vmx/vmx.c          |  2 ++
 arch/x86/kvm/x86.c              | 27 +++++++++++++++++++++++++++
 arch/x86/kvm/x86.h              |  3 +++
 6 files changed, 46 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 20bbcd95511f..69cbc9d9b277 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -805,6 +805,7 @@ struct kvm_vcpu_arch {
 	u64 xcr0;
 	u64 guest_supported_xcr0;
 	u64 guest_supported_xss;
+	u64 cet_s_ssp[3];
 
 	struct kvm_pio_request pio;
 	void *pio_data;
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index b1658c0de847..b221a663de4c 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -232,4 +232,15 @@ static __always_inline bool guest_pv_has(struct kvm_vcpu *vcpu,
 	return vcpu->arch.pv_cpuid.features & (1u << kvm_feature);
 }
 
+/*
+ * FIXME: When the "KVM-governed" enabling patchset is merge, rebase this
+ * series on top of that and add a new patch for CET to replace this helper
+ * with the qualified one.
+ */
+static __always_inline bool guest_can_use(struct kvm_vcpu *vcpu,
+					  unsigned int feature)
+{
+	return kvm_cpu_cap_has(feature) && guest_cpuid_has(vcpu, feature);
+}
+
 #endif
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 1bc0936bbd51..8652e86fbfb2 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1515,11 +1515,13 @@ static void svm_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
 	if (likely(tsc_aux_uret_slot >= 0))
 		kvm_set_user_return_msr(tsc_aux_uret_slot, svm->tsc_aux, -1ull);
 
+	reload_cet_supervisor_ssp(vcpu);
 	svm->guest_state_loaded = true;
 }
 
 static void svm_prepare_host_switch(struct kvm_vcpu *vcpu)
 {
+	save_cet_supervisor_ssp(vcpu);
 	to_svm(vcpu)->guest_state_loaded = false;
 }
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c8d9870cfecb..6aa76124e81e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1323,6 +1323,7 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
 	gs_base = segment_base(gs_sel);
 #endif
 
+	reload_cet_supervisor_ssp(vcpu);
 	vmx_set_host_fs_gs(host_state, fs_sel, gs_sel, fs_base, gs_base);
 	vmx->guest_state_loaded = true;
 }
@@ -1362,6 +1363,7 @@ static void vmx_prepare_switch_to_host(struct vcpu_vmx *vmx)
 	wrmsrl(MSR_KERNEL_GS_BASE, vmx->msr_host_kernel_gs_base);
 #endif
 	load_fixmap_gdt(raw_smp_processor_id());
+	save_cet_supervisor_ssp(&vmx->vcpu);
 	vmx->guest_state_loaded = false;
 	vmx->guest_uret_msrs_loaded = false;
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d68ef87fe007..5b63441fd2d2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11128,6 +11128,31 @@ static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
 	trace_kvm_fpu(0);
 }
 
+void save_cet_supervisor_ssp(struct kvm_vcpu *vcpu)
+{
+	if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) {
+		rdmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]);
+		rdmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]);
+		rdmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]);
+		/*
+		 * Omit reset to host PL{1,2}_SSP because Linux will never use
+		 * these MSRs.
+		 */
+		wrmsrl(MSR_IA32_PL0_SSP, 0);
+	}
+}
+EXPORT_SYMBOL_GPL(save_cet_supervisor_ssp);
+
+void reload_cet_supervisor_ssp(struct kvm_vcpu *vcpu)
+{
+	if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) {
+		wrmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]);
+		wrmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]);
+		wrmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]);
+	}
+}
+EXPORT_SYMBOL_GPL(reload_cet_supervisor_ssp);
+
 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 {
 	struct kvm_queued_exception *ex = &vcpu->arch.exception;
@@ -12133,6 +12158,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 
 	vcpu->arch.cr3 = 0;
 	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
+	memset(vcpu->arch.cet_s_ssp, 0, sizeof(vcpu->arch.cet_s_ssp));
 
 	/*
 	 * CR0.CD/NW are set on RESET, preserved on INIT.  Note, some versions
@@ -12313,6 +12339,7 @@ void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
 		pmu->need_cleanup = true;
 		kvm_make_request(KVM_REQ_PMU, vcpu);
 	}
+
 	static_call(kvm_x86_sched_in)(vcpu, cpu);
 }
 
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 6e6292915f8c..c69fc027f5ec 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -501,6 +501,9 @@ static inline void kvm_machine_check(void)
 
 void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu);
 void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu);
+void save_cet_supervisor_ssp(struct kvm_vcpu *vcpu);
+void reload_cet_supervisor_ssp(struct kvm_vcpu *vcpu);
+
 int kvm_spec_ctrl_test_value(u64 value);
 bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
 int kvm_handle_memory_failure(struct kvm_vcpu *vcpu, int r,
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v5 10/19] KVM:VMX: Introduce CET VMCS fields and control bits
  2023-08-03  4:27 [PATCH v5 00/19] Enable CET Virtualization Yang Weijiang
                   ` (8 preceding siblings ...)
  2023-08-03  4:27 ` [PATCH v5 09/19] KVM:x86: Make guest supervisor states as non-XSAVE managed Yang Weijiang
@ 2023-08-03  4:27 ` Yang Weijiang
  2023-08-03  4:27 ` [PATCH v5 11/19] KVM:VMX: Emulate read and write to CET MSRs Yang Weijiang
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 82+ messages in thread
From: Yang Weijiang @ 2023-08-03  4:27 UTC (permalink / raw)
  To: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel
  Cc: rick.p.edgecombe, chao.gao, binbin.wu, weijiang.yang, Zhang Yi Z

Control-flow Enforcement Technology (CET) is a kind of CPU feature used
to prevent Return/CALL/Jump-Oriented Programming (ROP/COP/JOP) attacks.
It provides two sub-features(SHSTK,IBT) to defend against ROP/COP/JOP
style control-flow subversion attacks.

Shadow Stack (SHSTK):
  A shadow stack is a second stack used exclusively for control transfer
  operations. The shadow stack is separate from the data/normal stack and
  can be enabled individually in user and kernel mode. When shadow stack
  is enabled, CALL pushes the return address on both the data and shadow
  stack. RET pops the return address from both stacks and compares them.
  If the return addresses from the two stacks do not match, the processor
  generates a #CP.

Indirect Branch Tracking (IBT):
  IBT introduces new instruction(ENDBRANCH)to mark valid target addresses of
  indirect branches (CALL, JMP etc...). If an indirect branch is executed
  and the next instruction is _not_ an ENDBRANCH, the processor generates a
  #CP. These instruction behaves as a NOP on platforms that doesn't support
  CET.

Several new CET MSRs are defined to support CET:
  MSR_IA32_{U,S}_CET: CET settings for {user,supervisor} mode respectively.

  MSR_IA32_PL{0,1,2,3}_SSP: SHSTK pointer linear address for CPL{0,1,2,3}.

  MSR_IA32_INT_SSP_TAB: Linear address of SHSTK table,the entry is indexed
			by IST of interrupt gate desc.

Two XSAVES state bits are introduced for CET:
  IA32_XSS:[bit 11]: Control saving/restoring user mode CET states
  IA32_XSS:[bit 12]: Control saving/restoring supervisor mode CET states.

Six VMCS fields are introduced for CET:
  {HOST,GUEST}_S_CET: Stores CET settings for kernel mode.
  {HOST,GUEST}_SSP: Stores shadow stack pointer of current active task/thread.
  {HOST,GUEST}_INTR_SSP_TABLE: Stores current active MSR_IA32_INT_SSP_TAB.

On Intel platforms, two additional bits are defined in VM_EXIT and VM_ENTRY
control fields:
If VM_EXIT_LOAD_CET_STATE = 1, the host CET states are restored from
the following VMCS fields at VM-Exit:
  HOST_S_CET
  HOST_SSP
  HOST_INTR_SSP_TABLE

If VM_ENTRY_LOAD_CET_STATE = 1, the guest CET states are loaded from
the following VMCS fields at VM-Entry:
  GUEST_S_CET
  GUEST_SSP
  GUEST_INTR_SSP_TABLE

Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/include/asm/vmx.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 0d02c4aafa6f..db7f93307349 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -104,6 +104,7 @@
 #define VM_EXIT_CLEAR_BNDCFGS                   0x00800000
 #define VM_EXIT_PT_CONCEAL_PIP			0x01000000
 #define VM_EXIT_CLEAR_IA32_RTIT_CTL		0x02000000
+#define VM_EXIT_LOAD_CET_STATE                  0x10000000
 
 #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR	0x00036dff
 
@@ -117,6 +118,7 @@
 #define VM_ENTRY_LOAD_BNDCFGS                   0x00010000
 #define VM_ENTRY_PT_CONCEAL_PIP			0x00020000
 #define VM_ENTRY_LOAD_IA32_RTIT_CTL		0x00040000
+#define VM_ENTRY_LOAD_CET_STATE                 0x00100000
 
 #define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR	0x000011ff
 
@@ -345,6 +347,9 @@ enum vmcs_field {
 	GUEST_PENDING_DBG_EXCEPTIONS    = 0x00006822,
 	GUEST_SYSENTER_ESP              = 0x00006824,
 	GUEST_SYSENTER_EIP              = 0x00006826,
+	GUEST_S_CET                     = 0x00006828,
+	GUEST_SSP                       = 0x0000682a,
+	GUEST_INTR_SSP_TABLE            = 0x0000682c,
 	HOST_CR0                        = 0x00006c00,
 	HOST_CR3                        = 0x00006c02,
 	HOST_CR4                        = 0x00006c04,
@@ -357,6 +362,9 @@ enum vmcs_field {
 	HOST_IA32_SYSENTER_EIP          = 0x00006c12,
 	HOST_RSP                        = 0x00006c14,
 	HOST_RIP                        = 0x00006c16,
+	HOST_S_CET                      = 0x00006c18,
+	HOST_SSP                        = 0x00006c1a,
+	HOST_INTR_SSP_TABLE             = 0x00006c1c
 };
 
 /*
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v5 11/19] KVM:VMX: Emulate read and write to CET MSRs
  2023-08-03  4:27 [PATCH v5 00/19] Enable CET Virtualization Yang Weijiang
                   ` (9 preceding siblings ...)
  2023-08-03  4:27 ` [PATCH v5 10/19] KVM:VMX: Introduce CET VMCS fields and control bits Yang Weijiang
@ 2023-08-03  4:27 ` Yang Weijiang
  2023-08-04  5:14   ` Chao Gao
                     ` (2 more replies)
  2023-08-03  4:27 ` [PATCH v5 12/19] KVM:x86: Save and reload SSP to/from SMRAM Yang Weijiang
                   ` (7 subsequent siblings)
  18 siblings, 3 replies; 82+ messages in thread
From: Yang Weijiang @ 2023-08-03  4:27 UTC (permalink / raw)
  To: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel
  Cc: rick.p.edgecombe, chao.gao, binbin.wu, weijiang.yang

Add emulation interface for CET MSR read and write.
The emulation code is split into common part and vendor specific
part, the former resides in x86.c to benefic different x86 CPU
vendors, the latter for VMX is implemented in this patch.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/vmx/vmx.c |  27 +++++++++++
 arch/x86/kvm/x86.c     | 104 +++++++++++++++++++++++++++++++++++++----
 arch/x86/kvm/x86.h     |  18 +++++++
 3 files changed, 141 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 6aa76124e81e..ccf750e79608 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2095,6 +2095,18 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		else
 			msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
 		break;
+	case MSR_IA32_S_CET:
+	case MSR_KVM_GUEST_SSP:
+	case MSR_IA32_INT_SSP_TAB:
+		if (kvm_get_msr_common(vcpu, msr_info))
+			return 1;
+		if (msr_info->index == MSR_KVM_GUEST_SSP)
+			msr_info->data = vmcs_readl(GUEST_SSP);
+		else if (msr_info->index == MSR_IA32_S_CET)
+			msr_info->data = vmcs_readl(GUEST_S_CET);
+		else if (msr_info->index == MSR_IA32_INT_SSP_TAB)
+			msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE);
+		break;
 	case MSR_IA32_DEBUGCTLMSR:
 		msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
 		break;
@@ -2404,6 +2416,18 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		else
 			vmx->pt_desc.guest.addr_a[index / 2] = data;
 		break;
+	case MSR_IA32_S_CET:
+	case MSR_KVM_GUEST_SSP:
+	case MSR_IA32_INT_SSP_TAB:
+		if (kvm_set_msr_common(vcpu, msr_info))
+			return 1;
+		if (msr_index == MSR_KVM_GUEST_SSP)
+			vmcs_writel(GUEST_SSP, data);
+		else if (msr_index == MSR_IA32_S_CET)
+			vmcs_writel(GUEST_S_CET, data);
+		else if (msr_index == MSR_IA32_INT_SSP_TAB)
+			vmcs_writel(GUEST_INTR_SSP_TABLE, data);
+		break;
 	case MSR_IA32_PERF_CAPABILITIES:
 		if (data && !vcpu_to_pmu(vcpu)->version)
 			return 1;
@@ -4864,6 +4888,9 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 		vmcs_write64(GUEST_BNDCFGS, 0);
 
 	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);  /* 22.2.1 */
+	vmcs_writel(GUEST_SSP, 0);
+	vmcs_writel(GUEST_S_CET, 0);
+	vmcs_writel(GUEST_INTR_SSP_TABLE, 0);
 
 	kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5b63441fd2d2..98f3ff6078e6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3627,6 +3627,39 @@ static bool kvm_is_msr_to_save(u32 msr_index)
 	return false;
 }
 
+static inline bool is_shadow_stack_msr(u32 msr)
+{
+	return msr == MSR_IA32_PL0_SSP ||
+		msr == MSR_IA32_PL1_SSP ||
+		msr == MSR_IA32_PL2_SSP ||
+		msr == MSR_IA32_PL3_SSP ||
+		msr == MSR_IA32_INT_SSP_TAB ||
+		msr == MSR_KVM_GUEST_SSP;
+}
+
+static bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu,
+				      struct msr_data *msr)
+{
+	if (is_shadow_stack_msr(msr->index)) {
+		if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
+			return false;
+
+		if (msr->index == MSR_KVM_GUEST_SSP)
+			return msr->host_initiated;
+
+		return msr->host_initiated ||
+			guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
+	}
+
+	if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
+	    !kvm_cpu_cap_has(X86_FEATURE_IBT))
+		return false;
+
+	return msr->host_initiated ||
+		guest_cpuid_has(vcpu, X86_FEATURE_IBT) ||
+		guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
+}
+
 int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
 	u32 msr = msr_info->index;
@@ -3981,6 +4014,45 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		vcpu->arch.guest_fpu.xfd_err = data;
 		break;
 #endif
+#define CET_EXCLUSIVE_BITS		(CET_SUPPRESS | CET_WAIT_ENDBR)
+#define CET_CTRL_RESERVED_BITS		GENMASK(9, 6)
+#define CET_SHSTK_MASK_BITS		GENMASK(1, 0)
+#define CET_IBT_MASK_BITS		(GENMASK_ULL(5, 2) | \
+					 GENMASK_ULL(63, 10))
+#define CET_LEG_BITMAP_BASE(data)	((data) >> 12)
+	case MSR_IA32_U_CET:
+	case MSR_IA32_S_CET:
+		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
+			return 1;
+		if (!!(data & CET_CTRL_RESERVED_BITS))
+			return 1;
+		if (!guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
+		    (data & CET_SHSTK_MASK_BITS))
+			return 1;
+		if (!guest_can_use(vcpu, X86_FEATURE_IBT) &&
+		    (data & CET_IBT_MASK_BITS))
+			return 1;
+		if (!IS_ALIGNED(CET_LEG_BITMAP_BASE(data), 4) ||
+		    (data & CET_EXCLUSIVE_BITS) == CET_EXCLUSIVE_BITS)
+			return 1;
+		if (msr == MSR_IA32_U_CET)
+			kvm_set_xsave_msr(msr_info);
+		break;
+	case MSR_KVM_GUEST_SSP:
+	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
+		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
+			return 1;
+		if (is_noncanonical_address(data, vcpu))
+			return 1;
+		if (!IS_ALIGNED(data, 4))
+			return 1;
+		if (msr == MSR_IA32_PL0_SSP || msr == MSR_IA32_PL1_SSP ||
+		    msr == MSR_IA32_PL2_SSP) {
+			vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP] = data;
+		} else if (msr == MSR_IA32_PL3_SSP) {
+			kvm_set_xsave_msr(msr_info);
+		}
+		break;
 	default:
 		if (kvm_pmu_is_valid_msr(vcpu, msr))
 			return kvm_pmu_set_msr(vcpu, msr_info);
@@ -4051,7 +4123,9 @@ static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host)
 
 int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
-	switch (msr_info->index) {
+	u32 msr = msr_info->index;
+
+	switch (msr) {
 	case MSR_IA32_PLATFORM_ID:
 	case MSR_IA32_EBL_CR_POWERON:
 	case MSR_IA32_LASTBRANCHFROMIP:
@@ -4086,7 +4160,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR3:
 	case MSR_P6_PERFCTR0 ... MSR_P6_PERFCTR1:
 	case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL1:
-		if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
+		if (kvm_pmu_is_valid_msr(vcpu, msr))
 			return kvm_pmu_get_msr(vcpu, msr_info);
 		msr_info->data = 0;
 		break;
@@ -4137,7 +4211,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_MTRRcap:
 	case MTRRphysBase_MSR(0) ... MSR_MTRRfix4K_F8000:
 	case MSR_MTRRdefType:
-		return kvm_mtrr_get_msr(vcpu, msr_info->index, &msr_info->data);
+		return kvm_mtrr_get_msr(vcpu, msr, &msr_info->data);
 	case 0xcd: /* fsb frequency */
 		msr_info->data = 3;
 		break;
@@ -4159,7 +4233,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		msr_info->data = kvm_get_apic_base(vcpu);
 		break;
 	case APIC_BASE_MSR ... APIC_BASE_MSR + 0xff:
-		return kvm_x2apic_msr_read(vcpu, msr_info->index, &msr_info->data);
+		return kvm_x2apic_msr_read(vcpu, msr, &msr_info->data);
 	case MSR_IA32_TSC_DEADLINE:
 		msr_info->data = kvm_get_lapic_tscdeadline_msr(vcpu);
 		break;
@@ -4253,7 +4327,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_MCG_STATUS:
 	case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
 	case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
-		return get_msr_mce(vcpu, msr_info->index, &msr_info->data,
+		return get_msr_mce(vcpu, msr, &msr_info->data,
 				   msr_info->host_initiated);
 	case MSR_IA32_XSS:
 		if (!msr_info->host_initiated &&
@@ -4284,7 +4358,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case HV_X64_MSR_TSC_EMULATION_STATUS:
 	case HV_X64_MSR_TSC_INVARIANT_CONTROL:
 		return kvm_hv_get_msr_common(vcpu,
-					     msr_info->index, &msr_info->data,
+					     msr, &msr_info->data,
 					     msr_info->host_initiated);
 	case MSR_IA32_BBL_CR_CTL3:
 		/* This legacy MSR exists but isn't fully documented in current
@@ -4337,8 +4411,22 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		msr_info->data = vcpu->arch.guest_fpu.xfd_err;
 		break;
 #endif
+	case MSR_IA32_U_CET:
+	case MSR_IA32_S_CET:
+	case MSR_KVM_GUEST_SSP:
+	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
+		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
+			return 1;
+		if (msr == MSR_IA32_PL0_SSP || msr == MSR_IA32_PL1_SSP ||
+		    msr == MSR_IA32_PL2_SSP) {
+			msr_info->data =
+				vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP];
+		} else if (msr == MSR_IA32_U_CET || msr == MSR_IA32_PL3_SSP) {
+			kvm_get_xsave_msr(msr_info);
+		}
+		break;
 	default:
-		if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
+		if (kvm_pmu_is_valid_msr(vcpu, msr))
 			return kvm_pmu_get_msr(vcpu, msr_info);
 
 		/*
@@ -4346,7 +4434,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		 * to-be-saved, even if an MSR isn't fully supported.
 		 */
 		if (msr_info->host_initiated &&
-		    kvm_is_msr_to_save(msr_info->index)) {
+		    kvm_is_msr_to_save(msr)) {
 			msr_info->data = 0;
 			break;
 		}
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index c69fc027f5ec..3b79d6db2f83 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -552,4 +552,22 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
 			 unsigned int port, void *data,  unsigned int count,
 			 int in);
 
+/*
+ * Guest xstate MSRs have been loaded in __msr_io(), disable preemption before
+ * access the MSRs to avoid MSR content corruption.
+ */
+static inline void kvm_get_xsave_msr(struct msr_data *msr_info)
+{
+	kvm_fpu_get();
+	rdmsrl(msr_info->index, msr_info->data);
+	kvm_fpu_put();
+}
+
+static inline void kvm_set_xsave_msr(struct msr_data *msr_info)
+{
+	kvm_fpu_get();
+	wrmsrl(msr_info->index, msr_info->data);
+	kvm_fpu_put();
+}
+
 #endif
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v5 12/19] KVM:x86: Save and reload SSP to/from SMRAM
  2023-08-03  4:27 [PATCH v5 00/19] Enable CET Virtualization Yang Weijiang
                   ` (10 preceding siblings ...)
  2023-08-03  4:27 ` [PATCH v5 11/19] KVM:VMX: Emulate read and write to CET MSRs Yang Weijiang
@ 2023-08-03  4:27 ` Yang Weijiang
  2023-08-04  7:53   ` Chao Gao
  2023-08-03  4:27 ` [PATCH v5 13/19] KVM:VMX: Set up interception for CET MSRs Yang Weijiang
                   ` (6 subsequent siblings)
  18 siblings, 1 reply; 82+ messages in thread
From: Yang Weijiang @ 2023-08-03  4:27 UTC (permalink / raw)
  To: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel
  Cc: rick.p.edgecombe, chao.gao, binbin.wu, weijiang.yang

Save CET SSP to SMRAM on SMI and reload it on RSM.
KVM emulates architectural behavior when guest enters/leaves SMM
mode, i.e., save registers to SMRAM at the entry of SMM and reload
them at the exit of SMM. Per SDM, SSP is defined as one of
the fields in SMRAM for 64-bit mode, so handle the state accordingly.

Check is_smm() to determine whether kvm_cet_is_msr_accessible()
is called in SMM mode so that kvm_{set,get}_msr() works in SMM mode.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/smm.c | 11 +++++++++++
 arch/x86/kvm/smm.h |  2 +-
 arch/x86/kvm/x86.c | 11 ++++++++++-
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c
index b42111a24cc2..e0b62d211306 100644
--- a/arch/x86/kvm/smm.c
+++ b/arch/x86/kvm/smm.c
@@ -309,6 +309,12 @@ void enter_smm(struct kvm_vcpu *vcpu)
 
 	kvm_smm_changed(vcpu, true);
 
+#ifdef CONFIG_X86_64
+	if (guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
+	    kvm_get_msr(vcpu, MSR_KVM_GUEST_SSP, &smram.smram64.ssp))
+		goto error;
+#endif
+
 	if (kvm_vcpu_write_guest(vcpu, vcpu->arch.smbase + 0xfe00, &smram, sizeof(smram)))
 		goto error;
 
@@ -586,6 +592,11 @@ int emulator_leave_smm(struct x86_emulate_ctxt *ctxt)
 	if ((vcpu->arch.hflags & HF_SMM_INSIDE_NMI_MASK) == 0)
 		static_call(kvm_x86_set_nmi_mask)(vcpu, false);
 
+#ifdef CONFIG_X86_64
+	if (guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
+	    kvm_set_msr(vcpu, MSR_KVM_GUEST_SSP, smram.smram64.ssp))
+		return X86EMUL_UNHANDLEABLE;
+#endif
 	kvm_smm_changed(vcpu, false);
 
 	/*
diff --git a/arch/x86/kvm/smm.h b/arch/x86/kvm/smm.h
index a1cf2ac5bd78..1e2a3e18207f 100644
--- a/arch/x86/kvm/smm.h
+++ b/arch/x86/kvm/smm.h
@@ -116,8 +116,8 @@ struct kvm_smram_state_64 {
 	u32 smbase;
 	u32 reserved4[5];
 
-	/* ssp and svm_* fields below are not implemented by KVM */
 	u64 ssp;
+	/* svm_* fields below are not implemented by KVM */
 	u64 svm_guest_pat;
 	u64 svm_host_efer;
 	u64 svm_host_cr4;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 98f3ff6078e6..56aa5a3d3913 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3644,8 +3644,17 @@ static bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu,
 		if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
 			return false;
 
-		if (msr->index == MSR_KVM_GUEST_SSP)
+		/*
+		 * This MSR is synthesized mainly for userspace access during
+		 * Live Migration, it also can be accessed in SMM mode by VMM.
+		 * Guest is not allowed to access this MSR.
+		 */
+		if (msr->index == MSR_KVM_GUEST_SSP) {
+			if (IS_ENABLED(CONFIG_X86_64) && is_smm(vcpu))
+				return true;
+
 			return msr->host_initiated;
+		}
 
 		return msr->host_initiated ||
 			guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v5 13/19] KVM:VMX: Set up interception for CET MSRs
  2023-08-03  4:27 [PATCH v5 00/19] Enable CET Virtualization Yang Weijiang
                   ` (11 preceding siblings ...)
  2023-08-03  4:27 ` [PATCH v5 12/19] KVM:x86: Save and reload SSP to/from SMRAM Yang Weijiang
@ 2023-08-03  4:27 ` Yang Weijiang
  2023-08-04  8:16   ` Chao Gao
  2023-08-03  4:27 ` [PATCH v5 14/19] KVM:VMX: Set host constant supervisor states to VMCS fields Yang Weijiang
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 82+ messages in thread
From: Yang Weijiang @ 2023-08-03  4:27 UTC (permalink / raw)
  To: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel
  Cc: rick.p.edgecombe, chao.gao, binbin.wu, weijiang.yang

Pass through CET MSRs when the associated feature is enabled.
Shadow Stack feature requires all the CET MSRs to make it
architectural support in guest. IBT feature only depends on
MSR_IA32_U_CET and MSR_IA32_S_CET to enable both user and
supervisor IBT. Note, This MSR design introduced an architectual
limitation of SHSTK and IBT control for guest, i.e., when SHSTK
is exposed, IBT is also available to guest from architectual level
since IBT relies on subset of SHSTK relevant MSRs.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ccf750e79608..6779b8a63789 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -709,6 +709,10 @@ static bool is_valid_passthrough_msr(u32 msr)
 	case MSR_LBR_CORE_TO ... MSR_LBR_CORE_TO + 8:
 		/* LBR MSRs. These are handled in vmx_update_intercept_for_lbr_msrs() */
 		return true;
+	case MSR_IA32_U_CET:
+	case MSR_IA32_S_CET:
+	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
+		return true;
 	}
 
 	r = possible_passthrough_msr_slot(msr) != -ENOENT;
@@ -7747,6 +7751,41 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
 		vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4));
 }
 
+static void vmx_update_intercept_for_cet_msr(struct kvm_vcpu *vcpu)
+{
+	bool incpt;
+
+	if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) {
+		incpt = !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
+
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET,
+					  MSR_TYPE_RW, incpt);
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET,
+					  MSR_TYPE_RW, incpt);
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP,
+					  MSR_TYPE_RW, incpt);
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP,
+					  MSR_TYPE_RW, incpt);
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP,
+					  MSR_TYPE_RW, incpt);
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP,
+					  MSR_TYPE_RW, incpt);
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_INT_SSP_TAB,
+					  MSR_TYPE_RW, incpt);
+		if (!incpt)
+			return;
+	}
+
+	if (kvm_cpu_cap_has(X86_FEATURE_IBT)) {
+		incpt = !guest_can_use(vcpu, X86_FEATURE_IBT);
+
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET,
+					  MSR_TYPE_RW, incpt);
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET,
+					  MSR_TYPE_RW, incpt);
+	}
+}
+
 static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -7814,6 +7853,8 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 
 	/* Refresh #PF interception to account for MAXPHYADDR changes. */
 	vmx_update_exception_bitmap(vcpu);
+
+	vmx_update_intercept_for_cet_msr(vcpu);
 }
 
 static u64 vmx_get_perf_capabilities(void)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v5 14/19] KVM:VMX: Set host constant supervisor states to VMCS fields
  2023-08-03  4:27 [PATCH v5 00/19] Enable CET Virtualization Yang Weijiang
                   ` (12 preceding siblings ...)
  2023-08-03  4:27 ` [PATCH v5 13/19] KVM:VMX: Set up interception for CET MSRs Yang Weijiang
@ 2023-08-03  4:27 ` Yang Weijiang
  2023-08-04  8:23   ` Chao Gao
  2023-08-03  4:27 ` [PATCH v5 15/19] KVM:x86: Optimize CET supervisor SSP save/reload Yang Weijiang
                   ` (4 subsequent siblings)
  18 siblings, 1 reply; 82+ messages in thread
From: Yang Weijiang @ 2023-08-03  4:27 UTC (permalink / raw)
  To: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel
  Cc: rick.p.edgecombe, chao.gao, binbin.wu, weijiang.yang

Set constant values to HOST_{S_CET,SSP,INTR_SSP_TABLE} VMCS
fields explicitly. Kernel IBT is supported and the setting in
MSR_IA32_S_CET is static after post-boot(except is BIOS call
case but vCPU thread never across it.), i.e. KVM doesn't need
to refresh HOST_S_CET field before every VM-Enter/VM-Exit
sequence.

Host supervisor shadow stack is not enabled now and SSP is not
accessible to kernel mode, thus it's safe to set host IA32_INT_
SSP_TAB/SSP VMCS fields to 0s. When shadow stack is enabled for
CPL3, SSP is reloaded from IA32_PL3_SSP before it exits to userspace.
Check SDM Vol 2A/B Chapter 3/4 for SYSCALL/SYSRET/SYSENTER SYSEXIT/
RDSSP/CALL etc.

Prevent KVM module loading and if host supervisor shadow stack
SHSTK_EN is set in MSR_IA32_S_CET as KVM cannot co-exit with it
correctly.

Suggested-by: Sean Christopherson <seanjc@google.com>
Suggested-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/vmx/capabilities.h |  4 ++++
 arch/x86/kvm/vmx/vmx.c          | 15 +++++++++++++++
 arch/x86/kvm/x86.c              | 14 ++++++++++++++
 arch/x86/kvm/x86.h              |  1 +
 4 files changed, 34 insertions(+)

diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index d0abee35d7ba..b1883f6c08eb 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -106,6 +106,10 @@ static inline bool cpu_has_load_perf_global_ctrl(void)
 	return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
 }
 
+static inline bool cpu_has_load_cet_ctrl(void)
+{
+	return (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_CET_STATE);
+}
 static inline bool cpu_has_vmx_mpx(void)
 {
 	return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_BNDCFGS;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 6779b8a63789..99bf63b2a779 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4341,6 +4341,21 @@ void vmx_set_constant_host_state(struct vcpu_vmx *vmx)
 
 	if (cpu_has_load_ia32_efer())
 		vmcs_write64(HOST_IA32_EFER, host_efer);
+
+	/*
+	 * Supervisor shadow stack is not enabled on host side, i.e.,
+	 * host IA32_S_CET.SHSTK_EN bit is guaranteed to 0 now, per SDM
+	 * description(RDSSP instruction), SSP is not readable in CPL0,
+	 * so resetting the two registers to 0s at VM-Exit does no harm
+	 * to kernel execution. When execution flow exits to userspace,
+	 * SSP is reloaded from IA32_PL3_SSP. Check SDM Vol.2A/B Chapter
+	 * 3 and 4 for details.
+	 */
+	if (cpu_has_load_cet_ctrl()) {
+		vmcs_writel(HOST_S_CET, host_s_cet);
+		vmcs_writel(HOST_SSP, 0);
+		vmcs_writel(HOST_INTR_SSP_TABLE, 0);
+	}
 }
 
 void set_cr4_guest_host_mask(struct vcpu_vmx *vmx)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 56aa5a3d3913..01b4f10fa8ab 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -113,6 +113,8 @@ static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
 #endif
 
 static u64 __read_mostly cr4_reserved_bits = CR4_RESERVED_BITS;
+u64 __read_mostly host_s_cet;
+EXPORT_SYMBOL_GPL(host_s_cet);
 
 #define KVM_EXIT_HYPERCALL_VALID_MASK (1 << KVM_HC_MAP_GPA_RANGE)
 
@@ -9615,6 +9617,18 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 		return -EIO;
 	}
 
+	if (boot_cpu_has(X86_FEATURE_SHSTK)) {
+		rdmsrl(MSR_IA32_S_CET, host_s_cet);
+		/*
+		 * Linux doesn't yet support supervisor shadow stacks (SSS), so
+		 * KVM doesn't save/restore the associated MSRs, i.e. KVM may
+		 * clobber the host values.  Yell and refuse to load if SSS is
+		 * unexpectedly enabled, e.g. to avoid crashing the host.
+		 */
+		if (WARN_ON_ONCE(host_s_cet & CET_SHSTK_EN))
+			return -EIO;
+	}
+
 	x86_emulator_cache = kvm_alloc_emulator_cache();
 	if (!x86_emulator_cache) {
 		pr_err("failed to allocate cache for x86 emulator\n");
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 3b79d6db2f83..e42e5263fcf7 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -323,6 +323,7 @@ fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu);
 
 extern u64 host_xcr0;
 extern u64 host_xss;
+extern u64 host_s_cet;
 
 extern struct kvm_caps kvm_caps;
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v5 15/19] KVM:x86: Optimize CET supervisor SSP save/reload
  2023-08-03  4:27 [PATCH v5 00/19] Enable CET Virtualization Yang Weijiang
                   ` (13 preceding siblings ...)
  2023-08-03  4:27 ` [PATCH v5 14/19] KVM:VMX: Set host constant supervisor states to VMCS fields Yang Weijiang
@ 2023-08-03  4:27 ` Yang Weijiang
  2023-08-04  8:43   ` Chao Gao
  2023-08-03  4:27 ` [PATCH v5 16/19] KVM:x86: Enable CET virtualization for VMX and advertise to userspace Yang Weijiang
                   ` (3 subsequent siblings)
  18 siblings, 1 reply; 82+ messages in thread
From: Yang Weijiang @ 2023-08-03  4:27 UTC (permalink / raw)
  To: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel
  Cc: rick.p.edgecombe, chao.gao, binbin.wu, weijiang.yang

Make PL{0,1,2}_SSP as write-intercepted to detect whether
guest is using these MSRs. Disable intercept to the MSRs
if they're written with non-zero values. KVM does save/
reload for the MSRs only if they're used by guest.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/vmx/vmx.c          | 31 ++++++++++++++++++++++++++++---
 arch/x86/kvm/x86.c              |  8 ++++++--
 3 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 69cbc9d9b277..c50b555234fb 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -748,6 +748,7 @@ struct kvm_vcpu_arch {
 	bool tpr_access_reporting;
 	bool xsaves_enabled;
 	bool xfd_no_write_intercept;
+	bool cet_sss_active;
 	u64 ia32_xss;
 	u64 microcode_version;
 	u64 arch_capabilities;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 99bf63b2a779..96e22515ed13 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2152,6 +2152,18 @@ static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated
 	return debugctl;
 }
 
+static void vmx_disable_write_intercept_sss_msr(struct kvm_vcpu *vcpu)
+{
+	if (guest_can_use(vcpu, X86_FEATURE_SHSTK)) {
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP,
+					  MSR_TYPE_RW, false);
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP,
+					  MSR_TYPE_RW, false);
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP,
+					  MSR_TYPE_RW, false);
+	}
+}
+
 /*
  * Writes msr value into the appropriate "register".
  * Returns 0 on success, non-0 otherwise.
@@ -2420,6 +2432,14 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		else
 			vmx->pt_desc.guest.addr_a[index / 2] = data;
 		break;
+	case MSR_IA32_PL0_SSP ... MSR_IA32_PL2_SSP:
+		if (kvm_set_msr_common(vcpu, msr_info))
+			return 1;
+		if (data) {
+			vmx_disable_write_intercept_sss_msr(vcpu);
+			wrmsrl(msr_index, data);
+		}
+		break;
 	case MSR_IA32_S_CET:
 	case MSR_KVM_GUEST_SSP:
 	case MSR_IA32_INT_SSP_TAB:
@@ -7777,12 +7797,17 @@ static void vmx_update_intercept_for_cet_msr(struct kvm_vcpu *vcpu)
 					  MSR_TYPE_RW, incpt);
 		vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET,
 					  MSR_TYPE_RW, incpt);
+		/*
+		 * Supervisor shadow stack MSRs are intercepted until
+		 * they're written by guest, this is designed to
+		 * optimize the save/restore overhead.
+		 */
 		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP,
-					  MSR_TYPE_RW, incpt);
+					  MSR_TYPE_R, incpt);
 		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP,
-					  MSR_TYPE_RW, incpt);
+					  MSR_TYPE_R, incpt);
 		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP,
-					  MSR_TYPE_RW, incpt);
+					  MSR_TYPE_R, incpt);
 		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP,
 					  MSR_TYPE_RW, incpt);
 		vmx_set_intercept_for_msr(vcpu, MSR_IA32_INT_SSP_TAB,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 01b4f10fa8ab..fa3e7f7c639f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4060,6 +4060,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		if (msr == MSR_IA32_PL0_SSP || msr == MSR_IA32_PL1_SSP ||
 		    msr == MSR_IA32_PL2_SSP) {
 			vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP] = data;
+			if (!vcpu->arch.cet_sss_active && data)
+				vcpu->arch.cet_sss_active = true;
 		} else if (msr == MSR_IA32_PL3_SSP) {
 			kvm_set_xsave_msr(msr_info);
 		}
@@ -11241,7 +11243,8 @@ static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
 
 void save_cet_supervisor_ssp(struct kvm_vcpu *vcpu)
 {
-	if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) {
+	if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
+		     vcpu->arch.cet_sss_active)) {
 		rdmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]);
 		rdmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]);
 		rdmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]);
@@ -11256,7 +11259,8 @@ EXPORT_SYMBOL_GPL(save_cet_supervisor_ssp);
 
 void reload_cet_supervisor_ssp(struct kvm_vcpu *vcpu)
 {
-	if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) {
+	if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
+		     vcpu->arch.cet_sss_active)) {
 		wrmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]);
 		wrmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]);
 		wrmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v5 16/19] KVM:x86: Enable CET virtualization for VMX and advertise to userspace
  2023-08-03  4:27 [PATCH v5 00/19] Enable CET Virtualization Yang Weijiang
                   ` (14 preceding siblings ...)
  2023-08-03  4:27 ` [PATCH v5 15/19] KVM:x86: Optimize CET supervisor SSP save/reload Yang Weijiang
@ 2023-08-03  4:27 ` Yang Weijiang
  2023-08-03  4:27 ` [PATCH v5 17/19] KVM:x86: Enable guest CET supervisor xstate bit support Yang Weijiang
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 82+ messages in thread
From: Yang Weijiang @ 2023-08-03  4:27 UTC (permalink / raw)
  To: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel
  Cc: rick.p.edgecombe, chao.gao, binbin.wu, weijiang.yang

Enable CET related feature bits in KVM capabilities array and make
X86_CR4_CET available to guest. Remove the feature bits if host side
dependencies cannot be met.

Set the feature bits so that CET features are available in guest CPUID.
Add CR4.CET bit support in order to allow guest set CET master control
bit(CR4.CET).

Disable KVM CET feature if unrestricted_guest is unsupported/disabled as
KVM does not support emulating CET.
Don't expose CET feature if dependent CET bit(U_CET) is cleared in host
XSS or if XSAVES isn't supported.

The CET bits in VM_ENTRY/VM_EXIT control fields should be set to make guest
CET states isolated from host side. CET is only available on platforms that
enumerate VMX_BASIC[bit 56] as 1.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/include/asm/kvm_host.h  |  3 ++-
 arch/x86/include/asm/msr-index.h |  1 +
 arch/x86/kvm/cpuid.c             | 12 ++++++++++--
 arch/x86/kvm/vmx/capabilities.h  |  6 ++++++
 arch/x86/kvm/vmx/vmx.c           | 20 ++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.h           |  6 ++++--
 arch/x86/kvm/x86.c               | 16 +++++++++++++++-
 arch/x86/kvm/x86.h               |  3 +++
 8 files changed, 61 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c50b555234fb..f883696723f4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -125,7 +125,8 @@
 			  | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_PCIDE \
 			  | X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \
 			  | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \
-			  | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP))
+			  | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP \
+			  | X86_CR4_CET))
 
 #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR)
 
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 3aedae61af4f..7ce0850c6067 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1078,6 +1078,7 @@
 #define VMX_BASIC_MEM_TYPE_MASK	0x003c000000000000LLU
 #define VMX_BASIC_MEM_TYPE_WB	6LLU
 #define VMX_BASIC_INOUT		0x0040000000000000LLU
+#define VMX_BASIC_NO_HW_ERROR_CODE	0x0100000000000000LLU
 
 /* Resctrl MSRs: */
 /* - Intel: */
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0338316b827c..1a601be7b4fa 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -624,7 +624,7 @@ void kvm_set_cpu_caps(void)
 		F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
 		F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
 		F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | 0 /*WAITPKG*/ |
-		F(SGX_LC) | F(BUS_LOCK_DETECT)
+		F(SGX_LC) | F(BUS_LOCK_DETECT) | F(SHSTK)
 	);
 	/* Set LA57 based on hardware capability. */
 	if (cpuid_ecx(7) & F(LA57))
@@ -642,7 +642,8 @@ void kvm_set_cpu_caps(void)
 		F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
 		F(MD_CLEAR) | F(AVX512_VP2INTERSECT) | F(FSRM) |
 		F(SERIALIZE) | F(TSXLDTRK) | F(AVX512_FP16) |
-		F(AMX_TILE) | F(AMX_INT8) | F(AMX_BF16) | F(FLUSH_L1D)
+		F(AMX_TILE) | F(AMX_INT8) | F(AMX_BF16) | F(FLUSH_L1D) |
+		F(IBT)
 	);
 
 	/* TSC_ADJUST and ARCH_CAPABILITIES are emulated in software. */
@@ -655,6 +656,13 @@ void kvm_set_cpu_caps(void)
 		kvm_cpu_cap_set(X86_FEATURE_INTEL_STIBP);
 	if (boot_cpu_has(X86_FEATURE_AMD_SSBD))
 		kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL_SSBD);
+	/*
+	 * The feature bit in boot_cpu_data.x86_capability could have been
+	 * cleared due to ibt=off cmdline option, then add it back if CPU
+	 * supports IBT.
+	 */
+	if (cpuid_edx(7) & F(IBT))
+		kvm_cpu_cap_set(X86_FEATURE_IBT);
 
 	kvm_cpu_cap_mask(CPUID_7_1_EAX,
 		F(AVX_VNNI) | F(AVX512_BF16) | F(CMPCCXADD) |
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index b1883f6c08eb..2948a288d0b4 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -79,6 +79,12 @@ static inline bool cpu_has_vmx_basic_inout(void)
 	return	(((u64)vmcs_config.basic_cap << 32) & VMX_BASIC_INOUT);
 }
 
+static inline bool cpu_has_vmx_basic_no_hw_errcode(void)
+{
+	return	((u64)vmcs_config.basic_cap << 32) &
+		 VMX_BASIC_NO_HW_ERROR_CODE;
+}
+
 static inline bool cpu_has_virtual_nmis(void)
 {
 	return vmcs_config.pin_based_exec_ctrl & PIN_BASED_VIRTUAL_NMIS &&
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 96e22515ed13..2f2b6f7c33d9 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2624,6 +2624,7 @@ static int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 		{ VM_ENTRY_LOAD_IA32_EFER,		VM_EXIT_LOAD_IA32_EFER },
 		{ VM_ENTRY_LOAD_BNDCFGS,		VM_EXIT_CLEAR_BNDCFGS },
 		{ VM_ENTRY_LOAD_IA32_RTIT_CTL,		VM_EXIT_CLEAR_IA32_RTIT_CTL },
+		{ VM_ENTRY_LOAD_CET_STATE,		VM_EXIT_LOAD_CET_STATE },
 	};
 
 	memset(vmcs_conf, 0, sizeof(*vmcs_conf));
@@ -6357,6 +6358,12 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
 	if (vmcs_read32(VM_EXIT_MSR_STORE_COUNT) > 0)
 		vmx_dump_msrs("guest autostore", &vmx->msr_autostore.guest);
 
+	if (vmentry_ctl & VM_ENTRY_LOAD_CET_STATE) {
+		pr_err("S_CET = 0x%016lx\n", vmcs_readl(GUEST_S_CET));
+		pr_err("SSP = 0x%016lx\n", vmcs_readl(GUEST_SSP));
+		pr_err("INTR SSP TABLE = 0x%016lx\n",
+		       vmcs_readl(GUEST_INTR_SSP_TABLE));
+	}
 	pr_err("*** Host State ***\n");
 	pr_err("RIP = 0x%016lx  RSP = 0x%016lx\n",
 	       vmcs_readl(HOST_RIP), vmcs_readl(HOST_RSP));
@@ -6434,6 +6441,12 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
 	if (secondary_exec_control & SECONDARY_EXEC_ENABLE_VPID)
 		pr_err("Virtual processor ID = 0x%04x\n",
 		       vmcs_read16(VIRTUAL_PROCESSOR_ID));
+	if (vmexit_ctl & VM_EXIT_LOAD_CET_STATE) {
+		pr_err("S_CET = 0x%016lx\n", vmcs_readl(HOST_S_CET));
+		pr_err("SSP = 0x%016lx\n", vmcs_readl(HOST_SSP));
+		pr_err("INTR SSP TABLE = 0x%016lx\n",
+		       vmcs_readl(HOST_INTR_SSP_TABLE));
+	}
 }
 
 /*
@@ -7970,6 +7983,13 @@ static __init void vmx_set_cpu_caps(void)
 
 	if (cpu_has_vmx_waitpkg())
 		kvm_cpu_cap_check_and_set(X86_FEATURE_WAITPKG);
+
+	if (!cpu_has_load_cet_ctrl() || !enable_unrestricted_guest ||
+	    !cpu_has_vmx_basic_no_hw_errcode()) {
+		kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
+		kvm_cpu_cap_clear(X86_FEATURE_IBT);
+		kvm_caps.supported_xss &= ~CET_XSTATE_MASK;
+	}
 }
 
 static void vmx_request_immediate_exit(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 32384ba38499..4e88b5fb45e8 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -481,7 +481,8 @@ static inline u8 vmx_get_rvi(void)
 	 VM_ENTRY_LOAD_IA32_EFER |					\
 	 VM_ENTRY_LOAD_BNDCFGS |					\
 	 VM_ENTRY_PT_CONCEAL_PIP |					\
-	 VM_ENTRY_LOAD_IA32_RTIT_CTL)
+	 VM_ENTRY_LOAD_IA32_RTIT_CTL |					\
+	 VM_ENTRY_LOAD_CET_STATE)
 
 #define __KVM_REQUIRED_VMX_VM_EXIT_CONTROLS				\
 	(VM_EXIT_SAVE_DEBUG_CONTROLS |					\
@@ -503,7 +504,8 @@ static inline u8 vmx_get_rvi(void)
 	       VM_EXIT_LOAD_IA32_EFER |					\
 	       VM_EXIT_CLEAR_BNDCFGS |					\
 	       VM_EXIT_PT_CONCEAL_PIP |					\
-	       VM_EXIT_CLEAR_IA32_RTIT_CTL)
+	       VM_EXIT_CLEAR_IA32_RTIT_CTL |				\
+	       VM_EXIT_LOAD_CET_STATE)
 
 #define KVM_REQUIRED_VMX_PIN_BASED_VM_EXEC_CONTROL			\
 	(PIN_BASED_EXT_INTR_MASK |					\
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fa3e7f7c639f..aa92dec66f1e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -230,7 +230,7 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
 				| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
 				| XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE)
 
-#define KVM_SUPPORTED_XSS     0
+#define KVM_SUPPORTED_XSS	(XFEATURE_MASK_CET_USER)
 
 u64 __read_mostly host_efer;
 EXPORT_SYMBOL_GPL(host_efer);
@@ -9669,6 +9669,20 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 
 	kvm_ops_update(ops);
 
+	if (!kvm_is_cet_supported()) {
+		kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
+		kvm_cpu_cap_clear(X86_FEATURE_IBT);
+	}
+
+	/*
+	 * If SHSTK and IBT are not available in KVM, clear CET user bit in
+	 * kvm_caps.supported_xss so that kvm_is_cet__supported() returns
+	 * false when called.
+	 */
+	if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
+	    !kvm_cpu_cap_has(X86_FEATURE_IBT))
+		kvm_caps.supported_xss &= ~CET_XSTATE_MASK;
+
 	for_each_online_cpu(cpu) {
 		smp_call_function_single(cpu, kvm_x86_check_cpu_compat, &r, 1);
 		if (r < 0)
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index e42e5263fcf7..373386fb9ed2 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -542,6 +542,9 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type);
 		__reserved_bits |= X86_CR4_VMXE;        \
 	if (!__cpu_has(__c, X86_FEATURE_PCID))          \
 		__reserved_bits |= X86_CR4_PCIDE;       \
+	if (!__cpu_has(__c, X86_FEATURE_SHSTK) &&       \
+	    !__cpu_has(__c, X86_FEATURE_IBT))           \
+		__reserved_bits |= X86_CR4_CET;         \
 	__reserved_bits;                                \
 })
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v5 17/19] KVM:x86: Enable guest CET supervisor xstate bit support
  2023-08-03  4:27 [PATCH v5 00/19] Enable CET Virtualization Yang Weijiang
                   ` (15 preceding siblings ...)
  2023-08-03  4:27 ` [PATCH v5 16/19] KVM:x86: Enable CET virtualization for VMX and advertise to userspace Yang Weijiang
@ 2023-08-03  4:27 ` Yang Weijiang
  2023-08-04 22:02   ` Paolo Bonzini
  2023-08-03  4:27 ` [PATCH v5 18/19] KVM:nVMX: Refine error code injection to nested VM Yang Weijiang
  2023-08-03  4:27 ` [PATCH v5 19/19] KVM:nVMX: Enable CET support for " Yang Weijiang
  18 siblings, 1 reply; 82+ messages in thread
From: Yang Weijiang @ 2023-08-03  4:27 UTC (permalink / raw)
  To: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel
  Cc: rick.p.edgecombe, chao.gao, binbin.wu, weijiang.yang

Add S_CET bit in kvm_caps.supported_xss so that guest can enumerate
the feature in CPUID(0xd,1).ECX.

Guest S_CET xstate bit is specially handled, i.e., it can be exposed
without related enabling on host side, because KVM manually saves/reloads
guest supervisor SHSTK SSPs and current XSS swap logic for host/guest aslo
supports doing so, thus it's safe to enable the bit without host support.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/x86.c | 8 +++++++-
 arch/x86/kvm/x86.h | 2 +-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index aa92dec66f1e..2e200a5d00e9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -230,7 +230,8 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
 				| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
 				| XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE)
 
-#define KVM_SUPPORTED_XSS	(XFEATURE_MASK_CET_USER)
+#define KVM_SUPPORTED_XSS	(XFEATURE_MASK_CET_USER | \
+				 XFEATURE_MASK_CET_KERNEL)
 
 u64 __read_mostly host_efer;
 EXPORT_SYMBOL_GPL(host_efer);
@@ -9657,8 +9658,13 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 	rdmsrl_safe(MSR_EFER, &host_efer);
 
 	if (boot_cpu_has(X86_FEATURE_XSAVES)) {
+		u32 eax, ebx, ecx, edx;
+
+		cpuid_count(0xd, 1, &eax, &ebx, &ecx, &edx);
 		rdmsrl(MSR_IA32_XSS, host_xss);
 		kvm_caps.supported_xss = host_xss & KVM_SUPPORTED_XSS;
+		if (ecx & XFEATURE_MASK_CET_KERNEL)
+			kvm_caps.supported_xss |= XFEATURE_MASK_CET_KERNEL;
 	}
 
 	kvm_init_pmu_capability(ops->pmu_ops);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 373386fb9ed2..ea0ecb8f0df6 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -363,7 +363,7 @@ static inline bool kvm_mpx_supported(void)
 		== (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR);
 }
 
-#define CET_XSTATE_MASK (XFEATURE_MASK_CET_USER)
+#define CET_XSTATE_MASK (XFEATURE_MASK_CET_USER | XFEATURE_MASK_CET_KERNEL)
 /*
  * Shadow Stack and Indirect Branch Tracking feature enabling depends on
  * whether host side CET user xstate bit is supported or not.
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v5 18/19] KVM:nVMX: Refine error code injection to nested VM
  2023-08-03  4:27 [PATCH v5 00/19] Enable CET Virtualization Yang Weijiang
                   ` (16 preceding siblings ...)
  2023-08-03  4:27 ` [PATCH v5 17/19] KVM:x86: Enable guest CET supervisor xstate bit support Yang Weijiang
@ 2023-08-03  4:27 ` Yang Weijiang
  2023-08-04 21:38   ` Sean Christopherson
  2023-08-03  4:27 ` [PATCH v5 19/19] KVM:nVMX: Enable CET support for " Yang Weijiang
  18 siblings, 1 reply; 82+ messages in thread
From: Yang Weijiang @ 2023-08-03  4:27 UTC (permalink / raw)
  To: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel
  Cc: rick.p.edgecombe, chao.gao, binbin.wu, weijiang.yang

Per SDM description(Vol.3D, Appendix A.1):
"If bit 56 is read as 1, software can use VM entry to deliver
a hardware exception with or without an error code, regardless
of vector"

Modify has_error_code check  before inject events to nested guest.
Only enforce the check when guest is in real mode, the exception
is not hard exception and the platform doesn't enumerate bit56
in VMX_BASIC, otherwise ignore it.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/vmx/nested.c | 22 ++++++++++++++--------
 arch/x86/kvm/vmx/nested.h |  7 +++++++
 2 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 516391cc0d64..9bcd989252f7 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1205,9 +1205,9 @@ static int vmx_restore_vmx_basic(struct vcpu_vmx *vmx, u64 data)
 {
 	const u64 feature_and_reserved =
 		/* feature (except bit 48; see below) */
-		BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) |
+		BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) | BIT_ULL(56) |
 		/* reserved */
-		BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 56);
+		BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 57);
 	u64 vmx_basic = vmcs_config.nested.basic;
 
 	if (!is_bitwise_subset(vmx_basic, data, feature_and_reserved))
@@ -2846,12 +2846,16 @@ static int nested_check_vm_entry_controls(struct kvm_vcpu *vcpu,
 		    CC(intr_type == INTR_TYPE_OTHER_EVENT && vector != 0))
 			return -EINVAL;
 
-		/* VM-entry interruption-info field: deliver error code */
-		should_have_error_code =
-			intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode &&
-			x86_exception_has_error_code(vector);
-		if (CC(has_error_code != should_have_error_code))
-			return -EINVAL;
+		if (!prot_mode || intr_type != INTR_TYPE_HARD_EXCEPTION ||
+		    !nested_cpu_has_no_hw_errcode(vcpu)) {
+			/* VM-entry interruption-info field: deliver error code */
+			should_have_error_code =
+				intr_type == INTR_TYPE_HARD_EXCEPTION &&
+				prot_mode &&
+				x86_exception_has_error_code(vector);
+			if (CC(has_error_code != should_have_error_code))
+				return -EINVAL;
+		}
 
 		/* VM-entry exception error code */
 		if (CC(has_error_code &&
@@ -6967,6 +6971,8 @@ static void nested_vmx_setup_basic(struct nested_vmx_msrs *msrs)
 
 	if (cpu_has_vmx_basic_inout())
 		msrs->basic |= VMX_BASIC_INOUT;
+	if (cpu_has_vmx_basic_no_hw_errcode())
+		msrs->basic |= VMX_BASIC_NO_HW_ERROR_CODE;
 }
 
 static void nested_vmx_setup_cr_fixed(struct nested_vmx_msrs *msrs)
diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index 96952263b029..1884628294e4 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -284,6 +284,13 @@ static inline bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val)
 	       __kvm_is_valid_cr4(vcpu, val);
 }
 
+static inline bool nested_cpu_has_no_hw_errcode(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	return vmx->nested.msrs.basic & VMX_BASIC_NO_HW_ERROR_CODE;
+}
+
 /* No difference in the restrictions on guest and host CR4 in VMX operation. */
 #define nested_guest_cr4_valid	nested_cr4_valid
 #define nested_host_cr4_valid	nested_cr4_valid
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v5 19/19] KVM:nVMX: Enable CET support for nested VM
  2023-08-03  4:27 [PATCH v5 00/19] Enable CET Virtualization Yang Weijiang
                   ` (17 preceding siblings ...)
  2023-08-03  4:27 ` [PATCH v5 18/19] KVM:nVMX: Refine error code injection to nested VM Yang Weijiang
@ 2023-08-03  4:27 ` Yang Weijiang
  18 siblings, 0 replies; 82+ messages in thread
From: Yang Weijiang @ 2023-08-03  4:27 UTC (permalink / raw)
  To: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel
  Cc: rick.p.edgecombe, chao.gao, binbin.wu, weijiang.yang

Set up CET MSRs, related VM_ENTRY/EXIT control bits and fixed
setting for CR4 to enable CET for nested VM.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/kvm/vmx/nested.c | 27 +++++++++++++++++++++++++--
 arch/x86/kvm/vmx/vmcs12.c |  6 ++++++
 arch/x86/kvm/vmx/vmcs12.h | 14 +++++++++++++-
 arch/x86/kvm/vmx/vmx.c    |  2 ++
 4 files changed, 46 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 9bcd989252f7..bd6883033f69 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -660,6 +660,28 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
 	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
 					 MSR_IA32_FLUSH_CMD, MSR_TYPE_W);
 
+	/* Pass CET MSRs to nested VM if L0 and L1 are set to pass-through. */
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_U_CET, MSR_TYPE_RW);
+
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_S_CET, MSR_TYPE_RW);
+
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_PL0_SSP, MSR_TYPE_RW);
+
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_PL1_SSP, MSR_TYPE_RW);
+
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_PL2_SSP, MSR_TYPE_RW);
+
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_PL3_SSP, MSR_TYPE_RW);
+
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_INT_SSP_TAB, MSR_TYPE_RW);
+
 	kvm_vcpu_unmap(vcpu, &vmx->nested.msr_bitmap_map, false);
 
 	vmx->nested.force_msr_bitmap_recalc = false;
@@ -6793,7 +6815,7 @@ static void nested_vmx_setup_exit_ctls(struct vmcs_config *vmcs_conf,
 		VM_EXIT_HOST_ADDR_SPACE_SIZE |
 #endif
 		VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
-		VM_EXIT_CLEAR_BNDCFGS;
+		VM_EXIT_CLEAR_BNDCFGS | VM_EXIT_LOAD_CET_STATE;
 	msrs->exit_ctls_high |=
 		VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
 		VM_EXIT_LOAD_IA32_EFER | VM_EXIT_SAVE_IA32_EFER |
@@ -6815,7 +6837,8 @@ static void nested_vmx_setup_entry_ctls(struct vmcs_config *vmcs_conf,
 #ifdef CONFIG_X86_64
 		VM_ENTRY_IA32E_MODE |
 #endif
-		VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS;
+		VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS |
+		VM_ENTRY_LOAD_CET_STATE;
 	msrs->entry_ctls_high |=
 		(VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | VM_ENTRY_LOAD_IA32_EFER |
 		 VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL);
diff --git a/arch/x86/kvm/vmx/vmcs12.c b/arch/x86/kvm/vmx/vmcs12.c
index 106a72c923ca..4233b5ca9461 100644
--- a/arch/x86/kvm/vmx/vmcs12.c
+++ b/arch/x86/kvm/vmx/vmcs12.c
@@ -139,6 +139,9 @@ const unsigned short vmcs12_field_offsets[] = {
 	FIELD(GUEST_PENDING_DBG_EXCEPTIONS, guest_pending_dbg_exceptions),
 	FIELD(GUEST_SYSENTER_ESP, guest_sysenter_esp),
 	FIELD(GUEST_SYSENTER_EIP, guest_sysenter_eip),
+	FIELD(GUEST_S_CET, guest_s_cet),
+	FIELD(GUEST_SSP, guest_ssp),
+	FIELD(GUEST_INTR_SSP_TABLE, guest_ssp_tbl),
 	FIELD(HOST_CR0, host_cr0),
 	FIELD(HOST_CR3, host_cr3),
 	FIELD(HOST_CR4, host_cr4),
@@ -151,5 +154,8 @@ const unsigned short vmcs12_field_offsets[] = {
 	FIELD(HOST_IA32_SYSENTER_EIP, host_ia32_sysenter_eip),
 	FIELD(HOST_RSP, host_rsp),
 	FIELD(HOST_RIP, host_rip),
+	FIELD(HOST_S_CET, host_s_cet),
+	FIELD(HOST_SSP, host_ssp),
+	FIELD(HOST_INTR_SSP_TABLE, host_ssp_tbl),
 };
 const unsigned int nr_vmcs12_fields = ARRAY_SIZE(vmcs12_field_offsets);
diff --git a/arch/x86/kvm/vmx/vmcs12.h b/arch/x86/kvm/vmx/vmcs12.h
index 01936013428b..3884489e7f7e 100644
--- a/arch/x86/kvm/vmx/vmcs12.h
+++ b/arch/x86/kvm/vmx/vmcs12.h
@@ -117,7 +117,13 @@ struct __packed vmcs12 {
 	natural_width host_ia32_sysenter_eip;
 	natural_width host_rsp;
 	natural_width host_rip;
-	natural_width paddingl[8]; /* room for future expansion */
+	natural_width host_s_cet;
+	natural_width host_ssp;
+	natural_width host_ssp_tbl;
+	natural_width guest_s_cet;
+	natural_width guest_ssp;
+	natural_width guest_ssp_tbl;
+	natural_width paddingl[2]; /* room for future expansion */
 	u32 pin_based_vm_exec_control;
 	u32 cpu_based_vm_exec_control;
 	u32 exception_bitmap;
@@ -292,6 +298,12 @@ static inline void vmx_check_vmcs12_offsets(void)
 	CHECK_OFFSET(host_ia32_sysenter_eip, 656);
 	CHECK_OFFSET(host_rsp, 664);
 	CHECK_OFFSET(host_rip, 672);
+	CHECK_OFFSET(host_s_cet, 680);
+	CHECK_OFFSET(host_ssp, 688);
+	CHECK_OFFSET(host_ssp_tbl, 696);
+	CHECK_OFFSET(guest_s_cet, 704);
+	CHECK_OFFSET(guest_ssp, 712);
+	CHECK_OFFSET(guest_ssp_tbl, 720);
 	CHECK_OFFSET(pin_based_vm_exec_control, 744);
 	CHECK_OFFSET(cpu_based_vm_exec_control, 748);
 	CHECK_OFFSET(exception_bitmap, 752);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 2f2b6f7c33d9..491039aeb61b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7726,6 +7726,8 @@ static void nested_vmx_cr_fixed1_bits_update(struct kvm_vcpu *vcpu)
 	cr4_fixed1_update(X86_CR4_PKE,        ecx, feature_bit(PKU));
 	cr4_fixed1_update(X86_CR4_UMIP,       ecx, feature_bit(UMIP));
 	cr4_fixed1_update(X86_CR4_LA57,       ecx, feature_bit(LA57));
+	cr4_fixed1_update(X86_CR4_CET,	      ecx, feature_bit(SHSTK));
+	cr4_fixed1_update(X86_CR4_CET,	      edx, feature_bit(IBT));
 
 #undef cr4_fixed1_update
 }
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 07/19] KVM:x86: Add fault checks for guest CR4.CET setting
  2023-08-03  4:27 ` [PATCH v5 07/19] KVM:x86: Add fault checks for guest CR4.CET setting Yang Weijiang
@ 2023-08-03  9:07   ` Chao Gao
  0 siblings, 0 replies; 82+ messages in thread
From: Chao Gao @ 2023-08-03  9:07 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, binbin.wu

nit: add a space betwen "KVM:" and "x86:" in the short changelog.

On Thu, Aug 03, 2023 at 12:27:20AM -0400, Yang Weijiang wrote:
>Check potential faults for CR4.CET setting per Intel SDM.
>CET can be enabled if and only if CR0.WP==1, i.e. setting
>CR4.CET=1 faults if CR0.WP==0 and setting CR0.WP=0 fails
>if CR4.CET==1.
>
>Co-developed-by: Sean Christopherson <seanjc@google.com>
>Signed-off-by: Sean Christopherson <seanjc@google.com>
>Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>

Reviewed-by: Chao Gao <chao.gao@intel.com>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 08/19] KVM:x86: Report KVM supported CET MSRs as to-be-saved
  2023-08-03  4:27 ` [PATCH v5 08/19] KVM:x86: Report KVM supported CET MSRs as to-be-saved Yang Weijiang
@ 2023-08-03 10:39   ` Chao Gao
  2023-08-04  3:13     ` Yang, Weijiang
  2023-08-04 18:55   ` Sean Christopherson
  2023-08-04 21:47   ` Paolo Bonzini
  2 siblings, 1 reply; 82+ messages in thread
From: Chao Gao @ 2023-08-03 10:39 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, binbin.wu

On Thu, Aug 03, 2023 at 12:27:21AM -0400, Yang Weijiang wrote:
>Add all CET MSRs including the synthesized GUEST_SSP to report list.
>PL{0,1,2}_SSP are independent to host XSAVE management with later
>patches. MSR_IA32_U_CET and MSR_IA32_PL3_SSP are XSAVE-managed on
>host side. MSR_IA32_S_CET/MSR_IA32_INT_SSP_TAB/MSR_KVM_GUEST_SSP
>are not XSAVE-managed.
>
>When CET IBT/SHSTK are enumerated to guest, both user and supervisor
>modes should be supported for architechtural integrity, i.e., two
>modes are supported as both or neither.

I think whether MSRs are XSAVE-managed or not isn't related or important in
this patch. And I don't get what's the intent of the last paragraph.

how about:

Add CET MSRs to the list of MSRs reported to userspace if the feature
i.e., IBT or SHSTK, associated with the MSRs is supported by KVM.

>
>Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
>---
> arch/x86/include/uapi/asm/kvm_para.h |  1 +
> arch/x86/kvm/x86.c                   | 10 ++++++++++
> arch/x86/kvm/x86.h                   | 10 ++++++++++
> 3 files changed, 21 insertions(+)
>
>diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
>index 6e64b27b2c1e..7af465e4e0bd 100644
>--- a/arch/x86/include/uapi/asm/kvm_para.h
>+++ b/arch/x86/include/uapi/asm/kvm_para.h
>@@ -58,6 +58,7 @@
> #define MSR_KVM_ASYNC_PF_INT	0x4b564d06
> #define MSR_KVM_ASYNC_PF_ACK	0x4b564d07
> #define MSR_KVM_MIGRATION_CONTROL	0x4b564d08
>+#define MSR_KVM_GUEST_SSP	0x4b564d09
> 
> struct kvm_steal_time {
> 	__u64 steal;
>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>index 82b9f14990da..d68ef87fe007 100644
>--- a/arch/x86/kvm/x86.c
>+++ b/arch/x86/kvm/x86.c
>@@ -1463,6 +1463,9 @@ static const u32 msrs_to_save_base[] = {
> 
> 	MSR_IA32_XFD, MSR_IA32_XFD_ERR,
> 	MSR_IA32_XSS,
>+	MSR_IA32_U_CET, MSR_IA32_S_CET,
>+	MSR_IA32_PL0_SSP, MSR_IA32_PL1_SSP, MSR_IA32_PL2_SSP,
>+	MSR_IA32_PL3_SSP, MSR_IA32_INT_SSP_TAB, MSR_KVM_GUEST_SSP,

MSR_KVM_GUEST_SSP really should be added by a separate patch.

it is incorrect to put MSR_KVM_GUEST_SSP here because the rdmsr_safe() in
kvm_probe_msr_to_save() will fail since hardware doesn't have this MSR.

IMO, MSR_KVM_GUEST_SSP should go to emulated_msrs_all[].

> };
> 
> static const u32 msrs_to_save_pmu[] = {
>@@ -7214,6 +7217,13 @@ static void kvm_probe_msr_to_save(u32 msr_index)
> 		if (!kvm_caps.supported_xss)
> 			return;
> 		break;
>+	case MSR_IA32_U_CET:
>+	case MSR_IA32_S_CET:
>+	case MSR_KVM_GUEST_SSP:
>+	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
>+		if (!kvm_is_cet_supported())

shall we consider the case where IBT is supported while SS isn't
(e.g., in L1 guest)?

if yes, we should do
	case MSR_IA32_U_CET:
	case MSR_IA32_S_CET:
		if (!kvm_is_cet_supported())
			return;
	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
		if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
			return;
	


>+			return;
>+		break;
> 	default:
> 		break;
> 	}
>diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
>index 82e3dafc5453..6e6292915f8c 100644
>--- a/arch/x86/kvm/x86.h
>+++ b/arch/x86/kvm/x86.h
>@@ -362,6 +362,16 @@ static inline bool kvm_mpx_supported(void)
> 		== (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR);
> }
> 
>+#define CET_XSTATE_MASK (XFEATURE_MASK_CET_USER)
>+/*
>+ * Shadow Stack and Indirect Branch Tracking feature enabling depends on
>+ * whether host side CET user xstate bit is supported or not.
>+ */
>+static inline bool kvm_is_cet_supported(void)
>+{
>+	return (kvm_caps.supported_xss & CET_XSTATE_MASK) == CET_XSTATE_MASK;

why not just check if SHSTK or IBT is supported explicitly, i.e.,

	return kvm_cpu_cap_has(X86_FEATURE_SHSTK) ||
	       kvm_cpu_cap_has(X86_FEATURE_IBT);

this is straightforward. And strictly speaking, the support of a feature and
the support of managing a feature's state via XSAVE(S) are two different things.

then patch 16 has no need to do

+	/*
+	 * If SHSTK and IBT are not available in KVM, clear CET user bit in
+	 * kvm_caps.supported_xss so that kvm_is_cet__supported() returns
+	 * false when called.
+	 */
+	if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
+	    !kvm_cpu_cap_has(X86_FEATURE_IBT))
+		kvm_caps.supported_xss &= ~CET_XSTATE_MASK;

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 09/19] KVM:x86: Make guest supervisor states as non-XSAVE managed
  2023-08-03  4:27 ` [PATCH v5 09/19] KVM:x86: Make guest supervisor states as non-XSAVE managed Yang Weijiang
@ 2023-08-03 11:15   ` Chao Gao
  2023-08-04  3:26     ` Yang, Weijiang
  0 siblings, 1 reply; 82+ messages in thread
From: Chao Gao @ 2023-08-03 11:15 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, binbin.wu

On Thu, Aug 03, 2023 at 12:27:22AM -0400, Yang Weijiang wrote:
>+void save_cet_supervisor_ssp(struct kvm_vcpu *vcpu)
>+{
>+	if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) {
>+		rdmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]);
>+		rdmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]);
>+		rdmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]);

>+		/*
>+		 * Omit reset to host PL{1,2}_SSP because Linux will never use
>+		 * these MSRs.
>+		 */
>+		wrmsrl(MSR_IA32_PL0_SSP, 0);

This wrmsrl() can be dropped because host doesn't support SSS yet.

>+	}
>+}
>+EXPORT_SYMBOL_GPL(save_cet_supervisor_ssp);
>+
>+void reload_cet_supervisor_ssp(struct kvm_vcpu *vcpu)
>+{
>+	if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) {

ditto

>+		wrmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]);
>+		wrmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]);
>+		wrmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]);
>+	}
>+}
>+EXPORT_SYMBOL_GPL(reload_cet_supervisor_ssp);
>+
> int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> {
> 	struct kvm_queued_exception *ex = &vcpu->arch.exception;
>@@ -12133,6 +12158,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> 
> 	vcpu->arch.cr3 = 0;
> 	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
>+	memset(vcpu->arch.cet_s_ssp, 0, sizeof(vcpu->arch.cet_s_ssp));
> 
> 	/*
> 	 * CR0.CD/NW are set on RESET, preserved on INIT.  Note, some versions
>@@ -12313,6 +12339,7 @@ void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
> 		pmu->need_cleanup = true;
> 		kvm_make_request(KVM_REQ_PMU, vcpu);
> 	}
>+

remove the stray newline.

> 	static_call(kvm_x86_sched_in)(vcpu, cpu);
> }
> 
>diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
>index 6e6292915f8c..c69fc027f5ec 100644
>--- a/arch/x86/kvm/x86.h
>+++ b/arch/x86/kvm/x86.h
>@@ -501,6 +501,9 @@ static inline void kvm_machine_check(void)
> 
> void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu);
> void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu);
>+void save_cet_supervisor_ssp(struct kvm_vcpu *vcpu);
>+void reload_cet_supervisor_ssp(struct kvm_vcpu *vcpu);

nit: please add kvm_ prefix to the function names because they are exposed to
other modules. "cet" in the names is a little redundant. I slightly prefer
kvm_save/load_guest_supervisor_ssp()

Overall, this patch looks good to me. Hence,

Reviewed-by: Chao Gao <chao.gao@intel.com>

>+
> int kvm_spec_ctrl_test_value(u64 value);
> bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
> int kvm_handle_memory_failure(struct kvm_vcpu *vcpu, int r,
>-- 
>2.27.0
>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 08/19] KVM:x86: Report KVM supported CET MSRs as to-be-saved
  2023-08-03 10:39   ` Chao Gao
@ 2023-08-04  3:13     ` Yang, Weijiang
  2023-08-04  5:51       ` Chao Gao
  0 siblings, 1 reply; 82+ messages in thread
From: Yang, Weijiang @ 2023-08-04  3:13 UTC (permalink / raw)
  To: Chao Gao
  Cc: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, binbin.wu

On 8/3/2023 6:39 PM, Chao Gao wrote:
> On Thu, Aug 03, 2023 at 12:27:21AM -0400, Yang Weijiang wrote:
>> Add all CET MSRs including the synthesized GUEST_SSP to report list.
>> PL{0,1,2}_SSP are independent to host XSAVE management with later
>> patches. MSR_IA32_U_CET and MSR_IA32_PL3_SSP are XSAVE-managed on
>> host side. MSR_IA32_S_CET/MSR_IA32_INT_SSP_TAB/MSR_KVM_GUEST_SSP
>> are not XSAVE-managed.
>>
>> When CET IBT/SHSTK are enumerated to guest, both user and supervisor
>> modes should be supported for architechtural integrity, i.e., two
>> modes are supported as both or neither.
> I think whether MSRs are XSAVE-managed or not isn't related or important in
> this patch. And I don't get what's the intent of the last paragraph.
I recalled the original intent to say, although kvm_is_cet_supported() only checks the
host support of user mode states, but user and supervisor mode states are exposed
as a bundle, i.e., both or neither. So it can enforce the same check for both
user and supervisor states support.
> how about:
>
> Add CET MSRs to the list of MSRs reported to userspace if the feature
> i.e., IBT or SHSTK, associated with the MSRs is supported by KVM.
It's OK for me, thanks!
>> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
>> ---
>> arch/x86/include/uapi/asm/kvm_para.h |  1 +
>> arch/x86/kvm/x86.c                   | 10 ++++++++++
>> arch/x86/kvm/x86.h                   | 10 ++++++++++
>> 3 files changed, 21 insertions(+)
>>
>> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
>> index 6e64b27b2c1e..7af465e4e0bd 100644
>> --- a/arch/x86/include/uapi/asm/kvm_para.h
>> +++ b/arch/x86/include/uapi/asm/kvm_para.h
>> @@ -58,6 +58,7 @@
>> #define MSR_KVM_ASYNC_PF_INT	0x4b564d06
>> #define MSR_KVM_ASYNC_PF_ACK	0x4b564d07
>> #define MSR_KVM_MIGRATION_CONTROL	0x4b564d08
>> +#define MSR_KVM_GUEST_SSP	0x4b564d09
>>
>> struct kvm_steal_time {
>> 	__u64 steal;
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 82b9f14990da..d68ef87fe007 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -1463,6 +1463,9 @@ static const u32 msrs_to_save_base[] = {
>>
>> 	MSR_IA32_XFD, MSR_IA32_XFD_ERR,
>> 	MSR_IA32_XSS,
>> +	MSR_IA32_U_CET, MSR_IA32_S_CET,
>> +	MSR_IA32_PL0_SSP, MSR_IA32_PL1_SSP, MSR_IA32_PL2_SSP,
>> +	MSR_IA32_PL3_SSP, MSR_IA32_INT_SSP_TAB, MSR_KVM_GUEST_SSP,
> MSR_KVM_GUEST_SSP really should be added by a separate patch.
>
> it is incorrect to put MSR_KVM_GUEST_SSP here because the rdmsr_safe() in
> kvm_probe_msr_to_save() will fail since hardware doesn't have this MSR.
>
> IMO, MSR_KVM_GUEST_SSP should go to emulated_msrs_all[].
Nice catch! Will move it to emulated_msrs_all, thanks!
>> };
>>
>> static const u32 msrs_to_save_pmu[] = {
>> @@ -7214,6 +7217,13 @@ static void kvm_probe_msr_to_save(u32 msr_index)
>> 		if (!kvm_caps.supported_xss)
>> 			return;
>> 		break;
>> +	case MSR_IA32_U_CET:
>> +	case MSR_IA32_S_CET:
>> +	case MSR_KVM_GUEST_SSP:
>> +	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
>> +		if (!kvm_is_cet_supported())
> shall we consider the case where IBT is supported while SS isn't
> (e.g., in L1 guest)?
Yes, but userspace should be able to access SHSTK MSRs even only IBT is exposed to guest so
far as KVM can support SHSTK MSRs. And here is to advertise all the supported CET MSRs.
So maybe we don't need to check specific feature support.
> if yes, we should do
> 	case MSR_IA32_U_CET:
> 	case MSR_IA32_S_CET:
> 		if (!kvm_is_cet_supported())
> 			return;
> 	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
> 		if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
> 			return;
> 	
>
>
>> +			return;
>> +		break;
>> 	default:
>> 		break;
>> 	}
>> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
>> index 82e3dafc5453..6e6292915f8c 100644
>> --- a/arch/x86/kvm/x86.h
>> +++ b/arch/x86/kvm/x86.h
>> @@ -362,6 +362,16 @@ static inline bool kvm_mpx_supported(void)
>> 		== (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR);
>> }
>>
>> +#define CET_XSTATE_MASK (XFEATURE_MASK_CET_USER)
>> +/*
>> + * Shadow Stack and Indirect Branch Tracking feature enabling depends on
>> + * whether host side CET user xstate bit is supported or not.
>> + */
>> +static inline bool kvm_is_cet_supported(void)
>> +{
>> +	return (kvm_caps.supported_xss & CET_XSTATE_MASK) == CET_XSTATE_MASK;
> why not just check if SHSTK or IBT is supported explicitly, i.e.,
>
> 	return kvm_cpu_cap_has(X86_FEATURE_SHSTK) ||
> 	       kvm_cpu_cap_has(X86_FEATURE_IBT);
>
> this is straightforward. And strictly speaking, the support of a feature and
> the support of managing a feature's state via XSAVE(S) are two different things.x
I think using exiting check implies two things:
1. Platform/KVM can support CET features.
2. CET user mode MSRs are backed by host thus are guaranteed to be valid.
i.e., the purpose is to check guest CET dependencies instead of features' availability.

kvm_cpu_cap_has(X86_FEATURE_SHSTK) || kvm_cpu_cap_has(X86_FEATURE_IBT)

only tells at least one of the CET features is supported by KVM.

> then patch 16 has no need to do
>
> +	/*
> +	 * If SHSTK and IBT are not available in KVM, clear CET user bit in
> +	 * kvm_caps.supported_xss so that kvm_is_cet__supported() returns
> +	 * false when called.
> +	 */
> +	if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
> +	    !kvm_cpu_cap_has(X86_FEATURE_IBT))
> +		kvm_caps.supported_xss &= ~CET_XSTATE_MASK;


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 09/19] KVM:x86: Make guest supervisor states as non-XSAVE managed
  2023-08-03 11:15   ` Chao Gao
@ 2023-08-04  3:26     ` Yang, Weijiang
  2023-08-04 20:45       ` Sean Christopherson
  0 siblings, 1 reply; 82+ messages in thread
From: Yang, Weijiang @ 2023-08-04  3:26 UTC (permalink / raw)
  To: Chao Gao
  Cc: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, binbin.wu

On 8/3/2023 7:15 PM, Chao Gao wrote:
> On Thu, Aug 03, 2023 at 12:27:22AM -0400, Yang Weijiang wrote:
>> +void save_cet_supervisor_ssp(struct kvm_vcpu *vcpu)
>> +{
>> +	if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) {
>> +		rdmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]);
>> +		rdmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]);
>> +		rdmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]);
>> +		/*
>> +		 * Omit reset to host PL{1,2}_SSP because Linux will never use
>> +		 * these MSRs.
>> +		 */
>> +		wrmsrl(MSR_IA32_PL0_SSP, 0);
> This wrmsrl() can be dropped because host doesn't support SSS yet.
Frankly speaking, I want to remove this line of code. But that would mess up the MSR
on host side, i.e., from host perspective, the MSRs could be filled with garbage data,
and looks awful. Anyway, I can remove it.
>> +	}
>> +}
>> +EXPORT_SYMBOL_GPL(save_cet_supervisor_ssp);
>> +
>> +void reload_cet_supervisor_ssp(struct kvm_vcpu *vcpu)
>> +{
>> +	if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) {
> ditto
Below is to reload guest supervisor SSPs instead of resetting host ones.
>> +		wrmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]);
>> +		wrmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]);
>> +		wrmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]);
>> +	}
>> +}
>> +EXPORT_SYMBOL_GPL(reload_cet_supervisor_ssp);
>> +
>> int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>> {
>> 	struct kvm_queued_exception *ex = &vcpu->arch.exception;
>> @@ -12133,6 +12158,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>>
>> 	vcpu->arch.cr3 = 0;
>> 	kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
>> +	memset(vcpu->arch.cet_s_ssp, 0, sizeof(vcpu->arch.cet_s_ssp));
>>
>> 	/*
>> 	 * CR0.CD/NW are set on RESET, preserved on INIT.  Note, some versions
>> @@ -12313,6 +12339,7 @@ void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
>> 		pmu->need_cleanup = true;
>> 		kvm_make_request(KVM_REQ_PMU, vcpu);
>> 	}
>> +
> remove the stray newline.
OK.
>> 	static_call(kvm_x86_sched_in)(vcpu, cpu);
>> }
>>
>> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
>> index 6e6292915f8c..c69fc027f5ec 100644
>> --- a/arch/x86/kvm/x86.h
>> +++ b/arch/x86/kvm/x86.h
>> @@ -501,6 +501,9 @@ static inline void kvm_machine_check(void)
>>
>> void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu);
>> void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu);
>> +void save_cet_supervisor_ssp(struct kvm_vcpu *vcpu);
>> +void reload_cet_supervisor_ssp(struct kvm_vcpu *vcpu);
> nit: please add kvm_ prefix to the function names because they are exposed to
> other modules. "cet" in the names is a little redundant. I slightly prefer
> kvm_save/load_guest_supervisor_ssp()
Sure, actually I wanted to add the prefix, but at a second thought, the functions with
kvm_ are mostly generic functions in KVM, but here are the CET specific functions.
>
> Overall, this patch looks good to me. Hence,
>
> Reviewed-by: Chao Gao <chao.gao@intel.com>
Thanks a lot for the review!
>> +
>> int kvm_spec_ctrl_test_value(u64 value);
>> bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
>> int kvm_handle_memory_failure(struct kvm_vcpu *vcpu, int r,
>> -- 
>> 2.27.0
>>


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 11/19] KVM:VMX: Emulate read and write to CET MSRs
  2023-08-03  4:27 ` [PATCH v5 11/19] KVM:VMX: Emulate read and write to CET MSRs Yang Weijiang
@ 2023-08-04  5:14   ` Chao Gao
  2023-08-04 21:27     ` Sean Christopherson
  2023-08-04  8:28   ` Chao Gao
  2023-08-04 21:40   ` Paolo Bonzini
  2 siblings, 1 reply; 82+ messages in thread
From: Chao Gao @ 2023-08-04  5:14 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, binbin.wu

On Thu, Aug 03, 2023 at 12:27:24AM -0400, Yang Weijiang wrote:
>Add emulation interface for CET MSR read and write.
>The emulation code is split into common part and vendor specific
>part, the former resides in x86.c to benefic different x86 CPU
>vendors, the latter for VMX is implemented in this patch.
>
>Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
>---
> arch/x86/kvm/vmx/vmx.c |  27 +++++++++++
> arch/x86/kvm/x86.c     | 104 +++++++++++++++++++++++++++++++++++++----
> arch/x86/kvm/x86.h     |  18 +++++++
> 3 files changed, 141 insertions(+), 8 deletions(-)
>
>diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>index 6aa76124e81e..ccf750e79608 100644
>--- a/arch/x86/kvm/vmx/vmx.c
>+++ b/arch/x86/kvm/vmx/vmx.c
>@@ -2095,6 +2095,18 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> 		else
> 			msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
> 		break;
>+	case MSR_IA32_S_CET:
>+	case MSR_KVM_GUEST_SSP:
>+	case MSR_IA32_INT_SSP_TAB:
>+		if (kvm_get_msr_common(vcpu, msr_info))
>+			return 1;
>+		if (msr_info->index == MSR_KVM_GUEST_SSP)
>+			msr_info->data = vmcs_readl(GUEST_SSP);
>+		else if (msr_info->index == MSR_IA32_S_CET)
>+			msr_info->data = vmcs_readl(GUEST_S_CET);
>+		else if (msr_info->index == MSR_IA32_INT_SSP_TAB)
>+			msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE);

This if-else-if suggests that they are focibly grouped together to just
share the call of kvm_get_msr_common(). For readability, I think it is better
to handle them separately.

e.g.,
	case MSR_IA32_S_CET:
		if (kvm_get_msr_common(vcpu, msr_info))
			return 1;
		msr_info->data = vmcs_readl(GUEST_S_CET);
		break;

	case MSR_KVM_GUEST_SSP:
		if (kvm_get_msr_common(vcpu, msr_info))
			return 1;
		msr_info->data = vmcs_readl(GUEST_SSP);
		break;

	...


>+		break;
> 	case MSR_IA32_DEBUGCTLMSR:
> 		msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
> 		break;
>@@ -2404,6 +2416,18 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> 		else
> 			vmx->pt_desc.guest.addr_a[index / 2] = data;
> 		break;
>+	case MSR_IA32_S_CET:
>+	case MSR_KVM_GUEST_SSP:
>+	case MSR_IA32_INT_SSP_TAB:
>+		if (kvm_set_msr_common(vcpu, msr_info))
>+			return 1;
>+		if (msr_index == MSR_KVM_GUEST_SSP)
>+			vmcs_writel(GUEST_SSP, data);
>+		else if (msr_index == MSR_IA32_S_CET)
>+			vmcs_writel(GUEST_S_CET, data);
>+		else if (msr_index == MSR_IA32_INT_SSP_TAB)
>+			vmcs_writel(GUEST_INTR_SSP_TABLE, data);

ditto

>+		break;
> 	case MSR_IA32_PERF_CAPABILITIES:
> 		if (data && !vcpu_to_pmu(vcpu)->version)
> 			return 1;
>@@ -4864,6 +4888,9 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> 		vmcs_write64(GUEST_BNDCFGS, 0);
> 
> 	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);  /* 22.2.1 */
>+	vmcs_writel(GUEST_SSP, 0);
>+	vmcs_writel(GUEST_S_CET, 0);
>+	vmcs_writel(GUEST_INTR_SSP_TABLE, 0);

where are MSR_IA32_PL3_SSP and MSR_IA32_U_CET reset?

I thought that guest FPU would be reset in kvm_vcpu_reset(). But it turns out
only MPX states are reset in KVM while other FPU states are unchanged. This
is aligned with "Table 10.1 IA-32 and Intel® 64 Processor States Following
Power-up, Reset, or INIT"

Could you double confirm the hardware beahavior that CET states are reset to 0
on INIT? If CET states are reset, we need to handle CET_IA32_PL3_SSP and
MSR_IA32_U_CET like MPX.

> 
> 	kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu);
> 
>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>index 5b63441fd2d2..98f3ff6078e6 100644
>--- a/arch/x86/kvm/x86.c
>+++ b/arch/x86/kvm/x86.c
>@@ -3627,6 +3627,39 @@ static bool kvm_is_msr_to_save(u32 msr_index)
> 	return false;
> }
> 
>+static inline bool is_shadow_stack_msr(u32 msr)
>+{
>+	return msr == MSR_IA32_PL0_SSP ||
>+		msr == MSR_IA32_PL1_SSP ||
>+		msr == MSR_IA32_PL2_SSP ||
>+		msr == MSR_IA32_PL3_SSP ||
>+		msr == MSR_IA32_INT_SSP_TAB ||
>+		msr == MSR_KVM_GUEST_SSP;
>+}
>+
>+static bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu,
>+				      struct msr_data *msr)
>+{
>+	if (is_shadow_stack_msr(msr->index)) {
>+		if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
>+			return false;
>+
>+		if (msr->index == MSR_KVM_GUEST_SSP)
>+			return msr->host_initiated;
>+
>+		return msr->host_initiated ||
>+			guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
>+	}
>+
>+	if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
>+	    !kvm_cpu_cap_has(X86_FEATURE_IBT))
>+		return false;
>+
>+	return msr->host_initiated ||
>+		guest_cpuid_has(vcpu, X86_FEATURE_IBT) ||
>+		guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
>+}
>+
> int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> {
> 	u32 msr = msr_info->index;
>@@ -3981,6 +4014,45 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> 		vcpu->arch.guest_fpu.xfd_err = data;
> 		break;
> #endif
>+#define CET_EXCLUSIVE_BITS		(CET_SUPPRESS | CET_WAIT_ENDBR)
>+#define CET_CTRL_RESERVED_BITS		GENMASK(9, 6)
>+#define CET_SHSTK_MASK_BITS		GENMASK(1, 0)
>+#define CET_IBT_MASK_BITS		(GENMASK_ULL(5, 2) | \
>+					 GENMASK_ULL(63, 10))
>+#define CET_LEG_BITMAP_BASE(data)	((data) >> 12)
>+	case MSR_IA32_U_CET:
>+	case MSR_IA32_S_CET:
>+		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
>+			return 1;
>+		if (!!(data & CET_CTRL_RESERVED_BITS))
>+			return 1;
>+		if (!guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
>+		    (data & CET_SHSTK_MASK_BITS))
>+			return 1;
>+		if (!guest_can_use(vcpu, X86_FEATURE_IBT) &&
>+		    (data & CET_IBT_MASK_BITS))
>+			return 1;
>+		if (!IS_ALIGNED(CET_LEG_BITMAP_BASE(data), 4) ||
>+		    (data & CET_EXCLUSIVE_BITS) == CET_EXCLUSIVE_BITS)
>+			return 1;
>+		if (msr == MSR_IA32_U_CET)

can you add a comment before this if() statement like?
		/* MSR_IA32_S_CET is handled by vendor code */

>+	case MSR_KVM_GUEST_SSP:
>+	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
>+		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
>+			return 1;
>+		if (is_noncanonical_address(data, vcpu))
>+			return 1;
>+		if (!IS_ALIGNED(data, 4))
>+			return 1;
>+		if (msr == MSR_IA32_PL0_SSP || msr == MSR_IA32_PL1_SSP ||
>+		    msr == MSR_IA32_PL2_SSP) {
>+			vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP] = data;
>+		} else if (msr == MSR_IA32_PL3_SSP) {
>+			kvm_set_xsave_msr(msr_info);
>+		}

brackets are not needed.

also add a comment for MSR_KVM_GUEST_SSP.

>+		break;
> 	default:
> 		if (kvm_pmu_is_valid_msr(vcpu, msr))
> 			return kvm_pmu_set_msr(vcpu, msr_info);
>@@ -4051,7 +4123,9 @@ static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host)
> 
> int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> {
>-	switch (msr_info->index) {
>+	u32 msr = msr_info->index;
>+
>+	switch (msr) {
> 	case MSR_IA32_PLATFORM_ID:
> 	case MSR_IA32_EBL_CR_POWERON:
> 	case MSR_IA32_LASTBRANCHFROMIP:
>@@ -4086,7 +4160,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> 	case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR3:
> 	case MSR_P6_PERFCTR0 ... MSR_P6_PERFCTR1:
> 	case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL1:
>-		if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
>+		if (kvm_pmu_is_valid_msr(vcpu, msr))
> 			return kvm_pmu_get_msr(vcpu, msr_info);
> 		msr_info->data = 0;
> 		break;
>@@ -4137,7 +4211,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> 	case MSR_MTRRcap:
> 	case MTRRphysBase_MSR(0) ... MSR_MTRRfix4K_F8000:
> 	case MSR_MTRRdefType:
>-		return kvm_mtrr_get_msr(vcpu, msr_info->index, &msr_info->data);
>+		return kvm_mtrr_get_msr(vcpu, msr, &msr_info->data);
> 	case 0xcd: /* fsb frequency */
> 		msr_info->data = 3;
> 		break;
>@@ -4159,7 +4233,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> 		msr_info->data = kvm_get_apic_base(vcpu);
> 		break;
> 	case APIC_BASE_MSR ... APIC_BASE_MSR + 0xff:
>-		return kvm_x2apic_msr_read(vcpu, msr_info->index, &msr_info->data);
>+		return kvm_x2apic_msr_read(vcpu, msr, &msr_info->data);
> 	case MSR_IA32_TSC_DEADLINE:
> 		msr_info->data = kvm_get_lapic_tscdeadline_msr(vcpu);
> 		break;
>@@ -4253,7 +4327,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> 	case MSR_IA32_MCG_STATUS:
> 	case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
> 	case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
>-		return get_msr_mce(vcpu, msr_info->index, &msr_info->data,
>+		return get_msr_mce(vcpu, msr, &msr_info->data,
> 				   msr_info->host_initiated);
> 	case MSR_IA32_XSS:
> 		if (!msr_info->host_initiated &&
>@@ -4284,7 +4358,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> 	case HV_X64_MSR_TSC_EMULATION_STATUS:
> 	case HV_X64_MSR_TSC_INVARIANT_CONTROL:
> 		return kvm_hv_get_msr_common(vcpu,
>-					     msr_info->index, &msr_info->data,
>+					     msr, &msr_info->data,
> 					     msr_info->host_initiated);
> 	case MSR_IA32_BBL_CR_CTL3:
> 		/* This legacy MSR exists but isn't fully documented in current
>@@ -4337,8 +4411,22 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> 		msr_info->data = vcpu->arch.guest_fpu.xfd_err;
> 		break;
> #endif
>+	case MSR_IA32_U_CET:
>+	case MSR_IA32_S_CET:
>+	case MSR_KVM_GUEST_SSP:
>+	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
>+		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
>+			return 1;
>+		if (msr == MSR_IA32_PL0_SSP || msr == MSR_IA32_PL1_SSP ||
>+		    msr == MSR_IA32_PL2_SSP) {
>+			msr_info->data =
>+				vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP];
>+		} else if (msr == MSR_IA32_U_CET || msr == MSR_IA32_PL3_SSP) {
>+			kvm_get_xsave_msr(msr_info);
>+		}

Again, for readability and clarity, how about:

	case MSR_IA32_U_CET:
	case MSR_IA32_PL3_SSP:
		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
			return 1;
		kvm_get_xsave_msr(msr_info);
		break;
	case MSR_IA32_PL0_SSP ... MSR_IA32_PL2_SSP:
		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
			return 1;
		msr_info->data = vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP];
		break;
	case MSR_IA32_S_CET:
	case MSR_KVM_GUEST_SSP:
		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
			return 1;
		/* Further handling in vendor code */
		break;

>+		break;
> 	default:
>-		if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
>+		if (kvm_pmu_is_valid_msr(vcpu, msr))
> 			return kvm_pmu_get_msr(vcpu, msr_info);
> 
> 		/*
>@@ -4346,7 +4434,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> 		 * to-be-saved, even if an MSR isn't fully supported.
> 		 */
> 		if (msr_info->host_initiated &&
>-		    kvm_is_msr_to_save(msr_info->index)) {
>+		    kvm_is_msr_to_save(msr)) {
> 			msr_info->data = 0;
> 			break;
> 		}
>diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
>index c69fc027f5ec..3b79d6db2f83 100644
>--- a/arch/x86/kvm/x86.h
>+++ b/arch/x86/kvm/x86.h
>@@ -552,4 +552,22 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
> 			 unsigned int port, void *data,  unsigned int count,
> 			 int in);
> 
>+/*
>+ * Guest xstate MSRs have been loaded in __msr_io(), disable preemption before
>+ * access the MSRs to avoid MSR content corruption.
>+ */

I think it is better to describe what the function does prior to jumping into
details like where guest FPU is loaded.

/*
 * Lock and/or reload guest FPU and access xstate MSRs. For accesses initiated
 * by host, guest FPU is loaded in __msr_io(). For accesses initiated by guest,
 * guest FPU should have been loaded already.
 */
>+static inline void kvm_get_xsave_msr(struct msr_data *msr_info)
>+{
>+	kvm_fpu_get();
>+	rdmsrl(msr_info->index, msr_info->data);
>+	kvm_fpu_put();
>+}
>+
>+static inline void kvm_set_xsave_msr(struct msr_data *msr_info)
>+{
>+	kvm_fpu_get();
>+	wrmsrl(msr_info->index, msr_info->data);
>+	kvm_fpu_put();
>+}

Can you rename functions to kvm_get/set_xstate_msr() to align with the comment
and patch 6? And if there is no user outside x86.c, you can just put these two
functions right after the is_xstate_msr() added in patch 6.

>+
> #endif
>-- 
>2.27.0
>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 08/19] KVM:x86: Report KVM supported CET MSRs as to-be-saved
  2023-08-04  3:13     ` Yang, Weijiang
@ 2023-08-04  5:51       ` Chao Gao
  2023-08-04 18:51         ` Sean Christopherson
  2023-08-06  8:54         ` Yang, Weijiang
  0 siblings, 2 replies; 82+ messages in thread
From: Chao Gao @ 2023-08-04  5:51 UTC (permalink / raw)
  To: Yang, Weijiang
  Cc: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, binbin.wu

On Fri, Aug 04, 2023 at 11:13:36AM +0800, Yang, Weijiang wrote:
>> > @@ -7214,6 +7217,13 @@ static void kvm_probe_msr_to_save(u32 msr_index)
>> > 		if (!kvm_caps.supported_xss)
>> > 			return;
>> > 		break;
>> > +	case MSR_IA32_U_CET:
>> > +	case MSR_IA32_S_CET:
>> > +	case MSR_KVM_GUEST_SSP:
>> > +	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
>> > +		if (!kvm_is_cet_supported())
>> shall we consider the case where IBT is supported while SS isn't
>> (e.g., in L1 guest)?
>Yes, but userspace should be able to access SHSTK MSRs even only IBT is exposed to guest so
>far as KVM can support SHSTK MSRs.

Why should userspace be allowed to access SHSTK MSRs in this case? L1 may not
even enumerate SHSTK (qemu removes -shstk explicitly but keeps IBT), how KVM in
L1 can allow its userspace to do that?

>> > +static inline bool kvm_is_cet_supported(void)
>> > +{
>> > +	return (kvm_caps.supported_xss & CET_XSTATE_MASK) == CET_XSTATE_MASK;
>> why not just check if SHSTK or IBT is supported explicitly, i.e.,
>> 
>> 	return kvm_cpu_cap_has(X86_FEATURE_SHSTK) ||
>> 	       kvm_cpu_cap_has(X86_FEATURE_IBT);
>> 
>> this is straightforward. And strictly speaking, the support of a feature and
>> the support of managing a feature's state via XSAVE(S) are two different things.x
>I think using exiting check implies two things:
>1. Platform/KVM can support CET features.
>2. CET user mode MSRs are backed by host thus are guaranteed to be valid.
>i.e., the purpose is to check guest CET dependencies instead of features' availability.

When KVM claims a feature is supported, it should ensure all its dependencies are
met. that's, KVM's support of a feature also imples all dependencies are met.
Function-wise, the two approaches have no difference. I just think checking
KVM's support of SHSTK/IBT is more clear because the function name is
kvm_is_cet_supported() rather than e.g., kvm_is_cet_state_managed_by_xsave().

>
>kvm_cpu_cap_has(X86_FEATURE_SHSTK) || kvm_cpu_cap_has(X86_FEATURE_IBT)
>
>only tells at least one of the CET features is supported by KVM.
>
>> then patch 16 has no need to do
>> 
>> +	/*
>> +	 * If SHSTK and IBT are not available in KVM, clear CET user bit in
>> +	 * kvm_caps.supported_xss so that kvm_is_cet__supported() returns
>> +	 * false when called.
>> +	 */
>> +	if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
>> +	    !kvm_cpu_cap_has(X86_FEATURE_IBT))
>> +		kvm_caps.supported_xss &= ~CET_XSTATE_MASK;
>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 12/19] KVM:x86: Save and reload SSP to/from SMRAM
  2023-08-03  4:27 ` [PATCH v5 12/19] KVM:x86: Save and reload SSP to/from SMRAM Yang Weijiang
@ 2023-08-04  7:53   ` Chao Gao
  2023-08-04 15:25     ` Sean Christopherson
  0 siblings, 1 reply; 82+ messages in thread
From: Chao Gao @ 2023-08-04  7:53 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, binbin.wu

On Thu, Aug 03, 2023 at 12:27:25AM -0400, Yang Weijiang wrote:
>Save CET SSP to SMRAM on SMI and reload it on RSM.
>KVM emulates architectural behavior when guest enters/leaves SMM
>mode, i.e., save registers to SMRAM at the entry of SMM and reload
>them at the exit of SMM. Per SDM, SSP is defined as one of
>the fields in SMRAM for 64-bit mode, so handle the state accordingly.
>
>Check is_smm() to determine whether kvm_cet_is_msr_accessible()
>is called in SMM mode so that kvm_{set,get}_msr() works in SMM mode.
>
>Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
>---
> arch/x86/kvm/smm.c | 11 +++++++++++
> arch/x86/kvm/smm.h |  2 +-
> arch/x86/kvm/x86.c | 11 ++++++++++-
> 3 files changed, 22 insertions(+), 2 deletions(-)
>
>diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c
>index b42111a24cc2..e0b62d211306 100644
>--- a/arch/x86/kvm/smm.c
>+++ b/arch/x86/kvm/smm.c
>@@ -309,6 +309,12 @@ void enter_smm(struct kvm_vcpu *vcpu)
> 
> 	kvm_smm_changed(vcpu, true);
> 
>+#ifdef CONFIG_X86_64
>+	if (guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
>+	    kvm_get_msr(vcpu, MSR_KVM_GUEST_SSP, &smram.smram64.ssp))
>+		goto error;
>+#endif

SSP save/load should go to enter_smm_save_state_64() and rsm_load_state_64(),
where other fields of SMRAM are handled.

>+
> 	if (kvm_vcpu_write_guest(vcpu, vcpu->arch.smbase + 0xfe00, &smram, sizeof(smram)))
> 		goto error;
> 
>@@ -586,6 +592,11 @@ int emulator_leave_smm(struct x86_emulate_ctxt *ctxt)
> 	if ((vcpu->arch.hflags & HF_SMM_INSIDE_NMI_MASK) == 0)
> 		static_call(kvm_x86_set_nmi_mask)(vcpu, false);
> 
>+#ifdef CONFIG_X86_64
>+	if (guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
>+	    kvm_set_msr(vcpu, MSR_KVM_GUEST_SSP, smram.smram64.ssp))
>+		return X86EMUL_UNHANDLEABLE;
>+#endif
> 	kvm_smm_changed(vcpu, false);
> 
> 	/*
>diff --git a/arch/x86/kvm/smm.h b/arch/x86/kvm/smm.h
>index a1cf2ac5bd78..1e2a3e18207f 100644
>--- a/arch/x86/kvm/smm.h
>+++ b/arch/x86/kvm/smm.h
>@@ -116,8 +116,8 @@ struct kvm_smram_state_64 {
> 	u32 smbase;
> 	u32 reserved4[5];
> 
>-	/* ssp and svm_* fields below are not implemented by KVM */
> 	u64 ssp;
>+	/* svm_* fields below are not implemented by KVM */
> 	u64 svm_guest_pat;
> 	u64 svm_host_efer;
> 	u64 svm_host_cr4;
>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>index 98f3ff6078e6..56aa5a3d3913 100644
>--- a/arch/x86/kvm/x86.c
>+++ b/arch/x86/kvm/x86.c
>@@ -3644,8 +3644,17 @@ static bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu,
> 		if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
> 			return false;
> 
>-		if (msr->index == MSR_KVM_GUEST_SSP)
>+		/*
>+		 * This MSR is synthesized mainly for userspace access during
>+		 * Live Migration, it also can be accessed in SMM mode by VMM.
>+		 * Guest is not allowed to access this MSR.
>+		 */
>+		if (msr->index == MSR_KVM_GUEST_SSP) {
>+			if (IS_ENABLED(CONFIG_X86_64) && is_smm(vcpu))
>+				return true;

On second thoughts, this is incorrect. We don't want guest in SMM
mode to read/write SSP via the synthesized MSR. Right?

You can
1. move set/get guest SSP into two helper functions, e.g., kvm_set/get_ssp()
2. call kvm_set/get_ssp() for host-initiated MSR accesses and SMM transitions.
3. refuse guest accesses to the synthesized MSR.

>+
> 			return msr->host_initiated;
>+		}
> 
> 		return msr->host_initiated ||
> 			guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
>-- 
>2.27.0
>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 13/19] KVM:VMX: Set up interception for CET MSRs
  2023-08-03  4:27 ` [PATCH v5 13/19] KVM:VMX: Set up interception for CET MSRs Yang Weijiang
@ 2023-08-04  8:16   ` Chao Gao
  2023-08-06  9:22     ` Yang, Weijiang
  0 siblings, 1 reply; 82+ messages in thread
From: Chao Gao @ 2023-08-04  8:16 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, binbin.wu

On Thu, Aug 03, 2023 at 12:27:26AM -0400, Yang Weijiang wrote:
>Pass through CET MSRs when the associated feature is enabled.
>Shadow Stack feature requires all the CET MSRs to make it
>architectural support in guest. IBT feature only depends on
>MSR_IA32_U_CET and MSR_IA32_S_CET to enable both user and
>supervisor IBT. Note, This MSR design introduced an architectual
>limitation of SHSTK and IBT control for guest, i.e., when SHSTK
>is exposed, IBT is also available to guest from architectual level
>since IBT relies on subset of SHSTK relevant MSRs.
>
>Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>

Reviewed-by: Chao Gao <chao.gao@intel.com>

one nit below

>---
> arch/x86/kvm/vmx/vmx.c | 41 +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 41 insertions(+)
>
>diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>index ccf750e79608..6779b8a63789 100644
>--- a/arch/x86/kvm/vmx/vmx.c
>+++ b/arch/x86/kvm/vmx/vmx.c
>@@ -709,6 +709,10 @@ static bool is_valid_passthrough_msr(u32 msr)
> 	case MSR_LBR_CORE_TO ... MSR_LBR_CORE_TO + 8:
> 		/* LBR MSRs. These are handled in vmx_update_intercept_for_lbr_msrs() */
> 		return true;
>+	case MSR_IA32_U_CET:
>+	case MSR_IA32_S_CET:
>+	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
>+		return true;
> 	}
> 
> 	r = possible_passthrough_msr_slot(msr) != -ENOENT;
>@@ -7747,6 +7751,41 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
> 		vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4));
> }
> 
>+static void vmx_update_intercept_for_cet_msr(struct kvm_vcpu *vcpu)
>+{
>+	bool incpt;
>+
>+	if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) {
>+		incpt = !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);

...

>+
>+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET,
>+					  MSR_TYPE_RW, incpt);
>+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET,
>+					  MSR_TYPE_RW, incpt);
>+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP,
>+					  MSR_TYPE_RW, incpt);
>+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP,
>+					  MSR_TYPE_RW, incpt);
>+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP,
>+					  MSR_TYPE_RW, incpt);
>+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP,
>+					  MSR_TYPE_RW, incpt);
>+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_INT_SSP_TAB,
>+					  MSR_TYPE_RW, incpt);
>+		if (!incpt)
>+			return;
>+	}
>+
>+	if (kvm_cpu_cap_has(X86_FEATURE_IBT)) {
>+		incpt = !guest_can_use(vcpu, X86_FEATURE_IBT);

can you use guest_can_use() or guest_cpuid_has() consistently?

>+
>+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET,
>+					  MSR_TYPE_RW, incpt);
>+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET,
>+					  MSR_TYPE_RW, incpt);
>+	}
>+}
>+
> static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> {
> 	struct vcpu_vmx *vmx = to_vmx(vcpu);
>@@ -7814,6 +7853,8 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> 
> 	/* Refresh #PF interception to account for MAXPHYADDR changes. */
> 	vmx_update_exception_bitmap(vcpu);
>+
>+	vmx_update_intercept_for_cet_msr(vcpu);
> }
> 
> static u64 vmx_get_perf_capabilities(void)
>-- 
>2.27.0
>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 14/19] KVM:VMX: Set host constant supervisor states to VMCS fields
  2023-08-03  4:27 ` [PATCH v5 14/19] KVM:VMX: Set host constant supervisor states to VMCS fields Yang Weijiang
@ 2023-08-04  8:23   ` Chao Gao
  0 siblings, 0 replies; 82+ messages in thread
From: Chao Gao @ 2023-08-04  8:23 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, binbin.wu

On Thu, Aug 03, 2023 at 12:27:27AM -0400, Yang Weijiang wrote:
>Set constant values to HOST_{S_CET,SSP,INTR_SSP_TABLE} VMCS
>fields explicitly. Kernel IBT is supported and the setting in
>MSR_IA32_S_CET is static after post-boot(except is BIOS call
>case but vCPU thread never across it.), i.e. KVM doesn't need
>to refresh HOST_S_CET field before every VM-Enter/VM-Exit
>sequence.
>
>Host supervisor shadow stack is not enabled now and SSP is not
>accessible to kernel mode, thus it's safe to set host IA32_INT_
>SSP_TAB/SSP VMCS fields to 0s. When shadow stack is enabled for
>CPL3, SSP is reloaded from IA32_PL3_SSP before it exits to userspace.
>Check SDM Vol 2A/B Chapter 3/4 for SYSCALL/SYSRET/SYSENTER SYSEXIT/
>RDSSP/CALL etc.
>
>Prevent KVM module loading and if host supervisor shadow stack
>SHSTK_EN is set in MSR_IA32_S_CET as KVM cannot co-exit with it
>correctly.
>
>Suggested-by: Sean Christopherson <seanjc@google.com>
>Suggested-by: Chao Gao <chao.gao@intel.com>
>Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>

Reviewed-by: Chao Gao <chao.gao@intel.com>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 11/19] KVM:VMX: Emulate read and write to CET MSRs
  2023-08-03  4:27 ` [PATCH v5 11/19] KVM:VMX: Emulate read and write to CET MSRs Yang Weijiang
  2023-08-04  5:14   ` Chao Gao
@ 2023-08-04  8:28   ` Chao Gao
  2023-08-09  7:12     ` Yang, Weijiang
  2023-08-04 21:40   ` Paolo Bonzini
  2 siblings, 1 reply; 82+ messages in thread
From: Chao Gao @ 2023-08-04  8:28 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, binbin.wu

>+	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
>+		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
>+			return 1;
>+		if (is_noncanonical_address(data, vcpu))
>+			return 1;
>+		if (!IS_ALIGNED(data, 4))
>+			return 1;

Why should MSR_IA32_INT_SSP_TAB be 4-byte aligned? I don't see
this requirement in SDM.

IA32_INTERRUPT_SSP_TABLE_ADDR:

Linear address of a table of seven shadow
stack pointers that are selected in IA-32e
mode using the IST index (when not 0) from
the interrupt gate descriptor. (R/W)
This MSR is not present on processors that
do not support Intel 64 architecture. This
field cannot represent a non-canonical
address.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 15/19] KVM:x86: Optimize CET supervisor SSP save/reload
  2023-08-03  4:27 ` [PATCH v5 15/19] KVM:x86: Optimize CET supervisor SSP save/reload Yang Weijiang
@ 2023-08-04  8:43   ` Chao Gao
  2023-08-09  9:00     ` Yang, Weijiang
  0 siblings, 1 reply; 82+ messages in thread
From: Chao Gao @ 2023-08-04  8:43 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, binbin.wu

On Thu, Aug 03, 2023 at 12:27:28AM -0400, Yang Weijiang wrote:
>Make PL{0,1,2}_SSP as write-intercepted to detect whether
>guest is using these MSRs. Disable intercept to the MSRs
>if they're written with non-zero values. KVM does save/
>reload for the MSRs only if they're used by guest.

What would happen if guest tries to use XRSTORS to load S_CET state from a
xsave area without any writes to the PL0-2_SSP (i.e., at that point, writes to
the MSRs are still intercepted)?

>@@ -2420,6 +2432,14 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> 		else
> 			vmx->pt_desc.guest.addr_a[index / 2] = data;
> 		break;
>+	case MSR_IA32_PL0_SSP ... MSR_IA32_PL2_SSP:
>+		if (kvm_set_msr_common(vcpu, msr_info))
>+			return 1;
>+		if (data) {
>+			vmx_disable_write_intercept_sss_msr(vcpu);
>+			wrmsrl(msr_index, data);

Is it necessary to do the wrmsl()?
looks the next kvm_x86_prepare_switch_to_guest() will load PL0-2_SSP from the
caching values.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 12/19] KVM:x86: Save and reload SSP to/from SMRAM
  2023-08-04  7:53   ` Chao Gao
@ 2023-08-04 15:25     ` Sean Christopherson
  2023-08-06  9:14       ` Yang, Weijiang
  0 siblings, 1 reply; 82+ messages in thread
From: Sean Christopherson @ 2023-08-04 15:25 UTC (permalink / raw)
  To: Chao Gao
  Cc: Yang Weijiang, pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, binbin.wu

On Fri, Aug 04, 2023, Chao Gao wrote:
> On Thu, Aug 03, 2023 at 12:27:25AM -0400, Yang Weijiang wrote:
> >Save CET SSP to SMRAM on SMI and reload it on RSM.
> >KVM emulates architectural behavior when guest enters/leaves SMM
> >mode, i.e., save registers to SMRAM at the entry of SMM and reload
> >them at the exit of SMM. Per SDM, SSP is defined as one of
> >the fields in SMRAM for 64-bit mode, so handle the state accordingly.
> >
> >Check is_smm() to determine whether kvm_cet_is_msr_accessible()
> >is called in SMM mode so that kvm_{set,get}_msr() works in SMM mode.
> >
> >Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> >---
> > arch/x86/kvm/smm.c | 11 +++++++++++
> > arch/x86/kvm/smm.h |  2 +-
> > arch/x86/kvm/x86.c | 11 ++++++++++-
> > 3 files changed, 22 insertions(+), 2 deletions(-)
> >
> >diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c
> >index b42111a24cc2..e0b62d211306 100644
> >--- a/arch/x86/kvm/smm.c
> >+++ b/arch/x86/kvm/smm.c
> >@@ -309,6 +309,12 @@ void enter_smm(struct kvm_vcpu *vcpu)
> > 
> > 	kvm_smm_changed(vcpu, true);
> > 
> >+#ifdef CONFIG_X86_64
> >+	if (guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
> >+	    kvm_get_msr(vcpu, MSR_KVM_GUEST_SSP, &smram.smram64.ssp))
> >+		goto error;
> >+#endif
> 
> SSP save/load should go to enter_smm_save_state_64() and rsm_load_state_64(),
> where other fields of SMRAM are handled.

+1.  The right way to get/set MSRs like this is to use __kvm_get_msr() and pass
%true for @host_initiated.  Though I would add a prep patch to provide wrappers
for __kvm_get_msr() and __kvm_set_msr().  Naming will be hard, but I think we
can use kvm_{read,write}_msr() to go along with the KVM-initiated register
accessors/mutators, e.g. kvm_register_read(), kvm_pdptr_write(), etc.

Then you don't need to wait until after kvm_smm_changed(), and kvm_cet_is_msr_accessible()
doesn't need the confusing (and broken) SMM waiver, e.g. as Chao points out below,
that would allow the guest to access the synthetic MSR.

Delta patch at the bottom (would need to be split up, rebased, etc.).

> > 	if (kvm_vcpu_write_guest(vcpu, vcpu->arch.smbase + 0xfe00, &smram, sizeof(smram)))
> > 		goto error;
> > 
> >@@ -586,6 +592,11 @@ int emulator_leave_smm(struct x86_emulate_ctxt *ctxt)
> > 	if ((vcpu->arch.hflags & HF_SMM_INSIDE_NMI_MASK) == 0)
> > 		static_call(kvm_x86_set_nmi_mask)(vcpu, false);
> > 
> >+#ifdef CONFIG_X86_64
> >+	if (guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
> >+	    kvm_set_msr(vcpu, MSR_KVM_GUEST_SSP, smram.smram64.ssp))
> >+		return X86EMUL_UNHANDLEABLE;
> >+#endif
> > 	kvm_smm_changed(vcpu, false);
> > 
> > 	/*
> >diff --git a/arch/x86/kvm/smm.h b/arch/x86/kvm/smm.h
> >index a1cf2ac5bd78..1e2a3e18207f 100644
> >--- a/arch/x86/kvm/smm.h
> >+++ b/arch/x86/kvm/smm.h
> >@@ -116,8 +116,8 @@ struct kvm_smram_state_64 {
> > 	u32 smbase;
> > 	u32 reserved4[5];
> > 
> >-	/* ssp and svm_* fields below are not implemented by KVM */
> > 	u64 ssp;
> >+	/* svm_* fields below are not implemented by KVM */
> > 	u64 svm_guest_pat;
> > 	u64 svm_host_efer;
> > 	u64 svm_host_cr4;
> >diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >index 98f3ff6078e6..56aa5a3d3913 100644
> >--- a/arch/x86/kvm/x86.c
> >+++ b/arch/x86/kvm/x86.c
> >@@ -3644,8 +3644,17 @@ static bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu,
> > 		if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
> > 			return false;
> > 
> >-		if (msr->index == MSR_KVM_GUEST_SSP)
> >+		/*
> >+		 * This MSR is synthesized mainly for userspace access during
> >+		 * Live Migration, it also can be accessed in SMM mode by VMM.
> >+		 * Guest is not allowed to access this MSR.
> >+		 */
> >+		if (msr->index == MSR_KVM_GUEST_SSP) {
> >+			if (IS_ENABLED(CONFIG_X86_64) && is_smm(vcpu))
> >+				return true;
> 
> On second thoughts, this is incorrect. We don't want guest in SMM
> mode to read/write SSP via the synthesized MSR. Right?

It's not a guest read though, KVM is doing the read while emulating SMI/RSM.

> You can
> 1. move set/get guest SSP into two helper functions, e.g., kvm_set/get_ssp()
> 2. call kvm_set/get_ssp() for host-initiated MSR accesses and SMM transitions.

We could, but that would largely defeat the purpose of kvm_x86_ops.{g,s}et_msr(),
i.e. we already have hooks to get at MSR values that are buried in the VMCS/VMCB,
the interface is just a bit kludgy.
 
> 3. refuse guest accesses to the synthesized MSR.

---
 arch/x86/include/asm/kvm_host.h |  8 +++++++-
 arch/x86/kvm/cpuid.c            |  2 +-
 arch/x86/kvm/smm.c              | 10 ++++------
 arch/x86/kvm/x86.c              | 17 +++++++++++++----
 4 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f883696723f4..fe8484bc8082 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1939,7 +1939,13 @@ void kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu);
 
 void kvm_enable_efer_bits(u64);
 bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer);
-int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data, bool host_initiated);
+
+/*
+ * kvm_msr_{read,write}() are KVM-internal helpers, i.e. for when KVM needs to
+ * get/set an MSR value when emulating CPU behavior.
+ */
+int kvm_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data);
+int kvm_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 *data);
 int kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data);
 int kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data);
 int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 1a601be7b4fa..b595645b2af7 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1515,7 +1515,7 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
 		*edx = entry->edx;
 		if (function == 7 && index == 0) {
 			u64 data;
-		        if (!__kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data, true) &&
+		        if (!kvm_msr_read(vcpu, MSR_IA32_TSX_CTRL, &data) &&
 			    (data & TSX_CTRL_CPUID_CLEAR))
 				*ebx &= ~(F(RTM) | F(HLE));
 		} else if (function == 0x80000007) {
diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c
index e0b62d211306..8db12831877e 100644
--- a/arch/x86/kvm/smm.c
+++ b/arch/x86/kvm/smm.c
@@ -275,6 +275,10 @@ static void enter_smm_save_state_64(struct kvm_vcpu *vcpu,
 	enter_smm_save_seg_64(vcpu, &smram->gs, VCPU_SREG_GS);
 
 	smram->int_shadow = static_call(kvm_x86_get_interrupt_shadow)(vcpu);
+
+	if (guest_can_use(vcpu, X86_FEATURE_SHSTK)
+		KVM_BUG_ON(kvm_msr_read(vcpu, MSR_KVM_GUEST_SSP,
+					&smram.smram64.ssp), vcpu->kvm));
 }
 #endif
 
@@ -309,12 +313,6 @@ void enter_smm(struct kvm_vcpu *vcpu)
 
 	kvm_smm_changed(vcpu, true);
 
-#ifdef CONFIG_X86_64
-	if (guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
-	    kvm_get_msr(vcpu, MSR_KVM_GUEST_SSP, &smram.smram64.ssp))
-		goto error;
-#endif
-
 	if (kvm_vcpu_write_guest(vcpu, vcpu->arch.smbase + 0xfe00, &smram, sizeof(smram)))
 		goto error;
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2e200a5d00e9..872767b7bf51 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1924,8 +1924,8 @@ static int kvm_set_msr_ignored_check(struct kvm_vcpu *vcpu,
  * Returns 0 on success, non-0 otherwise.
  * Assumes vcpu_load() was already called.
  */
-int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
-		  bool host_initiated)
+static int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
+			 bool host_initiated)
 {
 	struct msr_data msr;
 	int ret;
@@ -1951,6 +1951,16 @@ int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
 	return ret;
 }
 
+int kvm_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 *data)
+{
+	return __kvm_get_msr(vcpu, index, data, true);
+}
+
+int kvm_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data)
+{
+	return __kvm_get_msr(vcpu, index, data, true);
+}
+
 static int kvm_get_msr_ignored_check(struct kvm_vcpu *vcpu,
 				     u32 index, u64 *data, bool host_initiated)
 {
@@ -4433,8 +4443,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			return 1;
 		if (msr == MSR_IA32_PL0_SSP || msr == MSR_IA32_PL1_SSP ||
 		    msr == MSR_IA32_PL2_SSP) {
-			msr_info->data =
-				vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP];
+			msr_info->data = vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP];
 		} else if (msr == MSR_IA32_U_CET || msr == MSR_IA32_PL3_SSP) {
 			kvm_get_xsave_msr(msr_info);
 		}

base-commit: 82e95ab0094bf1b823a6f9c9a07238852b375a22
-- 


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 04/19] KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS
  2023-08-03  4:27 ` [PATCH v5 04/19] KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS Yang Weijiang
@ 2023-08-04 16:02   ` Sean Christopherson
  2023-08-04 21:43     ` Paolo Bonzini
  2023-08-08 14:20     ` Yang, Weijiang
  2023-08-04 18:27   ` Sean Christopherson
  1 sibling, 2 replies; 82+ messages in thread
From: Sean Christopherson @ 2023-08-04 16:02 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, chao.gao, binbin.wu, Zhang Yi Z

On Thu, Aug 03, 2023, Yang Weijiang wrote:
> Update CPUID(EAX=0DH,ECX=1) when the guest's XSS is modified.
> CPUID(EAX=0DH,ECX=1).EBX reports required storage size of
> all enabled xstate features in XCR0 | XSS. Guest can allocate
> sufficient xsave buffer based on the info.

Please wrap changelogs closer to ~75 chars.  I'm pretty sure this isn't the first
time I've made this request...

> Note, KVM does not yet support any XSS based features, i.e.
> supported_xss is guaranteed to be zero at this time.
> 
> Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  1 +
>  arch/x86/kvm/cpuid.c            | 20 ++++++++++++++++++--
>  arch/x86/kvm/x86.c              |  8 +++++---
>  3 files changed, 24 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 28bd38303d70..20bbcd95511f 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -804,6 +804,7 @@ struct kvm_vcpu_arch {
>  
>  	u64 xcr0;
>  	u64 guest_supported_xcr0;
> +	u64 guest_supported_xss;
>  
>  	struct kvm_pio_request pio;
>  	void *pio_data;
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 7f4d13383cf2..0338316b827c 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -249,6 +249,17 @@ static u64 cpuid_get_supported_xcr0(struct kvm_cpuid_entry2 *entries, int nent)
>  	return (best->eax | ((u64)best->edx << 32)) & kvm_caps.supported_xcr0;
>  }
>  
> +static u64 cpuid_get_supported_xss(struct kvm_cpuid_entry2 *entries, int nent)
> +{
> +	struct kvm_cpuid_entry2 *best;
> +
> +	best = cpuid_entry2_find(entries, nent, 0xd, 1);
> +	if (!best)
> +		return 0;
> +
> +	return (best->ecx | ((u64)best->edx << 32)) & kvm_caps.supported_xss;
> +}
> +
>  static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *entries,
>  				       int nent)
>  {
> @@ -276,8 +287,11 @@ static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_e
>  
>  	best = cpuid_entry2_find(entries, nent, 0xD, 1);
>  	if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
> -		     cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
> -		best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
> +		     cpuid_entry_has(best, X86_FEATURE_XSAVEC))) {
> +		u64 xstate = vcpu->arch.xcr0 | vcpu->arch.ia32_xss;

Nit, the variable should be xfeatures, not xstate.  Though I vote to avoid the
variable entirely,

	best = cpuid_entry2_find(entries, nent, 0xD, 1);
	if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
		     cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
		best->ebx = xstate_required_size(vcpu->arch.xcr0 |
						 vcpu->arch.ia32_xss, true);

though it's only a slight preference, i.e. feel free to keep your approach if
you or others feel strongly about the style.

> +	}
>  
>  	best = __kvm_find_kvm_cpuid_features(vcpu, entries, nent);
>  	if (kvm_hlt_in_guest(vcpu->kvm) && best &&
> @@ -325,6 +339,8 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  
>  	vcpu->arch.guest_supported_xcr0 =
>  		cpuid_get_supported_xcr0(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);
> +	vcpu->arch.guest_supported_xss =
> +		cpuid_get_supported_xss(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);

Blech.  I tried to clean up this ugly, but Paolo disagreed[*].  Can you fold in
the below (compile tested only) patch at the very beginning of this series?  It
implements my suggested alternative.  And then this would become:

static u64 vcpu_get_supported_xss(struct kvm_vcpu *vcpu)
{
	struct kvm_cpuid_entry2 *best;

	best = kvm_find_cpuid_entry_index(vcpu, 0xd, 1);
	if (!best)
		return 0;

	return (best->ecx | ((u64)best->edx << 32)) & kvm_caps.supported_xss;
}

[*] https://lore.kernel.org/all/ZGfius5UkckpUyXl@google.com

>  	/*
>  	 * FP+SSE can always be saved/restored via KVM_{G,S}ET_XSAVE, even if
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0b9033551d8c..5d6d6fa33e5b 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3780,10 +3780,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		 * IA32_XSS[bit 8]. Guests have to use RDMSR/WRMSR rather than
>  		 * XSAVES/XRSTORS to save/restore PT MSRs.
>  		 */
> -		if (data & ~kvm_caps.supported_xss)
> +		if (data & ~vcpu->arch.guest_supported_xss)
>  			return 1;
> -		vcpu->arch.ia32_xss = data;
> -		kvm_update_cpuid_runtime(vcpu);
> +		if (vcpu->arch.ia32_xss != data) {
> +			vcpu->arch.ia32_xss = data;
> +			kvm_update_cpuid_runtime(vcpu);
> +		}

Nit, I prefer this style:

		if (vcpu->arch.ia32_xss == data)
			break;

		vcpu->arch.ia32_xss = data;
		kvm_update_cpuid_runtime(vcpu);

so that the common path isn't buried in an if-statement.

>  		break;
>  	case MSR_SMI_COUNT:
>  		if (!msr_info->host_initiated)
> -- 


From: Sean Christopherson <seanjc@google.com>
Date: Fri, 4 Aug 2023 08:48:03 -0700
Subject: [PATCH] KVM: x86: Rework cpuid_get_supported_xcr0() to operate on
 vCPU data

Rework and rename cpuid_get_supported_xcr0() to explicitly operate on vCPU
state, i.e. on a vCPU's CPUID state.  Prior to commit 275a87244ec8 ("KVM:
x86: Don't adjust guest's CPUID.0x12.1 (allowed SGX enclave XFRM)"), KVM
incorrectly fudged guest CPUID at runtime, which in turn necessitated
massaging the incoming CPUID state for KVM_SET_CPUID{2} so as not to run
afoul of kvm_cpuid_check_equal().

Opportunistically move the helper below kvm_update_cpuid_runtime() to make
it harder to repeat the mistake of querying supported XCR0 for runtime
updates.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c | 33 ++++++++++++++++-----------------
 1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 7f4d13383cf2..5e42846c948a 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -234,21 +234,6 @@ void kvm_update_pv_runtime(struct kvm_vcpu *vcpu)
 		vcpu->arch.pv_cpuid.features = best->eax;
 }
 
-/*
- * Calculate guest's supported XCR0 taking into account guest CPUID data and
- * KVM's supported XCR0 (comprised of host's XCR0 and KVM_SUPPORTED_XCR0).
- */
-static u64 cpuid_get_supported_xcr0(struct kvm_cpuid_entry2 *entries, int nent)
-{
-	struct kvm_cpuid_entry2 *best;
-
-	best = cpuid_entry2_find(entries, nent, 0xd, 0);
-	if (!best)
-		return 0;
-
-	return (best->eax | ((u64)best->edx << 32)) & kvm_caps.supported_xcr0;
-}
-
 static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *entries,
 				       int nent)
 {
@@ -299,6 +284,21 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_update_cpuid_runtime);
 
+/*
+ * Calculate guest's supported XCR0 taking into account guest CPUID data and
+ * KVM's supported XCR0 (comprised of host's XCR0 and KVM_SUPPORTED_XCR0).
+ */
+static u64 vcpu_get_supported_xcr0(struct kvm_vcpu *vcpu)
+{
+	struct kvm_cpuid_entry2 *best;
+
+	best = kvm_find_cpuid_entry_index(vcpu, 0xd, 0);
+	if (!best)
+		return 0;
+
+	return (best->eax | ((u64)best->edx << 32)) & kvm_caps.supported_xcr0;
+}
+
 static bool kvm_cpuid_has_hyperv(struct kvm_cpuid_entry2 *entries, int nent)
 {
 	struct kvm_cpuid_entry2 *entry;
@@ -323,8 +323,7 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 		kvm_apic_set_version(vcpu);
 	}
 
-	vcpu->arch.guest_supported_xcr0 =
-		cpuid_get_supported_xcr0(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);
+	vcpu->arch.guest_supported_xcr0 = vcpu_get_supported_xcr0(vcpu);
 
 	/*
 	 * FP+SSE can always be saved/restored via KVM_{G,S}ET_XSAVE, even if

base-commit: f0147fcfab840fe9a3f03e9645d25c1326373fe6
-- 


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 04/19] KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS
  2023-08-03  4:27 ` [PATCH v5 04/19] KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS Yang Weijiang
  2023-08-04 16:02   ` Sean Christopherson
@ 2023-08-04 18:27   ` Sean Christopherson
  2023-08-07  6:55     ` Paolo Bonzini
  2023-08-09  8:56     ` Yang, Weijiang
  1 sibling, 2 replies; 82+ messages in thread
From: Sean Christopherson @ 2023-08-04 18:27 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, chao.gao, binbin.wu, Zhang Yi Z

On Thu, Aug 03, 2023, Yang Weijiang wrote:
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0b9033551d8c..5d6d6fa33e5b 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3780,10 +3780,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		 * IA32_XSS[bit 8]. Guests have to use RDMSR/WRMSR rather than
>  		 * XSAVES/XRSTORS to save/restore PT MSRs.
>  		 */
> -		if (data & ~kvm_caps.supported_xss)
> +		if (data & ~vcpu->arch.guest_supported_xss)

Hmm, this is arguably wrong for userspace-initiated writes, as it would prevent
userspace from restoring MSRs before CPUID.

And it would make the handling of MSR_IA32_XSS writes inconsistent just within
this case statement.  The initial "can this MSR be written at all" check would
*not* honor guest CPUID for host writes, but then the per-bit check *would* honor
guest CPUID for host writes.

But if we exempt host writes, then we'll end up with another mess, as exempting
host writes for MSR_KVM_GUEST_SSP would let the guest coerce KVM into writing an
illegal value by modifying SMRAM while in SMM.

Blech.

If we can get away with it, i.e. not break userspace, I think my preference is
to enforce guest CPUID for host accesses to XSS, XFD, XFD_ERR, etc.  I'm 99%
certain we can make that change, because there are many, many MSRs that do NOT
exempt host writes, i.e. the only way this would be a breaking change is if
userspace is writing things like XSS before KVM_SET_CPUID2, but other MSRs after
KVM_SET_CPUID2.

I'm pretty sure I've advocated for the exact opposite in the past, i.e. argued
that KVM's ABI is to not enforce ordering between KVM_SET_CPUID2 and KVM_SET_MSR.
But this is becoming untenable, juggling the dependencies in KVM is complex and
is going to result in a nasty bug at some point.

For this series, lets just tighten the rules for XSS, i.e. drop the host_initated
exemption.  And in a parallel/separate series, try to do a wholesale cleanup of
all the cases that essentially allow userspace to do KVM_SET_MSR before KVM_SET_CPUID2.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 05/19] KVM:x86: Initialize kvm_caps.supported_xss
  2023-08-03  4:27 ` [PATCH v5 05/19] KVM:x86: Initialize kvm_caps.supported_xss Yang Weijiang
@ 2023-08-04 18:45   ` Sean Christopherson
  2023-08-08 15:08     ` Yang, Weijiang
  0 siblings, 1 reply; 82+ messages in thread
From: Sean Christopherson @ 2023-08-04 18:45 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, chao.gao, binbin.wu

On Thu, Aug 03, 2023, Yang Weijiang wrote:
> Set kvm_caps.supported_xss to host_xss && KVM XSS mask.
> host_xss contains the host supported xstate feature bits for thread
> context switch, KVM_SUPPORTED_XSS includes all KVM enabled XSS feature
> bits, the operation result represents all KVM supported feature bits.
> Since the result is subset of host_xss, the related XSAVE-managed MSRs
> are automatically swapped for guest and host when vCPU exits to
> userspace.
> 
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> ---
>  arch/x86/kvm/vmx/vmx.c | 1 -
>  arch/x86/kvm/x86.c     | 6 +++++-
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 0ecf4be2c6af..c8d9870cfecb 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7849,7 +7849,6 @@ static __init void vmx_set_cpu_caps(void)
>  		kvm_cpu_cap_set(X86_FEATURE_UMIP);
>  
>  	/* CPUID 0xD.1 */
> -	kvm_caps.supported_xss = 0;

Dropping this code in *this* patch is wrong, this belong in whatever patch(es) adds
IBT and SHSTK support in VMX.

And that does matter because it means this common patch can be carried wih SVM
support without breaking VMX.

>  	if (!cpu_has_vmx_xsaves())
>  		kvm_cpu_cap_clear(X86_FEATURE_XSAVES);
>  
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 5d6d6fa33e5b..e9f3627d5fdd 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -225,6 +225,8 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
>  				| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
>  				| XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE)
>  
> +#define KVM_SUPPORTED_XSS     0
> +
>  u64 __read_mostly host_efer;
>  EXPORT_SYMBOL_GPL(host_efer);
>  
> @@ -9498,8 +9500,10 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
>  
>  	rdmsrl_safe(MSR_EFER, &host_efer);
>  
> -	if (boot_cpu_has(X86_FEATURE_XSAVES))
> +	if (boot_cpu_has(X86_FEATURE_XSAVES)) {
>  		rdmsrl(MSR_IA32_XSS, host_xss);
> +		kvm_caps.supported_xss = host_xss & KVM_SUPPORTED_XSS;
> +	}

Can you opportunistically (in this patch) hoist this above EFER so that XCR0 and
XSS are colocated?  I.e. end up with this:

	if (boot_cpu_has(X86_FEATURE_XSAVE)) {
		host_xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
		kvm_caps.supported_xcr0 = host_xcr0 & KVM_SUPPORTED_XCR0;
	}
	if (boot_cpu_has(X86_FEATURE_XSAVES)) {
		rdmsrl(MSR_IA32_XSS, host_xss);
		kvm_caps.supported_xss = host_xss & KVM_SUPPORTED_XSS;
	}

	rdmsrl_safe(MSR_EFER, &host_efer);


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 08/19] KVM:x86: Report KVM supported CET MSRs as to-be-saved
  2023-08-04  5:51       ` Chao Gao
@ 2023-08-04 18:51         ` Sean Christopherson
  2023-08-04 22:01           ` Paolo Bonzini
  2023-08-08 15:16           ` Yang, Weijiang
  2023-08-06  8:54         ` Yang, Weijiang
  1 sibling, 2 replies; 82+ messages in thread
From: Sean Christopherson @ 2023-08-04 18:51 UTC (permalink / raw)
  To: Chao Gao
  Cc: Weijiang Yang, pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, binbin.wu

On Fri, Aug 04, 2023, Chao Gao wrote:
> On Fri, Aug 04, 2023 at 11:13:36AM +0800, Yang, Weijiang wrote:
> >> > @@ -7214,6 +7217,13 @@ static void kvm_probe_msr_to_save(u32 msr_index)
> >> > 		if (!kvm_caps.supported_xss)
> >> > 			return;
> >> > 		break;
> >> > +	case MSR_IA32_U_CET:
> >> > +	case MSR_IA32_S_CET:
> >> > +	case MSR_KVM_GUEST_SSP:
> >> > +	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
> >> > +		if (!kvm_is_cet_supported())
> >> shall we consider the case where IBT is supported while SS isn't
> >> (e.g., in L1 guest)?
> >Yes, but userspace should be able to access SHSTK MSRs even only IBT is exposed to guest so
> >far as KVM can support SHSTK MSRs.
> 
> Why should userspace be allowed to access SHSTK MSRs in this case? L1 may not
> even enumerate SHSTK (qemu removes -shstk explicitly but keeps IBT), how KVM in
> L1 can allow its userspace to do that?

+1.  And specifically, this isn't about SHSTK being exposed to the guest, it's about
SHSTK being _supported by KVM_.  This is all about KVM telling userspace what MSRs
are valid and/or need to be saved+restored.  If KVM doesn't support a feature,
then the MSRs are invalid and there is no reason for userspace to save+restore
the MSRs on live migration.

> >> > +static inline bool kvm_is_cet_supported(void)
> >> > +{
> >> > +	return (kvm_caps.supported_xss & CET_XSTATE_MASK) == CET_XSTATE_MASK;
> >> why not just check if SHSTK or IBT is supported explicitly, i.e.,
> >> 
> >> 	return kvm_cpu_cap_has(X86_FEATURE_SHSTK) ||
> >> 	       kvm_cpu_cap_has(X86_FEATURE_IBT);
> >> 
> >> this is straightforward. And strictly speaking, the support of a feature and
> >> the support of managing a feature's state via XSAVE(S) are two different things.x
> >I think using exiting check implies two things:
> >1. Platform/KVM can support CET features.
> >2. CET user mode MSRs are backed by host thus are guaranteed to be valid.
> >i.e., the purpose is to check guest CET dependencies instead of features' availability.
> 
> When KVM claims a feature is supported, it should ensure all its dependencies are
> met. that's, KVM's support of a feature also imples all dependencies are met.
> Function-wise, the two approaches have no difference. I just think checking
> KVM's support of SHSTK/IBT is more clear because the function name is
> kvm_is_cet_supported() rather than e.g., kvm_is_cet_state_managed_by_xsave().

+1, one of the big reasons kvm_cpu_cap_has() came about was being KVM had a giant
mess of one-off helpers.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 08/19] KVM:x86: Report KVM supported CET MSRs as to-be-saved
  2023-08-03  4:27 ` [PATCH v5 08/19] KVM:x86: Report KVM supported CET MSRs as to-be-saved Yang Weijiang
  2023-08-03 10:39   ` Chao Gao
@ 2023-08-04 18:55   ` Sean Christopherson
  2023-08-08 15:26     ` Yang, Weijiang
  2023-08-04 21:47   ` Paolo Bonzini
  2 siblings, 1 reply; 82+ messages in thread
From: Sean Christopherson @ 2023-08-04 18:55 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, chao.gao, binbin.wu

On Thu, Aug 03, 2023, Yang Weijiang wrote:
> Add all CET MSRs including the synthesized GUEST_SSP to report list.
> PL{0,1,2}_SSP are independent to host XSAVE management with later
> patches. MSR_IA32_U_CET and MSR_IA32_PL3_SSP are XSAVE-managed on
> host side. MSR_IA32_S_CET/MSR_IA32_INT_SSP_TAB/MSR_KVM_GUEST_SSP
> are not XSAVE-managed.
> 
> When CET IBT/SHSTK are enumerated to guest, both user and supervisor
> modes should be supported for architechtural integrity, i.e., two
> modes are supported as both or neither.
> 
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> ---
>  arch/x86/include/uapi/asm/kvm_para.h |  1 +
>  arch/x86/kvm/x86.c                   | 10 ++++++++++
>  arch/x86/kvm/x86.h                   | 10 ++++++++++
>  3 files changed, 21 insertions(+)
> 
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> index 6e64b27b2c1e..7af465e4e0bd 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -58,6 +58,7 @@
>  #define MSR_KVM_ASYNC_PF_INT	0x4b564d06
>  #define MSR_KVM_ASYNC_PF_ACK	0x4b564d07
>  #define MSR_KVM_MIGRATION_CONTROL	0x4b564d08
> +#define MSR_KVM_GUEST_SSP	0x4b564d09
>  
>  struct kvm_steal_time {
>  	__u64 steal;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 82b9f14990da..d68ef87fe007 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1463,6 +1463,9 @@ static const u32 msrs_to_save_base[] = {
>  
>  	MSR_IA32_XFD, MSR_IA32_XFD_ERR,
>  	MSR_IA32_XSS,
> +	MSR_IA32_U_CET, MSR_IA32_S_CET,
> +	MSR_IA32_PL0_SSP, MSR_IA32_PL1_SSP, MSR_IA32_PL2_SSP,
> +	MSR_IA32_PL3_SSP, MSR_IA32_INT_SSP_TAB, MSR_KVM_GUEST_SSP,
>  };
>  
>  static const u32 msrs_to_save_pmu[] = {
> @@ -7214,6 +7217,13 @@ static void kvm_probe_msr_to_save(u32 msr_index)
>  		if (!kvm_caps.supported_xss)
>  			return;
>  		break;
> +	case MSR_IA32_U_CET:
> +	case MSR_IA32_S_CET:
> +	case MSR_KVM_GUEST_SSP:
> +	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
> +		if (!kvm_is_cet_supported())
> +			return;
> +		break;
>  	default:
>  		break;
>  	}
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 82e3dafc5453..6e6292915f8c 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -362,6 +362,16 @@ static inline bool kvm_mpx_supported(void)
>  		== (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR);
>  }
>  
> +#define CET_XSTATE_MASK (XFEATURE_MASK_CET_USER)

This is funky.  As of this patch, KVM reports MSR_IA32_S_CET, a supervisor MSR,
but does not require XFEATURE_MASK_CET_KERNEL.  That eventually comes along with
"KVM:x86: Enable guest CET supervisor xstate bit support", but as of this patch
KVM is busted.

The whole cpuid_count() code in that patch shouldn't exist, so the easiest thing
is to just fold the KVM_SUPPORTED_XSS and CET_XSTATE_MASK changes from that patch
into this one.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 09/19] KVM:x86: Make guest supervisor states as non-XSAVE managed
  2023-08-04  3:26     ` Yang, Weijiang
@ 2023-08-04 20:45       ` Sean Christopherson
  2023-08-04 20:59         ` Peter Zijlstra
                           ` (3 more replies)
  0 siblings, 4 replies; 82+ messages in thread
From: Sean Christopherson @ 2023-08-04 20:45 UTC (permalink / raw)
  To: Weijiang Yang
  Cc: Chao Gao, pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, binbin.wu

On Fri, Aug 04, 2023, Weijiang Yang wrote:
> On 8/3/2023 7:15 PM, Chao Gao wrote:
> > On Thu, Aug 03, 2023 at 12:27:22AM -0400, Yang Weijiang wrote:
> > > +void save_cet_supervisor_ssp(struct kvm_vcpu *vcpu)
> > > +{
> > > +	if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) {

Drop the unlikely, KVM should not speculate on the guest configuration or underlying
hardware.

> > > +		rdmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]);
> > > +		rdmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]);
> > > +		rdmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]);
> > > +		/*
> > > +		 * Omit reset to host PL{1,2}_SSP because Linux will never use
> > > +		 * these MSRs.
> > > +		 */
> > > +		wrmsrl(MSR_IA32_PL0_SSP, 0);
> > This wrmsrl() can be dropped because host doesn't support SSS yet.
> Frankly speaking, I want to remove this line of code. But that would mess up the MSR
> on host side, i.e., from host perspective, the MSRs could be filled with garbage data,
> and looks awful.

So?  :-)

That's the case for all of the MSRs that KVM defers restoring until the host
returns to userspace, i.e. running in the host with bogus values in hardware is
nothing new.

And as I mentioned in the other thread regarding the assertion that SSS isn't
enabled in the host, sanitizing hardware values for something that should never
be consumed is a fools errand.

> Anyway, I can remove it.

Yes please, though it may be a moot point.

> > > +	}
> > > +}
> > > +EXPORT_SYMBOL_GPL(save_cet_supervisor_ssp);
> > > +
> > > +void reload_cet_supervisor_ssp(struct kvm_vcpu *vcpu)
> > > +{
> > > +	if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) {
> > ditto
> Below is to reload guest supervisor SSPs instead of resetting host ones.
> > > +		wrmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]);
> > > +		wrmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]);
> > > +		wrmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]);

Pulling back in the justification from v3:

 the Pros:
  - Super easy to implement for KVM.
  - Automatically avoids saving and restoring this data when the vmexit
    is handled within KVM.

 the Cons:
  - Unnecessarily restores XFEATURE_CET_KERNEL when switching to
    non-KVM task's userspace.
  - Forces allocating space for this state on all tasks, whether or not
    they use KVM, and with likely zero users today and the near future.
  - Complicates the FPU optimization thinking by including things that
    can have no affect on userspace in the FPU

IMO the pros far outweigh the cons.  3x RDMSR and 3x WRMSR when loading host/guest
state is non-trivial overhead.  That can be mitigated, e.g. by utilizing the
user return MSR framework, but it's still unpalatable.  It's unlikely many guests
will SSS in the *near* future, but I don't want to end up with code that performs
poorly in the future and needs to be rewritten.

Especially because another big negative is that not utilizing XSTATE bleeds into
KVM's ABI.  Userspace has to be told to manually save+restore MSRs instead of just
letting KVM_{G,S}ET_XSAVE handle the state.  And that will create a bit of a
snafu if Linux does gain support for SSS.

On the other hand, the extra per-task memory is all of 24 bytes.  AFAICT, there's
literally zero effect on guest XSTATE allocations because those are vmalloc'd and
thus rounded up to PAGE_SIZE, i.e. the next 4KiB.  And XSTATE needs to be 64-byte
aligned, so the 24 bytes is only actually meaningful if the current size is within
24 bytes of the next cahce line.  And the "current" size is variable depending on
which features are present and enabled, i.e. it's a roll of the dice as to whether
or not using XSTATE for supervisor CET would actually increase memory usage.  And
_if_ it does increase memory consumption, I have a very hard time believing an
extra 64 bytes in the worst case scenario is a dealbreaker.

If the performance is a concern, i.e. we don't want to eat saving/restoring the
MSRs when switching to/from host FPU context, then I *think* that's simply a matter
of keeping guest state resident when loading non-guest FPU state.

diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 1015af1ae562..8e7599e3b923 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -167,6 +167,16 @@ void restore_fpregs_from_fpstate(struct fpstate *fpstate, u64 mask)
                 */
                xfd_update_state(fpstate);
 
+               /*
+                * Leave supervisor CET state as-is when loading host state
+                * (kernel or userspace).  Supervisor CET state is managed via
+                * XSTATE for KVM guests, but the host never consumes said
+                * state (doesn't support supervisor shadow stacks), i.e. it's
+                * safe to keep guest state loaded into hardware.
+                */
+               if (!fpstate->is_guest)
+                       mask &= ~XFEATURE_MASK_CET_KERNEL;
+
                /*
                 * Restoring state always needs to modify all features
                 * which are in @mask even if the current task cannot use


So unless I'm missing something, NAK to this approach, at least not without trying
the kernel FPU approach, i.e. I want somelike like to PeterZ or tglx to actually
full on NAK the kernel approach before we consider shoving a hack into KVM.

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 09/19] KVM:x86: Make guest supervisor states as non-XSAVE managed
  2023-08-04 20:45       ` Sean Christopherson
@ 2023-08-04 20:59         ` Peter Zijlstra
  2023-08-04 21:32         ` Paolo Bonzini
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 82+ messages in thread
From: Peter Zijlstra @ 2023-08-04 20:59 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Weijiang Yang, Chao Gao, pbonzini, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, binbin.wu

On Fri, Aug 04, 2023 at 01:45:11PM -0700, Sean Christopherson wrote:

> So unless I'm missing something, NAK to this approach, at least not without trying
> the kernel FPU approach, i.e. I want somelike like to PeterZ or tglx to actually
> full on NAK the kernel approach before we consider shoving a hack into KVM.

Not having fully followed things (I'll go read up), SSS is blocked on
FRED. But it is definitely on the books to do SSS once FRED is go.

So if the approach as chosen gets in the way of host kernel SS
management, that is a wee problem.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 11/19] KVM:VMX: Emulate read and write to CET MSRs
  2023-08-04  5:14   ` Chao Gao
@ 2023-08-04 21:27     ` Sean Christopherson
  2023-08-04 21:45       ` Paolo Bonzini
  2023-08-06  8:44       ` Yang, Weijiang
  0 siblings, 2 replies; 82+ messages in thread
From: Sean Christopherson @ 2023-08-04 21:27 UTC (permalink / raw)
  To: Chao Gao
  Cc: Yang Weijiang, pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, binbin.wu

On Fri, Aug 04, 2023, Chao Gao wrote:
> On Thu, Aug 03, 2023 at 12:27:24AM -0400, Yang Weijiang wrote:
> >Add emulation interface for CET MSR read and write.
> >The emulation code is split into common part and vendor specific
> >part, the former resides in x86.c to benefic different x86 CPU
> >vendors, the latter for VMX is implemented in this patch.
> >
> >Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> >---
> > arch/x86/kvm/vmx/vmx.c |  27 +++++++++++
> > arch/x86/kvm/x86.c     | 104 +++++++++++++++++++++++++++++++++++++----
> > arch/x86/kvm/x86.h     |  18 +++++++
> > 3 files changed, 141 insertions(+), 8 deletions(-)
> >
> >diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> >index 6aa76124e81e..ccf750e79608 100644
> >--- a/arch/x86/kvm/vmx/vmx.c
> >+++ b/arch/x86/kvm/vmx/vmx.c
> >@@ -2095,6 +2095,18 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > 		else
> > 			msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
> > 		break;
> >+	case MSR_IA32_S_CET:
> >+	case MSR_KVM_GUEST_SSP:
> >+	case MSR_IA32_INT_SSP_TAB:
> >+		if (kvm_get_msr_common(vcpu, msr_info))
> >+			return 1;
> >+		if (msr_info->index == MSR_KVM_GUEST_SSP)
> >+			msr_info->data = vmcs_readl(GUEST_SSP);
> >+		else if (msr_info->index == MSR_IA32_S_CET)
> >+			msr_info->data = vmcs_readl(GUEST_S_CET);
> >+		else if (msr_info->index == MSR_IA32_INT_SSP_TAB)
> >+			msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE);
> 
> This if-else-if suggests that they are focibly grouped together to just
> share the call of kvm_get_msr_common(). For readability, I think it is better
> to handle them separately.
> 
> e.g.,
> 	case MSR_IA32_S_CET:
> 		if (kvm_get_msr_common(vcpu, msr_info))
> 			return 1;
> 		msr_info->data = vmcs_readl(GUEST_S_CET);
> 		break;
> 
> 	case MSR_KVM_GUEST_SSP:
> 		if (kvm_get_msr_common(vcpu, msr_info))
> 			return 1;
> 		msr_info->data = vmcs_readl(GUEST_SSP);
> 		break;

Actually, we can do even better.  We have an existing framework for these types
of prechecks, I just completely forgot about it :-(  (my "look at PAT" was a bad
suggestion).

Handle the checks in __kvm_set_msr() and __kvm_get_msr(), i.e. *before* calling
into vendor code.  Then vendor code doesn't need to make weird callbacks.

> > int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > {
> > 	u32 msr = msr_info->index;
> >@@ -3981,6 +4014,45 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > 		vcpu->arch.guest_fpu.xfd_err = data;
> > 		break;
> > #endif
> >+#define CET_EXCLUSIVE_BITS		(CET_SUPPRESS | CET_WAIT_ENDBR)
> >+#define CET_CTRL_RESERVED_BITS		GENMASK(9, 6)

Please use a single namespace for these #defines, e.g. CET_CTRL_* or maybe
CET_US_* for everything.

> >+#define CET_SHSTK_MASK_BITS		GENMASK(1, 0)
> >+#define CET_IBT_MASK_BITS		(GENMASK_ULL(5, 2) | \
> >+					 GENMASK_ULL(63, 10))
> >+#define CET_LEG_BITMAP_BASE(data)	((data) >> 12)

Bah, stupid SDM.  Please spell out "LEGACY", I though "LEG" was short for "LEGAL"
since this looks a lot like a page shift, i.e. getting a pfn.

> >+static bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu,
> >+				      struct msr_data *msr)
> >+{
> >+	if (is_shadow_stack_msr(msr->index)) {
> >+		if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
> >+			return false;
> >+
> >+		if (msr->index == MSR_KVM_GUEST_SSP)
> >+			return msr->host_initiated;
> >+
> >+		return msr->host_initiated ||
> >+			guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
> >+	}
> >+
> >+	if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
> >+	    !kvm_cpu_cap_has(X86_FEATURE_IBT))
> >+		return false;
> >+
> >+	return msr->host_initiated ||
> >+		guest_cpuid_has(vcpu, X86_FEATURE_IBT) ||
> >+		guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);

Similar to my suggestsion for XSS, I think we drop the waiver for host_initiated
accesses, i.e. require the feature to be enabled and exposed to the guest, even
for the host.

> >diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> >index c69fc027f5ec..3b79d6db2f83 100644
> >--- a/arch/x86/kvm/x86.h
> >+++ b/arch/x86/kvm/x86.h
> >@@ -552,4 +552,22 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
> > 			 unsigned int port, void *data,  unsigned int count,
> > 			 int in);
> > 
> >+/*
> >+ * Guest xstate MSRs have been loaded in __msr_io(), disable preemption before
> >+ * access the MSRs to avoid MSR content corruption.
> >+ */
> 
> I think it is better to describe what the function does prior to jumping into
> details like where guest FPU is loaded.
> 
> /*
>  * Lock and/or reload guest FPU and access xstate MSRs. For accesses initiated
>  * by host, guest FPU is loaded in __msr_io(). For accesses initiated by guest,
>  * guest FPU should have been loaded already.
>  */
> >+static inline void kvm_get_xsave_msr(struct msr_data *msr_info)
> >+{
> >+	kvm_fpu_get();
> >+	rdmsrl(msr_info->index, msr_info->data);
> >+	kvm_fpu_put();
> >+}
> >+
> >+static inline void kvm_set_xsave_msr(struct msr_data *msr_info)
> >+{
> >+	kvm_fpu_get();
> >+	wrmsrl(msr_info->index, msr_info->data);
> >+	kvm_fpu_put();
> >+}
> 
> Can you rename functions to kvm_get/set_xstate_msr() to align with the comment
> and patch 6? And if there is no user outside x86.c, you can just put these two
> functions right after the is_xstate_msr() added in patch 6.

+1.  These should also assert that (a) guest FPU state is loaded and (b) the MSR
is passed through to the guest.  I might be ok dropping (b) if both VMX and SVM
passthrough all MSRs if they're exposed to the guest, i.e. not lazily passed
through.

Sans any changes to kvm_{g,s}et_xsave_msr(), I think this?  (completely untested)


---
 arch/x86/kvm/vmx/vmx.c |  34 +++-------
 arch/x86/kvm/x86.c     | 151 +++++++++++++++--------------------------
 2 files changed, 64 insertions(+), 121 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 491039aeb61b..1211eb469d06 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2100,16 +2100,13 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
 		break;
 	case MSR_IA32_S_CET:
+		msr_info->data = vmcs_readl(GUEST_S_CET);
+		break;
 	case MSR_KVM_GUEST_SSP:
+		msr_info->data = vmcs_readl(GUEST_SSP);
+		break;
 	case MSR_IA32_INT_SSP_TAB:
-		if (kvm_get_msr_common(vcpu, msr_info))
-			return 1;
-		if (msr_info->index == MSR_KVM_GUEST_SSP)
-			msr_info->data = vmcs_readl(GUEST_SSP);
-		else if (msr_info->index == MSR_IA32_S_CET)
-			msr_info->data = vmcs_readl(GUEST_S_CET);
-		else if (msr_info->index == MSR_IA32_INT_SSP_TAB)
-			msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE);
+		msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE);
 		break;
 	case MSR_IA32_DEBUGCTLMSR:
 		msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
@@ -2432,25 +2429,14 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		else
 			vmx->pt_desc.guest.addr_a[index / 2] = data;
 		break;
-	case MSR_IA32_PL0_SSP ... MSR_IA32_PL2_SSP:
-		if (kvm_set_msr_common(vcpu, msr_info))
-			return 1;
-		if (data) {
-			vmx_disable_write_intercept_sss_msr(vcpu);
-			wrmsrl(msr_index, data);
-		}
-		break;
 	case MSR_IA32_S_CET:
+		vmcs_writel(GUEST_S_CET, data);
+		break;
 	case MSR_KVM_GUEST_SSP:
+		vmcs_writel(GUEST_SSP, data);
+		break;
 	case MSR_IA32_INT_SSP_TAB:
-		if (kvm_set_msr_common(vcpu, msr_info))
-			return 1;
-		if (msr_index == MSR_KVM_GUEST_SSP)
-			vmcs_writel(GUEST_SSP, data);
-		else if (msr_index == MSR_IA32_S_CET)
-			vmcs_writel(GUEST_S_CET, data);
-		else if (msr_index == MSR_IA32_INT_SSP_TAB)
-			vmcs_writel(GUEST_INTR_SSP_TABLE, data);
+		vmcs_writel(GUEST_INTR_SSP_TABLE, data);
 		break;
 	case MSR_IA32_PERF_CAPABILITIES:
 		if (data && !vcpu_to_pmu(vcpu)->version)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7385fc25a987..75e6de7c9268 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1838,6 +1838,11 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type)
 }
 EXPORT_SYMBOL_GPL(kvm_msr_allowed);
 
+#define CET_US_RESERVED_BITS		GENMASK(9, 6)
+#define CET_US_SHSTK_MASK_BITS		GENMASK(1, 0)
+#define CET_US_IBT_MASK_BITS		(GENMASK_ULL(5, 2) | GENMASK_ULL(63, 10))
+#define CET_US_LEGACY_BITMAP_BASE(data)	((data) >> 12)
+
 /*
  * Write @data into the MSR specified by @index.  Select MSR specific fault
  * checks are bypassed if @host_initiated is %true.
@@ -1897,6 +1902,35 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
 
 		data = (u32)data;
 		break;
+	case MSR_IA32_U_CET:
+	case MSR_IA32_S_CET:
+		if (!guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
+		    !guest_can_use(vcpu, X86_FEATURE_IBT))
+		    	return 1;
+		if (data & CET_US_RESERVED_BITS)
+			return 1;
+		if (!guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
+		    (data & CET_US_SHSTK_MASK_BITS))
+			return 1;
+		if (!guest_can_use(vcpu, X86_FEATURE_IBT) &&
+		    (data & CET_US_IBT_MASK_BITS))
+			return 1;
+		if (!IS_ALIGNED(CET_US_LEGACY_BITMAP_BASE(data), 4))
+			return 1;
+
+		/* IBT can be suppressed iff the TRACKER isn't WAIT_ENDR. */
+		if ((data & CET_SUPPRESS) && (data & CET_WAIT_ENDBR))
+			return 1;
+		break;
+	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
+	case MSR_KVM_GUEST_SSP:
+		if (!guest_can_use(vcpu, X86_FEATURE_SHSTK))
+			return 1;
+		if (is_noncanonical_address(data, vcpu))
+			return 1;
+		if (!IS_ALIGNED(data, 4))
+			return 1;
+		break;
 	}
 
 	msr.data = data;
@@ -1940,6 +1974,17 @@ static int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
 		    !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
 			return 1;
 		break;
+	case MSR_IA32_U_CET:
+	case MSR_IA32_S_CET:
+		if (!guest_can_use(vcpu, X86_FEATURE_IBT) &&
+		    !guest_can_use(vcpu, X86_FEATURE_SHSTK))
+			return 1;
+		break;
+	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
+	case MSR_KVM_GUEST_SSP:
+		if (!guest_can_use(vcpu, X86_FEATURE_SHSTK))
+			return 1;
+		break;
 	}
 
 	msr.index = index;
@@ -3640,47 +3685,6 @@ static bool kvm_is_msr_to_save(u32 msr_index)
 	return false;
 }
 
-static inline bool is_shadow_stack_msr(u32 msr)
-{
-	return msr == MSR_IA32_PL0_SSP ||
-		msr == MSR_IA32_PL1_SSP ||
-		msr == MSR_IA32_PL2_SSP ||
-		msr == MSR_IA32_PL3_SSP ||
-		msr == MSR_IA32_INT_SSP_TAB ||
-		msr == MSR_KVM_GUEST_SSP;
-}
-
-static bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu,
-				      struct msr_data *msr)
-{
-	if (is_shadow_stack_msr(msr->index)) {
-		if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
-			return false;
-
-		/*
-		 * This MSR is synthesized mainly for userspace access during
-		 * Live Migration, it also can be accessed in SMM mode by VMM.
-		 * Guest is not allowed to access this MSR.
-		 */
-		if (msr->index == MSR_KVM_GUEST_SSP) {
-			if (IS_ENABLED(CONFIG_X86_64) && is_smm(vcpu))
-				return true;
-
-			return msr->host_initiated;
-		}
-
-		return msr->host_initiated ||
-			guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
-	}
-
-	if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
-	    !kvm_cpu_cap_has(X86_FEATURE_IBT))
-		return false;
-
-	return msr->host_initiated ||
-		guest_cpuid_has(vcpu, X86_FEATURE_IBT) ||
-		guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
-}
 
 int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
@@ -4036,46 +4040,9 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		vcpu->arch.guest_fpu.xfd_err = data;
 		break;
 #endif
-#define CET_EXCLUSIVE_BITS		(CET_SUPPRESS | CET_WAIT_ENDBR)
-#define CET_CTRL_RESERVED_BITS		GENMASK(9, 6)
-#define CET_SHSTK_MASK_BITS		GENMASK(1, 0)
-#define CET_IBT_MASK_BITS		(GENMASK_ULL(5, 2) | \
-					 GENMASK_ULL(63, 10))
-#define CET_LEG_BITMAP_BASE(data)	((data) >> 12)
 	case MSR_IA32_U_CET:
-	case MSR_IA32_S_CET:
-		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
-			return 1;
-		if (!!(data & CET_CTRL_RESERVED_BITS))
-			return 1;
-		if (!guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
-		    (data & CET_SHSTK_MASK_BITS))
-			return 1;
-		if (!guest_can_use(vcpu, X86_FEATURE_IBT) &&
-		    (data & CET_IBT_MASK_BITS))
-			return 1;
-		if (!IS_ALIGNED(CET_LEG_BITMAP_BASE(data), 4) ||
-		    (data & CET_EXCLUSIVE_BITS) == CET_EXCLUSIVE_BITS)
-			return 1;
-		if (msr == MSR_IA32_U_CET)
-			kvm_set_xsave_msr(msr_info);
-		break;
-	case MSR_KVM_GUEST_SSP:
-	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
-		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
-			return 1;
-		if (is_noncanonical_address(data, vcpu))
-			return 1;
-		if (!IS_ALIGNED(data, 4))
-			return 1;
-		if (msr == MSR_IA32_PL0_SSP || msr == MSR_IA32_PL1_SSP ||
-		    msr == MSR_IA32_PL2_SSP) {
-			vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP] = data;
-			if (!vcpu->arch.cet_sss_active && data)
-				vcpu->arch.cet_sss_active = true;
-		} else if (msr == MSR_IA32_PL3_SSP) {
-			kvm_set_xsave_msr(msr_info);
-		}
+	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+		kvm_set_xsave_msr(msr_info);
 		break;
 	default:
 		if (kvm_pmu_is_valid_msr(vcpu, msr))
@@ -4436,17 +4403,8 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		break;
 #endif
 	case MSR_IA32_U_CET:
-	case MSR_IA32_S_CET:
-	case MSR_KVM_GUEST_SSP:
-	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
-		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
-			return 1;
-		if (msr == MSR_IA32_PL0_SSP || msr == MSR_IA32_PL1_SSP ||
-		    msr == MSR_IA32_PL2_SSP) {
-			msr_info->data = vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP];
-		} else if (msr == MSR_IA32_U_CET || msr == MSR_IA32_PL3_SSP) {
-			kvm_get_xsave_msr(msr_info);
-		}
+	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+		kvm_get_xsave_msr(msr_info);
 		break;
 	default:
 		if (kvm_pmu_is_valid_msr(vcpu, msr))
@@ -7330,9 +7288,13 @@ static void kvm_probe_msr_to_save(u32 msr_index)
 		break;
 	case MSR_IA32_U_CET:
 	case MSR_IA32_S_CET:
+		if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
+		    !kvm_cpu_cap_has(X86_FEATURE_IBT))
+			return;
+		break;
 	case MSR_KVM_GUEST_SSP:
 	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
-		if (!kvm_is_cet_supported())
+		if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
 			return;
 		break;
 	default:
@@ -9664,13 +9626,8 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 		kvm_caps.supported_xcr0 = host_xcr0 & KVM_SUPPORTED_XCR0;
 	}
 	if (boot_cpu_has(X86_FEATURE_XSAVES)) {
-		u32 eax, ebx, ecx, edx;
-
-		cpuid_count(0xd, 1, &eax, &ebx, &ecx, &edx);
 		rdmsrl(MSR_IA32_XSS, host_xss);
 		kvm_caps.supported_xss = host_xss & KVM_SUPPORTED_XSS;
-		if (ecx & XFEATURE_MASK_CET_KERNEL)
-			kvm_caps.supported_xss |= XFEATURE_MASK_CET_KERNEL;
 	}
 
 	rdmsrl_safe(MSR_EFER, &host_efer);

base-commit: efb9177acd7a4df5883b844e1ec9c69ef0899c9c
-- 


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 09/19] KVM:x86: Make guest supervisor states as non-XSAVE managed
  2023-08-04 20:45       ` Sean Christopherson
  2023-08-04 20:59         ` Peter Zijlstra
@ 2023-08-04 21:32         ` Paolo Bonzini
  2023-08-09  2:51           ` Yang, Weijiang
  2023-08-09  2:39         ` Yang, Weijiang
  2023-08-10  9:29         ` Yang, Weijiang
  3 siblings, 1 reply; 82+ messages in thread
From: Paolo Bonzini @ 2023-08-04 21:32 UTC (permalink / raw)
  To: Sean Christopherson, Weijiang Yang
  Cc: Chao Gao, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, binbin.wu

On 8/4/23 22:45, Sean Christopherson wrote:
>>>> +void save_cet_supervisor_ssp(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +	if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) {
> Drop the unlikely, KVM should not speculate on the guest configuration or underlying
> hardware.

In general unlikely() can still be a good idea if you have a fast path 
vs. a slow path; the extra cost of a branch will be much more visible on 
the fast path.  That said the compiler should already be doing that.

>  the Pros:
>   - Super easy to implement for KVM.
>   - Automatically avoids saving and restoring this data when the vmexit
>     is handled within KVM.
>
>  the Cons:
>   - Unnecessarily restores XFEATURE_CET_KERNEL when switching to
>     non-KVM task's userspace.
>   - Forces allocating space for this state on all tasks, whether or not
>     they use KVM, and with likely zero users today and the near future.
>   - Complicates the FPU optimization thinking by including things that
>     can have no affect on userspace in the FPU

I'm not sure if Linux will ever use XFEATURE_CET_KERNEL.  Linux does not 
use MSR_IA32_PL{1,2}_SSP; MSR_IA32_PL0_SSP probably would be per-CPU but 
it is not used while in ring 0 (except for SETSSBSY) and the restore can 
be delayed until return to userspace.  It is not unlike the SYSCALL MSRs.

So I would treat the bit similar to the dynamic features even if it's 
not guarded by XFD, for example

#define XFEATURE_MASK_USER_DYNAMIC XFEATURE_MASK_XTILE_DATA
#define XFEATURE_MASK_USER_OPTIONAL \
	(XFEATURE_MASK_DYNAMIC | XFEATURE_MASK_CET_KERNEL)

where XFEATURE_MASK_USER_DYNAMIC is used for xfd-related tasks but 
everything else uses XFEATURE_MASK_USER_OPTIONAL.

Then you'd enable the feature by hand when allocating the guest fpstate.

> Especially because another big negative is that not utilizing XSTATE bleeds into
> KVM's ABI.  Userspace has to be told to manually save+restore MSRs instead of just
> letting KVM_{G,S}ET_XSAVE handle the state.  And that will create a bit of a
> snafu if Linux does gain support for SSS.

I don't think this matters, we don't have any MSRs in KVM_GET/SET_XSAVE 
and in fact we can't even add them since the uABI uses the non-compacted 
format.  MSRs should be retrieved and set via KVM_GET/SET_MSR and 
userspace will learn about the index automatically via 
KVM_GET_MSR_INDEX_LIST.

Paolo


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 18/19] KVM:nVMX: Refine error code injection to nested VM
  2023-08-03  4:27 ` [PATCH v5 18/19] KVM:nVMX: Refine error code injection to nested VM Yang Weijiang
@ 2023-08-04 21:38   ` Sean Christopherson
  2023-08-09  3:00     ` Yang, Weijiang
  0 siblings, 1 reply; 82+ messages in thread
From: Sean Christopherson @ 2023-08-04 21:38 UTC (permalink / raw)
  To: Yang Weijiang
  Cc: pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, chao.gao, binbin.wu

This is not "refinement", this is full on supporting a new nVMX feature.  Please
phrase the shortlog accordingly, e.g. something like this (it's not very good,
but it's a start).

  KVM: nVMX: Add support for exposing "No PM H/W error code checks" to L1

Regarding shortlog, please update all of them in this series to put a space after
the colon, i.e. "KVM: VMX:" and "KVM: x86:", not "KVM:x86:".

>  static void nested_vmx_setup_cr_fixed(struct nested_vmx_msrs *msrs)
> diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
> index 96952263b029..1884628294e4 100644
> --- a/arch/x86/kvm/vmx/nested.h
> +++ b/arch/x86/kvm/vmx/nested.h
> @@ -284,6 +284,13 @@ static inline bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val)
>  	       __kvm_is_valid_cr4(vcpu, val);
>  }
>  
> +static inline bool nested_cpu_has_no_hw_errcode(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +
> +	return vmx->nested.msrs.basic & VMX_BASIC_NO_HW_ERROR_CODE;

The "CC" part of my suggestion is critical to this being sane.  As is, this reads
"nested CPU has no hardware error code", which is not even remotely close to the
truth.

static inline bool nested_cpu_has_no_hw_errcode_cc(struct kvm_vcpu *vcpu)
{
	return to_vmx(vcpu)->nested.msrs.basic & VMX_BASIC_NO_HW_ERROR_CODE_CC;
}

[*] https://lore.kernel.org/all/ZJ7vyBw1nbTBOfuf@google.com

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 11/19] KVM:VMX: Emulate read and write to CET MSRs
  2023-08-03  4:27 ` [PATCH v5 11/19] KVM:VMX: Emulate read and write to CET MSRs Yang Weijiang
  2023-08-04  5:14   ` Chao Gao
  2023-08-04  8:28   ` Chao Gao
@ 2023-08-04 21:40   ` Paolo Bonzini
  2023-08-09  3:05     ` Yang, Weijiang
  2 siblings, 1 reply; 82+ messages in thread
From: Paolo Bonzini @ 2023-08-04 21:40 UTC (permalink / raw)
  To: Yang Weijiang, seanjc, peterz, john.allen, kvm, linux-kernel
  Cc: rick.p.edgecombe, chao.gao, binbin.wu

On 8/3/23 06:27, Yang Weijiang wrote:
> +		if (msr_info->index == MSR_KVM_GUEST_SSP)
> +			msr_info->data = vmcs_readl(GUEST_SSP);

Accesses to MSR_KVM_(GUEST_)SSP must be rejected unless host-initiated.

Paolo


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 04/19] KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS
  2023-08-04 16:02   ` Sean Christopherson
@ 2023-08-04 21:43     ` Paolo Bonzini
  2023-08-09  3:11       ` Yang, Weijiang
  2023-08-08 14:20     ` Yang, Weijiang
  1 sibling, 1 reply; 82+ messages in thread
From: Paolo Bonzini @ 2023-08-04 21:43 UTC (permalink / raw)
  To: Sean Christopherson, Yang Weijiang
  Cc: peterz, john.allen, kvm, linux-kernel, rick.p.edgecombe,
	chao.gao, binbin.wu, Zhang Yi Z

On 8/4/23 18:02, Sean Christopherson wrote:
>> Update CPUID(EAX=0DH,ECX=1) when the guest's XSS is modified.
>> CPUID(EAX=0DH,ECX=1).EBX reports required storage size of
>> all enabled xstate features in XCR0 | XSS. Guest can allocate
>> sufficient xsave buffer based on the info.
>
> Please wrap changelogs closer to ~75 chars.  I'm pretty sure this isn't the first
> time I've made this request...

I suspect this is because of the long "word" CPUID(EAX=0DH,ECX=1).EBX. 
It would make the lengths less homogeneous if lines 1 stayed the same 
but lines 2-4 became longer.

Paolo


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 11/19] KVM:VMX: Emulate read and write to CET MSRs
  2023-08-04 21:27     ` Sean Christopherson
@ 2023-08-04 21:45       ` Paolo Bonzini
  2023-08-04 22:21         ` Sean Christopherson
  2023-08-06  8:44       ` Yang, Weijiang
  1 sibling, 1 reply; 82+ messages in thread
From: Paolo Bonzini @ 2023-08-04 21:45 UTC (permalink / raw)
  To: Sean Christopherson, Chao Gao
  Cc: Yang Weijiang, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, binbin.wu

On 8/4/23 23:27, Sean Christopherson wrote:
>>> +
>>> +	if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
>>> +	    !kvm_cpu_cap_has(X86_FEATURE_IBT))
>>> +		return false;
>>> +
>>> +	return msr->host_initiated ||
>>> +		guest_cpuid_has(vcpu, X86_FEATURE_IBT) ||
>>> +		guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
>
> Similar to my suggestsion for XSS, I think we drop the waiver for host_initiated
> accesses, i.e. require the feature to be enabled and exposed to the guest, even
> for the host.

No, please don't.  Allowing host-initiated accesses is what makes it 
possible to take the list of MSR indices and pass it blindly to 
KVM_GET_MSR and KVM_SET_MSR.  This should be documented, will send a patch.

Paolo


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 08/19] KVM:x86: Report KVM supported CET MSRs as to-be-saved
  2023-08-03  4:27 ` [PATCH v5 08/19] KVM:x86: Report KVM supported CET MSRs as to-be-saved Yang Weijiang
  2023-08-03 10:39   ` Chao Gao
  2023-08-04 18:55   ` Sean Christopherson
@ 2023-08-04 21:47   ` Paolo Bonzini
  2023-08-09  3:14     ` Yang, Weijiang
  2 siblings, 1 reply; 82+ messages in thread
From: Paolo Bonzini @ 2023-08-04 21:47 UTC (permalink / raw)
  To: Yang Weijiang, seanjc, peterz, john.allen, kvm, linux-kernel
  Cc: rick.p.edgecombe, chao.gao, binbin.wu

On 8/3/23 06:27, Yang Weijiang wrote:
> Add all CET MSRs including the synthesized GUEST_SSP to report list.
> PL{0,1,2}_SSP are independent to host XSAVE management with later
> patches. MSR_IA32_U_CET and MSR_IA32_PL3_SSP are XSAVE-managed on
> host side. MSR_IA32_S_CET/MSR_IA32_INT_SSP_TAB/MSR_KVM_GUEST_SSP
> are not XSAVE-managed.

MSR_KVM_GUEST_SSP -> MSR_KVM_SSP

Also please add a comment,

/*
  * SSP can only be read via RDSSP; writing even requires
  * destructive and potentially faulting operations such as
  * SAVEPREVSSP/RSTORSSP or SETSSBSY/CLRSSBSY.  Let the host
  * use a pseudo-MSR that is just a wrapper for the GUEST_SSP
  * field of the VMCS.
  */

Paolo


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 08/19] KVM:x86: Report KVM supported CET MSRs as to-be-saved
  2023-08-04 18:51         ` Sean Christopherson
@ 2023-08-04 22:01           ` Paolo Bonzini
  2023-08-08 15:16           ` Yang, Weijiang
  1 sibling, 0 replies; 82+ messages in thread
From: Paolo Bonzini @ 2023-08-04 22:01 UTC (permalink / raw)
  To: Sean Christopherson, Chao Gao
  Cc: Weijiang Yang, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, binbin.wu

On 8/4/23 20:51, Sean Christopherson wrote:
>>>>> +	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
>>>>> +		if (!kvm_is_cet_supported())
>>>> shall we consider the case where IBT is supported while SS isn't
>>>> (e.g., in L1 guest)?
>>>
>>> Yes, but userspace should be able to access SHSTK MSRs even only IBT is
>>> exposed to guest so far as KVM can support SHSTK MSRs.
>>
>> Why should userspace be allowed to access SHSTK MSRs in this case? L1 may not
>> even enumerate SHSTK (qemu removes -shstk explicitly but keeps IBT), how KVM in
>> L1 can allow its userspace to do that?
>
> +1.  And specifically, this isn't about SHSTK being exposed to the guest, it's about
> SHSTK being _supported by KVM_.  This is all about KVM telling userspace what MSRs
> are valid and/or need to be saved+restored.  If KVM doesn't support a feature,
> then the MSRs are invalid and there is no reason for userspace to save+restore
> the MSRs on live migration.

I think you three are talking past each other.

There are four cases:

- U_CET/S_CET supported by the host and exposed (obvious).

- U_CET/S_CET supported by the host, IBT or SHSTK partially exposed.  The
MSRs should still be guest-accessible and bits that apply to absent features
should be reserved (bits 0-1 for SHSTK, bits 2-63 for IBT).

- U_CET/S_CET supported by the host, IBT or SHSTK not exposed.  The MSRs
should still be host-accessible and writable to the default value.  This is
clearer if you think that KVM_GET_MSR_INDEX_LIST is a system ioctl.  Whether
to allow writing 0 from the guest is debatable.

- U_CET/S_CET not supported by the host.  Then the MSRs should not be
enabled and should not be in KVM_GET_MSR_INDEX_LIST, and also IBT/SHSTK
should not be in KVM_GET_SUPPORTED_CPUID.

In my opinion it is reasonable to require both U_CET and S_CET to be
supported from the beginning in the host in order to support CET.  It
is simpler and keeps the feature matrix at bay.

Paolo


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 17/19] KVM:x86: Enable guest CET supervisor xstate bit support
  2023-08-03  4:27 ` [PATCH v5 17/19] KVM:x86: Enable guest CET supervisor xstate bit support Yang Weijiang
@ 2023-08-04 22:02   ` Paolo Bonzini
  2023-08-09  6:07     ` Yang, Weijiang
  0 siblings, 1 reply; 82+ messages in thread
From: Paolo Bonzini @ 2023-08-04 22:02 UTC (permalink / raw)
  To: Yang Weijiang, seanjc, peterz, john.allen, kvm, linux-kernel
  Cc: rick.p.edgecombe, chao.gao, binbin.wu

On 8/3/23 06:27, Yang Weijiang wrote:
>   	if (boot_cpu_has(X86_FEATURE_XSAVES)) {
> +		u32 eax, ebx, ecx, edx;
> +
> +		cpuid_count(0xd, 1, &eax, &ebx, &ecx, &edx);
>   		rdmsrl(MSR_IA32_XSS, host_xss);
>   		kvm_caps.supported_xss = host_xss & KVM_SUPPORTED_XSS;
> +		if (ecx & XFEATURE_MASK_CET_KERNEL)
> +			kvm_caps.supported_xss |= XFEATURE_MASK_CET_KERNEL;
>   	}

This is a bit hackish and makes me lean more towards adding support for 
XFEATURE_MASK_CET_KERNEL in host MSR_IA32_XSS (and then possibly hide it 
in the actual calls to XSAVE/XRSTORS for non-guest FPU).

Paolo


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 11/19] KVM:VMX: Emulate read and write to CET MSRs
  2023-08-04 21:45       ` Paolo Bonzini
@ 2023-08-04 22:21         ` Sean Christopherson
  2023-08-07  7:03           ` Paolo Bonzini
  0 siblings, 1 reply; 82+ messages in thread
From: Sean Christopherson @ 2023-08-04 22:21 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Chao Gao, Yang Weijiang, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, binbin.wu

On Fri, Aug 04, 2023, Paolo Bonzini wrote:
> On 8/4/23 23:27, Sean Christopherson wrote:
> > > > +
> > > > +	if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
> > > > +	    !kvm_cpu_cap_has(X86_FEATURE_IBT))
> > > > +		return false;
> > > > +
> > > > +	return msr->host_initiated ||
> > > > +		guest_cpuid_has(vcpu, X86_FEATURE_IBT) ||
> > > > +		guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
> > 
> > Similar to my suggestsion for XSS, I think we drop the waiver for host_initiated
> > accesses, i.e. require the feature to be enabled and exposed to the guest, even
> > for the host.
> 
> No, please don't.  Allowing host-initiated accesses is what makes it
> possible to take the list of MSR indices and pass it blindly to KVM_GET_MSR
> and KVM_SET_MSR.

I don't see how that can work today.  Oooh, the MSRs that don't exempt host_initiated
are added to the list of MSRs to save/restore, i.e. KVM "silently" supports
MSR_AMD64_OSVW_ID_LENGTH and MSR_AMD64_OSVW_STATUS.

And guest_pv_has() returns true unless userspace has opted in to enforcement.

Sad panda.

That means we need to figure out a solution for KVM stuffing GUEST_SSP on RSM,
which is a "host" write but a guest controlled value.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 11/19] KVM:VMX: Emulate read and write to CET MSRs
  2023-08-04 21:27     ` Sean Christopherson
  2023-08-04 21:45       ` Paolo Bonzini
@ 2023-08-06  8:44       ` Yang, Weijiang
  2023-08-07  7:00         ` Paolo Bonzini
  1 sibling, 1 reply; 82+ messages in thread
From: Yang, Weijiang @ 2023-08-06  8:44 UTC (permalink / raw)
  To: Sean Christopherson, Chao Gao
  Cc: pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, binbin.wu

On 8/5/2023 5:27 AM, Sean Christopherson wrote:
> On Fri, Aug 04, 2023, Chao Gao wrote:
>> On Thu, Aug 03, 2023 at 12:27:24AM -0400, Yang Weijiang wrote:
>>> Add emulation interface for CET MSR read and write.
>>> The emulation code is split into common part and vendor specific
>>> part, the former resides in x86.c to benefic different x86 CPU
>>> vendors, the latter for VMX is implemented in this patch.
>>>
>>> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
>>> ---
>>> arch/x86/kvm/vmx/vmx.c |  27 +++++++++++
>>> arch/x86/kvm/x86.c     | 104 +++++++++++++++++++++++++++++++++++++----
>>> arch/x86/kvm/x86.h     |  18 +++++++
>>> 3 files changed, 141 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>>> index 6aa76124e81e..ccf750e79608 100644
>>> --- a/arch/x86/kvm/vmx/vmx.c
>>> +++ b/arch/x86/kvm/vmx/vmx.c
>>> @@ -2095,6 +2095,18 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>> 		else
>>> 			msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
>>> 		break;
>>> +	case MSR_IA32_S_CET:
>>> +	case MSR_KVM_GUEST_SSP:
>>> +	case MSR_IA32_INT_SSP_TAB:
>>> +		if (kvm_get_msr_common(vcpu, msr_info))
>>> +			return 1;
>>> +		if (msr_info->index == MSR_KVM_GUEST_SSP)
>>> +			msr_info->data = vmcs_readl(GUEST_SSP);
>>> +		else if (msr_info->index == MSR_IA32_S_CET)
>>> +			msr_info->data = vmcs_readl(GUEST_S_CET);
>>> +		else if (msr_info->index == MSR_IA32_INT_SSP_TAB)
>>> +			msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE);
>> This if-else-if suggests that they are focibly grouped together to just
>> share the call of kvm_get_msr_common(). For readability, I think it is better
>> to handle them separately.
>>
>> e.g.,
>> 	case MSR_IA32_S_CET:
>> 		if (kvm_get_msr_common(vcpu, msr_info))
>> 			return 1;
>> 		msr_info->data = vmcs_readl(GUEST_S_CET);
>> 		break;
>>
>> 	case MSR_KVM_GUEST_SSP:
>> 		if (kvm_get_msr_common(vcpu, msr_info))
>> 			return 1;
>> 		msr_info->data = vmcs_readl(GUEST_SSP);
>> 		break;
> Actually, we can do even better.  We have an existing framework for these types
> of prechecks, I just completely forgot about it :-(  (my "look at PAT" was a bad
> suggestion).
>
> Handle the checks in __kvm_set_msr() and __kvm_get_msr(), i.e. *before* calling
> into vendor code.  Then vendor code doesn't need to make weird callbacks.
I see, will change it, thank you!
>>> int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>> {
>>> 	u32 msr = msr_info->index;
>>> @@ -3981,6 +4014,45 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>> 		vcpu->arch.guest_fpu.xfd_err = data;
>>> 		break;
>>> #endif
>>> +#define CET_EXCLUSIVE_BITS		(CET_SUPPRESS | CET_WAIT_ENDBR)
>>> +#define CET_CTRL_RESERVED_BITS		GENMASK(9, 6)
> Please use a single namespace for these #defines, e.g. CET_CTRL_* or maybe
> CET_US_* for everything.
OK.
>>> +#define CET_SHSTK_MASK_BITS		GENMASK(1, 0)
>>> +#define CET_IBT_MASK_BITS		(GENMASK_ULL(5, 2) | \
>>> +					 GENMASK_ULL(63, 10))
>>> +#define CET_LEG_BITMAP_BASE(data)	((data) >> 12)
> Bah, stupid SDM.  Please spell out "LEGACY", I though "LEG" was short for "LEGAL"
> since this looks a lot like a page shift, i.e. getting a pfn.
Sure :-)
>>> +static bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu,
>>> +				      struct msr_data *msr)
>>> +{
>>> +	if (is_shadow_stack_msr(msr->index)) {
>>> +		if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
>>> +			return false;
>>> +
>>> +		if (msr->index == MSR_KVM_GUEST_SSP)
>>> +			return msr->host_initiated;
>>> +
>>> +		return msr->host_initiated ||
>>> +			guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
>>> +	}
>>> +
>>> +	if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
>>> +	    !kvm_cpu_cap_has(X86_FEATURE_IBT))
>>> +		return false;
>>> +
>>> +	return msr->host_initiated ||
>>> +		guest_cpuid_has(vcpu, X86_FEATURE_IBT) ||
>>> +		guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
> Similar to my suggestsion for XSS, I think we drop the waiver for host_initiated
> accesses, i.e. require the feature to be enabled and exposed to the guest, even
> for the host.
I saw Paolo shares different opinion on this, so would hold on for a while...
>>> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
>>> index c69fc027f5ec..3b79d6db2f83 100644
>>> --- a/arch/x86/kvm/x86.h
>>> +++ b/arch/x86/kvm/x86.h
>>> @@ -552,4 +552,22 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
>>> 			 unsigned int port, void *data,  unsigned int count,
>>> 			 int in);
>>>
>>> +/*
>>> + * Guest xstate MSRs have been loaded in __msr_io(), disable preemption before
>>> + * access the MSRs to avoid MSR content corruption.
>>> + */
>> I think it is better to describe what the function does prior to jumping into
>> details like where guest FPU is loaded.
OK, will do it, thanks!
>> /*
>>   * Lock and/or reload guest FPU and access xstate MSRs. For accesses initiated
>>   * by host, guest FPU is loaded in __msr_io(). For accesses initiated by guest,
>>   * guest FPU should have been loaded already.
>>   */
>>> +static inline void kvm_get_xsave_msr(struct msr_data *msr_info)
>>> +{
>>> +	kvm_fpu_get();
>>> +	rdmsrl(msr_info->index, msr_info->data);
>>> +	kvm_fpu_put();
>>> +}
>>> +
>>> +static inline void kvm_set_xsave_msr(struct msr_data *msr_info)
>>> +{
>>> +	kvm_fpu_get();
>>> +	wrmsrl(msr_info->index, msr_info->data);
>>> +	kvm_fpu_put();
>>> +}
>> Can you rename functions to kvm_get/set_xstate_msr() to align with the comment
>> and patch 6? And if there is no user outside x86.c, you can just put these two
>> functions right after the is_xstate_msr() added in patch 6.
OK, maybe I added the helpers in this patch duo to compilation error "function is defined but not used".
> +1.  These should also assert that (a) guest FPU state is loaded and
Do you mean something like this:
WARN_ON_ONCE(!vcpu->arch.guest_fpu->in_use) or  KVM_BUG_ON()
added in the helpers?
> (b) the MSR
> is passed through to the guest.  I might be ok dropping (b) if both VMX and SVM
> passthrough all MSRs if they're exposed to the guest, i.e. not lazily passed
> through.
I'm OK to add the assert if finally all the CET MSRs are passed through directly.
> Sans any changes to kvm_{g,s}et_xsave_msr(), I think this?  (completely untested)
>
>
> ---
>   arch/x86/kvm/vmx/vmx.c |  34 +++-------
>   arch/x86/kvm/x86.c     | 151 +++++++++++++++--------------------------
>   2 files changed, 64 insertions(+), 121 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 491039aeb61b..1211eb469d06 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2100,16 +2100,13 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   			msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
>   		break;
>   	case MSR_IA32_S_CET:
> +		msr_info->data = vmcs_readl(GUEST_S_CET);
> +		break;
>   	case MSR_KVM_GUEST_SSP:
> +		msr_info->data = vmcs_readl(GUEST_SSP);
> +		break;
>   	case MSR_IA32_INT_SSP_TAB:
> -		if (kvm_get_msr_common(vcpu, msr_info))
> -			return 1;
> -		if (msr_info->index == MSR_KVM_GUEST_SSP)
> -			msr_info->data = vmcs_readl(GUEST_SSP);
> -		else if (msr_info->index == MSR_IA32_S_CET)
> -			msr_info->data = vmcs_readl(GUEST_S_CET);
> -		else if (msr_info->index == MSR_IA32_INT_SSP_TAB)
> -			msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE);
> +		msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE);
>   		break;
>   	case MSR_IA32_DEBUGCTLMSR:
>   		msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
> @@ -2432,25 +2429,14 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		else
>   			vmx->pt_desc.guest.addr_a[index / 2] = data;
>   		break;
> -	case MSR_IA32_PL0_SSP ... MSR_IA32_PL2_SSP:
> -		if (kvm_set_msr_common(vcpu, msr_info))
> -			return 1;
> -		if (data) {
> -			vmx_disable_write_intercept_sss_msr(vcpu);
> -			wrmsrl(msr_index, data);
> -		}
> -		break;
>   	case MSR_IA32_S_CET:
> +		vmcs_writel(GUEST_S_CET, data);
> +		break;
>   	case MSR_KVM_GUEST_SSP:
> +		vmcs_writel(GUEST_SSP, data);
> +		break;
>   	case MSR_IA32_INT_SSP_TAB:
> -		if (kvm_set_msr_common(vcpu, msr_info))
> -			return 1;
> -		if (msr_index == MSR_KVM_GUEST_SSP)
> -			vmcs_writel(GUEST_SSP, data);
> -		else if (msr_index == MSR_IA32_S_CET)
> -			vmcs_writel(GUEST_S_CET, data);
> -		else if (msr_index == MSR_IA32_INT_SSP_TAB)
> -			vmcs_writel(GUEST_INTR_SSP_TABLE, data);
> +		vmcs_writel(GUEST_INTR_SSP_TABLE, data);
>   		break;
>   	case MSR_IA32_PERF_CAPABILITIES:
>   		if (data && !vcpu_to_pmu(vcpu)->version)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 7385fc25a987..75e6de7c9268 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1838,6 +1838,11 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type)
>   }
>   EXPORT_SYMBOL_GPL(kvm_msr_allowed);
>   
> +#define CET_US_RESERVED_BITS		GENMASK(9, 6)
> +#define CET_US_SHSTK_MASK_BITS		GENMASK(1, 0)
> +#define CET_US_IBT_MASK_BITS		(GENMASK_ULL(5, 2) | GENMASK_ULL(63, 10))
> +#define CET_US_LEGACY_BITMAP_BASE(data)	((data) >> 12)
> +
>   /*
>    * Write @data into the MSR specified by @index.  Select MSR specific fault
>    * checks are bypassed if @host_initiated is %true.
> @@ -1897,6 +1902,35 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
>   
>   		data = (u32)data;
>   		break;
> +	case MSR_IA32_U_CET:
> +	case MSR_IA32_S_CET:
> +		if (!guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
> +		    !guest_can_use(vcpu, X86_FEATURE_IBT))
> +		    	return 1;
> +		if (data & CET_US_RESERVED_BITS)
> +			return 1;
> +		if (!guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
> +		    (data & CET_US_SHSTK_MASK_BITS))
> +			return 1;
> +		if (!guest_can_use(vcpu, X86_FEATURE_IBT) &&
> +		    (data & CET_US_IBT_MASK_BITS))
> +			return 1;
> +		if (!IS_ALIGNED(CET_US_LEGACY_BITMAP_BASE(data), 4))
> +			return 1;
> +
> +		/* IBT can be suppressed iff the TRACKER isn't WAIT_ENDR. */
> +		if ((data & CET_SUPPRESS) && (data & CET_WAIT_ENDBR))
> +			return 1;
> +		break;
> +	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
> +	case MSR_KVM_GUEST_SSP:
> +		if (!guest_can_use(vcpu, X86_FEATURE_SHSTK))
> +			return 1;
> +		if (is_noncanonical_address(data, vcpu))
> +			return 1;
> +		if (!IS_ALIGNED(data, 4))
> +			return 1;
> +		break;
>   	}
>   
>   	msr.data = data;
> @@ -1940,6 +1974,17 @@ static int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
>   		    !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
>   			return 1;
>   		break;
> +	case MSR_IA32_U_CET:
> +	case MSR_IA32_S_CET:
> +		if (!guest_can_use(vcpu, X86_FEATURE_IBT) &&
> +		    !guest_can_use(vcpu, X86_FEATURE_SHSTK))
> +			return 1;
> +		break;
> +	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
> +	case MSR_KVM_GUEST_SSP:
> +		if (!guest_can_use(vcpu, X86_FEATURE_SHSTK))
> +			return 1;
> +		break;
>   	}
>   
>   	msr.index = index;
> @@ -3640,47 +3685,6 @@ static bool kvm_is_msr_to_save(u32 msr_index)
>   	return false;
>   }
>   
> -static inline bool is_shadow_stack_msr(u32 msr)
> -{
> -	return msr == MSR_IA32_PL0_SSP ||
> -		msr == MSR_IA32_PL1_SSP ||
> -		msr == MSR_IA32_PL2_SSP ||
> -		msr == MSR_IA32_PL3_SSP ||
> -		msr == MSR_IA32_INT_SSP_TAB ||
> -		msr == MSR_KVM_GUEST_SSP;
> -}
> -
> -static bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu,
> -				      struct msr_data *msr)
> -{
> -	if (is_shadow_stack_msr(msr->index)) {
> -		if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
> -			return false;
> -
> -		/*
> -		 * This MSR is synthesized mainly for userspace access during
> -		 * Live Migration, it also can be accessed in SMM mode by VMM.
> -		 * Guest is not allowed to access this MSR.
> -		 */
> -		if (msr->index == MSR_KVM_GUEST_SSP) {
> -			if (IS_ENABLED(CONFIG_X86_64) && is_smm(vcpu))
> -				return true;
> -
> -			return msr->host_initiated;
> -		}
> -
> -		return msr->host_initiated ||
> -			guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
> -	}
> -
> -	if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
> -	    !kvm_cpu_cap_has(X86_FEATURE_IBT))
> -		return false;
> -
> -	return msr->host_initiated ||
> -		guest_cpuid_has(vcpu, X86_FEATURE_IBT) ||
> -		guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
> -}
>   
>   int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   {
> @@ -4036,46 +4040,9 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		vcpu->arch.guest_fpu.xfd_err = data;
>   		break;
>   #endif
> -#define CET_EXCLUSIVE_BITS		(CET_SUPPRESS | CET_WAIT_ENDBR)
> -#define CET_CTRL_RESERVED_BITS		GENMASK(9, 6)
> -#define CET_SHSTK_MASK_BITS		GENMASK(1, 0)
> -#define CET_IBT_MASK_BITS		(GENMASK_ULL(5, 2) | \
> -					 GENMASK_ULL(63, 10))
> -#define CET_LEG_BITMAP_BASE(data)	((data) >> 12)
>   	case MSR_IA32_U_CET:
> -	case MSR_IA32_S_CET:
> -		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
> -			return 1;
> -		if (!!(data & CET_CTRL_RESERVED_BITS))
> -			return 1;
> -		if (!guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
> -		    (data & CET_SHSTK_MASK_BITS))
> -			return 1;
> -		if (!guest_can_use(vcpu, X86_FEATURE_IBT) &&
> -		    (data & CET_IBT_MASK_BITS))
> -			return 1;
> -		if (!IS_ALIGNED(CET_LEG_BITMAP_BASE(data), 4) ||
> -		    (data & CET_EXCLUSIVE_BITS) == CET_EXCLUSIVE_BITS)
> -			return 1;
> -		if (msr == MSR_IA32_U_CET)
> -			kvm_set_xsave_msr(msr_info);
> -		break;
> -	case MSR_KVM_GUEST_SSP:
> -	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
> -		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
> -			return 1;
> -		if (is_noncanonical_address(data, vcpu))
> -			return 1;
> -		if (!IS_ALIGNED(data, 4))
> -			return 1;
> -		if (msr == MSR_IA32_PL0_SSP || msr == MSR_IA32_PL1_SSP ||
> -		    msr == MSR_IA32_PL2_SSP) {
> -			vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP] = data;
> -			if (!vcpu->arch.cet_sss_active && data)
> -				vcpu->arch.cet_sss_active = true;
> -		} else if (msr == MSR_IA32_PL3_SSP) {
> -			kvm_set_xsave_msr(msr_info);
> -		}
> +	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
> +		kvm_set_xsave_msr(msr_info);
>   		break;
>   	default:
>   		if (kvm_pmu_is_valid_msr(vcpu, msr))
> @@ -4436,17 +4403,8 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		break;
>   #endif
>   	case MSR_IA32_U_CET:
> -	case MSR_IA32_S_CET:
> -	case MSR_KVM_GUEST_SSP:
> -	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
> -		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
> -			return 1;
> -		if (msr == MSR_IA32_PL0_SSP || msr == MSR_IA32_PL1_SSP ||
> -		    msr == MSR_IA32_PL2_SSP) {
> -			msr_info->data = vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP];
> -		} else if (msr == MSR_IA32_U_CET || msr == MSR_IA32_PL3_SSP) {
> -			kvm_get_xsave_msr(msr_info);
> -		}
> +	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
> +		kvm_get_xsave_msr(msr_info);
>   		break;
>   	default:
>   		if (kvm_pmu_is_valid_msr(vcpu, msr))
> @@ -7330,9 +7288,13 @@ static void kvm_probe_msr_to_save(u32 msr_index)
>   		break;
>   	case MSR_IA32_U_CET:
>   	case MSR_IA32_S_CET:
> +		if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
> +		    !kvm_cpu_cap_has(X86_FEATURE_IBT))
> +			return;
> +		break;
>   	case MSR_KVM_GUEST_SSP:
>   	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
> -		if (!kvm_is_cet_supported())
> +		if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
>   			return;
>   		break;
>   	default:
> @@ -9664,13 +9626,8 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
>   		kvm_caps.supported_xcr0 = host_xcr0 & KVM_SUPPORTED_XCR0;
>   	}
>   	if (boot_cpu_has(X86_FEATURE_XSAVES)) {
> -		u32 eax, ebx, ecx, edx;
> -
> -		cpuid_count(0xd, 1, &eax, &ebx, &ecx, &edx);
>   		rdmsrl(MSR_IA32_XSS, host_xss);
>   		kvm_caps.supported_xss = host_xss & KVM_SUPPORTED_XSS;
> -		if (ecx & XFEATURE_MASK_CET_KERNEL)
> -			kvm_caps.supported_xss |= XFEATURE_MASK_CET_KERNEL;
>   	}
>   
>   	rdmsrl_safe(MSR_EFER, &host_efer);
>
> base-commit: efb9177acd7a4df5883b844e1ec9c69ef0899c9c
The code looks good to me except the handling of MSR_KVM_GUEST_SSP,
non-host-initiated read/write should be prevented.


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 08/19] KVM:x86: Report KVM supported CET MSRs as to-be-saved
  2023-08-04  5:51       ` Chao Gao
  2023-08-04 18:51         ` Sean Christopherson
@ 2023-08-06  8:54         ` Yang, Weijiang
  1 sibling, 0 replies; 82+ messages in thread
From: Yang, Weijiang @ 2023-08-06  8:54 UTC (permalink / raw)
  To: Chao Gao
  Cc: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, binbin.wu

On 8/4/2023 1:51 PM, Chao Gao wrote:
> On Fri, Aug 04, 2023 at 11:13:36AM +0800, Yang, Weijiang wrote:
>>>> @@ -7214,6 +7217,13 @@ static void kvm_probe_msr_to_save(u32 msr_index)
>>>> 		if (!kvm_caps.supported_xss)
>>>> 			return;
>>>> 		break;
>>>> +	case MSR_IA32_U_CET:
>>>> +	case MSR_IA32_S_CET:
>>>> +	case MSR_KVM_GUEST_SSP:
>>>> +	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
>>>> +		if (!kvm_is_cet_supported())
>>> shall we consider the case where IBT is supported while SS isn't
>>> (e.g., in L1 guest)?
>> Yes, but userspace should be able to access SHSTK MSRs even only IBT is exposed to guest so
>> far as KVM can support SHSTK MSRs.
> Why should userspace be allowed to access SHSTK MSRs in this case? L1 may not
> even enumerate SHSTK (qemu removes -shstk explicitly but keeps IBT), how KVM in
> L1 can allow its userspace to do that?
Hold on until host_initiated access is finalized.
>>>> +static inline bool kvm_is_cet_supported(void)
>>>> +{
>>>> +	return (kvm_caps.supported_xss & CET_XSTATE_MASK) == CET_XSTATE_MASK;
>>> why not just check if SHSTK or IBT is supported explicitly, i.e.,
>>>
>>> 	return kvm_cpu_cap_has(X86_FEATURE_SHSTK) ||
>>> 	       kvm_cpu_cap_has(X86_FEATURE_IBT);
>>>
>>> this is straightforward. And strictly speaking, the support of a feature and
>>> the support of managing a feature's state via XSAVE(S) are two different things.x
>> I think using exiting check implies two things:
>> 1. Platform/KVM can support CET features.
>> 2. CET user mode MSRs are backed by host thus are guaranteed to be valid.
>> i.e., the purpose is to check guest CET dependencies instead of features' availability.
> When KVM claims a feature is supported, it should ensure all its dependencies are
> met. that's, KVM's support of a feature also imples all dependencies are met.
> Function-wise, the two approaches have no difference. I just think checking
> KVM's support of SHSTK/IBT is more clear because the function name is
> kvm_is_cet_supported() rather than e.g., kvm_is_cet_state_managed_by_xsave().
OK, maybe the helper is not necessary anymore, I will remove it, thank you!
>> kvm_cpu_cap_has(X86_FEATURE_SHSTK) || kvm_cpu_cap_has(X86_FEATURE_IBT)
>>
>> only tells at least one of the CET features is supported by KVM.
>>
>>> then patch 16 has no need to do
>>>
>>> +	/*
>>> +	 * If SHSTK and IBT are not available in KVM, clear CET user bit in
>>> +	 * kvm_caps.supported_xss so that kvm_is_cet__supported() returns
>>> +	 * false when called.
>>> +	 */
>>> +	if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
>>> +	    !kvm_cpu_cap_has(X86_FEATURE_IBT))
>>> +		kvm_caps.supported_xss &= ~CET_XSTATE_MASK;


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 12/19] KVM:x86: Save and reload SSP to/from SMRAM
  2023-08-04 15:25     ` Sean Christopherson
@ 2023-08-06  9:14       ` Yang, Weijiang
  0 siblings, 0 replies; 82+ messages in thread
From: Yang, Weijiang @ 2023-08-06  9:14 UTC (permalink / raw)
  To: Sean Christopherson, Chao Gao
  Cc: pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, binbin.wu

On 8/4/2023 11:25 PM, Sean Christopherson wrote:
> On Fri, Aug 04, 2023, Chao Gao wrote:
>> On Thu, Aug 03, 2023 at 12:27:25AM -0400, Yang Weijiang wrote:
>>> Save CET SSP to SMRAM on SMI and reload it on RSM.
>>> KVM emulates architectural behavior when guest enters/leaves SMM
>>> mode, i.e., save registers to SMRAM at the entry of SMM and reload
>>> them at the exit of SMM. Per SDM, SSP is defined as one of
>>> the fields in SMRAM for 64-bit mode, so handle the state accordingly.
>>>
>>> Check is_smm() to determine whether kvm_cet_is_msr_accessible()
>>> is called in SMM mode so that kvm_{set,get}_msr() works in SMM mode.
>>>
>>> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
>>> ---
>>> arch/x86/kvm/smm.c | 11 +++++++++++
>>> arch/x86/kvm/smm.h |  2 +-
>>> arch/x86/kvm/x86.c | 11 ++++++++++-
>>> 3 files changed, 22 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c
>>> index b42111a24cc2..e0b62d211306 100644
>>> --- a/arch/x86/kvm/smm.c
>>> +++ b/arch/x86/kvm/smm.c
>>> @@ -309,6 +309,12 @@ void enter_smm(struct kvm_vcpu *vcpu)
>>>
>>> 	kvm_smm_changed(vcpu, true);
>>>
>>> +#ifdef CONFIG_X86_64
>>> +	if (guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
>>> +	    kvm_get_msr(vcpu, MSR_KVM_GUEST_SSP, &smram.smram64.ssp))
>>> +		goto error;
>>> +#endif
>> SSP save/load should go to enter_smm_save_state_64() and rsm_load_state_64(),
>> where other fields of SMRAM are handled.
> +1.  The right way to get/set MSRs like this is to use __kvm_get_msr() and pass
> %true for @host_initiated.  Though I would add a prep patch to provide wrappers
> for __kvm_get_msr() and __kvm_set_msr().  Naming will be hard, but I think we
> can use kvm_{read,write}_msr() to go along with the KVM-initiated register
> accessors/mutators, e.g. kvm_register_read(), kvm_pdptr_write(), etc.
>
> Then you don't need to wait until after kvm_smm_changed(), and kvm_cet_is_msr_accessible()
> doesn't need the confusing (and broken) SMM waiver, e.g. as Chao points out below,
> that would allow the guest to access the synthetic MSR.
>
> Delta patch at the bottom (would need to be split up, rebased, etc.).
Thanks! Will change the related stuffs per your suggestions!
>>> 	if (kvm_vcpu_write_guest(vcpu, vcpu->arch.smbase + 0xfe00, &smram, sizeof(smram)))
>>> 		goto error;
>>>
>>> @@ -586,6 +592,11 @@ int emulator_leave_smm(struct x86_emulate_ctxt *ctxt)
>>> 	if ((vcpu->arch.hflags & HF_SMM_INSIDE_NMI_MASK) == 0)
>>> 		static_call(kvm_x86_set_nmi_mask)(vcpu, false);
>>>
>>> +#ifdef CONFIG_X86_64
>>> +	if (guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
>>> +	    kvm_set_msr(vcpu, MSR_KVM_GUEST_SSP, smram.smram64.ssp))
>>> +		return X86EMUL_UNHANDLEABLE;
>>> +#endif
>>> 	kvm_smm_changed(vcpu, false);
>>>
>>> 	/*
>>> diff --git a/arch/x86/kvm/smm.h b/arch/x86/kvm/smm.h
>>> index a1cf2ac5bd78..1e2a3e18207f 100644
>>> --- a/arch/x86/kvm/smm.h
>>> +++ b/arch/x86/kvm/smm.h
>>> @@ -116,8 +116,8 @@ struct kvm_smram_state_64 {
>>> 	u32 smbase;
>>> 	u32 reserved4[5];
>>>
>>> -	/* ssp and svm_* fields below are not implemented by KVM */
>>> 	u64 ssp;
>>> +	/* svm_* fields below are not implemented by KVM */
>>> 	u64 svm_guest_pat;
>>> 	u64 svm_host_efer;
>>> 	u64 svm_host_cr4;
>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>> index 98f3ff6078e6..56aa5a3d3913 100644
>>> --- a/arch/x86/kvm/x86.c
>>> +++ b/arch/x86/kvm/x86.c
>>> @@ -3644,8 +3644,17 @@ static bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu,
>>> 		if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
>>> 			return false;
>>>
>>> -		if (msr->index == MSR_KVM_GUEST_SSP)
>>> +		/*
>>> +		 * This MSR is synthesized mainly for userspace access during
>>> +		 * Live Migration, it also can be accessed in SMM mode by VMM.
>>> +		 * Guest is not allowed to access this MSR.
>>> +		 */
>>> +		if (msr->index == MSR_KVM_GUEST_SSP) {
>>> +			if (IS_ENABLED(CONFIG_X86_64) && is_smm(vcpu))
>>> +				return true;
>> On second thoughts, this is incorrect. We don't want guest in SMM
>> mode to read/write SSP via the synthesized MSR. Right?
> It's not a guest read though, KVM is doing the read while emulating SMI/RSM.
>
>> You can
>> 1. move set/get guest SSP into two helper functions, e.g., kvm_set/get_ssp()
>> 2. call kvm_set/get_ssp() for host-initiated MSR accesses and SMM transitions.
> We could, but that would largely defeat the purpose of kvm_x86_ops.{g,s}et_msr(),
> i.e. we already have hooks to get at MSR values that are buried in the VMCS/VMCB,
> the interface is just a bit kludgy.
>   
>> 3. refuse guest accesses to the synthesized MSR.
> ---
>   arch/x86/include/asm/kvm_host.h |  8 +++++++-
>   arch/x86/kvm/cpuid.c            |  2 +-
>   arch/x86/kvm/smm.c              | 10 ++++------
>   arch/x86/kvm/x86.c              | 17 +++++++++++++----
>   4 files changed, 25 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index f883696723f4..fe8484bc8082 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1939,7 +1939,13 @@ void kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu);
>   
>   void kvm_enable_efer_bits(u64);
>   bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer);
> -int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data, bool host_initiated);
> +
> +/*
> + * kvm_msr_{read,write}() are KVM-internal helpers, i.e. for when KVM needs to
> + * get/set an MSR value when emulating CPU behavior.
> + */
> +int kvm_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data);
> +int kvm_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 *data);
>   int kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data);
>   int kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data);
>   int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 1a601be7b4fa..b595645b2af7 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -1515,7 +1515,7 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
>   		*edx = entry->edx;
>   		if (function == 7 && index == 0) {
>   			u64 data;
> -		        if (!__kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data, true) &&
> +		        if (!kvm_msr_read(vcpu, MSR_IA32_TSX_CTRL, &data) &&
>   			    (data & TSX_CTRL_CPUID_CLEAR))
>   				*ebx &= ~(F(RTM) | F(HLE));
>   		} else if (function == 0x80000007) {
> diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c
> index e0b62d211306..8db12831877e 100644
> --- a/arch/x86/kvm/smm.c
> +++ b/arch/x86/kvm/smm.c
> @@ -275,6 +275,10 @@ static void enter_smm_save_state_64(struct kvm_vcpu *vcpu,
>   	enter_smm_save_seg_64(vcpu, &smram->gs, VCPU_SREG_GS);
>   
>   	smram->int_shadow = static_call(kvm_x86_get_interrupt_shadow)(vcpu);
> +
> +	if (guest_can_use(vcpu, X86_FEATURE_SHSTK)
> +		KVM_BUG_ON(kvm_msr_read(vcpu, MSR_KVM_GUEST_SSP,
> +					&smram.smram64.ssp), vcpu->kvm));
>   }
>   #endif
>   
> @@ -309,12 +313,6 @@ void enter_smm(struct kvm_vcpu *vcpu)
>   
>   	kvm_smm_changed(vcpu, true);
>   
> -#ifdef CONFIG_X86_64
> -	if (guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
> -	    kvm_get_msr(vcpu, MSR_KVM_GUEST_SSP, &smram.smram64.ssp))
> -		goto error;
> -#endif
> -
>   	if (kvm_vcpu_write_guest(vcpu, vcpu->arch.smbase + 0xfe00, &smram, sizeof(smram)))
>   		goto error;
>   
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 2e200a5d00e9..872767b7bf51 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1924,8 +1924,8 @@ static int kvm_set_msr_ignored_check(struct kvm_vcpu *vcpu,
>    * Returns 0 on success, non-0 otherwise.
>    * Assumes vcpu_load() was already called.
>    */
> -int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
> -		  bool host_initiated)
> +static int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
> +			 bool host_initiated)
>   {
>   	struct msr_data msr;
>   	int ret;
> @@ -1951,6 +1951,16 @@ int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
>   	return ret;
>   }
>   
> +int kvm_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 *data)
> +{
> +	return __kvm_get_msr(vcpu, index, data, true);
> +}
> +
> +int kvm_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data)
> +{
> +	return __kvm_get_msr(vcpu, index, data, true);
> +}
> +
>   static int kvm_get_msr_ignored_check(struct kvm_vcpu *vcpu,
>   				     u32 index, u64 *data, bool host_initiated)
>   {
> @@ -4433,8 +4443,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   			return 1;
>   		if (msr == MSR_IA32_PL0_SSP || msr == MSR_IA32_PL1_SSP ||
>   		    msr == MSR_IA32_PL2_SSP) {
> -			msr_info->data =
> -				vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP];
> +			msr_info->data = vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP];
>   		} else if (msr == MSR_IA32_U_CET || msr == MSR_IA32_PL3_SSP) {
>   			kvm_get_xsave_msr(msr_info);
>   		}
>
> base-commit: 82e95ab0094bf1b823a6f9c9a07238852b375a22


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 13/19] KVM:VMX: Set up interception for CET MSRs
  2023-08-04  8:16   ` Chao Gao
@ 2023-08-06  9:22     ` Yang, Weijiang
  2023-08-07  1:16       ` Chao Gao
  0 siblings, 1 reply; 82+ messages in thread
From: Yang, Weijiang @ 2023-08-06  9:22 UTC (permalink / raw)
  To: Chao Gao
  Cc: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, binbin.wu

On 8/4/2023 4:16 PM, Chao Gao wrote:
> On Thu, Aug 03, 2023 at 12:27:26AM -0400, Yang Weijiang wrote:
>> Pass through CET MSRs when the associated feature is enabled.
>> Shadow Stack feature requires all the CET MSRs to make it
>> architectural support in guest. IBT feature only depends on
>> MSR_IA32_U_CET and MSR_IA32_S_CET to enable both user and
>> supervisor IBT. Note, This MSR design introduced an architectual
>> limitation of SHSTK and IBT control for guest, i.e., when SHSTK
>> is exposed, IBT is also available to guest from architectual level
>> since IBT relies on subset of SHSTK relevant MSRs.
>>
>> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Reviewed-by: Chao Gao <chao.gao@intel.com>
>
> one nit below
Thanks!
> [...]
>> +
>> +	if (kvm_cpu_cap_has(X86_FEATURE_IBT)) {
>> +		incpt = !guest_can_use(vcpu, X86_FEATURE_IBT);
> can you use guest_can_use() or guest_cpuid_has() consistently?
Hmm, the inspiration actually came from Sean:
Re: [RFC PATCH v2 3/6] KVM: x86: SVM: Pass through shadow stack MSRs - Sean Christopherson (kernel.org) <https://lore.kernel.org/all/ZMk14YiPw9l7ZTXP@google.com/>
it would make the code more reasonable on non-CET platforms.
>> +
>> +		vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET,
>> +					  MSR_TYPE_RW, incpt);
>> +		vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET,
>> +					  MSR_TYPE_RW, incpt);
>> +	}
>> +}
>> +
>> static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>> {
>> 	struct vcpu_vmx *vmx = to_vmx(vcpu);
>> @@ -7814,6 +7853,8 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>>
>> 	/* Refresh #PF interception to account for MAXPHYADDR changes. */
>> 	vmx_update_exception_bitmap(vcpu);
>> +
>> +	vmx_update_intercept_for_cet_msr(vcpu);
>> }
>>
>> static u64 vmx_get_perf_capabilities(void)
>> -- 
>> 2.27.0
>>


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 13/19] KVM:VMX: Set up interception for CET MSRs
  2023-08-06  9:22     ` Yang, Weijiang
@ 2023-08-07  1:16       ` Chao Gao
  2023-08-09  6:11         ` Yang, Weijiang
  0 siblings, 1 reply; 82+ messages in thread
From: Chao Gao @ 2023-08-07  1:16 UTC (permalink / raw)
  To: Yang, Weijiang
  Cc: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, binbin.wu

>> > +	if (kvm_cpu_cap_has(X86_FEATURE_IBT)) {
>> > +		incpt = !guest_can_use(vcpu, X86_FEATURE_IBT);
>> can you use guest_can_use() or guest_cpuid_has() consistently?
>Hmm, the inspiration actually came from Sean:
>Re: [RFC PATCH v2 3/6] KVM: x86: SVM: Pass through shadow stack MSRs - Sean Christopherson (kernel.org) <https://lore.kernel.org/all/ZMk14YiPw9l7ZTXP@google.com/>
>it would make the code more reasonable on non-CET platforms.

then, can you switch to use guest_cpuid_has() for IBT here as you do a few
lines above for the SHSTK? that's why I said "consistently".

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 04/19] KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS
  2023-08-04 18:27   ` Sean Christopherson
@ 2023-08-07  6:55     ` Paolo Bonzini
  2023-08-09  8:56     ` Yang, Weijiang
  1 sibling, 0 replies; 82+ messages in thread
From: Paolo Bonzini @ 2023-08-07  6:55 UTC (permalink / raw)
  To: Sean Christopherson, Yang Weijiang
  Cc: peterz, john.allen, kvm, linux-kernel, rick.p.edgecombe,
	chao.gao, binbin.wu, Zhang Yi Z

On 8/4/23 20:27, Sean Christopherson wrote:
> I think my preference is to enforce guest CPUID for host accesses to 
> XSS, XFD, XFD_ERR, etc I'm pretty sure I've advocated for the exact 
> opposite in the past, i.e. argued that KVM's ABI is to not enforce 
> ordering between KVM_SET_CPUID2 and KVM_SET_MSR. But this is becoming
> untenable, juggling the dependencies in KVM is complex and is going
> to result in a nasty bug at some point.

Fortunately, you are right now.  Well, almost :) but the important part 
is that indeed the dependencies are too complex.

While host-side accesses must be allowed, they should only allow the 
default value if the CPUID bit is not set.

Paolo


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 11/19] KVM:VMX: Emulate read and write to CET MSRs
  2023-08-06  8:44       ` Yang, Weijiang
@ 2023-08-07  7:00         ` Paolo Bonzini
  0 siblings, 0 replies; 82+ messages in thread
From: Paolo Bonzini @ 2023-08-07  7:00 UTC (permalink / raw)
  To: Yang, Weijiang, Sean Christopherson, Chao Gao
  Cc: peterz, john.allen, kvm, linux-kernel, rick.p.edgecombe, binbin.wu

On 8/6/23 10:44, Yang, Weijiang wrote:
>> Similar to my suggestsion for XSS, I think we drop the waiver for 
>> host_initiated
>> accesses, i.e. require the feature to be enabled and exposed to the 
>> guest, even
>> for the host.
>
> I saw Paolo shares different opinion on this, so would hold on for a 
> while...

It's not *so* different: the host initiated access should be allowed, 
but it should only allow writing zero.  So, something like:

> +static bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu,
> +                      struct msr_data *msr)
> +{

bool host_msr_reset =
	msr->host_initiated && msr->data == 0;

and then below you use host_msr_reset instead of msr->host_initiated.

> +        if (msr->index == MSR_KVM_GUEST_SSP)
> +            return msr->host_initiated;
> +
> +        return msr->host_initiated ||
> +            guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);

This can be unified like this:

return
	(host_msr_reset || guest_cpuid_has(vcpu, X86_FEATURE_SHSTK)) &&
	(msr->index != MSR_KVM_GUEST_SSP || msr->host_initiated);

> +    }
> +
> +    if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
> +        !kvm_cpu_cap_has(X86_FEATURE_IBT))
> +        return false;
> +
> +    return msr->host_initiated ||
> +        guest_cpuid_has(vcpu, X86_FEATURE_IBT) ||
> +        guest_cpuid_has(vcpu, X86_FEATURE_SHSTK); 

while this can simply use host_msr_reset.

Paolo


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 11/19] KVM:VMX: Emulate read and write to CET MSRs
  2023-08-04 22:21         ` Sean Christopherson
@ 2023-08-07  7:03           ` Paolo Bonzini
  0 siblings, 0 replies; 82+ messages in thread
From: Paolo Bonzini @ 2023-08-07  7:03 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Chao Gao, Yang Weijiang, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, binbin.wu

On 8/5/23 00:21, Sean Christopherson wrote:
> Oooh, the MSRs that don't exempt host_initiated are added to the list

(are *not* added)

> of MSRs to save/restore, i.e. KVM "silently" supports 
> MSR_AMD64_OSVW_ID_LENGTH and MSR_AMD64_OSVW_STATUS.
> 
> And guest_pv_has() returns true unless userspace has opted in to
> enforcement.

Two different ways of having the same bug.  The latter was introduced in 
the implementation of KVM_CAP_ENFORCE_PV_FEATURE_CPUID; it would become 
a problem if some selftests started using it.

Paolo


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 04/19] KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS
  2023-08-04 16:02   ` Sean Christopherson
  2023-08-04 21:43     ` Paolo Bonzini
@ 2023-08-08 14:20     ` Yang, Weijiang
  1 sibling, 0 replies; 82+ messages in thread
From: Yang, Weijiang @ 2023-08-08 14:20 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, chao.gao, binbin.wu, Zhang Yi Z

On 8/5/2023 12:02 AM, Sean Christopherson wrote:
> On Thu, Aug 03, 2023, Yang Weijiang wrote:
>> Update CPUID(EAX=0DH,ECX=1) when the guest's XSS is modified.
>> CPUID(EAX=0DH,ECX=1).EBX reports required storage size of
>> all enabled xstate features in XCR0 | XSS. Guest can allocate
>> sufficient xsave buffer based on the info.
> Please wrap changelogs closer to ~75 chars.  I'm pretty sure this isn't the first
> time I've made this request...
Thanks, will keep the changelog to 70~75  chars.
>> Note, KVM does not yet support any XSS based features, i.e.
>> supported_xss is guaranteed to be zero at this time.
>>
>> Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
>> Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
>> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
>> ---
>>   arch/x86/include/asm/kvm_host.h |  1 +
>>   arch/x86/kvm/cpuid.c            | 20 ++++++++++++++++++--
>>   arch/x86/kvm/x86.c              |  8 +++++---
>>   3 files changed, 24 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 28bd38303d70..20bbcd95511f 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -804,6 +804,7 @@ struct kvm_vcpu_arch {
>>   
>>   	u64 xcr0;
>>   	u64 guest_supported_xcr0;
>> +	u64 guest_supported_xss;
>>   
>>   	struct kvm_pio_request pio;
>>   	void *pio_data;
>> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>> index 7f4d13383cf2..0338316b827c 100644
>> --- a/arch/x86/kvm/cpuid.c
>> +++ b/arch/x86/kvm/cpuid.c
>> @@ -249,6 +249,17 @@ static u64 cpuid_get_supported_xcr0(struct kvm_cpuid_entry2 *entries, int nent)
>>   	return (best->eax | ((u64)best->edx << 32)) & kvm_caps.supported_xcr0;
>>   }
>>   
>> +static u64 cpuid_get_supported_xss(struct kvm_cpuid_entry2 *entries, int nent)
>> +{
>> +	struct kvm_cpuid_entry2 *best;
>> +
>> +	best = cpuid_entry2_find(entries, nent, 0xd, 1);
>> +	if (!best)
>> +		return 0;
>> +
>> +	return (best->ecx | ((u64)best->edx << 32)) & kvm_caps.supported_xss;
>> +}
>> +
>>   static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *entries,
>>   				       int nent)
>>   {
>> @@ -276,8 +287,11 @@ static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_e
>>   
>>   	best = cpuid_entry2_find(entries, nent, 0xD, 1);
>>   	if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
>> -		     cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
>> -		best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
>> +		     cpuid_entry_has(best, X86_FEATURE_XSAVEC))) {
>> +		u64 xstate = vcpu->arch.xcr0 | vcpu->arch.ia32_xss;
> Nit, the variable should be xfeatures, not xstate.  Though I vote to avoid the
> variable entirely,
>
> 	best = cpuid_entry2_find(entries, nent, 0xD, 1);
> 	if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
> 		     cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
> 		best->ebx = xstate_required_size(vcpu->arch.xcr0 |
> 						 vcpu->arch.ia32_xss, true);
>
> though it's only a slight preference, i.e. feel free to keep your approach if
> you or others feel strongly about the style.
Yes, the variable is not necessary, will remove it.
>> +	}
>>   
>>   	best = __kvm_find_kvm_cpuid_features(vcpu, entries, nent);
>>   	if (kvm_hlt_in_guest(vcpu->kvm) && best &&
>> @@ -325,6 +339,8 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>>   
>>   	vcpu->arch.guest_supported_xcr0 =
>>   		cpuid_get_supported_xcr0(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);
>> +	vcpu->arch.guest_supported_xss =
>> +		cpuid_get_supported_xss(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);
> Blech.  I tried to clean up this ugly, but Paolo disagreed[*].  Can you fold in
> the below (compile tested only) patch at the very beginning of this series?  It
> implements my suggested alternative.  And then this would become:
>
> static u64 vcpu_get_supported_xss(struct kvm_vcpu *vcpu)
> {
> 	struct kvm_cpuid_entry2 *best;
>
> 	best = kvm_find_cpuid_entry_index(vcpu, 0xd, 1);
> 	if (!best)
> 		return 0;
>
> 	return (best->ecx | ((u64)best->edx << 32)) & kvm_caps.supported_xss;
> }
>
> [*] https://lore.kernel.org/all/ZGfius5UkckpUyXl@google.com
Sure, will take it into my series, thanks!
>>   	/*
>>   	 * FP+SSE can always be saved/restored via KVM_{G,S}ET_XSAVE, even if
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 0b9033551d8c..5d6d6fa33e5b 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -3780,10 +3780,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>   		 * IA32_XSS[bit 8]. Guests have to use RDMSR/WRMSR rather than
>>   		 * XSAVES/XRSTORS to save/restore PT MSRs.
>>   		 */
>> -		if (data & ~kvm_caps.supported_xss)
>> +		if (data & ~vcpu->arch.guest_supported_xss)
>>   			return 1;
>> -		vcpu->arch.ia32_xss = data;
>> -		kvm_update_cpuid_runtime(vcpu);
>> +		if (vcpu->arch.ia32_xss != data) {
>> +			vcpu->arch.ia32_xss = data;
>> +			kvm_update_cpuid_runtime(vcpu);
>> +		}
> Nit, I prefer this style:
>
> 		if (vcpu->arch.ia32_xss == data)
> 			break;
>
> 		vcpu->arch.ia32_xss = data;
> 		kvm_update_cpuid_runtime(vcpu);
>
> so that the common path isn't buried in an if-statement.
Yeah, I feel I'm a bit awkward to make code look nicer :-)
>>   		break;
>>   	case MSR_SMI_COUNT:
>>   		if (!msr_info->host_initiated)
>> -- 
>
> From: Sean Christopherson <seanjc@google.com>
> Date: Fri, 4 Aug 2023 08:48:03 -0700
> Subject: [PATCH] KVM: x86: Rework cpuid_get_supported_xcr0() to operate on
>   vCPU data
>
> Rework and rename cpuid_get_supported_xcr0() to explicitly operate on vCPU
> state, i.e. on a vCPU's CPUID state.  Prior to commit 275a87244ec8 ("KVM:
> x86: Don't adjust guest's CPUID.0x12.1 (allowed SGX enclave XFRM)"), KVM
> incorrectly fudged guest CPUID at runtime, which in turn necessitated
> massaging the incoming CPUID state for KVM_SET_CPUID{2} so as not to run
> afoul of kvm_cpuid_check_equal().
>
> Opportunistically move the helper below kvm_update_cpuid_runtime() to make
> it harder to repeat the mistake of querying supported XCR0 for runtime
> updates.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/cpuid.c | 33 ++++++++++++++++-----------------
>   1 file changed, 16 insertions(+), 17 deletions(-)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 7f4d13383cf2..5e42846c948a 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -234,21 +234,6 @@ void kvm_update_pv_runtime(struct kvm_vcpu *vcpu)
>   		vcpu->arch.pv_cpuid.features = best->eax;
>   }
>   
> -/*
> - * Calculate guest's supported XCR0 taking into account guest CPUID data and
> - * KVM's supported XCR0 (comprised of host's XCR0 and KVM_SUPPORTED_XCR0).
> - */
> -static u64 cpuid_get_supported_xcr0(struct kvm_cpuid_entry2 *entries, int nent)
> -{
> -	struct kvm_cpuid_entry2 *best;
> -
> -	best = cpuid_entry2_find(entries, nent, 0xd, 0);
> -	if (!best)
> -		return 0;
> -
> -	return (best->eax | ((u64)best->edx << 32)) & kvm_caps.supported_xcr0;
> -}
> -
>   static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *entries,
>   				       int nent)
>   {
> @@ -299,6 +284,21 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
>   }
>   EXPORT_SYMBOL_GPL(kvm_update_cpuid_runtime);
>   
> +/*
> + * Calculate guest's supported XCR0 taking into account guest CPUID data and
> + * KVM's supported XCR0 (comprised of host's XCR0 and KVM_SUPPORTED_XCR0).
> + */
> +static u64 vcpu_get_supported_xcr0(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_cpuid_entry2 *best;
> +
> +	best = kvm_find_cpuid_entry_index(vcpu, 0xd, 0);
> +	if (!best)
> +		return 0;
> +
> +	return (best->eax | ((u64)best->edx << 32)) & kvm_caps.supported_xcr0;
> +}
> +
>   static bool kvm_cpuid_has_hyperv(struct kvm_cpuid_entry2 *entries, int nent)
>   {
>   	struct kvm_cpuid_entry2 *entry;
> @@ -323,8 +323,7 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>   		kvm_apic_set_version(vcpu);
>   	}
>   
> -	vcpu->arch.guest_supported_xcr0 =
> -		cpuid_get_supported_xcr0(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);
> +	vcpu->arch.guest_supported_xcr0 = vcpu_get_supported_xcr0(vcpu);
>   
>   	/*
>   	 * FP+SSE can always be saved/restored via KVM_{G,S}ET_XSAVE, even if
>
> base-commit: f0147fcfab840fe9a3f03e9645d25c1326373fe6


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 05/19] KVM:x86: Initialize kvm_caps.supported_xss
  2023-08-04 18:45   ` Sean Christopherson
@ 2023-08-08 15:08     ` Yang, Weijiang
  0 siblings, 0 replies; 82+ messages in thread
From: Yang, Weijiang @ 2023-08-08 15:08 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, chao.gao, binbin.wu

On 8/5/2023 2:45 AM, Sean Christopherson wrote:
> On Thu, Aug 03, 2023, Yang Weijiang wrote:
>> Set kvm_caps.supported_xss to host_xss && KVM XSS mask.
>> host_xss contains the host supported xstate feature bits for thread
>> context switch, KVM_SUPPORTED_XSS includes all KVM enabled XSS feature
>> bits, the operation result represents all KVM supported feature bits.
>> Since the result is subset of host_xss, the related XSAVE-managed MSRs
>> are automatically swapped for guest and host when vCPU exits to
>> userspace.
>>
>> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
>> ---
>>   arch/x86/kvm/vmx/vmx.c | 1 -
>>   arch/x86/kvm/x86.c     | 6 +++++-
>>   2 files changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index 0ecf4be2c6af..c8d9870cfecb 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -7849,7 +7849,6 @@ static __init void vmx_set_cpu_caps(void)
>>   		kvm_cpu_cap_set(X86_FEATURE_UMIP);
>>   
>>   	/* CPUID 0xD.1 */
>> -	kvm_caps.supported_xss = 0;
> Dropping this code in *this* patch is wrong, this belong in whatever patch(es) adds
> IBT and SHSTK support in VMX.
>
> And that does matter because it means this common patch can be carried wih SVM
> support without breaking VMX.
OK, I'll dropping this line for VMX/SVM in CET feature bits enabling patch.
>>   	if (!cpu_has_vmx_xsaves())
>>   		kvm_cpu_cap_clear(X86_FEATURE_XSAVES);
>>   
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 5d6d6fa33e5b..e9f3627d5fdd 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -225,6 +225,8 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
>>   				| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
>>   				| XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE)
>>   
>> +#define KVM_SUPPORTED_XSS     0
>> +
>>   u64 __read_mostly host_efer;
>>   EXPORT_SYMBOL_GPL(host_efer);
>>   
>> @@ -9498,8 +9500,10 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
>>   
>>   	rdmsrl_safe(MSR_EFER, &host_efer);
>>   
>> -	if (boot_cpu_has(X86_FEATURE_XSAVES))
>> +	if (boot_cpu_has(X86_FEATURE_XSAVES)) {
>>   		rdmsrl(MSR_IA32_XSS, host_xss);
>> +		kvm_caps.supported_xss = host_xss & KVM_SUPPORTED_XSS;
>> +	}
> Can you opportunistically (in this patch) hoist this above EFER so that XCR0 and
> XSS are colocated?  I.e. end up with this:
>
> 	if (boot_cpu_has(X86_FEATURE_XSAVE)) {
> 		host_xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
> 		kvm_caps.supported_xcr0 = host_xcr0 & KVM_SUPPORTED_XCR0;
> 	}
> 	if (boot_cpu_has(X86_FEATURE_XSAVES)) {
> 		rdmsrl(MSR_IA32_XSS, host_xss);
> 		kvm_caps.supported_xss = host_xss & KVM_SUPPORTED_XSS;
> 	}
>
> 	rdmsrl_safe(MSR_EFER, &host_efer);
Will change it, thanks!


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 08/19] KVM:x86: Report KVM supported CET MSRs as to-be-saved
  2023-08-04 18:51         ` Sean Christopherson
  2023-08-04 22:01           ` Paolo Bonzini
@ 2023-08-08 15:16           ` Yang, Weijiang
  1 sibling, 0 replies; 82+ messages in thread
From: Yang, Weijiang @ 2023-08-08 15:16 UTC (permalink / raw)
  To: Sean Christopherson, Chao Gao
  Cc: pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, binbin.wu

On 8/5/2023 2:51 AM, Sean Christopherson wrote:
> On Fri, Aug 04, 2023, Chao Gao wrote:
>> On Fri, Aug 04, 2023 at 11:13:36AM +0800, Yang, Weijiang wrote:
>>>>> @@ -7214,6 +7217,13 @@ static void kvm_probe_msr_to_save(u32 msr_index)
>>>>> 		if (!kvm_caps.supported_xss)
>>>>> 			return;
>>>>> 		break;
>>>>> +	case MSR_IA32_U_CET:
>>>>> +	case MSR_IA32_S_CET:
>>>>> +	case MSR_KVM_GUEST_SSP:
>>>>> +	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
>>>>> +		if (!kvm_is_cet_supported())
>>>> shall we consider the case where IBT is supported while SS isn't
>>>> (e.g., in L1 guest)?
>>> Yes, but userspace should be able to access SHSTK MSRs even only IBT is exposed to guest so
>>> far as KVM can support SHSTK MSRs.
>> Why should userspace be allowed to access SHSTK MSRs in this case? L1 may not
>> even enumerate SHSTK (qemu removes -shstk explicitly but keeps IBT), how KVM in
>> L1 can allow its userspace to do that?
> +1.  And specifically, this isn't about SHSTK being exposed to the guest, it's about
> SHSTK being _supported by KVM_.  This is all about KVM telling userspace what MSRs
> are valid and/or need to be saved+restored.  If KVM doesn't support a feature,
> then the MSRs are invalid and there is no reason for userspace to save+restore
> the MSRs on live migration.
OK, will use kvm_cpu_cap_has() to check KVM support before add CET MSRs to the lists.
>>>>> +static inline bool kvm_is_cet_supported(void)
>>>>> +{
>>>>> +	return (kvm_caps.supported_xss & CET_XSTATE_MASK) == CET_XSTATE_MASK;
>>>> why not just check if SHSTK or IBT is supported explicitly, i.e.,
>>>>
>>>> 	return kvm_cpu_cap_has(X86_FEATURE_SHSTK) ||
>>>> 	       kvm_cpu_cap_has(X86_FEATURE_IBT);
>>>>
>>>> this is straightforward. And strictly speaking, the support of a feature and
>>>> the support of managing a feature's state via XSAVE(S) are two different things.x
>>> I think using exiting check implies two things:
>>> 1. Platform/KVM can support CET features.
>>> 2. CET user mode MSRs are backed by host thus are guaranteed to be valid.
>>> i.e., the purpose is to check guest CET dependencies instead of features' availability.
>> When KVM claims a feature is supported, it should ensure all its dependencies are
>> met. that's, KVM's support of a feature also imples all dependencies are met.
>> Function-wise, the two approaches have no difference. I just think checking
>> KVM's support of SHSTK/IBT is more clear because the function name is
>> kvm_is_cet_supported() rather than e.g., kvm_is_cet_state_managed_by_xsave().
> +1, one of the big reasons kvm_cpu_cap_has() came about was being KVM had a giant
> mess of one-off helpers.
I see, thanks!

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 08/19] KVM:x86: Report KVM supported CET MSRs as to-be-saved
  2023-08-04 18:55   ` Sean Christopherson
@ 2023-08-08 15:26     ` Yang, Weijiang
  0 siblings, 0 replies; 82+ messages in thread
From: Yang, Weijiang @ 2023-08-08 15:26 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, chao.gao, binbin.wu

On 8/5/2023 2:55 AM, Sean Christopherson wrote:
> On Thu, Aug 03, 2023, Yang Weijiang wrote:
>> Add all CET MSRs including the synthesized GUEST_SSP to report list.
>> PL{0,1,2}_SSP are independent to host XSAVE management with later
>> patches. MSR_IA32_U_CET and MSR_IA32_PL3_SSP are XSAVE-managed on
>> host side. MSR_IA32_S_CET/MSR_IA32_INT_SSP_TAB/MSR_KVM_GUEST_SSP
>> are not XSAVE-managed.
>>
>> [...]
>>   	}
>> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
>> index 82e3dafc5453..6e6292915f8c 100644
>> --- a/arch/x86/kvm/x86.h
>> +++ b/arch/x86/kvm/x86.h
>> @@ -362,6 +362,16 @@ static inline bool kvm_mpx_supported(void)
>>   		== (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR);
>>   }
>>   
>> +#define CET_XSTATE_MASK (XFEATURE_MASK_CET_USER)
> This is funky.  As of this patch, KVM reports MSR_IA32_S_CET, a supervisor MSR,
> but does not require XFEATURE_MASK_CET_KERNEL.  That eventually comes along with
> "KVM:x86: Enable guest CET supervisor xstate bit support", but as of this patch
> KVM is busted.
>
> The whole cpuid_count() code in that patch shouldn't exist, so the easiest thing
> is to just fold the KVM_SUPPORTED_XSS and CET_XSTATE_MASK changes from that patch
> into this one.
I screwed it up when tried to make it clearer :-/
Will do it, thanks!

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 09/19] KVM:x86: Make guest supervisor states as non-XSAVE managed
  2023-08-04 20:45       ` Sean Christopherson
  2023-08-04 20:59         ` Peter Zijlstra
  2023-08-04 21:32         ` Paolo Bonzini
@ 2023-08-09  2:39         ` Yang, Weijiang
  2023-08-10  9:29         ` Yang, Weijiang
  3 siblings, 0 replies; 82+ messages in thread
From: Yang, Weijiang @ 2023-08-09  2:39 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Chao Gao, pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, binbin.wu

On 8/5/2023 4:45 AM, Sean Christopherson wrote:
> On Fri, Aug 04, 2023, Weijiang Yang wrote:
>> On 8/3/2023 7:15 PM, Chao Gao wrote:
>>> On Thu, Aug 03, 2023 at 12:27:22AM -0400, Yang Weijiang wrote:
>>>> +void save_cet_supervisor_ssp(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +	if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) {
> Drop the unlikely, KVM should not speculate on the guest configuration or underlying
> hardware.
OK.
>>>> +		rdmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]);
>>>> +		rdmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]);
>>>> +		rdmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]);
>>>> +		/*
>>>> +		 * Omit reset to host PL{1,2}_SSP because Linux will never use
>>>> +		 * these MSRs.
>>>> +		 */
>>>> +		wrmsrl(MSR_IA32_PL0_SSP, 0);
>>> This wrmsrl() can be dropped because host doesn't support SSS yet.
>> Frankly speaking, I want to remove this line of code. But that would mess up the MSR
>> on host side, i.e., from host perspective, the MSRs could be filled with garbage data,
>> and looks awful.
> So?  :-)
>
> That's the case for all of the MSRs that KVM defers restoring until the host
> returns to userspace, i.e. running in the host with bogus values in hardware is
> nothing new.
CET PL{0,1,2}_SSP are a bit different from other MSRs, the latter will be reloaded with host values
at some points after VM-Exit, but the CET MSRs are "leaked" and never be handled anywhere.
>
> And as I mentioned in the other thread regarding the assertion that SSS isn't
> enabled in the host, sanitizing hardware values for something that should never
> be consumed is a fools errand.
>
>> Anyway, I can remove it.
> Yes please, though it may be a moot point.
>
>>>> +	}
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(save_cet_supervisor_ssp);
>>>> +
>>>> +void reload_cet_supervisor_ssp(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +	if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) {
>>> ditto
>> Below is to reload guest supervisor SSPs instead of resetting host ones.
>>>> +		wrmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]);
>>>> +		wrmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]);
>>>> +		wrmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]);
> Pulling back in the justification from v3:
>
>   the Pros:
>    - Super easy to implement for KVM.
>    - Automatically avoids saving and restoring this data when the vmexit
>      is handled within KVM.
>
>   the Cons:
>    - Unnecessarily restores XFEATURE_CET_KERNEL when switching to
>      non-KVM task's userspace.
>    - Forces allocating space for this state on all tasks, whether or not
>      they use KVM, and with likely zero users today and the near future.
>    - Complicates the FPU optimization thinking by including things that
>      can have no affect on userspace in the FPU
>
> IMO the pros far outweigh the cons.  3x RDMSR and 3x WRMSR when loading host/guest
> state is non-trivial overhead.  That can be mitigated, e.g. by utilizing the
> user return MSR framework, but it's still unpalatable.  It's unlikely many guests
> will SSS in the *near* future, but I don't want to end up with code that performs
> poorly in the future and needs to be rewritten.
> Especially because another big negative is that not utilizing XSTATE bleeds into
> KVM's ABI.  Userspace has to be told to manually save+restore MSRs instead of just
> letting KVM_{G,S}ET_XSAVE handle the state.  And that will create a bit of a
> snafu if Linux does gain support for SSS.
>
> On the other hand, the extra per-task memory is all of 24 bytes.  AFAICT, there's
> literally zero effect on guest XSTATE allocations because those are vmalloc'd and
> thus rounded up to PAGE_SIZE, i.e. the next 4KiB.  And XSTATE needs to be 64-byte
> aligned, so the 24 bytes is only actually meaningful if the current size is within
> 24 bytes of the next cahce line.  And the "current" size is variable depending on
> which features are present and enabled, i.e. it's a roll of the dice as to whether
> or not using XSTATE for supervisor CET would actually increase memory usage.  And
> _if_ it does increase memory consumption, I have a very hard time believing an
> extra 64 bytes in the worst case scenario is a dealbreaker.
>
> If the performance is a concern, i.e. we don't want to eat saving/restoring the
> MSRs when switching to/from host FPU context, then I *think* that's simply a matter
> of keeping guest state resident when loading non-guest FPU state.
>
> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index 1015af1ae562..8e7599e3b923 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -167,6 +167,16 @@ void restore_fpregs_from_fpstate(struct fpstate *fpstate, u64 mask)
>                   */
>                  xfd_update_state(fpstate);
>   
> +               /*
> +                * Leave supervisor CET state as-is when loading host state
> +                * (kernel or userspace).  Supervisor CET state is managed via
> +                * XSTATE for KVM guests, but the host never consumes said
> +                * state (doesn't support supervisor shadow stacks), i.e. it's
> +                * safe to keep guest state loaded into hardware.
> +                */
> +               if (!fpstate->is_guest)
> +                       mask &= ~XFEATURE_MASK_CET_KERNEL;
> +
>                  /*
>                   * Restoring state always needs to modify all features
>                   * which are in @mask even if the current task cannot use
>
>
> So unless I'm missing something, NAK to this approach, at least not without trying
> the kernel FPU approach, i.e. I want somelike like to PeterZ or tglx to actually
> full on NAK the kernel approach before we consider shoving a hack into KVM.
I will discuss it with the stakeholders, and get back to this when it's clear. Thanks!

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 09/19] KVM:x86: Make guest supervisor states as non-XSAVE managed
  2023-08-04 21:32         ` Paolo Bonzini
@ 2023-08-09  2:51           ` Yang, Weijiang
  0 siblings, 0 replies; 82+ messages in thread
From: Yang, Weijiang @ 2023-08-09  2:51 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: Chao Gao, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, binbin.wu

On 8/5/2023 5:32 AM, Paolo Bonzini wrote:
> On 8/4/23 22:45, Sean Christopherson wrote:
>>>>> +void save_cet_supervisor_ssp(struct kvm_vcpu *vcpu)
>>>>> +{
>>>>> +    if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) {
>> Drop the unlikely, KVM should not speculate on the guest configuration or underlying
>> hardware.
>
> In general unlikely() can still be a good idea if you have a fast path vs. a slow path; the extra cost of a branch will be much more visible on the fast path.  That said the compiler should already be doing that.
This is my original assumption that compiler can help do some level of optimization with the modifier. Thanks!
>>  the Pros:
>>   - Super easy to implement for KVM.
>>   - Automatically avoids saving and restoring this data when the vmexit
>>     is handled within KVM.
>>
>>  the Cons:
>>   - Unnecessarily restores XFEATURE_CET_KERNEL when switching to
>>     non-KVM task's userspace.
>>   - Forces allocating space for this state on all tasks, whether or not
>>     they use KVM, and with likely zero users today and the near future.
>>   - Complicates the FPU optimization thinking by including things that
>>     can have no affect on userspace in the FPU
>
> I'm not sure if Linux will ever use XFEATURE_CET_KERNEL.  Linux does not use MSR_IA32_PL{1,2}_SSP; MSR_IA32_PL0_SSP probably would be per-CPU but it is not used while in ring 0 (except for SETSSBSY) and the restore can be delayed until return to userspace.  It is not unlike the SYSCALL MSRs.
>
> So I would treat the bit similar to the dynamic features even if it's not guarded by XFD, for example
>
> #define XFEATURE_MASK_USER_DYNAMIC XFEATURE_MASK_XTILE_DATA
> #define XFEATURE_MASK_USER_OPTIONAL \
>     (XFEATURE_MASK_DYNAMIC | XFEATURE_MASK_CET_KERNEL)
>
> where XFEATURE_MASK_USER_DYNAMIC is used for xfd-related tasks but everything else uses XFEATURE_MASK_USER_OPTIONAL.
>
> Then you'd enable the feature by hand when allocating the guest fpstate.
Yes, this is another way to optimize the kernel-managed solution, I'll investigate it, thanks!
>> Especially because another big negative is that not utilizing XSTATE bleeds into
>> KVM's ABI.  Userspace has to be told to manually save+restore MSRs instead of just
>> letting KVM_{G,S}ET_XSAVE handle the state.  And that will create a bit of a
>> snafu if Linux does gain support for SSS.
>
> I don't think this matters, we don't have any MSRs in KVM_GET/SET_XSAVE and in fact we can't even add them since the uABI uses the non-compacted format.  MSRs should be retrieved and set via KVM_GET/SET_MSR and userspace will learn about the index automatically via KVM_GET_MSR_INDEX_LIST.
> Paolo
>


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 18/19] KVM:nVMX: Refine error code injection to nested VM
  2023-08-04 21:38   ` Sean Christopherson
@ 2023-08-09  3:00     ` Yang, Weijiang
  0 siblings, 0 replies; 82+ messages in thread
From: Yang, Weijiang @ 2023-08-09  3:00 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, chao.gao, binbin.wu

On 8/5/2023 5:38 AM, Sean Christopherson wrote:
> This is not "refinement", this is full on supporting a new nVMX feature.  Please
> phrase the shortlog accordingly, e.g. something like this (it's not very good,
> but it's a start).
>
>    KVM: nVMX: Add support for exposing "No PM H/W error code checks" to L1
>
> Regarding shortlog, please update all of them in this series to put a space after
> the colon, i.e. "KVM: VMX:" and "KVM: x86:", not "KVM:x86:".
OK, will update this part.
>>   static void nested_vmx_setup_cr_fixed(struct nested_vmx_msrs *msrs)
>> diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
>> index 96952263b029..1884628294e4 100644
>> --- a/arch/x86/kvm/vmx/nested.h
>> +++ b/arch/x86/kvm/vmx/nested.h
>> @@ -284,6 +284,13 @@ static inline bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val)
>>   	       __kvm_is_valid_cr4(vcpu, val);
>>   }
>>   
>> +static inline bool nested_cpu_has_no_hw_errcode(struct kvm_vcpu *vcpu)
>> +{
>> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
>> +
>> +	return vmx->nested.msrs.basic & VMX_BASIC_NO_HW_ERROR_CODE;
> The "CC" part of my suggestion is critical to this being sane.  As is, this reads
> "nested CPU has no hardware error code", which is not even remotely close to the
> truth.
Understood, I was not aware of the essence of "CC".
> static inline bool nested_cpu_has_no_hw_errcode_cc(struct kvm_vcpu *vcpu)
> {
> 	return to_vmx(vcpu)->nested.msrs.basic & VMX_BASIC_NO_HW_ERROR_CODE_CC;
> }
>
> [*] https://lore.kernel.org/all/ZJ7vyBw1nbTBOfuf@google.com


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 11/19] KVM:VMX: Emulate read and write to CET MSRs
  2023-08-04 21:40   ` Paolo Bonzini
@ 2023-08-09  3:05     ` Yang, Weijiang
  0 siblings, 0 replies; 82+ messages in thread
From: Yang, Weijiang @ 2023-08-09  3:05 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: rick.p.edgecombe, chao.gao, binbin.wu, seanjc, peterz,
	john.allen, kvm, linux-kernel

On 8/5/2023 5:40 AM, Paolo Bonzini wrote:
> On 8/3/23 06:27, Yang Weijiang wrote:
>> +        if (msr_info->index == MSR_KVM_GUEST_SSP)
>> +            msr_info->data = vmcs_readl(GUEST_SSP);
>
> Accesses to MSR_KVM_(GUEST_)SSP must be rejected unless host-initiated.
Yes, it's kept, in v5 it's folded in:

+static bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu,

+struct msr_data *msr)

+{

+if (is_shadow_stack_msr(msr->index)) {

+if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))

+return false;

+

+if (msr->index == MSR_KVM_GUEST_SSP)

+return msr->host_initiated;

+

+return msr->host_initiated ||

+guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);

+}

+

+if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&

+!kvm_cpu_cap_has(X86_FEATURE_IBT))

+return false;

+

+return msr->host_initiated ||

+guest_cpuid_has(vcpu, X86_FEATURE_IBT) ||

+guest_cpuid_has(vcpu, X86_FEATURE_SHSTK); }

+

> Paolo
>


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 04/19] KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS
  2023-08-04 21:43     ` Paolo Bonzini
@ 2023-08-09  3:11       ` Yang, Weijiang
  0 siblings, 0 replies; 82+ messages in thread
From: Yang, Weijiang @ 2023-08-09  3:11 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: peterz, john.allen, kvm, linux-kernel, rick.p.edgecombe,
	chao.gao, binbin.wu

On 8/5/2023 5:43 AM, Paolo Bonzini wrote:
> On 8/4/23 18:02, Sean Christopherson wrote:
>>> Update CPUID(EAX=0DH,ECX=1) when the guest's XSS is modified.
>>> CPUID(EAX=0DH,ECX=1).EBX reports required storage size of
>>> all enabled xstate features in XCR0 | XSS. Guest can allocate
>>> sufficient xsave buffer based on the info.
>>
>> Please wrap changelogs closer to ~75 chars.  I'm pretty sure this isn't the first
>> time I've made this request...
>
> I suspect this is because of the long "word" CPUID(EAX=0DH,ECX=1).EBX. It would make the lengths less homogeneous if lines 1 stayed the same but lines 2-4 became longer.
Yes, more or less, but I need to learn some "techniques" to make the wording looks trimmed and tidy. Thanks!
> Paolo
>


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 08/19] KVM:x86: Report KVM supported CET MSRs as to-be-saved
  2023-08-04 21:47   ` Paolo Bonzini
@ 2023-08-09  3:14     ` Yang, Weijiang
  0 siblings, 0 replies; 82+ messages in thread
From: Yang, Weijiang @ 2023-08-09  3:14 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: rick.p.edgecombe, chao.gao, binbin.wu, seanjc, peterz,
	john.allen, kvm, linux-kernel

On 8/5/2023 5:47 AM, Paolo Bonzini wrote:
> On 8/3/23 06:27, Yang Weijiang wrote:
>> Add all CET MSRs including the synthesized GUEST_SSP to report list.
>> PL{0,1,2}_SSP are independent to host XSAVE management with later
>> patches. MSR_IA32_U_CET and MSR_IA32_PL3_SSP are XSAVE-managed on
>> host side. MSR_IA32_S_CET/MSR_IA32_INT_SSP_TAB/MSR_KVM_GUEST_SSP
>> are not XSAVE-managed.
>
> MSR_KVM_GUEST_SSP -> MSR_KVM_SSP
>
> Also please add a comment,
>
> /*
>  * SSP can only be read via RDSSP; writing even requires
>  * destructive and potentially faulting operations such as
>  * SAVEPREVSSP/RSTORSSP or SETSSBSY/CLRSSBSY.  Let the host
>  * use a pseudo-MSR that is just a wrapper for the GUEST_SSP
>  * field of the VMCS.
>  */
>
OK,  will take it, thanks!
> Paolo
>


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 17/19] KVM:x86: Enable guest CET supervisor xstate bit support
  2023-08-04 22:02   ` Paolo Bonzini
@ 2023-08-09  6:07     ` Yang, Weijiang
  0 siblings, 0 replies; 82+ messages in thread
From: Yang, Weijiang @ 2023-08-09  6:07 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: rick.p.edgecombe, chao.gao, binbin.wu, seanjc, peterz,
	john.allen, kvm, linux-kernel

On 8/5/2023 6:02 AM, Paolo Bonzini wrote:
> On 8/3/23 06:27, Yang Weijiang wrote:
>>       if (boot_cpu_has(X86_FEATURE_XSAVES)) {
>> +        u32 eax, ebx, ecx, edx;
>> +
>> +        cpuid_count(0xd, 1, &eax, &ebx, &ecx, &edx);
>>           rdmsrl(MSR_IA32_XSS, host_xss);
>>           kvm_caps.supported_xss = host_xss & KVM_SUPPORTED_XSS;
>> +        if (ecx & XFEATURE_MASK_CET_KERNEL)
>> +            kvm_caps.supported_xss |= XFEATURE_MASK_CET_KERNEL;
>>       }
>
> This is a bit hackish and makes me lean more towards adding support for XFEATURE_MASK_CET_KERNEL in host MSR_IA32_XSS (and then possibly hide it in the actual calls to XSAVE/XRSTORS for non-guest FPU).
Yes, if kernel can support CET_U/S bits in XSS, things would be much easier.
But if CET_S bit cannot be enabled for somehow,  we may have KVM emulation
for it.
> Paolo
>


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 13/19] KVM:VMX: Set up interception for CET MSRs
  2023-08-07  1:16       ` Chao Gao
@ 2023-08-09  6:11         ` Yang, Weijiang
  0 siblings, 0 replies; 82+ messages in thread
From: Yang, Weijiang @ 2023-08-09  6:11 UTC (permalink / raw)
  To: Chao Gao
  Cc: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, binbin.wu

On 8/7/2023 9:16 AM, Chao Gao wrote:
>>>> +	if (kvm_cpu_cap_has(X86_FEATURE_IBT)) {
>>>> +		incpt = !guest_can_use(vcpu, X86_FEATURE_IBT);
>>> can you use guest_can_use() or guest_cpuid_has() consistently?
>> Hmm, the inspiration actually came from Sean:
>> Re: [RFC PATCH v2 3/6] KVM: x86: SVM: Pass through shadow stack MSRs - Sean Christopherson (kernel.org) <https://lore.kernel.org/all/ZMk14YiPw9l7ZTXP@google.com/>
>> it would make the code more reasonable on non-CET platforms.
> then, can you switch to use guest_cpuid_has() for IBT here as you do a few
> lines above for the SHSTK? that's why I said "consistently".
Oh, I should use guest_cpuid_has() instead of guest_can_use() here, thanks!


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 11/19] KVM:VMX: Emulate read and write to CET MSRs
  2023-08-04  8:28   ` Chao Gao
@ 2023-08-09  7:12     ` Yang, Weijiang
  0 siblings, 0 replies; 82+ messages in thread
From: Yang, Weijiang @ 2023-08-09  7:12 UTC (permalink / raw)
  To: Chao Gao
  Cc: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, binbin.wu

On 8/4/2023 4:28 PM, Chao Gao wrote:
>> +	case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
>> +		if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
>> +			return 1;
>> +		if (is_noncanonical_address(data, vcpu))
>> +			return 1;
>> +		if (!IS_ALIGNED(data, 4))
>> +			return 1;
> Why should MSR_IA32_INT_SSP_TAB be 4-byte aligned? I don't see
> this requirement in SDM.
It must be something wrong in my memory, thanks for catching it!
> IA32_INTERRUPT_SSP_TABLE_ADDR:
>
> Linear address of a table of seven shadow
> stack pointers that are selected in IA-32e
> mode using the IST index (when not 0) from
> the interrupt gate descriptor. (R/W)
> This MSR is not present on processors that
> do not support Intel 64 architecture. This
> field cannot represent a non-canonical
> address.


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 04/19] KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS
  2023-08-04 18:27   ` Sean Christopherson
  2023-08-07  6:55     ` Paolo Bonzini
@ 2023-08-09  8:56     ` Yang, Weijiang
  2023-08-10  0:01       ` Paolo Bonzini
  1 sibling, 1 reply; 82+ messages in thread
From: Yang, Weijiang @ 2023-08-09  8:56 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, chao.gao, binbin.wu, Zhang Yi Z

On 8/5/2023 2:27 AM, Sean Christopherson wrote:
> On Thu, Aug 03, 2023, Yang Weijiang wrote:
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 0b9033551d8c..5d6d6fa33e5b 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -3780,10 +3780,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>   		 * IA32_XSS[bit 8]. Guests have to use RDMSR/WRMSR rather than
>>   		 * XSAVES/XRSTORS to save/restore PT MSRs.
>>   		 */
>> -		if (data & ~kvm_caps.supported_xss)
>> +		if (data & ~vcpu->arch.guest_supported_xss)
> Hmm, this is arguably wrong for userspace-initiated writes, as it would prevent
> userspace from restoring MSRs before CPUID.
>
> And it would make the handling of MSR_IA32_XSS writes inconsistent just within
> this case statement.  The initial "can this MSR be written at all" check would
> *not* honor guest CPUID for host writes, but then the per-bit check *would* honor
> guest CPUID for host writes.
>
> But if we exempt host writes, then we'll end up with another mess, as exempting
> host writes for MSR_KVM_GUEST_SSP would let the guest coerce KVM into writing an
> illegal value by modifying SMRAM while in SMM.
>
> Blech.
>
> If we can get away with it, i.e. not break userspace, I think my preference is
> to enforce guest CPUID for host accesses to XSS, XFD, XFD_ERR, etc.  I'm 99%
> certain we can make that change, because there are many, many MSRs that do NOT
> exempt host writes, i.e. the only way this would be a breaking change is if
> userspace is writing things like XSS before KVM_SET_CPUID2, but other MSRs after
> KVM_SET_CPUID2.
>
> I'm pretty sure I've advocated for the exact opposite in the past, i.e. argued
> that KVM's ABI is to not enforce ordering between KVM_SET_CPUID2 and KVM_SET_MSR.
> But this is becoming untenable, juggling the dependencies in KVM is complex and
> is going to result in a nasty bug at some point.
>
> For this series, lets just tighten the rules for XSS, i.e. drop the host_initated
> exemption.  And in a parallel/separate series, try to do a wholesale cleanup of
> all the cases that essentially allow userspace to do KVM_SET_MSR before KVM_SET_CPUID2.
OK, will do it for this series and investigate for other MSRs.
Thanks!

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 15/19] KVM:x86: Optimize CET supervisor SSP save/reload
  2023-08-04  8:43   ` Chao Gao
@ 2023-08-09  9:00     ` Yang, Weijiang
  0 siblings, 0 replies; 82+ messages in thread
From: Yang, Weijiang @ 2023-08-09  9:00 UTC (permalink / raw)
  To: Chao Gao
  Cc: seanjc, pbonzini, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, binbin.wu

On 8/4/2023 4:43 PM, Chao Gao wrote:
> On Thu, Aug 03, 2023 at 12:27:28AM -0400, Yang Weijiang wrote:
>> Make PL{0,1,2}_SSP as write-intercepted to detect whether
>> guest is using these MSRs. Disable intercept to the MSRs
>> if they're written with non-zero values. KVM does save/
>> reload for the MSRs only if they're used by guest.
> What would happen if guest tries to use XRSTORS to load S_CET state from a
> xsave area without any writes to the PL0-2_SSP (i.e., at that point, writes to
> the MSRs are still intercepted)?
I need to do some experiments to get the details, but now expect some kind
of error in guest is seen.
>> @@ -2420,6 +2432,14 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>> 		else
>> 			vmx->pt_desc.guest.addr_a[index / 2] = data;
>> 		break;
>> +	case MSR_IA32_PL0_SSP ... MSR_IA32_PL2_SSP:
>> +		if (kvm_set_msr_common(vcpu, msr_info))
>> +			return 1;
>> +		if (data) {
>> +			vmx_disable_write_intercept_sss_msr(vcpu);
>> +			wrmsrl(msr_index, data);
> Is it necessary to do the wrmsl()?
> looks the next kvm_x86_prepare_switch_to_guest() will load PL0-2_SSP from the
> caching values.
Oh, yes, it's not necessary after moving the reload logic to kvm_x86_prepare_switch_to_guest().
Thanks!

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 04/19] KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS
  2023-08-09  8:56     ` Yang, Weijiang
@ 2023-08-10  0:01       ` Paolo Bonzini
  2023-08-10  1:12         ` Yang, Weijiang
  0 siblings, 1 reply; 82+ messages in thread
From: Paolo Bonzini @ 2023-08-10  0:01 UTC (permalink / raw)
  To: Yang, Weijiang
  Cc: Sean Christopherson, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, chao.gao, binbin.wu, Zhang Yi Z

On Wed, Aug 9, 2023 at 10:56 AM Yang, Weijiang <weijiang.yang@intel.com> wrote:
> > I'm pretty sure I've advocated for the exact opposite in the past, i.e. argued
> > that KVM's ABI is to not enforce ordering between KVM_SET_CPUID2 and KVM_SET_MSR.
> > But this is becoming untenable, juggling the dependencies in KVM is complex and
> > is going to result in a nasty bug at some point.
> >
> > For this series, lets just tighten the rules for XSS, i.e. drop the host_initated
> > exemption.  And in a parallel/separate series, try to do a wholesale cleanup of
> > all the cases that essentially allow userspace to do KVM_SET_MSR before KVM_SET_CPUID2.
> OK, will do it for this series and investigate for other MSRs.
> Thanks!

Remember that, while the ordering between KVM_SET_CPUID2 and
KVM_SET_MSR must be enforced(*), the host_initiated path must allow
the default (generally 0) value.

Paolo

(*) this means that you should check guest_cpuid_has even if
host_initiated == true.


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 04/19] KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS
  2023-08-10  0:01       ` Paolo Bonzini
@ 2023-08-10  1:12         ` Yang, Weijiang
  0 siblings, 0 replies; 82+ messages in thread
From: Yang, Weijiang @ 2023-08-10  1:12 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, peterz, john.allen, kvm, linux-kernel,
	rick.p.edgecombe, chao.gao, binbin.wu, Zhang Yi Z

On 8/10/2023 8:01 AM, Paolo Bonzini wrote:
> On Wed, Aug 9, 2023 at 10:56 AM Yang, Weijiang <weijiang.yang@intel.com> wrote:
>>> I'm pretty sure I've advocated for the exact opposite in the past, i.e. argued
>>> that KVM's ABI is to not enforce ordering between KVM_SET_CPUID2 and KVM_SET_MSR.
>>> But this is becoming untenable, juggling the dependencies in KVM is complex and
>>> is going to result in a nasty bug at some point.
>>>
>>> For this series, lets just tighten the rules for XSS, i.e. drop the host_initated
>>> exemption.  And in a parallel/separate series, try to do a wholesale cleanup of
>>> all the cases that essentially allow userspace to do KVM_SET_MSR before KVM_SET_CPUID2.
>> OK, will do it for this series and investigate for other MSRs.
>> Thanks!
> Remember that, while the ordering between KVM_SET_CPUID2 and
> KVM_SET_MSR must be enforced(*), the host_initiated path must allow
> the default (generally 0) value.
Yes, will take it, thanks!
> Paolo
>
> (*) this means that you should check guest_cpuid_has even if
> host_initiated == true.
>


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 09/19] KVM:x86: Make guest supervisor states as non-XSAVE managed
  2023-08-04 20:45       ` Sean Christopherson
                           ` (2 preceding siblings ...)
  2023-08-09  2:39         ` Yang, Weijiang
@ 2023-08-10  9:29         ` Yang, Weijiang
  2023-08-10 14:29           ` Dave Hansen
  3 siblings, 1 reply; 82+ messages in thread
From: Yang, Weijiang @ 2023-08-10  9:29 UTC (permalink / raw)
  To: dave.hansen, Thomas Gleixner, peterz, pbonzini, Sean Christopherson
  Cc: Chao Gao, john.allen, kvm, linux-kernel, rick.p.edgecombe, binbin.wu

Hi, Dave, Thomas and Peter,

I would like to connect you to this discussion loop about CET supervisor states support
in kernel so that you may directly talk to KVM maintainers, thanks!
The discussion background/problem/solution as below:

Background:
When KVM enumerates shadow stack support for guest in CPUID(0x7, 0).ECX[bit7],
architecturally it claims both SS user and supervisor mode are supported. Although
the latter is not supported in Linux, but in virtualization world, the guest OS could
be non-Linux system, so KVM supervisor state support is necessary in this case.
Two solutions are on the table:
1) Enable CET supervisor support in Linux kernel like user mode support.
2) Enable support in KVM domain.

Problem:
The Pros/Cons for each solution(my individual thoughts):
In kernel solution:
Pros:
- Avoid saving/restoring 3 supervisor MSRs(PL{0,1,2}_SSP) at vCPU execution path.
- Easy for KVM to manage guest CET xstate bits for guest.
Cons:
- Unnecessary supervisor state xsaves/xrstors operation for non-vCPU thread.
- Potentially extra storage space(24 bytes) for thread context.

KVM solution:
Pros:
- Not touch current kernel FPU management framework and logic.
- No extra space and operation for non-vCPU thread.
Cons:
- Manually saving/restoring 3 supervisor MSRs is a performance burden to KVM.
- It looks more like a hack method for KVM, and some handling logic seems a bit awkward.

KVM maintainers request it supported in kernel instead of KVM to make things streamlined.

We'd like to hear your voice of in kernel solution, favor vs. objection?
Any important points we omitted?
Appreciated!

Solution:
Below is the supervisor state enabling patch in kernel, not include Sean's suggestion below.
=====================================================================
 From 53f9890c76e4163a0fead3afe198d0c17136120e Mon Sep 17 00:00:00 2001
From: Yang Weijiang <weijiang.yang@intel.com>
Date: Thu, 10 Aug 2023 00:10:55 -0400
Subject: [RFC PATCH] x86: fpu: Enable CET supervisor state support

Enable CET supervisor state support within current FPU states management
framework. CET shadow stack feature is enabled with CPUID(0x7,0).ECX[bit7],
if the bit is set, archchtectually both user and supervisor SHSTK should
be supported, i.e., when KVM enumerates the feature bit to guest, it claims
both modes are supported by the VMM.

The user mode SHSTK XSAVE states comprise of IA32_{U_CET,PL3_SSP},
and supervisor mode states inlude IA32_PL{0,1,2}_SSP. The xstate support
for the former is enclosed in native user mode shadow stack series, but
the latter is not supported yet.

KVM is going to support guest shadow stack which means guest's supervisor
shadow stack states should also be well managed by VMM.

To make KVM fully support guest shadow stack states, there're at least two
approaches, one is to enable supervisor xstate bit in kernel, which is
straightforward and can fit well in all cases. The alternative is to enable
the support within KVM domain and manually save/restore the states per vCPU
thread, i.e., introduce addtional WRMSR/RDMSR at vCPU execution path.

This patch doesn't optimize CET supervisor state management, just follow
the implementation of user mode state support.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
  arch/x86/include/asm/fpu/types.h  | 14 ++++++++++++--
  arch/x86/include/asm/fpu/xstate.h |  6 +++---
  arch/x86/kernel/fpu/xstate.c      |  6 +++++-
  3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index eb810074f1e7..c6fd13a17205 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -116,7 +116,7 @@ enum xfeature {
         XFEATURE_PKRU,
         XFEATURE_PASID,
         XFEATURE_CET_USER,
-       XFEATURE_CET_KERNEL_UNUSED,
+       XFEATURE_CET_KERNEL,
         XFEATURE_RSRVD_COMP_13,
         XFEATURE_RSRVD_COMP_14,
         XFEATURE_LBR,
@@ -139,7 +139,7 @@ enum xfeature {
  #define XFEATURE_MASK_PKRU             (1 << XFEATURE_PKRU)
  #define XFEATURE_MASK_PASID            (1 << XFEATURE_PASID)
  #define XFEATURE_MASK_CET_USER         (1 << XFEATURE_CET_USER)
-#define XFEATURE_MASK_CET_KERNEL       (1 << XFEATURE_CET_KERNEL_UNUSED)
+#define XFEATURE_MASK_CET_KERNEL       (1 << XFEATURE_CET_KERNEL)
  #define XFEATURE_MASK_LBR              (1 << XFEATURE_LBR)
  #define XFEATURE_MASK_XTILE_CFG                (1 << XFEATURE_XTILE_CFG)
  #define XFEATURE_MASK_XTILE_DATA       (1 << XFEATURE_XTILE_DATA)
@@ -264,6 +264,16 @@ struct cet_user_state {
         u64 user_ssp;
  };

+/*
+ * State component 12 is Control-flow Enforcement supervisor states
+ */
+struct cet_supervisor_state {
+       /* supervisor ssp pointers  */
+       u64 pl0_ssp;
+       u64 pl1_ssp;
+       u64 pl2_ssp;
+};
+
  /*
   * State component 15: Architectural LBR configuration state.
   * The size of Arch LBR state depends on the number of LBRs (lbr_depth).
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index d4427b88ee12..3b4a038d3c57 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -51,7 +51,8 @@

  /* All currently supported supervisor features */
  #define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID | \
-                                           XFEATURE_MASK_CET_USER)
+                                           XFEATURE_MASK_CET_USER | \
+ XFEATURE_MASK_CET_KERNEL)

  /*
   * A supervisor state component may not always contain valuable information,
@@ -78,8 +79,7 @@
   * Unsupported supervisor features. When a supervisor feature in this mask is
   * supported in the future, move it to the supported supervisor feature mask.
   */
-#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT | \
- XFEATURE_MASK_CET_KERNEL)
+#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT)

  /* All supervisor states including supported and unsupported states. */
  #define XFEATURE_MASK_SUPERVISOR_ALL (XFEATURE_MASK_SUPERVISOR_SUPPORTED | \
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 4fa4751912d9..fc346c7c6916 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -51,7 +51,7 @@ static const char *xfeature_names[] =
         "Protection Keys User registers",
         "PASID state",
         "Control-flow User registers",
-       "Control-flow Kernel registers (unused)",
+       "Control-flow Kernel registers",
         "unknown xstate feature",
         "unknown xstate feature",
         "unknown xstate feature",
@@ -74,6 +74,7 @@ static unsigned short xsave_cpuid_features[] __initdata = {
         [XFEATURE_PKRU]                         = X86_FEATURE_PKU,
         [XFEATURE_PASID]                        = X86_FEATURE_ENQCMD,
         [XFEATURE_CET_USER]                     = X86_FEATURE_SHSTK,
+       [XFEATURE_CET_KERNEL]                   = X86_FEATURE_SHSTK,
         [XFEATURE_XTILE_CFG]                    = X86_FEATURE_AMX_TILE,
         [XFEATURE_XTILE_DATA]                   = X86_FEATURE_AMX_TILE,
  };
@@ -278,6 +279,7 @@ static void __init print_xstate_features(void)
         print_xstate_feature(XFEATURE_MASK_PKRU);
         print_xstate_feature(XFEATURE_MASK_PASID);
         print_xstate_feature(XFEATURE_MASK_CET_USER);
+       print_xstate_feature(XFEATURE_MASK_CET_KERNEL);
         print_xstate_feature(XFEATURE_MASK_XTILE_CFG);
         print_xstate_feature(XFEATURE_MASK_XTILE_DATA);
  }
@@ -347,6 +349,7 @@ static __init void os_xrstor_booting(struct xregs_state *xstate)
          XFEATURE_MASK_BNDCSR |                 \
          XFEATURE_MASK_PASID |                  \
          XFEATURE_MASK_CET_USER |               \
+        XFEATURE_MASK_CET_KERNEL |             \
          XFEATURE_MASK_XTILE)

  /*
@@ -547,6 +550,7 @@ static bool __init check_xstate_against_struct(int nr)
         case XFEATURE_PASID:      return XCHECK_SZ(sz, nr, struct ia32_pasid_state);
         case XFEATURE_XTILE_CFG:  return XCHECK_SZ(sz, nr, struct xtile_cfg);
         case XFEATURE_CET_USER:   return XCHECK_SZ(sz, nr, struct cet_user_state);
+       case XFEATURE_CET_KERNEL: return XCHECK_SZ(sz, nr, struct cet_supervisor_state);
         case XFEATURE_XTILE_DATA: check_xtile_data_against_struct(sz); return true;
         default:
                 XSTATE_WARN_ON(1, "No structure for xstate: %d\n", nr);
--
2.27.0




On 8/5/2023 4:45 AM, Sean Christopherson wrote:
> [...]
> Pulling back in the justification from v3:
>
>   the Pros:
>    - Super easy to implement for KVM.
>    - Automatically avoids saving and restoring this data when the vmexit
>      is handled within KVM.
>
>   the Cons:
>    - Unnecessarily restores XFEATURE_CET_KERNEL when switching to
>      non-KVM task's userspace.
>    - Forces allocating space for this state on all tasks, whether or not
>      they use KVM, and with likely zero users today and the near future.
>    - Complicates the FPU optimization thinking by including things that
>      can have no affect on userspace in the FPU
>
> IMO the pros far outweigh the cons.  3x RDMSR and 3x WRMSR when loading host/guest
> state is non-trivial overhead.  That can be mitigated, e.g. by utilizing the
> user return MSR framework, but it's still unpalatable.  It's unlikely many guests
> will SSS in the *near* future, but I don't want to end up with code that performs
> poorly in the future and needs to be rewritten.
>
> Especially because another big negative is that not utilizing XSTATE bleeds into
> KVM's ABI.  Userspace has to be told to manually save+restore MSRs instead of just
> letting KVM_{G,S}ET_XSAVE handle the state.  And that will create a bit of a
> snafu if Linux does gain support for SSS.
>
> On the other hand, the extra per-task memory is all of 24 bytes.  AFAICT, there's
> literally zero effect on guest XSTATE allocations because those are vmalloc'd and
> thus rounded up to PAGE_SIZE, i.e. the next 4KiB.  And XSTATE needs to be 64-byte
> aligned, so the 24 bytes is only actually meaningful if the current size is within
> 24 bytes of the next cahce line.  And the "current" size is variable depending on
> which features are present and enabled, i.e. it's a roll of the dice as to whether
> or not using XSTATE for supervisor CET would actually increase memory usage.  And
> _if_ it does increase memory consumption, I have a very hard time believing an
> extra 64 bytes in the worst case scenario is a dealbreaker.
>
> If the performance is a concern, i.e. we don't want to eat saving/restoring the
> MSRs when switching to/from host FPU context, then I *think* that's simply a matter
> of keeping guest state resident when loading non-guest FPU state.
>
> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index 1015af1ae562..8e7599e3b923 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -167,6 +167,16 @@ void restore_fpregs_from_fpstate(struct fpstate *fpstate, u64 mask)
>                   */
>                  xfd_update_state(fpstate);
>   
> +               /*
> +                * Leave supervisor CET state as-is when loading host state
> +                * (kernel or userspace).  Supervisor CET state is managed via
> +                * XSTATE for KVM guests, but the host never consumes said
> +                * state (doesn't support supervisor shadow stacks), i.e. it's
> +                * safe to keep guest state loaded into hardware.
> +                */
> +               if (!fpstate->is_guest)
> +                       mask &= ~XFEATURE_MASK_CET_KERNEL;
> +
>                  /*
>                   * Restoring state always needs to modify all features
>                   * which are in @mask even if the current task cannot use
>
>
> So unless I'm missing something, NAK to this approach, at least not without trying
> the kernel FPU approach, i.e. I want somelike like to PeterZ or tglx to actually
> full on NAK the kernel approach before we consider shoving a hack into KVM.


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 09/19] KVM:x86: Make guest supervisor states as non-XSAVE managed
  2023-08-10  9:29         ` Yang, Weijiang
@ 2023-08-10 14:29           ` Dave Hansen
  2023-08-10 15:15             ` Paolo Bonzini
  0 siblings, 1 reply; 82+ messages in thread
From: Dave Hansen @ 2023-08-10 14:29 UTC (permalink / raw)
  To: Yang, Weijiang, Thomas Gleixner, peterz, pbonzini, Sean Christopherson
  Cc: Chao Gao, john.allen, kvm, linux-kernel, rick.p.edgecombe, binbin.wu

On 8/10/23 02:29, Yang, Weijiang wrote:
...
> When KVM enumerates shadow stack support for guest in CPUID(0x7, 
> 0).ECX[bit7], architecturally it claims both SS user and supervisor
> mode are supported. Although the latter is not supported in Linux,
> but in virtualization world, the guest OS could be non-Linux system,
> so KVM supervisor state support is necessary in this case.

What actual OSes need this support?

> Two solutions are on the table:
> 1) Enable CET supervisor support in Linux kernel like user mode support.

We _will_ do this eventually, but not until FRED is merged.  The core
kernel also probably won't be managing the MSRs on non-FRED hardware.

I think what you're really talking about here is that the kernel would
enable CET_S XSAVE state management so that CET_S state could be managed
by the core kernel's FPU code.

That is, frankly, *NOT* like the user mode support at all.

> 2) Enable support in KVM domain.
> 
> Problem:
> The Pros/Cons for each solution(my individual thoughts):
> In kernel solution:
> Pros:
> - Avoid saving/restoring 3 supervisor MSRs(PL{0,1,2}_SSP) at vCPU
>   execution path.
> - Easy for KVM to manage guest CET xstate bits for guest.
> Cons:
> - Unnecessary supervisor state xsaves/xrstors operation for non-vCPU
>   thread.

What operations would be unnecessary exactly?

> - Potentially extra storage space(24 bytes) for thread context.

Yep.  This one is pretty unavoidable.  But, we've kept MPX around in
this state for a looooooong time and nobody really seemed to care.

> KVM solution:
> Pros:
> - Not touch current kernel FPU management framework and logic.
> - No extra space and operation for non-vCPU thread.
> Cons:
> - Manually saving/restoring 3 supervisor MSRs is a performance burden to
>   KVM.
> - It looks more like a hack method for KVM, and some handling logic
>   seems a bit awkward.

In a perfect world, we'd just allocate space for CET_S in the KVM
fpstates.  The core kernel fpstates would have
XSTATE_BV[13]==XCOMP_BV[13]==0.  An XRSTOR of the core kernel fpstates
would just set CET_S to its init state.

But I suspect that would be too much work to implement in practice.  It
would be akin to a new lesser kind of dynamic xstate, one that didn't
interact with XFD and *NEVER* gets allocated in the core kernel
fpstates, even on demand.

I want to hear more about who is going to use CET_S state under KVM in
practice.  I don't want to touch it if this is some kind of purely
academic exercise.  But it's also silly to hack some kind of temporary
solution into KVM that we'll rip out in a year when real supervisor
shadow stack support comes along.

If it's actually necessary, we should probably just eat the 24 bytes in
the fpstates, flip the bit in IA32_XSS and move on.  There shouldn't be
any other meaningful impact to the core kernel.


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 09/19] KVM:x86: Make guest supervisor states as non-XSAVE managed
  2023-08-10 14:29           ` Dave Hansen
@ 2023-08-10 15:15             ` Paolo Bonzini
  2023-08-10 15:37               ` Sean Christopherson
                                 ` (2 more replies)
  0 siblings, 3 replies; 82+ messages in thread
From: Paolo Bonzini @ 2023-08-10 15:15 UTC (permalink / raw)
  To: Dave Hansen, Yang, Weijiang, Thomas Gleixner, peterz,
	Sean Christopherson
  Cc: Chao Gao, john.allen, kvm, linux-kernel, rick.p.edgecombe, binbin.wu

On 8/10/23 16:29, Dave Hansen wrote:
> On 8/10/23 02:29, Yang, Weijiang wrote:
> ...
>> When KVM enumerates shadow stack support for guest in CPUID(0x7,
>> 0).ECX[bit7], architecturally it claims both SS user and supervisor
>> mode are supported. Although the latter is not supported in Linux,
>> but in virtualization world, the guest OS could be non-Linux system,
>> so KVM supervisor state support is necessary in this case.
> 
> What actual OSes need this support?

I think Xen could use it when running nested.  But KVM cannot expose 
support for CET in CPUID, and at the same time fake support for 
MSR_IA32_PL{0,1,2}_SSP (e.g. inject a #GP if it's ever written to a 
nonzero value).

I suppose we could invent our own paravirtualized CPUID bit for 
"supervisor IBT works but supervisor SHSTK doesn't".  Linux could check 
that but I don't think it's a good idea.

So... do, or do not.  There is no try. :)

>> Two solutions are on the table:
>> 1) Enable CET supervisor support in Linux kernel like user mode support.
> 
> We _will_ do this eventually, but not until FRED is merged.  The core
> kernel also probably won't be managing the MSRs on non-FRED hardware.
> 
> I think what you're really talking about here is that the kernel would
> enable CET_S XSAVE state management so that CET_S state could be managed
> by the core kernel's FPU code.

Yes, I understand it that way too.

> That is, frankly, *NOT* like the user mode support at all.

I agree.

>> 2) Enable support in KVM domain.
>>
>> Problem:
>> The Pros/Cons for each solution(my individual thoughts):
>> In kernel solution:
>> Pros:
>> - Avoid saving/restoring 3 supervisor MSRs(PL{0,1,2}_SSP) at vCPU
>>    execution path.
>> - Easy for KVM to manage guest CET xstate bits for guest.
>> Cons:
>> - Unnecessary supervisor state xsaves/xrstors operation for non-vCPU
>>    thread.
> 
> What operations would be unnecessary exactly?

Saving/restoring PL0/1/2_SSP when switching from one usermode task's 
fpstate to another.

>> KVM solution:
>> Pros:
>> - Not touch current kernel FPU management framework and logic.
>> - No extra space and operation for non-vCPU thread.
>> Cons:
>> - Manually saving/restoring 3 supervisor MSRs is a performance burden to
>>    KVM.
>> - It looks more like a hack method for KVM, and some handling logic
>>    seems a bit awkward.
> 
> In a perfect world, we'd just allocate space for CET_S in the KVM
> fpstates.  The core kernel fpstates would have
> XSTATE_BV[13]==XCOMP_BV[13]==0.  An XRSTOR of the core kernel fpstates
> would just set CET_S to its init state.

Yep.  I don't think it's a lot of work to implement.  The basic idea as 
you point out below is something like

#define XFEATURE_MASK_USER_DYNAMIC XFEATURE_MASK_XTILE_DATA
#define XFEATURE_MASK_USER_OPTIONAL \
     (XFEATURE_MASK_DYNAMIC | XFEATURE_MASK_CET_KERNEL)

where XFEATURE_MASK_USER_DYNAMIC is used for xfd-related tasks 
(including the ARCH_GET_XCOMP_SUPP arch_prctl) but everything else uses 
XFEATURE_MASK_USER_OPTIONAL.

KVM would enable the feature by hand when allocating the guest fpstate. 
Disabled features would be cleared from EDX:EAX when calling 
XSAVE/XSAVEC/XSAVES.

> But I suspect that would be too much work to implement in practice.  It
> would be akin to a new lesser kind of dynamic xstate, one that didn't
> interact with XFD and *NEVER* gets allocated in the core kernel
> fpstates, even on demand.
> 
> I want to hear more about who is going to use CET_S state under KVM in
> practice.  I don't want to touch it if this is some kind of purely
> academic exercise.  But it's also silly to hack some kind of temporary
> solution into KVM that we'll rip out in a year when real supervisor
> shadow stack support comes along.
> 
> If it's actually necessary, we should probably just eat the 24 bytes in
> the fpstates, flip the bit in IA32_XSS and move on.  There shouldn't be
> any other meaningful impact to the core kernel.

If that's good to you, why not.

Paolo


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 09/19] KVM:x86: Make guest supervisor states as non-XSAVE managed
  2023-08-10 15:15             ` Paolo Bonzini
@ 2023-08-10 15:37               ` Sean Christopherson
  2023-08-11  3:03               ` Yang, Weijiang
  2023-08-28 21:00               ` Dave Hansen
  2 siblings, 0 replies; 82+ messages in thread
From: Sean Christopherson @ 2023-08-10 15:37 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Dave Hansen, Weijiang Yang, Thomas Gleixner, peterz, Chao Gao,
	john.allen, kvm, linux-kernel, rick.p.edgecombe, binbin.wu

On Thu, Aug 10, 2023, Paolo Bonzini wrote:
> On 8/10/23 16:29, Dave Hansen wrote:
> > On 8/10/23 02:29, Yang, Weijiang wrote:
> > ...
> > > When KVM enumerates shadow stack support for guest in CPUID(0x7,
> > > 0).ECX[bit7], architecturally it claims both SS user and supervisor
> > > mode are supported. Although the latter is not supported in Linux,
> > > but in virtualization world, the guest OS could be non-Linux system,
> > > so KVM supervisor state support is necessary in this case.
> > 
> > What actual OSes need this support?
> 
> I think Xen could use it when running nested.  But KVM cannot expose support
> for CET in CPUID, and at the same time fake support for
> MSR_IA32_PL{0,1,2}_SSP (e.g. inject a #GP if it's ever written to a nonzero
> value).
> 
> I suppose we could invent our own paravirtualized CPUID bit for "supervisor
> IBT works but supervisor SHSTK doesn't".  Linux could check that but I don't
> think it's a good idea.
> 
> So... do, or do not.  There is no try. :)

> > I want to hear more about who is going to use CET_S state under KVM in
> > practice.  I don't want to touch it if this is some kind of purely
> > academic exercise.  But it's also silly to hack some kind of temporary
> > solution into KVM that we'll rip out in a year when real supervisor
> > shadow stack support comes along.

As Paolo alluded to, this is about KVM faithfully emulating the architecture.
There is no combination of CPUID bits that allows KVM to advertise SHSTK for
userspace without advertising SHSTK for supervisor.

Whether or not there are any users in the short term is unfortunately irrelevant
from KVM's perspective.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 09/19] KVM:x86: Make guest supervisor states as non-XSAVE managed
  2023-08-10 15:15             ` Paolo Bonzini
  2023-08-10 15:37               ` Sean Christopherson
@ 2023-08-11  3:03               ` Yang, Weijiang
  2023-08-28 21:00               ` Dave Hansen
  2 siblings, 0 replies; 82+ messages in thread
From: Yang, Weijiang @ 2023-08-11  3:03 UTC (permalink / raw)
  To: Paolo Bonzini, Dave Hansen, Thomas Gleixner, peterz, Sean Christopherson
  Cc: Chao Gao, john.allen, kvm, linux-kernel, rick.p.edgecombe, binbin.wu

On 8/10/2023 11:15 PM, Paolo Bonzini wrote:
> On 8/10/23 16:29, Dave Hansen wrote:
>> On 8/10/23 02:29, Yang, Weijiang wrote:
>> ...
>>> When KVM enumerates shadow stack support for guest in CPUID(0x7,
>>> 0).ECX[bit7], architecturally it claims both SS user and supervisor
>>> mode are supported. Although the latter is not supported in Linux,
>>> but in virtualization world, the guest OS could be non-Linux system,
>>> so KVM supervisor state support is necessary in this case.
>>
>> What actual OSes need this support?
>
> I think Xen could use it when running nested.  But KVM cannot expose support for CET in CPUID, and at the same time fake support for MSR_IA32_PL{0,1,2}_SSP (e.g. inject a #GP if it's ever written to a nonzero value).
>
> I suppose we could invent our own paravirtualized CPUID bit for "supervisor IBT works but supervisor SHSTK doesn't".  Linux could check that but I don't think it's a good idea.
>
> So... do, or do not.  There is no try. :)
>
>>> Two solutions are on the table:
>>> 1) Enable CET supervisor support in Linux kernel like user mode support.
>>
>> We _will_ do this eventually, but not until FRED is merged.  The core
>> kernel also probably won't be managing the MSRs on non-FRED hardware.
>>
>> I think what you're really talking about here is that the kernel would
>> enable CET_S XSAVE state management so that CET_S state could be managed
>> by the core kernel's FPU code.
>
> Yes, I understand it that way too.

Sorry for confusion, I missed the word "state" here.

>> That is, frankly, *NOT* like the user mode support at all.
>
> I agree.
>
>>> 2) Enable support in KVM domain.
>>>
>>> Problem:
>>> The Pros/Cons for each solution(my individual thoughts):
>>> In kernel solution:
>>> Pros:
>>> - Avoid saving/restoring 3 supervisor MSRs(PL{0,1,2}_SSP) at vCPU
>>>    execution path.
>>> - Easy for KVM to manage guest CET xstate bits for guest.
>>> Cons:
>>> - Unnecessary supervisor state xsaves/xrstors operation for non-vCPU
>>>    thread.
>>
>> What operations would be unnecessary exactly?
>
> Saving/restoring PL0/1/2_SSP when switching from one usermode task's fpstate to another.
>
>>> KVM solution:
>>> Pros:
>>> - Not touch current kernel FPU management framework and logic.
>>> - No extra space and operation for non-vCPU thread.
>>> Cons:
>>> - Manually saving/restoring 3 supervisor MSRs is a performance burden to
>>>    KVM.
>>> - It looks more like a hack method for KVM, and some handling logic
>>>    seems a bit awkward.
>>
>> In a perfect world, we'd just allocate space for CET_S in the KVM
>> fpstates.  The core kernel fpstates would have
>> XSTATE_BV[13]==XCOMP_BV[13]==0.  An XRSTOR of the core kernel fpstates
>> would just set CET_S to its init state.
>
> Yep.  I don't think it's a lot of work to implement.  The basic idea as you point out below is something like
>
> #define XFEATURE_MASK_USER_DYNAMIC XFEATURE_MASK_XTILE_DATA
> #define XFEATURE_MASK_USER_OPTIONAL \
>     (XFEATURE_MASK_DYNAMIC | XFEATURE_MASK_CET_KERNEL)
>
> where XFEATURE_MASK_USER_DYNAMIC is used for xfd-related tasks (including the ARCH_GET_XCOMP_SUPP arch_prctl) but everything else uses XFEATURE_MASK_USER_OPTIONAL.
>
> KVM would enable the feature by hand when allocating the guest fpstate. Disabled features would be cleared from EDX:EAX when calling XSAVE/XSAVEC/XSAVES.

OK, I'll move ahead in that direction.

>> But I suspect that would be too much work to implement in practice.  It
>> would be akin to a new lesser kind of dynamic xstate, one that didn't
>> interact with XFD and *NEVER* gets allocated in the core kernel
>> fpstates, even on demand.
>>
>> I want to hear more about who is going to use CET_S state under KVM in
>> practice.  I don't want to touch it if this is some kind of purely
>> academic exercise.  But it's also silly to hack some kind of temporary
>> solution into KVM that we'll rip out in a year when real supervisor
>> shadow stack support comes along.
>>
>> If it's actually necessary, we should probably just eat the 24 bytes in
>> the fpstates, flip the bit in IA32_XSS and move on.  There shouldn't be
>> any other meaningful impact to the core kernel.
>
> If that's good to you, why not.

Thanks to all of you for quickly helping out!

> Paolo
>


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 09/19] KVM:x86: Make guest supervisor states as non-XSAVE managed
  2023-08-10 15:15             ` Paolo Bonzini
  2023-08-10 15:37               ` Sean Christopherson
  2023-08-11  3:03               ` Yang, Weijiang
@ 2023-08-28 21:00               ` Dave Hansen
  2023-08-29  7:05                 ` Yang, Weijiang
  2 siblings, 1 reply; 82+ messages in thread
From: Dave Hansen @ 2023-08-28 21:00 UTC (permalink / raw)
  To: Paolo Bonzini, Yang, Weijiang, Thomas Gleixner, peterz,
	Sean Christopherson
  Cc: Chao Gao, john.allen, kvm, linux-kernel, rick.p.edgecombe, binbin.wu

On 8/10/23 08:15, Paolo Bonzini wrote:
> On 8/10/23 16:29, Dave Hansen wrote:
>> What actual OSes need this support?
> 
> I think Xen could use it when running nested.  But KVM cannot expose
> support for CET in CPUID, and at the same time fake support for
> MSR_IA32_PL{0,1,2}_SSP (e.g. inject a #GP if it's ever written to a
> nonzero value).
> 
> I suppose we could invent our own paravirtualized CPUID bit for
> "supervisor IBT works but supervisor SHSTK doesn't".  Linux could check
> that but I don't think it's a good idea.
> 
> So... do, or do not.  There is no try. :)

Ahh, that makes sense.  This is needed for implementing the
*architecture*, not because some OS actually wants to _do_ it.

...
>> In a perfect world, we'd just allocate space for CET_S in the KVM
>> fpstates.  The core kernel fpstates would have
>> XSTATE_BV[13]==XCOMP_BV[13]==0.  An XRSTOR of the core kernel fpstates
>> would just set CET_S to its init state.
> 
> Yep.  I don't think it's a lot of work to implement.  The basic idea as
> you point out below is something like
> 
> #define XFEATURE_MASK_USER_DYNAMIC XFEATURE_MASK_XTILE_DATA
> #define XFEATURE_MASK_USER_OPTIONAL \
>     (XFEATURE_MASK_DYNAMIC | XFEATURE_MASK_CET_KERNEL)
> 
> where XFEATURE_MASK_USER_DYNAMIC is used for xfd-related tasks
> (including the ARCH_GET_XCOMP_SUPP arch_prctl) but everything else uses
> XFEATURE_MASK_USER_OPTIONAL.
> 
> KVM would enable the feature by hand when allocating the guest fpstate.
> Disabled features would be cleared from EDX:EAX when calling
> XSAVE/XSAVEC/XSAVES.

OK, so let's _try_ this perfect-world solution.  KVM fpstates get
fpstate->xfeatures[13] set, but no normal task fpstates have that bit
set.  Most of the infrastructure should be there to handle this without
much fuss because it _should_ be looking at generic things like
fpstate->size and fpstate->features.

But who knows what trouble this will turn up.  It could get nasty and
not worth it, but we should at least try it.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v5 09/19] KVM:x86: Make guest supervisor states as non-XSAVE managed
  2023-08-28 21:00               ` Dave Hansen
@ 2023-08-29  7:05                 ` Yang, Weijiang
  0 siblings, 0 replies; 82+ messages in thread
From: Yang, Weijiang @ 2023-08-29  7:05 UTC (permalink / raw)
  To: Dave Hansen, Paolo Bonzini, Thomas Gleixner, peterz, Sean Christopherson
  Cc: Chao Gao, john.allen, kvm, linux-kernel, rick.p.edgecombe, binbin.wu

On 8/29/2023 5:00 AM, Dave Hansen wrote:
> On 8/10/23 08:15, Paolo Bonzini wrote:
>> On 8/10/23 16:29, Dave Hansen wrote:
>>> What actual OSes need this support?
>> I think Xen could use it when running nested.  But KVM cannot expose
>> support for CET in CPUID, and at the same time fake support for
>> MSR_IA32_PL{0,1,2}_SSP (e.g. inject a #GP if it's ever written to a
>> nonzero value).
>>
>> I suppose we could invent our own paravirtualized CPUID bit for
>> "supervisor IBT works but supervisor SHSTK doesn't".  Linux could check
>> that but I don't think it's a good idea.
>>
>> So... do, or do not.  There is no try. :)
> Ahh, that makes sense.  This is needed for implementing the
> *architecture*, not because some OS actually wants to _do_ it.
>
> ...
>>> In a perfect world, we'd just allocate space for CET_S in the KVM
>>> fpstates.  The core kernel fpstates would have
>>> XSTATE_BV[13]==XCOMP_BV[13]==0.  An XRSTOR of the core kernel fpstates
>>> would just set CET_S to its init state.
>> Yep.  I don't think it's a lot of work to implement.  The basic idea as
>> you point out below is something like
>>
>> #define XFEATURE_MASK_USER_DYNAMIC XFEATURE_MASK_XTILE_DATA
>> #define XFEATURE_MASK_USER_OPTIONAL \
>>      (XFEATURE_MASK_DYNAMIC | XFEATURE_MASK_CET_KERNEL)
>>
>> where XFEATURE_MASK_USER_DYNAMIC is used for xfd-related tasks
>> (including the ARCH_GET_XCOMP_SUPP arch_prctl) but everything else uses
>> XFEATURE_MASK_USER_OPTIONAL.
>>
>> KVM would enable the feature by hand when allocating the guest fpstate.
>> Disabled features would be cleared from EDX:EAX when calling
>> XSAVE/XSAVEC/XSAVES.
> OK, so let's _try_ this perfect-world solution.  KVM fpstates get
> fpstate->xfeatures[13] set, but no normal task fpstates have that bit
> set.  Most of the infrastructure should be there to handle this without
> much fuss because it _should_ be looking at generic things like
> fpstate->size and fpstate->features.
>
> But who knows what trouble this will turn up.  It could get nasty and
> not worth it, but we should at least try it.

Thanks Dave for clarity!
I'm moving in that direction...


^ permalink raw reply	[flat|nested] 82+ messages in thread

end of thread, other threads:[~2023-08-29  7:06 UTC | newest]

Thread overview: 82+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-03  4:27 [PATCH v5 00/19] Enable CET Virtualization Yang Weijiang
2023-08-03  4:27 ` [PATCH v5 01/19] x86/cpufeatures: Add CPU feature flags for shadow stacks Yang Weijiang
2023-08-03  4:27 ` [PATCH v5 02/19] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states Yang Weijiang
2023-08-03  4:27 ` [PATCH v5 03/19] KVM:x86: Report XSS as to-be-saved if there are supported features Yang Weijiang
2023-08-03  4:27 ` [PATCH v5 04/19] KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS Yang Weijiang
2023-08-04 16:02   ` Sean Christopherson
2023-08-04 21:43     ` Paolo Bonzini
2023-08-09  3:11       ` Yang, Weijiang
2023-08-08 14:20     ` Yang, Weijiang
2023-08-04 18:27   ` Sean Christopherson
2023-08-07  6:55     ` Paolo Bonzini
2023-08-09  8:56     ` Yang, Weijiang
2023-08-10  0:01       ` Paolo Bonzini
2023-08-10  1:12         ` Yang, Weijiang
2023-08-03  4:27 ` [PATCH v5 05/19] KVM:x86: Initialize kvm_caps.supported_xss Yang Weijiang
2023-08-04 18:45   ` Sean Christopherson
2023-08-08 15:08     ` Yang, Weijiang
2023-08-03  4:27 ` [PATCH v5 06/19] KVM:x86: Load guest FPU state when access XSAVE-managed MSRs Yang Weijiang
2023-08-03  4:27 ` [PATCH v5 07/19] KVM:x86: Add fault checks for guest CR4.CET setting Yang Weijiang
2023-08-03  9:07   ` Chao Gao
2023-08-03  4:27 ` [PATCH v5 08/19] KVM:x86: Report KVM supported CET MSRs as to-be-saved Yang Weijiang
2023-08-03 10:39   ` Chao Gao
2023-08-04  3:13     ` Yang, Weijiang
2023-08-04  5:51       ` Chao Gao
2023-08-04 18:51         ` Sean Christopherson
2023-08-04 22:01           ` Paolo Bonzini
2023-08-08 15:16           ` Yang, Weijiang
2023-08-06  8:54         ` Yang, Weijiang
2023-08-04 18:55   ` Sean Christopherson
2023-08-08 15:26     ` Yang, Weijiang
2023-08-04 21:47   ` Paolo Bonzini
2023-08-09  3:14     ` Yang, Weijiang
2023-08-03  4:27 ` [PATCH v5 09/19] KVM:x86: Make guest supervisor states as non-XSAVE managed Yang Weijiang
2023-08-03 11:15   ` Chao Gao
2023-08-04  3:26     ` Yang, Weijiang
2023-08-04 20:45       ` Sean Christopherson
2023-08-04 20:59         ` Peter Zijlstra
2023-08-04 21:32         ` Paolo Bonzini
2023-08-09  2:51           ` Yang, Weijiang
2023-08-09  2:39         ` Yang, Weijiang
2023-08-10  9:29         ` Yang, Weijiang
2023-08-10 14:29           ` Dave Hansen
2023-08-10 15:15             ` Paolo Bonzini
2023-08-10 15:37               ` Sean Christopherson
2023-08-11  3:03               ` Yang, Weijiang
2023-08-28 21:00               ` Dave Hansen
2023-08-29  7:05                 ` Yang, Weijiang
2023-08-03  4:27 ` [PATCH v5 10/19] KVM:VMX: Introduce CET VMCS fields and control bits Yang Weijiang
2023-08-03  4:27 ` [PATCH v5 11/19] KVM:VMX: Emulate read and write to CET MSRs Yang Weijiang
2023-08-04  5:14   ` Chao Gao
2023-08-04 21:27     ` Sean Christopherson
2023-08-04 21:45       ` Paolo Bonzini
2023-08-04 22:21         ` Sean Christopherson
2023-08-07  7:03           ` Paolo Bonzini
2023-08-06  8:44       ` Yang, Weijiang
2023-08-07  7:00         ` Paolo Bonzini
2023-08-04  8:28   ` Chao Gao
2023-08-09  7:12     ` Yang, Weijiang
2023-08-04 21:40   ` Paolo Bonzini
2023-08-09  3:05     ` Yang, Weijiang
2023-08-03  4:27 ` [PATCH v5 12/19] KVM:x86: Save and reload SSP to/from SMRAM Yang Weijiang
2023-08-04  7:53   ` Chao Gao
2023-08-04 15:25     ` Sean Christopherson
2023-08-06  9:14       ` Yang, Weijiang
2023-08-03  4:27 ` [PATCH v5 13/19] KVM:VMX: Set up interception for CET MSRs Yang Weijiang
2023-08-04  8:16   ` Chao Gao
2023-08-06  9:22     ` Yang, Weijiang
2023-08-07  1:16       ` Chao Gao
2023-08-09  6:11         ` Yang, Weijiang
2023-08-03  4:27 ` [PATCH v5 14/19] KVM:VMX: Set host constant supervisor states to VMCS fields Yang Weijiang
2023-08-04  8:23   ` Chao Gao
2023-08-03  4:27 ` [PATCH v5 15/19] KVM:x86: Optimize CET supervisor SSP save/reload Yang Weijiang
2023-08-04  8:43   ` Chao Gao
2023-08-09  9:00     ` Yang, Weijiang
2023-08-03  4:27 ` [PATCH v5 16/19] KVM:x86: Enable CET virtualization for VMX and advertise to userspace Yang Weijiang
2023-08-03  4:27 ` [PATCH v5 17/19] KVM:x86: Enable guest CET supervisor xstate bit support Yang Weijiang
2023-08-04 22:02   ` Paolo Bonzini
2023-08-09  6:07     ` Yang, Weijiang
2023-08-03  4:27 ` [PATCH v5 18/19] KVM:nVMX: Refine error code injection to nested VM Yang Weijiang
2023-08-04 21:38   ` Sean Christopherson
2023-08-09  3:00     ` Yang, Weijiang
2023-08-03  4:27 ` [PATCH v5 19/19] KVM:nVMX: Enable CET support for " Yang Weijiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).