linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 00/11] Add TDX Guest Support (Initial support)
@ 2021-09-03 17:28 Kuppuswamy Sathyanarayanan
  2021-09-03 17:28 ` [PATCH v6 01/11] x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT Kuppuswamy Sathyanarayanan
                   ` (10 more replies)
  0 siblings, 11 replies; 35+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-09-03 17:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

Hi All,

Intel's Trust Domain Extensions (TDX) protect guest VMs from malicious
hosts and some physical attacks. This series adds the basic TDX guest
infrastructure support (including #VE handler support, and #VE support
for halt and CPUID). This is just a subset of patches in the bare minimum
TDX support patch list which is required for supporting minimal
functional TDX guest. Other basic feature features like #VE support for
IO, MMIO, boot optimization fixes and shared-mm support will be submitted
in a separate patch set. To make reviewing easier we split it into smaller
series. This series alone is not necessarily fully functional.

Also, the host-side support patches, and support for advanced TD guest
features like attestation or debug-mode will be submitted at a later time.
Also, at this point it is not secure with some known holes in drivers, and
also hasn’t been fully audited and fuzzed yet.

TDX has a lot of similarities to SEV. It enhances confidentiality and
of guest memory and state (like registers) and includes a new exception
(#VE) for the same basic reasons as SEV-ES. Like SEV-SNP (not merged
yet), TDX limits the host's ability to effect changes in the guest
physical address space. With TDX the host cannot access the guest memory,
so various functionality that would normally be done in KVM has moved
into a (paravirtualized) guest. Partially this is done using the
Virtualization Exception (#VE) and partially with direct paravirtual hooks.

The TDX architecture also includes a new CPU mode called
Secure-Arbitration Mode (SEAM). The software (TDX module) running in this
mode arbitrates interactions between host and guest and implements many of
the guarantees of the TDX architecture.

Some of the key differences between TD and regular VM is,

1. Multi CPU bring-up is done using the ACPI MADT wake-up table.
2. A new #VE exception handler is added. The TDX module injects #VE exception
   to the guest TD in cases of instructions that need to be emulated, disallowed
   MSR accesses, etc.
3. By default memory is marked as private, and TD will selectively share it with
   VMM based on need.
   
Note that the kernel will also need to be hardened against low level inputs from
the now untrusted hosts. This will be done in follow on patches.

You can find TDX related documents in the following link.

https://software.intel.com/content/www/br/pt/develop/articles/intel-trust-domain-extensions.html

This patchset has dependency on protected guest changes submitted by Tom Lendacky

https://lore.kernel.org/patchwork/cover/1468760/

Changes since v5:
 * Moved patch titled "x86/tdx: Get TD execution environment
   information via TDINFO" to the series which uses it.
 * Rest of the change log is added per patch.

Changes since v4:
 * Added a patch that adds TDX guest exception for CSTAR MSR.
 * Rebased on top of Tom Lendacky's protected guest changes.
 * Rest of the change log is added per patch.

Changes since v3:
 * Moved generic protected guest changes from patch titled "x86:
   Introduce generic protected guest abstraction" into seperate
   patch outside this patchset. Also, TDX specific changes are
   moved to patch titled "x86/tdx: Add protected guest support
   for TDX guest"
 * Rebased on top of v5.14-rc1.
 * Rest of the change log is added per patch.

Changes since v1 (v2 is partial set submission):
 * Patch titled "x86/x86: Add early_is_tdx_guest() interface" is moved
   out of this series.
 * Rest of the change log is added per patch.

Andi Kleen (1):
  x86/tdx: Don't write CSTAR MSR on Intel

Kirill A. Shutemov (6):
  x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT
  x86/traps: Add #VE support for TDX guest
  x86/tdx: Add HLT support for TDX guest
  x86/tdx: Wire up KVM hypercalls
  x86/tdx: Add MSR support for TDX guest
  x86/tdx: Handle CPUID via #VE

Kuppuswamy Sathyanarayanan (4):
  x86/tdx: Introduce INTEL_TDX_GUEST config option
  x86/cpufeatures: Add TDX Guest CPU feature
  x86/tdx: Add protected guest support for TDX guest
  x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper
    functions

 arch/x86/Kconfig                       |  15 ++
 arch/x86/include/asm/asm-prototypes.h  |   1 +
 arch/x86/include/asm/cpufeatures.h     |   1 +
 arch/x86/include/asm/idtentry.h        |   4 +
 arch/x86/include/asm/irqflags.h        |  48 ++--
 arch/x86/include/asm/kvm_para.h        |  22 ++
 arch/x86/include/asm/paravirt.h        |  20 +-
 arch/x86/include/asm/paravirt_types.h  |   3 +-
 arch/x86/include/asm/protected_guest.h |   9 +
 arch/x86/include/asm/tdx.h             | 109 +++++++++
 arch/x86/kernel/Makefile               |   2 +
 arch/x86/kernel/asm-offsets.c          |  23 ++
 arch/x86/kernel/cpu/common.c           |  11 +-
 arch/x86/kernel/cpu/intel.c            |  14 ++
 arch/x86/kernel/head64.c               |   3 +
 arch/x86/kernel/idt.c                  |   6 +
 arch/x86/kernel/paravirt.c             |   3 +-
 arch/x86/kernel/tdcall.S               | 314 +++++++++++++++++++++++++
 arch/x86/kernel/tdx.c                  | 237 +++++++++++++++++++
 arch/x86/kernel/traps.c                |  77 ++++++
 include/linux/protected_guest.h        |   3 +
 21 files changed, 890 insertions(+), 35 deletions(-)
 create mode 100644 arch/x86/include/asm/tdx.h
 create mode 100644 arch/x86/kernel/tdcall.S
 create mode 100644 arch/x86/kernel/tdx.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v6 01/11] x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT
  2021-09-03 17:28 [PATCH v6 00/11] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
@ 2021-09-03 17:28 ` Kuppuswamy Sathyanarayanan
  2021-09-03 18:22   ` Borislav Petkov
  2021-09-03 17:28 ` [PATCH v6 02/11] x86/tdx: Introduce INTEL_TDX_GUEST config option Kuppuswamy Sathyanarayanan
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 35+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-09-03 17:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

CONFIG_PARAVIRT_XXL is mainly defined/used by XEN PV guests. For
other VM guest types, features supported under CONFIG_PARAVIRT
are self sufficient. CONFIG_PARAVIRT mainly provides support for
TLB flush operations and time related operations.

For TDX guest as well, paravirt calls under CONFIG_PARVIRT meets
most of its requirement except the need of HLT and SAFE_HLT
paravirt calls, which is currently defined under
COFNIG_PARAVIRT_XXL.

Since enabling CONFIG_PARAVIRT_XXL is too bloated for TDX guest
like platforms, move HLT and SAFE_HLT paravirt calls under
CONFIG_PARAVIRT.

Moving HLT and SAFE_HLT paravirt calls are not fatal and should not
break any functionality for current users of CONFIG_PARAVIRT.

Co-developed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---

Change since v5:
 * Removed "/* Irq HLT ops. */" redundant comment.
 * Included asm/paravirt.h in asm/irqflags.h for
   CONFIG_PARAVIRT case to fix possible compilation issue.
 * Fixed typo in commit log.

Changes since v4:
 * None.

 arch/x86/include/asm/irqflags.h       | 48 ++++++++++++++++-----------
 arch/x86/include/asm/paravirt.h       | 20 +++++------
 arch/x86/include/asm/paravirt_types.h |  3 +-
 arch/x86/kernel/paravirt.c            |  3 +-
 4 files changed, 41 insertions(+), 33 deletions(-)

diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index c5ce9845c999..ddc77c95adc6 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -59,27 +59,8 @@ static inline __cpuidle void native_halt(void)
 
 #endif
 
-#ifdef CONFIG_PARAVIRT_XXL
-#include <asm/paravirt.h>
-#else
+#ifndef CONFIG_PARAVIRT
 #ifndef __ASSEMBLY__
-#include <linux/types.h>
-
-static __always_inline unsigned long arch_local_save_flags(void)
-{
-	return native_save_fl();
-}
-
-static __always_inline void arch_local_irq_disable(void)
-{
-	native_irq_disable();
-}
-
-static __always_inline void arch_local_irq_enable(void)
-{
-	native_irq_enable();
-}
-
 /*
  * Used in the idle loop; sti takes one instruction cycle
  * to complete:
@@ -97,6 +78,33 @@ static inline __cpuidle void halt(void)
 {
 	native_halt();
 }
+#endif /* __ASSEMBLY__ */
+#endif /* CONFIG_PARAVIRT */
+
+#ifdef CONFIG_PARAVIRT
+#ifndef __ASSEMBLY__
+#include <asm/paravirt.h>
+#endif /* __ASSEMBLY__ */
+#endif /* CONFIG_PARAVIRT */
+
+#ifndef CONFIG_PARAVIRT_XXL
+#ifndef __ASSEMBLY__
+#include <linux/types.h>
+
+static __always_inline unsigned long arch_local_save_flags(void)
+{
+	return native_save_fl();
+}
+
+static __always_inline void arch_local_irq_disable(void)
+{
+	native_irq_disable();
+}
+
+static __always_inline void arch_local_irq_enable(void)
+{
+	native_irq_enable();
+}
 
 /*
  * For spinlocks, etc:
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index da3a1ac82be5..d323a626c7a8 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -97,6 +97,16 @@ static inline void paravirt_arch_exit_mmap(struct mm_struct *mm)
 	PVOP_VCALL1(mmu.exit_mmap, mm);
 }
 
+static inline void arch_safe_halt(void)
+{
+	PVOP_VCALL0(irq.safe_halt);
+}
+
+static inline void halt(void)
+{
+	PVOP_VCALL0(irq.halt);
+}
+
 #ifdef CONFIG_PARAVIRT_XXL
 static inline void load_sp0(unsigned long sp0)
 {
@@ -162,16 +172,6 @@ static inline void __write_cr4(unsigned long x)
 	PVOP_VCALL1(cpu.write_cr4, x);
 }
 
-static inline void arch_safe_halt(void)
-{
-	PVOP_VCALL0(irq.safe_halt);
-}
-
-static inline void halt(void)
-{
-	PVOP_VCALL0(irq.halt);
-}
-
 static inline void wbinvd(void)
 {
 	PVOP_ALT_VCALL0(cpu.wbinvd, "wbinvd", ALT_NOT(X86_FEATURE_XENPV));
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index d9d6b0203ec4..40082847f314 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -150,10 +150,9 @@ struct pv_irq_ops {
 	struct paravirt_callee_save save_fl;
 	struct paravirt_callee_save irq_disable;
 	struct paravirt_callee_save irq_enable;
-
+#endif
 	void (*safe_halt)(void);
 	void (*halt)(void);
-#endif
 } __no_randomize_layout;
 
 struct pv_mmu_ops {
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 04cafc057bed..8cea6e75ba29 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -283,9 +283,10 @@ struct paravirt_patch_template pv_ops = {
 	.irq.save_fl		= __PV_IS_CALLEE_SAVE(native_save_fl),
 	.irq.irq_disable	= __PV_IS_CALLEE_SAVE(native_irq_disable),
 	.irq.irq_enable		= __PV_IS_CALLEE_SAVE(native_irq_enable),
+#endif /* CONFIG_PARAVIRT_XXL */
+
 	.irq.safe_halt		= native_safe_halt,
 	.irq.halt		= native_halt,
-#endif /* CONFIG_PARAVIRT_XXL */
 
 	/* Mmu ops. */
 	.mmu.flush_tlb_user	= native_flush_tlb_local,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v6 02/11] x86/tdx: Introduce INTEL_TDX_GUEST config option
  2021-09-03 17:28 [PATCH v6 00/11] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
  2021-09-03 17:28 ` [PATCH v6 01/11] x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT Kuppuswamy Sathyanarayanan
@ 2021-09-03 17:28 ` Kuppuswamy Sathyanarayanan
  2021-09-28 11:37   ` Joerg Roedel
  2021-09-03 17:28 ` [PATCH v6 03/11] x86/cpufeatures: Add TDX Guest CPU feature Kuppuswamy Sathyanarayanan
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 35+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-09-03 17:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

Add INTEL_TDX_GUEST config option to selectively compile
TDX guest support.

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---

Changes since v5:
 * None

Changes since v4:
 * None

Changes since v3:
 * Removed PARAVIRT_XL from dependency list.
 * Fixed typo in help content.

 arch/x86/Kconfig | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index de58e0a5df50..ab0e7c346c44 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -865,6 +865,20 @@ config ACRN_GUEST
 	  IOT with small footprint and real-time features. More details can be
 	  found in https://projectacrn.org/.
 
+config INTEL_TDX_GUEST
+	bool "Intel Trusted Domain eXtensions Guest Support"
+	depends on X86_64 && CPU_SUP_INTEL && PARAVIRT
+	depends on SECURITY
+	select X86_X2APIC
+	select SECURITY_LOCKDOWN_LSM
+	help
+	  Provide support for running in a trusted domain on Intel processors
+	  equipped with Trusted Domain eXtensions. TDX is a new Intel
+	  technology that extends VMX and Memory Encryption with a new kind of
+	  virtual machine guest called Trust Domain (TD). A TD is designed to
+	  run in a CPU mode that protects the confidentiality of TD memory
+	  contents and the TD’s CPU state from other software, including VMM.
+
 endif #HYPERVISOR_GUEST
 
 source "arch/x86/Kconfig.cpu"
-- 
2.25.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v6 03/11] x86/cpufeatures: Add TDX Guest CPU feature
  2021-09-03 17:28 [PATCH v6 00/11] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
  2021-09-03 17:28 ` [PATCH v6 01/11] x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT Kuppuswamy Sathyanarayanan
  2021-09-03 17:28 ` [PATCH v6 02/11] x86/tdx: Introduce INTEL_TDX_GUEST config option Kuppuswamy Sathyanarayanan
@ 2021-09-03 17:28 ` Kuppuswamy Sathyanarayanan
  2021-09-28 11:39   ` Joerg Roedel
  2021-09-03 17:28 ` [PATCH v6 04/11] x86/tdx: Add protected guest support for TDX guest Kuppuswamy Sathyanarayanan
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 35+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-09-03 17:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

Add CPU feature detection for Trusted Domain Extensions support. TDX
feature adds capabilities to keep guest register state and memory
isolated from hypervisor.

For TDX guest platforms, executing CPUID(eax=0x21, ecx=0) will return
following values in EAX, EBX, ECX and EDX.

EAX:  Maximum sub-leaf number:  0
EBX/EDX/ECX:  Vendor string:

EBX =  "Inte"
EDX =  "lTDX"
ECX =  "    "

So when above condition is true, set X86_FEATURE_TDX_GUEST feature cap
bit.

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Changes since v5:
 * Included KASAN_SANITIZE for tdx.c.

Changes since v4:
 * Moved tdx_early_init() below copy_bootdata() because of
   cmdline and IDT dependencies.

Changes since v3:
 * Fixed order of string cpuid_count() call.

Changes since v2:
 * Fixed debug prints as per Borislav suggestion.

Changes since v1:
 * Fixed commit log issues reported by Borislav.
 * Moved header file include to the start of tdx.h.
 * Added pr_fmt for TDX.
 * Simplified cpuid_has_tdx_guest() implementation as per
   Borislav comments.

 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/include/asm/tdx.h         | 20 ++++++++++++++++++++
 arch/x86/kernel/Makefile           |  2 ++
 arch/x86/kernel/head64.c           |  3 +++
 arch/x86/kernel/tdx.c              | 29 +++++++++++++++++++++++++++++
 5 files changed, 55 insertions(+)
 create mode 100644 arch/x86/include/asm/tdx.h
 create mode 100644 arch/x86/kernel/tdx.c

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index d0ce5cfd3ac1..84997abeb401 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -238,6 +238,7 @@
 #define X86_FEATURE_VMW_VMMCALL		( 8*32+19) /* "" VMware prefers VMMCALL hypercall instruction */
 #define X86_FEATURE_PVUNLOCK		( 8*32+20) /* "" PV unlock function */
 #define X86_FEATURE_VCPUPREEMPT		( 8*32+21) /* "" PV vcpu_is_preempted function */
+#define X86_FEATURE_TDX_GUEST		( 8*32+22) /* Trusted Domain Extensions Guest */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0 (EBX), word 9 */
 #define X86_FEATURE_FSGSBASE		( 9*32+ 0) /* RDFSBASE, WRFSBASE, RDGSBASE, WRGSBASE instructions*/
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
new file mode 100644
index 000000000000..c738bde944d1
--- /dev/null
+++ b/arch/x86/include/asm/tdx.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2020 Intel Corporation */
+#ifndef _ASM_X86_TDX_H
+#define _ASM_X86_TDX_H
+
+#include <linux/cpufeature.h>
+
+#define TDX_CPUID_LEAF_ID	0x21
+
+#ifdef CONFIG_INTEL_TDX_GUEST
+
+void __init tdx_early_init(void);
+
+#else
+
+static inline void tdx_early_init(void) { };
+
+#endif /* CONFIG_INTEL_TDX_GUEST */
+
+#endif /* _ASM_X86_TDX_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 3e625c61f008..744f1316a079 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -29,6 +29,7 @@ KASAN_SANITIZE_dumpstack_$(BITS).o			:= n
 KASAN_SANITIZE_stacktrace.o				:= n
 KASAN_SANITIZE_paravirt.o				:= n
 KASAN_SANITIZE_sev.o					:= n
+KASAN_SANITIZE_tdx.o					:= n
 
 # With some compiler versions the generated code results in boot hangs, caused
 # by several compilation units. To be safe, disable all instrumentation.
@@ -127,6 +128,7 @@ obj-$(CONFIG_PARAVIRT_CLOCK)	+= pvclock.o
 obj-$(CONFIG_X86_PMEM_LEGACY_DEVICE) += pmem.o
 
 obj-$(CONFIG_JAILHOUSE_GUEST)	+= jailhouse.o
+obj-$(CONFIG_INTEL_TDX_GUEST)	+= tdx.o
 
 obj-$(CONFIG_EISA)		+= eisa.o
 obj-$(CONFIG_PCSPKR_PLATFORM)	+= pcspeaker.o
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index cafed6456d45..10ff14a09414 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -40,6 +40,7 @@
 #include <asm/extable.h>
 #include <asm/trapnr.h>
 #include <asm/sev.h>
+#include <asm/tdx.h>
 
 /*
  * Manage page tables very early on.
@@ -495,6 +496,8 @@ asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)
 
 	copy_bootdata(__va(real_mode_data));
 
+	tdx_early_init();
+
 	/*
 	 * Load microcode early on BSP.
 	 */
diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
new file mode 100644
index 000000000000..39dd1515b131
--- /dev/null
+++ b/arch/x86/kernel/tdx.c
@@ -0,0 +1,29 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2020 Intel Corporation */
+
+#undef pr_fmt
+#define pr_fmt(fmt)     "x86/tdx: " fmt
+
+#include <asm/tdx.h>
+
+static inline bool cpuid_has_tdx_guest(void)
+{
+	u32 eax, sig[3];
+
+	if (cpuid_eax(0) < TDX_CPUID_LEAF_ID)
+		return false;
+
+	cpuid_count(TDX_CPUID_LEAF_ID, 0, &eax, &sig[0], &sig[2], &sig[1]);
+
+	return !memcmp("IntelTDX    ", sig, 12);
+}
+
+void __init tdx_early_init(void)
+{
+	if (!cpuid_has_tdx_guest())
+		return;
+
+	setup_force_cpu_cap(X86_FEATURE_TDX_GUEST);
+
+	pr_info("Guest initialized\n");
+}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v6 04/11] x86/tdx: Add protected guest support for TDX guest
  2021-09-03 17:28 [PATCH v6 00/11] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
                   ` (2 preceding siblings ...)
  2021-09-03 17:28 ` [PATCH v6 03/11] x86/cpufeatures: Add TDX Guest CPU feature Kuppuswamy Sathyanarayanan
@ 2021-09-03 17:28 ` Kuppuswamy Sathyanarayanan
  2021-09-28 11:46   ` Joerg Roedel
  2021-09-03 17:28 ` [PATCH v6 05/11] x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper functions Kuppuswamy Sathyanarayanan
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 35+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-09-03 17:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

TDX architecture provides a way for VM guests to be highly secure and
isolated (from untrusted VMM). To achieve this requirement, any data
coming from VMM cannot be completely trusted. TDX guest fixes this
issue by hardening the IO drivers against the attack from the VMM.
So, when adding hardening fixes to the generic drivers, to protect
custom fixes use prot_guest_has() API.

Also add TDX guest support to prot_guest_has() API to protect the
TDX specific fixes.

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Changes since v5:
 * Replaced tdx_prot_guest_has() with intel_prot_guest_has() to
   keep the Intel call non TDX specific.
 * Added TDX guest support to intel_prot_guest_has().

Changes since v4:
 * Rebased on top of Tom Lendacky's protected guest changes.
 * Moved memory encryption related protected guest flags in
   tdx_prot_guest_has() to the patch that actually uses them.

 arch/x86/Kconfig                       |  1 +
 arch/x86/include/asm/protected_guest.h |  9 +++++++++
 arch/x86/kernel/cpu/intel.c            | 14 ++++++++++++++
 include/linux/protected_guest.h        |  3 +++
 4 files changed, 27 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ab0e7c346c44..10f2cb51a39d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -871,6 +871,7 @@ config INTEL_TDX_GUEST
 	depends on SECURITY
 	select X86_X2APIC
 	select SECURITY_LOCKDOWN_LSM
+	select ARCH_HAS_PROTECTED_GUEST
 	help
 	  Provide support for running in a trusted domain on Intel processors
 	  equipped with Trusted Domain eXtensions. TDX is a new Intel
diff --git a/arch/x86/include/asm/protected_guest.h b/arch/x86/include/asm/protected_guest.h
index b4a267dddf93..722d11b2c5e8 100644
--- a/arch/x86/include/asm/protected_guest.h
+++ b/arch/x86/include/asm/protected_guest.h
@@ -11,13 +11,22 @@
 #define _X86_PROTECTED_GUEST_H
 
 #include <linux/mem_encrypt.h>
+#include <linux/processor.h>
 
 #ifndef __ASSEMBLY__
 
+#if defined(CONFIG_CPU_SUP_INTEL) && defined(CONFIG_ARCH_HAS_PROTECTED_GUEST)
+bool intel_prot_guest_has(unsigned int flag);
+#else
+static inline bool intel_prot_guest_has(unsigned int flag) { return false; }
+#endif
+
 static inline bool prot_guest_has(unsigned int attr)
 {
 	if (sme_me_mask)
 		return amd_prot_guest_has(attr);
+	else if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
+		return intel_prot_guest_has(attr);
 
 	return false;
 }
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 8321c43554a1..134ee3984fdd 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -11,6 +11,7 @@
 #include <linux/init.h>
 #include <linux/uaccess.h>
 #include <linux/delay.h>
+#include <linux/protected_guest.h>
 
 #include <asm/cpufeature.h>
 #include <asm/msr.h>
@@ -60,6 +61,19 @@ static u64 msr_test_ctrl_cache __ro_after_init;
  */
 static bool cpu_model_supports_sld __ro_after_init;
 
+#ifdef CONFIG_ARCH_HAS_PROTECTED_GUEST
+bool intel_prot_guest_has(unsigned int flag)
+{
+	switch (flag) {
+	case PATTR_GUEST_TDX:
+		return cpu_feature_enabled(X86_FEATURE_TDX_GUEST);
+	}
+
+	return false;
+}
+EXPORT_SYMBOL_GPL(intel_prot_guest_has);
+#endif
+
 /*
  * Processors which have self-snooping capability can handle conflicting
  * memory type across CPUs by snooping its own cache. However, there exists
diff --git a/include/linux/protected_guest.h b/include/linux/protected_guest.h
index 5ddef1b6a2ea..b6bb86bdf713 100644
--- a/include/linux/protected_guest.h
+++ b/include/linux/protected_guest.h
@@ -25,6 +25,9 @@
 #define PATTR_SEV			0x801
 #define PATTR_SEV_ES			0x802
 
+/* 0x900 - 0x9ff reserved for Intel */
+#define PATTR_GUEST_TDX			0x900
+
 #ifdef CONFIG_ARCH_HAS_PROTECTED_GUEST
 
 #include <asm/protected_guest.h>
-- 
2.25.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v6 05/11] x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper functions
  2021-09-03 17:28 [PATCH v6 00/11] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
                   ` (3 preceding siblings ...)
  2021-09-03 17:28 ` [PATCH v6 04/11] x86/tdx: Add protected guest support for TDX guest Kuppuswamy Sathyanarayanan
@ 2021-09-03 17:28 ` Kuppuswamy Sathyanarayanan
  2021-09-03 17:28 ` [PATCH v6 06/11] x86/traps: Add #VE support for TDX guest Kuppuswamy Sathyanarayanan
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 35+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-09-03 17:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

Guests communicate with VMMs with hypercalls. Historically, these
are implemented using instructions that are known to cause VMEXITs
like VMCALL, VMLAUNCH, etc. However, with TDX, VMEXITs no longer
expose guest state to the host.  This prevents the old hypercall
mechanisms from working. So to communicate with VMM, TDX
specification defines a new instruction called TDCALL.

In a TDX based VM, since VMM is an untrusted entity, a intermediary
layer (TDX module) exists between host and guest to facilitate
secure communication. TDX guests communicate with the TDX module
using TDCALL instruction. 

Both TDX module and VMM communication uses TDCALL instruction. Value
of the RAX register when executing TDCALL instruction is used to
determine the TDCALL type. If the TDCALL is executed with value "0"
in RAX, then it is the service request to VMM. Any other value in
RAX means it is a TDX module related call.

Implement common helper functions to communicate with the TDX Module
and VMM (using TDCALL instruction).
   
__tdx_hypercall()    - request services from the VMM.
__tdx_module_call()  - communicate with the TDX Module.

Also define two additional wrappers, tdx_hypercall() and
tdx_hypercall_out_r11() to cover common use cases of
__tdx_hypercall() function. Since each use case of
__tdx_module_call() is different, it does not need
multiple wrappers.

Implement __tdx_module_call() and __tdx_hypercall() helper functions
in assembly.

Rationale behind choosing to use assembly over inline assembly is,
since the number of lines of instructions (with comments) in
__tdx_hypercall() implementation is over 70, using inline assembly
to implement it will make it hard to read.
   
Also, just like syscalls, not all TDVMCALL/TDCALLs use cases need to
use the same set of argument registers. The implementation here picks
the current worst-case scenario for TDCALL (4 registers). For TDCALLs
with fewer than 4 arguments, there will end up being a few superfluous
(cheap) instructions.  But, this approach maximizes code reuse. The
same argument applies to __tdx_hypercall() function as well.

For registers used by TDCALL instruction, please check TDX GHCI
specification, sec 2.4 and 3.

https://software.intel.com/content/dam/develop/external/us/en/documents/intel-tdx-guest-hypervisor-communication-interface.pdf

Originally-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Changes since v5:
 * Fixed commit log (vmcall -> VMCALL, vmlaunch -> VMLAUNCH).
 * Fixed comments typo in __tdx_hypercall().

Changes since v4:
 * None

Changes since v3:
 * Fixed ARG7_SP_OFFSET value for CONFIG_FRAME_POINTER case.
 * Fixed usage of we/you in comments.

Change since v1:
 * Fixed commit comments format issues as per review comments.
 * Fixed empty line and other typo issues found in review.
 * Removed do_tdx_hypercall() helper function and modified
   __tdx_hypercall() to include do_tdx_hypercall() implementation
   and to accept TDVMCALL type as argument.
 * Since the number of arguments in __tdx_hypercall() is more than
   6, it has been modified  to get the 7th argument from the caller stack.
 * Instead of leaving output pointer in register before making
   TDCALL, stored it in a stack.
 * Instead of triggering ud2 for TDCALL failures in __tdx_hypercall(),
   it is modified to return the TDCALL status as return value. we will let
   user add appropriate error info before triggering fatal error. Also,
   extended struct tdx_hypercall_output to store r10 register which
   contains hypercall error code.
 * Included TDCALL ABI details in __tdx_module_call() and __tdx_hypercall.
 * Removed tdx_hypercall_out_r11() helper function. Since it is not really
   useful.
 * Added _tdx_hypercall() as a wrapper for __tdx_hypercall() with BUG_ON
   check for TDCALL failure.

 arch/x86/include/asm/tdx.h    |  40 +++++
 arch/x86/kernel/Makefile      |   2 +-
 arch/x86/kernel/asm-offsets.c |  23 +++
 arch/x86/kernel/tdcall.S      | 282 ++++++++++++++++++++++++++++++++++
 arch/x86/kernel/tdx.c         |  23 +++
 5 files changed, 369 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kernel/tdcall.S

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index c738bde944d1..7c7346fd8062 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -4,13 +4,53 @@
 #define _ASM_X86_TDX_H
 
 #include <linux/cpufeature.h>
+#include <linux/types.h>
 
 #define TDX_CPUID_LEAF_ID	0x21
+#define TDX_HYPERCALL_STANDARD  0
+
+/*
+ * Used in __tdx_module_call() helper function to gather the
+ * output registers' values of TDCALL instruction when requesting
+ * services from the TDX module. This is software only structure
+ * and not related to TDX module/VMM.
+ */
+struct tdx_module_output {
+	u64 rcx;
+	u64 rdx;
+	u64 r8;
+	u64 r9;
+	u64 r10;
+	u64 r11;
+};
+
+/*
+ * Used in __tdx_hypercall() helper function to gather the
+ * output registers' values of TDCALL instruction when requesting
+ * services from the VMM. This is software only structure
+ * and not related to TDX module/VMM.
+ */
+struct tdx_hypercall_output {
+	u64 r10;
+	u64 r11;
+	u64 r12;
+	u64 r13;
+	u64 r14;
+	u64 r15;
+};
 
 #ifdef CONFIG_INTEL_TDX_GUEST
 
 void __init tdx_early_init(void);
 
+/* Helper function used to communicate with the TDX module */
+u64 __tdx_module_call(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
+		      struct tdx_module_output *out);
+
+/* Helper function used to request services from VMM */
+u64 __tdx_hypercall(u64 type, u64 fn, u64 r12, u64 r13, u64 r14,
+		    u64 r15, struct tdx_hypercall_output *out);
+
 #else
 
 static inline void tdx_early_init(void) { };
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 744f1316a079..1ed013c216df 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -128,7 +128,7 @@ obj-$(CONFIG_PARAVIRT_CLOCK)	+= pvclock.o
 obj-$(CONFIG_X86_PMEM_LEGACY_DEVICE) += pmem.o
 
 obj-$(CONFIG_JAILHOUSE_GUEST)	+= jailhouse.o
-obj-$(CONFIG_INTEL_TDX_GUEST)	+= tdx.o
+obj-$(CONFIG_INTEL_TDX_GUEST)	+= tdcall.o tdx.o
 
 obj-$(CONFIG_EISA)		+= eisa.o
 obj-$(CONFIG_PCSPKR_PLATFORM)	+= pcspeaker.o
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index ecd3fd6993d1..0494ec01218d 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -23,6 +23,10 @@
 #include <xen/interface/xen.h>
 #endif
 
+#ifdef CONFIG_INTEL_TDX_GUEST
+#include <asm/tdx.h>
+#endif
+
 #ifdef CONFIG_X86_32
 # include "asm-offsets_32.c"
 #else
@@ -68,6 +72,25 @@ static void __used common(void)
 	OFFSET(XEN_vcpu_info_arch_cr2, vcpu_info, arch.cr2);
 #endif
 
+#ifdef CONFIG_INTEL_TDX_GUEST
+	BLANK();
+	/* Offset for fields in tdx_module_output */
+	OFFSET(TDX_MODULE_rcx, tdx_module_output, rcx);
+	OFFSET(TDX_MODULE_rdx, tdx_module_output, rdx);
+	OFFSET(TDX_MODULE_r8,  tdx_module_output, r8);
+	OFFSET(TDX_MODULE_r9,  tdx_module_output, r9);
+	OFFSET(TDX_MODULE_r10, tdx_module_output, r10);
+	OFFSET(TDX_MODULE_r11, tdx_module_output, r11);
+
+	/* Offset for fields in tdx_hypercall_output */
+	OFFSET(TDX_HYPERCALL_r10, tdx_hypercall_output, r10);
+	OFFSET(TDX_HYPERCALL_r11, tdx_hypercall_output, r11);
+	OFFSET(TDX_HYPERCALL_r12, tdx_hypercall_output, r12);
+	OFFSET(TDX_HYPERCALL_r13, tdx_hypercall_output, r13);
+	OFFSET(TDX_HYPERCALL_r14, tdx_hypercall_output, r14);
+	OFFSET(TDX_HYPERCALL_r15, tdx_hypercall_output, r15);
+#endif
+
 	BLANK();
 	OFFSET(BP_scratch, boot_params, scratch);
 	OFFSET(BP_secure_boot, boot_params, secure_boot);
diff --git a/arch/x86/kernel/tdcall.S b/arch/x86/kernel/tdcall.S
new file mode 100644
index 000000000000..2e70133bebf2
--- /dev/null
+++ b/arch/x86/kernel/tdcall.S
@@ -0,0 +1,282 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include <asm/asm-offsets.h>
+#include <asm/asm.h>
+#include <asm/frame.h>
+#include <asm/unwind_hints.h>
+
+#include <linux/linkage.h>
+#include <linux/bits.h>
+#include <linux/errno.h>
+
+#define TDG_R10		BIT(10)
+#define TDG_R11		BIT(11)
+#define TDG_R12		BIT(12)
+#define TDG_R13		BIT(13)
+#define TDG_R14		BIT(14)
+#define TDG_R15		BIT(15)
+
+#ifdef CONFIG_FRAME_POINTER
+/* Frame offset + 8 (for arg1) */
+#define ARG7_SP_OFFSET		(FRAME_OFFSET + 0x08)
+#else
+#define ARG7_SP_OFFSET		(0x08)
+#endif
+
+/*
+ * Expose registers R10-R15 to VMM. It is passed via RCX register
+ * to the TDX Module, which will be used by the TDX module to
+ * identify the list of registers exposed to VMM. Each bit in this
+ * mask represents a register ID. Bit field details can be found
+ * in TDX GHCI specification.
+ */
+#define TDVMCALL_EXPOSE_REGS_MASK	( TDG_R10 | TDG_R11 | \
+					  TDG_R12 | TDG_R13 | \
+					  TDG_R14 | TDG_R15 )
+
+/*
+ * TDX guests use the TDCALL instruction to make requests to the
+ * TDX module and hypercalls to the VMM. It is supported in
+ * Binutils >= 2.36.
+ */
+#define tdcall .byte 0x66,0x0f,0x01,0xcc
+
+/*
+ * __tdx_module_call()  - Helper function used by TDX guests to request
+ * services from the TDX module (does not include VMM services).
+ *
+ * This function serves as a wrapper to move user call arguments to the
+ * correct registers as specified by TDCALL ABI and share it with the
+ * TDX module. If the TDCALL operation is successful and a valid
+ * "struct tdx_module_output" pointer is available (in "out" argument),
+ * output from the TDX module is saved to the memory specified in the
+ * "out" pointer. Also the status of the TDCALL operation is returned
+ * back to the user as a function return value.
+ *
+ *-------------------------------------------------------------------------
+ * TDCALL ABI:
+ *-------------------------------------------------------------------------
+ * Input Registers:
+ *
+ * RAX                 - TDCALL Leaf number.
+ * RCX,RDX,R8-R9       - TDCALL Leaf specific input registers.
+ *
+ * Output Registers:
+ *
+ * RAX                 - TDCALL instruction error code.
+ * RCX,RDX,R8-R11      - TDCALL Leaf specific output registers.
+ *
+ *-------------------------------------------------------------------------
+ *
+ * __tdx_module_call() function ABI:
+ *
+ * @fn  (RDI)          - TDCALL Leaf ID,    moved to RAX
+ * @rcx (RSI)          - Input parameter 1, moved to RCX
+ * @rdx (RDX)          - Input parameter 2, moved to RDX
+ * @r8  (RCX)          - Input parameter 3, moved to R8
+ * @r9  (R8)           - Input parameter 4, moved to R9
+ *
+ * @out (R9)           - struct tdx_module_output pointer
+ *                       stored temporarily in R12 (not
+ *                       shared with the TDX module). It
+ *                       can be NULL.
+ *
+ * Return status of TDCALL via RAX.
+ */
+SYM_FUNC_START(__tdx_module_call)
+	FRAME_BEGIN
+
+	/*
+	 * R12 will be used as temporary storage for
+	 * struct tdx_module_output pointer. More
+	 * details about struct tdx_module_output can
+	 * be found in arch/x86/include/asm/tdx.h. Also
+	 * note that registers R12-R15 are not used by
+	 * TDCALL services supported by this helper
+	 * function.
+	 */
+
+	/* Callee saved, so preserve it */
+	push %r12
+
+	/*
+	 * Push output pointer to stack, after TDCALL operation,
+	 * it will be fetched into R12 register.
+	 */
+	push %r9
+
+	/* Mangle function call ABI into TDCALL ABI: */
+	/* Move TDCALL Leaf ID to RAX */
+	mov %rdi, %rax
+	/* Move input 4 to R9 */
+	mov %r8,  %r9
+	/* Move input 3 to R8 */
+	mov %rcx, %r8
+	/* Move input 1 to RCX */
+	mov %rsi, %rcx
+	/* Leave input param 2 in RDX */
+
+	tdcall
+
+	/* Fetch output pointer from stack to R12 */
+	pop %r12
+
+	/* Check for TDCALL success: 0 - Successful, otherwise failed */
+	test %rax, %rax
+	jnz 1f
+
+	/*
+	 * __tdx_module_call() can be initiated without an output pointer.
+	 * So, check if caller provided an output struct before storing
+	 * output registers.
+	 */
+	test %r12, %r12
+	jz 1f
+
+	/* Copy TDCALL result registers to output struct: */
+	movq %rcx, TDX_MODULE_rcx(%r12)
+	movq %rdx, TDX_MODULE_rdx(%r12)
+	movq %r8,  TDX_MODULE_r8(%r12)
+	movq %r9,  TDX_MODULE_r9(%r12)
+	movq %r10, TDX_MODULE_r10(%r12)
+	movq %r11, TDX_MODULE_r11(%r12)
+1:
+	/* Restore the state of R12 register */
+	pop %r12
+
+	FRAME_END
+	ret
+SYM_FUNC_END(__tdx_module_call)
+
+/*
+ * __tdx_hypercall()  - Helper function used by TDX guests to request
+ * services from the VMM. All requests are made via the TDX module
+ * using TDCALL instruction.
+ *
+ * This function serves as a wrapper to move user call arguments to the
+ * correct registers as specified by TDCALL ABI and share it with VMM
+ * via the TDX module. After TDCALL operation, output from the VMM is
+ * saved to the memory specified in the "out" (struct tdx_hypercall_output)
+ * pointer. 
+ *
+ *-------------------------------------------------------------------------
+ * TD VMCALL ABI:
+ *-------------------------------------------------------------------------
+ *
+ * Input Registers:
+ *
+ * RAX                 - TDCALL instruction leaf number (0 - TDG.VP.VMCALL)
+ * RCX                 - BITMAP which controls which part of TD Guest GPR
+ *                       is passed as-is to VMM and back.
+ * R10                 - Set 0 to indicate TDCALL follows standard TDX ABI
+ *                       specification. Non zero value indicates vendor
+ *                       specific ABI.
+ * R11                 - VMCALL sub function number
+ * RBX, RBP, RDI, RSI  - Used to pass VMCALL sub function specific arguments.
+ * R8-R9, R12–R15      - Same as above.
+ *
+ * Output Registers:
+ *
+ * RAX                 - TDCALL instruction status (Not related to hypercall
+ *                        output).
+ * R10                 - Hypercall output error code.
+ * R11-R15             - Hypercall sub function specific output values.
+ *
+ *-------------------------------------------------------------------------
+ *
+ * __tdx_hypercall() function ABI:
+ *
+ * @type  (RDI)        - TD VMCALL type, moved to R10
+ * @fn    (RSI)        - TD VMCALL sub function, moved to R11
+ * @r12   (RDX)        - Input parameter 1, moved to R12
+ * @r13   (RCX)        - Input parameter 2, moved to R13
+ * @r14   (R8)         - Input parameter 3, moved to R14
+ * @r15   (R9)         - Input parameter 4, moved to R15
+ *
+ * @out   (stack)      - struct tdx_hypercall_output pointer (cannot be NULL)
+ *
+ * On successful completion, return TDCALL status or -EINVAL for invalid
+ * inputs.
+ */
+SYM_FUNC_START(__tdx_hypercall)
+	FRAME_BEGIN
+
+	/* Move argument 7 from caller stack to RAX */
+	movq ARG7_SP_OFFSET(%rsp), %rax
+
+	/* Check if caller provided an output struct */
+	test %rax, %rax
+	/* If out pointer is NULL, return -EINVAL */
+	jz 1f
+
+	/* Save callee-saved GPRs as mandated by the x86_64 ABI */
+	push %r15
+	push %r14
+	push %r13
+	push %r12
+
+	/*
+	 * Save R9 and output pointer (rax) in stack, it will be used
+	 * again when storing the output registers after TDCALL
+	 * operation.
+	 */
+	push %r9
+	push %rax
+
+	/* Mangle function call ABI into TDCALL ABI: */
+	/* Set TDCALL leaf ID (TDVMCALL (0)) in RAX */
+	xor %eax, %eax
+	/* Move TDVMCALL type (standard vs vendor) in R10 */
+	mov %rdi, %r10
+	/* Move TDVMCALL sub function id to R11 */
+	mov %rsi, %r11
+	/* Move input 1 to R12 */
+	mov %rdx, %r12
+	/* Move input 2 to R13 */
+	mov %rcx, %r13
+	/* Move input 3 to R14 */
+	mov %r8,  %r14
+	/* Move input 4 to R15 */
+	mov %r9,  %r15
+
+	movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx
+
+	tdcall
+
+	/* Restore output pointer to R9 */
+	pop  %r9
+
+	/* Copy hypercall result registers to output struct: */
+	movq %r10, TDX_HYPERCALL_r10(%r9)
+	movq %r11, TDX_HYPERCALL_r11(%r9)
+	movq %r12, TDX_HYPERCALL_r12(%r9)
+	movq %r13, TDX_HYPERCALL_r13(%r9)
+	movq %r14, TDX_HYPERCALL_r14(%r9)
+	movq %r15, TDX_HYPERCALL_r15(%r9)
+
+	/*
+	 * Zero out registers exposed to the VMM to avoid
+	 * speculative execution with VMM-controlled values.
+	 * This needs to include all registers present in
+	 * TDVMCALL_EXPOSE_REGS_MASK (except R12-R15).
+	 * R12-R15 context will be restored.
+	 */
+	xor %r10d, %r10d
+	xor %r11d, %r11d
+
+	/* Restore state of R9 register */
+	pop %r9
+
+	/* Restore callee-saved GPRs as mandated by the x86_64 ABI */
+	pop %r12
+	pop %r13
+	pop %r14
+	pop %r15
+
+	jmp 2f
+1:
+       movq $(-EINVAL), %rax
+2:
+       FRAME_END
+
+       retq
+SYM_FUNC_END(__tdx_hypercall)
diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index 39dd1515b131..e6444ef75269 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -6,6 +6,29 @@
 
 #include <asm/tdx.h>
 
+/*
+ * Wrapper for standard use of __tdx_hypercall with BUG_ON() check
+ * for TDCALL error.
+ */
+static inline u64 _tdx_hypercall(u64 fn, u64 r12, u64 r13, u64 r14,
+				 u64 r15, struct tdx_hypercall_output *out)
+{
+	struct tdx_hypercall_output outl = {0};
+	u64 err;
+
+	/* __tdx_hypercall() does not accept NULL output pointer */
+	if (!out)
+		out = &outl;
+
+	err = __tdx_hypercall(TDX_HYPERCALL_STANDARD, fn, r12, r13, r14,
+			      r15, out);
+
+	/* Non zero return value indicates buggy TDX module, so panic */
+	BUG_ON(err);
+
+	return out->r10;
+}
+
 static inline bool cpuid_has_tdx_guest(void)
 {
 	u32 eax, sig[3];
-- 
2.25.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v6 06/11] x86/traps: Add #VE support for TDX guest
  2021-09-03 17:28 [PATCH v6 00/11] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
                   ` (4 preceding siblings ...)
  2021-09-03 17:28 ` [PATCH v6 05/11] x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper functions Kuppuswamy Sathyanarayanan
@ 2021-09-03 17:28 ` Kuppuswamy Sathyanarayanan
  2021-09-28 12:16   ` Joerg Roedel
  2021-09-28 15:23   ` Dave Hansen
  2021-09-03 17:28 ` [PATCH v6 07/11] x86/tdx: Add HLT " Kuppuswamy Sathyanarayanan
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 35+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-09-03 17:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Virtualization Exceptions (#VE) are delivered to TDX guests due to
specific guest actions which may happen in either user space or the kernel:

 * Specific instructions (WBINVD, for example)
 * Specific MSR accesses
 * Specific CPUID leaf accesses
 * Access to TD-shared memory, which includes MMIO

In the settings that Linux will run in, virtual exceptions are never
generated on accesses to normal, TD-private memory that has been
accepted.

The entry paths do not access TD-shared memory, MMIO regions or use
those specific MSRs, instructions, CPUID leaves that might generate #VE.
In addition, all interrupts including NMIs are blocked by the hardware
starting with #VE delivery until TDGETVEINFO is called.  This eliminates
the chance of a #VE during the syscall gap or paranoid entry paths and
simplifies #VE handling.

After TDGETVEINFO #VE could happen in theory (e.g. through an NMI),
but it is expected not to happen because TDX expects NMIs not to
trigger #VEs. Another case where they could happen is if the #VE
exception panics, but in this case there are no guarantees on anything
anyways.

If a guest kernel action which would normally cause a #VE occurs in the
interrupt-disabled region before TDGETVEINFO, a #DF is delivered to the
guest which will result in an oops (and should eventually be a panic, as
it is expected panic_on_oops is set to 1 for TDX guests).

Add basic infrastructure to handle any #VE which occurs in the kernel or
userspace.  Later patches will add handling for specific #VE scenarios.

Convert unhandled #VE's (everything, until later in this series) so that
they appear just like a #GP by calling ve_raise_fault() directly.
ve_raise_fault() is similar to #GP handler and is responsible for
sending SIGSEGV to userspace and CPU die and notifying debuggers and
other die chain users.  

Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Changes since v5:
 * Fixed "We" usage in commit log and replaced cpu -> CPU.
 * Renamed "tdg_" prefix with "tdx_".
 * Removed TODO comment in tdg_handle_virtualization_exception() as
   per Boris review comments.
 * Added comments for ve_raise_fault().

Changes since v4:
 * Since ve_raise_fault() is used only by TDX code, moved it
   within #ifdef CONFIG_INTEL_TDX_GUEST.

Changes since v3:
 * None

 arch/x86/include/asm/idtentry.h |  4 ++
 arch/x86/include/asm/tdx.h      | 19 ++++++++
 arch/x86/kernel/idt.c           |  6 +++
 arch/x86/kernel/tdx.c           | 33 ++++++++++++++
 arch/x86/kernel/traps.c         | 77 +++++++++++++++++++++++++++++++++
 5 files changed, 139 insertions(+)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 1345088e9902..8ccc81d653b3 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -625,6 +625,10 @@ DECLARE_IDTENTRY_XENCB(X86_TRAP_OTHER,	exc_xen_hypervisor_callback);
 DECLARE_IDTENTRY_RAW(X86_TRAP_OTHER,	exc_xen_unknown_trap);
 #endif
 
+#ifdef CONFIG_INTEL_TDX_GUEST
+DECLARE_IDTENTRY(X86_TRAP_VE,		exc_virtualization_exception);
+#endif
+
 /* Device interrupts common/spurious */
 DECLARE_IDTENTRY_IRQ(X86_TRAP_OTHER,	common_interrupt);
 #ifdef CONFIG_X86_LOCAL_APIC
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 7c7346fd8062..8e7fda8cd662 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -39,6 +39,20 @@ struct tdx_hypercall_output {
 	u64 r15;
 };
 
+/*
+ * Used by #VE exception handler to gather the #VE exception
+ * info from the TDX module. This is software only structure
+ * and not related to TDX module/VMM.
+ */
+struct ve_info {
+	u64 exit_reason;
+	u64 exit_qual;
+	u64 gla;	/* Guest Linear (virtual) Address */
+	u64 gpa;	/* Guest Physical (virtual) Address */
+	u32 instr_len;
+	u32 instr_info;
+};
+
 #ifdef CONFIG_INTEL_TDX_GUEST
 
 void __init tdx_early_init(void);
@@ -51,6 +65,11 @@ u64 __tdx_module_call(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
 u64 __tdx_hypercall(u64 type, u64 fn, u64 r12, u64 r13, u64 r14,
 		    u64 r15, struct tdx_hypercall_output *out);
 
+unsigned long tdx_get_ve_info(struct ve_info *ve);
+
+int tdx_handle_virtualization_exception(struct pt_regs *regs,
+					struct ve_info *ve);
+
 #else
 
 static inline void tdx_early_init(void) { };
diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index df0fa695bb09..a5eaae8e6c44 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -68,6 +68,9 @@ static const __initconst struct idt_data early_idts[] = {
 	 */
 	INTG(X86_TRAP_PF,		asm_exc_page_fault),
 #endif
+#ifdef CONFIG_INTEL_TDX_GUEST
+	INTG(X86_TRAP_VE,		asm_exc_virtualization_exception),
+#endif
 };
 
 /*
@@ -91,6 +94,9 @@ static const __initconst struct idt_data def_idts[] = {
 	INTG(X86_TRAP_MF,		asm_exc_coprocessor_error),
 	INTG(X86_TRAP_AC,		asm_exc_alignment_check),
 	INTG(X86_TRAP_XF,		asm_exc_simd_coprocessor_error),
+#ifdef CONFIG_INTEL_TDX_GUEST
+	INTG(X86_TRAP_VE,		asm_exc_virtualization_exception),
+#endif
 
 #ifdef CONFIG_X86_32
 	TSKG(X86_TRAP_DF,		GDT_ENTRY_DOUBLEFAULT_TSS),
diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index e6444ef75269..8d29ed07af1c 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -6,6 +6,9 @@
 
 #include <asm/tdx.h>
 
+/* TDX Module call Leaf IDs */
+#define TDGETVEINFO			3
+
 /*
  * Wrapper for standard use of __tdx_hypercall with BUG_ON() check
  * for TDCALL error.
@@ -41,6 +44,36 @@ static inline bool cpuid_has_tdx_guest(void)
 	return !memcmp("IntelTDX    ", sig, 12);
 }
 
+unsigned long tdx_get_ve_info(struct ve_info *ve)
+{
+	struct tdx_module_output out = {0};
+	u64 ret;
+
+	/*
+	 * NMIs and machine checks are suppressed. Before this point any
+	 * #VE is fatal. After this point (TDGETVEINFO call), NMIs and
+	 * additional #VEs are permitted (but it is expected not to
+	 * happen unless kernel panics).
+	 */
+	ret = __tdx_module_call(TDGETVEINFO, 0, 0, 0, 0, &out);
+
+	ve->exit_reason = out.rcx;
+	ve->exit_qual   = out.rdx;
+	ve->gla         = out.r8;
+	ve->gpa         = out.r9;
+	ve->instr_len   = out.r10 & UINT_MAX;
+	ve->instr_info  = out.r10 >> 32;
+
+	return ret;
+}
+
+int tdx_handle_virtualization_exception(struct pt_regs *regs,
+					struct ve_info *ve)
+{
+	pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
+	return -EFAULT;
+}
+
 void __init tdx_early_init(void)
 {
 	if (!cpuid_has_tdx_guest())
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index a58800973aed..152d1d3b9dc8 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -61,6 +61,7 @@
 #include <asm/insn.h>
 #include <asm/insn-eval.h>
 #include <asm/vdso.h>
+#include <asm/tdx.h>
 
 #ifdef CONFIG_X86_64
 #include <asm/x86_init.h>
@@ -1140,6 +1141,82 @@ DEFINE_IDTENTRY(exc_device_not_available)
 	}
 }
 
+#ifdef CONFIG_INTEL_TDX_GUEST
+#define VEFSTR "VE fault"
+static void ve_raise_fault(struct pt_regs *regs, long error_code)
+{
+	struct task_struct *tsk = current;
+
+	if (user_mode(regs)) {
+		tsk->thread.error_code = error_code;
+		tsk->thread.trap_nr = X86_TRAP_VE;
+
+		/*
+		 * Not fixing up VDSO exceptions similar to #GP handler
+		 * because it is expected that VDSO doesn't trigger #VE.
+		 */
+		show_signal(tsk, SIGSEGV, "", VEFSTR, regs, error_code);
+		force_sig(SIGSEGV);
+		return;
+	}
+
+	/*
+	 * Attempt to recover from #VE exception failure without
+	 * triggering OOPS (useful for MSR read/write failures)
+	 */
+	if (fixup_exception(regs, X86_TRAP_VE, error_code, 0))
+		return;
+
+	tsk->thread.error_code = error_code;
+	tsk->thread.trap_nr = X86_TRAP_VE;
+
+	/*
+	 * To be potentially processing a kprobe fault and to trust the result
+	 * from kprobe_running(), it should be non-preemptible.
+	 */
+	if (!preemptible() &&
+	    kprobe_running() &&
+	    kprobe_fault_handler(regs, X86_TRAP_VE))
+		return;
+
+	/* Notify about #VE handling failure, useful for debugger hooks */
+	if (notify_die(DIE_GPF, VEFSTR, regs, error_code,
+		       X86_TRAP_VE, SIGSEGV) == NOTIFY_STOP)
+		return;
+
+	/* Trigger OOPS and panic */
+	die_addr(VEFSTR, regs, error_code, 0);
+}
+
+DEFINE_IDTENTRY(exc_virtualization_exception)
+{
+	struct ve_info ve;
+	int ret;
+
+	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+
+	/*
+	 * NMIs/Machine-checks/Interrupts will be in a disabled state
+	 * till TDGETVEINFO TDCALL is executed. This prevents #VE
+	 * nesting issue.
+	 */
+	ret = tdx_get_ve_info(&ve);
+
+	cond_local_irq_enable(regs);
+
+	if (!ret)
+		ret = tdx_handle_virtualization_exception(regs, &ve);
+	/*
+	 * If tdx_handle_virtualization_exception() could not process
+	 * it successfully, treat it as #GP(0) and handle it.
+	 */
+	if (ret)
+		ve_raise_fault(regs, 0);
+
+	cond_local_irq_disable(regs);
+}
+#endif
+
 #ifdef CONFIG_X86_32
 DEFINE_IDTENTRY_SW(iret_error)
 {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v6 07/11] x86/tdx: Add HLT support for TDX guest
  2021-09-03 17:28 [PATCH v6 00/11] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
                   ` (5 preceding siblings ...)
  2021-09-03 17:28 ` [PATCH v6 06/11] x86/traps: Add #VE support for TDX guest Kuppuswamy Sathyanarayanan
@ 2021-09-03 17:28 ` Kuppuswamy Sathyanarayanan
  2021-09-03 17:28 ` [PATCH v6 08/11] x86/tdx: Wire up KVM hypercalls Kuppuswamy Sathyanarayanan
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 35+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-09-03 17:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Per Guest-Host-Communication Interface (GHCI) for Intel Trust
Domain Extensions (Intel TDX) specification, sec 3.8,
TDVMCALL[Instruction.HLT] provides HLT operation. Use it to implement
halt() and safe_halt() paravirtualization calls.

The same TDX hypercall is used to handle #VE exception due to
EXIT_REASON_HLT.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Changes since v5:
 * Replaced sti with STI in commit log and comments.
 * Added comments for _tdx_hypercall() usage in _tdx_halt().
 * Added new helper function _tdx_halt() to contain common
   code between tdx_halt() and tdx_safe_halt().
 * Renamed tdg_->tdx_.
 * Removed BUG_ON() and used WARN_ONCE() for HLT emulation
   failure.

Changes since v4:
 * Added exception for EXIT_REASON_HLT in __tdx_hypercall() to
   enable interrupts using sti.

Changes since v3:
 * None

 arch/x86/kernel/tdcall.S | 30 +++++++++++++++++++
 arch/x86/kernel/tdx.c    | 65 ++++++++++++++++++++++++++++++++++++++--
 2 files changed, 93 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/tdcall.S b/arch/x86/kernel/tdcall.S
index 2e70133bebf2..1b9649ec2e29 100644
--- a/arch/x86/kernel/tdcall.S
+++ b/arch/x86/kernel/tdcall.S
@@ -40,6 +40,9 @@
  */
 #define tdcall .byte 0x66,0x0f,0x01,0xcc
 
+/* HLT TDVMCALL sub-function ID */
+#define EXIT_REASON_HLT			12
+
 /*
  * __tdx_module_call()  - Helper function used by TDX guests to request
  * services from the TDX module (does not include VMM services).
@@ -240,6 +243,33 @@ SYM_FUNC_START(__tdx_hypercall)
 
 	movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx
 
+	/*
+	 * For the idle loop STI needs to be called directly before
+	 * the TDCALL that enters idle (EXIT_REASON_HLT case). STI
+	 * enables interrupts only one instruction later. If there
+	 * are any instructions between the STI and the TDCALL for
+	 * HLT then an interrupt could happen in that time, but the
+	 * code would go back to sleep afterwards, which can cause
+	 * longer delays.
+	 *
+	 * This leads to significant difference in network performance
+	 * benchmarks. So add a special case for EXIT_REASON_HLT to
+	 * trigger STI before TDCALL. But this change is not required
+	 * for all HLT cases. So use R15 register value to identify the
+	 * case which needs STI. So, if R11 is EXIT_REASON_HLT and R15
+	 * is 1, then call STI before TDCALL instruction. Note that R15
+	 * register is not required by TDCALL ABI when triggering the
+	 * hypercall for EXIT_REASON_HLT case. So use it in software to
+	 * select the STI case.
+	 */
+	cmpl $EXIT_REASON_HLT, %r11d
+	jne skip_sti
+	cmpl $1, %r15d
+	jne skip_sti
+	/* Set R15 register to 0, it is unused in EXIT_REASON_HLT case */
+	xor %r15, %r15
+	sti
+skip_sti:
 	tdcall
 
 	/* Restore output pointer to R9 */
diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index 8d29ed07af1c..ec1008cc42d9 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -5,6 +5,7 @@
 #define pr_fmt(fmt)     "x86/tdx: " fmt
 
 #include <asm/tdx.h>
+#include <asm/vmx.h>
 
 /* TDX Module call Leaf IDs */
 #define TDGETVEINFO			3
@@ -44,6 +45,52 @@ static inline bool cpuid_has_tdx_guest(void)
 	return !memcmp("IntelTDX    ", sig, 12);
 }
 
+static __cpuidle void _tdx_halt(const bool irq_disabled, const bool do_sti)
+{
+	u64 ret;
+
+	/*
+	 * Emulate HLT operation via hypercall. More info about ABI
+	 * can be found in TDX Guest-Host-Communication Interface
+	 * (GHCI), sec 3.8.
+	 *
+	 * The VMM uses the "IRQ disabled" param to understand IRQ
+	 * enabled status (RFLAGS.IF) of TD guest and determine
+	 * whether or not it should schedule the halted vCPU if an
+	 * IRQ becomes pending. E.g. if IRQs are disabled the VMM
+	 * can keep the vCPU in virtual HLT, even if an IRQ is
+	 * pending, without hanging/breaking the guest.
+	 *
+	 * do_sti parameter is used by __tdx_hypercall() to decide
+	 * whether to call STI instruction before executing TDCALL
+	 * instruction.
+	 */
+	ret = _tdx_hypercall(EXIT_REASON_HLT, irq_disabled, 0, 0, do_sti, NULL);
+
+	/*
+	 * Use WARN_ONCE() to report the failure. Since tdx_*halt() calls
+	 * are also used in pv_ops, #VE handler error handler cannot be
+	 * used to report the failure.
+	 */
+	WARN_ONCE(ret, "HLT instruction emulation failed\n");
+}
+
+static __cpuidle void tdx_halt(void)
+{
+	const bool irq_disabled = irqs_disabled();
+	const bool do_sti = false;
+
+	_tdx_halt(irq_disabled, do_sti);
+}
+
+static __cpuidle void tdx_safe_halt(void)
+{
+	const bool irq_disabled = false; /* since sti will be called */
+	const bool do_sti = true;
+
+	_tdx_halt(irq_disabled, do_sti);
+}
+
 unsigned long tdx_get_ve_info(struct ve_info *ve)
 {
 	struct tdx_module_output out = {0};
@@ -70,8 +117,19 @@ unsigned long tdx_get_ve_info(struct ve_info *ve)
 int tdx_handle_virtualization_exception(struct pt_regs *regs,
 					struct ve_info *ve)
 {
-	pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
-	return -EFAULT;
+	switch (ve->exit_reason) {
+	case EXIT_REASON_HLT:
+		tdx_halt();
+		break;
+	default:
+		pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
+		return -EFAULT;
+	}
+
+	/* After successful #VE handling, move the IP */
+	regs->ip += ve->instr_len;
+
+	return 0;
 }
 
 void __init tdx_early_init(void)
@@ -81,5 +139,8 @@ void __init tdx_early_init(void)
 
 	setup_force_cpu_cap(X86_FEATURE_TDX_GUEST);
 
+	pv_ops.irq.safe_halt = tdx_safe_halt;
+	pv_ops.irq.halt = tdx_halt;
+
 	pr_info("Guest initialized\n");
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v6 08/11] x86/tdx: Wire up KVM hypercalls
  2021-09-03 17:28 [PATCH v6 00/11] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
                   ` (6 preceding siblings ...)
  2021-09-03 17:28 ` [PATCH v6 07/11] x86/tdx: Add HLT " Kuppuswamy Sathyanarayanan
@ 2021-09-03 17:28 ` Kuppuswamy Sathyanarayanan
  2021-09-03 17:28 ` [PATCH v6 09/11] x86/tdx: Add MSR support for TDX guest Kuppuswamy Sathyanarayanan
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 35+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-09-03 17:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

KVM hypercalls use the VMCALL or VMMCALL instructions. Although the ABI
is similar, those instructions no longer function for TDX guests. Make
vendor-specific TDVMCALLs instead of VMCALL. This enables TDX guests to
run with KVM acting as the hypervisor. TDX guests running under other
hypervisors will continue to use those hypervisors' hypercalls.

Since KVM driver can be built as a kernel module, export
tdx_kvm_hypercall*() to make the symbols visible to kvm.ko.

Also, add asm/tdx.h to asm/asm-prototypes.h so that asm symbol's
checksum can be generated in order to support CONFIG_MODVERSIONS with
it and fix:

WARNING: modpost: EXPORT symbol "__tdx_hypercall" [vmlinux] version \
generation failed, symbol will not be versioned.

[Isaku Yamahata: proposed KVM VENDOR string]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Changes since v5:
 * Added more info about version generation failed error in commit
   log.
 * Fixed commit log as per review comments.
 * Removed CONFIG_INTEL_TDX_GUEST_KVM and used
   CONFIG_KVM_GUEST/CONFIG_INTEL_TDX_GUEST for TDX KVM hypercall
   implementation.
 * Used EXPORT_SYMBOL_GPL for __tdx_hypercall() export.

Changes since v4:
 * No functional changes.

Changes since v3:
 * Fixed ASM sysmbol generation issue in tdcall.S by including tdx.h
   in asm-prototypes.h

Changes since v1:
 * Replaced is_tdx_guest() with prot_guest_has(PR_GUEST_TDX).
 * Replaced tdx_kvm_hypercall{1-4} with single generic 
   function tdx_kvm_hypercall().
 * Removed __tdx_hypercall_vendor_kvm() and re-used __tdx_hypercall().

 arch/x86/include/asm/asm-prototypes.h |  1 +
 arch/x86/include/asm/kvm_para.h       | 22 +++++++++++++++++
 arch/x86/include/asm/tdx.h            | 34 +++++++++++++++++++++++++--
 arch/x86/kernel/tdcall.S              |  2 ++
 4 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/asm-prototypes.h b/arch/x86/include/asm/asm-prototypes.h
index 4cb726c71ed8..404add7ee720 100644
--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -6,6 +6,7 @@
 #include <asm/page.h>
 #include <asm/checksum.h>
 #include <asm/mce.h>
+#include <asm/tdx.h>
 
 #include <asm-generic/asm-prototypes.h>
 
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 69299878b200..bd0ab7c3ae25 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -4,7 +4,9 @@
 
 #include <asm/processor.h>
 #include <asm/alternative.h>
+#include <asm/tdx.h>
 #include <linux/interrupt.h>
+#include <linux/protected_guest.h>
 #include <uapi/asm/kvm_para.h>
 
 #ifdef CONFIG_KVM_GUEST
@@ -32,6 +34,10 @@ static inline bool kvm_check_and_clear_guest_paused(void)
 static inline long kvm_hypercall0(unsigned int nr)
 {
 	long ret;
+
+	if (prot_guest_has(PATTR_GUEST_TDX))
+		return tdx_kvm_hypercall(nr, 0, 0, 0, 0);
+
 	asm volatile(KVM_HYPERCALL
 		     : "=a"(ret)
 		     : "a"(nr)
@@ -42,6 +48,10 @@ static inline long kvm_hypercall0(unsigned int nr)
 static inline long kvm_hypercall1(unsigned int nr, unsigned long p1)
 {
 	long ret;
+
+	if (prot_guest_has(PATTR_GUEST_TDX))
+		return tdx_kvm_hypercall(nr, p1, 0, 0, 0);
+
 	asm volatile(KVM_HYPERCALL
 		     : "=a"(ret)
 		     : "a"(nr), "b"(p1)
@@ -53,6 +63,10 @@ static inline long kvm_hypercall2(unsigned int nr, unsigned long p1,
 				  unsigned long p2)
 {
 	long ret;
+
+	if (prot_guest_has(PATTR_GUEST_TDX))
+		return tdx_kvm_hypercall(nr, p1, p2, 0, 0);
+
 	asm volatile(KVM_HYPERCALL
 		     : "=a"(ret)
 		     : "a"(nr), "b"(p1), "c"(p2)
@@ -64,6 +78,10 @@ static inline long kvm_hypercall3(unsigned int nr, unsigned long p1,
 				  unsigned long p2, unsigned long p3)
 {
 	long ret;
+
+	if (prot_guest_has(PATTR_GUEST_TDX))
+		return tdx_kvm_hypercall(nr, p1, p2, p3, 0);
+
 	asm volatile(KVM_HYPERCALL
 		     : "=a"(ret)
 		     : "a"(nr), "b"(p1), "c"(p2), "d"(p3)
@@ -76,6 +94,10 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned long p1,
 				  unsigned long p4)
 {
 	long ret;
+
+	if (prot_guest_has(PATTR_GUEST_TDX))
+		return tdx_kvm_hypercall(nr, p1, p2, p3, p4);
+
 	asm volatile(KVM_HYPERCALL
 		     : "=a"(ret)
 		     : "a"(nr), "b"(p1), "c"(p2), "d"(p3), "S"(p4)
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 8e7fda8cd662..403aaa6efb8b 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -6,8 +6,9 @@
 #include <linux/cpufeature.h>
 #include <linux/types.h>
 
-#define TDX_CPUID_LEAF_ID	0x21
-#define TDX_HYPERCALL_STANDARD  0
+#define TDX_CPUID_LEAF_ID			0x21
+#define TDX_HYPERCALL_STANDARD			0
+#define TDX_HYPERCALL_VENDOR_KVM		0x4d564b2e584454 /* TDX.KVM */
 
 /*
  * Used in __tdx_module_call() helper function to gather the
@@ -76,4 +77,33 @@ static inline void tdx_early_init(void) { };
 
 #endif /* CONFIG_INTEL_TDX_GUEST */
 
+#ifdef CONFIG_KVM_GUEST
+#ifdef CONFIG_INTEL_TDX_GUEST
+static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
+				     unsigned long p2, unsigned long p3,
+				     unsigned long p4)
+{
+	struct tdx_hypercall_output out;
+	u64 err;
+
+	err = __tdx_hypercall(TDX_HYPERCALL_VENDOR_KVM, nr, p1, p2,
+			      p3, p4, &out);
+
+	/*
+	 * Non zero return value means buggy TDX module (which is fatal).
+	 * So use BUG_ON() to panic.
+	 */
+	BUG_ON(err);
+
+	return out.r10;
+}
+#else
+static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
+				     unsigned long p2, unsigned long p3,
+				     unsigned long p4)
+{
+	return -ENODEV;
+}
+#endif /* CONFIG_INTEL_TDX_GUEST */
+#endif /* CONFIG_KVM_GUEST */
 #endif /* _ASM_X86_TDX_H */
diff --git a/arch/x86/kernel/tdcall.S b/arch/x86/kernel/tdcall.S
index 1b9649ec2e29..fa87f5e2cf29 100644
--- a/arch/x86/kernel/tdcall.S
+++ b/arch/x86/kernel/tdcall.S
@@ -3,6 +3,7 @@
 #include <asm/asm.h>
 #include <asm/frame.h>
 #include <asm/unwind_hints.h>
+#include <asm/export.h>
 
 #include <linux/linkage.h>
 #include <linux/bits.h>
@@ -310,3 +311,4 @@ skip_sti:
 
        retq
 SYM_FUNC_END(__tdx_hypercall)
+EXPORT_SYMBOL_GPL(__tdx_hypercall);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v6 09/11] x86/tdx: Add MSR support for TDX guest
  2021-09-03 17:28 [PATCH v6 00/11] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
                   ` (7 preceding siblings ...)
  2021-09-03 17:28 ` [PATCH v6 08/11] x86/tdx: Wire up KVM hypercalls Kuppuswamy Sathyanarayanan
@ 2021-09-03 17:28 ` Kuppuswamy Sathyanarayanan
  2021-09-03 17:28 ` [PATCH v6 10/11] x86/tdx: Don't write CSTAR MSR on Intel Kuppuswamy Sathyanarayanan
  2021-09-03 17:28 ` [PATCH v6 11/11] x86/tdx: Handle CPUID via #VE Kuppuswamy Sathyanarayanan
  10 siblings, 0 replies; 35+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-09-03 17:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Operations on context-switched MSRs can be run natively. The rest of
MSRs should be handled through TDVMCALLs.

TDVMCALL[Instruction.RDMSR] and TDVMCALL[Instruction.WRMSR] provide
MSR oprations.

RDMSR and WRMSR specification details can be found in
Guest-Host-Communication Interface (GHCI) for Intel Trust Domain
Extensions (Intel TDX) specification, sec 3.10, 3.11.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Changes since v5:
 * Renamed "tdg" prefix with "tdx".
 * Added comments for _tdx_hypercall() usage in MSR read/write functions.

Change since v4:
 * Removed You usage from commit log.

Changes since v3:
 * None

 arch/x86/kernel/tdx.c | 77 +++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 75 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index ec1008cc42d9..5c52dde4a5fd 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -91,6 +91,65 @@ static __cpuidle void tdx_safe_halt(void)
 	_tdx_halt(irq_disabled, do_sti);
 }
 
+static bool tdx_is_context_switched_msr(unsigned int msr)
+{
+	switch (msr) {
+	case MSR_EFER:
+	case MSR_IA32_CR_PAT:
+	case MSR_FS_BASE:
+	case MSR_GS_BASE:
+	case MSR_KERNEL_GS_BASE:
+	case MSR_IA32_SYSENTER_CS:
+	case MSR_IA32_SYSENTER_EIP:
+	case MSR_IA32_SYSENTER_ESP:
+	case MSR_STAR:
+	case MSR_LSTAR:
+	case MSR_SYSCALL_MASK:
+	case MSR_IA32_XSS:
+	case MSR_TSC_AUX:
+	case MSR_IA32_BNDCFGS:
+		return true;
+	}
+	return false;
+}
+
+static u64 tdx_read_msr_safe(unsigned int msr, int *err)
+{
+	struct tdx_hypercall_output out = {0};
+	u64 ret;
+
+	WARN_ON_ONCE(tdx_is_context_switched_msr(msr));
+
+	/*
+	 * Emulate the MSR read via hypercall. More info about ABI
+	 * can be found in TDX Guest-Host-Communication Interface
+	 * (GHCI), sec 3.10.
+	 */
+	ret = _tdx_hypercall(EXIT_REASON_MSR_READ, msr, 0, 0, 0, &out);
+
+	*err = ret ? -EIO : 0;
+
+	return out.r11;
+}
+
+static int tdx_write_msr_safe(unsigned int msr, unsigned int low,
+			      unsigned int high)
+{
+	u64 ret;
+
+	WARN_ON_ONCE(tdx_is_context_switched_msr(msr));
+
+	/*
+	 * Emulate the MSR write via hypercall. More info about ABI
+	 * can be found in TDX Guest-Host-Communication Interface
+	 * (GHCI) sec 3.11.
+	 */
+	ret = _tdx_hypercall(EXIT_REASON_MSR_WRITE, msr, (u64)high << 32 | low,
+			     0, 0, NULL);
+
+	return ret ? -EIO : 0;
+}
+
 unsigned long tdx_get_ve_info(struct ve_info *ve)
 {
 	struct tdx_module_output out = {0};
@@ -117,19 +176,33 @@ unsigned long tdx_get_ve_info(struct ve_info *ve)
 int tdx_handle_virtualization_exception(struct pt_regs *regs,
 					struct ve_info *ve)
 {
+	unsigned long val;
+	int ret = 0;
+
 	switch (ve->exit_reason) {
 	case EXIT_REASON_HLT:
 		tdx_halt();
 		break;
+	case EXIT_REASON_MSR_READ:
+		val = tdx_read_msr_safe(regs->cx, (unsigned int *)&ret);
+		if (!ret) {
+			regs->ax = (u32)val;
+			regs->dx = val >> 32;
+		}
+		break;
+	case EXIT_REASON_MSR_WRITE:
+		ret = tdx_write_msr_safe(regs->cx, regs->ax, regs->dx);
+		break;
 	default:
 		pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
 		return -EFAULT;
 	}
 
 	/* After successful #VE handling, move the IP */
-	regs->ip += ve->instr_len;
+	if (!ret)
+		regs->ip += ve->instr_len;
 
-	return 0;
+	return ret;
 }
 
 void __init tdx_early_init(void)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v6 10/11] x86/tdx: Don't write CSTAR MSR on Intel
  2021-09-03 17:28 [PATCH v6 00/11] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
                   ` (8 preceding siblings ...)
  2021-09-03 17:28 ` [PATCH v6 09/11] x86/tdx: Add MSR support for TDX guest Kuppuswamy Sathyanarayanan
@ 2021-09-03 17:28 ` Kuppuswamy Sathyanarayanan
  2021-09-03 17:28 ` [PATCH v6 11/11] x86/tdx: Handle CPUID via #VE Kuppuswamy Sathyanarayanan
  10 siblings, 0 replies; 35+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-09-03 17:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

From: Andi Kleen <ak@linux.intel.com>

On Intel CPUs writing the CSTAR MSR is not really needed. Syscalls
from 32bit work using SYSENTER and 32bit SYSCALL is an illegal opcode.
But the kernel did write it anyways even though it was ignored by
the CPU. Inside a TDX guest this actually leads to a #VE which in
turn will trigger ve_raise_fault() due to failed MSR write. Inside
ve_raise_fault() before it recovers from this error, it prints an
ugly message at boot. Since such warning message is pointless for
CSTAR MSR write failure, add exception to skip CSTAR msr write on
Intel CPUs.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---

Changes since v5:
 * Fixed commit log as per review comments.

 arch/x86/kernel/cpu/common.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 64b805bd6a54..d936f0e4ec51 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1752,7 +1752,13 @@ void syscall_init(void)
 	wrmsrl(MSR_LSTAR, (unsigned long)entry_SYSCALL_64);
 
 #ifdef CONFIG_IA32_EMULATION
-	wrmsrl(MSR_CSTAR, (unsigned long)entry_SYSCALL_compat);
+	/*
+	 * CSTAR is not needed on Intel because it doesn't support
+	 * 32bit SYSCALL, but only SYSENTER. On a TDX guest
+	 * it leads to a #GP.
+	 */
+	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
+		wrmsrl(MSR_CSTAR, (unsigned long)entry_SYSCALL_compat);
 	/*
 	 * This only works on Intel CPUs.
 	 * On AMD CPUs these MSRs are 32-bit, CPU truncates MSR_IA32_SYSENTER_EIP.
@@ -1764,7 +1770,8 @@ void syscall_init(void)
 		    (unsigned long)(cpu_entry_stack(smp_processor_id()) + 1));
 	wrmsrl_safe(MSR_IA32_SYSENTER_EIP, (u64)entry_SYSENTER_compat);
 #else
-	wrmsrl(MSR_CSTAR, (unsigned long)ignore_sysret);
+	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
+		wrmsrl(MSR_CSTAR, (unsigned long)ignore_sysret);
 	wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)GDT_ENTRY_INVALID_SEG);
 	wrmsrl_safe(MSR_IA32_SYSENTER_ESP, 0ULL);
 	wrmsrl_safe(MSR_IA32_SYSENTER_EIP, 0ULL);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v6 11/11] x86/tdx: Handle CPUID via #VE
  2021-09-03 17:28 [PATCH v6 00/11] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
                   ` (9 preceding siblings ...)
  2021-09-03 17:28 ` [PATCH v6 10/11] x86/tdx: Don't write CSTAR MSR on Intel Kuppuswamy Sathyanarayanan
@ 2021-09-03 17:28 ` Kuppuswamy Sathyanarayanan
  2021-09-03 18:35   ` Dave Hansen
  10 siblings, 1 reply; 35+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-09-03 17:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

TDX has three classes of CPUID leaves: some CPUID leaves are always
handled by the CPU, others are handled by the TDX module, and some
others are handled by the VMM. Since the VMM cannot directly intercept
the instruction these are reflected with a #VE exception to the guest,
which then converts it into a hypercall to the VMM, or handled
directly.

The TDX module specification [1], sec 16.2 has a full list of CPUID
leaves which are handled natively or by the TDX module. Only unknown
CPUIDs are handled by the #VE method. In practice this typically only
applies to the hypervisor-specific CPUIDs unknown to the native CPU.

Therefore there is no risk of causing this in early CPUID code which
runs before the #VE handler is set up because it will never access
those exotic CPUID leaves.

[1] - https://software.intel.com/content/dam/develop/external/us/en/documents/tdx-module-1.0-public-spec-v0.931.pdf

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Changes since v5:
 * Fixed commit log as per review comments.
 * Removed WARN_ON() in tdx_handle_cpuid().
 * Renamed "tdg" prefix with "tdx".

Changes since v4:
 * None

Changes since v3:
 * None
 arch/x86/kernel/tdx.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index 5c52dde4a5fd..c65c117aff5f 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -150,6 +150,21 @@ static int tdx_write_msr_safe(unsigned int msr, unsigned int low,
 	return ret ? -EIO : 0;
 }
 
+static u64 tdx_handle_cpuid(struct pt_regs *regs)
+{
+	struct tdx_hypercall_output out = {0};
+	u64 ret;
+
+	ret = _tdx_hypercall(EXIT_REASON_CPUID, regs->ax, regs->cx, 0, 0, &out);
+
+	regs->ax = out.r12;
+	regs->bx = out.r13;
+	regs->cx = out.r14;
+	regs->dx = out.r15;
+
+	return ret;
+}
+
 unsigned long tdx_get_ve_info(struct ve_info *ve)
 {
 	struct tdx_module_output out = {0};
@@ -193,6 +208,9 @@ int tdx_handle_virtualization_exception(struct pt_regs *regs,
 	case EXIT_REASON_MSR_WRITE:
 		ret = tdx_write_msr_safe(regs->cx, regs->ax, regs->dx);
 		break;
+	case EXIT_REASON_CPUID:
+		ret = tdx_handle_cpuid(regs);
+		break;
 	default:
 		pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
 		return -EFAULT;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 01/11] x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT
  2021-09-03 17:28 ` [PATCH v6 01/11] x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT Kuppuswamy Sathyanarayanan
@ 2021-09-03 18:22   ` Borislav Petkov
  2021-09-03 19:03     ` Kuppuswamy, Sathyanarayanan
  0 siblings, 1 reply; 35+ messages in thread
From: Borislav Petkov @ 2021-09-03 18:22 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, x86, Paolo Bonzini, Juergen Gross,
	Deep Shah, VMware Inc, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Peter H Anvin, Dave Hansen, Tony Luck,
	Dan Williams, Andi Kleen, Kirill Shutemov, Sean Christopherson,
	Kuppuswamy Sathyanarayanan, linux-kernel

On Fri, Sep 03, 2021 at 10:28:02AM -0700, Kuppuswamy Sathyanarayanan wrote:
> diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
> index c5ce9845c999..ddc77c95adc6 100644
> --- a/arch/x86/include/asm/irqflags.h
> +++ b/arch/x86/include/asm/irqflags.h
> @@ -59,27 +59,8 @@ static inline __cpuidle void native_halt(void)
>  
>  #endif
>  
> -#ifdef CONFIG_PARAVIRT_XXL
> -#include <asm/paravirt.h>
> -#else
> +#ifndef CONFIG_PARAVIRT
>  #ifndef __ASSEMBLY__
> -#include <linux/types.h>
> -
> -static __always_inline unsigned long arch_local_save_flags(void)
> -{
> -	return native_save_fl();
> -}
> -
> -static __always_inline void arch_local_irq_disable(void)
> -{
> -	native_irq_disable();
> -}
> -
> -static __always_inline void arch_local_irq_enable(void)
> -{
> -	native_irq_enable();
> -}
> -
>  /*
>   * Used in the idle loop; sti takes one instruction cycle
>   * to complete:
> @@ -97,6 +78,33 @@ static inline __cpuidle void halt(void)
>  {
>  	native_halt();
>  }
> +#endif /* __ASSEMBLY__ */
> +#endif /* CONFIG_PARAVIRT */
> +
> +#ifdef CONFIG_PARAVIRT
> +#ifndef __ASSEMBLY__
> +#include <asm/paravirt.h>
> +#endif /* __ASSEMBLY__ */
> +#endif /* CONFIG_PARAVIRT */

I think the way we write those is like this:

#ifdef CONFIG_PARAVIRT

# ifndef __ASSEMBLY__
# include <asm/paravirt.h>
# endif

#else /* ! CONFIG_PARAVIRT */

# ifndef __ASSEMBLY__
/*
 * Used in the idle loop; sti takes one instruction cycle
 * to complete:
 */
static inline __cpuidle void arch_safe_halt(void)
{
	native_safe_halt();
}

/*
 * Used when interrupts are already enabled or to
 * shutdown the processor:
 */
static inline __cpuidle void halt(void)
{
	native_halt();
}
# endif /* __ASSEMBLY__ */

#endif /* CONFIG_PARAVIRT */

Note the empty space after the '#' of the inner ifdef to show that it is
an inner one.

Also, this header has clearly too many #if*def __ASSEMBLY__ things
sprinkled around. Lemme see if I can get rid of them so that it is at
least a bit readable.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 11/11] x86/tdx: Handle CPUID via #VE
  2021-09-03 17:28 ` [PATCH v6 11/11] x86/tdx: Handle CPUID via #VE Kuppuswamy Sathyanarayanan
@ 2021-09-03 18:35   ` Dave Hansen
  2021-09-03 19:14     ` Kuppuswamy, Sathyanarayanan
  0 siblings, 1 reply; 35+ messages in thread
From: Dave Hansen @ 2021-09-03 18:35 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, Paolo Bonzini, Juergen Gross, Deep Shah,
	VMware Inc, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel
  Cc: Peter H Anvin, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On 9/3/21 10:28 AM, Kuppuswamy Sathyanarayanan wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> 
> TDX has three classes of CPUID leaves: some CPUID leaves are always
> handled by the CPU, others are handled by the TDX module, and some
> others are handled by the VMM. Since the VMM cannot directly intercept
> the instruction these are reflected with a #VE exception to the guest,
> which then converts it into a hypercall to the VMM, or handled
> directly.

Does this patch do any of the "handled directly" leaves?  If not, why
mention it?

It would also be nice to mention that this applies to both kernel and
userspace use of CPUID.  It talks a bit about early kernel use, which
makes it seem like this is kernel-only.

I also think it's a mistake to talk about TDX-module handling.  For
*this* patch, it doesn't matter.

Here's a reformatted replacement changelog:

--

When running virtualized, the CPUID instruction is handled differently
based on the leaf being accessed.  The behavior depends only on on the
leaf and applies equally to both kernel/ring-0 and userspace/ring-3
execution of CPUID. Historically, there are two basic classes:

 * Leaves handled transparently to the guest
 * Leaves handled by the VMM

In a typical guest without TDX, "handled by the VMM" leaves cause a
VMEXIT.  TDX replaces these VMEXITs with a #VE exception in the guest.
The guest typically handles the #VE by making a hypercall to the VMM.

The TDX module spec talks about a few more classes of CPUID handling.
But, for the purposes of this patch, the "handled transparently" CPUID
leaves are all lumped together because the guest handling is the same.

--

> The TDX module specification [1], sec 16.2 has a full list of CPUID

				     ^ I think we can spare the extra four bytes to make "sec" ->
"section".

I also opened up the pdf from [1] an searched for "16.2". I found:

	16.2. Branch Prediction Side Channel Attacks Mitigation
	Mechanisms

There is, however, a:

	18.2. CPUID Virtualization

section.  Did you, perhaps, mean to reference that instead?

Which kinda argues for not using these section numbers at *all*.
Perhaps you should just mention the section titles, as they're obviously
less volatile.  That's probably a comment that applies to *ALL* of your
changelogs across *ALL* TDX patches.  This just proves that the section
numbers are worthless.

> leaves which are handled natively or by the TDX module. Only unknown

Just in terms of nice writing, it would be great to use the same
language when you refer to the same concept.  Earlier you called this
"handled by the CPU".  But, here you refer to it as being "handled
natively".  Neither is wrong, but this puts a burden on the reader to
make a connection rather than doing it for them as the writer.

> CPUIDs are handled by the #VE method. In practice this typically only
> applies to the hypervisor-specific CPUIDs unknown to the native CPU.
> 
> Therefore there is no risk of causing this in early CPUID code which
> runs before the #VE handler is set up because it will never access
> those exotic CPUID leaves.

This never actually makes a transition from background to telling what
the patch does.  I think this needs at least this:

	Allow the the #VE code to handle any CPUID leaves which cause a
	#VE.  Unconditionally make a TDCALL to the VMM.

> diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
> index 5c52dde4a5fd..c65c117aff5f 100644
> --- a/arch/x86/kernel/tdx.c
> +++ b/arch/x86/kernel/tdx.c
> @@ -150,6 +150,21 @@ static int tdx_write_msr_safe(unsigned int msr, unsigned int low,
>  	return ret ? -EIO : 0;
>  }
>  
> +static u64 tdx_handle_cpuid(struct pt_regs *regs)
> +{
> +	struct tdx_hypercall_output out = {0};
> +	u64 ret;
> +
> +	ret = _tdx_hypercall(EXIT_REASON_CPUID, regs->ax, regs->cx, 0, 0, &out);
> +
> +	regs->ax = out.r12;
> +	regs->bx = out.r13;
> +	regs->cx = out.r14;
> +	regs->dx = out.r15;

This probably needs a comment about why this is shuffling registers
around like this.

> +	return ret;
> +}
> +
>  unsigned long tdx_get_ve_info(struct ve_info *ve)
>  {
>  	struct tdx_module_output out = {0};
> @@ -193,6 +208,9 @@ int tdx_handle_virtualization_exception(struct pt_regs *regs,
>  	case EXIT_REASON_MSR_WRITE:
>  		ret = tdx_write_msr_safe(regs->cx, regs->ax, regs->dx);
>  		break;
> +	case EXIT_REASON_CPUID:
> +		ret = tdx_handle_cpuid(regs);
> +		break;
>  	default:
>  		pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
>  		return -EFAULT;
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 01/11] x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT
  2021-09-03 18:22   ` Borislav Petkov
@ 2021-09-03 19:03     ` Kuppuswamy, Sathyanarayanan
  2021-09-03 20:33       ` Borislav Petkov
  0 siblings, 1 reply; 35+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2021-09-03 19:03 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Ingo Molnar, x86, Paolo Bonzini, Juergen Gross,
	Deep Shah, VMware Inc, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Peter H Anvin, Dave Hansen, Tony Luck,
	Dan Williams, Andi Kleen, Kirill Shutemov, Sean Christopherson,
	Kuppuswamy Sathyanarayanan, linux-kernel



On 9/3/21 11:22 AM, Borislav Petkov wrote:
> On Fri, Sep 03, 2021 at 10:28:02AM -0700, Kuppuswamy Sathyanarayanan wrote:
>> diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
>> index c5ce9845c999..ddc77c95adc6 100644
>> --- a/arch/x86/include/asm/irqflags.h
>> +++ b/arch/x86/include/asm/irqflags.h
>> @@ -59,27 +59,8 @@ static inline __cpuidle void native_halt(void)
>>   
>>   #endif
>>   
>> -#ifdef CONFIG_PARAVIRT_XXL
>> -#include <asm/paravirt.h>
>> -#else
>> +#ifndef CONFIG_PARAVIRT
>>   #ifndef __ASSEMBLY__
>> -#include <linux/types.h>
>> -
>> -static __always_inline unsigned long arch_local_save_flags(void)
>> -{
>> -	return native_save_fl();
>> -}
>> -
>> -static __always_inline void arch_local_irq_disable(void)
>> -{
>> -	native_irq_disable();
>> -}
>> -
>> -static __always_inline void arch_local_irq_enable(void)
>> -{
>> -	native_irq_enable();
>> -}
>> -
>>   /*
>>    * Used in the idle loop; sti takes one instruction cycle
>>    * to complete:
>> @@ -97,6 +78,33 @@ static inline __cpuidle void halt(void)
>>   {
>>   	native_halt();
>>   }
>> +#endif /* __ASSEMBLY__ */
>> +#endif /* CONFIG_PARAVIRT */
>> +
>> +#ifdef CONFIG_PARAVIRT
>> +#ifndef __ASSEMBLY__
>> +#include <asm/paravirt.h>
>> +#endif /* __ASSEMBLY__ */
>> +#endif /* CONFIG_PARAVIRT */
> 
> I think the way we write those is like this:
> 
> #ifdef CONFIG_PARAVIRT
> 
> # ifndef __ASSEMBLY__
> # include <asm/paravirt.h>
> # endif
> 
> #else /* ! CONFIG_PARAVIRT */
> 
> # ifndef __ASSEMBLY__
> /*
>   * Used in the idle loop; sti takes one instruction cycle
>   * to complete:
>   */
> static inline __cpuidle void arch_safe_halt(void)
> {
> 	native_safe_halt();
> }
> 
> /*
>   * Used when interrupts are already enabled or to
>   * shutdown the processor:
>   */
> static inline __cpuidle void halt(void)
> {
> 	native_halt();
> }
> # endif /* __ASSEMBLY__ */
> 
> #endif /* CONFIG_PARAVIRT */

Yes, I can combine the #ifdef as you have mentioned. I can fix this
in next version.

> 
> Note the empty space after the '#' of the inner ifdef to show that it is
> an inner one.

Is this new convention? #ifdefs used in this file does not follow this
format.

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 11/11] x86/tdx: Handle CPUID via #VE
  2021-09-03 18:35   ` Dave Hansen
@ 2021-09-03 19:14     ` Kuppuswamy, Sathyanarayanan
  2021-09-03 23:43       ` Sean Christopherson
  0 siblings, 1 reply; 35+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2021-09-03 19:14 UTC (permalink / raw)
  To: Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel



On 9/3/21 11:35 AM, Dave Hansen wrote:
> On 9/3/21 10:28 AM, Kuppuswamy Sathyanarayanan wrote:
>> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>>
>> TDX has three classes of CPUID leaves: some CPUID leaves are always
>> handled by the CPU, others are handled by the TDX module, and some
>> others are handled by the VMM. Since the VMM cannot directly intercept
>> the instruction these are reflected with a #VE exception to the guest,
>> which then converts it into a hypercall to the VMM, or handled
>> directly.
> 
> Does this patch do any of the "handled directly" leaves?  If not, why
> mention it?

It was added to give more information about CPUID leaves handling. Since
it has nothing to do with this patch, I can remove it.

> 
> It would also be nice to mention that this applies to both kernel and
> userspace use of CPUID.  It talks a bit about early kernel use, which
> makes it seem like this is kernel-only.
> 
> I also think it's a mistake to talk about TDX-module handling.  For
> *this* patch, it doesn't matter.
> 
> Here's a reformatted replacement changelog:
> 
> --
> 
> When running virtualized, the CPUID instruction is handled differently
> based on the leaf being accessed.  The behavior depends only on on the
> leaf and applies equally to both kernel/ring-0 and userspace/ring-3
> execution of CPUID. Historically, there are two basic classes:
> 
>   * Leaves handled transparently to the guest
>   * Leaves handled by the VMM
> 
> In a typical guest without TDX, "handled by the VMM" leaves cause a
> VMEXIT.  TDX replaces these VMEXITs with a #VE exception in the guest.
> The guest typically handles the #VE by making a hypercall to the VMM.
> 
> The TDX module spec talks about a few more classes of CPUID handling.
> But, for the purposes of this patch, the "handled transparently" CPUID
> leaves are all lumped together because the guest handling is the same.
> 
> --

Thanks. I will use above commit log in next version.

> 
>> The TDX module specification [1], sec 16.2 has a full list of CPUID
> 
> 				     ^ I think we can spare the extra four bytes to make "sec" ->
> "section".
> 
> I also opened up the pdf from [1] an searched for "16.2". I found:
> 
> 	16.2. Branch Prediction Side Channel Attacks Mitigation
> 	Mechanisms
> 
> There is, however, a:
> 
> 	18.2. CPUID Virtualization
> 
> section.  Did you, perhaps, mean to reference that instead?

It looks like I have been using previous version of the TDX module spec
(Sep 2020). In the newer version, it is changed to 18.2.

To avoid confusion I will use the section title for reference.
> 
> Which kinda argues for not using these section numbers at *all*.
> Perhaps you should just mention the section titles, as they're obviously
> less volatile.  That's probably a comment that applies to *ALL* of your
> changelogs across *ALL* TDX patches.  This just proves that the section
> numbers are worthless.

Makes sense. I will fix it in all TDX patch series.

> 
>> leaves which are handled natively or by the TDX module. Only unknown
> 
> Just in terms of nice writing, it would be great to use the same
> language when you refer to the same concept.  Earlier you called this
> "handled by the CPU".  But, here you refer to it as being "handled
> natively".  Neither is wrong, but this puts a burden on the reader to
> make a connection rather than doing it for them as the writer.

Ok. I will keep this in mind for future submissions. For this patch
your commit log refactor fixes this issue.

> 
>> CPUIDs are handled by the #VE method. In practice this typically only
>> applies to the hypervisor-specific CPUIDs unknown to the native CPU.
>>
>> Therefore there is no risk of causing this in early CPUID code which
>> runs before the #VE handler is set up because it will never access
>> those exotic CPUID leaves.
> 
> This never actually makes a transition from background to telling what
> the patch does.  I think this needs at least this:
> 
> 	Allow the the #VE code to handle any CPUID leaves which cause a
> 	#VE.  Unconditionally make a TDCALL to the VMM.
> 
>> diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
>> index 5c52dde4a5fd..c65c117aff5f 100644
>> --- a/arch/x86/kernel/tdx.c
>> +++ b/arch/x86/kernel/tdx.c
>> @@ -150,6 +150,21 @@ static int tdx_write_msr_safe(unsigned int msr, unsigned int low,
>>   	return ret ? -EIO : 0;
>>   }
>>   
>> +static u64 tdx_handle_cpuid(struct pt_regs *regs)
>> +{
>> +	struct tdx_hypercall_output out = {0};
>> +	u64 ret;
>> +
>> +	ret = _tdx_hypercall(EXIT_REASON_CPUID, regs->ax, regs->cx, 0, 0, &out);
>> +
>> +	regs->ax = out.r12;
>> +	regs->bx = out.r13;
>> +	regs->cx = out.r14;
>> +	regs->dx = out.r15;
> 
> This probably needs a comment about why this is shuffling registers
> around like this.

I will add the ABI details here and also spell out what we are getting
in R12-R15 registers.


> 

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 01/11] x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT
  2021-09-03 19:03     ` Kuppuswamy, Sathyanarayanan
@ 2021-09-03 20:33       ` Borislav Petkov
  0 siblings, 0 replies; 35+ messages in thread
From: Borislav Petkov @ 2021-09-03 20:33 UTC (permalink / raw)
  To: Kuppuswamy, Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, x86, Paolo Bonzini, Juergen Gross,
	Deep Shah, VMware Inc, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Peter H Anvin, Dave Hansen, Tony Luck,
	Dan Williams, Andi Kleen, Kirill Shutemov, Sean Christopherson,
	Kuppuswamy Sathyanarayanan, linux-kernel

On Fri, Sep 03, 2021 at 12:03:56PM -0700, Kuppuswamy, Sathyanarayanan wrote:
> Is this new convention?

No, not new:

git grep -E "^#\s+define" *.h

> #ifdefs used in this file does not follow this format.

That's why we fix them so that they do.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 11/11] x86/tdx: Handle CPUID via #VE
  2021-09-03 19:14     ` Kuppuswamy, Sathyanarayanan
@ 2021-09-03 23:43       ` Sean Christopherson
  2021-09-03 23:54         ` Andi Kleen
  0 siblings, 1 reply; 35+ messages in thread
From: Sean Christopherson @ 2021-09-03 23:43 UTC (permalink / raw)
  To: Kuppuswamy, Sathyanarayanan
  Cc: Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Peter H Anvin, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Kuppuswamy Sathyanarayanan, linux-kernel

On Fri, Sep 03, 2021, Kuppuswamy, Sathyanarayanan wrote:
> 
> 
> On 9/3/21 11:35 AM, Dave Hansen wrote:
> > On 9/3/21 10:28 AM, Kuppuswamy Sathyanarayanan wrote:
> > > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> > > 
> > > TDX has three classes of CPUID leaves: some CPUID leaves are always
> > > handled by the CPU, others are handled by the TDX module, and some
> > > others are handled by the VMM. Since the VMM cannot directly intercept
> > > the instruction these are reflected with a #VE exception to the guest,
> > > which then converts it into a hypercall to the VMM, or handled
> > > directly.
> > 
> > Does this patch do any of the "handled directly" leaves?  If not, why
> > mention it?
> 
> It was added to give more information about CPUID leaves handling. Since
> it has nothing to do with this patch, I can remove it.

What leaves are "always handled by the CPU"?  VTx doesn't allow disabling
CPUID exiting, let alone conditionally exiting on a specific CPUID leaf.  I don't
see anything in the TDX specs that suggests that's any different in SEAM non-root
mode.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 11/11] x86/tdx: Handle CPUID via #VE
  2021-09-03 23:43       ` Sean Christopherson
@ 2021-09-03 23:54         ` Andi Kleen
  2021-09-04  0:00           ` Dave Hansen
  0 siblings, 1 reply; 35+ messages in thread
From: Andi Kleen @ 2021-09-03 23:54 UTC (permalink / raw)
  To: Sean Christopherson, Kuppuswamy, Sathyanarayanan
  Cc: Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Peter H Anvin, Tony Luck, Dan Williams, Kirill Shutemov,
	Kuppuswamy Sathyanarayanan, linux-kernel


On 9/3/2021 4:43 PM, Sean Christopherson wrote:
> On Fri, Sep 03, 2021, Kuppuswamy, Sathyanarayanan wrote:
>>
>> On 9/3/21 11:35 AM, Dave Hansen wrote:
>>> On 9/3/21 10:28 AM, Kuppuswamy Sathyanarayanan wrote:
>>>> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>>>>
>>>> TDX has three classes of CPUID leaves: some CPUID leaves are always
>>>> handled by the CPU, others are handled by the TDX module, and some
>>>> others are handled by the VMM. Since the VMM cannot directly intercept
>>>> the instruction these are reflected with a #VE exception to the guest,
>>>> which then converts it into a hypercall to the VMM, or handled
>>>> directly.
>>> Does this patch do any of the "handled directly" leaves?  If not, why
>>> mention it?
>> It was added to give more information about CPUID leaves handling. Since
>> it has nothing to do with this patch, I can remove it.
> What leaves are "always handled by the CPU"?  VTx doesn't allow disabling
> CPUID exiting, let alone conditionally exiting on a specific CPUID leaf.  I don't
> see anything in the TDX specs that suggests that's any different in SEAM non-root
> mode.

It means they are handled by the TDX module, but always have the same 
contents as a native CPU would.

As opposed to leaves that are modified by the TDX module.

-Andi


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 11/11] x86/tdx: Handle CPUID via #VE
  2021-09-03 23:54         ` Andi Kleen
@ 2021-09-04  0:00           ` Dave Hansen
  2021-09-04  0:05             ` Andi Kleen
  0 siblings, 1 reply; 35+ messages in thread
From: Dave Hansen @ 2021-09-04  0:00 UTC (permalink / raw)
  To: Andi Kleen, Sean Christopherson, Kuppuswamy, Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Peter H Anvin, Tony Luck, Dan Williams, Kirill Shutemov,
	Kuppuswamy Sathyanarayanan, linux-kernel

On 9/3/21 4:54 PM, Andi Kleen wrote:
> It means they are handled by the TDX module, but always have the same
> contents as a native CPU would.
> 
> As opposed to leaves that are modified by the TDX module.

Is that distinction important for this patch?

Or, should we stick to the two-class taxonomy (#VE-inducing and not)
that I suggested in my replacement changelog?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 11/11] x86/tdx: Handle CPUID via #VE
  2021-09-04  0:00           ` Dave Hansen
@ 2021-09-04  0:05             ` Andi Kleen
  0 siblings, 0 replies; 35+ messages in thread
From: Andi Kleen @ 2021-09-04  0:05 UTC (permalink / raw)
  To: Dave Hansen, Sean Christopherson, Kuppuswamy, Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Peter H Anvin, Tony Luck, Dan Williams, Kirill Shutemov,
	Kuppuswamy Sathyanarayanan, linux-kernel


On 9/3/2021 5:00 PM, Dave Hansen wrote:
> On 9/3/21 4:54 PM, Andi Kleen wrote:
>> It means they are handled by the TDX module, but always have the same
>> contents as a native CPU would.
>>
>> As opposed to leaves that are modified by the TDX module.
> Is that distinction important for this patch?
It's not.
>
> Or, should we stick to the two-class taxonomy (#VE-inducing and not)
> that I suggested in my replacement changelog?


Seems fine.

-Andi


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 02/11] x86/tdx: Introduce INTEL_TDX_GUEST config option
  2021-09-03 17:28 ` [PATCH v6 02/11] x86/tdx: Introduce INTEL_TDX_GUEST config option Kuppuswamy Sathyanarayanan
@ 2021-09-28 11:37   ` Joerg Roedel
  2021-09-28 12:52     ` Kuppuswamy, Sathyanarayanan
  0 siblings, 1 reply; 35+ messages in thread
From: Joerg Roedel @ 2021-09-28 11:37 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Peter H Anvin,
	Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On Fri, Sep 03, 2021 at 10:28:03AM -0700, Kuppuswamy Sathyanarayanan wrote:
> +config INTEL_TDX_GUEST
> +	bool "Intel Trusted Domain eXtensions Guest Support"
> +	depends on X86_64 && CPU_SUP_INTEL && PARAVIRT
> +	depends on SECURITY
> +	select X86_X2APIC
> +	select SECURITY_LOCKDOWN_LSM
> +	help
> +	  Provide support for running in a trusted domain on Intel processors
> +	  equipped with Trusted Domain eXtensions. TDX is a new Intel

Nit: the word 'new' in this description will not age well, you should
consider removing it.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 03/11] x86/cpufeatures: Add TDX Guest CPU feature
  2021-09-03 17:28 ` [PATCH v6 03/11] x86/cpufeatures: Add TDX Guest CPU feature Kuppuswamy Sathyanarayanan
@ 2021-09-28 11:39   ` Joerg Roedel
  2021-09-28 12:53     ` Kuppuswamy, Sathyanarayanan
  0 siblings, 1 reply; 35+ messages in thread
From: Joerg Roedel @ 2021-09-28 11:39 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Peter H Anvin,
	Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On Fri, Sep 03, 2021 at 10:28:04AM -0700, Kuppuswamy Sathyanarayanan wrote:
> +++ b/arch/x86/kernel/tdx.c
> @@ -0,0 +1,29 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (C) 2020 Intel Corporation */
> +
> +#undef pr_fmt
> +#define pr_fmt(fmt)     "x86/tdx: " fmt

The 'x86' is implicit already and don't need to be in the printk prefix.
How about just 'tdx' or 'intel-tdx' instead?


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 04/11] x86/tdx: Add protected guest support for TDX guest
  2021-09-03 17:28 ` [PATCH v6 04/11] x86/tdx: Add protected guest support for TDX guest Kuppuswamy Sathyanarayanan
@ 2021-09-28 11:46   ` Joerg Roedel
  2021-09-28 12:56     ` Kuppuswamy, Sathyanarayanan
  0 siblings, 1 reply; 35+ messages in thread
From: Joerg Roedel @ 2021-09-28 11:46 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Peter H Anvin,
	Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On Fri, Sep 03, 2021 at 10:28:05AM -0700, Kuppuswamy Sathyanarayanan wrote:
>  static inline bool prot_guest_has(unsigned int attr)
>  {
>  	if (sme_me_mask)
>  		return amd_prot_guest_has(attr);
> +	else if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
> +		return intel_prot_guest_has(attr);

This causes a function call on every Intel machine this code runs. is
there an easier to check whether TDX is enabled, like the sme_me_mask
check on AMD?


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 06/11] x86/traps: Add #VE support for TDX guest
  2021-09-03 17:28 ` [PATCH v6 06/11] x86/traps: Add #VE support for TDX guest Kuppuswamy Sathyanarayanan
@ 2021-09-28 12:16   ` Joerg Roedel
  2021-09-28 14:05     ` Dave Hansen
  2021-09-28 15:23   ` Dave Hansen
  1 sibling, 1 reply; 35+ messages in thread
From: Joerg Roedel @ 2021-09-28 12:16 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Peter H Anvin,
	Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On Fri, Sep 03, 2021 at 10:28:07AM -0700, Kuppuswamy Sathyanarayanan wrote:
> In the settings that Linux will run in, virtual exceptions are never
> generated on accesses to normal, TD-private memory that has been
> accepted.

Does this also hold true when the Hypervisor does unexpected things that
cause previously accepted pages to become unaccepted again? This means
pages like the entry code pages or other memory that is touched before
the syscall entry path switched stacks.

Can you sched some light on what happens in such a situation?

> +DEFINE_IDTENTRY(exc_virtualization_exception)
> +{
> +	struct ve_info ve;
> +	int ret;
> +
> +	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
> +
> +	/*
> +	 * NMIs/Machine-checks/Interrupts will be in a disabled state
> +	 * till TDGETVEINFO TDCALL is executed. This prevents #VE
> +	 * nesting issue.
> +	 */
> +	ret = tdx_get_ve_info(&ve);
> +
> +	cond_local_irq_enable(regs);

Potentially enabling IRQs here means that TDX does not have a shared
per-cpu data structure (like the GHCB on AMD). Is that right?


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 02/11] x86/tdx: Introduce INTEL_TDX_GUEST config option
  2021-09-28 11:37   ` Joerg Roedel
@ 2021-09-28 12:52     ` Kuppuswamy, Sathyanarayanan
  0 siblings, 0 replies; 35+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2021-09-28 12:52 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Peter H Anvin,
	Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel



On 9/28/21 4:37 AM, Joerg Roedel wrote:
> Nit: the word 'new' in this description will not age well, you should
> consider removing it.

Yes. I will remove it.

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 03/11] x86/cpufeatures: Add TDX Guest CPU feature
  2021-09-28 11:39   ` Joerg Roedel
@ 2021-09-28 12:53     ` Kuppuswamy, Sathyanarayanan
  0 siblings, 0 replies; 35+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2021-09-28 12:53 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Peter H Anvin,
	Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel



On 9/28/21 4:39 AM, Joerg Roedel wrote:
> On Fri, Sep 03, 2021 at 10:28:04AM -0700, Kuppuswamy Sathyanarayanan wrote:
>> +++ b/arch/x86/kernel/tdx.c
>> @@ -0,0 +1,29 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/* Copyright (C) 2020 Intel Corporation */
>> +
>> +#undef pr_fmt
>> +#define pr_fmt(fmt)     "x86/tdx: " fmt
> 
> The 'x86' is implicit already and don't need to be in the printk prefix.
> How about just 'tdx' or 'intel-tdx' instead?

I am fine with tdx. I will change it in next version.

> 

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 04/11] x86/tdx: Add protected guest support for TDX guest
  2021-09-28 11:46   ` Joerg Roedel
@ 2021-09-28 12:56     ` Kuppuswamy, Sathyanarayanan
  2021-09-28 13:04       ` Joerg Roedel
  0 siblings, 1 reply; 35+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2021-09-28 12:56 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Peter H Anvin,
	Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel



On 9/28/21 4:46 AM, Joerg Roedel wrote:
> On Fri, Sep 03, 2021 at 10:28:05AM -0700, Kuppuswamy Sathyanarayanan wrote:
>>   static inline bool prot_guest_has(unsigned int attr)
>>   {
>>   	if (sme_me_mask)
>>   		return amd_prot_guest_has(attr);
>> +	else if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
>> +		return intel_prot_guest_has(attr);
> 
> This causes a function call on every Intel machine this code runs. is
> there an easier to check whether TDX is enabled, like the sme_me_mask
> check on AMD?

This will only be called when CONFIG_ARCH_HAS_CC_PLATFORM is set by a platform.
So it won't be called for all platforms.

Also, intel_prot_guest_has() is a generic Intel platform branch call (so we can't
directly check for TDX here).

> 

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 04/11] x86/tdx: Add protected guest support for TDX guest
  2021-09-28 12:56     ` Kuppuswamy, Sathyanarayanan
@ 2021-09-28 13:04       ` Joerg Roedel
  2021-09-28 13:18         ` Borislav Petkov
  0 siblings, 1 reply; 35+ messages in thread
From: Joerg Roedel @ 2021-09-28 13:04 UTC (permalink / raw)
  To: Kuppuswamy, Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Peter H Anvin,
	Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On Tue, Sep 28, 2021 at 05:56:03AM -0700, Kuppuswamy, Sathyanarayanan wrote:
> This will only be called when CONFIG_ARCH_HAS_CC_PLATFORM is set by a platform.
> So it won't be called for all platforms.

Yes, but distributions will enable this in their default configs, so the
function call will happen on every Intel machine, even on bare-metal.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 04/11] x86/tdx: Add protected guest support for TDX guest
  2021-09-28 13:04       ` Joerg Roedel
@ 2021-09-28 13:18         ` Borislav Petkov
  0 siblings, 0 replies; 35+ messages in thread
From: Borislav Petkov @ 2021-09-28 13:18 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Kuppuswamy, Sathyanarayanan, Thomas Gleixner, Ingo Molnar, x86,
	Paolo Bonzini, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Peter H Anvin,
	Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On Tue, Sep 28, 2021 at 03:04:36PM +0200, Joerg Roedel wrote:
> Yes, but distributions will enable this in their default configs, so the
> function call will happen on every Intel machine, even on bare-metal.

The result of whether the guest is a TDX needs to be cached and queried
instead anyway:

https://lkml.kernel.org/r/YVL4ZUGhfsh1QfRX@zn.tnic

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 06/11] x86/traps: Add #VE support for TDX guest
  2021-09-28 12:16   ` Joerg Roedel
@ 2021-09-28 14:05     ` Dave Hansen
  2021-09-28 15:22       ` Joerg Roedel
  0 siblings, 1 reply; 35+ messages in thread
From: Dave Hansen @ 2021-09-28 14:05 UTC (permalink / raw)
  To: Joerg Roedel, Kuppuswamy Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Peter H Anvin,
	Tony Luck, Dan Williams, Andi Kleen, Kirill Shutemov,
	Sean Christopherson, Kuppuswamy Sathyanarayanan, linux-kernel

On 9/28/21 5:16 AM, Joerg Roedel wrote:
> On Fri, Sep 03, 2021 at 10:28:07AM -0700, Kuppuswamy Sathyanarayanan wrote:
>> In the settings that Linux will run in, virtual exceptions are never
>> generated on accesses to normal, TD-private memory that has been
>> accepted.
> 
> Does this also hold true when the Hypervisor does unexpected things that
> cause previously accepted pages to become unaccepted again? This means
> pages like the entry code pages or other memory that is touched before
> the syscall entry path switched stacks.
> 
> Can you sched some light on what happens in such a situation?

"Shared" pages can cause #VE's.  The guest must be careful not to touch
them in the syscall entry path, for example.  But, shared pages are
untrusted so they're not use for stacks.

"Private" pages can cause #VE's.  But, only *some* of them.  Before a
page is accepted, it is in the SEPT_PENDING and a reference would cause
a #VE.  But, after acceptance, page references either succeed or a TD
Exit and the hypervisor gets to handle the situation.

Basically, if the hypervisor does unexpected things, the hypervisor gets
to handle it.

The "Intel® TDX Module 1.0 Specification" from

https://software.intel.com/content/www/us/en/develop/articles/intel-trust-domain-extensions.html

has more details:

> 10.10.2. #VE Injection by the CPU due to EPT Violations
> #VE is enabled unconditionally for TDX non-root operation. The Intel TDX module sets the TD VMCS EPT-violation #VE
> VM-execution control to 1.
> For shared memory accesses (i.e., when GPA.SHARED == 1), as with legacy VMX, the VMM can choose which pages are
> eligible for #VE mutation based on the value of the Shared EPTE bit 63.
> For private memory accesses (GPA.SHARED == 0), an EPT Violation causes a TD Exit in most cases, except when the Secure
> EPT entry state is SEPT_PENDING (an exception to this is described in 10.11.1 below). The Intel TDX module sets the
> Secure EPT entry’s Suppress VE bit (63) to 0 if the entry’s state is SEPT_PENDING. It sets that bit to 1 for all other entry
> states.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 06/11] x86/traps: Add #VE support for TDX guest
  2021-09-28 14:05     ` Dave Hansen
@ 2021-09-28 15:22       ` Joerg Roedel
  2021-09-28 15:25         ` Dave Hansen
  0 siblings, 1 reply; 35+ messages in thread
From: Joerg Roedel @ 2021-09-28 15:22 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Kuppuswamy Sathyanarayanan, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, Paolo Bonzini, Juergen Gross, Deep Shah,
	VMware Inc, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Peter H Anvin, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On Tue, Sep 28, 2021 at 07:05:40AM -0700, Dave Hansen wrote:
> "Shared" pages can cause #VE's.  The guest must be careful not to touch
> them in the syscall entry path, for example.  But, shared pages are
> untrusted so they're not use for stacks.
> 
> "Private" pages can cause #VE's.  But, only *some* of them.  Before a
> page is accepted, it is in the SEPT_PENDING and a reference would cause
> a #VE.  But, after acceptance, page references either succeed or a TD
> Exit and the hypervisor gets to handle the situation.

Okay, and there is no way for the VMM to replace an already accepted
page with a page in SEPT_PENDING state?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 06/11] x86/traps: Add #VE support for TDX guest
  2021-09-03 17:28 ` [PATCH v6 06/11] x86/traps: Add #VE support for TDX guest Kuppuswamy Sathyanarayanan
  2021-09-28 12:16   ` Joerg Roedel
@ 2021-09-28 15:23   ` Dave Hansen
  2021-09-28 16:59     ` Kuppuswamy, Sathyanarayanan
  1 sibling, 1 reply; 35+ messages in thread
From: Dave Hansen @ 2021-09-28 15:23 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, Paolo Bonzini, Juergen Gross, Deep Shah,
	VMware Inc, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel
  Cc: Peter H Anvin, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On 9/3/21 10:28 AM, Kuppuswamy Sathyanarayanan wrote:
> Virtualization Exceptions (#VE) are delivered to TDX guests due to
> specific guest actions which may happen in either user space or the kernel:
> 
>  * Specific instructions (WBINVD, for example)
>  * Specific MSR accesses
>  * Specific CPUID leaf accesses
>  * Access to TD-shared memory, which includes MMIO
> 
> In the settings that Linux will run in, virtual exceptions are never

					    ^ virtualization

> generated on accesses to normal, TD-private memory that has been
> accepted.

We've gone over this at least half a dozen times.  Sathya, please add
this to your cover letter and also to the TDX documentation if it's not
there already:

In the settings that Linux will run in, virtualization exceptions are
never generated on accesses to normal kernel memory (see #VE on Memory
Access below).

...

== #VE on Memory Accesses ==

A TD guest is in control of whether its memory accesses are treated as
private or shared.  It selects the behavior with a bit in its page table
entries.

=== #VE on Shared Pages ===

Accesses to shared mappings can cause #VE's.  The hypervisor is in
control of when a #VE might occur, so the guest must be careful to only
reference shared pages when it is in a context that can safely handle a #VE.

However, shared mapping content can not be trusted since shared page
content is writable by the hypervisor.  This means that shared mappings
are never used for sensitive memory contents like stacks or kernel text.
 This means that the shared mapping property of inducing #VEs requires
essentially no special kernel handling in sensitive contexts like
syscall entry or NMIs.

=== #VE on Private Pages ===

Some accesses to private mappings may cause #VEs.  Before a mapping is
accepted (aka. in the SEPT_PENDING state), a reference would cause
a #VE.  But, after acceptance, references typically succeed.

The hypervisor can cause a private page reference to fail if it chooses
to move an accepted page to a "blocked" state.  However, if it does
this, a page access will not generate a #VE.  It will, instead, cause a
"TD Exit" where the hypervisor is required to handle the exception.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 06/11] x86/traps: Add #VE support for TDX guest
  2021-09-28 15:22       ` Joerg Roedel
@ 2021-09-28 15:25         ` Dave Hansen
  0 siblings, 0 replies; 35+ messages in thread
From: Dave Hansen @ 2021-09-28 15:25 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Kuppuswamy Sathyanarayanan, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, Paolo Bonzini, Juergen Gross, Deep Shah,
	VMware Inc, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Peter H Anvin, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On 9/28/21 8:22 AM, Joerg Roedel wrote:
> On Tue, Sep 28, 2021 at 07:05:40AM -0700, Dave Hansen wrote:
>> "Shared" pages can cause #VE's.  The guest must be careful not to touch
>> them in the syscall entry path, for example.  But, shared pages are
>> untrusted so they're not use for stacks.
>>
>> "Private" pages can cause #VE's.  But, only *some* of them.  Before a
>> page is accepted, it is in the SEPT_PENDING and a reference would cause
>> a #VE.  But, after acceptance, page references either succeed or a TD
>> Exit and the hypervisor gets to handle the situation.
> Okay, and there is no way for the VMM to replace an already accepted
> page with a page in SEPT_PENDING state?

I hope not.

There would be lots of other problems if that were permitted.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 06/11] x86/traps: Add #VE support for TDX guest
  2021-09-28 15:23   ` Dave Hansen
@ 2021-09-28 16:59     ` Kuppuswamy, Sathyanarayanan
  0 siblings, 0 replies; 35+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2021-09-28 16:59 UTC (permalink / raw)
  To: Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel



On 9/28/21 8:23 AM, Dave Hansen wrote:
> We've gone over this at least half a dozen times.  Sathya, please add
> this to your cover letter and also to the TDX documentation if it's not
> there already:

Sure. I will add it in next submission.

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2021-09-28 16:59 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-03 17:28 [PATCH v6 00/11] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
2021-09-03 17:28 ` [PATCH v6 01/11] x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT Kuppuswamy Sathyanarayanan
2021-09-03 18:22   ` Borislav Petkov
2021-09-03 19:03     ` Kuppuswamy, Sathyanarayanan
2021-09-03 20:33       ` Borislav Petkov
2021-09-03 17:28 ` [PATCH v6 02/11] x86/tdx: Introduce INTEL_TDX_GUEST config option Kuppuswamy Sathyanarayanan
2021-09-28 11:37   ` Joerg Roedel
2021-09-28 12:52     ` Kuppuswamy, Sathyanarayanan
2021-09-03 17:28 ` [PATCH v6 03/11] x86/cpufeatures: Add TDX Guest CPU feature Kuppuswamy Sathyanarayanan
2021-09-28 11:39   ` Joerg Roedel
2021-09-28 12:53     ` Kuppuswamy, Sathyanarayanan
2021-09-03 17:28 ` [PATCH v6 04/11] x86/tdx: Add protected guest support for TDX guest Kuppuswamy Sathyanarayanan
2021-09-28 11:46   ` Joerg Roedel
2021-09-28 12:56     ` Kuppuswamy, Sathyanarayanan
2021-09-28 13:04       ` Joerg Roedel
2021-09-28 13:18         ` Borislav Petkov
2021-09-03 17:28 ` [PATCH v6 05/11] x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper functions Kuppuswamy Sathyanarayanan
2021-09-03 17:28 ` [PATCH v6 06/11] x86/traps: Add #VE support for TDX guest Kuppuswamy Sathyanarayanan
2021-09-28 12:16   ` Joerg Roedel
2021-09-28 14:05     ` Dave Hansen
2021-09-28 15:22       ` Joerg Roedel
2021-09-28 15:25         ` Dave Hansen
2021-09-28 15:23   ` Dave Hansen
2021-09-28 16:59     ` Kuppuswamy, Sathyanarayanan
2021-09-03 17:28 ` [PATCH v6 07/11] x86/tdx: Add HLT " Kuppuswamy Sathyanarayanan
2021-09-03 17:28 ` [PATCH v6 08/11] x86/tdx: Wire up KVM hypercalls Kuppuswamy Sathyanarayanan
2021-09-03 17:28 ` [PATCH v6 09/11] x86/tdx: Add MSR support for TDX guest Kuppuswamy Sathyanarayanan
2021-09-03 17:28 ` [PATCH v6 10/11] x86/tdx: Don't write CSTAR MSR on Intel Kuppuswamy Sathyanarayanan
2021-09-03 17:28 ` [PATCH v6 11/11] x86/tdx: Handle CPUID via #VE Kuppuswamy Sathyanarayanan
2021-09-03 18:35   ` Dave Hansen
2021-09-03 19:14     ` Kuppuswamy, Sathyanarayanan
2021-09-03 23:43       ` Sean Christopherson
2021-09-03 23:54         ` Andi Kleen
2021-09-04  0:00           ` Dave Hansen
2021-09-04  0:05             ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).