linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/12] Add TDX Guest Support (Initial support)
@ 2021-08-04 18:13 Kuppuswamy Sathyanarayanan
  2021-08-04 18:13 ` [PATCH v5 01/12] x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT Kuppuswamy Sathyanarayanan
                   ` (11 more replies)
  0 siblings, 12 replies; 72+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-08-04 18:13 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Andy Lutomirski
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel

Hi All,

Intel's Trust Domain Extensions (TDX) protect guest VMs from malicious
hosts and some physical attacks. This series adds the basic TDX guest
infrastructure support (including #VE handler support, and #VE support
for halt and CPUID). This is just a subset of patches in the bare minimum
TDX support patch list which is required for supporting minimal
functional TDX guest. Other basic feature features like #VE support for
IO, MMIO, boot optimization fixes and shared-mm support will be submitted
in a separate patch set. To make reviewing easier we split it into smaller
series. This series alone is not necessarily fully functional.

Also, the host-side support patches, and support for advanced TD guest
features like attestation or debug-mode will be submitted at a later time.
Also, at this point it is not secure with some known holes in drivers, and
also hasn’t been fully audited and fuzzed yet.

TDX has a lot of similarities to SEV. It enhances confidentiality and
of guest memory and state (like registers) and includes a new exception
(#VE) for the same basic reasons as SEV-ES. Like SEV-SNP (not merged
yet), TDX limits the host's ability to effect changes in the guest
physical address space. With TDX the host cannot access the guest memory,
so various functionality that would normally be done in KVM has moved
into a (paravirtualized) guest. Partially this is done using the
Virtualization Exception (#VE) and partially with direct paravirtual hooks.

The TDX architecture also includes a new CPU mode called
Secure-Arbitration Mode (SEAM). The software (TDX module) running in this
mode arbitrates interactions between host and guest and implements many of
the guarantees of the TDX architecture.

Some of the key differences between TD and regular VM is,

1. Multi CPU bring-up is done using the ACPI MADT wake-up table.
2. A new #VE exception handler is added. The TDX module injects #VE exception
   to the guest TD in cases of instructions that need to be emulated, disallowed
   MSR accesses, etc.
3. By default memory is marked as private, and TD will selectively share it with
   VMM based on need.
   
Note that the kernel will also need to be hardened against low level inputs from
the now untrusted hosts. This will be done in follow on patches.

You can find TDX related documents in the following link.

https://software.intel.com/content/www/br/pt/develop/articles/intel-trust-domain-extensions.html

This patchset has dependency on protected guest changes submitted by Tom Lendacky

https://lore.kernel.org/patchwork/cover/1468760/

Changes since v4:
 * Added a patch that adds TDX guest exception for CSTAR MSR.
 * Rebased on top of Tom Lendacky's protected guest changes.
 * Rest of the change log is added per patch.

Changes since v3:
 * Moved generic protected guest changes from patch titled "x86:
   Introduce generic protected guest abstraction" into seperate
   patch outside this patchset. Also, TDX specific changes are
   moved to patch titled "x86/tdx: Add protected guest support
   for TDX guest"
 * Rebased on top of v5.14-rc1.
 * Rest of the change log is added per patch.

Changes since v1 (v2 is partial set submission):
 * Patch titled "x86/x86: Add early_is_tdx_guest() interface" is moved
   out of this series.
 * Rest of the change log is added per patch.

Andi Kleen (1):
  x86/tdx: Don't write CSTAR MSR on Intel

Kirill A. Shutemov (7):
  x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT
  x86/tdx: Get TD execution environment information via TDINFO
  x86/traps: Add #VE support for TDX guest
  x86/tdx: Add HLT support for TDX guest
  x86/tdx: Wire up KVM hypercalls
  x86/tdx: Add MSR support for TDX guest
  x86/tdx: Handle CPUID via #VE

Kuppuswamy Sathyanarayanan (4):
  x86/tdx: Introduce INTEL_TDX_GUEST config option
  x86/cpufeatures: Add TDX Guest CPU feature
  x86/tdx: Add protected guest support for TDX guest
  x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper
    functions

 arch/x86/Kconfig                       |  20 ++
 arch/x86/include/asm/asm-prototypes.h  |   4 +
 arch/x86/include/asm/cpufeatures.h     |   1 +
 arch/x86/include/asm/idtentry.h        |   4 +
 arch/x86/include/asm/irqflags.h        |  40 ++--
 arch/x86/include/asm/kvm_para.h        |  22 ++
 arch/x86/include/asm/paravirt.h        |  20 +-
 arch/x86/include/asm/paravirt_types.h  |   3 +-
 arch/x86/include/asm/protected_guest.h |   5 +
 arch/x86/include/asm/tdx.h             | 109 +++++++++
 arch/x86/kernel/Makefile               |   1 +
 arch/x86/kernel/asm-offsets.c          |  23 ++
 arch/x86/kernel/cpu/common.c           |  11 +-
 arch/x86/kernel/head64.c               |   3 +
 arch/x86/kernel/idt.c                  |   6 +
 arch/x86/kernel/paravirt.c             |   4 +-
 arch/x86/kernel/tdcall.S               | 313 +++++++++++++++++++++++++
 arch/x86/kernel/tdx.c                  | 242 +++++++++++++++++++
 arch/x86/kernel/traps.c                |  69 ++++++
 include/linux/protected_guest.h        |   3 +
 20 files changed, 870 insertions(+), 33 deletions(-)
 create mode 100644 arch/x86/include/asm/tdx.h
 create mode 100644 arch/x86/kernel/tdcall.S
 create mode 100644 arch/x86/kernel/tdx.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v5 01/12] x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT
  2021-08-04 18:13 [PATCH v5 00/12] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
@ 2021-08-04 18:13 ` Kuppuswamy Sathyanarayanan
  2021-08-12  7:18   ` Borislav Petkov
  2021-08-04 18:13 ` [PATCH v5 02/12] x86/tdx: Introduce INTEL_TDX_GUEST config option Kuppuswamy Sathyanarayanan
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 72+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-08-04 18:13 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Andy Lutomirski
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

CONFIG_PARAVIRT_XXL is mainly defined/used by XEN PV guests. For
other VM guest types, features supported under CONFIG_PARAVIRT
are self sufficient. CONFIG_PARAVIRT mainly provides support for
TLB flush operations and time related operations.

For TDX guest as well, paravirt calls under CONFIG_PARVIRT meets
most of its requirement except the need of HLT and SAFE_HLT
paravirt calls, which is currently defined under
COFNIG_PARAVIRT_XXL.

Since enabling CONFIG_PARAVIRT_XXL is too bloated for TDX guest
like platforms, move HLT and SAFE_HLT paravirt calls under
CONFIG_PARAVIRT.

Moving HLT and SAFE_HLT paravirt calls are not fatal and should not
break any functionality for current users of CONFIG_PARAVIRT.

Co-developed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---

Changes since v4:
 * None.

 arch/x86/include/asm/irqflags.h       | 40 +++++++++++++++------------
 arch/x86/include/asm/paravirt.h       | 20 +++++++-------
 arch/x86/include/asm/paravirt_types.h |  3 +-
 arch/x86/kernel/paravirt.c            |  4 ++-
 4 files changed, 36 insertions(+), 31 deletions(-)

diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index c5ce9845c999..f3bb33b1715d 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -59,6 +59,28 @@ static inline __cpuidle void native_halt(void)
 
 #endif
 
+#ifndef CONFIG_PARAVIRT
+#ifndef __ASSEMBLY__
+/*
+ * Used in the idle loop; sti takes one instruction cycle
+ * to complete:
+ */
+static inline __cpuidle void arch_safe_halt(void)
+{
+	native_safe_halt();
+}
+
+/*
+ * Used when interrupts are already enabled or to
+ * shutdown the processor:
+ */
+static inline __cpuidle void halt(void)
+{
+	native_halt();
+}
+#endif /* __ASSEMBLY__ */
+#endif /* CONFIG_PARAVIRT */
+
 #ifdef CONFIG_PARAVIRT_XXL
 #include <asm/paravirt.h>
 #else
@@ -80,24 +102,6 @@ static __always_inline void arch_local_irq_enable(void)
 	native_irq_enable();
 }
 
-/*
- * Used in the idle loop; sti takes one instruction cycle
- * to complete:
- */
-static inline __cpuidle void arch_safe_halt(void)
-{
-	native_safe_halt();
-}
-
-/*
- * Used when interrupts are already enabled or to
- * shutdown the processor:
- */
-static inline __cpuidle void halt(void)
-{
-	native_halt();
-}
-
 /*
  * For spinlocks, etc:
  */
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index da3a1ac82be5..d323a626c7a8 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -97,6 +97,16 @@ static inline void paravirt_arch_exit_mmap(struct mm_struct *mm)
 	PVOP_VCALL1(mmu.exit_mmap, mm);
 }
 
+static inline void arch_safe_halt(void)
+{
+	PVOP_VCALL0(irq.safe_halt);
+}
+
+static inline void halt(void)
+{
+	PVOP_VCALL0(irq.halt);
+}
+
 #ifdef CONFIG_PARAVIRT_XXL
 static inline void load_sp0(unsigned long sp0)
 {
@@ -162,16 +172,6 @@ static inline void __write_cr4(unsigned long x)
 	PVOP_VCALL1(cpu.write_cr4, x);
 }
 
-static inline void arch_safe_halt(void)
-{
-	PVOP_VCALL0(irq.safe_halt);
-}
-
-static inline void halt(void)
-{
-	PVOP_VCALL0(irq.halt);
-}
-
 static inline void wbinvd(void)
 {
 	PVOP_ALT_VCALL0(cpu.wbinvd, "wbinvd", ALT_NOT(X86_FEATURE_XENPV));
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index d9d6b0203ec4..40082847f314 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -150,10 +150,9 @@ struct pv_irq_ops {
 	struct paravirt_callee_save save_fl;
 	struct paravirt_callee_save irq_disable;
 	struct paravirt_callee_save irq_enable;
-
+#endif
 	void (*safe_halt)(void);
 	void (*halt)(void);
-#endif
 } __no_randomize_layout;
 
 struct pv_mmu_ops {
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 04cafc057bed..124e0f6c5d1c 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -283,9 +283,11 @@ struct paravirt_patch_template pv_ops = {
 	.irq.save_fl		= __PV_IS_CALLEE_SAVE(native_save_fl),
 	.irq.irq_disable	= __PV_IS_CALLEE_SAVE(native_irq_disable),
 	.irq.irq_enable		= __PV_IS_CALLEE_SAVE(native_irq_enable),
+#endif /* CONFIG_PARAVIRT_XXL */
+
+	/* Irq HLT ops. */
 	.irq.safe_halt		= native_safe_halt,
 	.irq.halt		= native_halt,
-#endif /* CONFIG_PARAVIRT_XXL */
 
 	/* Mmu ops. */
 	.mmu.flush_tlb_user	= native_flush_tlb_local,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v5 02/12] x86/tdx: Introduce INTEL_TDX_GUEST config option
  2021-08-04 18:13 [PATCH v5 00/12] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
  2021-08-04 18:13 ` [PATCH v5 01/12] x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT Kuppuswamy Sathyanarayanan
@ 2021-08-04 18:13 ` Kuppuswamy Sathyanarayanan
  2021-08-04 18:13 ` [PATCH v5 03/12] x86/cpufeatures: Add TDX Guest CPU feature Kuppuswamy Sathyanarayanan
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 72+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-08-04 18:13 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Andy Lutomirski
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel

Add INTEL_TDX_GUEST config option to selectively compile
TDX guest support.

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---

Changes since v4:
 * None

Changes since v3:
 * Removed PARAVIRT_XL from dependency list.
 * Fixed typo in help content.

 arch/x86/Kconfig | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index de58e0a5df50..ab0e7c346c44 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -865,6 +865,20 @@ config ACRN_GUEST
 	  IOT with small footprint and real-time features. More details can be
 	  found in https://projectacrn.org/.
 
+config INTEL_TDX_GUEST
+	bool "Intel Trusted Domain eXtensions Guest Support"
+	depends on X86_64 && CPU_SUP_INTEL && PARAVIRT
+	depends on SECURITY
+	select X86_X2APIC
+	select SECURITY_LOCKDOWN_LSM
+	help
+	  Provide support for running in a trusted domain on Intel processors
+	  equipped with Trusted Domain eXtensions. TDX is a new Intel
+	  technology that extends VMX and Memory Encryption with a new kind of
+	  virtual machine guest called Trust Domain (TD). A TD is designed to
+	  run in a CPU mode that protects the confidentiality of TD memory
+	  contents and the TD’s CPU state from other software, including VMM.
+
 endif #HYPERVISOR_GUEST
 
 source "arch/x86/Kconfig.cpu"
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v5 03/12] x86/cpufeatures: Add TDX Guest CPU feature
  2021-08-04 18:13 [PATCH v5 00/12] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
  2021-08-04 18:13 ` [PATCH v5 01/12] x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT Kuppuswamy Sathyanarayanan
  2021-08-04 18:13 ` [PATCH v5 02/12] x86/tdx: Introduce INTEL_TDX_GUEST config option Kuppuswamy Sathyanarayanan
@ 2021-08-04 18:13 ` Kuppuswamy Sathyanarayanan
  2021-08-12  7:39   ` Borislav Petkov
  2021-08-04 18:13 ` [PATCH v5 04/12] x86/tdx: Add protected guest support for TDX guest Kuppuswamy Sathyanarayanan
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 72+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-08-04 18:13 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Andy Lutomirski
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel

Add CPU feature detection for Trusted Domain Extensions support. TDX
feature adds capabilities to keep guest register state and memory
isolated from hypervisor.

For TDX guest platforms, executing CPUID(eax=0x21, ecx=0) will return
following values in EAX, EBX, ECX and EDX.

EAX:  Maximum sub-leaf number:  0
EBX/EDX/ECX:  Vendor string:

EBX =  "Inte"
EDX =  "lTDX"
ECX =  "    "

So when above condition is true, set X86_FEATURE_TDX_GUEST feature cap
bit.

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Changes since v4:
 * Moved tdx_early_init() below copy_bootdata() because of
   cmdline and IDT dependencies.

Changes since v3:
 * Fixed order of string cpuid_count() call.

Changes since v2:
 * Fixed debug prints as per Borislav suggestion.

Changes since v1:
 * Fixed commit log issues reported by Borislav.
 * Moved header file include to the start of tdx.h.
 * Added pr_fmt for TDX.
 * Simplified cpuid_has_tdx_guest() implementation as per
   Borislav comments.

 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/include/asm/tdx.h         | 20 ++++++++++++++++++++
 arch/x86/kernel/Makefile           |  1 +
 arch/x86/kernel/head64.c           |  3 +++
 arch/x86/kernel/tdx.c              | 29 +++++++++++++++++++++++++++++
 5 files changed, 54 insertions(+)
 create mode 100644 arch/x86/include/asm/tdx.h
 create mode 100644 arch/x86/kernel/tdx.c

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index d0ce5cfd3ac1..84997abeb401 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -238,6 +238,7 @@
 #define X86_FEATURE_VMW_VMMCALL		( 8*32+19) /* "" VMware prefers VMMCALL hypercall instruction */
 #define X86_FEATURE_PVUNLOCK		( 8*32+20) /* "" PV unlock function */
 #define X86_FEATURE_VCPUPREEMPT		( 8*32+21) /* "" PV vcpu_is_preempted function */
+#define X86_FEATURE_TDX_GUEST		( 8*32+22) /* Trusted Domain Extensions Guest */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0 (EBX), word 9 */
 #define X86_FEATURE_FSGSBASE		( 9*32+ 0) /* RDFSBASE, WRFSBASE, RDGSBASE, WRGSBASE instructions*/
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
new file mode 100644
index 000000000000..c738bde944d1
--- /dev/null
+++ b/arch/x86/include/asm/tdx.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2020 Intel Corporation */
+#ifndef _ASM_X86_TDX_H
+#define _ASM_X86_TDX_H
+
+#include <linux/cpufeature.h>
+
+#define TDX_CPUID_LEAF_ID	0x21
+
+#ifdef CONFIG_INTEL_TDX_GUEST
+
+void __init tdx_early_init(void);
+
+#else
+
+static inline void tdx_early_init(void) { };
+
+#endif /* CONFIG_INTEL_TDX_GUEST */
+
+#endif /* _ASM_X86_TDX_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 3e625c61f008..b49110bd50d9 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -127,6 +127,7 @@ obj-$(CONFIG_PARAVIRT_CLOCK)	+= pvclock.o
 obj-$(CONFIG_X86_PMEM_LEGACY_DEVICE) += pmem.o
 
 obj-$(CONFIG_JAILHOUSE_GUEST)	+= jailhouse.o
+obj-$(CONFIG_INTEL_TDX_GUEST)	+= tdx.o
 
 obj-$(CONFIG_EISA)		+= eisa.o
 obj-$(CONFIG_PCSPKR_PLATFORM)	+= pcspeaker.o
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index de01903c3735..f41f3dac1318 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -40,6 +40,7 @@
 #include <asm/extable.h>
 #include <asm/trapnr.h>
 #include <asm/sev.h>
+#include <asm/tdx.h>
 
 /*
  * Manage page tables very early on.
@@ -495,6 +496,8 @@ asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)
 
 	copy_bootdata(__va(real_mode_data));
 
+	tdx_early_init();
+
 	/*
 	 * Load microcode early on BSP.
 	 */
diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
new file mode 100644
index 000000000000..39dd1515b131
--- /dev/null
+++ b/arch/x86/kernel/tdx.c
@@ -0,0 +1,29 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2020 Intel Corporation */
+
+#undef pr_fmt
+#define pr_fmt(fmt)     "x86/tdx: " fmt
+
+#include <asm/tdx.h>
+
+static inline bool cpuid_has_tdx_guest(void)
+{
+	u32 eax, sig[3];
+
+	if (cpuid_eax(0) < TDX_CPUID_LEAF_ID)
+		return false;
+
+	cpuid_count(TDX_CPUID_LEAF_ID, 0, &eax, &sig[0], &sig[2], &sig[1]);
+
+	return !memcmp("IntelTDX    ", sig, 12);
+}
+
+void __init tdx_early_init(void)
+{
+	if (!cpuid_has_tdx_guest())
+		return;
+
+	setup_force_cpu_cap(X86_FEATURE_TDX_GUEST);
+
+	pr_info("Guest initialized\n");
+}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v5 04/12] x86/tdx: Add protected guest support for TDX guest
  2021-08-04 18:13 [PATCH v5 00/12] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
                   ` (2 preceding siblings ...)
  2021-08-04 18:13 ` [PATCH v5 03/12] x86/cpufeatures: Add TDX Guest CPU feature Kuppuswamy Sathyanarayanan
@ 2021-08-04 18:13 ` Kuppuswamy Sathyanarayanan
  2021-08-04 21:59   ` Sean Christopherson
  2021-08-04 18:13 ` [PATCH v5 05/12] x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper functions Kuppuswamy Sathyanarayanan
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 72+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-08-04 18:13 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Andy Lutomirski
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel

TDX architecture provides a way for VM guests to be highly secure and
isolated (from untrusted VMM). To achieve this requirement, we can't
completely trust any data coming from VMM. TDX guest fixes this issue
by hardening the IO drivers against the attack from the VMM. Since we
have a requirement to modify the generic drivers, we need to use the
generic prot_guest_has() API to add TDX specific code in generic
drivers.

So add TDX guest support in prot_guest_has() API.

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Change since v4:
 * Rebased on top of Tom Lendacky's protected guest changes.
 * Moved memory encryption related protected guest flags in
   tdx_prot_guest_has() to the patch that actually uses them.

 arch/x86/Kconfig                       |  1 +
 arch/x86/include/asm/protected_guest.h |  5 +++++
 arch/x86/include/asm/tdx.h             |  4 ++++
 arch/x86/kernel/tdx.c                  | 13 +++++++++++++
 include/linux/protected_guest.h        |  3 +++
 5 files changed, 26 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ab0e7c346c44..10f2cb51a39d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -871,6 +871,7 @@ config INTEL_TDX_GUEST
 	depends on SECURITY
 	select X86_X2APIC
 	select SECURITY_LOCKDOWN_LSM
+	select ARCH_HAS_PROTECTED_GUEST
 	help
 	  Provide support for running in a trusted domain on Intel processors
 	  equipped with Trusted Domain eXtensions. TDX is a new Intel
diff --git a/arch/x86/include/asm/protected_guest.h b/arch/x86/include/asm/protected_guest.h
index b4a267dddf93..c67bf13c8ad3 100644
--- a/arch/x86/include/asm/protected_guest.h
+++ b/arch/x86/include/asm/protected_guest.h
@@ -12,12 +12,17 @@
 
 #include <linux/mem_encrypt.h>
 
+#include <asm/processor.h>
+#include <asm/tdx.h>
+
 #ifndef __ASSEMBLY__
 
 static inline bool prot_guest_has(unsigned int attr)
 {
 	if (sme_me_mask)
 		return amd_prot_guest_has(attr);
+	else if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
+		return tdx_prot_guest_has(attr);
 
 	return false;
 }
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index c738bde944d1..eee226e4b3b4 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -11,10 +11,14 @@
 
 void __init tdx_early_init(void);
 
+bool tdx_prot_guest_has(unsigned long flag);
+
 #else
 
 static inline void tdx_early_init(void) { };
 
+static inline bool tdx_prot_guest_has(unsigned long flag) { return false; }
+
 #endif /* CONFIG_INTEL_TDX_GUEST */
 
 #endif /* _ASM_X86_TDX_H */
diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index 39dd1515b131..1a032d700f51 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -4,6 +4,8 @@
 #undef pr_fmt
 #define pr_fmt(fmt)     "x86/tdx: " fmt
 
+#include <linux/protected_guest.h>
+
 #include <asm/tdx.h>
 
 static inline bool cpuid_has_tdx_guest(void)
@@ -18,6 +20,17 @@ static inline bool cpuid_has_tdx_guest(void)
 	return !memcmp("IntelTDX    ", sig, 12);
 }
 
+bool tdx_prot_guest_has(unsigned long flag)
+{
+	switch (flag) {
+	case PATTR_GUEST_TDX:
+		return cpu_feature_enabled(X86_FEATURE_TDX_GUEST);
+	}
+
+	return false;
+}
+EXPORT_SYMBOL_GPL(tdx_prot_guest_has);
+
 void __init tdx_early_init(void)
 {
 	if (!cpuid_has_tdx_guest())
diff --git a/include/linux/protected_guest.h b/include/linux/protected_guest.h
index 7a7120abbb62..9085f5dd834c 100644
--- a/include/linux/protected_guest.h
+++ b/include/linux/protected_guest.h
@@ -22,6 +22,9 @@
 #define PATTR_SEV			0x801
 #define PATTR_SEV_ES			0x802
 
+/* 0x900 - 0x9ff reserved for Intel */
+#define PATTR_GUEST_TDX			0x900
+
 #ifdef CONFIG_ARCH_HAS_PROTECTED_GUEST
 
 #include <asm/protected_guest.h>
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v5 05/12] x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper functions
  2021-08-04 18:13 [PATCH v5 00/12] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
                   ` (3 preceding siblings ...)
  2021-08-04 18:13 ` [PATCH v5 04/12] x86/tdx: Add protected guest support for TDX guest Kuppuswamy Sathyanarayanan
@ 2021-08-04 18:13 ` Kuppuswamy Sathyanarayanan
  2021-08-20 15:16   ` Borislav Petkov
  2021-08-04 18:13 ` [PATCH v5 06/12] x86/tdx: Get TD execution environment information via TDINFO Kuppuswamy Sathyanarayanan
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 72+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-08-04 18:13 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Andy Lutomirski
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel

Guests communicate with VMMs with hypercalls. Historically, these
are implemented using instructions that are known to cause VMEXITs
like vmcall, vmlaunch, etc. However, with TDX, VMEXITs no longer
expose guest state to the host.  This prevents the old hypercall
mechanisms from working. So to communicate with VMM, TDX
specification defines a new instruction called TDCALL.

In a TDX based VM, since VMM is an untrusted entity, a intermediary
layer (TDX module) exists between host and guest to facilitate
secure communication. TDX guests communicate with the TDX module
using TDCALL instruction. 

Both TDX module and VMM communication uses TDCALL instruction. Value
of the RAX register when executing TDCALL instruction is used to
determine the TDCALL type. If the TDCALL is executed with value "0"
in RAX, then it is the service request to VMM. Any other value in
RAX means it is a TDX module related call.

Implement common helper functions to communicate with the TDX Module
and VMM (using TDCALL instruction).
   
__tdx_hypercall()    - request services from the VMM.
__tdx_module_call()  - communicate with the TDX Module.

Also define two additional wrappers, tdx_hypercall() and
tdx_hypercall_out_r11() to cover common use cases of
__tdx_hypercall() function. Since each use case of
__tdx_module_call() is different, it does not need
multiple wrappers.

Implement __tdx_module_call() and __tdx_hypercall() helper functions
in assembly.

Rationale behind choosing to use assembly over inline assembly is,
since the number of lines of instructions (with comments) in
__tdx_hypercall() implementation is over 70, using inline assembly
to implement it will make it hard to read.
   
Also, just like syscalls, not all TDVMCALL/TDCALLs use cases need to
use the same set of argument registers. The implementation here picks
the current worst-case scenario for TDCALL (4 registers). For TDCALLs
with fewer than 4 arguments, there will end up being a few superfluous
(cheap) instructions.  But, this approach maximizes code reuse. The
same argument applies to __tdx_hypercall() function as well.

For registers used by TDCALL instruction, please check TDX GHCI
specification, sec 2.4 and 3.

https://software.intel.com/content/dam/develop/external/us/en/documents/intel-tdx-guest-hypervisor-communication-interface.pdf

Originally-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Changes since v4:
 * None

Changes since v3:
 * Fixed ARG7_SP_OFFSET value for CONFIG_FRAME_POINTER case.
 * Fixed usage of we/you in comments.

Change since v1:
 * Fixed commit comments format issues as per review comments.
 * Fixed empty line and other typo issues found in review.
 * Removed do_tdx_hypercall() helper function and modified
   __tdx_hypercall() to include do_tdx_hypercall() implementation
   and to accept TDVMCALL type as argument.
 * Since the number of arguments in __tdx_hypercall() is more than
   6, it has been modified  to get the 7th argument from the caller stack.
 * Instead of leaving output pointer in register before making
   TDCALL, stored it in a stack.
 * Instead of triggering ud2 for TDCALL failures in __tdx_hypercall(),
   it is modified to return the TDCALL status as return value. we will let
   user add appropriate error info before triggering fatal error. Also,
   extended struct tdx_hypercall_output to store r10 register which
   contains hypercall error code.
 * Included TDCALL ABI details in __tdx_module_call() and __tdx_hypercall.
 * Removed tdx_hypercall_out_r11() helper function. Since it is not really
   useful.
 * Added _tdx_hypercall() as a wrapper for __tdx_hypercall() with BUG_ON
   check for TDCALL failure.

 arch/x86/include/asm/tdx.h    |  40 +++++
 arch/x86/kernel/Makefile      |   2 +-
 arch/x86/kernel/asm-offsets.c |  23 +++
 arch/x86/kernel/tdcall.S      | 282 ++++++++++++++++++++++++++++++++++
 arch/x86/kernel/tdx.c         |  23 +++
 5 files changed, 369 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kernel/tdcall.S

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index eee226e4b3b4..72a6a719ce37 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -4,8 +4,40 @@
 #define _ASM_X86_TDX_H
 
 #include <linux/cpufeature.h>
+#include <linux/types.h>
 
 #define TDX_CPUID_LEAF_ID	0x21
+#define TDX_HYPERCALL_STANDARD  0
+
+/*
+ * Used in __tdx_module_call() helper function to gather the
+ * output registers' values of TDCALL instruction when requesting
+ * services from the TDX module. This is software only structure
+ * and not related to TDX module/VMM.
+ */
+struct tdx_module_output {
+	u64 rcx;
+	u64 rdx;
+	u64 r8;
+	u64 r9;
+	u64 r10;
+	u64 r11;
+};
+
+/*
+ * Used in __tdx_hypercall() helper function to gather the
+ * output registers' values of TDCALL instruction when requesting
+ * services from the VMM. This is software only structure
+ * and not related to TDX module/VMM.
+ */
+struct tdx_hypercall_output {
+	u64 r10;
+	u64 r11;
+	u64 r12;
+	u64 r13;
+	u64 r14;
+	u64 r15;
+};
 
 #ifdef CONFIG_INTEL_TDX_GUEST
 
@@ -13,6 +45,14 @@ void __init tdx_early_init(void);
 
 bool tdx_prot_guest_has(unsigned long flag);
 
+/* Helper function used to communicate with the TDX module */
+u64 __tdx_module_call(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
+		      struct tdx_module_output *out);
+
+/* Helper function used to request services from VMM */
+u64 __tdx_hypercall(u64 type, u64 fn, u64 r12, u64 r13, u64 r14,
+		    u64 r15, struct tdx_hypercall_output *out);
+
 #else
 
 static inline void tdx_early_init(void) { };
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index b49110bd50d9..247511b6a63d 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -127,7 +127,7 @@ obj-$(CONFIG_PARAVIRT_CLOCK)	+= pvclock.o
 obj-$(CONFIG_X86_PMEM_LEGACY_DEVICE) += pmem.o
 
 obj-$(CONFIG_JAILHOUSE_GUEST)	+= jailhouse.o
-obj-$(CONFIG_INTEL_TDX_GUEST)	+= tdx.o
+obj-$(CONFIG_INTEL_TDX_GUEST)	+= tdcall.o tdx.o
 
 obj-$(CONFIG_EISA)		+= eisa.o
 obj-$(CONFIG_PCSPKR_PLATFORM)	+= pcspeaker.o
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index ecd3fd6993d1..0494ec01218d 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -23,6 +23,10 @@
 #include <xen/interface/xen.h>
 #endif
 
+#ifdef CONFIG_INTEL_TDX_GUEST
+#include <asm/tdx.h>
+#endif
+
 #ifdef CONFIG_X86_32
 # include "asm-offsets_32.c"
 #else
@@ -68,6 +72,25 @@ static void __used common(void)
 	OFFSET(XEN_vcpu_info_arch_cr2, vcpu_info, arch.cr2);
 #endif
 
+#ifdef CONFIG_INTEL_TDX_GUEST
+	BLANK();
+	/* Offset for fields in tdx_module_output */
+	OFFSET(TDX_MODULE_rcx, tdx_module_output, rcx);
+	OFFSET(TDX_MODULE_rdx, tdx_module_output, rdx);
+	OFFSET(TDX_MODULE_r8,  tdx_module_output, r8);
+	OFFSET(TDX_MODULE_r9,  tdx_module_output, r9);
+	OFFSET(TDX_MODULE_r10, tdx_module_output, r10);
+	OFFSET(TDX_MODULE_r11, tdx_module_output, r11);
+
+	/* Offset for fields in tdx_hypercall_output */
+	OFFSET(TDX_HYPERCALL_r10, tdx_hypercall_output, r10);
+	OFFSET(TDX_HYPERCALL_r11, tdx_hypercall_output, r11);
+	OFFSET(TDX_HYPERCALL_r12, tdx_hypercall_output, r12);
+	OFFSET(TDX_HYPERCALL_r13, tdx_hypercall_output, r13);
+	OFFSET(TDX_HYPERCALL_r14, tdx_hypercall_output, r14);
+	OFFSET(TDX_HYPERCALL_r15, tdx_hypercall_output, r15);
+#endif
+
 	BLANK();
 	OFFSET(BP_scratch, boot_params, scratch);
 	OFFSET(BP_secure_boot, boot_params, secure_boot);
diff --git a/arch/x86/kernel/tdcall.S b/arch/x86/kernel/tdcall.S
new file mode 100644
index 000000000000..c9c91c1bf99d
--- /dev/null
+++ b/arch/x86/kernel/tdcall.S
@@ -0,0 +1,282 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include <asm/asm-offsets.h>
+#include <asm/asm.h>
+#include <asm/frame.h>
+#include <asm/unwind_hints.h>
+
+#include <linux/linkage.h>
+#include <linux/bits.h>
+#include <linux/errno.h>
+
+#define TDG_R10		BIT(10)
+#define TDG_R11		BIT(11)
+#define TDG_R12		BIT(12)
+#define TDG_R13		BIT(13)
+#define TDG_R14		BIT(14)
+#define TDG_R15		BIT(15)
+
+#ifdef CONFIG_FRAME_POINTER
+/* Frame offset + 8 (for arg1) */
+#define ARG7_SP_OFFSET		(FRAME_OFFSET + 0x08)
+#else
+#define ARG7_SP_OFFSET		(0x08)
+#endif
+
+/*
+ * Expose registers R10-R15 to VMM. It is passed via RCX register
+ * to the TDX Module, which will be used by the TDX module to
+ * identify the list of registers exposed to VMM. Each bit in this
+ * mask represents a register ID. Bit field details can be found
+ * in TDX GHCI specification.
+ */
+#define TDVMCALL_EXPOSE_REGS_MASK	( TDG_R10 | TDG_R11 | \
+					  TDG_R12 | TDG_R13 | \
+					  TDG_R14 | TDG_R15 )
+
+/*
+ * TDX guests use the TDCALL instruction to make requests to the
+ * TDX module and hypercalls to the VMM. It is supported in
+ * Binutils >= 2.36.
+ */
+#define tdcall .byte 0x66,0x0f,0x01,0xcc
+
+/*
+ * __tdx_module_call()  - Helper function used by TDX guests to request
+ * services from the TDX module (does not include VMM services).
+ *
+ * This function serves as a wrapper to move user call arguments to the
+ * correct registers as specified by TDCALL ABI and share it with the
+ * TDX module. If the TDCALL operation is successful and a valid
+ * "struct tdx_module_output" pointer is available (in "out" argument),
+ * output from the TDX module is saved to the memory specified in the
+ * "out" pointer. Also the status of the TDCALL operation is returned
+ * back to the user as a function return value.
+ *
+ *-------------------------------------------------------------------------
+ * TDCALL ABI:
+ *-------------------------------------------------------------------------
+ * Input Registers:
+ *
+ * RAX                 - TDCALL Leaf number.
+ * RCX,RDX,R8-R9       - TDCALL Leaf specific input registers.
+ *
+ * Output Registers:
+ *
+ * RAX                 - TDCALL instruction error code.
+ * RCX,RDX,R8-R11      - TDCALL Leaf specific output registers.
+ *
+ *-------------------------------------------------------------------------
+ *
+ * __tdx_module_call() function ABI:
+ *
+ * @fn  (RDI)          - TDCALL Leaf ID,    moved to RAX
+ * @rcx (RSI)          - Input parameter 1, moved to RCX
+ * @rdx (RDX)          - Input parameter 2, moved to RDX
+ * @r8  (RCX)          - Input parameter 3, moved to R8
+ * @r9  (R8)           - Input parameter 4, moved to R9
+ *
+ * @out (R9)           - struct tdx_module_output pointer
+ *                       stored temporarily in R12 (not
+ *                       shared with the TDX module). It
+ *                       can be NULL.
+ *
+ * Return status of TDCALL via RAX.
+ */
+SYM_FUNC_START(__tdx_module_call)
+	FRAME_BEGIN
+
+	/*
+	 * R12 will be used as temporary storage for
+	 * struct tdx_module_output pointer. More
+	 * details about struct tdx_module_output can
+	 * be found in arch/x86/include/asm/tdx.h. Also
+	 * note that registers R12-R15 are not used by
+	 * TDCALL services supported by this helper
+	 * function.
+	 */
+
+	/* Callee saved, so preserve it */
+	push %r12
+
+	/*
+	 * Push output pointer to stack, after TDCALL operation,
+	 * it will be fetched into R12 register.
+	 */
+	push %r9
+
+	/* Mangle function call ABI into TDCALL ABI: */
+	/* Move TDCALL Leaf ID to RAX */
+	mov %rdi, %rax
+	/* Move input 4 to R9 */
+	mov %r8,  %r9
+	/* Move input 3 to R8 */
+	mov %rcx, %r8
+	/* Move input 1 to RCX */
+	mov %rsi, %rcx
+	/* Leave input param 2 in RDX */
+
+	tdcall
+
+	/* Fetch output pointer from stack to R12 */
+	pop %r12
+
+	/* Check for TDCALL success: 0 - Successful, otherwise failed */
+	test %rax, %rax
+	jnz 1f
+
+	/*
+	 * __tdx_module_call() can be initiated without an output pointer.
+	 * So, check if caller provided an output struct before storing
+	 * output registers.
+	 */
+	test %r12, %r12
+	jz 1f
+
+	/* Copy TDCALL result registers to output struct: */
+	movq %rcx, TDX_MODULE_rcx(%r12)
+	movq %rdx, TDX_MODULE_rdx(%r12)
+	movq %r8,  TDX_MODULE_r8(%r12)
+	movq %r9,  TDX_MODULE_r9(%r12)
+	movq %r10, TDX_MODULE_r10(%r12)
+	movq %r11, TDX_MODULE_r11(%r12)
+1:
+	/* Restore the state of R12 register */
+	pop %r12
+
+	FRAME_END
+	ret
+SYM_FUNC_END(__tdx_module_call)
+
+/*
+ * __tdx_hypercall()  - Helper function used by TDX guests to request
+ * services from the VMM. All requests are made via the TDX module
+ * using TDCALL instruction.
+ *
+ * This function serves as a wrapper to move user call arguments to the
+ * correct registers as specified by TDCALL ABI and share it with VMM
+ * via the TDX module. After TDCALL operation, output from the VMM is
+ * saved to the memory specified in the "out" (struct tdx_hypercall_output)
+ * pointer. 
+ *
+ *-------------------------------------------------------------------------
+ * TD VMCALL ABI:
+ *-------------------------------------------------------------------------
+ *
+ * Input Registers:
+ *
+ * RAX                 - TDCALL instruction leaf number (0 - TDG.VP.VMCALL)
+ * RCX                 - BITMAP which controls which part of TD Guest GPR
+ *                       is passed as-is to VMM and back.
+ * R10                 - Set 0 to indicate TDCALL follows standard TDX ABI
+ *                       specification. Non zero value indicates vendor
+ *                       specific ABI.
+ * R11                 - VMCALL sub function number
+ * RBX, RBP, RDI, RSI  - Used to pass VMCALL sub function specific arguments.
+ * R8-R9, R12–R15      - Same as above.
+ *
+ * Output Registers:
+ *
+ * RAX                 - TDCALL instruction status (Not related to hypercall
+ *                        output).
+ * R10                 - Hypercall output error code.
+ * R11-R15             - Hypercall sub function specific output values.
+ *
+ *-------------------------------------------------------------------------
+ *
+ * __tdx_hypercall() function ABI:
+ *
+ * @type  (RDI)        - TD VMCALL type, moved to R10
+ * @fn    (RSI)        - TD VMCALL sub function, moved to R11
+ * @r12   (RDX)        - Input parameter 1, moved to R12
+ * @r13   (RCX)        - Input parameter 2, moved to R13
+ * @r14   (R8)         - Input parameter 3, moved to R14
+ * @r15   (R9)         - Input parameter 4, moved to R15
+ *
+ * @out   (stack)      - struct tdx_hypercall_output pointer (cannot be NULL)
+ *
+ * On successful completion, return TDCALL status or -EINVAL for invalid
+ * inputs.
+ */
+SYM_FUNC_START(__tdx_hypercall)
+	FRAME_BEGIN
+
+	/* Move argument 7 from caller stack to RAX */
+	movq ARG7_SP_OFFSET(%rsp), %rax
+
+	/* Check if caller provided an output struct */
+	test %rax, %rax
+	/* If out pointer is NULL, return -EINVAL */
+	jz 1f
+
+	/* Save callee-s ved GPRs as mandated by the x86_64 ABI */
+	push %r15
+	push %r14
+	push %r13
+	push %r12
+
+	/*
+	 * Save R9 and output pointer (rax) in stack, it will be used
+	 * again when storing the output registers after TDCALL
+	 * operation.
+	 */
+	push %r9
+	push %rax
+
+	/* Mangle function call ABI into TDCALL ABI: */
+	/* Set TDCALL leaf ID (TDVMCALL (0)) in RAX */
+	xor %eax, %eax
+	/* Move TDVMCALL type (standard vs vendor) in R10 */
+	mov %rdi, %r10
+	/* Move TDVMCALL sub function id to R11 */
+	mov %rsi, %r11
+	/* Move input 1 to R12 */
+	mov %rdx, %r12
+	/* Move input 2 to R13 */
+	mov %rcx, %r13
+	/* Move input 3 to R14 */
+	mov %r8,  %r14
+	/* Move input 4 to R15 */
+	mov %r9,  %r15
+
+	movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx
+
+	tdcall
+
+	/* Restore output pointer to R9 */
+	pop  %r9
+
+	/* Copy hypercall result registers to output struct: */
+	movq %r10, TDX_HYPERCALL_r10(%r9)
+	movq %r11, TDX_HYPERCALL_r11(%r9)
+	movq %r12, TDX_HYPERCALL_r12(%r9)
+	movq %r13, TDX_HYPERCALL_r13(%r9)
+	movq %r14, TDX_HYPERCALL_r14(%r9)
+	movq %r15, TDX_HYPERCALL_r15(%r9)
+
+	/*
+	 * Zero out registers exposed to the VMM to avoid
+	 * speculative execution with VMM-controlled values.
+	 * This needs to include all registers present in
+	 * TDVMCALL_EXPOSE_REGS_MASK (except R12-R15).
+	 * R12-R15 context will be restored.
+	 */
+	xor %r10d, %r10d
+	xor %r11d, %r11d
+
+	/* Restore state of R9 register */
+	pop %r9
+
+	/* Restore callee-s ved GPRs as mandated by the x86_64 ABI */
+	pop %r12
+	pop %r13
+	pop %r14
+	pop %r15
+
+	jmp 2f
+1:
+       movq $(-EINVAL), %rax
+2:
+       FRAME_END
+
+       retq
+SYM_FUNC_END(__tdx_hypercall)
diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index 1a032d700f51..287564990f21 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -8,6 +8,29 @@
 
 #include <asm/tdx.h>
 
+/*
+ * Wrapper for standard use of __tdx_hypercall with BUG_ON() check
+ * for TDCALL error.
+ */
+static inline u64 _tdx_hypercall(u64 fn, u64 r12, u64 r13, u64 r14,
+				 u64 r15, struct tdx_hypercall_output *out)
+{
+	struct tdx_hypercall_output outl = {0};
+	u64 err;
+
+	/* __tdx_hypercall() does not accept NULL output pointer */
+	if (!out)
+		out = &outl;
+
+	err = __tdx_hypercall(TDX_HYPERCALL_STANDARD, fn, r12, r13, r14,
+			      r15, out);
+
+	/* Non zero return value indicates buggy TDX module, so panic */
+	BUG_ON(err);
+
+	return out->r10;
+}
+
 static inline bool cpuid_has_tdx_guest(void)
 {
 	u32 eax, sig[3];
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v5 06/12] x86/tdx: Get TD execution environment information via TDINFO
  2021-08-04 18:13 [PATCH v5 00/12] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
                   ` (4 preceding siblings ...)
  2021-08-04 18:13 ` [PATCH v5 05/12] x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper functions Kuppuswamy Sathyanarayanan
@ 2021-08-04 18:13 ` Kuppuswamy Sathyanarayanan
  2021-08-04 22:38   ` Sean Christopherson
  2021-08-20 17:13   ` Borislav Petkov
  2021-08-04 18:13 ` [PATCH v5 07/12] x86/traps: Add #VE support for TDX guest Kuppuswamy Sathyanarayanan
                   ` (5 subsequent siblings)
  11 siblings, 2 replies; 72+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-08-04 18:13 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Andy Lutomirski
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Per Guest-Host-Communication Interface (GHCI) for Intel Trust
Domain Extensions (Intel TDX) specification, sec 2.4.2,
TDCALL[TDINFO] provides basic TD execution environment information, not
provided by CPUID.

Call TDINFO during early boot to be used for following system
initialization.

The call provides info on which bit in pfn is used to indicate that the
page is shared with the host and attributes of the TD, such as debug.

Information about the number of CPUs need not be saved because there are
no users so far for it.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Changes since v4:
 * None

Changes since v3:
 * None

 arch/x86/kernel/tdx.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index 287564990f21..3973e81751ba 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -8,6 +8,14 @@
 
 #include <asm/tdx.h>
 
+/* TDX Module call Leaf IDs */
+#define TDINFO				1
+
+static struct {
+	unsigned int gpa_width;
+	unsigned long attributes;
+} td_info __ro_after_init;
+
 /*
  * Wrapper for standard use of __tdx_hypercall with BUG_ON() check
  * for TDCALL error.
@@ -54,6 +62,19 @@ bool tdx_prot_guest_has(unsigned long flag)
 }
 EXPORT_SYMBOL_GPL(tdx_prot_guest_has);
 
+static void tdg_get_info(void)
+{
+	u64 ret;
+	struct tdx_module_output out = {0};
+
+	ret = __tdx_module_call(TDINFO, 0, 0, 0, 0, &out);
+
+	BUG_ON(ret);
+
+	td_info.gpa_width = out.rcx & GENMASK(5, 0);
+	td_info.attributes = out.rdx;
+}
+
 void __init tdx_early_init(void)
 {
 	if (!cpuid_has_tdx_guest())
@@ -61,5 +82,7 @@ void __init tdx_early_init(void)
 
 	setup_force_cpu_cap(X86_FEATURE_TDX_GUEST);
 
+	tdg_get_info();
+
 	pr_info("Guest initialized\n");
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v5 07/12] x86/traps: Add #VE support for TDX guest
  2021-08-04 18:13 [PATCH v5 00/12] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
                   ` (5 preceding siblings ...)
  2021-08-04 18:13 ` [PATCH v5 06/12] x86/tdx: Get TD execution environment information via TDINFO Kuppuswamy Sathyanarayanan
@ 2021-08-04 18:13 ` Kuppuswamy Sathyanarayanan
  2021-08-24 10:17   ` Borislav Petkov
  2021-08-04 18:13 ` [PATCH v5 08/12] x86/tdx: Add HLT " Kuppuswamy Sathyanarayanan
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 72+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-08-04 18:13 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Andy Lutomirski
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Virtualization Exceptions (#VE) are delivered to TDX guests due to
specific guest actions which may happen in either user space or the kernel:

 * Specific instructions (WBINVD, for example)
 * Specific MSR accesses
 * Specific CPUID leaf accesses
 * Access to TD-shared memory, which includes MMIO

In the settings that Linux will run in, virtual exceptions are never
generated on accesses to normal, TD-private memory that has been
accepted.

The entry paths do not access TD-shared memory, MMIO regions or use
those specific MSRs, instructions, CPUID leaves that might generate #VE.
In addition, all interrupts including NMIs are blocked by the hardware
starting with #VE delivery until TDGETVEINFO is called.  This eliminates
the chance of a #VE during the syscall gap or paranoid entry paths and
simplifies #VE handling.

After TDGETVEINFO #VE could happen in theory (e.g. through an NMI),
but it is expected not to happen because TDX expects NMIs not to
trigger #VEs. Another case where they could happen is if the #VE
exception panics, but in this case there are no guarantees on anything
anyways.

If a guest kernel action which would normally cause a #VE occurs in the
interrupt-disabled region before TDGETVEINFO, a #DF is delivered to the
guest which will result in an oops (and should eventually be a panic, as
we would like to set panic_on_oops to 1 for TDX guests).

Add basic infrastructure to handle any #VE which occurs in the kernel or
userspace.  Later patches will add handling for specific #VE scenarios.

Convert unhandled #VE's (everything, until later in this series) so that
they appear just like a #GP by calling ve_raise_fault() directly.
ve_raise_fault() is similar to #GP handler and is responsible for
sending SIGSEGV to userspace and cpu die and notifying debuggers and
other die chain users.  

Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Changes since v4:
 * Since ve_raise_fault() is used only by TDX code, moved it
   within #ifdef CONFIG_INTEL_TDX_GUEST.

Changes since v3:
 * None

 arch/x86/include/asm/idtentry.h |  4 ++
 arch/x86/include/asm/tdx.h      | 19 +++++++++
 arch/x86/kernel/idt.c           |  6 +++
 arch/x86/kernel/tdx.c           | 36 +++++++++++++++++
 arch/x86/kernel/traps.c         | 69 +++++++++++++++++++++++++++++++++
 5 files changed, 134 insertions(+)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 1345088e9902..8ccc81d653b3 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -625,6 +625,10 @@ DECLARE_IDTENTRY_XENCB(X86_TRAP_OTHER,	exc_xen_hypervisor_callback);
 DECLARE_IDTENTRY_RAW(X86_TRAP_OTHER,	exc_xen_unknown_trap);
 #endif
 
+#ifdef CONFIG_INTEL_TDX_GUEST
+DECLARE_IDTENTRY(X86_TRAP_VE,		exc_virtualization_exception);
+#endif
+
 /* Device interrupts common/spurious */
 DECLARE_IDTENTRY_IRQ(X86_TRAP_OTHER,	common_interrupt);
 #ifdef CONFIG_X86_LOCAL_APIC
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 72a6a719ce37..846fe58f0426 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -39,6 +39,20 @@ struct tdx_hypercall_output {
 	u64 r15;
 };
 
+/*
+ * Used by #VE exception handler to gather the #VE exception
+ * info from the TDX module. This is software only structure
+ * and not related to TDX module/VMM.
+ */
+struct ve_info {
+	u64 exit_reason;
+	u64 exit_qual;
+	u64 gla;	/* Guest Linear (virtual) Address */
+	u64 gpa;	/* Guest Physical (virtual) Address */
+	u32 instr_len;
+	u32 instr_info;
+};
+
 #ifdef CONFIG_INTEL_TDX_GUEST
 
 void __init tdx_early_init(void);
@@ -53,6 +67,11 @@ u64 __tdx_module_call(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
 u64 __tdx_hypercall(u64 type, u64 fn, u64 r12, u64 r13, u64 r14,
 		    u64 r15, struct tdx_hypercall_output *out);
 
+unsigned long tdg_get_ve_info(struct ve_info *ve);
+
+int tdg_handle_virtualization_exception(struct pt_regs *regs,
+					struct ve_info *ve);
+
 #else
 
 static inline void tdx_early_init(void) { };
diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index df0fa695bb09..a5eaae8e6c44 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -68,6 +68,9 @@ static const __initconst struct idt_data early_idts[] = {
 	 */
 	INTG(X86_TRAP_PF,		asm_exc_page_fault),
 #endif
+#ifdef CONFIG_INTEL_TDX_GUEST
+	INTG(X86_TRAP_VE,		asm_exc_virtualization_exception),
+#endif
 };
 
 /*
@@ -91,6 +94,9 @@ static const __initconst struct idt_data def_idts[] = {
 	INTG(X86_TRAP_MF,		asm_exc_coprocessor_error),
 	INTG(X86_TRAP_AC,		asm_exc_alignment_check),
 	INTG(X86_TRAP_XF,		asm_exc_simd_coprocessor_error),
+#ifdef CONFIG_INTEL_TDX_GUEST
+	INTG(X86_TRAP_VE,		asm_exc_virtualization_exception),
+#endif
 
 #ifdef CONFIG_X86_32
 	TSKG(X86_TRAP_DF,		GDT_ENTRY_DOUBLEFAULT_TSS),
diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index 3973e81751ba..6169f9c740b2 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -10,6 +10,7 @@
 
 /* TDX Module call Leaf IDs */
 #define TDINFO				1
+#define TDGETVEINFO			3
 
 static struct {
 	unsigned int gpa_width;
@@ -75,6 +76,41 @@ static void tdg_get_info(void)
 	td_info.attributes = out.rdx;
 }
 
+unsigned long tdg_get_ve_info(struct ve_info *ve)
+{
+	u64 ret;
+	struct tdx_module_output out = {0};
+
+	/*
+	 * NMIs and machine checks are suppressed. Before this point any
+	 * #VE is fatal. After this point (TDGETVEINFO call), NMIs and
+	 * additional #VEs are permitted (but we don't expect them to
+	 * happen unless you panic).
+	 */
+	ret = __tdx_module_call(TDGETVEINFO, 0, 0, 0, 0, &out);
+
+	ve->exit_reason = out.rcx;
+	ve->exit_qual   = out.rdx;
+	ve->gla         = out.r8;
+	ve->gpa         = out.r9;
+	ve->instr_len   = out.r10 & UINT_MAX;
+	ve->instr_info  = out.r10 >> 32;
+
+	return ret;
+}
+
+int tdg_handle_virtualization_exception(struct pt_regs *regs,
+					struct ve_info *ve)
+{
+	/*
+	 * TODO: Add handler support for various #VE exit
+	 * reasons. It will be added by other patches in
+	 * the series.
+	 */
+	pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
+	return -EFAULT;
+}
+
 void __init tdx_early_init(void)
 {
 	if (!cpuid_has_tdx_guest())
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index a58800973aed..be56f0281cb5 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -61,6 +61,7 @@
 #include <asm/insn.h>
 #include <asm/insn-eval.h>
 #include <asm/vdso.h>
+#include <asm/tdx.h>
 
 #ifdef CONFIG_X86_64
 #include <asm/x86_init.h>
@@ -1140,6 +1141,74 @@ DEFINE_IDTENTRY(exc_device_not_available)
 	}
 }
 
+#ifdef CONFIG_INTEL_TDX_GUEST
+#define VEFSTR "VE fault"
+static void ve_raise_fault(struct pt_regs *regs, long error_code)
+{
+	struct task_struct *tsk = current;
+
+	if (user_mode(regs)) {
+		tsk->thread.error_code = error_code;
+		tsk->thread.trap_nr = X86_TRAP_VE;
+
+		/*
+		 * Not fixing up VDSO exceptions similar to #GP handler
+		 * because we don't expect the VDSO to trigger #VE.
+		 */
+		show_signal(tsk, SIGSEGV, "", VEFSTR, regs, error_code);
+		force_sig(SIGSEGV);
+		return;
+	}
+
+	if (fixup_exception(regs, X86_TRAP_VE, error_code, 0))
+		return;
+
+	tsk->thread.error_code = error_code;
+	tsk->thread.trap_nr = X86_TRAP_VE;
+
+	/*
+	 * To be potentially processing a kprobe fault and to trust the result
+	 * from kprobe_running(), we have to be non-preemptible.
+	 */
+	if (!preemptible() &&
+	    kprobe_running() &&
+	    kprobe_fault_handler(regs, X86_TRAP_VE))
+		return;
+
+	notify_die(DIE_GPF, VEFSTR, regs, error_code, X86_TRAP_VE, SIGSEGV);
+
+	die_addr(VEFSTR, regs, error_code, 0);
+}
+
+DEFINE_IDTENTRY(exc_virtualization_exception)
+{
+	struct ve_info ve;
+	int ret;
+
+	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+
+	/*
+	 * NMIs/Machine-checks/Interrupts will be in a disabled state
+	 * till TDGETVEINFO TDCALL is executed. This prevents #VE
+	 * nesting issue.
+	 */
+	ret = tdg_get_ve_info(&ve);
+
+	cond_local_irq_enable(regs);
+
+	if (!ret)
+		ret = tdg_handle_virtualization_exception(regs, &ve);
+	/*
+	 * If tdg_handle_virtualization_exception() could not process
+	 * it successfully, treat it as #GP(0) and handle it.
+	 */
+	if (ret)
+		ve_raise_fault(regs, 0);
+
+	cond_local_irq_disable(regs);
+}
+#endif
+
 #ifdef CONFIG_X86_32
 DEFINE_IDTENTRY_SW(iret_error)
 {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v5 08/12] x86/tdx: Add HLT support for TDX guest
  2021-08-04 18:13 [PATCH v5 00/12] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
                   ` (6 preceding siblings ...)
  2021-08-04 18:13 ` [PATCH v5 07/12] x86/traps: Add #VE support for TDX guest Kuppuswamy Sathyanarayanan
@ 2021-08-04 18:13 ` Kuppuswamy Sathyanarayanan
  2021-08-24 16:10   ` Borislav Petkov
  2021-08-04 18:13 ` [PATCH v5 09/12] x86/tdx: Wire up KVM hypercalls Kuppuswamy Sathyanarayanan
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 72+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-08-04 18:13 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Andy Lutomirski
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Per Guest-Host-Communication Interface (GHCI) for Intel Trust
Domain Extensions (Intel TDX) specification, sec 3.8,
TDVMCALL[Instruction.HLT] provides HLT operation. Use it to implement
halt() and safe_halt() paravirtualization calls.

The same TDX hypercall is used to handle #VE exception due to
EXIT_REASON_HLT.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Changes since v4:
 * Added exception for EXIT_REASON_HLT in __tdx_hypercall() to
   enable interrupts using sti.

Changes since v3:
 * None

 arch/x86/kernel/tdcall.S | 29 +++++++++++++++++++++++
 arch/x86/kernel/tdx.c    | 51 ++++++++++++++++++++++++++++++++++------
 2 files changed, 73 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/tdcall.S b/arch/x86/kernel/tdcall.S
index c9c91c1bf99d..9df94f87465d 100644
--- a/arch/x86/kernel/tdcall.S
+++ b/arch/x86/kernel/tdcall.S
@@ -40,6 +40,9 @@
  */
 #define tdcall .byte 0x66,0x0f,0x01,0xcc
 
+/* HLT TDVMCALL sub-function ID */
+#define EXIT_REASON_HLT			12
+
 /*
  * __tdx_module_call()  - Helper function used by TDX guests to request
  * services from the TDX module (does not include VMM services).
@@ -240,6 +243,32 @@ SYM_FUNC_START(__tdx_hypercall)
 
 	movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx
 
+	/*
+	 * For the idle loop STI needs to be called directly before
+	 * the TDCALL that enters idle (EXIT_REASON_HLT case). STI
+	 * enables interrupts only one instruction later. If there
+	 * are any instructions between the STI and the TDCALL for
+	 * HLT then an interrupt could happen in that time, but the
+	 * code would go back to sleep afterwards, which can cause
+	 * longer delays. There leads to significant difference in
+	 * network performance benchmarks. So add a special case for
+	 * EXIT_REASON_HLT to trigger sti before TDCALL. But this
+	 * change is not required for all HLT cases. So use R15
+	 * register value to identify the case which needs sti. So,
+	 * if R11 is EXIT_REASON_HLT and R15 is 1, then call sti
+	 * before TDCALL instruction. Note that R15 register is not
+	 * required by TDCALL ABI when triggering the hypercall for
+	 * EXIT_REASON_HLT case. So use it in software to select the
+	 * sti case.
+	 */
+	cmpl $EXIT_REASON_HLT, %r11d
+	jne skip_sti
+	cmpl $1, %r15d
+	jne skip_sti
+	/* Set R15 register to 0, it is unused in EXIT_REASON_HLT case */
+	xor %r15, %r15
+	sti
+skip_sti:
 	tdcall
 
 	/* Restore output pointer to R9 */
diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index 6169f9c740b2..bdd041c4c509 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -7,6 +7,7 @@
 #include <linux/protected_guest.h>
 
 #include <asm/tdx.h>
+#include <asm/vmx.h>
 
 /* TDX Module call Leaf IDs */
 #define TDINFO				1
@@ -76,6 +77,33 @@ static void tdg_get_info(void)
 	td_info.attributes = out.rdx;
 }
 
+static __cpuidle void tdg_halt(void)
+{
+	u64 ret;
+
+	ret = _tdx_hypercall(EXIT_REASON_HLT, irqs_disabled(), 0, 0, 0, NULL);
+
+	/* It should never fail */
+	BUG_ON(ret);
+}
+
+static __cpuidle void tdg_safe_halt(void)
+{
+	u64 ret;
+
+	/*
+	 * Enable interrupts next to the TDVMCALL to avoid
+	 * performance degradation.
+	 */
+	local_irq_enable();
+
+	/* IRQ is enabled, So set R12 as 0 */
+	ret = _tdx_hypercall(EXIT_REASON_HLT, 0, 0, 0, 1, NULL);
+
+	/* It should never fail */
+	BUG_ON(ret);
+}
+
 unsigned long tdg_get_ve_info(struct ve_info *ve)
 {
 	u64 ret;
@@ -102,13 +130,19 @@ unsigned long tdg_get_ve_info(struct ve_info *ve)
 int tdg_handle_virtualization_exception(struct pt_regs *regs,
 					struct ve_info *ve)
 {
-	/*
-	 * TODO: Add handler support for various #VE exit
-	 * reasons. It will be added by other patches in
-	 * the series.
-	 */
-	pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
-	return -EFAULT;
+	switch (ve->exit_reason) {
+	case EXIT_REASON_HLT:
+		tdg_halt();
+		break;
+	default:
+		pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
+		return -EFAULT;
+	}
+
+	/* After successful #VE handling, move the IP */
+	regs->ip += ve->instr_len;
+
+	return 0;
 }
 
 void __init tdx_early_init(void)
@@ -120,5 +154,8 @@ void __init tdx_early_init(void)
 
 	tdg_get_info();
 
+	pv_ops.irq.safe_halt = tdg_safe_halt;
+	pv_ops.irq.halt = tdg_halt;
+
 	pr_info("Guest initialized\n");
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v5 09/12] x86/tdx: Wire up KVM hypercalls
  2021-08-04 18:13 [PATCH v5 00/12] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
                   ` (7 preceding siblings ...)
  2021-08-04 18:13 ` [PATCH v5 08/12] x86/tdx: Add HLT " Kuppuswamy Sathyanarayanan
@ 2021-08-04 18:13 ` Kuppuswamy Sathyanarayanan
  2021-08-24 16:34   ` Borislav Petkov
  2021-08-04 18:13 ` [PATCH v5 10/12] x86/tdx: Add MSR support for TDX guest Kuppuswamy Sathyanarayanan
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 72+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-08-04 18:13 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Andy Lutomirski
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

KVM hypercalls use the "vmcall" or "vmmcall" instructions.
Although the ABI is similar, those instructions no longer
function for TDX guests. Make vendor-specific TDVMCALLs
instead of VMCALL. This enables TDX guests to run with KVM
acting as the hypervisor. TDX guests running under other
hypervisors will continue to use those hypervisors'
hypercalls.

Since KVM driver can be built as a kernel module, export
tdx_kvm_hypercall*() to make the symbols visible to kvm.ko.

[Isaku Yamahata: proposed KVM VENDOR string]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Changes since v4:
 * No functional changes.

Changes since v3:
 * Fixed ASM sysmbol generation issue in tdcall.S by including tdx.h
   in asm-prototypes.h

Changes since v1:
 * Replaced is_tdx_guest() with prot_guest_has(PR_GUEST_TDX).
 * Replaced tdx_kvm_hypercall{1-4} with single generic 
   function tdx_kvm_hypercall().
 * Removed __tdx_hypercall_vendor_kvm() and re-used __tdx_hypercall().

 arch/x86/Kconfig                      |  5 +++++
 arch/x86/include/asm/asm-prototypes.h |  4 ++++
 arch/x86/include/asm/kvm_para.h       | 22 ++++++++++++++++++++
 arch/x86/include/asm/tdx.h            | 30 +++++++++++++++++++++++++--
 arch/x86/kernel/tdcall.S              |  2 ++
 5 files changed, 61 insertions(+), 2 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 10f2cb51a39d..b500f2afacce 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -880,6 +880,11 @@ config INTEL_TDX_GUEST
 	  run in a CPU mode that protects the confidentiality of TD memory
 	  contents and the TD’s CPU state from other software, including VMM.
 
+# This option enables KVM specific hypercalls in TDX guest.
+config INTEL_TDX_GUEST_KVM
+	def_bool y
+	depends on KVM_GUEST && INTEL_TDX_GUEST
+
 endif #HYPERVISOR_GUEST
 
 source "arch/x86/Kconfig.cpu"
diff --git a/arch/x86/include/asm/asm-prototypes.h b/arch/x86/include/asm/asm-prototypes.h
index 4cb726c71ed8..9855a9ff2924 100644
--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -17,6 +17,10 @@
 extern void cmpxchg8b_emu(void);
 #endif
 
+#ifdef CONFIG_INTEL_TDX_GUEST
+#include <asm/tdx.h>
+#endif
+
 #ifdef CONFIG_RETPOLINE
 
 #undef GEN
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 69299878b200..bd0ab7c3ae25 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -4,7 +4,9 @@
 
 #include <asm/processor.h>
 #include <asm/alternative.h>
+#include <asm/tdx.h>
 #include <linux/interrupt.h>
+#include <linux/protected_guest.h>
 #include <uapi/asm/kvm_para.h>
 
 #ifdef CONFIG_KVM_GUEST
@@ -32,6 +34,10 @@ static inline bool kvm_check_and_clear_guest_paused(void)
 static inline long kvm_hypercall0(unsigned int nr)
 {
 	long ret;
+
+	if (prot_guest_has(PATTR_GUEST_TDX))
+		return tdx_kvm_hypercall(nr, 0, 0, 0, 0);
+
 	asm volatile(KVM_HYPERCALL
 		     : "=a"(ret)
 		     : "a"(nr)
@@ -42,6 +48,10 @@ static inline long kvm_hypercall0(unsigned int nr)
 static inline long kvm_hypercall1(unsigned int nr, unsigned long p1)
 {
 	long ret;
+
+	if (prot_guest_has(PATTR_GUEST_TDX))
+		return tdx_kvm_hypercall(nr, p1, 0, 0, 0);
+
 	asm volatile(KVM_HYPERCALL
 		     : "=a"(ret)
 		     : "a"(nr), "b"(p1)
@@ -53,6 +63,10 @@ static inline long kvm_hypercall2(unsigned int nr, unsigned long p1,
 				  unsigned long p2)
 {
 	long ret;
+
+	if (prot_guest_has(PATTR_GUEST_TDX))
+		return tdx_kvm_hypercall(nr, p1, p2, 0, 0);
+
 	asm volatile(KVM_HYPERCALL
 		     : "=a"(ret)
 		     : "a"(nr), "b"(p1), "c"(p2)
@@ -64,6 +78,10 @@ static inline long kvm_hypercall3(unsigned int nr, unsigned long p1,
 				  unsigned long p2, unsigned long p3)
 {
 	long ret;
+
+	if (prot_guest_has(PATTR_GUEST_TDX))
+		return tdx_kvm_hypercall(nr, p1, p2, p3, 0);
+
 	asm volatile(KVM_HYPERCALL
 		     : "=a"(ret)
 		     : "a"(nr), "b"(p1), "c"(p2), "d"(p3)
@@ -76,6 +94,10 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned long p1,
 				  unsigned long p4)
 {
 	long ret;
+
+	if (prot_guest_has(PATTR_GUEST_TDX))
+		return tdx_kvm_hypercall(nr, p1, p2, p3, p4);
+
 	asm volatile(KVM_HYPERCALL
 		     : "=a"(ret)
 		     : "a"(nr), "b"(p1), "c"(p2), "d"(p3), "S"(p4)
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 846fe58f0426..8fa33e2c98db 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -6,8 +6,9 @@
 #include <linux/cpufeature.h>
 #include <linux/types.h>
 
-#define TDX_CPUID_LEAF_ID	0x21
-#define TDX_HYPERCALL_STANDARD  0
+#define TDX_CPUID_LEAF_ID			0x21
+#define TDX_HYPERCALL_STANDARD			0
+#define TDX_HYPERCALL_VENDOR_KVM		0x4d564b2e584454
 
 /*
  * Used in __tdx_module_call() helper function to gather the
@@ -80,4 +81,29 @@ static inline bool tdx_prot_guest_has(unsigned long flag) { return false; }
 
 #endif /* CONFIG_INTEL_TDX_GUEST */
 
+#ifdef CONFIG_INTEL_TDX_GUEST_KVM
+
+static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
+				     unsigned long p2, unsigned long p3,
+				     unsigned long p4)
+{
+	struct tdx_hypercall_output out;
+	u64 err;
+
+	err = __tdx_hypercall(TDX_HYPERCALL_VENDOR_KVM, nr, p1, p2,
+			      p3, p4, &out);
+
+	BUG_ON(err);
+
+	return out.r10;
+}
+#else
+static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
+				     unsigned long p2, unsigned long p3,
+				     unsigned long p4)
+{
+	return -ENODEV;
+}
+#endif /* CONFIG_INTEL_TDX_GUEST_KVM */
+
 #endif /* _ASM_X86_TDX_H */
diff --git a/arch/x86/kernel/tdcall.S b/arch/x86/kernel/tdcall.S
index 9df94f87465d..1823bac4542d 100644
--- a/arch/x86/kernel/tdcall.S
+++ b/arch/x86/kernel/tdcall.S
@@ -3,6 +3,7 @@
 #include <asm/asm.h>
 #include <asm/frame.h>
 #include <asm/unwind_hints.h>
+#include <asm/export.h>
 
 #include <linux/linkage.h>
 #include <linux/bits.h>
@@ -309,3 +310,4 @@ skip_sti:
 
        retq
 SYM_FUNC_END(__tdx_hypercall)
+EXPORT_SYMBOL(__tdx_hypercall);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v5 10/12] x86/tdx: Add MSR support for TDX guest
  2021-08-04 18:13 [PATCH v5 00/12] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
                   ` (8 preceding siblings ...)
  2021-08-04 18:13 ` [PATCH v5 09/12] x86/tdx: Wire up KVM hypercalls Kuppuswamy Sathyanarayanan
@ 2021-08-04 18:13 ` Kuppuswamy Sathyanarayanan
  2021-08-24 16:55   ` Borislav Petkov
  2021-08-04 18:13 ` [PATCH v5 11/12] x86/tdx: Don't write CSTAR MSR on Intel Kuppuswamy Sathyanarayanan
  2021-08-04 18:13 ` [PATCH v5 12/12] x86/tdx: Handle CPUID via #VE Kuppuswamy Sathyanarayanan
  11 siblings, 1 reply; 72+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-08-04 18:13 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Andy Lutomirski
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Operations on context-switched MSRs can be run natively. The rest of
MSRs should be handled through TDVMCALLs.

TDVMCALL[Instruction.RDMSR] and TDVMCALL[Instruction.WRMSR] provide
MSR oprations.

RDMSR and WRMSR specification details can be found in
Guest-Host-Communication Interface (GHCI) for Intel Trust Domain
Extensions (Intel TDX) specification, sec 3.10, 3.11.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Change since v4:
 * Removed You usage from commit log.

Changes since v3:
 * None

 arch/x86/kernel/tdx.c | 67 +++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 65 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index bdd041c4c509..d16c7f8759ea 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -104,6 +104,55 @@ static __cpuidle void tdg_safe_halt(void)
 	BUG_ON(ret);
 }
 
+static bool tdg_is_context_switched_msr(unsigned int msr)
+{
+	switch (msr) {
+	case MSR_EFER:
+	case MSR_IA32_CR_PAT:
+	case MSR_FS_BASE:
+	case MSR_GS_BASE:
+	case MSR_KERNEL_GS_BASE:
+	case MSR_IA32_SYSENTER_CS:
+	case MSR_IA32_SYSENTER_EIP:
+	case MSR_IA32_SYSENTER_ESP:
+	case MSR_STAR:
+	case MSR_LSTAR:
+	case MSR_SYSCALL_MASK:
+	case MSR_IA32_XSS:
+	case MSR_TSC_AUX:
+	case MSR_IA32_BNDCFGS:
+		return true;
+	}
+	return false;
+}
+
+static u64 tdg_read_msr_safe(unsigned int msr, int *err)
+{
+	u64 ret;
+	struct tdx_hypercall_output out = {0};
+
+	WARN_ON_ONCE(tdg_is_context_switched_msr(msr));
+
+	ret = _tdx_hypercall(EXIT_REASON_MSR_READ, msr, 0, 0, 0, &out);
+
+	*err = ret ? -EIO : 0;
+
+	return out.r11;
+}
+
+static int tdg_write_msr_safe(unsigned int msr, unsigned int low,
+			      unsigned int high)
+{
+	u64 ret;
+
+	WARN_ON_ONCE(tdg_is_context_switched_msr(msr));
+
+	ret = _tdx_hypercall(EXIT_REASON_MSR_WRITE, msr, (u64)high << 32 | low,
+			     0, 0, NULL);
+
+	return ret ? -EIO : 0;
+}
+
 unsigned long tdg_get_ve_info(struct ve_info *ve)
 {
 	u64 ret;
@@ -130,19 +179,33 @@ unsigned long tdg_get_ve_info(struct ve_info *ve)
 int tdg_handle_virtualization_exception(struct pt_regs *regs,
 					struct ve_info *ve)
 {
+	unsigned long val;
+	int ret = 0;
+
 	switch (ve->exit_reason) {
 	case EXIT_REASON_HLT:
 		tdg_halt();
 		break;
+	case EXIT_REASON_MSR_READ:
+		val = tdg_read_msr_safe(regs->cx, (unsigned int *)&ret);
+		if (!ret) {
+			regs->ax = val & UINT_MAX;
+			regs->dx = val >> 32;
+		}
+		break;
+	case EXIT_REASON_MSR_WRITE:
+		ret = tdg_write_msr_safe(regs->cx, regs->ax, regs->dx);
+		break;
 	default:
 		pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
 		return -EFAULT;
 	}
 
 	/* After successful #VE handling, move the IP */
-	regs->ip += ve->instr_len;
+	if (!ret)
+		regs->ip += ve->instr_len;
 
-	return 0;
+	return ret;
 }
 
 void __init tdx_early_init(void)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v5 11/12] x86/tdx: Don't write CSTAR MSR on Intel
  2021-08-04 18:13 [PATCH v5 00/12] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
                   ` (9 preceding siblings ...)
  2021-08-04 18:13 ` [PATCH v5 10/12] x86/tdx: Add MSR support for TDX guest Kuppuswamy Sathyanarayanan
@ 2021-08-04 18:13 ` Kuppuswamy Sathyanarayanan
  2021-08-04 18:31   ` Sean Christopherson
  2021-08-04 18:13 ` [PATCH v5 12/12] x86/tdx: Handle CPUID via #VE Kuppuswamy Sathyanarayanan
  11 siblings, 1 reply; 72+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-08-04 18:13 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Andy Lutomirski
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel

From: Andi Kleen <ak@linux.intel.com>

On Intel CPUs writing the CSTAR MSR is not really needed. Syscalls
from 32bit work using SYSENTER and 32bit SYSCALL is an illegal opcode.
But the kernel did write it anyways even though it was ignored by
the CPU. Inside a TDX guest this actually leads to a #GP. While the #GP
is caught and recovered from, it prints an ugly message at boot.
Do not write the CSTAR MSR on Intel CPUs.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/kernel/cpu/common.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 64b805bd6a54..d936f0e4ec51 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1752,7 +1752,13 @@ void syscall_init(void)
 	wrmsrl(MSR_LSTAR, (unsigned long)entry_SYSCALL_64);
 
 #ifdef CONFIG_IA32_EMULATION
-	wrmsrl(MSR_CSTAR, (unsigned long)entry_SYSCALL_compat);
+	/*
+	 * CSTAR is not needed on Intel because it doesn't support
+	 * 32bit SYSCALL, but only SYSENTER. On a TDX guest
+	 * it leads to a #GP.
+	 */
+	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
+		wrmsrl(MSR_CSTAR, (unsigned long)entry_SYSCALL_compat);
 	/*
 	 * This only works on Intel CPUs.
 	 * On AMD CPUs these MSRs are 32-bit, CPU truncates MSR_IA32_SYSENTER_EIP.
@@ -1764,7 +1770,8 @@ void syscall_init(void)
 		    (unsigned long)(cpu_entry_stack(smp_processor_id()) + 1));
 	wrmsrl_safe(MSR_IA32_SYSENTER_EIP, (u64)entry_SYSENTER_compat);
 #else
-	wrmsrl(MSR_CSTAR, (unsigned long)ignore_sysret);
+	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
+		wrmsrl(MSR_CSTAR, (unsigned long)ignore_sysret);
 	wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)GDT_ENTRY_INVALID_SEG);
 	wrmsrl_safe(MSR_IA32_SYSENTER_ESP, 0ULL);
 	wrmsrl_safe(MSR_IA32_SYSENTER_EIP, 0ULL);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v5 12/12] x86/tdx: Handle CPUID via #VE
  2021-08-04 18:13 [PATCH v5 00/12] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
                   ` (10 preceding siblings ...)
  2021-08-04 18:13 ` [PATCH v5 11/12] x86/tdx: Don't write CSTAR MSR on Intel Kuppuswamy Sathyanarayanan
@ 2021-08-04 18:13 ` Kuppuswamy Sathyanarayanan
  2021-08-24 17:48   ` Borislav Petkov
  11 siblings, 1 reply; 72+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-08-04 18:13 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Andy Lutomirski
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

TDX has three classes of CPUID leaves: some CPUID leaves
are always handled by the CPU, others are handled by the TDX module,
and some others are handled by the VMM. Since the VMM cannot directly
intercept the instruction these are reflected with a #VE exception
to the guest, which then converts it into a hypercall to the VMM,
or handled directly.

The TDX module EAS has a full list of CPUID leaves which are handled
natively or by the TDX module in 16.2. Only unknown CPUIDs are handled by
the #VE method. In practice this typically only applies to the
hypervisor specific CPUIDs unknown to the native CPU.

Therefore there is no risk of causing this in early CPUID code which
runs before the #VE handler is set up because it will never access
those exotic CPUID leaves.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Changes since v4:
 * None

Changes since v3:
 * None

 arch/x86/kernel/tdx.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index d16c7f8759ea..5d2fd6c8b01c 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -153,6 +153,21 @@ static int tdg_write_msr_safe(unsigned int msr, unsigned int low,
 	return ret ? -EIO : 0;
 }
 
+static void tdg_handle_cpuid(struct pt_regs *regs)
+{
+	u64 ret;
+	struct tdx_hypercall_output out = {0};
+
+	ret = _tdx_hypercall(EXIT_REASON_CPUID, regs->ax, regs->cx, 0, 0, &out);
+
+	WARN_ON(ret);
+
+	regs->ax = out.r12;
+	regs->bx = out.r13;
+	regs->cx = out.r14;
+	regs->dx = out.r15;
+}
+
 unsigned long tdg_get_ve_info(struct ve_info *ve)
 {
 	u64 ret;
@@ -196,6 +211,9 @@ int tdg_handle_virtualization_exception(struct pt_regs *regs,
 	case EXIT_REASON_MSR_WRITE:
 		ret = tdg_write_msr_safe(regs->cx, regs->ax, regs->dx);
 		break;
+	case EXIT_REASON_CPUID:
+		tdg_handle_cpuid(regs);
+		break;
 	default:
 		pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
 		return -EFAULT;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 11/12] x86/tdx: Don't write CSTAR MSR on Intel
  2021-08-04 18:13 ` [PATCH v5 11/12] x86/tdx: Don't write CSTAR MSR on Intel Kuppuswamy Sathyanarayanan
@ 2021-08-04 18:31   ` Sean Christopherson
  2021-08-04 21:03     ` Kuppuswamy, Sathyanarayanan
  0 siblings, 1 reply; 72+ messages in thread
From: Sean Christopherson @ 2021-08-04 18:31 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Andy Lutomirski, Peter H Anvin, Dave Hansen, Tony Luck,
	Dan Williams, Andi Kleen, Kirill Shutemov,
	Kuppuswamy Sathyanarayanan, x86, linux-kernel

On Wed, Aug 04, 2021, Kuppuswamy Sathyanarayanan wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> On Intel CPUs writing the CSTAR MSR is not really needed. Syscalls
> from 32bit work using SYSENTER and 32bit SYSCALL is an illegal opcode.
> But the kernel did write it anyways even though it was ignored by
> the CPU. Inside a TDX guest this actually leads to a #GP. While the #GP
> is caught and recovered from, it prints an ugly message at boot.
> Do not write the CSTAR MSR on Intel CPUs.

Not that it really matters, but...

Is #GP the actual TDX-Module behavior?  If so, isn't that a contradiction with
respect to the TDX-Module architecture?  It says:

  guest TD access violations to MSRs can cause a #GP(0) in most cases where the
  MSR is enumerated as inaccessible by the Intel TDX module via CPUID
  virtualization.  In other cases, guest TD access violations to MSRs can cause
  a #VE.

Given that there is no dedicated CPUID flag for CSTAR and CSTAR obviously exists
on Intel CPUs, I don't see how the TDX-Module can possible enumerate CSTAR as
being inaccessible.

Regardless of #GP versus #VE, "Table 16.2 MSR Virtualization" needs to state the
actual behavior.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 11/12] x86/tdx: Don't write CSTAR MSR on Intel
  2021-08-04 18:31   ` Sean Christopherson
@ 2021-08-04 21:03     ` Kuppuswamy, Sathyanarayanan
  2021-08-04 21:44       ` Sean Christopherson
  2021-08-04 21:48       ` Dave Hansen
  0 siblings, 2 replies; 72+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2021-08-04 21:03 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Andy Lutomirski, Peter H Anvin, Dave Hansen, Tony Luck,
	Dan Williams, Andi Kleen, Kirill Shutemov,
	Kuppuswamy Sathyanarayanan, x86, linux-kernel



On 8/4/21 11:31 AM, Sean Christopherson wrote:
>> On Intel CPUs writing the CSTAR MSR is not really needed. Syscalls
>> from 32bit work using SYSENTER and 32bit SYSCALL is an illegal opcode.
>> But the kernel did write it anyways even though it was ignored by
>> the CPU. Inside a TDX guest this actually leads to a #GP. While the #GP
>> is caught and recovered from, it prints an ugly message at boot.
>> Do not write the CSTAR MSR on Intel CPUs.
> Not that it really matters, but...
> 
> Is #GP the actual TDX-Module behavior?  If so, isn't that a contradiction with

No, #GP is triggered by guest.

> respect to the TDX-Module architecture?  It says:
> 
>    guest TD access violations to MSRs can cause a #GP(0) in most cases where the
>    MSR is enumerated as inaccessible by the Intel TDX module via CPUID
>    virtualization.  In other cases, guest TD access violations to MSRs can cause
>    a #VE.
> 
> Given that there is no dedicated CPUID flag for CSTAR and CSTAR obviously exists
> on Intel CPUs, I don't see how the TDX-Module can possible enumerate CSTAR as
> being inaccessible.
> 
> Regardless of #GP versus #VE, "Table 16.2 MSR Virtualization" needs to state the
> actual behavior.

Even in this case, it will trigger #VE. But since CSTAR MSR is not supported, write
to it will fail and leads to #VE fault.

File: arch/x86/kernel/traps.c

1183 DEFINE_IDTENTRY(exc_virtualization_exception)
1201         if (!ret)
1202                 ret = tdg_handle_virtualization_exception(regs, &ve);
1203         /*
1204          * If tdg_handle_virtualization_exception() could not process
1205          * it successfully, treat it as #GP(0) and handle it.
1206          */
1207         if (ret)
1208                 ve_raise_fault(regs, 0);

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 11/12] x86/tdx: Don't write CSTAR MSR on Intel
  2021-08-04 21:03     ` Kuppuswamy, Sathyanarayanan
@ 2021-08-04 21:44       ` Sean Christopherson
  2021-08-04 21:48       ` Dave Hansen
  1 sibling, 0 replies; 72+ messages in thread
From: Sean Christopherson @ 2021-08-04 21:44 UTC (permalink / raw)
  To: Kuppuswamy, Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Andy Lutomirski, Peter H Anvin, Dave Hansen, Tony Luck,
	Dan Williams, Andi Kleen, Kirill Shutemov,
	Kuppuswamy Sathyanarayanan, x86, linux-kernel

On Wed, Aug 04, 2021, Kuppuswamy, Sathyanarayanan wrote:
> 
> On 8/4/21 11:31 AM, Sean Christopherson wrote:
> > > On Intel CPUs writing the CSTAR MSR is not really needed. Syscalls
> > > from 32bit work using SYSENTER and 32bit SYSCALL is an illegal opcode.
> > > But the kernel did write it anyways even though it was ignored by
> > > the CPU. Inside a TDX guest this actually leads to a #GP. While the #GP
> > > is caught and recovered from, it prints an ugly message at boot.
> > > Do not write the CSTAR MSR on Intel CPUs.
> > Not that it really matters, but...
> > 
> > Is #GP the actual TDX-Module behavior?  If so, isn't that a contradiction with
> 
> No, #GP is triggered by guest.

#GP is not triggered by the guest, it's not even reported by the guest.  From
patch 7, the #VE handler escalates unhandled #VEs "similar to #GP handler", but
it still reports #VE as the actual vector.

Now, that particular behavior could change, e.g. setting tsk->thread.trap_nr to
#VE might confuse userspace, but at no point does this "trigger" a #GP.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 11/12] x86/tdx: Don't write CSTAR MSR on Intel
  2021-08-04 21:03     ` Kuppuswamy, Sathyanarayanan
  2021-08-04 21:44       ` Sean Christopherson
@ 2021-08-04 21:48       ` Dave Hansen
  2021-08-04 22:23         ` Kuppuswamy, Sathyanarayanan
  1 sibling, 1 reply; 72+ messages in thread
From: Dave Hansen @ 2021-08-04 21:48 UTC (permalink / raw)
  To: Kuppuswamy, Sathyanarayanan, Sean Christopherson
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Andy Lutomirski, Peter H Anvin, Tony Luck, Dan Williams,
	Andi Kleen, Kirill Shutemov, Kuppuswamy Sathyanarayanan, x86,
	linux-kernel

On 8/4/21 2:03 PM, Kuppuswamy, Sathyanarayanan wrote:
>> Is #GP the actual TDX-Module behavior?  If so, isn't that a
>> contradiction with
> 
> No, #GP is triggered by guest.
...
>> Regardless of #GP versus #VE, "Table 16.2 MSR Virtualization" needs
>> to state the actual behavior.
> 
> Even in this case, it will trigger #VE. But since CSTAR MSR is not 
> supported, write to it will fail and leads to #VE fault.

Sathya, I think there might be a mixup of terminology here that's
confusing.  I'm confused by this exchange.

In general, we refer to hardware exceptions by their architecture names:
#GP for general protection fault, #PF for page fault, #VE for
Virtualization Exception.

Those hardware exceptions are wired up to software handlers:
#GP lands in asm_exc_general_protection
#PF ends up in exc_page_fault
#VE ends up in exc_virtualization_exception
... and more of course

But, to add to the confusion, the #VE handler
(exc_virtualization_exception()) itself calls (or did once upon a time
call) do_general_protection() when it can't handle something.
do_general_protection() is (was?) *ALSO* called by the #GP handler.

So, is that what you meant?  By "#GP is triggered by guest", you mean
that a write to the CSTAR MSR and the resulting #VE will end up being
handled in a way that is similar to how a #GP hardware exception would
have been handled?

If that's what you meant, I'm not _sure_ that's totally accurate.  Could
you elaborate on this a bit?  It also would be really handy if you were
able to adopt the terminology I talked about above.  It will really make
things less confusing.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 04/12] x86/tdx: Add protected guest support for TDX guest
  2021-08-04 18:13 ` [PATCH v5 04/12] x86/tdx: Add protected guest support for TDX guest Kuppuswamy Sathyanarayanan
@ 2021-08-04 21:59   ` Sean Christopherson
  2021-08-04 22:03     ` Dave Hansen
  0 siblings, 1 reply; 72+ messages in thread
From: Sean Christopherson @ 2021-08-04 21:59 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Andy Lutomirski, Peter H Anvin, Dave Hansen, Tony Luck,
	Dan Williams, Andi Kleen, Kirill Shutemov,
	Kuppuswamy Sathyanarayanan, x86, linux-kernel

On Wed, Aug 04, 2021, Kuppuswamy Sathyanarayanan wrote:
> TDX architecture provides a way for VM guests to be highly secure and
> isolated (from untrusted VMM). To achieve this requirement, we can't
> completely trust any data coming from VMM. TDX guest fixes this issue
> by hardening the IO drivers against the attack from the VMM. Since we
> have a requirement to modify the generic drivers, we need to use the
> generic prot_guest_has() API to add TDX specific code in generic
> drivers.
> 
> So add TDX guest support in prot_guest_has() API.
> 
> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> ---

...

> diff --git a/arch/x86/include/asm/protected_guest.h b/arch/x86/include/asm/protected_guest.h
> index b4a267dddf93..c67bf13c8ad3 100644
> --- a/arch/x86/include/asm/protected_guest.h
> +++ b/arch/x86/include/asm/protected_guest.h
> @@ -12,12 +12,17 @@
>  
>  #include <linux/mem_encrypt.h>
>  
> +#include <asm/processor.h>
> +#include <asm/tdx.h>
> +
>  #ifndef __ASSEMBLY__
>  
>  static inline bool prot_guest_has(unsigned int attr)
>  {
>  	if (sme_me_mask)
>  		return amd_prot_guest_has(attr);
> +	else if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)

Why not "boot_cpu_has(X86_FEATURE_TDX_GUEST)"?

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 04/12] x86/tdx: Add protected guest support for TDX guest
  2021-08-04 21:59   ` Sean Christopherson
@ 2021-08-04 22:03     ` Dave Hansen
  2021-08-04 22:26       ` Kuppuswamy, Sathyanarayanan
  0 siblings, 1 reply; 72+ messages in thread
From: Dave Hansen @ 2021-08-04 22:03 UTC (permalink / raw)
  To: Sean Christopherson, Kuppuswamy Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Andy Lutomirski, Peter H Anvin, Tony Luck, Dan Williams,
	Andi Kleen, Kirill Shutemov, Kuppuswamy Sathyanarayanan, x86,
	linux-kernel

On 8/4/21 2:59 PM, Sean Christopherson wrote:
>> +#include <asm/processor.h>
>> +#include <asm/tdx.h>
>> +
>>  #ifndef __ASSEMBLY__
>>  
>>  static inline bool prot_guest_has(unsigned int attr)
>>  {
>>  	if (sme_me_mask)
>>  		return amd_prot_guest_has(attr);
>> +	else if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
> Why not "boot_cpu_has(X86_FEATURE_TDX_GUEST)"?

Even better: cpu_feature_enabled(X86_FEATURE_TDX_GUEST).  That gets you
both static patching *and* compile-time optimization if you hook
X86_FEATURE_TDX_GUEST into disabled-features.h.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 11/12] x86/tdx: Don't write CSTAR MSR on Intel
  2021-08-04 21:48       ` Dave Hansen
@ 2021-08-04 22:23         ` Kuppuswamy, Sathyanarayanan
  0 siblings, 0 replies; 72+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2021-08-04 22:23 UTC (permalink / raw)
  To: Dave Hansen, Sean Christopherson
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Andy Lutomirski, Peter H Anvin, Tony Luck, Dan Williams,
	Andi Kleen, Kirill Shutemov, Kuppuswamy Sathyanarayanan, x86,
	linux-kernel



On 8/4/21 2:48 PM, Dave Hansen wrote:
>> No, #GP is triggered by guest.
> ...
>>> Regardless of #GP versus #VE, "Table 16.2 MSR Virtualization" needs
>>> to state the actual behavior.
>> Even in this case, it will trigger #VE. But since CSTAR MSR is not
>> supported, write to it will fail and leads to #VE fault.
> Sathya, I think there might be a mixup of terminology here that's
> confusing.  I'm confused by this exchange.
> 
> In general, we refer to hardware exceptions by their architecture names:
> #GP for general protection fault, #PF for page fault, #VE for
> Virtualization Exception.
> 
> Those hardware exceptions are wired up to software handlers:
> #GP lands in asm_exc_general_protection
> #PF ends up in exc_page_fault
> #VE ends up in exc_virtualization_exception
> ... and more of course
> 
> But, to add to the confusion, the #VE handler
> (exc_virtualization_exception()) itself calls (or did once upon a time
> call) do_general_protection() when it can't handle something.
> do_general_protection() is (was?)*ALSO*  called by the #GP handler.
> 
> So, is that what you meant?  By "#GP is triggered by guest", you mean
> that a write to the CSTAR MSR and the resulting #VE will end up being
> handled in a way that is similar to how a #GP hardware exception would
> have been handled?
> 
> If that's what you meant, I'm not_sure_  that's totally accurate.  Could
> you elaborate on this a bit?  It also would be really handy if you were
> able to adopt the terminology I talked about above.  It will really make
> things less confusing.


In TDX guest, MSR write will trigger #VE which will be handled by
exc_virtualization_exception()->tdg_handle_virtualization_exception().
Internally this exception handler emulates the "MSR write" using
hypercalls. But if the hypercall returns failure, then it means we
failed to handle the #VE exception. In such cases,
exc_virtualization_exception() handler will trigger #GP like behavior
using ve_raise_fault(). ve_raise_fault() is the customized version of
do_general_protection(). This what I meant by guest triggers #GP(0).

Since CSTAR_MSR is not supported/used in Intel platforms, instead of
going through all these processes before triggering the failure, we
have added the exception for it before it is used.

Following are the implementation details:

static void ve_raise_fault(struct pt_regs *regs, long error_code)
{
         struct task_struct *tsk = current;

         if (user_mode(regs)) {
                 tsk->thread.error_code = error_code;
                 tsk->thread.trap_nr = X86_TRAP_VE;

                 /*
                  * Not fixing up VDSO exceptions similar to #GP handler
                  * because we don't expect the VDSO to trigger #VE.
                  */
                 show_signal(tsk, SIGSEGV, "", VEFSTR, regs, error_code);
                 force_sig(SIGSEGV);
                 return;
         }

         if (fixup_exception(regs, X86_TRAP_VE, error_code, 0))
                 return;

         tsk->thread.error_code = error_code;
         tsk->thread.trap_nr = X86_TRAP_VE;

         /*
          * To be potentially processing a kprobe fault and to trust the result
          * from kprobe_running(), we have to be non-preemptible.
          */
         if (!preemptible() &&
             kprobe_running() &&
             kprobe_fault_handler(regs, X86_TRAP_VE))
                 return;

         notify_die(DIE_GPF, VEFSTR, regs, error_code, X86_TRAP_VE, SIGSEGV);

         die_addr(VEFSTR, regs, error_code, 0);
}


DEFINE_IDTENTRY(exc_virtualization_exception)
{
         struct ve_info ve;
         int ret;

         RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");

         inc_irq_stat(tdg_ve_count);

         /*
          * NMIs/Machine-checks/Interrupts will be in a disabled state
          * till TDGETVEINFO TDCALL is executed. This prevents #VE
          * nesting issue.
          */
         ret = tdg_get_ve_info(&ve);

         cond_local_irq_enable(regs);

         if (!ret)
                 ret = tdg_handle_virtualization_exception(regs, &ve);
         /*
          * If tdg_handle_virtualization_exception() could not process
          * it successfully, treat it as #GP(0) and handle it.
          */
         if (ret)
                 ve_raise_fault(regs, 0);

         cond_local_irq_disable(regs);

}
-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 04/12] x86/tdx: Add protected guest support for TDX guest
  2021-08-04 22:03     ` Dave Hansen
@ 2021-08-04 22:26       ` Kuppuswamy, Sathyanarayanan
  2021-08-04 22:42         ` Sean Christopherson
  0 siblings, 1 reply; 72+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2021-08-04 22:26 UTC (permalink / raw)
  To: Dave Hansen, Sean Christopherson
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Andy Lutomirski, Peter H Anvin, Tony Luck, Dan Williams,
	Andi Kleen, Kirill Shutemov, Kuppuswamy Sathyanarayanan, x86,
	linux-kernel



On 8/4/21 3:03 PM, Dave Hansen wrote:
>>> +#include <asm/processor.h>
>>> +#include <asm/tdx.h>
>>> +
>>>   #ifndef __ASSEMBLY__
>>>   
>>>   static inline bool prot_guest_has(unsigned int attr)
>>>   {
>>>   	if (sme_me_mask)
>>>   		return amd_prot_guest_has(attr);
>>> +	else if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
>> Why not "boot_cpu_has(X86_FEATURE_TDX_GUEST)"?
> Even better: cpu_feature_enabled(X86_FEATURE_TDX_GUEST).  That gets you
> both static patching*and*  compile-time optimization if you hook
> X86_FEATURE_TDX_GUEST into disabled-features.h.

This is how Borislav preferred it. tdx_prot_guest_has() internally uses
cpu_feature_enabled(X86_FEATURE_TDX_GUEST) to return the status.

I think the intention is to keep the first call generic (non TDX
specific). So that it can be extended for other use cases.

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 06/12] x86/tdx: Get TD execution environment information via TDINFO
  2021-08-04 18:13 ` [PATCH v5 06/12] x86/tdx: Get TD execution environment information via TDINFO Kuppuswamy Sathyanarayanan
@ 2021-08-04 22:38   ` Sean Christopherson
  2021-08-20 17:13   ` Borislav Petkov
  1 sibling, 0 replies; 72+ messages in thread
From: Sean Christopherson @ 2021-08-04 22:38 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Andy Lutomirski, Peter H Anvin, Dave Hansen, Tony Luck,
	Dan Williams, Andi Kleen, Kirill Shutemov,
	Kuppuswamy Sathyanarayanan, x86, linux-kernel

On Wed, Aug 04, 2021, Kuppuswamy Sathyanarayanan wrote:
> diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
> index 287564990f21..3973e81751ba 100644
> --- a/arch/x86/kernel/tdx.c
> +++ b/arch/x86/kernel/tdx.c
> @@ -8,6 +8,14 @@
>  
>  #include <asm/tdx.h>
>  
> +/* TDX Module call Leaf IDs */
> +#define TDINFO				1
> +
> +static struct {
> +	unsigned int gpa_width;
> +	unsigned long attributes;
> +} td_info __ro_after_init;
> +
>  /*
>   * Wrapper for standard use of __tdx_hypercall with BUG_ON() check
>   * for TDCALL error.
> @@ -54,6 +62,19 @@ bool tdx_prot_guest_has(unsigned long flag)
>  }
>  EXPORT_SYMBOL_GPL(tdx_prot_guest_has);
>  
> +static void tdg_get_info(void)

I've already made my dislike of "tdg" abundantly clear, but I will keep on complaining
so long as y'all keep sending patches with "tdx" and "tdg" interspersed without any
obvious method to the madness.

Also, a function with "get" in the name that returns a void and fills in a global
struct that's is a bit misleading.  

> +{
> +	u64 ret;
> +	struct tdx_module_output out = {0};
> +
> +	ret = __tdx_module_call(TDINFO, 0, 0, 0, 0, &out);
> +
> +	BUG_ON(ret);
> +
> +	td_info.gpa_width = out.rcx & GENMASK(5, 0);
> +	td_info.attributes = out.rdx;
> +}
> +
>  void __init tdx_early_init(void)
>  {
>  	if (!cpuid_has_tdx_guest())
> @@ -61,5 +82,7 @@ void __init tdx_early_init(void)
>  
>  	setup_force_cpu_cap(X86_FEATURE_TDX_GUEST);
>  
> +	tdg_get_info();
> +
>  	pr_info("Guest initialized\n");
>  }
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 04/12] x86/tdx: Add protected guest support for TDX guest
  2021-08-04 22:26       ` Kuppuswamy, Sathyanarayanan
@ 2021-08-04 22:42         ` Sean Christopherson
  2021-08-04 23:00           ` Kuppuswamy, Sathyanarayanan
  0 siblings, 1 reply; 72+ messages in thread
From: Sean Christopherson @ 2021-08-04 22:42 UTC (permalink / raw)
  To: Kuppuswamy, Sathyanarayanan
  Cc: Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, Andy Lutomirski, Peter H Anvin, Tony Luck,
	Dan Williams, Andi Kleen, Kirill Shutemov,
	Kuppuswamy Sathyanarayanan, x86, linux-kernel

On Wed, Aug 04, 2021, Kuppuswamy, Sathyanarayanan wrote:
> 
> On 8/4/21 3:03 PM, Dave Hansen wrote:
> > > > +#include <asm/processor.h>
> > > > +#include <asm/tdx.h>
> > > > +
> > > >   #ifndef __ASSEMBLY__
> > > >   static inline bool prot_guest_has(unsigned int attr)
> > > >   {
> > > >   	if (sme_me_mask)
> > > >   		return amd_prot_guest_has(attr);
> > > > +	else if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
> > > Why not "boot_cpu_has(X86_FEATURE_TDX_GUEST)"?
> > Even better: cpu_feature_enabled(X86_FEATURE_TDX_GUEST).  That gets you
> > both static patching*and*  compile-time optimization if you hook

Ah, I keep forgetting it can be compiled out.

> > X86_FEATURE_TDX_GUEST into disabled-features.h.
> 
> This is how Borislav preferred it. tdx_prot_guest_has() internally uses
> cpu_feature_enabled(X86_FEATURE_TDX_GUEST) to return the status.
> 
> I think the intention is to keep the first call generic (non TDX
> specific). So that it can be extended for other use cases.

What other possible use case is there for invoking tdx_prot_guest_has() beyond
running as a TDX guest?  If it were e.g. intel_prot_guest_has() then I would at
least understand the code, if not agree with the sub-optimal approach, but as is
it makes no sense.

Given amd_prot_guest_has(), my guess is Boris intended intel_prot_guest_has()...

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 04/12] x86/tdx: Add protected guest support for TDX guest
  2021-08-04 22:42         ` Sean Christopherson
@ 2021-08-04 23:00           ` Kuppuswamy, Sathyanarayanan
  2021-08-12  7:53             ` Borislav Petkov
  0 siblings, 1 reply; 72+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2021-08-04 23:00 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, Andy Lutomirski, Peter H Anvin, Tony Luck,
	Dan Williams, Andi Kleen, Kirill Shutemov,
	Kuppuswamy Sathyanarayanan, x86, linux-kernel



On 8/4/21 3:42 PM, Sean Christopherson wrote:
> What other possible use case is there for invoking tdx_prot_guest_has() beyond
> running as a TDX guest?  If it were e.g. intel_prot_guest_has() then I would at
> least understand the code, if not agree with the sub-optimal approach, but as is
> it makes no sense.

Yes. I think his original intention is to implement intel_prot_guest_has() in
arch/x86/kernel/cpu/intel.c and then call tdx_prot_guest_has() from it. But since
we have no other use cases, we have called tdx_prot_*() directly here.

Boris, how do you want to proceed? Do you want me to implement intel_prot_guest_has()
or change comparison log to TDX specific?


> 
> Given amd_prot_guest_has(), my guess is Boris intended intel_prot_guest_has()...

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 01/12] x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT
  2021-08-04 18:13 ` [PATCH v5 01/12] x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT Kuppuswamy Sathyanarayanan
@ 2021-08-12  7:18   ` Borislav Petkov
  2021-08-12 17:17     ` Kuppuswamy, Sathyanarayanan
  2021-08-17 12:50     ` Juergen Gross
  0 siblings, 2 replies; 72+ messages in thread
From: Borislav Petkov @ 2021-08-12  7:18 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel, Juergen Gross, Deep Shah, VMware, Inc.

On Wed, Aug 04, 2021 at 11:13:18AM -0700, Kuppuswamy Sathyanarayanan wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> 
> CONFIG_PARAVIRT_XXL is mainly defined/used by XEN PV guests. For
> other VM guest types, features supported under CONFIG_PARAVIRT
> are self sufficient. CONFIG_PARAVIRT mainly provides support for
> TLB flush operations and time related operations.
> 
> For TDX guest as well, paravirt calls under CONFIG_PARVIRT meets
> most of its requirement except the need of HLT and SAFE_HLT
> paravirt calls, which is currently defined under
> COFNIG_PARAVIRT_XXL.
> 
> Since enabling CONFIG_PARAVIRT_XXL is too bloated for TDX guest
> like platforms, move HLT and SAFE_HLT paravirt calls under
> CONFIG_PARAVIRT.
> 
> Moving HLT and SAFE_HLT paravirt calls are not fatal and should not
> break any functionality for current users of CONFIG_PARAVIRT.
> 
> Co-developed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reviewed-by: Andi Kleen <ak@linux.intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> ---

You need to do this before sending your patches:

./scripts/get_maintainer.pl /tmp/tdx.01
Thomas Gleixner <tglx@linutronix.de> (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),commit_signer:1/6=17%)
Ingo Molnar <mingo@redhat.com> (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT))
Borislav Petkov <bp@alien8.de> (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),commit_signer:6/6=100%)
x86@kernel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT))
"H. Peter Anvin" <hpa@zytor.com> (reviewer:X86 ARCHITECTURE (32-BIT AND 64-BIT))
Juergen Gross <jgross@suse.com> (supporter:PARAVIRT_OPS INTERFACE,commit_signer:5/6=83%,authored:5/6=83%,added_lines:15/16=94%,removed_lines:38/39=97%)
Deep Shah <sdeep@vmware.com> (supporter:PARAVIRT_OPS INTERFACE)
"VMware, Inc." <pv-drivers@vmware.com> (supporter:PARAVIRT_OPS INTERFACE)
...

and CC also the supporters - I'm pretty sure at least Juergen would like
to be kept up-to-date on pv changes. I'll CC him and the others now and
leave in the whole diff but make sure you do that in the future please.

>  arch/x86/include/asm/irqflags.h       | 40 +++++++++++++++------------
>  arch/x86/include/asm/paravirt.h       | 20 +++++++-------
>  arch/x86/include/asm/paravirt_types.h |  3 +-
>  arch/x86/kernel/paravirt.c            |  4 ++-
>  4 files changed, 36 insertions(+), 31 deletions(-)
> 
> diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
> index c5ce9845c999..f3bb33b1715d 100644
> --- a/arch/x86/include/asm/irqflags.h
> +++ b/arch/x86/include/asm/irqflags.h
> @@ -59,6 +59,28 @@ static inline __cpuidle void native_halt(void)
>  
>  #endif
>  
> +#ifndef CONFIG_PARAVIRT
> +#ifndef __ASSEMBLY__
> +/*
> + * Used in the idle loop; sti takes one instruction cycle
> + * to complete:
> + */
> +static inline __cpuidle void arch_safe_halt(void)
> +{
> +	native_safe_halt();
> +}
> +
> +/*
> + * Used when interrupts are already enabled or to
> + * shutdown the processor:
> + */
> +static inline __cpuidle void halt(void)
> +{
> +	native_halt();
> +}
> +#endif /* __ASSEMBLY__ */
> +#endif /* CONFIG_PARAVIRT */
> +
>  #ifdef CONFIG_PARAVIRT_XXL
>  #include <asm/paravirt.h>
>  #else
> @@ -80,24 +102,6 @@ static __always_inline void arch_local_irq_enable(void)
>  	native_irq_enable();
>  }
>  
> -/*
> - * Used in the idle loop; sti takes one instruction cycle
> - * to complete:
> - */
> -static inline __cpuidle void arch_safe_halt(void)
> -{
> -	native_safe_halt();
> -}
> -
> -/*
> - * Used when interrupts are already enabled or to
> - * shutdown the processor:
> - */
> -static inline __cpuidle void halt(void)
> -{
> -	native_halt();
> -}
> -
>  /*
>   * For spinlocks, etc:
>   */
> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index da3a1ac82be5..d323a626c7a8 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -97,6 +97,16 @@ static inline void paravirt_arch_exit_mmap(struct mm_struct *mm)
>  	PVOP_VCALL1(mmu.exit_mmap, mm);
>  }
>  
> +static inline void arch_safe_halt(void)
> +{
> +	PVOP_VCALL0(irq.safe_halt);
> +}
> +
> +static inline void halt(void)
> +{
> +	PVOP_VCALL0(irq.halt);
> +}
> +
>  #ifdef CONFIG_PARAVIRT_XXL
>  static inline void load_sp0(unsigned long sp0)
>  {
> @@ -162,16 +172,6 @@ static inline void __write_cr4(unsigned long x)
>  	PVOP_VCALL1(cpu.write_cr4, x);
>  }
>  
> -static inline void arch_safe_halt(void)
> -{
> -	PVOP_VCALL0(irq.safe_halt);
> -}
> -
> -static inline void halt(void)
> -{
> -	PVOP_VCALL0(irq.halt);
> -}
> -
>  static inline void wbinvd(void)
>  {
>  	PVOP_ALT_VCALL0(cpu.wbinvd, "wbinvd", ALT_NOT(X86_FEATURE_XENPV));
> diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
> index d9d6b0203ec4..40082847f314 100644
> --- a/arch/x86/include/asm/paravirt_types.h
> +++ b/arch/x86/include/asm/paravirt_types.h
> @@ -150,10 +150,9 @@ struct pv_irq_ops {
>  	struct paravirt_callee_save save_fl;
>  	struct paravirt_callee_save irq_disable;
>  	struct paravirt_callee_save irq_enable;
> -
> +#endif
>  	void (*safe_halt)(void);
>  	void (*halt)(void);
> -#endif
>  } __no_randomize_layout;
>  
>  struct pv_mmu_ops {
> diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
> index 04cafc057bed..124e0f6c5d1c 100644
> --- a/arch/x86/kernel/paravirt.c
> +++ b/arch/x86/kernel/paravirt.c
> @@ -283,9 +283,11 @@ struct paravirt_patch_template pv_ops = {
>  	.irq.save_fl		= __PV_IS_CALLEE_SAVE(native_save_fl),
>  	.irq.irq_disable	= __PV_IS_CALLEE_SAVE(native_irq_disable),
>  	.irq.irq_enable		= __PV_IS_CALLEE_SAVE(native_irq_enable),
> +#endif /* CONFIG_PARAVIRT_XXL */
> +
> +	/* Irq HLT ops. */

What's that comment for?

>  	.irq.safe_halt		= native_safe_halt,
>  	.irq.halt		= native_halt,
> -#endif /* CONFIG_PARAVIRT_XXL */
>  
>  	/* Mmu ops. */
>  	.mmu.flush_tlb_user	= native_flush_tlb_local,
> -- 
> 2.25.1
> 

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 03/12] x86/cpufeatures: Add TDX Guest CPU feature
  2021-08-04 18:13 ` [PATCH v5 03/12] x86/cpufeatures: Add TDX Guest CPU feature Kuppuswamy Sathyanarayanan
@ 2021-08-12  7:39   ` Borislav Petkov
  0 siblings, 0 replies; 72+ messages in thread
From: Borislav Petkov @ 2021-08-12  7:39 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel

On Wed, Aug 04, 2021 at 11:13:20AM -0700, Kuppuswamy Sathyanarayanan wrote:
> +static inline bool cpuid_has_tdx_guest(void)
> +{
> +	u32 eax, sig[3];
> +
> +	if (cpuid_eax(0) < TDX_CPUID_LEAF_ID)
> +		return false;
> +
> +	cpuid_count(TDX_CPUID_LEAF_ID, 0, &eax, &sig[0], &sig[2], &sig[1]);

Ok, good, in the meantime you've caught that the order was wrong - you
had:

	cpuid_count(TDX_CPUID_LEAF_ID, 0, &eax, &sig[0], &sig[1], &sig[2]);

but that would've been "Inte    lTDX"

:-)

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 04/12] x86/tdx: Add protected guest support for TDX guest
  2021-08-04 23:00           ` Kuppuswamy, Sathyanarayanan
@ 2021-08-12  7:53             ` Borislav Petkov
  2021-08-12 17:18               ` Kuppuswamy, Sathyanarayanan
  0 siblings, 1 reply; 72+ messages in thread
From: Borislav Petkov @ 2021-08-12  7:53 UTC (permalink / raw)
  To: Kuppuswamy, Sathyanarayanan
  Cc: Sean Christopherson, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Andy Lutomirski, Peter H Anvin, Tony Luck,
	Dan Williams, Andi Kleen, Kirill Shutemov,
	Kuppuswamy Sathyanarayanan, x86, linux-kernel

On Wed, Aug 04, 2021 at 04:00:18PM -0700, Kuppuswamy, Sathyanarayanan wrote:
> Boris, how do you want to proceed? Do you want me to implement intel_prot_guest_has()
> or change comparison log to TDX specific?

Yes, the idea was to have the generic call prot_guest_has() call
into the vendor one. This way each vendor is free to do whatever it
desires in its own version and callers simply need to call the generic
interface.

> > Given amd_prot_guest_has(), my guess is Boris intended intel_prot_guest_has()...

Yap, exactly.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 01/12] x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT
  2021-08-12  7:18   ` Borislav Petkov
@ 2021-08-12 17:17     ` Kuppuswamy, Sathyanarayanan
  2021-08-17 12:50     ` Juergen Gross
  1 sibling, 0 replies; 72+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2021-08-12 17:17 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel, Juergen Gross, Deep Shah, VMware, Inc.



On 8/12/21 12:18 AM, Borislav Petkov wrote:
> and CC also the supporters - I'm pretty sure at least Juergen would like
> to be kept up-to-date on pv changes. I'll CC him and the others now and
> leave in the whole diff but make sure you do that in the future please.

Sure. I will do so in future submissions.

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 04/12] x86/tdx: Add protected guest support for TDX guest
  2021-08-12  7:53             ` Borislav Petkov
@ 2021-08-12 17:18               ` Kuppuswamy, Sathyanarayanan
  2021-08-20 14:28                 ` Borislav Petkov
  0 siblings, 1 reply; 72+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2021-08-12 17:18 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Sean Christopherson, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Andy Lutomirski, Peter H Anvin, Tony Luck,
	Dan Williams, Andi Kleen, Kirill Shutemov,
	Kuppuswamy Sathyanarayanan, x86, linux-kernel



On 8/12/21 12:53 AM, Borislav Petkov wrote:
>>> Given amd_prot_guest_has(), my guess is Boris intended intel_prot_guest_has()...
> Yap, exactly.

I can implement intel_prot_guest_has() in arch/x86/kernel/cpu/intel.c. And call
tdx_prot_guest_has() from it.

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 01/12] x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT
  2021-08-12  7:18   ` Borislav Petkov
  2021-08-12 17:17     ` Kuppuswamy, Sathyanarayanan
@ 2021-08-17 12:50     ` Juergen Gross
  2021-08-17 13:16       ` Kuppuswamy, Sathyanarayanan
  1 sibling, 1 reply; 72+ messages in thread
From: Juergen Gross @ 2021-08-17 12:50 UTC (permalink / raw)
  To: Borislav Petkov, Kuppuswamy Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel, Deep Shah, VMware, Inc.


[-- Attachment #1.1.1: Type: text/plain, Size: 6673 bytes --]

On 12.08.21 09:18, Borislav Petkov wrote:
> On Wed, Aug 04, 2021 at 11:13:18AM -0700, Kuppuswamy Sathyanarayanan wrote:
>> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>>
>> CONFIG_PARAVIRT_XXL is mainly defined/used by XEN PV guests. For
>> other VM guest types, features supported under CONFIG_PARAVIRT
>> are self sufficient. CONFIG_PARAVIRT mainly provides support for
>> TLB flush operations and time related operations.
>>
>> For TDX guest as well, paravirt calls under CONFIG_PARVIRT meets
>> most of its requirement except the need of HLT and SAFE_HLT
>> paravirt calls, which is currently defined under
>> COFNIG_PARAVIRT_XXL.
>>
>> Since enabling CONFIG_PARAVIRT_XXL is too bloated for TDX guest
>> like platforms, move HLT and SAFE_HLT paravirt calls under
>> CONFIG_PARAVIRT.
>>
>> Moving HLT and SAFE_HLT paravirt calls are not fatal and should not
>> break any functionality for current users of CONFIG_PARAVIRT.
>>
>> Co-developed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> Reviewed-by: Andi Kleen <ak@linux.intel.com>
>> Reviewed-by: Tony Luck <tony.luck@intel.com>
>> ---
> 
> You need to do this before sending your patches:
> 
> ./scripts/get_maintainer.pl /tmp/tdx.01
> Thomas Gleixner <tglx@linutronix.de> (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),commit_signer:1/6=17%)
> Ingo Molnar <mingo@redhat.com> (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT))
> Borislav Petkov <bp@alien8.de> (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),commit_signer:6/6=100%)
> x86@kernel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT))
> "H. Peter Anvin" <hpa@zytor.com> (reviewer:X86 ARCHITECTURE (32-BIT AND 64-BIT))
> Juergen Gross <jgross@suse.com> (supporter:PARAVIRT_OPS INTERFACE,commit_signer:5/6=83%,authored:5/6=83%,added_lines:15/16=94%,removed_lines:38/39=97%)
> Deep Shah <sdeep@vmware.com> (supporter:PARAVIRT_OPS INTERFACE)
> "VMware, Inc." <pv-drivers@vmware.com> (supporter:PARAVIRT_OPS INTERFACE)
> ...
> 
> and CC also the supporters - I'm pretty sure at least Juergen would like
> to be kept up-to-date on pv changes. I'll CC him and the others now and
> leave in the whole diff but make sure you do that in the future please.
> 
>>   arch/x86/include/asm/irqflags.h       | 40 +++++++++++++++------------
>>   arch/x86/include/asm/paravirt.h       | 20 +++++++-------
>>   arch/x86/include/asm/paravirt_types.h |  3 +-
>>   arch/x86/kernel/paravirt.c            |  4 ++-
>>   4 files changed, 36 insertions(+), 31 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
>> index c5ce9845c999..f3bb33b1715d 100644
>> --- a/arch/x86/include/asm/irqflags.h
>> +++ b/arch/x86/include/asm/irqflags.h
>> @@ -59,6 +59,28 @@ static inline __cpuidle void native_halt(void)
>>   
>>   #endif
>>   
>> +#ifndef CONFIG_PARAVIRT
>> +#ifndef __ASSEMBLY__
>> +/*
>> + * Used in the idle loop; sti takes one instruction cycle
>> + * to complete:
>> + */
>> +static inline __cpuidle void arch_safe_halt(void)
>> +{
>> +	native_safe_halt();
>> +}
>> +
>> +/*
>> + * Used when interrupts are already enabled or to
>> + * shutdown the processor:
>> + */
>> +static inline __cpuidle void halt(void)
>> +{
>> +	native_halt();
>> +}
>> +#endif /* __ASSEMBLY__ */
>> +#endif /* CONFIG_PARAVIRT */
>> +
>>   #ifdef CONFIG_PARAVIRT_XXL
>>   #include <asm/paravirt.h>

Did you test this with CONFIG_PARAVIRT enabled and CONFIG_PARAVIRT_XXL
disabled?

I'm asking because in this case I don't see where halt() and
arch_safe_halt() would be defined in case someone is including
asm/irqflags.h and not asm/paravirt.h.

>>   #else
>> @@ -80,24 +102,6 @@ static __always_inline void arch_local_irq_enable(void)
>>   	native_irq_enable();
>>   }
>>   
>> -/*
>> - * Used in the idle loop; sti takes one instruction cycle
>> - * to complete:
>> - */
>> -static inline __cpuidle void arch_safe_halt(void)
>> -{
>> -	native_safe_halt();
>> -}
>> -
>> -/*
>> - * Used when interrupts are already enabled or to
>> - * shutdown the processor:
>> - */
>> -static inline __cpuidle void halt(void)
>> -{
>> -	native_halt();
>> -}
>> -
>>   /*
>>    * For spinlocks, etc:
>>    */
>> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
>> index da3a1ac82be5..d323a626c7a8 100644
>> --- a/arch/x86/include/asm/paravirt.h
>> +++ b/arch/x86/include/asm/paravirt.h
>> @@ -97,6 +97,16 @@ static inline void paravirt_arch_exit_mmap(struct mm_struct *mm)
>>   	PVOP_VCALL1(mmu.exit_mmap, mm);
>>   }
>>   
>> +static inline void arch_safe_halt(void)
>> +{
>> +	PVOP_VCALL0(irq.safe_halt);
>> +}
>> +
>> +static inline void halt(void)
>> +{
>> +	PVOP_VCALL0(irq.halt);
>> +}
>> +
>>   #ifdef CONFIG_PARAVIRT_XXL
>>   static inline void load_sp0(unsigned long sp0)
>>   {
>> @@ -162,16 +172,6 @@ static inline void __write_cr4(unsigned long x)
>>   	PVOP_VCALL1(cpu.write_cr4, x);
>>   }
>>   
>> -static inline void arch_safe_halt(void)
>> -{
>> -	PVOP_VCALL0(irq.safe_halt);
>> -}
>> -
>> -static inline void halt(void)
>> -{
>> -	PVOP_VCALL0(irq.halt);
>> -}
>> -
>>   static inline void wbinvd(void)
>>   {
>>   	PVOP_ALT_VCALL0(cpu.wbinvd, "wbinvd", ALT_NOT(X86_FEATURE_XENPV));
>> diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
>> index d9d6b0203ec4..40082847f314 100644
>> --- a/arch/x86/include/asm/paravirt_types.h
>> +++ b/arch/x86/include/asm/paravirt_types.h
>> @@ -150,10 +150,9 @@ struct pv_irq_ops {
>>   	struct paravirt_callee_save save_fl;
>>   	struct paravirt_callee_save irq_disable;
>>   	struct paravirt_callee_save irq_enable;
>> -
>> +#endif
>>   	void (*safe_halt)(void);
>>   	void (*halt)(void);
>> -#endif
>>   } __no_randomize_layout;
>>   
>>   struct pv_mmu_ops {
>> diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
>> index 04cafc057bed..124e0f6c5d1c 100644
>> --- a/arch/x86/kernel/paravirt.c
>> +++ b/arch/x86/kernel/paravirt.c
>> @@ -283,9 +283,11 @@ struct paravirt_patch_template pv_ops = {
>>   	.irq.save_fl		= __PV_IS_CALLEE_SAVE(native_save_fl),
>>   	.irq.irq_disable	= __PV_IS_CALLEE_SAVE(native_irq_disable),
>>   	.irq.irq_enable		= __PV_IS_CALLEE_SAVE(native_irq_enable),
>> +#endif /* CONFIG_PARAVIRT_XXL */
>> +
>> +	/* Irq HLT ops. */
> 
> What's that comment for?

I agree, please drop it.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 01/12] x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT
  2021-08-17 12:50     ` Juergen Gross
@ 2021-08-17 13:16       ` Kuppuswamy, Sathyanarayanan
  2021-08-17 13:28         ` Juergen Gross
  0 siblings, 1 reply; 72+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2021-08-17 13:16 UTC (permalink / raw)
  To: Juergen Gross, Borislav Petkov
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel, Deep Shah, VMware, Inc.



On 8/17/21 5:50 AM, Juergen Gross wrote:
> On 12.08.21 09:18, Borislav Petkov wrote:
>> On Wed, Aug 04, 2021 at 11:13:18AM -0700, Kuppuswamy Sathyanarayanan wrote:
>>> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>>>
>>> CONFIG_PARAVIRT_XXL is mainly defined/used by XEN PV guests. For
>>> other VM guest types, features supported under CONFIG_PARAVIRT
>>> are self sufficient. CONFIG_PARAVIRT mainly provides support for
>>> TLB flush operations and time related operations.
>>>
>>> For TDX guest as well, paravirt calls under CONFIG_PARVIRT meets
>>> most of its requirement except the need of HLT and SAFE_HLT
>>> paravirt calls, which is currently defined under
>>> COFNIG_PARAVIRT_XXL.
>>>
>>> Since enabling CONFIG_PARAVIRT_XXL is too bloated for TDX guest
>>> like platforms, move HLT and SAFE_HLT paravirt calls under
>>> CONFIG_PARAVIRT.
>>>
>>> Moving HLT and SAFE_HLT paravirt calls are not fatal and should not
>>> break any functionality for current users of CONFIG_PARAVIRT.
>>>
>>> Co-developed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>>> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>>> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>>> Reviewed-by: Andi Kleen <ak@linux.intel.com>
>>> Reviewed-by: Tony Luck <tony.luck@intel.com>
>>> ---
>>
>> You need to do this before sending your patches:
>>
>> ./scripts/get_maintainer.pl /tmp/tdx.01
>> Thomas Gleixner <tglx@linutronix.de> (maintainer:X86 ARCHITECTURE (32-BIT AND 
>> 64-BIT),commit_signer:1/6=17%)
>> Ingo Molnar <mingo@redhat.com> (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT))
>> Borislav Petkov <bp@alien8.de> (maintainer:X86 ARCHITECTURE (32-BIT AND 
>> 64-BIT),commit_signer:6/6=100%)
>> x86@kernel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT))
>> "H. Peter Anvin" <hpa@zytor.com> (reviewer:X86 ARCHITECTURE (32-BIT AND 64-BIT))
>> Juergen Gross <jgross@suse.com> (supporter:PARAVIRT_OPS 
>> INTERFACE,commit_signer:5/6=83%,authored:5/6=83%,added_lines:15/16=94%,removed_lines:38/39=97%)
>> Deep Shah <sdeep@vmware.com> (supporter:PARAVIRT_OPS INTERFACE)
>> "VMware, Inc." <pv-drivers@vmware.com> (supporter:PARAVIRT_OPS INTERFACE)
>> ...
>>
>> and CC also the supporters - I'm pretty sure at least Juergen would like
>> to be kept up-to-date on pv changes. I'll CC him and the others now and
>> leave in the whole diff but make sure you do that in the future please.
>>
>>>   arch/x86/include/asm/irqflags.h       | 40 +++++++++++++++------------
>>>   arch/x86/include/asm/paravirt.h       | 20 +++++++-------
>>>   arch/x86/include/asm/paravirt_types.h |  3 +-
>>>   arch/x86/kernel/paravirt.c            |  4 ++-
>>>   4 files changed, 36 insertions(+), 31 deletions(-)
>>>
>>> diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
>>> index c5ce9845c999..f3bb33b1715d 100644
>>> --- a/arch/x86/include/asm/irqflags.h
>>> +++ b/arch/x86/include/asm/irqflags.h
>>> @@ -59,6 +59,28 @@ static inline __cpuidle void native_halt(void)
>>>   #endif
>>> +#ifndef CONFIG_PARAVIRT
>>> +#ifndef __ASSEMBLY__
>>> +/*
>>> + * Used in the idle loop; sti takes one instruction cycle
>>> + * to complete:
>>> + */
>>> +static inline __cpuidle void arch_safe_halt(void)
>>> +{
>>> +    native_safe_halt();
>>> +}
>>> +
>>> +/*
>>> + * Used when interrupts are already enabled or to
>>> + * shutdown the processor:
>>> + */
>>> +static inline __cpuidle void halt(void)
>>> +{
>>> +    native_halt();
>>> +}
>>> +#endif /* __ASSEMBLY__ */
>>> +#endif /* CONFIG_PARAVIRT */
>>> +
>>>   #ifdef CONFIG_PARAVIRT_XXL
>>>   #include <asm/paravirt.h>
> 
> Did you test this with CONFIG_PARAVIRT enabled and CONFIG_PARAVIRT_XXL
> disabled?
> 
> I'm asking because in this case I don't see where halt() and
> arch_safe_halt() would be defined in case someone is including
> asm/irqflags.h and not asm/paravirt.h.

We have tested both cases and did not hit any issues.

1. CONFIG_PARAVIRT=y and  CONFIG_PARAVIRT_XXL=y
2. CONFIG_PARAVIRT=y and  CONFIG_PARAVIRT_XXL=n


>>> diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
>>> index 04cafc057bed..124e0f6c5d1c 100644
>>> --- a/arch/x86/kernel/paravirt.c
>>> +++ b/arch/x86/kernel/paravirt.c
>>> @@ -283,9 +283,11 @@ struct paravirt_patch_template pv_ops = {
>>>       .irq.save_fl        = __PV_IS_CALLEE_SAVE(native_save_fl),
>>>       .irq.irq_disable    = __PV_IS_CALLEE_SAVE(native_irq_disable),
>>>       .irq.irq_enable        = __PV_IS_CALLEE_SAVE(native_irq_enable),
>>> +#endif /* CONFIG_PARAVIRT_XXL */
>>> +
>>> +    /* Irq HLT ops. */
>>
>> What's that comment for?
> 
> I agree, please drop it.

Yes. I will drop it.

> 
> 
> Juergen

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 01/12] x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT
  2021-08-17 13:16       ` Kuppuswamy, Sathyanarayanan
@ 2021-08-17 13:28         ` Juergen Gross
  2021-08-17 13:39           ` Kuppuswamy, Sathyanarayanan
  0 siblings, 1 reply; 72+ messages in thread
From: Juergen Gross @ 2021-08-17 13:28 UTC (permalink / raw)
  To: Kuppuswamy, Sathyanarayanan, Borislav Petkov
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel, Deep Shah, VMware, Inc.


[-- Attachment #1.1.1: Type: text/plain, Size: 4776 bytes --]

On 17.08.21 15:16, Kuppuswamy, Sathyanarayanan wrote:
> 
> 
> On 8/17/21 5:50 AM, Juergen Gross wrote:
>> On 12.08.21 09:18, Borislav Petkov wrote:
>>> On Wed, Aug 04, 2021 at 11:13:18AM -0700, Kuppuswamy Sathyanarayanan 
>>> wrote:
>>>> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>>>>
>>>> CONFIG_PARAVIRT_XXL is mainly defined/used by XEN PV guests. For
>>>> other VM guest types, features supported under CONFIG_PARAVIRT
>>>> are self sufficient. CONFIG_PARAVIRT mainly provides support for
>>>> TLB flush operations and time related operations.
>>>>
>>>> For TDX guest as well, paravirt calls under CONFIG_PARVIRT meets
>>>> most of its requirement except the need of HLT and SAFE_HLT
>>>> paravirt calls, which is currently defined under
>>>> COFNIG_PARAVIRT_XXL.
>>>>
>>>> Since enabling CONFIG_PARAVIRT_XXL is too bloated for TDX guest
>>>> like platforms, move HLT and SAFE_HLT paravirt calls under
>>>> CONFIG_PARAVIRT.
>>>>
>>>> Moving HLT and SAFE_HLT paravirt calls are not fatal and should not
>>>> break any functionality for current users of CONFIG_PARAVIRT.
>>>>
>>>> Co-developed-by: Kuppuswamy Sathyanarayanan 
>>>> <sathyanarayanan.kuppuswamy@linux.intel.com>
>>>> Signed-off-by: Kuppuswamy Sathyanarayanan 
>>>> <sathyanarayanan.kuppuswamy@linux.intel.com>
>>>> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>>>> Reviewed-by: Andi Kleen <ak@linux.intel.com>
>>>> Reviewed-by: Tony Luck <tony.luck@intel.com>
>>>> ---
>>>
>>> You need to do this before sending your patches:
>>>
>>> ./scripts/get_maintainer.pl /tmp/tdx.01
>>> Thomas Gleixner <tglx@linutronix.de> (maintainer:X86 ARCHITECTURE 
>>> (32-BIT AND 64-BIT),commit_signer:1/6=17%)
>>> Ingo Molnar <mingo@redhat.com> (maintainer:X86 ARCHITECTURE (32-BIT 
>>> AND 64-BIT))
>>> Borislav Petkov <bp@alien8.de> (maintainer:X86 ARCHITECTURE (32-BIT 
>>> AND 64-BIT),commit_signer:6/6=100%)
>>> x86@kernel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT))
>>> "H. Peter Anvin" <hpa@zytor.com> (reviewer:X86 ARCHITECTURE (32-BIT 
>>> AND 64-BIT))
>>> Juergen Gross <jgross@suse.com> (supporter:PARAVIRT_OPS 
>>> INTERFACE,commit_signer:5/6=83%,authored:5/6=83%,added_lines:15/16=94%,removed_lines:38/39=97%) 
>>>
>>> Deep Shah <sdeep@vmware.com> (supporter:PARAVIRT_OPS INTERFACE)
>>> "VMware, Inc." <pv-drivers@vmware.com> (supporter:PARAVIRT_OPS 
>>> INTERFACE)
>>> ...
>>>
>>> and CC also the supporters - I'm pretty sure at least Juergen would like
>>> to be kept up-to-date on pv changes. I'll CC him and the others now and
>>> leave in the whole diff but make sure you do that in the future please.
>>>
>>>>   arch/x86/include/asm/irqflags.h       | 40 
>>>> +++++++++++++++------------
>>>>   arch/x86/include/asm/paravirt.h       | 20 +++++++-------
>>>>   arch/x86/include/asm/paravirt_types.h |  3 +-
>>>>   arch/x86/kernel/paravirt.c            |  4 ++-
>>>>   4 files changed, 36 insertions(+), 31 deletions(-)
>>>>
>>>> diff --git a/arch/x86/include/asm/irqflags.h 
>>>> b/arch/x86/include/asm/irqflags.h
>>>> index c5ce9845c999..f3bb33b1715d 100644
>>>> --- a/arch/x86/include/asm/irqflags.h
>>>> +++ b/arch/x86/include/asm/irqflags.h
>>>> @@ -59,6 +59,28 @@ static inline __cpuidle void native_halt(void)
>>>>   #endif
>>>> +#ifndef CONFIG_PARAVIRT
>>>> +#ifndef __ASSEMBLY__
>>>> +/*
>>>> + * Used in the idle loop; sti takes one instruction cycle
>>>> + * to complete:
>>>> + */
>>>> +static inline __cpuidle void arch_safe_halt(void)
>>>> +{
>>>> +    native_safe_halt();
>>>> +}
>>>> +
>>>> +/*
>>>> + * Used when interrupts are already enabled or to
>>>> + * shutdown the processor:
>>>> + */
>>>> +static inline __cpuidle void halt(void)
>>>> +{
>>>> +    native_halt();
>>>> +}
>>>> +#endif /* __ASSEMBLY__ */
>>>> +#endif /* CONFIG_PARAVIRT */
>>>> +
>>>>   #ifdef CONFIG_PARAVIRT_XXL
>>>>   #include <asm/paravirt.h>
>>
>> Did you test this with CONFIG_PARAVIRT enabled and CONFIG_PARAVIRT_XXL
>> disabled?
>>
>> I'm asking because in this case I don't see where halt() and
>> arch_safe_halt() would be defined in case someone is including
>> asm/irqflags.h and not asm/paravirt.h.
> 
> We have tested both cases and did not hit any issues.
> 
> 1. CONFIG_PARAVIRT=y and  CONFIG_PARAVIRT_XXL=y
> 2. CONFIG_PARAVIRT=y and  CONFIG_PARAVIRT_XXL=n

I guess you have been lucky and all users of arch_safe_halt() and halt()
are directly or indirectly including asm/paravirt.h by other means.

There might be configs where this is not true, though.

In any case I believe you should fix your patch to let asm/irqflags.h
include asm/paravirt.h for the CONFIG_PARAVIRT case.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 01/12] x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT
  2021-08-17 13:28         ` Juergen Gross
@ 2021-08-17 13:39           ` Kuppuswamy, Sathyanarayanan
  2021-08-17 13:47             ` Juergen Gross
  0 siblings, 1 reply; 72+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2021-08-17 13:39 UTC (permalink / raw)
  To: Juergen Gross, Borislav Petkov
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel, Deep Shah, VMware, Inc.



On 8/17/21 6:28 AM, Juergen Gross wrote:
> I guess you have been lucky and all users of arch_safe_halt() and halt()
> are directly or indirectly including asm/paravirt.h by other means.
> 
> There might be configs where this is not true, though.
> 
> In any case I believe you should fix your patch to let asm/irqflags.h
> include asm/paravirt.h for the CONFIG_PARAVIRT case.

Ok. I will include it.

#if defined(CONFIG_PARAVIRT) && !defined(CONFIG_PARAVIRT_XXL)
#include <asm/paravirt.h>
#endif

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 01/12] x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT
  2021-08-17 13:39           ` Kuppuswamy, Sathyanarayanan
@ 2021-08-17 13:47             ` Juergen Gross
  2021-08-17 13:50               ` Kuppuswamy, Sathyanarayanan
  0 siblings, 1 reply; 72+ messages in thread
From: Juergen Gross @ 2021-08-17 13:47 UTC (permalink / raw)
  To: Kuppuswamy, Sathyanarayanan, Borislav Petkov
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel, Deep Shah, VMware, Inc.


[-- Attachment #1.1.1: Type: text/plain, Size: 860 bytes --]

On 17.08.21 15:39, Kuppuswamy, Sathyanarayanan wrote:
> 
> 
> On 8/17/21 6:28 AM, Juergen Gross wrote:
>> I guess you have been lucky and all users of arch_safe_halt() and halt()
>> are directly or indirectly including asm/paravirt.h by other means.
>>
>> There might be configs where this is not true, though.
>>
>> In any case I believe you should fix your patch to let asm/irqflags.h
>> include asm/paravirt.h for the CONFIG_PARAVIRT case.
> 
> Ok. I will include it.
> 
> #if defined(CONFIG_PARAVIRT) && !defined(CONFIG_PARAVIRT_XXL)
> #include <asm/paravirt.h>
> #endif
> 

I don't see a reason to have two "#include <asm/paravirt.h>" lines in
one file. Why don't you use:

#ifdef CONFIG_PARAVIRT
#include <asm/paravirt.h>
#else
#ifndef __ASSEMBLY___
...
#endif
#endif

#ifndef CONFIG_PARAVIRT_XXL
...
#endif


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 01/12] x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT
  2021-08-17 13:47             ` Juergen Gross
@ 2021-08-17 13:50               ` Kuppuswamy, Sathyanarayanan
  0 siblings, 0 replies; 72+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2021-08-17 13:50 UTC (permalink / raw)
  To: Juergen Gross, Borislav Petkov
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel, Deep Shah, VMware, Inc.



On 8/17/21 6:47 AM, Juergen Gross wrote:
> I don't see a reason to have two "#include <asm/paravirt.h>" lines in
> one file. Why don't you use:
> 
> #ifdef CONFIG_PARAVIRT
> #include <asm/paravirt.h>
> #else
> #ifndef __ASSEMBLY___
> ...
> #endif
> #endif
> 
> #ifndef CONFIG_PARAVIRT_XXL
> ...
> #endif

Ok. I will use your format.

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 04/12] x86/tdx: Add protected guest support for TDX guest
  2021-08-12 17:18               ` Kuppuswamy, Sathyanarayanan
@ 2021-08-20 14:28                 ` Borislav Petkov
  2021-08-20 16:42                   ` Kuppuswamy, Sathyanarayanan
  0 siblings, 1 reply; 72+ messages in thread
From: Borislav Petkov @ 2021-08-20 14:28 UTC (permalink / raw)
  To: Kuppuswamy, Sathyanarayanan
  Cc: Sean Christopherson, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Andy Lutomirski, Peter H Anvin, Tony Luck,
	Dan Williams, Andi Kleen, Kirill Shutemov,
	Kuppuswamy Sathyanarayanan, x86, linux-kernel

On Thu, Aug 12, 2021 at 10:18:39AM -0700, Kuppuswamy, Sathyanarayanan wrote:
> I can implement intel_prot_guest_has() in arch/x86/kernel/cpu/intel.c.
> And call tdx_prot_guest_has() from it.

No, you should simply implement intel_prot_guest_has() or whatever we
end up calling it and have the generic routine call it. Not two routines
- tdx_* and intel_*

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 05/12] x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper functions
  2021-08-04 18:13 ` [PATCH v5 05/12] x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper functions Kuppuswamy Sathyanarayanan
@ 2021-08-20 15:16   ` Borislav Petkov
  0 siblings, 0 replies; 72+ messages in thread
From: Borislav Petkov @ 2021-08-20 15:16 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel

On Wed, Aug 04, 2021 at 11:13:22AM -0700, Kuppuswamy Sathyanarayanan wrote:
> +SYM_FUNC_START(__tdx_hypercall)
> +	FRAME_BEGIN
> +
> +	/* Move argument 7 from caller stack to RAX */
> +	movq ARG7_SP_OFFSET(%rsp), %rax
> +
> +	/* Check if caller provided an output struct */
> +	test %rax, %rax
> +	/* If out pointer is NULL, return -EINVAL */
> +	jz 1f
> +
> +	/* Save callee-s ved GPRs as mandated by the x86_64 ABI */

That should be "callee-saved" ofc. "s ved" is not a word. :-)

...

> +	/*
> +	 * Zero out registers exposed to the VMM to avoid
> +	 * speculative execution with VMM-controlled values.
> +	 * This needs to include all registers present in
> +	 * TDVMCALL_EXPOSE_REGS_MASK (except R12-R15).
> +	 * R12-R15 context will be restored.
> +	 */
> +	xor %r10d, %r10d
> +	xor %r11d, %r11d
> +
> +	/* Restore state of R9 register */
> +	pop %r9
> +
> +	/* Restore callee-s ved GPRs as mandated by the x86_64 ABI */

Here too.

Otherwise, LGTM. Thanks for documenting the ABI - looks good.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 04/12] x86/tdx: Add protected guest support for TDX guest
  2021-08-20 14:28                 ` Borislav Petkov
@ 2021-08-20 16:42                   ` Kuppuswamy, Sathyanarayanan
  2021-08-20 16:59                     ` Borislav Petkov
  0 siblings, 1 reply; 72+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2021-08-20 16:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Sean Christopherson, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Andy Lutomirski, Peter H Anvin, Tony Luck,
	Dan Williams, Andi Kleen, Kirill Shutemov,
	Kuppuswamy Sathyanarayanan, x86, linux-kernel



On 8/20/21 7:28 AM, Borislav Petkov wrote:
> On Thu, Aug 12, 2021 at 10:18:39AM -0700, Kuppuswamy, Sathyanarayanan wrote:
>> I can implement intel_prot_guest_has() in arch/x86/kernel/cpu/intel.c.
>> And call tdx_prot_guest_has() from it.
> 
> No, you should simply implement intel_prot_guest_has() or whatever we
> end up calling it and have the generic routine call it. Not two routines
> - tdx_* and intel_*
> 

Reason for suggesting seperate function for tdx_* specific protected guest
check is, we will be adding some exceptions for TDX features (like command
line option used to override the default flags or when device filter
support is disabled). Our current final version looks like below. Such
customization are not good in generic intel_* function right?

bool tdx_prot_guest_has(unsigned long flag)
{
         bool tdx_guest_enabled = cpu_feature_enabled(X86_FEATURE_TDX_GUEST);

         if (flag == tdg_disable_prot)
                 return false;

         switch (flag) {
         case PATTR_GUEST_TDX:
         case PATTR_GUEST_UNROLL_STRING_IO:
         case PATTR_GUEST_MEM_ENCRYPT:
         case PATTR_GUEST_SHARED_MAPPING_INIT:
         case PATTR_MEM_ENCRYPT:
         case PATTR_GUEST_SECURE_TIME:
         case PATTR_GUEST_CPUID_FILTER:
         case PATTR_GUEST_RAND_LOOP:
                 return tdx_guest_enabled;
         case PATTR_GUEST_DRIVER_FILTER:
                 return tdg_filter_enabled() && tdx_guest_enabled;
         }

         return false;
}


-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 04/12] x86/tdx: Add protected guest support for TDX guest
  2021-08-20 16:42                   ` Kuppuswamy, Sathyanarayanan
@ 2021-08-20 16:59                     ` Borislav Petkov
  2021-08-20 17:11                       ` Kuppuswamy, Sathyanarayanan
  0 siblings, 1 reply; 72+ messages in thread
From: Borislav Petkov @ 2021-08-20 16:59 UTC (permalink / raw)
  To: Kuppuswamy, Sathyanarayanan
  Cc: Sean Christopherson, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Andy Lutomirski, Peter H Anvin, Tony Luck,
	Dan Williams, Andi Kleen, Kirill Shutemov,
	Kuppuswamy Sathyanarayanan, x86, linux-kernel

On Fri, Aug 20, 2021 at 09:42:55AM -0700, Kuppuswamy, Sathyanarayanan wrote:
> Reason for suggesting seperate function for tdx_* specific protected guest
> check is, we will be adding some exceptions for TDX features (like command
> line option used to override the default flags or when device filter
> support is disabled). Our current final version looks like below. Such
> customization are not good in generic intel_* function right?

Err, why?

TDX is Intel technology. That's like asking to have

sev_prot_guest_has() and amd_prot_guest_has() on AMD.

Maybe I still don't get what you're trying to achieve but from where I'm
standing that sounds wrong.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 04/12] x86/tdx: Add protected guest support for TDX guest
  2021-08-20 16:59                     ` Borislav Petkov
@ 2021-08-20 17:11                       ` Kuppuswamy, Sathyanarayanan
  0 siblings, 0 replies; 72+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2021-08-20 17:11 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Sean Christopherson, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Andy Lutomirski, Peter H Anvin, Tony Luck,
	Dan Williams, Andi Kleen, Kirill Shutemov,
	Kuppuswamy Sathyanarayanan, x86, linux-kernel



On 8/20/21 9:59 AM, Borislav Petkov wrote:
> Err, why?
> 
> TDX is Intel technology. That's like asking to have
> 
> sev_prot_guest_has() and amd_prot_guest_has() on AMD.
> 
> Maybe I still don't get what you're trying to achieve but from where I'm
> standing that sounds wrong.

My intention was to keep intel_* function clean and hide all TDX specific
customization in tdx_* function (within in tdx.c) and call it from
intel_* function (within cpu/intel.c).

But I understand your point as well. I am fine with moving TDX specific checks to
intel_* function.

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 06/12] x86/tdx: Get TD execution environment information via TDINFO
  2021-08-04 18:13 ` [PATCH v5 06/12] x86/tdx: Get TD execution environment information via TDINFO Kuppuswamy Sathyanarayanan
  2021-08-04 22:38   ` Sean Christopherson
@ 2021-08-20 17:13   ` Borislav Petkov
  2021-08-20 17:31     ` Kuppuswamy, Sathyanarayanan
  1 sibling, 1 reply; 72+ messages in thread
From: Borislav Petkov @ 2021-08-20 17:13 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel

On Wed, Aug 04, 2021 at 11:13:23AM -0700, Kuppuswamy Sathyanarayanan wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> 
> Per Guest-Host-Communication Interface (GHCI) for Intel Trust
> Domain Extensions (Intel TDX) specification, sec 2.4.2,
> TDCALL[TDINFO] provides basic TD execution environment information, not
> provided by CPUID.
> 
> Call TDINFO during early boot to be used for following system
> initialization.
> 
> The call provides info on which bit in pfn is used to indicate that the
> page is shared with the host and attributes of the TD, such as debug.
> 
> Information about the number of CPUs need not be saved because there are
> no users so far for it.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reviewed-by: Andi Kleen <ak@linux.intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> ---
> 
> Changes since v4:
>  * None
> 
> Changes since v3:
>  * None
> 
>  arch/x86/kernel/tdx.c | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
> index 287564990f21..3973e81751ba 100644
> --- a/arch/x86/kernel/tdx.c
> +++ b/arch/x86/kernel/tdx.c
> @@ -8,6 +8,14 @@
>  
>  #include <asm/tdx.h>
>  
> +/* TDX Module call Leaf IDs */
> +#define TDINFO				1
> +
> +static struct {
> +	unsigned int gpa_width;
> +	unsigned long attributes;
> +} td_info __ro_after_init;

Where is that thing even used? I don't see it in the whole patchset.

> +
>  /*
>   * Wrapper for standard use of __tdx_hypercall with BUG_ON() check
>   * for TDCALL error.
> @@ -54,6 +62,19 @@ bool tdx_prot_guest_has(unsigned long flag)
>  }
>  EXPORT_SYMBOL_GPL(tdx_prot_guest_has);
>  
> +static void tdg_get_info(void)

Also, what Sean said: "tdx_" please. Unless there's a real reason to
have a different prefix - then state that reason.

> +{
> +	u64 ret;
> +	struct tdx_module_output out = {0};

The tip-tree preferred ordering of variable declarations at the
beginning of a function is reverse fir tree order::

	struct long_struct_name *descriptive_name;
	unsigned long foo, bar;
	unsigned int tmp;
	int ret;

The above is faster to parse than the reverse ordering::

	int ret;
	unsigned int tmp;
	unsigned long foo, bar;
	struct long_struct_name *descriptive_name;

And even more so than random ordering::

	unsigned long foo, bar;
	int ret;
	struct long_struct_name *descriptive_name;
	unsigned int tmp;

> +
> +	ret = __tdx_module_call(TDINFO, 0, 0, 0, 0, &out);
> +
> +	BUG_ON(ret);

WARNING: Avoid crashing the kernel - try using WARN_ON & recovery code rather than BUG() or BUG_ON()
#121: FILE: arch/x86/kernel/tdx.c:72:
+	BUG_ON(ret);

Have I already told you about checkpatch?

If not, here it is:

Please integrate scripts/checkpatch.pl into your patch creation
workflow. Some of the warnings/errors *actually* make sense.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 06/12] x86/tdx: Get TD execution environment information via TDINFO
  2021-08-20 17:13   ` Borislav Petkov
@ 2021-08-20 17:31     ` Kuppuswamy, Sathyanarayanan
  2021-08-20 17:35       ` Borislav Petkov
  0 siblings, 1 reply; 72+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2021-08-20 17:31 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel



On 8/20/21 10:13 AM, Borislav Petkov wrote:
> On Wed, Aug 04, 2021 at 11:13:23AM -0700, Kuppuswamy Sathyanarayanan wrote:
>> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>>
>> Per Guest-Host-Communication Interface (GHCI) for Intel Trust
>> Domain Extensions (Intel TDX) specification, sec 2.4.2,
>> TDCALL[TDINFO] provides basic TD execution environment information, not
>> provided by CPUID.
>>
>> Call TDINFO during early boot to be used for following system
>> initialization.
>>
>> The call provides info on which bit in pfn is used to indicate that the
>> page is shared with the host and attributes of the TD, such as debug.
>>
>> Information about the number of CPUs need not be saved because there are
>> no users so far for it.
>>
>> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> Reviewed-by: Andi Kleen <ak@linux.intel.com>
>> Reviewed-by: Tony Luck <tony.luck@intel.com>
>> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>> ---
>>
>> Changes since v4:
>>   * None
>>
>> Changes since v3:
>>   * None
>>
>>   arch/x86/kernel/tdx.c | 23 +++++++++++++++++++++++
>>   1 file changed, 23 insertions(+)
>>
>> diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
>> index 287564990f21..3973e81751ba 100644
>> --- a/arch/x86/kernel/tdx.c
>> +++ b/arch/x86/kernel/tdx.c
>> @@ -8,6 +8,14 @@
>>   
>>   #include <asm/tdx.h>
>>   
>> +/* TDX Module call Leaf IDs */
>> +#define TDINFO				1
>> +
>> +static struct {
>> +	unsigned int gpa_width;
>> +	unsigned long attributes;
>> +} td_info __ro_after_init;
> 
> Where is that thing even used? I don't see it in the whole patchset.

It is used in different patch set. If you prefer to move it there, I can
move it to that patch set.

patch: https://lore.kernel.org/patchwork/patch/1472343/
series: https://lore.kernel.org/patchwork/project/lkml/list/?series=510836


> 
>> +
>>   /*
>>    * Wrapper for standard use of __tdx_hypercall with BUG_ON() check
>>    * for TDCALL error.
>> @@ -54,6 +62,19 @@ bool tdx_prot_guest_has(unsigned long flag)
>>   }
>>   EXPORT_SYMBOL_GPL(tdx_prot_guest_has);
>>   
>> +static void tdg_get_info(void)
> 
> Also, what Sean said: "tdx_" please. Unless there's a real reason to
> have a different prefix - then state that reason.
> 
>> +{
>> +	u64 ret;
>> +	struct tdx_module_output out = {0};
> 
> The tip-tree preferred ordering of variable declarations at the
> beginning of a function is reverse fir tree order::
> 
> 	struct long_struct_name *descriptive_name;
> 	unsigned long foo, bar;
> 	unsigned int tmp;
> 	int ret;
> 
> The above is faster to parse than the reverse ordering::
> 
> 	int ret;
> 	unsigned int tmp;
> 	unsigned long foo, bar;
> 	struct long_struct_name *descriptive_name;
> 
> And even more so than random ordering::
> 
> 	unsigned long foo, bar;
> 	int ret;
> 	struct long_struct_name *descriptive_name;
> 	unsigned int tmp;

I will re-check the TDX patchset and fix the ordering issues.

> 
>> +
>> +	ret = __tdx_module_call(TDINFO, 0, 0, 0, 0, &out);
>> +
>> +	BUG_ON(ret);
> 
> WARNING: Avoid crashing the kernel - try using WARN_ON & recovery code rather than BUG() or BUG_ON()
> #121: FILE: arch/x86/kernel/tdx.c:72:
> +	BUG_ON(ret);

I have already fixed reasonable check-patch issues. For this case, we
want to use BUG_ON(). Failure in tdx_module_call means buggy TDX
module. So it is safer to crash the kernel.

> 
> Have I already told you about checkpatch?
> 
> If not, here it is:
> 
> Please integrate scripts/checkpatch.pl into your patch creation
> workflow. Some of the warnings/errors *actually* make sense.
> 

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 06/12] x86/tdx: Get TD execution environment information via TDINFO
  2021-08-20 17:31     ` Kuppuswamy, Sathyanarayanan
@ 2021-08-20 17:35       ` Borislav Petkov
  2021-08-20 18:29         ` Kuppuswamy, Sathyanarayanan
  0 siblings, 1 reply; 72+ messages in thread
From: Borislav Petkov @ 2021-08-20 17:35 UTC (permalink / raw)
  To: Kuppuswamy, Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel

On Fri, Aug 20, 2021 at 10:31:10AM -0700, Kuppuswamy, Sathyanarayanan wrote:
> It is used in different patch set. If you prefer to move it there, I can
> move it to that patch set.

Yes please.

> I have already fixed reasonable check-patch issues. For this case, we
> want to use BUG_ON(). Failure in tdx_module_call means buggy TDX
> module. So it is safer to crash the kernel.

Ok, put that as a comment above it to explain why it cannot continue.
Also, make sure you issue an error message before it explodes so that
the user knows.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 06/12] x86/tdx: Get TD execution environment information via TDINFO
  2021-08-20 17:35       ` Borislav Petkov
@ 2021-08-20 18:29         ` Kuppuswamy, Sathyanarayanan
  2021-08-20 18:58           ` Andi Kleen
  0 siblings, 1 reply; 72+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2021-08-20 18:29 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel



On 8/20/21 10:35 AM, Borislav Petkov wrote:
> Ok, put that as a comment above it to explain why it cannot continue.
> Also, make sure you issue an error message before it explodes so that
> the user knows.

Ok. I will fix this in next version.

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 06/12] x86/tdx: Get TD execution environment information via TDINFO
  2021-08-20 18:29         ` Kuppuswamy, Sathyanarayanan
@ 2021-08-20 18:58           ` Andi Kleen
  2021-08-20 19:01             ` Kuppuswamy, Sathyanarayanan
  0 siblings, 1 reply; 72+ messages in thread
From: Andi Kleen @ 2021-08-20 18:58 UTC (permalink / raw)
  To: Kuppuswamy, Sathyanarayanan, Borislav Petkov
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel


On 8/20/2021 11:29 AM, Kuppuswamy, Sathyanarayanan wrote:
>
>
> On 8/20/21 10:35 AM, Borislav Petkov wrote:
>> Ok, put that as a comment above it to explain why it cannot continue.
>> Also, make sure you issue an error message before it explodes so that
>> the user knows.
>
> Ok. I will fix this in next version.


Without working TDCALLs the error message won't appear anywhere. The 
only practical way to debug such a problem is a kernel debugger.

Also printing an error message might end up recursing because the 
console write would trigger TDCALL again, or eventually stop because the 
console lock is already taken. In any case it won't work.


-Andi


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 06/12] x86/tdx: Get TD execution environment information via TDINFO
  2021-08-20 18:58           ` Andi Kleen
@ 2021-08-20 19:01             ` Kuppuswamy, Sathyanarayanan
  0 siblings, 0 replies; 72+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2021-08-20 19:01 UTC (permalink / raw)
  To: Andi Kleen, Borislav Petkov
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel



On 8/20/21 11:58 AM, Andi Kleen wrote:
> 
> 
> Without working TDCALLs the error message won't appear anywhere. The only practical way to debug 
> such a problem is a kernel debugger.
> 
> Also printing an error message might end up recursing because the console write would trigger TDCALL 
> again, or eventually stop because the console lock is already taken. In any case it won't work.

Yes, good point. In this case, adding debug print will not work. But I
can add more comments to the code.

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 07/12] x86/traps: Add #VE support for TDX guest
  2021-08-04 18:13 ` [PATCH v5 07/12] x86/traps: Add #VE support for TDX guest Kuppuswamy Sathyanarayanan
@ 2021-08-24 10:17   ` Borislav Petkov
  2021-08-24 17:32     ` Kuppuswamy, Sathyanarayanan
  0 siblings, 1 reply; 72+ messages in thread
From: Borislav Petkov @ 2021-08-24 10:17 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel

On Wed, Aug 04, 2021 at 11:13:24AM -0700, Kuppuswamy Sathyanarayanan wrote:
> If a guest kernel action which would normally cause a #VE occurs in the
> interrupt-disabled region before TDGETVEINFO, a #DF is delivered to the
> guest which will result in an oops (and should eventually be a panic, as
> we would like to set panic_on_oops to 1 for TDX guests).

Who's "we"?

Please use passive voice in your commit message and comments: no "we"
or "I", etc. Personal pronouns are ambiguous in text, especially with
so many parties/companies/etc developing the kernel so let's avoid them.

Audit all your patchsets pls.

> Add basic infrastructure to handle any #VE which occurs in the kernel or
> userspace.  Later patches will add handling for specific #VE scenarios.
> 
> Convert unhandled #VE's (everything, until later in this series) so that
> they appear just like a #GP by calling ve_raise_fault() directly.
> ve_raise_fault() is similar to #GP handler and is responsible for
> sending SIGSEGV to userspace and cpu die and notifying debuggers and

In all your text:

s/cpu/CPU/g

Audit all your patchsets pls.

> @@ -53,6 +67,11 @@ u64 __tdx_module_call(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
>  u64 __tdx_hypercall(u64 type, u64 fn, u64 r12, u64 r13, u64 r14,
>  		    u64 r15, struct tdx_hypercall_output *out);
>  
> +unsigned long tdg_get_ve_info(struct ve_info *ve);
> +
> +int tdg_handle_virtualization_exception(struct pt_regs *regs,

There's that "tdg" prefix again. Please fix all your patchsets.

> +					struct ve_info *ve);
> +
>  #else
>  
>  static inline void tdx_early_init(void) { };
> diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
> index df0fa695bb09..a5eaae8e6c44 100644
> --- a/arch/x86/kernel/idt.c
> +++ b/arch/x86/kernel/idt.c
> @@ -68,6 +68,9 @@ static const __initconst struct idt_data early_idts[] = {
>  	 */
>  	INTG(X86_TRAP_PF,		asm_exc_page_fault),
>  #endif
> +#ifdef CONFIG_INTEL_TDX_GUEST
> +	INTG(X86_TRAP_VE,		asm_exc_virtualization_exception),
> +#endif
>  };
>  
>  /*
> @@ -91,6 +94,9 @@ static const __initconst struct idt_data def_idts[] = {
>  	INTG(X86_TRAP_MF,		asm_exc_coprocessor_error),
>  	INTG(X86_TRAP_AC,		asm_exc_alignment_check),
>  	INTG(X86_TRAP_XF,		asm_exc_simd_coprocessor_error),
> +#ifdef CONFIG_INTEL_TDX_GUEST
> +	INTG(X86_TRAP_VE,		asm_exc_virtualization_exception),
> +#endif
>  
>  #ifdef CONFIG_X86_32
>  	TSKG(X86_TRAP_DF,		GDT_ENTRY_DOUBLEFAULT_TSS),
> diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
> index 3973e81751ba..6169f9c740b2 100644
> --- a/arch/x86/kernel/tdx.c
> +++ b/arch/x86/kernel/tdx.c
> @@ -10,6 +10,7 @@
>  
>  /* TDX Module call Leaf IDs */
>  #define TDINFO				1
> +#define TDGETVEINFO			3
>  
>  static struct {
>  	unsigned int gpa_width;
> @@ -75,6 +76,41 @@ static void tdg_get_info(void)
>  	td_info.attributes = out.rdx;
>  }
>  
> +unsigned long tdg_get_ve_info(struct ve_info *ve)
> +{
> +	u64 ret;
> +	struct tdx_module_output out = {0};

The tip-tree preferred ordering of variable declarations at the
beginning of a function is reverse fir tree order::

	struct long_struct_name *descriptive_name;
	unsigned long foo, bar;
	unsigned int tmp;
	int ret;

The above is faster to parse than the reverse ordering::

	int ret;
	unsigned int tmp;
	unsigned long foo, bar;
	struct long_struct_name *descriptive_name;

And even more so than random ordering::

	unsigned long foo, bar;
	int ret;
	struct long_struct_name *descriptive_name;
	unsigned int tmp;

> +
> +	/*
> +	 * NMIs and machine checks are suppressed. Before this point any
> +	 * #VE is fatal. After this point (TDGETVEINFO call), NMIs and
> +	 * additional #VEs are permitted (but we don't expect them to
> +	 * happen unless you panic).
> +	 */
> +	ret = __tdx_module_call(TDGETVEINFO, 0, 0, 0, 0, &out);
> +
> +	ve->exit_reason = out.rcx;
> +	ve->exit_qual   = out.rdx;
> +	ve->gla         = out.r8;
> +	ve->gpa         = out.r9;
> +	ve->instr_len   = out.r10 & UINT_MAX;
> +	ve->instr_info  = out.r10 >> 32;
> +
> +	return ret;
> +}
> +
> +int tdg_handle_virtualization_exception(struct pt_regs *regs,
> +					struct ve_info *ve)
> +{
> +	/*
> +	 * TODO: Add handler support for various #VE exit
> +	 * reasons. It will be added by other patches in
> +	 * the series.
> +	 */

That comment needs to go.

> +	pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
> +	return -EFAULT;
> +}
> +
>  void __init tdx_early_init(void)
>  {
>  	if (!cpuid_has_tdx_guest())
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index a58800973aed..be56f0281cb5 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -61,6 +61,7 @@
>  #include <asm/insn.h>
>  #include <asm/insn-eval.h>
>  #include <asm/vdso.h>
> +#include <asm/tdx.h>
>  
>  #ifdef CONFIG_X86_64
>  #include <asm/x86_init.h>
> @@ -1140,6 +1141,74 @@ DEFINE_IDTENTRY(exc_device_not_available)
>  	}
>  }
>  
> +#ifdef CONFIG_INTEL_TDX_GUEST
> +#define VEFSTR "VE fault"
> +static void ve_raise_fault(struct pt_regs *regs, long error_code)
> +{
> +	struct task_struct *tsk = current;
> +
> +	if (user_mode(regs)) {
> +		tsk->thread.error_code = error_code;
> +		tsk->thread.trap_nr = X86_TRAP_VE;
> +
> +		/*
> +		 * Not fixing up VDSO exceptions similar to #GP handler
> +		 * because we don't expect the VDSO to trigger #VE.
> +		 */
> +		show_signal(tsk, SIGSEGV, "", VEFSTR, regs, error_code);
> +		force_sig(SIGSEGV);
> +		return;
> +	}
> +
> +	if (fixup_exception(regs, X86_TRAP_VE, error_code, 0))

There are exception table entries which can trigger a #VE?

> +		return;
> +
> +	tsk->thread.error_code = error_code;
> +	tsk->thread.trap_nr = X86_TRAP_VE;
> +
> +	/*
> +	 * To be potentially processing a kprobe fault and to trust the result
> +	 * from kprobe_running(), we have to be non-preemptible.
> +	 */
> +	if (!preemptible() &&
> +	    kprobe_running() &&
> +	    kprobe_fault_handler(regs, X86_TRAP_VE))
> +		return;
> +
> +	notify_die(DIE_GPF, VEFSTR, regs, error_code, X86_TRAP_VE, SIGSEGV);

Other handlers check that retval.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 08/12] x86/tdx: Add HLT support for TDX guest
  2021-08-04 18:13 ` [PATCH v5 08/12] x86/tdx: Add HLT " Kuppuswamy Sathyanarayanan
@ 2021-08-24 16:10   ` Borislav Petkov
  2021-08-24 17:06     ` Sean Christopherson
  2021-08-24 17:35     ` Kuppuswamy, Sathyanarayanan
  0 siblings, 2 replies; 72+ messages in thread
From: Borislav Petkov @ 2021-08-24 16:10 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel

On Wed, Aug 04, 2021 at 11:13:25AM -0700, Kuppuswamy Sathyanarayanan wrote:
> @@ -240,6 +243,32 @@ SYM_FUNC_START(__tdx_hypercall)
>  
>  	movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx
>  
> +	/*
> +	 * For the idle loop STI needs to be called directly before
> +	 * the TDCALL that enters idle (EXIT_REASON_HLT case). STI
> +	 * enables interrupts only one instruction later. If there
> +	 * are any instructions between the STI and the TDCALL for
> +	 * HLT then an interrupt could happen in that time, but the
> +	 * code would go back to sleep afterwards, which can cause
> +	 * longer delays.

<-- newline in the comment here for better readability.

> There leads to significant difference in

"There leads..." ?

> +	 * network performance benchmarks. So add a special case for
> +	 * EXIT_REASON_HLT to trigger sti before TDCALL. But this
> +	 * change is not required for all HLT cases. So use R15
> +	 * register value to identify the case which needs sti. So,

s/sti/STI/g

> +	 * if R11 is EXIT_REASON_HLT and R15 is 1, then call sti
> +	 * before TDCALL instruction. Note that R15 register is not
> +	 * required by TDCALL ABI when triggering the hypercall for
> +	 * EXIT_REASON_HLT case. So use it in software to select the
> +	 * sti case.
> +	 */
> +	cmpl $EXIT_REASON_HLT, %r11d
> +	jne skip_sti
> +	cmpl $1, %r15d
> +	jne skip_sti
> +	/* Set R15 register to 0, it is unused in EXIT_REASON_HLT case */
> +	xor %r15, %r15
> +	sti
> +skip_sti:
>  	tdcall

...

> +static __cpuidle void tdg_safe_halt(void)
> +{
> +	u64 ret;
> +
> +	/*
> +	 * Enable interrupts next to the TDVMCALL to avoid
> +	 * performance degradation.

That comment needs some more love to say exactly what the problem is.

> +	 */
> +	local_irq_enable();
> +
> +	/* IRQ is enabled, So set R12 as 0 */
> +	ret = _tdx_hypercall(EXIT_REASON_HLT, 0, 0, 0, 1, NULL);
> +
> +	/* It should never fail */
> +	BUG_ON(ret);
> +}

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 09/12] x86/tdx: Wire up KVM hypercalls
  2021-08-04 18:13 ` [PATCH v5 09/12] x86/tdx: Wire up KVM hypercalls Kuppuswamy Sathyanarayanan
@ 2021-08-24 16:34   ` Borislav Petkov
  2021-08-24 18:11     ` Kuppuswamy, Sathyanarayanan
  0 siblings, 1 reply; 72+ messages in thread
From: Borislav Petkov @ 2021-08-24 16:34 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel

On Wed, Aug 04, 2021 at 11:13:26AM -0700, Kuppuswamy Sathyanarayanan wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> 
> KVM hypercalls use the "vmcall" or "vmmcall" instructions.

Write instruction mnemonics in all caps pls.

> +# This option enables KVM specific hypercalls in TDX guest.
> +config INTEL_TDX_GUEST_KVM

What is that config option really for? IOW, can't you use
CONFIG_KVM_GUEST instead?

> +	def_bool y
> +	depends on KVM_GUEST && INTEL_TDX_GUEST
> +
>  endif #HYPERVISOR_GUEST
>  
>  source "arch/x86/Kconfig.cpu"
> diff --git a/arch/x86/include/asm/asm-prototypes.h b/arch/x86/include/asm/asm-prototypes.h
> index 4cb726c71ed8..9855a9ff2924 100644
> --- a/arch/x86/include/asm/asm-prototypes.h
> +++ b/arch/x86/include/asm/asm-prototypes.h
> @@ -17,6 +17,10 @@
>  extern void cmpxchg8b_emu(void);
>  #endif
>  
> +#ifdef CONFIG_INTEL_TDX_GUEST
> +#include <asm/tdx.h>
> +#endif

What "ASM sysmbol generation issue" forced this?

...

> diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
> index 846fe58f0426..8fa33e2c98db 100644
> --- a/arch/x86/include/asm/tdx.h
> +++ b/arch/x86/include/asm/tdx.h
> @@ -6,8 +6,9 @@
>  #include <linux/cpufeature.h>
>  #include <linux/types.h>
>  
> -#define TDX_CPUID_LEAF_ID	0x21
> -#define TDX_HYPERCALL_STANDARD  0
> +#define TDX_CPUID_LEAF_ID			0x21
> +#define TDX_HYPERCALL_STANDARD			0
> +#define TDX_HYPERCALL_VENDOR_KVM		0x4d564b2e584454

						"TDX.KVM"

Yeah, you can put it in a comment so that people don't have to do the
CTRL-V game in vim insert mode, i.e., ":help i_CTRL-V_digit" :-)

>  /*
>   * Used in __tdx_module_call() helper function to gather the
> @@ -80,4 +81,29 @@ static inline bool tdx_prot_guest_has(unsigned long flag) { return false; }
>  
>  #endif /* CONFIG_INTEL_TDX_GUEST */
>  
> +#ifdef CONFIG_INTEL_TDX_GUEST_KVM

I don't think that even needs the ifdeffery. If it is not used, the
inline will simply get discarded so why bother?

> +
> +static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
> +				     unsigned long p2, unsigned long p3,
> +				     unsigned long p4)
> +{
> +	struct tdx_hypercall_output out;
> +	u64 err;
> +
> +	err = __tdx_hypercall(TDX_HYPERCALL_VENDOR_KVM, nr, p1, p2,
> +			      p3, p4, &out);
> +
> +	BUG_ON(err);
> +
> +	return out.r10;
> +}
> +#else
> +static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
> +				     unsigned long p2, unsigned long p3,
> +				     unsigned long p4)
> +{
> +	return -ENODEV;
> +}
> +#endif /* CONFIG_INTEL_TDX_GUEST_KVM */
> +
>  #endif /* _ASM_X86_TDX_H */
> diff --git a/arch/x86/kernel/tdcall.S b/arch/x86/kernel/tdcall.S
> index 9df94f87465d..1823bac4542d 100644
> --- a/arch/x86/kernel/tdcall.S
> +++ b/arch/x86/kernel/tdcall.S
> @@ -3,6 +3,7 @@
>  #include <asm/asm.h>
>  #include <asm/frame.h>
>  #include <asm/unwind_hints.h>
> +#include <asm/export.h>
>  
>  #include <linux/linkage.h>
>  #include <linux/bits.h>
> @@ -309,3 +310,4 @@ skip_sti:
>  
>         retq
>  SYM_FUNC_END(__tdx_hypercall)
> +EXPORT_SYMBOL(__tdx_hypercall);

EXPORT_SYMBOL_GPL, of course.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 10/12] x86/tdx: Add MSR support for TDX guest
  2021-08-04 18:13 ` [PATCH v5 10/12] x86/tdx: Add MSR support for TDX guest Kuppuswamy Sathyanarayanan
@ 2021-08-24 16:55   ` Borislav Petkov
  2021-08-24 18:12     ` Kuppuswamy, Sathyanarayanan
  0 siblings, 1 reply; 72+ messages in thread
From: Borislav Petkov @ 2021-08-24 16:55 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel

On Wed, Aug 04, 2021 at 11:13:27AM -0700, Kuppuswamy Sathyanarayanan wrote:
>  int tdg_handle_virtualization_exception(struct pt_regs *regs,
>  					struct ve_info *ve)
>  {
> +	unsigned long val;
> +	int ret = 0;
> +
>  	switch (ve->exit_reason) {
>  	case EXIT_REASON_HLT:
>  		tdg_halt();
>  		break;
> +	case EXIT_REASON_MSR_READ:
> +		val = tdg_read_msr_safe(regs->cx, (unsigned int *)&ret);
> +		if (!ret) {
> +			regs->ax = val & UINT_MAX;

			regs->ax = (u32)val;

> +			regs->dx = val >> 32;
> +		}
> +		break;
> +	case EXIT_REASON_MSR_WRITE:
> +		ret = tdg_write_msr_safe(regs->cx, regs->ax, regs->dx);
> +		break;
>  	default:
>  		pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
>  		return -EFAULT;
>  	}
>  
>  	/* After successful #VE handling, move the IP */
> -	regs->ip += ve->instr_len;
> +	if (!ret)
> +		regs->ip += ve->instr_len;
>  
> -	return 0;
> +	return ret;
>  }
>  
>  void __init tdx_early_init(void)
> -- 

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 08/12] x86/tdx: Add HLT support for TDX guest
  2021-08-24 16:10   ` Borislav Petkov
@ 2021-08-24 17:06     ` Sean Christopherson
  2021-08-24 17:25       ` Andi Kleen
                         ` (2 more replies)
  2021-08-24 17:35     ` Kuppuswamy, Sathyanarayanan
  1 sibling, 3 replies; 72+ messages in thread
From: Sean Christopherson @ 2021-08-24 17:06 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Kuppuswamy Sathyanarayanan, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Andy Lutomirski, Peter H Anvin, Dave Hansen,
	Tony Luck, Dan Williams, Andi Kleen, Kirill Shutemov,
	Kuppuswamy Sathyanarayanan, x86, linux-kernel

On Tue, Aug 24, 2021, Borislav Petkov wrote:
> On Wed, Aug 04, 2021 at 11:13:25AM -0700, Kuppuswamy Sathyanarayanan wrote:
> > +static __cpuidle void tdg_safe_halt(void)
> > +{
> > +	u64 ret;
> > +
> > +	/*
> > +	 * Enable interrupts next to the TDVMCALL to avoid
> > +	 * performance degradation.
> 
> That comment needs some more love to say exactly what the problem is.

LOL, I guess hanging the vCPU counts as degraded performance.  But this comment
can and should go away entirely...

> > +	 */
> > +	local_irq_enable();

...because this is broken.  It's also disturbing because it suggests that these
patches are not being tested.

The STI _must_ immediately precede TDCALL, and it _must_ execute with interrupts
disabled.  The whole point of the STI blocking shadow is to ensure interrupts are
blocked until _after_ the HLT completes so that a wake event is not recongized
before the HLT, in which case the vCPU will get stuck in HLT because its wake
event alreadyfired.  Enabling IRQs well before the TDCALL defeats the purpose of
the STI dance in __tdx_hypercall().

There's even a massive comment in __tdx_hypercall() explaining all this...

> > +
> > +	/* IRQ is enabled, So set R12 as 0 */

It would be helpful to use local variables to document what's up, e.g.

 	const bool irqs_enabled = true;
	const bool do_sti = true;

	ret = _tdx_hypercall(EXIT_REASON_HLT, irqs_enabled0, 0, 0, do_sti, NULL);
	
> > +	ret = _tdx_hypercall(EXIT_REASON_HLT, 0, 0, 0, 1, NULL);
> > +
> > +	/* It should never fail */
> > +	BUG_ON(ret);
> > +}
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 08/12] x86/tdx: Add HLT support for TDX guest
  2021-08-24 17:06     ` Sean Christopherson
@ 2021-08-24 17:25       ` Andi Kleen
  2021-08-24 17:27       ` Borislav Petkov
  2021-08-24 18:18       ` Kuppuswamy, Sathyanarayanan
  2 siblings, 0 replies; 72+ messages in thread
From: Andi Kleen @ 2021-08-24 17:25 UTC (permalink / raw)
  To: Sean Christopherson, Borislav Petkov
  Cc: Kuppuswamy Sathyanarayanan, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Andy Lutomirski, Peter H Anvin, Dave Hansen,
	Tony Luck, Dan Williams, Kirill Shutemov,
	Kuppuswamy Sathyanarayanan, x86, linux-kernel


>>> +	 */
>>> +	local_irq_enable();
> ...because this is broken.  It's also disturbing because it suggests that these
> patches are not being tested.

This is already fixed in the github tree. Yes it took some time for the 
fix to trickle out.


-Andi



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 08/12] x86/tdx: Add HLT support for TDX guest
  2021-08-24 17:06     ` Sean Christopherson
  2021-08-24 17:25       ` Andi Kleen
@ 2021-08-24 17:27       ` Borislav Petkov
  2021-08-24 17:47         ` Sean Christopherson
  2021-08-31 20:49         ` Kuppuswamy, Sathyanarayanan
  2021-08-24 18:18       ` Kuppuswamy, Sathyanarayanan
  2 siblings, 2 replies; 72+ messages in thread
From: Borislav Petkov @ 2021-08-24 17:27 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Kuppuswamy Sathyanarayanan, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Andy Lutomirski, Peter H Anvin, Dave Hansen,
	Tony Luck, Dan Williams, Andi Kleen, Kirill Shutemov,
	Kuppuswamy Sathyanarayanan, x86, linux-kernel

On Tue, Aug 24, 2021 at 05:06:21PM +0000, Sean Christopherson wrote:
> On Tue, Aug 24, 2021, Borislav Petkov wrote:
> > On Wed, Aug 04, 2021 at 11:13:25AM -0700, Kuppuswamy Sathyanarayanan wrote:
> > > +static __cpuidle void tdg_safe_halt(void)
> > > +{
> > > +	u64 ret;
> > > +
> > > +	/*
> > > +	 * Enable interrupts next to the TDVMCALL to avoid
> > > +	 * performance degradation.
> > 
> > That comment needs some more love to say exactly what the problem is.
> 
> LOL, I guess hanging the vCPU counts as degraded performance.  But this comment
> can and should go away entirely...
> 
> > > +	 */
> > > +	local_irq_enable();
> 
> ...because this is broken.  It's also disturbing because it suggests that these
> patches are not being tested.

My complaint since '88.

> The STI _must_ immediately precede TDCALL, and it _must_ execute with interrupts
> disabled.  The whole point of the STI blocking shadow is to ensure interrupts are
> blocked until _after_ the HLT completes so that a wake event is not recongized
> before the HLT, in which case the vCPU will get stuck in HLT because its wake
> event alreadyfired.  Enabling IRQs well before the TDCALL defeats the purpose of
> the STI dance in __tdx_hypercall().

Wait, whaaaat?!

So tdg_halt() does that but tdg_safe_halt() goes to great lengths not to
do it. And it looks all legit and all, like it really wanted to do it
differently. WTF?

> There's even a massive comment in __tdx_hypercall() explaining all this...
> 
> > > +
> > > +	/* IRQ is enabled, So set R12 as 0 */
> 
> It would be helpful to use local variables to document what's up, e.g.
> 
>  	const bool irqs_enabled = true;
> 	const bool do_sti = true;
> 
> 	ret = _tdx_hypercall(EXIT_REASON_HLT, irqs_enabled0, 0, 0, do_sti, NULL);

Wait, is this do_sti thing supposed to be:

	 * ... But this
         * change is not required for all HLT cases. So use R15
         * register value to identify the case which needs sti. So,
         * if R11 is EXIT_REASON_HLT and R15 is 1, then call sti
         * before TDCALL instruction.

?


> > > +	ret = _tdx_hypercall(EXIT_REASON_HLT, 0, 0, 0, 1, NULL);
						      ^^^
Yeah, it must be it - the 1 there.

And what's with the irqs_enabled first parameter?

Is that used by the TDX module?

I think in the next version all those _tdx_hypercall() wrappers should
spell it out what the parameters they pass are used for.

Hohumm.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 07/12] x86/traps: Add #VE support for TDX guest
  2021-08-24 10:17   ` Borislav Petkov
@ 2021-08-24 17:32     ` Kuppuswamy, Sathyanarayanan
  2021-08-24 17:36       ` Dave Hansen
  2021-08-24 17:46       ` Borislav Petkov
  0 siblings, 2 replies; 72+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2021-08-24 17:32 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel



On 8/24/21 3:17 AM, Borislav Petkov wrote:
> On Wed, Aug 04, 2021 at 11:13:24AM -0700, Kuppuswamy Sathyanarayanan wrote:
>> If a guest kernel action which would normally cause a #VE occurs in the
>> interrupt-disabled region before TDGETVEINFO, a #DF is delivered to the
>> guest which will result in an oops (and should eventually be a panic, as
>> we would like to set panic_on_oops to 1 for TDX guests).
> 
> Who's "we"?
> 
> Please use passive voice in your commit message and comments: no "we"
> or "I", etc. Personal pronouns are ambiguous in text, especially with
> so many parties/companies/etc developing the kernel so let's avoid them.
> 
> Audit all your patchsets pls.

Sorry. I will fix this in next version.

> 
>> Add basic infrastructure to handle any #VE which occurs in the kernel or
>> userspace.  Later patches will add handling for specific #VE scenarios.
>>
>> Convert unhandled #VE's (everything, until later in this series) so that
>> they appear just like a #GP by calling ve_raise_fault() directly.
>> ve_raise_fault() is similar to #GP handler and is responsible for
>> sending SIGSEGV to userspace and cpu die and notifying debuggers and
> 
> In all your text:
> 
> s/cpu/CPU/g
> 
> Audit all your patchsets pls.

Yes. I will fix this in next version.

> 
>> @@ -53,6 +67,11 @@ u64 __tdx_module_call(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
>>   u64 __tdx_hypercall(u64 type, u64 fn, u64 r12, u64 r13, u64 r14,
>>   		    u64 r15, struct tdx_hypercall_output *out);
>>   
>> +unsigned long tdg_get_ve_info(struct ve_info *ve);
>> +
>> +int tdg_handle_virtualization_exception(struct pt_regs *regs,
> 
> There's that "tdg" prefix again. Please fix all your patchsets.

Mainly chose it avoid future name conflicts with KVM (tdx) calls. But
if you don't like "tdg", I can change it back to "tdx" and resolve the
naming issues when it occurs.


>>   static struct {
>>   	unsigned int gpa_width;
>> @@ -75,6 +76,41 @@ static void tdg_get_info(void)
>>   	td_info.attributes = out.rdx;
>>   }
>>   
>> +unsigned long tdg_get_ve_info(struct ve_info *ve)
>> +{
>> +	u64 ret;
>> +	struct tdx_module_output out = {0};
> 
> The tip-tree preferred ordering of variable declarations at the
> beginning of a function is reverse fir tree order::
> 
> 	struct long_struct_name *descriptive_name;
> 	unsigned long foo, bar;
> 	unsigned int tmp;
> 	int ret;
> 
> The above is faster to parse than the reverse ordering::
> 
> 	int ret;
> 	unsigned int tmp;
> 	unsigned long foo, bar;
> 	struct long_struct_name *descriptive_name;
> 
> And even more so than random ordering::
> 
> 	unsigned long foo, bar;
> 	int ret;
> 	struct long_struct_name *descriptive_name;
> 	unsigned int tmp;

Yes. I will fix this in next version.


>> +int tdg_handle_virtualization_exception(struct pt_regs *regs,
>> +					struct ve_info *ve)
>> +{
>> +	/*
>> +	 * TODO: Add handler support for various #VE exit
>> +	 * reasons. It will be added by other patches in
>> +	 * the series.
>> +	 */
> 
> That comment needs to go.

Ok. I will remove it.

>> +#ifdef CONFIG_INTEL_TDX_GUEST
>> +#define VEFSTR "VE fault"
>> +static void ve_raise_fault(struct pt_regs *regs, long error_code)
>> +{
>> +	struct task_struct *tsk = current;
>> +
>> +	if (user_mode(regs)) {
>> +		tsk->thread.error_code = error_code;
>> +		tsk->thread.trap_nr = X86_TRAP_VE;
>> +
>> +		/*
>> +		 * Not fixing up VDSO exceptions similar to #GP handler
>> +		 * because we don't expect the VDSO to trigger #VE.
>> +		 */
>> +		show_signal(tsk, SIGSEGV, "", VEFSTR, regs, error_code);
>> +		force_sig(SIGSEGV);
>> +		return;
>> +	}
>> +
>> +	if (fixup_exception(regs, X86_TRAP_VE, error_code, 0))
> 
> There are exception table entries which can trigger a #VE?

It is required to handle #VE exceptions raised by unhandled MSR
read/writes.

> 
>> +		return;
>> +
>> +	tsk->thread.error_code = error_code;
>> +	tsk->thread.trap_nr = X86_TRAP_VE;
>> +
>> +	/*
>> +	 * To be potentially processing a kprobe fault and to trust the result
>> +	 * from kprobe_running(), we have to be non-preemptible.
>> +	 */
>> +	if (!preemptible() &&
>> +	    kprobe_running() &&
>> +	    kprobe_fault_handler(regs, X86_TRAP_VE))
>> +		return;
>> +
>> +	notify_die(DIE_GPF, VEFSTR, regs, error_code, X86_TRAP_VE, SIGSEGV);
> 
> Other handlers check that retval.

Ok. I can check it. But there is only one statement after this call. So it
may not be very helpful.

> 

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 08/12] x86/tdx: Add HLT support for TDX guest
  2021-08-24 16:10   ` Borislav Petkov
  2021-08-24 17:06     ` Sean Christopherson
@ 2021-08-24 17:35     ` Kuppuswamy, Sathyanarayanan
  2021-08-24 17:48       ` Borislav Petkov
  1 sibling, 1 reply; 72+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2021-08-24 17:35 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel



On 8/24/21 9:10 AM, Borislav Petkov wrote:
> On Wed, Aug 04, 2021 at 11:13:25AM -0700, Kuppuswamy Sathyanarayanan wrote:
>> @@ -240,6 +243,32 @@ SYM_FUNC_START(__tdx_hypercall)
>>   
>>   	movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx
>>   
>> +	/*
>> +	 * For the idle loop STI needs to be called directly before
>> +	 * the TDCALL that enters idle (EXIT_REASON_HLT case). STI
>> +	 * enables interrupts only one instruction later. If there
>> +	 * are any instructions between the STI and the TDCALL for
>> +	 * HLT then an interrupt could happen in that time, but the
>> +	 * code would go back to sleep afterwards, which can cause
>> +	 * longer delays.
> 
> <-- newline in the comment here for better readability.

Ok. I will add it in next version.

> 
>> There leads to significant difference in
> 
> "There leads..." ?

Will fix this in next version. "This leads"

> 
>> +	 * network performance benchmarks. So add a special case for
>> +	 * EXIT_REASON_HLT to trigger sti before TDCALL. But this
>> +	 * change is not required for all HLT cases. So use R15
>> +	 * register value to identify the case which needs sti. So,
> 
> s/sti/STI/g

will fix this in next version.

> 
>> +	 * if R11 is EXIT_REASON_HLT and R15 is 1, then call sti
>> +	 * before TDCALL instruction. Note that R15 register is not
>> +	 * required by TDCALL ABI when triggering the hypercall for
>> +	 * EXIT_REASON_HLT case. So use it in software to select the
>> +	 * sti case.
>> +	 */
>> +	cmpl $EXIT_REASON_HLT, %r11d
>> +	jne skip_sti
>> +	cmpl $1, %r15d
>> +	jne skip_sti
>> +	/* Set R15 register to 0, it is unused in EXIT_REASON_HLT case */
>> +	xor %r15, %r15
>> +	sti
>> +skip_sti:
>>   	tdcall
> 
> ...
> 
>> +static __cpuidle void tdg_safe_halt(void)
>> +{
>> +	u64 ret;
>> +
>> +	/*
>> +	 * Enable interrupts next to the TDVMCALL to avoid
>> +	 * performance degradation.
> 
> That comment needs some more love to say exactly what the problem is.

It is a bug in this submission. After adding STI fix, this local_irq_enable()
had to be removed. Somehow I have missed to do it. I will fix this
in next version.

> 
>> +	 */
>> +	local_irq_enable();
>> +
>> +	/* IRQ is enabled, So set R12 as 0 */
>> +	ret = _tdx_hypercall(EXIT_REASON_HLT, 0, 0, 0, 1, NULL);
>> +
>> +	/* It should never fail */
>> +	BUG_ON(ret);
>> +}
> 

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 07/12] x86/traps: Add #VE support for TDX guest
  2021-08-24 17:32     ` Kuppuswamy, Sathyanarayanan
@ 2021-08-24 17:36       ` Dave Hansen
  2021-08-24 17:46       ` Borislav Petkov
  1 sibling, 0 replies; 72+ messages in thread
From: Dave Hansen @ 2021-08-24 17:36 UTC (permalink / raw)
  To: Kuppuswamy, Sathyanarayanan, Borislav Petkov
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel

On 8/24/21 10:32 AM, Kuppuswamy, Sathyanarayanan wrote:
>>> +    if (fixup_exception(regs, X86_TRAP_VE, error_code, 0))
>>
>> There are exception table entries which can trigger a #VE?
> 
> It is required to handle #VE exceptions raised by unhandled MSR
> read/writes.

Is that really the *complete* set of reasons that a #VE can be triggered
from the kernel?

Just off the top of my head, I could imagine the kernel doing a
copy_{to,from}_user() which touched user-mapped MMIO causing a #VE.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 07/12] x86/traps: Add #VE support for TDX guest
  2021-08-24 17:32     ` Kuppuswamy, Sathyanarayanan
  2021-08-24 17:36       ` Dave Hansen
@ 2021-08-24 17:46       ` Borislav Petkov
  2021-09-02 15:24         ` Kuppuswamy, Sathyanarayanan
  1 sibling, 1 reply; 72+ messages in thread
From: Borislav Petkov @ 2021-08-24 17:46 UTC (permalink / raw)
  To: Kuppuswamy, Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel

On Tue, Aug 24, 2021 at 10:32:13AM -0700, Kuppuswamy, Sathyanarayanan wrote:
> Mainly chose it avoid future name conflicts with KVM (tdx) calls. But

What name conflicts with KVM calls? Please explain.

> It is required to handle #VE exceptions raised by unhandled MSR
> read/writes.

Example? Please elaborate.

> Ok. I can check it. But there is only one statement after this call.
> So it may not be very helpful.

Looking at die_addr(), that calls the die notifier too. So do you
even *have* to call it here with VEFSTR? As yo say, there's only one
statement after that call and box is dead in the water after that so why
even bother...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 08/12] x86/tdx: Add HLT support for TDX guest
  2021-08-24 17:27       ` Borislav Petkov
@ 2021-08-24 17:47         ` Sean Christopherson
  2021-08-24 17:50           ` Borislav Petkov
  2021-08-31 20:49         ` Kuppuswamy, Sathyanarayanan
  1 sibling, 1 reply; 72+ messages in thread
From: Sean Christopherson @ 2021-08-24 17:47 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Kuppuswamy Sathyanarayanan, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Andy Lutomirski, Peter H Anvin, Dave Hansen,
	Tony Luck, Dan Williams, Andi Kleen, Kirill Shutemov,
	Kuppuswamy Sathyanarayanan, x86, linux-kernel

On Tue, Aug 24, 2021, Borislav Petkov wrote:
> On Tue, Aug 24, 2021 at 05:06:21PM +0000, Sean Christopherson wrote:
> > It would be helpful to use local variables to document what's up, e.g.
> > 
> >  	const bool irqs_enabled = true;
> > 	const bool do_sti = true;
> > 
> > 	ret = _tdx_hypercall(EXIT_REASON_HLT, irqs_enabled0, 0, 0, do_sti, NULL);
> 
> Wait, is this do_sti thing supposed to be:
> 
> 	 * ... But this
>          * change is not required for all HLT cases. So use R15
>          * register value to identify the case which needs sti. So,
>          * if R11 is EXIT_REASON_HLT and R15 is 1, then call sti
>          * before TDCALL instruction.
> 
> ?
> 
> 
> > > > +	ret = _tdx_hypercall(EXIT_REASON_HLT, 0, 0, 0, 1, NULL);
> 						      ^^^
> Yeah, it must be it - the 1 there.
> 
> And what's with the irqs_enabled first parameter?
> 
> Is that used by the TDX module?

It's passed to the (untrusted) VMM.  The TDX Module has direct access to the guest's
entire FLAGS via the VMCS.

The VMM uses the "IRQs enabled" param to understand whether or not it should
schedule the halted vCPU if an IRQ becomes pending.  E.g. if IRQs are disabled
the VMM can keep the vCPU in virtual HLT, even if an IRQ is pending, without
hanging/breaking the guest.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 08/12] x86/tdx: Add HLT support for TDX guest
  2021-08-24 17:35     ` Kuppuswamy, Sathyanarayanan
@ 2021-08-24 17:48       ` Borislav Petkov
  0 siblings, 0 replies; 72+ messages in thread
From: Borislav Petkov @ 2021-08-24 17:48 UTC (permalink / raw)
  To: Kuppuswamy, Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel

On Tue, Aug 24, 2021 at 10:35:33AM -0700, Kuppuswamy, Sathyanarayanan wrote:
> It is a bug in this submission. After adding STI fix, this local_irq_enable()
> had to be removed. Somehow I have missed to do it. I will fix this
> in next version.

Thanks, and also do this pls:

"I think in the next version all those _tdx_hypercall() wrappers should
spell it out what the parameters they pass are used for."

See Sean's mail for more info.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 12/12] x86/tdx: Handle CPUID via #VE
  2021-08-04 18:13 ` [PATCH v5 12/12] x86/tdx: Handle CPUID via #VE Kuppuswamy Sathyanarayanan
@ 2021-08-24 17:48   ` Borislav Petkov
  0 siblings, 0 replies; 72+ messages in thread
From: Borislav Petkov @ 2021-08-24 17:48 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel

On Wed, Aug 04, 2021 at 11:13:29AM -0700, Kuppuswamy Sathyanarayanan wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> 
> TDX has three classes of CPUID leaves: some CPUID leaves
> are always handled by the CPU, others are handled by the TDX module,
> and some others are handled by the VMM. Since the VMM cannot directly
> intercept the instruction these are reflected with a #VE exception
> to the guest, which then converts it into a hypercall to the VMM,
> or handled directly.
> 
> The TDX module EAS has a full list of CPUID leaves which are handled

EAS?

> natively or by the TDX module in 16.2. Only unknown CPUIDs are handled by

16.2?

I believe that commit message was forgotten to be converted to
outside-Intel speak. Please do so.

> the #VE method. In practice this typically only applies to the
> hypervisor specific CPUIDs unknown to the native CPU.

hypervisor-specific

> Therefore there is no risk of causing this in early CPUID code which
> runs before the #VE handler is set up because it will never access
> those exotic CPUID leaves.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reviewed-by: Andi Kleen <ak@linux.intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> ---
> 
> Changes since v4:
>  * None
> 
> Changes since v3:
>  * None
> 
>  arch/x86/kernel/tdx.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
> index d16c7f8759ea..5d2fd6c8b01c 100644
> --- a/arch/x86/kernel/tdx.c
> +++ b/arch/x86/kernel/tdx.c
> @@ -153,6 +153,21 @@ static int tdg_write_msr_safe(unsigned int msr, unsigned int low,
>  	return ret ? -EIO : 0;
>  }
>  
> +static void tdg_handle_cpuid(struct pt_regs *regs)
> +{
> +	u64 ret;
> +	struct tdx_hypercall_output out = {0};
> +
> +	ret = _tdx_hypercall(EXIT_REASON_CPUID, regs->ax, regs->cx, 0, 0, &out);
> +
> +	WARN_ON(ret);

Why warn and not forward the error, instead, so that it lands in
ve_raise_fault() ?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 08/12] x86/tdx: Add HLT support for TDX guest
  2021-08-24 17:47         ` Sean Christopherson
@ 2021-08-24 17:50           ` Borislav Petkov
  0 siblings, 0 replies; 72+ messages in thread
From: Borislav Petkov @ 2021-08-24 17:50 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Kuppuswamy Sathyanarayanan, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Andy Lutomirski, Peter H Anvin, Dave Hansen,
	Tony Luck, Dan Williams, Andi Kleen, Kirill Shutemov,
	Kuppuswamy Sathyanarayanan, x86, linux-kernel

On Tue, Aug 24, 2021 at 05:47:25PM +0000, Sean Christopherson wrote:
> It's passed to the (untrusted) VMM.  The TDX Module has direct access to the guest's
> entire FLAGS via the VMCS.
> 
> The VMM uses the "IRQs enabled" param to understand whether or not it should
> schedule the halted vCPU if an IRQ becomes pending.  E.g. if IRQs are disabled
> the VMM can keep the vCPU in virtual HLT, even if an IRQ is pending, without
> hanging/breaking the guest.

See, explanations like that need to be over those _tdx_hypercall()
wrappers. Otherwise it is just random magic.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 09/12] x86/tdx: Wire up KVM hypercalls
  2021-08-24 16:34   ` Borislav Petkov
@ 2021-08-24 18:11     ` Kuppuswamy, Sathyanarayanan
  2021-08-24 18:29       ` Borislav Petkov
  0 siblings, 1 reply; 72+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2021-08-24 18:11 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel



On 8/24/21 9:34 AM, Borislav Petkov wrote:
> On Wed, Aug 04, 2021 at 11:13:26AM -0700, Kuppuswamy Sathyanarayanan wrote:
>> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>>
>> KVM hypercalls use the "vmcall" or "vmmcall" instructions.
> 
> Write instruction mnemonics in all caps pls.

Ok. I will change it in next submission.

> 
>> +# This option enables KVM specific hypercalls in TDX guest.
>> +config INTEL_TDX_GUEST_KVM
> 
> What is that config option really for? IOW, can't you use
> CONFIG_KVM_GUEST instead?
Since TDX code can be used by other hypervisor (non KVM case) we
want to have a config to differentiate the KVM related calls.

> 
>> +	def_bool y
>> +	depends on KVM_GUEST && INTEL_TDX_GUEST
>> +
>>   endif #HYPERVISOR_GUEST
>>   
>>   source "arch/x86/Kconfig.cpu"
>> diff --git a/arch/x86/include/asm/asm-prototypes.h b/arch/x86/include/asm/asm-prototypes.h
>> index 4cb726c71ed8..9855a9ff2924 100644
>> --- a/arch/x86/include/asm/asm-prototypes.h
>> +++ b/arch/x86/include/asm/asm-prototypes.h
>> @@ -17,6 +17,10 @@
>>   extern void cmpxchg8b_emu(void);
>>   #endif
>>   
>> +#ifdef CONFIG_INTEL_TDX_GUEST
>> +#include <asm/tdx.h>
>> +#endif
> 
> What "ASM sysmbol generation issue" forced this?

Compiler raised version generation issue for __tdx_hypercall

> 
> ...
> 
>> diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
>> index 846fe58f0426..8fa33e2c98db 100644
>> --- a/arch/x86/include/asm/tdx.h
>> +++ b/arch/x86/include/asm/tdx.h
>> @@ -6,8 +6,9 @@
>>   #include <linux/cpufeature.h>
>>   #include <linux/types.h>
>>   
>> -#define TDX_CPUID_LEAF_ID	0x21
>> -#define TDX_HYPERCALL_STANDARD  0
>> +#define TDX_CPUID_LEAF_ID			0x21
>> +#define TDX_HYPERCALL_STANDARD			0
>> +#define TDX_HYPERCALL_VENDOR_KVM		0x4d564b2e584454
> 
> 						"TDX.KVM"
> 
> Yeah, you can put it in a comment so that people don't have to do the
> CTRL-V game in vim insert mode, i.e., ":help i_CTRL-V_digit" :-)

Got it. I will add it in comment.

> 
>>   /*
>>    * Used in __tdx_module_call() helper function to gather the
>> @@ -80,4 +81,29 @@ static inline bool tdx_prot_guest_has(unsigned long flag) { return false; }
>>   
>>   #endif /* CONFIG_INTEL_TDX_GUEST */
>>   
>> +#ifdef CONFIG_INTEL_TDX_GUEST_KVM
> 
> I don't think that even needs the ifdeffery. If it is not used, the
> inline will simply get discarded so why bother?

It makes sense. I can remove it.

> 
>> +
>> +static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
>> +				     unsigned long p2, unsigned long p3,
>> +				     unsigned long p4)
>> +{
>> +	struct tdx_hypercall_output out;
>> +	u64 err;
>> +
>> +	err = __tdx_hypercall(TDX_HYPERCALL_VENDOR_KVM, nr, p1, p2,
>> +			      p3, p4, &out);
>> +
>> +	BUG_ON(err);
>> +
>> +	return out.r10;
>> +}
>> +#else
>> +static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
>> +				     unsigned long p2, unsigned long p3,
>> +				     unsigned long p4)
>> +{
>> +	return -ENODEV;
>> +}
>> +#endif /* CONFIG_INTEL_TDX_GUEST_KVM */
>> +
>>   #endif /* _ASM_X86_TDX_H */
>> diff --git a/arch/x86/kernel/tdcall.S b/arch/x86/kernel/tdcall.S
>> index 9df94f87465d..1823bac4542d 100644
>> --- a/arch/x86/kernel/tdcall.S
>> +++ b/arch/x86/kernel/tdcall.S
>> @@ -3,6 +3,7 @@
>>   #include <asm/asm.h>
>>   #include <asm/frame.h>
>>   #include <asm/unwind_hints.h>
>> +#include <asm/export.h>
>>   
>>   #include <linux/linkage.h>
>>   #include <linux/bits.h>
>> @@ -309,3 +310,4 @@ skip_sti:
>>   
>>          retq
>>   SYM_FUNC_END(__tdx_hypercall)
>> +EXPORT_SYMBOL(__tdx_hypercall);
> 
> EXPORT_SYMBOL_GPL, of course.

Yes. I will fix this in next version.

> 

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 10/12] x86/tdx: Add MSR support for TDX guest
  2021-08-24 16:55   ` Borislav Petkov
@ 2021-08-24 18:12     ` Kuppuswamy, Sathyanarayanan
  0 siblings, 0 replies; 72+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2021-08-24 18:12 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel



On 8/24/21 9:55 AM, Borislav Petkov wrote:
>> +			regs->ax = val & UINT_MAX;
> 			regs->ax = (u32)val;
> 

Ok. I will use your version in next submission.

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 08/12] x86/tdx: Add HLT support for TDX guest
  2021-08-24 17:06     ` Sean Christopherson
  2021-08-24 17:25       ` Andi Kleen
  2021-08-24 17:27       ` Borislav Petkov
@ 2021-08-24 18:18       ` Kuppuswamy, Sathyanarayanan
  2021-08-24 18:28         ` Andi Kleen
  2 siblings, 1 reply; 72+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2021-08-24 18:18 UTC (permalink / raw)
  To: Sean Christopherson, Borislav Petkov
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Kuppuswamy Sathyanarayanan, x86, linux-kernel



On 8/24/21 10:06 AM, Sean Christopherson wrote:
> On Tue, Aug 24, 2021, Borislav Petkov wrote:
>> On Wed, Aug 04, 2021 at 11:13:25AM -0700, Kuppuswamy Sathyanarayanan wrote:
>>> +static __cpuidle void tdg_safe_halt(void)
>>> +{
>>> +	u64 ret;
>>> +
>>> +	/*
>>> +	 * Enable interrupts next to the TDVMCALL to avoid
>>> +	 * performance degradation.
>>
>> That comment needs some more love to say exactly what the problem is.
> 
> LOL, I guess hanging the vCPU counts as degraded performance.  But this comment
> can and should go away entirely...
> 
>>> +	 */
>>> +	local_irq_enable();
> 
> ...because this is broken.  It's also disturbing because it suggests that these
> patches are not being tested.

Sorry, some how we missed this issue before our submission.

We do usual boot test before submission. Since this fix does not block the
boot process, it did not get caught. But we already found this in full functional
testing and also fixed it in github tree.

I will remove this in next submission.

> 
> The STI _must_ immediately precede TDCALL, and it _must_ execute with interrupts
> disabled.  The whole point of the STI blocking shadow is to ensure interrupts are
> blocked until _after_ the HLT completes so that a wake event is not recongized
> before the HLT, in which case the vCPU will get stuck in HLT because its wake
> event alreadyfired.  Enabling IRQs well before the TDCALL defeats the purpose of
> the STI dance in __tdx_hypercall().
> 
> There's even a massive comment in __tdx_hypercall() explaining all this...
> 
>>> +
>>> +	/* IRQ is enabled, So set R12 as 0 */
> 
> It would be helpful to use local variables to document what's up, e.g.
> 
>   	const bool irqs_enabled = true;
> 	const bool do_sti = true;
> 
> 	ret = _tdx_hypercall(EXIT_REASON_HLT, irqs_enabled0, 0, 0, do_sti, NULL);

Ok. I can follow your suggestion in next submission.

> 	
>>> +	ret = _tdx_hypercall(EXIT_REASON_HLT, 0, 0, 0, 1, NULL);
>>> +
>>> +	/* It should never fail */
>>> +	BUG_ON(ret);
>>> +}
>>
>> -- 
>> Regards/Gruss,
>>      Boris.
>>
>> https://people.kernel.org/tglx/notes-about-netiquette

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 08/12] x86/tdx: Add HLT support for TDX guest
  2021-08-24 18:18       ` Kuppuswamy, Sathyanarayanan
@ 2021-08-24 18:28         ` Andi Kleen
  0 siblings, 0 replies; 72+ messages in thread
From: Andi Kleen @ 2021-08-24 18:28 UTC (permalink / raw)
  To: Kuppuswamy, Sathyanarayanan, Sean Christopherson, Borislav Petkov
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams,
	Kirill Shutemov, Kuppuswamy Sathyanarayanan, x86, linux-kernel


>> It would be helpful to use local variables to document what's up, e.g.
>>
>>       const bool irqs_enabled = true;
>>     const bool do_sti = true;
>>
>>     ret = _tdx_hypercall(EXIT_REASON_HLT, irqs_enabled0, 0, 0, 
>> do_sti, NULL);
>
> Ok. I can follow your suggestion in next submission.


I would use defines for the argument values, then it's self documenting.


-Andi



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 09/12] x86/tdx: Wire up KVM hypercalls
  2021-08-24 18:11     ` Kuppuswamy, Sathyanarayanan
@ 2021-08-24 18:29       ` Borislav Petkov
  2021-08-24 19:11         ` Kuppuswamy, Sathyanarayanan
  0 siblings, 1 reply; 72+ messages in thread
From: Borislav Petkov @ 2021-08-24 18:29 UTC (permalink / raw)
  To: Kuppuswamy, Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel

On Tue, Aug 24, 2021 at 11:11:43AM -0700, Kuppuswamy, Sathyanarayanan wrote:
> Since TDX code can be used by other hypervisor (non KVM case) we
> want to have a config to differentiate the KVM related calls.

You need to start explaining yourself better. WTH does "to differentiate
the KVM related calls" even mean? Differentiate for what?!

Our CONFIG space is a huuge mess. Adding another option better be
properly justified.

> Compiler raised version generation issue for __tdx_hypercall

-ENOTENOUGHINFO.

Try again.

> Yes. I will fix this in next version.

And here audit all your patchsets. All exports better be _GPL.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 09/12] x86/tdx: Wire up KVM hypercalls
  2021-08-24 18:29       ` Borislav Petkov
@ 2021-08-24 19:11         ` Kuppuswamy, Sathyanarayanan
  2021-08-24 19:39           ` Borislav Petkov
  0 siblings, 1 reply; 72+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2021-08-24 19:11 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel



On 8/24/21 11:29 AM, Borislav Petkov wrote:
> On Tue, Aug 24, 2021 at 11:11:43AM -0700, Kuppuswamy, Sathyanarayanan wrote:
>> Since TDX code can be used by other hypervisor (non KVM case) we
>> want to have a config to differentiate the KVM related calls.
> 
> You need to start explaining yourself better. WTH does "to differentiate
> the KVM related calls" even mean? Differentiate for what?!

tdx_kvm_hypercall() function and its usage in arch/x86/include/asm/kvm_para.h
is only required for KVM hypervisor.

  static inline long kvm_hypercall0(unsigned int nr)
  {
         long ret;
+
+       if (prot_guest_has(PATTR_GUEST_TDX))
+               return tdx_kvm_hypercall(nr, 0, 0, 0, 0);

If the TDX code is complied for another hypervior, we need some config to
disable above the above code. CONFIG_INTEL_TDX_GUEST_KVM is added
for this purpose. If you think there is no sufficient reason, I can
use defined(CONFIG_KVM_GUEST) && defined(CONFIG_INTEL_TDX_GUEST) to protect
the implementation of tdx_kvm_hypercall()


> 
> Our CONFIG space is a huuge mess. Adding another option better be
> properly justified.
> 
>> Compiler raised version generation issue for __tdx_hypercall
> 
> -ENOTENOUGHINFO.

Following is the error info.

WARNING: modpost: EXPORT symbol "__tdx_hypercall" [vmlinux] version generation failed, symbol will 
not be versioned.

So to fix the above issue, added tdx.h in arch/x86/include/asm/asm-prototypes.h

> 
> Try again.
> 
>> Yes. I will fix this in next version.
> 
> And here audit all your patchsets. All exports better be _GPL.

Ok. I will check it before my next submission.

> 

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 09/12] x86/tdx: Wire up KVM hypercalls
  2021-08-24 19:11         ` Kuppuswamy, Sathyanarayanan
@ 2021-08-24 19:39           ` Borislav Petkov
  0 siblings, 0 replies; 72+ messages in thread
From: Borislav Petkov @ 2021-08-24 19:39 UTC (permalink / raw)
  To: Kuppuswamy, Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel

On Tue, Aug 24, 2021 at 12:11:00PM -0700, Kuppuswamy, Sathyanarayanan wrote:
> If the TDX code is complied for another hypervior, we need some config to
> disable above the above code.

Isn't that what CONFIG_KVM_GUEST is for?

Also, if they don't get used anywhere, the compiler will simply discard
them. I still don't see the need for the ifdeffery.

> Following is the error info.
> 
> WARNING: modpost: EXPORT symbol "__tdx_hypercall" [vmlinux] version
> generation failed, symbol will not be versioned.
> 
> So to fix the above issue, added tdx.h in arch/x86/include/asm/asm-prototypes.h

You need the C-style declaration of __tdx_hypercall, see

334bb7738764 ("x86/kbuild: enable modversions for symbols exported from asm")

and you can do the include without the ifdeffery.

And also state in the commit message why you're including it.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 08/12] x86/tdx: Add HLT support for TDX guest
  2021-08-24 17:27       ` Borislav Petkov
  2021-08-24 17:47         ` Sean Christopherson
@ 2021-08-31 20:49         ` Kuppuswamy, Sathyanarayanan
  2021-09-01  7:42           ` Borislav Petkov
  1 sibling, 1 reply; 72+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2021-08-31 20:49 UTC (permalink / raw)
  To: Borislav Petkov, Sean Christopherson
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Kuppuswamy Sathyanarayanan, x86, linux-kernel

Hi Boris,

On 8/24/21 10:27 AM, Borislav Petkov wrote:
> I think in the next version all those _tdx_hypercall() wrappers should
> spell it out what the parameters they pass are used for.
> 
> Hohumm.

Regarding details about _tdx_hypercall() wrapper usage, do you want me
to document the ABI details?

For example for MSR read,

static u64 tdx_read_msr_safe(unsigned int msr, int *err)
{
         struct tdx_hypercall_output out = {0};
         u64 ret;

         WARN_ON_ONCE(tdx_is_context_switched_msr(msr));

         /*
          * TDX MSR READ Hypercall ABI:
          *
          * Input Registers:
          *
          * R11(EXIT_REASON_MSR_READ) - hypercall sub function id
          * R12(msr) - MSR index
          *
          * Output Registers:
          *
          * R10(out.r10) - Hypercall return error code
          * R11(out.r11) - MSR read value
          * RAX(ret) - TDCALL error code
          */
         ret = _tdx_hypercall(EXIT_REASON_MSR_READ, msr, 0, 0, 0, &out);

Let me know if you agree with above format?

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 08/12] x86/tdx: Add HLT support for TDX guest
  2021-08-31 20:49         ` Kuppuswamy, Sathyanarayanan
@ 2021-09-01  7:42           ` Borislav Petkov
  0 siblings, 0 replies; 72+ messages in thread
From: Borislav Petkov @ 2021-09-01  7:42 UTC (permalink / raw)
  To: Kuppuswamy, Sathyanarayanan
  Cc: Sean Christopherson, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Andy Lutomirski, Peter H Anvin, Dave Hansen,
	Tony Luck, Dan Williams, Andi Kleen, Kirill Shutemov,
	Kuppuswamy Sathyanarayanan, x86, linux-kernel

On Tue, Aug 31, 2021 at 01:49:54PM -0700, Kuppuswamy, Sathyanarayanan wrote:
> Regarding details about _tdx_hypercall() wrapper usage, do you want me
> to document the ABI details?
> 
> For example for MSR read,
> 
> static u64 tdx_read_msr_safe(unsigned int msr, int *err)
> {
>         struct tdx_hypercall_output out = {0};
>         u64 ret;
> 
>         WARN_ON_ONCE(tdx_is_context_switched_msr(msr));
> 
>         /*
>          * TDX MSR READ Hypercall ABI:
>          *
>          * Input Registers:
>          *
>          * R11(EXIT_REASON_MSR_READ) - hypercall sub function id
>          * R12(msr) - MSR index
>          *
>          * Output Registers:
>          *
>          * R10(out.r10) - Hypercall return error code
>          * R11(out.r11) - MSR read value
>          * RAX(ret) - TDCALL error code
>          */
>         ret = _tdx_hypercall(EXIT_REASON_MSR_READ, msr, 0, 0, 0, &out);
> 
> Let me know if you agree with above format?

Yes, that's nice, thanks.

Alternatively, you could simply point at the relevant place in the TDX
documentation so that people can go look at it and find out how, for
example, a MSR read is done, ABI-wise.

With all those confidential computing solutions and the amount of specs
out there, one can get lost pretty easily so having doc references
should be very helpful.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 07/12] x86/traps: Add #VE support for TDX guest
  2021-08-24 17:46       ` Borislav Petkov
@ 2021-09-02 15:24         ` Kuppuswamy, Sathyanarayanan
  2021-09-03 10:17           ` Borislav Petkov
  0 siblings, 1 reply; 72+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2021-09-02 15:24 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel



On 8/24/21 10:46 AM, Borislav Petkov wrote:
> On Tue, Aug 24, 2021 at 10:32:13AM -0700, Kuppuswamy, Sathyanarayanan wrote:
>> Mainly chose it avoid future name conflicts with KVM (tdx) calls. But
> 
> What name conflicts with KVM calls? Please explain.

Currently there are no name conflicts. But in our initial submissions (RFC v?)
we had some conflicts in functions like (tdx_get_tdreport() and
tdx_get_quote()).

Since it is no longer true and "tdg" is not a favorite prefix, I will
rename tdg -> tdx in next submission.

> 
>> It is required to handle #VE exceptions raised by unhandled MSR
>> read/writes.
> 
> Example? Please elaborate.

If MSR read/write failed in tdx_handle_virtualization_exception(), it will
return non zero return value which in turn will trigger ve_raise_fault().

If we don't call fixup_exception() for such case, it will trigger oops
and eventually panic in TDX. For MSR read/write failures we don't want
to panic.

#VE MSR read/write
  -> exc_virtualization_exception()
     -> tdx_handle_virtualization_exception()
        ->tdx_write_msr_safe()
     -> ve_raise_fault
        -> fixup_exception()

> 
>> Ok. I can check it. But there is only one statement after this call.
>> So it may not be very helpful.
> 
> Looking at die_addr(), that calls the die notifier too. So do you
> even *have* to call it here with VEFSTR? As yo say, there's only one
> statement after that call and box is dead in the water after that so why
> even bother...

Reason for calling die_addr() is to trigger oops for failed #VE handling, which
is desirable for TDX. Also sending die notification may be useful for debuggers.

This sequence of calls are similar to exc_general_protection().

> 

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 07/12] x86/traps: Add #VE support for TDX guest
  2021-09-02 15:24         ` Kuppuswamy, Sathyanarayanan
@ 2021-09-03 10:17           ` Borislav Petkov
  0 siblings, 0 replies; 72+ messages in thread
From: Borislav Petkov @ 2021-09-03 10:17 UTC (permalink / raw)
  To: Kuppuswamy, Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Andy Lutomirski,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	x86, linux-kernel

On Thu, Sep 02, 2021 at 08:24:53AM -0700, Kuppuswamy, Sathyanarayanan wrote:
> If MSR read/write failed in tdx_handle_virtualization_exception(), it will
> return non zero return value which in turn will trigger ve_raise_fault().
> 
> If we don't call fixup_exception() for such case, it will trigger oops
> and eventually panic in TDX. For MSR read/write failures we don't want
> to panic.
> 
> #VE MSR read/write
>  -> exc_virtualization_exception()
>     -> tdx_handle_virtualization_exception()
>        ->tdx_write_msr_safe()
>     -> ve_raise_fault
>        -> fixup_exception()

Lemme see if I understand this correctly: you're relying on the kernel
exception handling fixup to end up in

	ex_handler_{rd,wr}msr_unsafe()

which would warn but succeed so that you return from ve_raise_fault()
before die()ing?

If so, I could use a comment in ve_raise_fault() in case we touch the
fixup exception machinery, like we're currently doing.

> Reason for calling die_addr() is to trigger oops for failed #VE handling, which
> is desirable for TDX. Also sending die notification may be useful for debuggers.
> 
> This sequence of calls are similar to exc_general_protection().

Ok.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 72+ messages in thread

end of thread, other threads:[~2021-09-03 10:17 UTC | newest]

Thread overview: 72+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-04 18:13 [PATCH v5 00/12] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
2021-08-04 18:13 ` [PATCH v5 01/12] x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT Kuppuswamy Sathyanarayanan
2021-08-12  7:18   ` Borislav Petkov
2021-08-12 17:17     ` Kuppuswamy, Sathyanarayanan
2021-08-17 12:50     ` Juergen Gross
2021-08-17 13:16       ` Kuppuswamy, Sathyanarayanan
2021-08-17 13:28         ` Juergen Gross
2021-08-17 13:39           ` Kuppuswamy, Sathyanarayanan
2021-08-17 13:47             ` Juergen Gross
2021-08-17 13:50               ` Kuppuswamy, Sathyanarayanan
2021-08-04 18:13 ` [PATCH v5 02/12] x86/tdx: Introduce INTEL_TDX_GUEST config option Kuppuswamy Sathyanarayanan
2021-08-04 18:13 ` [PATCH v5 03/12] x86/cpufeatures: Add TDX Guest CPU feature Kuppuswamy Sathyanarayanan
2021-08-12  7:39   ` Borislav Petkov
2021-08-04 18:13 ` [PATCH v5 04/12] x86/tdx: Add protected guest support for TDX guest Kuppuswamy Sathyanarayanan
2021-08-04 21:59   ` Sean Christopherson
2021-08-04 22:03     ` Dave Hansen
2021-08-04 22:26       ` Kuppuswamy, Sathyanarayanan
2021-08-04 22:42         ` Sean Christopherson
2021-08-04 23:00           ` Kuppuswamy, Sathyanarayanan
2021-08-12  7:53             ` Borislav Petkov
2021-08-12 17:18               ` Kuppuswamy, Sathyanarayanan
2021-08-20 14:28                 ` Borislav Petkov
2021-08-20 16:42                   ` Kuppuswamy, Sathyanarayanan
2021-08-20 16:59                     ` Borislav Petkov
2021-08-20 17:11                       ` Kuppuswamy, Sathyanarayanan
2021-08-04 18:13 ` [PATCH v5 05/12] x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper functions Kuppuswamy Sathyanarayanan
2021-08-20 15:16   ` Borislav Petkov
2021-08-04 18:13 ` [PATCH v5 06/12] x86/tdx: Get TD execution environment information via TDINFO Kuppuswamy Sathyanarayanan
2021-08-04 22:38   ` Sean Christopherson
2021-08-20 17:13   ` Borislav Petkov
2021-08-20 17:31     ` Kuppuswamy, Sathyanarayanan
2021-08-20 17:35       ` Borislav Petkov
2021-08-20 18:29         ` Kuppuswamy, Sathyanarayanan
2021-08-20 18:58           ` Andi Kleen
2021-08-20 19:01             ` Kuppuswamy, Sathyanarayanan
2021-08-04 18:13 ` [PATCH v5 07/12] x86/traps: Add #VE support for TDX guest Kuppuswamy Sathyanarayanan
2021-08-24 10:17   ` Borislav Petkov
2021-08-24 17:32     ` Kuppuswamy, Sathyanarayanan
2021-08-24 17:36       ` Dave Hansen
2021-08-24 17:46       ` Borislav Petkov
2021-09-02 15:24         ` Kuppuswamy, Sathyanarayanan
2021-09-03 10:17           ` Borislav Petkov
2021-08-04 18:13 ` [PATCH v5 08/12] x86/tdx: Add HLT " Kuppuswamy Sathyanarayanan
2021-08-24 16:10   ` Borislav Petkov
2021-08-24 17:06     ` Sean Christopherson
2021-08-24 17:25       ` Andi Kleen
2021-08-24 17:27       ` Borislav Petkov
2021-08-24 17:47         ` Sean Christopherson
2021-08-24 17:50           ` Borislav Petkov
2021-08-31 20:49         ` Kuppuswamy, Sathyanarayanan
2021-09-01  7:42           ` Borislav Petkov
2021-08-24 18:18       ` Kuppuswamy, Sathyanarayanan
2021-08-24 18:28         ` Andi Kleen
2021-08-24 17:35     ` Kuppuswamy, Sathyanarayanan
2021-08-24 17:48       ` Borislav Petkov
2021-08-04 18:13 ` [PATCH v5 09/12] x86/tdx: Wire up KVM hypercalls Kuppuswamy Sathyanarayanan
2021-08-24 16:34   ` Borislav Petkov
2021-08-24 18:11     ` Kuppuswamy, Sathyanarayanan
2021-08-24 18:29       ` Borislav Petkov
2021-08-24 19:11         ` Kuppuswamy, Sathyanarayanan
2021-08-24 19:39           ` Borislav Petkov
2021-08-04 18:13 ` [PATCH v5 10/12] x86/tdx: Add MSR support for TDX guest Kuppuswamy Sathyanarayanan
2021-08-24 16:55   ` Borislav Petkov
2021-08-24 18:12     ` Kuppuswamy, Sathyanarayanan
2021-08-04 18:13 ` [PATCH v5 11/12] x86/tdx: Don't write CSTAR MSR on Intel Kuppuswamy Sathyanarayanan
2021-08-04 18:31   ` Sean Christopherson
2021-08-04 21:03     ` Kuppuswamy, Sathyanarayanan
2021-08-04 21:44       ` Sean Christopherson
2021-08-04 21:48       ` Dave Hansen
2021-08-04 22:23         ` Kuppuswamy, Sathyanarayanan
2021-08-04 18:13 ` [PATCH v5 12/12] x86/tdx: Handle CPUID via #VE Kuppuswamy Sathyanarayanan
2021-08-24 17:48   ` Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).