All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v10 00/11] Add TDX Guest Support (Initial support)
@ 2021-10-09  5:37 Kuppuswamy Sathyanarayanan
  2021-10-09  5:37 ` [PATCH v10 01/11] x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT Kuppuswamy Sathyanarayanan
                   ` (11 more replies)
  0 siblings, 12 replies; 58+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-10-09  5:37 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

Hi All,

Intel's Trust Domain Extensions (TDX) protect guest VMs from malicious
hosts and some physical attacks. This series adds the basic TDX guest
infrastructure support (including #VE handler support, and #VE support
for halt and CPUID). This is just a subset of patches in the bare minimum
TDX support patch list which is required for supporting minimal
functional TDX guest. Other basic feature features like #VE support for
IO, MMIO, boot optimization fixes and shared-mm support will be submitted
in a separate patch set. To make reviewing easier we split it into smaller
series. This series alone is not necessarily fully functional.

Also, the host-side support patches, and support for advanced TD guest
features like attestation or debug-mode will be submitted at a later time.
Also, at this point it is not secure with some known holes in drivers, and
also hasn’t been fully audited and fuzzed yet.

TDX has a lot of similarities to SEV. It enhances confidentiality and
of guest memory and state (like registers) and includes a new exception
(#VE) for the same basic reasons as SEV-ES. Like SEV-SNP (not merged
yet), TDX limits the host's ability to effect changes in the guest
physical address space. With TDX the host cannot access the guest memory,
so various functionality that would normally be done in KVM has moved
into a (paravirtualized) guest. Partially this is done using the
Virtualization Exception (#VE) and partially with direct paravirtual hooks.

The TDX architecture also includes a new CPU mode called
Secure-Arbitration Mode (SEAM). The software (TDX module) running in this
mode arbitrates interactions between host and guest and implements many of
the guarantees of the TDX architecture.

Some of the key differences between TD and regular VM is,

1. Multi CPU bring-up is done using the ACPI MADT wake-up table.
2. A new #VE exception handler is added. The TDX module injects #VE exception
   to the guest TD in cases of instructions that need to be emulated, disallowed
   MSR accesses, etc.
3. By default memory is marked as private, and TD will selectively share it with
   VMM based on need.
   
Note that the kernel will also need to be hardened against low level inputs from
the now untrusted hosts. This will be done in follow on patches.

Please check the following info for more details about #VE on memory access:

== #VE on Memory Accesses ==

A TD guest is in control of whether its memory accesses are treated as
private or shared.  It selects the behavior with a bit in its page table
entries.

=== #VE on Shared Pages ===

Accesses to shared mappings can cause #VE's.  The hypervisor is in
control of when a #VE might occur, so the guest must be careful to only
reference shared pages when it is in a context that can safely handle a #VE.

However, shared mapping content can not be trusted since shared page
content is writable by the hypervisor.  This means that shared mappings
are never used for sensitive memory contents like stacks or kernel text.
 This means that the shared mapping property of inducing #VEs requires
essentially no special kernel handling in sensitive contexts like
syscall entry or NMIs.

=== #VE on Private Pages ===

Some accesses to private mappings may cause #VEs.  Before a mapping is
accepted (aka. in the SEPT_PENDING state), a reference would cause
a #VE.  But, after acceptance, references typically succeed.

The hypervisor can cause a private page reference to fail if it chooses
to move an accepted page to a "blocked" state.  However, if it does
this, a page access will not generate a #VE.  It will, instead, cause a
"TD Exit" where the hypervisor is required to handle the exception.

You can find TDX related documents in the following link.

https://software.intel.com/content/www/br/pt/develop/articles/intel-trust-domain-extensions.html

Changes since v9:
 * Added inline definition of is_tdx_guest() for !CONFIG_INTEL_TDX_GUEST case

Changes since v8:
 * * Change logs are include per patch.

Changes since v7:
 * Included #VE details on memory access in cover letter.
 * Change log is include per patch.

Changes since v6
 * Fixed #ifdef format as per Boris comment in patch titled
   "x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT".
 * Fixed commit log of "x86/tdx: Handle CPUID via #VE" as per Dave
   Hansen review comments.
 * Removed TDX changes from "x86/tdx: Add protected guest support for
   TDX guest" and created "x86/tdx: Add Intel version of
   cc_platform_has()"
 * Rebased on top of Tom Landecky's CC platform support patch series
   https://lore.kernel.org/linux-iommu/f9951644147e27772bf4512325e8ba6472e363b7.1631141919.git.thomas.lendacky@amd.com/T/
 * Rebased on top of v5.15-rc1.

Changes since v5:
 * Moved patch titled "x86/tdx: Get TD execution environment
   information via TDINFO" to the series which uses it.
 * Rest of the change log is added per patch.

Changes since v4:
 * Added a patch that adds TDX guest exception for CSTAR MSR.
 * Rebased on top of Tom Lendacky's protected guest changes.
 * Rest of the change log is added per patch.

Changes since v3:
 * Moved generic protected guest changes from patch titled "x86:
   Introduce generic protected guest abstraction" into seperate
   patch outside this patchset. Also, TDX specific changes are
   moved to patch titled "x86/tdx: Add protected guest support
   for TDX guest"
 * Rebased on top of v5.14-rc1.
 * Rest of the change log is added per patch.

Changes since v1 (v2 is partial set submission):
 * Patch titled "x86/x86: Add early_is_tdx_guest() interface" is moved
   out of this series.
 * Rest of the change log is added per patch.

Andi Kleen (1):
  x86/tdx: Don't write CSTAR MSR on Intel

Kirill A. Shutemov (6):
  x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT
  x86/traps: Add #VE support for TDX guest
  x86/tdx: Add HLT support for TDX guest
  x86/tdx: Wire up KVM hypercalls
  x86/tdx: Add MSR support for TDX guest
  x86/tdx: Handle CPUID via #VE

Kuppuswamy Sathyanarayanan (4):
  x86/tdx: Introduce INTEL_TDX_GUEST config option
  x86/cpufeatures: Add TDX Guest CPU feature
  x86/tdx: Add TDX support to intel_cc_platform_has()
  x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper
    functions

 arch/x86/Kconfig                      |  15 ++
 arch/x86/include/asm/asm-prototypes.h |   1 +
 arch/x86/include/asm/cpufeatures.h    |   1 +
 arch/x86/include/asm/idtentry.h       |   4 +
 arch/x86/include/asm/irqflags.h       |  44 ++--
 arch/x86/include/asm/kvm_para.h       |  22 ++
 arch/x86/include/asm/paravirt.h       |  20 +-
 arch/x86/include/asm/paravirt_types.h |   3 +-
 arch/x86/include/asm/tdx.h            | 109 +++++++++
 arch/x86/kernel/Makefile              |   4 +
 arch/x86/kernel/asm-offsets.c         |  23 ++
 arch/x86/kernel/cc_platform.c         |  12 +-
 arch/x86/kernel/cpu/common.c          |  11 +-
 arch/x86/kernel/head64.c              |   9 +
 arch/x86/kernel/idt.c                 |   3 +
 arch/x86/kernel/paravirt.c            |   3 +-
 arch/x86/kernel/tdcall.S              | 306 ++++++++++++++++++++++++++
 arch/x86/kernel/tdx.c                 | 245 +++++++++++++++++++++
 arch/x86/kernel/traps.c               |  77 +++++++
 include/linux/cc_platform.h           |   9 +
 20 files changed, 884 insertions(+), 37 deletions(-)
 create mode 100644 arch/x86/include/asm/tdx.h
 create mode 100644 arch/x86/kernel/tdcall.S
 create mode 100644 arch/x86/kernel/tdx.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v10 01/11] x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT
  2021-10-09  5:37 [PATCH v10 00/11] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
@ 2021-10-09  5:37 ` Kuppuswamy Sathyanarayanan
  2021-10-15 16:59   ` David Hildenbrand
  2021-10-09  5:37 ` [PATCH v10 02/11] x86/tdx: Introduce INTEL_TDX_GUEST config option Kuppuswamy Sathyanarayanan
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 58+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-10-09  5:37 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

CONFIG_PARAVIRT_XXL is mainly defined/used by XEN PV guests. For
other VM guest types, features supported under CONFIG_PARAVIRT
are self sufficient. CONFIG_PARAVIRT mainly provides support for
TLB flush operations and time related operations.

For TDX guest as well, paravirt calls under CONFIG_PARVIRT meets
most of its requirement except the need of HLT and SAFE_HLT
paravirt calls, which is currently defined under
COFNIG_PARAVIRT_XXL.

Since enabling CONFIG_PARAVIRT_XXL is too bloated for TDX guest
like platforms, move HLT and SAFE_HLT paravirt calls under
CONFIG_PARAVIRT.

Moving HLT and SAFE_HLT paravirt calls are not fatal and should not
break any functionality for current users of CONFIG_PARAVIRT.

Co-developed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---

Changes since v9:
 * None.

Changes since v8:
 * Modified the code to reduce the __ASSEMBLY__ usage (as per
   Josh Poimboeuf comments).

Changes since v7:
 * None.

 arch/x86/include/asm/irqflags.h       | 44 +++++++++++++++------------
 arch/x86/include/asm/paravirt.h       | 20 ++++++------
 arch/x86/include/asm/paravirt_types.h |  3 +-
 arch/x86/kernel/paravirt.c            |  3 +-
 4 files changed, 37 insertions(+), 33 deletions(-)

diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index c5ce9845c999..b794b6da3214 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -57,28 +57,11 @@ static inline __cpuidle void native_halt(void)
 	asm volatile("hlt": : :"memory");
 }
 
-#endif
-
-#ifdef CONFIG_PARAVIRT_XXL
-#include <asm/paravirt.h>
-#else
-#ifndef __ASSEMBLY__
-#include <linux/types.h>
-
-static __always_inline unsigned long arch_local_save_flags(void)
-{
-	return native_save_fl();
-}
+# ifdef CONFIG_PARAVIRT
 
-static __always_inline void arch_local_irq_disable(void)
-{
-	native_irq_disable();
-}
+#  include <asm/paravirt.h>
 
-static __always_inline void arch_local_irq_enable(void)
-{
-	native_irq_enable();
-}
+# else /* ! CONFIG_PARAVIRT */
 
 /*
  * Used in the idle loop; sti takes one instruction cycle
@@ -97,6 +80,27 @@ static inline __cpuidle void halt(void)
 {
 	native_halt();
 }
+#endif /* CONFIG_PARAVIRT */
+# endif /* __ASSEMBLY__ */
+
+#ifndef CONFIG_PARAVIRT_XXL
+#ifndef __ASSEMBLY__
+#include <linux/types.h>
+
+static __always_inline unsigned long arch_local_save_flags(void)
+{
+	return native_save_fl();
+}
+
+static __always_inline void arch_local_irq_disable(void)
+{
+	native_irq_disable();
+}
+
+static __always_inline void arch_local_irq_enable(void)
+{
+	native_irq_enable();
+}
 
 /*
  * For spinlocks, etc:
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index da3a1ac82be5..d323a626c7a8 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -97,6 +97,16 @@ static inline void paravirt_arch_exit_mmap(struct mm_struct *mm)
 	PVOP_VCALL1(mmu.exit_mmap, mm);
 }
 
+static inline void arch_safe_halt(void)
+{
+	PVOP_VCALL0(irq.safe_halt);
+}
+
+static inline void halt(void)
+{
+	PVOP_VCALL0(irq.halt);
+}
+
 #ifdef CONFIG_PARAVIRT_XXL
 static inline void load_sp0(unsigned long sp0)
 {
@@ -162,16 +172,6 @@ static inline void __write_cr4(unsigned long x)
 	PVOP_VCALL1(cpu.write_cr4, x);
 }
 
-static inline void arch_safe_halt(void)
-{
-	PVOP_VCALL0(irq.safe_halt);
-}
-
-static inline void halt(void)
-{
-	PVOP_VCALL0(irq.halt);
-}
-
 static inline void wbinvd(void)
 {
 	PVOP_ALT_VCALL0(cpu.wbinvd, "wbinvd", ALT_NOT(X86_FEATURE_XENPV));
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index d9d6b0203ec4..40082847f314 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -150,10 +150,9 @@ struct pv_irq_ops {
 	struct paravirt_callee_save save_fl;
 	struct paravirt_callee_save irq_disable;
 	struct paravirt_callee_save irq_enable;
-
+#endif
 	void (*safe_halt)(void);
 	void (*halt)(void);
-#endif
 } __no_randomize_layout;
 
 struct pv_mmu_ops {
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 04cafc057bed..8cea6e75ba29 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -283,9 +283,10 @@ struct paravirt_patch_template pv_ops = {
 	.irq.save_fl		= __PV_IS_CALLEE_SAVE(native_save_fl),
 	.irq.irq_disable	= __PV_IS_CALLEE_SAVE(native_irq_disable),
 	.irq.irq_enable		= __PV_IS_CALLEE_SAVE(native_irq_enable),
+#endif /* CONFIG_PARAVIRT_XXL */
+
 	.irq.safe_halt		= native_safe_halt,
 	.irq.halt		= native_halt,
-#endif /* CONFIG_PARAVIRT_XXL */
 
 	/* Mmu ops. */
 	.mmu.flush_tlb_user	= native_flush_tlb_local,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v10 02/11] x86/tdx: Introduce INTEL_TDX_GUEST config option
  2021-10-09  5:37 [PATCH v10 00/11] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
  2021-10-09  5:37 ` [PATCH v10 01/11] x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT Kuppuswamy Sathyanarayanan
@ 2021-10-09  5:37 ` Kuppuswamy Sathyanarayanan
  2021-10-11 18:19   ` Josh Poimboeuf
  2021-10-09  5:37 ` [PATCH v10 03/11] x86/cpufeatures: Add TDX Guest CPU feature Kuppuswamy Sathyanarayanan
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 58+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-10-09  5:37 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

Add INTEL_TDX_GUEST config option to selectively compile
TDX guest support.

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---

Changes since v9:
 * None.

Changes since v8:
 * Used Extensions in place of eXtensions to maintain uniformity.
 * Removed "select SECURITY_LOCKDOWN_LSM" (it is no longer needed).
 * Replaced "select X86_X2APIC" with "depends X86_X2APIC".

Changes since v7:
 * None

Changes since v6:
 * None

Changes since v5:
 * None

Changes since v4:
 * None

Changes since v3:
 * Removed PARAVIRT_XL from dependency list.
 * Fixed typo in help content.

 arch/x86/Kconfig | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2b2a9639d8ae..eab7f2911f94 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -865,6 +865,20 @@ config ACRN_GUEST
 	  IOT with small footprint and real-time features. More details can be
 	  found in https://projectacrn.org/.
 
+config INTEL_TDX_GUEST
+	bool "Intel Trusted Domain Extensions (TDX) Guest Support"
+	depends on X86_64 && CPU_SUP_INTEL && PARAVIRT
+	depends on SECURITY
+	depends on X86_X2APIC
+	help
+	  Provide support for running in a trusted domain on Intel processors
+	  equipped with Trusted Domain Extensions. TDX is a Intel technology
+	  that extends VMX and Memory Encryption with a new kind of virtual
+	  machine guest called Trust Domain (TD). A TD is designed to run in
+	  a CPU mode that protects the confidentiality of TD memory contents
+	  and the TD’s CPU state from other software, including VMM. TDX guest
+	  uses virtual X2APIC for interrupt management.
+
 endif #HYPERVISOR_GUEST
 
 source "arch/x86/Kconfig.cpu"
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v10 03/11] x86/cpufeatures: Add TDX Guest CPU feature
  2021-10-09  5:37 [PATCH v10 00/11] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
  2021-10-09  5:37 ` [PATCH v10 01/11] x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT Kuppuswamy Sathyanarayanan
  2021-10-09  5:37 ` [PATCH v10 02/11] x86/tdx: Introduce INTEL_TDX_GUEST config option Kuppuswamy Sathyanarayanan
@ 2021-10-09  5:37 ` Kuppuswamy Sathyanarayanan
  2021-10-13  8:18   ` Borislav Petkov
  2021-10-13 20:44   ` Thomas Gleixner
  2021-10-09  5:37 ` [PATCH v10 04/11] x86/tdx: Add TDX support to intel_cc_platform_has() Kuppuswamy Sathyanarayanan
                   ` (8 subsequent siblings)
  11 siblings, 2 replies; 58+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-10-09  5:37 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

Add CPU feature detection for Trusted Domain Extensions support. TDX
feature adds capabilities to keep guest register state and memory
isolated from hypervisor.

For TDX guest platforms, executing CPUID(eax=0x21, ecx=0) will return
following values in EAX, EBX, ECX and EDX.

EAX:  Maximum sub-leaf number:  0
EBX/EDX/ECX:  Vendor string:

EBX =  "Inte"
EDX =  "lTDX"
ECX =  "    "

So when above condition is true, set X86_FEATURE_TDX_GUEST feature cap
bit.

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Changes since v9:
 * Added inline definition of is_tdx_guest() for !CONFIG_INTEL_TDX_GUEST case.

Changes since v8:
 * Changed is_tdx_guest variable to is_tdx_guest() function.
 * Added -fno-stack-protector and -pg for tdx.c
 * Fixed tdx_early_init() related comments in x86_64_start_kernel()
   as per review suggestion.

Changes since v7:
 * Add comments for the order of tdx_early_init() call.
 * Similar to AMD (sme_me_mask) added a global variable
   is_tdx_guest for Intel TDX related checks.
 * Changed pr_fmt from "x86/tdx: " to "tdx:"

 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/include/asm/tdx.h         | 22 +++++++++++++++++
 arch/x86/kernel/Makefile           |  4 ++++
 arch/x86/kernel/head64.c           |  9 +++++++
 arch/x86/kernel/tdx.c              | 38 ++++++++++++++++++++++++++++++
 5 files changed, 74 insertions(+)
 create mode 100644 arch/x86/include/asm/tdx.h
 create mode 100644 arch/x86/kernel/tdx.c

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index d0ce5cfd3ac1..84997abeb401 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -238,6 +238,7 @@
 #define X86_FEATURE_VMW_VMMCALL		( 8*32+19) /* "" VMware prefers VMMCALL hypercall instruction */
 #define X86_FEATURE_PVUNLOCK		( 8*32+20) /* "" PV unlock function */
 #define X86_FEATURE_VCPUPREEMPT		( 8*32+21) /* "" PV vcpu_is_preempted function */
+#define X86_FEATURE_TDX_GUEST		( 8*32+22) /* Trusted Domain Extensions Guest */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0 (EBX), word 9 */
 #define X86_FEATURE_FSGSBASE		( 9*32+ 0) /* RDFSBASE, WRFSBASE, RDGSBASE, WRGSBASE instructions*/
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
new file mode 100644
index 000000000000..96ffce7bc530
--- /dev/null
+++ b/arch/x86/include/asm/tdx.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2020 Intel Corporation */
+#ifndef _ASM_X86_TDX_H
+#define _ASM_X86_TDX_H
+
+#include <linux/cpufeature.h>
+
+#define TDX_CPUID_LEAF_ID	0x21
+
+#ifdef CONFIG_INTEL_TDX_GUEST
+
+bool is_tdx_guest(void);
+void __init tdx_early_init(void);
+
+#else
+
+static inline bool is_tdx_guest(void) { return false; }
+static inline void tdx_early_init(void) { };
+
+#endif /* CONFIG_INTEL_TDX_GUEST */
+
+#endif /* _ASM_X86_TDX_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 2ff3e600f426..8c9a9214dd34 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -22,6 +22,7 @@ CFLAGS_REMOVE_early_printk.o = -pg
 CFLAGS_REMOVE_head64.o = -pg
 CFLAGS_REMOVE_sev.o = -pg
 CFLAGS_REMOVE_cc_platform.o = -pg
+CFLAGS_REMOVE_tdx.o = -pg
 endif
 
 KASAN_SANITIZE_head$(BITS).o				:= n
@@ -31,6 +32,7 @@ KASAN_SANITIZE_stacktrace.o				:= n
 KASAN_SANITIZE_paravirt.o				:= n
 KASAN_SANITIZE_sev.o					:= n
 KASAN_SANITIZE_cc_platform.o				:= n
+KASAN_SANITIZE_tdx.o					:= n
 
 # With some compiler versions the generated code results in boot hangs, caused
 # by several compilation units. To be safe, disable all instrumentation.
@@ -50,6 +52,7 @@ KCOV_INSTRUMENT		:= n
 
 CFLAGS_head$(BITS).o	+= -fno-stack-protector
 CFLAGS_cc_platform.o	+= -fno-stack-protector
+CFLAGS_tdx.o		+= -fno-stack-protector
 
 CFLAGS_irq.o := -I $(srctree)/$(src)/../include/asm/trace
 
@@ -130,6 +133,7 @@ obj-$(CONFIG_PARAVIRT_CLOCK)	+= pvclock.o
 obj-$(CONFIG_X86_PMEM_LEGACY_DEVICE) += pmem.o
 
 obj-$(CONFIG_JAILHOUSE_GUEST)	+= jailhouse.o
+obj-$(CONFIG_INTEL_TDX_GUEST)	+= tdx.o
 
 obj-$(CONFIG_EISA)		+= eisa.o
 obj-$(CONFIG_PCSPKR_PLATFORM)	+= pcspeaker.o
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index fc5371a7e9d1..a5ee90df2aba 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -40,6 +40,7 @@
 #include <asm/extable.h>
 #include <asm/trapnr.h>
 #include <asm/sev.h>
+#include <asm/tdx.h>
 
 /*
  * Manage page tables very early on.
@@ -500,6 +501,14 @@ asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)
 
 	copy_bootdata(__va(real_mode_data));
 
+	/*
+	 * A future dependency on cmdline parameters is expected (for
+	 * adding debug options). So the order of calling it should be
+	 * after copy_bootdata() (in which command line parameter is
+	 * initialized).
+	 */
+	tdx_early_init();
+
 	/*
 	 * Load microcode early on BSP.
 	 */
diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
new file mode 100644
index 000000000000..88bf12788684
--- /dev/null
+++ b/arch/x86/kernel/tdx.c
@@ -0,0 +1,38 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2020 Intel Corporation */
+
+#undef pr_fmt
+#define pr_fmt(fmt)     "tdx: " fmt
+
+#include <asm/tdx.h>
+
+bool is_tdx_guest(void)
+{
+	static int tdx_guest = -1;
+	u32 eax, sig[3];
+
+	if (tdx_guest >= 0)
+		goto done;
+
+	if (cpuid_eax(0) < TDX_CPUID_LEAF_ID) {
+		tdx_guest = 0;
+		goto done;
+	}
+
+	cpuid_count(TDX_CPUID_LEAF_ID, 0, &eax, &sig[0], &sig[2], &sig[1]);
+
+	tdx_guest = !memcmp("IntelTDX    ", sig, 12);
+
+done:
+	return !!tdx_guest;
+}
+
+void __init tdx_early_init(void)
+{
+	if (!is_tdx_guest())
+		return;
+
+	setup_force_cpu_cap(X86_FEATURE_TDX_GUEST);
+
+	pr_info("Guest initialized\n");
+}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v10 04/11] x86/tdx: Add TDX support to intel_cc_platform_has()
  2021-10-09  5:37 [PATCH v10 00/11] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
                   ` (2 preceding siblings ...)
  2021-10-09  5:37 ` [PATCH v10 03/11] x86/cpufeatures: Add TDX Guest CPU feature Kuppuswamy Sathyanarayanan
@ 2021-10-09  5:37 ` Kuppuswamy Sathyanarayanan
  2021-10-13 15:57   ` Borislav Petkov
  2021-10-14  7:12   ` Thomas Gleixner
  2021-10-09  5:37 ` [PATCH v10 05/11] x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper functions Kuppuswamy Sathyanarayanan
                   ` (7 subsequent siblings)
  11 siblings, 2 replies; 58+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-10-09  5:37 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

cc_platform_has() can be used to check for specific active confidential
computing attributes, like memory encryption. For Intel platform like
Trusted Domain Extensions (TDX) guest has need for using this function
to protect the TDX specific changes made in generic drivers.

So add TDX guest support to intel_cc_platform_has().

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Changes since v9:
 * None

Changes since v8:
 * Rebased this patch on top of x86/tip/master branch.

Changes since v7:
 * Merged patches titled "x86/tdx: Add Intel ARCH support to cc_platform_has()" and
   "x86/tdx: Add TDX guest support to intel_cc_platform_has()" into one patch.
 * Used cpuid_has_tdx_guest() when adding Intel support to cc_platform_has().

Change since v6:
 * Used cc_platform_has() in place of prot_guest_has().
 * Rebased on top of Tom Landecky's CC platform support patch series.
   https://lore.kernel.org/linux-iommu/f9951644147e27772bf4512325e8ba6472e363b7.1631141919.git.thomas.lendacky@amd.com/T/

Changes since v5:
 * Replaced tdx_prot_guest_has() with intel_prot_guest_has() to
   keep the Intel call non TDX specific.
 * Added TDX guest support to intel_prot_guest_has().

Changes since v4:
 * Rebased on top of Tom Lendacky's protected guest changes.
 * Moved memory encryption related protected guest flags in
   tdx_prot_guest_has() to the patch that actually uses them.

 arch/x86/Kconfig              |  1 +
 arch/x86/kernel/cc_platform.c | 12 ++++++++++--
 include/linux/cc_platform.h   |  9 +++++++++
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index eab7f2911f94..af49ad084919 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -870,6 +870,7 @@ config INTEL_TDX_GUEST
 	depends on X86_64 && CPU_SUP_INTEL && PARAVIRT
 	depends on SECURITY
 	depends on X86_X2APIC
+	select ARCH_HAS_CC_PLATFORM
 	help
 	  Provide support for running in a trusted domain on Intel processors
 	  equipped with Trusted Domain Extensions. TDX is a Intel technology
diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c
index 03bb2f343ddb..589c003d8954 100644
--- a/arch/x86/kernel/cc_platform.c
+++ b/arch/x86/kernel/cc_platform.c
@@ -11,12 +11,18 @@
 #include <linux/cc_platform.h>
 #include <linux/mem_encrypt.h>
 
+#include <asm/tdx.h>
 #include <asm/processor.h>
 
-static bool __maybe_unused intel_cc_platform_has(enum cc_attr attr)
+static bool intel_cc_platform_has(enum cc_attr attr)
 {
 #ifdef CONFIG_INTEL_TDX_GUEST
-	return false;
+	switch (attr) {
+	case CC_ATTR_GUEST_TDX:
+		return is_tdx_guest();
+	default:
+		return false;
+	}
 #else
 	return false;
 #endif
@@ -63,6 +69,8 @@ bool cc_platform_has(enum cc_attr attr)
 {
 	if (sme_me_mask)
 		return amd_cc_platform_has(attr);
+	else if (is_tdx_guest())
+		return intel_cc_platform_has(attr);
 
 	return false;
 }
diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h
index a075b70b9a70..6124527a0423 100644
--- a/include/linux/cc_platform.h
+++ b/include/linux/cc_platform.h
@@ -61,6 +61,15 @@ enum cc_attr {
 	 * Examples include SEV-ES.
 	 */
 	CC_ATTR_GUEST_STATE_ENCRYPT,
+
+	/**
+	 * @CC_ATTR_GUEST_TDX: Trusted Domain Extension Support
+	 *
+	 * The platform/OS is running as a TDX guest/virtual machine.
+	 *
+	 * Examples include Intel TDX.
+	 */
+	CC_ATTR_GUEST_TDX,
 };
 
 #ifdef CONFIG_ARCH_HAS_CC_PLATFORM
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v10 05/11] x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper functions
  2021-10-09  5:37 [PATCH v10 00/11] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
                   ` (3 preceding siblings ...)
  2021-10-09  5:37 ` [PATCH v10 04/11] x86/tdx: Add TDX support to intel_cc_platform_has() Kuppuswamy Sathyanarayanan
@ 2021-10-09  5:37 ` Kuppuswamy Sathyanarayanan
  2021-10-14  7:28   ` Thomas Gleixner
  2021-10-09  5:37 ` [PATCH v10 06/11] x86/traps: Add #VE support for TDX guest Kuppuswamy Sathyanarayanan
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 58+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-10-09  5:37 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

Guests communicate with VMMs with hypercalls. Historically, these
are implemented using instructions that are known to cause VMEXITs
like VMCALL, VMLAUNCH, etc. However, with TDX, VMEXITs no longer
expose guest state to the host.  This prevents the old hypercall
mechanisms from working. So to communicate with VMM, TDX
specification defines a new instruction called TDCALL.

In a TDX based VM, since VMM is an untrusted entity, a intermediary
layer (TDX module) exists between host and guest to facilitate
secure communication. TDX guests communicate with the TDX module
using TDCALL instruction. 

Both TDX module and VMM communication uses TDCALL instruction. Value
of the RAX register when executing TDCALL instruction is used to
determine the TDCALL type. If the TDCALL is executed with value "0"
in RAX, then it is the service request to VMM. Any other value in
RAX means it is a TDX module related call.

Implement common helper functions to communicate with the TDX Module
and VMM (using TDCALL instruction).
   
__tdx_hypercall()    - request services from the VMM.
__tdx_module_call()  - communicate with the TDX Module.

Also define two additional wrappers, tdx_hypercall() and
tdx_hypercall_out_r11() to cover common use cases of
__tdx_hypercall() function. Since each use case of
__tdx_module_call() is different, it does not need
multiple wrappers.

Implement __tdx_module_call() and __tdx_hypercall() helper functions
in assembly.

Rationale behind choosing to use assembly over inline assembly is,
since the number of lines of instructions (with comments) in
__tdx_hypercall() implementation is over 70, using inline assembly
to implement it will make it hard to read.
   
Also, just like syscalls, not all TDVMCALL/TDCALLs use cases need to
use the same set of argument registers. The implementation here picks
the current worst-case scenario for TDCALL (4 registers). For TDCALLs
with fewer than 4 arguments, there will end up being a few superfluous
(cheap) instructions.  But, this approach maximizes code reuse. The
same argument applies to __tdx_hypercall() function as well.

For registers used by TDCALL instruction, please check TDX GHCI
specification, sec 2.4 and 3.

https://software.intel.com/content/dam/develop/external/us/en/documents/intel-tdx-guest-hypervisor-communication-interface.pdf

Originally-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Changes since v9:
 * None

Changes since v8:
 * Removed ifdef CONFIG_FRAME_POINTER for ARG7_SP_OFFSET.
 * Removed push/pop for R9 register in__tdx_hypercall().

Changes since v7:
 * None

Changes since v6:
 * None

Changes since v5:
 * Fixed commit log (vmcall -> VMCALL, vmlaunch -> VMLAUNCH).
 * Fixed comments typo in __tdx_hypercall().

Changes since v4:
 * None

Changes since v3:
 * Fixed ARG7_SP_OFFSET value for CONFIG_FRAME_POINTER case.
 * Fixed usage of we/you in comments.

Change since v1:
 * Fixed commit comments format issues as per review comments.
 * Fixed empty line and other typo issues found in review.
 * Removed do_tdx_hypercall() helper function and modified
   __tdx_hypercall() to include do_tdx_hypercall() implementation
   and to accept TDVMCALL type as argument.
 * Since the number of arguments in __tdx_hypercall() is more than
   6, it has been modified  to get the 7th argument from the caller stack.
 * Instead of leaving output pointer in register before making
   TDCALL, stored it in a stack.
 * Instead of triggering ud2 for TDCALL failures in __tdx_hypercall(),
   it is modified to return the TDCALL status as return value. we will let
   user add appropriate error info before triggering fatal error. Also,
   extended struct tdx_hypercall_output to store r10 register which
   contains hypercall error code.
 * Included TDCALL ABI details in __tdx_module_call() and __tdx_hypercall.
 * Removed tdx_hypercall_out_r11() helper function. Since it is not really
   useful.
 * Added _tdx_hypercall() as a wrapper for __tdx_hypercall() with BUG_ON
   check for TDCALL failure.

 arch/x86/include/asm/tdx.h    |  40 +++++
 arch/x86/kernel/Makefile      |   2 +-
 arch/x86/kernel/asm-offsets.c |  23 +++
 arch/x86/kernel/tdcall.S      | 274 ++++++++++++++++++++++++++++++++++
 arch/x86/kernel/tdx.c         |  23 +++
 5 files changed, 361 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kernel/tdcall.S

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 96ffce7bc530..5f352c5d4d1f 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -4,14 +4,54 @@
 #define _ASM_X86_TDX_H
 
 #include <linux/cpufeature.h>
+#include <linux/types.h>
 
 #define TDX_CPUID_LEAF_ID	0x21
+#define TDX_HYPERCALL_STANDARD  0
+
+/*
+ * Used in __tdx_module_call() helper function to gather the
+ * output registers' values of TDCALL instruction when requesting
+ * services from the TDX module. This is software only structure
+ * and not related to TDX module/VMM.
+ */
+struct tdx_module_output {
+	u64 rcx;
+	u64 rdx;
+	u64 r8;
+	u64 r9;
+	u64 r10;
+	u64 r11;
+};
+
+/*
+ * Used in __tdx_hypercall() helper function to gather the
+ * output registers' values of TDCALL instruction when requesting
+ * services from the VMM. This is software only structure
+ * and not related to TDX module/VMM.
+ */
+struct tdx_hypercall_output {
+	u64 r10;
+	u64 r11;
+	u64 r12;
+	u64 r13;
+	u64 r14;
+	u64 r15;
+};
 
 #ifdef CONFIG_INTEL_TDX_GUEST
 
 bool is_tdx_guest(void);
 void __init tdx_early_init(void);
 
+/* Helper function used to communicate with the TDX module */
+u64 __tdx_module_call(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
+		      struct tdx_module_output *out);
+
+/* Helper function used to request services from VMM */
+u64 __tdx_hypercall(u64 type, u64 fn, u64 r12, u64 r13, u64 r14,
+		    u64 r15, struct tdx_hypercall_output *out);
+
 #else
 
 static inline bool is_tdx_guest(void) { return false; }
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 8c9a9214dd34..59578a6488ad 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -133,7 +133,7 @@ obj-$(CONFIG_PARAVIRT_CLOCK)	+= pvclock.o
 obj-$(CONFIG_X86_PMEM_LEGACY_DEVICE) += pmem.o
 
 obj-$(CONFIG_JAILHOUSE_GUEST)	+= jailhouse.o
-obj-$(CONFIG_INTEL_TDX_GUEST)	+= tdx.o
+obj-$(CONFIG_INTEL_TDX_GUEST)	+= tdcall.o tdx.o
 
 obj-$(CONFIG_EISA)		+= eisa.o
 obj-$(CONFIG_PCSPKR_PLATFORM)	+= pcspeaker.o
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index ecd3fd6993d1..0494ec01218d 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -23,6 +23,10 @@
 #include <xen/interface/xen.h>
 #endif
 
+#ifdef CONFIG_INTEL_TDX_GUEST
+#include <asm/tdx.h>
+#endif
+
 #ifdef CONFIG_X86_32
 # include "asm-offsets_32.c"
 #else
@@ -68,6 +72,25 @@ static void __used common(void)
 	OFFSET(XEN_vcpu_info_arch_cr2, vcpu_info, arch.cr2);
 #endif
 
+#ifdef CONFIG_INTEL_TDX_GUEST
+	BLANK();
+	/* Offset for fields in tdx_module_output */
+	OFFSET(TDX_MODULE_rcx, tdx_module_output, rcx);
+	OFFSET(TDX_MODULE_rdx, tdx_module_output, rdx);
+	OFFSET(TDX_MODULE_r8,  tdx_module_output, r8);
+	OFFSET(TDX_MODULE_r9,  tdx_module_output, r9);
+	OFFSET(TDX_MODULE_r10, tdx_module_output, r10);
+	OFFSET(TDX_MODULE_r11, tdx_module_output, r11);
+
+	/* Offset for fields in tdx_hypercall_output */
+	OFFSET(TDX_HYPERCALL_r10, tdx_hypercall_output, r10);
+	OFFSET(TDX_HYPERCALL_r11, tdx_hypercall_output, r11);
+	OFFSET(TDX_HYPERCALL_r12, tdx_hypercall_output, r12);
+	OFFSET(TDX_HYPERCALL_r13, tdx_hypercall_output, r13);
+	OFFSET(TDX_HYPERCALL_r14, tdx_hypercall_output, r14);
+	OFFSET(TDX_HYPERCALL_r15, tdx_hypercall_output, r15);
+#endif
+
 	BLANK();
 	OFFSET(BP_scratch, boot_params, scratch);
 	OFFSET(BP_secure_boot, boot_params, secure_boot);
diff --git a/arch/x86/kernel/tdcall.S b/arch/x86/kernel/tdcall.S
new file mode 100644
index 000000000000..c7a5b8d79552
--- /dev/null
+++ b/arch/x86/kernel/tdcall.S
@@ -0,0 +1,274 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include <asm/asm-offsets.h>
+#include <asm/asm.h>
+#include <asm/frame.h>
+#include <asm/unwind_hints.h>
+
+#include <linux/linkage.h>
+#include <linux/bits.h>
+#include <linux/errno.h>
+
+#define TDG_R10		BIT(10)
+#define TDG_R11		BIT(11)
+#define TDG_R12		BIT(12)
+#define TDG_R13		BIT(13)
+#define TDG_R14		BIT(14)
+#define TDG_R15		BIT(15)
+
+/* Frame offset + 8 (for arg1) */
+#define ARG7_SP_OFFSET		(FRAME_OFFSET + 0x08)
+
+/*
+ * Expose registers R10-R15 to VMM. It is passed via RCX register
+ * to the TDX Module, which will be used by the TDX module to
+ * identify the list of registers exposed to VMM. Each bit in this
+ * mask represents a register ID. Bit field details can be found
+ * in TDX GHCI specification.
+ */
+#define TDVMCALL_EXPOSE_REGS_MASK	( TDG_R10 | TDG_R11 | \
+					  TDG_R12 | TDG_R13 | \
+					  TDG_R14 | TDG_R15 )
+
+/*
+ * TDX guests use the TDCALL instruction to make requests to the
+ * TDX module and hypercalls to the VMM. It is supported in
+ * Binutils >= 2.36.
+ */
+#define tdcall .byte 0x66,0x0f,0x01,0xcc
+
+/*
+ * __tdx_module_call()  - Helper function used by TDX guests to request
+ * services from the TDX module (does not include VMM services).
+ *
+ * This function serves as a wrapper to move user call arguments to the
+ * correct registers as specified by TDCALL ABI and share it with the
+ * TDX module. If the TDCALL operation is successful and a valid
+ * "struct tdx_module_output" pointer is available (in "out" argument),
+ * output from the TDX module is saved to the memory specified in the
+ * "out" pointer. Also the status of the TDCALL operation is returned
+ * back to the user as a function return value.
+ *
+ *-------------------------------------------------------------------------
+ * TDCALL ABI:
+ *-------------------------------------------------------------------------
+ * Input Registers:
+ *
+ * RAX                 - TDCALL Leaf number.
+ * RCX,RDX,R8-R9       - TDCALL Leaf specific input registers.
+ *
+ * Output Registers:
+ *
+ * RAX                 - TDCALL instruction error code.
+ * RCX,RDX,R8-R11      - TDCALL Leaf specific output registers.
+ *
+ *-------------------------------------------------------------------------
+ *
+ * __tdx_module_call() function ABI:
+ *
+ * @fn  (RDI)          - TDCALL Leaf ID,    moved to RAX
+ * @rcx (RSI)          - Input parameter 1, moved to RCX
+ * @rdx (RDX)          - Input parameter 2, moved to RDX
+ * @r8  (RCX)          - Input parameter 3, moved to R8
+ * @r9  (R8)           - Input parameter 4, moved to R9
+ *
+ * @out (R9)           - struct tdx_module_output pointer
+ *                       stored temporarily in R12 (not
+ *                       shared with the TDX module). It
+ *                       can be NULL.
+ *
+ * Return status of TDCALL via RAX.
+ */
+SYM_FUNC_START(__tdx_module_call)
+	FRAME_BEGIN
+
+	/*
+	 * R12 will be used as temporary storage for
+	 * struct tdx_module_output pointer. More
+	 * details about struct tdx_module_output can
+	 * be found in arch/x86/include/asm/tdx.h. Also
+	 * note that registers R12-R15 are not used by
+	 * TDCALL services supported by this helper
+	 * function.
+	 */
+
+	/* Callee saved, so preserve it */
+	push %r12
+
+	/*
+	 * Push output pointer to stack, after TDCALL operation,
+	 * it will be fetched into R12 register.
+	 */
+	push %r9
+
+	/* Mangle function call ABI into TDCALL ABI: */
+	/* Move TDCALL Leaf ID to RAX */
+	mov %rdi, %rax
+	/* Move input 4 to R9 */
+	mov %r8,  %r9
+	/* Move input 3 to R8 */
+	mov %rcx, %r8
+	/* Move input 1 to RCX */
+	mov %rsi, %rcx
+	/* Leave input param 2 in RDX */
+
+	tdcall
+
+	/* Fetch output pointer from stack to R12 */
+	pop %r12
+
+	/* Check for TDCALL success: 0 - Successful, otherwise failed */
+	test %rax, %rax
+	jnz 1f
+
+	/*
+	 * __tdx_module_call() can be initiated without an output pointer.
+	 * So, check if caller provided an output struct before storing
+	 * output registers.
+	 */
+	test %r12, %r12
+	jz 1f
+
+	/* Copy TDCALL result registers to output struct: */
+	movq %rcx, TDX_MODULE_rcx(%r12)
+	movq %rdx, TDX_MODULE_rdx(%r12)
+	movq %r8,  TDX_MODULE_r8(%r12)
+	movq %r9,  TDX_MODULE_r9(%r12)
+	movq %r10, TDX_MODULE_r10(%r12)
+	movq %r11, TDX_MODULE_r11(%r12)
+1:
+	/* Restore the state of R12 register */
+	pop %r12
+
+	FRAME_END
+	ret
+SYM_FUNC_END(__tdx_module_call)
+
+/*
+ * __tdx_hypercall()  - Helper function used by TDX guests to request
+ * services from the VMM. All requests are made via the TDX module
+ * using TDCALL instruction.
+ *
+ * This function serves as a wrapper to move user call arguments to the
+ * correct registers as specified by TDCALL ABI and share it with VMM
+ * via the TDX module. After TDCALL operation, output from the VMM is
+ * saved to the memory specified in the "out" (struct tdx_hypercall_output)
+ * pointer. 
+ *
+ *-------------------------------------------------------------------------
+ * TD VMCALL ABI:
+ *-------------------------------------------------------------------------
+ *
+ * Input Registers:
+ *
+ * RAX                 - TDCALL instruction leaf number (0 - TDG.VP.VMCALL)
+ * RCX                 - BITMAP which controls which part of TD Guest GPR
+ *                       is passed as-is to VMM and back.
+ * R10                 - Set 0 to indicate TDCALL follows standard TDX ABI
+ *                       specification. Non zero value indicates vendor
+ *                       specific ABI.
+ * R11                 - VMCALL sub function number
+ * RBX, RBP, RDI, RSI  - Used to pass VMCALL sub function specific arguments.
+ * R8-R9, R12–R15      - Same as above.
+ *
+ * Output Registers:
+ *
+ * RAX                 - TDCALL instruction status (Not related to hypercall
+ *                        output).
+ * R10                 - Hypercall output error code.
+ * R11-R15             - Hypercall sub function specific output values.
+ *
+ *-------------------------------------------------------------------------
+ *
+ * __tdx_hypercall() function ABI:
+ *
+ * @type  (RDI)        - TD VMCALL type, moved to R10
+ * @fn    (RSI)        - TD VMCALL sub function, moved to R11
+ * @r12   (RDX)        - Input parameter 1, moved to R12
+ * @r13   (RCX)        - Input parameter 2, moved to R13
+ * @r14   (R8)         - Input parameter 3, moved to R14
+ * @r15   (R9)         - Input parameter 4, moved to R15
+ *
+ * @out   (stack)      - struct tdx_hypercall_output pointer (cannot be NULL)
+ *
+ * On successful completion, return TDCALL status or -EINVAL for invalid
+ * inputs.
+ */
+SYM_FUNC_START(__tdx_hypercall)
+	FRAME_BEGIN
+
+	/* Move argument 7 from caller stack to RAX */
+	movq ARG7_SP_OFFSET(%rsp), %rax
+
+	/* Check if caller provided an output struct */
+	test %rax, %rax
+	/* If out pointer is NULL, return -EINVAL */
+	jz 1f
+
+	/* Save callee-saved GPRs as mandated by the x86_64 ABI */
+	push %r15
+	push %r14
+	push %r13
+	push %r12
+
+	/*
+	 * Save output pointer (rax) in stack, it will be used
+	 * again when storing the output registers after TDCALL
+	 * operation.
+	 */
+	push %rax
+
+	/* Mangle function call ABI into TDCALL ABI: */
+	/* Set TDCALL leaf ID (TDVMCALL (0)) in RAX */
+	xor %eax, %eax
+	/* Move TDVMCALL type (standard vs vendor) in R10 */
+	mov %rdi, %r10
+	/* Move TDVMCALL sub function id to R11 */
+	mov %rsi, %r11
+	/* Move input 1 to R12 */
+	mov %rdx, %r12
+	/* Move input 2 to R13 */
+	mov %rcx, %r13
+	/* Move input 3 to R14 */
+	mov %r8,  %r14
+	/* Move input 4 to R15 */
+	mov %r9,  %r15
+
+	movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx
+
+	tdcall
+
+	/* Restore output pointer to R9 */
+	pop  %r9
+
+	/* Copy hypercall result registers to output struct: */
+	movq %r10, TDX_HYPERCALL_r10(%r9)
+	movq %r11, TDX_HYPERCALL_r11(%r9)
+	movq %r12, TDX_HYPERCALL_r12(%r9)
+	movq %r13, TDX_HYPERCALL_r13(%r9)
+	movq %r14, TDX_HYPERCALL_r14(%r9)
+	movq %r15, TDX_HYPERCALL_r15(%r9)
+
+	/*
+	 * Zero out registers exposed to the VMM to avoid
+	 * speculative execution with VMM-controlled values.
+	 * This needs to include all registers present in
+	 * TDVMCALL_EXPOSE_REGS_MASK (except R12-R15).
+	 * R12-R15 context will be restored.
+	 */
+	xor %r10d, %r10d
+	xor %r11d, %r11d
+
+	/* Restore callee-saved GPRs as mandated by the x86_64 ABI */
+	pop %r12
+	pop %r13
+	pop %r14
+	pop %r15
+
+	jmp 2f
+1:
+       movq $(-EINVAL), %rax
+2:
+       FRAME_END
+
+       retq
+SYM_FUNC_END(__tdx_hypercall)
diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index 88bf12788684..3246ffb3606b 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -27,6 +27,29 @@ bool is_tdx_guest(void)
 	return !!tdx_guest;
 }
 
+/*
+ * Wrapper for standard use of __tdx_hypercall with BUG_ON() check
+ * for TDCALL error.
+ */
+static inline u64 _tdx_hypercall(u64 fn, u64 r12, u64 r13, u64 r14,
+				 u64 r15, struct tdx_hypercall_output *out)
+{
+	struct tdx_hypercall_output outl;
+	u64 err;
+
+	/* __tdx_hypercall() does not accept NULL output pointer */
+	if (!out)
+		out = &outl;
+
+	err = __tdx_hypercall(TDX_HYPERCALL_STANDARD, fn, r12, r13, r14,
+			      r15, out);
+
+	/* Non zero return value indicates buggy TDX module, so panic */
+	BUG_ON(err);
+
+	return out->r10;
+}
+
 void __init tdx_early_init(void)
 {
 	if (!is_tdx_guest())
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v10 06/11] x86/traps: Add #VE support for TDX guest
  2021-10-09  5:37 [PATCH v10 00/11] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
                   ` (4 preceding siblings ...)
  2021-10-09  5:37 ` [PATCH v10 05/11] x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper functions Kuppuswamy Sathyanarayanan
@ 2021-10-09  5:37 ` Kuppuswamy Sathyanarayanan
  2021-10-14  8:30   ` Thomas Gleixner
  2021-10-09  5:37 ` [PATCH v10 07/11] x86/tdx: Add HLT " Kuppuswamy Sathyanarayanan
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 58+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-10-09  5:37 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Virtualization Exceptions (#VE) are delivered to TDX guests due to
specific guest actions which may happen in either user space or the kernel:

 * Specific instructions (WBINVD, for example)
 * Specific MSR accesses
 * Specific CPUID leaf accesses
 * Access to TD-shared memory, which includes MMIO

In the settings that Linux will run in, virtual exceptions are never
generated on accesses to normal, TD-private memory that has been
accepted.

The entry paths do not access TD-shared memory, MMIO regions or use
those specific MSRs, instructions, CPUID leaves that might generate #VE.
In addition, all interrupts including NMIs are blocked by the hardware
starting with #VE delivery until TDGETVEINFO is called.  This eliminates
the chance of a #VE during the syscall gap or paranoid entry paths and
simplifies #VE handling.

After TDGETVEINFO #VE could happen in theory (e.g. through an NMI),
but it is expected not to happen because TDX expects NMIs not to
trigger #VEs. Another case where they could happen is if the #VE
exception panics, but in this case there are no guarantees on anything
anyways.

If a guest kernel action which would normally cause a #VE occurs in the
interrupt-disabled region before TDGETVEINFO, a #DF is delivered to the
guest which will result in an oops (and should eventually be a panic, as
it is expected panic_on_oops is set to 1 for TDX guests).

Add basic infrastructure to handle any #VE which occurs in the kernel or
userspace.  Later patches will add handling for specific #VE scenarios.

Convert unhandled #VE's (everything, until later in this series) so that
they appear just like a #GP by calling ve_raise_fault() directly.
ve_raise_fault() is similar to #GP handler and is responsible for
sending SIGSEGV to userspace and CPU die and notifying debuggers and
other die chain users.  

Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Changes since v9:
 * None

Changes since v8:
 * Changed tdx_handle_virtualization_exception() return type to bool.
 * Changed tdx_get_ve_info() return type to bool.
 * Modified the code to adapt to above return type changes.
 * In tdx_get_ve_info() added logic to prevent ve update for tdcall
   failure case.
 * Renamed TDGETVEINFO to TDX_GET_VEINFO

Changes since v7:
 * None

Changes since v6:
 * None

Changes since v5:
 * Fixed "We" usage in commit log and replaced cpu -> CPU.
 * Renamed "tdg_" prefix with "tdx_".
 * Removed TODO comment in tdg_handle_virtualization_exception() as
   per Boris review comments.
 * Added comments for ve_raise_fault().

Changes since v4:
 * Since ve_raise_fault() is used only by TDX code, moved it
   within #ifdef CONFIG_INTEL_TDX_GUEST.

Changes since v3:
 * None

 arch/x86/include/asm/idtentry.h |  4 ++
 arch/x86/include/asm/tdx.h      | 19 ++++++++
 arch/x86/kernel/idt.c           |  3 ++
 arch/x86/kernel/tdx.c           | 38 ++++++++++++++++
 arch/x86/kernel/traps.c         | 77 +++++++++++++++++++++++++++++++++
 5 files changed, 141 insertions(+)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 1345088e9902..8ccc81d653b3 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -625,6 +625,10 @@ DECLARE_IDTENTRY_XENCB(X86_TRAP_OTHER,	exc_xen_hypervisor_callback);
 DECLARE_IDTENTRY_RAW(X86_TRAP_OTHER,	exc_xen_unknown_trap);
 #endif
 
+#ifdef CONFIG_INTEL_TDX_GUEST
+DECLARE_IDTENTRY(X86_TRAP_VE,		exc_virtualization_exception);
+#endif
+
 /* Device interrupts common/spurious */
 DECLARE_IDTENTRY_IRQ(X86_TRAP_OTHER,	common_interrupt);
 #ifdef CONFIG_X86_LOCAL_APIC
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 5f352c5d4d1f..8031d4bfe9ed 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -39,6 +39,20 @@ struct tdx_hypercall_output {
 	u64 r15;
 };
 
+/*
+ * Used by #VE exception handler to gather the #VE exception
+ * info from the TDX module. This is software only structure
+ * and not related to TDX module/VMM.
+ */
+struct ve_info {
+	u64 exit_reason;
+	u64 exit_qual;
+	u64 gla;	/* Guest Linear (virtual) Address */
+	u64 gpa;	/* Guest Physical (virtual) Address */
+	u32 instr_len;
+	u32 instr_info;
+};
+
 #ifdef CONFIG_INTEL_TDX_GUEST
 
 bool is_tdx_guest(void);
@@ -52,6 +66,11 @@ u64 __tdx_module_call(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
 u64 __tdx_hypercall(u64 type, u64 fn, u64 r12, u64 r13, u64 r14,
 		    u64 r15, struct tdx_hypercall_output *out);
 
+bool tdx_get_ve_info(struct ve_info *ve);
+
+bool tdx_handle_virtualization_exception(struct pt_regs *regs,
+					 struct ve_info *ve);
+
 #else
 
 static inline bool is_tdx_guest(void) { return false; }
diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index df0fa695bb09..1da074123c16 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -68,6 +68,9 @@ static const __initconst struct idt_data early_idts[] = {
 	 */
 	INTG(X86_TRAP_PF,		asm_exc_page_fault),
 #endif
+#ifdef CONFIG_INTEL_TDX_GUEST
+	INTG(X86_TRAP_VE,		asm_exc_virtualization_exception),
+#endif
 };
 
 /*
diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index 3246ffb3606b..1ef979008fe8 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -6,6 +6,9 @@
 
 #include <asm/tdx.h>
 
+/* TDX Module call Leaf IDs */
+#define TDX_GET_VEINFO			3
+
 bool is_tdx_guest(void)
 {
 	static int tdx_guest = -1;
@@ -50,6 +53,41 @@ static inline u64 _tdx_hypercall(u64 fn, u64 r12, u64 r13, u64 r14,
 	return out->r10;
 }
 
+bool tdx_get_ve_info(struct ve_info *ve)
+{
+	struct tdx_module_output out;
+	u64 ret;
+
+	if (!ve)
+		return false;
+
+	/*
+	 * NMIs and machine checks are suppressed. Before this point any
+	 * #VE is fatal. After this point (TDGETVEINFO call), NMIs and
+	 * additional #VEs are permitted (but it is expected not to
+	 * happen unless kernel panics).
+	 */
+	ret = __tdx_module_call(TDX_GET_VEINFO, 0, 0, 0, 0, &out);
+	if (ret)
+		return false;
+
+	ve->exit_reason = out.rcx;
+	ve->exit_qual   = out.rdx;
+	ve->gla         = out.r8;
+	ve->gpa         = out.r9;
+	ve->instr_len   = out.r10 & UINT_MAX;
+	ve->instr_info  = out.r10 >> 32;
+
+	return true;
+}
+
+bool tdx_handle_virtualization_exception(struct pt_regs *regs,
+					 struct ve_info *ve)
+{
+	pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
+	return false;
+}
+
 void __init tdx_early_init(void)
 {
 	if (!is_tdx_guest())
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index a58800973aed..70d76c3a548f 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -61,6 +61,7 @@
 #include <asm/insn.h>
 #include <asm/insn-eval.h>
 #include <asm/vdso.h>
+#include <asm/tdx.h>
 
 #ifdef CONFIG_X86_64
 #include <asm/x86_init.h>
@@ -1140,6 +1141,82 @@ DEFINE_IDTENTRY(exc_device_not_available)
 	}
 }
 
+#ifdef CONFIG_INTEL_TDX_GUEST
+#define VE_FAULT_STR "VE fault"
+static void ve_raise_fault(struct pt_regs *regs, long error_code)
+{
+	struct task_struct *tsk = current;
+
+	if (user_mode(regs)) {
+		tsk->thread.error_code = error_code;
+		tsk->thread.trap_nr = X86_TRAP_VE;
+
+		/*
+		 * Not fixing up VDSO exceptions similar to #GP handler
+		 * because it is expected that VDSO doesn't trigger #VE.
+		 */
+		show_signal(tsk, SIGSEGV, "", VE_FAULT_STR, regs, error_code);
+		force_sig(SIGSEGV);
+		return;
+	}
+
+	/*
+	 * Attempt to recover from #VE exception failure without
+	 * triggering OOPS (useful for MSR read/write failures)
+	 */
+	if (fixup_exception(regs, X86_TRAP_VE, error_code, 0))
+		return;
+
+	tsk->thread.error_code = error_code;
+	tsk->thread.trap_nr = X86_TRAP_VE;
+
+	/*
+	 * To be potentially processing a kprobe fault and to trust the result
+	 * from kprobe_running(), it should be non-preemptible.
+	 */
+	if (!preemptible() &&
+	    kprobe_running() &&
+	    kprobe_fault_handler(regs, X86_TRAP_VE))
+		return;
+
+	/* Notify about #VE handling failure, useful for debugger hooks */
+	if (notify_die(DIE_GPF, VE_FAULT_STR, regs, error_code,
+		       X86_TRAP_VE, SIGSEGV) == NOTIFY_STOP)
+		return;
+
+	/* Trigger OOPS and panic */
+	die_addr(VE_FAULT_STR, regs, error_code, 0);
+}
+
+DEFINE_IDTENTRY(exc_virtualization_exception)
+{
+	struct ve_info ve;
+	bool ret;
+
+	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+
+	/*
+	 * NMIs/Machine-checks/Interrupts will be in a disabled state
+	 * till TDGETVEINFO TDCALL is executed. This prevents #VE
+	 * nesting issue.
+	 */
+	ret = tdx_get_ve_info(&ve);
+
+	cond_local_irq_enable(regs);
+
+	if (ret)
+		ret = tdx_handle_virtualization_exception(regs, &ve);
+	/*
+	 * If tdx_handle_virtualization_exception() could not process
+	 * it successfully, treat it as #GP(0) and handle it.
+	 */
+	if (!ret)
+		ve_raise_fault(regs, 0);
+
+	cond_local_irq_disable(regs);
+}
+#endif
+
 #ifdef CONFIG_X86_32
 DEFINE_IDTENTRY_SW(iret_error)
 {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v10 07/11] x86/tdx: Add HLT support for TDX guest
  2021-10-09  5:37 [PATCH v10 00/11] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
                   ` (5 preceding siblings ...)
  2021-10-09  5:37 ` [PATCH v10 06/11] x86/traps: Add #VE support for TDX guest Kuppuswamy Sathyanarayanan
@ 2021-10-09  5:37 ` Kuppuswamy Sathyanarayanan
  2021-10-14  9:30   ` Thomas Gleixner
  2021-10-09  5:37 ` [PATCH v10 08/11] x86/tdx: Wire up KVM hypercalls Kuppuswamy Sathyanarayanan
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 58+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-10-09  5:37 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Per Guest-Host-Communication Interface (GHCI) for Intel Trust
Domain Extensions (Intel TDX) specification, sec 3.8,
TDVMCALL[Instruction.HLT] provides HLT operation. Use it to implement
halt() and safe_halt() paravirtualization calls.

The same TDX hypercall is used to handle #VE exception due to
EXIT_REASON_HLT.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Changes since v9:
 * None

Changes since v8:
 * Moved tdx_halt() comments near variable declaration.

Changes since v7:
 * Added section title to spec reference in commit log and comments.
 * Added extra comments as per review suggestion.

Changes since v6:
 * None

Changes since v5:
 * Replaced sti with STI in commit log and comments.
 * Added comments for _tdx_hypercall() usage in _tdx_halt().
 * Added new helper function _tdx_halt() to contain common
   code between tdx_halt() and tdx_safe_halt().
 * Renamed tdg_->tdx_.
 * Removed BUG_ON() and used WARN_ONCE() for HLT emulation
   failure.

Changes since v4:
 * Added exception for EXIT_REASON_HLT in __tdx_hypercall() to
   enable interrupts using sti.

Changes since v3:
 * None

 arch/x86/kernel/tdcall.S | 30 ++++++++++++++++
 arch/x86/kernel/tdx.c    | 75 ++++++++++++++++++++++++++++++++++++++--
 2 files changed, 103 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/tdcall.S b/arch/x86/kernel/tdcall.S
index c7a5b8d79552..4b1fd1dd2ab5 100644
--- a/arch/x86/kernel/tdcall.S
+++ b/arch/x86/kernel/tdcall.S
@@ -36,6 +36,9 @@
  */
 #define tdcall .byte 0x66,0x0f,0x01,0xcc
 
+/* HLT TDVMCALL sub-function ID */
+#define EXIT_REASON_HLT			12
+
 /*
  * __tdx_module_call()  - Helper function used by TDX guests to request
  * services from the TDX module (does not include VMM services).
@@ -235,6 +238,33 @@ SYM_FUNC_START(__tdx_hypercall)
 
 	movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx
 
+	/*
+	 * For the idle loop STI needs to be called directly before
+	 * the TDCALL that enters idle (EXIT_REASON_HLT case). STI
+	 * enables interrupts only one instruction later. If there
+	 * are any instructions between the STI and the TDCALL for
+	 * HLT then an interrupt could happen in that time, but the
+	 * code would go back to sleep afterwards, which can cause
+	 * longer delays.
+	 *
+	 * This leads to significant difference in network performance
+	 * benchmarks. So add a special case for EXIT_REASON_HLT to
+	 * trigger STI before TDCALL. But this change is not required
+	 * for all HLT cases. So use R15 register value to identify the
+	 * case which needs STI. So, if R11 is EXIT_REASON_HLT and R15
+	 * is 1, then call STI before TDCALL instruction. Note that R15
+	 * register is not required by TDCALL ABI when triggering the
+	 * hypercall for EXIT_REASON_HLT case. So use it in software to
+	 * select the STI case.
+	 */
+	cmpl $EXIT_REASON_HLT, %r11d
+	jne skip_sti
+	cmpl $1, %r15d
+	jne skip_sti
+	/* Set R15 register to 0, it is unused in EXIT_REASON_HLT case */
+	xor %r15, %r15
+	sti
+skip_sti:
 	tdcall
 
 	/* Restore output pointer to R9 */
diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index 1ef979008fe8..c05e8824e5e0 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -5,6 +5,7 @@
 #define pr_fmt(fmt)     "tdx: " fmt
 
 #include <asm/tdx.h>
+#include <asm/vmx.h>
 
 /* TDX Module call Leaf IDs */
 #define TDX_GET_VEINFO			3
@@ -53,6 +54,62 @@ static inline u64 _tdx_hypercall(u64 fn, u64 r12, u64 r13, u64 r14,
 	return out->r10;
 }
 
+static __cpuidle void _tdx_halt(const bool irq_disabled, const bool do_sti)
+{
+	u64 ret;
+
+	/*
+	 * Emulate HLT operation via hypercall. More info about ABI
+	 * can be found in TDX Guest-Host-Communication Interface
+	 * (GHCI), sec 3.8 TDG.VP.VMCALL<Instruction.HLT>.
+	 *
+	 * The VMM uses the "IRQ disabled" param to understand IRQ
+	 * enabled status (RFLAGS.IF) of TD guest and determine
+	 * whether or not it should schedule the halted vCPU if an
+	 * IRQ becomes pending. E.g. if IRQs are disabled the VMM
+	 * can keep the vCPU in virtual HLT, even if an IRQ is
+	 * pending, without hanging/breaking the guest.
+	 *
+	 * do_sti parameter is used by __tdx_hypercall() to decide
+	 * whether to call STI instruction before executing TDCALL
+	 * instruction.
+	 */
+	ret = _tdx_hypercall(EXIT_REASON_HLT, irq_disabled, 0, 0, do_sti, NULL);
+
+	/*
+	 * Use WARN_ONCE() to report the failure. Since tdx_*halt() calls
+	 * are also used in pv_ops, #VE error handler cannot be used to
+	 * report the failure.
+	 */
+	WARN_ONCE(ret, "HLT instruction emulation failed\n");
+}
+
+static __cpuidle void tdx_halt(void)
+{
+	/*
+	 * Non safe halt is mainly used in CPU offlining and
+	 * the guest will stay in halt state. So, STI
+	 * instruction call is not required (set do_sti as
+	 * false).
+	 */
+	const bool irq_disabled = irqs_disabled();
+	const bool do_sti = false;
+
+	_tdx_halt(irq_disabled, do_sti);
+}
+
+static __cpuidle void tdx_safe_halt(void)
+{
+	 /*
+	  * Since STI instruction will be called in __tdx_hypercall()
+	  * set irq_disabled as false.
+	  */
+	const bool irq_disabled = false;
+	const bool do_sti = true;
+
+	_tdx_halt(irq_disabled, do_sti);
+}
+
 bool tdx_get_ve_info(struct ve_info *ve)
 {
 	struct tdx_module_output out;
@@ -84,8 +141,19 @@ bool tdx_get_ve_info(struct ve_info *ve)
 bool tdx_handle_virtualization_exception(struct pt_regs *regs,
 					 struct ve_info *ve)
 {
-	pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
-	return false;
+	switch (ve->exit_reason) {
+	case EXIT_REASON_HLT:
+		tdx_halt();
+		break;
+	default:
+		pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
+		return false;
+	}
+
+	/* After successful #VE handling, move the IP */
+	regs->ip += ve->instr_len;
+
+	return true;
 }
 
 void __init tdx_early_init(void)
@@ -95,5 +163,8 @@ void __init tdx_early_init(void)
 
 	setup_force_cpu_cap(X86_FEATURE_TDX_GUEST);
 
+	pv_ops.irq.safe_halt = tdx_safe_halt;
+	pv_ops.irq.halt = tdx_halt;
+
 	pr_info("Guest initialized\n");
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v10 08/11] x86/tdx: Wire up KVM hypercalls
  2021-10-09  5:37 [PATCH v10 00/11] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
                   ` (6 preceding siblings ...)
  2021-10-09  5:37 ` [PATCH v10 07/11] x86/tdx: Add HLT " Kuppuswamy Sathyanarayanan
@ 2021-10-09  5:37 ` Kuppuswamy Sathyanarayanan
  2021-10-14 10:21   ` Thomas Gleixner
  2021-10-09  5:37 ` [PATCH v10 09/11] x86/tdx: Add MSR support for TDX guest Kuppuswamy Sathyanarayanan
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 58+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-10-09  5:37 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

KVM hypercalls use the VMCALL or VMMCALL instructions. Although the ABI
is similar, those instructions no longer function for TDX guests. Make
vendor-specific TDVMCALLs instead of VMCALL. This enables TDX guests to
run with KVM acting as the hypervisor. TDX guests running under other
hypervisors will continue to use those hypervisors' hypercalls.

Since KVM driver can be built as a kernel module, export
tdx_kvm_hypercall*() to make the symbols visible to kvm.ko.

Also, add asm/tdx.h to asm/asm-prototypes.h so that asm symbol's
checksum can be generated in order to support CONFIG_MODVERSIONS with
it and fix:

WARNING: modpost: EXPORT symbol "__tdx_hypercall" [vmlinux] version \
generation failed, symbol will not be versioned.

[Isaku Yamahata: proposed KVM VENDOR string]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Changes since v9:
 * None

Changes since v8:
 * None

Changes since v7:
 * None

Changes since v6:
 * Used cc_platform_has() in place of prot_guest_has().

Changes since v5:
 * Added more info about version generation failed error in commit
   log.
 * Fixed commit log as per review comments.
 * Removed CONFIG_INTEL_TDX_GUEST_KVM and used
   CONFIG_KVM_GUEST/CONFIG_INTEL_TDX_GUEST for TDX KVM hypercall
   implementation.
 * Used EXPORT_SYMBOL_GPL for __tdx_hypercall() export.

Changes since v4:
 * No functional changes.

Changes since v3:
 * Fixed ASM sysmbol generation issue in tdcall.S by including tdx.h
   in asm-prototypes.h

Changes since v1:
 * Replaced is_tdx_guest() with prot_guest_has(PR_GUEST_TDX).
 * Replaced tdx_kvm_hypercall{1-4} with single generic 
   function tdx_kvm_hypercall().
 * Removed __tdx_hypercall_vendor_kvm() and re-used __tdx_hypercall().

 arch/x86/include/asm/asm-prototypes.h |  1 +
 arch/x86/include/asm/kvm_para.h       | 22 ++++++++++++++++++
 arch/x86/include/asm/tdx.h            | 32 +++++++++++++++++++++++++--
 arch/x86/kernel/tdcall.S              |  2 ++
 4 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/asm-prototypes.h b/arch/x86/include/asm/asm-prototypes.h
index 4cb726c71ed8..404add7ee720 100644
--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -6,6 +6,7 @@
 #include <asm/page.h>
 #include <asm/checksum.h>
 #include <asm/mce.h>
+#include <asm/tdx.h>
 
 #include <asm-generic/asm-prototypes.h>
 
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 69299878b200..e51977e77590 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -4,7 +4,9 @@
 
 #include <asm/processor.h>
 #include <asm/alternative.h>
+#include <asm/tdx.h>
 #include <linux/interrupt.h>
+#include <linux/cc_platform.h>
 #include <uapi/asm/kvm_para.h>
 
 #ifdef CONFIG_KVM_GUEST
@@ -32,6 +34,10 @@ static inline bool kvm_check_and_clear_guest_paused(void)
 static inline long kvm_hypercall0(unsigned int nr)
 {
 	long ret;
+
+	if (cc_platform_has(CC_ATTR_GUEST_TDX))
+		return tdx_kvm_hypercall(nr, 0, 0, 0, 0);
+
 	asm volatile(KVM_HYPERCALL
 		     : "=a"(ret)
 		     : "a"(nr)
@@ -42,6 +48,10 @@ static inline long kvm_hypercall0(unsigned int nr)
 static inline long kvm_hypercall1(unsigned int nr, unsigned long p1)
 {
 	long ret;
+
+	if (cc_platform_has(CC_ATTR_GUEST_TDX))
+		return tdx_kvm_hypercall(nr, p1, 0, 0, 0);
+
 	asm volatile(KVM_HYPERCALL
 		     : "=a"(ret)
 		     : "a"(nr), "b"(p1)
@@ -53,6 +63,10 @@ static inline long kvm_hypercall2(unsigned int nr, unsigned long p1,
 				  unsigned long p2)
 {
 	long ret;
+
+	if (cc_platform_has(CC_ATTR_GUEST_TDX))
+		return tdx_kvm_hypercall(nr, p1, p2, 0, 0);
+
 	asm volatile(KVM_HYPERCALL
 		     : "=a"(ret)
 		     : "a"(nr), "b"(p1), "c"(p2)
@@ -64,6 +78,10 @@ static inline long kvm_hypercall3(unsigned int nr, unsigned long p1,
 				  unsigned long p2, unsigned long p3)
 {
 	long ret;
+
+	if (cc_platform_has(CC_ATTR_GUEST_TDX))
+		return tdx_kvm_hypercall(nr, p1, p2, p3, 0);
+
 	asm volatile(KVM_HYPERCALL
 		     : "=a"(ret)
 		     : "a"(nr), "b"(p1), "c"(p2), "d"(p3)
@@ -76,6 +94,10 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned long p1,
 				  unsigned long p4)
 {
 	long ret;
+
+	if (cc_platform_has(CC_ATTR_GUEST_TDX))
+		return tdx_kvm_hypercall(nr, p1, p2, p3, p4);
+
 	asm volatile(KVM_HYPERCALL
 		     : "=a"(ret)
 		     : "a"(nr), "b"(p1), "c"(p2), "d"(p3), "S"(p4)
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 8031d4bfe9ed..54a4daceaa1b 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -6,8 +6,9 @@
 #include <linux/cpufeature.h>
 #include <linux/types.h>
 
-#define TDX_CPUID_LEAF_ID	0x21
-#define TDX_HYPERCALL_STANDARD  0
+#define TDX_CPUID_LEAF_ID			0x21
+#define TDX_HYPERCALL_STANDARD			0
+#define TDX_HYPERCALL_VENDOR_KVM		0x4d564b2e584454 /* TDX.KVM */
 
 /*
  * Used in __tdx_module_call() helper function to gather the
@@ -78,4 +79,31 @@ static inline void tdx_early_init(void) { };
 
 #endif /* CONFIG_INTEL_TDX_GUEST */
 
+#if defined(CONFIG_KVM_GUEST) && defined(CONFIG_INTEL_TDX_GUEST)
+static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
+				     unsigned long p2, unsigned long p3,
+				     unsigned long p4)
+{
+	struct tdx_hypercall_output out;
+	u64 err;
+
+	err = __tdx_hypercall(TDX_HYPERCALL_VENDOR_KVM, nr, p1, p2,
+			      p3, p4, &out);
+
+	/*
+	 * Non zero return value means buggy TDX module (which is fatal).
+	 * So use BUG_ON() to panic.
+	 */
+	BUG_ON(err);
+
+	return out.r10;
+}
+#else
+static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
+				     unsigned long p2, unsigned long p3,
+				     unsigned long p4)
+{
+	return -ENODEV;
+}
+#endif /* CONFIG_INTEL_TDX_GUEST && CONFIG_KVM_GUEST */
 #endif /* _ASM_X86_TDX_H */
diff --git a/arch/x86/kernel/tdcall.S b/arch/x86/kernel/tdcall.S
index 4b1fd1dd2ab5..5ce04f6a34ce 100644
--- a/arch/x86/kernel/tdcall.S
+++ b/arch/x86/kernel/tdcall.S
@@ -3,6 +3,7 @@
 #include <asm/asm.h>
 #include <asm/frame.h>
 #include <asm/unwind_hints.h>
+#include <asm/export.h>
 
 #include <linux/linkage.h>
 #include <linux/bits.h>
@@ -302,3 +303,4 @@ skip_sti:
 
        retq
 SYM_FUNC_END(__tdx_hypercall)
+EXPORT_SYMBOL_GPL(__tdx_hypercall);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v10 09/11] x86/tdx: Add MSR support for TDX guest
  2021-10-09  5:37 [PATCH v10 00/11] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
                   ` (7 preceding siblings ...)
  2021-10-09  5:37 ` [PATCH v10 08/11] x86/tdx: Wire up KVM hypercalls Kuppuswamy Sathyanarayanan
@ 2021-10-09  5:37 ` Kuppuswamy Sathyanarayanan
  2021-10-09  5:37 ` [PATCH v10 10/11] x86/tdx: Don't write CSTAR MSR on Intel Kuppuswamy Sathyanarayanan
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 58+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-10-09  5:37 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Use hypercall to emulate MSR read/write for TDX platform.

TDVMCALL[Instruction.RDMSR] and TDVMCALL[Instruction.WRMSR] provide MSR
oprations.

RDMSR and WRMSR specification details can be found in
Guest-Host-Communication Interface (GHCI) for Intel Trust Domain
Extensions (Intel TDX) specification, sec titled "TDG.VP.
VMCALL<Instruction.RDMSR>" and "TDG.VP.VMCALL<Instruction.WRMSR>".

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Changes since v9:
 * None

Changes since v8:
 * Changed tdx_read_msr_safe() return type to bool.
 * Changed tdx_write_msr_safe() return type to bool.
 * Changed second argument of tdx_read_msr_safe() from err to val.
 * Fixed compliation error due to use of tdx_is_context_switched_msr().

Changes since v7:
 * Removed tdx_is_context_switched_msr() support (since the list
   is incomplete).
 * Added section title to spec reference.

Changes since v6:
 * None

Changes since v5:
 * Renamed "tdg" prefix with "tdx".
 * Added comments for _tdx_hypercall() usage in MSR read/write functions.

Change since v4:
 * Removed You usage from commit log.

Changes since v3:
 * None

 arch/x86/kernel/tdx.c | 51 +++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 49 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index c05e8824e5e0..7c6c84015302 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -110,6 +110,39 @@ static __cpuidle void tdx_safe_halt(void)
 	_tdx_halt(irq_disabled, do_sti);
 }
 
+static bool tdx_read_msr_safe(unsigned int msr, u64 *val)
+{
+	struct tdx_hypercall_output out;
+
+	/*
+	 * Emulate the MSR read via hypercall. More info about ABI
+	 * can be found in TDX Guest-Host-Communication Interface
+	 * (GHCI), sec titled "TDG.VP.VMCALL<Instruction.RDMSR>".
+	 */
+	if (_tdx_hypercall(EXIT_REASON_MSR_READ, msr, 0, 0, 0, &out))
+		return false;
+
+	*val = out.r11;
+
+	return true;
+}
+
+static bool tdx_write_msr_safe(unsigned int msr, unsigned int low,
+			       unsigned int high)
+{
+	u64 ret;
+
+	/*
+	 * Emulate the MSR write via hypercall. More info about ABI
+	 * can be found in TDX Guest-Host-Communication Interface
+	 * (GHCI) sec titled "TDG.VP.VMCALL<Instruction.WRMSR>".
+	 */
+	ret = _tdx_hypercall(EXIT_REASON_MSR_WRITE, msr, (u64)high << 32 | low,
+			     0, 0, NULL);
+
+	return ret ? false : true;
+}
+
 bool tdx_get_ve_info(struct ve_info *ve)
 {
 	struct tdx_module_output out;
@@ -141,19 +174,33 @@ bool tdx_get_ve_info(struct ve_info *ve)
 bool tdx_handle_virtualization_exception(struct pt_regs *regs,
 					 struct ve_info *ve)
 {
+	bool ret = true;
+	u64 val;
+
 	switch (ve->exit_reason) {
 	case EXIT_REASON_HLT:
 		tdx_halt();
 		break;
+	case EXIT_REASON_MSR_READ:
+		ret = tdx_read_msr_safe(regs->cx, &val);
+		if (ret) {
+			regs->ax = (u32)val;
+			regs->dx = val >> 32;
+		}
+		break;
+	case EXIT_REASON_MSR_WRITE:
+		ret = tdx_write_msr_safe(regs->cx, regs->ax, regs->dx);
+		break;
 	default:
 		pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
 		return false;
 	}
 
 	/* After successful #VE handling, move the IP */
-	regs->ip += ve->instr_len;
+	if (ret)
+		regs->ip += ve->instr_len;
 
-	return true;
+	return ret;
 }
 
 void __init tdx_early_init(void)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v10 10/11] x86/tdx: Don't write CSTAR MSR on Intel
  2021-10-09  5:37 [PATCH v10 00/11] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
                   ` (8 preceding siblings ...)
  2021-10-09  5:37 ` [PATCH v10 09/11] x86/tdx: Add MSR support for TDX guest Kuppuswamy Sathyanarayanan
@ 2021-10-09  5:37 ` Kuppuswamy Sathyanarayanan
  2021-10-14 10:47   ` Thomas Gleixner
  2021-10-09  5:37 ` [PATCH v10 11/11] x86/tdx: Handle CPUID via #VE Kuppuswamy Sathyanarayanan
  2021-10-09  7:38 ` [PATCH v10 00/11] Add TDX Guest Support (Initial support) Borislav Petkov
  11 siblings, 1 reply; 58+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-10-09  5:37 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

From: Andi Kleen <ak@linux.intel.com>

On Intel CPUs writing the CSTAR MSR is not really needed. Syscalls
from 32bit work using SYSENTER and 32bit SYSCALL is an illegal opcode.
But the kernel did write it anyways even though it was ignored by
the CPU. Inside a TDX guest this actually leads to a #VE which in
turn will trigger ve_raise_fault() due to failed MSR write. Inside
ve_raise_fault() before it recovers from this error, it prints an
ugly message at boot. Since such warning message is pointless for
CSTAR MSR write failure, add exception to skip CSTAR msr write on
Intel CPUs.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---

Changes since v9:
 * None

Changes since v8:
 * None

Changes since v7:
 * None.

Changes since v6:
 * None.

Changes since v5:
 * Fixed commit log as per review comments.

 arch/x86/kernel/cpu/common.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 0f8885949e8c..fd10f1044157 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1758,7 +1758,13 @@ void syscall_init(void)
 	wrmsrl(MSR_LSTAR, (unsigned long)entry_SYSCALL_64);
 
 #ifdef CONFIG_IA32_EMULATION
-	wrmsrl(MSR_CSTAR, (unsigned long)entry_SYSCALL_compat);
+	/*
+	 * CSTAR is not needed on Intel because it doesn't support
+	 * 32bit SYSCALL, but only SYSENTER. On a TDX guest
+	 * it leads to a #GP.
+	 */
+	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
+		wrmsrl(MSR_CSTAR, (unsigned long)entry_SYSCALL_compat);
 	/*
 	 * This only works on Intel CPUs.
 	 * On AMD CPUs these MSRs are 32-bit, CPU truncates MSR_IA32_SYSENTER_EIP.
@@ -1770,7 +1776,8 @@ void syscall_init(void)
 		    (unsigned long)(cpu_entry_stack(smp_processor_id()) + 1));
 	wrmsrl_safe(MSR_IA32_SYSENTER_EIP, (u64)entry_SYSENTER_compat);
 #else
-	wrmsrl(MSR_CSTAR, (unsigned long)ignore_sysret);
+	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
+		wrmsrl(MSR_CSTAR, (unsigned long)ignore_sysret);
 	wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)GDT_ENTRY_INVALID_SEG);
 	wrmsrl_safe(MSR_IA32_SYSENTER_ESP, 0ULL);
 	wrmsrl_safe(MSR_IA32_SYSENTER_EIP, 0ULL);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v10 11/11] x86/tdx: Handle CPUID via #VE
  2021-10-09  5:37 [PATCH v10 00/11] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
                   ` (9 preceding siblings ...)
  2021-10-09  5:37 ` [PATCH v10 10/11] x86/tdx: Don't write CSTAR MSR on Intel Kuppuswamy Sathyanarayanan
@ 2021-10-09  5:37 ` Kuppuswamy Sathyanarayanan
  2021-10-14 12:01   ` Thomas Gleixner
  2021-10-09  7:38 ` [PATCH v10 00/11] Add TDX Guest Support (Initial support) Borislav Petkov
  11 siblings, 1 reply; 58+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2021-10-09  5:37 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

When running virtualized, the CPUID instruction is handled differently
based on the leaf being accessed.  The behavior depends only on the
leaf and applies equally to both kernel/ring-0 and userspace/ring-3
execution of CPUID. Historically, there are two basic classes:

 * Leaves handled transparently to the guest
 * Leaves handled by the VMM

In a typical guest without TDX, "handled by the VMM" leaves cause a
VMEXIT.  TDX replaces these VMEXITs with a #VE exception in the guest.
The guest typically handles the #VE by making a hypercall to the VMM.

The TDX module specification [1], section titled "CPUID Virtualization"
talks about a few more classes of CPUID handling. But, for the purposes
of this patch, the "handled transparently" CPUID leaves are all lumped
together because the guest handling is the same.

[1] - https://software.intel.com/content/dam/develop/external/us/en/documents/tdx-module-1.0-public-spec-v0.931.pdf

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---

Changes since v9:
 * None

Changes since v8:
 * Removed zero initialization of struct tdx_hypercall_output in
   tdx_handle_cpuid().
 * Changed return type of tdx_handle_cpuid() to bool.

Changes since v7: 
 * None

Changes since v6:
 * Fixed commit log as per Dave Hansen comments.
 * Added extra comments for _tdx_hypercall() usage.

Changes since v5:
 * Fixed commit log as per review comments.
 * Removed WARN_ON() in tdx_handle_cpuid().
 * Renamed "tdg" prefix with "tdx".

Changes since v4:
 * None

Changes since v3:
 * None

 arch/x86/kernel/tdx.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index 7c6c84015302..f8f242215dbc 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -143,6 +143,31 @@ static bool tdx_write_msr_safe(unsigned int msr, unsigned int low,
 	return ret ? false : true;
 }
 
+static bool tdx_handle_cpuid(struct pt_regs *regs)
+{
+	struct tdx_hypercall_output out;
+
+	/*
+	 * Emulate CPUID instruction via hypercall. More info about
+	 * ABI can be found in TDX Guest-Host-Communication Interface
+	 * (GHCI), section titled "VP.VMCALL<Instruction.CPUID>".
+	 */
+	if (_tdx_hypercall(EXIT_REASON_CPUID, regs->ax, regs->cx, 0, 0, &out))
+		return false;
+
+	/*
+	 * As per TDX GHCI CPUID ABI, r12-r15 registers contains contents of
+	 * EAX, EBX, ECX, EDX registers after CPUID instruction execution.
+	 * So copy the register contents back to pt_regs.
+	 */
+	regs->ax = out.r12;
+	regs->bx = out.r13;
+	regs->cx = out.r14;
+	regs->dx = out.r15;
+
+	return true;
+}
+
 bool tdx_get_ve_info(struct ve_info *ve)
 {
 	struct tdx_module_output out;
@@ -191,6 +216,9 @@ bool tdx_handle_virtualization_exception(struct pt_regs *regs,
 	case EXIT_REASON_MSR_WRITE:
 		ret = tdx_write_msr_safe(regs->cx, regs->ax, regs->dx);
 		break;
+	case EXIT_REASON_CPUID:
+		ret = tdx_handle_cpuid(regs);
+		break;
 	default:
 		pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
 		return false;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 00/11] Add TDX Guest Support (Initial support)
  2021-10-09  5:37 [PATCH v10 00/11] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
                   ` (10 preceding siblings ...)
  2021-10-09  5:37 ` [PATCH v10 11/11] x86/tdx: Handle CPUID via #VE Kuppuswamy Sathyanarayanan
@ 2021-10-09  7:38 ` Borislav Petkov
  2021-10-09 20:56   ` Kuppuswamy, Sathyanarayanan
  11 siblings, 1 reply; 58+ messages in thread
From: Borislav Petkov @ 2021-10-09  7:38 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, x86, Paolo Bonzini,
	David Hildenbrand, Andrea Arcangeli, Josh Poimboeuf,
	Juergen Gross, Deep Shah, VMware Inc, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Peter H Anvin,
	Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On Fri, Oct 08, 2021 at 10:37:36PM -0700, Kuppuswamy Sathyanarayanan wrote:
> Hi All,

Now let's see: you sent this particular patchset on Monday, 4th. The
usual process is that you wait at least a week for review comments,
incorporate them into your next revision and then you send it. We were
still reviewing v8...

But I see already a v9 in my mbox from yesterday and *also* a v10. v9
you probably didn't build-test enough so you had to hastily do a v10. 4
days later!

And because that's not enough, there are a bunch of other TDX patchsets
from you flying in constantly.

Now, please explain to me how you imagine this whole review thing is
supposed to work?

You hammer people with patchsets until they go in? Forget proper review?

Or people should drop the other things they have to do for their jobs
and deal only with your patchsets?

How about we trade places: you review and try to get sh*t to work and I
hammer you with patchsets every 3-4 days?

For chrissakes, please calm down with that constant hammering and try to
put yourself in the maintainer's shoes for once. Also, try to realize
that hammering people with patchsets will get you the *opposite* of what
you're trying to achieve - you will get ignored.

Geez.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 00/11] Add TDX Guest Support (Initial support)
  2021-10-09  7:38 ` [PATCH v10 00/11] Add TDX Guest Support (Initial support) Borislav Petkov
@ 2021-10-09 20:56   ` Kuppuswamy, Sathyanarayanan
  2021-10-11 13:03     ` Borislav Petkov
  0 siblings, 1 reply; 58+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2021-10-09 20:56 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Ingo Molnar, x86, Paolo Bonzini,
	David Hildenbrand, Andrea Arcangeli, Josh Poimboeuf,
	Juergen Gross, Deep Shah, VMware Inc, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Peter H Anvin,
	Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel



On 10/9/21 12:38 AM, Borislav Petkov wrote:
> On Fri, Oct 08, 2021 at 10:37:36PM -0700, Kuppuswamy Sathyanarayanan wrote:
>> Hi All,
> 
> Now let's see: you sent this particular patchset on Monday, 4th. The
> usual process is that you wait at least a week for review comments,
> incorporate them into your next revision and then you send it. We were
> still reviewing v8...

Sorry for the quick re-submissions. But the main reason for sending v9
within a week is,

1. Following compilation error found in v8.

This fails to build:

arch/x86/kernel/tdx.c: In function ‘tdx_write_msr_safe’:
arch/x86/kernel/tdx.c:135:22: error: implicit declaration of function ‘tdx_is_context_switched_msr’ 
[-Werror=implicit-function-declaration]
135 | WARN_ON_ONCE(tdx_is_context_switched_msr(msr));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
./include/asm-generic/bug.h:104:32: note: in definition of macro ‘WARN_ON_ONCE’
104 | int __ret_warn_on = !!(condition);


2. I had to rebase my patches on your tip tree again to adapt to the latest
version of CC patches.
3. Also to address your comment about using is_tdx_guest() in
cc_platform_has()

I thought the above issues warranted a re-submission. I know that it is a mistake
from my end. But I did not want you to review code changes that might go away
due to rebase.


> 
> But I see already a v9 in my mbox from yesterday and *also* a v10. v9
> you probably didn't build-test enough so you had to hastily do a v10. 4
> days later!

I have sent v10 within few hours of v9 submission to fix a static inline issue.

I did not catch it my compilation test because, it happens only with a
TDX disabled config.

Sorry for the trouble again. Please ignore the v9 version. I will try not to repeat
this  in future.

> 
> And because that's not enough, there are a bunch of other TDX patchsets
> from you flying in constantly.
> 
> Now, please explain to me how you imagine this whole review thing is
> supposed to work?
> 
> You hammer people with patchsets until they go in? Forget proper review?
> 
> Or people should drop the other things they have to do for their jobs
> and deal only with your patchsets?
> 
> How about we trade places: you review and try to get sh*t to work and I
> hammer you with patchsets every 3-4 days?
> 
> For chrissakes, please calm down with that constant hammering and try to
> put yourself in the maintainer's shoes for once. Also, try to realize
> that hammering people with patchsets will get you the *opposite* of what
> you're trying to achieve - you will get ignored.
> 
> Geez.
> 

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 00/11] Add TDX Guest Support (Initial support)
  2021-10-09 20:56   ` Kuppuswamy, Sathyanarayanan
@ 2021-10-11 13:03     ` Borislav Petkov
  2021-10-11 16:33       ` Dave Hansen
  0 siblings, 1 reply; 58+ messages in thread
From: Borislav Petkov @ 2021-10-11 13:03 UTC (permalink / raw)
  To: Kuppuswamy, Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, x86, Paolo Bonzini,
	David Hildenbrand, Andrea Arcangeli, Josh Poimboeuf,
	Juergen Gross, Deep Shah, VMware Inc, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Peter H Anvin,
	Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On Sat, Oct 09, 2021 at 01:56:12PM -0700, Kuppuswamy, Sathyanarayanan wrote:
> I thought the above issues warranted a re-submission. I know that it is a mistake
> from my end. But I did not want you to review code changes that might go away
> due to rebase.

That's fine - most of the time I do notice when code is changing and I
even warn submitters about it.

> I have sent v10 within few hours of v9 submission to fix a static inline issue.
> 
> I did not catch it my compilation test because, it happens only with a
> TDX disabled config.

For that you need to write yourself a script which does:

ARCHES=('i386' 'x86_64')

	...

        for a in "${ARCHES[@]}"
        do
                for cfg in "allnoconfig" "defconfig" "allmodconfig" "allyesconfig"
                do
                        build_kernel $a $cfg
                done
        done

then find a big fat machine at Intel - I don't think that would post as
a particularly hard problem :-) - and run it in tmpfs, on your patchset.

Then you collect build logs and grep them for errors. And for additional
coverage, when you're done with the above configs, you do randconfigs.
That's what I do all the time with patchsets and it catches pretty much
every build error.

> Sorry for the trouble again. Please ignore the v9 version. I will try
> not to repeat this in future.

Thanks for that - very much appreciated.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 00/11] Add TDX Guest Support (Initial support)
  2021-10-11 13:03     ` Borislav Petkov
@ 2021-10-11 16:33       ` Dave Hansen
  2021-10-11 16:48         ` Dave Hansen
  0 siblings, 1 reply; 58+ messages in thread
From: Dave Hansen @ 2021-10-11 16:33 UTC (permalink / raw)
  To: Borislav Petkov, Kuppuswamy, Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, x86, Paolo Bonzini,
	David Hildenbrand, Andrea Arcangeli, Josh Poimboeuf,
	Juergen Gross, Deep Shah, VMware Inc, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Peter H Anvin, Tony Luck,
	Dan Williams, Andi Kleen, Kirill Shutemov, Sean Christopherson,
	Kuppuswamy Sathyanarayanan, linux-kernel

On 10/11/21 6:03 AM, Borislav Petkov wrote:
> For that you need to write yourself a script which does:
> 
> ARCHES=('i386' 'x86_64')
> 
> 	...
> 
>         for a in "${ARCHES[@]}"
>         do
>                 for cfg in "allnoconfig" "defconfig" "allmodconfig" "allyesconfig"
>                 do
>                         build_kernel $a $cfg
>                 done
>         done
> 
> then find a big fat machine at Intel - I don't think that would post as
> a particularly hard problem :-) - and run it in tmpfs, on your patchset.
> 
> Then you collect build logs and grep them for errors. And for additional
> coverage, when you're done with the above configs, you do randconfigs.
> That's what I do all the time with patchsets and it catches pretty much
> every build error.

FWIW, if you're in search of funky Kconfigs for compiling x86, I've got
a big bundle of them.  In addition, I suggest grepping through any
Kconfig options you add and then toggling *those* options specifically.

The three that come to mind that tend to find bugs the fastest are a
defconfig, plus removing (one at a time):

	CONFIG_CPU_SUP_INTEL
	CONFIG_SMP
	CONFIG_NUMA

In your case, I'd make sure you also flip SECURITY and X86_X2APIC, because:

> +	depends on X86_64 && CPU_SUP_INTEL && PARAVIRT
> +	depends on SECURITY
> +	depends on X86_X2APIC


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 00/11] Add TDX Guest Support (Initial support)
  2021-10-11 16:33       ` Dave Hansen
@ 2021-10-11 16:48         ` Dave Hansen
  2021-10-11 17:04           ` Borislav Petkov
  0 siblings, 1 reply; 58+ messages in thread
From: Dave Hansen @ 2021-10-11 16:48 UTC (permalink / raw)
  To: Borislav Petkov, Kuppuswamy, Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, x86, Paolo Bonzini,
	David Hildenbrand, Andrea Arcangeli, Josh Poimboeuf,
	Juergen Gross, Deep Shah, VMware Inc, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Peter H Anvin, Tony Luck,
	Dan Williams, Andi Kleen, Kirill Shutemov, Sean Christopherson,
	Kuppuswamy Sathyanarayanan, linux-kernel

On 10/11/21 9:33 AM, Dave Hansen wrote:
> FWIW, if you're in search of funky Kconfigs for compiling x86, I've got
> a big bundle of them. 

In case anyone's interested, here's a bundle of them:

	https://sr71.net/~dave/intel/configs-5.15.tar.gz

Nothing magic here, just a list of configs that have caused me problems
over the years.  These tend to be tweaked a *bit* away from their
namesake.  For instance, "allnoconfig" is typically a "make
allnoconfig", with a few things added back so it has a chance of booting
somewhere.

These, once untarred, are in a directory structure so you can easily:

	make O=/path/to/build/64bit menuconfig

or whatever.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 00/11] Add TDX Guest Support (Initial support)
  2021-10-11 16:48         ` Dave Hansen
@ 2021-10-11 17:04           ` Borislav Petkov
  0 siblings, 0 replies; 58+ messages in thread
From: Borislav Petkov @ 2021-10-11 17:04 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Kuppuswamy, Sathyanarayanan, Thomas Gleixner, Ingo Molnar, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Peter H Anvin, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On Mon, Oct 11, 2021 at 09:48:40AM -0700, Dave Hansen wrote:
> On 10/11/21 9:33 AM, Dave Hansen wrote:
> > FWIW, if you're in search of funky Kconfigs for compiling x86, I've got
> > a big bundle of them. 
> 
> In case anyone's interested, here's a bundle of them:
> 
> 	https://sr71.net/~dave/intel/configs-5.15.tar.gz
> 
> Nothing magic here, just a list of configs that have caused me problems
> over the years.  These tend to be tweaked a *bit* away from their
> namesake.  For instance, "allnoconfig" is typically a "make
> allnoconfig", with a few things added back so it has a chance of booting
> somewhere.
> 
> These, once untarred, are in a directory structure so you can easily:
> 
> 	make O=/path/to/build/64bit menuconfig
> 
> or whatever.

Yap, looks good. Lemme add them to my pile of build-test machinery.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 02/11] x86/tdx: Introduce INTEL_TDX_GUEST config option
  2021-10-09  5:37 ` [PATCH v10 02/11] x86/tdx: Introduce INTEL_TDX_GUEST config option Kuppuswamy Sathyanarayanan
@ 2021-10-11 18:19   ` Josh Poimboeuf
  2021-10-11 18:38     ` Andi Kleen
  0 siblings, 1 reply; 58+ messages in thread
From: Josh Poimboeuf @ 2021-10-11 18:19 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Juergen Gross, Deep Shah, VMware Inc, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Peter H Anvin,
	Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On Fri, Oct 08, 2021 at 10:37:38PM -0700, Kuppuswamy Sathyanarayanan wrote:
> +config INTEL_TDX_GUEST
> +	bool "Intel Trusted Domain Extensions (TDX) Guest Support"
> +	depends on X86_64 && CPU_SUP_INTEL && PARAVIRT
> +	depends on SECURITY
> +	depends on X86_X2APIC
> +	help
> +	  Provide support for running in a trusted domain on Intel processors
> +	  equipped with Trusted Domain Extensions. TDX is a Intel technology
> +	  that extends VMX and Memory Encryption with a new kind of virtual
> +	  machine guest called Trust Domain (TD). A TD is designed to run in
> +	  a CPU mode that protects the confidentiality of TD memory contents
> +	  and the TD’s CPU state from other software, including VMM. TDX guest
> +	  uses virtual X2APIC for interrupt management.

Why does it depend on SECURITY?  It should at least be explained in the
commit message.

-- 
Josh


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 02/11] x86/tdx: Introduce INTEL_TDX_GUEST config option
  2021-10-11 18:19   ` Josh Poimboeuf
@ 2021-10-11 18:38     ` Andi Kleen
  2021-10-11 18:47       ` Kuppuswamy, Sathyanarayanan
  0 siblings, 1 reply; 58+ messages in thread
From: Andi Kleen @ 2021-10-11 18:38 UTC (permalink / raw)
  To: Josh Poimboeuf, Kuppuswamy Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Juergen Gross, Deep Shah, VMware Inc, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Peter H Anvin,
	Dave Hansen, Tony Luck, Dan Williams, Kirill Shutemov,
	Sean Christopherson, Kuppuswamy Sathyanarayanan, linux-kernel


On 10/11/2021 11:19 AM, Josh Poimboeuf wrote:
> On Fri, Oct 08, 2021 at 10:37:38PM -0700, Kuppuswamy Sathyanarayanan wrote:
>> +config INTEL_TDX_GUEST
>> +	bool "Intel Trusted Domain Extensions (TDX) Guest Support"
>> +	depends on X86_64 && CPU_SUP_INTEL && PARAVIRT
>> +	depends on SECURITY
>> +	depends on X86_X2APIC
>> +	help
>> +	  Provide support for running in a trusted domain on Intel processors
>> +	  equipped with Trusted Domain Extensions. TDX is a Intel technology
>> +	  that extends VMX and Memory Encryption with a new kind of virtual
>> +	  machine guest called Trust Domain (TD). A TD is designed to run in
>> +	  a CPU mode that protects the confidentiality of TD memory contents
>> +	  and the TD’s CPU state from other software, including VMM. TDX guest
>> +	  uses virtual X2APIC for interrupt management.
> Why does it depend on SECURITY?  It should at least be explained in the
> commit message.

It can be dropped, it was only needed in an earlier version that used a LSM.

-Andi

>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 02/11] x86/tdx: Introduce INTEL_TDX_GUEST config option
  2021-10-11 18:38     ` Andi Kleen
@ 2021-10-11 18:47       ` Kuppuswamy, Sathyanarayanan
  0 siblings, 0 replies; 58+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2021-10-11 18:47 UTC (permalink / raw)
  To: Andi Kleen, Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Juergen Gross, Deep Shah, VMware Inc, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Peter H Anvin,
	Dave Hansen, Tony Luck, Dan Williams, Kirill Shutemov,
	Sean Christopherson, Kuppuswamy Sathyanarayanan, linux-kernel



On 10/11/21 11:38 AM, Andi Kleen wrote:
>> Why does it depend on SECURITY?  It should at least be explained in the
>> commit message.
> 
> It can be dropped, it was only needed in an earlier version that used a LSM.

I will remove it next version.

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 03/11] x86/cpufeatures: Add TDX Guest CPU feature
  2021-10-09  5:37 ` [PATCH v10 03/11] x86/cpufeatures: Add TDX Guest CPU feature Kuppuswamy Sathyanarayanan
@ 2021-10-13  8:18   ` Borislav Petkov
  2021-10-13 13:32     ` Sathyanarayanan Kuppuswamy
  2021-10-13 19:42     ` Josh Poimboeuf
  2021-10-13 20:44   ` Thomas Gleixner
  1 sibling, 2 replies; 58+ messages in thread
From: Borislav Petkov @ 2021-10-13  8:18 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, x86, Paolo Bonzini,
	David Hildenbrand, Andrea Arcangeli, Josh Poimboeuf,
	Juergen Gross, Deep Shah, VMware Inc, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Peter H Anvin,
	Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On Fri, Oct 08, 2021 at 10:37:39PM -0700, Kuppuswamy Sathyanarayanan wrote:
> @@ -500,6 +501,14 @@ asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)
>  
>  	copy_bootdata(__va(real_mode_data));
>  
> +	/*
> +	 * A future dependency on cmdline parameters is expected (for
> +	 * adding debug options). So the order of calling it should be
> +	 * after copy_bootdata() (in which command line parameter is
> +	 * initialized).
> +	 */

Plain and simple:

        /*
         * Keep this after copy_bootdata() so that TDX cmdline options can take
         * effect.
         */


> +	tdx_early_init();
> +
>  	/*
>  	 * Load microcode early on BSP.
>  	 */
> diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
> new file mode 100644
> index 000000000000..88bf12788684
> --- /dev/null
> +++ b/arch/x86/kernel/tdx.c
> @@ -0,0 +1,38 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (C) 2020 Intel Corporation */
> +
> +#undef pr_fmt
> +#define pr_fmt(fmt)     "tdx: " fmt
> +
> +#include <asm/tdx.h>
> +
> +bool is_tdx_guest(void)
> +{
> +	static int tdx_guest = -1;

Put that one at the top of the file because such static variables do not
belong among the automatic function vars.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 03/11] x86/cpufeatures: Add TDX Guest CPU feature
  2021-10-13  8:18   ` Borislav Petkov
@ 2021-10-13 13:32     ` Sathyanarayanan Kuppuswamy
  2021-10-13 19:42     ` Josh Poimboeuf
  1 sibling, 0 replies; 58+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2021-10-13 13:32 UTC (permalink / raw)
  To: Borislav Petkov, Kuppuswamy Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, x86, Paolo Bonzini,
	David Hildenbrand, Andrea Arcangeli, Josh Poimboeuf,
	Juergen Gross, Deep Shah, VMware Inc, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Peter H Anvin,
	Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel


On 10/13/21 1:18 AM, Borislav Petkov wrote:
> On Fri, Oct 08, 2021 at 10:37:39PM -0700, Kuppuswamy Sathyanarayanan wrote:
>> @@ -500,6 +501,14 @@ asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)
>>   
>>   	copy_bootdata(__va(real_mode_data));
>>   
>> +	/*
>> +	 * A future dependency on cmdline parameters is expected (for
>> +	 * adding debug options). So the order of calling it should be
>> +	 * after copy_bootdata() (in which command line parameter is
>> +	 * initialized).
>> +	 */
> Plain and simple:
>
>          /*
>           * Keep this after copy_bootdata() so that TDX cmdline options can take
>           * effect.
>           */
ok. I will fix this in next version.
>
>
>> +	tdx_early_init();
>> +
>>   	/*
>>   	 * Load microcode early on BSP.
>>   	 */
>> diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
>> new file mode 100644
>> index 000000000000..88bf12788684
>> --- /dev/null
>> +++ b/arch/x86/kernel/tdx.c
>> @@ -0,0 +1,38 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/* Copyright (C) 2020 Intel Corporation */
>> +
>> +#undef pr_fmt
>> +#define pr_fmt(fmt)     "tdx: " fmt
>> +
>> +#include <asm/tdx.h>
>> +
>> +bool is_tdx_guest(void)
>> +{
>> +	static int tdx_guest = -1;
> Put that one at the top of the file because such static variables do not
> belong among the automatic function vars.
ok. I will fix this in next version.
>
> Thx.
>
-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 04/11] x86/tdx: Add TDX support to intel_cc_platform_has()
  2021-10-09  5:37 ` [PATCH v10 04/11] x86/tdx: Add TDX support to intel_cc_platform_has() Kuppuswamy Sathyanarayanan
@ 2021-10-13 15:57   ` Borislav Petkov
  2021-10-14  7:12   ` Thomas Gleixner
  1 sibling, 0 replies; 58+ messages in thread
From: Borislav Petkov @ 2021-10-13 15:57 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan
  Cc: Thomas Gleixner, Ingo Molnar, x86, Paolo Bonzini,
	David Hildenbrand, Andrea Arcangeli, Josh Poimboeuf,
	Juergen Gross, Deep Shah, VMware Inc, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Peter H Anvin,
	Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On Fri, Oct 08, 2021 at 10:37:40PM -0700, Kuppuswamy Sathyanarayanan wrote:
> diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h
> index a075b70b9a70..6124527a0423 100644
> --- a/include/linux/cc_platform.h
> +++ b/include/linux/cc_platform.h
> @@ -61,6 +61,15 @@ enum cc_attr {
>  	 * Examples include SEV-ES.
>  	 */
>  	CC_ATTR_GUEST_STATE_ENCRYPT,
> +
> +	/**
> +	 * @CC_ATTR_GUEST_TDX: Trusted Domain Extension Support
> +	 *
> +	 * The platform/OS is running as a TDX guest/virtual machine.
> +	 *
> +	 * Examples include Intel TDX.

It already has "TDX" in the name - no need for "Examples ..."

> +	 */
> +	CC_ATTR_GUEST_TDX,

Yeah, we talked about putting those vendor-specific enums in a block so
that they're all consecutive, so please do

	CC_ATTR_GUEST_TDX	= 0x200

Brijesh already took the 0x100 range for SNP.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 03/11] x86/cpufeatures: Add TDX Guest CPU feature
  2021-10-13  8:18   ` Borislav Petkov
  2021-10-13 13:32     ` Sathyanarayanan Kuppuswamy
@ 2021-10-13 19:42     ` Josh Poimboeuf
  2021-10-13 23:19       ` Thomas Gleixner
       [not found]       ` <1a6220a5-3abd-dea1-4b2f-2acade311236@linux.intel.com>
  1 sibling, 2 replies; 58+ messages in thread
From: Josh Poimboeuf @ 2021-10-13 19:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Kuppuswamy Sathyanarayanan, Thomas Gleixner, Ingo Molnar, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Juergen Gross, Deep Shah, VMware Inc, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Peter H Anvin,
	Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On Wed, Oct 13, 2021 at 10:18:18AM +0200, Borislav Petkov wrote:
> On Fri, Oct 08, 2021 at 10:37:39PM -0700, Kuppuswamy Sathyanarayanan wrote:
> > @@ -500,6 +501,14 @@ asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)
> >  
> >  	copy_bootdata(__va(real_mode_data));
> >  
> > +	/*
> > +	 * A future dependency on cmdline parameters is expected (for
> > +	 * adding debug options). So the order of calling it should be
> > +	 * after copy_bootdata() (in which command line parameter is
> > +	 * initialized).
> > +	 */
> 
> Plain and simple:
> 
>         /*
>          * Keep this after copy_bootdata() so that TDX cmdline options can take
>          * effect.
>          */

But there are no actual TDX cmdline options in this patch set, which is
why I originally asked for clarification in the comment.

> > +	tdx_early_init();
> > +
> >  	/*
> >  	 * Load microcode early on BSP.
> >  	 */
> > diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
> > new file mode 100644
> > index 000000000000..88bf12788684
> > --- /dev/null
> > +++ b/arch/x86/kernel/tdx.c
> > @@ -0,0 +1,38 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/* Copyright (C) 2020 Intel Corporation */
> > +
> > +#undef pr_fmt
> > +#define pr_fmt(fmt)     "tdx: " fmt
> > +
> > +#include <asm/tdx.h>
> > +
> > +bool is_tdx_guest(void)
> > +{
> > +	static int tdx_guest = -1;
> 
> Put that one at the top of the file because such static variables do not
> belong among the automatic function vars.

I disagree, this prevents confusion and misuse by making it clear that
the scope of the static variable is limited to this function.

-- 
Josh


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 03/11] x86/cpufeatures: Add TDX Guest CPU feature
  2021-10-09  5:37 ` [PATCH v10 03/11] x86/cpufeatures: Add TDX Guest CPU feature Kuppuswamy Sathyanarayanan
  2021-10-13  8:18   ` Borislav Petkov
@ 2021-10-13 20:44   ` Thomas Gleixner
  2021-10-13 21:05     ` Sathyanarayanan Kuppuswamy
  2021-10-13 21:07     ` Borislav Petkov
  1 sibling, 2 replies; 58+ messages in thread
From: Thomas Gleixner @ 2021-10-13 20:44 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On Fri, Oct 08 2021 at 22:37, Kuppuswamy Sathyanarayanan wrote:
> +
> +bool is_tdx_guest(void)
> +{
> +	static int tdx_guest = -1;
> +	u32 eax, sig[3];
> +
> +	if (tdx_guest >= 0)
> +		goto done;
> +
> +	if (cpuid_eax(0) < TDX_CPUID_LEAF_ID) {
> +		tdx_guest = 0;
> +		goto done;
> +	}
> +
> +	cpuid_count(TDX_CPUID_LEAF_ID, 0, &eax, &sig[0], &sig[2], &sig[1]);
> +
> +	tdx_guest = !memcmp("IntelTDX    ", sig, 12);
> +
> +done:
> +	return !!tdx_guest;
> +}

No. This is tasteless garbage, really.

tdx_early_init() is invoked from x86_64_start_kernel() very early in the
boot process __before__ is_tdx_guest() is invoked.

So why on earth is it requried to keep those conditionals and cpuid()
muck around after init?

> +void __init tdx_early_init(void)
> +{
> +	if (!is_tdx_guest())
> +		return;
> +
> +	setup_force_cpu_cap(X86_FEATURE_TDX_GUEST);
> +
> +	pr_info("Guest initialized\n");
> +}

What's wrong with:

static bool tdx_guest_detected __ro_after_init;

void __init tdx_early_init(void)
{
	u32 eax, sig[3];

	if (cpuid_eax(0) < TDX_CPUID_LEAF_ID)
        	return;

        cpuid_count(TDX_CPUID_LEAF_ID, 0, &eax, &sig[0], &sig[2],  &sig[1]);

        if (memcmp("IntelTDX    ", sig, 12))
        	return;

        tdx_guest_detected = true;
	setup_force_cpu_cap(X86_FEATURE_TDX_GUEST);

	pr_info("Guest initialized\n");
}

bool is_tdx_guest(void)
{
        return tdx_guest_detected;
}

That's simple and obvious and discards all the detection gunk completely
after init and uses the proper data type, right?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 03/11] x86/cpufeatures: Add TDX Guest CPU feature
  2021-10-13 20:44   ` Thomas Gleixner
@ 2021-10-13 21:05     ` Sathyanarayanan Kuppuswamy
  2021-10-13 21:35       ` Thomas Gleixner
  2021-10-13 21:07     ` Borislav Petkov
  1 sibling, 1 reply; 58+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2021-10-13 21:05 UTC (permalink / raw)
  To: Thomas Gleixner, Kuppuswamy Sathyanarayanan, Ingo Molnar,
	Borislav Petkov, x86, Paolo Bonzini, David Hildenbrand,
	Andrea Arcangeli, Josh Poimboeuf, Juergen Gross, Deep Shah,
	VMware Inc, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel


On 10/13/21 1:44 PM, Thomas Gleixner wrote:
> On Fri, Oct 08 2021 at 22:37, Kuppuswamy Sathyanarayanan wrote:
>> +
>> +bool is_tdx_guest(void)
>> +{
>> +	static int tdx_guest = -1;
>> +	u32 eax, sig[3];
>> +
>> +	if (tdx_guest >= 0)
>> +		goto done;
>> +
>> +	if (cpuid_eax(0) < TDX_CPUID_LEAF_ID) {
>> +		tdx_guest = 0;
>> +		goto done;
>> +	}
>> +
>> +	cpuid_count(TDX_CPUID_LEAF_ID, 0, &eax, &sig[0], &sig[2], &sig[1]);
>> +
>> +	tdx_guest = !memcmp("IntelTDX    ", sig, 12);
>> +
>> +done:
>> +	return !!tdx_guest;
>> +}
> No. This is tasteless garbage, really.
>
> tdx_early_init() is invoked from x86_64_start_kernel() very early in the
> boot process __before__ is_tdx_guest() is invoked.
>
> So why on earth is it requried to keep those conditionals and cpuid()
> muck around after init?

is_tdx_guest() is also used by cc_platform_has() API. Please check
the following patch [1]. cc_platform_has() will be called much earlier
than x86_64_start_kernel() (like __startup_64() [2]).

[1] - 
https://lore.kernel.org/lkml/20211009053747.1694419-5-sathyanarayanan.kuppuswamy@linux.intel.com/
[2] - 
https://lore.kernel.org/lkml/46a18427dc4e9dda985b10e472965e3e4c769f1d.1631141919.git.thomas.lendacky@amd.com/

>
>> +void __init tdx_early_init(void)
>> +{
>> +	if (!is_tdx_guest())
>> +		return;
>> +
>> +	setup_force_cpu_cap(X86_FEATURE_TDX_GUEST);
>> +
>> +	pr_info("Guest initialized\n");
>> +}
> What's wrong with:
>
> static bool tdx_guest_detected __ro_after_init;
>
> void __init tdx_early_init(void)
> {
> 	u32 eax, sig[3];
>
> 	if (cpuid_eax(0) < TDX_CPUID_LEAF_ID)
>          	return;
>
>          cpuid_count(TDX_CPUID_LEAF_ID, 0, &eax, &sig[0], &sig[2],  &sig[1]);
>
>          if (memcmp("IntelTDX    ", sig, 12))
>          	return;
>
>          tdx_guest_detected = true;
> 	setup_force_cpu_cap(X86_FEATURE_TDX_GUEST);
>
> 	pr_info("Guest initialized\n");
> }
>
> bool is_tdx_guest(void)
> {
>          return tdx_guest_detected;
> }
>
> That's simple and obvious and discards all the detection gunk completely
> after init and uses the proper data type, right?
>
> Thanks,
>
>          tglx

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 03/11] x86/cpufeatures: Add TDX Guest CPU feature
  2021-10-13 20:44   ` Thomas Gleixner
  2021-10-13 21:05     ` Sathyanarayanan Kuppuswamy
@ 2021-10-13 21:07     ` Borislav Petkov
  2021-10-13 21:25       ` Thomas Gleixner
  1 sibling, 1 reply; 58+ messages in thread
From: Borislav Petkov @ 2021-10-13 21:07 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Kuppuswamy Sathyanarayanan, Ingo Molnar, x86, Paolo Bonzini,
	David Hildenbrand, Andrea Arcangeli, Josh Poimboeuf,
	Juergen Gross, Deep Shah, VMware Inc, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Peter H Anvin,
	Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On Wed, Oct 13, 2021 at 10:44:37PM +0200, Thomas Gleixner wrote:
> No. This is tasteless garbage, really.
> 
> tdx_early_init() is invoked from x86_64_start_kernel() very early in the
> boot process __before__ is_tdx_guest() is invoked.
> 
> So why on earth is it requried to keep those conditionals and cpuid()
> muck around after init?

Yah, reportedly, they wanna parse cmdline options so it has to be after
copy_bootdata() but copy_bootdata() has a cc_platform_has() call which
ends up in is_tdx_guest() on Intel and there you have the catch 22
because CPUID hasn't happened yet in tdx_early_init().

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 03/11] x86/cpufeatures: Add TDX Guest CPU feature
  2021-10-13 21:07     ` Borislav Petkov
@ 2021-10-13 21:25       ` Thomas Gleixner
  2021-10-13 21:37         ` Borislav Petkov
  0 siblings, 1 reply; 58+ messages in thread
From: Thomas Gleixner @ 2021-10-13 21:25 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Kuppuswamy Sathyanarayanan, Ingo Molnar, x86, Paolo Bonzini,
	David Hildenbrand, Andrea Arcangeli, Josh Poimboeuf,
	Juergen Gross, Deep Shah, VMware Inc, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Peter H Anvin,
	Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On Wed, Oct 13 2021 at 23:07, Borislav Petkov wrote:
> On Wed, Oct 13, 2021 at 10:44:37PM +0200, Thomas Gleixner wrote:
>> No. This is tasteless garbage, really.
>> 
>> tdx_early_init() is invoked from x86_64_start_kernel() very early in the
>> boot process __before__ is_tdx_guest() is invoked.
>> 
>> So why on earth is it requried to keep those conditionals and cpuid()
>> muck around after init?
>
> Yah, reportedly, they wanna parse cmdline options so it has to be after
> copy_bootdata() but copy_bootdata() has a cc_platform_has() call which
> ends up in is_tdx_guest() on Intel and there you have the catch 22
> because CPUID hasn't happened yet in tdx_early_init().

Seriously?

So this ends up in doing:

   use();
   init();

Can you spot what's wrong with that?

That's a clear violation of common sense and is simply not going to
happen. Why? If you think about deep defensive programming then use()
will look like this:

use()
{
        assert(initialized);
}

which is not something made up. It's a fundamental principle of
programming and some languages enforce that for very good reasons.

Just because it can be done in C is no justification.

What's wrong with:

x86_64_start_kernel()

    tdx_early_init();

    copy_bootdata();
    
    tdx_late_init();

Absolutely nothing. It's clear, simple and well defined.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 03/11] x86/cpufeatures: Add TDX Guest CPU feature
  2021-10-13 21:05     ` Sathyanarayanan Kuppuswamy
@ 2021-10-13 21:35       ` Thomas Gleixner
  0 siblings, 0 replies; 58+ messages in thread
From: Thomas Gleixner @ 2021-10-13 21:35 UTC (permalink / raw)
  To: Sathyanarayanan Kuppuswamy, Kuppuswamy Sathyanarayanan,
	Ingo Molnar, Borislav Petkov, x86, Paolo Bonzini,
	David Hildenbrand, Andrea Arcangeli, Josh Poimboeuf,
	Juergen Gross, Deep Shah, VMware Inc, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

Sathyanarayanan,

On Wed, Oct 13 2021 at 14:05, Sathyanarayanan Kuppuswamy wrote:
> On 10/13/21 1:44 PM, Thomas Gleixner wrote:
>> tdx_early_init() is invoked from x86_64_start_kernel() very early in the
>> boot process __before__ is_tdx_guest() is invoked.
>>
>> So why on earth is it requried to keep those conditionals and cpuid()
>> muck around after init?
>
> is_tdx_guest() is also used by cc_platform_has() API. Please check
> the following patch [1]. cc_platform_has() will be called much earlier
> than x86_64_start_kernel() (like __startup_64() [2]).

That's just fundamentally wrong as I pointed out to Borislav already.

AMD has done it the right way.

__startup_64()
  sme_enable()

  x86_64_start_kernel()
    sme_early_init()
    copy_bootdata()
    ...

It's not that hard and we are not going to make this fundamentally
different just because Intel.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 03/11] x86/cpufeatures: Add TDX Guest CPU feature
  2021-10-13 21:25       ` Thomas Gleixner
@ 2021-10-13 21:37         ` Borislav Petkov
  2021-10-13 22:28           ` Sathyanarayanan Kuppuswamy
  0 siblings, 1 reply; 58+ messages in thread
From: Borislav Petkov @ 2021-10-13 21:37 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Kuppuswamy Sathyanarayanan, Ingo Molnar, x86, Paolo Bonzini,
	David Hildenbrand, Andrea Arcangeli, Josh Poimboeuf,
	Juergen Gross, Deep Shah, VMware Inc, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Peter H Anvin,
	Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On Wed, Oct 13, 2021 at 11:25:35PM +0200, Thomas Gleixner wrote:
> So this ends up in doing:
> 
>    use();
>    init();
> 
> Can you spot what's wrong with that?
> 
> That's a clear violation of common sense and is simply not going to
> happen. Why? If you think about deep defensive programming then use()
> will look like this:
> 
> use()
> {
>         assert(initialized);
> }
> 
> which is not something made up. It's a fundamental principle of
> programming and some languages enforce that for very good reasons.
> 
> Just because it can be done in C is no justification.

Oh, I heartily agree.
 
> What's wrong with:
> 
> x86_64_start_kernel()
> 
>     tdx_early_init();
> 
>     copy_bootdata();
>     
>     tdx_late_init();
> 
> Absolutely nothing. It's clear, simple and well defined.

I like simple more than anyone, so sure, I'd prefer that a lot more.

And so the options parsing would need to happen early using, say,
cmdline_find_option() or so, like sme_enable() does.

Hmmm.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 03/11] x86/cpufeatures: Add TDX Guest CPU feature
  2021-10-13 21:37         ` Borislav Petkov
@ 2021-10-13 22:28           ` Sathyanarayanan Kuppuswamy
  2021-10-13 23:02             ` Thomas Gleixner
  0 siblings, 1 reply; 58+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2021-10-13 22:28 UTC (permalink / raw)
  To: Borislav Petkov, Thomas Gleixner
  Cc: Kuppuswamy Sathyanarayanan, Ingo Molnar, x86, Paolo Bonzini,
	David Hildenbrand, Andrea Arcangeli, Josh Poimboeuf,
	Juergen Gross, Deep Shah, VMware Inc, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Peter H Anvin,
	Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel


On 10/13/21 2:37 PM, Borislav Petkov wrote:
> On Wed, Oct 13, 2021 at 11:25:35PM +0200, Thomas Gleixner wrote:
>> So this ends up in doing:
>>
>>     use();
>>     init();
>>
>> Can you spot what's wrong with that?
>>
>> That's a clear violation of common sense and is simply not going to
>> happen. Why? If you think about deep defensive programming then use()
>> will look like this:
>>
>> use()
>> {
>>          assert(initialized);
>> }
>>
>> which is not something made up. It's a fundamental principle of
>> programming and some languages enforce that for very good reasons.
>>
>> Just because it can be done in C is no justification.
> Oh, I heartily agree.
>   
>> What's wrong with:
>>
>> x86_64_start_kernel()
>>
>>      tdx_early_init();
>>
>>      copy_bootdata();
>>      
>>      tdx_late_init();
>>
>> Absolutely nothing. It's clear, simple and well defined.
> I like simple more than anyone, so sure, I'd prefer that a lot more.
>
> And so the options parsing would need to happen early using, say,
> cmdline_find_option() or so, like sme_enable() does.

Since in tdx_early_init() all we are going to do is to initialize
"tdx_guest_detected" using cpuid call, shall we name it
tdx_guest_cpuid_init()? (similar to sme_enable call in AMD)

>
> Hmmm.
>
-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 03/11] x86/cpufeatures: Add TDX Guest CPU feature
  2021-10-13 22:28           ` Sathyanarayanan Kuppuswamy
@ 2021-10-13 23:02             ` Thomas Gleixner
  2021-10-14 17:28               ` Sathyanarayanan Kuppuswamy
  0 siblings, 1 reply; 58+ messages in thread
From: Thomas Gleixner @ 2021-10-13 23:02 UTC (permalink / raw)
  To: Sathyanarayanan Kuppuswamy, Borislav Petkov
  Cc: Kuppuswamy Sathyanarayanan, Ingo Molnar, x86, Paolo Bonzini,
	David Hildenbrand, Andrea Arcangeli, Josh Poimboeuf,
	Juergen Gross, Deep Shah, VMware Inc, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Peter H Anvin,
	Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On Wed, Oct 13 2021 at 15:28, Sathyanarayanan Kuppuswamy wrote:
> On 10/13/21 2:37 PM, Borislav Petkov wrote:
>> On Wed, Oct 13, 2021 at 11:25:35PM +0200, Thomas Gleixner wrote:
>>> So this ends up in doing:
>>>
>>>     use();
>>>     init();
>>>
>>> Can you spot what's wrong with that?
>>>
>>> That's a clear violation of common sense and is simply not going to
>>> happen. Why? If you think about deep defensive programming then use()
>>> will look like this:
>>>
>>> use()
>>> {
>>>          assert(initialized);
>>> }
>>>
>>> which is not something made up. It's a fundamental principle of
>>> programming and some languages enforce that for very good reasons.
>>>
>>> Just because it can be done in C is no justification.
>> Oh, I heartily agree.
>>   
>>> What's wrong with:
>>>
>>> x86_64_start_kernel()
>>>
>>>      tdx_early_init();
>>>
>>>      copy_bootdata();
>>>      
>>>      tdx_late_init();
>>>
>>> Absolutely nothing. It's clear, simple and well defined.
>> I like simple more than anyone, so sure, I'd prefer that a lot more.
>>
>> And so the options parsing would need to happen early using, say,
>> cmdline_find_option() or so, like sme_enable() does.
>
> Since in tdx_early_init() all we are going to do is to initialize
> "tdx_guest_detected" using cpuid call, shall we name it
> tdx_guest_cpuid_init()? (similar to sme_enable call in AMD)

How is that similar?

Just chose a name which makes sense in the overall scheme. I surely care
about naming convetions, but what I care more about is correctness.

Whether it ends up being named

        tdx_enable() - to match the SME muck

or

        tdx_detect()

or whatever makes sense does not really matter. As long as it makes
sense. That's bikeshed painting realm.

Coming back to your suggestion 'tdx_guest_cpuid_init()'. Just sit back
and think about what that name says:

    tdx_guest_cpuid_init()

For the uniformed reader this says:

    If tdx guest then initialize CPUID

which is obviously not what you want to express, right?

So, naming matters but you are free to chose something which makes
sense.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 03/11] x86/cpufeatures: Add TDX Guest CPU feature
  2021-10-13 19:42     ` Josh Poimboeuf
@ 2021-10-13 23:19       ` Thomas Gleixner
  2021-10-14  0:25         ` Josh Poimboeuf
       [not found]       ` <1a6220a5-3abd-dea1-4b2f-2acade311236@linux.intel.com>
  1 sibling, 1 reply; 58+ messages in thread
From: Thomas Gleixner @ 2021-10-13 23:19 UTC (permalink / raw)
  To: Josh Poimboeuf, Borislav Petkov
  Cc: Kuppuswamy Sathyanarayanan, Ingo Molnar, x86, Paolo Bonzini,
	David Hildenbrand, Andrea Arcangeli, Juergen Gross, Deep Shah,
	VMware Inc, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Peter H Anvin, Dave Hansen, Tony Luck,
	Dan Williams, Andi Kleen, Kirill Shutemov, Sean Christopherson,
	Kuppuswamy Sathyanarayanan, linux-kernel

On Wed, Oct 13 2021 at 12:42, Josh Poimboeuf wrote:
> On Wed, Oct 13, 2021 at 10:18:18AM +0200, Borislav Petkov wrote:
>> > +#include <asm/tdx.h>
>> > +
>> > +bool is_tdx_guest(void)
>> > +{
>> > +	static int tdx_guest = -1;
>> 
>> Put that one at the top of the file because such static variables do not
>> belong among the automatic function vars.
>
> I disagree, this prevents confusion and misuse by making it clear that
> the scope of the static variable is limited to this function.

I kinda agree under the assumption that a static variable is only used
in a particular context and there is a reasonable requirement to do so.

The above does not qualify as I pointed out in my other reply. Just
because it looks like strict local usage at the first glance does not
make an argument, really.

I'm amazed that it's so hard to see that this

    use()
    init()

pattern is broken to begin with.

So why are you arguing about the placement of this variable in the first
place instead of actually looking at the code, wondering about the
obscenity and then asking about the call ordering?

In case that I might miss something important here due to my lack of CS
education, please let me know.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 03/11] x86/cpufeatures: Add TDX Guest CPU feature
  2021-10-13 23:19       ` Thomas Gleixner
@ 2021-10-14  0:25         ` Josh Poimboeuf
  2021-10-14  7:57           ` Borislav Petkov
  0 siblings, 1 reply; 58+ messages in thread
From: Josh Poimboeuf @ 2021-10-14  0:25 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Borislav Petkov, Kuppuswamy Sathyanarayanan, Ingo Molnar, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Juergen Gross, Deep Shah, VMware Inc, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Peter H Anvin,
	Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On Thu, Oct 14, 2021 at 01:19:23AM +0200, Thomas Gleixner wrote:
> I'm amazed that it's so hard to see that this
> 
>     use()
>     init()
> 
> pattern is broken to begin with.
> 
> So why are you arguing about the placement of this variable in the first
> place instead of actually looking at the code, wondering about the
> obscenity and then asking about the call ordering?
> 
> In case that I might miss something important here due to my lack of CS
> education, please let me know.

I agree that's better.

I'd suggested doing setup_force_cpu_cap(X86_FEATURE_TDX_GUEST) early,
and then just check that instead of needing this new static variable.  I
think Boris said that's not possible because of some ordering reasons
which are eluding me (and I didn't have time to investigate).

-- 
Josh


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 04/11] x86/tdx: Add TDX support to intel_cc_platform_has()
  2021-10-09  5:37 ` [PATCH v10 04/11] x86/tdx: Add TDX support to intel_cc_platform_has() Kuppuswamy Sathyanarayanan
  2021-10-13 15:57   ` Borislav Petkov
@ 2021-10-14  7:12   ` Thomas Gleixner
  2021-10-14 17:31     ` Sathyanarayanan Kuppuswamy
  1 sibling, 1 reply; 58+ messages in thread
From: Thomas Gleixner @ 2021-10-14  7:12 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On Fri, Oct 08 2021 at 22:37, Kuppuswamy Sathyanarayanan wrote:
>  
> -static bool __maybe_unused intel_cc_platform_has(enum cc_attr attr)
> +static bool intel_cc_platform_has(enum cc_attr attr)
>  {
>  #ifdef CONFIG_INTEL_TDX_GUEST
> -	return false;
> +	switch (attr) {
> +	case CC_ATTR_GUEST_TDX:
> +		return is_tdx_guest();

This function is only called when is_tdx_guest() is true. So
is_tdx_guest() has to be called again to make sure?

Also the ifdeffery can just go away simply because the compiler will
discard this function when CONFIG_INTEL_TDX_GUEST=n due to:

> +#ifdef CONFIG_INTEL_TDX_GUEST
> +
> +bool is_tdx_guest(void);
> +void __init tdx_early_init(void);
> +
> +#else
> +
> +static inline bool is_tdx_guest(void) { return false; }
> +static inline void tdx_early_init(void) { };
> +
> +#endif /* CONFIG_INTEL_TDX_GUEST */

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 05/11] x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper functions
  2021-10-09  5:37 ` [PATCH v10 05/11] x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper functions Kuppuswamy Sathyanarayanan
@ 2021-10-14  7:28   ` Thomas Gleixner
  2021-10-15  0:19     ` Sathyanarayanan Kuppuswamy
  0 siblings, 1 reply; 58+ messages in thread
From: Thomas Gleixner @ 2021-10-14  7:28 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On Fri, Oct 08 2021 at 22:37, Kuppuswamy Sathyanarayanan wrote:
>  
> +#ifdef CONFIG_INTEL_TDX_GUEST
> +#include <asm/tdx.h>
> +#endif

Please get rid of the #ifdef and make sure that tdx.h can be included unconditionally.

> +	/* Restore callee-saved GPRs as mandated by the x86_64 ABI */
> +	pop %r12
> +	pop %r13
> +	pop %r14
> +	pop %r15
> +
> +	jmp 2f
> +1:

ASM supports named labels. 

> +       movq $(-EINVAL), %rax
> +2:
> +       FRAME_END
> +
> +       retq
> +SYM_FUNC_END(__tdx_hypercall)


> +/*
> + * Wrapper for standard use of __tdx_hypercall with BUG_ON() check
> + * for TDCALL error.
> + */
> +static inline u64 _tdx_hypercall(u64 fn, u64 r12, u64 r13, u64 r14,
> +				 u64 r15, struct tdx_hypercall_output *out)
> +{
> +	struct tdx_hypercall_output outl;
> +	u64 err;
> +
> +	/* __tdx_hypercall() does not accept NULL output pointer */
> +	if (!out)
> +		out = &outl;
> +
> +	err = __tdx_hypercall(TDX_HYPERCALL_STANDARD, fn, r12, r13, r14,
> +			      r15, out);
> +
> +	/* Non zero return value indicates buggy TDX module, so panic */
> +	BUG_ON(err);

BUG() does not necessarily panic. If you want to panic in then invoke
the function which does that, i.e. panic().

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 03/11] x86/cpufeatures: Add TDX Guest CPU feature
  2021-10-14  0:25         ` Josh Poimboeuf
@ 2021-10-14  7:57           ` Borislav Petkov
  0 siblings, 0 replies; 58+ messages in thread
From: Borislav Petkov @ 2021-10-14  7:57 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Kuppuswamy Sathyanarayanan, Ingo Molnar, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Juergen Gross, Deep Shah, VMware Inc, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Peter H Anvin,
	Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On Wed, Oct 13, 2021 at 05:25:06PM -0700, Josh Poimboeuf wrote:
> I think Boris said that's not possible because of some ordering
> reasons which are eluding me (and I didn't have time to investigate).

The obvious one being that CPU caps aren't set yet. And then there was
this wild goose chase about fixup_pointer() and whether the compiler
generates rip-relative addressing to global vars in early boot code
which runs out of 1:1 mapping and could be loaded somewhere else...

So I'm not opposed to doing cap flags - that would get rid of the new
flags but it needs to be done carefully because with early boot, the
devil's in the detail.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 06/11] x86/traps: Add #VE support for TDX guest
  2021-10-09  5:37 ` [PATCH v10 06/11] x86/traps: Add #VE support for TDX guest Kuppuswamy Sathyanarayanan
@ 2021-10-14  8:30   ` Thomas Gleixner
  2021-10-17  2:45     ` Sathyanarayanan Kuppuswamy
  0 siblings, 1 reply; 58+ messages in thread
From: Thomas Gleixner @ 2021-10-14  8:30 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On Fri, Oct 08 2021 at 22:37, Kuppuswamy Sathyanarayanan wrote:
>  
> +/*
> + * Used by #VE exception handler to gather the #VE exception
> + * info from the TDX module. This is software only structure
> + * and not related to TDX module/VMM.
> + */
> +struct ve_info {
> +	u64 exit_reason;
> +	u64 exit_qual;
> +	u64 gla;	/* Guest Linear (virtual) Address */
> +	u64 gpa;	/* Guest Physical (virtual) Address */

Please do not use tail comments and with a tab between type and name
this becomes more readable:

	/* Guest Linear (virtual) Address */
	u64	gla;

        /* Guest Physical (virtual) Address */
	u64	gpa;

Hmm?

  
> +bool tdx_get_ve_info(struct ve_info *ve)
> +{
> +	struct tdx_module_output out;
> +	u64 ret;
> +
> +	if (!ve)
> +		return false;

This should be WARN_ON_ONCE() if at all.

> +	/*
> +	 * NMIs and machine checks are suppressed. Before this point any
> +	 * #VE is fatal. After this point (TDGETVEINFO call), NMIs and
> +	 * additional #VEs are permitted (but it is expected not to
> +	 * happen unless kernel panics).
> +	 */
> +	ret = __tdx_module_call(TDX_GET_VEINFO, 0, 0, 0, 0, &out);
> +	if (ret)
> +		return false;

         if (__tdx...())
         	return false;

> +	ve->exit_reason = out.rcx;
> +	ve->exit_qual   = out.rdx;
> +	ve->gla         = out.r8;
> +	ve->gpa         = out.r9;
> +	ve->instr_len   = out.r10 & UINT_MAX;
> +	ve->instr_info  = out.r10 >> 32;
> +
> +	return true;
> +}
> +
> +bool tdx_handle_virtualization_exception(struct pt_regs *regs,
> +					 struct ve_info *ve)
> +{
> +	pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
> +	return false;
> +}
> +
>  void __init tdx_early_init(void)
>  {
>  	if (!is_tdx_guest())
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index a58800973aed..70d76c3a548f 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -61,6 +61,7 @@
>  #include <asm/insn.h>
>  #include <asm/insn-eval.h>
>  #include <asm/vdso.h>
> +#include <asm/tdx.h>
>  
>  #ifdef CONFIG_X86_64
>  #include <asm/x86_init.h>
> @@ -1140,6 +1141,82 @@ DEFINE_IDTENTRY(exc_device_not_available)
>  	}
>  }
>  
> +#ifdef CONFIG_INTEL_TDX_GUEST
> +#define VE_FAULT_STR "VE fault"
> +static void ve_raise_fault(struct pt_regs *regs, long error_code)

Please do not glue the #define and the function definition
together. Newlines exist for a reaon.

> +{
> +	struct task_struct *tsk = current;
> +
> +	if (user_mode(regs)) {
> +		tsk->thread.error_code = error_code;
> +		tsk->thread.trap_nr = X86_TRAP_VE;
> +
> +		/*
> +		 * Not fixing up VDSO exceptions similar to #GP handler
> +		 * because it is expected that VDSO doesn't trigger #VE.

Expected?

> +		 */
> +		show_signal(tsk, SIGSEGV, "", VE_FAULT_STR, regs, error_code);
> +		force_sig(SIGSEGV);
> +		return;
> +	}
> +
> +	/*
> +	 * Attempt to recover from #VE exception failure without
> +	 * triggering OOPS (useful for MSR read/write failures)
> +	 */
> +	if (fixup_exception(regs, X86_TRAP_VE, error_code, 0))
> +		return;
> +
> +	tsk->thread.error_code = error_code;
> +	tsk->thread.trap_nr = X86_TRAP_VE;
> +
> +	/*
> +	 * To be potentially processing a kprobe fault and to trust the result
> +	 * from kprobe_running(), it should be non-preemptible.
> +	 */
> +	if (!preemptible() &&
> +	    kprobe_running() &&

	if (!preemptible() && kprobe_running() &&
> +	    kprobe_fault_handler(regs, X86_TRAP_VE))

perhaps? 

> +
> +DEFINE_IDTENTRY(exc_virtualization_exception)
> +{
> +	struct ve_info ve;
> +	bool ret;
> +
> +	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");

Please remove that. The idtentry code is already taking care of that.

> +	/*
> +	 * NMIs/Machine-checks/Interrupts will be in a disabled state
> +	 * till TDGETVEINFO TDCALL is executed. This prevents #VE
> +	 * nesting issue.

s/This prevents.../This ensures that VE info cannot be overwritten by a
nested #VE/

Or something like that perhaps?

Also a some comment about #VE in general above the DEFINE_IDTENTRY()
would be appreciated.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 07/11] x86/tdx: Add HLT support for TDX guest
  2021-10-09  5:37 ` [PATCH v10 07/11] x86/tdx: Add HLT " Kuppuswamy Sathyanarayanan
@ 2021-10-14  9:30   ` Thomas Gleixner
  2021-10-15  1:33     ` Sathyanarayanan Kuppuswamy
  0 siblings, 1 reply; 58+ messages in thread
From: Thomas Gleixner @ 2021-10-14  9:30 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On Fri, Oct 08 2021 at 22:37, Kuppuswamy Sathyanarayanan wrote:
> +/* HLT TDVMCALL sub-function ID */
> +#define EXIT_REASON_HLT			12

arch/x86/include/uapi/asm/vmx.h:#define EXIT_REASON_HLT                 12

Is there a _good_ reason why this can't be reused?

>  /*
>   * __tdx_module_call()  - Helper function used by TDX guests to request
>   * services from the TDX module (does not include VMM services).
> @@ -235,6 +238,33 @@ SYM_FUNC_START(__tdx_hypercall)
>  
>  	movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx
>  
> +	/*
> +	 * For the idle loop STI needs to be called directly before
> +	 * the TDCALL that enters idle (EXIT_REASON_HLT case). STI
> +	 * enables interrupts only one instruction later. If there
> +	 * are any instructions between the STI and the TDCALL for
> +	 * HLT then an interrupt could happen in that time, but the
> +	 * code would go back to sleep afterwards, which can cause
> +	 * longer delays.
> +	 *
> +	 * This leads to significant difference in network performance
> +	 * benchmarks. So add a special case for EXIT_REASON_HLT to
> +	 * trigger STI before TDCALL. But this change is not required
> +	 * for all HLT cases. So use R15 register value to identify the
> +	 * case which needs STI. So, if R11 is EXIT_REASON_HLT and R15
> +	 * is 1, then call STI before TDCALL instruction. Note that R15
> +	 * register is not required by TDCALL ABI when triggering the
> +	 * hypercall for EXIT_REASON_HLT case. So use it in software to
> +	 * select the STI case.
> +	 */
> +	cmpl $EXIT_REASON_HLT, %r11d
> +	jne skip_sti
> +	cmpl $1, %r15d

You already have a nice define for EXIT_REASON_HLT. Please add one for this
constant as well.

> +	jne skip_sti
> +	/* Set R15 register to 0, it is unused in EXIT_REASON_HLT case */
> +	xor %r15, %r15
> +	sti
> +skip_sti:
>  	tdcall

>  bool tdx_get_ve_info(struct ve_info *ve)
>  {
>  	struct tdx_module_output out;
> @@ -84,8 +141,19 @@ bool tdx_get_ve_info(struct ve_info *ve)
>  bool tdx_handle_virtualization_exception(struct pt_regs *regs,
>  					 struct ve_info *ve)
>  {
> -	pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
> -	return false;
> +	switch (ve->exit_reason) {
> +	case EXIT_REASON_HLT:
> +		tdx_halt();
> +		break;
> +	default:
> +		pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
> +		return false;
> +	}
> +
> +	/* After successful #VE handling, move the IP */
> +	regs->ip += ve->instr_len;
> +
> +	return true;
>  }
>  
>  void __init tdx_early_init(void)
> @@ -95,5 +163,8 @@ void __init tdx_early_init(void)
>  
>  	setup_force_cpu_cap(X86_FEATURE_TDX_GUEST);
>  
> +	pv_ops.irq.safe_halt = tdx_safe_halt;
> +	pv_ops.irq.halt = tdx_halt;

Colour me confused, but why do we end up in #VE with EXIT_REASON_HLT
when halt/safe_halt is paravirtualized?

There may be a valid reason. If so then this lacks a comment.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 08/11] x86/tdx: Wire up KVM hypercalls
  2021-10-09  5:37 ` [PATCH v10 08/11] x86/tdx: Wire up KVM hypercalls Kuppuswamy Sathyanarayanan
@ 2021-10-14 10:21   ` Thomas Gleixner
  2021-10-15  3:03     ` Sathyanarayanan Kuppuswamy
  0 siblings, 1 reply; 58+ messages in thread
From: Thomas Gleixner @ 2021-10-14 10:21 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On Fri, Oct 08 2021 at 22:37, Kuppuswamy Sathyanarayanan wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>  
>  #ifdef CONFIG_KVM_GUEST
> @@ -32,6 +34,10 @@ static inline bool kvm_check_and_clear_guest_paused(void)
>  static inline long kvm_hypercall0(unsigned int nr)
>  {
>  	long ret;
> +
> +	if (cc_platform_has(CC_ATTR_GUEST_TDX))
> +		return tdx_kvm_hypercall(nr, 0, 0, 0, 0);

So if TDX is not enabled in Kconfig this cannot be optimized out unless
CC_PLATFORM is disabled as well. But what's worse is that every
hypercall needs to call into cc_platform_has().

None of the hypercalls is used before the early TDX detection. So we can
simply use

       if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST))

here, right? Then you add X86_FEATURE_TDX_GUEST to the disabled feature
bits correctly and all of the above is solved.

Hmm?
 
> +#if defined(CONFIG_KVM_GUEST) && defined(CONFIG_INTEL_TDX_GUEST)
> +static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
> +				     unsigned long p2, unsigned long p3,
> +				     unsigned long p4)
> +{
> +	struct tdx_hypercall_output out;
> +	u64 err;
> +
> +	err = __tdx_hypercall(TDX_HYPERCALL_VENDOR_KVM, nr, p1, p2,
> +			      p3, p4, &out);
> +
> +	/*
> +	 * Non zero return value means buggy TDX module (which is fatal).
> +	 * So use BUG_ON() to panic.
> +	 */
> +	BUG_ON(err);
> +
> +	return out.r10;
> +}

Can we make that a proper exported function (instead of
tdx_kvm_hypercall) so we don't end up with the very same code inlined
all over the place?

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 10/11] x86/tdx: Don't write CSTAR MSR on Intel
  2021-10-09  5:37 ` [PATCH v10 10/11] x86/tdx: Don't write CSTAR MSR on Intel Kuppuswamy Sathyanarayanan
@ 2021-10-14 10:47   ` Thomas Gleixner
  2021-10-14 13:47     ` Andi Kleen
  0 siblings, 1 reply; 58+ messages in thread
From: Thomas Gleixner @ 2021-10-14 10:47 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On Fri, Oct 08 2021 at 22:37, Kuppuswamy Sathyanarayanan wrote:
> From: Andi Kleen <ak@linux.intel.com>
>
> On Intel CPUs writing the CSTAR MSR is not really needed. Syscalls
> from 32bit work using SYSENTER and 32bit SYSCALL is an illegal opcode.
> But the kernel did write it anyways even though it was ignored by
> the CPU. Inside a TDX guest this actually leads to a #VE which in
> turn will trigger ve_raise_fault() due to failed MSR write. Inside
> ve_raise_fault() before it recovers from this error, it prints an
> ugly message at boot. Since such warning message is pointless for
> CSTAR MSR write failure, add exception to skip CSTAR msr write on
> Intel CPUs.

Ugly messages are a technical reason? The above is word salad.

   Intel CPUs do not support SYSCALL in 32-bit mode, but the kernel
   initializes MSR_CSTAR unconditionally. That MSR write is normaly
   ignored by the CPU, but in a TDX guest it raises a #VE trap.

   Exlude Intel CPUs from the MSR_CSTAR initialization.

>  #ifdef CONFIG_IA32_EMULATION
> -	wrmsrl(MSR_CSTAR, (unsigned long)entry_SYSCALL_compat);
> +	/*
> +	 * CSTAR is not needed on Intel because it doesn't support
> +	 * 32bit SYSCALL, but only SYSENTER. On a TDX guest
> +	 * it leads to a #GP.

Sigh. Above you write it raises #VE, but now it's #GP !?!

        Intel CPUs do not support 32-bit SYSCALL. Writing to MSR_CSTAR
        is normaly ignored by the CPU, but raises a #VE trap in a TDX
        guest.

Hmm?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 11/11] x86/tdx: Handle CPUID via #VE
  2021-10-09  5:37 ` [PATCH v10 11/11] x86/tdx: Handle CPUID via #VE Kuppuswamy Sathyanarayanan
@ 2021-10-14 12:01   ` Thomas Gleixner
  2021-10-14 13:25     ` Dave Hansen
  0 siblings, 1 reply; 58+ messages in thread
From: Thomas Gleixner @ 2021-10-14 12:01 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On Fri, Oct 08 2021 at 22:37, Kuppuswamy Sathyanarayanan wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>
> When running virtualized, the CPUID instruction is handled differently
> based on the leaf being accessed.  The behavior depends only on the
> leaf and applies equally to both kernel/ring-0 and userspace/ring-3
> execution of CPUID. Historically, there are two basic classes:
>
>  * Leaves handled transparently to the guest
>  * Leaves handled by the VMM
>
> In a typical guest without TDX, "handled by the VMM" leaves cause a
> VMEXIT.  TDX replaces these VMEXITs with a #VE exception in the guest.
> The guest typically handles the #VE by making a hypercall to the VMM.
>
> The TDX module specification [1], section titled "CPUID Virtualization"
> talks about a few more classes of CPUID handling. But, for the purposes
> of this patch, the "handled transparently" CPUID leaves are all lumped
> together because the guest handling is the same.

What means 'for the purposes of this patch'? And I have no idea what's
lumped together means either.

#VE is either raised on CPUID leaf/sub-leaf combinations which are not
part of the CPUID virtualization table or on request of the guest for
all CPUID invocations (either Ring0 or Ring3 or both).

So this patch implements the #VE handling for EXIT_REASON_CPUID by
handing it through to the hypercall, which in turn lets the TDX module
handle it by invoking the host VMM.

So unless the guest requested #VE on all CPUID invocations it won't see
a #VE for the transparent leaf/sub-leaf combinations. #VE is raised
for the VMM handled ones which goes through the hypercall, right?

I must be missing something, but that last paragraph does not make any
sense to me.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 11/11] x86/tdx: Handle CPUID via #VE
  2021-10-14 12:01   ` Thomas Gleixner
@ 2021-10-14 13:25     ` Dave Hansen
  0 siblings, 0 replies; 58+ messages in thread
From: Dave Hansen @ 2021-10-14 13:25 UTC (permalink / raw)
  To: Thomas Gleixner, Kuppuswamy Sathyanarayanan, Ingo Molnar,
	Borislav Petkov, x86, Paolo Bonzini, David Hildenbrand,
	Andrea Arcangeli, Josh Poimboeuf, Juergen Gross, Deep Shah,
	VMware Inc, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel
  Cc: Peter H Anvin, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On 10/14/21 5:01 AM, Thomas Gleixner wrote:
>> The TDX module specification [1], section titled "CPUID Virtualization"
>> talks about a few more classes of CPUID handling. But, for the purposes
>> of this patch, the "handled transparently" CPUID leaves are all lumped
>> together because the guest handling is the same.
> What means 'for the purposes of this patch'? And I have no idea what's
> lumped together means either.

The TDX spec talks about a several classes of CPUID bit fields.  These
matter in terms of trust because some of the bits can be supplied by the
VMM.  The original changelog here tried to draw a distinction between
bits which can be supplied by the VMM ("as configured" in the spec) and
those that come directly from the hardware ("native" in the spec).

"Lumped together" means that although these cases are differentiated in
the spec, their differences are immaterial for the purposes of this patch.

I think I wrote this.  I think I was trying to write it at someone who
might be puzzling over the TDX spec and its table of CPUID leaves
wondering why the #VE handler doesn't have cases for lots of the classes
called out in the spec.

> #VE is either raised on CPUID leaf/sub-leaf combinations which are not
> part of the CPUID virtualization table or on request of the guest for
> all CPUID invocations (either Ring0 or Ring3 or both).

Yes.  This changelog should also include a note about the unconditional
#VE for all CPUID executions feature, noting that it is expected to be
disabled for ring0.

> So this patch implements the #VE handling for EXIT_REASON_CPUID by
> handing it through to the hypercall, which in turn lets the TDX module
> handle it by invoking the host VMM.
> 
> So unless the guest requested #VE on all CPUID invocations it won't see
> a #VE for the transparent leaf/sub-leaf combinations. #VE is raised
> for the VMM handled ones which goes through the hypercall, right?

Yep, that's a pretty succinct way to put it.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 10/11] x86/tdx: Don't write CSTAR MSR on Intel
  2021-10-14 10:47   ` Thomas Gleixner
@ 2021-10-14 13:47     ` Andi Kleen
  2021-10-14 14:27       ` Thomas Gleixner
  0 siblings, 1 reply; 58+ messages in thread
From: Andi Kleen @ 2021-10-14 13:47 UTC (permalink / raw)
  To: Thomas Gleixner, Kuppuswamy Sathyanarayanan, Ingo Molnar,
	Borislav Petkov, x86, Paolo Bonzini, David Hildenbrand,
	Andrea Arcangeli, Josh Poimboeuf, Juergen Gross, Deep Shah,
	VMware Inc, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel


>> -	wrmsrl(MSR_CSTAR, (unsigned long)entry_SYSCALL_compat);
>> +	/*
>> +	 * CSTAR is not needed on Intel because it doesn't support
>> +	 * 32bit SYSCALL, but only SYSENTER. On a TDX guest
>> +	 * it leads to a #GP.
> Sigh. Above you write it raises #VE, but now it's #GP !?!


The unhandled #VE trap is handled like a #GP, which is then caught by 
the kernel wrmsr code.

So both are correct.

>
>          Intel CPUs do not support 32-bit SYSCALL. Writing to MSR_CSTAR
>          is normaly ignored by the CPU, but raises a #VE trap in a TDX
>          guest.
>
> Hmm?


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 10/11] x86/tdx: Don't write CSTAR MSR on Intel
  2021-10-14 13:47     ` Andi Kleen
@ 2021-10-14 14:27       ` Thomas Gleixner
  0 siblings, 0 replies; 58+ messages in thread
From: Thomas Gleixner @ 2021-10-14 14:27 UTC (permalink / raw)
  To: Andi Kleen, Kuppuswamy Sathyanarayanan, Ingo Molnar,
	Borislav Petkov, x86, Paolo Bonzini, David Hildenbrand,
	Andrea Arcangeli, Josh Poimboeuf, Juergen Gross, Deep Shah,
	VMware Inc, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On Thu, Oct 14 2021 at 06:47, Andi Kleen wrote:

>>> -	wrmsrl(MSR_CSTAR, (unsigned long)entry_SYSCALL_compat);
>>> +	/*
>>> +	 * CSTAR is not needed on Intel because it doesn't support
>>> +	 * 32bit SYSCALL, but only SYSENTER. On a TDX guest
>>> +	 * it leads to a #GP.
>> Sigh. Above you write it raises #VE, but now it's #GP !?!
>
>
> The unhandled #VE trap is handled like a #GP, which is then caught by 
> the kernel wrmsr code.

That's completely irrelevant because that's an implementation detail of
the #VE handler. It raises #VE in the first place and that's unwanted no
matter what the #VE handler does with it. It could just pretent that
it's fine and move on.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 03/11] x86/cpufeatures: Add TDX Guest CPU feature
  2021-10-13 23:02             ` Thomas Gleixner
@ 2021-10-14 17:28               ` Sathyanarayanan Kuppuswamy
  0 siblings, 0 replies; 58+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2021-10-14 17:28 UTC (permalink / raw)
  To: Thomas Gleixner, Borislav Petkov
  Cc: Kuppuswamy Sathyanarayanan, Ingo Molnar, x86, Paolo Bonzini,
	David Hildenbrand, Andrea Arcangeli, Josh Poimboeuf,
	Juergen Gross, Deep Shah, VMware Inc, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Peter H Anvin,
	Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel


On 10/13/21 4:02 PM, Thomas Gleixner wrote:
> On Wed, Oct 13 2021 at 15:28, Sathyanarayanan Kuppuswamy wrote:
>> On 10/13/21 2:37 PM, Borislav Petkov wrote:
>>> On Wed, Oct 13, 2021 at 11:25:35PM +0200, Thomas Gleixner wrote:
>>>> So this ends up in doing:
>>>>
>>>>      use();
>>>>      init();
>>>>
>>>> Can you spot what's wrong with that?
>>>>
>>>> That's a clear violation of common sense and is simply not going to
>>>> happen. Why? If you think about deep defensive programming then use()
>>>> will look like this:
>>>>
>>>> use()
>>>> {
>>>>           assert(initialized);
>>>> }
>>>>
>>>> which is not something made up. It's a fundamental principle of
>>>> programming and some languages enforce that for very good reasons.
>>>>
>>>> Just because it can be done in C is no justification.
>>> Oh, I heartily agree.
>>>    
>>>> What's wrong with:
>>>>
>>>> x86_64_start_kernel()
>>>>
>>>>       tdx_early_init();
>>>>
>>>>       copy_bootdata();
>>>>       
>>>>       tdx_late_init();
>>>>
>>>> Absolutely nothing. It's clear, simple and well defined.
>>> I like simple more than anyone, so sure, I'd prefer that a lot more.
>>>
>>> And so the options parsing would need to happen early using, say,
>>> cmdline_find_option() or so, like sme_enable() does.
>> Since in tdx_early_init() all we are going to do is to initialize
>> "tdx_guest_detected" using cpuid call, shall we name it
>> tdx_guest_cpuid_init()? (similar to sme_enable call in AMD)
> How is that similar?
>
> Just chose a name which makes sense in the overall scheme. I surely care
> about naming convetions, but what I care more about is correctness.
>
> Whether it ends up being named
>
>          tdx_enable() - to match the SME muck
>
> or
>
>          tdx_detect()
>
> or whatever makes sense does not really matter. As long as it makes
> sense. That's bikeshed painting realm.
>
> Coming back to your suggestion 'tdx_guest_cpuid_init()'. Just sit back
> and think about what that name says:
>
>      tdx_guest_cpuid_init()
>
> For the uniformed reader this says:
>
>      If tdx guest then initialize CPUID
>
> which is obviously not what you want to express, right?
>
> So, naming matters but you are free to chose something which makes
> sense.

Makes sense. I agree tdx_guest_cpuid_init() name is bit confusing.
I will use tdx_detect as you have mentioned.

>
> Thanks,
>
>          tglx

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 04/11] x86/tdx: Add TDX support to intel_cc_platform_has()
  2021-10-14  7:12   ` Thomas Gleixner
@ 2021-10-14 17:31     ` Sathyanarayanan Kuppuswamy
  0 siblings, 0 replies; 58+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2021-10-14 17:31 UTC (permalink / raw)
  To: Thomas Gleixner, Kuppuswamy Sathyanarayanan, Ingo Molnar,
	Borislav Petkov, x86, Paolo Bonzini, David Hildenbrand,
	Andrea Arcangeli, Josh Poimboeuf, Juergen Gross, Deep Shah,
	VMware Inc, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel


On 10/14/21 12:12 AM, Thomas Gleixner wrote:
> On Fri, Oct 08 2021 at 22:37, Kuppuswamy Sathyanarayanan wrote:
>>   
>> -static bool __maybe_unused intel_cc_platform_has(enum cc_attr attr)
>> +static bool intel_cc_platform_has(enum cc_attr attr)
>>   {
>>   #ifdef CONFIG_INTEL_TDX_GUEST
>> -	return false;
>> +	switch (attr) {
>> +	case CC_ATTR_GUEST_TDX:
>> +		return is_tdx_guest();
> This function is only called when is_tdx_guest() is true. So
> is_tdx_guest() has to be called again to make sure?


Agree. In future, if intel_cc_platform_has() is used in non-tdx case then
we can add is_tdx_guest() back there. For now it is redundant
as you have mentioned. I will remove it in next version.

>
> Also the ifdeffery can just go away simply because the compiler will
> discard this function when CONFIG_INTEL_TDX_GUEST=n due to:
Agree. I will remove it.
>
>> +#ifdef CONFIG_INTEL_TDX_GUEST
>> +
>> +bool is_tdx_guest(void);
>> +void __init tdx_early_init(void);
>> +
>> +#else
>> +
>> +static inline bool is_tdx_guest(void) { return false; }
>> +static inline void tdx_early_init(void) { };
>> +
>> +#endif /* CONFIG_INTEL_TDX_GUEST */
> Thanks,
>
>          tglx

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 05/11] x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper functions
  2021-10-14  7:28   ` Thomas Gleixner
@ 2021-10-15  0:19     ` Sathyanarayanan Kuppuswamy
  0 siblings, 0 replies; 58+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2021-10-15  0:19 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel


On 10/14/21 12:28 AM, Thomas Gleixner wrote:
> On Fri, Oct 08 2021 at 22:37, Kuppuswamy Sathyanarayanan wrote:
>>   
>> +#ifdef CONFIG_INTEL_TDX_GUEST
>> +#include <asm/tdx.h>
>> +#endif
> Please get rid of the #ifdef and make sure that tdx.h can be included unconditionally.


It can be included unconditionally. I will remove it in next version.

>
>> +	/* Restore callee-saved GPRs as mandated by the x86_64 ABI */
>> +	pop %r12
>> +	pop %r13
>> +	pop %r14
>> +	pop %r15
>> +
>> +	jmp 2f
>> +1:
> ASM supports named labels.

I will use a meaningful label instead of 1 or 2. I will fix this in next 
version.

>
>> +       movq $(-EINVAL), %rax
>> +2:
>> +       FRAME_END
>> +
>> +       retq
>> +SYM_FUNC_END(__tdx_hypercall)
>
>> +/*
>> + * Wrapper for standard use of __tdx_hypercall with BUG_ON() check
>> + * for TDCALL error.
>> + */
>> +static inline u64 _tdx_hypercall(u64 fn, u64 r12, u64 r13, u64 r14,
>> +				 u64 r15, struct tdx_hypercall_output *out)
>> +{
>> +	struct tdx_hypercall_output outl;
>> +	u64 err;
>> +
>> +	/* __tdx_hypercall() does not accept NULL output pointer */
>> +	if (!out)
>> +		out = &outl;
>> +
>> +	err = __tdx_hypercall(TDX_HYPERCALL_STANDARD, fn, r12, r13, r14,
>> +			      r15, out);
>> +
>> +	/* Non zero return value indicates buggy TDX module, so panic */
>> +	BUG_ON(err);
> BUG() does not necessarily panic. If you want to panic in then invoke
> the function which does that, i.e. panic().


Yes, we want to panic here. I will use panic() in next version.


>
> Thanks,
>
>          tglx

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 07/11] x86/tdx: Add HLT support for TDX guest
  2021-10-14  9:30   ` Thomas Gleixner
@ 2021-10-15  1:33     ` Sathyanarayanan Kuppuswamy
  2021-10-15 15:03       ` Sean Christopherson
  0 siblings, 1 reply; 58+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2021-10-15  1:33 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel


On 10/14/21 2:30 AM, Thomas Gleixner wrote:
> On Fri, Oct 08 2021 at 22:37, Kuppuswamy Sathyanarayanan wrote:
>> +/* HLT TDVMCALL sub-function ID */
>> +#define EXIT_REASON_HLT			12
> arch/x86/include/uapi/asm/vmx.h:#define EXIT_REASON_HLT                 12
>
> Is there a _good_ reason why this can't be reused?

As per current use case we can re-use it. Out of all TDX hypercall sub 
function
IDs, only Instruction.PCONFIG (65) exit reason id is missing in vmx.h. 
But currently
we are not handling it. So we can ignore it for now.

I will fix this in next version.

>>   /*
>>    * __tdx_module_call()  - Helper function used by TDX guests to request
>>    * services from the TDX module (does not include VMM services).
>> @@ -235,6 +238,33 @@ SYM_FUNC_START(__tdx_hypercall)
>>   
>>   	movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx
>>   
>> +	/*
>> +	 * For the idle loop STI needs to be called directly before
>> +	 * the TDCALL that enters idle (EXIT_REASON_HLT case). STI
>> +	 * enables interrupts only one instruction later. If there
>> +	 * are any instructions between the STI and the TDCALL for
>> +	 * HLT then an interrupt could happen in that time, but the
>> +	 * code would go back to sleep afterwards, which can cause
>> +	 * longer delays.
>> +	 *
>> +	 * This leads to significant difference in network performance
>> +	 * benchmarks. So add a special case for EXIT_REASON_HLT to
>> +	 * trigger STI before TDCALL. But this change is not required
>> +	 * for all HLT cases. So use R15 register value to identify the
>> +	 * case which needs STI. So, if R11 is EXIT_REASON_HLT and R15
>> +	 * is 1, then call STI before TDCALL instruction. Note that R15
>> +	 * register is not required by TDCALL ABI when triggering the
>> +	 * hypercall for EXIT_REASON_HLT case. So use it in software to
>> +	 * select the STI case.
>> +	 */
>> +	cmpl $EXIT_REASON_HLT, %r11d
>> +	jne skip_sti
>> +	cmpl $1, %r15d
> You already have a nice define for EXIT_REASON_HLT. Please add one for this
> constant as well.

Ok. I will add it.

>> +	jne skip_sti
>> +	/* Set R15 register to 0, it is unused in EXIT_REASON_HLT case */
>> +	xor %r15, %r15
>> +	sti
>> +skip_sti:
>>   	tdcall
>>   bool tdx_get_ve_info(struct ve_info *ve)
>>   {
>>   	struct tdx_module_output out;
>> @@ -84,8 +141,19 @@ bool tdx_get_ve_info(struct ve_info *ve)
>>   bool tdx_handle_virtualization_exception(struct pt_regs *regs,
>>   					 struct ve_info *ve)
>>   {
>> -	pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
>> -	return false;
>> +	switch (ve->exit_reason) {
>> +	case EXIT_REASON_HLT:
>> +		tdx_halt();
>> +		break;
>> +	default:
>> +		pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
>> +		return false;
>> +	}
>> +
>> +	/* After successful #VE handling, move the IP */
>> +	regs->ip += ve->instr_len;
>> +
>> +	return true;
>>   }
>>   
>>   void __init tdx_early_init(void)
>> @@ -95,5 +163,8 @@ void __init tdx_early_init(void)
>>   
>>   	setup_force_cpu_cap(X86_FEATURE_TDX_GUEST);
>>   
>> +	pv_ops.irq.safe_halt = tdx_safe_halt;
>> +	pv_ops.irq.halt = tdx_halt;
> Colour me confused, but why do we end up in #VE with EXIT_REASON_HLT
> when halt/safe_halt is paravirtualized?
>
> There may be a valid reason. If so then this lacks a comment.

No, with halt/safe_halt para-virtualized, we should never get
#VE for it. I think this is a redundant handler code. This was
added before we have added the pv_ops support. I will remove
it in next version.

>
> Thanks,
>
>          tglx

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 08/11] x86/tdx: Wire up KVM hypercalls
  2021-10-14 10:21   ` Thomas Gleixner
@ 2021-10-15  3:03     ` Sathyanarayanan Kuppuswamy
  0 siblings, 0 replies; 58+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2021-10-15  3:03 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel


On 10/14/21 3:21 AM, Thomas Gleixner wrote:
> On Fri, Oct 08 2021 at 22:37, Kuppuswamy Sathyanarayanan wrote:
>> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>>   
>>   #ifdef CONFIG_KVM_GUEST
>> @@ -32,6 +34,10 @@ static inline bool kvm_check_and_clear_guest_paused(void)
>>   static inline long kvm_hypercall0(unsigned int nr)
>>   {
>>   	long ret;
>> +
>> +	if (cc_platform_has(CC_ATTR_GUEST_TDX))
>> +		return tdx_kvm_hypercall(nr, 0, 0, 0, 0);
> So if TDX is not enabled in Kconfig this cannot be optimized out unless
> CC_PLATFORM is disabled as well. But what's worse is that every
> hypercall needs to call into cc_platform_has().
>
> None of the hypercalls is used before the early TDX detection. So we can
> simply use
>
>         if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
>
> here, right? Then you add X86_FEATURE_TDX_GUEST to the disabled feature
> bits correctly and all of the above is solved.
>
> Hmm?


Make sense. Since this will only be used after tdx_early_init() call,
and X86_FEATURE_TDX_GUEST is also set in that call, we can just use
cpu_feature_enabled(X86_FEATURE_TDX_GUEST) as you have mentioned.

I will fix this in next version.

>   
>> +#if defined(CONFIG_KVM_GUEST) && defined(CONFIG_INTEL_TDX_GUEST)
>> +static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
>> +				     unsigned long p2, unsigned long p3,
>> +				     unsigned long p4)
>> +{
>> +	struct tdx_hypercall_output out;
>> +	u64 err;
>> +
>> +	err = __tdx_hypercall(TDX_HYPERCALL_VENDOR_KVM, nr, p1, p2,
>> +			      p3, p4, &out);
>> +
>> +	/*
>> +	 * Non zero return value means buggy TDX module (which is fatal).
>> +	 * So use BUG_ON() to panic.
>> +	 */
>> +	BUG_ON(err);
>> +
>> +	return out.r10;
>> +}
> Can we make that a proper exported function (instead of
> tdx_kvm_hypercall) so we don't end up with the very same code inlined
> all over the place?


Initially it was an exported function. But we made it inline in tdx.h
to simplify the implementation. But if exported function is preferred,
I will fix it in next version.

>
> Thanks,
>
>          tglx
>
-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 07/11] x86/tdx: Add HLT support for TDX guest
  2021-10-15  1:33     ` Sathyanarayanan Kuppuswamy
@ 2021-10-15 15:03       ` Sean Christopherson
  0 siblings, 0 replies; 58+ messages in thread
From: Sean Christopherson @ 2021-10-15 15:03 UTC (permalink / raw)
  To: Sathyanarayanan Kuppuswamy
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Kuppuswamy Sathyanarayanan, linux-kernel

On Thu, Oct 14, 2021, Sathyanarayanan Kuppuswamy wrote:
> 
> On 10/14/21 2:30 AM, Thomas Gleixner wrote:
> > On Fri, Oct 08 2021 at 22:37, Kuppuswamy Sathyanarayanan wrote:
> > > +/* HLT TDVMCALL sub-function ID */
> > > +#define EXIT_REASON_HLT			12
> > arch/x86/include/uapi/asm/vmx.h:#define EXIT_REASON_HLT                 12
> > 
> > Is there a _good_ reason why this can't be reused?

So, no.  :-D

> As per current use case we can re-use it. Out of all TDX hypercall sub
> function IDs, only Instruction.PCONFIG (65) exit reason id is missing in
> vmx.h. But currently we are not handling it. So we can ignore it for now.

If the kernel proper needs EXIT_REASON_PCONFIG before KVM, just send a patch...

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 01/11] x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT
  2021-10-09  5:37 ` [PATCH v10 01/11] x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT Kuppuswamy Sathyanarayanan
@ 2021-10-15 16:59   ` David Hildenbrand
  0 siblings, 0 replies; 58+ messages in thread
From: David Hildenbrand @ 2021-10-15 16:59 UTC (permalink / raw)
  To: Kuppuswamy Sathyanarayanan, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, Paolo Bonzini, Andrea Arcangeli,
	Josh Poimboeuf, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On 09.10.21 07:37, Kuppuswamy Sathyanarayanan wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> 
> CONFIG_PARAVIRT_XXL is mainly defined/used by XEN PV guests. For
> other VM guest types, features supported under CONFIG_PARAVIRT
> are self sufficient. CONFIG_PARAVIRT mainly provides support for
> TLB flush operations and time related operations.
> 
> For TDX guest as well, paravirt calls under CONFIG_PARVIRT meets
> most of its requirement except the need of HLT and SAFE_HLT
> paravirt calls, which is currently defined under
> COFNIG_PARAVIRT_XXL.
> 
> Since enabling CONFIG_PARAVIRT_XXL is too bloated for TDX guest
> like platforms, move HLT and SAFE_HLT paravirt calls under
> CONFIG_PARAVIRT.
> 
> Moving HLT and SAFE_HLT paravirt calls are not fatal and should not
> break any functionality for current users of CONFIG_PARAVIRT.
> 
> Co-developed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reviewed-by: Andi Kleen <ak@linux.intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> ---
> 

Reviewed-by: David Hildenbrand <david@redhat.com>


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 06/11] x86/traps: Add #VE support for TDX guest
  2021-10-14  8:30   ` Thomas Gleixner
@ 2021-10-17  2:45     ` Sathyanarayanan Kuppuswamy
  2021-10-17  3:18       ` Dave Hansen
  0 siblings, 1 reply; 58+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2021-10-17  2:45 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Dave Hansen, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel


On 10/14/21 1:30 AM, Thomas Gleixner wrote:
> On Fri, Oct 08 2021 at 22:37, Kuppuswamy Sathyanarayanan wrote:
>>   
>> +/*
>> + * Used by #VE exception handler to gather the #VE exception
>> + * info from the TDX module. This is software only structure
>> + * and not related to TDX module/VMM.
>> + */
>> +struct ve_info {
>> +	u64 exit_reason;
>> +	u64 exit_qual;
>> +	u64 gla;	/* Guest Linear (virtual) Address */
>> +	u64 gpa;	/* Guest Physical (virtual) Address */
> Please do not use tail comments and with a tab between type and name
> this becomes more readable:
>
> 	/* Guest Linear (virtual) Address */
> 	u64	gla;
>
>          /* Guest Physical (virtual) Address */
> 	u64	gpa;
>
> Hmm?

Agree. I will fix this in next version. I will make sure to fix similar 
issues
in other TDX patch series as well.

>    
>> +bool tdx_get_ve_info(struct ve_info *ve)
>> +{
>> +	struct tdx_module_output out;
>> +	u64 ret;
>> +
>> +	if (!ve)
>> +		return false;
> This should be WARN_ON_ONCE() if at all.

This is an input validation. Since we need to de-reference "ve" in the 
following code,
we want to validate it to avoid NULL pointer exception. As per current 
usage of this
function, "ve" will not be NULL. But we have added this check as a extra 
precaution
against future use cases.

If you think this check is not required now, I can remove it. Please let 
me know.

>> +	/*
>> +	 * NMIs and machine checks are suppressed. Before this point any
>> +	 * #VE is fatal. After this point (TDGETVEINFO call), NMIs and
>> +	 * additional #VEs are permitted (but it is expected not to
>> +	 * happen unless kernel panics).
>> +	 */
>> +	ret = __tdx_module_call(TDX_GET_VEINFO, 0, 0, 0, 0, &out);
>> +	if (ret)
>> +		return false;
>           if (__tdx...())
>           	return false;

Agree. I will simplify it as you have mentioned in next version.

>> +	ve->exit_reason = out.rcx;
>> +	ve->exit_qual   = out.rdx;
>> +	ve->gla         = out.r8;
>> +	ve->gpa         = out.r9;
>> +	ve->instr_len   = out.r10 & UINT_MAX;
>> +	ve->instr_info  = out.r10 >> 32;
>> +
>> +	return true;
>> +}
>> +
>> +bool tdx_handle_virtualization_exception(struct pt_regs *regs,
>> +					 struct ve_info *ve)
>> +{
>> +	pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
>> +	return false;
>> +}
>> +
>>   void __init tdx_early_init(void)
>>   {
>>   	if (!is_tdx_guest())
>> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
>> index a58800973aed..70d76c3a548f 100644
>> --- a/arch/x86/kernel/traps.c
>> +++ b/arch/x86/kernel/traps.c
>> @@ -61,6 +61,7 @@
>>   #include <asm/insn.h>
>>   #include <asm/insn-eval.h>
>>   #include <asm/vdso.h>
>> +#include <asm/tdx.h>
>>   
>>   #ifdef CONFIG_X86_64
>>   #include <asm/x86_init.h>
>> @@ -1140,6 +1141,82 @@ DEFINE_IDTENTRY(exc_device_not_available)
>>   	}
>>   }
>>   
>> +#ifdef CONFIG_INTEL_TDX_GUEST
>> +#define VE_FAULT_STR "VE fault"
>> +static void ve_raise_fault(struct pt_regs *regs, long error_code)
> Please do not glue the #define and the function definition
> together. Newlines exist for a reaon.

Got it. I will fix this issue in next version.

>> +{
>> +	struct task_struct *tsk = current;
>> +
>> +	if (user_mode(regs)) {
>> +		tsk->thread.error_code = error_code;
>> +		tsk->thread.trap_nr = X86_TRAP_VE;
>> +
>> +		/*
>> +		 * Not fixing up VDSO exceptions similar to #GP handler
>> +		 * because it is expected that VDSO doesn't trigger #VE.
> Expected?


It will never happen for TDX. Since this handler is a trimmed out version
of  #GP handler, I want to add some comments about why we don't have
VDSO related handling code. Since this is irrelevant for #VE handler, may
be I remove this comment completely, any comment?

>> +		 */
>> +		show_signal(tsk, SIGSEGV, "", VE_FAULT_STR, regs, error_code);
>> +		force_sig(SIGSEGV);
>> +		return;
>> +	}
>> +
>> +	/*
>> +	 * Attempt to recover from #VE exception failure without
>> +	 * triggering OOPS (useful for MSR read/write failures)
>> +	 */
>> +	if (fixup_exception(regs, X86_TRAP_VE, error_code, 0))
>> +		return;
>> +
>> +	tsk->thread.error_code = error_code;
>> +	tsk->thread.trap_nr = X86_TRAP_VE;
>> +
>> +	/*
>> +	 * To be potentially processing a kprobe fault and to trust the result
>> +	 * from kprobe_running(), it should be non-preemptible.
>> +	 */
>> +	if (!preemptible() &&
>> +	    kprobe_running() &&
> 	if (!preemptible() && kprobe_running() &&
>> +	    kprobe_fault_handler(regs, X86_TRAP_VE))
> perhaps?

Agree. I will fix it in next version.

>> +
>> +DEFINE_IDTENTRY(exc_virtualization_exception)
>> +{
>> +	struct ve_info ve;
>> +	bool ret;
>> +
>> +	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
> Please remove that. The idtentry code is already taking care of that.

Ok. I will remove it in next version.

>> +	/*
>> +	 * NMIs/Machine-checks/Interrupts will be in a disabled state
>> +	 * till TDGETVEINFO TDCALL is executed. This prevents #VE
>> +	 * nesting issue.
> s/This prevents.../This ensures that VE info cannot be overwritten by a
> nested #VE/
>
> Or something like that perhaps?


It is a better way to put it. I will use your version.

> Also a some comment about #VE in general above the DEFINE_IDTENTRY()
> would be appreciated.

I will add more info in next version.

> Thanks,
>
>          tglx

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 06/11] x86/traps: Add #VE support for TDX guest
  2021-10-17  2:45     ` Sathyanarayanan Kuppuswamy
@ 2021-10-17  3:18       ` Dave Hansen
  2021-10-17  3:49         ` Sathyanarayanan Kuppuswamy
  0 siblings, 1 reply; 58+ messages in thread
From: Dave Hansen @ 2021-10-17  3:18 UTC (permalink / raw)
  To: Sathyanarayanan Kuppuswamy, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, Paolo Bonzini, David Hildenbrand,
	Andrea Arcangeli, Josh Poimboeuf, Juergen Gross, Deep Shah,
	VMware Inc, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel
  Cc: Peter H Anvin, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel

On 10/16/21 7:45 PM, Sathyanarayanan Kuppuswamy wrote:
>>> +bool tdx_get_ve_info(struct ve_info *ve)
>>> +{
>>> +    struct tdx_module_output out;
>>> +    u64 ret;
>>> +
>>> +    if (!ve)
>>> +        return false;
>> This should be WARN_ON_ONCE() if at all.
> 
> This is an input validation. Since we need to de-reference "ve" in
> the following code, we want to validate it to avoid NULL pointer
> exception. As per current usage of this function, "ve" will not be
> NULL. But we have added this check as a extra precaution against
> future use cases.
Input validation, eh?

It's one thing if this argument comes from userspace, or is even open
for modules to call.  You *might* have an argument that it should be
checked in case something in the kernel goes insane.

But, there's a single call site.  It looks like this:

> +DEFINE_IDTENTRY(exc_virtualization_exception)
> +{
> +	struct ve_info ve;
...
> +	ret = tdx_get_ve_info(&ve);

Could you please explain, given the existing kernel code, how !ve could
ever possibly happen?  Or, how tdx_get_ve_info() might conceivably ever
be called from another path which is not extremely well controlled?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 06/11] x86/traps: Add #VE support for TDX guest
  2021-10-17  3:18       ` Dave Hansen
@ 2021-10-17  3:49         ` Sathyanarayanan Kuppuswamy
  0 siblings, 0 replies; 58+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2021-10-17  3:49 UTC (permalink / raw)
  To: Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Paolo Bonzini, David Hildenbrand, Andrea Arcangeli,
	Josh Poimboeuf, Juergen Gross, Deep Shah, VMware Inc,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: Peter H Anvin, Tony Luck, Dan Williams, Andi Kleen,
	Kirill Shutemov, Sean Christopherson, Kuppuswamy Sathyanarayanan,
	linux-kernel


On 10/16/21 8:18 PM, Dave Hansen wrote:
> On 10/16/21 7:45 PM, Sathyanarayanan Kuppuswamy wrote:
>>>> +bool tdx_get_ve_info(struct ve_info *ve)
>>>> +{
>>>> +    struct tdx_module_output out;
>>>> +    u64 ret;
>>>> +
>>>> +    if (!ve)
>>>> +        return false;
>>> This should be WARN_ON_ONCE() if at all.
>> This is an input validation. Since we need to de-reference "ve" in
>> the following code, we want to validate it to avoid NULL pointer
>> exception. As per current usage of this function, "ve" will not be
>> NULL. But we have added this check as a extra precaution against
>> future use cases.
> Input validation, eh?
>
> It's one thing if this argument comes from userspace, or is even open
> for modules to call.  You *might* have an argument that it should be
> checked in case something in the kernel goes insane.
>
> But, there's a single call site.  It looks like this:


As per current use cases (exc_virtualization_exception() &
tdx_early_handle_ve()), it will never happen. As I have mentioned,
it was added as a precaution against the future use case or any
misuse of this function in kernel. We did not have this check
initially. But was added later due to review suggestion.

But I am fine with removing it if it is required.

>
>> +DEFINE_IDTENTRY(exc_virtualization_exception)
>> +{
>> +	struct ve_info ve;
> ...
>> +	ret = tdx_get_ve_info(&ve);
> Could you please explain, given the existing kernel code, how !ve could
> ever possibly happen?  Or, how tdx_get_ve_info() might conceivably ever
> be called from another path which is not extremely well controlled?

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 03/11] x86/cpufeatures: Add TDX Guest CPU feature
       [not found]       ` <1a6220a5-3abd-dea1-4b2f-2acade311236@linux.intel.com>
@ 2021-10-18 21:59         ` Borislav Petkov
  2021-10-18 22:04           ` Sathyanarayanan Kuppuswamy
  0 siblings, 1 reply; 58+ messages in thread
From: Borislav Petkov @ 2021-10-18 21:59 UTC (permalink / raw)
  To: Sathyanarayanan Kuppuswamy
  Cc: Josh Poimboeuf, Thomas Gleixner, Ingo Molnar, x86, Paolo Bonzini,
	David Hildenbrand, Andrea Arcangeli, Juergen Gross, Deep Shah,
	VMware Inc, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Peter H Anvin, Dave Hansen, Tony Luck,
	Dan Williams, Andi Kleen, Kirill Shutemov, Sean Christopherson,
	Kuppuswamy Sathyanarayanan, linux-kernel

On Mon, Oct 18, 2021 at 02:05:10PM -0700, Sathyanarayanan Kuppuswamy wrote:
> Any consensus on this?

Well, you can simply not put any comment there now because this patchset
doesn't add those debug options.

The patch which adds them, should add that comment explaining why that
order needs to be the way it is.

In general, I get the impression from review that you add stuff for
future patchsets which only confuses reviewers and instead, you should
simply not do that but do only the required changes, only for the
current aspect of functionality being added.

> I think SME code also talks about future use case in its comment.

No, this has nothing to do with a future use case: you should look at
what that function does and then read that comment again.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v10 03/11] x86/cpufeatures: Add TDX Guest CPU feature
  2021-10-18 21:59         ` Borislav Petkov
@ 2021-10-18 22:04           ` Sathyanarayanan Kuppuswamy
  0 siblings, 0 replies; 58+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2021-10-18 22:04 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Josh Poimboeuf, Thomas Gleixner, Ingo Molnar, x86, Paolo Bonzini,
	David Hildenbrand, Andrea Arcangeli, Juergen Gross, Deep Shah,
	VMware Inc, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Peter H Anvin, Dave Hansen, Tony Luck,
	Dan Williams, Andi Kleen, Kirill Shutemov, Sean Christopherson,
	Kuppuswamy Sathyanarayanan, linux-kernel


On 10/18/21 2:59 PM, Borislav Petkov wrote:
> On Mon, Oct 18, 2021 at 02:05:10PM -0700, Sathyanarayanan Kuppuswamy wrote:
>> Any consensus on this?
> Well, you can simply not put any comment there now because this patchset
> doesn't add those debug options.
>
> The patch which adds them, should add that comment explaining why that
> order needs to be the way it is.

For now I will remove the comment as you have suggested.

>
> In general, I get the impression from review that you add stuff for
> future patchsets which only confuses reviewers and instead, you should
> simply not do that but do only the required changes, only for the
> current aspect of functionality being added.

I will check the patch list again and re-organize the patch set, If I
find any such changes.

>
>> I think SME code also talks about future use case in its comment.
> No, this has nothing to do with a future use case: you should look at
> what that function does and then read that comment again.

Sorry, I will check the details correctly.

>
-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2021-10-18 22:04 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-09  5:37 [PATCH v10 00/11] Add TDX Guest Support (Initial support) Kuppuswamy Sathyanarayanan
2021-10-09  5:37 ` [PATCH v10 01/11] x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT Kuppuswamy Sathyanarayanan
2021-10-15 16:59   ` David Hildenbrand
2021-10-09  5:37 ` [PATCH v10 02/11] x86/tdx: Introduce INTEL_TDX_GUEST config option Kuppuswamy Sathyanarayanan
2021-10-11 18:19   ` Josh Poimboeuf
2021-10-11 18:38     ` Andi Kleen
2021-10-11 18:47       ` Kuppuswamy, Sathyanarayanan
2021-10-09  5:37 ` [PATCH v10 03/11] x86/cpufeatures: Add TDX Guest CPU feature Kuppuswamy Sathyanarayanan
2021-10-13  8:18   ` Borislav Petkov
2021-10-13 13:32     ` Sathyanarayanan Kuppuswamy
2021-10-13 19:42     ` Josh Poimboeuf
2021-10-13 23:19       ` Thomas Gleixner
2021-10-14  0:25         ` Josh Poimboeuf
2021-10-14  7:57           ` Borislav Petkov
     [not found]       ` <1a6220a5-3abd-dea1-4b2f-2acade311236@linux.intel.com>
2021-10-18 21:59         ` Borislav Petkov
2021-10-18 22:04           ` Sathyanarayanan Kuppuswamy
2021-10-13 20:44   ` Thomas Gleixner
2021-10-13 21:05     ` Sathyanarayanan Kuppuswamy
2021-10-13 21:35       ` Thomas Gleixner
2021-10-13 21:07     ` Borislav Petkov
2021-10-13 21:25       ` Thomas Gleixner
2021-10-13 21:37         ` Borislav Petkov
2021-10-13 22:28           ` Sathyanarayanan Kuppuswamy
2021-10-13 23:02             ` Thomas Gleixner
2021-10-14 17:28               ` Sathyanarayanan Kuppuswamy
2021-10-09  5:37 ` [PATCH v10 04/11] x86/tdx: Add TDX support to intel_cc_platform_has() Kuppuswamy Sathyanarayanan
2021-10-13 15:57   ` Borislav Petkov
2021-10-14  7:12   ` Thomas Gleixner
2021-10-14 17:31     ` Sathyanarayanan Kuppuswamy
2021-10-09  5:37 ` [PATCH v10 05/11] x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper functions Kuppuswamy Sathyanarayanan
2021-10-14  7:28   ` Thomas Gleixner
2021-10-15  0:19     ` Sathyanarayanan Kuppuswamy
2021-10-09  5:37 ` [PATCH v10 06/11] x86/traps: Add #VE support for TDX guest Kuppuswamy Sathyanarayanan
2021-10-14  8:30   ` Thomas Gleixner
2021-10-17  2:45     ` Sathyanarayanan Kuppuswamy
2021-10-17  3:18       ` Dave Hansen
2021-10-17  3:49         ` Sathyanarayanan Kuppuswamy
2021-10-09  5:37 ` [PATCH v10 07/11] x86/tdx: Add HLT " Kuppuswamy Sathyanarayanan
2021-10-14  9:30   ` Thomas Gleixner
2021-10-15  1:33     ` Sathyanarayanan Kuppuswamy
2021-10-15 15:03       ` Sean Christopherson
2021-10-09  5:37 ` [PATCH v10 08/11] x86/tdx: Wire up KVM hypercalls Kuppuswamy Sathyanarayanan
2021-10-14 10:21   ` Thomas Gleixner
2021-10-15  3:03     ` Sathyanarayanan Kuppuswamy
2021-10-09  5:37 ` [PATCH v10 09/11] x86/tdx: Add MSR support for TDX guest Kuppuswamy Sathyanarayanan
2021-10-09  5:37 ` [PATCH v10 10/11] x86/tdx: Don't write CSTAR MSR on Intel Kuppuswamy Sathyanarayanan
2021-10-14 10:47   ` Thomas Gleixner
2021-10-14 13:47     ` Andi Kleen
2021-10-14 14:27       ` Thomas Gleixner
2021-10-09  5:37 ` [PATCH v10 11/11] x86/tdx: Handle CPUID via #VE Kuppuswamy Sathyanarayanan
2021-10-14 12:01   ` Thomas Gleixner
2021-10-14 13:25     ` Dave Hansen
2021-10-09  7:38 ` [PATCH v10 00/11] Add TDX Guest Support (Initial support) Borislav Petkov
2021-10-09 20:56   ` Kuppuswamy, Sathyanarayanan
2021-10-11 13:03     ` Borislav Petkov
2021-10-11 16:33       ` Dave Hansen
2021-10-11 16:48         ` Dave Hansen
2021-10-11 17:04           ` Borislav Petkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.