linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] TDX host: kexec() support
@ 2024-01-31 11:31 Huang, Kai
  2024-01-31 11:31 ` [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush during kexec Huang, Kai
                   ` (4 more replies)
  0 siblings, 5 replies; 42+ messages in thread
From: Huang, Kai @ 2024-01-31 11:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, dave.hansen, kirill.shutemov, tglx, bp, mingo, hpa, luto,
	peterz, thomas.lendacky, chao.gao, bhe, nik.borisov, pbonzini

Currently kexec() support and TDX host are muturally exclusive in the
Kconfig.  This series adds the TDX host kexec support so that they can
work together and can be enabled at the same time in the Kconfig.

This follows Dave's suggestion to add the CC_ATTR_HOST_MEM_INCOHERENT
attribute to unify both Intel and AMD, instead of having Intel/AMD
specific checks around [1].

Hi Tom,

I've tested on my TDX testig machine but I don't have AMD machine to
test.  I highly appreciate if you or any AMD guy can help to review
and/or test this series to make sure I didn't break anything.

Thanks a lot!

[1] https://lore.kernel.org/lkml/cbc9c527-17e5-4a63-80fe-85451394cc7c@amd.com/

Kai Huang (4):
  x86/coco: Add a new CC attribute to unify cache flush during kexec
  x86/virt/tdx: Advertise the CC_ATTR_HOST_MEM_INCOHERENT for TDX host
  x86/kexec(): Reset TDX private memory on platforms with TDX erratum
  x86/virt/tdx: Remove the !KEXEC_CORE dependency

 arch/x86/Kconfig                   |   2 +-
 arch/x86/coco/core.c               |  34 +++++++++-
 arch/x86/include/asm/tdx.h         |   2 +
 arch/x86/kernel/machine_kexec_64.c |  18 ++++-
 arch/x86/kernel/process.c          |  14 +---
 arch/x86/mm/mem_encrypt_identity.c |  11 +++-
 arch/x86/virt/vmx/tdx/tdx.c        | 101 +++++++++++++++++++++++++++++
 include/linux/cc_platform.h        |  16 +++++
 8 files changed, 183 insertions(+), 15 deletions(-)


base-commit: a6f0b57202b0ee50dc042bae16494008dc6dc992
-- 
2.34.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush during kexec
  2024-01-31 11:31 [PATCH 0/4] TDX host: kexec() support Huang, Kai
@ 2024-01-31 11:31 ` Huang, Kai
  2024-02-19 16:16   ` Borislav Petkov
  2024-01-31 11:31 ` [PATCH 2/4] x86/virt/tdx: Advertise the CC_ATTR_HOST_MEM_INCOHERENT for TDX host Huang, Kai
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 42+ messages in thread
From: Huang, Kai @ 2024-01-31 11:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, dave.hansen, kirill.shutemov, tglx, bp, mingo, hpa, luto,
	peterz, thomas.lendacky, chao.gao, bhe, nik.borisov, pbonzini

From: Kai Huang <kai.huang@intel.com>

Currently on AMD SME platforms, during kexec() caches are flushed
manually before jumping to the new kernel due to memory encryption.
Intel TDX needs to flush cachelines of TDX private memory before
jumping to the second kernel too, otherwise they may silently corrupt
the new kernel.

Instead of sprinkling both AMD and Intel's specific checks around,
introduce a new CC_ATTR_HOST_MEM_INCOHERENT attribute to unify both
Intel and AMD, and simplify the logic:

	Could the old kernel leave incoherent caches around?
	If so, do WBINVD.

Convert the AMD SME to use this new CC attribute.  A later patch will
utilize this new attribute for Intel TDX too.

Specifically, AMD SME flushes caches at two places: 1) stop_this_cpu();
2) relocate_kernel().  stop_this_cpu() checks the CPUID directly to do
WBINVD for the reason that the current kernel's SME enabling status may
not match the new kernel's choice.  However the relocate_kernel() only
does the WBINVD when the current kernel has enabled SME for the reason
that the new kernel is always placed in an "unencrypted" area.

To simplify the logic, for AMD SME change to always use the way that is
done in stop_this_cpu().  This will cause an additional WBINVD in
relocate_kernel() when the current kernel hasn't enabled SME (e.g.,
disabled by kernel command line), but this is acceptable for the sake of
having less complicated code (see [1] for the relevant discussion).

Note currently the kernel only advertises CC vendor for AMD SME when SME
is actually enabled by the kernel.  To always advertise the new
CC_ATTR_HOST_MEM_INCOHERENT regardless of the kernel's SME enabling
status, change to set CC vendor as long as the hardware has enabled SME.

Note "advertising CC_ATTR_HOST_MEM_INCOHERENT when the hardware has
enabled SME" is still different from "checking the CPUID" (the way that
is done in stop_this_cpu()), but technically the former also serves the
purpose and is actually more accurate.

Such change allows sme_me_mask to be 0 while CC vendor reports as AMD.
But this doesn't impact other CC attributes on AMD platforms, nor does
it impact the cc_mkdec()/cc_mkenc().

[1] https://lore.kernel.org/lkml/cbc9c527-17e5-4a63-80fe-85451394cc7c@amd.com/

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/coco/core.c               | 13 +++++++++++++
 arch/x86/kernel/machine_kexec_64.c |  2 +-
 arch/x86/kernel/process.c          | 14 +++-----------
 arch/x86/mm/mem_encrypt_identity.c | 11 ++++++++++-
 include/linux/cc_platform.h        | 15 +++++++++++++++
 5 files changed, 42 insertions(+), 13 deletions(-)

diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
index eeec9986570e..8d6d727e6e18 100644
--- a/arch/x86/coco/core.c
+++ b/arch/x86/coco/core.c
@@ -72,6 +72,19 @@ static bool noinstr amd_cc_platform_has(enum cc_attr attr)
 	case CC_ATTR_HOST_MEM_ENCRYPT:
 		return sme_me_mask && !(sev_status & MSR_AMD64_SEV_ENABLED);
 
+	case CC_ATTR_HOST_MEM_INCOHERENT:
+		/*
+		 * CC_ATTR_HOST_MEM_INCOHERENT represents whether SME has
+		 * enabled on the platform regardless whether the kernel
+		 * has actually enabled the SME.
+		 */
+		return !(sev_status & MSR_AMD64_SEV_ENABLED);
+
+	/*
+	 * For all CC_ATTR_GUEST_* there's no need to check sme_me_mask
+	 * as it must be true when there's any SEV enable bit set in
+	 * sev_status.
+	 */
 	case CC_ATTR_GUEST_MEM_ENCRYPT:
 		return sev_status & MSR_AMD64_SEV_ENABLED;
 
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index bc0a5348b4a6..c9c6974e2e9c 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -358,7 +358,7 @@ void machine_kexec(struct kimage *image)
 				       (unsigned long)page_list,
 				       image->start,
 				       image->preserve_context,
-				       cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT));
+				       cc_platform_has(CC_ATTR_HOST_MEM_INCOHERENT));
 
 #ifdef CONFIG_KEXEC_JUMP
 	if (image->preserve_context)
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index ab49ade31b0d..2c7e8d9889c0 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -813,18 +813,10 @@ void __noreturn stop_this_cpu(void *dummy)
 	mcheck_cpu_clear(c);
 
 	/*
-	 * Use wbinvd on processors that support SME. This provides support
-	 * for performing a successful kexec when going from SME inactive
-	 * to SME active (or vice-versa). The cache must be cleared so that
-	 * if there are entries with the same physical address, both with and
-	 * without the encryption bit, they don't race each other when flushed
-	 * and potentially end up with the wrong entry being committed to
-	 * memory.
-	 *
-	 * Test the CPUID bit directly because the machine might've cleared
-	 * X86_FEATURE_SME due to cmdline options.
+	 * Use wbinvd on processors that the first kernel *could*
+	 * potentially leave incoherent cachelines.
 	 */
-	if (c->extended_cpuid_level >= 0x8000001f && (cpuid_eax(0x8000001f) & BIT(0)))
+	if (cc_platform_has(CC_ATTR_HOST_MEM_INCOHERENT))
 		native_wbinvd();
 
 	/*
diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
index 7f72472a34d6..87e4fddab770 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -570,9 +570,19 @@ void __init sme_enable(struct boot_params *bp)
 		msr = __rdmsr(MSR_AMD64_SYSCFG);
 		if (!(msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT))
 			return;
+
+		/*
+		 * Always set CC vendor when the platform has SME enabled
+		 * regardless whether the kernel will actually activates the
+		 * SME or not.  This reports the CC_ATTR_HOST_MEM_INCOHERENT
+		 * being true as long as the platform has SME enabled so that
+		 * stop_this_cpu() can do necessary WBINVD during kexec().
+		 */
+		cc_vendor = CC_VENDOR_AMD;
 	} else {
 		/* SEV state cannot be controlled by a command line option */
 		sme_me_mask = me_mask;
+		cc_vendor = CC_VENDOR_AMD;
 		goto out;
 	}
 
@@ -608,7 +618,6 @@ void __init sme_enable(struct boot_params *bp)
 out:
 	if (sme_me_mask) {
 		physical_mask &= ~sme_me_mask;
-		cc_vendor = CC_VENDOR_AMD;
 		cc_set_mask(sme_me_mask);
 	}
 }
diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h
index cb0d6cd1c12f..2f7273596102 100644
--- a/include/linux/cc_platform.h
+++ b/include/linux/cc_platform.h
@@ -42,6 +42,21 @@ enum cc_attr {
 	 */
 	CC_ATTR_HOST_MEM_ENCRYPT,
 
+	/**
+	 * @CC_ATTR_HOST_MEM_INCOHERENT: Host memory encryption can be
+	 * incoherent
+	 *
+	 * The platform/OS is running as a bare-metal system or a hypervisor.
+	 * The memory encryption engine might have left non-cache-coherent
+	 * data in the caches that needs to be flushed.
+	 *
+	 * Use this in places where the cache coherency of the memory matters
+	 * but the encryption status does not.
+	 *
+	 * Includes all systems that set CC_ATTR_HOST_MEM_ENCRYPT.
+	 */
+	CC_ATTR_HOST_MEM_INCOHERENT,
+
 	/**
 	 * @CC_ATTR_GUEST_MEM_ENCRYPT: Guest memory encryption is active
 	 *
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 2/4] x86/virt/tdx: Advertise the CC_ATTR_HOST_MEM_INCOHERENT for TDX host
  2024-01-31 11:31 [PATCH 0/4] TDX host: kexec() support Huang, Kai
  2024-01-31 11:31 ` [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush during kexec Huang, Kai
@ 2024-01-31 11:31 ` Huang, Kai
  2024-01-31 17:11   ` Dave Hansen
  2024-01-31 11:31 ` [PATCH 3/4] x86/kexec(): Reset TDX private memory on platforms with TDX erratum Huang, Kai
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 42+ messages in thread
From: Huang, Kai @ 2024-01-31 11:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, dave.hansen, kirill.shutemov, tglx, bp, mingo, hpa, luto,
	peterz, thomas.lendacky, chao.gao, bhe, nik.borisov, pbonzini

From: Kai Huang <kai.huang@intel.com>

On the TDX capable platform, during kexec() the old kernel needs to
flush dirty cachelines of all TDX private memory otherwise they may
silently corrupt the new kernel's memory.

Advertise the new introduced CC_ATTR_HOST_MEM_INCOHERENT attribute for
TDX host platform so the cache will be flushed during kexec().

Note theoretically cache flush is only needed when TDX module is
initialized, but the module initialization is done at runtime so just
advertise the CC attribute when the platform has TDX enabled.

Signed-off-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/Kconfig            |  1 +
 arch/x86/coco/core.c        | 21 ++++++++++++++++++++-
 arch/x86/virt/vmx/tdx/tdx.c |  3 +++
 include/linux/cc_platform.h |  3 ++-
 4 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 502986237cb6..ac3b32149a77 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1975,6 +1975,7 @@ config INTEL_TDX_HOST
 	depends on CONTIG_ALLOC
 	depends on !KEXEC_CORE
 	depends on X86_MCE
+	select ARCH_HAS_CC_PLATFORM
 	help
 	  Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
 	  host and certain physical attacks.  This option enables necessary TDX
diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
index 8d6d727e6e18..ecb15852b69d 100644
--- a/arch/x86/coco/core.c
+++ b/arch/x86/coco/core.c
@@ -12,11 +12,12 @@
 
 #include <asm/coco.h>
 #include <asm/processor.h>
+#include <asm/cpufeature.h>
 
 enum cc_vendor cc_vendor __ro_after_init = CC_VENDOR_NONE;
 static u64 cc_mask __ro_after_init;
 
-static bool noinstr intel_cc_platform_has(enum cc_attr attr)
+static bool noinstr intel_cc_platform_guest_has(enum cc_attr attr)
 {
 	switch (attr) {
 	case CC_ATTR_GUEST_UNROLL_STRING_IO:
@@ -29,6 +30,24 @@ static bool noinstr intel_cc_platform_has(enum cc_attr attr)
 	}
 }
 
+static bool noinstr intel_cc_platform_host_has(enum cc_attr attr)
+{
+	switch (attr) {
+	case CC_ATTR_HOST_MEM_INCOHERENT:
+		return true;
+	default:
+		return false;
+	}
+}
+
+static bool noinstr intel_cc_platform_has(enum cc_attr attr)
+{
+	if (boot_cpu_has(X86_FEATURE_TDX_HOST_PLATFORM))
+		return intel_cc_platform_host_has(attr);
+
+	return intel_cc_platform_guest_has(attr);
+}
+
 /*
  * Handle the SEV-SNP vTOM case where sme_me_mask is zero, and
  * the other levels of SME/SEV functionality, including C-bit
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 4d6826a76f78..9f1fed458a32 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -37,6 +37,7 @@
 #include <asm/intel-family.h>
 #include <asm/processor.h>
 #include <asm/mce.h>
+#include <asm/coco.h>
 #include "tdx.h"
 
 static u32 tdx_global_keyid __ro_after_init;
@@ -1488,5 +1489,7 @@ void __init tdx_init(void)
 
 	setup_force_cpu_cap(X86_FEATURE_TDX_HOST_PLATFORM);
 
+	cc_vendor = CC_VENDOR_INTEL;
+
 	check_tdx_erratum();
 }
diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h
index 2f7273596102..654777d64dc0 100644
--- a/include/linux/cc_platform.h
+++ b/include/linux/cc_platform.h
@@ -53,7 +53,8 @@ enum cc_attr {
 	 * Use this in places where the cache coherency of the memory matters
 	 * but the encryption status does not.
 	 *
-	 * Includes all systems that set CC_ATTR_HOST_MEM_ENCRYPT.
+	 * Includes all systems that set CC_ATTR_HOST_MEM_ENCRYPT, but
+	 * additionally adds TDX hosts.
 	 */
 	CC_ATTR_HOST_MEM_INCOHERENT,
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 3/4] x86/kexec(): Reset TDX private memory on platforms with TDX erratum
  2024-01-31 11:31 [PATCH 0/4] TDX host: kexec() support Huang, Kai
  2024-01-31 11:31 ` [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush during kexec Huang, Kai
  2024-01-31 11:31 ` [PATCH 2/4] x86/virt/tdx: Advertise the CC_ATTR_HOST_MEM_INCOHERENT for TDX host Huang, Kai
@ 2024-01-31 11:31 ` Huang, Kai
  2024-01-31 21:21   ` Dave Hansen
  2024-02-02  0:54   ` Edgecombe, Rick P
  2024-01-31 11:31 ` [PATCH 4/4] x86/virt/tdx: Remove the !KEXEC_CORE dependency Huang, Kai
  2024-02-01 18:28 ` [PATCH 0/4] TDX host: kexec() support Tom Lendacky
  4 siblings, 2 replies; 42+ messages in thread
From: Huang, Kai @ 2024-01-31 11:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, dave.hansen, kirill.shutemov, tglx, bp, mingo, hpa, luto,
	peterz, thomas.lendacky, chao.gao, bhe, nik.borisov, pbonzini

From: Kai Huang <kai.huang@intel.com>

The first few generations of TDX hardware have an erratum.  A partial
write to a TDX private memory cacheline will silently "poison" the
line.  Subsequent reads will consume the poison and generate a machine
check.  According to the TDX hardware spec, neither of these things
should have happened.

== Background ==

Virtually all kernel memory accesses operations happen in full
cachelines.  In practice, writing a "byte" of memory usually reads a 64
byte cacheline of memory, modifies it, then writes the whole line back.
Those operations do not trigger this problem.

This problem is triggered by "partial" writes where a write transaction
of less than cacheline lands at the memory controller.  The CPU does
these via non-temporal write instructions (like MOVNTI), or through
UC/WC memory mappings.  The issue can also be triggered away from the
CPU by devices doing partial writes via DMA.

== Problem ==

A fast warm reset doesn't reset TDX private memory.  Kexec() can also
boot into the new kernel directly.  Thus if the old kernel has enabled
TDX on the platform with this erratum, the new kernel may get unexpected
machine check.

Note that w/o this erratum any kernel read/write on TDX private memory
should never cause machine check, thus it's OK for the old kernel to
leave TDX private pages as is.

== Solution ==

In short, with this erratum, the kernel needs to explicitly convert all
TDX private pages back to normal to give the new kernel a clean slate
after kexec().  The BIOS is also expected to disable fast warm reset as
a workaround to this erratum, thus this implementation doesn't try to
reset TDX private memory for the reboot case in the kernel but depend on
the BIOS to enable the workaround.

Convert TDX private pages back to normal after all remote cpus has been
stopped and cache flush has been done on all cpus, when no more TDX
activity can happen further.  Do it in machine_kexec() to avoid the
additional overhead to the normal reboot/shutdown as the kernel depends
on the BIOS to disable fast warm reset for the reboot case.

For now TDX private memory can only be PAMT pages.  It would be ideal to
cover all types of TDX private memory here, but there are practical
problems to do so:

1) There's no existing infrastructure to track TDX private pages;
2) It's not feasible to query the TDX module about page type because VMX
   has already been stopped when KVM receives the reboot notifier, plus
   the result from the TDX module may not be accurate (e.g., the remote
   CPU could be stopped right before MOVDIR64B).

One temporary solution is to blindly convert all memory pages, but it's
problematic to do so too, because not all pages are mapped as writable
in the direct mapping.  It can be done by switching to the identical
mapping created for kexec() or a new page table, but the complexity
looks overkill.

Therefore, rather than doing something dramatic, only reset PAMT pages
here.  Other kernel components which use TDX need to do the conversion
on their own by intercepting the rebooting/shutdown notifier (KVM
already does that).

Note kexec() can happen at anytime, including when TDX module is being
initialized.  Register TDX reboot notifier callback to stop further TDX
module initialization.  If there's any ongoing module initialization,
wait until it finishes.  This makes sure the TDX module status is stable
after the reboot notifier callback, and the later kexec() code can read
module status to decide whether PAMTs are stable and available.

Also stop further TDX module initialization in case of machine shutdown
and halt, but not limited to kexec(), as there's no reason to do so in
these cases too.

Signed-off-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/tdx.h         |  2 +
 arch/x86/kernel/machine_kexec_64.c | 16 +++++
 arch/x86/virt/vmx/tdx/tdx.c        | 98 ++++++++++++++++++++++++++++++
 3 files changed, 116 insertions(+)

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index eba178996d84..ed3ac9a8a079 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -116,11 +116,13 @@ static inline u64 sc_retry(sc_func_t func, u64 fn,
 int tdx_cpu_enable(void);
 int tdx_enable(void);
 const char *tdx_dump_mce_info(struct mce *m);
+void tdx_reset_memory(void);
 #else
 static inline void tdx_init(void) { }
 static inline int tdx_cpu_enable(void) { return -ENODEV; }
 static inline int tdx_enable(void)  { return -ENODEV; }
 static inline const char *tdx_dump_mce_info(struct mce *m) { return NULL; }
+static inline void tdx_reset_memory(void) { }
 #endif	/* CONFIG_INTEL_TDX_HOST */
 
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index c9c6974e2e9c..b2279a3f6976 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -28,6 +28,7 @@
 #include <asm/setup.h>
 #include <asm/set_memory.h>
 #include <asm/cpu.h>
+#include <asm/tdx.h>
 
 #ifdef CONFIG_ACPI
 /*
@@ -298,9 +299,24 @@ void machine_kexec(struct kimage *image)
 	void *control_page;
 	int save_ftrace_enabled;
 
+	/*
+	 * For platforms with TDX "partial write machine check" erratum,
+	 * all TDX private pages need to be converted back to normal
+	 * before booting to the new kernel, otherwise the new kernel
+	 * may get unexpected machine check.
+	 *
+	 * But skip this when preserve_context is on.  The second kernel
+	 * shouldn't write to the first kernel's memory anyway.  Skipping
+	 * this also avoids killing TDX in the first kernel, which would
+	 * require more complicated handling.
+	 */
 #ifdef CONFIG_KEXEC_JUMP
 	if (image->preserve_context)
 		save_processor_state();
+	else
+		tdx_reset_memory();
+#else
+	tdx_reset_memory();
 #endif
 
 	save_ftrace_enabled = __ftrace_enabled_save();
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 9f1fed458a32..0537b1b76c2b 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -28,6 +28,7 @@
 #include <linux/acpi.h>
 #include <linux/suspend.h>
 #include <linux/acpi.h>
+#include <linux/reboot.h>
 #include <asm/page.h>
 #include <asm/special_insns.h>
 #include <asm/msr-index.h>
@@ -54,6 +55,8 @@ static DEFINE_MUTEX(tdx_module_lock);
 /* All TDX-usable memory regions.  Protected by mem_hotplug_lock. */
 static LIST_HEAD(tdx_memlist);
 
+static bool tdx_rebooting;
+
 typedef void (*sc_err_func_t)(u64 fn, u64 err, struct tdx_module_args *args);
 
 static inline void seamcall_err(u64 fn, u64 err, struct tdx_module_args *args)
@@ -1187,6 +1190,9 @@ static int __tdx_enable(void)
 {
 	int ret;
 
+	if (tdx_rebooting)
+		return -EAGAIN;
+
 	ret = init_tdx_module();
 	if (ret) {
 		pr_err("module initialization failed (%d)\n", ret);
@@ -1420,6 +1426,90 @@ static struct notifier_block tdx_memory_nb = {
 	.notifier_call = tdx_memory_notifier,
 };
 
+/*
+ * Convert TDX private pages back to normal on platforms with
+ * "partial write machine check" erratum.
+ *
+ * Called from machine_kexec() before booting to the new kernel.
+ */
+void tdx_reset_memory(void)
+{
+	if (!boot_cpu_has(X86_FEATURE_TDX_HOST_PLATFORM))
+		return;
+
+	/*
+	 * Kernel read/write to TDX private memory doesn't
+	 * cause machine check on hardware w/o this erratum.
+	 */
+	if (!boot_cpu_has_bug(X86_BUG_TDX_PW_MCE))
+		return;
+
+	/* Called from kexec() when only rebooting cpu is alive */
+	WARN_ON_ONCE(num_online_cpus() != 1);
+
+	/*
+	 * tdx_reboot_notifier() waits until ongoing TDX module
+	 * initialization to finish, and module initialization is
+	 * rejected after that.  Therefore @tdx_module_status is
+	 * stable here and can be read w/o holding lock.
+	 */
+	if (tdx_module_status != TDX_MODULE_INITIALIZED)
+		return;
+
+	/*
+	 * Flush cache of all TDX private memory _before_ converting
+	 * them back to avoid silent memory corruption.
+	 */
+	native_wbinvd();
+
+	/*
+	 * Convert PAMTs back to normal.  All other cpus are already
+	 * dead and TDMRs/PAMTs are stable.
+	 *
+	 * Ideally it's better to cover all types of TDX private pages
+	 * here, but it's impractical:
+	 *
+	 *  - There's no existing infrastructure to tell whether a page
+	 *    is TDX private memory or not.
+	 *
+	 *  - Using SEAMCALL to query TDX module isn't feasible either:
+	 *    - VMX has been turned off by reaching here so SEAMCALL
+	 *      cannot be made;
+	 *    - Even SEAMCALL can be made the result from TDX module may
+	 *      not be accurate (e.g., remote CPU can be stopped while
+	 *      the kernel is in the middle of reclaiming TDX private
+	 *      page and doing MOVDIR64B).
+	 *
+	 * One temporary solution could be just converting all memory
+	 * pages, but it's problematic too, because not all pages are
+	 * mapped as writable in direct mapping.  It can be done by
+	 * switching to the identical mapping for kexec() or a new page
+	 * table which maps all pages as writable, but the complexity is
+	 * overkill.
+	 *
+	 * Thus instead of doing something dramatic to convert all pages,
+	 * only convert PAMTs here.  Other kernel components which use
+	 * TDX need to do the conversion on their own by intercepting the
+	 * rebooting/shutdown notifier (KVM already does that).
+	 */
+	tdmrs_reset_pamt_all(&tdx_tdmr_list);
+}
+
+static int tdx_reboot_notifier(struct notifier_block *nb, unsigned long mode,
+			       void *unused)
+{
+	/* Wait ongoing TDX initialization to finish */
+	mutex_lock(&tdx_module_lock);
+	tdx_rebooting = true;
+	mutex_unlock(&tdx_module_lock);
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block tdx_reboot_nb = {
+	.notifier_call = tdx_reboot_notifier,
+};
+
 static void __init check_tdx_erratum(void)
 {
 	/*
@@ -1474,6 +1564,14 @@ void __init tdx_init(void)
 		return;
 	}
 
+	err = register_reboot_notifier(&tdx_reboot_nb);
+	if (err) {
+		pr_err("initialization failed: register_reboot_notifier() failed (%d)\n",
+				err);
+		unregister_memory_notifier(&tdx_memory_nb);
+		return;
+	}
+
 #if defined(CONFIG_ACPI) && defined(CONFIG_SUSPEND)
 	pr_info("Disable ACPI S3. Turn off TDX in the BIOS to use ACPI S3.\n");
 	acpi_suspend_lowlevel = NULL;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 4/4] x86/virt/tdx: Remove the !KEXEC_CORE dependency
  2024-01-31 11:31 [PATCH 0/4] TDX host: kexec() support Huang, Kai
                   ` (2 preceding siblings ...)
  2024-01-31 11:31 ` [PATCH 3/4] x86/kexec(): Reset TDX private memory on platforms with TDX erratum Huang, Kai
@ 2024-01-31 11:31 ` Huang, Kai
  2024-02-01 18:28 ` [PATCH 0/4] TDX host: kexec() support Tom Lendacky
  4 siblings, 0 replies; 42+ messages in thread
From: Huang, Kai @ 2024-01-31 11:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, dave.hansen, kirill.shutemov, tglx, bp, mingo, hpa, luto,
	peterz, thomas.lendacky, chao.gao, bhe, nik.borisov, pbonzini

From: Kai Huang <kai.huang@intel.com>

Now TDX host can work with kexec().  Remove the !KEXEC_CORE dependency.

Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/Kconfig | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ac3b32149a77..5225f8f3eade 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1973,7 +1973,6 @@ config INTEL_TDX_HOST
 	depends on X86_X2APIC
 	select ARCH_KEEP_MEMBLOCK
 	depends on CONTIG_ALLOC
-	depends on !KEXEC_CORE
 	depends on X86_MCE
 	select ARCH_HAS_CC_PLATFORM
 	help
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/4] x86/virt/tdx: Advertise the CC_ATTR_HOST_MEM_INCOHERENT for TDX host
  2024-01-31 11:31 ` [PATCH 2/4] x86/virt/tdx: Advertise the CC_ATTR_HOST_MEM_INCOHERENT for TDX host Huang, Kai
@ 2024-01-31 17:11   ` Dave Hansen
  2024-02-01 14:42     ` Huang, Kai
  0 siblings, 1 reply; 42+ messages in thread
From: Dave Hansen @ 2024-01-31 17:11 UTC (permalink / raw)
  To: Huang, Kai, linux-kernel
  Cc: x86, kirill.shutemov, tglx, bp, mingo, hpa, luto, peterz,
	thomas.lendacky, chao.gao, bhe, nik.borisov, pbonzini

On 1/31/24 03:31, Huang, Kai wrote:
> From: Kai Huang <kai.huang@intel.com>
> 
> On the TDX capable platform, during kexec() the old kernel needs to
> flush dirty cachelines of all TDX private memory otherwise they may
> silently corrupt the new kernel's memory.
> 
> Advertise the new introduced CC_ATTR_HOST_MEM_INCOHERENT attribute for
> TDX host platform so the cache will be flushed during kexec().

So, you're setting a new bit, CC_ATTR_HOST_MEM_INCOHERENT.  The way I
like to deal with these is to go back and look at the definition of
CC_ATTR_HOST_MEM_INCOHERENT and see whether the changelog here convinces
me that CC_ATTR_HOST_MEM_INCOHERENT is being set appropriately.  Bonus
points if this changelog uses the same nomenclature as the comment
describing CC_ATTR_HOST_MEM_INCOHERENT.

How well does this match the comment above CC_ATTR_HOST_MEM_INCOHERENT?

> Note theoretically cache flush is only needed when TDX module is
> initialized, but the module initialization is done at runtime so just
> advertise the CC attribute when the platform has TDX enabled.

I find this really hard to parse.  Here's a rewrite, as usual:

	A TDX-host-capable system might not actually have any incoherent
	memory.  This can occur if a TDX module was never initialized or
	if the caches have been flushed since the last time TDX was
	used.  Ignore that case.  Eliminate the need for any locking and
	assume that any TDX-host-capable system might have incoherent
	memory by always setting CC_ATTR_HOST_MEM_INCOHERENT.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/4] x86/kexec(): Reset TDX private memory on platforms with TDX erratum
  2024-01-31 11:31 ` [PATCH 3/4] x86/kexec(): Reset TDX private memory on platforms with TDX erratum Huang, Kai
@ 2024-01-31 21:21   ` Dave Hansen
  2024-01-31 22:03     ` Kirill A. Shutemov
  2024-02-01 14:35     ` Huang, Kai
  2024-02-02  0:54   ` Edgecombe, Rick P
  1 sibling, 2 replies; 42+ messages in thread
From: Dave Hansen @ 2024-01-31 21:21 UTC (permalink / raw)
  To: Huang, Kai, linux-kernel
  Cc: x86, kirill.shutemov, tglx, bp, mingo, hpa, luto, peterz,
	thomas.lendacky, chao.gao, bhe, nik.borisov, pbonzini

On 1/31/24 03:31, Huang, Kai wrote:
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -28,6 +28,7 @@
>  #include <asm/setup.h>
>  #include <asm/set_memory.h>
>  #include <asm/cpu.h>
> +#include <asm/tdx.h>
>  
>  #ifdef CONFIG_ACPI
>  /*
> @@ -298,9 +299,24 @@ void machine_kexec(struct kimage *image)
>  	void *control_page;
>  	int save_ftrace_enabled;
>  
> +	/*
> +	 * For platforms with TDX "partial write machine check" erratum,
> +	 * all TDX private pages need to be converted back to normal
> +	 * before booting to the new kernel, otherwise the new kernel
> +	 * may get unexpected machine check.
> +	 *
> +	 * But skip this when preserve_context is on.  The second kernel
> +	 * shouldn't write to the first kernel's memory anyway.  Skipping
> +	 * this also avoids killing TDX in the first kernel, which would
> +	 * require more complicated handling.
> +	 */

This is waaaaaaaaaaaaaaay too much information to stick in a generic
function.  Just call tdx_reset_memory() and make the argument about

>  #ifdef CONFIG_KEXEC_JUMP
>  	if (image->preserve_context)
>  		save_processor_state();
> +	else
> +		tdx_reset_memory();
> +#else
> +	tdx_reset_memory();
>  #endif

Wow, that's awfully hard to read.  I really wish folks' gag reflex would
kick in when they see stuff like this to get them to spend an additional
15 seconds to turn this into:

	if (IS_ENABLED(CONFIG_KEXEC_JUMP) && image->preserve_context)
		save_processor_state();
	else
		tdx_reset_memory();

>  	save_ftrace_enabled = __ftrace_enabled_save();
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index 9f1fed458a32..0537b1b76c2b 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -28,6 +28,7 @@
>  #include <linux/acpi.h>
>  #include <linux/suspend.h>
>  #include <linux/acpi.h>
> +#include <linux/reboot.h>
>  #include <asm/page.h>
>  #include <asm/special_insns.h>
>  #include <asm/msr-index.h>
> @@ -54,6 +55,8 @@ static DEFINE_MUTEX(tdx_module_lock);
>  /* All TDX-usable memory regions.  Protected by mem_hotplug_lock. */
>  static LIST_HEAD(tdx_memlist);
>  
> +static bool tdx_rebooting;
> +
>  typedef void (*sc_err_func_t)(u64 fn, u64 err, struct tdx_module_args *args);
>  
>  static inline void seamcall_err(u64 fn, u64 err, struct tdx_module_args *args)
> @@ -1187,6 +1190,9 @@ static int __tdx_enable(void)
>  {
>  	int ret;
>  
> +	if (tdx_rebooting)
> +		return -EAGAIN;

This whole 'tdx_rebooting' deserves to at least be in its own patch.
Also -EAGAIN seems to be a really odd return code for a permanent failure.

>  	ret = init_tdx_module();
>  	if (ret) {
>  		pr_err("module initialization failed (%d)\n", ret);
> @@ -1420,6 +1426,90 @@ static struct notifier_block tdx_memory_nb = {
>  	.notifier_call = tdx_memory_notifier,
>  };
>  
> +/*
> + * Convert TDX private pages back to normal on platforms with
> + * "partial write machine check" erratum.
> + *
> + * Called from machine_kexec() before booting to the new kernel.
> + */
> +void tdx_reset_memory(void)
> +{
> +	if (!boot_cpu_has(X86_FEATURE_TDX_HOST_PLATFORM))
> +		return;
> +
> +	/*
> +	 * Kernel read/write to TDX private memory doesn't
> +	 * cause machine check on hardware w/o this erratum.
> +	 */
> +	if (!boot_cpu_has_bug(X86_BUG_TDX_PW_MCE))
> +		return;
> +
> +	/* Called from kexec() when only rebooting cpu is alive */
> +	WARN_ON_ONCE(num_online_cpus() != 1);
> +
> +	/*
> +	 * tdx_reboot_notifier() waits until ongoing TDX module
> +	 * initialization to finish, and module initialization is
> +	 * rejected after that.  Therefore @tdx_module_status is
> +	 * stable here and can be read w/o holding lock.
> +	 */
> +	if (tdx_module_status != TDX_MODULE_INITIALIZED)
> +		return;

kexec() can't happen until after reboot notifiers are called.
tdx_reboot_notifier() makes 'tdx_module_status' stable, so no lock is
needed.

Right?

> +	/*
> +	 * Flush cache of all TDX private memory _before_ converting
> +	 * them back to avoid silent memory corruption.
> +	 */
> +	native_wbinvd();

Since this is single-threaded, it also needs to mention where all the
other CPU caches got flushed.

> +	/*
> +	 * Convert PAMTs back to normal.  All other cpus are already
> +	 * dead and TDMRs/PAMTs are stable.
> +	 *
> +	 * Ideally it's better to cover all types of TDX private pages
> +	 * here, but it's impractical:
> +	 *
> +	 *  - There's no existing infrastructure to tell whether a page
> +	 *    is TDX private memory or not.
> +	 *
> +	 *  - Using SEAMCALL to query TDX module isn't feasible either:
> +	 *    - VMX has been turned off by reaching here so SEAMCALL
> +	 *      cannot be made;
> +	 *    - Even SEAMCALL can be made the result from TDX module may

		    ^ if     ^ a          ^,

> +	 *      not be accurate (e.g., remote CPU can be stopped while
> +	 *      the kernel is in the middle of reclaiming TDX private
> +	 *      page and doing MOVDIR64B).
> +	 *
> +	 * One temporary solution could be just converting all memory
> +	 * pages, but it's problematic too, because not all pages are
> +	 * mapped as writable in direct mapping.  It can be done by
> +	 * switching to the identical mapping for kexec() or a new page
> +	 * table which maps all pages as writable, but the complexity is
> +	 * overkill.
> +	 *
> +	 * Thus instead of doing something dramatic to convert all pages,
> +	 * only convert PAMTs here.  Other kernel components which use
> +	 * TDX need to do the conversion on their own by intercepting the
> +	 * rebooting/shutdown notifier (KVM already does that).
> +	 */

I'd leave the extended alternatives discussion in the changelog, not here.

Focus on what _this_ is doing and why it is imperfect:

 1. Just reset the PAMDs
 2. This leaves A, B, and C undealt with
 3. The risk of leaving those is ...


> +	tdmrs_reset_pamt_all(&tdx_tdmr_list);
> +}
> +
> +static int tdx_reboot_notifier(struct notifier_block *nb, unsigned long mode,
> +			       void *unused)
> +{
> +	/* Wait ongoing TDX initialization to finish */
> +	mutex_lock(&tdx_module_lock);
> +	tdx_rebooting = true;
> +	mutex_unlock(&tdx_module_lock);
> +
> +	return NOTIFY_OK;
> +}

Why is 'tdx_rebooting' a new variable instead of a new state in
tdx_module_status?


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/4] x86/kexec(): Reset TDX private memory on platforms with TDX erratum
  2024-01-31 21:21   ` Dave Hansen
@ 2024-01-31 22:03     ` Kirill A. Shutemov
  2024-02-01 14:22       ` Huang, Kai
  2024-02-01 14:35     ` Huang, Kai
  1 sibling, 1 reply; 42+ messages in thread
From: Kirill A. Shutemov @ 2024-01-31 22:03 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Huang, Kai, linux-kernel, x86, tglx, bp, mingo, hpa, luto,
	peterz, thomas.lendacky, chao.gao, bhe, nik.borisov, pbonzini

On Wed, Jan 31, 2024 at 01:21:39PM -0800, Dave Hansen wrote:
> >  #ifdef CONFIG_KEXEC_JUMP
> >  	if (image->preserve_context)
> >  		save_processor_state();
> > +	else
> > +		tdx_reset_memory();
> > +#else
> > +	tdx_reset_memory();
> >  #endif
> 
> Wow, that's awfully hard to read.  I really wish folks' gag reflex would
> kick in when they see stuff like this to get them to spend an additional
> 15 seconds to turn this into:
> 
> 	if (IS_ENABLED(CONFIG_KEXEC_JUMP) && image->preserve_context)
> 		save_processor_state();
> 	else
> 		tdx_reset_memory();

save_processor_state() is declared under CONFIG_PM_SLEEP, so I guess your
variant might break build in some cases without updated suspend.h.
-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/4] x86/kexec(): Reset TDX private memory on platforms with TDX erratum
  2024-01-31 22:03     ` Kirill A. Shutemov
@ 2024-02-01 14:22       ` Huang, Kai
  2024-02-01 14:39         ` Kirill A. Shutemov
  0 siblings, 1 reply; 42+ messages in thread
From: Huang, Kai @ 2024-02-01 14:22 UTC (permalink / raw)
  To: Kirill A. Shutemov, Dave Hansen
  Cc: linux-kernel, x86, tglx, bp, mingo, hpa, luto, peterz,
	thomas.lendacky, chao.gao, bhe, nik.borisov, pbonzini



On 1/02/2024 6:03 am, Kirill A. Shutemov wrote:
> On Wed, Jan 31, 2024 at 01:21:39PM -0800, Dave Hansen wrote:
>>>   #ifdef CONFIG_KEXEC_JUMP
>>>   	if (image->preserve_context)
>>>   		save_processor_state();
>>> +	else
>>> +		tdx_reset_memory();
>>> +#else
>>> +	tdx_reset_memory();
>>>   #endif
>>
>> Wow, that's awfully hard to read.  I really wish folks' gag reflex would
>> kick in when they see stuff like this to get them to spend an additional
>> 15 seconds to turn this into:
>>
>> 	if (IS_ENABLED(CONFIG_KEXEC_JUMP) && image->preserve_context)
>> 		save_processor_state();
>> 	else
>> 		tdx_reset_memory();
> 
> save_processor_state() is declared under CONFIG_PM_SLEEP, so I guess your
> variant might break build in some cases without updated suspend.h.

I tried.  If I turn off both SUSPEND and HIBERNATION in the Kconfig I 
got build error:

arch/x86/kernel/machine_kexec_64.c:325:17: error: implicit declaration 
of function ‘save_processor_state’ [-Werror=implicit-function-declaration]
   325 |                 save_processor_state();
       |                 ^~~~~~~~~~~~~~~~~~~~


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/4] x86/kexec(): Reset TDX private memory on platforms with TDX erratum
  2024-01-31 21:21   ` Dave Hansen
  2024-01-31 22:03     ` Kirill A. Shutemov
@ 2024-02-01 14:35     ` Huang, Kai
  1 sibling, 0 replies; 42+ messages in thread
From: Huang, Kai @ 2024-02-01 14:35 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel
  Cc: x86, kirill.shutemov, tglx, bp, mingo, hpa, luto, peterz,
	thomas.lendacky, chao.gao, bhe, nik.borisov, pbonzini



On 1/02/2024 5:21 am, Dave Hansen wrote:
> On 1/31/24 03:31, Huang, Kai wrote:
>> --- a/arch/x86/kernel/machine_kexec_64.c
>> +++ b/arch/x86/kernel/machine_kexec_64.c
>> @@ -28,6 +28,7 @@
>>   #include <asm/setup.h>
>>   #include <asm/set_memory.h>
>>   #include <asm/cpu.h>
>> +#include <asm/tdx.h>
>>   
>>   #ifdef CONFIG_ACPI
>>   /*
>> @@ -298,9 +299,24 @@ void machine_kexec(struct kimage *image)
>>   	void *control_page;
>>   	int save_ftrace_enabled;
>>   
>> +	/*
>> +	 * For platforms with TDX "partial write machine check" erratum,
>> +	 * all TDX private pages need to be converted back to normal
>> +	 * before booting to the new kernel, otherwise the new kernel
>> +	 * may get unexpected machine check.
>> +	 *
>> +	 * But skip this when preserve_context is on.  The second kernel
>> +	 * shouldn't write to the first kernel's memory anyway.  Skipping
>> +	 * this also avoids killing TDX in the first kernel, which would
>> +	 * require more complicated handling.
>> +	 */
> 
> This is waaaaaaaaaaaaaaay too much information to stick in a generic
> function.  Just call tdx_reset_memory() and make the argument about

Hi Dave,

Thanks for review.

Agreed too much info here.  But I don't quite understand your second 
sentence here.  Can I have your suggestion again?

> 
>>   #ifdef CONFIG_KEXEC_JUMP
>>   	if (image->preserve_context)
>>   		save_processor_state();
>> +	else
>> +		tdx_reset_memory();
>> +#else
>> +	tdx_reset_memory();
>>   #endif
> 
> Wow, that's awfully hard to read.  I really wish folks' gag reflex would
> kick in when they see stuff like this to get them to spend an additional
> 15 seconds to turn this into:
> 
> 	if (IS_ENABLED(CONFIG_KEXEC_JUMP) && image->preserve_context)
> 		save_processor_state();
> 	else
> 		tdx_reset_memory();
> 

Yeah this is way better, but as Kirill pointed out we will have build 
error when both SUSPEND and HIBERNATION is off.

Maybe below?

#ifdef CONFIG_KEXEC_JUMP
	if (image->preserve_context)
		save_processor_state();
	else
#endif
	tdx_reset_memory();


>>   	save_ftrace_enabled = __ftrace_enabled_save();
>> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
>> index 9f1fed458a32..0537b1b76c2b 100644
>> --- a/arch/x86/virt/vmx/tdx/tdx.c
>> +++ b/arch/x86/virt/vmx/tdx/tdx.c
>> @@ -28,6 +28,7 @@
>>   #include <linux/acpi.h>
>>   #include <linux/suspend.h>
>>   #include <linux/acpi.h>
>> +#include <linux/reboot.h>
>>   #include <asm/page.h>
>>   #include <asm/special_insns.h>
>>   #include <asm/msr-index.h>
>> @@ -54,6 +55,8 @@ static DEFINE_MUTEX(tdx_module_lock);
>>   /* All TDX-usable memory regions.  Protected by mem_hotplug_lock. */
>>   static LIST_HEAD(tdx_memlist);
>>   
>> +static bool tdx_rebooting;
>> +
>>   typedef void (*sc_err_func_t)(u64 fn, u64 err, struct tdx_module_args *args);
>>   
>>   static inline void seamcall_err(u64 fn, u64 err, struct tdx_module_args *args)
>> @@ -1187,6 +1190,9 @@ static int __tdx_enable(void)
>>   {
>>   	int ret;
>>   
>> +	if (tdx_rebooting)
>> +		return -EAGAIN;
> 
> This whole 'tdx_rebooting' deserves to at least be in its own patch.
> Also -EAGAIN seems to be a really odd return code for a permanent failure.

Will move to a separate patch.

How about just -EINVAL?

> 
>>   	ret = init_tdx_module();
>>   	if (ret) {
>>   		pr_err("module initialization failed (%d)\n", ret);
>> @@ -1420,6 +1426,90 @@ static struct notifier_block tdx_memory_nb = {
>>   	.notifier_call = tdx_memory_notifier,
>>   };
>>   
>> +/*
>> + * Convert TDX private pages back to normal on platforms with
>> + * "partial write machine check" erratum.
>> + *
>> + * Called from machine_kexec() before booting to the new kernel.
>> + */
>> +void tdx_reset_memory(void)
>> +{
>> +	if (!boot_cpu_has(X86_FEATURE_TDX_HOST_PLATFORM))
>> +		return;
>> +
>> +	/*
>> +	 * Kernel read/write to TDX private memory doesn't
>> +	 * cause machine check on hardware w/o this erratum.
>> +	 */
>> +	if (!boot_cpu_has_bug(X86_BUG_TDX_PW_MCE))
>> +		return;
>> +
>> +	/* Called from kexec() when only rebooting cpu is alive */
>> +	WARN_ON_ONCE(num_online_cpus() != 1);
>> +
>> +	/*
>> +	 * tdx_reboot_notifier() waits until ongoing TDX module
>> +	 * initialization to finish, and module initialization is
>> +	 * rejected after that.  Therefore @tdx_module_status is
>> +	 * stable here and can be read w/o holding lock.
>> +	 */
>> +	if (tdx_module_status != TDX_MODULE_INITIALIZED)
>> +		return;
> 
> kexec() can't happen until after reboot notifiers are called.
> tdx_reboot_notifier() makes 'tdx_module_status' stable, so no lock is
> needed.
> 
> Right?

Yes.  I can replace the comment with your words.

> 
>> +	/*
>> +	 * Flush cache of all TDX private memory _before_ converting
>> +	 * them back to avoid silent memory corruption.
>> +	 */
>> +	native_wbinvd();
> 
> Since this is single-threaded, it also needs to mention where all the
> other CPU caches got flushed.

OK will point out.

> 
>> +	/*
>> +	 * Convert PAMTs back to normal.  All other cpus are already
>> +	 * dead and TDMRs/PAMTs are stable.
>> +	 *
>> +	 * Ideally it's better to cover all types of TDX private pages
>> +	 * here, but it's impractical:
>> +	 *
>> +	 *  - There's no existing infrastructure to tell whether a page
>> +	 *    is TDX private memory or not.
>> +	 *
>> +	 *  - Using SEAMCALL to query TDX module isn't feasible either:
>> +	 *    - VMX has been turned off by reaching here so SEAMCALL
>> +	 *      cannot be made;
>> +	 *    - Even SEAMCALL can be made the result from TDX module may
> 
> 		    ^ if     ^ a          ^,
> 

Thanks.  Will add.

>> +	 *      not be accurate (e.g., remote CPU can be stopped while
>> +	 *      the kernel is in the middle of reclaiming TDX private
>> +	 *      page and doing MOVDIR64B).
>> +	 *
>> +	 * One temporary solution could be just converting all memory
>> +	 * pages, but it's problematic too, because not all pages are
>> +	 * mapped as writable in direct mapping.  It can be done by
>> +	 * switching to the identical mapping for kexec() or a new page
>> +	 * table which maps all pages as writable, but the complexity is
>> +	 * overkill.
>> +	 *
>> +	 * Thus instead of doing something dramatic to convert all pages,
>> +	 * only convert PAMTs here.  Other kernel components which use
>> +	 * TDX need to do the conversion on their own by intercepting the
>> +	 * rebooting/shutdown notifier (KVM already does that).
>> +	 */
> 
> I'd leave the extended alternatives discussion in the changelog, not here.
> 
> Focus on what _this_ is doing and why it is imperfect:
> 
>   1. Just reset the PAMDs
>   2. This leaves A, B, and C undealt with
>   3. The risk of leaving those is ...
> 

Agreed. Will do.

> 
>> +	tdmrs_reset_pamt_all(&tdx_tdmr_list);
>> +}
>> +
>> +static int tdx_reboot_notifier(struct notifier_block *nb, unsigned long mode,
>> +			       void *unused)
>> +{
>> +	/* Wait ongoing TDX initialization to finish */
>> +	mutex_lock(&tdx_module_lock);
>> +	tdx_rebooting = true;
>> +	mutex_unlock(&tdx_module_lock);
>> +
>> +	return NOTIFY_OK;
>> +}
> 
> Why is 'tdx_rebooting' a new variable instead of a new state in
> tdx_module_status?
> 

I think we can but I believe using tdx_rebooting is more clear.

For instance, if we want to add a new state TDX_MODULE_ABORT for this 
case, when tdx_reboot_notifier() is called after module initialization, 
then the tdx_module_status (which is already TDX_MODULE_INITIALIZED) 
will be overwritten and we will lose the information that module 
initialization has been done successfully.

We may be able to avoid these using more complicated logic but I think 
using a separate tdx_rebooting is straightforward.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/4] x86/kexec(): Reset TDX private memory on platforms with TDX erratum
  2024-02-01 14:22       ` Huang, Kai
@ 2024-02-01 14:39         ` Kirill A. Shutemov
  2024-02-01 14:47           ` Huang, Kai
  2024-02-01 16:57           ` Dave Hansen
  0 siblings, 2 replies; 42+ messages in thread
From: Kirill A. Shutemov @ 2024-02-01 14:39 UTC (permalink / raw)
  To: Huang, Kai
  Cc: Dave Hansen, linux-kernel, x86, tglx, bp, mingo, hpa, luto,
	peterz, thomas.lendacky, chao.gao, bhe, nik.borisov, pbonzini

On Thu, Feb 01, 2024 at 10:22:18PM +0800, Huang, Kai wrote:
> 
> 
> On 1/02/2024 6:03 am, Kirill A. Shutemov wrote:
> > On Wed, Jan 31, 2024 at 01:21:39PM -0800, Dave Hansen wrote:
> > > >   #ifdef CONFIG_KEXEC_JUMP
> > > >   	if (image->preserve_context)
> > > >   		save_processor_state();
> > > > +	else
> > > > +		tdx_reset_memory();
> > > > +#else
> > > > +	tdx_reset_memory();
> > > >   #endif
> > > 
> > > Wow, that's awfully hard to read.  I really wish folks' gag reflex would
> > > kick in when they see stuff like this to get them to spend an additional
> > > 15 seconds to turn this into:
> > > 
> > > 	if (IS_ENABLED(CONFIG_KEXEC_JUMP) && image->preserve_context)
> > > 		save_processor_state();
> > > 	else
> > > 		tdx_reset_memory();
> > 
> > save_processor_state() is declared under CONFIG_PM_SLEEP, so I guess your
> > variant might break build in some cases without updated suspend.h.
> 
> I tried.  If I turn off both SUSPEND and HIBERNATION in the Kconfig I got
> build error:
> 
> arch/x86/kernel/machine_kexec_64.c:325:17: error: implicit declaration of
> function ‘save_processor_state’ [-Werror=implicit-function-declaration]
>   325 |                 save_processor_state();
>       |                 ^~~~~~~~~~~~~~~~~~~~

Moving save_processor_state() declaration outside all defines would do the
trick.

Something like the patch below.

But finding the right spot for the move can be tricky. I don't particular
like where I moved it.

diff --git a/include/linux/suspend.h b/include/linux/suspend.h
index ef503088942d..1c17059df039 100644
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -201,6 +201,9 @@ struct platform_s2idle_ops {
 	void (*end)(void);
 };
 
+void save_processor_state(void);
+void restore_processor_state(void);
+
 #ifdef CONFIG_SUSPEND
 extern suspend_state_t pm_suspend_target_state;
 extern suspend_state_t mem_sleep_current;
@@ -491,8 +494,6 @@ static inline int is_hibernate_resume_dev(dev_t dev) { return 0; }
 extern struct mutex system_transition_mutex;
 
 #ifdef CONFIG_PM_SLEEP
-void save_processor_state(void);
-void restore_processor_state(void);
 
 /* kernel/power/main.c */
 extern int register_pm_notifier(struct notifier_block *nb);
-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/4] x86/virt/tdx: Advertise the CC_ATTR_HOST_MEM_INCOHERENT for TDX host
  2024-01-31 17:11   ` Dave Hansen
@ 2024-02-01 14:42     ` Huang, Kai
  0 siblings, 0 replies; 42+ messages in thread
From: Huang, Kai @ 2024-02-01 14:42 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel
  Cc: x86, kirill.shutemov, tglx, bp, mingo, hpa, luto, peterz,
	thomas.lendacky, chao.gao, bhe, nik.borisov, pbonzini



On 1/02/2024 1:11 am, Dave Hansen wrote:
> On 1/31/24 03:31, Huang, Kai wrote:
>> From: Kai Huang <kai.huang@intel.com>
>>
>> On the TDX capable platform, during kexec() the old kernel needs to
>> flush dirty cachelines of all TDX private memory otherwise they may
>> silently corrupt the new kernel's memory.
>>
>> Advertise the new introduced CC_ATTR_HOST_MEM_INCOHERENT attribute for
>> TDX host platform so the cache will be flushed during kexec().
> 
> So, you're setting a new bit, CC_ATTR_HOST_MEM_INCOHERENT.  The way I
> like to deal with these is to go back and look at the definition of
> CC_ATTR_HOST_MEM_INCOHERENT and see whether the changelog here convinces
> me that CC_ATTR_HOST_MEM_INCOHERENT is being set appropriately.  Bonus
> points if this changelog uses the same nomenclature as the comment
> describing CC_ATTR_HOST_MEM_INCOHERENT.
> 
> How well does this match the comment above CC_ATTR_HOST_MEM_INCOHERENT?

I will try to improve the changelog: explain what does 
CC_ATTR_HOST_MEM_INCOHERENT mean, and why it is suitable for TDX host.

Please let me know if you have more comments.  Thanks.

> 
>> Note theoretically cache flush is only needed when TDX module is
>> initialized, but the module initialization is done at runtime so just
>> advertise the CC attribute when the platform has TDX enabled.
> 
> I find this really hard to parse.  Here's a rewrite, as usual:
> 
> 	A TDX-host-capable system might not actually have any incoherent
> 	memory.  This can occur if a TDX module was never initialized or
> 	if the caches have been flushed since the last time TDX was
> 	used.  Ignore that case.  Eliminate the need for any locking and
> 	assume that any TDX-host-capable system might have incoherent
> 	memory by always setting CC_ATTR_HOST_MEM_INCOHERENT.

I appreciate, as usual.  :-)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/4] x86/kexec(): Reset TDX private memory on platforms with TDX erratum
  2024-02-01 14:39         ` Kirill A. Shutemov
@ 2024-02-01 14:47           ` Huang, Kai
  2024-02-01 16:57           ` Dave Hansen
  1 sibling, 0 replies; 42+ messages in thread
From: Huang, Kai @ 2024-02-01 14:47 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, linux-kernel, x86, tglx, bp, mingo, hpa, luto,
	peterz, thomas.lendacky, chao.gao, bhe, nik.borisov, pbonzini



On 1/02/2024 10:39 pm, Kirill A. Shutemov wrote:
> On Thu, Feb 01, 2024 at 10:22:18PM +0800, Huang, Kai wrote:
>>
>>
>> On 1/02/2024 6:03 am, Kirill A. Shutemov wrote:
>>> On Wed, Jan 31, 2024 at 01:21:39PM -0800, Dave Hansen wrote:
>>>>>    #ifdef CONFIG_KEXEC_JUMP
>>>>>    	if (image->preserve_context)
>>>>>    		save_processor_state();
>>>>> +	else
>>>>> +		tdx_reset_memory();
>>>>> +#else
>>>>> +	tdx_reset_memory();
>>>>>    #endif
>>>>
>>>> Wow, that's awfully hard to read.  I really wish folks' gag reflex would
>>>> kick in when they see stuff like this to get them to spend an additional
>>>> 15 seconds to turn this into:
>>>>
>>>> 	if (IS_ENABLED(CONFIG_KEXEC_JUMP) && image->preserve_context)
>>>> 		save_processor_state();
>>>> 	else
>>>> 		tdx_reset_memory();
>>>
>>> save_processor_state() is declared under CONFIG_PM_SLEEP, so I guess your
>>> variant might break build in some cases without updated suspend.h.
>>
>> I tried.  If I turn off both SUSPEND and HIBERNATION in the Kconfig I got
>> build error:
>>
>> arch/x86/kernel/machine_kexec_64.c:325:17: error: implicit declaration of
>> function ‘save_processor_state’ [-Werror=implicit-function-declaration]
>>    325 |                 save_processor_state();
>>        |                 ^~~~~~~~~~~~~~~~~~~~
> 
> Moving save_processor_state() declaration outside all defines would do the
> trick.
> 
> Something like the patch below.
> 
> But finding the right spot for the move can be tricky. I don't particular
> like where I moved it.
> 

I don't feel I have enough justification to do such change.

As I replied in another email, how about:

#ifdef CONFIG_KEXEC_JUMP
     if (image->preserve_context)
         save_processor_state();
     else
#endif
     tdx_reset_memory();

Not ideal, but looks slightly better?

But I'll leave to Dave to comment.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/4] x86/kexec(): Reset TDX private memory on platforms with TDX erratum
  2024-02-01 14:39         ` Kirill A. Shutemov
  2024-02-01 14:47           ` Huang, Kai
@ 2024-02-01 16:57           ` Dave Hansen
  2024-02-05  6:49             ` Huang, Kai
  1 sibling, 1 reply; 42+ messages in thread
From: Dave Hansen @ 2024-02-01 16:57 UTC (permalink / raw)
  To: Kirill A. Shutemov, Huang, Kai
  Cc: linux-kernel, x86, tglx, bp, mingo, hpa, luto, peterz,
	thomas.lendacky, chao.gao, bhe, nik.borisov, pbonzini

On 2/1/24 06:39, Kirill A. Shutemov wrote:
>> On 1/02/2024 6:03 am, Kirill A. Shutemov wrote:
>>> On Wed, Jan 31, 2024 at 01:21:39PM -0800, Dave Hansen wrote:
>>>>>   #ifdef CONFIG_KEXEC_JUMP
>>>>>   	if (image->preserve_context)
>>>>>   		save_processor_state();
>>>>> +	else
>>>>> +		tdx_reset_memory();
>>>>> +#else
>>>>> +	tdx_reset_memory();
>>>>>   #endif
...
> +void save_processor_state(void);
> +void restore_processor_state(void);
> +
>  #ifdef CONFIG_SUSPEND
>  extern suspend_state_t pm_suspend_target_state;
>  extern suspend_state_t mem_sleep_current;
> @@ -491,8 +494,6 @@ static inline int is_hibernate_resume_dev(dev_t dev) { return 0; }
>  extern struct mutex system_transition_mutex;
>  
>  #ifdef CONFIG_PM_SLEEP
> -void save_processor_state(void);
> -void restore_processor_state(void);

It's a little funky that we've got a #ifdef CONFIG_KEXEC_JUMP in the .c
file and then we're dodging around an #ifdef CONFIG_PM_SLEEP in the
header.  This is one of the reasons we shouldn't be putting #ifdefs in
.c files in the first place.  But I digress...

Either way, if you focus on getting the dang #ifdef out of the main code
flow, the rest will fall in place easily.  Heck, if you even do this in
the x86 kexec code:

static void kexec_save_processor_start(image)
{
#ifdef CONFIG_KEXEC_JUMP
	if (image->preserve_context)
		save_processor_state();
#endif
}

it'll leave you with:

	kexec_save_processor_start(image);

	/* Give a good reason here */
	if (!image->preserve_context)
		tdx_reset_memory();

which is *FINE*.  No funky #ifdefs, indentation or else's dangling about.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/4] TDX host: kexec() support
  2024-01-31 11:31 [PATCH 0/4] TDX host: kexec() support Huang, Kai
                   ` (3 preceding siblings ...)
  2024-01-31 11:31 ` [PATCH 4/4] x86/virt/tdx: Remove the !KEXEC_CORE dependency Huang, Kai
@ 2024-02-01 18:28 ` Tom Lendacky
  2024-02-05  6:50   ` Huang, Kai
  4 siblings, 1 reply; 42+ messages in thread
From: Tom Lendacky @ 2024-02-01 18:28 UTC (permalink / raw)
  To: Huang, Kai, linux-kernel, Kalra, Ashish
  Cc: x86, dave.hansen, kirill.shutemov, tglx, bp, mingo, hpa, luto,
	peterz, chao.gao, bhe, nik.borisov, pbonzini

On 1/31/24 05:31, Huang, Kai wrote:
> Currently kexec() support and TDX host are muturally exclusive in the
> Kconfig.  This series adds the TDX host kexec support so that they can
> work together and can be enabled at the same time in the Kconfig.
> 
> This follows Dave's suggestion to add the CC_ATTR_HOST_MEM_INCOHERENT
> attribute to unify both Intel and AMD, instead of having Intel/AMD
> specific checks around [1].
> 
> Hi Tom,
> 
> I've tested on my TDX testig machine but I don't have AMD machine to
> test.  I highly appreciate if you or any AMD guy can help to review
> and/or test this series to make sure I didn't break anything.

Hi Kai,

I'm adding Ashish to the thread to take a look at this as he's been 
focusing on kexec related things recently.

Thanks,
Tom

> 
> Thanks a lot!
> 
> [1] https://lore.kernel.org/lkml/cbc9c527-17e5-4a63-80fe-85451394cc7c@amd.com/
> 
> Kai Huang (4):
>    x86/coco: Add a new CC attribute to unify cache flush during kexec
>    x86/virt/tdx: Advertise the CC_ATTR_HOST_MEM_INCOHERENT for TDX host
>    x86/kexec(): Reset TDX private memory on platforms with TDX erratum
>    x86/virt/tdx: Remove the !KEXEC_CORE dependency
> 
>   arch/x86/Kconfig                   |   2 +-
>   arch/x86/coco/core.c               |  34 +++++++++-
>   arch/x86/include/asm/tdx.h         |   2 +
>   arch/x86/kernel/machine_kexec_64.c |  18 ++++-
>   arch/x86/kernel/process.c          |  14 +---
>   arch/x86/mm/mem_encrypt_identity.c |  11 +++-
>   arch/x86/virt/vmx/tdx/tdx.c        | 101 +++++++++++++++++++++++++++++
>   include/linux/cc_platform.h        |  16 +++++
>   8 files changed, 183 insertions(+), 15 deletions(-)
> 
> 
> base-commit: a6f0b57202b0ee50dc042bae16494008dc6dc992

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/4] x86/kexec(): Reset TDX private memory on platforms with TDX erratum
  2024-01-31 11:31 ` [PATCH 3/4] x86/kexec(): Reset TDX private memory on platforms with TDX erratum Huang, Kai
  2024-01-31 21:21   ` Dave Hansen
@ 2024-02-02  0:54   ` Edgecombe, Rick P
  2024-02-05  6:44     ` Huang, Kai
  1 sibling, 1 reply; 42+ messages in thread
From: Edgecombe, Rick P @ 2024-02-02  0:54 UTC (permalink / raw)
  To: linux-kernel, Huang, Kai
  Cc: Gao, Chao, Hansen, Dave, luto, bp, x86, peterz, hpa, mingo,
	kirill.shutemov, tglx, bhe, thomas.lendacky, nik.borisov,
	pbonzini

On Wed, 2024-01-31 at 11:31 +0000, Huang, Kai wrote:
> Note kexec() can happen at anytime, including when TDX module is
> being
> initialized.  Register TDX reboot notifier callback to stop further
> TDX
> module initialization.  If there's any ongoing module initialization,
> wait until it finishes.  This makes sure the TDX module status is
> stable
> after the reboot notifier callback, and the later kexec() code can
> read
> module status to decide whether PAMTs are stable and available.

I don't see how this works with the crash kernel flavor of kexec. Did
you look at that scenario?



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/4] x86/kexec(): Reset TDX private memory on platforms with TDX erratum
  2024-02-02  0:54   ` Edgecombe, Rick P
@ 2024-02-05  6:44     ` Huang, Kai
  0 siblings, 0 replies; 42+ messages in thread
From: Huang, Kai @ 2024-02-05  6:44 UTC (permalink / raw)
  To: Edgecombe, Rick P, linux-kernel
  Cc: Gao, Chao, Hansen, Dave, luto, bp, x86, peterz, hpa, mingo,
	kirill.shutemov, tglx, bhe, thomas.lendacky, nik.borisov,
	pbonzini



On 2/02/2024 8:54 am, Edgecombe, Rick P wrote:
> On Wed, 2024-01-31 at 11:31 +0000, Huang, Kai wrote:
>> Note kexec() can happen at anytime, including when TDX module is
>> being
>> initialized.  Register TDX reboot notifier callback to stop further
>> TDX
>> module initialization.  If there's any ongoing module initialization,
>> wait until it finishes.  This makes sure the TDX module status is
>> stable
>> after the reboot notifier callback, and the later kexec() code can
>> read
>> module status to decide whether PAMTs are stable and available.
> 
> I don't see how this works with the crash kernel flavor of kexec. Did
> you look at that scenario?
> 
> 
Hmm right this doesn't work for crash kexec.  Thanks for pointing out.

We need a way that doesn't depend on the reboot notifier.  Previously we 
used a variable to indicate the point where it's possible to have any 
TDX private pages.  I'll switch back to use that.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/4] x86/kexec(): Reset TDX private memory on platforms with TDX erratum
  2024-02-01 16:57           ` Dave Hansen
@ 2024-02-05  6:49             ` Huang, Kai
  0 siblings, 0 replies; 42+ messages in thread
From: Huang, Kai @ 2024-02-05  6:49 UTC (permalink / raw)
  To: Dave Hansen, Kirill A. Shutemov
  Cc: linux-kernel, x86, tglx, bp, mingo, hpa, luto, peterz,
	thomas.lendacky, chao.gao, bhe, nik.borisov, pbonzini



On 2/02/2024 12:57 am, Dave Hansen wrote:
> On 2/1/24 06:39, Kirill A. Shutemov wrote:
>>> On 1/02/2024 6:03 am, Kirill A. Shutemov wrote:
>>>> On Wed, Jan 31, 2024 at 01:21:39PM -0800, Dave Hansen wrote:
>>>>>>    #ifdef CONFIG_KEXEC_JUMP
>>>>>>    	if (image->preserve_context)
>>>>>>    		save_processor_state();
>>>>>> +	else
>>>>>> +		tdx_reset_memory();
>>>>>> +#else
>>>>>> +	tdx_reset_memory();
>>>>>>    #endif
> ...
>> +void save_processor_state(void);
>> +void restore_processor_state(void);
>> +
>>   #ifdef CONFIG_SUSPEND
>>   extern suspend_state_t pm_suspend_target_state;
>>   extern suspend_state_t mem_sleep_current;
>> @@ -491,8 +494,6 @@ static inline int is_hibernate_resume_dev(dev_t dev) { return 0; }
>>   extern struct mutex system_transition_mutex;
>>   
>>   #ifdef CONFIG_PM_SLEEP
>> -void save_processor_state(void);
>> -void restore_processor_state(void);
> 
> It's a little funky that we've got a #ifdef CONFIG_KEXEC_JUMP in the .c
> file and then we're dodging around an #ifdef CONFIG_PM_SLEEP in the
> header.  This is one of the reasons we shouldn't be putting #ifdefs in
> .c files in the first place.  But I digress...
> 
> Either way, if you focus on getting the dang #ifdef out of the main code
> flow, the rest will fall in place easily.  Heck, if you even do this in
> the x86 kexec code:
> 
> static void kexec_save_processor_start(image)
> {
> #ifdef CONFIG_KEXEC_JUMP
> 	if (image->preserve_context)
> 		save_processor_state();
> #endif
> }
> 
> it'll leave you with:
> 
> 	kexec_save_processor_start(image);
> 
> 	/* Give a good reason here */
> 	if (!image->preserve_context)
> 		tdx_reset_memory();
> 
> which is *FINE*.  No funky #ifdefs, indentation or else's dangling about.
> 
> 

Thanks for the insight and explanation!  I'll use this in the new version.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/4] TDX host: kexec() support
  2024-02-01 18:28 ` [PATCH 0/4] TDX host: kexec() support Tom Lendacky
@ 2024-02-05  6:50   ` Huang, Kai
  2024-02-06 18:56     ` Kalra, Ashish
  0 siblings, 1 reply; 42+ messages in thread
From: Huang, Kai @ 2024-02-05  6:50 UTC (permalink / raw)
  To: Tom Lendacky, linux-kernel, Kalra, Ashish
  Cc: x86, dave.hansen, kirill.shutemov, tglx, bp, mingo, hpa, luto,
	peterz, chao.gao, bhe, nik.borisov, pbonzini



On 2/02/2024 2:28 am, Tom Lendacky wrote:
> On 1/31/24 05:31, Huang, Kai wrote:
>> Currently kexec() support and TDX host are muturally exclusive in the
>> Kconfig.  This series adds the TDX host kexec support so that they can
>> work together and can be enabled at the same time in the Kconfig.
>>
>> This follows Dave's suggestion to add the CC_ATTR_HOST_MEM_INCOHERENT
>> attribute to unify both Intel and AMD, instead of having Intel/AMD
>> specific checks around [1].
>>
>> Hi Tom,
>>
>> I've tested on my TDX testig machine but I don't have AMD machine to
>> test.  I highly appreciate if you or any AMD guy can help to review
>> and/or test this series to make sure I didn't break anything.
> 
> Hi Kai,
> 
> I'm adding Ashish to the thread to take a look at this as he's been 
> focusing on kexec related things recently.
> 

Thanks Tom.

Hi Ashish,

I appreciate if you can help to review and test the first patch.  Thanks!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/4] TDX host: kexec() support
  2024-02-05  6:50   ` Huang, Kai
@ 2024-02-06 18:56     ` Kalra, Ashish
  2024-02-07  1:43       ` Huang, Kai
  0 siblings, 1 reply; 42+ messages in thread
From: Kalra, Ashish @ 2024-02-06 18:56 UTC (permalink / raw)
  To: Huang, Kai, Tom Lendacky, linux-kernel
  Cc: x86, dave.hansen, kirill.shutemov, tglx, bp, mingo, hpa, luto,
	peterz, chao.gao, bhe, nik.borisov, pbonzini

On 2/5/2024 12:50 AM, Huang, Kai wrote:

>
>
> On 2/02/2024 2:28 am, Tom Lendacky wrote:
>> On 1/31/24 05:31, Huang, Kai wrote:
>>> Currently kexec() support and TDX host are muturally exclusive in the
>>> Kconfig.  This series adds the TDX host kexec support so that they can
>>> work together and can be enabled at the same time in the Kconfig.
>>>
>>> This follows Dave's suggestion to add the CC_ATTR_HOST_MEM_INCOHERENT
>>> attribute to unify both Intel and AMD, instead of having Intel/AMD
>>> specific checks around [1].
>>>
>>> Hi Tom,
>>>
>>> I've tested on my TDX testig machine but I don't have AMD machine to
>>> test.  I highly appreciate if you or any AMD guy can help to review
>>> and/or test this series to make sure I didn't break anything.
>>
>> Hi Kai,
>>
>> I'm adding Ashish to the thread to take a look at this as he's been 
>> focusing on kexec related things recently.
>>
>
> Thanks Tom.
>
> Hi Ashish,
>
> I appreciate if you can help to review and test the first patch. Thanks!

Hello Kai,

Yes, i am currently working on SNP guest kexec support on top of 
Kirill's TDX guest kexec patches, i will have a look your patch-set and 
test and review it.

Thanks, Ashish


^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH 0/4] TDX host: kexec() support
  2024-02-06 18:56     ` Kalra, Ashish
@ 2024-02-07  1:43       ` Huang, Kai
  0 siblings, 0 replies; 42+ messages in thread
From: Huang, Kai @ 2024-02-07  1:43 UTC (permalink / raw)
  To: Kalra, Ashish, Tom Lendacky, linux-kernel
  Cc: x86, Hansen, Dave, kirill.shutemov, tglx, bp, mingo, hpa, luto,
	peterz, Gao, Chao, bhe, nik.borisov, pbonzini

> Hello Kai,
> 
> Yes, i am currently working on SNP guest kexec support on top of Kirill's TDX
> guest kexec patches, i will have a look your patch-set and test and review it.
> 
> Thanks, Ashish

Hi Ashish,

I appreciate!


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush during kexec
  2024-01-31 11:31 ` [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush during kexec Huang, Kai
@ 2024-02-19 16:16   ` Borislav Petkov
  2024-02-19 19:45     ` Tom Lendacky
  2024-02-20  3:12     ` Huang, Kai
  0 siblings, 2 replies; 42+ messages in thread
From: Borislav Petkov @ 2024-02-19 16:16 UTC (permalink / raw)
  To: Huang, Kai
  Cc: linux-kernel, x86, dave.hansen, kirill.shutemov, tglx, mingo,
	hpa, luto, peterz, thomas.lendacky, chao.gao, bhe, nik.borisov,
	pbonzini

On Wed, Jan 31, 2024 at 11:31:53AM +0000, Huang, Kai wrote:
> From: Kai Huang <kai.huang@intel.com>
> 
> Currently on AMD SME platforms, during kexec() caches are flushed
> manually before jumping to the new kernel due to memory encryption.
> Intel TDX needs to flush cachelines of TDX private memory before
> jumping to the second kernel too, otherwise they may silently corrupt
> the new kernel.
> 
> Instead of sprinkling both AMD and Intel's specific checks around,
> introduce a new CC_ATTR_HOST_MEM_INCOHERENT attribute to unify both
> Intel and AMD, and simplify the logic:
> 
> 	Could the old kernel leave incoherent caches around?

	"Is it possible that the kernel could leave caches in incoherent state?"

> 	If so, do WBINVD.
> 
> Convert the AMD SME to use this new CC attribute.

> A later patch will
> utilize this new attribute for Intel TDX too.

Yeah, once those are all in git, the concept of "subsequent patch"
becomes ambiguous depending on how you're sorting them. So try to read
that commit message out of the context of all those "subsequent patches"
and see if it still makes sense.

IOW, you should strive for your commit messages to make sense on their
own, without referencing other patches.

In this particular case, that "later patch" can go.

> Specifically, AMD SME flushes caches at two places: 1) stop_this_cpu();
> 2) relocate_kernel().  stop_this_cpu() checks the CPUID directly to do
> WBINVD for the reason that the current kernel's SME enabling status may
> not match the new kernel's choice.  However the relocate_kernel() only
> does the WBINVD when the current kernel has enabled SME for the reason
> that the new kernel is always placed in an "unencrypted" area.
> 
> To simplify the logic, for AMD SME change to always use the way that is
> done in stop_this_cpu().  This will cause an additional WBINVD in
> relocate_kernel() when the current kernel hasn't enabled SME (e.g.,
> disabled by kernel command line), but this is acceptable for the sake of
> having less complicated code (see [1] for the relevant discussion).
> 
> Note currently the kernel only advertises CC vendor for AMD SME when SME
> is actually enabled by the kernel.  To always advertise the new
> CC_ATTR_HOST_MEM_INCOHERENT regardless of the kernel's SME enabling
> status, change to set CC vendor as long as the hardware has enabled SME.
> 
> Note "advertising CC_ATTR_HOST_MEM_INCOHERENT when the hardware has
> enabled SME" is still different from "checking the CPUID" (the way that
> is done in stop_this_cpu()), but technically the former also serves the
> purpose and is actually more accurate.
> 
> Such change allows sme_me_mask to be 0 while CC vendor reports as AMD.
> But this doesn't impact other CC attributes on AMD platforms, nor does
> it impact the cc_mkdec()/cc_mkenc().
> 
> [1] https://lore.kernel.org/lkml/cbc9c527-17e5-4a63-80fe-85451394cc7c@amd.com/
> 
> Suggested-by: Dave Hansen <dave.hansen@intel.com>
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> ---
>  arch/x86/coco/core.c               | 13 +++++++++++++
>  arch/x86/kernel/machine_kexec_64.c |  2 +-
>  arch/x86/kernel/process.c          | 14 +++-----------
>  arch/x86/mm/mem_encrypt_identity.c | 11 ++++++++++-
>  include/linux/cc_platform.h        | 15 +++++++++++++++
>  5 files changed, 42 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
> index eeec9986570e..8d6d727e6e18 100644
> --- a/arch/x86/coco/core.c
> +++ b/arch/x86/coco/core.c
> @@ -72,6 +72,19 @@ static bool noinstr amd_cc_platform_has(enum cc_attr attr)
>  	case CC_ATTR_HOST_MEM_ENCRYPT:
>  		return sme_me_mask && !(sev_status & MSR_AMD64_SEV_ENABLED);
>  
> +	case CC_ATTR_HOST_MEM_INCOHERENT:
> +		/*
> +		 * CC_ATTR_HOST_MEM_INCOHERENT represents whether SME has
> +		 * enabled on the platform regardless whether the kernel
> +		 * has actually enabled the SME.
> +

"represents whether SME has [been] enabled ... regardless whether the
kernel has enabled SME"?!?

I think this needs to be:

diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
index d07be9d05cd0..e3653465e532 100644
--- a/arch/x86/coco/core.c
+++ b/arch/x86/coco/core.c
@@ -67,6 +67,13 @@ static bool noinstr amd_cc_platform_has(enum cc_attr attr)
 
        switch (attr) {
        case CC_ATTR_MEM_ENCRYPT:
+
+               /*
+                * CC_ATTR_HOST_MEM_INCOHERENT represents whether SME has
+                * enabled on the platform regardless whether the kernel
+                * has actually enabled the SME.
+                */
+       case CC_ATTR_HOST_MEM_INCOHERENT:
                return sme_me_mask;
 
        case CC_ATTR_HOST_MEM_ENCRYPT:


> +		return !(sev_status & MSR_AMD64_SEV_ENABLED);
> +
> +	/*
> +	 * For all CC_ATTR_GUEST_* there's no need to check sme_me_mask
> +	 * as it must be true when there's any SEV enable bit set in
> +	 * sev_status.
> +	 */

Superfluous comment.

>  	case CC_ATTR_GUEST_MEM_ENCRYPT:
>  		return sev_status & MSR_AMD64_SEV_ENABLED;
>  
> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> index bc0a5348b4a6..c9c6974e2e9c 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -358,7 +358,7 @@ void machine_kexec(struct kimage *image)
>  				       (unsigned long)page_list,
>  				       image->start,
>  				       image->preserve_context,
> -				       cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT));
> +				       cc_platform_has(CC_ATTR_HOST_MEM_INCOHERENT));
>  
>  #ifdef CONFIG_KEXEC_JUMP
>  	if (image->preserve_context)
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index ab49ade31b0d..2c7e8d9889c0 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -813,18 +813,10 @@ void __noreturn stop_this_cpu(void *dummy)
>  	mcheck_cpu_clear(c);
>  
>  	/*
> -	 * Use wbinvd on processors that support SME. This provides support
> -	 * for performing a successful kexec when going from SME inactive
> -	 * to SME active (or vice-versa). The cache must be cleared so that
> -	 * if there are entries with the same physical address, both with and
> -	 * without the encryption bit, they don't race each other when flushed
> -	 * and potentially end up with the wrong entry being committed to
> -	 * memory.
> -	 *
> -	 * Test the CPUID bit directly because the machine might've cleared
> -	 * X86_FEATURE_SME due to cmdline options.
> +	 * Use wbinvd on processors that the first kernel *could*
> +	 * potentially leave incoherent cachelines.

No need for that comment anymore - people can grep for
CC_ATTR_HOST_MEM_INCOHERENT's definition simply.

>  	 */
> -	if (c->extended_cpuid_level >= 0x8000001f && (cpuid_eax(0x8000001f) & BIT(0)))
> +	if (cc_platform_has(CC_ATTR_HOST_MEM_INCOHERENT))
>  		native_wbinvd();
>  
>  	/*
> diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
> index 7f72472a34d6..87e4fddab770 100644
> --- a/arch/x86/mm/mem_encrypt_identity.c
> +++ b/arch/x86/mm/mem_encrypt_identity.c
> @@ -570,9 +570,19 @@ void __init sme_enable(struct boot_params *bp)
>  		msr = __rdmsr(MSR_AMD64_SYSCFG);
>  		if (!(msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT))
>  			return;
> +
> +		/*
> +		 * Always set CC vendor when the platform has SME enabled
> +		 * regardless whether the kernel will actually activates the
> +		 * SME or not.  This reports the CC_ATTR_HOST_MEM_INCOHERENT
> +		 * being true as long as the platform has SME enabled so that
> +		 * stop_this_cpu() can do necessary WBINVD during kexec().
> +		 */
> +		cc_vendor = CC_VENDOR_AMD;
>  	} else {
>  		/* SEV state cannot be controlled by a command line option */
>  		sme_me_mask = me_mask;
> +		cc_vendor = CC_VENDOR_AMD;
>  		goto out;
>  	}
>  

So you can't put it before the if - just slap it in both branches. Geez!

diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
index 0166ab1780cc..1e4566cc233f 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -549,6 +549,8 @@ void __init sme_enable(struct boot_params *bp)
        if (snp && !(msr & MSR_AMD64_SEV_SNP_ENABLED))
                snp_abort();
 
+       cc_vendor = CC_VENDOR_AMD;
+
        /* Check if memory encryption is enabled */
        if (feature_mask == AMD_SME_BIT) {
                /*
@@ -597,6 +599,5 @@ void __init sme_enable(struct boot_params *bp)
 out:
        RIP_REL_REF(sme_me_mask) = me_mask;
        physical_mask &= ~me_mask;
-       cc_vendor = CC_VENDOR_AMD;
        cc_set_mask(me_mask);
 }

Btw, pls do your patches ontop of tip/master as other patches in there
are touching this file.

> @@ -608,7 +618,6 @@ void __init sme_enable(struct boot_params *bp)
>  out:
>  	if (sme_me_mask) {
>  		physical_mask &= ~sme_me_mask;
> -		cc_vendor = CC_VENDOR_AMD;
>  		cc_set_mask(sme_me_mask);
>  	}
>  }
> diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h
> index cb0d6cd1c12f..2f7273596102 100644
> --- a/include/linux/cc_platform.h
> +++ b/include/linux/cc_platform.h
> @@ -42,6 +42,21 @@ enum cc_attr {
>  	 */
>  	CC_ATTR_HOST_MEM_ENCRYPT,
>  

This goes to the end of the enum.

> +	/**
> +	 * @CC_ATTR_HOST_MEM_INCOHERENT: Host memory encryption can be
> +	 * incoherent

"...can leave caches in an incoherent state."

> +	 *
> +	 * The platform/OS is running as a bare-metal system or a hypervisor.
> +	 * The memory encryption engine might have left non-cache-coherent
> +	 * data in the caches that needs to be flushed.
> +	 *
> +	 * Use this in places where the cache coherency of the memory matters
> +	 * but the encryption status does not.
> +	 *
> +	 * Includes all systems that set CC_ATTR_HOST_MEM_ENCRYPT.

If that is the case, why do you even need a new CC_ATTR define?

Might as well do:

	if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
		native_wbinvd();

?

> +	 */
> +	CC_ATTR_HOST_MEM_INCOHERENT,
> +


-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush during kexec
  2024-02-19 16:16   ` Borislav Petkov
@ 2024-02-19 19:45     ` Tom Lendacky
  2024-02-19 20:32       ` Borislav Petkov
  2024-02-20  3:12     ` Huang, Kai
  1 sibling, 1 reply; 42+ messages in thread
From: Tom Lendacky @ 2024-02-19 19:45 UTC (permalink / raw)
  To: Borislav Petkov, Huang, Kai
  Cc: linux-kernel, x86, dave.hansen, kirill.shutemov, tglx, mingo,
	hpa, luto, peterz, chao.gao, bhe, nik.borisov, pbonzini

On 2/19/24 10:16, Borislav Petkov wrote:
> On Wed, Jan 31, 2024 at 11:31:53AM +0000, Huang, Kai wrote:
>> From: Kai Huang <kai.huang@intel.com>
>>
>> Currently on AMD SME platforms, during kexec() caches are flushed
>> manually before jumping to the new kernel due to memory encryption.
>> Intel TDX needs to flush cachelines of TDX private memory before
>> jumping to the second kernel too, otherwise they may silently corrupt
>> the new kernel.
>>
>> Instead of sprinkling both AMD and Intel's specific checks around,
>> introduce a new CC_ATTR_HOST_MEM_INCOHERENT attribute to unify both
>> Intel and AMD, and simplify the logic:
>>
>> 	Could the old kernel leave incoherent caches around?
> 
> 	"Is it possible that the kernel could leave caches in incoherent state?"
> 
>> 	If so, do WBINVD.
>>
>> Convert the AMD SME to use this new CC attribute.
> 
>> A later patch will
>> utilize this new attribute for Intel TDX too.
> 
> Yeah, once those are all in git, the concept of "subsequent patch"
> becomes ambiguous depending on how you're sorting them. So try to read
> that commit message out of the context of all those "subsequent patches"
> and see if it still makes sense.
> 
> IOW, you should strive for your commit messages to make sense on their
> own, without referencing other patches.
> 
> In this particular case, that "later patch" can go.
> 
>> Specifically, AMD SME flushes caches at two places: 1) stop_this_cpu();
>> 2) relocate_kernel().  stop_this_cpu() checks the CPUID directly to do
>> WBINVD for the reason that the current kernel's SME enabling status may
>> not match the new kernel's choice.  However the relocate_kernel() only
>> does the WBINVD when the current kernel has enabled SME for the reason
>> that the new kernel is always placed in an "unencrypted" area.
>>
>> To simplify the logic, for AMD SME change to always use the way that is
>> done in stop_this_cpu().  This will cause an additional WBINVD in
>> relocate_kernel() when the current kernel hasn't enabled SME (e.g.,
>> disabled by kernel command line), but this is acceptable for the sake of
>> having less complicated code (see [1] for the relevant discussion).
>>
>> Note currently the kernel only advertises CC vendor for AMD SME when SME
>> is actually enabled by the kernel.  To always advertise the new
>> CC_ATTR_HOST_MEM_INCOHERENT regardless of the kernel's SME enabling
>> status, change to set CC vendor as long as the hardware has enabled SME.
>>
>> Note "advertising CC_ATTR_HOST_MEM_INCOHERENT when the hardware has
>> enabled SME" is still different from "checking the CPUID" (the way that
>> is done in stop_this_cpu()), but technically the former also serves the
>> purpose and is actually more accurate.
>>
>> Such change allows sme_me_mask to be 0 while CC vendor reports as AMD.
>> But this doesn't impact other CC attributes on AMD platforms, nor does
>> it impact the cc_mkdec()/cc_mkenc().
>>
>> [1] https://lore.kernel.org/lkml/cbc9c527-17e5-4a63-80fe-85451394cc7c@amd.com/
>>
>> Suggested-by: Dave Hansen <dave.hansen@intel.com>
>> Signed-off-by: Kai Huang <kai.huang@intel.com>
>> ---
>>   arch/x86/coco/core.c               | 13 +++++++++++++
>>   arch/x86/kernel/machine_kexec_64.c |  2 +-
>>   arch/x86/kernel/process.c          | 14 +++-----------
>>   arch/x86/mm/mem_encrypt_identity.c | 11 ++++++++++-
>>   include/linux/cc_platform.h        | 15 +++++++++++++++
>>   5 files changed, 42 insertions(+), 13 deletions(-)
>>
>> diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
>> index eeec9986570e..8d6d727e6e18 100644
>> --- a/arch/x86/coco/core.c
>> +++ b/arch/x86/coco/core.c
>> @@ -72,6 +72,19 @@ static bool noinstr amd_cc_platform_has(enum cc_attr attr)
>>   	case CC_ATTR_HOST_MEM_ENCRYPT:
>>   		return sme_me_mask && !(sev_status & MSR_AMD64_SEV_ENABLED);
>>   
>> +	case CC_ATTR_HOST_MEM_INCOHERENT:
>> +		/*
>> +		 * CC_ATTR_HOST_MEM_INCOHERENT represents whether SME has
>> +		 * enabled on the platform regardless whether the kernel
>> +		 * has actually enabled the SME.
>> +
> 
> "represents whether SME has [been] enabled ... regardless whether the
> kernel has enabled SME"?!?
> 
> I think this needs to be:
> 
> diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
> index d07be9d05cd0..e3653465e532 100644
> --- a/arch/x86/coco/core.c
> +++ b/arch/x86/coco/core.c
> @@ -67,6 +67,13 @@ static bool noinstr amd_cc_platform_has(enum cc_attr attr)
>   
>          switch (attr) {
>          case CC_ATTR_MEM_ENCRYPT:
> +
> +               /*
> +                * CC_ATTR_HOST_MEM_INCOHERENT represents whether SME has
> +                * enabled on the platform regardless whether the kernel
> +                * has actually enabled the SME.
> +                */
> +       case CC_ATTR_HOST_MEM_INCOHERENT:

This change won't return the correct answer. The check needs to remain 
against the sev_status value.

>                  return sme_me_mask;
>   
>          case CC_ATTR_HOST_MEM_ENCRYPT:
> 
> 
>> +		return !(sev_status & MSR_AMD64_SEV_ENABLED);
>> +
>> +	/*
>> +	 * For all CC_ATTR_GUEST_* there's no need to check sme_me_mask
>> +	 * as it must be true when there's any SEV enable bit set in
>> +	 * sev_status.
>> +	 */
> 
> Superfluous comment.
> 
>>   	case CC_ATTR_GUEST_MEM_ENCRYPT:
>>   		return sev_status & MSR_AMD64_SEV_ENABLED;
>>   
>> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
>> index bc0a5348b4a6..c9c6974e2e9c 100644
>> --- a/arch/x86/kernel/machine_kexec_64.c
>> +++ b/arch/x86/kernel/machine_kexec_64.c
>> @@ -358,7 +358,7 @@ void machine_kexec(struct kimage *image)
>>   				       (unsigned long)page_list,
>>   				       image->start,
>>   				       image->preserve_context,
>> -				       cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT));
>> +				       cc_platform_has(CC_ATTR_HOST_MEM_INCOHERENT));
>>   
>>   #ifdef CONFIG_KEXEC_JUMP
>>   	if (image->preserve_context)
>> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>> index ab49ade31b0d..2c7e8d9889c0 100644
>> --- a/arch/x86/kernel/process.c
>> +++ b/arch/x86/kernel/process.c
>> @@ -813,18 +813,10 @@ void __noreturn stop_this_cpu(void *dummy)
>>   	mcheck_cpu_clear(c);
>>   
>>   	/*
>> -	 * Use wbinvd on processors that support SME. This provides support
>> -	 * for performing a successful kexec when going from SME inactive
>> -	 * to SME active (or vice-versa). The cache must be cleared so that
>> -	 * if there are entries with the same physical address, both with and
>> -	 * without the encryption bit, they don't race each other when flushed
>> -	 * and potentially end up with the wrong entry being committed to
>> -	 * memory.
>> -	 *
>> -	 * Test the CPUID bit directly because the machine might've cleared
>> -	 * X86_FEATURE_SME due to cmdline options.
>> +	 * Use wbinvd on processors that the first kernel *could*
>> +	 * potentially leave incoherent cachelines.
> 
> No need for that comment anymore - people can grep for
> CC_ATTR_HOST_MEM_INCOHERENT's definition simply.
> 
>>   	 */
>> -	if (c->extended_cpuid_level >= 0x8000001f && (cpuid_eax(0x8000001f) & BIT(0)))
>> +	if (cc_platform_has(CC_ATTR_HOST_MEM_INCOHERENT))
>>   		native_wbinvd();
>>   
>>   	/*
>> diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
>> index 7f72472a34d6..87e4fddab770 100644
>> --- a/arch/x86/mm/mem_encrypt_identity.c
>> +++ b/arch/x86/mm/mem_encrypt_identity.c
>> @@ -570,9 +570,19 @@ void __init sme_enable(struct boot_params *bp)
>>   		msr = __rdmsr(MSR_AMD64_SYSCFG);
>>   		if (!(msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT))
>>   			return;
>> +
>> +		/*
>> +		 * Always set CC vendor when the platform has SME enabled
>> +		 * regardless whether the kernel will actually activates the
>> +		 * SME or not.  This reports the CC_ATTR_HOST_MEM_INCOHERENT
>> +		 * being true as long as the platform has SME enabled so that
>> +		 * stop_this_cpu() can do necessary WBINVD during kexec().
>> +		 */
>> +		cc_vendor = CC_VENDOR_AMD;
>>   	} else {
>>   		/* SEV state cannot be controlled by a command line option */
>>   		sme_me_mask = me_mask;
>> +		cc_vendor = CC_VENDOR_AMD;
>>   		goto out;
>>   	}
>>   
> 
> So you can't put it before the if - just slap it in both branches. Geez!

I think that will still work because sme_me_mask and sev_status will both 
be 0 on bare-metal if 'msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT' doesn't 
evaluate to true. However, that will cause any platform that hasn't 
enabled memory encryption (see SYS_CFG MSR), to also perform the WBINVD.

The idea looks like it was to keep existing behavior where cc_vendor 
wasn't set if 'msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT' is false.

> 
> diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
> index 0166ab1780cc..1e4566cc233f 100644
> --- a/arch/x86/mm/mem_encrypt_identity.c
> +++ b/arch/x86/mm/mem_encrypt_identity.c
> @@ -549,6 +549,8 @@ void __init sme_enable(struct boot_params *bp)
>          if (snp && !(msr & MSR_AMD64_SEV_SNP_ENABLED))
>                  snp_abort();
>   
> +       cc_vendor = CC_VENDOR_AMD;
> +
>          /* Check if memory encryption is enabled */
>          if (feature_mask == AMD_SME_BIT) {
>                  /*
> @@ -597,6 +599,5 @@ void __init sme_enable(struct boot_params *bp)
>   out:
>          RIP_REL_REF(sme_me_mask) = me_mask;
>          physical_mask &= ~me_mask;
> -       cc_vendor = CC_VENDOR_AMD;
>          cc_set_mask(me_mask);
>   }
> 
> Btw, pls do your patches ontop of tip/master as other patches in there
> are touching this file.
> 
>> @@ -608,7 +618,6 @@ void __init sme_enable(struct boot_params *bp)
>>   out:
>>   	if (sme_me_mask) {
>>   		physical_mask &= ~sme_me_mask;
>> -		cc_vendor = CC_VENDOR_AMD;
>>   		cc_set_mask(sme_me_mask);
>>   	}
>>   }
>> diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h
>> index cb0d6cd1c12f..2f7273596102 100644
>> --- a/include/linux/cc_platform.h
>> +++ b/include/linux/cc_platform.h
>> @@ -42,6 +42,21 @@ enum cc_attr {
>>   	 */
>>   	CC_ATTR_HOST_MEM_ENCRYPT,
>>   
> 
> This goes to the end of the enum.
> 
>> +	/**
>> +	 * @CC_ATTR_HOST_MEM_INCOHERENT: Host memory encryption can be
>> +	 * incoherent
> 
> "...can leave caches in an incoherent state."
> 
>> +	 *
>> +	 * The platform/OS is running as a bare-metal system or a hypervisor.
>> +	 * The memory encryption engine might have left non-cache-coherent
>> +	 * data in the caches that needs to be flushed.
>> +	 *
>> +	 * Use this in places where the cache coherency of the memory matters
>> +	 * but the encryption status does not.
>> +	 *
>> +	 * Includes all systems that set CC_ATTR_HOST_MEM_ENCRYPT.
> 
> If that is the case, why do you even need a new CC_ATTR define?
> 
> Might as well do:
> 
> 	if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
> 		native_wbinvd();

That won't work, because the current system may not have SME active. The 
cases that needs to be caught are kexec'ing from a mem_encrypt=off to a 
mem_encrypt=on or from a mem_encrypt=on to a mem_encrypt=off.

For the first case, you need to determine if the system is SME capable, 
because cc_platform_has(CC_ATTR_MEM_ENCRYPT) will return false if SME is 
not active.

Thanks,
Tom

> 
> ?
> 
>> +	 */
>> +	CC_ATTR_HOST_MEM_INCOHERENT,
>> +
> 
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush during kexec
  2024-02-19 19:45     ` Tom Lendacky
@ 2024-02-19 20:32       ` Borislav Petkov
  2024-02-19 22:09         ` Tom Lendacky
  0 siblings, 1 reply; 42+ messages in thread
From: Borislav Petkov @ 2024-02-19 20:32 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Huang, Kai, linux-kernel, x86, dave.hansen, kirill.shutemov,
	tglx, mingo, hpa, luto, peterz, chao.gao, bhe, nik.borisov,
	pbonzini

On Mon, Feb 19, 2024 at 01:45:37PM -0600, Tom Lendacky wrote:
> This change won't return the correct answer. The check needs to remain
> against the sev_status value.

Feel free to explain because this patch is confusing me.

> > So you can't put it before the if - just slap it in both branches. Geez!
> 
> I think that will still work because sme_me_mask and sev_status will both be
> 0 on bare-metal if 'msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT' doesn't evaluate to
> true. However, that will cause any platform that hasn't enabled memory
> encryption (see SYS_CFG MSR), to also perform the WBINVD.

If it keeps the code simpler I don't mind. That's so not a fast path.

> That won't work, because the current system may not have SME active. The
> cases that needs to be caught are kexec'ing from a mem_encrypt=off to a
> mem_encrypt=on or from a mem_encrypt=on to a mem_encrypt=off.

And I'm saying, we should keep it simple and simply WBINVD on SME
capable machines, regardless of the encryption setting.

Any downsides to that which are actually real and noticeable?

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush during kexec
  2024-02-19 20:32       ` Borislav Petkov
@ 2024-02-19 22:09         ` Tom Lendacky
  2024-02-20  2:57           ` Huang, Kai
  2024-02-20 14:28           ` Borislav Petkov
  0 siblings, 2 replies; 42+ messages in thread
From: Tom Lendacky @ 2024-02-19 22:09 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Huang, Kai, linux-kernel, x86, dave.hansen, kirill.shutemov,
	tglx, mingo, hpa, luto, peterz, chao.gao, bhe, nik.borisov,
	pbonzini

On 2/19/24 14:32, Borislav Petkov wrote:
> On Mon, Feb 19, 2024 at 01:45:37PM -0600, Tom Lendacky wrote:
>> This change won't return the correct answer. The check needs to remain
>> against the sev_status value.
> 
> Feel free to explain because this patch is confusing me.

In your previous email, you want to put the CC_ATTR_HOST_MEM_INCOHERENT 
case statement with the CC_ATTR_MEM_ENCRYPT case which is returning 
sme_me_mask. That will be zero/false if SME is not active, skipping the 
WBINVD. But, in reality you still need to perform the WBINVD in case the 
kexec target is doing mem_encrypt=on.

That's why the '!(sev_status & MSR_AMD64_SEV_ENABLED)' works here. 
Basically, if you are bare-metal, it will return true. And it will only 
return true for machines that support SME and have the 
MSR_AMD64_SYSCFG_MEM_ENCRYPT bit set in SYS_CFG MSR because of where the 
'cc_vendor = CC_VENDOR_AMD' assignment is. However, if you move the 
'cc_vendor = CC_VENDOR_AMD' to before the if statement, then you will have 
the WBINVD called for any machine that supports SME, even if SME is not 
possible because the proper bit in the SYS_CFG MSR hasn't been set.

I know what I'm trying to say, let me know if it is making sense...

> 
>>> So you can't put it before the if - just slap it in both branches. Geez!
>>
>> I think that will still work because sme_me_mask and sev_status will both be
>> 0 on bare-metal if 'msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT' doesn't evaluate to
>> true. However, that will cause any platform that hasn't enabled memory
>> encryption (see SYS_CFG MSR), to also perform the WBINVD.
> 
> If it keeps the code simpler I don't mind. That's so not a fast path.
> 
>> That won't work, because the current system may not have SME active. The
>> cases that needs to be caught are kexec'ing from a mem_encrypt=off to a
>> mem_encrypt=on or from a mem_encrypt=on to a mem_encrypt=off.
> 
> And I'm saying, we should keep it simple and simply WBINVD on SME
> capable machines, regardless of the encryption setting.

In that case, CC_ATTR_HOST_MEM_INCOHERENT needs to be separate from 
CC_ATTR_MEM_ENCRYPT as the original patch has it. The comment might make 
more sense as:

	 * CC_ATTR_HOST_MEM_INCOHERENT represents whether SME is possible
	 * on the platform, regardless of whether mem_encrypt=on has been
	 * used to make SME active.

Thanks,
Tom

> 
> Any downsides to that which are actually real and noticeable?
> 
> Thx.
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush during kexec
  2024-02-19 22:09         ` Tom Lendacky
@ 2024-02-20  2:57           ` Huang, Kai
  2024-02-20 14:40             ` Tom Lendacky
  2024-02-20 14:28           ` Borislav Petkov
  1 sibling, 1 reply; 42+ messages in thread
From: Huang, Kai @ 2024-02-20  2:57 UTC (permalink / raw)
  To: thomas.lendacky, bp
  Cc: Gao, Chao, Hansen, Dave, luto, x86, peterz, hpa, mingo,
	kirill.shutemov, tglx, linux-kernel, pbonzini, nik.borisov, bhe

On Mon, 2024-02-19 at 16:09 -0600, Tom Lendacky wrote:
> On 2/19/24 14:32, Borislav Petkov wrote:
> > On Mon, Feb 19, 2024 at 01:45:37PM -0600, Tom Lendacky wrote:
> > > This change won't return the correct answer. The check needs to remain
> > > against the sev_status value.
> > 
> > Feel free to explain because this patch is confusing me.
> 
> In your previous email, you want to put the CC_ATTR_HOST_MEM_INCOHERENT 
> case statement with the CC_ATTR_MEM_ENCRYPT case which is returning 
> sme_me_mask. That will be zero/false if SME is not active, skipping the 
> WBINVD. But, in reality you still need to perform the WBINVD in case the 
> kexec target is doing mem_encrypt=on.
> 
> That's why the '!(sev_status & MSR_AMD64_SEV_ENABLED)' works here. 
> Basically, if you are bare-metal, it will return true. And it will only 
> return true for machines that support SME and have the 
> MSR_AMD64_SYSCFG_MEM_ENCRYPT bit set in SYS_CFG MSR because of where the 
> 'cc_vendor = CC_VENDOR_AMD' assignment is. 
> 

[...]

> However, if you move the 
> 'cc_vendor = CC_VENDOR_AMD' to before the if statement, then you will have 
> the WBINVD called for any machine that supports SME, even if SME is not 
> possible because the proper bit in the SYS_CFG MSR hasn't been set.

Hi Tom,

Thanks for clarifying.  However it seems to me that this is the behaviour in the
current upstream code.  The stop_this_cpu() checks CPUID directly w/o checking
the SYS_CFG MSR:

        if (c->extended_cpuid_level >= 0x8000001f && 
			(cpuid_eax(0x8000001f) & BIT(0)))                                           
                native_wbinvd();  

I believe the BIT(0) in CPUID, which is "Secure Memory Encryption Support", will
always be true regardless of whether the MSR_AMD64_SYSCFG_MEM_ENCRYPT is set in
SYS_CFG MSR?

If so, IIUC moving the 'cc_vendor = CC_VENDOR_AMD' to the place right before the
if statement as suggested by Boris seems just follows the current behaviour in
the upstream code.

Of course we need to always return true for CC_ATTR_HOST_MEM_INCOHERENT but not
querying sme_me_mask.

> 
> I know what I'm trying to say, let me know if it is making sense...
> 
> > 
> > > > So you can't put it before the if - just slap it in both branches. Geez!
> > > 
> > > I think that will still work because sme_me_mask and sev_status will both be
> > > 0 on bare-metal if 'msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT' doesn't evaluate to
> > > true. However, that will cause any platform that hasn't enabled memory
> > > encryption (see SYS_CFG MSR), to also perform the WBINVD.
> > 
> > If it keeps the code simpler I don't mind. That's so not a fast path.
> > 
> > > That won't work, because the current system may not have SME active. The
> > > cases that needs to be caught are kexec'ing from a mem_encrypt=off to a
> > > mem_encrypt=on or from a mem_encrypt=on to a mem_encrypt=off.
> > 
> > And I'm saying, we should keep it simple and simply WBINVD on SME
> > capable machines, regardless of the encryption setting.
> 
> In that case, CC_ATTR_HOST_MEM_INCOHERENT needs to be separate from 
> CC_ATTR_MEM_ENCRYPT as the original patch has it. The comment might make 
> more sense as:
> 
> 	 * CC_ATTR_HOST_MEM_INCOHERENT represents whether SME is possible
> 	 * on the platform, regardless of whether mem_encrypt=on has been
> 	 * used to make SME active.
> 
> Thanks,
> Tom

This looks good to me.  Thanks!



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush during kexec
  2024-02-19 16:16   ` Borislav Petkov
  2024-02-19 19:45     ` Tom Lendacky
@ 2024-02-20  3:12     ` Huang, Kai
  1 sibling, 0 replies; 42+ messages in thread
From: Huang, Kai @ 2024-02-20  3:12 UTC (permalink / raw)
  To: bp
  Cc: Gao, Chao, Hansen, Dave, luto, x86, pbonzini, peterz, hpa, mingo,
	kirill.shutemov, tglx, linux-kernel, thomas.lendacky,
	nik.borisov, bhe

On Mon, 2024-02-19 at 17:16 +0100, Borislav Petkov wrote:
> On Wed, Jan 31, 2024 at 11:31:53AM +0000, Huang, Kai wrote:
> > From: Kai Huang <kai.huang@intel.com>
> > 
> > Currently on AMD SME platforms, during kexec() caches are flushed
> > manually before jumping to the new kernel due to memory encryption.
> > Intel TDX needs to flush cachelines of TDX private memory before
> > jumping to the second kernel too, otherwise they may silently corrupt
> > the new kernel.
> > 
> > Instead of sprinkling both AMD and Intel's specific checks around,
> > introduce a new CC_ATTR_HOST_MEM_INCOHERENT attribute to unify both
> > Intel and AMD, and simplify the logic:
> > 
> > 	Could the old kernel leave incoherent caches around?
> 
> 	"Is it possible that the kernel could leave caches in incoherent state?"

Will do.  Thanks.

> 
> > 	If so, do WBINVD.
> > 
> > Convert the AMD SME to use this new CC attribute.
> 
> > A later patch will
> > utilize this new attribute for Intel TDX too.
> 
> Yeah, once those are all in git, the concept of "subsequent patch"
> becomes ambiguous depending on how you're sorting them. So try to read
> that commit message out of the context of all those "subsequent patches"
> and see if it still makes sense.

Right.  Makes sense.

> 
> IOW, you should strive for your commit messages to make sense on their
> own, without referencing other patches.
> 
> In this particular case, that "later patch" can go.

Will remove the "later patch" sentence.  Thanks.

> 
> > Specifically, AMD SME flushes caches at two places: 1) stop_this_cpu();
> > 2) relocate_kernel().  stop_this_cpu() checks the CPUID directly to do
> > WBINVD for the reason that the current kernel's SME enabling status may
> > not match the new kernel's choice.  However the relocate_kernel() only
> > does the WBINVD when the current kernel has enabled SME for the reason
> > that the new kernel is always placed in an "unencrypted" area.
> > 
> > To simplify the logic, for AMD SME change to always use the way that is
> > done in stop_this_cpu().  This will cause an additional WBINVD in
> > relocate_kernel() when the current kernel hasn't enabled SME (e.g.,
> > disabled by kernel command line), but this is acceptable for the sake of
> > having less complicated code (see [1] for the relevant discussion).
> > 
> > Note currently the kernel only advertises CC vendor for AMD SME when SME
> > is actually enabled by the kernel.  To always advertise the new
> > CC_ATTR_HOST_MEM_INCOHERENT regardless of the kernel's SME enabling
> > status, change to set CC vendor as long as the hardware has enabled SME.
> > 
> > Note "advertising CC_ATTR_HOST_MEM_INCOHERENT when the hardware has
> > enabled SME" is still different from "checking the CPUID" (the way that
> > is done in stop_this_cpu()), but technically the former also serves the
> > purpose and is actually more accurate.
> > 
> > Such change allows sme_me_mask to be 0 while CC vendor reports as AMD.
> > But this doesn't impact other CC attributes on AMD platforms, nor does
> > it impact the cc_mkdec()/cc_mkenc().
> > 
> > [1] https://lore.kernel.org/lkml/cbc9c527-17e5-4a63-80fe-85451394cc7c@amd.com/
> > 
> > Suggested-by: Dave Hansen <dave.hansen@intel.com>
> > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > ---
> >  arch/x86/coco/core.c               | 13 +++++++++++++
> >  arch/x86/kernel/machine_kexec_64.c |  2 +-
> >  arch/x86/kernel/process.c          | 14 +++-----------
> >  arch/x86/mm/mem_encrypt_identity.c | 11 ++++++++++-
> >  include/linux/cc_platform.h        | 15 +++++++++++++++
> >  5 files changed, 42 insertions(+), 13 deletions(-)
> > 
> > diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
> > index eeec9986570e..8d6d727e6e18 100644
> > --- a/arch/x86/coco/core.c
> > +++ b/arch/x86/coco/core.c
> > @@ -72,6 +72,19 @@ static bool noinstr amd_cc_platform_has(enum cc_attr attr)
> >  	case CC_ATTR_HOST_MEM_ENCRYPT:
> >  		return sme_me_mask && !(sev_status & MSR_AMD64_SEV_ENABLED);
> >  
> > +	case CC_ATTR_HOST_MEM_INCOHERENT:
> > +		/*
> > +		 * CC_ATTR_HOST_MEM_INCOHERENT represents whether SME has
> > +		 * enabled on the platform regardless whether the kernel
> > +		 * has actually enabled the SME.
> > +
> 
> "represents whether SME has [been] enabled ... regardless whether the
> kernel has enabled SME"?!?
> 
> I think this needs to be:
> 
> diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
> index d07be9d05cd0..e3653465e532 100644
> --- a/arch/x86/coco/core.c
> +++ b/arch/x86/coco/core.c
> @@ -67,6 +67,13 @@ static bool noinstr amd_cc_platform_has(enum cc_attr attr)
>  
>         switch (attr) {
>         case CC_ATTR_MEM_ENCRYPT:
> +
> +               /*
> +                * CC_ATTR_HOST_MEM_INCOHERENT represents whether SME has
> +                * enabled on the platform regardless whether the kernel
> +                * has actually enabled the SME.
> +                */
> +       case CC_ATTR_HOST_MEM_INCOHERENT:
>                 return sme_me_mask;

As Tom pointed out this will return false when kernel doesn't active SME due to
command line, and WBINVD won't be done.

>  
>         case CC_ATTR_HOST_MEM_ENCRYPT:
> 
> 
> > +		return !(sev_status & MSR_AMD64_SEV_ENABLED);
> > +
> > +	/*
> > +	 * For all CC_ATTR_GUEST_* there's no need to check sme_me_mask
> > +	 * as it must be true when there's any SEV enable bit set in
> > +	 * sev_status.
> > +	 */
> 
> Superfluous comment.

Will remove.

> 
> >  	case CC_ATTR_GUEST_MEM_ENCRYPT:
> >  		return sev_status & MSR_AMD64_SEV_ENABLED;
> >  
> > diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> > index bc0a5348b4a6..c9c6974e2e9c 100644
> > --- a/arch/x86/kernel/machine_kexec_64.c
> > +++ b/arch/x86/kernel/machine_kexec_64.c
> > @@ -358,7 +358,7 @@ void machine_kexec(struct kimage *image)
> >  				       (unsigned long)page_list,
> >  				       image->start,
> >  				       image->preserve_context,
> > -				       cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT));
> > +				       cc_platform_has(CC_ATTR_HOST_MEM_INCOHERENT));
> >  
> >  #ifdef CONFIG_KEXEC_JUMP
> >  	if (image->preserve_context)
> > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> > index ab49ade31b0d..2c7e8d9889c0 100644
> > --- a/arch/x86/kernel/process.c
> > +++ b/arch/x86/kernel/process.c
> > @@ -813,18 +813,10 @@ void __noreturn stop_this_cpu(void *dummy)
> >  	mcheck_cpu_clear(c);
> >  
> >  	/*
> > -	 * Use wbinvd on processors that support SME. This provides support
> > -	 * for performing a successful kexec when going from SME inactive
> > -	 * to SME active (or vice-versa). The cache must be cleared so that
> > -	 * if there are entries with the same physical address, both with and
> > -	 * without the encryption bit, they don't race each other when flushed
> > -	 * and potentially end up with the wrong entry being committed to
> > -	 * memory.
> > -	 *
> > -	 * Test the CPUID bit directly because the machine might've cleared
> > -	 * X86_FEATURE_SME due to cmdline options.
> > +	 * Use wbinvd on processors that the first kernel *could*
> > +	 * potentially leave incoherent cachelines.
> 
> No need for that comment anymore - people can grep for
> CC_ATTR_HOST_MEM_INCOHERENT's definition simply.

Will remove.  Thanks.

> 
> >  	 */
> > -	if (c->extended_cpuid_level >= 0x8000001f && (cpuid_eax(0x8000001f) & BIT(0)))
> > +	if (cc_platform_has(CC_ATTR_HOST_MEM_INCOHERENT))
> >  		native_wbinvd();
> >  
> >  	/*
> > diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
> > index 7f72472a34d6..87e4fddab770 100644
> > --- a/arch/x86/mm/mem_encrypt_identity.c
> > +++ b/arch/x86/mm/mem_encrypt_identity.c
> > @@ -570,9 +570,19 @@ void __init sme_enable(struct boot_params *bp)
> >  		msr = __rdmsr(MSR_AMD64_SYSCFG);
> >  		if (!(msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT))
> >  			return;
> > +
> > +		/*
> > +		 * Always set CC vendor when the platform has SME enabled
> > +		 * regardless whether the kernel will actually activates the
> > +		 * SME or not.  This reports the CC_ATTR_HOST_MEM_INCOHERENT
> > +		 * being true as long as the platform has SME enabled so that
> > +		 * stop_this_cpu() can do necessary WBINVD during kexec().
> > +		 */
> > +		cc_vendor = CC_VENDOR_AMD;
> >  	} else {
> >  		/* SEV state cannot be controlled by a command line option */
> >  		sme_me_mask = me_mask;
> > +		cc_vendor = CC_VENDOR_AMD;
> >  		goto out;
> >  	}
> >  
> 
> So you can't put it before the if - just slap it in both branches. Geez!

I think we can.  Please see my reply to Tom's email.

> 
> diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
> index 0166ab1780cc..1e4566cc233f 100644
> --- a/arch/x86/mm/mem_encrypt_identity.c
> +++ b/arch/x86/mm/mem_encrypt_identity.c
> @@ -549,6 +549,8 @@ void __init sme_enable(struct boot_params *bp)
>         if (snp && !(msr & MSR_AMD64_SEV_SNP_ENABLED))
>                 snp_abort();
>  
> +       cc_vendor = CC_VENDOR_AMD;
> +
>         /* Check if memory encryption is enabled */
>         if (feature_mask == AMD_SME_BIT) {
>                 /*
> @@ -597,6 +599,5 @@ void __init sme_enable(struct boot_params *bp)
>  out:
>         RIP_REL_REF(sme_me_mask) = me_mask;
>         physical_mask &= ~me_mask;
> -       cc_vendor = CC_VENDOR_AMD;
>         cc_set_mask(me_mask);
>  }
> 
> Btw, pls do your patches ontop of tip/master as other patches in there
> are touching this file.

Yeah will do.  I believe this series was generated based on tip/master but this
was 3 weeks ago.

> 
> > @@ -608,7 +618,6 @@ void __init sme_enable(struct boot_params *bp)
> >  out:
> >  	if (sme_me_mask) {
> >  		physical_mask &= ~sme_me_mask;
> > -		cc_vendor = CC_VENDOR_AMD;
> >  		cc_set_mask(sme_me_mask);
> >  	}
> >  }
> > diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h
> > index cb0d6cd1c12f..2f7273596102 100644
> > --- a/include/linux/cc_platform.h
> > +++ b/include/linux/cc_platform.h
> > @@ -42,6 +42,21 @@ enum cc_attr {
> >  	 */
> >  	CC_ATTR_HOST_MEM_ENCRYPT,
> >  
> 
> This goes to the end of the enum.
> 
> > +	/**
> > +	 * @CC_ATTR_HOST_MEM_INCOHERENT: Host memory encryption can be
> > +	 * incoherent
> 
> "...can leave caches in an incoherent state."

Will do.  And I'll move this CC_ATTR_HOST_MEM_INCOHERENT to the end of the enum
if I understand you correctly.

> 
> > +	 *
> > +	 * The platform/OS is running as a bare-metal system or a hypervisor.
> > +	 * The memory encryption engine might have left non-cache-coherent
> > +	 * data in the caches that needs to be flushed.
> > +	 *
> > +	 * Use this in places where the cache coherency of the memory matters
> > +	 * but the encryption status does not.
> > +	 *
> > +	 * Includes all systems that set CC_ATTR_HOST_MEM_ENCRYPT.
> 
> If that is the case, why do you even need a new CC_ATTR define?
> 
> Might as well do:
> 
> 	if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
> 		native_wbinvd();
> 
> ?

As Tom pointed out using the CC_ATTR_MEM_ENCRYPT will miss WBINVD when kernel
disables SME by commandline.

> 
> > +	 */
> > +	CC_ATTR_HOST_MEM_INCOHERENT,
> > +
> 
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush during kexec
  2024-02-19 22:09         ` Tom Lendacky
  2024-02-20  2:57           ` Huang, Kai
@ 2024-02-20 14:28           ` Borislav Petkov
  2024-02-20 14:47             ` Tom Lendacky
  1 sibling, 1 reply; 42+ messages in thread
From: Borislav Petkov @ 2024-02-20 14:28 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Huang, Kai, linux-kernel, x86, dave.hansen, kirill.shutemov,
	tglx, mingo, hpa, luto, peterz, chao.gao, bhe, nik.borisov,
	pbonzini

On Mon, Feb 19, 2024 at 04:09:47PM -0600, Tom Lendacky wrote:
> That's why the '!(sev_status & MSR_AMD64_SEV_ENABLED)' works here.

I would've never figured that out just from staring at the test. :-\

> Basically, if you are bare-metal, it will return true. And it will only
> return true for machines that support SME and have the
> MSR_AMD64_SYSCFG_MEM_ENCRYPT bit set in SYS_CFG MSR because of where the
> 'cc_vendor = CC_VENDOR_AMD' assignment is. However, if you move the
> 'cc_vendor = CC_VENDOR_AMD' to before the if statement, then you will have
> the WBINVD called for any machine that supports SME, even if SME is not
> possible because the proper bit in the SYS_CFG MSR hasn't been set.
> 
> I know what I'm trying to say, let me know if it is making sense...

Yah, thanks for taking the time to explain.

Here's an even more radical idea:

Why not do WBINVD *unconditionally* on the CPU down path?

- it is the opposite of a fast path, i.e., no one cares

- it'll take care of every possible configuration without ugly logic

- it wouldn't hurt to have the caches nice and coherent before going
  down

Hmmm.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush during kexec
  2024-02-20  2:57           ` Huang, Kai
@ 2024-02-20 14:40             ` Tom Lendacky
  0 siblings, 0 replies; 42+ messages in thread
From: Tom Lendacky @ 2024-02-20 14:40 UTC (permalink / raw)
  To: Huang, Kai, bp
  Cc: Gao, Chao, Hansen, Dave, luto, x86, peterz, hpa, mingo,
	kirill.shutemov, tglx, linux-kernel, pbonzini, nik.borisov, bhe

On 2/19/24 20:57, Huang, Kai wrote:
> On Mon, 2024-02-19 at 16:09 -0600, Tom Lendacky wrote:
>> On 2/19/24 14:32, Borislav Petkov wrote:
>>> On Mon, Feb 19, 2024 at 01:45:37PM -0600, Tom Lendacky wrote:
>>>> This change won't return the correct answer. The check needs to remain
>>>> against the sev_status value.
>>>
>>> Feel free to explain because this patch is confusing me.
>>
>> In your previous email, you want to put the CC_ATTR_HOST_MEM_INCOHERENT
>> case statement with the CC_ATTR_MEM_ENCRYPT case which is returning
>> sme_me_mask. That will be zero/false if SME is not active, skipping the
>> WBINVD. But, in reality you still need to perform the WBINVD in case the
>> kexec target is doing mem_encrypt=on.
>>
>> That's why the '!(sev_status & MSR_AMD64_SEV_ENABLED)' works here.
>> Basically, if you are bare-metal, it will return true. And it will only
>> return true for machines that support SME and have the
>> MSR_AMD64_SYSCFG_MEM_ENCRYPT bit set in SYS_CFG MSR because of where the
>> 'cc_vendor = CC_VENDOR_AMD' assignment is.
>>
> 
> [...]
> 
>> However, if you move the
>> 'cc_vendor = CC_VENDOR_AMD' to before the if statement, then you will have
>> the WBINVD called for any machine that supports SME, even if SME is not
>> possible because the proper bit in the SYS_CFG MSR hasn't been set.
> 
> Hi Tom,
> 
> Thanks for clarifying.  However it seems to me that this is the behaviour in the
> current upstream code.  The stop_this_cpu() checks CPUID directly w/o checking
> the SYS_CFG MSR:

Correct, it will match the upstream behavior this way. It would have been 
improved slightly with your original patch by avoiding the WBINVD if the 
MSR_AMD64_SYSCFG_MEM_ENCRYPT wasn't set.

> 
>          if (c->extended_cpuid_level >= 0x8000001f &&
> 			(cpuid_eax(0x8000001f) & BIT(0)))
>                  native_wbinvd();
> 
> I believe the BIT(0) in CPUID, which is "Secure Memory Encryption Support", will
> always be true regardless of whether the MSR_AMD64_SYSCFG_MEM_ENCRYPT is set in
> SYS_CFG MSR?
> 
> If so, IIUC moving the 'cc_vendor = CC_VENDOR_AMD' to the place right before the
> if statement as suggested by Boris seems just follows the current behaviour in
> the upstream code.

Yep, that's how I see it, too.

Thanks,
Tom

> 
> Of course we need to always return true for CC_ATTR_HOST_MEM_INCOHERENT but not
> querying sme_me_mask.
> 
>>
>> I know what I'm trying to say, let me know if it is making sense...
>>
>>>
>>>>> So you can't put it before the if - just slap it in both branches. Geez!
>>>>
>>>> I think that will still work because sme_me_mask and sev_status will both be
>>>> 0 on bare-metal if 'msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT' doesn't evaluate to
>>>> true. However, that will cause any platform that hasn't enabled memory
>>>> encryption (see SYS_CFG MSR), to also perform the WBINVD.
>>>
>>> If it keeps the code simpler I don't mind. That's so not a fast path.
>>>
>>>> That won't work, because the current system may not have SME active. The
>>>> cases that needs to be caught are kexec'ing from a mem_encrypt=off to a
>>>> mem_encrypt=on or from a mem_encrypt=on to a mem_encrypt=off.
>>>
>>> And I'm saying, we should keep it simple and simply WBINVD on SME
>>> capable machines, regardless of the encryption setting.
>>
>> In that case, CC_ATTR_HOST_MEM_INCOHERENT needs to be separate from
>> CC_ATTR_MEM_ENCRYPT as the original patch has it. The comment might make
>> more sense as:
>>
>> 	 * CC_ATTR_HOST_MEM_INCOHERENT represents whether SME is possible
>> 	 * on the platform, regardless of whether mem_encrypt=on has been
>> 	 * used to make SME active.
>>
>> Thanks,
>> Tom
> 
> This looks good to me.  Thanks!
> 
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush during kexec
  2024-02-20 14:28           ` Borislav Petkov
@ 2024-02-20 14:47             ` Tom Lendacky
  2024-02-20 20:07               ` Huang, Kai
  0 siblings, 1 reply; 42+ messages in thread
From: Tom Lendacky @ 2024-02-20 14:47 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Huang, Kai, linux-kernel, x86, dave.hansen, kirill.shutemov,
	tglx, mingo, hpa, luto, peterz, chao.gao, bhe, nik.borisov,
	pbonzini

On 2/20/24 08:28, Borislav Petkov wrote:
> On Mon, Feb 19, 2024 at 04:09:47PM -0600, Tom Lendacky wrote:
>> That's why the '!(sev_status & MSR_AMD64_SEV_ENABLED)' works here.
> 
> I would've never figured that out just from staring at the test. :-\
> 
>> Basically, if you are bare-metal, it will return true. And it will only
>> return true for machines that support SME and have the
>> MSR_AMD64_SYSCFG_MEM_ENCRYPT bit set in SYS_CFG MSR because of where the
>> 'cc_vendor = CC_VENDOR_AMD' assignment is. However, if you move the
>> 'cc_vendor = CC_VENDOR_AMD' to before the if statement, then you will have
>> the WBINVD called for any machine that supports SME, even if SME is not
>> possible because the proper bit in the SYS_CFG MSR hasn't been set.
>>
>> I know what I'm trying to say, let me know if it is making sense...
> 
> Yah, thanks for taking the time to explain.
> 
> Here's an even more radical idea:
> 
> Why not do WBINVD *unconditionally* on the CPU down path?
> 
> - it is the opposite of a fast path, i.e., no one cares
> 
> - it'll take care of every possible configuration without ugly logic
> 
> - it wouldn't hurt to have the caches nice and coherent before going
>    down
> 
> Hmmm.

That's what I initially did, but errors were reported, see commit:
   f23d74f6c66c ("x86/mm: Rework wbinvd, hlt operation in stop_this_cpu()")

But that may be because of the issue solved by commit:
   1f5e7eb7868e ("x86/smp: Make stop_other_cpus() more robust")

So...

Thanks,
Tom

> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush during kexec
  2024-02-20 14:47             ` Tom Lendacky
@ 2024-02-20 20:07               ` Huang, Kai
  2024-02-20 22:30                 ` Tom Lendacky
  0 siblings, 1 reply; 42+ messages in thread
From: Huang, Kai @ 2024-02-20 20:07 UTC (permalink / raw)
  To: thomas.lendacky, bp
  Cc: Gao, Chao, Hansen, Dave, luto, x86, peterz, hpa, mingo,
	kirill.shutemov, tglx, linux-kernel, pbonzini, nik.borisov, bhe

On Tue, 2024-02-20 at 08:47 -0600, Tom Lendacky wrote:
> On 2/20/24 08:28, Borislav Petkov wrote:
> > On Mon, Feb 19, 2024 at 04:09:47PM -0600, Tom Lendacky wrote:
> > > That's why the '!(sev_status & MSR_AMD64_SEV_ENABLED)' works here.
> > 
> > I would've never figured that out just from staring at the test. :-\
> > 
> > > Basically, if you are bare-metal, it will return true. And it will only
> > > return true for machines that support SME and have the
> > > MSR_AMD64_SYSCFG_MEM_ENCRYPT bit set in SYS_CFG MSR because of where the
> > > 'cc_vendor = CC_VENDOR_AMD' assignment is. However, if you move the
> > > 'cc_vendor = CC_VENDOR_AMD' to before the if statement, then you will have
> > > the WBINVD called for any machine that supports SME, even if SME is not
> > > possible because the proper bit in the SYS_CFG MSR hasn't been set.
> > > 
> > > I know what I'm trying to say, let me know if it is making sense...
> > 
> > Yah, thanks for taking the time to explain.
> > 
> > Here's an even more radical idea:
> > 
> > Why not do WBINVD *unconditionally* on the CPU down path?
> > 
> > - it is the opposite of a fast path, i.e., no one cares
> > 
> > - it'll take care of every possible configuration without ugly logic
> > 
> > - it wouldn't hurt to have the caches nice and coherent before going
> >    down
> > 
> > Hmmm.
> 
> That's what I initially did, but errors were reported, see commit:
>    f23d74f6c66c ("x86/mm: Rework wbinvd, hlt operation in stop_this_cpu()")

This changelog only mentions "Some issues".  Do you know exactly what kind
issues did you see?  Are these issues only appeared on SME enabled system or
other non-SME-capable systems too?

Because ...

> 
> But that may be because of the issue solved by commit:
>    1f5e7eb7868e ("x86/smp: Make stop_other_cpus() more robust")

... the issue resolved in this commit seems to be hang.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush during kexec
  2024-02-20 20:07               ` Huang, Kai
@ 2024-02-20 22:30                 ` Tom Lendacky
  2024-02-21  1:38                   ` Huang, Kai
  2024-02-21  9:28                   ` Borislav Petkov
  0 siblings, 2 replies; 42+ messages in thread
From: Tom Lendacky @ 2024-02-20 22:30 UTC (permalink / raw)
  To: Huang, Kai, bp
  Cc: Gao, Chao, Hansen, Dave, luto, x86, peterz, hpa, mingo,
	kirill.shutemov, tglx, linux-kernel, pbonzini, nik.borisov, bhe

On 2/20/24 14:07, Huang, Kai wrote:
> On Tue, 2024-02-20 at 08:47 -0600, Tom Lendacky wrote:
>> On 2/20/24 08:28, Borislav Petkov wrote:
>>> On Mon, Feb 19, 2024 at 04:09:47PM -0600, Tom Lendacky wrote:
>>>> That's why the '!(sev_status & MSR_AMD64_SEV_ENABLED)' works here.
>>>
>>> I would've never figured that out just from staring at the test. :-\
>>>
>>>> Basically, if you are bare-metal, it will return true. And it will only
>>>> return true for machines that support SME and have the
>>>> MSR_AMD64_SYSCFG_MEM_ENCRYPT bit set in SYS_CFG MSR because of where the
>>>> 'cc_vendor = CC_VENDOR_AMD' assignment is. However, if you move the
>>>> 'cc_vendor = CC_VENDOR_AMD' to before the if statement, then you will have
>>>> the WBINVD called for any machine that supports SME, even if SME is not
>>>> possible because the proper bit in the SYS_CFG MSR hasn't been set.
>>>>
>>>> I know what I'm trying to say, let me know if it is making sense...
>>>
>>> Yah, thanks for taking the time to explain.
>>>
>>> Here's an even more radical idea:
>>>
>>> Why not do WBINVD *unconditionally* on the CPU down path?
>>>
>>> - it is the opposite of a fast path, i.e., no one cares
>>>
>>> - it'll take care of every possible configuration without ugly logic
>>>
>>> - it wouldn't hurt to have the caches nice and coherent before going
>>>     down
>>>
>>> Hmmm.
>>
>> That's what I initially did, but errors were reported, see commit:
>>     f23d74f6c66c ("x86/mm: Rework wbinvd, hlt operation in stop_this_cpu()")
> 
> This changelog only mentions "Some issues".  Do you know exactly what kind
> issues did you see?  Are these issues only appeared on SME enabled system or
> other non-SME-capable systems too?

I believe the issues were that different Intel systems would hang or reset 
and it was bisected to that commit that added the WBINVD. It was a while 
ago, but I remember that they were similar to what the 1f5e7eb7868e commit 
ended up fixing, which was debugged because sometimes the WBINVD was still 
occasionally issued resulting in the following patch

   9b040453d444 ("x86/smp: Dont access non-existing CPUID leaf")

It just means that if we go to an unconditional WBINVD, then we need to be 
careful.

Thanks,
Tom

> 
> Because ...
> 
>>
>> But that may be because of the issue solved by commit:
>>     1f5e7eb7868e ("x86/smp: Make stop_other_cpus() more robust")
> 
> ... the issue resolved in this commit seems to be hang.
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush during kexec
  2024-02-20 22:30                 ` Tom Lendacky
@ 2024-02-21  1:38                   ` Huang, Kai
  2024-02-21  9:28                   ` Borislav Petkov
  1 sibling, 0 replies; 42+ messages in thread
From: Huang, Kai @ 2024-02-21  1:38 UTC (permalink / raw)
  To: thomas.lendacky, bp
  Cc: Gao, Chao, luto, Hansen, Dave, x86, peterz, hpa, mingo,
	kirill.shutemov, tglx, linux-kernel, pbonzini, nik.borisov, bhe

On Tue, 2024-02-20 at 16:30 -0600, Tom Lendacky wrote:
> On 2/20/24 14:07, Huang, Kai wrote:
> > On Tue, 2024-02-20 at 08:47 -0600, Tom Lendacky wrote:
> > > On 2/20/24 08:28, Borislav Petkov wrote:
> > > > On Mon, Feb 19, 2024 at 04:09:47PM -0600, Tom Lendacky wrote:
> > > > > That's why the '!(sev_status & MSR_AMD64_SEV_ENABLED)' works here.
> > > > 
> > > > I would've never figured that out just from staring at the test. :-\
> > > > 
> > > > > Basically, if you are bare-metal, it will return true. And it will only
> > > > > return true for machines that support SME and have the
> > > > > MSR_AMD64_SYSCFG_MEM_ENCRYPT bit set in SYS_CFG MSR because of where the
> > > > > 'cc_vendor = CC_VENDOR_AMD' assignment is. However, if you move the
> > > > > 'cc_vendor = CC_VENDOR_AMD' to before the if statement, then you will have
> > > > > the WBINVD called for any machine that supports SME, even if SME is not
> > > > > possible because the proper bit in the SYS_CFG MSR hasn't been set.
> > > > > 
> > > > > I know what I'm trying to say, let me know if it is making sense...
> > > > 
> > > > Yah, thanks for taking the time to explain.
> > > > 
> > > > Here's an even more radical idea:
> > > > 
> > > > Why not do WBINVD *unconditionally* on the CPU down path?
> > > > 
> > > > - it is the opposite of a fast path, i.e., no one cares
> > > > 
> > > > - it'll take care of every possible configuration without ugly logic
> > > > 
> > > > - it wouldn't hurt to have the caches nice and coherent before going
> > > >     down
> > > > 
> > > > Hmmm.
> > > 
> > > That's what I initially did, but errors were reported, see commit:
> > >     f23d74f6c66c ("x86/mm: Rework wbinvd, hlt operation in stop_this_cpu()")
> > 
> > This changelog only mentions "Some issues".  Do you know exactly what kind
> > issues did you see?  Are these issues only appeared on SME enabled system or
> > other non-SME-capable systems too?
> 
> I believe the issues were that different Intel systems would hang or reset 
> and it was bisected to that commit that added the WBINVD. It was a while 
> ago, but I remember that they were similar to what the 1f5e7eb7868e commit 
> ended up fixing, which was debugged because sometimes the WBINVD was still 
> occasionally issued resulting in the following patch
> 
>    9b040453d444 ("x86/smp: Dont access non-existing CPUID leaf")
> 
> It just means that if we go to an unconditional WBINVD, then we need to be 
> careful.
> 
> 

Thanks Tom for the info.  That helps a lot.

Hi Boris, Dave,

I think I still prefer to keeping the existing SME kexec behaviour, that is, to
have the new CC_ATTR_HOST_MEM_INCOHERENT attribute, because in this way there
will be no risk.

However based on the information above I believe the risk is small if we switch
to unconditional WBINVD, in which way we don't need the new attribute and
there's also no new code needed for TDX to do cache flush.

Btw, I want to point out stop_this_cpu() is not the only place that needs to do
WBINVD for SME/TDX, the relocate_kernel() assembly also needs to:

        image->start = relocate_kernel((unsigned long)image->head,
                                     (unsigned long)page_list,
                                     image->start,
                                     image->preserve_context,
                                     cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT));

The last function argument cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT) is for SME.
The relocate_kernel() assembly checks the last argument and does WBINVD if it is
true.  If we go with unconditional WBINVD, I think we can just change the
assembly to do unconditional WBINVD and remove the last function parameter.

Please let me know your preference?


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush during kexec
  2024-02-20 22:30                 ` Tom Lendacky
  2024-02-21  1:38                   ` Huang, Kai
@ 2024-02-21  9:28                   ` Borislav Petkov
  2024-02-22 11:49                     ` Huang, Kai
  2024-02-23 10:41                     ` Dave Young
  1 sibling, 2 replies; 42+ messages in thread
From: Borislav Petkov @ 2024-02-21  9:28 UTC (permalink / raw)
  To: Tom Lendacky, Dave Young
  Cc: Huang, Kai, Gao, Chao, Hansen, Dave, luto, x86, peterz, hpa,
	mingo, kirill.shutemov, tglx, linux-kernel, pbonzini,
	nik.borisov, bhe

On Tue, Feb 20, 2024 at 04:30:13PM -0600, Tom Lendacky wrote:
> I believe the issues were that different Intel systems would hang or reset
> and it was bisected to that commit that added the WBINVD. It was a while
> ago, but I remember that they were similar to what the 1f5e7eb7868e commit
> ended up fixing, which was debugged because sometimes the WBINVD was still
> occasionally issued resulting in the following patch
> 
>   9b040453d444 ("x86/smp: Dont access non-existing CPUID leaf")
> 
> It just means that if we go to an unconditional WBINVD, then we need to be
> careful.

Let's try it.

Dave, do you remember what issues

  f23d74f6c66c ("x86/mm: Rework wbinvd, hlt operation in stop_this_cpu()")

fixed?

If so, can you try the below diff ontop of latest tip/master to see if
those issues would reappear?

Thx.

---

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index ab49ade31b0d..ec4dcc9f70ca 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -824,8 +824,7 @@ void __noreturn stop_this_cpu(void *dummy)
 	 * Test the CPUID bit directly because the machine might've cleared
 	 * X86_FEATURE_SME due to cmdline options.
 	 */
-	if (c->extended_cpuid_level >= 0x8000001f && (cpuid_eax(0x8000001f) & BIT(0)))
-		native_wbinvd();
+	native_wbinvd();
 
 	/*
 	 * This brings a cache line back and dirties it, but

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush during kexec
  2024-02-21  9:28                   ` Borislav Petkov
@ 2024-02-22 11:49                     ` Huang, Kai
  2024-02-23  3:13                       ` Dave Young
  2024-02-23 10:41                     ` Dave Young
  1 sibling, 1 reply; 42+ messages in thread
From: Huang, Kai @ 2024-02-22 11:49 UTC (permalink / raw)
  To: thomas.lendacky, dyoung, bp
  Cc: Gao, Chao, luto, Hansen, Dave, x86, peterz, hpa, mingo,
	kirill.shutemov, tglx, linux-kernel, pbonzini, nik.borisov, bhe

On Wed, 2024-02-21 at 10:28 +0100, Borislav Petkov wrote:
> On Tue, Feb 20, 2024 at 04:30:13PM -0600, Tom Lendacky wrote:
> > I believe the issues were that different Intel systems would hang or reset
> > and it was bisected to that commit that added the WBINVD. It was a while
> > ago, but I remember that they were similar to what the 1f5e7eb7868e commit
> > ended up fixing, which was debugged because sometimes the WBINVD was still
> > occasionally issued resulting in the following patch
> > 
> >   9b040453d444 ("x86/smp: Dont access non-existing CPUID leaf")
> > 
> > It just means that if we go to an unconditional WBINVD, then we need to be
> > careful.
> 
> Let's try it.
> 
> Dave, do you remember what issues
> 
>   f23d74f6c66c ("x86/mm: Rework wbinvd, hlt operation in stop_this_cpu()")
> 
> fixed?
> 
> If so, can you try the below diff ontop of latest tip/master to see if
> those issues would reappear?
> 
> Thx.
> 
> ---
> 
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index ab49ade31b0d..ec4dcc9f70ca 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -824,8 +824,7 @@ void __noreturn stop_this_cpu(void *dummy)
>  	 * Test the CPUID bit directly because the machine might've cleared
>  	 * X86_FEATURE_SME due to cmdline options.
>  	 */
> -	if (c->extended_cpuid_level >= 0x8000001f && (cpuid_eax(0x8000001f) & BIT(0)))
> -		native_wbinvd();
> +	native_wbinvd();
>  
>  	/*
>  	 * This brings a cache line back and dirties it, but
> 

I really appreciate if Dave can help here.

I will also reach out to see whether there's anyone in Intel met this before.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush during kexec
  2024-02-22 11:49                     ` Huang, Kai
@ 2024-02-23  3:13                       ` Dave Young
  0 siblings, 0 replies; 42+ messages in thread
From: Dave Young @ 2024-02-23  3:13 UTC (permalink / raw)
  To: Huang, Kai
  Cc: thomas.lendacky, bp, Gao, Chao, luto, Hansen, Dave, x86, peterz,
	hpa, mingo, kirill.shutemov, tglx, linux-kernel, pbonzini,
	nik.borisov, bhe

Hi,

On Thu, 22 Feb 2024 at 19:50, Huang, Kai <kai.huang@intel.com> wrote:
>
> On Wed, 2024-02-21 at 10:28 +0100, Borislav Petkov wrote:
> > On Tue, Feb 20, 2024 at 04:30:13PM -0600, Tom Lendacky wrote:
> > > I believe the issues were that different Intel systems would hang or reset
> > > and it was bisected to that commit that added the WBINVD. It was a while
> > > ago, but I remember that they were similar to what the 1f5e7eb7868e commit
> > > ended up fixing, which was debugged because sometimes the WBINVD was still
> > > occasionally issued resulting in the following patch
> > >
> > >   9b040453d444 ("x86/smp: Dont access non-existing CPUID leaf")
> > >
> > > It just means that if we go to an unconditional WBINVD, then we need to be
> > > careful.
> >
> > Let's try it.
> >
> > Dave, do you remember what issues
> >
> >   f23d74f6c66c ("x86/mm: Rework wbinvd, hlt operation in stop_this_cpu()")
> >
> > fixed?
> >
> > If so, can you try the below diff ontop of latest tip/master to see if
> > those issues would reappear?
> >
> > Thx.
> >
> > ---
> >
> > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> > index ab49ade31b0d..ec4dcc9f70ca 100644
> > --- a/arch/x86/kernel/process.c
> > +++ b/arch/x86/kernel/process.c
> > @@ -824,8 +824,7 @@ void __noreturn stop_this_cpu(void *dummy)
> >        * Test the CPUID bit directly because the machine might've cleared
> >        * X86_FEATURE_SME due to cmdline options.
> >        */
> > -     if (c->extended_cpuid_level >= 0x8000001f && (cpuid_eax(0x8000001f) & BIT(0)))
> > -             native_wbinvd();
> > +     native_wbinvd();
> >
> >       /*
> >        * This brings a cache line back and dirties it, but
> >
>
> I really appreciate if Dave can help here.
>
> I will also reach out to see whether there's anyone in Intel met this before.

I forgot the details now,  let me recall and try the patches if possible.

Thanks
Dave


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush during kexec
  2024-02-21  9:28                   ` Borislav Petkov
  2024-02-22 11:49                     ` Huang, Kai
@ 2024-02-23 10:41                     ` Dave Young
  2024-02-28  2:54                       ` Dave Young
  1 sibling, 1 reply; 42+ messages in thread
From: Dave Young @ 2024-02-23 10:41 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Tom Lendacky, Huang, Kai, Gao, Chao, Hansen, Dave, luto, x86,
	peterz, hpa, mingo, kirill.shutemov, tglx, linux-kernel,
	pbonzini, nik.borisov, bhe

On Wed, 21 Feb 2024 at 17:33, Borislav Petkov <bp@alien8.de> wrote:
>
> On Tue, Feb 20, 2024 at 04:30:13PM -0600, Tom Lendacky wrote:
> > I believe the issues were that different Intel systems would hang or reset
> > and it was bisected to that commit that added the WBINVD. It was a while
> > ago, but I remember that they were similar to what the 1f5e7eb7868e commit
> > ended up fixing, which was debugged because sometimes the WBINVD was still
> > occasionally issued resulting in the following patch
> >
> >   9b040453d444 ("x86/smp: Dont access non-existing CPUID leaf")
> >
> > It just means that if we go to an unconditional WBINVD, then we need to be
> > careful.
>
> Let's try it.
>
> Dave, do you remember what issues
>
>   f23d74f6c66c ("x86/mm: Rework wbinvd, hlt operation in stop_this_cpu()")
>
> fixed?

It should be a kexec reboot failure describe in below thread:
https://lore.kernel.org/lkml/20180117072123.GA1866@dhcp-128-65.nay.redhat.com/

>
> If so, can you try the below diff ontop of latest tip/master to see if
> those issues would reappear?

It was reproduced on an old laptop (Thinkpad t440s or t480s, I can not
remember), but I have replaced them with a new different one.  I tried
the latest tip-master with the if condition commented out, kexec
reboot works fine.

Let me try to find an old laptop to see if I can do more tests, will
get back later next week.

>
> Thx.
>
> ---
>
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index ab49ade31b0d..ec4dcc9f70ca 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -824,8 +824,7 @@ void __noreturn stop_this_cpu(void *dummy)
>          * Test the CPUID bit directly because the machine might've cleared
>          * X86_FEATURE_SME due to cmdline options.
>          */
> -       if (c->extended_cpuid_level >= 0x8000001f && (cpuid_eax(0x8000001f) & BIT(0)))
> -               native_wbinvd();
> +       native_wbinvd();
>
>         /*
>          * This brings a cache line back and dirties it, but
>
> --
> Regards/Gruss,
>     Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush during kexec
  2024-02-23 10:41                     ` Dave Young
@ 2024-02-28  2:54                       ` Dave Young
  2024-02-28  9:21                         ` Huang, Kai
  2024-02-28 10:44                         ` Borislav Petkov
  0 siblings, 2 replies; 42+ messages in thread
From: Dave Young @ 2024-02-28  2:54 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Tom Lendacky, Huang, Kai, Gao, Chao, Hansen, Dave, luto, x86,
	peterz, hpa, mingo, kirill.shutemov, tglx, linux-kernel,
	pbonzini, nik.borisov, bhe

On Fri, 23 Feb 2024 at 18:41, Dave Young <dyoung@redhat.com> wrote:
>
> On Wed, 21 Feb 2024 at 17:33, Borislav Petkov <bp@alien8.de> wrote:
> >
> > On Tue, Feb 20, 2024 at 04:30:13PM -0600, Tom Lendacky wrote:
> > > I believe the issues were that different Intel systems would hang or reset
> > > and it was bisected to that commit that added the WBINVD. It was a while
> > > ago, but I remember that they were similar to what the 1f5e7eb7868e commit
> > > ended up fixing, which was debugged because sometimes the WBINVD was still
> > > occasionally issued resulting in the following patch
> > >
> > >   9b040453d444 ("x86/smp: Dont access non-existing CPUID leaf")
> > >
> > > It just means that if we go to an unconditional WBINVD, then we need to be
> > > careful.
> >
> > Let's try it.
> >
> > Dave, do you remember what issues
> >
> >   f23d74f6c66c ("x86/mm: Rework wbinvd, hlt operation in stop_this_cpu()")
> >
> > fixed?
>
> It should be a kexec reboot failure describe in below thread:
> https://lore.kernel.org/lkml/20180117072123.GA1866@dhcp-128-65.nay.redhat.com/
>
> >
> > If so, can you try the below diff ontop of latest tip/master to see if
> > those issues would reappear?
>
> It was reproduced on an old laptop (Thinkpad t440s or t480s, I can not
> remember), but I have replaced them with a new different one.  I tried
> the latest tip-master with the if condition commented out, kexec
> reboot works fine.
>
> Let me try to find an old laptop to see if I can do more tests, will
> get back later next week.

Update: tested on an old laptop as well,  I did not find any problems
with unconditional native_wbinvd(), kexec and kdump works fine.

>
> >
> > Thx.
> >
> > ---
> >
> > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> > index ab49ade31b0d..ec4dcc9f70ca 100644
> > --- a/arch/x86/kernel/process.c
> > +++ b/arch/x86/kernel/process.c
> > @@ -824,8 +824,7 @@ void __noreturn stop_this_cpu(void *dummy)
> >          * Test the CPUID bit directly because the machine might've cleared
> >          * X86_FEATURE_SME due to cmdline options.
> >          */
> > -       if (c->extended_cpuid_level >= 0x8000001f && (cpuid_eax(0x8000001f) & BIT(0)))
> > -               native_wbinvd();
> > +       native_wbinvd();
> >
> >         /*
> >          * This brings a cache line back and dirties it, but
> >
> > --
> > Regards/Gruss,
> >     Boris.
> >
> > https://people.kernel.org/tglx/notes-about-netiquette
> >


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush during kexec
  2024-02-28  2:54                       ` Dave Young
@ 2024-02-28  9:21                         ` Huang, Kai
  2024-02-28 11:02                           ` Borislav Petkov
  2024-02-28 10:44                         ` Borislav Petkov
  1 sibling, 1 reply; 42+ messages in thread
From: Huang, Kai @ 2024-02-28  9:21 UTC (permalink / raw)
  To: dyoung, bp
  Cc: Gao, Chao, luto, Hansen, Dave, x86, bhe, peterz, hpa, mingo,
	thomas.lendacky, kirill.shutemov, linux-kernel, tglx,
	nik.borisov, pbonzini

On Wed, 2024-02-28 at 10:54 +0800, Dave Young wrote:
> On Fri, 23 Feb 2024 at 18:41, Dave Young <dyoung@redhat.com> wrote:
> > 
> > On Wed, 21 Feb 2024 at 17:33, Borislav Petkov <bp@alien8.de> wrote:
> > > 
> > > On Tue, Feb 20, 2024 at 04:30:13PM -0600, Tom Lendacky wrote:
> > > > I believe the issues were that different Intel systems would hang or reset
> > > > and it was bisected to that commit that added the WBINVD. It was a while
> > > > ago, but I remember that they were similar to what the 1f5e7eb7868e commit
> > > > ended up fixing, which was debugged because sometimes the WBINVD was still
> > > > occasionally issued resulting in the following patch
> > > > 
> > > >   9b040453d444 ("x86/smp: Dont access non-existing CPUID leaf")
> > > > 
> > > > It just means that if we go to an unconditional WBINVD, then we need to be
> > > > careful.
> > > 
> > > Let's try it.
> > > 
> > > Dave, do you remember what issues
> > > 
> > >   f23d74f6c66c ("x86/mm: Rework wbinvd, hlt operation in stop_this_cpu()")
> > > 
> > > fixed?
> > 
> > It should be a kexec reboot failure describe in below thread:
> > https://lore.kernel.org/lkml/20180117072123.GA1866@dhcp-128-65.nay.redhat.com/
> > 
> > > 
> > > If so, can you try the below diff ontop of latest tip/master to see if
> > > those issues would reappear?
> > 
> > It was reproduced on an old laptop (Thinkpad t440s or t480s, I can not
> > remember), but I have replaced them with a new different one.  I tried
> > the latest tip-master with the if condition commented out, kexec
> > reboot works fine.
> > 
> > Let me try to find an old laptop to see if I can do more tests, will
> > get back later next week.
> 
> Update: tested on an old laptop as well,  I did not find any problems
> with unconditional native_wbinvd(), kexec and kdump works fine.
> 
> 

Hi DaveY,

Thanks for getting back and I appreciate your help.

Hi Boris/DaveH,

Based on this I'll change to do unconditional WBINVD during kexec() for in
stop_this_cpu() and relocate_kernel().  Please let me know if you have any
comments.

Thanks all!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush during kexec
  2024-02-28  2:54                       ` Dave Young
  2024-02-28  9:21                         ` Huang, Kai
@ 2024-02-28 10:44                         ` Borislav Petkov
  1 sibling, 0 replies; 42+ messages in thread
From: Borislav Petkov @ 2024-02-28 10:44 UTC (permalink / raw)
  To: Dave Young
  Cc: Tom Lendacky, Huang, Kai, Gao, Chao, Hansen, Dave, luto, x86,
	peterz, hpa, mingo, kirill.shutemov, tglx, linux-kernel,
	pbonzini, nik.borisov, bhe

On Wed, Feb 28, 2024 at 10:54:22AM +0800, Dave Young wrote:
> Update: tested on an old laptop as well,  I did not find any problems
> with unconditional native_wbinvd(), kexec and kdump works fine.

Thanks a lot for testing it!

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush during kexec
  2024-02-28  9:21                         ` Huang, Kai
@ 2024-02-28 11:02                           ` Borislav Petkov
  2024-02-28 22:21                             ` Huang, Kai
  0 siblings, 1 reply; 42+ messages in thread
From: Borislav Petkov @ 2024-02-28 11:02 UTC (permalink / raw)
  To: Huang, Kai
  Cc: dyoung, Gao, Chao, luto, Hansen, Dave, x86, bhe, peterz, hpa,
	mingo, thomas.lendacky, kirill.shutemov, linux-kernel, tglx,
	nik.borisov, pbonzini

On Wed, Feb 28, 2024 at 09:21:00AM +0000, Huang, Kai wrote:
> Based on this I'll change to do unconditional WBINVD during kexec() for in
> stop_this_cpu() and relocate_kernel().  Please let me know if you have any
> comments.

Yes, pls make sure to summarize in the commit message what this thread
figured out. But basically, we want to try the simple approach and
WBINVD unconditionally on the CPU stopping path so that caches are clean
and there's no potential issues...

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush during kexec
  2024-02-28 11:02                           ` Borislav Petkov
@ 2024-02-28 22:21                             ` Huang, Kai
  0 siblings, 0 replies; 42+ messages in thread
From: Huang, Kai @ 2024-02-28 22:21 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: dyoung, Gao, Chao, luto, Hansen, Dave, x86, bhe, peterz, hpa,
	mingo, thomas.lendacky, kirill.shutemov, linux-kernel, tglx,
	nik.borisov, pbonzini



On 29/02/2024 12:02 am, Borislav Petkov wrote:
> On Wed, Feb 28, 2024 at 09:21:00AM +0000, Huang, Kai wrote:
>> Based on this I'll change to do unconditional WBINVD during kexec() for in
>> stop_this_cpu() and relocate_kernel().  Please let me know if you have any
>> comments.
> 
> Yes, pls make sure to summarize in the commit message what this thread
> figured out. But basically, we want to try the simple approach and
> WBINVD unconditionally on the CPU stopping path so that caches are clean
> and there's no potential issues...
> 

Yes will do.  Thanks for the feedback.

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2024-02-28 22:21 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-31 11:31 [PATCH 0/4] TDX host: kexec() support Huang, Kai
2024-01-31 11:31 ` [PATCH 1/4] x86/coco: Add a new CC attribute to unify cache flush during kexec Huang, Kai
2024-02-19 16:16   ` Borislav Petkov
2024-02-19 19:45     ` Tom Lendacky
2024-02-19 20:32       ` Borislav Petkov
2024-02-19 22:09         ` Tom Lendacky
2024-02-20  2:57           ` Huang, Kai
2024-02-20 14:40             ` Tom Lendacky
2024-02-20 14:28           ` Borislav Petkov
2024-02-20 14:47             ` Tom Lendacky
2024-02-20 20:07               ` Huang, Kai
2024-02-20 22:30                 ` Tom Lendacky
2024-02-21  1:38                   ` Huang, Kai
2024-02-21  9:28                   ` Borislav Petkov
2024-02-22 11:49                     ` Huang, Kai
2024-02-23  3:13                       ` Dave Young
2024-02-23 10:41                     ` Dave Young
2024-02-28  2:54                       ` Dave Young
2024-02-28  9:21                         ` Huang, Kai
2024-02-28 11:02                           ` Borislav Petkov
2024-02-28 22:21                             ` Huang, Kai
2024-02-28 10:44                         ` Borislav Petkov
2024-02-20  3:12     ` Huang, Kai
2024-01-31 11:31 ` [PATCH 2/4] x86/virt/tdx: Advertise the CC_ATTR_HOST_MEM_INCOHERENT for TDX host Huang, Kai
2024-01-31 17:11   ` Dave Hansen
2024-02-01 14:42     ` Huang, Kai
2024-01-31 11:31 ` [PATCH 3/4] x86/kexec(): Reset TDX private memory on platforms with TDX erratum Huang, Kai
2024-01-31 21:21   ` Dave Hansen
2024-01-31 22:03     ` Kirill A. Shutemov
2024-02-01 14:22       ` Huang, Kai
2024-02-01 14:39         ` Kirill A. Shutemov
2024-02-01 14:47           ` Huang, Kai
2024-02-01 16:57           ` Dave Hansen
2024-02-05  6:49             ` Huang, Kai
2024-02-01 14:35     ` Huang, Kai
2024-02-02  0:54   ` Edgecombe, Rick P
2024-02-05  6:44     ` Huang, Kai
2024-01-31 11:31 ` [PATCH 4/4] x86/virt/tdx: Remove the !KEXEC_CORE dependency Huang, Kai
2024-02-01 18:28 ` [PATCH 0/4] TDX host: kexec() support Tom Lendacky
2024-02-05  6:50   ` Huang, Kai
2024-02-06 18:56     ` Kalra, Ashish
2024-02-07  1:43       ` Huang, Kai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).