linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kai Huang <kai.huang@intel.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: x86@kernel.org, dave.hansen@intel.com,
	kirill.shutemov@linux.intel.com, peterz@infradead.org,
	tony.luck@intel.com, tglx@linutronix.de, bp@alien8.de,
	mingo@redhat.com, hpa@zytor.com, seanjc@google.com,
	pbonzini@redhat.com, rafael@kernel.org, david@redhat.com,
	dan.j.williams@intel.com, len.brown@intel.com,
	ak@linux.intel.com, isaku.yamahata@intel.com,
	ying.huang@intel.com, chao.gao@intel.com,
	sathyanarayanan.kuppuswamy@linux.intel.com, nik.borisov@suse.com,
	bagasdotme@gmail.com, sagis@google.com, imammedo@redhat.com,
	kai.huang@intel.com
Subject: [PATCH v14 17/23] x86/kexec: Flush cache of TDX private memory
Date: Tue, 17 Oct 2023 23:14:41 +1300	[thread overview]
Message-ID: <da8c976d1fd2530d55ab4164429b513aca1b9efc.1697532085.git.kai.huang@intel.com> (raw)
In-Reply-To: <cover.1697532085.git.kai.huang@intel.com>

There are two problems in terms of using kexec() to boot to a new kernel
when the old kernel has enabled TDX: 1) Part of the memory pages are
still TDX private pages; 2) There might be dirty cachelines associated
with TDX private pages.

The first problem doesn't matter on the platforms w/o the "partial write
machine check" erratum.  KeyID 0 doesn't have integrity check.  If the
new kernel wants to use any non-zero KeyID, it needs to convert the
memory to that KeyID and such conversion would work from any KeyID.

However the old kernel needs to guarantee there's no dirty cacheline
left behind before booting to the new kernel to avoid silent corruption
from later cacheline writeback (Intel hardware doesn't guarantee cache
coherency across different KeyIDs).

There are two things that the old kernel needs to do to achieve that:

1) Stop accessing TDX private memory mappings:
   a. Stop making TDX module SEAMCALLs (TDX global KeyID);
   b. Stop TDX guests from running (per-guest TDX KeyID).
2) Flush any cachelines from previous TDX private KeyID writes.

For 2), use wbinvd() to flush cache in stop_this_cpu(), following SME
support.  And in this way 1) happens for free as there's no TDX activity
between wbinvd() and the native_halt().

Flushing cache in stop_this_cpu() only flushes cache on remote cpus.  On
the rebooting cpu which does kexec(), unlike SME which does the cache
flush in relocate_kernel(), flush the cache right after stopping remote
cpus in machine_shutdown().

There are two reasons to do so: 1) For TDX there's no need to defer
cache flush to relocate_kernel() because all TDX activities have been
stopped.  2) On the platforms with the above erratum the kernel must
convert all TDX private pages back to normal before booting to the new
kernel in kexec(), and flushing cache early allows the kernel to convert
memory early rather than having to muck with the relocate_kernel()
assembly.

Theoretically, cache flush is only needed when the TDX module has been
initialized.  However initializing the TDX module is done on demand at
runtime, and it takes a mutex to read the module status.  Just check
whether TDX is enabled by the BIOS instead to flush cache.

Signed-off-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---

v13 -> v14:
 - No change

---
 arch/x86/kernel/process.c |  8 +++++++-
 arch/x86/kernel/reboot.c  | 15 +++++++++++++++
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 9f0909142a0a..c197be03ea06 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -830,8 +830,14 @@ void __noreturn stop_this_cpu(void *dummy)
 	 *
 	 * Test the CPUID bit directly because the machine might've cleared
 	 * X86_FEATURE_SME due to cmdline options.
+	 *
+	 * The TDX module or guests might have left dirty cachelines
+	 * behind.  Flush them to avoid corruption from later writeback.
+	 * Note that this flushes on all systems where TDX is possible,
+	 * but does not actually check that TDX was in use.
 	 */
-	if (c->extended_cpuid_level >= 0x8000001f && (cpuid_eax(0x8000001f) & BIT(0)))
+	if ((c->extended_cpuid_level >= 0x8000001f && (cpuid_eax(0x8000001f) & BIT(0)))
+			|| platform_tdx_enabled())
 		native_wbinvd();
 
 	/*
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 830425e6d38e..e1a4fa8de11d 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -31,6 +31,7 @@
 #include <asm/realmode.h>
 #include <asm/x86_init.h>
 #include <asm/efi.h>
+#include <asm/tdx.h>
 
 /*
  * Power off function, if any
@@ -741,6 +742,20 @@ void native_machine_shutdown(void)
 	local_irq_disable();
 	stop_other_cpus();
 #endif
+	/*
+	 * stop_other_cpus() has flushed all dirty cachelines of TDX
+	 * private memory on remote cpus.  Unlike SME, which does the
+	 * cache flush on _this_ cpu in the relocate_kernel(), flush
+	 * the cache for _this_ cpu here.  This is because on the
+	 * platforms with "partial write machine check" erratum the
+	 * kernel needs to convert all TDX private pages back to normal
+	 * before booting to the new kernel in kexec(), and the cache
+	 * flush must be done before that.  If the kernel took SME's way,
+	 * it would have to muck with the relocate_kernel() assembly to
+	 * do memory conversion.
+	 */
+	if (platform_tdx_enabled())
+		native_wbinvd();
 
 	lapic_shutdown();
 	restore_boot_irq_mode();
-- 
2.41.0


  parent reply	other threads:[~2023-10-17 10:18 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-17 10:14 [PATCH v14 00/23] TDX host kernel support Kai Huang
2023-10-17 10:14 ` [PATCH v14 01/23] x86/virt/tdx: Detect TDX during kernel boot Kai Huang
2023-10-17 13:24   ` Kuppuswamy Sathyanarayanan
2023-10-17 10:14 ` [PATCH v14 02/23] x86/tdx: Define TDX supported page sizes as macros Kai Huang
2023-10-17 10:14 ` [PATCH v14 03/23] x86/virt/tdx: Make INTEL_TDX_HOST depend on X86_X2APIC Kai Huang
2023-10-17 10:14 ` [PATCH v14 04/23] x86/cpu: Detect TDX partial write machine check erratum Kai Huang
2023-10-17 10:14 ` [PATCH v14 05/23] x86/virt/tdx: Handle SEAMCALL no entropy error in common code Kai Huang
2023-10-17 13:34   ` Kuppuswamy Sathyanarayanan
2023-10-17 10:14 ` [PATCH v14 06/23] x86/virt/tdx: Add SEAMCALL error printing for module initialization Kai Huang
2023-10-17 13:37   ` Kuppuswamy Sathyanarayanan
2023-10-18  6:27     ` Huang, Kai
2023-10-18  7:40   ` Nikolay Borisov
2023-10-18  8:26     ` Huang, Kai
2023-10-18 14:17   ` Kuppuswamy Sathyanarayanan
2023-10-17 10:14 ` [PATCH v14 07/23] x86/virt/tdx: Add skeleton to enable TDX on demand Kai Huang
2023-10-17 14:24   ` Kuppuswamy Sathyanarayanan
2023-10-18  6:51     ` Huang, Kai
2023-10-18 13:56       ` Dave Hansen
2023-10-18 19:55         ` Huang, Kai
2023-10-18  7:57   ` Nikolay Borisov
2023-10-18  8:29     ` Huang, Kai
2023-10-18  8:39       ` Nikolay Borisov
2023-10-18  8:57         ` Huang, Kai
2023-10-18  9:14   ` Nikolay Borisov
2023-10-18  9:17     ` Huang, Kai
2023-10-17 10:14 ` [PATCH v14 08/23] x86/virt/tdx: Get information about TDX module and TDX-capable memory Kai Huang
2023-10-17 10:14 ` [PATCH v14 09/23] x86/virt/tdx: Use all system memory when initializing TDX module as TDX memory Kai Huang
2023-10-17 10:14 ` [PATCH v14 10/23] x86/virt/tdx: Add placeholder to construct TDMRs to cover all TDX memory regions Kai Huang
2023-10-17 10:14 ` [PATCH v14 11/23] x86/virt/tdx: Fill out " Kai Huang
2023-10-17 10:14 ` [PATCH v14 12/23] x86/virt/tdx: Allocate and set up PAMTs for TDMRs Kai Huang
2023-10-24  5:53   ` Nikolay Borisov
2023-10-24 10:49     ` Huang, Kai
2023-10-24 13:31       ` Nikolay Borisov
2023-10-17 10:14 ` [PATCH v14 13/23] x86/virt/tdx: Designate reserved areas for all TDMRs Kai Huang
2023-10-17 10:14 ` [PATCH v14 14/23] x86/virt/tdx: Configure TDX module with the TDMRs and global KeyID Kai Huang
2023-10-17 10:14 ` [PATCH v14 15/23] x86/virt/tdx: Configure global KeyID on all packages Kai Huang
2023-10-17 10:14 ` [PATCH v14 16/23] x86/virt/tdx: Initialize all TDMRs Kai Huang
2023-10-17 10:14 ` Kai Huang [this message]
2023-10-17 10:14 ` [PATCH v14 18/23] x86/virt/tdx: Keep TDMRs when module initialization is successful Kai Huang
2023-10-17 10:14 ` [PATCH v14 19/23] x86/virt/tdx: Improve readability of module initialization error handling Kai Huang
2023-10-17 10:14 ` [PATCH v14 20/23] x86/kexec(): Reset TDX private memory on platforms with TDX erratum Kai Huang
2023-10-17 10:14 ` [PATCH v14 21/23] x86/virt/tdx: Handle TDX interaction with ACPI S3 and deeper states Kai Huang
2023-10-17 10:53   ` Rafael J. Wysocki
2023-10-18  3:22     ` Huang, Kai
2023-10-18 10:15       ` Rafael J. Wysocki
2023-10-18 10:51         ` Huang, Kai
2023-10-18 10:53           ` Rafael J. Wysocki
2023-10-19 20:45             ` Huang, Kai
2023-10-24 10:46               ` Huang, Kai
2023-10-17 10:14 ` [PATCH v14 22/23] x86/mce: Improve error log of kernel space TDX #MC due to erratum Kai Huang
2023-10-17 10:14 ` [PATCH v14 23/23] Documentation/x86: Add documentation for TDX host support Kai Huang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=da8c976d1fd2530d55ab4164429b513aca1b9efc.1697532085.git.kai.huang@intel.com \
    --to=kai.huang@intel.com \
    --cc=ak@linux.intel.com \
    --cc=bagasdotme@gmail.com \
    --cc=bp@alien8.de \
    --cc=chao.gao@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=hpa@zytor.com \
    --cc=imammedo@redhat.com \
    --cc=isaku.yamahata@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=nik.borisov@suse.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=sagis@google.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).