From: "Huang, Kai" <kai.huang@intel.com>
To: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"Hansen, Dave" <dave.hansen@intel.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Cc: "Luck, Tony" <tony.luck@intel.com>,
"bagasdotme@gmail.com" <bagasdotme@gmail.com>,
"ak@linux.intel.com" <ak@linux.intel.com>,
"Wysocki, Rafael J" <rafael.j.wysocki@intel.com>,
"kirill.shutemov@linux.intel.com"
<kirill.shutemov@linux.intel.com>, "Christopherson,,
Sean" <seanjc@google.com>,
"Chatre, Reinette" <reinette.chatre@intel.com>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"Yamahata, Isaku" <isaku.yamahata@intel.com>,
"peterz@infradead.org" <peterz@infradead.org>,
"Shahar, Sagi" <sagis@google.com>,
"imammedo@redhat.com" <imammedo@redhat.com>,
"Gao, Chao" <chao.gao@intel.com>,
"Brown, Len" <len.brown@intel.com>,
"sathyanarayanan.kuppuswamy@linux.intel.com"
<sathyanarayanan.kuppuswamy@linux.intel.com>,
"Huang, Ying" <ying.huang@intel.com>,
"Williams, Dan J" <dan.j.williams@intel.com>
Subject: Re: [PATCH v8 15/16] x86/virt/tdx: Flush cache in kexec() when TDX is enabled
Date: Tue, 10 Jan 2023 11:29:49 +0000 [thread overview]
Message-ID: <6f959f494f0fb3dedfa963c3d6a0ce7f395b745d.camel@intel.com> (raw)
In-Reply-To: <ba0fdee9-148b-b0b9-ecde-2610eff02ba1@intel.com>
On Fri, 2023-01-06 at 16:35 -0800, Dave Hansen wrote:
> On 12/8/22 22:52, Kai Huang wrote:
> > There are two problems in terms of using kexec() to boot to a new kernel
> > when the old kernel has enabled TDX: 1) Part of the memory pages are
> > still TDX private pages (i.e. metadata used by the TDX module, and any
> > TDX guest memory if kexec() happens when there's any TDX guest alive).
> > 2) There might be dirty cachelines associated with TDX private pages.
> >
> > Because the hardware doesn't guarantee cache coherency among different
> > KeyIDs, the old kernel needs to flush cache (of those TDX private pages)
> > before booting to the new kernel. Also, reading TDX private page using
> > any shared non-TDX KeyID with integrity-check enabled can trigger #MC.
> > Therefore ideally, the kernel should convert all TDX private pages back
> > to normal before booting to the new kernel.
>
> This is just talking about way too many things that just don't apply.
>
> Let's focus on the *ACTUAL* problem that's being addressed instead of
> the 15 problems that aren't actual practical problems.
Will get rid of those.
>
> > However, this implementation doesn't convert TDX private pages back to
> > normal in kexec() because of below considerations:
> >
> > 1) Neither the kernel nor the TDX module has existing infrastructure to
> > track which pages are TDX private pages.
> > 2) The number of TDX private pages can be large, and converting all of
> > them (cache flush + using MOVDIR64B to clear the page) in kexec() can
> > be time consuming.
> > 3) The new kernel will almost only use KeyID 0 to access memory. KeyID
> > 0 doesn't support integrity-check, so it's OK.
> > 4) The kernel doesn't (and may never) support MKTME. If any 3rd party
> > kernel ever supports MKTME, it can/should do MOVDIR64B to clear the
> > page with the new MKTME KeyID (just like TDX does) before using it.
>
> Yeah, why are we getting all worked up about MKTME when there is not
> support?
I am not sure whether we need to consider 3rd party kernel case?
>
> The only thing that matters here is dirty cacheline writeback. There
> are two things the kernel needs to do to mitigate that:
>
> 1. Stop accessing TDX private memory mappings
> 1a. Stop making TDX module calls (uses global private KeyID)
> 1b. Stop TDX guests from running (uses per-guest KeyID)
> 2. Flush any cachelines from previous private KeyID writes
>
> There are a couple of ways we can do #2. We do *NOT* need to convert
> *ANYTHING* back to KeyID 0. Page conversion doesn't even come into play
> in any way as far as I can tell.
May I ask why? When I was writing this patch I was not sure whether kexec()
should give the new kernel a clean slate. SGX driver doesn't EREMOVE all EPC
during kexec() but depends on the new kernel to do that too, but I don't know
what's the general guide of supporting kexec().
>
> I think you're also saying that since all CPUs go through this path and
> there is no TDX activity between the WBINVD and the native_halt() that
> 1a and 1b basically happen for "free" without needing to do theme
> explicitly.
Yes. Should we mention this part in changelog?
AMD SME kexec() support patch bba4ed011a52d ("x86/mm, kexec: Allow kexec to be
used with SME") seems doesn't mention anything similar (SME and TDX may be
different, though).
>
> > Therefore, this implementation just flushes cache to make sure there are
> > no stale dirty cachelines associated with any TDX private KeyIDs before
> > booting to the new kernel, otherwise they may silently corrupt the new
> > kernel.
>
> That's fine. So, this patch kinda happens to land in the right spot
> even after thrashing about about a while.
>
> > Following SME support, use wbinvd() to flush cache in stop_this_cpu().
> > Theoretically, cache flush is only needed when the TDX module has been
> > initialized. However initializing the TDX module is done on demand at
> > runtime, and it takes a mutex to read the module status. Just check
> > whether TDX is enabled by BIOS instead to flush cache.
>
> Yeah, close enough.
>
> > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> > index c21b7347a26d..0cc84977dc62 100644
> > --- a/arch/x86/kernel/process.c
> > +++ b/arch/x86/kernel/process.c
> > @@ -765,8 +765,14 @@ void __noreturn stop_this_cpu(void *dummy)
> > *
> > * Test the CPUID bit directly because the machine might've cleared
> > * X86_FEATURE_SME due to cmdline options.
> > + *
> > + * Similar to SME, if the TDX module is ever initialized, the
> > + * cachelines associated with any TDX private KeyID must be flushed
> > + * before transiting to the new kernel. The TDX module is initialized
> > + * on demand, and it takes the mutex to read its status. Just check
> > + * whether TDX is enabled by BIOS instead to flush cache.
> > */
>
> There's too much detail here. Let's up-level it a bit. We don't need
> to be talking about TDX locking here.
Sure will do. Thanks!
>
> /*
> * The TDX module or guests might have left dirty cachelines
> * behind. Flush them to avoid corruption from later writeback.
> * Note that this flushes on all systems where TDX is possible,
> * but does not actually check that TDX was in use.
> */
>
> > - if (cpuid_eax(0x8000001f) & BIT(0))
> > + if (cpuid_eax(0x8000001f) & BIT(0) || platform_tdx_enabled())
> > native_wbinvd();
> > for (;;) {
> > /*
>
>
next prev parent reply other threads:[~2023-01-10 11:30 UTC|newest]
Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-09 6:52 [PATCH v8 00/16] TDX host kernel support Kai Huang
2022-12-09 6:52 ` [PATCH v8 01/16] x86/tdx: Define TDX supported page sizes as macros Kai Huang
2023-01-06 19:04 ` Dave Hansen
2022-12-09 6:52 ` [PATCH v8 02/16] x86/virt/tdx: Detect TDX during kernel boot Kai Huang
2023-01-06 17:09 ` Dave Hansen
2023-01-08 22:25 ` Huang, Kai
2022-12-09 6:52 ` [PATCH v8 03/16] x86/virt/tdx: Make INTEL_TDX_HOST depend on X86_X2APIC Kai Huang
2023-01-06 19:04 ` Dave Hansen
2022-12-09 6:52 ` [PATCH v8 04/16] x86/virt/tdx: Add skeleton to initialize TDX on demand Kai Huang
2023-01-06 17:14 ` Dave Hansen
2023-01-08 22:26 ` Huang, Kai
2022-12-09 6:52 ` [PATCH v8 05/16] x86/virt/tdx: Implement functions to make SEAMCALL Kai Huang
2023-01-06 17:29 ` Dave Hansen
2023-01-09 10:30 ` Huang, Kai
2023-01-09 19:54 ` Dave Hansen
2023-01-09 22:10 ` Huang, Kai
2022-12-09 6:52 ` [PATCH v8 06/16] x86/virt/tdx: Get information about TDX module and TDX-capable memory Kai Huang
2023-01-06 17:46 ` Dave Hansen
2023-01-09 10:25 ` Huang, Kai
2023-01-09 19:52 ` Dave Hansen
2023-01-09 22:07 ` Huang, Kai
2023-01-09 22:11 ` Dave Hansen
2022-12-09 6:52 ` [PATCH v8 07/16] x86/virt/tdx: Use all system memory when initializing TDX module as TDX memory Kai Huang
2023-01-06 18:18 ` Dave Hansen
2023-01-09 11:48 ` Huang, Kai
2023-01-09 16:51 ` Dave Hansen
2023-01-10 12:09 ` Huang, Kai
2023-01-10 16:18 ` Dave Hansen
2023-01-11 10:00 ` Huang, Kai
2023-01-12 0:56 ` Huang, Ying
2023-01-12 1:18 ` Huang, Kai
2023-01-12 1:59 ` Huang, Ying
2023-01-12 2:22 ` Huang, Kai
2023-01-12 11:33 ` Huang, Kai
2023-01-18 11:08 ` Huang, Kai
2023-01-18 13:57 ` David Hildenbrand
2023-01-18 19:38 ` Huang, Kai
2022-12-09 6:52 ` [PATCH v8 08/16] x86/virt/tdx: Add placeholder to construct TDMRs to cover all TDX memory regions Kai Huang
2023-01-06 19:24 ` Dave Hansen
2023-01-10 0:40 ` Huang, Kai
2023-01-10 0:47 ` Dave Hansen
2023-01-10 2:23 ` Huang, Kai
2023-01-10 19:12 ` Dave Hansen
2023-01-11 9:23 ` Huang, Kai
2022-12-09 6:52 ` [PATCH v8 09/16] x86/virt/tdx: Fill out " Kai Huang
2023-01-06 19:36 ` Dave Hansen
2023-01-10 0:45 ` Huang, Kai
2022-12-09 6:52 ` [PATCH v8 10/16] x86/virt/tdx: Allocate and set up PAMTs for TDMRs Kai Huang
2023-01-06 21:53 ` Dave Hansen
2023-01-10 0:49 ` Huang, Kai
2023-01-07 0:47 ` Dave Hansen
2023-01-10 0:47 ` Huang, Kai
2022-12-09 6:52 ` [PATCH v8 11/16] x86/virt/tdx: Designate reserved areas for all TDMRs Kai Huang
2023-01-06 22:07 ` Dave Hansen
2023-01-10 1:19 ` Huang, Kai
2023-01-10 1:22 ` Dave Hansen
2023-01-10 11:01 ` Huang, Kai
2023-01-10 15:19 ` Dave Hansen
2023-01-11 10:57 ` Huang, Kai
2023-01-11 16:16 ` Dave Hansen
2023-01-11 22:10 ` Huang, Kai
2023-01-10 11:01 ` Huang, Kai
2023-01-10 15:17 ` Dave Hansen
2022-12-09 6:52 ` [PATCH v8 12/16] x86/virt/tdx: Designate the global KeyID and configure the TDX module Kai Huang
2023-01-06 22:21 ` Dave Hansen
2023-01-10 10:48 ` Huang, Kai
2023-01-10 16:25 ` Dave Hansen
2023-01-10 23:33 ` Huang, Kai
2022-12-09 6:52 ` [PATCH v8 13/16] x86/virt/tdx: Configure global KeyID on all packages Kai Huang
2023-01-06 22:49 ` Dave Hansen
2023-01-10 10:15 ` Huang, Kai
2023-01-10 16:53 ` Dave Hansen
2023-01-11 0:06 ` Huang, Kai
2022-12-09 6:52 ` [PATCH v8 14/16] x86/virt/tdx: Initialize all TDMRs Kai Huang
2023-01-07 0:17 ` Dave Hansen
2023-01-10 10:23 ` Huang, Kai
2022-12-09 6:52 ` [PATCH v8 15/16] x86/virt/tdx: Flush cache in kexec() when TDX is enabled Kai Huang
2023-01-07 0:35 ` Dave Hansen
2023-01-10 11:29 ` Huang, Kai [this message]
2023-01-10 15:27 ` Dave Hansen
2023-01-11 0:13 ` Huang, Kai
2023-01-11 0:30 ` Dave Hansen
2023-01-11 1:58 ` Huang, Kai
2022-12-09 6:52 ` [PATCH v8 16/16] Documentation/x86: Add documentation for TDX host support Kai Huang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6f959f494f0fb3dedfa963c3d6a0ce7f395b745d.camel@intel.com \
--to=kai.huang@intel.com \
--cc=ak@linux.intel.com \
--cc=bagasdotme@gmail.com \
--cc=chao.gao@intel.com \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@intel.com \
--cc=imammedo@redhat.com \
--cc=isaku.yamahata@intel.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=len.brown@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=rafael.j.wysocki@intel.com \
--cc=reinette.chatre@intel.com \
--cc=sagis@google.com \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).