From: Kai Huang <kai.huang@intel.com>
To: Dave Hansen <dave.hansen@intel.com>,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: seanjc@google.com, pbonzini@redhat.com, len.brown@intel.com,
tony.luck@intel.com, rafael.j.wysocki@intel.com,
reinette.chatre@intel.com, dan.j.williams@intel.com,
peterz@infradead.org, ak@linux.intel.com,
kirill.shutemov@linux.intel.com,
sathyanarayanan.kuppuswamy@linux.intel.com,
isaku.yamahata@intel.com
Subject: Re: [PATCH v3 00/21] TDX host kernel support
Date: Wed, 27 Apr 2022 13:15:02 +1200 [thread overview]
Message-ID: <ecf718abf864bbb2366209f00d4315ada090aedc.camel@intel.com> (raw)
In-Reply-To: <522e37eb-68fc-35db-44d5-479d0088e43f@intel.com>
On Tue, 2022-04-26 at 13:13 -0700, Dave Hansen wrote:
> On 4/5/22 21:49, Kai Huang wrote:
> > SEAM VMX root operation is designed to host a CPU-attested, software
> > module called the 'TDX module' which implements functions to manage
> > crypto protected VMs called Trust Domains (TD). SEAM VMX root is also
>
> "crypto protected"? What the heck is that?
How about "crypto-protected"? I googled and it seems it is used by someone
else.
>
> > designed to host a CPU-attested, software module called the 'Intel
> > Persistent SEAMLDR (Intel P-SEAMLDR)' to load and update the TDX module.
> >
> > Host kernel transits to either the P-SEAMLDR or the TDX module via a new
>
> ^ The
Thanks.
>
> > SEAMCALL instruction. SEAMCALLs are host-side interface functions
> > defined by the P-SEAMLDR and the TDX module around the new SEAMCALL
> > instruction. They are similar to a hypercall, except they are made by
> > host kernel to the SEAM software modules.
>
> This is still missing some important high-level things, like that the
> TDX module is protected from the untrusted VMM. Heck, it forgets to
> mention that the VMM itself is untrusted and the TDX module replaces
> things that the VMM usually does.
>
> It would also be nice to mention here how this compares with SEV-SNP.
> Where is the TDX module in that design? Why doesn't SEV need all this code?
>
> > TDX leverages Intel Multi-Key Total Memory Encryption (MKTME) to crypto
> > protect TD guests. TDX reserves part of MKTME KeyID space as TDX private
> > KeyIDs, which can only be used by software runs in SEAM. The physical
>
> ^ which
Thanks.
>
> > address bits for encoding TDX private KeyID are treated as reserved bits
> > when not in SEAM operation. The partitioning of MKTME KeyIDs and TDX
> > private KeyIDs is configured by BIOS.
> >
> > Before being able to manage TD guests, the TDX module must be loaded
> > and properly initialized using SEAMCALLs defined by TDX architecture.
> > This series assumes both the P-SEAMLDR and the TDX module are loaded by
> > BIOS before the kernel boots.
> >
> > There's no CPUID or MSR to detect either the P-SEAMLDR or the TDX module.
> > Instead, detecting them can be done by using P-SEAMLDR's SEAMLDR.INFO
> > SEAMCALL to detect P-SEAMLDR. The success of this SEAMCALL means the
> > P-SEAMLDR is loaded. The P-SEAMLDR information returned by this
> > SEAMCALL further tells whether TDX module is loaded.
>
> There's a bit of information missing here. The kernel might not know
> the state of things being loaded. A previous kernel might have loaded
> it and left it in an unknown state.
>
> > The TDX module is initialized in multiple steps:
> >
> > 1) Global initialization;
> > 2) Logical-CPU scope initialization;
> > 3) Enumerate the TDX module capabilities;
> > 4) Configure the TDX module about usable memory ranges and
> > global KeyID information;
> > 5) Package-scope configuration for the global KeyID;
> > 6) Initialize TDX metadata for usable memory ranges based on 4).
> >
> > Step 2) requires calling some SEAMCALL on all "BIOS-enabled" (in MADT
> > table) logical cpus, otherwise step 4) will fail. Step 5) requires
> > calling SEAMCALL on at least one cpu on all packages.
> >
> > TDX module can also be shut down at any time during module's lifetime, by
> > calling SEAMCALL on all "BIOS-enabled" logical cpus.
> >
> > == Design Considerations ==
> >
> > 1. Lazy TDX module initialization on-demand by caller
>
> This doesn't really tell us what "lazy" is or what the alternatives are.
>
> There are basically two ways the TDX module could be loaded. Either:
> * In early boot
> or
> * At runtime just before the first TDX guest is run
>
> This series implements the runtime loading.
OK will do.
>
> > None of the steps in the TDX module initialization process must be done
> > during kernel boot. This series doesn't initialize TDX at boot time, but
> > instead, provides two functions to allow caller to detect and initialize
> > TDX on demand:
> >
> > if (tdx_detect())
> > goto no_tdx;
> > if (tdx_init())
> > goto no_tdx;
> >
> > This approach has below pros:
> >
> > 1) Initializing the TDX module requires to reserve ~1/256th system RAM as
> > metadata. Enabling TDX on demand allows only to consume this memory when
> > TDX is truly needed (i.e. when KVM wants to create TD guests).
> >
> > 2) Both detecting and initializing the TDX module require calling
> > SEAMCALL. However, SEAMCALL requires CPU being already in VMX operation
> > (VMXON has been done). So far, KVM is the only user of TDX, and it
> > already handles VMXON/VMXOFF. Therefore, letting KVM to initialize TDX
> > on-demand avoids handling VMXON/VMXOFF (which is not that trivial) in
> > core-kernel. Also, in long term, likely a reference based VMXON/VMXOFF
> > approach is needed since more kernel components will need to handle
> > VMXON/VMXONFF.
> >
> > 3) It is more flexible to support "TDX module runtime update" (not in
> > this series). After updating to the new module at runtime, kernel needs
> > to go through the initialization process again. For the new module,
> > it's possible the metadata allocated for the old module cannot be reused
> > for the new module, and needs to be re-allocated again.
> >
> > 2. Kernel policy on TDX memory
> >
> > Host kernel is responsible for choosing which memory regions can be used
> > as TDX memory, and configuring those memory regions to the TDX module by
> > using an array of "TD Memory Regions" (TDMR), which is a data structure
> > defined by TDX architecture.
>
>
> This is putting the cart before the horse. Don't define the details up
> front.
>
> The TDX architecture allows the VMM to designate specific memory
> as usable for TDX private memory. This series chooses to
> designate _all_ system RAM as TDX to avoid having to modify the
> page allocator to distinguish TDX and non-TDX-capable memory
>
> ... then go on to explain the details.
Thanks. Will update.
>
> > The first generation of TDX essentially guarantees that all system RAM
> > memory regions (excluding the memory below 1MB) can be used as TDX
> > memory. To avoid having to modify the page allocator to distinguish TDX
> > and non-TDX allocation, this series chooses to use all system RAM as TDX
> > memory.
> >
> > E820 table is used to find all system RAM entries. Following
> > e820__memblock_setup(), both E820_TYPE_RAM and E820_TYPE_RESERVED_KERN
> > types are treated as TDX memory, and contiguous ranges in the same NUMA
> > node are merged together (similar to memblock_add()) before trimming the
> > non-page-aligned part.
>
> This e820 cruft is too much detail for a cover letter. In general, once
> you start talking about individual functions, you've gone too far in the
> cover letter.
Will remove.
>
> > 3. Memory hotplug
> >
> > The first generation of TDX architecturally doesn't support memory
> > hotplug. And the first generation of TDX-capable platforms don't support
> > physical memory hotplug. Since it physically cannot happen, this series
> > doesn't add any check in ACPI memory hotplug code path to disable it.
> >
> > A special case of memory hotplug is adding NVDIMM as system RAM using
> > kmem driver. However the first generation of TDX-capable platforms
> > cannot enable TDX and NVDIMM simultaneously, so in practice this cannot
> > happen either.
>
> What prevents this code from today's code being run on tomorrow's
> platforms and breaking these assumptions?
I forgot to add below (which is in the documentation patch):
"This can be enhanced when future generation of TDX starts to support ACPI
memory hotplug, or NVDIMM and TDX can be enabled simultaneously on the
same platform."
Is this acceptable?
>
> > Another case is admin can use 'memmap' kernel command line to create
> > legacy PMEMs and use them as TD guest memory, or theoretically, can use
> > kmem driver to add them as system RAM. To avoid having to change memory
> > hotplug code to prevent this from happening, this series always include
> > legacy PMEMs when constructing TDMRs so they are also TDX memory.
> >
> > 4. CPU hotplug
> >
> > The first generation of TDX architecturally doesn't support ACPI CPU
> > hotplug. All logical cpus are enabled by BIOS in MADT table. Also, the
> > first generation of TDX-capable platforms don't support ACPI CPU hotplug
> > either. Since this physically cannot happen, this series doesn't add any
> > check in ACPI CPU hotplug code path to disable it.
> >
> > Also, only TDX module initialization requires all BIOS-enabled cpus are
> > online. After the initialization, any logical cpu can be brought down
> > and brought up to online again later. Therefore this series doesn't
> > change logical CPU hotplug either.
> >
> > 5. TDX interaction with kexec()
> >
> > If TDX is ever enabled and/or used to run any TD guests, the cachelines
> > of TDX private memory, including PAMTs, used by TDX module need to be
> > flushed before transiting to the new kernel otherwise they may silently
> > corrupt the new kernel. Similar to SME, this series flushes cache in
> > stop_this_cpu().
>
> What does this have to do with kexec()? What's a PAMT?
The point is the dirty cachelines of TDX private memory must be flushed
otherwise they may slightly corrupt the new kexec()-ed kernel.
Will use "TDX metadata" instead of "PAMT". The former has already been
mentioned above.
>
> > The TDX module can be initialized only once during its lifetime. The
> > first generation of TDX doesn't have interface to reset TDX module to
>
> ^ an
Thanks.
>
> > uninitialized state so it can be initialized again.
> >
> > This implies:
> >
> > - If the old kernel fails to initialize TDX, the new kernel cannot
> > use TDX too unless the new kernel fixes the bug which leads to
> > initialization failure in the old kernel and can resume from where
> > the old kernel stops. This requires certain coordination between
> > the two kernels.
>
> OK, but what does this *MEAN*?
This means we need to extend the information which the old kernel passes to the
new kernel. But I don't think it's feasible. I'll refine this kexec() section
to make it more concise next version.
>
> > - If the old kernel has initialized TDX successfully, the new kernel
> > may be able to use TDX if the two kernels have the exactly same
> > configurations on the TDX module. It further requires the new kernel
> > to reserve the TDX metadata pages (allocated by the old kernel) in
> > its page allocator. It also requires coordination between the two
> > kernels. Furthermore, if kexec() is done when there are active TD
> > guests running, the new kernel cannot use TDX because it's extremely
> > hard for the old kernel to pass all TDX private pages to the new
> > kernel.
> >
> > Given that, this series doesn't support TDX after kexec() (except the
> > old kernel doesn't attempt to initialize TDX at all).
> >
> > And this series doesn't shut down TDX module but leaves it open during
> > kexec(). It is because shutting down TDX module requires CPU being in
> > VMX operation but there's no guarantee of this during kexec(). Leaving
> > the TDX module open is not the best case, but it is OK since the new
> > kernel won't be able to use TDX anyway (therefore TDX module won't run
> > at all).
>
> tl;dr: kexec() doesn't work with this code.
>
> Right?
>
> That doesn't seem good.
It can work in my understanding. We just need to flush cache before booting to
the new kernel.
--
Thanks,
-Kai
next prev parent reply other threads:[~2022-04-27 1:15 UTC|newest]
Thread overview: 156+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-06 4:49 [PATCH v3 00/21] TDX host kernel support Kai Huang
2022-04-06 4:49 ` [PATCH v3 01/21] x86/virt/tdx: Detect SEAM Kai Huang
2022-04-18 22:29 ` Sathyanarayanan Kuppuswamy
2022-04-18 22:50 ` Sean Christopherson
2022-04-19 3:38 ` Kai Huang
2022-04-26 20:21 ` Dave Hansen
2022-04-26 23:12 ` Kai Huang
2022-04-26 23:28 ` Dave Hansen
2022-04-26 23:49 ` Kai Huang
2022-04-27 0:22 ` Sean Christopherson
2022-04-27 0:44 ` Kai Huang
2022-04-27 14:22 ` Dave Hansen
2022-04-27 22:39 ` Kai Huang
2022-04-06 4:49 ` [PATCH v3 02/21] x86/virt/tdx: Detect TDX private KeyIDs Kai Huang
2022-04-19 5:39 ` Sathyanarayanan Kuppuswamy
2022-04-19 9:41 ` Kai Huang
2022-04-19 5:42 ` Sathyanarayanan Kuppuswamy
2022-04-19 10:07 ` Kai Huang
2022-04-06 4:49 ` [PATCH v3 03/21] x86/virt/tdx: Implement the SEAMCALL base function Kai Huang
2022-04-19 14:07 ` Sathyanarayanan Kuppuswamy
2022-04-20 4:16 ` Kai Huang
2022-04-20 7:29 ` Sathyanarayanan Kuppuswamy
2022-04-20 10:39 ` Kai Huang
2022-04-26 20:37 ` Dave Hansen
2022-04-26 23:29 ` Kai Huang
2022-04-06 4:49 ` [PATCH v3 04/21] x86/virt/tdx: Add skeleton for detecting and initializing TDX on demand Kai Huang
2022-04-19 14:53 ` Sathyanarayanan Kuppuswamy
2022-04-20 4:37 ` Kai Huang
2022-04-20 5:21 ` Dave Hansen
2022-04-20 14:30 ` Sathyanarayanan Kuppuswamy
2022-04-20 22:35 ` Kai Huang
2022-04-26 20:53 ` Dave Hansen
2022-04-27 0:43 ` Kai Huang
2022-04-27 14:49 ` Dave Hansen
2022-04-28 0:00 ` Kai Huang
2022-04-28 14:27 ` Dave Hansen
2022-04-28 23:44 ` Kai Huang
2022-04-28 23:53 ` Dave Hansen
2022-04-29 0:11 ` Kai Huang
2022-04-29 0:26 ` Dave Hansen
2022-04-29 0:59 ` Kai Huang
2022-04-06 4:49 ` [PATCH v3 05/21] x86/virt/tdx: Detect P-SEAMLDR and TDX module Kai Huang
2022-04-26 20:56 ` Dave Hansen
2022-04-27 0:01 ` Kai Huang
2022-04-27 14:24 ` Dave Hansen
2022-04-27 21:30 ` Kai Huang
2022-04-06 4:49 ` [PATCH v3 06/21] x86/virt/tdx: Shut down TDX module in case of error Kai Huang
2022-04-23 15:39 ` Sathyanarayanan Kuppuswamy
2022-04-25 23:41 ` Kai Huang
2022-04-26 1:48 ` Sathyanarayanan Kuppuswamy
2022-04-26 2:12 ` Kai Huang
2022-04-26 20:59 ` Dave Hansen
2022-04-27 0:06 ` Kai Huang
2022-05-18 16:19 ` Sagi Shahar
2022-05-18 23:51 ` Kai Huang
2022-04-06 4:49 ` [PATCH v3 07/21] x86/virt/tdx: Do TDX module global initialization Kai Huang
2022-04-20 22:27 ` Sathyanarayanan Kuppuswamy
2022-04-20 22:37 ` Kai Huang
2022-04-06 4:49 ` [PATCH v3 08/21] x86/virt/tdx: Do logical-cpu scope TDX module initialization Kai Huang
2022-04-24 1:27 ` Sathyanarayanan Kuppuswamy
2022-04-25 23:55 ` Kai Huang
2022-04-06 4:49 ` [PATCH v3 09/21] x86/virt/tdx: Get information about TDX module and convertible memory Kai Huang
2022-04-25 2:58 ` Sathyanarayanan Kuppuswamy
2022-04-26 0:05 ` Kai Huang
2022-04-27 22:15 ` Dave Hansen
2022-04-28 0:15 ` Kai Huang
2022-04-28 14:06 ` Dave Hansen
2022-04-28 23:14 ` Kai Huang
2022-04-29 17:47 ` Dave Hansen
2022-05-02 5:04 ` Kai Huang
2022-05-25 4:47 ` Kai Huang
2022-05-25 4:57 ` Kai Huang
2022-05-25 16:00 ` Kai Huang
2022-05-18 22:30 ` Sagi Shahar
2022-05-18 23:56 ` Kai Huang
2022-04-06 4:49 ` [PATCH v3 10/21] x86/virt/tdx: Add placeholder to coveret all system RAM as TDX memory Kai Huang
2022-04-20 20:48 ` Isaku Yamahata
2022-04-20 22:38 ` Kai Huang
2022-04-27 22:24 ` Dave Hansen
2022-04-28 0:53 ` Kai Huang
2022-04-28 1:07 ` Dave Hansen
2022-04-28 1:35 ` Kai Huang
2022-04-28 3:40 ` Dave Hansen
2022-04-28 3:55 ` Kai Huang
2022-04-06 4:49 ` [PATCH v3 11/21] x86/virt/tdx: Choose to use " Kai Huang
2022-04-20 20:55 ` Isaku Yamahata
2022-04-20 22:39 ` Kai Huang
2022-04-28 15:54 ` Dave Hansen
2022-04-29 7:32 ` Kai Huang
2022-04-06 4:49 ` [PATCH v3 12/21] x86/virt/tdx: Create TDMRs to cover all system RAM Kai Huang
2022-04-28 16:22 ` Dave Hansen
2022-04-29 7:24 ` Kai Huang
2022-04-29 13:52 ` Dave Hansen
2022-04-06 4:49 ` [PATCH v3 13/21] x86/virt/tdx: Allocate and set up PAMTs for TDMRs Kai Huang
2022-04-28 17:12 ` Dave Hansen
2022-04-29 7:46 ` Kai Huang
2022-04-29 14:20 ` Dave Hansen
2022-04-29 14:30 ` Sean Christopherson
2022-04-29 17:46 ` Dave Hansen
2022-04-29 18:19 ` Sean Christopherson
2022-04-29 18:32 ` Dave Hansen
2022-05-02 5:59 ` Kai Huang
2022-05-02 14:17 ` Dave Hansen
2022-05-02 21:55 ` Kai Huang
2022-04-06 4:49 ` [PATCH v3 14/21] x86/virt/tdx: Set up reserved areas for all TDMRs Kai Huang
2022-04-06 4:49 ` [PATCH v3 15/21] x86/virt/tdx: Reserve TDX module global KeyID Kai Huang
2022-04-06 4:49 ` [PATCH v3 16/21] x86/virt/tdx: Configure TDX module with TDMRs and " Kai Huang
2022-04-06 4:49 ` [PATCH v3 17/21] x86/virt/tdx: Configure global KeyID on all packages Kai Huang
2022-04-06 4:49 ` [PATCH v3 18/21] x86/virt/tdx: Initialize all TDMRs Kai Huang
2022-04-06 4:49 ` [PATCH v3 19/21] x86: Flush cache of TDX private memory during kexec() Kai Huang
2022-04-06 4:49 ` [PATCH v3 20/21] x86/virt/tdx: Add kernel command line to opt-in TDX host support Kai Huang
2022-04-28 17:25 ` Dave Hansen
2022-04-06 4:49 ` [PATCH v3 21/21] Documentation/x86: Add documentation for " Kai Huang
2022-04-14 10:19 ` [PATCH v3 00/21] TDX host kernel support Kai Huang
2022-04-26 20:13 ` Dave Hansen
2022-04-27 1:15 ` Kai Huang [this message]
2022-04-27 21:59 ` Dave Hansen
2022-04-28 0:37 ` Kai Huang
2022-04-28 0:50 ` Dave Hansen
2022-04-28 0:58 ` Kai Huang
2022-04-29 1:40 ` Kai Huang
2022-04-29 3:04 ` Dan Williams
2022-04-29 5:35 ` Kai Huang
2022-05-03 23:59 ` Kai Huang
2022-05-04 0:25 ` Dave Hansen
2022-05-04 1:15 ` Kai Huang
2022-05-05 9:54 ` Kai Huang
2022-05-05 13:51 ` Dan Williams
2022-05-05 22:14 ` Kai Huang
2022-05-06 0:22 ` Dan Williams
2022-05-06 0:45 ` Kai Huang
2022-05-06 1:15 ` Dan Williams
2022-05-06 1:46 ` Kai Huang
2022-05-06 15:57 ` Dan Williams
2022-05-09 2:46 ` Kai Huang
2022-05-10 10:25 ` Kai Huang
2022-05-07 0:09 ` Mike Rapoport
2022-05-08 10:00 ` Kai Huang
2022-05-09 10:33 ` Mike Rapoport
2022-05-09 23:27 ` Kai Huang
2022-05-04 14:31 ` Dan Williams
2022-05-04 22:50 ` Kai Huang
2022-04-28 1:01 ` Dan Williams
2022-04-28 1:21 ` Kai Huang
2022-04-29 2:58 ` Dan Williams
2022-04-29 5:43 ` Kai Huang
2022-04-29 14:39 ` Dave Hansen
2022-04-29 15:18 ` Dan Williams
2022-04-29 17:18 ` Dave Hansen
2022-04-29 17:48 ` Dan Williams
2022-04-29 18:34 ` Dave Hansen
2022-04-29 18:47 ` Dan Williams
2022-04-29 19:20 ` Dave Hansen
2022-04-29 21:20 ` Dan Williams
2022-04-29 21:27 ` Dave Hansen
2022-05-02 10:18 ` Kai Huang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ecf718abf864bbb2366209f00d4315ada090aedc.camel@intel.com \
--to=kai.huang@intel.com \
--cc=ak@linux.intel.com \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@intel.com \
--cc=isaku.yamahata@intel.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=len.brown@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=rafael.j.wysocki@intel.com \
--cc=reinette.chatre@intel.com \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=seanjc@google.com \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).