linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Hansen <dave.hansen@intel.com>
To: Kai Huang <kai.huang@intel.com>,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: seanjc@google.com, pbonzini@redhat.com, len.brown@intel.com,
	tony.luck@intel.com, rafael.j.wysocki@intel.com,
	reinette.chatre@intel.com, dan.j.williams@intel.com,
	peterz@infradead.org, ak@linux.intel.com,
	kirill.shutemov@linux.intel.com,
	sathyanarayanan.kuppuswamy@linux.intel.com,
	isaku.yamahata@intel.com
Subject: Re: [PATCH v3 00/21] TDX host kernel support
Date: Tue, 26 Apr 2022 13:13:00 -0700	[thread overview]
Message-ID: <522e37eb-68fc-35db-44d5-479d0088e43f@intel.com> (raw)
In-Reply-To: <cover.1649219184.git.kai.huang@intel.com>

On 4/5/22 21:49, Kai Huang wrote:
> SEAM VMX root operation is designed to host a CPU-attested, software
> module called the 'TDX module' which implements functions to manage
> crypto protected VMs called Trust Domains (TD).  SEAM VMX root is also

"crypto protected"?  What the heck is that?

> designed to host a CPU-attested, software module called the 'Intel
> Persistent SEAMLDR (Intel P-SEAMLDR)' to load and update the TDX module.
> 
> Host kernel transits to either the P-SEAMLDR or the TDX module via a new

 ^ The

> SEAMCALL instruction.  SEAMCALLs are host-side interface functions
> defined by the P-SEAMLDR and the TDX module around the new SEAMCALL
> instruction.  They are similar to a hypercall, except they are made by
> host kernel to the SEAM software modules.

This is still missing some important high-level things, like that the
TDX module is protected from the untrusted VMM.  Heck, it forgets to
mention that the VMM itself is untrusted and the TDX module replaces
things that the VMM usually does.

It would also be nice to mention here how this compares with SEV-SNP.
Where is the TDX module in that design?  Why doesn't SEV need all this code?

> TDX leverages Intel Multi-Key Total Memory Encryption (MKTME) to crypto
> protect TD guests.  TDX reserves part of MKTME KeyID space as TDX private
> KeyIDs, which can only be used by software runs in SEAM.  The physical

					    ^ which

> address bits for encoding TDX private KeyID are treated as reserved bits
> when not in SEAM operation.  The partitioning of MKTME KeyIDs and TDX
> private KeyIDs is configured by BIOS.
> 
> Before being able to manage TD guests, the TDX module must be loaded
> and properly initialized using SEAMCALLs defined by TDX architecture.
> This series assumes both the P-SEAMLDR and the TDX module are loaded by
> BIOS before the kernel boots.
> 
> There's no CPUID or MSR to detect either the P-SEAMLDR or the TDX module.
> Instead, detecting them can be done by using P-SEAMLDR's SEAMLDR.INFO
> SEAMCALL to detect P-SEAMLDR.  The success of this SEAMCALL means the
> P-SEAMLDR is loaded.  The P-SEAMLDR information returned by this
> SEAMCALL further tells whether TDX module is loaded.

There's a bit of information missing here.  The kernel might not know
the state of things being loaded.  A previous kernel might have loaded
it and left it in an unknown state.

> The TDX module is initialized in multiple steps:
> 
>         1) Global initialization;
>         2) Logical-CPU scope initialization;
>         3) Enumerate the TDX module capabilities;
>         4) Configure the TDX module about usable memory ranges and
>            global KeyID information;
>         5) Package-scope configuration for the global KeyID;
>         6) Initialize TDX metadata for usable memory ranges based on 4).
> 
> Step 2) requires calling some SEAMCALL on all "BIOS-enabled" (in MADT
> table) logical cpus, otherwise step 4) will fail.  Step 5) requires
> calling SEAMCALL on at least one cpu on all packages.
> 
> TDX module can also be shut down at any time during module's lifetime, by
> calling SEAMCALL on all "BIOS-enabled" logical cpus.
> 
> == Design Considerations ==
> 
> 1. Lazy TDX module initialization on-demand by caller

This doesn't really tell us what "lazy" is or what the alternatives are.

There are basically two ways the TDX module could be loaded.  Either:
  * In early boot
or
  * At runtime just before the first TDX guest is run

This series implements the runtime loading.

> None of the steps in the TDX module initialization process must be done
> during kernel boot.  This series doesn't initialize TDX at boot time, but
> instead, provides two functions to allow caller to detect and initialize
> TDX on demand:
> 
>         if (tdx_detect())
>                 goto no_tdx;
>         if (tdx_init())
>                 goto no_tdx;
> 
> This approach has below pros:
> 
> 1) Initializing the TDX module requires to reserve ~1/256th system RAM as
> metadata.  Enabling TDX on demand allows only to consume this memory when
> TDX is truly needed (i.e. when KVM wants to create TD guests).
> 
> 2) Both detecting and initializing the TDX module require calling
> SEAMCALL.  However, SEAMCALL requires CPU being already in VMX operation
> (VMXON has been done).  So far, KVM is the only user of TDX, and it
> already handles VMXON/VMXOFF.  Therefore, letting KVM to initialize TDX
> on-demand avoids handling VMXON/VMXOFF (which is not that trivial) in
> core-kernel.  Also, in long term, likely a reference based VMXON/VMXOFF
> approach is needed since more kernel components will need to handle
> VMXON/VMXONFF.
> 
> 3) It is more flexible to support "TDX module runtime update" (not in
> this series).  After updating to the new module at runtime, kernel needs
> to go through the initialization process again.  For the new module,
> it's possible the metadata allocated for the old module cannot be reused
> for the new module, and needs to be re-allocated again.
> 
> 2. Kernel policy on TDX memory
> 
> Host kernel is responsible for choosing which memory regions can be used
> as TDX memory, and configuring those memory regions to the TDX module by
> using an array of "TD Memory Regions" (TDMR), which is a data structure
> defined by TDX architecture.


This is putting the cart before the horse.  Don't define the details up
front.

	The TDX architecture allows the VMM to designate specific memory
	as usable for TDX private memory.  This series chooses to
	designate _all_ system RAM as TDX to avoid having to modify the
	page allocator to distinguish TDX and non-TDX-capable memory

... then go on to explain the details.

> The first generation of TDX essentially guarantees that all system RAM
> memory regions (excluding the memory below 1MB) can be used as TDX
> memory.  To avoid having to modify the page allocator to distinguish TDX
> and non-TDX allocation, this series chooses to use all system RAM as TDX
> memory.
> 
> E820 table is used to find all system RAM entries.  Following
> e820__memblock_setup(), both E820_TYPE_RAM and E820_TYPE_RESERVED_KERN
> types are treated as TDX memory, and contiguous ranges in the same NUMA
> node are merged together (similar to memblock_add()) before trimming the
> non-page-aligned part.

This e820 cruft is too much detail for a cover letter.  In general, once
you start talking about individual functions, you've gone too far in the
cover letter.

> 3. Memory hotplug
> 
> The first generation of TDX architecturally doesn't support memory
> hotplug.  And the first generation of TDX-capable platforms don't support
> physical memory hotplug.  Since it physically cannot happen, this series
> doesn't add any check in ACPI memory hotplug code path to disable it.
> 
> A special case of memory hotplug is adding NVDIMM as system RAM using
> kmem driver.  However the first generation of TDX-capable platforms
> cannot enable TDX and NVDIMM simultaneously, so in practice this cannot
> happen either.

What prevents this code from today's code being run on tomorrow's
platforms and breaking these assumptions?

> Another case is admin can use 'memmap' kernel command line to create
> legacy PMEMs and use them as TD guest memory, or theoretically, can use
> kmem driver to add them as system RAM.  To avoid having to change memory
> hotplug code to prevent this from happening, this series always include
> legacy PMEMs when constructing TDMRs so they are also TDX memory.
> 
> 4. CPU hotplug
> 
> The first generation of TDX architecturally doesn't support ACPI CPU
> hotplug.  All logical cpus are enabled by BIOS in MADT table.  Also, the
> first generation of TDX-capable platforms don't support ACPI CPU hotplug
> either.  Since this physically cannot happen, this series doesn't add any
> check in ACPI CPU hotplug code path to disable it.
> 
> Also, only TDX module initialization requires all BIOS-enabled cpus are
> online.  After the initialization, any logical cpu can be brought down
> and brought up to online again later.  Therefore this series doesn't
> change logical CPU hotplug either.
> 
> 5. TDX interaction with kexec()
> 
> If TDX is ever enabled and/or used to run any TD guests, the cachelines
> of TDX private memory, including PAMTs, used by TDX module need to be
> flushed before transiting to the new kernel otherwise they may silently
> corrupt the new kernel.  Similar to SME, this series flushes cache in
> stop_this_cpu().

What does this have to do with kexec()?  What's a PAMT?

> The TDX module can be initialized only once during its lifetime.  The
> first generation of TDX doesn't have interface to reset TDX module to

				      ^ an

> uninitialized state so it can be initialized again.
> 
> This implies:
> 
>   - If the old kernel fails to initialize TDX, the new kernel cannot
>     use TDX too unless the new kernel fixes the bug which leads to
>     initialization failure in the old kernel and can resume from where
>     the old kernel stops. This requires certain coordination between
>     the two kernels.

OK, but what does this *MEAN*?

>   - If the old kernel has initialized TDX successfully, the new kernel
>     may be able to use TDX if the two kernels have the exactly same
>     configurations on the TDX module. It further requires the new kernel
>     to reserve the TDX metadata pages (allocated by the old kernel) in
>     its page allocator. It also requires coordination between the two
>     kernels.  Furthermore, if kexec() is done when there are active TD
>     guests running, the new kernel cannot use TDX because it's extremely
>     hard for the old kernel to pass all TDX private pages to the new
>     kernel.
> 
> Given that, this series doesn't support TDX after kexec() (except the
> old kernel doesn't attempt to initialize TDX at all).
> 
> And this series doesn't shut down TDX module but leaves it open during
> kexec().  It is because shutting down TDX module requires CPU being in
> VMX operation but there's no guarantee of this during kexec().  Leaving
> the TDX module open is not the best case, but it is OK since the new
> kernel won't be able to use TDX anyway (therefore TDX module won't run
> at all).

tl;dr: kexec() doesn't work with this code.

Right?

That doesn't seem good.

  parent reply	other threads:[~2022-04-26 20:12 UTC|newest]

Thread overview: 156+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-06  4:49 [PATCH v3 00/21] TDX host kernel support Kai Huang
2022-04-06  4:49 ` [PATCH v3 01/21] x86/virt/tdx: Detect SEAM Kai Huang
2022-04-18 22:29   ` Sathyanarayanan Kuppuswamy
2022-04-18 22:50     ` Sean Christopherson
2022-04-19  3:38     ` Kai Huang
2022-04-26 20:21   ` Dave Hansen
2022-04-26 23:12     ` Kai Huang
2022-04-26 23:28       ` Dave Hansen
2022-04-26 23:49         ` Kai Huang
2022-04-27  0:22           ` Sean Christopherson
2022-04-27  0:44             ` Kai Huang
2022-04-27 14:22           ` Dave Hansen
2022-04-27 22:39             ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 02/21] x86/virt/tdx: Detect TDX private KeyIDs Kai Huang
2022-04-19  5:39   ` Sathyanarayanan Kuppuswamy
2022-04-19  9:41     ` Kai Huang
2022-04-19  5:42   ` Sathyanarayanan Kuppuswamy
2022-04-19 10:07     ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 03/21] x86/virt/tdx: Implement the SEAMCALL base function Kai Huang
2022-04-19 14:07   ` Sathyanarayanan Kuppuswamy
2022-04-20  4:16     ` Kai Huang
2022-04-20  7:29       ` Sathyanarayanan Kuppuswamy
2022-04-20 10:39         ` Kai Huang
2022-04-26 20:37   ` Dave Hansen
2022-04-26 23:29     ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 04/21] x86/virt/tdx: Add skeleton for detecting and initializing TDX on demand Kai Huang
2022-04-19 14:53   ` Sathyanarayanan Kuppuswamy
2022-04-20  4:37     ` Kai Huang
2022-04-20  5:21       ` Dave Hansen
2022-04-20 14:30       ` Sathyanarayanan Kuppuswamy
2022-04-20 22:35         ` Kai Huang
2022-04-26 20:53   ` Dave Hansen
2022-04-27  0:43     ` Kai Huang
2022-04-27 14:49       ` Dave Hansen
2022-04-28  0:00         ` Kai Huang
2022-04-28 14:27           ` Dave Hansen
2022-04-28 23:44             ` Kai Huang
2022-04-28 23:53               ` Dave Hansen
2022-04-29  0:11                 ` Kai Huang
2022-04-29  0:26                   ` Dave Hansen
2022-04-29  0:59                     ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 05/21] x86/virt/tdx: Detect P-SEAMLDR and TDX module Kai Huang
2022-04-26 20:56   ` Dave Hansen
2022-04-27  0:01     ` Kai Huang
2022-04-27 14:24       ` Dave Hansen
2022-04-27 21:30         ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 06/21] x86/virt/tdx: Shut down TDX module in case of error Kai Huang
2022-04-23 15:39   ` Sathyanarayanan Kuppuswamy
2022-04-25 23:41     ` Kai Huang
2022-04-26  1:48       ` Sathyanarayanan Kuppuswamy
2022-04-26  2:12         ` Kai Huang
2022-04-26 20:59   ` Dave Hansen
2022-04-27  0:06     ` Kai Huang
2022-05-18 16:19       ` Sagi Shahar
2022-05-18 23:51         ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 07/21] x86/virt/tdx: Do TDX module global initialization Kai Huang
2022-04-20 22:27   ` Sathyanarayanan Kuppuswamy
2022-04-20 22:37     ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 08/21] x86/virt/tdx: Do logical-cpu scope TDX module initialization Kai Huang
2022-04-24  1:27   ` Sathyanarayanan Kuppuswamy
2022-04-25 23:55     ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 09/21] x86/virt/tdx: Get information about TDX module and convertible memory Kai Huang
2022-04-25  2:58   ` Sathyanarayanan Kuppuswamy
2022-04-26  0:05     ` Kai Huang
2022-04-27 22:15   ` Dave Hansen
2022-04-28  0:15     ` Kai Huang
2022-04-28 14:06       ` Dave Hansen
2022-04-28 23:14         ` Kai Huang
2022-04-29 17:47           ` Dave Hansen
2022-05-02  5:04             ` Kai Huang
2022-05-25  4:47             ` Kai Huang
2022-05-25  4:57               ` Kai Huang
2022-05-25 16:00                 ` Kai Huang
2022-05-18 22:30       ` Sagi Shahar
2022-05-18 23:56         ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 10/21] x86/virt/tdx: Add placeholder to coveret all system RAM as TDX memory Kai Huang
2022-04-20 20:48   ` Isaku Yamahata
2022-04-20 22:38     ` Kai Huang
2022-04-27 22:24   ` Dave Hansen
2022-04-28  0:53     ` Kai Huang
2022-04-28  1:07       ` Dave Hansen
2022-04-28  1:35         ` Kai Huang
2022-04-28  3:40           ` Dave Hansen
2022-04-28  3:55             ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 11/21] x86/virt/tdx: Choose to use " Kai Huang
2022-04-20 20:55   ` Isaku Yamahata
2022-04-20 22:39     ` Kai Huang
2022-04-28 15:54   ` Dave Hansen
2022-04-29  7:32     ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 12/21] x86/virt/tdx: Create TDMRs to cover all system RAM Kai Huang
2022-04-28 16:22   ` Dave Hansen
2022-04-29  7:24     ` Kai Huang
2022-04-29 13:52       ` Dave Hansen
2022-04-06  4:49 ` [PATCH v3 13/21] x86/virt/tdx: Allocate and set up PAMTs for TDMRs Kai Huang
2022-04-28 17:12   ` Dave Hansen
2022-04-29  7:46     ` Kai Huang
2022-04-29 14:20       ` Dave Hansen
2022-04-29 14:30         ` Sean Christopherson
2022-04-29 17:46           ` Dave Hansen
2022-04-29 18:19             ` Sean Christopherson
2022-04-29 18:32               ` Dave Hansen
2022-05-02  5:59         ` Kai Huang
2022-05-02 14:17           ` Dave Hansen
2022-05-02 21:55             ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 14/21] x86/virt/tdx: Set up reserved areas for all TDMRs Kai Huang
2022-04-06  4:49 ` [PATCH v3 15/21] x86/virt/tdx: Reserve TDX module global KeyID Kai Huang
2022-04-06  4:49 ` [PATCH v3 16/21] x86/virt/tdx: Configure TDX module with TDMRs and " Kai Huang
2022-04-06  4:49 ` [PATCH v3 17/21] x86/virt/tdx: Configure global KeyID on all packages Kai Huang
2022-04-06  4:49 ` [PATCH v3 18/21] x86/virt/tdx: Initialize all TDMRs Kai Huang
2022-04-06  4:49 ` [PATCH v3 19/21] x86: Flush cache of TDX private memory during kexec() Kai Huang
2022-04-06  4:49 ` [PATCH v3 20/21] x86/virt/tdx: Add kernel command line to opt-in TDX host support Kai Huang
2022-04-28 17:25   ` Dave Hansen
2022-04-06  4:49 ` [PATCH v3 21/21] Documentation/x86: Add documentation for " Kai Huang
2022-04-14 10:19 ` [PATCH v3 00/21] TDX host kernel support Kai Huang
2022-04-26 20:13 ` Dave Hansen [this message]
2022-04-27  1:15   ` Kai Huang
2022-04-27 21:59     ` Dave Hansen
2022-04-28  0:37       ` Kai Huang
2022-04-28  0:50         ` Dave Hansen
2022-04-28  0:58           ` Kai Huang
2022-04-29  1:40             ` Kai Huang
2022-04-29  3:04               ` Dan Williams
2022-04-29  5:35                 ` Kai Huang
2022-05-03 23:59               ` Kai Huang
2022-05-04  0:25                 ` Dave Hansen
2022-05-04  1:15                   ` Kai Huang
2022-05-05  9:54                     ` Kai Huang
2022-05-05 13:51                       ` Dan Williams
2022-05-05 22:14                         ` Kai Huang
2022-05-06  0:22                           ` Dan Williams
2022-05-06  0:45                             ` Kai Huang
2022-05-06  1:15                               ` Dan Williams
2022-05-06  1:46                                 ` Kai Huang
2022-05-06 15:57                                   ` Dan Williams
2022-05-09  2:46                                     ` Kai Huang
2022-05-10 10:25                                       ` Kai Huang
2022-05-07  0:09                         ` Mike Rapoport
2022-05-08 10:00                           ` Kai Huang
2022-05-09 10:33                             ` Mike Rapoport
2022-05-09 23:27                               ` Kai Huang
2022-05-04 14:31                 ` Dan Williams
2022-05-04 22:50                   ` Kai Huang
2022-04-28  1:01   ` Dan Williams
2022-04-28  1:21     ` Kai Huang
2022-04-29  2:58       ` Dan Williams
2022-04-29  5:43         ` Kai Huang
2022-04-29 14:39         ` Dave Hansen
2022-04-29 15:18           ` Dan Williams
2022-04-29 17:18             ` Dave Hansen
2022-04-29 17:48               ` Dan Williams
2022-04-29 18:34                 ` Dave Hansen
2022-04-29 18:47                   ` Dan Williams
2022-04-29 19:20                     ` Dave Hansen
2022-04-29 21:20                       ` Dan Williams
2022-04-29 21:27                         ` Dave Hansen
2022-05-02 10:18                   ` Kai Huang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=522e37eb-68fc-35db-44d5-479d0088e43f@intel.com \
    --to=dave.hansen@intel.com \
    --cc=ak@linux.intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=isaku.yamahata@intel.com \
    --cc=kai.huang@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=reinette.chatre@intel.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=seanjc@google.com \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).