kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Kai Huang <kai.huang@intel.com>,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: linux-mm@kvack.org, dave.hansen@intel.com, peterz@infradead.org,
	tglx@linutronix.de, seanjc@google.com, pbonzini@redhat.com,
	dan.j.williams@intel.com, rafael.j.wysocki@intel.com,
	kirill.shutemov@linux.intel.com, ying.huang@intel.com,
	reinette.chatre@intel.com, len.brown@intel.com,
	tony.luck@intel.com, ak@linux.intel.com,
	isaku.yamahata@intel.com, chao.gao@intel.com,
	sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com,
	sagis@google.com, imammedo@redhat.com
Subject: Re: [PATCH v10 00/16] TDX host kernel support
Date: Thu, 16 Mar 2023 13:35:35 +0100	[thread overview]
Message-ID: <12597014-f920-df75-d516-db871aedbc8c@redhat.com> (raw)
In-Reply-To: <cover.1678111292.git.kai.huang@intel.com>

On 06.03.23 15:13, Kai Huang wrote:
> Intel Trusted Domain Extensions (TDX) protects guest VMs from malicious
> host and certain physical attacks.  TDX specs are available in [1].

I'm afraid there is no [1], probably got lost while resending :)

> 
> This series is the initial support to enable TDX with minimal code to
> allow KVM to create and run TDX guests.  KVM support for TDX is being
> developed separately[2].  A new "userspace inaccessible memfd" approach
> to support TDX private memory is also being developed[3].  The KVM will
> only support the new "userspace inaccessible memfd" as TDX guest memory.

Same with [2].

> 
> This series doesn't aim to support all functionalities, and doesn't aim
> to resolve all things perfectly.  For example, memory hotplug is handled
> in simple way (please refer to "Kernel policy on TDX memory" and "Memory
> hotplug" sections below).
> 
> (For memory hotplug, sorry for broadcasting widely but I cc'ed the
> linux-mm@kvack.org following Kirill's suggestion so MM experts can also
> help to provide comments.)
> 
> And TDX module metadata allocation just uses alloc_contig_pages() to
> allocate large chunk at runtime, thus it can fail.  It is imperfect now
> but _will_ be improved in the future.

Good enough for now I guess. Reserving it via memblock might be better, 
though.

> 
> Also, the patch to add the new kernel comline tdx="force" isn't included
> in this initial version, as Dave suggested it isn't mandatory.  But I
> _will_ add one once this initial version gets merged.

What would be the main purpose of that option?

> 
> All other optimizations will be posted as follow-up once this initial
> TDX support is upstreamed.
> 


[...]

> == Background ==
> 
> TDX introduces a new CPU mode called Secure Arbitration Mode (SEAM)
> and a new isolated range pointed by the SEAM Ranger Register (SEAMRR).
> A CPU-attested software module called 'the TDX module' runs in the new
> isolated region as a trusted hypervisor to create/run protected VMs.
> 
> TDX also leverages Intel Multi-Key Total Memory Encryption (MKTME) to
> provide crypto-protection to the VMs.  TDX reserves part of MKTME KeyIDs
> as TDX private KeyIDs, which are only accessible within the SEAM mode.
> 
> TDX is different from AMD SEV/SEV-ES/SEV-SNP, which uses a dedicated
> secure processor to provide crypto-protection.  The firmware runs on the
> secure processor acts a similar role as the TDX module.
> 
> The host kernel communicates with SEAM software via a new SEAMCALL
> instruction.  This is conceptually similar to a guest->host hypercall,
> except it is made from the host to SEAM software instead.
> 
> Before being able to manage TD guests, the TDX module must be loaded
> and properly initialized.  This series assumes the TDX module is loaded
> by BIOS before the kernel boots.
> 
> How to initialize the TDX module is described at TDX module 1.0
> specification, chapter "13.Intel TDX Module Lifecycle: Enumeration,
> Initialization and Shutdown".
> 
> == Design Considerations ==
> 
> 1. Initialize the TDX module at runtime
> 
> There are basically two ways the TDX module could be initialized: either
> in early boot, or at runtime before the first TDX guest is run.  This
> series implements the runtime initialization.
> 
> This series adds a function tdx_enable() to allow the caller to initialize
> TDX at runtime:
> 
>          if (tdx_enable())
>                  goto no_tdx;
> 	// TDX is ready to create TD guests.
> 
> This approach has below pros:
> 
> 1) Initializing the TDX module requires to reserve ~1/256th system RAM as
> metadata.  Enabling TDX on demand allows only to consume this memory when
> TDX is truly needed (i.e. when KVM wants to create TD guests).

Let's be clear: nobody is going to run encrypted VMs "out of the blue".

You can expect a certain hypervisor setup to be required, for example, 
enabling it on the cmdline and then allocating that metadata from 
memblock during boot.

IIRC s390x handles it similarly with protected VMs and required metadata.

> 
> 2) SEAMCALL requires CPU being already in VMX operation (VMXON has been
> done).  So far, KVM is the only user of TDX, and it already handles VMXON.
> Letting KVM to initialize TDX avoids handling VMXON in the core kernel.
> 
> 3) It is more flexible to support "TDX module runtime update" (not in
> this series).  After updating to the new module at runtime, kernel needs
> to go through the initialization process again.
> 
> 2. CPU hotplug
> 
> TDX module requires the per-cpu initialization SEAMCALL (TDH.SYS.LP.INIT)
> must be done on one cpu before any other SEAMCALLs can be made on that
> cpu, including those involved during the module initialization.
> 
> The kernel provides tdx_cpu_enable() to let the user of TDX to do it when
> the user wants to use a new cpu for TDX task.
> 
> TDX doesn't support physical (ACPI) CPU hotplug.  A non-buggy BIOS should
> never support hotpluggable CPU devicee and/or deliver ACPI CPU hotplug
> event to the kernel.  This series doesn't handle physical (ACPI) CPU
> hotplug at all but depends on the BIOS to behave correctly.
> 
> Note TDX works with CPU logical online/offline, thus this series still
> allows to do logical CPU online/offline.
> 
> 3. Kernel policy on TDX memory
> 
> The TDX module reports a list of "Convertible Memory Region" (CMR) to
> indicate which memory regions are TDX-capable.  The TDX architecture
> allows the VMM to designate specific convertible memory regions as usable
> for TDX private memory.
> 
> The initial support of TDX guests will only allocate TDX private memory
> from the global page allocator.  This series chooses to designate _all_
> system RAM in the core-mm at the time of initializing TDX module as TDX
> memory to guarantee all pages in the page allocator are TDX pages.
> 
> 4. Memory Hotplug
> 
> After the kernel passes all "TDX-usable" memory regions to the TDX
> module, the set of "TDX-usable" memory regions are fixed during module's
> runtime.  No more "TDX-usable" memory can be added to the TDX module
> after that.
> 
> To achieve above "to guarantee all pages in the page allocator are TDX
> pages", this series simply choose to reject any non-TDX-usable memory in
> memory hotplug.
> 
> This _will_ be enhanced in the future after first submission.

What's the primary reason to enhance that? Are there reasonable use 
cases? Why would be expect to have other (!TDX capable) memory in the 
system?

> 
> A better solution, suggested by Kirill, is similar to the per-node memory
> encryption flag in this series [4].  We can allow adding/onlining non-TDX
> memory to separate NUMA nodes so that both "TDX-capable" nodes and
> "TDX-capable" nodes can co-exist.  The new TDX flag can be exposed to
> userspace via /sysfs so userspace can bind TDX guests to "TDX-capable"
> nodes via NUMA ABIs.
> 
> 5. Physical Memory Hotplug
> 
> Note TDX assumes convertible memory is always physically present during
> machine's runtime.  A non-buggy BIOS should never support hot-removal of
> any convertible memory.  This implementation doesn't handle ACPI memory
> removal but depends on the BIOS to behave correctly.

-- 
Thanks,

David / dhildenb


  parent reply	other threads:[~2023-03-16 12:36 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-06 14:13 [PATCH v10 00/16] TDX host kernel support Kai Huang
2023-03-06 14:13 ` [PATCH v10 01/16] x86/tdx: Define TDX supported page sizes as macros Kai Huang
2023-03-16 12:37   ` David Hildenbrand
2023-03-16 22:41     ` Huang, Kai
2023-03-06 14:13 ` [PATCH v10 02/16] x86/virt/tdx: Detect TDX during kernel boot Kai Huang
2023-03-16 12:48   ` David Hildenbrand
2023-03-16 22:37     ` Huang, Kai
2023-03-23 17:02       ` David Hildenbrand
2023-03-23 22:15         ` Huang, Kai
2023-03-06 14:13 ` [PATCH v10 03/16] x86/virt/tdx: Make INTEL_TDX_HOST depend on X86_X2APIC Kai Huang
2023-03-16 12:57   ` David Hildenbrand
2023-03-06 14:13 ` [PATCH v10 04/16] x86/virt/tdx: Add SEAMCALL infrastructure Kai Huang
2023-03-06 14:13 ` [PATCH v10 05/16] x86/virt/tdx: Add skeleton to enable TDX on demand Kai Huang
2023-03-08 22:27   ` Isaku Yamahata
2023-03-12 23:08     ` Huang, Kai
2023-03-13 23:49       ` Isaku Yamahata
2023-03-14  1:50         ` Huang, Kai
2023-03-14  4:02           ` Isaku Yamahata
2023-03-14  5:45             ` Dave Hansen
2023-03-14 17:16               ` Isaku Yamahata
2023-03-14 17:38                 ` Dave Hansen
2023-03-14 15:48           ` Dave Hansen
2023-03-15 11:10             ` Huang, Kai
2023-03-16 22:07               ` Huang, Kai
2023-03-23 13:49               ` Dave Hansen
2023-03-23 22:09                 ` Huang, Kai
2023-03-23 22:12                   ` Dave Hansen
2023-03-23 22:42                     ` Huang, Kai
2023-03-16  0:31   ` Isaku Yamahata
2023-03-16  2:45     ` Isaku Yamahata
2023-03-16  2:52       ` Huang, Kai
2023-03-06 14:13 ` [PATCH v10 06/16] x86/virt/tdx: Get information about TDX module and TDX-capable memory Kai Huang
2023-03-06 14:13 ` [PATCH v10 07/16] x86/virt/tdx: Use all system memory when initializing TDX module as TDX memory Kai Huang
2023-03-09  1:38   ` Isaku Yamahata
2023-03-06 14:13 ` [PATCH v10 08/16] x86/virt/tdx: Add placeholder to construct TDMRs to cover all TDX memory regions Kai Huang
2023-03-06 14:13 ` [PATCH v10 09/16] x86/virt/tdx: Fill out " Kai Huang
2023-03-06 14:13 ` [PATCH v10 10/16] x86/virt/tdx: Allocate and set up PAMTs for TDMRs Kai Huang
2023-03-21  7:44   ` Dong, Eddie
2023-03-21  8:05     ` Huang, Kai
2023-03-06 14:13 ` [PATCH v10 11/16] x86/virt/tdx: Designate reserved areas for all TDMRs Kai Huang
2023-03-06 14:13 ` [PATCH v10 12/16] x86/virt/tdx: Configure TDX module with the TDMRs and global KeyID Kai Huang
2023-03-06 14:13 ` [PATCH v10 13/16] x86/virt/tdx: Configure global KeyID on all packages Kai Huang
2023-03-06 14:13 ` [PATCH v10 14/16] x86/virt/tdx: Initialize all TDMRs Kai Huang
2023-03-06 14:14 ` [PATCH v10 15/16] x86/virt/tdx: Flush cache in kexec() when TDX is enabled Kai Huang
2023-03-06 14:14 ` [PATCH v10 16/16] Documentation/x86: Add documentation for TDX host support Kai Huang
2023-03-08  1:11 ` [PATCH v10 00/16] TDX host kernel support Isaku Yamahata
2023-03-16 12:35 ` David Hildenbrand [this message]
2023-03-16 22:06   ` Huang, Kai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=12597014-f920-df75-d516-db871aedbc8c@redhat.com \
    --to=david@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=bagasdotme@gmail.com \
    --cc=chao.gao@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=imammedo@redhat.com \
    --cc=isaku.yamahata@intel.com \
    --cc=kai.huang@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=reinette.chatre@intel.com \
    --cc=sagis@google.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).