linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kai Huang <kai.huang@intel.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	KVM list <kvm@vger.kernel.org>,
	Sean Christopherson <seanjc@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"Brown, Len" <len.brown@intel.com>,
	"Luck, Tony" <tony.luck@intel.com>,
	Rafael J Wysocki <rafael.j.wysocki@intel.com>,
	Reinette Chatre <reinette.chatre@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Andi Kleen <ak@linux.intel.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Kuppuswamy Sathyanarayanan 
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	Isaku Yamahata <isaku.yamahata@intel.com>
Subject: Re: [PATCH v3 00/21] TDX host kernel support
Date: Fri, 29 Apr 2022 17:35:30 +1200	[thread overview]
Message-ID: <ffa956ade7c784af347da346a61bef22b85d9646.camel@intel.com> (raw)
In-Reply-To: <CAPcyv4gEwjnNE9cWb_KLZ6C7-UxKdUMZKFPF+LAJ4L1SjByisw@mail.gmail.com>

On Thu, 2022-04-28 at 20:04 -0700, Dan Williams wrote:
> On Thu, Apr 28, 2022 at 6:40 PM Kai Huang <kai.huang@intel.com> wrote:
> > 
> > On Thu, 2022-04-28 at 12:58 +1200, Kai Huang wrote:
> > > On Wed, 2022-04-27 at 17:50 -0700, Dave Hansen wrote:
> > > > On 4/27/22 17:37, Kai Huang wrote:
> > > > > On Wed, 2022-04-27 at 14:59 -0700, Dave Hansen wrote:
> > > > > > In 5 years, if someone takes this code and runs it on Intel hardware
> > > > > > with memory hotplug, CPU hotplug, NVDIMMs *AND* TDX support, what happens?
> > > > > 
> > > > > I thought we could document this in the documentation saying that this code can
> > > > > only work on TDX machines that don't have above capabilities (SPR for now).  We
> > > > > can change the code and the documentation  when we add the support of those
> > > > > features in the future, and update the documentation.
> > > > > 
> > > > > If 5 years later someone takes this code, he/she should take a look at the
> > > > > documentation and figure out that he/she should choose a newer kernel if the
> > > > > machine support those features.
> > > > > 
> > > > > I'll think about design solutions if above doesn't look good for you.
> > > > 
> > > > No, it doesn't look good to me.
> > > > 
> > > > You can't just say:
> > > > 
> > > >     /*
> > > >      * This code will eat puppies if used on systems with hotplug.
> > > >      */
> > > > 
> > > > and merrily await the puppy bloodbath.
> > > > 
> > > > If it's not compatible, then you have to *MAKE* it not compatible in a
> > > > safe, controlled way.
> > > > 
> > > > > > You can't just ignore the problems because they're not present on one
> > > > > > version of the hardware.
> > > > 
> > > > Please, please read this again ^^
> > > 
> > > OK.  I'll think about solutions and come back later.
> > > > 
> > 
> > Hi Dave,
> > 
> > I think we have two approaches to handle memory hotplug interaction with the TDX
> > module initialization.
> > 
> > The first approach is simple.  We just block memory from being added as system
> > RAM managed by page allocator when the platform supports TDX [1]. It seems we
> > can add some arch-specific-check to __add_memory_resource() and reject the new
> > memory resource if platform supports TDX.  __add_memory_resource() is called by
> > both __add_memory() and add_memory_driver_managed() so it prevents from adding
> > NVDIMM as system RAM and normal ACPI memory hotplug [2].
> 
> What if the memory being added *is* TDX capable? What if someone
> wanted to manage a memory range as soft-reserved and move it back and
> forth from the core-mm to device access. That should be perfectly
> acceptable as long as the memory is TDX capable.

Please see below.

> 
> > The second approach is relatively more complicated.  Instead of directly
> > rejecting the new memory resource in __add_memory_resource(), we check whether
> > the memory resource can be added based on CMR and the TDX module initialization
> > status.   This is feasible as with the latest public P-SEAMLDR spec, we can get
> > CMR from P-SEAMLDR SEAMCALL[3].  So we can detect P-SEAMLDR and get CMR info
> > during kernel boots.  And in __add_memory_resource() we do below check:
> > 
> >         tdx_init_disable();     /*similar to cpu_hotplug_disable() */
> >         if (tdx_module_initialized())
> >                 // reject memory hotplug
> >         else if (new_memory_resource NOT in CMRs)
> >                 // reject memory hotplug
> >         else
> >                 allow memory hotplug
> >         tdx_init_enable();      /*similar to cpu_hotplug_enable() */
> > 
> > tdx_init_disable() temporarily disables TDX module initialization by trying to
> > grab the mutex.  If the TDX module initialization is already on going, then it
> > waits until it completes.
> > 
> > This should work better for future platforms, but would requires non-trivial
> > more code as we need to add VMXON/VMXOFF support to the core-kernel to detect
> > CMR using  SEAMCALL.  A side advantage is with VMXON in core-kernel we can
> > shutdown the TDX module in kexec().
> > 
> > But for this series I think the second approach is overkill and we can choose to
> > use the first simple approach?
> 
> This still sounds like it is trying to solve symptoms and not the root
> problem. Why must the core-mm never have non-TDX memory when VMs are
> fine to operate with either core-mm pages or memory from other sources
> like hugetlbfs and device-dax?

Basically we don't want to modify page allocator API to distinguish TDX and non-
TDX allocation.  For instance, we don't want a new GFP_TDX.

There's another series done by Chao "KVM: mm: fd-based approach for supporting
KVM guest private memory" which essentially allows KVM to ask guest memory
backend to allocate page w/o having to mmap() to userspace.  

https://lore.kernel.org/kvm/20220310140911.50924-1-chao.p.peng@linux.intel.com/

More specifically, memfd will support a new MFD_INACCESSIBLE flag when it is
created so all pages associated with this memfd will be TDX capable memory.  The
backend will need to implement a new memfile_notifier_ops to allow KVM to get
and put the memory page.

struct memfile_pfn_ops {
	long (*get_lock_pfn)(struct inode *inode, pgoff_t offset, int *order);
	void (*put_unlock_pfn)(unsigned long pfn);
};

With that, it is backend's responsibility to implement get_lock_pfn() callback
in which the backend needs to ensure a TDX private page is allocated.

For TD guest, KVM should enforced to only use those fd-based backend.  I am not
sure whether anonymous pages should be supported anymore.

Sean, please correct me if I am wrong?

Currently only shmem is extended to support it.  By ensuring pages in page
allocator are all TDX memory, shmem can be extended easily to support TD guests.
 
If device-dax and hugetlbfs wants to support TD guests then they should
implement those callbacks and ensure only TDX memory is allocated.  For
instance, when future TDX supports NVDIMM (i.e. NVDIMM is included to CMRs),
then device-dax pages can be included as TDX memory when initializing the TDX
module and device-dax can implement it's own to support allocating page for TD
guests.

But TDX architecture can be changed to support memory hotplug in a more graceful
way in the future.  For instance, it can choose to support dynamically adding
any convertible memory as TDX memory *after* TDX module initialization.  But
this is just my brainstorming.

Anyway, for now, since only shmem (or + anonymous pages) can be used to create
TD guests, I think we can just reject any memory hot-add when platform supports
TDX as described in the first simple approach.  Eventually we may need something
like the second approach but TDX architecture can evolve too.


-- 
Thanks,
-Kai



  reply	other threads:[~2022-04-29  5:35 UTC|newest]

Thread overview: 156+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-06  4:49 [PATCH v3 00/21] TDX host kernel support Kai Huang
2022-04-06  4:49 ` [PATCH v3 01/21] x86/virt/tdx: Detect SEAM Kai Huang
2022-04-18 22:29   ` Sathyanarayanan Kuppuswamy
2022-04-18 22:50     ` Sean Christopherson
2022-04-19  3:38     ` Kai Huang
2022-04-26 20:21   ` Dave Hansen
2022-04-26 23:12     ` Kai Huang
2022-04-26 23:28       ` Dave Hansen
2022-04-26 23:49         ` Kai Huang
2022-04-27  0:22           ` Sean Christopherson
2022-04-27  0:44             ` Kai Huang
2022-04-27 14:22           ` Dave Hansen
2022-04-27 22:39             ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 02/21] x86/virt/tdx: Detect TDX private KeyIDs Kai Huang
2022-04-19  5:39   ` Sathyanarayanan Kuppuswamy
2022-04-19  9:41     ` Kai Huang
2022-04-19  5:42   ` Sathyanarayanan Kuppuswamy
2022-04-19 10:07     ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 03/21] x86/virt/tdx: Implement the SEAMCALL base function Kai Huang
2022-04-19 14:07   ` Sathyanarayanan Kuppuswamy
2022-04-20  4:16     ` Kai Huang
2022-04-20  7:29       ` Sathyanarayanan Kuppuswamy
2022-04-20 10:39         ` Kai Huang
2022-04-26 20:37   ` Dave Hansen
2022-04-26 23:29     ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 04/21] x86/virt/tdx: Add skeleton for detecting and initializing TDX on demand Kai Huang
2022-04-19 14:53   ` Sathyanarayanan Kuppuswamy
2022-04-20  4:37     ` Kai Huang
2022-04-20  5:21       ` Dave Hansen
2022-04-20 14:30       ` Sathyanarayanan Kuppuswamy
2022-04-20 22:35         ` Kai Huang
2022-04-26 20:53   ` Dave Hansen
2022-04-27  0:43     ` Kai Huang
2022-04-27 14:49       ` Dave Hansen
2022-04-28  0:00         ` Kai Huang
2022-04-28 14:27           ` Dave Hansen
2022-04-28 23:44             ` Kai Huang
2022-04-28 23:53               ` Dave Hansen
2022-04-29  0:11                 ` Kai Huang
2022-04-29  0:26                   ` Dave Hansen
2022-04-29  0:59                     ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 05/21] x86/virt/tdx: Detect P-SEAMLDR and TDX module Kai Huang
2022-04-26 20:56   ` Dave Hansen
2022-04-27  0:01     ` Kai Huang
2022-04-27 14:24       ` Dave Hansen
2022-04-27 21:30         ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 06/21] x86/virt/tdx: Shut down TDX module in case of error Kai Huang
2022-04-23 15:39   ` Sathyanarayanan Kuppuswamy
2022-04-25 23:41     ` Kai Huang
2022-04-26  1:48       ` Sathyanarayanan Kuppuswamy
2022-04-26  2:12         ` Kai Huang
2022-04-26 20:59   ` Dave Hansen
2022-04-27  0:06     ` Kai Huang
2022-05-18 16:19       ` Sagi Shahar
2022-05-18 23:51         ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 07/21] x86/virt/tdx: Do TDX module global initialization Kai Huang
2022-04-20 22:27   ` Sathyanarayanan Kuppuswamy
2022-04-20 22:37     ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 08/21] x86/virt/tdx: Do logical-cpu scope TDX module initialization Kai Huang
2022-04-24  1:27   ` Sathyanarayanan Kuppuswamy
2022-04-25 23:55     ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 09/21] x86/virt/tdx: Get information about TDX module and convertible memory Kai Huang
2022-04-25  2:58   ` Sathyanarayanan Kuppuswamy
2022-04-26  0:05     ` Kai Huang
2022-04-27 22:15   ` Dave Hansen
2022-04-28  0:15     ` Kai Huang
2022-04-28 14:06       ` Dave Hansen
2022-04-28 23:14         ` Kai Huang
2022-04-29 17:47           ` Dave Hansen
2022-05-02  5:04             ` Kai Huang
2022-05-25  4:47             ` Kai Huang
2022-05-25  4:57               ` Kai Huang
2022-05-25 16:00                 ` Kai Huang
2022-05-18 22:30       ` Sagi Shahar
2022-05-18 23:56         ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 10/21] x86/virt/tdx: Add placeholder to coveret all system RAM as TDX memory Kai Huang
2022-04-20 20:48   ` Isaku Yamahata
2022-04-20 22:38     ` Kai Huang
2022-04-27 22:24   ` Dave Hansen
2022-04-28  0:53     ` Kai Huang
2022-04-28  1:07       ` Dave Hansen
2022-04-28  1:35         ` Kai Huang
2022-04-28  3:40           ` Dave Hansen
2022-04-28  3:55             ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 11/21] x86/virt/tdx: Choose to use " Kai Huang
2022-04-20 20:55   ` Isaku Yamahata
2022-04-20 22:39     ` Kai Huang
2022-04-28 15:54   ` Dave Hansen
2022-04-29  7:32     ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 12/21] x86/virt/tdx: Create TDMRs to cover all system RAM Kai Huang
2022-04-28 16:22   ` Dave Hansen
2022-04-29  7:24     ` Kai Huang
2022-04-29 13:52       ` Dave Hansen
2022-04-06  4:49 ` [PATCH v3 13/21] x86/virt/tdx: Allocate and set up PAMTs for TDMRs Kai Huang
2022-04-28 17:12   ` Dave Hansen
2022-04-29  7:46     ` Kai Huang
2022-04-29 14:20       ` Dave Hansen
2022-04-29 14:30         ` Sean Christopherson
2022-04-29 17:46           ` Dave Hansen
2022-04-29 18:19             ` Sean Christopherson
2022-04-29 18:32               ` Dave Hansen
2022-05-02  5:59         ` Kai Huang
2022-05-02 14:17           ` Dave Hansen
2022-05-02 21:55             ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 14/21] x86/virt/tdx: Set up reserved areas for all TDMRs Kai Huang
2022-04-06  4:49 ` [PATCH v3 15/21] x86/virt/tdx: Reserve TDX module global KeyID Kai Huang
2022-04-06  4:49 ` [PATCH v3 16/21] x86/virt/tdx: Configure TDX module with TDMRs and " Kai Huang
2022-04-06  4:49 ` [PATCH v3 17/21] x86/virt/tdx: Configure global KeyID on all packages Kai Huang
2022-04-06  4:49 ` [PATCH v3 18/21] x86/virt/tdx: Initialize all TDMRs Kai Huang
2022-04-06  4:49 ` [PATCH v3 19/21] x86: Flush cache of TDX private memory during kexec() Kai Huang
2022-04-06  4:49 ` [PATCH v3 20/21] x86/virt/tdx: Add kernel command line to opt-in TDX host support Kai Huang
2022-04-28 17:25   ` Dave Hansen
2022-04-06  4:49 ` [PATCH v3 21/21] Documentation/x86: Add documentation for " Kai Huang
2022-04-14 10:19 ` [PATCH v3 00/21] TDX host kernel support Kai Huang
2022-04-26 20:13 ` Dave Hansen
2022-04-27  1:15   ` Kai Huang
2022-04-27 21:59     ` Dave Hansen
2022-04-28  0:37       ` Kai Huang
2022-04-28  0:50         ` Dave Hansen
2022-04-28  0:58           ` Kai Huang
2022-04-29  1:40             ` Kai Huang
2022-04-29  3:04               ` Dan Williams
2022-04-29  5:35                 ` Kai Huang [this message]
2022-05-03 23:59               ` Kai Huang
2022-05-04  0:25                 ` Dave Hansen
2022-05-04  1:15                   ` Kai Huang
2022-05-05  9:54                     ` Kai Huang
2022-05-05 13:51                       ` Dan Williams
2022-05-05 22:14                         ` Kai Huang
2022-05-06  0:22                           ` Dan Williams
2022-05-06  0:45                             ` Kai Huang
2022-05-06  1:15                               ` Dan Williams
2022-05-06  1:46                                 ` Kai Huang
2022-05-06 15:57                                   ` Dan Williams
2022-05-09  2:46                                     ` Kai Huang
2022-05-10 10:25                                       ` Kai Huang
2022-05-07  0:09                         ` Mike Rapoport
2022-05-08 10:00                           ` Kai Huang
2022-05-09 10:33                             ` Mike Rapoport
2022-05-09 23:27                               ` Kai Huang
2022-05-04 14:31                 ` Dan Williams
2022-05-04 22:50                   ` Kai Huang
2022-04-28  1:01   ` Dan Williams
2022-04-28  1:21     ` Kai Huang
2022-04-29  2:58       ` Dan Williams
2022-04-29  5:43         ` Kai Huang
2022-04-29 14:39         ` Dave Hansen
2022-04-29 15:18           ` Dan Williams
2022-04-29 17:18             ` Dave Hansen
2022-04-29 17:48               ` Dan Williams
2022-04-29 18:34                 ` Dave Hansen
2022-04-29 18:47                   ` Dan Williams
2022-04-29 19:20                     ` Dave Hansen
2022-04-29 21:20                       ` Dan Williams
2022-04-29 21:27                         ` Dave Hansen
2022-05-02 10:18                   ` Kai Huang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ffa956ade7c784af347da346a61bef22b85d9646.camel@intel.com \
    --to=kai.huang@intel.com \
    --cc=ak@linux.intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=isaku.yamahata@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=reinette.chatre@intel.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=seanjc@google.com \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).