linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kai Huang <kai.huang@intel.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	KVM list <kvm@vger.kernel.org>,
	Sean Christopherson <seanjc@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"Brown, Len" <len.brown@intel.com>,
	"Luck, Tony" <tony.luck@intel.com>,
	Rafael J Wysocki <rafael.j.wysocki@intel.com>,
	Reinette Chatre <reinette.chatre@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Andi Kleen <ak@linux.intel.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Kuppuswamy Sathyanarayanan 
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	Isaku Yamahata <isaku.yamahata@intel.com>,
	Mike Rapoport <rppt@kernel.org>
Subject: Re: [PATCH v3 00/21] TDX host kernel support
Date: Fri, 06 May 2022 13:46:55 +1200	[thread overview]
Message-ID: <b0d1ed15d8bf99efe1c49182f4a98f4a23f61d0d.camel@intel.com> (raw)
In-Reply-To: <CAPcyv4hdM+0zntuTez9n1-dJ_ODsF_TxAct=VpTs-tWJzBPJqQ@mail.gmail.com>

On Thu, 2022-05-05 at 18:15 -0700, Dan Williams wrote:
> On Thu, May 5, 2022 at 5:46 PM Kai Huang <kai.huang@intel.com> wrote:
> > 
> > On Thu, 2022-05-05 at 17:22 -0700, Dan Williams wrote:
> > > On Thu, May 5, 2022 at 3:14 PM Kai Huang <kai.huang@intel.com> wrote:
> > > > 
> > > > Thanks for feedback!
> > > > 
> > > > On Thu, 2022-05-05 at 06:51 -0700, Dan Williams wrote:
> > > > > [ add Mike ]
> > > > > 
> > > > > 
> > > > > On Thu, May 5, 2022 at 2:54 AM Kai Huang <kai.huang@intel.com> wrote:
> > > > > [..]
> > > > > > 
> > > > > > Hi Dave,
> > > > > > 
> > > > > > Sorry to ping (trying to close this).
> > > > > > 
> > > > > > Given we don't need to consider kmem-hot-add legacy PMEM after TDX module
> > > > > > initialization, I think for now it's totally fine to exclude legacy PMEMs from
> > > > > > TDMRs.  The worst case is when someone tries to use them as TD guest backend
> > > > > > directly, the TD will fail to create.  IMO it's acceptable, as it is supposedly
> > > > > > that no one should just use some random backend to run TD.
> > > > > 
> > > > > The platform will already do this, right?
> > > > > 
> > > > 
> > > > In the current v3 implementation, we don't have any code to handle memory
> > > > hotplug, therefore nothing prevents people from adding legacy PMEMs as system
> > > > RAM using kmem driver.  In order to guarantee all pages managed by page
> > > 
> > > That's the fundamental question I am asking why is "guarantee all
> > > pages managed by page allocator are TDX memory". That seems overkill
> > > compared to indicating the incompatibility after the fact.
> > 
> > As I explained, the reason is I don't want to modify page allocator to
> > distinguish TDX and non-TDX allocation, for instance, having to have a ZONE_TDX
> > and GFP_TDX.
> 
> Right, TDX details do not belong at that level, but it will work
> almost all the time if you do nothing to "guarantee" all TDX capable
> pages all the time.

"almost all the time" do you mean?

> 
> > KVM depends on host's page fault handler to allocate the page.  In fact KVM only
> > consumes PFN from host's page tables.  For now only RAM is TDX memory.  By
> > guaranteeing all pages in page allocator is TDX memory, we can easily use
> > anonymous pages as TD guest memory.
> 
> Again, TDX capable pages will be the overwhelming default, why are you
> worried about cluttering the memory hotplug path for nice corner
> cases.

Firstly perhaps I forgot to mention there are two concepts about TDX memory, so
let me clarify first:

1) Convertible Memory Regions (CMRs).  This is reported by BIOS (thus static) to
indicate which memory regions *can* be used as TDX memory.  This basically means
all RAM during boot for now.

2) TD Memory Regions (TDMRs).  Memory pages in CMRs are not automatically TDX
usable memory.  The TDX module needs to be configured which (convertible) memory
regions can be used as TDX memory.  Kernel is responsible for choosing the
ranges, and configure to the TDX module.  If a convertible memory page is not
included into TDMRs, the TDX module will report error when it is assigned to  a
TD.

> 
> Consider the fact that end users can break the kernel by specifying
> invalid memmap= command line options. The memory hotplug code does not
> take any steps to add safety in those cases because there are already
> too many ways it can go wrong. TDX is just one more corner case where
> the memmap= user needs to be careful. Otherwise, it is up to the
> platform firmware to make sure everything in the base memory map is
> TDX capable, and then all you need is documentation about the failure
> mode when extending "System RAM" beyond that baseline.

So the fact is, if we don't include legacy PMEMs into TDMRs, and don't do
anything in memory hotplug, then if user does kmem-hot-add legacy PMEMs as
system RAM, a live TD may eventually be killed.

If such case is a corner case that we don't need to guarantee, then even better.
And we have an additional reason that those legacy PMEMs don't need to be in
TDMRs.  As you suggested,  we can add some documentation to point out.

But the point we want to do some code check and prevent memory hotplug is, as
Dave said, we want this piece of code to work on *ANY* TDX capable machines,
including future machines which may, i.e. supports NVDIMM/CLX memory as TDX
memory.  If we don't do any code check in  memory hotplug in this series, then
when this code runs in future platforms, user can plug NVDIMM or CLX memory as
system RAM thus break the assumption "all pages in page allocator are TDX
memory", which eventually leads to live TDs being killed potentially.

Dave said we need to guarantee this code can work on *ANY* TDX machines.  Some
documentation saying it only works one some platforms and you shouldn't do
things on other platforms are not good enough:

https://lore.kernel.org/lkml/cover.1649219184.git.kai.huang@intel.com/T/#m6df45b6e1702bb03dcb027044a0dabf30a86e471

> 
> 
> > shmem to support a new fd-based backend which doesn't require having to mmap()
> > TD guest memory to host userspace:
> > 
> > https://lore.kernel.org/kvm/20220310140911.50924-1-chao.p.peng@linux.intel.com/
> > 
> > Also, besides TD guest memory, there are some per-TD control data structures
> > (which must be TDX memory too) need to be allocated for each TD.  Normal memory
> > allocation APIs can be used for such allocation if we guarantee all pages in
> > page allocator is TDX memory.
> 
> You don't need that guarantee, just check it after the fact and fail
> if that assertion fails. It should almost always be the case that it
> succeeds and if it doesn't then something special is happening with
> that system and the end user has effectively opt-ed out of TDX
> operation.

This doesn't guarantee consistent behaviour.  For instance, for one TD it can be
created, while the second may fail.  We should provide a consistent service.

The thing is anyway we need to configure some memory regions to the TDX module.
To me there's no reason we don't want to guarantee all pages in page allocator
are TDX memory. 

> 
> > > > allocator are all TDX memory, the v3 implementation needs to always include
> > > > legacy PMEMs as TDX memory so that even people truly add  legacy PMEMs as system
> > > > RAM, we can still guarantee all pages in page allocator are TDX memory.
> > > 
> > > Why?
> > 
> > If we don't include legacy PMEMs as TDX memory, then after they are hot-added as
> > system RAM using kmem driver, the assumption of "all pages in page allocator are
> > TDX memory" is broken.  A TD can be killed during runtime.
> 
> Yes, that is what the end user asked for. If they don't want that to
> happen then the policy decision about using kmem needs to be updated
> in userspace, not hard code that policy decision towards TDX inside
> the kernel.

This is also fine to me.  But please also see above Dave's comment.

Thanks for those valuable feedback!


-- 
Thanks,
-Kai



  reply	other threads:[~2022-05-06  1:47 UTC|newest]

Thread overview: 156+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-06  4:49 [PATCH v3 00/21] TDX host kernel support Kai Huang
2022-04-06  4:49 ` [PATCH v3 01/21] x86/virt/tdx: Detect SEAM Kai Huang
2022-04-18 22:29   ` Sathyanarayanan Kuppuswamy
2022-04-18 22:50     ` Sean Christopherson
2022-04-19  3:38     ` Kai Huang
2022-04-26 20:21   ` Dave Hansen
2022-04-26 23:12     ` Kai Huang
2022-04-26 23:28       ` Dave Hansen
2022-04-26 23:49         ` Kai Huang
2022-04-27  0:22           ` Sean Christopherson
2022-04-27  0:44             ` Kai Huang
2022-04-27 14:22           ` Dave Hansen
2022-04-27 22:39             ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 02/21] x86/virt/tdx: Detect TDX private KeyIDs Kai Huang
2022-04-19  5:39   ` Sathyanarayanan Kuppuswamy
2022-04-19  9:41     ` Kai Huang
2022-04-19  5:42   ` Sathyanarayanan Kuppuswamy
2022-04-19 10:07     ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 03/21] x86/virt/tdx: Implement the SEAMCALL base function Kai Huang
2022-04-19 14:07   ` Sathyanarayanan Kuppuswamy
2022-04-20  4:16     ` Kai Huang
2022-04-20  7:29       ` Sathyanarayanan Kuppuswamy
2022-04-20 10:39         ` Kai Huang
2022-04-26 20:37   ` Dave Hansen
2022-04-26 23:29     ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 04/21] x86/virt/tdx: Add skeleton for detecting and initializing TDX on demand Kai Huang
2022-04-19 14:53   ` Sathyanarayanan Kuppuswamy
2022-04-20  4:37     ` Kai Huang
2022-04-20  5:21       ` Dave Hansen
2022-04-20 14:30       ` Sathyanarayanan Kuppuswamy
2022-04-20 22:35         ` Kai Huang
2022-04-26 20:53   ` Dave Hansen
2022-04-27  0:43     ` Kai Huang
2022-04-27 14:49       ` Dave Hansen
2022-04-28  0:00         ` Kai Huang
2022-04-28 14:27           ` Dave Hansen
2022-04-28 23:44             ` Kai Huang
2022-04-28 23:53               ` Dave Hansen
2022-04-29  0:11                 ` Kai Huang
2022-04-29  0:26                   ` Dave Hansen
2022-04-29  0:59                     ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 05/21] x86/virt/tdx: Detect P-SEAMLDR and TDX module Kai Huang
2022-04-26 20:56   ` Dave Hansen
2022-04-27  0:01     ` Kai Huang
2022-04-27 14:24       ` Dave Hansen
2022-04-27 21:30         ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 06/21] x86/virt/tdx: Shut down TDX module in case of error Kai Huang
2022-04-23 15:39   ` Sathyanarayanan Kuppuswamy
2022-04-25 23:41     ` Kai Huang
2022-04-26  1:48       ` Sathyanarayanan Kuppuswamy
2022-04-26  2:12         ` Kai Huang
2022-04-26 20:59   ` Dave Hansen
2022-04-27  0:06     ` Kai Huang
2022-05-18 16:19       ` Sagi Shahar
2022-05-18 23:51         ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 07/21] x86/virt/tdx: Do TDX module global initialization Kai Huang
2022-04-20 22:27   ` Sathyanarayanan Kuppuswamy
2022-04-20 22:37     ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 08/21] x86/virt/tdx: Do logical-cpu scope TDX module initialization Kai Huang
2022-04-24  1:27   ` Sathyanarayanan Kuppuswamy
2022-04-25 23:55     ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 09/21] x86/virt/tdx: Get information about TDX module and convertible memory Kai Huang
2022-04-25  2:58   ` Sathyanarayanan Kuppuswamy
2022-04-26  0:05     ` Kai Huang
2022-04-27 22:15   ` Dave Hansen
2022-04-28  0:15     ` Kai Huang
2022-04-28 14:06       ` Dave Hansen
2022-04-28 23:14         ` Kai Huang
2022-04-29 17:47           ` Dave Hansen
2022-05-02  5:04             ` Kai Huang
2022-05-25  4:47             ` Kai Huang
2022-05-25  4:57               ` Kai Huang
2022-05-25 16:00                 ` Kai Huang
2022-05-18 22:30       ` Sagi Shahar
2022-05-18 23:56         ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 10/21] x86/virt/tdx: Add placeholder to coveret all system RAM as TDX memory Kai Huang
2022-04-20 20:48   ` Isaku Yamahata
2022-04-20 22:38     ` Kai Huang
2022-04-27 22:24   ` Dave Hansen
2022-04-28  0:53     ` Kai Huang
2022-04-28  1:07       ` Dave Hansen
2022-04-28  1:35         ` Kai Huang
2022-04-28  3:40           ` Dave Hansen
2022-04-28  3:55             ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 11/21] x86/virt/tdx: Choose to use " Kai Huang
2022-04-20 20:55   ` Isaku Yamahata
2022-04-20 22:39     ` Kai Huang
2022-04-28 15:54   ` Dave Hansen
2022-04-29  7:32     ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 12/21] x86/virt/tdx: Create TDMRs to cover all system RAM Kai Huang
2022-04-28 16:22   ` Dave Hansen
2022-04-29  7:24     ` Kai Huang
2022-04-29 13:52       ` Dave Hansen
2022-04-06  4:49 ` [PATCH v3 13/21] x86/virt/tdx: Allocate and set up PAMTs for TDMRs Kai Huang
2022-04-28 17:12   ` Dave Hansen
2022-04-29  7:46     ` Kai Huang
2022-04-29 14:20       ` Dave Hansen
2022-04-29 14:30         ` Sean Christopherson
2022-04-29 17:46           ` Dave Hansen
2022-04-29 18:19             ` Sean Christopherson
2022-04-29 18:32               ` Dave Hansen
2022-05-02  5:59         ` Kai Huang
2022-05-02 14:17           ` Dave Hansen
2022-05-02 21:55             ` Kai Huang
2022-04-06  4:49 ` [PATCH v3 14/21] x86/virt/tdx: Set up reserved areas for all TDMRs Kai Huang
2022-04-06  4:49 ` [PATCH v3 15/21] x86/virt/tdx: Reserve TDX module global KeyID Kai Huang
2022-04-06  4:49 ` [PATCH v3 16/21] x86/virt/tdx: Configure TDX module with TDMRs and " Kai Huang
2022-04-06  4:49 ` [PATCH v3 17/21] x86/virt/tdx: Configure global KeyID on all packages Kai Huang
2022-04-06  4:49 ` [PATCH v3 18/21] x86/virt/tdx: Initialize all TDMRs Kai Huang
2022-04-06  4:49 ` [PATCH v3 19/21] x86: Flush cache of TDX private memory during kexec() Kai Huang
2022-04-06  4:49 ` [PATCH v3 20/21] x86/virt/tdx: Add kernel command line to opt-in TDX host support Kai Huang
2022-04-28 17:25   ` Dave Hansen
2022-04-06  4:49 ` [PATCH v3 21/21] Documentation/x86: Add documentation for " Kai Huang
2022-04-14 10:19 ` [PATCH v3 00/21] TDX host kernel support Kai Huang
2022-04-26 20:13 ` Dave Hansen
2022-04-27  1:15   ` Kai Huang
2022-04-27 21:59     ` Dave Hansen
2022-04-28  0:37       ` Kai Huang
2022-04-28  0:50         ` Dave Hansen
2022-04-28  0:58           ` Kai Huang
2022-04-29  1:40             ` Kai Huang
2022-04-29  3:04               ` Dan Williams
2022-04-29  5:35                 ` Kai Huang
2022-05-03 23:59               ` Kai Huang
2022-05-04  0:25                 ` Dave Hansen
2022-05-04  1:15                   ` Kai Huang
2022-05-05  9:54                     ` Kai Huang
2022-05-05 13:51                       ` Dan Williams
2022-05-05 22:14                         ` Kai Huang
2022-05-06  0:22                           ` Dan Williams
2022-05-06  0:45                             ` Kai Huang
2022-05-06  1:15                               ` Dan Williams
2022-05-06  1:46                                 ` Kai Huang [this message]
2022-05-06 15:57                                   ` Dan Williams
2022-05-09  2:46                                     ` Kai Huang
2022-05-10 10:25                                       ` Kai Huang
2022-05-07  0:09                         ` Mike Rapoport
2022-05-08 10:00                           ` Kai Huang
2022-05-09 10:33                             ` Mike Rapoport
2022-05-09 23:27                               ` Kai Huang
2022-05-04 14:31                 ` Dan Williams
2022-05-04 22:50                   ` Kai Huang
2022-04-28  1:01   ` Dan Williams
2022-04-28  1:21     ` Kai Huang
2022-04-29  2:58       ` Dan Williams
2022-04-29  5:43         ` Kai Huang
2022-04-29 14:39         ` Dave Hansen
2022-04-29 15:18           ` Dan Williams
2022-04-29 17:18             ` Dave Hansen
2022-04-29 17:48               ` Dan Williams
2022-04-29 18:34                 ` Dave Hansen
2022-04-29 18:47                   ` Dan Williams
2022-04-29 19:20                     ` Dave Hansen
2022-04-29 21:20                       ` Dan Williams
2022-04-29 21:27                         ` Dave Hansen
2022-05-02 10:18                   ` Kai Huang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b0d1ed15d8bf99efe1c49182f4a98f4a23f61d0d.camel@intel.com \
    --to=kai.huang@intel.com \
    --cc=ak@linux.intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=isaku.yamahata@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=reinette.chatre@intel.com \
    --cc=rppt@kernel.org \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=seanjc@google.com \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).