All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Dave Hansen <dave.hansen@intel.com>
Cc: Kai Huang <kai.huang@intel.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"rafael@kernel.org" <rafael@kernel.org>,
	Chao Gao <chao.gao@intel.com>, Tony Luck <tony.luck@intel.com>,
	"david@redhat.com" <david@redhat.com>,
	"bagasdotme@gmail.com" <bagasdotme@gmail.com>,
	"ak@linux.intel.com" <ak@linux.intel.com>,
	"kirill.shutemov@linux.intel.com"
	<kirill.shutemov@linux.intel.com>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	Isaku Yamahata <isaku.yamahata@intel.com>,
	"nik.borisov@suse.com" <nik.borisov@suse.com>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"sagis@google.com" <sagis@google.com>,
	"imammedo@redhat.com" <imammedo@redhat.com>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"bp@alien8.de" <bp@alien8.de>, Len Brown <len.brown@intel.com>,
	"sathyanarayanan.kuppuswamy@linux.intel.com" 
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	Ying Huang <ying.huang@intel.com>,
	Dan J Williams <dan.j.williams@intel.com>,
	"x86@kernel.org" <x86@kernel.org>
Subject: Re: [PATCH v15 22/23] x86/mce: Improve error log of kernel space TDX #MC due to erratum
Date: Mon, 4 Dec 2023 18:04:52 -0800	[thread overview]
Message-ID: <ZW6FRBnOwYV-UCkY@google.com> (raw)
In-Reply-To: <1a5b18b2-3072-46d9-9d44-38589cb54e40@intel.com>

On Mon, Dec 04, 2023, Dave Hansen wrote:
> On 12/4/23 15:24, Huang, Kai wrote:
> > On Mon, 2023-12-04 at 14:04 -0800, Hansen, Dave wrote:
> ...
> > In ancient time KVM used to immediately enable VMX when it is loaded, but later
> > it was changed to only enable VMX when there's active VM because of the above
> > reason.
> > 
> > See commit 10474ae8945ce ("KVM: Activate Virtualization On Demand").

Huh, I always just assumed it was some backwards thinking about enabling VMX/SVM
being "dangerous" or something.

> Fine.  This doesn't need to change ... until you load TDX.  Once you
> initialize the TDX module, no more out-of-tree VMMs for you.

It's not just out-of-tree hypervisors, which IMO should be little more than an
afterthought.  The other more important issue is that being post-VMXON blocks INIT,
i.e. VMX needs to be disabled before reboot, suspend, etc.  Forcing kvm_usage_count
would work, but it would essentially turn "graceful" reboots, i.e. reboots where
the host isn't running VMs and thus VMX is already disabled.  Having VMX be enabled
so long as KVM is loaded would turn all reboots into the "oh crap, the system is
rebooting, quick do VMXOFF!" variety.

> That doesn't seem too insane.  This is yet *ANOTHER* reason that doing
> dynamic TDX module initialization is a good idea.
> 
> >> It's not wrong to say that TDX is a KVM user.  If KVm wants
> >> 'kvm_usage_count' to go back to 0, it can shut down the TDX module.  Then
> >> there's no PAMT to worry about.
> >>
> >> The shutdown would be something like:
> >>
> >>       1. TDX module shutdown
> >>       2. Deallocate/Convert PAMT
> >>       3. vmxoff
> >>
> >> Then, no SEAMCALL failure because of vmxoff can cause a PAMT-induced #MC
> >> to be missed.
> > 
> > The limitation is once the TDX module is shutdown, it cannot be initialized
> > again unless it is runtimely updated.
> > 
> > Long-termly, if we go this design then there might be other problems when other
> > kernel components are using TDX.  For example, the VT-d driver will need to be
> > changed to support TDX-IO, and it will need to enable TDX module much earlier
> > than KVM to do some initialization.  It might need to some TDX work (e.g.,
> > cleanup) while KVM is unloaded.  I am not super familiar with TDX-IO but looks
> > we might have some problem here if we go with such design.
> 
> The burden for who does vmxon will simply need to change from KVM itself
> to some common code that KVM depends on.  Probably not dissimilar to
> those nutty (sorry folks, just calling it as I see 'em) multi-KVM module

You misspelled "amazing" ;-)

> patches that are floating around.

Joking aside, why shove TDX module ownership into KVM?  It honestly sounds like
a terrible fit, even without the whole TDX-IO mess.  KVM state is largely ephemeral,
in the sense that loading and unloading kvm.ko doesn't allocate/free much memory
or do all that much initialization or teardown.

TDX on the other hand is quite different.  IIRC the PAMT is hundreds of MiB, maybe
over a GiB in most expected use cases?  And also IIRC, TDH.SYS.INIT is rather
long running operation, blocks IRQs, NMIs, (SMIs?), etc.

So rather than shove TDX ownership into KVM and force KVM to figure out how to
manage the TDX module, why not do what us nutty people are suggesting and move
hardware enabling and TDX-module management into a dedicated base module (bonus
points if you call it vac.ko ;-) ).

Alternatively, we could have a dedicated kernel module for TDX, e.g. tdx.ko, and
then have tdx.ko and kvm.ko depend on vac.ko.  But I think that ends up being
quite gross and unnecessary, e.g. in such a setup, kvm-intel.ko ideally wouldn't
take a hard dependency on tdx.ko, as auto-loading tdx.ko would defeat some of the
purpose of the split, and KVM shouldn't fail to load just because TDX isn't supported.
But that'd mean conditionally doing request_module("tdx") or whatever and would
create other conundrums.

(Oof, typing that out made me realize that KVM depends on the PSP driver if
CONFIG_KVM_AMD_SEV=y, even if if the platform owner has no intention of ever using
SEV/SEV-ES.  IIUC, it works because sp_mod_init() just registers a driver, i.e.
doesn't fail out of there's no PSP.  That's kinda gross).

Anyways, vac.ko provides an API to grab a reference to the TDX module, e.g. the
"create a VM" API gets extended to say "create a VM of the TDX variety", and then
vac.ko manages its refcounts to VMX and TDX accordingly.  And KVM obviously keeps
its existing behavior of getting and putting references for each VM.

That way userspace gets to decide when to (un)load tdx.ko without needing to add
a KVM module param or whatever to allow forcefully unloading tdx.ko (which would
be bizarre and probably quite difficult to implement correctly), and unloading
kvm-intel.ko wouldn't require unloading the TDX module.

The end behavior might not be all that different in the short term, but it would
give us more options, e.g. for this erratum, it would be quite easy for vac.ko to
let usersepace choose between keeping VMX "on" (while the TDX module is loaded)
and potentially having imperfect #MC messages.

And out-of-tree hypervisors could even use vac.ko's exported APIs to manage hardware
enabling if they so choose.

  parent reply	other threads:[~2023-12-05  2:05 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-09 11:55 [PATCH v15 00/23] TDX host kernel support Kai Huang
2023-11-09 11:55 ` [PATCH v15 01/23] x86/virt/tdx: Detect TDX during kernel boot Kai Huang
2023-11-09 11:55 ` [PATCH v15 02/23] x86/tdx: Define TDX supported page sizes as macros Kai Huang
2023-11-09 11:55 ` [PATCH v15 03/23] x86/virt/tdx: Make INTEL_TDX_HOST depend on X86_X2APIC Kai Huang
2023-11-09 11:55 ` [PATCH v15 04/23] x86/cpu: Detect TDX partial write machine check erratum Kai Huang
2023-11-09 11:55 ` [PATCH v15 05/23] x86/virt/tdx: Handle SEAMCALL no entropy error in common code Kai Huang
2023-11-09 16:38   ` Dave Hansen
2023-11-14 19:24   ` Isaku Yamahata
2023-11-15 10:41     ` Huang, Kai
2023-11-15 19:26       ` Isaku Yamahata
2023-11-09 11:55 ` [PATCH v15 06/23] x86/virt/tdx: Add SEAMCALL error printing for module initialization Kai Huang
2023-11-09 11:55 ` [PATCH v15 07/23] x86/virt/tdx: Add skeleton to enable TDX on demand Kai Huang
2023-11-09 11:55 ` [PATCH v15 08/23] x86/virt/tdx: Use all system memory when initializing TDX module as TDX memory Kai Huang
2023-11-09 11:55 ` [PATCH v15 09/23] x86/virt/tdx: Get module global metadata for module initialization Kai Huang
2023-11-09 23:29   ` Dave Hansen
2023-11-10  2:23     ` Huang, Kai
2023-11-15 19:35   ` Isaku Yamahata
2023-11-16  3:19     ` Huang, Kai
2023-11-09 11:55 ` [PATCH v15 10/23] x86/virt/tdx: Add placeholder to construct TDMRs to cover all TDX memory regions Kai Huang
2023-11-09 11:55 ` [PATCH v15 11/23] x86/virt/tdx: Fill out " Kai Huang
2023-11-09 11:55 ` [PATCH v15 12/23] x86/virt/tdx: Allocate and set up PAMTs for TDMRs Kai Huang
2023-11-09 11:55 ` [PATCH v15 13/23] x86/virt/tdx: Designate reserved areas for all TDMRs Kai Huang
2023-11-09 11:55 ` [PATCH v15 14/23] x86/virt/tdx: Configure TDX module with the TDMRs and global KeyID Kai Huang
2023-11-09 11:55 ` [PATCH v15 15/23] x86/virt/tdx: Configure global KeyID on all packages Kai Huang
2023-11-09 11:55 ` [PATCH v15 16/23] x86/virt/tdx: Initialize all TDMRs Kai Huang
2023-11-09 11:55 ` [PATCH v15 17/23] x86/kexec: Flush cache of TDX private memory Kai Huang
2023-11-27 18:13   ` Dave Hansen
2023-11-27 19:33     ` Huang, Kai
2023-11-27 20:02       ` Huang, Kai
2023-11-27 20:05       ` Dave Hansen
2023-11-27 20:52         ` Huang, Kai
2023-11-27 21:06           ` Dave Hansen
2023-11-27 22:09             ` Huang, Kai
2023-11-09 11:55 ` [PATCH v15 18/23] x86/virt/tdx: Keep TDMRs when module initialization is successful Kai Huang
2023-11-09 11:55 ` [PATCH v15 19/23] x86/virt/tdx: Improve readability of module initialization error handling Kai Huang
2023-11-09 11:55 ` [PATCH v15 20/23] x86/kexec(): Reset TDX private memory on platforms with TDX erratum Kai Huang
2023-11-09 11:55 ` [PATCH v15 21/23] x86/virt/tdx: Handle TDX interaction with ACPI S3 and deeper states Kai Huang
2023-11-30 17:20   ` Dave Hansen
2023-11-09 11:55 ` [PATCH v15 22/23] x86/mce: Improve error log of kernel space TDX #MC due to erratum Kai Huang
2023-11-30 18:01   ` Tony Luck
2023-12-01 20:35   ` Dave Hansen
2023-12-03 11:44     ` Huang, Kai
2023-12-04 17:07       ` Dave Hansen
2023-12-04 21:00         ` Huang, Kai
2023-12-04 22:04           ` Dave Hansen
2023-12-04 23:24             ` Huang, Kai
2023-12-04 23:39               ` Dave Hansen
2023-12-04 23:56                 ` Huang, Kai
2023-12-05  2:04                 ` Sean Christopherson [this message]
2023-12-05 16:36                   ` Dave Hansen
2023-12-05 16:53                     ` Sean Christopherson
2023-12-05 16:36                   ` Luck, Tony
2023-12-05 16:57                     ` Sean Christopherson
2023-12-04 23:41               ` Huang, Kai
2023-12-05 14:25   ` Borislav Petkov
2023-12-05 19:41     ` Huang, Kai
2023-12-05 19:56       ` Borislav Petkov
2023-12-05 20:08         ` Huang, Kai
2023-12-05 20:29           ` Borislav Petkov
2023-12-05 20:33             ` Huang, Kai
2023-12-05 20:41               ` Borislav Petkov
2023-12-05 20:49                 ` Dave Hansen
2023-12-05 20:58                 ` Huang, Kai
2023-11-09 11:56 ` [PATCH v15 23/23] Documentation/x86: Add documentation for TDX host support Kai Huang
2023-11-13  8:40 ` [PATCH v15 00/23] TDX host kernel support Nikolay Borisov
2023-11-13  9:11   ` Huang, Kai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZW6FRBnOwYV-UCkY@google.com \
    --to=seanjc@google.com \
    --cc=ak@linux.intel.com \
    --cc=bagasdotme@gmail.com \
    --cc=bp@alien8.de \
    --cc=chao.gao@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=hpa@zytor.com \
    --cc=imammedo@redhat.com \
    --cc=isaku.yamahata@intel.com \
    --cc=kai.huang@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=nik.borisov@suse.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=sagis@google.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.