linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Hansen <dave.hansen@intel.com>
To: Kai Huang <kai.huang@intel.com>,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: linux-mm@kvack.org, peterz@infradead.org, tglx@linutronix.de,
	seanjc@google.com, pbonzini@redhat.com, dan.j.williams@intel.com,
	rafael.j.wysocki@intel.com, kirill.shutemov@linux.intel.com,
	ying.huang@intel.com, reinette.chatre@intel.com,
	len.brown@intel.com, tony.luck@intel.com, ak@linux.intel.com,
	isaku.yamahata@intel.com, chao.gao@intel.com,
	sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com,
	sagis@google.com, imammedo@redhat.com
Subject: Re: [PATCH v8 15/16] x86/virt/tdx: Flush cache in kexec() when TDX is enabled
Date: Fri, 6 Jan 2023 16:35:49 -0800	[thread overview]
Message-ID: <ba0fdee9-148b-b0b9-ecde-2610eff02ba1@intel.com> (raw)
In-Reply-To: <ee5185e1727c3cd8bd51dbf9fcec95d432100d12.1670566861.git.kai.huang@intel.com>

On 12/8/22 22:52, Kai Huang wrote:
> There are two problems in terms of using kexec() to boot to a new kernel
> when the old kernel has enabled TDX: 1) Part of the memory pages are
> still TDX private pages (i.e. metadata used by the TDX module, and any
> TDX guest memory if kexec() happens when there's any TDX guest alive).
> 2) There might be dirty cachelines associated with TDX private pages.
> 
> Because the hardware doesn't guarantee cache coherency among different
> KeyIDs, the old kernel needs to flush cache (of those TDX private pages)
> before booting to the new kernel.  Also, reading TDX private page using
> any shared non-TDX KeyID with integrity-check enabled can trigger #MC.
> Therefore ideally, the kernel should convert all TDX private pages back
> to normal before booting to the new kernel.

This is just talking about way too many things that just don't apply.

Let's focus on the *ACTUAL* problem that's being addressed instead of
the 15 problems that aren't actual practical problems.

> However, this implementation doesn't convert TDX private pages back to
> normal in kexec() because of below considerations:
> 
> 1) Neither the kernel nor the TDX module has existing infrastructure to
>    track which pages are TDX private pages.
> 2) The number of TDX private pages can be large, and converting all of
>    them (cache flush + using MOVDIR64B to clear the page) in kexec() can
>    be time consuming.
> 3) The new kernel will almost only use KeyID 0 to access memory.  KeyID
>    0 doesn't support integrity-check, so it's OK.
> 4) The kernel doesn't (and may never) support MKTME.  If any 3rd party
>    kernel ever supports MKTME, it can/should do MOVDIR64B to clear the
>    page with the new MKTME KeyID (just like TDX does) before using it.

Yeah, why are we getting all worked up about MKTME when there is not
support?

The only thing that matters here is dirty cacheline writeback.  There
are two things the kernel needs to do to mitigate that:

 1. Stop accessing TDX private memory mappings
  1a. Stop making TDX module calls (uses global private KeyID)
  1b. Stop TDX guests from running (uses per-guest KeyID)
 2. Flush any cachelines from previous private KeyID writes

There are a couple of ways we can do #2.  We do *NOT* need to convert
*ANYTHING* back to KeyID 0.  Page conversion doesn't even come into play
in any way as far as I can tell.

I think you're also saying that since all CPUs go through this path and
there is no TDX activity between the WBINVD and the native_halt() that
1a and 1b basically happen for "free" without needing to do theme
explicitly.

> Therefore, this implementation just flushes cache to make sure there are
> no stale dirty cachelines associated with any TDX private KeyIDs before
> booting to the new kernel, otherwise they may silently corrupt the new
> kernel.

That's fine.  So, this patch kinda happens to land in the right spot
even after thrashing about about a while.

> Following SME support, use wbinvd() to flush cache in stop_this_cpu().
> Theoretically, cache flush is only needed when the TDX module has been
> initialized.  However initializing the TDX module is done on demand at
> runtime, and it takes a mutex to read the module status.  Just check
> whether TDX is enabled by BIOS instead to flush cache.

Yeah, close enough.

> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index c21b7347a26d..0cc84977dc62 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -765,8 +765,14 @@ void __noreturn stop_this_cpu(void *dummy)
>  	 *
>  	 * Test the CPUID bit directly because the machine might've cleared
>  	 * X86_FEATURE_SME due to cmdline options.
> +	 *
> +	 * Similar to SME, if the TDX module is ever initialized, the
> +	 * cachelines associated with any TDX private KeyID must be flushed
> +	 * before transiting to the new kernel.  The TDX module is initialized
> +	 * on demand, and it takes the mutex to read its status.  Just check
> +	 * whether TDX is enabled by BIOS instead to flush cache.
>  	 */

There's too much detail here.  Let's up-level it a bit.  We don't need
to be talking about TDX locking here.

	/*
	 * The TDX module or guests might have left dirty cachelines
	 * behind.  Flush them to avoid corruption from later writeback.
	 * Note that this flushes on all systems where TDX is possible,
	 * but does not actually check that TDX was in use.
	 */

> -	if (cpuid_eax(0x8000001f) & BIT(0))
> +	if (cpuid_eax(0x8000001f) & BIT(0) || platform_tdx_enabled())
>  		native_wbinvd();
>  	for (;;) {
>  		/*


  reply	other threads:[~2023-01-07  0:37 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-09  6:52 [PATCH v8 00/16] TDX host kernel support Kai Huang
2022-12-09  6:52 ` [PATCH v8 01/16] x86/tdx: Define TDX supported page sizes as macros Kai Huang
2023-01-06 19:04   ` Dave Hansen
2022-12-09  6:52 ` [PATCH v8 02/16] x86/virt/tdx: Detect TDX during kernel boot Kai Huang
2023-01-06 17:09   ` Dave Hansen
2023-01-08 22:25     ` Huang, Kai
2022-12-09  6:52 ` [PATCH v8 03/16] x86/virt/tdx: Make INTEL_TDX_HOST depend on X86_X2APIC Kai Huang
2023-01-06 19:04   ` Dave Hansen
2022-12-09  6:52 ` [PATCH v8 04/16] x86/virt/tdx: Add skeleton to initialize TDX on demand Kai Huang
2023-01-06 17:14   ` Dave Hansen
2023-01-08 22:26     ` Huang, Kai
2022-12-09  6:52 ` [PATCH v8 05/16] x86/virt/tdx: Implement functions to make SEAMCALL Kai Huang
2023-01-06 17:29   ` Dave Hansen
2023-01-09 10:30     ` Huang, Kai
2023-01-09 19:54       ` Dave Hansen
2023-01-09 22:10         ` Huang, Kai
2022-12-09  6:52 ` [PATCH v8 06/16] x86/virt/tdx: Get information about TDX module and TDX-capable memory Kai Huang
2023-01-06 17:46   ` Dave Hansen
2023-01-09 10:25     ` Huang, Kai
2023-01-09 19:52       ` Dave Hansen
2023-01-09 22:07         ` Huang, Kai
2023-01-09 22:11           ` Dave Hansen
2022-12-09  6:52 ` [PATCH v8 07/16] x86/virt/tdx: Use all system memory when initializing TDX module as TDX memory Kai Huang
2023-01-06 18:18   ` Dave Hansen
2023-01-09 11:48     ` Huang, Kai
2023-01-09 16:51       ` Dave Hansen
2023-01-10 12:09         ` Huang, Kai
2023-01-10 16:18           ` Dave Hansen
2023-01-11 10:00             ` Huang, Kai
2023-01-12  0:56               ` Huang, Ying
2023-01-12  1:18                 ` Huang, Kai
2023-01-12  1:59                   ` Huang, Ying
2023-01-12  2:22                     ` Huang, Kai
2023-01-12 11:33         ` Huang, Kai
2023-01-18 11:08   ` Huang, Kai
2023-01-18 13:57     ` David Hildenbrand
2023-01-18 19:38       ` Huang, Kai
2022-12-09  6:52 ` [PATCH v8 08/16] x86/virt/tdx: Add placeholder to construct TDMRs to cover all TDX memory regions Kai Huang
2023-01-06 19:24   ` Dave Hansen
2023-01-10  0:40     ` Huang, Kai
2023-01-10  0:47       ` Dave Hansen
2023-01-10  2:23         ` Huang, Kai
2023-01-10 19:12           ` Dave Hansen
2023-01-11  9:23             ` Huang, Kai
2022-12-09  6:52 ` [PATCH v8 09/16] x86/virt/tdx: Fill out " Kai Huang
2023-01-06 19:36   ` Dave Hansen
2023-01-10  0:45     ` Huang, Kai
2022-12-09  6:52 ` [PATCH v8 10/16] x86/virt/tdx: Allocate and set up PAMTs for TDMRs Kai Huang
2023-01-06 21:53   ` Dave Hansen
2023-01-10  0:49     ` Huang, Kai
2023-01-07  0:47   ` Dave Hansen
2023-01-10  0:47     ` Huang, Kai
2022-12-09  6:52 ` [PATCH v8 11/16] x86/virt/tdx: Designate reserved areas for all TDMRs Kai Huang
2023-01-06 22:07   ` Dave Hansen
2023-01-10  1:19     ` Huang, Kai
2023-01-10  1:22       ` Dave Hansen
2023-01-10 11:01         ` Huang, Kai
2023-01-10 15:19           ` Dave Hansen
2023-01-11 10:57             ` Huang, Kai
2023-01-11 16:16               ` Dave Hansen
2023-01-11 22:10                 ` Huang, Kai
2023-01-10 11:01       ` Huang, Kai
2023-01-10 15:17         ` Dave Hansen
2022-12-09  6:52 ` [PATCH v8 12/16] x86/virt/tdx: Designate the global KeyID and configure the TDX module Kai Huang
2023-01-06 22:21   ` Dave Hansen
2023-01-10 10:48     ` Huang, Kai
2023-01-10 16:25       ` Dave Hansen
2023-01-10 23:33         ` Huang, Kai
2022-12-09  6:52 ` [PATCH v8 13/16] x86/virt/tdx: Configure global KeyID on all packages Kai Huang
2023-01-06 22:49   ` Dave Hansen
2023-01-10 10:15     ` Huang, Kai
2023-01-10 16:53       ` Dave Hansen
2023-01-11  0:06         ` Huang, Kai
2022-12-09  6:52 ` [PATCH v8 14/16] x86/virt/tdx: Initialize all TDMRs Kai Huang
2023-01-07  0:17   ` Dave Hansen
2023-01-10 10:23     ` Huang, Kai
2022-12-09  6:52 ` [PATCH v8 15/16] x86/virt/tdx: Flush cache in kexec() when TDX is enabled Kai Huang
2023-01-07  0:35   ` Dave Hansen [this message]
2023-01-10 11:29     ` Huang, Kai
2023-01-10 15:27       ` Dave Hansen
2023-01-11  0:13         ` Huang, Kai
2023-01-11  0:30           ` Dave Hansen
2023-01-11  1:58             ` Huang, Kai
2022-12-09  6:52 ` [PATCH v8 16/16] Documentation/x86: Add documentation for TDX host support Kai Huang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ba0fdee9-148b-b0b9-ecde-2610eff02ba1@intel.com \
    --to=dave.hansen@intel.com \
    --cc=ak@linux.intel.com \
    --cc=bagasdotme@gmail.com \
    --cc=chao.gao@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=imammedo@redhat.com \
    --cc=isaku.yamahata@intel.com \
    --cc=kai.huang@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=reinette.chatre@intel.com \
    --cc=sagis@google.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).