All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Hansen <dave.hansen@intel.com>
To: Kai Huang <kai.huang@intel.com>,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: linux-mm@kvack.org, seanjc@google.com, pbonzini@redhat.com,
	dan.j.williams@intel.com, rafael.j.wysocki@intel.com,
	kirill.shutemov@linux.intel.com, ying.huang@intel.com,
	reinette.chatre@intel.com, len.brown@intel.com,
	tony.luck@intel.com, peterz@infradead.org, ak@linux.intel.com,
	isaku.yamahata@intel.com, chao.gao@intel.com,
	sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com,
	sagis@google.com, imammedo@redhat.com
Subject: Re: [PATCH v7 13/20] x86/virt/tdx: Allocate and set up PAMTs for TDMRs
Date: Wed, 23 Nov 2022 14:57:28 -0800	[thread overview]
Message-ID: <74723e2b-3094-d04b-aed7-2789268b00ab@intel.com> (raw)
In-Reply-To: <ef6cdab2c371b9f068f2b4bf493b1dd0c9bb3c99.1668988357.git.kai.huang@intel.com>

On 11/20/22 16:26, Kai Huang wrote:
> The TDX module uses additional metadata to record things like which
> guest "owns" a given page of memory.  This metadata, referred as
> Physical Address Metadata Table (PAMT), essentially serves as the
> 'struct page' for the TDX module.  PAMTs are not reserved by hardware
> up front.  They must be allocated by the kernel and then given to the
> TDX module.

... during module initialization.

> TDX supports 3 page sizes: 4K, 2M, and 1G.  Each "TD Memory Region"
> (TDMR) has 3 PAMTs to track the 3 supported page sizes.  Each PAMT must
> be a physically contiguous area from a Convertible Memory Region (CMR).
> However, the PAMTs which track pages in one TDMR do not need to reside
> within that TDMR but can be anywhere in CMRs.  If one PAMT overlaps with
> any TDMR, the overlapping part must be reported as a reserved area in
> that particular TDMR.
> 
> Use alloc_contig_pages() since PAMT must be a physically contiguous area
> and it may be potentially large (~1/256th of the size of the given TDMR).
> The downside is alloc_contig_pages() may fail at runtime.  One (bad)
> mitigation is to launch a TD guest early during system boot to get those
> PAMTs allocated at early time, but the only way to fix is to add a boot
> option to allocate or reserve PAMTs during kernel boot.

FWIW, we all agree that this is a bad permanent way to leave things.
You can call me out here as proposing that this wart be left in place
while this series is merged and is a detail we can work on afterword
with new module params, boot options, Kconfig or whatever.

> TDX only supports a limited number of reserved areas per TDMR to cover
> both PAMTs and memory holes within the given TDMR.  If many PAMTs are
> allocated within a single TDMR, the reserved areas may not be sufficient
> to cover all of them.
> 
> Adopt the following policies when allocating PAMTs for a given TDMR:
> 
>   - Allocate three PAMTs of the TDMR in one contiguous chunk to minimize
>     the total number of reserved areas consumed for PAMTs.
>   - Try to first allocate PAMT from the local node of the TDMR for better
>     NUMA locality.
> 
> Also dump out how many pages are allocated for PAMTs when the TDX module
> is initialized successfully.

... this helps answer the eternal "where did all my memory go?" questions.

> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index b36129183035..b86a333b860f 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1960,6 +1960,7 @@ config INTEL_TDX_HOST
>  	depends on KVM_INTEL
>  	depends on X86_X2APIC
>  	select ARCH_KEEP_MEMBLOCK
> +	depends on CONTIG_ALLOC
>  	help
>  	  Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
>  	  host and certain physical attacks.  This option enables necessary TDX
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index 57b448de59a0..9d76e70de46e 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -586,6 +586,187 @@ static int create_tdmrs(struct tdmr_info *tdmr_array, int *tdmr_num)
>  	return 0;
>  }
>  
> +/*
> + * Calculate PAMT size given a TDMR and a page size.  The returned
> + * PAMT size is always aligned up to 4K page boundary.
> + */
> +static unsigned long tdmr_get_pamt_sz(struct tdmr_info *tdmr, int pgsz)
> +{
> +	unsigned long pamt_sz, nr_pamt_entries;
> +
> +	switch (pgsz) {
> +	case TDX_PS_4K:
> +		nr_pamt_entries = tdmr->size >> PAGE_SHIFT;
> +		break;
> +	case TDX_PS_2M:
> +		nr_pamt_entries = tdmr->size >> PMD_SHIFT;
> +		break;
> +	case TDX_PS_1G:
> +		nr_pamt_entries = tdmr->size >> PUD_SHIFT;
> +		break;
> +	default:
> +		WARN_ON_ONCE(1);
> +		return 0;
> +	}
> +
> +	pamt_sz = nr_pamt_entries * tdx_sysinfo.pamt_entry_size;
> +	/* TDX requires PAMT size must be 4K aligned */
> +	pamt_sz = ALIGN(pamt_sz, PAGE_SIZE);
> +
> +	return pamt_sz;
> +}
> +
> +/*
> + * Pick a NUMA node on which to allocate this TDMR's metadata.
> + *
> + * This is imprecise since TDMRs are 1G aligned and NUMA nodes might
> + * not be.  If the TDMR covers more than one node, just use the _first_
> + * one.  This can lead to small areas of off-node metadata for some
> + * memory.
> + */
> +static int tdmr_get_nid(struct tdmr_info *tdmr)
> +{
> +	struct tdx_memblock *tmb;
> +
> +	/* Find the first memory region covered by the TDMR */
> +	list_for_each_entry(tmb, &tdx_memlist, list) {
> +		if (tmb->end_pfn > (tdmr_start(tdmr) >> PAGE_SHIFT))
> +			return tmb->nid;
> +	}

Aha, the first use of tmb->nid!  I wondered why that was there.

> +
> +	/*
> +	 * Fall back to allocating the TDMR's metadata from node 0 when
> +	 * no TDX memory block can be found.  This should never happen
> +	 * since TDMRs originate from TDX memory blocks.
> +	 */
> +	WARN_ON_ONCE(1);

That's probably better a pr_warn() or something.  A backtrace and all
that jazz seems a bit overly dramatic for this.

> +	return 0;
> +}
The rest of this actually looks fine.  It's nearing ack'able state.

  reply	other threads:[~2022-11-23 22:57 UTC|newest]

Thread overview: 163+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-21  0:26 [PATCH v7 00/20] TDX host kernel support Kai Huang
2022-11-21  0:26 ` [PATCH v7 01/20] x86/tdx: Define TDX supported page sizes as macros Kai Huang
2022-11-21  2:52   ` Sathyanarayanan Kuppuswamy
2022-11-21  9:15     ` Huang, Kai
2022-11-21 17:23       ` Sathyanarayanan Kuppuswamy
2022-11-21 18:12     ` Dave Hansen
2022-11-21 23:48   ` Dave Hansen
2022-11-22  0:01     ` Huang, Kai
2022-11-21  0:26 ` [PATCH v7 02/20] x86/virt/tdx: Detect TDX during kernel boot Kai Huang
2022-11-21  3:07   ` Sathyanarayanan Kuppuswamy
2022-11-21  9:37     ` Huang, Kai
2022-11-21 23:57       ` Sathyanarayanan Kuppuswamy
2022-11-22  0:10   ` Dave Hansen
2022-11-22 11:28     ` Huang, Kai
2022-11-22 16:50       ` Dave Hansen
2022-11-22 23:21         ` Huang, Kai
2022-11-21  0:26 ` [PATCH v7 03/20] x86/virt/tdx: Disable TDX if X2APIC is not enabled Kai Huang
2022-11-21  3:51   ` Sathyanarayanan Kuppuswamy
2022-11-21  9:44     ` Huang, Kai
2022-11-21 22:00       ` Sathyanarayanan Kuppuswamy
2022-11-21 23:40         ` Huang, Kai
2022-11-21 23:46   ` Dave Hansen
2022-11-22  0:30     ` Huang, Kai
2022-11-22  0:44       ` Dave Hansen
2022-11-22  0:58         ` Huang, Kai
2022-11-21  0:26 ` [PATCH v7 04/20] x86/virt/tdx: Add skeleton to initialize TDX on demand Kai Huang
2022-11-22  9:02   ` Peter Zijlstra
2022-11-22 10:31     ` Thomas Gleixner
2022-11-22 15:35       ` Dave Hansen
2022-11-22 20:03         ` Thomas Gleixner
2022-11-22 20:11           ` Sean Christopherson
2022-11-23  0:30           ` Huang, Kai
2022-11-23  1:12             ` Huang, Kai
2022-11-23 11:05             ` Thomas Gleixner
2022-11-23 12:22               ` Huang, Kai
2022-11-22 18:05   ` Dave Hansen
2022-11-23 10:18     ` Huang, Kai
2022-11-23 16:58       ` Dave Hansen
2022-11-23 21:58         ` Huang, Kai
2022-11-21  0:26 ` [PATCH v7 05/20] x86/virt/tdx: Implement functions to make SEAMCALL Kai Huang
2022-11-22  9:06   ` Peter Zijlstra
2022-11-23  8:53     ` Huang, Kai
2022-11-22 18:20   ` Dave Hansen
2022-11-23 10:43     ` Huang, Kai
2022-11-21  0:26 ` [PATCH v7 06/20] x86/virt/tdx: Shut down TDX module in case of error Kai Huang
2022-11-22  9:10   ` Peter Zijlstra
2022-11-22  9:13   ` Peter Zijlstra
2022-11-22 15:14     ` Dave Hansen
2022-11-22 19:13       ` Peter Zijlstra
2022-11-22 19:24         ` Dave Hansen
2022-11-22 19:33           ` Peter Zijlstra
2022-11-23  1:14             ` Huang, Kai
2022-11-29 21:40             ` Dave Hansen
2022-11-30 11:09               ` Thomas Gleixner
2022-11-23  0:58           ` Huang, Kai
2022-11-23  1:04             ` Dave Hansen
2022-11-23  1:22               ` Huang, Kai
2022-11-23 16:20                 ` Sean Christopherson
2022-11-23 16:41                   ` Dave Hansen
2022-11-23 17:37                     ` Sean Christopherson
2022-11-23 18:18                       ` Dave Hansen
2022-11-23 19:03                         ` Sean Christopherson
2022-11-22  9:20   ` Peter Zijlstra
2022-11-22 15:06     ` Thomas Gleixner
2022-11-22 19:06       ` Peter Zijlstra
2022-11-22 19:31         ` Sean Christopherson
2022-11-23  9:39           ` Huang, Kai
2022-11-22 15:20     ` Dave Hansen
2022-11-22 16:52       ` Thomas Gleixner
2022-11-22 18:57   ` Dave Hansen
2022-11-22 19:14     ` Peter Zijlstra
2022-11-23  1:24       ` Huang, Kai
2022-11-21  0:26 ` [PATCH v7 07/20] x86/virt/tdx: Do TDX module global initialization Kai Huang
2022-11-22 19:14   ` Dave Hansen
2022-11-23 11:45     ` Huang, Kai
2022-11-21  0:26 ` [PATCH v7 08/20] x86/virt/tdx: Do logical-cpu scope TDX module initialization Kai Huang
2022-11-21  0:26 ` [PATCH v7 09/20] x86/virt/tdx: Get information about TDX module and TDX-capable memory Kai Huang
2022-11-22 23:39   ` Dave Hansen
2022-11-23 11:40     ` Huang, Kai
2022-11-23 16:44       ` Dave Hansen
2022-11-23 22:53         ` Huang, Kai
2022-12-02 11:19           ` Huang, Kai
2022-12-02 17:25             ` Dave Hansen
2022-12-02 21:57               ` Huang, Kai
2022-12-02 11:11     ` Huang, Kai
2022-12-02 17:06       ` Dave Hansen
2022-12-02 21:56         ` Huang, Kai
2022-11-21  0:26 ` [PATCH v7 10/20] x86/virt/tdx: Use all system memory when initializing TDX module as TDX memory Kai Huang
2022-11-21  5:37   ` Huang, Ying
2022-11-21  9:09     ` Huang, Kai
2022-11-22  1:54       ` Huang, Ying
2022-11-22  9:16         ` Huang, Kai
2022-11-24  0:47           ` Huang, Ying
2022-11-22 10:10   ` Peter Zijlstra
2022-11-22 11:40     ` Huang, Kai
2022-11-23  0:21   ` Dave Hansen
2022-11-23  9:29     ` Peter Zijlstra
2022-11-24  1:04     ` Huang, Kai
2022-11-24  1:22       ` Dave Hansen
2022-11-24  2:27         ` Huang, Kai
2022-11-24  1:50   ` Dan Williams
2022-11-24  9:06     ` Huang, Kai
2022-11-25  9:28       ` David Hildenbrand
2022-11-28  8:38         ` Huang, Kai
2022-11-28  8:43           ` David Hildenbrand
2022-11-28  9:21             ` Huang, Kai
2022-11-28  9:26               ` David Hildenbrand
2022-11-28  9:50                 ` Huang, Kai
2022-11-24  9:26     ` Peter Zijlstra
2022-11-24 10:02       ` Huang, Kai
2022-11-30 22:26         ` Dave Hansen
2022-11-21  0:26 ` [PATCH v7 11/20] x86/virt/tdx: Add placeholder to construct TDMRs to cover all TDX memory regions Kai Huang
2022-11-23 22:17   ` Dave Hansen
2022-11-24  9:51     ` Huang, Kai
2022-11-24 12:02     ` Huang, Kai
2022-11-28 15:59       ` Dave Hansen
2022-11-28 22:13         ` Huang, Kai
2022-11-28 22:19           ` Dave Hansen
2022-11-28 22:50             ` Huang, Kai
2022-12-07 11:47               ` Huang, Kai
2022-12-08 12:56                 ` Huang, Kai
2022-12-08 14:58                   ` Dave Hansen
2022-12-08 23:29                     ` Huang, Kai
2022-11-21  0:26 ` [PATCH v7 12/20] x86/virt/tdx: Create " Kai Huang
2022-11-23 22:41   ` Dave Hansen
2022-11-24 11:29     ` Huang, Kai
2022-11-21  0:26 ` [PATCH v7 13/20] x86/virt/tdx: Allocate and set up PAMTs for TDMRs Kai Huang
2022-11-23 22:57   ` Dave Hansen [this message]
2022-11-24 11:46     ` Huang, Kai
2022-11-28 16:39       ` Dave Hansen
2022-11-28 22:48         ` Huang, Kai
2022-11-28 22:56           ` Dave Hansen
2022-11-28 23:14             ` Huang, Kai
2022-11-21  0:26 ` [PATCH v7 14/20] x86/virt/tdx: Set up reserved areas for all TDMRs Kai Huang
2022-11-23 23:39   ` Dave Hansen
2022-11-28  9:14     ` Huang, Kai
2022-11-28 13:18       ` Dave Hansen
2022-11-28 22:24         ` Huang, Kai
2022-11-28 22:58           ` Dave Hansen
2022-11-28 23:10             ` Huang, Kai
2022-11-21  0:26 ` [PATCH v7 15/20] x86/virt/tdx: Reserve TDX module global KeyID Kai Huang
2022-11-23 23:40   ` Dave Hansen
2022-11-24 22:39     ` Huang, Kai
2022-11-21  0:26 ` [PATCH v7 16/20] x86/virt/tdx: Configure TDX module with TDMRs and " Kai Huang
2022-11-23 23:56   ` Dave Hansen
2022-11-25  0:59     ` Huang, Kai
2022-11-25  1:18       ` Dave Hansen
2022-11-25  1:44         ` Huang, Kai
2022-11-21  0:26 ` [PATCH v7 17/20] x86/virt/tdx: Configure global KeyID on all packages Kai Huang
2022-11-24  0:28   ` Dave Hansen
2022-11-24 22:28     ` Huang, Kai
2022-11-25  0:08       ` Huang, Kai
2022-11-30  3:35   ` Binbin Wu
2022-11-30  8:34     ` Huang, Kai
2022-11-30 14:04       ` kirill.shutemov
2022-11-30 15:13       ` Dave Hansen
2022-11-30 20:17         ` Huang, Kai
2022-11-30 17:37   ` Dave Hansen
2022-11-21  0:26 ` [PATCH v7 18/20] x86/virt/tdx: Initialize all TDMRs Kai Huang
2022-11-24  0:42   ` Dave Hansen
2022-11-25  2:27     ` Huang, Kai
2022-11-21  0:26 ` [PATCH v7 19/20] x86/virt/tdx: Flush cache in kexec() when TDX is enabled Kai Huang
2022-11-21  0:26 ` [PATCH v7 20/20] Documentation/x86: Add documentation for TDX host support Kai Huang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=74723e2b-3094-d04b-aed7-2789268b00ab@intel.com \
    --to=dave.hansen@intel.com \
    --cc=ak@linux.intel.com \
    --cc=bagasdotme@gmail.com \
    --cc=chao.gao@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=imammedo@redhat.com \
    --cc=isaku.yamahata@intel.com \
    --cc=kai.huang@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=reinette.chatre@intel.com \
    --cc=sagis@google.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=seanjc@google.com \
    --cc=tony.luck@intel.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.