linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Hansen <dave.hansen@intel.com>
To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	luto@kernel.org, peterz@infradead.org
Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com,
	ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com,
	hpa@zytor.com, jgross@suse.com, jmattson@google.com,
	joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org,
	pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com,
	tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com,
	thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCHv5 26/30] x86/mm/cpa: Add support for TDX shared memory
Date: Wed, 9 Mar 2022 11:44:08 -0800	[thread overview]
Message-ID: <4e5aaf9f-10b2-e659-5173-a710f4423cf6@intel.com> (raw)
In-Reply-To: <20220302142806.51844-27-kirill.shutemov@linux.intel.com>

> +static bool tdx_tlb_flush_required(bool enc)
> +{
> +	/*
> +	 * TDX guest is responsible for flushing caches on private->shared

Caches?  In a "tlb" flushing function?  Did you mean paging structure
caches or CPU caches?

> +	 * transition. VMM is responsible for flushing on shared->private.
> +	 */
> +	return !enc;
> +}

It's also pretty nasty to have that argument called 'enc' when there's
no encryption in the comment.  That at least needs to be mentioned.

I'd also appreciate a mention somewhere of what the security/stability
model is here.  What if a malicious VMM doesn't flush on a
shared->private transition?  What is the fallout?  Who gets hurt?

> +static bool tdx_cache_flush_required(void)
> +{
> +	return true;
> +}

This leaves me totally in the dark.  I frankly don't know what
enc_tlb_flush_required does without looking, but I also don't know your
intent.  A one-liner about intent would be nice to ensure it matches
what enc_tlb_flush_required does.

> +static bool accept_page(phys_addr_t gpa, enum pg_level pg_level)
> +{
> +	/*
> +	 * Pass the page physical address to the TDX module to accept the
> +	 * pending, private page.
> +	 *
> +	 * Bits 2:0 of GPA encode page size: 0 - 4K, 1 - 2M, 2 - 1G.
> +	 */
> +	switch (pg_level) {
> +	case PG_LEVEL_4K:
> +		break;
> +	case PG_LEVEL_2M:
> +		gpa |= 1;
> +		break;
> +	case PG_LEVEL_1G:
> +		gpa |= 2;
> +		break;
> +	default:
> +		return false;
> +	}

Just a style thing.  I'd much rather this be something like:

	u8 page_size;
	u64 tdcall_rcx;

	switch (pg_level) {
	case PG_LEVEL_4K:
		page_size = 0;
		break;
	case PG_LEVEL_2M:
		page_size = 1;
		break;
	case PG_LEVEL_1G:
		page_size = 2;
		break;
	default:
		return false;
	}

	tdcall_rcx = gpa | page_size;

BTW, the spec from August 2021 says these bits are "either 0 (4kb) or 1
(2MB)" on the spec.  No mention of 1G.


> +	return !__tdx_module_call(TDX_ACCEPT_PAGE, gpa, 0, 0, 0, NULL);
> +}

The TDX rcx register in the TDX_ACCEPT_PAGE is *NOT* the gpa.  It
contains the gpa in a few of its bits.  Doing it this way ^ makes it
painfully clear that argument is not solely a gpa.

> +/*
> + * Inform the VMM of the guest's intent for this physical page: shared with
> + * the VMM or private to the guest.  The VMM is expected to change its mapping
> + * of the page in response.
> + */
> +static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
> +{
> +	phys_addr_t start = __pa(vaddr);
> +	phys_addr_t end = __pa(vaddr + numpages * PAGE_SIZE);
> +
> +	if (!enc) {
> +		start |= cc_mkdec(0);
> +		end |= cc_mkdec(0);
> +	}
> +
> +	/*
> +	 * Notify the VMM about page mapping conversion. More info about ABI
> +	 * can be found in TDX Guest-Host-Communication Interface (GHCI),
> +	 * section "TDG.VP.VMCALL<MapGPA>"
> +	 */
> +	if (_tdx_hypercall(TDVMCALL_MAP_GPA, start, end - start, 0, 0))
> +		return false;

This is really confusing.  "start" and "end" are physical addresses and
you're doing physical address math on them.  But they also have some
other bits encoded in them.

I *guess* that works.  If you've set the same bits in both, then you
subtract them, the bits cancel out.  But, it's horribly confusing.

Look how much more sane this is to read if we do a few things:

	phys_addr_t start = __pa(vaddr);
	phys_addr_t end   = __pa(vaddr + numpages * PAGE_SIZE);
			^ add vertical alignment
	phys_addr_t len_bytes = end - start;
	bool private = enc;

	if (!private) {
		/* Set the shared (decrypted) bits: */
		    ^ Note that we're helping the reader impedance-match
	 		between 'enc' and shared/private
		start |= cc_mkdec(0);
		end   |= cc_mkdec(0);
		    ^ more vertical alginment
	}

	if (__tdx_hypercall(TDVMCALL_MAP_GPA, start, len_bytes, 0, 0))
		return false;


> +	/* private->shared conversion  requires only MapGPA call */
> +	if (!enc)
> +		return true;
> +
> +	/*
> +	 * For shared->private conversion, accept the page using
> +	 * TDX_ACCEPT_PAGE TDX module call.
> +	 */
> +	while (start < end) {
> +		/* Try if 1G page accept is possible */
> +		if (!(start & ~PUD_MASK) && end - start >= PUD_SIZE &&
> +		    accept_page(start, PG_LEVEL_1G)) {
> +			start += PUD_SIZE;
> +			continue;
> +		}

This is rather ugly.  Why not just do a helper:

static int try_accept_one(phys_addr_t *start, unsigned long len,
unsigned long accept_size)
{
	int ret;

	if (!IS_ALIGNED(*start, accept_size));
		return -ESOMETHING;
	if (len < accept_size)
		return -ESOMETHING;

	ret = accept_page(start, size_to_level(accept_size));

	if (!ret)
		*start += accept_size;

	return ret;
}

Then the loop becomes actually readable:

	while (start < end) {
		len = end - start;

		/* Try larger accepts first because... */	

		ret = try_accept_one(&start, len, PUD_SIZE);
		if (ret)
			continue;

		ret = try_accept_one(&start, len, PMD_SIZE);
		if (ret)
			continue;

		ret = try_accept_one(&start, len, PTE_SIZE);
		if (ret)
			return false;
	}



> +
> +		/* Try if 2M page accept is possible */
> +		if (!(start & ~PMD_MASK) && end - start >= PMD_SIZE &&
> +		    accept_page(start, PG_LEVEL_2M)) {
> +			start += PMD_SIZE;
> +			continue;
> +		}
> +
> +		if (!accept_page(start, PG_LEVEL_4K))
> +			return false;
> +		start += PAGE_SIZE;
> +	}
> +
> +	return true;
> +}
> +
>  void __init tdx_early_init(void)
>  {
>  	unsigned int gpa_width;
> @@ -526,5 +623,9 @@ void __init tdx_early_init(void)
>  	 */
>  	cc_set_mask(BIT_ULL(gpa_width - 1));
>  
> +	x86_platform.guest.enc_cache_flush_required = tdx_cache_flush_required;
> +	x86_platform.guest.enc_tlb_flush_required = tdx_tlb_flush_required;
> +	x86_platform.guest.enc_status_change_finish = tdx_enc_status_changed;

Could you double-check for vertical alignment opportunities in this
patch?  This is a place where two spaces can at least tell you quickly
that this is all TDX-specific:

	x86_platform.guest.enc_cache_flush_required = tdx_cache_flush_required;
	x86_platform.guest.enc_tlb_flush_required   = tdx_tlb_flush_required;
	x86_platform.guest.enc_status_change_finish = tdx_enc_status_changed;

>  	pr_info("Guest detected\n");
>  }
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 1c3cb952fa2a..080f21171b27 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -1308,7 +1308,7 @@ static void ve_raise_fault(struct pt_regs *regs, long error_code)
>   *
>   * In the settings that Linux will run in, virtualization exceptions are
>   * never generated on accesses to normal, TD-private memory that has been
> - * accepted.
> + * accepted (by BIOS or with tdx_enc_status_changed()).
>   *
>   * Syscall entry code has a critical window where the kernel stack is not
>   * yet set up. Any exception in this window leads to hard to debug issues


  reply	other threads:[~2022-03-09 19:44 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-02 14:27 [PATCHv5 00/30] TDX Guest: TDX core support Kirill A. Shutemov
2022-03-02 14:27 ` [PATCHv5 01/30] x86/tdx: Detect running as a TDX guest in early boot Kirill A. Shutemov
2022-03-04 15:43   ` Borislav Petkov
2022-03-04 15:47     ` Dave Hansen
2022-03-04 16:02       ` Borislav Petkov
2022-03-07 22:24         ` [PATCHv5.1 " Kirill A. Shutemov
2022-03-09 18:22           ` Borislav Petkov
2022-03-02 14:27 ` [PATCHv5 02/30] x86/tdx: Provide common base for SEAMCALL and TDCALL C wrappers Kirill A. Shutemov
2022-03-08 19:56   ` Dave Hansen
2022-03-10 12:32   ` Borislav Petkov
2022-03-10 14:44     ` Kirill A. Shutemov
2022-03-10 14:51       ` Borislav Petkov
2022-03-02 14:27 ` [PATCHv5 03/30] x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper functions Kirill A. Shutemov
2022-03-08 20:03   ` Dave Hansen
2022-03-10 15:30   ` Borislav Petkov
2022-03-10 21:20     ` Kirill A. Shutemov
2022-03-10 21:48       ` Kirill A. Shutemov
2022-03-15 15:56         ` Borislav Petkov
2022-03-12 10:41       ` Borislav Petkov
2022-03-02 14:27 ` [PATCHv5 04/30] x86/tdx: Extend the confidential computing API to support TDX guests Kirill A. Shutemov
2022-03-08 20:17   ` Dave Hansen
2022-03-09 16:01     ` [PATCHv5.1 " Kirill A. Shutemov
2022-03-09 18:36       ` Dave Hansen
2022-03-09 23:51         ` [PATCHv5.2 " Kirill A. Shutemov
2022-03-10  0:07           ` Dave Hansen
2022-03-15 19:41           ` Borislav Petkov
2022-03-02 14:27 ` [PATCHv5 05/30] x86/tdx: Exclude shared bit from __PHYSICAL_MASK Kirill A. Shutemov
2022-03-02 14:27 ` [PATCHv5 06/30] x86/traps: Refactor exc_general_protection() Kirill A. Shutemov
2022-03-08 20:18   ` Dave Hansen
2022-03-02 14:27 ` [PATCHv5 07/30] x86/traps: Add #VE support for TDX guest Kirill A. Shutemov
2022-03-08 20:29   ` Dave Hansen
2022-03-02 14:27 ` [PATCHv5 08/30] x86/tdx: Add HLT support for TDX guests Kirill A. Shutemov
2022-03-02 14:27 ` [PATCHv5 09/30] x86/tdx: Add MSR " Kirill A. Shutemov
2022-03-02 14:27 ` [PATCHv5 10/30] x86/tdx: Handle CPUID via #VE Kirill A. Shutemov
2022-03-08 20:33   ` Dave Hansen
2022-03-09 16:15     ` [PATCH] " Kirill A. Shutemov
2022-03-02 14:27 ` [PATCHv5 11/30] x86/tdx: Handle in-kernel MMIO Kirill A. Shutemov
2022-03-08 21:26   ` Dave Hansen
2022-03-10  0:51     ` Kirill A. Shutemov
2022-03-10  1:06       ` Dave Hansen
2022-03-10 16:48         ` Kirill A. Shutemov
2022-03-10 17:53           ` Dave Hansen
2022-03-11 17:18             ` Kirill A. Shutemov
2022-03-11 17:22               ` Dave Hansen
2022-03-11 18:01               ` Dave Hansen
2022-03-02 14:27 ` [PATCHv5 12/30] x86/tdx: Detect TDX at early kernel decompression time Kirill A. Shutemov
2022-03-07 22:27   ` [PATCHv5.1 " Kirill A. Shutemov
2022-03-02 14:27 ` [PATCHv5 13/30] x86: Adjust types used in port I/O helpers Kirill A. Shutemov
2022-03-02 14:27 ` [PATCHv5 14/30] x86: Consolidate " Kirill A. Shutemov
2022-03-02 14:27 ` [PATCHv5 15/30] x86/boot: Port I/O: allow to hook up alternative helpers Kirill A. Shutemov
2022-03-02 17:42   ` Josh Poimboeuf
2022-03-02 19:41     ` Dave Hansen
2022-03-02 20:02       ` Josh Poimboeuf
2022-03-02 14:27 ` [PATCHv5 16/30] x86/boot: Port I/O: add decompression-time support for TDX Kirill A. Shutemov
2022-03-02 14:27 ` [PATCHv5 17/30] x86/tdx: Port I/O: add runtime hypercalls Kirill A. Shutemov
2022-03-08 21:30   ` Dave Hansen
2022-03-02 14:27 ` [PATCHv5 18/30] x86/tdx: Port I/O: add early boot support Kirill A. Shutemov
2022-03-02 14:27 ` [PATCHv5 19/30] x86/tdx: Wire up KVM hypercalls Kirill A. Shutemov
2022-03-02 14:27 ` [PATCHv5 20/30] x86/boot: Add a trampoline for booting APs via firmware handoff Kirill A. Shutemov
2022-03-02 14:27 ` [PATCHv5 21/30] x86/acpi, x86/boot: Add multiprocessor wake-up support Kirill A. Shutemov
2022-03-02 14:27 ` [PATCHv5 22/30] x86/boot: Set CR0.NE early and keep it set during the boot Kirill A. Shutemov
2022-03-08 21:37   ` Dave Hansen
2022-03-02 14:27 ` [PATCHv5 23/30] x86/boot: Avoid #VE during boot for TDX platforms Kirill A. Shutemov
2022-03-07  9:29   ` Xiaoyao Li
2022-03-07 22:33     ` Kirill A. Shutemov
2022-03-08  1:19       ` Xiaoyao Li
2022-03-08 16:41         ` Kirill A. Shutemov
2022-03-07 22:36     ` [PATCHv5.1 " Kirill A. Shutemov
2022-03-02 14:28 ` [PATCHv5 24/30] x86/topology: Disable CPU online/offline control for TDX guests Kirill A. Shutemov
2022-03-02 14:28 ` [PATCHv5 25/30] x86/tdx: Make pages shared in ioremap() Kirill A. Shutemov
2022-03-08 22:02   ` Dave Hansen
2022-03-02 14:28 ` [PATCHv5 26/30] x86/mm/cpa: Add support for TDX shared memory Kirill A. Shutemov
2022-03-09 19:44   ` Dave Hansen [this message]
2022-03-02 14:28 ` [PATCHv5 27/30] x86/kvm: Use bounce buffers for TD guest Kirill A. Shutemov
2022-03-09 20:07   ` Dave Hansen
2022-03-10 14:29     ` Tom Lendacky
2022-03-10 14:51       ` Christoph Hellwig
2022-03-02 14:28 ` [PATCHv5 28/30] x86/tdx: ioapic: Add shared bit for IOAPIC base address Kirill A. Shutemov
2022-03-09 20:39   ` Dave Hansen
2022-03-02 14:28 ` [PATCHv5 29/30] ACPICA: Avoid cache flush inside virtual machines Kirill A. Shutemov
2022-03-02 16:13   ` Dan Williams
2022-03-09 20:56   ` Dave Hansen
2022-03-02 14:28 ` [PATCHv5 30/30] Documentation/x86: Document TDX kernel architecture Kirill A. Shutemov
2022-03-09 21:49   ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4e5aaf9f-10b2-e659-5173-a710f4423cf6@intel.com \
    --to=dave.hansen@intel.com \
    --cc=aarcange@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=brijesh.singh@amd.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=hpa@zytor.com \
    --cc=jgross@suse.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=jpoimboe@redhat.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=knsathya@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=sdeep@vmware.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=tony.luck@intel.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).