linux-efi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Hansen <dave.hansen@intel.com>
To: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>,
	Martin Fernandez <martin.fernandez@eclypsium.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-efi <linux-efi@vger.kernel.org>,
	platform-driver-x86@vger.kernel.org,
	Linux MM <linux-mm@kvack.org>, "H. Peter Anvin" <hpa@zytor.com>,
	daniel.gutson@eclypsium.com, Darren Hart <dvhart@infradead.org>,
	Andy Shevchenko <andy@infradead.org>,
	Kees Cook <keescook@chromium.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ard Biesheuvel <ardb@kernel.org>, Ingo Molnar <mingo@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"Rafael J. Wysocki" <rafael@kernel.org>, X86 ML <x86@kernel.org>,
	"Schofield, Alison" <alison.schofield@intel.com>,
	hughsient@gmail.com, alex.bazhaniuk@eclypsium.com,
	Greg KH <gregkh@linuxfoundation.org>,
	Mike Rapoport <rppt@kernel.org>,
	Ben Widawsky <ben.widawsky@intel.com>,
	"Huang, Kai" <kai.huang@intel.com>,
	Sean Christopherson <seanjc@google.com>,
	"Shutemov, Kirill" <kirill.shutemov@intel.com>,
	Kuppuswamy Sathyanarayanan 
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	Michael Roth <michael.roth@amd.com>
Subject: Re: [PATCH v8 0/8] x86: Show in sysfs if a memory node is able to do encryption
Date: Mon, 9 May 2022 15:56:17 -0700	[thread overview]
Message-ID: <71c0e2b4-1a58-62ad-b8af-9e00fdd1222d@intel.com> (raw)
In-Reply-To: <YnmTFB1iXy7Qo403@zn.tnic>

On 5/9/22 15:17, Borislav Petkov wrote:
> 
>> This new ABI provides a way to avoid that situation in the first place.
>>  Userspace can look at sysfs to figure out which NUMA nodes support
>> "encryption" (aka. TDX) and can use the existing NUMA policy ABI to
>> avoid TDH.MEM.PAGE.ADD failures.
>>
>> So, here's the question for the TDX folks: are these mixed-capability
>> systems a problem for you?  Does this ABI help you fix the problem?
> What I'm not really sure too is, is per-node granularity ok? I guess it
> is but let me ask it anyway...

I think nodes are the only sane granularity.

tl;dr: Zones might work in theory but have no existing useful ABI around
them and too many practical problems.  Nodes are the only other real
option without inventing something new and fancy.

--

What about zones (or any sub-node granularity really)?

Folks have, for instance, discussed adding new memory zones for this
purpose: have ZONE_NORMAL, and then ZONE_UNENCRYPTABLE (or something
similar).  Zones are great because they have their own memory allocation
pools and can be targeted directly from within the kernel using things
like GFP_DMA.  If you run out of ZONE_FOO, you can theoretically just
reclaim ZONE_FOO.

But, even a single new zone isn't necessarily good enough.  What if we
have some ZONE_NORMAL that's encryption-capable and some that's not?
The same goes for ZONE_MOVABLE.  We'd probably need at least:

	ZONE_NORMAL
	ZONE_NORMAL_UNENCRYPTABLE
	ZONE_MOVABLE
	ZONE_MOVABLE_UNENCRYPTABLE

Also, zones are (mostly) not exposed to userspace.  If we want userspace
to be able to specify encryption capabilities, we're talking about new
ABI for enumeration and policy specification.

Why node granularity?

First, for the majority of cases, nodes "just work".  ACPI systems with
an "HMAT" table already separate out different performance classes of
memory into different Proximity Domains (PXMs) which the kernel maps
into NUMA nodes.

This means that for NVDIMMs or virtually any CXL memory regions (one or
more CXL devices glued together) we can think of, they already get their
own NUMA node.  Those nodes have their own zones (implicitly) and can
lean on the existing NUMA ABI for enumeration and policy creation.

Basically, the firmware creates the NUMA nodes for the kernel.  All the
kernel has to do is acknowledge which of them can do encryption or not.

The one place where nodes fall down is if a memory hot-add occurs within
an existing node and the newly hot-added memory does not match the
encryption capabilities of the existing memory.  The kernel basically
has two options in that case:
 * Throw away the memory until the next reboot where the system might be
   reconfigured in a way to support more uniform capabilities (this is
   actually *likely* for a reboot of a TDX system)
 * Create a synthetic NUMA node to hold it

Neither one of those is a horrible option.  Throwing the memory away is
the most likely way TDX will handle this situation if it pops up.  For
now, the folks building TDX-capable BIOSes claim emphatically that such
a system won't be built.

  reply	other threads:[~2022-05-09 22:56 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-29 20:17 [PATCH v8 0/8] x86: Show in sysfs if a memory node is able to do encryption Martin Fernandez
2022-04-29 20:17 ` [PATCH v8 1/8] mm/memblock: Tag memblocks with crypto capabilities Martin Fernandez
2022-04-29 20:17 ` [PATCH v8 2/8] mm/mmzone: Tag pg_data_t " Martin Fernandez
2022-04-29 20:17 ` [PATCH v8 3/8] x86/e820: Add infrastructure to refactor e820__range_{update,remove} Martin Fernandez
2022-04-29 20:17 ` [PATCH v8 4/8] x86/e820: Refactor __e820__range_update Martin Fernandez
2022-04-29 20:17 ` [PATCH v8 5/8] x86/e820: Refactor e820__range_remove Martin Fernandez
2022-04-29 20:17 ` [PATCH v8 6/8] x86/e820: Tag e820_entry with crypto capabilities Martin Fernandez
2022-04-29 20:17 ` [PATCH v8 7/8] x86/efi: Mark e820_entries as crypto capable from EFI memmap Martin Fernandez
2022-04-29 20:17 ` [PATCH v8 8/8] drivers/node: Show in sysfs node's crypto capabilities Martin Fernandez
2022-05-04 16:38 ` [PATCH v8 0/8] x86: Show in sysfs if a memory node is able to do encryption Borislav Petkov
2022-05-04 17:18   ` Martin Fernandez
2022-05-06 12:44     ` Borislav Petkov
2022-05-06 14:18       ` Limonciello, Mario
2022-05-06 15:32       ` Dave Hansen
2022-05-06 16:00         ` Dan Williams
2022-05-06 17:55           ` Boris Petkov
2022-05-06 18:14             ` Dave Hansen
2022-05-06 18:25               ` Boris Petkov
2022-05-06 18:43                 ` Dave Hansen
2022-05-06 19:02                   ` Boris Petkov
2022-05-09 18:47                     ` Dave Hansen
2022-05-09 22:17                       ` Borislav Petkov
2022-05-09 22:56                         ` Dave Hansen [this message]
2022-05-16  8:39                     ` Richard Hughes
2022-05-18  7:52                       ` Borislav Petkov
2022-05-18 18:28                         ` Dan Williams
2022-05-18 20:23                           ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=71c0e2b4-1a58-62ad-b8af-9e00fdd1222d@intel.com \
    --to=dave.hansen@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.bazhaniuk@eclypsium.com \
    --cc=alison.schofield@intel.com \
    --cc=andy@infradead.org \
    --cc=ardb@kernel.org \
    --cc=ben.widawsky@intel.com \
    --cc=bp@alien8.de \
    --cc=dan.j.williams@intel.com \
    --cc=daniel.gutson@eclypsium.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dvhart@infradead.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=hpa@zytor.com \
    --cc=hughsient@gmail.com \
    --cc=kai.huang@intel.com \
    --cc=keescook@chromium.org \
    --cc=kirill.shutemov@intel.com \
    --cc=linux-efi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=martin.fernandez@eclypsium.com \
    --cc=michael.roth@amd.com \
    --cc=mingo@redhat.com \
    --cc=platform-driver-x86@vger.kernel.org \
    --cc=rafael@kernel.org \
    --cc=rppt@kernel.org \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).