linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Hansen <dave.hansen@intel.com>
To: Erdem Aktas <erdemaktas@google.com>
Cc: "Nakajima, Jun" <jun.nakajima@intel.com>,
	Guorui Yu <GuoRui.Yu@linux.alibaba.com>,
	kirill.shutemov@linux.intel.com, ak@linux.intel.com,
	bp@alien8.de, dan.j.williams@intel.com, david@redhat.com,
	elena.reshetova@intel.com, hpa@zytor.com,
	linux-kernel@vger.kernel.org, luto@kernel.org, mingo@redhat.com,
	peterz@infradead.org, sathyanarayanan.kuppuswamy@linux.intel.com,
	seanjc@google.com, tglx@linutronix.de, thomas.lendacky@amd.com,
	x86@kernel.org
Subject: Re: [PATCH 2/2] x86/tdx: Do not allow #VE due to EPT violation on the private memory
Date: Mon, 7 Nov 2022 15:30:36 -0800	[thread overview]
Message-ID: <77b79116-951a-7ff9-c19b-73af2af98ce9@intel.com> (raw)
In-Reply-To: <CAAYXXYw1YpZx1AaOu0TgR9yR9Foi6_jh8XkbGU4ZM2TFTM=nSA@mail.gmail.com>

On 11/7/22 14:53, Erdem Aktas wrote:
> On Fri, Nov 4, 2022 at 3:50 PM Dave Hansen <dave.hansen@intel.com> wrote:
>> Could you please elaborate a bit on what you think the distinction is
>> between:
>>
>>         * Accept on first use
>> and
>>         * Accept on allocation
>>
>> Surely, for the vast majority of memory, it's allocated and then used
>> pretty quickly.  As in, most allocations are __GFP_ZERO so they're
>> allocated and "used" before they even leave the allocator.  So, in
>> practice, they're *VERY* close to equivalent.
>>
>> Where do you see them diverging?  Why does it matter?
> 
> For a VM with a very large memory size, let's say close to 800G of
> memory, it might take a really long time to finish the initialization.
> If all allocations are __GFP_ZERO, then I agree it would not matter
> but -- I need to run some benchmarks to validate --  what I remember
> was, that was not what we were observing. Let me run a few tests to
> provide more input on this but meanwhile if you have already run some
> benchmarks, that would be great.
> 
> What I see in the code is that the "accept_page" function will zero
> all the unaccepted pages even if the __GFP_ZERO flag is not set and if
> __GFP_ZERO is set, we will again zero all those pages. I see a lot of
> concerning comments like "Page acceptance can be very slow.".

I'm not following you at all here.  Yeah, page acceptance is very slow.
But, the slowest part is the probably cache coherency dance that the TDX
module has to do flushing and zeroing all the memory to initialize the
new integrity metadata.  Second to that is the cost of the TDCALL.
Third is the cost of the #VE.

Here's what Kirill is proposing, in some peudocode:

	alloc_page(order=0, __GFP_ZERO) {
		TD.accept(size=4M) {
			// TDX Module clflushes/zeroes 4M of memory
		}
		memset(4k);
		// leave 1023 accepted 4k pages in the allocator
	}

To accept 4M of memory, you do one TDCALL.  You do zero #VE's.  Using
the #VE handler, you do:

	alloc_page(order=0, __GFP_ZERO) {
		memset(4k) {
			-> #VE handler
			TD.accept(size=4k); // flush/zero 4k
		}
		// only 4k was accepted
	}
	... Take 1023 more #VE's later on for each 4k page

You do 1024 #VE's and 1024 TDCALLs.  So, let's summarize.  To do 4M
worth of 4k pages, here's how the two approaches break down if
__GFP_ZERO is in play:

	      #VE	Accept-in-allocator
#VE's:	     1024			0
TDCALLS:     1024			1
clflushes: 4k x 1024		4k x 1024
memset()s: 4k x 1024		4k x 1024

The *ONLY* downside of accept-at-allocate as implemented is that it does
4M at a time, so the TDCALL is long compared to a 4k one.  But, this is
a classing bandwidth versus latency compromise.  In this case, we choose
bandwidth.

*Both* cases need to memset() the same amount of memory.  Both cases
only memset() 4k at a time.

The *ONLY* way the #VE approach is better is if you allocate 4k and then
never touch the rest of the 4M page.  That might happen, maybe *ONE*
time per zone.  But the rest of the time, the amortization of the TDCALL
cost is going to win.

I'll be shocked if any benchmarking turns up another result.


  reply	other threads:[~2022-11-07 23:30 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-28 14:12 [PATCH 0/2] x86/tdx: Enforce no #VE on private memory accesses Kirill A. Shutemov
2022-10-28 14:12 ` [PATCH 1/2] x86/tdx: Extract GET_INFO call from get_cc_mask() Kirill A. Shutemov
2022-10-28 15:43   ` Dave Hansen
2022-10-28 23:27   ` Dave Hansen
2022-10-28 23:59     ` Kirill A. Shutemov
2022-10-31  4:12       ` Kirill A. Shutemov
2022-10-31 16:42         ` Dave Hansen
2022-10-31 19:19           ` Kirill A. Shutemov
2022-10-31 19:27         ` Andi Kleen
2022-10-31 19:44           ` Dave Hansen
2022-10-31 22:10             ` Kirill A. Shutemov
2022-10-28 14:12 ` [PATCH 2/2] x86/tdx: Do not allow #VE due to EPT violation on the private memory Kirill A. Shutemov
2022-10-28 15:41   ` Dave Hansen
2022-10-31  4:07   ` Guorui Yu
2022-10-31  4:33     ` Kirill A. Shutemov
2022-10-31 14:22     ` Dave Hansen
2022-11-04 22:36       ` Erdem Aktas
2022-11-04 22:50         ` Dave Hansen
2022-11-07 22:53           ` Erdem Aktas
2022-11-07 23:30             ` Dave Hansen [this message]
2022-11-07  5:10       ` Guorui Yu
2022-11-07 13:31         ` Dave Hansen
2022-11-07 13:43           ` Guorui Yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=77b79116-951a-7ff9-c19b-73af2af98ce9@intel.com \
    --to=dave.hansen@intel.com \
    --cc=GuoRui.Yu@linux.alibaba.com \
    --cc=ak@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=elena.reshetova@intel.com \
    --cc=erdemaktas@google.com \
    --cc=hpa@zytor.com \
    --cc=jun.nakajima@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).