linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michael Roth <michael.roth@amd.com>
To: Dave Hansen <dave.hansen@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>,
	Borislav Petkov <bp@alien8.de>, <x86@kernel.org>,
	<kvm@vger.kernel.org>, <linux-coco@lists.linux.dev>,
	<linux-mm@kvack.org>, <linux-crypto@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <tglx@linutronix.de>,
	<mingo@redhat.com>, <jroedel@suse.de>, <hpa@zytor.com>,
	<ardb@kernel.org>, <pbonzini@redhat.com>, <seanjc@google.com>,
	<vkuznets@redhat.com>, <jmattson@google.com>, <luto@kernel.org>,
	<dave.hansen@linux.intel.com>, <slp@redhat.com>,
	<pgonda@google.com>, <peterz@infradead.org>,
	<srinivas.pandruvada@linux.intel.com>, <rientjes@google.com>,
	<tobin@ibm.com>, <vbabka@suse.cz>, <kirill@shutemov.name>,
	<ak@linux.intel.com>, <tony.luck@intel.com>,
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	<alpergun@google.com>, <jarkko@kernel.org>,
	<ashish.kalra@amd.com>, <nikunj.dadhania@amd.com>,
	<pankaj.gupta@amd.com>, <liam.merwick@oracle.com>,
	<zhi.a.wang@intel.com>, Brijesh Singh <brijesh.singh@amd.com>,
	<rppt@kernel.org>
Subject: Re: [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
Date: Tue, 16 Jan 2024 10:19:09 -0600	[thread overview]
Message-ID: <20240116161909.msbdwiyux7wsxw2i@amd.com> (raw)
In-Reply-To: <336b55f9-c7e6-4ec9-806b-cb3659dbfdc3@intel.com>

On Fri, Jan 12, 2024 at 12:37:31PM -0800, Dave Hansen wrote:
> On 1/12/24 12:28, Tom Lendacky wrote:
> > I thought there was also a desire to remove the direct map for any pages
> > assigned to a guest as private, not just the case that the comment says.
> > So updating the comment would probably the best action.
> 
> I'm not sure who desires that.
> 
> It's sloooooooow to remove things from the direct map.  There's almost
> certainly a frequency cutoff where running the whole direct mapping as
> 4k is better than the cost of mapping/unmapping.

One area we'd been looking to optimize[1] is lots of vCPUs/guests
faulting in private memory on-demand via kernels with lazy acceptance
support. The lazy acceptance / #NPF path for each 4K/2M guest page
involves updating the corresponding RMP table entry to set it to
Guest-owned state, and as part of that we remove the PFNs from the
directmap. There is indeed potential for scalability issues due to how
directmap updates are currently handled, since they involve holding a
global cpa_lock for every update.

At the time, there were investigations on how to remove the cpa_lock,
and there's an RFC patch[2] that implements this change, but I don't
know where this stands today.

There was also some work[3] on being able to restore 2MB entries in the
directmap (currently entries are only re-added as 4K) the Mike Rapoport
pointed out to me a while back. With that, since the bulk of private
guest pages 2MB, we'd be able to avoid splitting the directmap to 4K over
time.

There was also some indication[4] that UPM/guest_memfd would eventually
manage directmap invalidations internally, so it made sense to have
SNP continue to handle this up until the point that mangement was
moved to gmem.

Those 3 things, paired with a platform-independent way of catching
unexpected kernel accesses to private guest memory, are what I think
nudged us all toward the current implementation.

But AFAIK none of these 3 things are being actively upstreamed today,
so it makes sense to re-consider how we handle the directmap in the
interim.

I did some performance tests which do seem to indicate that
pre-splitting the directmap to 4K can be substantially improve certain
SNP guest workloads. This test involves running a single 1TB SNP guest
with 128 vCPUs running "stress --vm 128 --vm-bytes 5G --vm-keep" to
rapidly fault in all of its memory via lazy acceptance, and then
measuring the rate that gmem pages are being allocated on the host by
monitoring "FileHugePages" from /proc/meminfo to get some rough gauge
of how quickly a guest can fault in it's initial working set prior to
reaching steady state. The data is a bit noisy but seems to indicate
significant improvement by taking the directmap updates out of the
lazy acceptance path, and I would only expect that to become more
significant as you scale up the number of guests / vCPUs.

  # Average fault-in rate across 3 runs, measured in GB/s
                    unpinned | pinned to NUMA node 0
  DirectMap4K           12.9 | 12.1
             stddev      2.2 |  1.3
  DirectMap2M+split      8.0 |  8.9
             stddev      1.3 |  0.8

The downside of course is potential impact for non-SNP workloads
resulting from splitting the directmap. Mike Rapoport's numbers make
me feel a little better about it, but I don't think they apply directly
to the notion of splitting the entire directmap. It's Even he LWN article
summarizes:

  "The conclusion from all of this, Rapoport continued, was that
  direct-map fragmentation just does not matter — for data access, at
  least. Using huge-page mappings does still appear to make a difference
  for memory containing the kernel code, so allocator changes should
  focus on code allocations — improving the layout of allocations for
  loadable modules, for example, or allowing vmalloc() to allocate huge
  pages for code. But, for kernel-data allocations, direct-map
  fragmentation simply appears to not be worth worrying about."

So at the very least, if we went down this path, we would be worth
investigating the following areas in addition to general perf testing:

  1) Only splitting directmap regions corresponding to kernel-allocatable
     *data* (hopefully that's even feasible...)
  2) Potentially deferring the split until an SNP guest is actually
     run, so there isn't any impact just from having SNP enabled (though
     you still take a hit from RMP checks in that case so maybe it's not
     worthwhile, but that itself has been noted as a concern for users
     so it would be nice to not make things even worse).

[1] https://lore.kernel.org/linux-mm/20231103000105.m3z4eijcxlxciyzd@amd.com/
[2] https://lore.kernel.org/lkml/Y7f9ZuPcIMk37KnN@gmail.com/T/#m15b74841f5319c0d1177f118470e9714d4ea96c8
[3] https://lore.kernel.org/linux-kernel/20200416213229.19174-1-kirill.shutemov@linux.intel.com/
[4] https://lore.kernel.org/all/YyGLXXkFCmxBfu5U@google.com/

> 
> Actually, where _is_ the TLB flushing here?

Boris pointed that out in v6, and we implemented it in v7, but it
completely cratered performance:

   https://lore.kernel.org/linux-mm/20221219150026.bltiyk72pmdc2ic3@amd.com/

After further discussion I think we'd concluded it wasn't necessary. Maybe
that's worth revisiting though. If it is necessary, then that would be
another reason to just pre-split the directmap because the above-mentioned
lazy acceptance workload/bottleneck would likely get substantially worse.

-Mike

  parent reply	other threads:[~2024-01-16 16:20 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-30 16:19 [PATCH v1 00/26] Add AMD Secure Nested Paging (SEV-SNP) Initialization Support Michael Roth
2023-12-30 16:19 ` [PATCH v1 01/26] x86/cpufeatures: Add SEV-SNP CPU feature Michael Roth
2023-12-31 11:50   ` Borislav Petkov
2023-12-31 16:44     ` Michael Roth
2023-12-30 16:19 ` [PATCH v1 02/26] x86/speculation: Do not enable Automatic IBRS if SEV SNP is enabled Michael Roth
2023-12-30 16:19 ` [PATCH v1 03/26] iommu/amd: Don't rely on external callers to enable IOMMU SNP support Michael Roth
2024-01-04 10:30   ` Borislav Petkov
2024-01-04 10:58   ` Joerg Roedel
2023-12-30 16:19 ` [PATCH v1 04/26] x86/sev: Add the host SEV-SNP initialization support Michael Roth
2024-01-04 11:05   ` Jeremi Piotrowski
2024-01-05 16:09     ` Borislav Petkov
2024-01-05 16:21       ` Borislav Petkov
2024-01-08 16:49         ` Jeremi Piotrowski
2024-01-08 17:04           ` Borislav Petkov
2024-01-09 11:56             ` Jeremi Piotrowski
2024-01-09 12:29               ` Borislav Petkov
2024-01-09 12:44                 ` Borislav Petkov
2024-02-14 16:56                   ` Jeremi Piotrowski
2024-01-04 11:16   ` Borislav Petkov
2024-01-04 14:42   ` Borislav Petkov
2024-01-05 19:19   ` Borislav Petkov
2024-01-05 21:27   ` Borislav Petkov
2023-12-30 16:19 ` [PATCH v1 05/26] x86/mtrr: Don't print errors if MtrrFixDramModEn is set when SNP enabled Michael Roth
2023-12-30 16:19 ` [PATCH v1 06/26] x86/sev: Add RMP entry lookup helpers Michael Roth
2023-12-30 16:19 ` [PATCH v1 07/26] x86/fault: Add helper for dumping RMP entries Michael Roth
2024-01-10  9:59   ` Borislav Petkov
2024-01-10 20:18     ` Jarkko Sakkinen
2024-01-10 22:14       ` Borislav Petkov
2024-01-10 11:13   ` Borislav Petkov
2024-01-10 15:20     ` Tom Lendacky
2024-01-10 15:27       ` Borislav Petkov
2024-01-10 15:51         ` Tom Lendacky
2024-01-10 15:55           ` Borislav Petkov
2024-01-10 15:10   ` Tom Lendacky
2023-12-30 16:19 ` [PATCH v1 08/26] x86/traps: Define RMP violation #PF error code Michael Roth
2023-12-30 16:19 ` [PATCH v1 09/26] x86/fault: Dump RMP table information when RMP page faults occur Michael Roth
2023-12-30 16:19 ` [PATCH v1 10/26] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction Michael Roth
2024-01-12 14:49   ` Borislav Petkov
2023-12-30 16:19 ` [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table Michael Roth
2024-01-12 19:48   ` Borislav Petkov
2024-01-12 20:00   ` Dave Hansen
2024-01-12 20:07     ` Borislav Petkov
2024-01-12 20:27       ` Vlastimil Babka
2024-01-15  9:06         ` Borislav Petkov
2024-01-15  9:14           ` Vlastimil Babka
2024-01-15  9:16           ` Mike Rapoport
2024-01-15  9:20             ` Borislav Petkov
2024-01-12 20:28       ` Tom Lendacky
2024-01-12 20:37         ` Dave Hansen
2024-01-15  9:23           ` Vlastimil Babka
2024-01-16 16:19           ` Michael Roth [this message]
2024-01-16 16:50             ` Michael Roth
2024-01-16 20:12               ` Mike Rapoport
2024-01-26  1:49                 ` Michael Roth
2024-01-16 18:22             ` Borislav Petkov
2024-01-16 20:22             ` Dave Hansen
2024-01-26  1:35               ` Michael Roth
2024-01-15  9:09     ` Borislav Petkov
2024-01-16 16:21       ` Dave Hansen
2024-01-17  9:34         ` Borislav Petkov
2024-01-15  9:01   ` Borislav Petkov
2023-12-30 16:19 ` [PATCH v1 12/26] crypto: ccp: Define the SEV-SNP commands Michael Roth
2024-01-15  9:41   ` Borislav Petkov
2024-01-26  1:56     ` Michael Roth
2023-12-30 16:19 ` [PATCH v1 13/26] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP Michael Roth
2024-01-15 11:19   ` Borislav Petkov
2024-01-15 19:53   ` Borislav Petkov
2024-01-26  2:48     ` Michael Roth
2023-12-30 16:19 ` [PATCH v1 14/26] crypto: ccp: Provide API to issue SEV and SNP commands Michael Roth
2024-01-17  9:48   ` Borislav Petkov
2023-12-30 16:19 ` [PATCH v1 15/26] x86/sev: Introduce snp leaked pages list Michael Roth
2024-01-08 10:45   ` Vlastimil Babka
2024-01-09 22:19     ` Kalra, Ashish
2024-01-10  8:59       ` Vlastimil Babka
2023-12-30 16:19 ` [PATCH v1 16/26] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled Michael Roth
2023-12-30 16:19 ` [PATCH v1 17/26] crypto: ccp: Handle non-volatile INIT_EX data " Michael Roth
2024-01-18 14:03   ` Borislav Petkov
2023-12-30 16:19 ` [PATCH v1 18/26] crypto: ccp: Handle legacy SEV commands " Michael Roth
2024-01-19 17:18   ` Borislav Petkov
2024-01-19 17:36     ` Tom Lendacky
2024-01-19 17:48       ` Borislav Petkov
2024-01-26 13:29     ` Michael Roth
2023-12-30 16:19 ` [PATCH v1 19/26] iommu/amd: Clean up RMP entries for IOMMU pages during SNP shutdown Michael Roth
2023-12-30 16:19 ` [PATCH v1 20/26] crypto: ccp: Add debug support for decrypting pages Michael Roth
2024-01-10 14:59   ` Sean Christopherson
2024-01-11  0:50     ` Michael Roth
2023-12-30 16:19 ` [PATCH v1 21/26] crypto: ccp: Add panic notifier for SEV/SNP firmware shutdown on kdump Michael Roth
2024-01-21 11:49   ` Borislav Petkov
2024-01-26  3:03     ` Kalra, Ashish
2024-01-26 13:38     ` Michael Roth
2023-12-30 16:19 ` [PATCH v1 22/26] KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe Michael Roth
2024-01-21 11:51   ` Borislav Petkov
2024-01-26  3:44     ` Michael Roth
2023-12-30 16:19 ` [PATCH v1 23/26] x86/cpufeatures: Enable/unmask SEV-SNP CPU feature Michael Roth
2023-12-30 16:19 ` [PATCH v1 24/26] crypto: ccp: Add the SNP_PLATFORM_STATUS command Michael Roth
2024-01-21 12:29   ` Borislav Petkov
2024-01-26  3:32     ` Michael Roth
2023-12-30 16:19 ` [PATCH v1 25/26] crypto: ccp: Add the SNP_COMMIT command Michael Roth
2024-01-21 12:35   ` Borislav Petkov
2023-12-30 16:19 ` [PATCH v1 26/26] crypto: ccp: Add the SNP_SET_CONFIG command Michael Roth
2024-01-21 12:41   ` Borislav Petkov
2024-01-26 13:30     ` Michael Roth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240116161909.msbdwiyux7wsxw2i@amd.com \
    --to=michael.roth@amd.com \
    --cc=ak@linux.intel.com \
    --cc=alpergun@google.com \
    --cc=ardb@kernel.org \
    --cc=ashish.kalra@amd.com \
    --cc=bp@alien8.de \
    --cc=brijesh.singh@amd.com \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=jarkko@kernel.org \
    --cc=jmattson@google.com \
    --cc=jroedel@suse.de \
    --cc=kirill@shutemov.name \
    --cc=kvm@vger.kernel.org \
    --cc=liam.merwick@oracle.com \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=nikunj.dadhania@amd.com \
    --cc=pankaj.gupta@amd.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pgonda@google.com \
    --cc=rientjes@google.com \
    --cc=rppt@kernel.org \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=seanjc@google.com \
    --cc=slp@redhat.com \
    --cc=srinivas.pandruvada@linux.intel.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=tobin@ibm.com \
    --cc=tony.luck@intel.com \
    --cc=vbabka@suse.cz \
    --cc=vkuznets@redhat.com \
    --cc=x86@kernel.org \
    --cc=zhi.a.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).