linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Brijesh Singh <brijesh.singh@amd.com>
To: Vlastimil Babka <vbabka@suse.cz>,
	Dave Hansen <dave.hansen@intel.com>,
	linux-kernel@vger.kernel.org, x86@kernel.org,
	kvm@vger.kernel.org, linux-crypto@vger.kernel.org
Cc: brijesh.singh@amd.com, ak@linux.intel.com,
	herbert@gondor.apana.org.au, Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Joerg Roedel <jroedel@suse.de>, "H. Peter Anvin" <hpa@zytor.com>,
	Tony Luck <tony.luck@intel.com>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	David Rientjes <rientjes@google.com>,
	Sean Christopherson <seanjc@google.com>
Subject: Re: [RFC Part2 PATCH 07/30] mm: add support to split the large THP based on RMP violation
Date: Wed, 21 Apr 2021 08:43:32 -0500	[thread overview]
Message-ID: <0a7a2ea4-6f07-61f8-fc02-c084734788ec@amd.com> (raw)
In-Reply-To: <55445efd-dc29-3693-a189-710c8a61dec2@suse.cz>


On 4/21/21 7:59 AM, Vlastimil Babka wrote:
> On 3/25/21 4:59 PM, Dave Hansen wrote:
>> On 3/25/21 8:24 AM, Brijesh Singh wrote:
>>> On 3/25/21 9:48 AM, Dave Hansen wrote:
>>>> On 3/24/21 10:04 AM, Brijesh Singh wrote:
>>>>> When SEV-SNP is enabled globally in the system, a write from the hypervisor
>>>>> can raise an RMP violation. We can resolve the RMP violation by splitting
>>>>> the virtual address to a lower page level.
>>>>>
>>>>> e.g
>>>>> - guest made a page shared in the RMP entry so that the hypervisor
>>>>>   can write to it.
>>>>> - the hypervisor has mapped the pfn as a large page. A write access
>>>>>   will cause an RMP violation if one of the pages within the 2MB region
>>>>>   is a guest private page.
>>>>>
>>>>> The above RMP violation can be resolved by simply splitting the large
>>>>> page.
>>>> What if the large page is provided by hugetlbfs?
>>> I was not able to find a method to split the large pages in the
>>> hugetlbfs. Unfortunately, at this time a VMM cannot use the backing
>>> memory from the hugetlbfs pool. An SEV-SNP aware VMM can use either
>>> transparent hugepage or small pages.
>> That's really, really nasty.  Especially since it might not be evident
>> until long after boot and the guest is killed.
> I'd assume a SNP-aware QEMU would be needed in the first place and thus this
> QEMU would know not to use hugetlbfs?


Yes, that is correct. Qemu patches will not launch SEV-SNP guest when
hugetlbfs is used. I can also look to add the check in kernel to ensure
that backing pages does not come from the hugetlbfs so that non-QEMU VMM
will also fail to create the SNP guest.

>
>> It's even nastier because hugetlbfs is actually a great fit for SEV-SNP
>> memory.  It's physically contiguous, so it would keep you from having to
> Maybe this could be solvable by remapping the hugetlbfs page with pte's when
> needed (a guest wants to share 4k out of 2MB with the host temporarily). But
> certainly never as flexibly as pte-mapped THP's as the complexity of that
> (refcounting tail pages etc) is significant.
>
>> fracture the direct map all the way down to 4k, it also can't be
>> reclaimed (just like all SEV memory).
> About that... the whitepaper I've seen [1] mentions support for swapping guest
> pages. I'd expect the same mechanism could be used for their migration -
> scattering 4kB unmovable SEV pages around would be terrible for fragmentation. I
> assume neither swap or migration support is part of the patchset(s) yet?


Yes, the patches does not support swapping guest pages yet. We want to
add the support incrementally. The swap/move can be implemented after we
have the base enabled in the kernel. Both the SEV and SNP firmware
provides PSP commands that can be used to swap the guest pages. I
believe KVM mmu notifier can use it during the page move.

>
>> I think the minimal thing you can do here is to fail to add memory to
>> the RMP in the first place if you can't split it.  That way, users will
>> at least fail to _start_ their VM versus dying randomly for no good reason.
>>
>> Even better would be to come up with a stronger contract between host
>> and guest.  I really don't think the host should be exposed to random
>> RMP faults on the direct map.  If the guest wants to share memory, then
>> it needs to tell the host and give the host an opportunity to move the
>> guest physical memory.  It might, for instance, sequester all the shared
>> pages in a single spot to minimize direct map fragmentation.
> Agreed, and the contract should be elaborated before going to implementation
> details (patches). Could a malicious guest violate such contract unilaterally? I
> guess not, because psmash is a hypervisor instruction? And if yes, the
> RMP-specific page fault handlers would be used just to kill such guest, not to
> fix things up during page fault.

The version 2 of GHCB specification defines a contract between the guest
and the host. When guest is ready to share a page with the host it
issues the page state change request to the hypervisor. Hypervisor is
responsible to add the page in the RMP table using the RMPUPDATE
instruction. The page state change request include an operation field.
The operation can be one of the following

1. Add page in RMP table (make guest page private)

2. Remove page from RMP table (make guest page shared)

3. Psmash - split the large RMP entry

4. Unmash - merge small RMP entry into large. The unmash operation
require the PSP assist.

The current RMP-specific fault handler checks if host is attempting to
write to a guest private page. If so, kill the guest. I guess it covers
the case where a malicious guest violates the contract to issue the
page-state-change.


>> I'll let the other x86 folks chime in on this, but I really think this
>> needs a different approach than what's being proposed.
> Not an x86 folk, but agreed :)
>
> [1]
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.amd.com%2Fsystem%2Ffiles%2FTechDocs%2FSEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf&amp;data=04%7C01%7Cbrijesh.singh%40amd.com%7C3a8c99a1738940b550af08d904c55938%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637546068243853651%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=x%2Bmtud8IxrykFCPAPgBu2CCAFO9Q26PA3OhryvlX%2BbM%3D&amp;reserved=0

  reply	other threads:[~2021-04-21 13:44 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-24 17:04 [RFC Part2 PATCH 00/30] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 01/30] x86: Add the host SEV-SNP initialization support Brijesh Singh
2021-03-25 14:58   ` Dave Hansen
2021-03-25 15:31     ` Brijesh Singh
2021-03-25 15:51       ` Dave Hansen
2021-03-25 17:41         ` Brijesh Singh
2021-04-14  7:27   ` Borislav Petkov
2021-04-14 22:48     ` Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 02/30] x86/sev-snp: add RMP entry lookup helpers Brijesh Singh
2021-04-15 16:57   ` Borislav Petkov
2021-04-15 18:08     ` Brijesh Singh
2021-04-15 19:50       ` Borislav Petkov
2021-04-15 22:18         ` Brijesh Singh
2021-04-15 17:03   ` Borislav Petkov
2021-04-15 18:09     ` Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 03/30] x86: add helper functions for RMPUPDATE and PSMASH instruction Brijesh Singh
2021-04-15 18:00   ` Borislav Petkov
2021-04-15 18:15     ` Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 04/30] x86/mm: split the physmap when adding the page in RMP table Brijesh Singh
2021-03-25 15:17   ` Dave Hansen
2021-04-19 12:32   ` Borislav Petkov
2021-04-19 15:25     ` Brijesh Singh
2021-04-19 16:52       ` Borislav Petkov
     [not found]         ` <30bff969-e8cf-a991-7660-054ea136855a@amd.com>
2021-04-19 17:58           ` Dave Hansen
2021-04-19 18:10             ` Andy Lutomirski
2021-04-19 18:33               ` Dave Hansen
2021-04-19 18:37                 ` Andy Lutomirski
2021-04-20  9:51                 ` Borislav Petkov
2021-04-19 21:25               ` Brijesh Singh
2021-04-20  9:47           ` Borislav Petkov
2021-03-24 17:04 ` [RFC Part2 PATCH 05/30] x86: define RMP violation #PF error code Brijesh Singh
2021-03-24 18:03   ` Dave Hansen
2021-03-25 14:32     ` Brijesh Singh
2021-03-25 14:34       ` Dave Hansen
2021-04-20 10:32   ` Borislav Petkov
2021-04-20 21:37     ` Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 06/30] x86/fault: dump the RMP entry on #PF Brijesh Singh
2021-03-24 17:47   ` Andy Lutomirski
2021-03-24 20:35     ` Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 07/30] mm: add support to split the large THP based on RMP violation Brijesh Singh
2021-03-25 14:30   ` Dave Hansen
2021-03-25 14:48   ` Dave Hansen
2021-03-25 15:24     ` Brijesh Singh
2021-03-25 15:59       ` Dave Hansen
2021-04-21 12:59         ` Vlastimil Babka
2021-04-21 13:43           ` Brijesh Singh [this message]
2021-03-24 17:04 ` [RFC Part2 PATCH 08/30] crypto:ccp: define the SEV-SNP commands Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 09/30] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 10/30] crypto: ccp: shutdown SNP firmware on kexec Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 11/30] crypto:ccp: provide APIs to issue SEV-SNP commands Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 12/30] crypto ccp: handle the legacy SEV command when SNP is enabled Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 13/30] KVM: SVM: add initial SEV-SNP support Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 14/30] KVM: SVM: make AVIC backing, VMSA and VMCB memory allocation SNP safe Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 15/30] KVM: SVM: define new SEV_FEATURES field in the VMCB Save State Area Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 16/30] KVM: SVM: add KVM_SNP_INIT command Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 17/30] KVM: SVM: add KVM_SEV_SNP_LAUNCH_START command Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 18/30] KVM: SVM: add KVM_SEV_SNP_LAUNCH_UPDATE command Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 19/30] KVM: SVM: Reclaim the guest pages when SEV-SNP VM terminates Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 20/30] KVM: SVM: add KVM_SEV_SNP_LAUNCH_FINISH command Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 21/30] KVM: X86: Add kvm_x86_ops to get the max page level for the TDP Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 22/30] x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by SEV Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 23/30] KVM: X86: Introduce kvm_mmu_get_tdp_walk() for SEV-SNP use Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 24/30] KVM: X86: define new RMP check related #NPF error bits Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 25/30] KVM: X86: update page-fault trace to log the 64-bit error code Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 26/30] KVM: SVM: add support to handle GHCB GPA register VMGEXIT Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 27/30] KVM: SVM: add support to handle MSR based Page State Change VMGEXIT Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 28/30] KVM: SVM: add support to handle " Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 29/30] KVM: X86: export the kvm_zap_gfn_range() for the SNP use Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 30/30] KVM: X86: Add support to handle the RMP nested page fault Brijesh Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0a7a2ea4-6f07-61f8-fc02-c084734788ec@amd.com \
    --to=brijesh.singh@amd.com \
    --cc=ak@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@intel.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=hpa@zytor.com \
    --cc=jroedel@suse.de \
    --cc=kvm@vger.kernel.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rientjes@google.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=tony.luck@intel.com \
    --cc=vbabka@suse.cz \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).