From: Dave Hansen <dave.hansen@intel.com>
To: Brijesh Singh <brijesh.singh@amd.com>,
linux-kernel@vger.kernel.org, x86@kernel.org,
kvm@vger.kernel.org, linux-crypto@vger.kernel.org
Cc: ak@linux.intel.com, herbert@gondor.apana.org.au,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Joerg Roedel <jroedel@suse.de>, "H. Peter Anvin" <hpa@zytor.com>,
Tony Luck <tony.luck@intel.com>,
"Peter Zijlstra (Intel)" <peterz@infradead.org>,
Paolo Bonzini <pbonzini@redhat.com>,
Tom Lendacky <thomas.lendacky@amd.com>,
David Rientjes <rientjes@google.com>,
Sean Christopherson <seanjc@google.com>
Subject: Re: [RFC Part2 PATCH 07/30] mm: add support to split the large THP based on RMP violation
Date: Thu, 25 Mar 2021 07:48:55 -0700 [thread overview]
Message-ID: <0edd1350-4865-dd71-5c14-3d57c784d62d@intel.com> (raw)
In-Reply-To: <20210324170436.31843-8-brijesh.singh@amd.com>
On 3/24/21 10:04 AM, Brijesh Singh wrote:
> When SEV-SNP is enabled globally in the system, a write from the hypervisor
> can raise an RMP violation. We can resolve the RMP violation by splitting
> the virtual address to a lower page level.
>
> e.g
> - guest made a page shared in the RMP entry so that the hypervisor
> can write to it.
> - the hypervisor has mapped the pfn as a large page. A write access
> will cause an RMP violation if one of the pages within the 2MB region
> is a guest private page.
>
> The above RMP violation can be resolved by simply splitting the large
> page.
What if the large page is provided by hugetlbfs?
What if the kernel uses the direct map to access the page instead of the
userspace mapping?
> The architecture specific code will read the RMP entry to determine
> if the fault can be resolved by splitting and propagating the request
> to split the page by setting newly introduced fault flag
> (FAULT_FLAG_PAGE_SPLIT). If the fault cannot be resolved by splitting,
> then a SIGBUS signal is sent to terminate the process.
Are users just supposed to know what memory types are compatible with
SEV-SNP? Basically, don't use anything that might map a guest using
non-4k entries, except THP?
This does seem like a rather nasty aspect of the hardware. For
everything else, if the virtualization page tables and the x86 tables
disagree, the TLB just sees the smallest page size.
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 7605e06a6dd9..f6571563f433 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -1305,6 +1305,70 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code,
> }
> NOKPROBE_SYMBOL(do_kern_addr_fault);
>
> +#define RMP_FAULT_RETRY 0
> +#define RMP_FAULT_KILL 1
> +#define RMP_FAULT_PAGE_SPLIT 2
> +
> +static inline size_t pages_per_hpage(int level)
> +{
> + return page_level_size(level) / PAGE_SIZE;
> +}
> +
> +/*
> + * The RMP fault can happen when a hypervisor attempts to write to:
> + * 1. a guest owned page or
> + * 2. any pages in the large page is a guest owned page.
> + *
> + * #1 will happen only when a process or VMM is attempting to modify the guest page
> + * without the guests cooperation. If a guest wants a VMM to be able to write to its memory
> + * then it should make the page shared. If we detect #1, kill the process because we can not
> + * resolve the fault.
> + *
> + * #2 can happen when the page level does not match between the RMP entry and x86
> + * page table walk, e.g the page is mapped as a large page in the x86 page table but its
> + * added as a 4K shared page in the RMP entry. This can be resolved by splitting the address
> + * into a smaller page level.
> + */
These comments need to get wrapped a bit sooner. Could you try to match
some of the others in the file?
> +static int handle_rmp_page_fault(unsigned long hw_error_code, unsigned long address)
> +{
> + unsigned long pfn, mask;
> + int rmp_level, level;
> + rmpentry_t *e;
> + pte_t *pte;
> +
> + /* Get the native page level */
> + pte = lookup_address_in_mm(current->mm, address, &level);
> + if (unlikely(!pte))
> + return RMP_FAULT_KILL;
> +
> + pfn = pte_pfn(*pte);
> + if (level > PG_LEVEL_4K) {
> + mask = pages_per_hpage(level) - pages_per_hpage(level - 1);
> + pfn |= (address >> PAGE_SHIFT) & mask;
> + }
What is this trying to do, exactly?
> + /* Get the page level from the RMP entry. */
> + e = lookup_page_in_rmptable(pfn_to_page(pfn), &rmp_level);
> + if (!e) {
> + pr_alert("SEV-SNP: failed to lookup RMP entry for address 0x%lx pfn 0x%lx\n",
> + address, pfn);
> + return RMP_FAULT_KILL;
> + }
> +
> + /* Its a guest owned page */
> + if (rmpentry_assigned(e))
> + return RMP_FAULT_KILL;
> +
> + /*
> + * Its a shared page but the page level does not match between the native walk
> + * and RMP entry.
> + */
For these two-line comments, please try to split the text fairly evenly
between the lines.
> + if (level > rmp_level)
> + return RMP_FAULT_PAGE_SPLIT;
> +
> + return RMP_FAULT_RETRY;
> +}
> +
> /* Handle faults in the user portion of the address space */
> static inline
> void do_user_addr_fault(struct pt_regs *regs,
> @@ -1315,6 +1379,7 @@ void do_user_addr_fault(struct pt_regs *regs,
> struct task_struct *tsk;
> struct mm_struct *mm;
> vm_fault_t fault;
> + int ret;
> unsigned int flags = FAULT_FLAG_DEFAULT;
>
> tsk = current;
> @@ -1377,6 +1442,22 @@ void do_user_addr_fault(struct pt_regs *regs,
> if (hw_error_code & X86_PF_INSTR)
> flags |= FAULT_FLAG_INSTRUCTION;
>
> + /*
> + * If its an RMP violation, see if we can resolve it.
> + */
> + if ((hw_error_code & X86_PF_RMP)) {
> + ret = handle_rmp_page_fault(hw_error_code, address);
> + if (ret == RMP_FAULT_PAGE_SPLIT) {
> + flags |= FAULT_FLAG_PAGE_SPLIT;
> + } else if (ret == RMP_FAULT_KILL) {
> + fault |= VM_FAULT_SIGBUS;
> + mm_fault_error(regs, hw_error_code, address, fault);
> + return;
> + } else {
> + return;
> + }
> + }
> +
> #ifdef CONFIG_X86_64
> /*
> * Faults in the vsyscall page might need emulation. The
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ecdf8a8cd6ae..1be3218f3738 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -434,6 +434,8 @@ extern pgprot_t protection_map[16];
> * @FAULT_FLAG_REMOTE: The fault is not for current task/mm.
> * @FAULT_FLAG_INSTRUCTION: The fault was during an instruction fetch.
> * @FAULT_FLAG_INTERRUPTIBLE: The fault can be interrupted by non-fatal signals.
> + * @FAULT_FLAG_PAGE_SPLIT: The fault was due page size mismatch, split the region to smaller
> + * page size and retry.
> *
> * About @FAULT_FLAG_ALLOW_RETRY and @FAULT_FLAG_TRIED: we can specify
> * whether we would allow page faults to retry by specifying these two
> @@ -464,6 +466,7 @@ extern pgprot_t protection_map[16];
> #define FAULT_FLAG_REMOTE 0x80
> #define FAULT_FLAG_INSTRUCTION 0x100
> #define FAULT_FLAG_INTERRUPTIBLE 0x200
> +#define FAULT_FLAG_PAGE_SPLIT 0x400
>
> /*
> * The default fault flags that should be used by most of the
> @@ -501,7 +504,8 @@ static inline bool fault_flag_allow_retry_first(unsigned int flags)
> { FAULT_FLAG_USER, "USER" }, \
> { FAULT_FLAG_REMOTE, "REMOTE" }, \
> { FAULT_FLAG_INSTRUCTION, "INSTRUCTION" }, \
> - { FAULT_FLAG_INTERRUPTIBLE, "INTERRUPTIBLE" }
> + { FAULT_FLAG_INTERRUPTIBLE, "INTERRUPTIBLE" }, \
> + { FAULT_FLAG_PAGE_SPLIT, "PAGESPLIT" }
>
> /*
> * vm_fault is filled by the pagefault handler and passed to the vma's
> diff --git a/mm/memory.c b/mm/memory.c
> index feff48e1465a..c9dcf9b30719 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4427,6 +4427,12 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
> return 0;
> }
>
> +static int handle_split_page_fault(struct vm_fault *vmf)
> +{
> + __split_huge_pmd(vmf->vma, vmf->pmd, vmf->address, false, NULL);
> + return 0;
> +}
Wait a sec, I thought this could fail. Where's the "failed to split"
path? Why does this even return an error code if it's always 0?
> /*
> * By the time we get here, we already hold the mm semaphore
> *
> @@ -4448,6 +4454,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
> pgd_t *pgd;
> p4d_t *p4d;
> vm_fault_t ret;
> + int split_page = flags & FAULT_FLAG_PAGE_SPLIT;
>
> pgd = pgd_offset(mm, address);
> p4d = p4d_alloc(mm, pgd, address);
> @@ -4504,6 +4511,10 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
> pmd_migration_entry_wait(mm, vmf.pmd);
> return 0;
> }
> +
> + if (split_page)
> + return handle_split_page_fault(&vmf);
> +
> if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) {
> if (pmd_protnone(orig_pmd) && vma_is_accessible(vma))
> return do_huge_pmd_numa_page(&vmf, orig_pmd);
Is there a reason for the 'split_page' variable? It seems like a waste
of space.
next prev parent reply other threads:[~2021-03-25 14:49 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-24 17:04 [RFC Part2 PATCH 00/30] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 01/30] x86: Add the host SEV-SNP initialization support Brijesh Singh
2021-03-25 14:58 ` Dave Hansen
2021-03-25 15:31 ` Brijesh Singh
2021-03-25 15:51 ` Dave Hansen
2021-03-25 17:41 ` Brijesh Singh
2021-04-14 7:27 ` Borislav Petkov
2021-04-14 22:48 ` Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 02/30] x86/sev-snp: add RMP entry lookup helpers Brijesh Singh
2021-04-15 16:57 ` Borislav Petkov
2021-04-15 18:08 ` Brijesh Singh
2021-04-15 19:50 ` Borislav Petkov
2021-04-15 22:18 ` Brijesh Singh
2021-04-15 17:03 ` Borislav Petkov
2021-04-15 18:09 ` Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 03/30] x86: add helper functions for RMPUPDATE and PSMASH instruction Brijesh Singh
2021-04-15 18:00 ` Borislav Petkov
2021-04-15 18:15 ` Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 04/30] x86/mm: split the physmap when adding the page in RMP table Brijesh Singh
2021-03-25 15:17 ` Dave Hansen
2021-04-19 12:32 ` Borislav Petkov
2021-04-19 15:25 ` Brijesh Singh
2021-04-19 16:52 ` Borislav Petkov
[not found] ` <30bff969-e8cf-a991-7660-054ea136855a@amd.com>
2021-04-19 17:58 ` Dave Hansen
2021-04-19 18:10 ` Andy Lutomirski
2021-04-19 18:33 ` Dave Hansen
2021-04-19 18:37 ` Andy Lutomirski
2021-04-20 9:51 ` Borislav Petkov
2021-04-19 21:25 ` Brijesh Singh
2021-04-20 9:47 ` Borislav Petkov
2021-03-24 17:04 ` [RFC Part2 PATCH 05/30] x86: define RMP violation #PF error code Brijesh Singh
2021-03-24 18:03 ` Dave Hansen
2021-03-25 14:32 ` Brijesh Singh
2021-03-25 14:34 ` Dave Hansen
2021-04-20 10:32 ` Borislav Petkov
2021-04-20 21:37 ` Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 06/30] x86/fault: dump the RMP entry on #PF Brijesh Singh
2021-03-24 17:47 ` Andy Lutomirski
2021-03-24 20:35 ` Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 07/30] mm: add support to split the large THP based on RMP violation Brijesh Singh
2021-03-25 14:30 ` Dave Hansen
2021-03-25 14:48 ` Dave Hansen [this message]
2021-03-25 15:24 ` Brijesh Singh
2021-03-25 15:59 ` Dave Hansen
2021-04-21 12:59 ` Vlastimil Babka
2021-04-21 13:43 ` Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 08/30] crypto:ccp: define the SEV-SNP commands Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 09/30] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 10/30] crypto: ccp: shutdown SNP firmware on kexec Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 11/30] crypto:ccp: provide APIs to issue SEV-SNP commands Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 12/30] crypto ccp: handle the legacy SEV command when SNP is enabled Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 13/30] KVM: SVM: add initial SEV-SNP support Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 14/30] KVM: SVM: make AVIC backing, VMSA and VMCB memory allocation SNP safe Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 15/30] KVM: SVM: define new SEV_FEATURES field in the VMCB Save State Area Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 16/30] KVM: SVM: add KVM_SNP_INIT command Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 17/30] KVM: SVM: add KVM_SEV_SNP_LAUNCH_START command Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 18/30] KVM: SVM: add KVM_SEV_SNP_LAUNCH_UPDATE command Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 19/30] KVM: SVM: Reclaim the guest pages when SEV-SNP VM terminates Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 20/30] KVM: SVM: add KVM_SEV_SNP_LAUNCH_FINISH command Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 21/30] KVM: X86: Add kvm_x86_ops to get the max page level for the TDP Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 22/30] x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by SEV Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 23/30] KVM: X86: Introduce kvm_mmu_get_tdp_walk() for SEV-SNP use Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 24/30] KVM: X86: define new RMP check related #NPF error bits Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 25/30] KVM: X86: update page-fault trace to log the 64-bit error code Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 26/30] KVM: SVM: add support to handle GHCB GPA register VMGEXIT Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 27/30] KVM: SVM: add support to handle MSR based Page State Change VMGEXIT Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 28/30] KVM: SVM: add support to handle " Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 29/30] KVM: X86: export the kvm_zap_gfn_range() for the SNP use Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 30/30] KVM: X86: Add support to handle the RMP nested page fault Brijesh Singh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0edd1350-4865-dd71-5c14-3d57c784d62d@intel.com \
--to=dave.hansen@intel.com \
--cc=ak@linux.intel.com \
--cc=bp@alien8.de \
--cc=brijesh.singh@amd.com \
--cc=herbert@gondor.apana.org.au \
--cc=hpa@zytor.com \
--cc=jroedel@suse.de \
--cc=kvm@vger.kernel.org \
--cc=linux-crypto@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=rientjes@google.com \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=thomas.lendacky@amd.com \
--cc=tony.luck@intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).