All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brijesh Singh <brijesh.singh@amd.com>
To: linux-kernel@vger.kernel.org, x86@kernel.org,
	kvm@vger.kernel.org, linux-crypto@vger.kernel.org
Cc: ak@linux.intel.com, herbert@gondor.apana.org.au,
	Brijesh Singh <brijesh.singh@amd.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Joerg Roedel <jroedel@suse.de>, "H. Peter Anvin" <hpa@zytor.com>,
	Tony Luck <tony.luck@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	David Rientjes <rientjes@google.com>,
	Sean Christopherson <seanjc@google.com>
Subject: [RFC Part2 PATCH 07/30] mm: add support to split the large THP based on RMP violation
Date: Wed, 24 Mar 2021 12:04:13 -0500	[thread overview]
Message-ID: <20210324170436.31843-8-brijesh.singh@amd.com> (raw)
In-Reply-To: <20210324170436.31843-1-brijesh.singh@amd.com>

When SEV-SNP is enabled globally in the system, a write from the hypervisor
can raise an RMP violation. We can resolve the RMP violation by splitting
the virtual address to a lower page level.

e.g
- guest made a page shared in the RMP entry so that the hypervisor
  can write to it.
- the hypervisor has mapped the pfn as a large page. A write access
  will cause an RMP violation if one of the pages within the 2MB region
  is a guest private page.

The above RMP violation can be resolved by simply splitting the large
page.

The architecture specific code will read the RMP entry to determine
if the fault can be resolved by splitting and propagating the request
to split the page by setting newly introduced fault flag
(FAULT_FLAG_PAGE_SPLIT). If the fault cannot be resolved by splitting,
then a SIGBUS signal is sent to terminate the process.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/mm/fault.c | 81 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/mm.h  |  6 +++-
 mm/memory.c         | 11 ++++++
 3 files changed, 97 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 7605e06a6dd9..f6571563f433 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1305,6 +1305,70 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code,
 }
 NOKPROBE_SYMBOL(do_kern_addr_fault);
 
+#define RMP_FAULT_RETRY		0
+#define RMP_FAULT_KILL		1
+#define RMP_FAULT_PAGE_SPLIT	2
+
+static inline size_t pages_per_hpage(int level)
+{
+	return page_level_size(level) / PAGE_SIZE;
+}
+
+/*
+ * The RMP fault can happen when a hypervisor attempts to write to:
+ * 1. a guest owned page or
+ * 2. any pages in the large page is a guest owned page.
+ *
+ * #1 will happen only when a process or VMM is attempting to modify the guest page
+ * without the guests cooperation. If a guest wants a VMM to be able to write to its memory
+ * then it should make the page shared. If we detect #1, kill the process because we can not
+ * resolve the fault.
+ *
+ * #2 can happen when the page level does not match between the RMP entry and x86
+ * page table walk, e.g the page is mapped as a large page in the x86 page table but its
+ * added as a 4K shared page in the RMP entry. This can be resolved by splitting the address
+ * into a smaller page level.
+ */
+static int handle_rmp_page_fault(unsigned long hw_error_code, unsigned long address)
+{
+	unsigned long pfn, mask;
+	int rmp_level, level;
+	rmpentry_t *e;
+	pte_t *pte;
+
+	/* Get the native page level */
+	pte = lookup_address_in_mm(current->mm, address, &level);
+	if (unlikely(!pte))
+		return RMP_FAULT_KILL;
+
+	pfn = pte_pfn(*pte);
+	if (level > PG_LEVEL_4K) {
+		mask = pages_per_hpage(level) - pages_per_hpage(level - 1);
+		pfn |= (address >> PAGE_SHIFT) & mask;
+	}
+
+	/* Get the page level from the RMP entry. */
+	e = lookup_page_in_rmptable(pfn_to_page(pfn), &rmp_level);
+	if (!e) {
+		pr_alert("SEV-SNP: failed to lookup RMP entry for address 0x%lx pfn 0x%lx\n",
+			 address, pfn);
+		return RMP_FAULT_KILL;
+	}
+
+	/* Its a guest owned page */
+	if (rmpentry_assigned(e))
+		return RMP_FAULT_KILL;
+
+	/*
+	 * Its a shared page but the page level does not match between the native walk
+	 * and RMP entry.
+	 */
+	if (level > rmp_level)
+		return RMP_FAULT_PAGE_SPLIT;
+
+	return RMP_FAULT_RETRY;
+}
+
 /* Handle faults in the user portion of the address space */
 static inline
 void do_user_addr_fault(struct pt_regs *regs,
@@ -1315,6 +1379,7 @@ void do_user_addr_fault(struct pt_regs *regs,
 	struct task_struct *tsk;
 	struct mm_struct *mm;
 	vm_fault_t fault;
+	int ret;
 	unsigned int flags = FAULT_FLAG_DEFAULT;
 
 	tsk = current;
@@ -1377,6 +1442,22 @@ void do_user_addr_fault(struct pt_regs *regs,
 	if (hw_error_code & X86_PF_INSTR)
 		flags |= FAULT_FLAG_INSTRUCTION;
 
+	/*
+	 * If its an RMP violation, see if we can resolve it.
+	 */
+	if ((hw_error_code & X86_PF_RMP)) {
+		ret = handle_rmp_page_fault(hw_error_code, address);
+		if (ret == RMP_FAULT_PAGE_SPLIT) {
+			flags |= FAULT_FLAG_PAGE_SPLIT;
+		} else if (ret == RMP_FAULT_KILL) {
+			fault |= VM_FAULT_SIGBUS;
+			mm_fault_error(regs, hw_error_code, address, fault);
+			return;
+		} else {
+			return;
+		}
+	}
+
 #ifdef CONFIG_X86_64
 	/*
 	 * Faults in the vsyscall page might need emulation.  The
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ecdf8a8cd6ae..1be3218f3738 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -434,6 +434,8 @@ extern pgprot_t protection_map[16];
  * @FAULT_FLAG_REMOTE: The fault is not for current task/mm.
  * @FAULT_FLAG_INSTRUCTION: The fault was during an instruction fetch.
  * @FAULT_FLAG_INTERRUPTIBLE: The fault can be interrupted by non-fatal signals.
+ * @FAULT_FLAG_PAGE_SPLIT: The fault was due page size mismatch, split the region to smaller
+ *   page size and retry.
  *
  * About @FAULT_FLAG_ALLOW_RETRY and @FAULT_FLAG_TRIED: we can specify
  * whether we would allow page faults to retry by specifying these two
@@ -464,6 +466,7 @@ extern pgprot_t protection_map[16];
 #define FAULT_FLAG_REMOTE			0x80
 #define FAULT_FLAG_INSTRUCTION  		0x100
 #define FAULT_FLAG_INTERRUPTIBLE		0x200
+#define FAULT_FLAG_PAGE_SPLIT			0x400
 
 /*
  * The default fault flags that should be used by most of the
@@ -501,7 +504,8 @@ static inline bool fault_flag_allow_retry_first(unsigned int flags)
 	{ FAULT_FLAG_USER,		"USER" }, \
 	{ FAULT_FLAG_REMOTE,		"REMOTE" }, \
 	{ FAULT_FLAG_INSTRUCTION,	"INSTRUCTION" }, \
-	{ FAULT_FLAG_INTERRUPTIBLE,	"INTERRUPTIBLE" }
+	{ FAULT_FLAG_INTERRUPTIBLE,	"INTERRUPTIBLE" }, \
+	{ FAULT_FLAG_PAGE_SPLIT,	"PAGESPLIT" }
 
 /*
  * vm_fault is filled by the pagefault handler and passed to the vma's
diff --git a/mm/memory.c b/mm/memory.c
index feff48e1465a..c9dcf9b30719 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4427,6 +4427,12 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
 	return 0;
 }
 
+static int handle_split_page_fault(struct vm_fault *vmf)
+{
+	__split_huge_pmd(vmf->vma, vmf->pmd, vmf->address, false, NULL);
+	return 0;
+}
+
 /*
  * By the time we get here, we already hold the mm semaphore
  *
@@ -4448,6 +4454,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
 	pgd_t *pgd;
 	p4d_t *p4d;
 	vm_fault_t ret;
+	int split_page = flags & FAULT_FLAG_PAGE_SPLIT;
 
 	pgd = pgd_offset(mm, address);
 	p4d = p4d_alloc(mm, pgd, address);
@@ -4504,6 +4511,10 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
 				pmd_migration_entry_wait(mm, vmf.pmd);
 			return 0;
 		}
+
+		if (split_page)
+			return handle_split_page_fault(&vmf);
+
 		if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) {
 			if (pmd_protnone(orig_pmd) && vma_is_accessible(vma))
 				return do_huge_pmd_numa_page(&vmf, orig_pmd);
-- 
2.17.1


  parent reply	other threads:[~2021-03-24 17:05 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-24 17:04 [RFC Part2 PATCH 00/30] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 01/30] x86: Add the host SEV-SNP initialization support Brijesh Singh
2021-03-25 14:58   ` Dave Hansen
2021-03-25 15:31     ` Brijesh Singh
2021-03-25 15:51       ` Dave Hansen
2021-03-25 17:41         ` Brijesh Singh
2021-04-14  7:27   ` Borislav Petkov
2021-04-14 22:48     ` Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 02/30] x86/sev-snp: add RMP entry lookup helpers Brijesh Singh
2021-04-15 16:57   ` Borislav Petkov
2021-04-15 18:08     ` Brijesh Singh
2021-04-15 19:50       ` Borislav Petkov
2021-04-15 22:18         ` Brijesh Singh
2021-04-15 17:03   ` Borislav Petkov
2021-04-15 18:09     ` Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 03/30] x86: add helper functions for RMPUPDATE and PSMASH instruction Brijesh Singh
2021-04-15 18:00   ` Borislav Petkov
2021-04-15 18:15     ` Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 04/30] x86/mm: split the physmap when adding the page in RMP table Brijesh Singh
2021-03-25 15:17   ` Dave Hansen
2021-04-19 12:32   ` Borislav Petkov
2021-04-19 15:25     ` Brijesh Singh
2021-04-19 16:52       ` Borislav Petkov
     [not found]         ` <30bff969-e8cf-a991-7660-054ea136855a@amd.com>
2021-04-19 17:58           ` Dave Hansen
2021-04-19 18:10             ` Andy Lutomirski
2021-04-19 18:33               ` Dave Hansen
2021-04-19 18:37                 ` Andy Lutomirski
2021-04-20  9:51                 ` Borislav Petkov
2021-04-19 21:25               ` Brijesh Singh
2021-04-20  9:47           ` Borislav Petkov
2021-03-24 17:04 ` [RFC Part2 PATCH 05/30] x86: define RMP violation #PF error code Brijesh Singh
2021-03-24 18:03   ` Dave Hansen
2021-03-25 14:32     ` Brijesh Singh
2021-03-25 14:34       ` Dave Hansen
2021-04-20 10:32   ` Borislav Petkov
2021-04-20 21:37     ` Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 06/30] x86/fault: dump the RMP entry on #PF Brijesh Singh
2021-03-24 17:47   ` Andy Lutomirski
2021-03-24 20:35     ` Brijesh Singh
2021-03-24 17:04 ` Brijesh Singh [this message]
2021-03-25 14:30   ` [RFC Part2 PATCH 07/30] mm: add support to split the large THP based on RMP violation Dave Hansen
2021-03-25 14:48   ` Dave Hansen
2021-03-25 15:24     ` Brijesh Singh
2021-03-25 15:59       ` Dave Hansen
2021-04-21 12:59         ` Vlastimil Babka
2021-04-21 13:43           ` Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 08/30] crypto:ccp: define the SEV-SNP commands Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 09/30] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 10/30] crypto: ccp: shutdown SNP firmware on kexec Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 11/30] crypto:ccp: provide APIs to issue SEV-SNP commands Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 12/30] crypto ccp: handle the legacy SEV command when SNP is enabled Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 13/30] KVM: SVM: add initial SEV-SNP support Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 14/30] KVM: SVM: make AVIC backing, VMSA and VMCB memory allocation SNP safe Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 15/30] KVM: SVM: define new SEV_FEATURES field in the VMCB Save State Area Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 16/30] KVM: SVM: add KVM_SNP_INIT command Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 17/30] KVM: SVM: add KVM_SEV_SNP_LAUNCH_START command Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 18/30] KVM: SVM: add KVM_SEV_SNP_LAUNCH_UPDATE command Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 19/30] KVM: SVM: Reclaim the guest pages when SEV-SNP VM terminates Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 20/30] KVM: SVM: add KVM_SEV_SNP_LAUNCH_FINISH command Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 21/30] KVM: X86: Add kvm_x86_ops to get the max page level for the TDP Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 22/30] x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by SEV Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 23/30] KVM: X86: Introduce kvm_mmu_get_tdp_walk() for SEV-SNP use Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 24/30] KVM: X86: define new RMP check related #NPF error bits Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 25/30] KVM: X86: update page-fault trace to log the 64-bit error code Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 26/30] KVM: SVM: add support to handle GHCB GPA register VMGEXIT Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 27/30] KVM: SVM: add support to handle MSR based Page State Change VMGEXIT Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 28/30] KVM: SVM: add support to handle " Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 29/30] KVM: X86: export the kvm_zap_gfn_range() for the SNP use Brijesh Singh
2021-03-24 17:04 ` [RFC Part2 PATCH 30/30] KVM: X86: Add support to handle the RMP nested page fault Brijesh Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210324170436.31843-8-brijesh.singh@amd.com \
    --to=brijesh.singh@amd.com \
    --cc=ak@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@intel.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=hpa@zytor.com \
    --cc=jroedel@suse.de \
    --cc=kvm@vger.kernel.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rientjes@google.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.