From: Junaid Shahid <junaids@google.com>
To: linux-kernel@vger.kernel.org
Cc: Ofir Weisse <oweisse@google.com>,
kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com,
pjt@google.com, alexandre.chartre@oracle.com,
rppt@linux.ibm.com, dave.hansen@linux.intel.com,
peterz@infradead.org, tglx@linutronix.de, luto@kernel.org,
linux-mm@kvack.org
Subject: [RFC PATCH 35/47] mm: asi: asi_exit() on PF, skip handling if address is accessible
Date: Tue, 22 Feb 2022 21:22:11 -0800 [thread overview]
Message-ID: <20220223052223.1202152-36-junaids@google.com> (raw)
In-Reply-To: <20220223052223.1202152-1-junaids@google.com>
From: Ofir Weisse <oweisse@google.com>
On a page-fault - do asi_exit(). Then check if now after the exit the
address is accessible. We do this by refactoring spurious_kernel_fault()
into two parts:
1. Verify that the error code value is something that could arise from a
lazy TLB update.
2. Walk the page table and verify permissions, which is now called
is_address_accessible_now(). We also define PTE_PRESENT() and
PMD_PRESENT() which are suitable for checking userspace pages. For the
sake of spurious faualts, pte_present() and pmd_present() are only
good for kernelspace pages. This is because these macros might return
true even if the present bit is 0 (only relevant for userspace).
Signed-off-by: Ofir Weisse <oweisse@google.com>
---
arch/x86/mm/fault.c | 60 ++++++++++++++++++++++++++++++++++------
include/linux/mm_types.h | 3 ++
2 files changed, 55 insertions(+), 8 deletions(-)
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 8692eb50f4a5..d08021ba380b 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -982,6 +982,8 @@ static int spurious_kernel_fault_check(unsigned long error_code, pte_t *pte)
return 1;
}
+static int is_address_accessible_now(unsigned long error_code, unsigned long address,
+ pgd_t *pgd);
/*
* Handle a spurious fault caused by a stale TLB entry.
*
@@ -1003,15 +1005,13 @@ static int spurious_kernel_fault_check(unsigned long error_code, pte_t *pte)
* See Intel Developer's Manual Vol 3 Section 4.10.4.3, bullet 3
* (Optional Invalidation).
*/
+/* A spurious fault is also possible when Address Space Isolation (ASI) is in
+ * use. Specifically, code running withing an ASI domain touched memory outside
+ * the domain. This access causes a page-fault --> asi_exit() */
static noinline int
spurious_kernel_fault(unsigned long error_code, unsigned long address)
{
pgd_t *pgd;
- p4d_t *p4d;
- pud_t *pud;
- pmd_t *pmd;
- pte_t *pte;
- int ret;
/*
* Only writes to RO or instruction fetches from NX may cause
@@ -1027,6 +1027,37 @@ spurious_kernel_fault(unsigned long error_code, unsigned long address)
return 0;
pgd = init_mm.pgd + pgd_index(address);
+ return is_address_accessible_now(error_code, address, pgd);
+}
+NOKPROBE_SYMBOL(spurious_kernel_fault);
+
+
+/* Check if an address (kernel or userspace) would cause a page fault if
+ * accessed now.
+ *
+ * For kernel addresses, pte_present and pmd_present are sufficioent. For
+ * userspace, we must use PTE_PRESENT and PMD_PRESENT, which will only check the
+ * present bits.
+ * The existing pmd_present() in arch/x86/include/asm/pgtable.h is misleading.
+ * The PMD page might be in the middle of split_huge_page with present bit
+ * clear, but pmd_present will still return true. We are inteerested in knowing
+ * if the page is accessible to hardware - that is - the present bit is 1. */
+#define PMD_PRESENT(pmd) (pmd_flags(pmd) & _PAGE_PRESENT)
+
+/* pte_present will return true is _PAGE_PROTNONE is 1. We care if the hardware
+ * can actually access the page right now. */
+#define PTE_PRESENT(pte) (pte_flags(pte) & _PAGE_PRESENT)
+
+static noinline int
+is_address_accessible_now(unsigned long error_code, unsigned long address,
+ pgd_t *pgd)
+{
+ p4d_t *p4d;
+ pud_t *pud;
+ pmd_t *pmd;
+ pte_t *pte;
+ int ret;
+
if (!pgd_present(*pgd))
return 0;
@@ -1045,14 +1076,14 @@ spurious_kernel_fault(unsigned long error_code, unsigned long address)
return spurious_kernel_fault_check(error_code, (pte_t *) pud);
pmd = pmd_offset(pud, address);
- if (!pmd_present(*pmd))
+ if (!PMD_PRESENT(*pmd))
return 0;
if (pmd_large(*pmd))
return spurious_kernel_fault_check(error_code, (pte_t *) pmd);
pte = pte_offset_kernel(pmd, address);
- if (!pte_present(*pte))
+ if (!PTE_PRESENT(*pte))
return 0;
ret = spurious_kernel_fault_check(error_code, pte);
@@ -1068,7 +1099,6 @@ spurious_kernel_fault(unsigned long error_code, unsigned long address)
return ret;
}
-NOKPROBE_SYMBOL(spurious_kernel_fault);
int show_unhandled_signals = 1;
@@ -1504,6 +1534,20 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
* the fixup on the next page fault.
*/
struct asi *asi = asi_get_current();
+ if (asi)
+ asi_exit();
+
+ /* handle_page_fault() might call BUG() if we run it for a kernel
+ * address. This might be the case if we got here due to an ASI fault.
+ * We avoid this case by checking whether the address is now, after a
+ * potential asi_exit(), accessible by hardware. If it is - there's
+ * nothing to do.
+ */
+ if (current && mm_asi_enabled(current->mm)) {
+ pgd_t *pgd = (pgd_t*)__va(read_cr3_pa()) + pgd_index(address);
+ if (is_address_accessible_now(error_code, address, pgd))
+ return;
+ }
prefetchw(¤t->mm->mmap_lock);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index c3f209720a84..560909e80841 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -707,6 +707,9 @@ extern struct mm_struct init_mm;
#ifdef CONFIG_ADDRESS_SPACE_ISOLATION
static inline bool mm_asi_enabled(struct mm_struct *mm)
{
+ if (!mm)
+ return false;
+
return mm->asi_enabled;
}
#else
--
2.35.1.473.g83b2b277ed-goog
next prev parent reply other threads:[~2022-02-23 5:25 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-23 5:21 [RFC PATCH 00/47] Address Space Isolation for KVM Junaid Shahid
2022-02-23 5:21 ` [RFC PATCH 01/47] mm: asi: Introduce ASI core API Junaid Shahid
2022-02-23 5:21 ` [RFC PATCH 02/47] mm: asi: Add command-line parameter to enable/disable ASI Junaid Shahid
2022-02-23 5:21 ` [RFC PATCH 03/47] mm: asi: Switch to unrestricted address space when entering scheduler Junaid Shahid
2022-02-23 5:21 ` [RFC PATCH 04/47] mm: asi: ASI support in interrupts/exceptions Junaid Shahid
2022-03-14 15:50 ` Thomas Gleixner
2022-03-15 2:01 ` Junaid Shahid
2022-03-15 12:55 ` Thomas Gleixner
2022-03-15 22:41 ` Junaid Shahid
2022-02-23 5:21 ` [RFC PATCH 05/47] mm: asi: Make __get_current_cr3_fast() ASI-aware Junaid Shahid
2022-02-23 5:21 ` [RFC PATCH 06/47] mm: asi: ASI page table allocation and free functions Junaid Shahid
2022-02-23 5:21 ` [RFC PATCH 07/47] mm: asi: Functions to map/unmap a memory range into ASI page tables Junaid Shahid
2022-02-23 5:21 ` [RFC PATCH 08/47] mm: asi: Add basic infrastructure for global non-sensitive mappings Junaid Shahid
2022-02-23 5:21 ` [RFC PATCH 09/47] mm: Add __PAGEFLAG_FALSE Junaid Shahid
2022-02-23 5:21 ` [RFC PATCH 10/47] mm: asi: Support for global non-sensitive direct map allocations Junaid Shahid
2022-03-23 21:06 ` Matthew Wilcox
2022-03-23 23:48 ` Junaid Shahid
2022-03-24 1:54 ` Junaid Shahid
2022-02-23 5:21 ` [RFC PATCH 11/47] mm: asi: Global non-sensitive vmalloc/vmap support Junaid Shahid
2022-02-23 5:21 ` [RFC PATCH 12/47] mm: asi: Support for global non-sensitive slab caches Junaid Shahid
2022-02-23 5:21 ` [RFC PATCH 13/47] asi: Added ASI memory cgroup flag Junaid Shahid
2022-02-23 5:21 ` [RFC PATCH 14/47] mm: asi: Disable ASI API when ASI is not enabled for a process Junaid Shahid
2022-02-23 5:21 ` [RFC PATCH 15/47] kvm: asi: Restricted address space for VM execution Junaid Shahid
2022-02-23 5:21 ` [RFC PATCH 16/47] mm: asi: Support for mapping non-sensitive pcpu chunks Junaid Shahid
2022-02-23 5:21 ` [RFC PATCH 17/47] mm: asi: Aliased direct map for local non-sensitive allocations Junaid Shahid
2022-02-23 5:21 ` [RFC PATCH 18/47] mm: asi: Support for pre-ASI-init " Junaid Shahid
2022-02-23 5:21 ` [RFC PATCH 19/47] mm: asi: Support for locally nonsensitive page allocations Junaid Shahid
2022-02-23 5:21 ` [RFC PATCH 20/47] mm: asi: Support for locally non-sensitive vmalloc allocations Junaid Shahid
2022-02-23 5:21 ` [RFC PATCH 21/47] mm: asi: Add support for locally non-sensitive VM_USERMAP pages Junaid Shahid
2022-02-23 5:21 ` [RFC PATCH 22/47] mm: asi: Added refcounting when initilizing an asi Junaid Shahid
2022-02-23 5:21 ` [RFC PATCH 23/47] mm: asi: Add support for mapping all userspace memory into ASI Junaid Shahid
2022-02-23 5:22 ` [RFC PATCH 24/47] mm: asi: Support for local non-sensitive slab caches Junaid Shahid
2022-02-23 5:22 ` [RFC PATCH 25/47] mm: asi: Avoid warning from NMI userspace accesses in ASI context Junaid Shahid
2022-02-23 5:22 ` [RFC PATCH 26/47] mm: asi: Use separate PCIDs for restricted address spaces Junaid Shahid
2022-02-23 5:22 ` [RFC PATCH 27/47] mm: asi: Avoid TLB flushes during ASI CR3 switches when possible Junaid Shahid
2022-02-23 5:22 ` [RFC PATCH 28/47] mm: asi: Avoid TLB flush IPIs to CPUs not in ASI context Junaid Shahid
2022-02-23 5:22 ` [RFC PATCH 29/47] mm: asi: Reduce TLB flushes when freeing pages asynchronously Junaid Shahid
2022-02-23 5:22 ` [RFC PATCH 30/47] mm: asi: Add API for mapping userspace address ranges Junaid Shahid
2022-02-23 5:22 ` [RFC PATCH 31/47] mm: asi: Support for non-sensitive SLUB caches Junaid Shahid
2022-02-23 5:22 ` [RFC PATCH 32/47] x86: asi: Allocate FPU state separately when ASI is enabled Junaid Shahid
2022-02-23 5:22 ` [RFC PATCH 33/47] kvm: asi: Map guest memory into restricted ASI address space Junaid Shahid
2022-02-23 5:22 ` [RFC PATCH 34/47] kvm: asi: Unmap guest memory from ASI address space when using nested virt Junaid Shahid
2022-02-23 5:22 ` Junaid Shahid [this message]
2022-02-23 5:22 ` [RFC PATCH 36/47] mm: asi: Adding support for dynamic percpu ASI allocations Junaid Shahid
2022-02-23 5:22 ` [RFC PATCH 37/47] mm: asi: ASI annotation support for static variables Junaid Shahid
2022-02-23 5:22 ` [RFC PATCH 38/47] mm: asi: ASI annotation support for dynamic modules Junaid Shahid
2022-02-23 5:22 ` [RFC PATCH 39/47] mm: asi: Skip conventional L1TF/MDS mitigations Junaid Shahid
2022-02-23 5:22 ` [RFC PATCH 40/47] mm: asi: support for static percpu DEFINE_PER_CPU*_ASI Junaid Shahid
2022-02-23 5:22 ` [RFC PATCH 41/47] mm: asi: Annotation of static variables to be nonsensitive Junaid Shahid
2022-02-23 5:22 ` [RFC PATCH 42/47] mm: asi: Annotation of PERCPU " Junaid Shahid
2022-02-23 5:22 ` [RFC PATCH 43/47] mm: asi: Annotation of dynamic " Junaid Shahid
2022-02-23 5:22 ` [RFC PATCH 44/47] kvm: asi: Splitting kvm_vcpu_arch into non/sensitive parts Junaid Shahid
2022-02-23 5:22 ` [RFC PATCH 45/47] mm: asi: Mapping global nonsensitive areas in asi_global_init Junaid Shahid
2022-02-23 5:22 ` [RFC PATCH 46/47] kvm: asi: Do asi_exit() in vcpu_run loop before returning to userspace Junaid Shahid
2022-02-23 5:22 ` [RFC PATCH 47/47] mm: asi: Properly un/mapping task stack from ASI + tlb flush Junaid Shahid
2022-03-05 3:39 ` [RFC PATCH 00/47] Address Space Isolation for KVM Hyeonggon Yoo
2022-03-16 21:34 ` Alexandre Chartre
2022-03-17 23:25 ` Junaid Shahid
2022-03-22 9:46 ` Alexandre Chartre
2022-03-23 19:35 ` Junaid Shahid
2022-04-08 8:52 ` Alexandre Chartre
2022-04-11 3:26 ` junaid_shahid
2022-03-16 22:49 ` Thomas Gleixner
2022-03-17 21:24 ` Junaid Shahid
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220223052223.1202152-36-junaids@google.com \
--to=junaids@google.com \
--cc=alexandre.chartre@oracle.com \
--cc=dave.hansen@linux.intel.com \
--cc=jmattson@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=oweisse@google.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=rppt@linux.ibm.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).