linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] [v2] x86/mm: page fault handling cleanups
@ 2018-09-28 16:02 Dave Hansen
  2018-09-28 16:02 ` [PATCH 1/8] x86/mm: clarify hardware vs. software "error_code" Dave Hansen
                   ` (8 more replies)
  0 siblings, 9 replies; 20+ messages in thread
From: Dave Hansen @ 2018-09-28 16:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, sean.j.christopherson, peterz, tglx, luto, jannh, x86

Changes from v1:
 * Take "space_" out of do_kern/user_addr_fault()
 * Make "bad fault" comment in do_kern_addr_fault() less scary
 * Add clarifying comment for the conditions under which 
   do_kern_addr_fault() is called.
 * Remove mention of hw_error_code in search_exception_tables()
   comment.
 * Clarify that the exception tables spell out individual
   instructions, not larger sections of code.
 * Use PAGE_MASK in is_vsyscall_vaddr()
 * Add some additional reasoning behind the code move when
   moving the vsyscall handling to user address space handler
 * Remove hard-coded ~0xfff and replace with PAGE_MASK

---

I went trying to clean up the spurious protection key checks.
But, I got distracted by some other warts in the code.  I hope this
makes things more comprehendable with some improved structure,
commenting and naming.

We come out the other side of it with a new warning in for pkey
faults in the kernel portion of the address space, and removal
of some dead pkeys code, but no other functional changes.

There is a potential impact from moving the vsyscall emulation.
But, this does pass the x86 selftests without any fuss.

Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: x86@kernel.org

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/8] x86/mm: clarify hardware vs. software "error_code"
  2018-09-28 16:02 [PATCH 0/8] [v2] x86/mm: page fault handling cleanups Dave Hansen
@ 2018-09-28 16:02 ` Dave Hansen
  2018-10-09 15:02   ` [tip:x86/mm] x86/mm: Clarify " tip-bot for Dave Hansen
  2018-09-28 16:02 ` [PATCH 2/8] x86/mm: break out kernel address space handling Dave Hansen
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Dave Hansen @ 2018-09-28 16:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, sean.j.christopherson, peterz, tglx, x86, luto, jannh


From: Dave Hansen <dave.hansen@linux.intel.com>

We pass around a variable called "error_code" all around the page
fault code.  Sounds simple enough, especially since "error_code" looks
like it exactly matches the values that the hardware gives us on the
stack to report the page fault error code (PFEC in SDM parlance).

But, that's not how it works.

For part of the page fault handler, "error_code" does exactly match
PFEC.  But, during later parts, it diverges and starts to mean
something a bit different.

Give it two names for its two jobs.

The place it diverges is also really screwy.  It's only in a spot
where the hardware tells us we have kernel-mode access that occurred
while we were in usermode accessing user-controlled address space.
Add a warning in there.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
---

 b/arch/x86/mm/fault.c |   77 +++++++++++++++++++++++++++++++++-----------------
 1 file changed, 52 insertions(+), 25 deletions(-)

diff -puN arch/x86/mm/fault.c~pkeys-fault-warnings-0 arch/x86/mm/fault.c
--- a/arch/x86/mm/fault.c~pkeys-fault-warnings-0	2018-09-27 10:17:21.481343572 -0700
+++ b/arch/x86/mm/fault.c	2018-09-27 10:17:21.485343572 -0700
@@ -1210,9 +1210,10 @@ static inline bool smap_violation(int er
  * routines.
  */
 static noinline void
-__do_page_fault(struct pt_regs *regs, unsigned long error_code,
+__do_page_fault(struct pt_regs *regs, unsigned long hw_error_code,
 		unsigned long address)
 {
+	unsigned long sw_error_code;
 	struct vm_area_struct *vma;
 	struct task_struct *tsk;
 	struct mm_struct *mm;
@@ -1238,17 +1239,17 @@ __do_page_fault(struct pt_regs *regs, un
 	 * nothing more.
 	 *
 	 * This verifies that the fault happens in kernel space
-	 * (error_code & 4) == 0, and that the fault was not a
-	 * protection error (error_code & 9) == 0.
+	 * (hw_error_code & 4) == 0, and that the fault was not a
+	 * protection error (hw_error_code & 9) == 0.
 	 */
 	if (unlikely(fault_in_kernel_space(address))) {
-		if (!(error_code & (X86_PF_RSVD | X86_PF_USER | X86_PF_PROT))) {
+		if (!(hw_error_code & (X86_PF_RSVD | X86_PF_USER | X86_PF_PROT))) {
 			if (vmalloc_fault(address) >= 0)
 				return;
 		}
 
 		/* Can handle a stale RO->RW TLB: */
-		if (spurious_fault(error_code, address))
+		if (spurious_fault(hw_error_code, address))
 			return;
 
 		/* kprobes don't want to hook the spurious faults: */
@@ -1258,7 +1259,7 @@ __do_page_fault(struct pt_regs *regs, un
 		 * Don't take the mm semaphore here. If we fixup a prefetch
 		 * fault we could otherwise deadlock:
 		 */
-		bad_area_nosemaphore(regs, error_code, address, NULL);
+		bad_area_nosemaphore(regs, hw_error_code, address, NULL);
 
 		return;
 	}
@@ -1267,11 +1268,11 @@ __do_page_fault(struct pt_regs *regs, un
 	if (unlikely(kprobes_fault(regs)))
 		return;
 
-	if (unlikely(error_code & X86_PF_RSVD))
-		pgtable_bad(regs, error_code, address);
+	if (unlikely(hw_error_code & X86_PF_RSVD))
+		pgtable_bad(regs, hw_error_code, address);
 
-	if (unlikely(smap_violation(error_code, regs))) {
-		bad_area_nosemaphore(regs, error_code, address, NULL);
+	if (unlikely(smap_violation(hw_error_code, regs))) {
+		bad_area_nosemaphore(regs, hw_error_code, address, NULL);
 		return;
 	}
 
@@ -1280,11 +1281,18 @@ __do_page_fault(struct pt_regs *regs, un
 	 * in a region with pagefaults disabled then we must not take the fault
 	 */
 	if (unlikely(faulthandler_disabled() || !mm)) {
-		bad_area_nosemaphore(regs, error_code, address, NULL);
+		bad_area_nosemaphore(regs, hw_error_code, address, NULL);
 		return;
 	}
 
 	/*
+	 * hw_error_code is literally the "page fault error code" passed to
+	 * the kernel directly from the hardware.  But, we will shortly be
+	 * modifying it in software, so give it a new name.
+	 */
+	sw_error_code = hw_error_code;
+
+	/*
 	 * It's safe to allow irq's after cr2 has been saved and the
 	 * vmalloc fault has been handled.
 	 *
@@ -1293,7 +1301,26 @@ __do_page_fault(struct pt_regs *regs, un
 	 */
 	if (user_mode(regs)) {
 		local_irq_enable();
-		error_code |= X86_PF_USER;
+		/*
+		 * Up to this point, X86_PF_USER set in hw_error_code
+		 * indicated a user-mode access.  But, after this,
+		 * X86_PF_USER in sw_error_code will indicate either
+		 * that, *or* an implicit kernel(supervisor)-mode access
+		 * which originated from user mode.
+		 */
+		if (!(hw_error_code & X86_PF_USER)) {
+			/*
+			 * The CPU was in user mode, but the CPU says
+			 * the fault was not a user-mode access.
+			 * Must be an implicit kernel-mode access,
+			 * which we do not expect to happen in the
+			 * user address space.
+			 */
+			pr_warn_once("kernel-mode error from user-mode: %lx\n",
+					hw_error_code);
+
+			sw_error_code |= X86_PF_USER;
+		}
 		flags |= FAULT_FLAG_USER;
 	} else {
 		if (regs->flags & X86_EFLAGS_IF)
@@ -1302,9 +1329,9 @@ __do_page_fault(struct pt_regs *regs, un
 
 	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
 
-	if (error_code & X86_PF_WRITE)
+	if (sw_error_code & X86_PF_WRITE)
 		flags |= FAULT_FLAG_WRITE;
-	if (error_code & X86_PF_INSTR)
+	if (sw_error_code & X86_PF_INSTR)
 		flags |= FAULT_FLAG_INSTRUCTION;
 
 	/*
@@ -1324,9 +1351,9 @@ __do_page_fault(struct pt_regs *regs, un
 	 * space check, thus avoiding the deadlock:
 	 */
 	if (unlikely(!down_read_trylock(&mm->mmap_sem))) {
-		if (!(error_code & X86_PF_USER) &&
+		if (!(sw_error_code & X86_PF_USER) &&
 		    !search_exception_tables(regs->ip)) {
-			bad_area_nosemaphore(regs, error_code, address, NULL);
+			bad_area_nosemaphore(regs, sw_error_code, address, NULL);
 			return;
 		}
 retry:
@@ -1342,16 +1369,16 @@ retry:
 
 	vma = find_vma(mm, address);
 	if (unlikely(!vma)) {
-		bad_area(regs, error_code, address);
+		bad_area(regs, sw_error_code, address);
 		return;
 	}
 	if (likely(vma->vm_start <= address))
 		goto good_area;
 	if (unlikely(!(vma->vm_flags & VM_GROWSDOWN))) {
-		bad_area(regs, error_code, address);
+		bad_area(regs, sw_error_code, address);
 		return;
 	}
-	if (error_code & X86_PF_USER) {
+	if (sw_error_code & X86_PF_USER) {
 		/*
 		 * Accessing the stack below %sp is always a bug.
 		 * The large cushion allows instructions like enter
@@ -1359,12 +1386,12 @@ retry:
 		 * 32 pointers and then decrements %sp by 65535.)
 		 */
 		if (unlikely(address + 65536 + 32 * sizeof(unsigned long) < regs->sp)) {
-			bad_area(regs, error_code, address);
+			bad_area(regs, sw_error_code, address);
 			return;
 		}
 	}
 	if (unlikely(expand_stack(vma, address))) {
-		bad_area(regs, error_code, address);
+		bad_area(regs, sw_error_code, address);
 		return;
 	}
 
@@ -1373,8 +1400,8 @@ retry:
 	 * we can handle it..
 	 */
 good_area:
-	if (unlikely(access_error(error_code, vma))) {
-		bad_area_access_error(regs, error_code, address, vma);
+	if (unlikely(access_error(sw_error_code, vma))) {
+		bad_area_access_error(regs, sw_error_code, address, vma);
 		return;
 	}
 
@@ -1416,13 +1443,13 @@ good_area:
 			return;
 
 		/* Not returning to user mode? Handle exceptions or die: */
-		no_context(regs, error_code, address, SIGBUS, BUS_ADRERR);
+		no_context(regs, sw_error_code, address, SIGBUS, BUS_ADRERR);
 		return;
 	}
 
 	up_read(&mm->mmap_sem);
 	if (unlikely(fault & VM_FAULT_ERROR)) {
-		mm_fault_error(regs, error_code, address, &pkey, fault);
+		mm_fault_error(regs, sw_error_code, address, &pkey, fault);
 		return;
 	}
 
_

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 2/8] x86/mm: break out kernel address space handling
  2018-09-28 16:02 [PATCH 0/8] [v2] x86/mm: page fault handling cleanups Dave Hansen
  2018-09-28 16:02 ` [PATCH 1/8] x86/mm: clarify hardware vs. software "error_code" Dave Hansen
@ 2018-09-28 16:02 ` Dave Hansen
  2018-10-09 15:02   ` [tip:x86/mm] x86/mm: Break " tip-bot for Dave Hansen
  2018-09-28 16:02 ` [PATCH 3/8] x86/mm: break out user " Dave Hansen
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Dave Hansen @ 2018-09-28 16:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, sean.j.christopherson, peterz, tglx, x86, luto, jannh


From: Dave Hansen <dave.hansen@linux.intel.com>

The page fault handler (__do_page_fault())  basically has two sections:
one for handling faults in the kernel portion of the address space
and another for faults in the user portion of the address space.

But, these two parts don't stick out that well.  Let's make that more
clear from code separation and naming.  Pull kernel fault
handling into its own helper, and reflect that naming by renaming
spurious_fault() -> spurious_kernel_fault().

Also, rewrite the vmalloc() handling comment a bit.  It was a bit
stale and also glossed over the reserved bit handling.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
---

 b/arch/x86/mm/fault.c |  101 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 62 insertions(+), 39 deletions(-)

diff -puN arch/x86/mm/fault.c~pkeys-fault-warnings-00 arch/x86/mm/fault.c
--- a/arch/x86/mm/fault.c~pkeys-fault-warnings-00	2018-09-27 10:17:21.985343570 -0700
+++ b/arch/x86/mm/fault.c	2018-09-27 10:17:21.988343570 -0700
@@ -1034,7 +1034,7 @@ mm_fault_error(struct pt_regs *regs, uns
 	}
 }
 
-static int spurious_fault_check(unsigned long error_code, pte_t *pte)
+static int spurious_kernel_fault_check(unsigned long error_code, pte_t *pte)
 {
 	if ((error_code & X86_PF_WRITE) && !pte_write(*pte))
 		return 0;
@@ -1073,7 +1073,7 @@ static int spurious_fault_check(unsigned
  * (Optional Invalidation).
  */
 static noinline int
-spurious_fault(unsigned long error_code, unsigned long address)
+spurious_kernel_fault(unsigned long error_code, unsigned long address)
 {
 	pgd_t *pgd;
 	p4d_t *p4d;
@@ -1104,27 +1104,27 @@ spurious_fault(unsigned long error_code,
 		return 0;
 
 	if (p4d_large(*p4d))
-		return spurious_fault_check(error_code, (pte_t *) p4d);
+		return spurious_kernel_fault_check(error_code, (pte_t *) p4d);
 
 	pud = pud_offset(p4d, address);
 	if (!pud_present(*pud))
 		return 0;
 
 	if (pud_large(*pud))
-		return spurious_fault_check(error_code, (pte_t *) pud);
+		return spurious_kernel_fault_check(error_code, (pte_t *) pud);
 
 	pmd = pmd_offset(pud, address);
 	if (!pmd_present(*pmd))
 		return 0;
 
 	if (pmd_large(*pmd))
-		return spurious_fault_check(error_code, (pte_t *) pmd);
+		return spurious_kernel_fault_check(error_code, (pte_t *) pmd);
 
 	pte = pte_offset_kernel(pmd, address);
 	if (!pte_present(*pte))
 		return 0;
 
-	ret = spurious_fault_check(error_code, pte);
+	ret = spurious_kernel_fault_check(error_code, pte);
 	if (!ret)
 		return 0;
 
@@ -1132,12 +1132,12 @@ spurious_fault(unsigned long error_code,
 	 * Make sure we have permissions in PMD.
 	 * If not, then there's a bug in the page tables:
 	 */
-	ret = spurious_fault_check(error_code, (pte_t *) pmd);
+	ret = spurious_kernel_fault_check(error_code, (pte_t *) pmd);
 	WARN_ONCE(!ret, "PMD has incorrect permission bits\n");
 
 	return ret;
 }
-NOKPROBE_SYMBOL(spurious_fault);
+NOKPROBE_SYMBOL(spurious_kernel_fault);
 
 int show_unhandled_signals = 1;
 
@@ -1205,6 +1205,58 @@ static inline bool smap_violation(int er
 }
 
 /*
+ * Called for all faults where 'address' is part of the kernel address
+ * space.  Might get called for faults that originate from *code* that
+ * ran in userspace or the kernel.
+ */
+static void
+do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code,
+		   unsigned long address)
+{
+	/*
+	 * We can fault-in kernel-space virtual memory on-demand. The
+	 * 'reference' page table is init_mm.pgd.
+	 *
+	 * NOTE! We MUST NOT take any locks for this case. We may
+	 * be in an interrupt or a critical region, and should
+	 * only copy the information from the master page table,
+	 * nothing more.
+	 *
+	 * Before doing this on-demand faulting, ensure that the
+	 * fault is not any of the following:
+	 * 1. A fault on a PTE with a reserved bit set.
+	 * 2. A fault caused by a user-mode access.  (Do not demand-
+	 *    fault kernel memory due to user-mode accesses).
+	 * 3. A fault caused by a page-level protection violation.
+	 *    (A demand fault would be on a non-present page which
+	 *     would have X86_PF_PROT==0).
+	 */
+	if (!(hw_error_code & (X86_PF_RSVD | X86_PF_USER | X86_PF_PROT))) {
+		if (vmalloc_fault(address) >= 0)
+			return;
+	}
+
+	/* Was the fault spurious, caused by lazy TLB invalidation? */
+	if (spurious_kernel_fault(hw_error_code, address))
+		return;
+
+	/* kprobes don't want to hook the spurious faults: */
+	if (kprobes_fault(regs))
+		return;
+
+	/*
+	 * Note, despite being a "bad area", there are quite a few
+	 * acceptable reasons to get here, such as erratum fixups
+	 * and handling kernel code that can fault, like get_user().
+	 *
+	 * Don't take the mm semaphore here. If we fixup a prefetch
+	 * fault we could otherwise deadlock:
+	 */
+	bad_area_nosemaphore(regs, hw_error_code, address, NULL);
+}
+NOKPROBE_SYMBOL(do_kern_addr_fault);
+
+/*
  * This routine handles page faults.  It determines the address,
  * and the problem, and then passes it off to one of the appropriate
  * routines.
@@ -1229,38 +1281,9 @@ __do_page_fault(struct pt_regs *regs, un
 	if (unlikely(kmmio_fault(regs, address)))
 		return;
 
-	/*
-	 * We fault-in kernel-space virtual memory on-demand. The
-	 * 'reference' page table is init_mm.pgd.
-	 *
-	 * NOTE! We MUST NOT take any locks for this case. We may
-	 * be in an interrupt or a critical region, and should
-	 * only copy the information from the master page table,
-	 * nothing more.
-	 *
-	 * This verifies that the fault happens in kernel space
-	 * (hw_error_code & 4) == 0, and that the fault was not a
-	 * protection error (hw_error_code & 9) == 0.
-	 */
+	/* Was the fault on kernel-controlled part of the address space? */
 	if (unlikely(fault_in_kernel_space(address))) {
-		if (!(hw_error_code & (X86_PF_RSVD | X86_PF_USER | X86_PF_PROT))) {
-			if (vmalloc_fault(address) >= 0)
-				return;
-		}
-
-		/* Can handle a stale RO->RW TLB: */
-		if (spurious_fault(hw_error_code, address))
-			return;
-
-		/* kprobes don't want to hook the spurious faults: */
-		if (kprobes_fault(regs))
-			return;
-		/*
-		 * Don't take the mm semaphore here. If we fixup a prefetch
-		 * fault we could otherwise deadlock:
-		 */
-		bad_area_nosemaphore(regs, hw_error_code, address, NULL);
-
+		do_kern_addr_fault(regs, hw_error_code, address);
 		return;
 	}
 
_

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 3/8] x86/mm: break out user address space handling
  2018-09-28 16:02 [PATCH 0/8] [v2] x86/mm: page fault handling cleanups Dave Hansen
  2018-09-28 16:02 ` [PATCH 1/8] x86/mm: clarify hardware vs. software "error_code" Dave Hansen
  2018-09-28 16:02 ` [PATCH 2/8] x86/mm: break out kernel address space handling Dave Hansen
@ 2018-09-28 16:02 ` Dave Hansen
  2018-10-09 15:03   ` [tip:x86/mm] x86/mm: Break " tip-bot for Dave Hansen
  2018-09-28 16:02 ` [PATCH 4/8] x86/mm: add clarifying comments for user addr space Dave Hansen
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Dave Hansen @ 2018-09-28 16:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, sean.j.christopherson, peterz, tglx, x86, luto, jannh


From: Dave Hansen <dave.hansen@linux.intel.com>

The last patch broke out kernel address space handing into its own
helper.  Now, do the same for user address space handling.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
---

 b/arch/x86/mm/fault.c |   47 ++++++++++++++++++++++++++++-------------------
 1 file changed, 28 insertions(+), 19 deletions(-)

diff -puN arch/x86/mm/fault.c~pkeys-fault-warnings-01 arch/x86/mm/fault.c
--- a/arch/x86/mm/fault.c~pkeys-fault-warnings-01	2018-09-27 10:17:22.485343569 -0700
+++ b/arch/x86/mm/fault.c	2018-09-27 10:17:22.489343569 -0700
@@ -968,6 +968,7 @@ bad_area_access_error(struct pt_regs *re
 		__bad_area(regs, error_code, address, vma, SEGV_ACCERR);
 }
 
+/* Handle faults in the kernel portion of the address space */
 static void
 do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
 	  u32 *pkey, unsigned int fault)
@@ -1256,14 +1257,11 @@ do_kern_addr_fault(struct pt_regs *regs,
 }
 NOKPROBE_SYMBOL(do_kern_addr_fault);
 
-/*
- * This routine handles page faults.  It determines the address,
- * and the problem, and then passes it off to one of the appropriate
- * routines.
- */
-static noinline void
-__do_page_fault(struct pt_regs *regs, unsigned long hw_error_code,
-		unsigned long address)
+/* Handle faults in the user portion of the address space */
+static inline
+void do_user_addr_fault(struct pt_regs *regs,
+			unsigned long hw_error_code,
+			unsigned long address)
 {
 	unsigned long sw_error_code;
 	struct vm_area_struct *vma;
@@ -1276,17 +1274,6 @@ __do_page_fault(struct pt_regs *regs, un
 	tsk = current;
 	mm = tsk->mm;
 
-	prefetchw(&mm->mmap_sem);
-
-	if (unlikely(kmmio_fault(regs, address)))
-		return;
-
-	/* Was the fault on kernel-controlled part of the address space? */
-	if (unlikely(fault_in_kernel_space(address))) {
-		do_kern_addr_fault(regs, hw_error_code, address);
-		return;
-	}
-
 	/* kprobes don't want to hook the spurious faults: */
 	if (unlikely(kprobes_fault(regs)))
 		return;
@@ -1490,6 +1477,28 @@ good_area:
 
 	check_v8086_mode(regs, address, tsk);
 }
+NOKPROBE_SYMBOL(do_user_addr_fault);
+
+/*
+ * This routine handles page faults.  It determines the address,
+ * and the problem, and then passes it off to one of the appropriate
+ * routines.
+ */
+static noinline void
+__do_page_fault(struct pt_regs *regs, unsigned long hw_error_code,
+		unsigned long address)
+{
+	prefetchw(&current->mm->mmap_sem);
+
+	if (unlikely(kmmio_fault(regs, address)))
+		return;
+
+	/* Was the fault on kernel-controlled part of the address space? */
+	if (unlikely(fault_in_kernel_space(address)))
+		do_kern_addr_fault(regs, hw_error_code, address);
+	else
+		do_user_addr_fault(regs, hw_error_code, address);
+}
 NOKPROBE_SYMBOL(__do_page_fault);
 
 static nokprobe_inline void
_

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 4/8] x86/mm: add clarifying comments for user addr space
  2018-09-28 16:02 [PATCH 0/8] [v2] x86/mm: page fault handling cleanups Dave Hansen
                   ` (2 preceding siblings ...)
  2018-09-28 16:02 ` [PATCH 3/8] x86/mm: break out user " Dave Hansen
@ 2018-09-28 16:02 ` Dave Hansen
  2018-10-09 15:03   ` [tip:x86/mm] x86/mm: Add " tip-bot for Dave Hansen
  2018-09-28 16:02 ` [PATCH 5/8] x86/mm: fix exception table comments Dave Hansen
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Dave Hansen @ 2018-09-28 16:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, sean.j.christopherson, peterz, tglx, x86, luto, jannh


From: Dave Hansen <dave.hansen@linux.intel.com>

The SMAP and Reserved checking do not have nice comments.  Add
some to clarify and make it match everything else.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
---

 b/arch/x86/mm/fault.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff -puN arch/x86/mm/fault.c~pkeys-fault-warnings-02 arch/x86/mm/fault.c
--- a/arch/x86/mm/fault.c~pkeys-fault-warnings-02	2018-09-27 10:17:22.992343568 -0700
+++ b/arch/x86/mm/fault.c	2018-09-27 10:17:22.996343568 -0700
@@ -1278,9 +1278,17 @@ void do_user_addr_fault(struct pt_regs *
 	if (unlikely(kprobes_fault(regs)))
 		return;
 
+	/*
+	 * Reserved bits are never expected to be set on
+	 * entries in the user portion of the page tables.
+	 */
 	if (unlikely(hw_error_code & X86_PF_RSVD))
 		pgtable_bad(regs, hw_error_code, address);
 
+	/*
+	 * Check for invalid kernel (supervisor) access to user
+	 * pages in the user address space.
+	 */
 	if (unlikely(smap_violation(hw_error_code, regs))) {
 		bad_area_nosemaphore(regs, hw_error_code, address, NULL);
 		return;
_

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 5/8] x86/mm: fix exception table comments
  2018-09-28 16:02 [PATCH 0/8] [v2] x86/mm: page fault handling cleanups Dave Hansen
                   ` (3 preceding siblings ...)
  2018-09-28 16:02 ` [PATCH 4/8] x86/mm: add clarifying comments for user addr space Dave Hansen
@ 2018-09-28 16:02 ` Dave Hansen
  2018-10-09 15:04   ` [tip:x86/mm] x86/mm: Fix " tip-bot for Dave Hansen
  2018-09-28 16:02 ` [PATCH 6/8] x86/mm: add vsyscall address helper Dave Hansen
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Dave Hansen @ 2018-09-28 16:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, sean.j.christopherson, peterz, tglx, x86, luto, jannh


From: Dave Hansen <dave.hansen@linux.intel.com>

The comments here are wrong.  They are too absolute about where
faults can occur when running in the kernel.  The comments are
also a bit hard to match up with the code.

Trim down the comments, and make them more precise.

Also add a comment explaining why we are doing the
bad_area_nosemaphore() path here.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
---

 b/arch/x86/mm/fault.c |   28 +++++++++++++++-------------
 1 file changed, 15 insertions(+), 13 deletions(-)

diff -puN arch/x86/mm/fault.c~pkeys-fault-warnings-03 arch/x86/mm/fault.c
--- a/arch/x86/mm/fault.c~pkeys-fault-warnings-03	2018-09-27 10:17:23.489343567 -0700
+++ b/arch/x86/mm/fault.c	2018-09-27 10:17:23.493343567 -0700
@@ -1353,24 +1353,26 @@ void do_user_addr_fault(struct pt_regs *
 		flags |= FAULT_FLAG_INSTRUCTION;
 
 	/*
-	 * When running in the kernel we expect faults to occur only to
-	 * addresses in user space.  All other faults represent errors in
-	 * the kernel and should generate an OOPS.  Unfortunately, in the
-	 * case of an erroneous fault occurring in a code path which already
-	 * holds mmap_sem we will deadlock attempting to validate the fault
-	 * against the address space.  Luckily the kernel only validly
-	 * references user space from well defined areas of code, which are
-	 * listed in the exceptions table.
+	 * Kernel-mode access to the user address space should only occur
+	 * on well-defined single instructions listed in the exception
+	 * tables.  But, an erroneous kernel fault occurring outside one of
+	 * those areas which also holds mmap_sem might deadlock attempting
+	 * to validate the fault against the address space.
 	 *
-	 * As the vast majority of faults will be valid we will only perform
-	 * the source reference check when there is a possibility of a
-	 * deadlock. Attempt to lock the address space, if we cannot we then
-	 * validate the source. If this is invalid we can skip the address
-	 * space check, thus avoiding the deadlock:
+	 * Only do the expensive exception table search when we might be at
+	 * risk of a deadlock.  This happens if we
+	 * 1. Failed to acquire mmap_sem, and
+	 * 2. The access did not originate in userspace.  Note: either the
+	 *    hardware or earlier page fault code may set X86_PF_USER
+	 *    in sw_error_code.
 	 */
 	if (unlikely(!down_read_trylock(&mm->mmap_sem))) {
 		if (!(sw_error_code & X86_PF_USER) &&
 		    !search_exception_tables(regs->ip)) {
+			/*
+			 * Fault from code in kernel from
+			 * which we do not expect faults.
+			 */
 			bad_area_nosemaphore(regs, sw_error_code, address, NULL);
 			return;
 		}
_

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 6/8] x86/mm: add vsyscall address helper
  2018-09-28 16:02 [PATCH 0/8] [v2] x86/mm: page fault handling cleanups Dave Hansen
                   ` (4 preceding siblings ...)
  2018-09-28 16:02 ` [PATCH 5/8] x86/mm: fix exception table comments Dave Hansen
@ 2018-09-28 16:02 ` Dave Hansen
  2018-10-09 15:04   ` [tip:x86/mm] x86/mm: Add " tip-bot for Dave Hansen
  2018-09-28 16:02 ` [PATCH 7/8] x86/mm/vsyscall: consider vsyscall page part of user address space Dave Hansen
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Dave Hansen @ 2018-09-28 16:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, sean.j.christopherson, peterz, tglx, x86, luto, jannh


From: Dave Hansen <dave.hansen@linux.intel.com>

We will shortly be using this check in two locations.  Put it in
a helper before we do so.

Let's also insert PAGE_MASK instead of the open-coded ~0xfff.
It is easier to read and also more obviously correct considering
the implicit type conversion that has to happen when it is not
an implicit 'unsigned long'.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
---

 b/arch/x86/mm/fault.c |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff -puN arch/x86/mm/fault.c~is-vsyscall-addr arch/x86/mm/fault.c
--- a/arch/x86/mm/fault.c~is-vsyscall-addr	2018-09-27 10:17:23.988343565 -0700
+++ b/arch/x86/mm/fault.c	2018-09-27 10:17:23.992343565 -0700
@@ -842,6 +842,15 @@ show_signal_msg(struct pt_regs *regs, un
 	show_opcodes(regs, loglvl);
 }
 
+/*
+ * The (legacy) vsyscall page is the long page in the kernel portion
+ * of the address space that has user-accessible permissions.
+ */
+static bool is_vsyscall_vaddr(unsigned long vaddr)
+{
+	return (vaddr & PAGE_MASK) == VSYSCALL_ADDR;
+}
+
 static void
 __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
 		       unsigned long address, u32 *pkey, int si_code)
@@ -871,7 +880,7 @@ __bad_area_nosemaphore(struct pt_regs *r
 		 * emulation.
 		 */
 		if (unlikely((error_code & X86_PF_INSTR) &&
-			     ((address & ~0xfff) == VSYSCALL_ADDR))) {
+			     is_vsyscall_vaddr(address))) {
 			if (emulate_vsyscall(regs, address))
 				return;
 		}
_

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 7/8] x86/mm/vsyscall: consider vsyscall page part of user address space
  2018-09-28 16:02 [PATCH 0/8] [v2] x86/mm: page fault handling cleanups Dave Hansen
                   ` (5 preceding siblings ...)
  2018-09-28 16:02 ` [PATCH 6/8] x86/mm: add vsyscall address helper Dave Hansen
@ 2018-09-28 16:02 ` Dave Hansen
  2018-10-09 15:05   ` [tip:x86/mm] x86/mm/vsyscall: Consider " tip-bot for Dave Hansen
  2018-09-28 16:02 ` [PATCH 8/8] x86/mm: remove spurious fault pkey check Dave Hansen
  2018-10-02  9:54 ` [PATCH 0/8] [v2] x86/mm: page fault handling cleanups Peter Zijlstra
  8 siblings, 1 reply; 20+ messages in thread
From: Dave Hansen @ 2018-09-28 16:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, sean.j.christopherson, peterz, tglx, x86, luto, jannh


From: Dave Hansen <dave.hansen@linux.intel.com>

The vsyscall page is weird.  It is in what is traditionally part of
the kernel address space.  But, it has user permissions and we handle
faults on it like we would on a user page: interrupts on.

Right now, we handle vsyscall emulation in the "bad_area" code, which
is used for both user-address-space and kernel-address-space faults.
Move the handling to the user-address-space code *only* and ensure we
get there by "excluding" the vsyscall page from the kernel address
space via a check in fault_in_kernel_space().

Since the fault_in_kernel_space() check is used on 32-bit, also add a
64-bit check to make it clear we only use this path on 64-bit.  Also
move the unlikely() to be in is_vsyscall_vaddr() itself.

This helps clean up the kernel fault handling path by removing a case
that can happen in normal[1] operation.  (Yeah, yeah, we can argue
about the vsyscall page being "normal" or not.)  This also makes
sanity checks easier, like the "we never take pkey faults in the
kernel address space" check in the next patch.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
---

 b/arch/x86/mm/fault.c |   38 +++++++++++++++++++++++++-------------
 1 file changed, 25 insertions(+), 13 deletions(-)

diff -puN arch/x86/mm/fault.c~vsyscall-is-user-address-space arch/x86/mm/fault.c
--- a/arch/x86/mm/fault.c~vsyscall-is-user-address-space	2018-09-27 10:17:24.487343564 -0700
+++ b/arch/x86/mm/fault.c	2018-09-27 10:17:24.490343564 -0700
@@ -848,7 +848,7 @@ show_signal_msg(struct pt_regs *regs, un
  */
 static bool is_vsyscall_vaddr(unsigned long vaddr)
 {
-	return (vaddr & PAGE_MASK) == VSYSCALL_ADDR;
+	return unlikely((vaddr & PAGE_MASK) == VSYSCALL_ADDR);
 }
 
 static void
@@ -874,18 +874,6 @@ __bad_area_nosemaphore(struct pt_regs *r
 		if (is_errata100(regs, address))
 			return;
 
-#ifdef CONFIG_X86_64
-		/*
-		 * Instruction fetch faults in the vsyscall page might need
-		 * emulation.
-		 */
-		if (unlikely((error_code & X86_PF_INSTR) &&
-			     is_vsyscall_vaddr(address))) {
-			if (emulate_vsyscall(regs, address))
-				return;
-		}
-#endif
-
 		/*
 		 * To avoid leaking information about the kernel page table
 		 * layout, pretend that user-mode accesses to kernel addresses
@@ -1194,6 +1182,14 @@ access_error(unsigned long error_code, s
 
 static int fault_in_kernel_space(unsigned long address)
 {
+	/*
+	 * On 64-bit systems, the vsyscall page is at an address above
+	 * TASK_SIZE_MAX, but is not considered part of the kernel
+	 * address space.
+	 */
+	if (IS_ENABLED(CONFIG_X86_64) && is_vsyscall_vaddr(address))
+		return false;
+
 	return address >= TASK_SIZE_MAX;
 }
 
@@ -1361,6 +1357,22 @@ void do_user_addr_fault(struct pt_regs *
 	if (sw_error_code & X86_PF_INSTR)
 		flags |= FAULT_FLAG_INSTRUCTION;
 
+#ifdef CONFIG_X86_64
+	/*
+	 * Instruction fetch faults in the vsyscall page might need
+	 * emulation.  The vsyscall page is at a high address
+	 * (>PAGE_OFFSET), but is considered to be part of the user
+	 * address space.
+	 *
+	 * The vsyscall page does not have a "real" VMA, so do this
+	 * emulation before we go searching for VMAs.
+	 */
+	if ((sw_error_code & X86_PF_INSTR) && is_vsyscall_vaddr(address)) {
+		if (emulate_vsyscall(regs, address))
+			return;
+	}
+#endif
+
 	/*
 	 * Kernel-mode access to the user address space should only occur
 	 * on well-defined single instructions listed in the exception
_

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 8/8] x86/mm: remove spurious fault pkey check
  2018-09-28 16:02 [PATCH 0/8] [v2] x86/mm: page fault handling cleanups Dave Hansen
                   ` (6 preceding siblings ...)
  2018-09-28 16:02 ` [PATCH 7/8] x86/mm/vsyscall: consider vsyscall page part of user address space Dave Hansen
@ 2018-09-28 16:02 ` Dave Hansen
  2018-10-09 15:05   ` [tip:x86/mm] x86/mm: Remove " tip-bot for Dave Hansen
  2018-10-02  9:54 ` [PATCH 0/8] [v2] x86/mm: page fault handling cleanups Peter Zijlstra
  8 siblings, 1 reply; 20+ messages in thread
From: Dave Hansen @ 2018-09-28 16:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, sean.j.christopherson, peterz, tglx, x86, luto, jannh


From: Dave Hansen <dave.hansen@linux.intel.com>

Spurious faults only ever occur in the kernel's address space.  They
are also constrained specifically to faults with one of these error codes:

	X86_PF_WRITE | X86_PF_PROT
	X86_PF_INSTR | X86_PF_PROT

So, it's never even possible to reach spurious_kernel_fault_check() with
X86_PF_PK set.

In addition, the kernel's address space never has pages with user-mode
protections.  Protection Keys are only enforced on pages with user-mode
protection.

This gives us lots of reasons to not check for protection keys in our
sprurious kernel fault handling.

But, let's also add some warnings to ensure that these assumptions about
protection keys hold true.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
---

 b/arch/x86/mm/fault.c |   13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff -puN arch/x86/mm/fault.c~pkeys-fault-warnings arch/x86/mm/fault.c
--- a/arch/x86/mm/fault.c~pkeys-fault-warnings	2018-09-27 10:17:24.992343563 -0700
+++ b/arch/x86/mm/fault.c	2018-09-27 10:17:24.995343563 -0700
@@ -1039,12 +1039,6 @@ static int spurious_kernel_fault_check(u
 
 	if ((error_code & X86_PF_INSTR) && !pte_exec(*pte))
 		return 0;
-	/*
-	 * Note: We do not do lazy flushing on protection key
-	 * changes, so no spurious fault will ever set X86_PF_PK.
-	 */
-	if ((error_code & X86_PF_PK))
-		return 1;
 
 	return 1;
 }
@@ -1220,6 +1214,13 @@ do_kern_addr_fault(struct pt_regs *regs,
 		   unsigned long address)
 {
 	/*
+	 * Protection keys exceptions only happen on user pages.  We
+	 * have no user pages in the kernel portion of the address
+	 * space, so do not expect them here.
+	 */
+	WARN_ON_ONCE(hw_error_code & X86_PF_PK);
+
+	/*
 	 * We can fault-in kernel-space virtual memory on-demand. The
 	 * 'reference' page table is init_mm.pgd.
 	 *
_

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/8] [v2] x86/mm: page fault handling cleanups
  2018-09-28 16:02 [PATCH 0/8] [v2] x86/mm: page fault handling cleanups Dave Hansen
                   ` (7 preceding siblings ...)
  2018-09-28 16:02 ` [PATCH 8/8] x86/mm: remove spurious fault pkey check Dave Hansen
@ 2018-10-02  9:54 ` Peter Zijlstra
  8 siblings, 0 replies; 20+ messages in thread
From: Peter Zijlstra @ 2018-10-02  9:54 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, sean.j.christopherson, tglx, luto, jannh, x86

On Fri, Sep 28, 2018 at 09:02:19AM -0700, Dave Hansen wrote:
> Changes from v1:
>  * Take "space_" out of do_kern/user_addr_fault()
>  * Make "bad fault" comment in do_kern_addr_fault() less scary
>  * Add clarifying comment for the conditions under which 
>    do_kern_addr_fault() is called.
>  * Remove mention of hw_error_code in search_exception_tables()
>    comment.
>  * Clarify that the exception tables spell out individual
>    instructions, not larger sections of code.
>  * Use PAGE_MASK in is_vsyscall_vaddr()
>  * Add some additional reasoning behind the code move when
>    moving the vsyscall handling to user address space handler
>  * Remove hard-coded ~0xfff and replace with PAGE_MASK
> 

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [tip:x86/mm] x86/mm: Clarify hardware vs. software "error_code"
  2018-09-28 16:02 ` [PATCH 1/8] x86/mm: clarify hardware vs. software "error_code" Dave Hansen
@ 2018-10-09 15:02   ` tip-bot for Dave Hansen
  0 siblings, 0 replies; 20+ messages in thread
From: tip-bot for Dave Hansen @ 2018-10-09 15:02 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jannh, sean.j.christopherson, luto, peterz, hpa, tglx,
	dave.hansen, mingo, linux-kernel

Commit-ID:  164477c2331be75d9bd57fb76704e676b2bcd1cd
Gitweb:     https://git.kernel.org/tip/164477c2331be75d9bd57fb76704e676b2bcd1cd
Author:     Dave Hansen <dave.hansen@linux.intel.com>
AuthorDate: Fri, 28 Sep 2018 09:02:20 -0700
Committer:  Peter Zijlstra <peterz@infradead.org>
CommitDate: Tue, 9 Oct 2018 16:51:15 +0200

x86/mm: Clarify hardware vs. software "error_code"

We pass around a variable called "error_code" all around the page
fault code.  Sounds simple enough, especially since "error_code" looks
like it exactly matches the values that the hardware gives us on the
stack to report the page fault error code (PFEC in SDM parlance).

But, that's not how it works.

For part of the page fault handler, "error_code" does exactly match
PFEC.  But, during later parts, it diverges and starts to mean
something a bit different.

Give it two names for its two jobs.

The place it diverges is also really screwy.  It's only in a spot
where the hardware tells us we have kernel-mode access that occurred
while we were in usermode accessing user-controlled address space.
Add a warning in there.

Cc: x86@kernel.org
Cc: Jann Horn <jannh@google.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180928160220.4A2272C9@viggo.jf.intel.com
---
 arch/x86/mm/fault.c | 77 ++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 52 insertions(+), 25 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 47bebfe6efa7..cd08f4fef836 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1208,9 +1208,10 @@ static inline bool smap_violation(int error_code, struct pt_regs *regs)
  * routines.
  */
 static noinline void
-__do_page_fault(struct pt_regs *regs, unsigned long error_code,
+__do_page_fault(struct pt_regs *regs, unsigned long hw_error_code,
 		unsigned long address)
 {
+	unsigned long sw_error_code;
 	struct vm_area_struct *vma;
 	struct task_struct *tsk;
 	struct mm_struct *mm;
@@ -1236,17 +1237,17 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
 	 * nothing more.
 	 *
 	 * This verifies that the fault happens in kernel space
-	 * (error_code & 4) == 0, and that the fault was not a
-	 * protection error (error_code & 9) == 0.
+	 * (hw_error_code & 4) == 0, and that the fault was not a
+	 * protection error (hw_error_code & 9) == 0.
 	 */
 	if (unlikely(fault_in_kernel_space(address))) {
-		if (!(error_code & (X86_PF_RSVD | X86_PF_USER | X86_PF_PROT))) {
+		if (!(hw_error_code & (X86_PF_RSVD | X86_PF_USER | X86_PF_PROT))) {
 			if (vmalloc_fault(address) >= 0)
 				return;
 		}
 
 		/* Can handle a stale RO->RW TLB: */
-		if (spurious_fault(error_code, address))
+		if (spurious_fault(hw_error_code, address))
 			return;
 
 		/* kprobes don't want to hook the spurious faults: */
@@ -1256,7 +1257,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
 		 * Don't take the mm semaphore here. If we fixup a prefetch
 		 * fault we could otherwise deadlock:
 		 */
-		bad_area_nosemaphore(regs, error_code, address, NULL);
+		bad_area_nosemaphore(regs, hw_error_code, address, NULL);
 
 		return;
 	}
@@ -1265,11 +1266,11 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
 	if (unlikely(kprobes_fault(regs)))
 		return;
 
-	if (unlikely(error_code & X86_PF_RSVD))
-		pgtable_bad(regs, error_code, address);
+	if (unlikely(hw_error_code & X86_PF_RSVD))
+		pgtable_bad(regs, hw_error_code, address);
 
-	if (unlikely(smap_violation(error_code, regs))) {
-		bad_area_nosemaphore(regs, error_code, address, NULL);
+	if (unlikely(smap_violation(hw_error_code, regs))) {
+		bad_area_nosemaphore(regs, hw_error_code, address, NULL);
 		return;
 	}
 
@@ -1278,10 +1279,17 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
 	 * in a region with pagefaults disabled then we must not take the fault
 	 */
 	if (unlikely(faulthandler_disabled() || !mm)) {
-		bad_area_nosemaphore(regs, error_code, address, NULL);
+		bad_area_nosemaphore(regs, hw_error_code, address, NULL);
 		return;
 	}
 
+	/*
+	 * hw_error_code is literally the "page fault error code" passed to
+	 * the kernel directly from the hardware.  But, we will shortly be
+	 * modifying it in software, so give it a new name.
+	 */
+	sw_error_code = hw_error_code;
+
 	/*
 	 * It's safe to allow irq's after cr2 has been saved and the
 	 * vmalloc fault has been handled.
@@ -1291,7 +1299,26 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
 	 */
 	if (user_mode(regs)) {
 		local_irq_enable();
-		error_code |= X86_PF_USER;
+		/*
+		 * Up to this point, X86_PF_USER set in hw_error_code
+		 * indicated a user-mode access.  But, after this,
+		 * X86_PF_USER in sw_error_code will indicate either
+		 * that, *or* an implicit kernel(supervisor)-mode access
+		 * which originated from user mode.
+		 */
+		if (!(hw_error_code & X86_PF_USER)) {
+			/*
+			 * The CPU was in user mode, but the CPU says
+			 * the fault was not a user-mode access.
+			 * Must be an implicit kernel-mode access,
+			 * which we do not expect to happen in the
+			 * user address space.
+			 */
+			pr_warn_once("kernel-mode error from user-mode: %lx\n",
+					hw_error_code);
+
+			sw_error_code |= X86_PF_USER;
+		}
 		flags |= FAULT_FLAG_USER;
 	} else {
 		if (regs->flags & X86_EFLAGS_IF)
@@ -1300,9 +1327,9 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
 
 	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
 
-	if (error_code & X86_PF_WRITE)
+	if (sw_error_code & X86_PF_WRITE)
 		flags |= FAULT_FLAG_WRITE;
-	if (error_code & X86_PF_INSTR)
+	if (sw_error_code & X86_PF_INSTR)
 		flags |= FAULT_FLAG_INSTRUCTION;
 
 	/*
@@ -1322,9 +1349,9 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
 	 * space check, thus avoiding the deadlock:
 	 */
 	if (unlikely(!down_read_trylock(&mm->mmap_sem))) {
-		if (!(error_code & X86_PF_USER) &&
+		if (!(sw_error_code & X86_PF_USER) &&
 		    !search_exception_tables(regs->ip)) {
-			bad_area_nosemaphore(regs, error_code, address, NULL);
+			bad_area_nosemaphore(regs, sw_error_code, address, NULL);
 			return;
 		}
 retry:
@@ -1340,16 +1367,16 @@ retry:
 
 	vma = find_vma(mm, address);
 	if (unlikely(!vma)) {
-		bad_area(regs, error_code, address);
+		bad_area(regs, sw_error_code, address);
 		return;
 	}
 	if (likely(vma->vm_start <= address))
 		goto good_area;
 	if (unlikely(!(vma->vm_flags & VM_GROWSDOWN))) {
-		bad_area(regs, error_code, address);
+		bad_area(regs, sw_error_code, address);
 		return;
 	}
-	if (error_code & X86_PF_USER) {
+	if (sw_error_code & X86_PF_USER) {
 		/*
 		 * Accessing the stack below %sp is always a bug.
 		 * The large cushion allows instructions like enter
@@ -1357,12 +1384,12 @@ retry:
 		 * 32 pointers and then decrements %sp by 65535.)
 		 */
 		if (unlikely(address + 65536 + 32 * sizeof(unsigned long) < regs->sp)) {
-			bad_area(regs, error_code, address);
+			bad_area(regs, sw_error_code, address);
 			return;
 		}
 	}
 	if (unlikely(expand_stack(vma, address))) {
-		bad_area(regs, error_code, address);
+		bad_area(regs, sw_error_code, address);
 		return;
 	}
 
@@ -1371,8 +1398,8 @@ retry:
 	 * we can handle it..
 	 */
 good_area:
-	if (unlikely(access_error(error_code, vma))) {
-		bad_area_access_error(regs, error_code, address, vma);
+	if (unlikely(access_error(sw_error_code, vma))) {
+		bad_area_access_error(regs, sw_error_code, address, vma);
 		return;
 	}
 
@@ -1414,13 +1441,13 @@ good_area:
 			return;
 
 		/* Not returning to user mode? Handle exceptions or die: */
-		no_context(regs, error_code, address, SIGBUS, BUS_ADRERR);
+		no_context(regs, sw_error_code, address, SIGBUS, BUS_ADRERR);
 		return;
 	}
 
 	up_read(&mm->mmap_sem);
 	if (unlikely(fault & VM_FAULT_ERROR)) {
-		mm_fault_error(regs, error_code, address, &pkey, fault);
+		mm_fault_error(regs, sw_error_code, address, &pkey, fault);
 		return;
 	}
 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [tip:x86/mm] x86/mm: Break out kernel address space handling
  2018-09-28 16:02 ` [PATCH 2/8] x86/mm: break out kernel address space handling Dave Hansen
@ 2018-10-09 15:02   ` tip-bot for Dave Hansen
  0 siblings, 0 replies; 20+ messages in thread
From: tip-bot for Dave Hansen @ 2018-10-09 15:02 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jannh, hpa, sean.j.christopherson, tglx, peterz, dave.hansen,
	linux-kernel, luto, mingo

Commit-ID:  8fed62000039058adfd8b663344e2f448aed1e7a
Gitweb:     https://git.kernel.org/tip/8fed62000039058adfd8b663344e2f448aed1e7a
Author:     Dave Hansen <dave.hansen@linux.intel.com>
AuthorDate: Fri, 28 Sep 2018 09:02:22 -0700
Committer:  Peter Zijlstra <peterz@infradead.org>
CommitDate: Tue, 9 Oct 2018 16:51:15 +0200

x86/mm: Break out kernel address space handling

The page fault handler (__do_page_fault())  basically has two sections:
one for handling faults in the kernel portion of the address space
and another for faults in the user portion of the address space.

But, these two parts don't stick out that well.  Let's make that more
clear from code separation and naming.  Pull kernel fault
handling into its own helper, and reflect that naming by renaming
spurious_fault() -> spurious_kernel_fault().

Also, rewrite the vmalloc() handling comment a bit.  It was a bit
stale and also glossed over the reserved bit handling.

Cc: x86@kernel.org
Cc: Jann Horn <jannh@google.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180928160222.401F4E10@viggo.jf.intel.com
---
 arch/x86/mm/fault.c | 101 ++++++++++++++++++++++++++++++++--------------------
 1 file changed, 62 insertions(+), 39 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index cd08f4fef836..c7e32f453852 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1032,7 +1032,7 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code,
 	}
 }
 
-static int spurious_fault_check(unsigned long error_code, pte_t *pte)
+static int spurious_kernel_fault_check(unsigned long error_code, pte_t *pte)
 {
 	if ((error_code & X86_PF_WRITE) && !pte_write(*pte))
 		return 0;
@@ -1071,7 +1071,7 @@ static int spurious_fault_check(unsigned long error_code, pte_t *pte)
  * (Optional Invalidation).
  */
 static noinline int
-spurious_fault(unsigned long error_code, unsigned long address)
+spurious_kernel_fault(unsigned long error_code, unsigned long address)
 {
 	pgd_t *pgd;
 	p4d_t *p4d;
@@ -1102,27 +1102,27 @@ spurious_fault(unsigned long error_code, unsigned long address)
 		return 0;
 
 	if (p4d_large(*p4d))
-		return spurious_fault_check(error_code, (pte_t *) p4d);
+		return spurious_kernel_fault_check(error_code, (pte_t *) p4d);
 
 	pud = pud_offset(p4d, address);
 	if (!pud_present(*pud))
 		return 0;
 
 	if (pud_large(*pud))
-		return spurious_fault_check(error_code, (pte_t *) pud);
+		return spurious_kernel_fault_check(error_code, (pte_t *) pud);
 
 	pmd = pmd_offset(pud, address);
 	if (!pmd_present(*pmd))
 		return 0;
 
 	if (pmd_large(*pmd))
-		return spurious_fault_check(error_code, (pte_t *) pmd);
+		return spurious_kernel_fault_check(error_code, (pte_t *) pmd);
 
 	pte = pte_offset_kernel(pmd, address);
 	if (!pte_present(*pte))
 		return 0;
 
-	ret = spurious_fault_check(error_code, pte);
+	ret = spurious_kernel_fault_check(error_code, pte);
 	if (!ret)
 		return 0;
 
@@ -1130,12 +1130,12 @@ spurious_fault(unsigned long error_code, unsigned long address)
 	 * Make sure we have permissions in PMD.
 	 * If not, then there's a bug in the page tables:
 	 */
-	ret = spurious_fault_check(error_code, (pte_t *) pmd);
+	ret = spurious_kernel_fault_check(error_code, (pte_t *) pmd);
 	WARN_ONCE(!ret, "PMD has incorrect permission bits\n");
 
 	return ret;
 }
-NOKPROBE_SYMBOL(spurious_fault);
+NOKPROBE_SYMBOL(spurious_kernel_fault);
 
 int show_unhandled_signals = 1;
 
@@ -1202,6 +1202,58 @@ static inline bool smap_violation(int error_code, struct pt_regs *regs)
 	return true;
 }
 
+/*
+ * Called for all faults where 'address' is part of the kernel address
+ * space.  Might get called for faults that originate from *code* that
+ * ran in userspace or the kernel.
+ */
+static void
+do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code,
+		   unsigned long address)
+{
+	/*
+	 * We can fault-in kernel-space virtual memory on-demand. The
+	 * 'reference' page table is init_mm.pgd.
+	 *
+	 * NOTE! We MUST NOT take any locks for this case. We may
+	 * be in an interrupt or a critical region, and should
+	 * only copy the information from the master page table,
+	 * nothing more.
+	 *
+	 * Before doing this on-demand faulting, ensure that the
+	 * fault is not any of the following:
+	 * 1. A fault on a PTE with a reserved bit set.
+	 * 2. A fault caused by a user-mode access.  (Do not demand-
+	 *    fault kernel memory due to user-mode accesses).
+	 * 3. A fault caused by a page-level protection violation.
+	 *    (A demand fault would be on a non-present page which
+	 *     would have X86_PF_PROT==0).
+	 */
+	if (!(hw_error_code & (X86_PF_RSVD | X86_PF_USER | X86_PF_PROT))) {
+		if (vmalloc_fault(address) >= 0)
+			return;
+	}
+
+	/* Was the fault spurious, caused by lazy TLB invalidation? */
+	if (spurious_kernel_fault(hw_error_code, address))
+		return;
+
+	/* kprobes don't want to hook the spurious faults: */
+	if (kprobes_fault(regs))
+		return;
+
+	/*
+	 * Note, despite being a "bad area", there are quite a few
+	 * acceptable reasons to get here, such as erratum fixups
+	 * and handling kernel code that can fault, like get_user().
+	 *
+	 * Don't take the mm semaphore here. If we fixup a prefetch
+	 * fault we could otherwise deadlock:
+	 */
+	bad_area_nosemaphore(regs, hw_error_code, address, NULL);
+}
+NOKPROBE_SYMBOL(do_kern_addr_fault);
+
 /*
  * This routine handles page faults.  It determines the address,
  * and the problem, and then passes it off to one of the appropriate
@@ -1227,38 +1279,9 @@ __do_page_fault(struct pt_regs *regs, unsigned long hw_error_code,
 	if (unlikely(kmmio_fault(regs, address)))
 		return;
 
-	/*
-	 * We fault-in kernel-space virtual memory on-demand. The
-	 * 'reference' page table is init_mm.pgd.
-	 *
-	 * NOTE! We MUST NOT take any locks for this case. We may
-	 * be in an interrupt or a critical region, and should
-	 * only copy the information from the master page table,
-	 * nothing more.
-	 *
-	 * This verifies that the fault happens in kernel space
-	 * (hw_error_code & 4) == 0, and that the fault was not a
-	 * protection error (hw_error_code & 9) == 0.
-	 */
+	/* Was the fault on kernel-controlled part of the address space? */
 	if (unlikely(fault_in_kernel_space(address))) {
-		if (!(hw_error_code & (X86_PF_RSVD | X86_PF_USER | X86_PF_PROT))) {
-			if (vmalloc_fault(address) >= 0)
-				return;
-		}
-
-		/* Can handle a stale RO->RW TLB: */
-		if (spurious_fault(hw_error_code, address))
-			return;
-
-		/* kprobes don't want to hook the spurious faults: */
-		if (kprobes_fault(regs))
-			return;
-		/*
-		 * Don't take the mm semaphore here. If we fixup a prefetch
-		 * fault we could otherwise deadlock:
-		 */
-		bad_area_nosemaphore(regs, hw_error_code, address, NULL);
-
+		do_kern_addr_fault(regs, hw_error_code, address);
 		return;
 	}
 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [tip:x86/mm] x86/mm: Break out user address space handling
  2018-09-28 16:02 ` [PATCH 3/8] x86/mm: break out user " Dave Hansen
@ 2018-10-09 15:03   ` tip-bot for Dave Hansen
  2018-10-15  5:43     ` Eric W. Biederman
  0 siblings, 1 reply; 20+ messages in thread
From: tip-bot for Dave Hansen @ 2018-10-09 15:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: luto, peterz, sean.j.christopherson, jannh, mingo, hpa,
	linux-kernel, tglx, dave.hansen

Commit-ID:  aa37c51b9421d66f7931c5fdcb9ce80c450974be
Gitweb:     https://git.kernel.org/tip/aa37c51b9421d66f7931c5fdcb9ce80c450974be
Author:     Dave Hansen <dave.hansen@linux.intel.com>
AuthorDate: Fri, 28 Sep 2018 09:02:23 -0700
Committer:  Peter Zijlstra <peterz@infradead.org>
CommitDate: Tue, 9 Oct 2018 16:51:15 +0200

x86/mm: Break out user address space handling

The last patch broke out kernel address space handing into its own
helper.  Now, do the same for user address space handling.

Cc: x86@kernel.org
Cc: Jann Horn <jannh@google.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180928160223.9C4F6440@viggo.jf.intel.com
---
 arch/x86/mm/fault.c | 47 ++++++++++++++++++++++++++++-------------------
 1 file changed, 28 insertions(+), 19 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index c7e32f453852..0d1f5d39fc63 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -966,6 +966,7 @@ bad_area_access_error(struct pt_regs *regs, unsigned long error_code,
 		__bad_area(regs, error_code, address, vma, SEGV_ACCERR);
 }
 
+/* Handle faults in the kernel portion of the address space */
 static void
 do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
 	  u32 *pkey, unsigned int fault)
@@ -1254,14 +1255,11 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code,
 }
 NOKPROBE_SYMBOL(do_kern_addr_fault);
 
-/*
- * This routine handles page faults.  It determines the address,
- * and the problem, and then passes it off to one of the appropriate
- * routines.
- */
-static noinline void
-__do_page_fault(struct pt_regs *regs, unsigned long hw_error_code,
-		unsigned long address)
+/* Handle faults in the user portion of the address space */
+static inline
+void do_user_addr_fault(struct pt_regs *regs,
+			unsigned long hw_error_code,
+			unsigned long address)
 {
 	unsigned long sw_error_code;
 	struct vm_area_struct *vma;
@@ -1274,17 +1272,6 @@ __do_page_fault(struct pt_regs *regs, unsigned long hw_error_code,
 	tsk = current;
 	mm = tsk->mm;
 
-	prefetchw(&mm->mmap_sem);
-
-	if (unlikely(kmmio_fault(regs, address)))
-		return;
-
-	/* Was the fault on kernel-controlled part of the address space? */
-	if (unlikely(fault_in_kernel_space(address))) {
-		do_kern_addr_fault(regs, hw_error_code, address);
-		return;
-	}
-
 	/* kprobes don't want to hook the spurious faults: */
 	if (unlikely(kprobes_fault(regs)))
 		return;
@@ -1488,6 +1475,28 @@ good_area:
 
 	check_v8086_mode(regs, address, tsk);
 }
+NOKPROBE_SYMBOL(do_user_addr_fault);
+
+/*
+ * This routine handles page faults.  It determines the address,
+ * and the problem, and then passes it off to one of the appropriate
+ * routines.
+ */
+static noinline void
+__do_page_fault(struct pt_regs *regs, unsigned long hw_error_code,
+		unsigned long address)
+{
+	prefetchw(&current->mm->mmap_sem);
+
+	if (unlikely(kmmio_fault(regs, address)))
+		return;
+
+	/* Was the fault on kernel-controlled part of the address space? */
+	if (unlikely(fault_in_kernel_space(address)))
+		do_kern_addr_fault(regs, hw_error_code, address);
+	else
+		do_user_addr_fault(regs, hw_error_code, address);
+}
 NOKPROBE_SYMBOL(__do_page_fault);
 
 static nokprobe_inline void

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [tip:x86/mm] x86/mm: Add clarifying comments for user addr space
  2018-09-28 16:02 ` [PATCH 4/8] x86/mm: add clarifying comments for user addr space Dave Hansen
@ 2018-10-09 15:03   ` tip-bot for Dave Hansen
  0 siblings, 0 replies; 20+ messages in thread
From: tip-bot for Dave Hansen @ 2018-10-09 15:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jannh, sean.j.christopherson, hpa, luto, tglx, dave.hansen,
	linux-kernel, mingo, peterz

Commit-ID:  5b0c2cac54d44ea40d5c5aec7cbf14353b2a903e
Gitweb:     https://git.kernel.org/tip/5b0c2cac54d44ea40d5c5aec7cbf14353b2a903e
Author:     Dave Hansen <dave.hansen@linux.intel.com>
AuthorDate: Fri, 28 Sep 2018 09:02:25 -0700
Committer:  Peter Zijlstra <peterz@infradead.org>
CommitDate: Tue, 9 Oct 2018 16:51:16 +0200

x86/mm: Add clarifying comments for user addr space

The SMAP and Reserved checking do not have nice comments.  Add
some to clarify and make it match everything else.

Cc: x86@kernel.org
Cc: Jann Horn <jannh@google.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180928160225.FFD44B8D@viggo.jf.intel.com
---
 arch/x86/mm/fault.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 0d1f5d39fc63..1d838701a5f7 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1276,9 +1276,17 @@ void do_user_addr_fault(struct pt_regs *regs,
 	if (unlikely(kprobes_fault(regs)))
 		return;
 
+	/*
+	 * Reserved bits are never expected to be set on
+	 * entries in the user portion of the page tables.
+	 */
 	if (unlikely(hw_error_code & X86_PF_RSVD))
 		pgtable_bad(regs, hw_error_code, address);
 
+	/*
+	 * Check for invalid kernel (supervisor) access to user
+	 * pages in the user address space.
+	 */
 	if (unlikely(smap_violation(hw_error_code, regs))) {
 		bad_area_nosemaphore(regs, hw_error_code, address, NULL);
 		return;

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [tip:x86/mm] x86/mm: Fix exception table comments
  2018-09-28 16:02 ` [PATCH 5/8] x86/mm: fix exception table comments Dave Hansen
@ 2018-10-09 15:04   ` tip-bot for Dave Hansen
  0 siblings, 0 replies; 20+ messages in thread
From: tip-bot for Dave Hansen @ 2018-10-09 15:04 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: luto, sean.j.christopherson, dave.hansen, mingo, hpa,
	linux-kernel, jannh, peterz, tglx

Commit-ID:  88259744e253777e898c186f08670c86dd8199bf
Gitweb:     https://git.kernel.org/tip/88259744e253777e898c186f08670c86dd8199bf
Author:     Dave Hansen <dave.hansen@linux.intel.com>
AuthorDate: Fri, 28 Sep 2018 09:02:27 -0700
Committer:  Peter Zijlstra <peterz@infradead.org>
CommitDate: Tue, 9 Oct 2018 16:51:16 +0200

x86/mm: Fix exception table comments

The comments here are wrong.  They are too absolute about where
faults can occur when running in the kernel.  The comments are
also a bit hard to match up with the code.

Trim down the comments, and make them more precise.

Also add a comment explaining why we are doing the
bad_area_nosemaphore() path here.

Cc: x86@kernel.org
Cc: Jann Horn <jannh@google.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180928160227.077DDD7A@viggo.jf.intel.com
---
 arch/x86/mm/fault.c | 28 +++++++++++++++-------------
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 1d838701a5f7..57b074b02ebb 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1351,24 +1351,26 @@ void do_user_addr_fault(struct pt_regs *regs,
 		flags |= FAULT_FLAG_INSTRUCTION;
 
 	/*
-	 * When running in the kernel we expect faults to occur only to
-	 * addresses in user space.  All other faults represent errors in
-	 * the kernel and should generate an OOPS.  Unfortunately, in the
-	 * case of an erroneous fault occurring in a code path which already
-	 * holds mmap_sem we will deadlock attempting to validate the fault
-	 * against the address space.  Luckily the kernel only validly
-	 * references user space from well defined areas of code, which are
-	 * listed in the exceptions table.
+	 * Kernel-mode access to the user address space should only occur
+	 * on well-defined single instructions listed in the exception
+	 * tables.  But, an erroneous kernel fault occurring outside one of
+	 * those areas which also holds mmap_sem might deadlock attempting
+	 * to validate the fault against the address space.
 	 *
-	 * As the vast majority of faults will be valid we will only perform
-	 * the source reference check when there is a possibility of a
-	 * deadlock. Attempt to lock the address space, if we cannot we then
-	 * validate the source. If this is invalid we can skip the address
-	 * space check, thus avoiding the deadlock:
+	 * Only do the expensive exception table search when we might be at
+	 * risk of a deadlock.  This happens if we
+	 * 1. Failed to acquire mmap_sem, and
+	 * 2. The access did not originate in userspace.  Note: either the
+	 *    hardware or earlier page fault code may set X86_PF_USER
+	 *    in sw_error_code.
 	 */
 	if (unlikely(!down_read_trylock(&mm->mmap_sem))) {
 		if (!(sw_error_code & X86_PF_USER) &&
 		    !search_exception_tables(regs->ip)) {
+			/*
+			 * Fault from code in kernel from
+			 * which we do not expect faults.
+			 */
 			bad_area_nosemaphore(regs, sw_error_code, address, NULL);
 			return;
 		}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [tip:x86/mm] x86/mm: Add vsyscall address helper
  2018-09-28 16:02 ` [PATCH 6/8] x86/mm: add vsyscall address helper Dave Hansen
@ 2018-10-09 15:04   ` tip-bot for Dave Hansen
  0 siblings, 0 replies; 20+ messages in thread
From: tip-bot for Dave Hansen @ 2018-10-09 15:04 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, dave.hansen, linux-kernel, luto, mingo, tglx, hpa,
	sean.j.christopherson, jannh

Commit-ID:  02e983b760c0d4183c28d625a3c99016e2cd8a7f
Gitweb:     https://git.kernel.org/tip/02e983b760c0d4183c28d625a3c99016e2cd8a7f
Author:     Dave Hansen <dave.hansen@linux.intel.com>
AuthorDate: Fri, 28 Sep 2018 09:02:28 -0700
Committer:  Peter Zijlstra <peterz@infradead.org>
CommitDate: Tue, 9 Oct 2018 16:51:16 +0200

x86/mm: Add vsyscall address helper

We will shortly be using this check in two locations.  Put it in
a helper before we do so.

Let's also insert PAGE_MASK instead of the open-coded ~0xfff.
It is easier to read and also more obviously correct considering
the implicit type conversion that has to happen when it is not
an implicit 'unsigned long'.

Cc: x86@kernel.org
Cc: Jann Horn <jannh@google.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180928160228.C593509B@viggo.jf.intel.com
---
 arch/x86/mm/fault.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 57b074b02ebb..7a627ac3a0d2 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -840,6 +840,15 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code,
 	show_opcodes(regs, loglvl);
 }
 
+/*
+ * The (legacy) vsyscall page is the long page in the kernel portion
+ * of the address space that has user-accessible permissions.
+ */
+static bool is_vsyscall_vaddr(unsigned long vaddr)
+{
+	return (vaddr & PAGE_MASK) == VSYSCALL_ADDR;
+}
+
 static void
 __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
 		       unsigned long address, u32 *pkey, int si_code)
@@ -869,7 +878,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
 		 * emulation.
 		 */
 		if (unlikely((error_code & X86_PF_INSTR) &&
-			     ((address & ~0xfff) == VSYSCALL_ADDR))) {
+			     is_vsyscall_vaddr(address))) {
 			if (emulate_vsyscall(regs, address))
 				return;
 		}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [tip:x86/mm] x86/mm/vsyscall: Consider vsyscall page part of user address space
  2018-09-28 16:02 ` [PATCH 7/8] x86/mm/vsyscall: consider vsyscall page part of user address space Dave Hansen
@ 2018-10-09 15:05   ` tip-bot for Dave Hansen
  0 siblings, 0 replies; 20+ messages in thread
From: tip-bot for Dave Hansen @ 2018-10-09 15:05 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, luto, jannh, sean.j.christopherson, tglx, peterz,
	dave.hansen, hpa, linux-kernel

Commit-ID:  3ae0ad92f53e0f05cf6ab781230b7902b88f73cd
Gitweb:     https://git.kernel.org/tip/3ae0ad92f53e0f05cf6ab781230b7902b88f73cd
Author:     Dave Hansen <dave.hansen@linux.intel.com>
AuthorDate: Fri, 28 Sep 2018 09:02:30 -0700
Committer:  Peter Zijlstra <peterz@infradead.org>
CommitDate: Tue, 9 Oct 2018 16:51:16 +0200

x86/mm/vsyscall: Consider vsyscall page part of user address space

The vsyscall page is weird.  It is in what is traditionally part of
the kernel address space.  But, it has user permissions and we handle
faults on it like we would on a user page: interrupts on.

Right now, we handle vsyscall emulation in the "bad_area" code, which
is used for both user-address-space and kernel-address-space faults.
Move the handling to the user-address-space code *only* and ensure we
get there by "excluding" the vsyscall page from the kernel address
space via a check in fault_in_kernel_space().

Since the fault_in_kernel_space() check is used on 32-bit, also add a
64-bit check to make it clear we only use this path on 64-bit.  Also
move the unlikely() to be in is_vsyscall_vaddr() itself.

This helps clean up the kernel fault handling path by removing a case
that can happen in normal[1] operation.  (Yeah, yeah, we can argue
about the vsyscall page being "normal" or not.)  This also makes
sanity checks easier, like the "we never take pkey faults in the
kernel address space" check in the next patch.

Cc: x86@kernel.org
Cc: Jann Horn <jannh@google.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180928160230.6E9336EE@viggo.jf.intel.com
---
 arch/x86/mm/fault.c | 38 +++++++++++++++++++++++++-------------
 1 file changed, 25 insertions(+), 13 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 7a627ac3a0d2..7e0fa7e24168 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -846,7 +846,7 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code,
  */
 static bool is_vsyscall_vaddr(unsigned long vaddr)
 {
-	return (vaddr & PAGE_MASK) == VSYSCALL_ADDR;
+	return unlikely((vaddr & PAGE_MASK) == VSYSCALL_ADDR);
 }
 
 static void
@@ -872,18 +872,6 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
 		if (is_errata100(regs, address))
 			return;
 
-#ifdef CONFIG_X86_64
-		/*
-		 * Instruction fetch faults in the vsyscall page might need
-		 * emulation.
-		 */
-		if (unlikely((error_code & X86_PF_INSTR) &&
-			     is_vsyscall_vaddr(address))) {
-			if (emulate_vsyscall(regs, address))
-				return;
-		}
-#endif
-
 		/*
 		 * To avoid leaking information about the kernel page table
 		 * layout, pretend that user-mode accesses to kernel addresses
@@ -1192,6 +1180,14 @@ access_error(unsigned long error_code, struct vm_area_struct *vma)
 
 static int fault_in_kernel_space(unsigned long address)
 {
+	/*
+	 * On 64-bit systems, the vsyscall page is at an address above
+	 * TASK_SIZE_MAX, but is not considered part of the kernel
+	 * address space.
+	 */
+	if (IS_ENABLED(CONFIG_X86_64) && is_vsyscall_vaddr(address))
+		return false;
+
 	return address >= TASK_SIZE_MAX;
 }
 
@@ -1359,6 +1355,22 @@ void do_user_addr_fault(struct pt_regs *regs,
 	if (sw_error_code & X86_PF_INSTR)
 		flags |= FAULT_FLAG_INSTRUCTION;
 
+#ifdef CONFIG_X86_64
+	/*
+	 * Instruction fetch faults in the vsyscall page might need
+	 * emulation.  The vsyscall page is at a high address
+	 * (>PAGE_OFFSET), but is considered to be part of the user
+	 * address space.
+	 *
+	 * The vsyscall page does not have a "real" VMA, so do this
+	 * emulation before we go searching for VMAs.
+	 */
+	if ((sw_error_code & X86_PF_INSTR) && is_vsyscall_vaddr(address)) {
+		if (emulate_vsyscall(regs, address))
+			return;
+	}
+#endif
+
 	/*
 	 * Kernel-mode access to the user address space should only occur
 	 * on well-defined single instructions listed in the exception

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [tip:x86/mm] x86/mm: Remove spurious fault pkey check
  2018-09-28 16:02 ` [PATCH 8/8] x86/mm: remove spurious fault pkey check Dave Hansen
@ 2018-10-09 15:05   ` tip-bot for Dave Hansen
  0 siblings, 0 replies; 20+ messages in thread
From: tip-bot for Dave Hansen @ 2018-10-09 15:05 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, mingo, tglx, dave.hansen, jannh, linux-kernel, hpa, luto,
	sean.j.christopherson

Commit-ID:  367e3f1d3fc9bbf1e626da7aea527f40babf8079
Gitweb:     https://git.kernel.org/tip/367e3f1d3fc9bbf1e626da7aea527f40babf8079
Author:     Dave Hansen <dave.hansen@linux.intel.com>
AuthorDate: Fri, 28 Sep 2018 09:02:31 -0700
Committer:  Peter Zijlstra <peterz@infradead.org>
CommitDate: Tue, 9 Oct 2018 16:51:16 +0200

x86/mm: Remove spurious fault pkey check

Spurious faults only ever occur in the kernel's address space.  They
are also constrained specifically to faults with one of these error codes:

	X86_PF_WRITE | X86_PF_PROT
	X86_PF_INSTR | X86_PF_PROT

So, it's never even possible to reach spurious_kernel_fault_check() with
X86_PF_PK set.

In addition, the kernel's address space never has pages with user-mode
protections.  Protection Keys are only enforced on pages with user-mode
protection.

This gives us lots of reasons to not check for protection keys in our
sprurious kernel fault handling.

But, let's also add some warnings to ensure that these assumptions about
protection keys hold true.

Cc: x86@kernel.org
Cc: Jann Horn <jannh@google.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180928160231.243A0D6A@viggo.jf.intel.com
---
 arch/x86/mm/fault.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 7e0fa7e24168..a16652982f98 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1037,12 +1037,6 @@ static int spurious_kernel_fault_check(unsigned long error_code, pte_t *pte)
 
 	if ((error_code & X86_PF_INSTR) && !pte_exec(*pte))
 		return 0;
-	/*
-	 * Note: We do not do lazy flushing on protection key
-	 * changes, so no spurious fault will ever set X86_PF_PK.
-	 */
-	if ((error_code & X86_PF_PK))
-		return 1;
 
 	return 1;
 }
@@ -1217,6 +1211,13 @@ static void
 do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code,
 		   unsigned long address)
 {
+	/*
+	 * Protection keys exceptions only happen on user pages.  We
+	 * have no user pages in the kernel portion of the address
+	 * space, so do not expect them here.
+	 */
+	WARN_ON_ONCE(hw_error_code & X86_PF_PK);
+
 	/*
 	 * We can fault-in kernel-space virtual memory on-demand. The
 	 * 'reference' page table is init_mm.pgd.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tip:x86/mm] x86/mm: Break out user address space handling
  2018-10-09 15:03   ` [tip:x86/mm] x86/mm: Break " tip-bot for Dave Hansen
@ 2018-10-15  5:43     ` Eric W. Biederman
  2018-10-19  5:58       ` Ingo Molnar
  0 siblings, 1 reply; 20+ messages in thread
From: Eric W. Biederman @ 2018-10-15  5:43 UTC (permalink / raw)
  To: tip-bot for Dave Hansen
  Cc: linux-tip-commits, dave.hansen, tglx, linux-kernel, hpa, mingo,
	jannh, sean.j.christopherson, peterz, luto

tip-bot for Dave Hansen <tipbot@zytor.com> writes:

> Commit-ID:  aa37c51b9421d66f7931c5fdcb9ce80c450974be
> Gitweb:     https://git.kernel.org/tip/aa37c51b9421d66f7931c5fdcb9ce80c450974be
> Author:     Dave Hansen <dave.hansen@linux.intel.com>
> AuthorDate: Fri, 28 Sep 2018 09:02:23 -0700
> Committer:  Peter Zijlstra <peterz@infradead.org>
> CommitDate: Tue, 9 Oct 2018 16:51:15 +0200
>
> x86/mm: Break out user address space handling
>
> The last patch broke out kernel address space handing into its own
> helper.  Now, do the same for user address space handling.
>
> Cc: x86@kernel.org
> Cc: Jann Horn <jannh@google.com>
> Cc: Sean Christopherson <sean.j.christopherson@intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: http://lkml.kernel.org/r/20180928160223.9C4F6440@viggo.jf.intel.com
> ---
>  arch/x86/mm/fault.c | 47 ++++++++++++++++++++++++++++-------------------
>  1 file changed, 28 insertions(+), 19 deletions(-)
>
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index c7e32f453852..0d1f5d39fc63 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -966,6 +966,7 @@ bad_area_access_error(struct pt_regs *regs, unsigned long error_code,
>  		__bad_area(regs, error_code, address, vma, SEGV_ACCERR);
>  }
>  
> +/* Handle faults in the kernel portion of the address space */
                           ^^^^^^
I believe you mean the __user__ portion of the address space.
Given that the call chain is:

do_user_addr_fault
   handle_mm_fault
      do_sigbus      

>  static void
>  do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
>  	  u32 *pkey, unsigned int fault)
> @@ -1254,14 +1255,11 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code,
>  }
>  NOKPROBE_SYMBOL(do_kern_addr_fault);
>  
> -/*
> - * This routine handles page faults.  It determines the address,
> - * and the problem, and then passes it off to one of the appropriate
> - * routines.
> - */
> -static noinline void
> -__do_page_fault(struct pt_regs *regs, unsigned long hw_error_code,
> -		unsigned long address)
> +/* Handle faults in the user portion of the address space */
> +static inline
> +void do_user_addr_fault(struct pt_regs *regs,
> +			unsigned long hw_error_code,
> +			unsigned long address)
>  {
>  	unsigned long sw_error_code;
>  	struct vm_area_struct *vma;
> @@ -1274,17 +1272,6 @@ __do_page_fault(struct pt_regs *regs, unsigned long hw_error_code,
>  	tsk = current;
>  	mm = tsk->mm;
>  
> -	prefetchw(&mm->mmap_sem);
> -
> -	if (unlikely(kmmio_fault(regs, address)))
> -		return;
> -
> -	/* Was the fault on kernel-controlled part of the address space? */
> -	if (unlikely(fault_in_kernel_space(address))) {
> -		do_kern_addr_fault(regs, hw_error_code, address);
> -		return;
> -	}
> -
>  	/* kprobes don't want to hook the spurious faults: */
>  	if (unlikely(kprobes_fault(regs)))
>  		return;
> @@ -1488,6 +1475,28 @@ good_area:
>  
>  	check_v8086_mode(regs, address, tsk);
>  }
> +NOKPROBE_SYMBOL(do_user_addr_fault);
> +
> +/*
> + * This routine handles page faults.  It determines the address,
> + * and the problem, and then passes it off to one of the appropriate
> + * routines.
> + */
> +static noinline void
> +__do_page_fault(struct pt_regs *regs, unsigned long hw_error_code,
> +		unsigned long address)
> +{
> +	prefetchw(&current->mm->mmap_sem);
> +
> +	if (unlikely(kmmio_fault(regs, address)))
> +		return;
> +
> +	/* Was the fault on kernel-controlled part of the address space? */
> +	if (unlikely(fault_in_kernel_space(address)))
> +		do_kern_addr_fault(regs, hw_error_code, address);
> +	else
> +		do_user_addr_fault(regs, hw_error_code, address);
> +}
>  NOKPROBE_SYMBOL(__do_page_fault);
>  
>  static nokprobe_inline void

Eric

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tip:x86/mm] x86/mm: Break out user address space handling
  2018-10-15  5:43     ` Eric W. Biederman
@ 2018-10-19  5:58       ` Ingo Molnar
  0 siblings, 0 replies; 20+ messages in thread
From: Ingo Molnar @ 2018-10-19  5:58 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: tip-bot for Dave Hansen, linux-tip-commits, dave.hansen, tglx,
	linux-kernel, hpa, jannh, sean.j.christopherson, peterz, luto


* Eric W. Biederman <ebiederm@xmission.com> wrote:

> tip-bot for Dave Hansen <tipbot@zytor.com> writes:
> 
> > Commit-ID:  aa37c51b9421d66f7931c5fdcb9ce80c450974be
> > Gitweb:     https://git.kernel.org/tip/aa37c51b9421d66f7931c5fdcb9ce80c450974be
> > Author:     Dave Hansen <dave.hansen@linux.intel.com>
> > AuthorDate: Fri, 28 Sep 2018 09:02:23 -0700
> > Committer:  Peter Zijlstra <peterz@infradead.org>
> > CommitDate: Tue, 9 Oct 2018 16:51:15 +0200
> >
> > x86/mm: Break out user address space handling
> >
> > The last patch broke out kernel address space handing into its own
> > helper.  Now, do the same for user address space handling.
> >
> > Cc: x86@kernel.org
> > Cc: Jann Horn <jannh@google.com>
> > Cc: Sean Christopherson <sean.j.christopherson@intel.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Andy Lutomirski <luto@kernel.org>
> > Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Link: http://lkml.kernel.org/r/20180928160223.9C4F6440@viggo.jf.intel.com
> > ---
> >  arch/x86/mm/fault.c | 47 ++++++++++++++++++++++++++++-------------------
> >  1 file changed, 28 insertions(+), 19 deletions(-)
> >
> > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> > index c7e32f453852..0d1f5d39fc63 100644
> > --- a/arch/x86/mm/fault.c
> > +++ b/arch/x86/mm/fault.c
> > @@ -966,6 +966,7 @@ bad_area_access_error(struct pt_regs *regs, unsigned long error_code,
> >  		__bad_area(regs, error_code, address, vma, SEGV_ACCERR);
> >  }
> >  
> > +/* Handle faults in the kernel portion of the address space */
>                            ^^^^^^
> I believe you mean the __user__ portion of the address space.
> Given that the call chain is:
> 
> do_user_addr_fault
>    handle_mm_fault
>       do_sigbus      

It's both:

  /* Handle faults in the kernel portion of the address space */
  static void
  do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
            u32 *pkey, unsigned int fault)
  {
        struct task_struct *tsk = current;
        int code = BUS_ADRERR;

        /* Kernel mode? Handle exceptions or die: */
        if (!(error_code & X86_PF_USER)) {
                no_context(regs, error_code, address, SIGBUS, BUS_ADRERR);
                return;
        }

        /* User-space => ok to do another page fault: */
        if (is_prefetch(regs, error_code, address))
                return;

        tsk->thread.cr2         = address;
        tsk->thread.error_code  = error_code;
        tsk->thread.trap_nr     = X86_TRAP_PF;


Note the X86_PF_USER check: that's what determines whether the fault was 
for user or system mappings.

I agree that the comment is misleading and should be clarified.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2018-10-19  5:58 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-28 16:02 [PATCH 0/8] [v2] x86/mm: page fault handling cleanups Dave Hansen
2018-09-28 16:02 ` [PATCH 1/8] x86/mm: clarify hardware vs. software "error_code" Dave Hansen
2018-10-09 15:02   ` [tip:x86/mm] x86/mm: Clarify " tip-bot for Dave Hansen
2018-09-28 16:02 ` [PATCH 2/8] x86/mm: break out kernel address space handling Dave Hansen
2018-10-09 15:02   ` [tip:x86/mm] x86/mm: Break " tip-bot for Dave Hansen
2018-09-28 16:02 ` [PATCH 3/8] x86/mm: break out user " Dave Hansen
2018-10-09 15:03   ` [tip:x86/mm] x86/mm: Break " tip-bot for Dave Hansen
2018-10-15  5:43     ` Eric W. Biederman
2018-10-19  5:58       ` Ingo Molnar
2018-09-28 16:02 ` [PATCH 4/8] x86/mm: add clarifying comments for user addr space Dave Hansen
2018-10-09 15:03   ` [tip:x86/mm] x86/mm: Add " tip-bot for Dave Hansen
2018-09-28 16:02 ` [PATCH 5/8] x86/mm: fix exception table comments Dave Hansen
2018-10-09 15:04   ` [tip:x86/mm] x86/mm: Fix " tip-bot for Dave Hansen
2018-09-28 16:02 ` [PATCH 6/8] x86/mm: add vsyscall address helper Dave Hansen
2018-10-09 15:04   ` [tip:x86/mm] x86/mm: Add " tip-bot for Dave Hansen
2018-09-28 16:02 ` [PATCH 7/8] x86/mm/vsyscall: consider vsyscall page part of user address space Dave Hansen
2018-10-09 15:05   ` [tip:x86/mm] x86/mm/vsyscall: Consider " tip-bot for Dave Hansen
2018-09-28 16:02 ` [PATCH 8/8] x86/mm: remove spurious fault pkey check Dave Hansen
2018-10-09 15:05   ` [tip:x86/mm] x86/mm: Remove " tip-bot for Dave Hansen
2018-10-02  9:54 ` [PATCH 0/8] [v2] x86/mm: page fault handling cleanups Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).