All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
@ 2016-09-11 22:54 Lorenzo Stoakes
  2016-09-11 22:59   ` Lorenzo Stoakes
                   ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Lorenzo Stoakes @ 2016-09-11 22:54 UTC (permalink / raw)
  To: linux-mm; +Cc: mgorman, torvalds, riel, tbsaunde, robert, Lorenzo Stoakes

The NUMA balancing logic uses an arch-specific PROT_NONE page table flag defined
by pte_protnone() or pmd_protnone() to mark PTEs or huge page PMDs respectively
as requiring balancing upon a subsequent page fault. User-defined PROT_NONE
memory regions which also have this flag set will not normally invoke the NUMA
balancing code as do_page_fault() will send a segfault to the process before
handle_mm_fault() is even called.

However if access_remote_vm() is invoked to access a PROT_NONE region of memory,
handle_mm_fault() is called via faultin_page() and __get_user_pages() without
any access checks being performed, meaning the NUMA balancing logic is
incorrectly invoked on a non-NUMA memory region.

A simple means of triggering this problem is to access PROT_NONE mmap'd memory
using /proc/self/mem which reliably results in the NUMA handling functions being
invoked when CONFIG_NUMA_BALANCING is set.

This issue was reported in bugzilla (issue 99101) which includes some simple
repro code.

There are BUG_ON() checks in do_numa_page() and do_huge_pmd_numa_page() added at
commit c0e7cad to avoid accidentally provoking strange behaviour by attempting
to apply NUMA balancing to pages that are in fact PROT_NONE. The BUG_ON()'s are
consistently triggered by the repro.

This patch moves the PROT_NONE check into mm/memory.c rather than invoking
BUG_ON() as faulting in these pages via faultin_page() is a valid reason for
reaching the NUMA check with the PROT_NONE page table flag set and is therefore
not always a bug.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=99101
Reported-by: Trevor Saunders <tbsaunde@tbsaunde.org>
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
---
 mm/huge_memory.c |  3 ---
 mm/memory.c      | 12 +++++++-----
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index d76700d..954be55 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1198,9 +1198,6 @@ int do_huge_pmd_numa_page(struct fault_env *fe, pmd_t pmd)
 	bool was_writable;
 	int flags = 0;

-	/* A PROT_NONE fault should not end up here */
-	BUG_ON(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)));
-
 	fe->ptl = pmd_lock(vma->vm_mm, fe->pmd);
 	if (unlikely(!pmd_same(pmd, *fe->pmd)))
 		goto out_unlock;
diff --git a/mm/memory.c b/mm/memory.c
index 020226b..aebc04f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3351,9 +3351,6 @@ static int do_numa_page(struct fault_env *fe, pte_t pte)
 	bool was_writable = pte_write(pte);
 	int flags = 0;

-	/* A PROT_NONE fault should not end up here */
-	BUG_ON(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)));
-
 	/*
 	* The "pte" at this point cannot be used safely without
 	* validation through pte_unmap_same(). It's of NUMA type but
@@ -3458,6 +3455,11 @@ static int wp_huge_pmd(struct fault_env *fe, pmd_t orig_pmd)
 	return VM_FAULT_FALLBACK;
 }

+static inline bool vma_is_accessible(struct vm_area_struct *vma)
+{
+	return vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE);
+}
+
 /*
  * These routines also need to handle stuff like marking pages dirty
  * and/or accessed for architectures that don't do it in hardware (most
@@ -3524,7 +3526,7 @@ static int handle_pte_fault(struct fault_env *fe)
 	if (!pte_present(entry))
 		return do_swap_page(fe, entry);

-	if (pte_protnone(entry))
+	if (pte_protnone(entry) && vma_is_accessible(fe->vma))
 		return do_numa_page(fe, entry);

 	fe->ptl = pte_lockptr(fe->vma->vm_mm, fe->pmd);
@@ -3590,7 +3592,7 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address,

 		barrier();
 		if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) {
-			if (pmd_protnone(orig_pmd))
+			if (pmd_protnone(orig_pmd) && vma_is_accessible(vma))
 				return do_huge_pmd_numa_page(&fe, orig_pmd);

 			if ((fe.flags & FAULT_FLAG_WRITE) &&
--
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2016-10-10 16:38 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-11 22:54 [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing Lorenzo Stoakes
2016-09-11 22:59 ` Lorenzo Stoakes
2016-09-11 22:59   ` Lorenzo Stoakes
2016-09-25 18:47 ` Lorenzo Stoakes
2016-09-25 18:47   ` Lorenzo Stoakes
2016-09-25 20:52   ` Linus Torvalds
2016-09-25 20:52     ` Linus Torvalds
2016-09-25 22:24     ` Linus Torvalds
2016-09-25 22:24       ` Linus Torvalds
2016-09-25 22:34     ` Rik van Riel
2016-09-25 22:50       ` Linus Torvalds
2016-09-25 22:50         ` Linus Torvalds
2016-09-25 23:28         ` Hugh Dickins
2016-09-25 23:28           ` Hugh Dickins
2016-09-26  0:49         ` Rik van Riel
2016-09-26  1:05           ` Linus Torvalds
2016-09-26  1:05             ` Linus Torvalds
2016-10-07 10:07         ` Lorenzo Stoakes
2016-10-07 10:07           ` Lorenzo Stoakes
2016-10-07 15:34           ` Linus Torvalds
2016-10-07 15:34             ` Linus Torvalds
2016-10-07 16:22             ` Lorenzo Stoakes
2016-10-07 16:22               ` Lorenzo Stoakes
2016-10-07 18:16               ` Hugh Dickins
2016-10-07 18:16                 ` Hugh Dickins
2016-10-07 18:26                 ` Lorenzo Stoakes
2016-10-07 18:26                   ` Lorenzo Stoakes
2016-10-10  7:47                 ` Jan Kara
2016-10-10  7:47                   ` Jan Kara
2016-10-10  8:28                   ` Lorenzo Stoakes
2016-10-10  8:28                     ` Lorenzo Stoakes
2016-10-10 16:37                     ` Jan Kara
2016-10-10 16:37                       ` Jan Kara
2016-09-26  8:19 ` Mel Gorman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.