All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
@ 2016-09-11 22:54 Lorenzo Stoakes
  2016-09-11 22:59   ` Lorenzo Stoakes
                   ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Lorenzo Stoakes @ 2016-09-11 22:54 UTC (permalink / raw)
  To: linux-mm; +Cc: mgorman, torvalds, riel, tbsaunde, robert, Lorenzo Stoakes

The NUMA balancing logic uses an arch-specific PROT_NONE page table flag defined
by pte_protnone() or pmd_protnone() to mark PTEs or huge page PMDs respectively
as requiring balancing upon a subsequent page fault. User-defined PROT_NONE
memory regions which also have this flag set will not normally invoke the NUMA
balancing code as do_page_fault() will send a segfault to the process before
handle_mm_fault() is even called.

However if access_remote_vm() is invoked to access a PROT_NONE region of memory,
handle_mm_fault() is called via faultin_page() and __get_user_pages() without
any access checks being performed, meaning the NUMA balancing logic is
incorrectly invoked on a non-NUMA memory region.

A simple means of triggering this problem is to access PROT_NONE mmap'd memory
using /proc/self/mem which reliably results in the NUMA handling functions being
invoked when CONFIG_NUMA_BALANCING is set.

This issue was reported in bugzilla (issue 99101) which includes some simple
repro code.

There are BUG_ON() checks in do_numa_page() and do_huge_pmd_numa_page() added at
commit c0e7cad to avoid accidentally provoking strange behaviour by attempting
to apply NUMA balancing to pages that are in fact PROT_NONE. The BUG_ON()'s are
consistently triggered by the repro.

This patch moves the PROT_NONE check into mm/memory.c rather than invoking
BUG_ON() as faulting in these pages via faultin_page() is a valid reason for
reaching the NUMA check with the PROT_NONE page table flag set and is therefore
not always a bug.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=99101
Reported-by: Trevor Saunders <tbsaunde@tbsaunde.org>
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
---
 mm/huge_memory.c |  3 ---
 mm/memory.c      | 12 +++++++-----
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index d76700d..954be55 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1198,9 +1198,6 @@ int do_huge_pmd_numa_page(struct fault_env *fe, pmd_t pmd)
 	bool was_writable;
 	int flags = 0;

-	/* A PROT_NONE fault should not end up here */
-	BUG_ON(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)));
-
 	fe->ptl = pmd_lock(vma->vm_mm, fe->pmd);
 	if (unlikely(!pmd_same(pmd, *fe->pmd)))
 		goto out_unlock;
diff --git a/mm/memory.c b/mm/memory.c
index 020226b..aebc04f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3351,9 +3351,6 @@ static int do_numa_page(struct fault_env *fe, pte_t pte)
 	bool was_writable = pte_write(pte);
 	int flags = 0;

-	/* A PROT_NONE fault should not end up here */
-	BUG_ON(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)));
-
 	/*
 	* The "pte" at this point cannot be used safely without
 	* validation through pte_unmap_same(). It's of NUMA type but
@@ -3458,6 +3455,11 @@ static int wp_huge_pmd(struct fault_env *fe, pmd_t orig_pmd)
 	return VM_FAULT_FALLBACK;
 }

+static inline bool vma_is_accessible(struct vm_area_struct *vma)
+{
+	return vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE);
+}
+
 /*
  * These routines also need to handle stuff like marking pages dirty
  * and/or accessed for architectures that don't do it in hardware (most
@@ -3524,7 +3526,7 @@ static int handle_pte_fault(struct fault_env *fe)
 	if (!pte_present(entry))
 		return do_swap_page(fe, entry);

-	if (pte_protnone(entry))
+	if (pte_protnone(entry) && vma_is_accessible(fe->vma))
 		return do_numa_page(fe, entry);

 	fe->ptl = pte_lockptr(fe->vma->vm_mm, fe->pmd);
@@ -3590,7 +3592,7 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address,

 		barrier();
 		if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) {
-			if (pmd_protnone(orig_pmd))
+			if (pmd_protnone(orig_pmd) && vma_is_accessible(vma))
 				return do_huge_pmd_numa_page(&fe, orig_pmd);

 			if ((fe.flags & FAULT_FLAG_WRITE) &&
--
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
  2016-09-11 22:54 [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing Lorenzo Stoakes
@ 2016-09-11 22:59   ` Lorenzo Stoakes
  2016-09-25 18:47   ` Lorenzo Stoakes
  2016-09-26  8:19 ` Mel Gorman
  2 siblings, 0 replies; 34+ messages in thread
From: Lorenzo Stoakes @ 2016-09-11 22:59 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Mel Gorman, torvalds, riel, tbsaunde, robert, Lorenzo Stoakes

[adding lkml, accidentally excluded!]

On 11 September 2016 at 23:54, Lorenzo Stoakes <lstoakes@gmail.com> wrote:
> The NUMA balancing logic uses an arch-specific PROT_NONE page table flag defined
> by pte_protnone() or pmd_protnone() to mark PTEs or huge page PMDs respectively
> as requiring balancing upon a subsequent page fault. User-defined PROT_NONE
> memory regions which also have this flag set will not normally invoke the NUMA
> balancing code as do_page_fault() will send a segfault to the process before
> handle_mm_fault() is even called.
>
> However if access_remote_vm() is invoked to access a PROT_NONE region of memory,
> handle_mm_fault() is called via faultin_page() and __get_user_pages() without
> any access checks being performed, meaning the NUMA balancing logic is
> incorrectly invoked on a non-NUMA memory region.
>
> A simple means of triggering this problem is to access PROT_NONE mmap'd memory
> using /proc/self/mem which reliably results in the NUMA handling functions being
> invoked when CONFIG_NUMA_BALANCING is set.
>
> This issue was reported in bugzilla (issue 99101) which includes some simple
> repro code.
>
> There are BUG_ON() checks in do_numa_page() and do_huge_pmd_numa_page() added at
> commit c0e7cad to avoid accidentally provoking strange behaviour by attempting
> to apply NUMA balancing to pages that are in fact PROT_NONE. The BUG_ON()'s are
> consistently triggered by the repro.
>
> This patch moves the PROT_NONE check into mm/memory.c rather than invoking
> BUG_ON() as faulting in these pages via faultin_page() is a valid reason for
> reaching the NUMA check with the PROT_NONE page table flag set and is therefore
> not always a bug.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=99101
> Reported-by: Trevor Saunders <tbsaunde@tbsaunde.org>
> Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
> ---
>  mm/huge_memory.c |  3 ---
>  mm/memory.c      | 12 +++++++-----
>  2 files changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index d76700d..954be55 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1198,9 +1198,6 @@ int do_huge_pmd_numa_page(struct fault_env *fe, pmd_t pmd)
>         bool was_writable;
>         int flags = 0;
>
> -       /* A PROT_NONE fault should not end up here */
> -       BUG_ON(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)));
> -
>         fe->ptl = pmd_lock(vma->vm_mm, fe->pmd);
>         if (unlikely(!pmd_same(pmd, *fe->pmd)))
>                 goto out_unlock;
> diff --git a/mm/memory.c b/mm/memory.c
> index 020226b..aebc04f 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3351,9 +3351,6 @@ static int do_numa_page(struct fault_env *fe, pte_t pte)
>         bool was_writable = pte_write(pte);
>         int flags = 0;
>
> -       /* A PROT_NONE fault should not end up here */
> -       BUG_ON(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)));
> -
>         /*
>         * The "pte" at this point cannot be used safely without
>         * validation through pte_unmap_same(). It's of NUMA type but
> @@ -3458,6 +3455,11 @@ static int wp_huge_pmd(struct fault_env *fe, pmd_t orig_pmd)
>         return VM_FAULT_FALLBACK;
>  }
>
> +static inline bool vma_is_accessible(struct vm_area_struct *vma)
> +{
> +       return vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE);
> +}
> +
>  /*
>   * These routines also need to handle stuff like marking pages dirty
>   * and/or accessed for architectures that don't do it in hardware (most
> @@ -3524,7 +3526,7 @@ static int handle_pte_fault(struct fault_env *fe)
>         if (!pte_present(entry))
>                 return do_swap_page(fe, entry);
>
> -       if (pte_protnone(entry))
> +       if (pte_protnone(entry) && vma_is_accessible(fe->vma))
>                 return do_numa_page(fe, entry);
>
>         fe->ptl = pte_lockptr(fe->vma->vm_mm, fe->pmd);
> @@ -3590,7 +3592,7 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
>
>                 barrier();
>                 if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) {
> -                       if (pmd_protnone(orig_pmd))
> +                       if (pmd_protnone(orig_pmd) && vma_is_accessible(vma))
>                                 return do_huge_pmd_numa_page(&fe, orig_pmd);
>
>                         if ((fe.flags & FAULT_FLAG_WRITE) &&
> --
> 2.9.3



-- 
Lorenzo Stoakes
https://ljs.io

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
@ 2016-09-11 22:59   ` Lorenzo Stoakes
  0 siblings, 0 replies; 34+ messages in thread
From: Lorenzo Stoakes @ 2016-09-11 22:59 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Mel Gorman, torvalds, riel, tbsaunde, robert, Lorenzo Stoakes

[adding lkml, accidentally excluded!]

On 11 September 2016 at 23:54, Lorenzo Stoakes <lstoakes@gmail.com> wrote:
> The NUMA balancing logic uses an arch-specific PROT_NONE page table flag defined
> by pte_protnone() or pmd_protnone() to mark PTEs or huge page PMDs respectively
> as requiring balancing upon a subsequent page fault. User-defined PROT_NONE
> memory regions which also have this flag set will not normally invoke the NUMA
> balancing code as do_page_fault() will send a segfault to the process before
> handle_mm_fault() is even called.
>
> However if access_remote_vm() is invoked to access a PROT_NONE region of memory,
> handle_mm_fault() is called via faultin_page() and __get_user_pages() without
> any access checks being performed, meaning the NUMA balancing logic is
> incorrectly invoked on a non-NUMA memory region.
>
> A simple means of triggering this problem is to access PROT_NONE mmap'd memory
> using /proc/self/mem which reliably results in the NUMA handling functions being
> invoked when CONFIG_NUMA_BALANCING is set.
>
> This issue was reported in bugzilla (issue 99101) which includes some simple
> repro code.
>
> There are BUG_ON() checks in do_numa_page() and do_huge_pmd_numa_page() added at
> commit c0e7cad to avoid accidentally provoking strange behaviour by attempting
> to apply NUMA balancing to pages that are in fact PROT_NONE. The BUG_ON()'s are
> consistently triggered by the repro.
>
> This patch moves the PROT_NONE check into mm/memory.c rather than invoking
> BUG_ON() as faulting in these pages via faultin_page() is a valid reason for
> reaching the NUMA check with the PROT_NONE page table flag set and is therefore
> not always a bug.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=99101
> Reported-by: Trevor Saunders <tbsaunde@tbsaunde.org>
> Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
> ---
>  mm/huge_memory.c |  3 ---
>  mm/memory.c      | 12 +++++++-----
>  2 files changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index d76700d..954be55 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1198,9 +1198,6 @@ int do_huge_pmd_numa_page(struct fault_env *fe, pmd_t pmd)
>         bool was_writable;
>         int flags = 0;
>
> -       /* A PROT_NONE fault should not end up here */
> -       BUG_ON(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)));
> -
>         fe->ptl = pmd_lock(vma->vm_mm, fe->pmd);
>         if (unlikely(!pmd_same(pmd, *fe->pmd)))
>                 goto out_unlock;
> diff --git a/mm/memory.c b/mm/memory.c
> index 020226b..aebc04f 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3351,9 +3351,6 @@ static int do_numa_page(struct fault_env *fe, pte_t pte)
>         bool was_writable = pte_write(pte);
>         int flags = 0;
>
> -       /* A PROT_NONE fault should not end up here */
> -       BUG_ON(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)));
> -
>         /*
>         * The "pte" at this point cannot be used safely without
>         * validation through pte_unmap_same(). It's of NUMA type but
> @@ -3458,6 +3455,11 @@ static int wp_huge_pmd(struct fault_env *fe, pmd_t orig_pmd)
>         return VM_FAULT_FALLBACK;
>  }
>
> +static inline bool vma_is_accessible(struct vm_area_struct *vma)
> +{
> +       return vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE);
> +}
> +
>  /*
>   * These routines also need to handle stuff like marking pages dirty
>   * and/or accessed for architectures that don't do it in hardware (most
> @@ -3524,7 +3526,7 @@ static int handle_pte_fault(struct fault_env *fe)
>         if (!pte_present(entry))
>                 return do_swap_page(fe, entry);
>
> -       if (pte_protnone(entry))
> +       if (pte_protnone(entry) && vma_is_accessible(fe->vma))
>                 return do_numa_page(fe, entry);
>
>         fe->ptl = pte_lockptr(fe->vma->vm_mm, fe->pmd);
> @@ -3590,7 +3592,7 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
>
>                 barrier();
>                 if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) {
> -                       if (pmd_protnone(orig_pmd))
> +                       if (pmd_protnone(orig_pmd) && vma_is_accessible(vma))
>                                 return do_huge_pmd_numa_page(&fe, orig_pmd);
>
>                         if ((fe.flags & FAULT_FLAG_WRITE) &&
> --
> 2.9.3



-- 
Lorenzo Stoakes
https://ljs.io

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
  2016-09-11 22:54 [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing Lorenzo Stoakes
@ 2016-09-25 18:47   ` Lorenzo Stoakes
  2016-09-25 18:47   ` Lorenzo Stoakes
  2016-09-26  8:19 ` Mel Gorman
  2 siblings, 0 replies; 34+ messages in thread
From: Lorenzo Stoakes @ 2016-09-25 18:47 UTC (permalink / raw)
  To: linux-mm; +Cc: mgorman, torvalds, riel, tbsaunde, robert, linux-kernel, akpm

Just a quick ping on this, let me know if you need anything more from me!

Thanks, Lorenzo

On Sun, Sep 11, 2016 at 11:54:25PM +0100, Lorenzo Stoakes wrote:
> The NUMA balancing logic uses an arch-specific PROT_NONE page table flag defined
> by pte_protnone() or pmd_protnone() to mark PTEs or huge page PMDs respectively
> as requiring balancing upon a subsequent page fault. User-defined PROT_NONE
> memory regions which also have this flag set will not normally invoke the NUMA
> balancing code as do_page_fault() will send a segfault to the process before
> handle_mm_fault() is even called.
>
> However if access_remote_vm() is invoked to access a PROT_NONE region of memory,
> handle_mm_fault() is called via faultin_page() and __get_user_pages() without
> any access checks being performed, meaning the NUMA balancing logic is
> incorrectly invoked on a non-NUMA memory region.
>
> A simple means of triggering this problem is to access PROT_NONE mmap'd memory
> using /proc/self/mem which reliably results in the NUMA handling functions being
> invoked when CONFIG_NUMA_BALANCING is set.
>
> This issue was reported in bugzilla (issue 99101) which includes some simple
> repro code.
>
> There are BUG_ON() checks in do_numa_page() and do_huge_pmd_numa_page() added at
> commit c0e7cad to avoid accidentally provoking strange behaviour by attempting
> to apply NUMA balancing to pages that are in fact PROT_NONE. The BUG_ON()'s are
> consistently triggered by the repro.
>
> This patch moves the PROT_NONE check into mm/memory.c rather than invoking
> BUG_ON() as faulting in these pages via faultin_page() is a valid reason for
> reaching the NUMA check with the PROT_NONE page table flag set and is therefore
> not always a bug.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=99101
> Reported-by: Trevor Saunders <tbsaunde@tbsaunde.org>
> Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
> ---
>  mm/huge_memory.c |  3 ---
>  mm/memory.c      | 12 +++++++-----
>  2 files changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index d76700d..954be55 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1198,9 +1198,6 @@ int do_huge_pmd_numa_page(struct fault_env *fe, pmd_t pmd)
>  	bool was_writable;
>  	int flags = 0;
>
> -	/* A PROT_NONE fault should not end up here */
> -	BUG_ON(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)));
> -
>  	fe->ptl = pmd_lock(vma->vm_mm, fe->pmd);
>  	if (unlikely(!pmd_same(pmd, *fe->pmd)))
>  		goto out_unlock;
> diff --git a/mm/memory.c b/mm/memory.c
> index 020226b..aebc04f 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3351,9 +3351,6 @@ static int do_numa_page(struct fault_env *fe, pte_t pte)
>  	bool was_writable = pte_write(pte);
>  	int flags = 0;
>
> -	/* A PROT_NONE fault should not end up here */
> -	BUG_ON(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)));
> -
>  	/*
>  	* The "pte" at this point cannot be used safely without
>  	* validation through pte_unmap_same(). It's of NUMA type but
> @@ -3458,6 +3455,11 @@ static int wp_huge_pmd(struct fault_env *fe, pmd_t orig_pmd)
>  	return VM_FAULT_FALLBACK;
>  }
>
> +static inline bool vma_is_accessible(struct vm_area_struct *vma)
> +{
> +	return vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE);
> +}
> +
>  /*
>   * These routines also need to handle stuff like marking pages dirty
>   * and/or accessed for architectures that don't do it in hardware (most
> @@ -3524,7 +3526,7 @@ static int handle_pte_fault(struct fault_env *fe)
>  	if (!pte_present(entry))
>  		return do_swap_page(fe, entry);
>
> -	if (pte_protnone(entry))
> +	if (pte_protnone(entry) && vma_is_accessible(fe->vma))
>  		return do_numa_page(fe, entry);
>
>  	fe->ptl = pte_lockptr(fe->vma->vm_mm, fe->pmd);
> @@ -3590,7 +3592,7 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
>
>  		barrier();
>  		if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) {
> -			if (pmd_protnone(orig_pmd))
> +			if (pmd_protnone(orig_pmd) && vma_is_accessible(vma))
>  				return do_huge_pmd_numa_page(&fe, orig_pmd);
>
>  			if ((fe.flags & FAULT_FLAG_WRITE) &&
> --
> 2.9.3

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
@ 2016-09-25 18:47   ` Lorenzo Stoakes
  0 siblings, 0 replies; 34+ messages in thread
From: Lorenzo Stoakes @ 2016-09-25 18:47 UTC (permalink / raw)
  To: linux-mm; +Cc: mgorman, torvalds, riel, tbsaunde, robert, linux-kernel, akpm

Just a quick ping on this, let me know if you need anything more from me!

Thanks, Lorenzo

On Sun, Sep 11, 2016 at 11:54:25PM +0100, Lorenzo Stoakes wrote:
> The NUMA balancing logic uses an arch-specific PROT_NONE page table flag defined
> by pte_protnone() or pmd_protnone() to mark PTEs or huge page PMDs respectively
> as requiring balancing upon a subsequent page fault. User-defined PROT_NONE
> memory regions which also have this flag set will not normally invoke the NUMA
> balancing code as do_page_fault() will send a segfault to the process before
> handle_mm_fault() is even called.
>
> However if access_remote_vm() is invoked to access a PROT_NONE region of memory,
> handle_mm_fault() is called via faultin_page() and __get_user_pages() without
> any access checks being performed, meaning the NUMA balancing logic is
> incorrectly invoked on a non-NUMA memory region.
>
> A simple means of triggering this problem is to access PROT_NONE mmap'd memory
> using /proc/self/mem which reliably results in the NUMA handling functions being
> invoked when CONFIG_NUMA_BALANCING is set.
>
> This issue was reported in bugzilla (issue 99101) which includes some simple
> repro code.
>
> There are BUG_ON() checks in do_numa_page() and do_huge_pmd_numa_page() added at
> commit c0e7cad to avoid accidentally provoking strange behaviour by attempting
> to apply NUMA balancing to pages that are in fact PROT_NONE. The BUG_ON()'s are
> consistently triggered by the repro.
>
> This patch moves the PROT_NONE check into mm/memory.c rather than invoking
> BUG_ON() as faulting in these pages via faultin_page() is a valid reason for
> reaching the NUMA check with the PROT_NONE page table flag set and is therefore
> not always a bug.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=99101
> Reported-by: Trevor Saunders <tbsaunde@tbsaunde.org>
> Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
> ---
>  mm/huge_memory.c |  3 ---
>  mm/memory.c      | 12 +++++++-----
>  2 files changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index d76700d..954be55 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1198,9 +1198,6 @@ int do_huge_pmd_numa_page(struct fault_env *fe, pmd_t pmd)
>  	bool was_writable;
>  	int flags = 0;
>
> -	/* A PROT_NONE fault should not end up here */
> -	BUG_ON(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)));
> -
>  	fe->ptl = pmd_lock(vma->vm_mm, fe->pmd);
>  	if (unlikely(!pmd_same(pmd, *fe->pmd)))
>  		goto out_unlock;
> diff --git a/mm/memory.c b/mm/memory.c
> index 020226b..aebc04f 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3351,9 +3351,6 @@ static int do_numa_page(struct fault_env *fe, pte_t pte)
>  	bool was_writable = pte_write(pte);
>  	int flags = 0;
>
> -	/* A PROT_NONE fault should not end up here */
> -	BUG_ON(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)));
> -
>  	/*
>  	* The "pte" at this point cannot be used safely without
>  	* validation through pte_unmap_same(). It's of NUMA type but
> @@ -3458,6 +3455,11 @@ static int wp_huge_pmd(struct fault_env *fe, pmd_t orig_pmd)
>  	return VM_FAULT_FALLBACK;
>  }
>
> +static inline bool vma_is_accessible(struct vm_area_struct *vma)
> +{
> +	return vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE);
> +}
> +
>  /*
>   * These routines also need to handle stuff like marking pages dirty
>   * and/or accessed for architectures that don't do it in hardware (most
> @@ -3524,7 +3526,7 @@ static int handle_pte_fault(struct fault_env *fe)
>  	if (!pte_present(entry))
>  		return do_swap_page(fe, entry);
>
> -	if (pte_protnone(entry))
> +	if (pte_protnone(entry) && vma_is_accessible(fe->vma))
>  		return do_numa_page(fe, entry);
>
>  	fe->ptl = pte_lockptr(fe->vma->vm_mm, fe->pmd);
> @@ -3590,7 +3592,7 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
>
>  		barrier();
>  		if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) {
> -			if (pmd_protnone(orig_pmd))
> +			if (pmd_protnone(orig_pmd) && vma_is_accessible(vma))
>  				return do_huge_pmd_numa_page(&fe, orig_pmd);
>
>  			if ((fe.flags & FAULT_FLAG_WRITE) &&
> --
> 2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
  2016-09-25 18:47   ` Lorenzo Stoakes
@ 2016-09-25 20:52     ` Linus Torvalds
  -1 siblings, 0 replies; 34+ messages in thread
From: Linus Torvalds @ 2016-09-25 20:52 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: linux-mm, Mel Gorman, Rik van Riel, tbsaunde, robert,
	Linux Kernel Mailing List, Andrew Morton

I was kind of assuming this would go through the normal channels for
THP patches, but it's been two weeks...

Can I have an ACK from the involved people, and I'll apply it
directly.. Mel? Rik?

                   Linus

On Sun, Sep 25, 2016 at 11:47 AM, Lorenzo Stoakes <lstoakes@gmail.com> wrote:
> Just a quick ping on this, let me know if you need anything more from me!
>
> Thanks, Lorenzo
>
> On Sun, Sep 11, 2016 at 11:54:25PM +0100, Lorenzo Stoakes wrote:
>> The NUMA balancing logic uses an arch-specific PROT_NONE page table flag defined
>> by pte_protnone() or pmd_protnone() to mark PTEs or huge page PMDs respectively
>> as requiring balancing upon a subsequent page fault. User-defined PROT_NONE
>> memory regions which also have this flag set will not normally invoke the NUMA
>> balancing code as do_page_fault() will send a segfault to the process before
>> handle_mm_fault() is even called.
>>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
@ 2016-09-25 20:52     ` Linus Torvalds
  0 siblings, 0 replies; 34+ messages in thread
From: Linus Torvalds @ 2016-09-25 20:52 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: linux-mm, Mel Gorman, Rik van Riel, tbsaunde, robert,
	Linux Kernel Mailing List, Andrew Morton

I was kind of assuming this would go through the normal channels for
THP patches, but it's been two weeks...

Can I have an ACK from the involved people, and I'll apply it
directly.. Mel? Rik?

                   Linus

On Sun, Sep 25, 2016 at 11:47 AM, Lorenzo Stoakes <lstoakes@gmail.com> wrote:
> Just a quick ping on this, let me know if you need anything more from me!
>
> Thanks, Lorenzo
>
> On Sun, Sep 11, 2016 at 11:54:25PM +0100, Lorenzo Stoakes wrote:
>> The NUMA balancing logic uses an arch-specific PROT_NONE page table flag defined
>> by pte_protnone() or pmd_protnone() to mark PTEs or huge page PMDs respectively
>> as requiring balancing upon a subsequent page fault. User-defined PROT_NONE
>> memory regions which also have this flag set will not normally invoke the NUMA
>> balancing code as do_page_fault() will send a segfault to the process before
>> handle_mm_fault() is even called.
>>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
  2016-09-25 20:52     ` Linus Torvalds
@ 2016-09-25 22:24       ` Linus Torvalds
  -1 siblings, 0 replies; 34+ messages in thread
From: Linus Torvalds @ 2016-09-25 22:24 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: linux-mm, Mel Gorman, Rik van Riel, tbsaunde, robert,
	Linux Kernel Mailing List, Andrew Morton

On Sun, Sep 25, 2016 at 1:52 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Can I have an ACK from the involved people, and I'll apply it
> directly.. Mel? Rik?

Oh well. The patch looks fine to me and I want to include it in the
rc8 release, so I'll apply it. Worst comes to worst we can revert, but
I can confirm that it's trivial to trigger the BUG_ON() without the
patch, so..

                Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
@ 2016-09-25 22:24       ` Linus Torvalds
  0 siblings, 0 replies; 34+ messages in thread
From: Linus Torvalds @ 2016-09-25 22:24 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: linux-mm, Mel Gorman, Rik van Riel, tbsaunde, robert,
	Linux Kernel Mailing List, Andrew Morton

On Sun, Sep 25, 2016 at 1:52 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Can I have an ACK from the involved people, and I'll apply it
> directly.. Mel? Rik?

Oh well. The patch looks fine to me and I want to include it in the
rc8 release, so I'll apply it. Worst comes to worst we can revert, but
I can confirm that it's trivial to trigger the BUG_ON() without the
patch, so..

                Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
  2016-09-25 20:52     ` Linus Torvalds
  (?)
  (?)
@ 2016-09-25 22:34     ` Rik van Riel
  2016-09-25 22:50         ` Linus Torvalds
  -1 siblings, 1 reply; 34+ messages in thread
From: Rik van Riel @ 2016-09-25 22:34 UTC (permalink / raw)
  To: Linus Torvalds, Lorenzo Stoakes
  Cc: linux-mm, Mel Gorman, tbsaunde, robert,
	Linux Kernel Mailing List, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 496 bytes --]

On Sun, 2016-09-25 at 13:52 -0700, Linus Torvalds wrote:
> I was kind of assuming this would go through the normal channels for
> THP patches, but it's been two weeks...
> 
> Can I have an ACK from the involved people, and I'll apply it
> directly.. Mel? Rik?

Sorry about that, I was a little distracted with the
NUMA hinting vs mprotect bug that caused programs to
segfault.

The patch looks good to me, too.

Acked-by: Rik van Riel <riel@redhat.com>

--- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
  2016-09-25 22:34     ` Rik van Riel
@ 2016-09-25 22:50         ` Linus Torvalds
  0 siblings, 0 replies; 34+ messages in thread
From: Linus Torvalds @ 2016-09-25 22:50 UTC (permalink / raw)
  To: Rik van Riel, Hugh Dickins
  Cc: Lorenzo Stoakes, linux-mm, Mel Gorman, tbsaunde, robert,
	Linux Kernel Mailing List, Andrew Morton

On Sun, Sep 25, 2016 at 3:34 PM, Rik van Riel <riel@redhat.com> wrote:
>
> The patch looks good to me, too.
>
> Acked-by: Rik van Riel <riel@redhat.com>

Thanks, amended the commit since I hadn't pushed out yet.

Btw, the only reason this bug could happen is that we do that
"force=1" for remote vm accesses, which turns into FOLL_FORCE, which
in turn will turn into us allowing an access even when we technically
shouldn't.

I'd really like to re-open the "drop FOLL_FORCE entirely" discussion,
because the thing really is disgusting.

I realize that debuggers etc sometimes would want to punch through
PROT_NONE protections, and I also realize that right now we only have
a read/write flag, and we have that whole issue with "what if it's
executable but not readable", which currently FOLL_FORCE makes a
non-issue.

But at the same time, FOLL_FORCE really is a major nasty thing. It
shouldn't be a security issue (we still do check VM_MAY_READ/WRITE etc
to verify that even if something isn't readable or writable we *could*
have had permissions to do this), but this bug is a prime example of
how it violates our deeply held beliefs of how VM permissions *should*
work, and it screwed up the numa case as a result.

So how about we consider getting rid of FOLL_FORCE? Addign Hugh
Dickins to the cc, because I think he argued for that many moons ago..

                  Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
@ 2016-09-25 22:50         ` Linus Torvalds
  0 siblings, 0 replies; 34+ messages in thread
From: Linus Torvalds @ 2016-09-25 22:50 UTC (permalink / raw)
  To: Rik van Riel, Hugh Dickins
  Cc: Lorenzo Stoakes, linux-mm, Mel Gorman, tbsaunde, robert,
	Linux Kernel Mailing List, Andrew Morton

On Sun, Sep 25, 2016 at 3:34 PM, Rik van Riel <riel@redhat.com> wrote:
>
> The patch looks good to me, too.
>
> Acked-by: Rik van Riel <riel@redhat.com>

Thanks, amended the commit since I hadn't pushed out yet.

Btw, the only reason this bug could happen is that we do that
"force=1" for remote vm accesses, which turns into FOLL_FORCE, which
in turn will turn into us allowing an access even when we technically
shouldn't.

I'd really like to re-open the "drop FOLL_FORCE entirely" discussion,
because the thing really is disgusting.

I realize that debuggers etc sometimes would want to punch through
PROT_NONE protections, and I also realize that right now we only have
a read/write flag, and we have that whole issue with "what if it's
executable but not readable", which currently FOLL_FORCE makes a
non-issue.

But at the same time, FOLL_FORCE really is a major nasty thing. It
shouldn't be a security issue (we still do check VM_MAY_READ/WRITE etc
to verify that even if something isn't readable or writable we *could*
have had permissions to do this), but this bug is a prime example of
how it violates our deeply held beliefs of how VM permissions *should*
work, and it screwed up the numa case as a result.

So how about we consider getting rid of FOLL_FORCE? Addign Hugh
Dickins to the cc, because I think he argued for that many moons ago..

                  Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
  2016-09-25 22:50         ` Linus Torvalds
@ 2016-09-25 23:28           ` Hugh Dickins
  -1 siblings, 0 replies; 34+ messages in thread
From: Hugh Dickins @ 2016-09-25 23:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Konstantin Khlebnikov, Rik van Riel, Hugh Dickins,
	Lorenzo Stoakes, linux-mm, Mel Gorman, tbsaunde, robert,
	Linux Kernel Mailing List, Andrew Morton

On Sun, 25 Sep 2016, Linus Torvalds wrote:
> On Sun, Sep 25, 2016 at 3:34 PM, Rik van Riel <riel@redhat.com> wrote:
> >
> > The patch looks good to me, too.
> >
> > Acked-by: Rik van Riel <riel@redhat.com>
> 
> Thanks, amended the commit since I hadn't pushed out yet.
> 
> Btw, the only reason this bug could happen is that we do that
> "force=1" for remote vm accesses, which turns into FOLL_FORCE, which
> in turn will turn into us allowing an access even when we technically
> shouldn't.
> 
> I'd really like to re-open the "drop FOLL_FORCE entirely" discussion,
> because the thing really is disgusting.
> 
> I realize that debuggers etc sometimes would want to punch through
> PROT_NONE protections, and I also realize that right now we only have
> a read/write flag, and we have that whole issue with "what if it's
> executable but not readable", which currently FOLL_FORCE makes a
> non-issue.
> 
> But at the same time, FOLL_FORCE really is a major nasty thing. It
> shouldn't be a security issue (we still do check VM_MAY_READ/WRITE etc
> to verify that even if something isn't readable or writable we *could*
> have had permissions to do this), but this bug is a prime example of
> how it violates our deeply held beliefs of how VM permissions *should*
> work, and it screwed up the numa case as a result.
> 
> So how about we consider getting rid of FOLL_FORCE? Addign Hugh
> Dickins to the cc, because I think he argued for that many moons ago..

No.  You do remember half-right, because there was a bizarre aspect
of write,force that Nick and I campaigned to remove, which in the end
cda540ace6a1 ("mm: get_user_pages(write,force) refuse to COW in shared areas")
got rid of - see that commit for details.

I don't have any objections to force now, though I haven't been reading
this thread to see if it would change my mind (and now I must dash out).
But someone else who had concerns about it, I forget whether resolved
or not by cda5, was Konstantin - baton passed.

Hugh

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
@ 2016-09-25 23:28           ` Hugh Dickins
  0 siblings, 0 replies; 34+ messages in thread
From: Hugh Dickins @ 2016-09-25 23:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Konstantin Khlebnikov, Rik van Riel, Hugh Dickins,
	Lorenzo Stoakes, linux-mm, Mel Gorman, tbsaunde, robert,
	Linux Kernel Mailing List, Andrew Morton

On Sun, 25 Sep 2016, Linus Torvalds wrote:
> On Sun, Sep 25, 2016 at 3:34 PM, Rik van Riel <riel@redhat.com> wrote:
> >
> > The patch looks good to me, too.
> >
> > Acked-by: Rik van Riel <riel@redhat.com>
> 
> Thanks, amended the commit since I hadn't pushed out yet.
> 
> Btw, the only reason this bug could happen is that we do that
> "force=1" for remote vm accesses, which turns into FOLL_FORCE, which
> in turn will turn into us allowing an access even when we technically
> shouldn't.
> 
> I'd really like to re-open the "drop FOLL_FORCE entirely" discussion,
> because the thing really is disgusting.
> 
> I realize that debuggers etc sometimes would want to punch through
> PROT_NONE protections, and I also realize that right now we only have
> a read/write flag, and we have that whole issue with "what if it's
> executable but not readable", which currently FOLL_FORCE makes a
> non-issue.
> 
> But at the same time, FOLL_FORCE really is a major nasty thing. It
> shouldn't be a security issue (we still do check VM_MAY_READ/WRITE etc
> to verify that even if something isn't readable or writable we *could*
> have had permissions to do this), but this bug is a prime example of
> how it violates our deeply held beliefs of how VM permissions *should*
> work, and it screwed up the numa case as a result.
> 
> So how about we consider getting rid of FOLL_FORCE? Addign Hugh
> Dickins to the cc, because I think he argued for that many moons ago..

No.  You do remember half-right, because there was a bizarre aspect
of write,force that Nick and I campaigned to remove, which in the end
cda540ace6a1 ("mm: get_user_pages(write,force) refuse to COW in shared areas")
got rid of - see that commit for details.

I don't have any objections to force now, though I haven't been reading
this thread to see if it would change my mind (and now I must dash out).
But someone else who had concerns about it, I forget whether resolved
or not by cda5, was Konstantin - baton passed.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
  2016-09-25 22:50         ` Linus Torvalds
  (?)
  (?)
@ 2016-09-26  0:49         ` Rik van Riel
  2016-09-26  1:05             ` Linus Torvalds
  -1 siblings, 1 reply; 34+ messages in thread
From: Rik van Riel @ 2016-09-26  0:49 UTC (permalink / raw)
  To: Linus Torvalds, Hugh Dickins
  Cc: Lorenzo Stoakes, linux-mm, Mel Gorman, tbsaunde, robert,
	Linux Kernel Mailing List, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 1299 bytes --]

On Sun, 2016-09-25 at 15:50 -0700, Linus Torvalds wrote:
> On Sun, Sep 25, 2016 at 3:34 PM, Rik van Riel <riel@redhat.com>
> wrote:
> > 
> > 
> > The patch looks good to me, too.
> > 
> > Acked-by: Rik van Riel <riel@redhat.com>
> 
> Thanks, amended the commit since I hadn't pushed out yet.
> 
> Btw, the only reason this bug could happen is that we do that
> "force=1" for remote vm accesses, which turns into FOLL_FORCE, which
> in turn will turn into us allowing an access even when we technically
> shouldn't.
> 
> I'd really like to re-open the "drop FOLL_FORCE entirely" discussion,
> because the thing really is disgusting.
> 
> I realize that debuggers etc sometimes would want to punch through
> PROT_NONE protections,

Reading the code for a little bit, it looks like get_user_pages
interprets both PROT_NONE and PAGE_NUMA ptes as present, and will
simply return the page to the caller.

Furthermore, if a page in a PROT_NONE VMA is actually not present,
it should be faulted in with PROT_NONE permissions, after which
the page is passed to the debugger.

That is, punching through PROT_NONE permissions should only happen
from outside of the process. Inside the process, PROT_NONE should
be preserved regardless of FOLL_FORCE.

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
  2016-09-26  0:49         ` Rik van Riel
@ 2016-09-26  1:05             ` Linus Torvalds
  0 siblings, 0 replies; 34+ messages in thread
From: Linus Torvalds @ 2016-09-26  1:05 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Hugh Dickins, Lorenzo Stoakes, linux-mm, Mel Gorman, tbsaunde,
	robert, Linux Kernel Mailing List, Andrew Morton

On Sun, Sep 25, 2016 at 5:49 PM, Rik van Riel <riel@redhat.com> wrote:
>
> Reading the code for a little bit, it looks like get_user_pages
> interprets both PROT_NONE and PAGE_NUMA ptes as present, and will
> simply return the page to the caller.

So the thing is, I don't think the code should even get that far.

It should just fail in check_vma_flags() (possibly after doing the
fast-lookup of the page tables, but that would fail with PROT_NONE).

But thanks to FOLL_FORCE, it doesn't. So things that actually use the
page array and prot_none can get access to the underlying data.

                 Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
@ 2016-09-26  1:05             ` Linus Torvalds
  0 siblings, 0 replies; 34+ messages in thread
From: Linus Torvalds @ 2016-09-26  1:05 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Hugh Dickins, Lorenzo Stoakes, linux-mm, Mel Gorman, tbsaunde,
	robert, Linux Kernel Mailing List, Andrew Morton

On Sun, Sep 25, 2016 at 5:49 PM, Rik van Riel <riel@redhat.com> wrote:
>
> Reading the code for a little bit, it looks like get_user_pages
> interprets both PROT_NONE and PAGE_NUMA ptes as present, and will
> simply return the page to the caller.

So the thing is, I don't think the code should even get that far.

It should just fail in check_vma_flags() (possibly after doing the
fast-lookup of the page tables, but that would fail with PROT_NONE).

But thanks to FOLL_FORCE, it doesn't. So things that actually use the
page array and prot_none can get access to the underlying data.

                 Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
  2016-09-11 22:54 [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing Lorenzo Stoakes
  2016-09-11 22:59   ` Lorenzo Stoakes
  2016-09-25 18:47   ` Lorenzo Stoakes
@ 2016-09-26  8:19 ` Mel Gorman
  2 siblings, 0 replies; 34+ messages in thread
From: Mel Gorman @ 2016-09-26  8:19 UTC (permalink / raw)
  To: Lorenzo Stoakes; +Cc: linux-mm, torvalds, riel, tbsaunde, robert

On Sun, Sep 11, 2016 at 11:54:25PM +0100, Lorenzo Stoakes wrote:
> The NUMA balancing logic uses an arch-specific PROT_NONE page table flag defined
> by pte_protnone() or pmd_protnone() to mark PTEs or huge page PMDs respectively
> as requiring balancing upon a subsequent page fault. User-defined PROT_NONE
> memory regions which also have this flag set will not normally invoke the NUMA
> balancing code as do_page_fault() will send a segfault to the process before
> handle_mm_fault() is even called.
> 
> However if access_remote_vm() is invoked to access a PROT_NONE region of memory,
> handle_mm_fault() is called via faultin_page() and __get_user_pages() without
> any access checks being performed, meaning the NUMA balancing logic is
> incorrectly invoked on a non-NUMA memory region.
> 
> A simple means of triggering this problem is to access PROT_NONE mmap'd memory
> using /proc/self/mem which reliably results in the NUMA handling functions being
> invoked when CONFIG_NUMA_BALANCING is set.
> 
> This issue was reported in bugzilla (issue 99101) which includes some simple
> repro code.
> 
> There are BUG_ON() checks in do_numa_page() and do_huge_pmd_numa_page() added at
> commit c0e7cad to avoid accidentally provoking strange behaviour by attempting
> to apply NUMA balancing to pages that are in fact PROT_NONE. The BUG_ON()'s are
> consistently triggered by the repro.
> 
> This patch moves the PROT_NONE check into mm/memory.c rather than invoking
> BUG_ON() as faulting in these pages via faultin_page() is a valid reason for
> reaching the NUMA check with the PROT_NONE page table flag set and is therefore
> not always a bug.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=99101
> Reported-by: Trevor Saunders <tbsaunde@tbsaunde.org>
> Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>

Acked-by: Mel Gorman <mgorman@techsingularity.net>

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
  2016-09-25 22:50         ` Linus Torvalds
@ 2016-10-07 10:07           ` Lorenzo Stoakes
  -1 siblings, 0 replies; 34+ messages in thread
From: Lorenzo Stoakes @ 2016-10-07 10:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rik van Riel, Hugh Dickins, linux-mm, Mel Gorman, tbsaunde,
	robert, Linux Kernel Mailing List, Andrew Morton

On Sun, Sep 25, 2016 at 03:50:21PM -0700, Linus Torvalds wrote:
> I'd really like to re-open the "drop FOLL_FORCE entirely" discussion,
> because the thing really is disgusting.
>
> I realize that debuggers etc sometimes would want to punch through
> PROT_NONE protections, and I also realize that right now we only have
> a read/write flag, and we have that whole issue with "what if it's
> executable but not readable", which currently FOLL_FORCE makes a
> non-issue.

So I've experimented with this a little locally, removing FOLL_FORCE altogether
and tracking places where it is used (it seems to be a fair few places
actually.)

I've rather naively replaced the FOLL_FORCE check in check_vma_flags() with a
check against 'tsk && tsk->ptrace && tsk->parent == current', I'm not sure how
valid or sane this is, however, but a quick check against gdb proves that it is
able to do its thing in this configuration. Is this a viable path, or is this
way off the mark here?

The places I've found that have invoked gup functions which eventually result in
FOLL_FORCE being set are:

Calls __get_user_pages():
	mm/gup.c: populate_vma_page_range()
	mm/gup.c: get_dump_page()

calls get_user_pages_unlocked():
	drivers/media/pci/ivtv/ivtv-yuv.c: ivtv_yuv_prep_user_dma()
	drivers/media/pci/ivtv/ivtv-udma.c: ivtv_udma_setup()

calls get_user_pages_remote():
	mm/memory.c: __access_remote_vm() [ see below for callers ]
	fs/exec.c: get_arg_page()
	kernel/events/uprobes.c: uprobe_write_opcode()
	kernel/events/uprobes.c: is_trap_at_addr()
	security/tomoyo/domain.c: tomoyo_dump_page()

calls __access_remote_vm():
	mm/memory.c: access_remote_vm() [ see below for callers ]
	mm/memory.c: access_process_vm()

access_process_vm() is exclusively used for ptrace, omitting its callers here.

calls access_remote_vm():
	fs/proc/base.c: proc_pid_cmdline_read()
	fs/proc/base.c: memrw()
	fs/proc/base.c: environ_read()

calls get_user_pages():
	drivers/infiniband/core/umem.c: ib_umem_get()
	drivers/infiniband/hw/qib/qib_user_pages.c: __qib_get_user_pages()
	drivers/infiniband/hw/usnic/usnic_uiom.c: usnic_uiom_get_pages()
	drivers/media/v4l2-core/videobuf-dma-sg.c: videobuf_dma_init_user_locked()

calls get_vaddr_frames():
	drivers/media/v4l2-core/videobuf2-memops.c: vb2_create_framevec()
	drivers/gpu/drm/exynos/exynos_drm_g2d.c: g2d_userptr_get_dma_addr()

So it seems the general areas where it is used are tracing, uprobes and DMA
initialisation what what I can tell. I'm thinking some extra provision/careful
checking will be needed in each of these cases to see if an alternative is
possible.

I'm happy to explore this some more if that is useful in any way, though of
course I defer to your expertise as to how a world without FOLL_FORCE might
look!

Cheers, Lorenzo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
@ 2016-10-07 10:07           ` Lorenzo Stoakes
  0 siblings, 0 replies; 34+ messages in thread
From: Lorenzo Stoakes @ 2016-10-07 10:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rik van Riel, Hugh Dickins, linux-mm, Mel Gorman, tbsaunde,
	robert, Linux Kernel Mailing List, Andrew Morton

On Sun, Sep 25, 2016 at 03:50:21PM -0700, Linus Torvalds wrote:
> I'd really like to re-open the "drop FOLL_FORCE entirely" discussion,
> because the thing really is disgusting.
>
> I realize that debuggers etc sometimes would want to punch through
> PROT_NONE protections, and I also realize that right now we only have
> a read/write flag, and we have that whole issue with "what if it's
> executable but not readable", which currently FOLL_FORCE makes a
> non-issue.

So I've experimented with this a little locally, removing FOLL_FORCE altogether
and tracking places where it is used (it seems to be a fair few places
actually.)

I've rather naively replaced the FOLL_FORCE check in check_vma_flags() with a
check against 'tsk && tsk->ptrace && tsk->parent == current', I'm not sure how
valid or sane this is, however, but a quick check against gdb proves that it is
able to do its thing in this configuration. Is this a viable path, or is this
way off the mark here?

The places I've found that have invoked gup functions which eventually result in
FOLL_FORCE being set are:

Calls __get_user_pages():
	mm/gup.c: populate_vma_page_range()
	mm/gup.c: get_dump_page()

calls get_user_pages_unlocked():
	drivers/media/pci/ivtv/ivtv-yuv.c: ivtv_yuv_prep_user_dma()
	drivers/media/pci/ivtv/ivtv-udma.c: ivtv_udma_setup()

calls get_user_pages_remote():
	mm/memory.c: __access_remote_vm() [ see below for callers ]
	fs/exec.c: get_arg_page()
	kernel/events/uprobes.c: uprobe_write_opcode()
	kernel/events/uprobes.c: is_trap_at_addr()
	security/tomoyo/domain.c: tomoyo_dump_page()

calls __access_remote_vm():
	mm/memory.c: access_remote_vm() [ see below for callers ]
	mm/memory.c: access_process_vm()

access_process_vm() is exclusively used for ptrace, omitting its callers here.

calls access_remote_vm():
	fs/proc/base.c: proc_pid_cmdline_read()
	fs/proc/base.c: memrw()
	fs/proc/base.c: environ_read()

calls get_user_pages():
	drivers/infiniband/core/umem.c: ib_umem_get()
	drivers/infiniband/hw/qib/qib_user_pages.c: __qib_get_user_pages()
	drivers/infiniband/hw/usnic/usnic_uiom.c: usnic_uiom_get_pages()
	drivers/media/v4l2-core/videobuf-dma-sg.c: videobuf_dma_init_user_locked()

calls get_vaddr_frames():
	drivers/media/v4l2-core/videobuf2-memops.c: vb2_create_framevec()
	drivers/gpu/drm/exynos/exynos_drm_g2d.c: g2d_userptr_get_dma_addr()

So it seems the general areas where it is used are tracing, uprobes and DMA
initialisation what what I can tell. I'm thinking some extra provision/careful
checking will be needed in each of these cases to see if an alternative is
possible.

I'm happy to explore this some more if that is useful in any way, though of
course I defer to your expertise as to how a world without FOLL_FORCE might
look!

Cheers, Lorenzo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
  2016-10-07 10:07           ` Lorenzo Stoakes
@ 2016-10-07 15:34             ` Linus Torvalds
  -1 siblings, 0 replies; 34+ messages in thread
From: Linus Torvalds @ 2016-10-07 15:34 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Rik van Riel, Hugh Dickins, linux-mm, Mel Gorman, tbsaunde,
	robert, Linux Kernel Mailing List, Andrew Morton

On Fri, Oct 7, 2016 at 3:07 AM, Lorenzo Stoakes <lstoakes@gmail.com> wrote:
>
> So I've experimented with this a little locally, removing FOLL_FORCE altogether
> and tracking places where it is used (it seems to be a fair few places
> actually.)

I'm actually a bit worried that it is used too much simply because
it's so easy to use.

In paticular, there's a class of functions that (for legacy reasons)
get the "int force" and "int write" arguments, and they both tend to
just use a very non-descript 0/1 in the callers. Then at some point
you have code like

    if (write)
        flags |= FOLL_WRITE;
    if (force)
        flags |= FOLL_FORCE;

(I'm paraphrasing from memory), and then subsequent calls use that FOLL_FORCE.

And the problem with that it's that it's *really* hard to see where
people actually use FOLL_FORCE, and it's also really easy to use it
without thinking (because it doesn't say "FOLL_FORCE", it just
randomly says "1" in some random argument list).

So the *first* step I'd do is to actually get rid of all the "write"
and "force" arguments, and just pass in "flags" instead, and use
FOLL_FORCE and FOLL_WRITE explicitly in the caller. That way it
suddenly becomes *much* easier to grep for FOLL_FORCE and see who it
is that actually really wants it.

(In fact, I think some functions get *both* "flags" and the separate
write/force argument).

Would you be willing to look at doing that kind of purely syntactic,
non-semantic cleanup first?

> I've rather naively replaced the FOLL_FORCE check in check_vma_flags() with a
> check against 'tsk && tsk->ptrace && tsk->parent == current', I'm not sure how
> valid or sane this is, however, but a quick check against gdb proves that it is
> able to do its thing in this configuration. Is this a viable path, or is this
> way off the mark here?

I think that if we end up having the FOLL_FORCE semantics, we're
actually better off having an explicit FOLL_FORCE flag, and *not* do
some kind of implicit "under these magical circumstances we'll force
it anyway". The implicit thing is what we used to do long long ago, we
definitely don't want to.

So I'd rather see all the callers that actually use FOLL_FORCE, and
then we can start looking at whether it's really a requirement. Some
of them may not strictly *need* it in actual use, they just have it
because of historical reasons (ie exactly those implicit users that
just got mindlessly converted to maintain same semantics).

                     Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
@ 2016-10-07 15:34             ` Linus Torvalds
  0 siblings, 0 replies; 34+ messages in thread
From: Linus Torvalds @ 2016-10-07 15:34 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Rik van Riel, Hugh Dickins, linux-mm, Mel Gorman, tbsaunde,
	robert, Linux Kernel Mailing List, Andrew Morton

On Fri, Oct 7, 2016 at 3:07 AM, Lorenzo Stoakes <lstoakes@gmail.com> wrote:
>
> So I've experimented with this a little locally, removing FOLL_FORCE altogether
> and tracking places where it is used (it seems to be a fair few places
> actually.)

I'm actually a bit worried that it is used too much simply because
it's so easy to use.

In paticular, there's a class of functions that (for legacy reasons)
get the "int force" and "int write" arguments, and they both tend to
just use a very non-descript 0/1 in the callers. Then at some point
you have code like

    if (write)
        flags |= FOLL_WRITE;
    if (force)
        flags |= FOLL_FORCE;

(I'm paraphrasing from memory), and then subsequent calls use that FOLL_FORCE.

And the problem with that it's that it's *really* hard to see where
people actually use FOLL_FORCE, and it's also really easy to use it
without thinking (because it doesn't say "FOLL_FORCE", it just
randomly says "1" in some random argument list).

So the *first* step I'd do is to actually get rid of all the "write"
and "force" arguments, and just pass in "flags" instead, and use
FOLL_FORCE and FOLL_WRITE explicitly in the caller. That way it
suddenly becomes *much* easier to grep for FOLL_FORCE and see who it
is that actually really wants it.

(In fact, I think some functions get *both* "flags" and the separate
write/force argument).

Would you be willing to look at doing that kind of purely syntactic,
non-semantic cleanup first?

> I've rather naively replaced the FOLL_FORCE check in check_vma_flags() with a
> check against 'tsk && tsk->ptrace && tsk->parent == current', I'm not sure how
> valid or sane this is, however, but a quick check against gdb proves that it is
> able to do its thing in this configuration. Is this a viable path, or is this
> way off the mark here?

I think that if we end up having the FOLL_FORCE semantics, we're
actually better off having an explicit FOLL_FORCE flag, and *not* do
some kind of implicit "under these magical circumstances we'll force
it anyway". The implicit thing is what we used to do long long ago, we
definitely don't want to.

So I'd rather see all the callers that actually use FOLL_FORCE, and
then we can start looking at whether it's really a requirement. Some
of them may not strictly *need* it in actual use, they just have it
because of historical reasons (ie exactly those implicit users that
just got mindlessly converted to maintain same semantics).

                     Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
  2016-10-07 15:34             ` Linus Torvalds
@ 2016-10-07 16:22               ` Lorenzo Stoakes
  -1 siblings, 0 replies; 34+ messages in thread
From: Lorenzo Stoakes @ 2016-10-07 16:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rik van Riel, Hugh Dickins, linux-mm, Mel Gorman, tbsaunde,
	robert, Linux Kernel Mailing List, Andrew Morton

On Fri, Oct 07, 2016 at 08:34:15AM -0700, Linus Torvalds wrote:
> Would you be willing to look at doing that kind of purely syntactic,
> non-semantic cleanup first?

Sure, more than happy to do that! I'll work on a patch for this.

> I think that if we end up having the FOLL_FORCE semantics, we're
> actually better off having an explicit FOLL_FORCE flag, and *not* do
> some kind of implicit "under these magical circumstances we'll force
> it anyway". The implicit thing is what we used to do long long ago, we
> definitely don't want to.

That's a good point, it would definitely be considerably more 'magical', and
expanding the conditions to include uprobes etc. would only add to that.

I wondered about an alternative parameter/flag but it felt like it was
more-or-less FOLL_FORCE in a different form, at which point it may as well
remain FOLL_FORCE :)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
@ 2016-10-07 16:22               ` Lorenzo Stoakes
  0 siblings, 0 replies; 34+ messages in thread
From: Lorenzo Stoakes @ 2016-10-07 16:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rik van Riel, Hugh Dickins, linux-mm, Mel Gorman, tbsaunde,
	robert, Linux Kernel Mailing List, Andrew Morton

On Fri, Oct 07, 2016 at 08:34:15AM -0700, Linus Torvalds wrote:
> Would you be willing to look at doing that kind of purely syntactic,
> non-semantic cleanup first?

Sure, more than happy to do that! I'll work on a patch for this.

> I think that if we end up having the FOLL_FORCE semantics, we're
> actually better off having an explicit FOLL_FORCE flag, and *not* do
> some kind of implicit "under these magical circumstances we'll force
> it anyway". The implicit thing is what we used to do long long ago, we
> definitely don't want to.

That's a good point, it would definitely be considerably more 'magical', and
expanding the conditions to include uprobes etc. would only add to that.

I wondered about an alternative parameter/flag but it felt like it was
more-or-less FOLL_FORCE in a different form, at which point it may as well
remain FOLL_FORCE :)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
  2016-10-07 16:22               ` Lorenzo Stoakes
@ 2016-10-07 18:16                 ` Hugh Dickins
  -1 siblings, 0 replies; 34+ messages in thread
From: Hugh Dickins @ 2016-10-07 18:16 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Linus Torvalds, Jan Kara, Dave Hansen, Rik van Riel,
	Hugh Dickins, linux-mm, Mel Gorman, tbsaunde, robert,
	Linux Kernel Mailing List, Andrew Morton

On Fri, 7 Oct 2016, Lorenzo Stoakes wrote:
> On Fri, Oct 07, 2016 at 08:34:15AM -0700, Linus Torvalds wrote:
> > Would you be willing to look at doing that kind of purely syntactic,
> > non-semantic cleanup first?
> 
> Sure, more than happy to do that! I'll work on a patch for this.
> 
> > I think that if we end up having the FOLL_FORCE semantics, we're
> > actually better off having an explicit FOLL_FORCE flag, and *not* do
> > some kind of implicit "under these magical circumstances we'll force
> > it anyway". The implicit thing is what we used to do long long ago, we
> > definitely don't want to.
> 
> That's a good point, it would definitely be considerably more 'magical', and
> expanding the conditions to include uprobes etc. would only add to that.
> 
> I wondered about an alternative parameter/flag but it felt like it was
> more-or-less FOLL_FORCE in a different form, at which point it may as well
> remain FOLL_FORCE :)

Adding Jan Kara (and Dave Hansen) to the Cc list: I think they were
pursuing get_user_pages() cleanups last year (which would remove the
force option from most callers anyway), and I've lost track of where
that all got to.  Lorenzo, please don't expend a lot of effort before
checking with Jan.

Hugh

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
@ 2016-10-07 18:16                 ` Hugh Dickins
  0 siblings, 0 replies; 34+ messages in thread
From: Hugh Dickins @ 2016-10-07 18:16 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Linus Torvalds, Jan Kara, Dave Hansen, Rik van Riel,
	Hugh Dickins, linux-mm, Mel Gorman, tbsaunde, robert,
	Linux Kernel Mailing List, Andrew Morton

On Fri, 7 Oct 2016, Lorenzo Stoakes wrote:
> On Fri, Oct 07, 2016 at 08:34:15AM -0700, Linus Torvalds wrote:
> > Would you be willing to look at doing that kind of purely syntactic,
> > non-semantic cleanup first?
> 
> Sure, more than happy to do that! I'll work on a patch for this.
> 
> > I think that if we end up having the FOLL_FORCE semantics, we're
> > actually better off having an explicit FOLL_FORCE flag, and *not* do
> > some kind of implicit "under these magical circumstances we'll force
> > it anyway". The implicit thing is what we used to do long long ago, we
> > definitely don't want to.
> 
> That's a good point, it would definitely be considerably more 'magical', and
> expanding the conditions to include uprobes etc. would only add to that.
> 
> I wondered about an alternative parameter/flag but it felt like it was
> more-or-less FOLL_FORCE in a different form, at which point it may as well
> remain FOLL_FORCE :)

Adding Jan Kara (and Dave Hansen) to the Cc list: I think they were
pursuing get_user_pages() cleanups last year (which would remove the
force option from most callers anyway), and I've lost track of where
that all got to.  Lorenzo, please don't expend a lot of effort before
checking with Jan.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
  2016-10-07 18:16                 ` Hugh Dickins
@ 2016-10-07 18:26                   ` Lorenzo Stoakes
  -1 siblings, 0 replies; 34+ messages in thread
From: Lorenzo Stoakes @ 2016-10-07 18:26 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Jan Kara, Dave Hansen, Rik van Riel, linux-mm,
	Mel Gorman, tbsaunde, robert, Linux Kernel Mailing List,
	Andrew Morton

On Fri, Oct 07, 2016 at 11:16:26AM -0700, Hugh Dickins wrote:
>
> Adding Jan Kara (and Dave Hansen) to the Cc list: I think they were
> pursuing get_user_pages() cleanups last year (which would remove the
> force option from most callers anyway), and I've lost track of where
> that all got to.  Lorenzo, please don't expend a lot of effort before
> checking with Jan.

Sure, no problem. I have the callers noted down + the surrounding code in my
wetware 'L3 cache' so it's not a huge effort to get a draft patch written, but
I'll hold off on sending anything until I hear from Jan.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
@ 2016-10-07 18:26                   ` Lorenzo Stoakes
  0 siblings, 0 replies; 34+ messages in thread
From: Lorenzo Stoakes @ 2016-10-07 18:26 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Jan Kara, Dave Hansen, Rik van Riel, linux-mm,
	Mel Gorman, tbsaunde, robert, Linux Kernel Mailing List,
	Andrew Morton

On Fri, Oct 07, 2016 at 11:16:26AM -0700, Hugh Dickins wrote:
>
> Adding Jan Kara (and Dave Hansen) to the Cc list: I think they were
> pursuing get_user_pages() cleanups last year (which would remove the
> force option from most callers anyway), and I've lost track of where
> that all got to.  Lorenzo, please don't expend a lot of effort before
> checking with Jan.

Sure, no problem. I have the callers noted down + the surrounding code in my
wetware 'L3 cache' so it's not a huge effort to get a draft patch written, but
I'll hold off on sending anything until I hear from Jan.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
  2016-10-07 18:16                 ` Hugh Dickins
@ 2016-10-10  7:47                   ` Jan Kara
  -1 siblings, 0 replies; 34+ messages in thread
From: Jan Kara @ 2016-10-10  7:47 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Lorenzo Stoakes, Linus Torvalds, Jan Kara, Dave Hansen,
	Rik van Riel, linux-mm, Mel Gorman, tbsaunde, robert,
	Linux Kernel Mailing List, Andrew Morton

On Fri 07-10-16 11:16:26, Hugh Dickins wrote:
> On Fri, 7 Oct 2016, Lorenzo Stoakes wrote:
> > On Fri, Oct 07, 2016 at 08:34:15AM -0700, Linus Torvalds wrote:
> > > Would you be willing to look at doing that kind of purely syntactic,
> > > non-semantic cleanup first?
> > 
> > Sure, more than happy to do that! I'll work on a patch for this.
> > 
> > > I think that if we end up having the FOLL_FORCE semantics, we're
> > > actually better off having an explicit FOLL_FORCE flag, and *not* do
> > > some kind of implicit "under these magical circumstances we'll force
> > > it anyway". The implicit thing is what we used to do long long ago, we
> > > definitely don't want to.
> > 
> > That's a good point, it would definitely be considerably more 'magical', and
> > expanding the conditions to include uprobes etc. would only add to that.
> > 
> > I wondered about an alternative parameter/flag but it felt like it was
> > more-or-less FOLL_FORCE in a different form, at which point it may as well
> > remain FOLL_FORCE :)
> 
> Adding Jan Kara (and Dave Hansen) to the Cc list: I think they were
> pursuing get_user_pages() cleanups last year (which would remove the
> force option from most callers anyway), and I've lost track of where
> that all got to.  Lorenzo, please don't expend a lot of effort before
> checking with Jan.

Yeah, so my cleanups where mostly concerned about mmap_sem locking and
reducing number of places which cared about those. Regarding flags for
get_user_pages() / get_vaddr_frames(), I agree that using flags argument
as Linus suggests will make it easier to see what the callers actually
want. So I'm for that.

Regarding the FOLL_FORCE I've had a discussion about its use in Infiniband
drivers in 2013 with Roland Dreier. He referenced me to discussion
https://lkml.org/lkml/2012/1/26/7 and summarized that they use FOLL_FORCE
to trigger possible COW early. That avoids the situation when the process
triggers COW of the pages later after DMA buffers are set up which results
in the DMA result to not be visible where the process expects it... I'll
defer to others whether that is a sane or intended use of FOLL_FORCE flag
but I suppose that is the reason why most drivers use it when setting up
DMA buffers in memory passed from userspace.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
@ 2016-10-10  7:47                   ` Jan Kara
  0 siblings, 0 replies; 34+ messages in thread
From: Jan Kara @ 2016-10-10  7:47 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Lorenzo Stoakes, Linus Torvalds, Jan Kara, Dave Hansen,
	Rik van Riel, linux-mm, Mel Gorman, tbsaunde, robert,
	Linux Kernel Mailing List, Andrew Morton

On Fri 07-10-16 11:16:26, Hugh Dickins wrote:
> On Fri, 7 Oct 2016, Lorenzo Stoakes wrote:
> > On Fri, Oct 07, 2016 at 08:34:15AM -0700, Linus Torvalds wrote:
> > > Would you be willing to look at doing that kind of purely syntactic,
> > > non-semantic cleanup first?
> > 
> > Sure, more than happy to do that! I'll work on a patch for this.
> > 
> > > I think that if we end up having the FOLL_FORCE semantics, we're
> > > actually better off having an explicit FOLL_FORCE flag, and *not* do
> > > some kind of implicit "under these magical circumstances we'll force
> > > it anyway". The implicit thing is what we used to do long long ago, we
> > > definitely don't want to.
> > 
> > That's a good point, it would definitely be considerably more 'magical', and
> > expanding the conditions to include uprobes etc. would only add to that.
> > 
> > I wondered about an alternative parameter/flag but it felt like it was
> > more-or-less FOLL_FORCE in a different form, at which point it may as well
> > remain FOLL_FORCE :)
> 
> Adding Jan Kara (and Dave Hansen) to the Cc list: I think they were
> pursuing get_user_pages() cleanups last year (which would remove the
> force option from most callers anyway), and I've lost track of where
> that all got to.  Lorenzo, please don't expend a lot of effort before
> checking with Jan.

Yeah, so my cleanups where mostly concerned about mmap_sem locking and
reducing number of places which cared about those. Regarding flags for
get_user_pages() / get_vaddr_frames(), I agree that using flags argument
as Linus suggests will make it easier to see what the callers actually
want. So I'm for that.

Regarding the FOLL_FORCE I've had a discussion about its use in Infiniband
drivers in 2013 with Roland Dreier. He referenced me to discussion
https://lkml.org/lkml/2012/1/26/7 and summarized that they use FOLL_FORCE
to trigger possible COW early. That avoids the situation when the process
triggers COW of the pages later after DMA buffers are set up which results
in the DMA result to not be visible where the process expects it... I'll
defer to others whether that is a sane or intended use of FOLL_FORCE flag
but I suppose that is the reason why most drivers use it when setting up
DMA buffers in memory passed from userspace.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
  2016-10-10  7:47                   ` Jan Kara
@ 2016-10-10  8:28                     ` Lorenzo Stoakes
  -1 siblings, 0 replies; 34+ messages in thread
From: Lorenzo Stoakes @ 2016-10-10  8:28 UTC (permalink / raw)
  To: Jan Kara
  Cc: Hugh Dickins, Linus Torvalds, Dave Hansen, Rik van Riel,
	linux-mm, Mel Gorman, tbsaunde, robert,
	Linux Kernel Mailing List, Andrew Morton

On Mon, Oct 10, 2016 at 09:47:12AM +0200, Jan Kara wrote:
> Yeah, so my cleanups where mostly concerned about mmap_sem locking and
> reducing number of places which cared about those. Regarding flags for
> get_user_pages() / get_vaddr_frames(), I agree that using flags argument
> as Linus suggests will make it easier to see what the callers actually
> want. So I'm for that.

Great, thanks Jan! I have a draft patch that needs a little tweaking/further
testing but isn't too far off.

One thing I am wondering about is whether functions that have write/force
parameters replaced with gup_flags should mask against (FOLL_WRITE | FOLL_FORCE)
to prevent callers from doing unexpected things with other FOLL_* flags?

I'm inclined _not_ to because it adds a rather non-obvious restriction on this
parameter, reduces clarity about which flags are actually being used (which is
the point of the patch in the first place), and the caller ought to know what
they are doing.

I'd be curious to hear people's thoughts on this.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
@ 2016-10-10  8:28                     ` Lorenzo Stoakes
  0 siblings, 0 replies; 34+ messages in thread
From: Lorenzo Stoakes @ 2016-10-10  8:28 UTC (permalink / raw)
  To: Jan Kara
  Cc: Hugh Dickins, Linus Torvalds, Dave Hansen, Rik van Riel,
	linux-mm, Mel Gorman, tbsaunde, robert,
	Linux Kernel Mailing List, Andrew Morton

On Mon, Oct 10, 2016 at 09:47:12AM +0200, Jan Kara wrote:
> Yeah, so my cleanups where mostly concerned about mmap_sem locking and
> reducing number of places which cared about those. Regarding flags for
> get_user_pages() / get_vaddr_frames(), I agree that using flags argument
> as Linus suggests will make it easier to see what the callers actually
> want. So I'm for that.

Great, thanks Jan! I have a draft patch that needs a little tweaking/further
testing but isn't too far off.

One thing I am wondering about is whether functions that have write/force
parameters replaced with gup_flags should mask against (FOLL_WRITE | FOLL_FORCE)
to prevent callers from doing unexpected things with other FOLL_* flags?

I'm inclined _not_ to because it adds a rather non-obvious restriction on this
parameter, reduces clarity about which flags are actually being used (which is
the point of the patch in the first place), and the caller ought to know what
they are doing.

I'd be curious to hear people's thoughts on this.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
  2016-10-10  8:28                     ` Lorenzo Stoakes
@ 2016-10-10 16:37                       ` Jan Kara
  -1 siblings, 0 replies; 34+ messages in thread
From: Jan Kara @ 2016-10-10 16:37 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Jan Kara, Hugh Dickins, Linus Torvalds, Dave Hansen,
	Rik van Riel, linux-mm, Mel Gorman, tbsaunde, robert,
	Linux Kernel Mailing List, Andrew Morton

On Mon 10-10-16 09:28:28, Lorenzo Stoakes wrote:
> On Mon, Oct 10, 2016 at 09:47:12AM +0200, Jan Kara wrote:
> > Yeah, so my cleanups where mostly concerned about mmap_sem locking and
> > reducing number of places which cared about those. Regarding flags for
> > get_user_pages() / get_vaddr_frames(), I agree that using flags argument
> > as Linus suggests will make it easier to see what the callers actually
> > want. So I'm for that.
> 
> Great, thanks Jan! I have a draft patch that needs a little tweaking/further
> testing but isn't too far off.
> 
> One thing I am wondering about is whether functions that have write/force
> parameters replaced with gup_flags should mask against (FOLL_WRITE | FOLL_FORCE)
> to prevent callers from doing unexpected things with other FOLL_* flags?
> 
> I'm inclined _not_ to because it adds a rather non-obvious restriction on this
> parameter, reduces clarity about which flags are actually being used (which is
> the point of the patch in the first place), and the caller ought to know what
> they are doing.

Yeah, just leave flags as is. There is no strong reason to restrict them.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
@ 2016-10-10 16:37                       ` Jan Kara
  0 siblings, 0 replies; 34+ messages in thread
From: Jan Kara @ 2016-10-10 16:37 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Jan Kara, Hugh Dickins, Linus Torvalds, Dave Hansen,
	Rik van Riel, linux-mm, Mel Gorman, tbsaunde, robert,
	Linux Kernel Mailing List, Andrew Morton

On Mon 10-10-16 09:28:28, Lorenzo Stoakes wrote:
> On Mon, Oct 10, 2016 at 09:47:12AM +0200, Jan Kara wrote:
> > Yeah, so my cleanups where mostly concerned about mmap_sem locking and
> > reducing number of places which cared about those. Regarding flags for
> > get_user_pages() / get_vaddr_frames(), I agree that using flags argument
> > as Linus suggests will make it easier to see what the callers actually
> > want. So I'm for that.
> 
> Great, thanks Jan! I have a draft patch that needs a little tweaking/further
> testing but isn't too far off.
> 
> One thing I am wondering about is whether functions that have write/force
> parameters replaced with gup_flags should mask against (FOLL_WRITE | FOLL_FORCE)
> to prevent callers from doing unexpected things with other FOLL_* flags?
> 
> I'm inclined _not_ to because it adds a rather non-obvious restriction on this
> parameter, reduces clarity about which flags are actually being used (which is
> the point of the patch in the first place), and the caller ought to know what
> they are doing.

Yeah, just leave flags as is. There is no strong reason to restrict them.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2016-10-10 16:38 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-11 22:54 [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing Lorenzo Stoakes
2016-09-11 22:59 ` Lorenzo Stoakes
2016-09-11 22:59   ` Lorenzo Stoakes
2016-09-25 18:47 ` Lorenzo Stoakes
2016-09-25 18:47   ` Lorenzo Stoakes
2016-09-25 20:52   ` Linus Torvalds
2016-09-25 20:52     ` Linus Torvalds
2016-09-25 22:24     ` Linus Torvalds
2016-09-25 22:24       ` Linus Torvalds
2016-09-25 22:34     ` Rik van Riel
2016-09-25 22:50       ` Linus Torvalds
2016-09-25 22:50         ` Linus Torvalds
2016-09-25 23:28         ` Hugh Dickins
2016-09-25 23:28           ` Hugh Dickins
2016-09-26  0:49         ` Rik van Riel
2016-09-26  1:05           ` Linus Torvalds
2016-09-26  1:05             ` Linus Torvalds
2016-10-07 10:07         ` Lorenzo Stoakes
2016-10-07 10:07           ` Lorenzo Stoakes
2016-10-07 15:34           ` Linus Torvalds
2016-10-07 15:34             ` Linus Torvalds
2016-10-07 16:22             ` Lorenzo Stoakes
2016-10-07 16:22               ` Lorenzo Stoakes
2016-10-07 18:16               ` Hugh Dickins
2016-10-07 18:16                 ` Hugh Dickins
2016-10-07 18:26                 ` Lorenzo Stoakes
2016-10-07 18:26                   ` Lorenzo Stoakes
2016-10-10  7:47                 ` Jan Kara
2016-10-10  7:47                   ` Jan Kara
2016-10-10  8:28                   ` Lorenzo Stoakes
2016-10-10  8:28                     ` Lorenzo Stoakes
2016-10-10 16:37                     ` Jan Kara
2016-10-10 16:37                       ` Jan Kara
2016-09-26  8:19 ` Mel Gorman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.