All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lstoakes@gmail.com>
To: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: Mel Gorman <mgorman@techsingularity.net>,
	torvalds@linux-foundation.org, riel@redhat.com,
	tbsaunde@tbsaunde.org, robert@ocallahan.org,
	Lorenzo Stoakes <lstoakes@gmail.com>
Subject: Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
Date: Sun, 11 Sep 2016 23:59:37 +0100	[thread overview]
Message-ID: <CAA5enKaOyCvSAWfw+KH=hcKT88nSF0nTQSb0RcviUfcCH+tCnw@mail.gmail.com> (raw)
In-Reply-To: <20160911225425.10388-1-lstoakes@gmail.com>

[adding lkml, accidentally excluded!]

On 11 September 2016 at 23:54, Lorenzo Stoakes <lstoakes@gmail.com> wrote:
> The NUMA balancing logic uses an arch-specific PROT_NONE page table flag defined
> by pte_protnone() or pmd_protnone() to mark PTEs or huge page PMDs respectively
> as requiring balancing upon a subsequent page fault. User-defined PROT_NONE
> memory regions which also have this flag set will not normally invoke the NUMA
> balancing code as do_page_fault() will send a segfault to the process before
> handle_mm_fault() is even called.
>
> However if access_remote_vm() is invoked to access a PROT_NONE region of memory,
> handle_mm_fault() is called via faultin_page() and __get_user_pages() without
> any access checks being performed, meaning the NUMA balancing logic is
> incorrectly invoked on a non-NUMA memory region.
>
> A simple means of triggering this problem is to access PROT_NONE mmap'd memory
> using /proc/self/mem which reliably results in the NUMA handling functions being
> invoked when CONFIG_NUMA_BALANCING is set.
>
> This issue was reported in bugzilla (issue 99101) which includes some simple
> repro code.
>
> There are BUG_ON() checks in do_numa_page() and do_huge_pmd_numa_page() added at
> commit c0e7cad to avoid accidentally provoking strange behaviour by attempting
> to apply NUMA balancing to pages that are in fact PROT_NONE. The BUG_ON()'s are
> consistently triggered by the repro.
>
> This patch moves the PROT_NONE check into mm/memory.c rather than invoking
> BUG_ON() as faulting in these pages via faultin_page() is a valid reason for
> reaching the NUMA check with the PROT_NONE page table flag set and is therefore
> not always a bug.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=99101
> Reported-by: Trevor Saunders <tbsaunde@tbsaunde.org>
> Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
> ---
>  mm/huge_memory.c |  3 ---
>  mm/memory.c      | 12 +++++++-----
>  2 files changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index d76700d..954be55 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1198,9 +1198,6 @@ int do_huge_pmd_numa_page(struct fault_env *fe, pmd_t pmd)
>         bool was_writable;
>         int flags = 0;
>
> -       /* A PROT_NONE fault should not end up here */
> -       BUG_ON(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)));
> -
>         fe->ptl = pmd_lock(vma->vm_mm, fe->pmd);
>         if (unlikely(!pmd_same(pmd, *fe->pmd)))
>                 goto out_unlock;
> diff --git a/mm/memory.c b/mm/memory.c
> index 020226b..aebc04f 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3351,9 +3351,6 @@ static int do_numa_page(struct fault_env *fe, pte_t pte)
>         bool was_writable = pte_write(pte);
>         int flags = 0;
>
> -       /* A PROT_NONE fault should not end up here */
> -       BUG_ON(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)));
> -
>         /*
>         * The "pte" at this point cannot be used safely without
>         * validation through pte_unmap_same(). It's of NUMA type but
> @@ -3458,6 +3455,11 @@ static int wp_huge_pmd(struct fault_env *fe, pmd_t orig_pmd)
>         return VM_FAULT_FALLBACK;
>  }
>
> +static inline bool vma_is_accessible(struct vm_area_struct *vma)
> +{
> +       return vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE);
> +}
> +
>  /*
>   * These routines also need to handle stuff like marking pages dirty
>   * and/or accessed for architectures that don't do it in hardware (most
> @@ -3524,7 +3526,7 @@ static int handle_pte_fault(struct fault_env *fe)
>         if (!pte_present(entry))
>                 return do_swap_page(fe, entry);
>
> -       if (pte_protnone(entry))
> +       if (pte_protnone(entry) && vma_is_accessible(fe->vma))
>                 return do_numa_page(fe, entry);
>
>         fe->ptl = pte_lockptr(fe->vma->vm_mm, fe->pmd);
> @@ -3590,7 +3592,7 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
>
>                 barrier();
>                 if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) {
> -                       if (pmd_protnone(orig_pmd))
> +                       if (pmd_protnone(orig_pmd) && vma_is_accessible(vma))
>                                 return do_huge_pmd_numa_page(&fe, orig_pmd);
>
>                         if ((fe.flags & FAULT_FLAG_WRITE) &&
> --
> 2.9.3



-- 
Lorenzo Stoakes
https://ljs.io

WARNING: multiple messages have this Message-ID (diff)
From: Lorenzo Stoakes <lstoakes@gmail.com>
To: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: Mel Gorman <mgorman@techsingularity.net>,
	torvalds@linux-foundation.org, riel@redhat.com,
	tbsaunde@tbsaunde.org, robert@ocallahan.org,
	Lorenzo Stoakes <lstoakes@gmail.com>
Subject: Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
Date: Sun, 11 Sep 2016 23:59:37 +0100	[thread overview]
Message-ID: <CAA5enKaOyCvSAWfw+KH=hcKT88nSF0nTQSb0RcviUfcCH+tCnw@mail.gmail.com> (raw)
In-Reply-To: <20160911225425.10388-1-lstoakes@gmail.com>

[adding lkml, accidentally excluded!]

On 11 September 2016 at 23:54, Lorenzo Stoakes <lstoakes@gmail.com> wrote:
> The NUMA balancing logic uses an arch-specific PROT_NONE page table flag defined
> by pte_protnone() or pmd_protnone() to mark PTEs or huge page PMDs respectively
> as requiring balancing upon a subsequent page fault. User-defined PROT_NONE
> memory regions which also have this flag set will not normally invoke the NUMA
> balancing code as do_page_fault() will send a segfault to the process before
> handle_mm_fault() is even called.
>
> However if access_remote_vm() is invoked to access a PROT_NONE region of memory,
> handle_mm_fault() is called via faultin_page() and __get_user_pages() without
> any access checks being performed, meaning the NUMA balancing logic is
> incorrectly invoked on a non-NUMA memory region.
>
> A simple means of triggering this problem is to access PROT_NONE mmap'd memory
> using /proc/self/mem which reliably results in the NUMA handling functions being
> invoked when CONFIG_NUMA_BALANCING is set.
>
> This issue was reported in bugzilla (issue 99101) which includes some simple
> repro code.
>
> There are BUG_ON() checks in do_numa_page() and do_huge_pmd_numa_page() added at
> commit c0e7cad to avoid accidentally provoking strange behaviour by attempting
> to apply NUMA balancing to pages that are in fact PROT_NONE. The BUG_ON()'s are
> consistently triggered by the repro.
>
> This patch moves the PROT_NONE check into mm/memory.c rather than invoking
> BUG_ON() as faulting in these pages via faultin_page() is a valid reason for
> reaching the NUMA check with the PROT_NONE page table flag set and is therefore
> not always a bug.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=99101
> Reported-by: Trevor Saunders <tbsaunde@tbsaunde.org>
> Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
> ---
>  mm/huge_memory.c |  3 ---
>  mm/memory.c      | 12 +++++++-----
>  2 files changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index d76700d..954be55 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1198,9 +1198,6 @@ int do_huge_pmd_numa_page(struct fault_env *fe, pmd_t pmd)
>         bool was_writable;
>         int flags = 0;
>
> -       /* A PROT_NONE fault should not end up here */
> -       BUG_ON(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)));
> -
>         fe->ptl = pmd_lock(vma->vm_mm, fe->pmd);
>         if (unlikely(!pmd_same(pmd, *fe->pmd)))
>                 goto out_unlock;
> diff --git a/mm/memory.c b/mm/memory.c
> index 020226b..aebc04f 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3351,9 +3351,6 @@ static int do_numa_page(struct fault_env *fe, pte_t pte)
>         bool was_writable = pte_write(pte);
>         int flags = 0;
>
> -       /* A PROT_NONE fault should not end up here */
> -       BUG_ON(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)));
> -
>         /*
>         * The "pte" at this point cannot be used safely without
>         * validation through pte_unmap_same(). It's of NUMA type but
> @@ -3458,6 +3455,11 @@ static int wp_huge_pmd(struct fault_env *fe, pmd_t orig_pmd)
>         return VM_FAULT_FALLBACK;
>  }
>
> +static inline bool vma_is_accessible(struct vm_area_struct *vma)
> +{
> +       return vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE);
> +}
> +
>  /*
>   * These routines also need to handle stuff like marking pages dirty
>   * and/or accessed for architectures that don't do it in hardware (most
> @@ -3524,7 +3526,7 @@ static int handle_pte_fault(struct fault_env *fe)
>         if (!pte_present(entry))
>                 return do_swap_page(fe, entry);
>
> -       if (pte_protnone(entry))
> +       if (pte_protnone(entry) && vma_is_accessible(fe->vma))
>                 return do_numa_page(fe, entry);
>
>         fe->ptl = pte_lockptr(fe->vma->vm_mm, fe->pmd);
> @@ -3590,7 +3592,7 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
>
>                 barrier();
>                 if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) {
> -                       if (pmd_protnone(orig_pmd))
> +                       if (pmd_protnone(orig_pmd) && vma_is_accessible(vma))
>                                 return do_huge_pmd_numa_page(&fe, orig_pmd);
>
>                         if ((fe.flags & FAULT_FLAG_WRITE) &&
> --
> 2.9.3



-- 
Lorenzo Stoakes
https://ljs.io

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-09-11 23:00 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-11 22:54 [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing Lorenzo Stoakes
2016-09-11 22:59 ` Lorenzo Stoakes [this message]
2016-09-11 22:59   ` Lorenzo Stoakes
2016-09-25 18:47 ` Lorenzo Stoakes
2016-09-25 18:47   ` Lorenzo Stoakes
2016-09-25 20:52   ` Linus Torvalds
2016-09-25 20:52     ` Linus Torvalds
2016-09-25 22:24     ` Linus Torvalds
2016-09-25 22:24       ` Linus Torvalds
2016-09-25 22:34     ` Rik van Riel
2016-09-25 22:50       ` Linus Torvalds
2016-09-25 22:50         ` Linus Torvalds
2016-09-25 23:28         ` Hugh Dickins
2016-09-25 23:28           ` Hugh Dickins
2016-09-26  0:49         ` Rik van Riel
2016-09-26  1:05           ` Linus Torvalds
2016-09-26  1:05             ` Linus Torvalds
2016-10-07 10:07         ` Lorenzo Stoakes
2016-10-07 10:07           ` Lorenzo Stoakes
2016-10-07 15:34           ` Linus Torvalds
2016-10-07 15:34             ` Linus Torvalds
2016-10-07 16:22             ` Lorenzo Stoakes
2016-10-07 16:22               ` Lorenzo Stoakes
2016-10-07 18:16               ` Hugh Dickins
2016-10-07 18:16                 ` Hugh Dickins
2016-10-07 18:26                 ` Lorenzo Stoakes
2016-10-07 18:26                   ` Lorenzo Stoakes
2016-10-10  7:47                 ` Jan Kara
2016-10-10  7:47                   ` Jan Kara
2016-10-10  8:28                   ` Lorenzo Stoakes
2016-10-10  8:28                     ` Lorenzo Stoakes
2016-10-10 16:37                     ` Jan Kara
2016-10-10 16:37                       ` Jan Kara
2016-09-26  8:19 ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAA5enKaOyCvSAWfw+KH=hcKT88nSF0nTQSb0RcviUfcCH+tCnw@mail.gmail.com' \
    --to=lstoakes@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=riel@redhat.com \
    --cc=robert@ocallahan.org \
    --cc=tbsaunde@tbsaunde.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.