From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758253AbcIYSrj (ORCPT ); Sun, 25 Sep 2016 14:47:39 -0400 Received: from mail-wm0-f66.google.com ([74.125.82.66]:36586 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756416AbcIYSrf (ORCPT ); Sun, 25 Sep 2016 14:47:35 -0400 Date: Sun, 25 Sep 2016 19:47:31 +0100 From: Lorenzo Stoakes To: linux-mm@kvack.org Cc: mgorman@techsingularity.net, torvalds@linux-foundation.org, riel@redhat.com, tbsaunde@tbsaunde.org, robert@ocallahan.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org Subject: Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing Message-ID: <20160925184731.GA20480@lucifer> References: <20160911225425.10388-1-lstoakes@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160911225425.10388-1-lstoakes@gmail.com> User-Agent: Mutt/1.7.0 (2016-08-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Just a quick ping on this, let me know if you need anything more from me! Thanks, Lorenzo On Sun, Sep 11, 2016 at 11:54:25PM +0100, Lorenzo Stoakes wrote: > The NUMA balancing logic uses an arch-specific PROT_NONE page table flag defined > by pte_protnone() or pmd_protnone() to mark PTEs or huge page PMDs respectively > as requiring balancing upon a subsequent page fault. User-defined PROT_NONE > memory regions which also have this flag set will not normally invoke the NUMA > balancing code as do_page_fault() will send a segfault to the process before > handle_mm_fault() is even called. > > However if access_remote_vm() is invoked to access a PROT_NONE region of memory, > handle_mm_fault() is called via faultin_page() and __get_user_pages() without > any access checks being performed, meaning the NUMA balancing logic is > incorrectly invoked on a non-NUMA memory region. > > A simple means of triggering this problem is to access PROT_NONE mmap'd memory > using /proc/self/mem which reliably results in the NUMA handling functions being > invoked when CONFIG_NUMA_BALANCING is set. > > This issue was reported in bugzilla (issue 99101) which includes some simple > repro code. > > There are BUG_ON() checks in do_numa_page() and do_huge_pmd_numa_page() added at > commit c0e7cad to avoid accidentally provoking strange behaviour by attempting > to apply NUMA balancing to pages that are in fact PROT_NONE. The BUG_ON()'s are > consistently triggered by the repro. > > This patch moves the PROT_NONE check into mm/memory.c rather than invoking > BUG_ON() as faulting in these pages via faultin_page() is a valid reason for > reaching the NUMA check with the PROT_NONE page table flag set and is therefore > not always a bug. > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=99101 > Reported-by: Trevor Saunders > Signed-off-by: Lorenzo Stoakes > --- > mm/huge_memory.c | 3 --- > mm/memory.c | 12 +++++++----- > 2 files changed, 7 insertions(+), 8 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index d76700d..954be55 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -1198,9 +1198,6 @@ int do_huge_pmd_numa_page(struct fault_env *fe, pmd_t pmd) > bool was_writable; > int flags = 0; > > - /* A PROT_NONE fault should not end up here */ > - BUG_ON(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE))); > - > fe->ptl = pmd_lock(vma->vm_mm, fe->pmd); > if (unlikely(!pmd_same(pmd, *fe->pmd))) > goto out_unlock; > diff --git a/mm/memory.c b/mm/memory.c > index 020226b..aebc04f 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3351,9 +3351,6 @@ static int do_numa_page(struct fault_env *fe, pte_t pte) > bool was_writable = pte_write(pte); > int flags = 0; > > - /* A PROT_NONE fault should not end up here */ > - BUG_ON(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE))); > - > /* > * The "pte" at this point cannot be used safely without > * validation through pte_unmap_same(). It's of NUMA type but > @@ -3458,6 +3455,11 @@ static int wp_huge_pmd(struct fault_env *fe, pmd_t orig_pmd) > return VM_FAULT_FALLBACK; > } > > +static inline bool vma_is_accessible(struct vm_area_struct *vma) > +{ > + return vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE); > +} > + > /* > * These routines also need to handle stuff like marking pages dirty > * and/or accessed for architectures that don't do it in hardware (most > @@ -3524,7 +3526,7 @@ static int handle_pte_fault(struct fault_env *fe) > if (!pte_present(entry)) > return do_swap_page(fe, entry); > > - if (pte_protnone(entry)) > + if (pte_protnone(entry) && vma_is_accessible(fe->vma)) > return do_numa_page(fe, entry); > > fe->ptl = pte_lockptr(fe->vma->vm_mm, fe->pmd); > @@ -3590,7 +3592,7 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address, > > barrier(); > if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) { > - if (pmd_protnone(orig_pmd)) > + if (pmd_protnone(orig_pmd) && vma_is_accessible(vma)) > return do_huge_pmd_numa_page(&fe, orig_pmd); > > if ((fe.flags & FAULT_FLAG_WRITE) && > -- > 2.9.3 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f71.google.com (mail-wm0-f71.google.com [74.125.82.71]) by kanga.kvack.org (Postfix) with ESMTP id A6046280267 for ; Sun, 25 Sep 2016 14:47:35 -0400 (EDT) Received: by mail-wm0-f71.google.com with SMTP id b130so63022926wmc.2 for ; Sun, 25 Sep 2016 11:47:35 -0700 (PDT) Received: from mail-wm0-x244.google.com (mail-wm0-x244.google.com. [2a00:1450:400c:c09::244]) by mx.google.com with ESMTPS id y6si5375023wmh.60.2016.09.25.11.47.34 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 25 Sep 2016 11:47:34 -0700 (PDT) Received: by mail-wm0-x244.google.com with SMTP id b184so10916495wma.3 for ; Sun, 25 Sep 2016 11:47:34 -0700 (PDT) Date: Sun, 25 Sep 2016 19:47:31 +0100 From: Lorenzo Stoakes Subject: Re: [PATCH] mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing Message-ID: <20160925184731.GA20480@lucifer> References: <20160911225425.10388-1-lstoakes@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160911225425.10388-1-lstoakes@gmail.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: mgorman@techsingularity.net, torvalds@linux-foundation.org, riel@redhat.com, tbsaunde@tbsaunde.org, robert@ocallahan.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org Just a quick ping on this, let me know if you need anything more from me! Thanks, Lorenzo On Sun, Sep 11, 2016 at 11:54:25PM +0100, Lorenzo Stoakes wrote: > The NUMA balancing logic uses an arch-specific PROT_NONE page table flag defined > by pte_protnone() or pmd_protnone() to mark PTEs or huge page PMDs respectively > as requiring balancing upon a subsequent page fault. User-defined PROT_NONE > memory regions which also have this flag set will not normally invoke the NUMA > balancing code as do_page_fault() will send a segfault to the process before > handle_mm_fault() is even called. > > However if access_remote_vm() is invoked to access a PROT_NONE region of memory, > handle_mm_fault() is called via faultin_page() and __get_user_pages() without > any access checks being performed, meaning the NUMA balancing logic is > incorrectly invoked on a non-NUMA memory region. > > A simple means of triggering this problem is to access PROT_NONE mmap'd memory > using /proc/self/mem which reliably results in the NUMA handling functions being > invoked when CONFIG_NUMA_BALANCING is set. > > This issue was reported in bugzilla (issue 99101) which includes some simple > repro code. > > There are BUG_ON() checks in do_numa_page() and do_huge_pmd_numa_page() added at > commit c0e7cad to avoid accidentally provoking strange behaviour by attempting > to apply NUMA balancing to pages that are in fact PROT_NONE. The BUG_ON()'s are > consistently triggered by the repro. > > This patch moves the PROT_NONE check into mm/memory.c rather than invoking > BUG_ON() as faulting in these pages via faultin_page() is a valid reason for > reaching the NUMA check with the PROT_NONE page table flag set and is therefore > not always a bug. > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=99101 > Reported-by: Trevor Saunders > Signed-off-by: Lorenzo Stoakes > --- > mm/huge_memory.c | 3 --- > mm/memory.c | 12 +++++++----- > 2 files changed, 7 insertions(+), 8 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index d76700d..954be55 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -1198,9 +1198,6 @@ int do_huge_pmd_numa_page(struct fault_env *fe, pmd_t pmd) > bool was_writable; > int flags = 0; > > - /* A PROT_NONE fault should not end up here */ > - BUG_ON(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE))); > - > fe->ptl = pmd_lock(vma->vm_mm, fe->pmd); > if (unlikely(!pmd_same(pmd, *fe->pmd))) > goto out_unlock; > diff --git a/mm/memory.c b/mm/memory.c > index 020226b..aebc04f 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3351,9 +3351,6 @@ static int do_numa_page(struct fault_env *fe, pte_t pte) > bool was_writable = pte_write(pte); > int flags = 0; > > - /* A PROT_NONE fault should not end up here */ > - BUG_ON(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE))); > - > /* > * The "pte" at this point cannot be used safely without > * validation through pte_unmap_same(). It's of NUMA type but > @@ -3458,6 +3455,11 @@ static int wp_huge_pmd(struct fault_env *fe, pmd_t orig_pmd) > return VM_FAULT_FALLBACK; > } > > +static inline bool vma_is_accessible(struct vm_area_struct *vma) > +{ > + return vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE); > +} > + > /* > * These routines also need to handle stuff like marking pages dirty > * and/or accessed for architectures that don't do it in hardware (most > @@ -3524,7 +3526,7 @@ static int handle_pte_fault(struct fault_env *fe) > if (!pte_present(entry)) > return do_swap_page(fe, entry); > > - if (pte_protnone(entry)) > + if (pte_protnone(entry) && vma_is_accessible(fe->vma)) > return do_numa_page(fe, entry); > > fe->ptl = pte_lockptr(fe->vma->vm_mm, fe->pmd); > @@ -3590,7 +3592,7 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address, > > barrier(); > if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) { > - if (pmd_protnone(orig_pmd)) > + if (pmd_protnone(orig_pmd) && vma_is_accessible(vma)) > return do_huge_pmd_numa_page(&fe, orig_pmd); > > if ((fe.flags & FAULT_FLAG_WRITE) && > -- > 2.9.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org