linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] mm: Fix a huge pud insertion race during faulting
@ 2019-10-08  9:37 Thomas Hellström (VMware)
  2019-10-15 10:06 ` Kirill A. Shutemov
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Hellström (VMware) @ 2019-10-08  9:37 UTC (permalink / raw)
  To: linux-mm, linux-kernel; +Cc: kirill, Thomas Hellstrom, Matthew Wilcox

From: Thomas Hellstrom <thellstrom@vmware.com>

A huge pud page can theoretically be faulted in racing with pmd_alloc()
in __handle_mm_fault(). That will lead to pmd_alloc() returning an
invalid pmd pointer. Fix this by adding a pud_trans_unstable() function
similar to pmd_trans_unstable() and check whether the pud is really stable
before using the pmd pointer.

Race:
Thread 1:             Thread 2:                 Comment
create_huge_pud()                               Fallback - not taken.
		      create_huge_pud()         Taken.
pmd_alloc()                                     Returns an invalid pointer.

Cc: Matthew Wilcox <willy@infradead.org>
Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages")
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
---
RFC: We include pud_devmap() as an unstable PUD flag. Is this correct?
     Do the same for pmds?
---
 include/asm-generic/pgtable.h | 25 +++++++++++++++++++++++++
 mm/memory.c                   |  6 ++++++
 2 files changed, 31 insertions(+)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 818691846c90..70c2058230ba 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -912,6 +912,31 @@ static inline int pud_trans_huge(pud_t pud)
 }
 #endif
 
+/* See pmd_none_or_trans_huge_or_clear_bad for discussion. */
+static inline int pud_none_or_trans_huge_or_dev_or_clear_bad(pud_t *pud)
+{
+	pud_t pudval = READ_ONCE(*pud);
+
+	if (pud_none(pudval) || pud_trans_huge(pudval) || pud_devmap(pudval))
+		return 1;
+	if (unlikely(pud_bad(pudval))) {
+		pud_clear_bad(pud);
+		return 1;
+	}
+	return 0;
+}
+
+/* See pmd_trans_unstable for discussion. */
+static inline int pud_trans_unstable(pud_t *pud)
+{
+#if defined(CONFIG_TRANSPARENT_HUGEPAGE) &&			\
+	defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD)
+	return pud_none_or_trans_huge_or_dev_or_clear_bad(pud);
+#else
+	return 0;
+#endif
+}
+
 #ifndef pmd_read_atomic
 static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
 {
diff --git a/mm/memory.c b/mm/memory.c
index b1ca51a079f2..43ff372f4f07 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3914,6 +3914,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
 	vmf.pud = pud_alloc(mm, p4d, address);
 	if (!vmf.pud)
 		return VM_FAULT_OOM;
+retry_pud:
 	if (pud_none(*vmf.pud) && __transparent_hugepage_enabled(vma)) {
 		ret = create_huge_pud(&vmf);
 		if (!(ret & VM_FAULT_FALLBACK))
@@ -3940,6 +3941,11 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
 	vmf.pmd = pmd_alloc(mm, vmf.pud, address);
 	if (!vmf.pmd)
 		return VM_FAULT_OOM;
+
+	/* Huge pud page fault raced with pmd_alloc? */
+	if (pud_trans_unstable(vmf.pud))
+		goto retry_pud;
+
 	if (pmd_none(*vmf.pmd) && __transparent_hugepage_enabled(vma)) {
 		ret = create_huge_pmd(&vmf);
 		if (!(ret & VM_FAULT_FALLBACK))
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] mm: Fix a huge pud insertion race during faulting
  2019-10-08  9:37 [RFC PATCH] mm: Fix a huge pud insertion race during faulting Thomas Hellström (VMware)
@ 2019-10-15 10:06 ` Kirill A. Shutemov
  2019-10-16  1:44   ` Dan Williams
  0 siblings, 1 reply; 5+ messages in thread
From: Kirill A. Shutemov @ 2019-10-15 10:06 UTC (permalink / raw)
  To: Thomas Hellström (VMware), Dan Williams, Matthew Wilcox
  Cc: linux-mm, linux-kernel, Thomas Hellstrom

On Tue, Oct 08, 2019 at 11:37:11AM +0200, Thomas Hellström (VMware) wrote:
> From: Thomas Hellstrom <thellstrom@vmware.com>
> 
> A huge pud page can theoretically be faulted in racing with pmd_alloc()
> in __handle_mm_fault(). That will lead to pmd_alloc() returning an
> invalid pmd pointer. Fix this by adding a pud_trans_unstable() function
> similar to pmd_trans_unstable() and check whether the pud is really stable
> before using the pmd pointer.
> 
> Race:
> Thread 1:             Thread 2:                 Comment
> create_huge_pud()                               Fallback - not taken.
> 		      create_huge_pud()         Taken.
> pmd_alloc()                                     Returns an invalid pointer.
> 
> Cc: Matthew Wilcox <willy@infradead.org>
> Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages")
> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
> ---
> RFC: We include pud_devmap() as an unstable PUD flag. Is this correct?
>      Do the same for pmds?

I *think* it is correct and we should do the same for PMD, but I may be
wrong.

Dan, Matthew, could you comment on this?

> ---
>  include/asm-generic/pgtable.h | 25 +++++++++++++++++++++++++
>  mm/memory.c                   |  6 ++++++
>  2 files changed, 31 insertions(+)
> 
> diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
> index 818691846c90..70c2058230ba 100644
> --- a/include/asm-generic/pgtable.h
> +++ b/include/asm-generic/pgtable.h
> @@ -912,6 +912,31 @@ static inline int pud_trans_huge(pud_t pud)
>  }
>  #endif
>  
> +/* See pmd_none_or_trans_huge_or_clear_bad for discussion. */
> +static inline int pud_none_or_trans_huge_or_dev_or_clear_bad(pud_t *pud)
> +{
> +	pud_t pudval = READ_ONCE(*pud);
> +
> +	if (pud_none(pudval) || pud_trans_huge(pudval) || pud_devmap(pudval))
> +		return 1;
> +	if (unlikely(pud_bad(pudval))) {
> +		pud_clear_bad(pud);
> +		return 1;
> +	}
> +	return 0;
> +}
> +
> +/* See pmd_trans_unstable for discussion. */
> +static inline int pud_trans_unstable(pud_t *pud)
> +{
> +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) &&			\
> +	defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD)
> +	return pud_none_or_trans_huge_or_dev_or_clear_bad(pud);
> +#else
> +	return 0;
> +#endif
> +}
> +
>  #ifndef pmd_read_atomic
>  static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
>  {
> diff --git a/mm/memory.c b/mm/memory.c
> index b1ca51a079f2..43ff372f4f07 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3914,6 +3914,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
>  	vmf.pud = pud_alloc(mm, p4d, address);
>  	if (!vmf.pud)
>  		return VM_FAULT_OOM;
> +retry_pud:
>  	if (pud_none(*vmf.pud) && __transparent_hugepage_enabled(vma)) {
>  		ret = create_huge_pud(&vmf);
>  		if (!(ret & VM_FAULT_FALLBACK))
> @@ -3940,6 +3941,11 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
>  	vmf.pmd = pmd_alloc(mm, vmf.pud, address);
>  	if (!vmf.pmd)
>  		return VM_FAULT_OOM;
> +
> +	/* Huge pud page fault raced with pmd_alloc? */
> +	if (pud_trans_unstable(vmf.pud))
> +		goto retry_pud;
> +
>  	if (pmd_none(*vmf.pmd) && __transparent_hugepage_enabled(vma)) {
>  		ret = create_huge_pmd(&vmf);
>  		if (!(ret & VM_FAULT_FALLBACK))
> -- 
> 2.20.1
> 

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] mm: Fix a huge pud insertion race during faulting
  2019-10-15 10:06 ` Kirill A. Shutemov
@ 2019-10-16  1:44   ` Dan Williams
  2019-10-16  5:59     ` Thomas Hellström (VMware)
  0 siblings, 1 reply; 5+ messages in thread
From: Dan Williams @ 2019-10-16  1:44 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Thomas Hellström (VMware),
	Matthew Wilcox, linux-mm, Linux Kernel Mailing List,
	Thomas Hellstrom

On Tue, Oct 15, 2019 at 3:06 AM Kirill A. Shutemov <kirill@shutemov.name> wrote:
>
> On Tue, Oct 08, 2019 at 11:37:11AM +0200, Thomas Hellström (VMware) wrote:
> > From: Thomas Hellstrom <thellstrom@vmware.com>
> >
> > A huge pud page can theoretically be faulted in racing with pmd_alloc()
> > in __handle_mm_fault(). That will lead to pmd_alloc() returning an
> > invalid pmd pointer. Fix this by adding a pud_trans_unstable() function
> > similar to pmd_trans_unstable() and check whether the pud is really stable
> > before using the pmd pointer.
> >
> > Race:
> > Thread 1:             Thread 2:                 Comment
> > create_huge_pud()                               Fallback - not taken.
> >                     create_huge_pud()         Taken.
> > pmd_alloc()                                     Returns an invalid pointer.
> >
> > Cc: Matthew Wilcox <willy@infradead.org>
> > Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages")
> > Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
> > ---
> > RFC: We include pud_devmap() as an unstable PUD flag. Is this correct?
> >      Do the same for pmds?
>
> I *think* it is correct and we should do the same for PMD, but I may be
> wrong.
>
> Dan, Matthew, could you comment on this?

The _devmap() check in these paths near _trans_unstable() has always
been about avoiding assumptions that the corresponding page might be
page cache or anonymous which for dax it's neither and does not behave
like a typical page.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] mm: Fix a huge pud insertion race during faulting
  2019-10-16  1:44   ` Dan Williams
@ 2019-10-16  5:59     ` Thomas Hellström (VMware)
  2019-10-16 20:02       ` Dan Williams
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Hellström (VMware) @ 2019-10-16  5:59 UTC (permalink / raw)
  To: Dan Williams, Kirill A. Shutemov
  Cc: Matthew Wilcox, linux-mm, Linux Kernel Mailing List, Thomas Hellstrom

Hi, Dan,

On 10/16/19 3:44 AM, Dan Williams wrote:
> On Tue, Oct 15, 2019 at 3:06 AM Kirill A. Shutemov <kirill@shutemov.name> wrote:
>> On Tue, Oct 08, 2019 at 11:37:11AM +0200, Thomas Hellström (VMware) wrote:
>>> From: Thomas Hellstrom <thellstrom@vmware.com>
>>>
>>> A huge pud page can theoretically be faulted in racing with pmd_alloc()
>>> in __handle_mm_fault(). That will lead to pmd_alloc() returning an
>>> invalid pmd pointer. Fix this by adding a pud_trans_unstable() function
>>> similar to pmd_trans_unstable() and check whether the pud is really stable
>>> before using the pmd pointer.
>>>
>>> Race:
>>> Thread 1:             Thread 2:                 Comment
>>> create_huge_pud()                               Fallback - not taken.
>>>                      create_huge_pud()         Taken.
>>> pmd_alloc()                                     Returns an invalid pointer.
>>>
>>> Cc: Matthew Wilcox <willy@infradead.org>
>>> Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages")
>>> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
>>> ---
>>> RFC: We include pud_devmap() as an unstable PUD flag. Is this correct?
>>>       Do the same for pmds?
>> I *think* it is correct and we should do the same for PMD, but I may be
>> wrong.
>>
>> Dan, Matthew, could you comment on this?
> The _devmap() check in these paths near _trans_unstable() has always
> been about avoiding assumptions that the corresponding page might be
> page cache or anonymous which for dax it's neither and does not behave
> like a typical page.

The concern here is that _trans_huge() returns false for _devmap() 
pages, which means that also _trans_unstable() returns false.

Still, I figure someone could zap the entry at any time using madvise(), 
so AFAICT the entry is indeed unstable, and it's a bug not to include 
_devmap() in the _trans_unstable() functions?

Thanks,

Thomas




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] mm: Fix a huge pud insertion race during faulting
  2019-10-16  5:59     ` Thomas Hellström (VMware)
@ 2019-10-16 20:02       ` Dan Williams
  0 siblings, 0 replies; 5+ messages in thread
From: Dan Williams @ 2019-10-16 20:02 UTC (permalink / raw)
  To: Thomas Hellström (VMware)
  Cc: Kirill A. Shutemov, Matthew Wilcox, linux-mm,
	Linux Kernel Mailing List, Thomas Hellstrom

On Tue, Oct 15, 2019 at 10:59 PM Thomas Hellström (VMware)
<thomas_os@shipmail.org> wrote:
>
> Hi, Dan,
>
> On 10/16/19 3:44 AM, Dan Williams wrote:
> > On Tue, Oct 15, 2019 at 3:06 AM Kirill A. Shutemov <kirill@shutemov.name> wrote:
> >> On Tue, Oct 08, 2019 at 11:37:11AM +0200, Thomas Hellström (VMware) wrote:
> >>> From: Thomas Hellstrom <thellstrom@vmware.com>
> >>>
> >>> A huge pud page can theoretically be faulted in racing with pmd_alloc()
> >>> in __handle_mm_fault(). That will lead to pmd_alloc() returning an
> >>> invalid pmd pointer. Fix this by adding a pud_trans_unstable() function
> >>> similar to pmd_trans_unstable() and check whether the pud is really stable
> >>> before using the pmd pointer.
> >>>
> >>> Race:
> >>> Thread 1:             Thread 2:                 Comment
> >>> create_huge_pud()                               Fallback - not taken.
> >>>                      create_huge_pud()         Taken.
> >>> pmd_alloc()                                     Returns an invalid pointer.
> >>>
> >>> Cc: Matthew Wilcox <willy@infradead.org>
> >>> Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages")
> >>> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
> >>> ---
> >>> RFC: We include pud_devmap() as an unstable PUD flag. Is this correct?
> >>>       Do the same for pmds?
> >> I *think* it is correct and we should do the same for PMD, but I may be
> >> wrong.
> >>
> >> Dan, Matthew, could you comment on this?
> > The _devmap() check in these paths near _trans_unstable() has always
> > been about avoiding assumptions that the corresponding page might be
> > page cache or anonymous which for dax it's neither and does not behave
> > like a typical page.
>
> The concern here is that _trans_huge() returns false for _devmap()
> pages, which means that also _trans_unstable() returns false.
>
> Still, I figure someone could zap the entry at any time using madvise(),
> so AFAICT the entry is indeed unstable, and it's a bug not to include
> _devmap() in the _trans_unstable() functions?

Yes, I can't think a case where it is wrong to include _devmap() in a
_trans_unstable(). It may be unnecessary if the given path can't
reasonably ever encounter a file-backed dax mapping, but it's
otherwise ok to always consider _devmap().


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-10-16 20:02 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-08  9:37 [RFC PATCH] mm: Fix a huge pud insertion race during faulting Thomas Hellström (VMware)
2019-10-15 10:06 ` Kirill A. Shutemov
2019-10-16  1:44   ` Dan Williams
2019-10-16  5:59     ` Thomas Hellström (VMware)
2019-10-16 20:02       ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).