linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] userfaultfd: hugetlbfs: Fix userfaultfd_huge_must_wait pte access
@ 2018-06-26 13:24 Janosch Frank
  2018-06-26 13:39 ` David Hildenbrand
  2018-06-26 17:00 ` Mike Kravetz
  0 siblings, 2 replies; 5+ messages in thread
From: Janosch Frank @ 2018-06-26 13:24 UTC (permalink / raw)
  To: mike.kravetz, aarcange; +Cc: linux-kernel, viro

Use huge_ptep_get to translate huge ptes to normal ptes so we can
check them with the huge_pte_* functions. Otherwise some architectures
will check the wrong values and will not wait for userspace to bring
in the memory.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Fixes: 369cd2121be4 ("userfaultfd: hugetlbfs: userfaultfd_huge_must_wait for hugepmd ranges")
---
 fs/userfaultfd.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 123bf7d516fc..594d192b2331 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -222,24 +222,26 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
 					 unsigned long reason)
 {
 	struct mm_struct *mm = ctx->mm;
-	pte_t *pte;
+	pte_t *ptep, pte;
 	bool ret = true;
 
 	VM_BUG_ON(!rwsem_is_locked(&mm->mmap_sem));
 
-	pte = huge_pte_offset(mm, address, vma_mmu_pagesize(vma));
-	if (!pte)
+	ptep = huge_pte_offset(mm, address, vma_mmu_pagesize(vma));
+
+	if (!ptep)
 		goto out;
 
 	ret = false;
+	pte = huge_ptep_get(ptep);
 
 	/*
 	 * Lockless access: we're in a wait_event so it's ok if it
 	 * changes under us.
 	 */
-	if (huge_pte_none(*pte))
+	if (huge_pte_none(pte))
 		ret = true;
-	if (!huge_pte_write(*pte) && (reason & VM_UFFD_WP))
+	if (!huge_pte_write(pte) && (reason & VM_UFFD_WP))
 		ret = true;
 out:
 	return ret;
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] userfaultfd: hugetlbfs: Fix userfaultfd_huge_must_wait pte access
  2018-06-26 13:24 [PATCH] userfaultfd: hugetlbfs: Fix userfaultfd_huge_must_wait pte access Janosch Frank
@ 2018-06-26 13:39 ` David Hildenbrand
  2018-06-26 17:00 ` Mike Kravetz
  1 sibling, 0 replies; 5+ messages in thread
From: David Hildenbrand @ 2018-06-26 13:39 UTC (permalink / raw)
  To: Janosch Frank, mike.kravetz, aarcange; +Cc: linux-kernel, viro

On 26.06.2018 15:24, Janosch Frank wrote:
> Use huge_ptep_get to translate huge ptes to normal ptes so we can
> check them with the huge_pte_* functions. Otherwise some architectures
> will check the wrong values and will not wait for userspace to bring
> in the memory.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> Fixes: 369cd2121be4 ("userfaultfd: hugetlbfs: userfaultfd_huge_must_wait for hugepmd ranges")
> ---
>  fs/userfaultfd.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index 123bf7d516fc..594d192b2331 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -222,24 +222,26 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
>  					 unsigned long reason)
>  {
>  	struct mm_struct *mm = ctx->mm;
> -	pte_t *pte;
> +	pte_t *ptep, pte;
>  	bool ret = true;
>  
>  	VM_BUG_ON(!rwsem_is_locked(&mm->mmap_sem));
>  
> -	pte = huge_pte_offset(mm, address, vma_mmu_pagesize(vma));
> -	if (!pte)
> +	ptep = huge_pte_offset(mm, address, vma_mmu_pagesize(vma));
> +
> +	if (!ptep)
>  		goto out;
>  
>  	ret = false;
> +	pte = huge_ptep_get(ptep);
>  
>  	/*
>  	 * Lockless access: we're in a wait_event so it's ok if it
>  	 * changes under us.
>  	 */
> -	if (huge_pte_none(*pte))
> +	if (huge_pte_none(pte))
>  		ret = true;
> -	if (!huge_pte_write(*pte) && (reason & VM_UFFD_WP))
> +	if (!huge_pte_write(pte) && (reason & VM_UFFD_WP))
>  		ret = true;
>  out:
>  	return ret;
> 

Indeed, this only works by luck on x86 and ppc, but not on s390x.

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] userfaultfd: hugetlbfs: Fix userfaultfd_huge_must_wait pte access
  2018-06-26 13:24 [PATCH] userfaultfd: hugetlbfs: Fix userfaultfd_huge_must_wait pte access Janosch Frank
  2018-06-26 13:39 ` David Hildenbrand
@ 2018-06-26 17:00 ` Mike Kravetz
  2018-06-27  8:47   ` Janosch Frank
  1 sibling, 1 reply; 5+ messages in thread
From: Mike Kravetz @ 2018-06-26 17:00 UTC (permalink / raw)
  To: Janosch Frank, aarcange; +Cc: linux-kernel, viro, linux-mm, Andrew Morton

On 06/26/2018 06:24 AM, Janosch Frank wrote:
> Use huge_ptep_get to translate huge ptes to normal ptes so we can
> check them with the huge_pte_* functions. Otherwise some architectures
> will check the wrong values and will not wait for userspace to bring
> in the memory.
> 
> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> Fixes: 369cd2121be4 ("userfaultfd: hugetlbfs: userfaultfd_huge_must_wait for hugepmd ranges")

Adding linux-mm and Andrew on Cc:

Thanks for catching and fixing this.
I think this needs to be fixed in stable as well.  Correct?  Assuming
userfaultfd is/can be enabled for impacted architectures.

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
-- 
Mike Kravetz

> ---
>  fs/userfaultfd.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index 123bf7d516fc..594d192b2331 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -222,24 +222,26 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
>  					 unsigned long reason)
>  {
>  	struct mm_struct *mm = ctx->mm;
> -	pte_t *pte;
> +	pte_t *ptep, pte;
>  	bool ret = true;
>  
>  	VM_BUG_ON(!rwsem_is_locked(&mm->mmap_sem));
>  
> -	pte = huge_pte_offset(mm, address, vma_mmu_pagesize(vma));
> -	if (!pte)
> +	ptep = huge_pte_offset(mm, address, vma_mmu_pagesize(vma));
> +
> +	if (!ptep)
>  		goto out;
>  
>  	ret = false;
> +	pte = huge_ptep_get(ptep);
>  
>  	/*
>  	 * Lockless access: we're in a wait_event so it's ok if it
>  	 * changes under us.
>  	 */
> -	if (huge_pte_none(*pte))
> +	if (huge_pte_none(pte))
>  		ret = true;
> -	if (!huge_pte_write(*pte) && (reason & VM_UFFD_WP))
> +	if (!huge_pte_write(pte) && (reason & VM_UFFD_WP))
>  		ret = true;
>  out:
>  	return ret;
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] userfaultfd: hugetlbfs: Fix userfaultfd_huge_must_wait pte access
  2018-06-26 17:00 ` Mike Kravetz
@ 2018-06-27  8:47   ` Janosch Frank
  2018-07-04  3:58     ` Andrea Arcangeli
  0 siblings, 1 reply; 5+ messages in thread
From: Janosch Frank @ 2018-06-27  8:47 UTC (permalink / raw)
  To: Mike Kravetz, aarcange; +Cc: linux-kernel, viro, linux-mm, Andrew Morton


[-- Attachment #1.1: Type: text/plain, Size: 2124 bytes --]

On 26.06.2018 19:00, Mike Kravetz wrote:
> On 06/26/2018 06:24 AM, Janosch Frank wrote:
>> Use huge_ptep_get to translate huge ptes to normal ptes so we can
>> check them with the huge_pte_* functions. Otherwise some architectures
>> will check the wrong values and will not wait for userspace to bring
>> in the memory.
>>
>> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
>> Fixes: 369cd2121be4 ("userfaultfd: hugetlbfs: userfaultfd_huge_must_wait for hugepmd ranges")
> Adding linux-mm and Andrew on Cc:
> 
> Thanks for catching and fixing this.

Sure
I'd be happy if we get less of these problems with time, this one was
rather painful to debug. :)

> I think this needs to be fixed in stable as well.  Correct?  Assuming
> userfaultfd is/can be enabled for impacted architectures.

Correct, it seems I forgot the CC stable...

> 
> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

Thanks

> -- Mike Kravetz
>> ---
>>  fs/userfaultfd.c | 12 +++++++-----
>>  1 file changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
>> index 123bf7d516fc..594d192b2331 100644
>> --- a/fs/userfaultfd.c
>> +++ b/fs/userfaultfd.c
>> @@ -222,24 +222,26 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
>>  					 unsigned long reason)
>>  {
>>  	struct mm_struct *mm = ctx->mm;
>> -	pte_t *pte;
>> +	pte_t *ptep, pte;
>>  	bool ret = true;
>>  
>>  	VM_BUG_ON(!rwsem_is_locked(&mm->mmap_sem));
>>  
>> -	pte = huge_pte_offset(mm, address, vma_mmu_pagesize(vma));
>> -	if (!pte)
>> +	ptep = huge_pte_offset(mm, address, vma_mmu_pagesize(vma));
>> +
>> +	if (!ptep)
>>  		goto out;
>>  
>>  	ret = false;
>> +	pte = huge_ptep_get(ptep);
>>  
>>  	/*
>>  	 * Lockless access: we're in a wait_event so it's ok if it
>>  	 * changes under us.
>>  	 */
>> -	if (huge_pte_none(*pte))
>> +	if (huge_pte_none(pte))
>>  		ret = true;
>> -	if (!huge_pte_write(*pte) && (reason & VM_UFFD_WP))
>> +	if (!huge_pte_write(pte) && (reason & VM_UFFD_WP))
>>  		ret = true;
>>  out:
>>  	return ret;
>>



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] userfaultfd: hugetlbfs: Fix userfaultfd_huge_must_wait pte access
  2018-06-27  8:47   ` Janosch Frank
@ 2018-07-04  3:58     ` Andrea Arcangeli
  0 siblings, 0 replies; 5+ messages in thread
From: Andrea Arcangeli @ 2018-07-04  3:58 UTC (permalink / raw)
  To: Janosch Frank; +Cc: Mike Kravetz, linux-kernel, viro, linux-mm, Andrew Morton

Hello,

On Wed, Jun 27, 2018 at 10:47:44AM +0200, Janosch Frank wrote:
> On 26.06.2018 19:00, Mike Kravetz wrote:
> > On 06/26/2018 06:24 AM, Janosch Frank wrote:
> >> Use huge_ptep_get to translate huge ptes to normal ptes so we can
> >> check them with the huge_pte_* functions. Otherwise some architectures
> >> will check the wrong values and will not wait for userspace to bring
> >> in the memory.
> >>
> >> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
> >> Fixes: 369cd2121be4 ("userfaultfd: hugetlbfs: userfaultfd_huge_must_wait for hugepmd ranges")
> > Adding linux-mm and Andrew on Cc:
> > 
> > Thanks for catching and fixing this.
> 
> Sure
> I'd be happy if we get less of these problems with time, this one was
> rather painful to debug. :)

What I thought when I read the fix is it would be more robust and we
could catch any further error like this at build time by having
huge_pte_offset return a new type "hugepte_t *" instead of the current
"pte_t *". Of course then huge_ptep_get() would take a "hugepte_t *" as
parameter. The x86 implementation would then become:

static inline pte_t huge_ptep_get(hugepte_t *ptep)
{
	return *(pte_t *)ptep;
}

I haven't tried it, perhaps it's not feasible for other reasons
because there's a significant fallout from such a change (i.e. a lot
of hugetlbfs methods needs to change input type), but you said you're
actively looking to get less of these problems this could be a way if
it can be done, so I should mention it.

The need of huge_ptep_get() of course is very apparent when reading the
fix, but it was all but apparent when reading the previous code and the
previous code was correct for x86 because of course huge_ptep_get is
implemented as *ptep on x86.

For now the current fix is certainly good, any robustness cleanup is
cleaner if done orthogonal anyway.

Thanks!
Andrea

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-07-04  3:58 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-26 13:24 [PATCH] userfaultfd: hugetlbfs: Fix userfaultfd_huge_must_wait pte access Janosch Frank
2018-06-26 13:39 ` David Hildenbrand
2018-06-26 17:00 ` Mike Kravetz
2018-06-27  8:47   ` Janosch Frank
2018-07-04  3:58     ` Andrea Arcangeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).