linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] userfaultfd v4.11 updates
@ 2017-03-02 17:37 Andrea Arcangeli
  2017-03-02 17:37 ` [PATCH 1/3] userfaultfd: non-cooperative: fix fork fctx->new memleak Andrea Arcangeli
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Andrea Arcangeli @ 2017-03-02 17:37 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: Mike Rapoport, Dr. David Alan Gilbert, Mike Kravetz,
	Pavel Emelyanov, Hillf Danton

Hello,

this is incremental against current -mm.

The earlier patchset fixed the fctx->orig memleak but as I thought
during review, there was a further leak in fctx->new and Mike promptly
fixed it too. I've been running a reproducer and there's no further
memleak when the uffd is closed by the parent while fork() blocks
waiting for the event to be received.

In addition I found a potential stale pointer in MADV_DONTNEED if
userfaultfd_remove has to block and releases the mmap_sem in the
process. This patch revalidates the vma and fixes the
race. Unfortunately calling userfaultfd_remove last is not ok as
explained in the commit.

A "make" run from the directory selftests/vm/ independently started to
fail in v4.11 trying to write executables to the root directory.
That's not very friendly because it worked before, but it's easy to
fix and the last patch corrects this behavior in the vm/Makefile.

Andrea Arcangeli (2):
  userfaultfd: non-cooperative: userfaultfd_remove revalidate vma in
    MADV_DONTNEED
  userfaultfd: selftest: vm: allow to build in vm/ directory

Mike Rapoport (1):
  userfaultfd: non-cooperative: fix fork fctx->new memleak

 fs/userfaultfd.c                    | 18 ++++++++++-----
 include/linux/userfaultfd_k.h       |  7 +++---
 mm/madvise.c                        | 44 ++++++++++++++++++++++++++++++++++---
 tools/testing/selftests/vm/Makefile |  4 ++++
 4 files changed, 60 insertions(+), 13 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/3] userfaultfd: non-cooperative: fix fork fctx->new memleak
  2017-03-02 17:37 [PATCH 0/3] userfaultfd v4.11 updates Andrea Arcangeli
@ 2017-03-02 17:37 ` Andrea Arcangeli
  2017-03-03 11:09   ` Mike Rapoport
  2017-09-28  9:26   ` Sasha Levin
  2017-03-02 17:37 ` [PATCH 2/3] userfaultfd: non-cooperative: userfaultfd_remove revalidate vma in MADV_DONTNEED Andrea Arcangeli
  2017-03-02 17:37 ` [PATCH 3/3] userfaultfd: selftest: vm: allow to build in vm/ directory Andrea Arcangeli
  2 siblings, 2 replies; 8+ messages in thread
From: Andrea Arcangeli @ 2017-03-02 17:37 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: Mike Rapoport, Dr. David Alan Gilbert, Mike Kravetz,
	Pavel Emelyanov, Hillf Danton

From: Mike Rapoport <rppt@linux.vnet.ibm.com>

We have a memleak in the ->new ctx if the uffd of the parent is closed
before the fork event is read, nothing frees the new context.

Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 fs/userfaultfd.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index d2f15a6..5087a69 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -548,6 +548,15 @@ static void userfaultfd_event_wait_completion(struct userfaultfd_ctx *ctx,
 		if (ACCESS_ONCE(ctx->released) ||
 		    fatal_signal_pending(current)) {
 			__remove_wait_queue(&ctx->event_wqh, &ewq->wq);
+			if (ewq->msg.event == UFFD_EVENT_FORK) {
+				struct userfaultfd_ctx *new;
+
+				new = (struct userfaultfd_ctx *)
+					(unsigned long)
+					ewq->msg.arg.reserved.reserved1;
+
+				userfaultfd_ctx_put(new);
+			}
 			break;
 		}
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/3] userfaultfd: non-cooperative: userfaultfd_remove revalidate vma in MADV_DONTNEED
  2017-03-02 17:37 [PATCH 0/3] userfaultfd v4.11 updates Andrea Arcangeli
  2017-03-02 17:37 ` [PATCH 1/3] userfaultfd: non-cooperative: fix fork fctx->new memleak Andrea Arcangeli
@ 2017-03-02 17:37 ` Andrea Arcangeli
  2017-03-03 19:48   ` Mike Rapoport
  2017-03-02 17:37 ` [PATCH 3/3] userfaultfd: selftest: vm: allow to build in vm/ directory Andrea Arcangeli
  2 siblings, 1 reply; 8+ messages in thread
From: Andrea Arcangeli @ 2017-03-02 17:37 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: Mike Rapoport, Dr. David Alan Gilbert, Mike Kravetz,
	Pavel Emelyanov, Hillf Danton

userfaultfd_remove() has to be execute before zapping the pagetables or
UFFDIO_COPY could keep filling pages after zap_page_range returned,
which would result in non zero data after a MADV_DONTNEED.

However userfaultfd_remove() may have to release the mmap_sem. This
was handled correctly in MADV_REMOVE, but MADV_DONTNEED accessed a
potentially stale vma (the very vma passed to zap_page_range(vma, ...)).

The fix consists in revalidating the vma in case userfaultfd_remove()
had to release the mmap_sem.

This also optimizes away an unnecessary down_read/up_read in the
MADV_REMOVE case if UFFD_EVENT_FORK had to be delivered.

It all remains zero runtime cost in case CONFIG_USERFAULTFD=n as
userfaultfd_remove() will be defined as "true" at build time.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 fs/userfaultfd.c              |  9 +++------
 include/linux/userfaultfd_k.h |  7 +++----
 mm/madvise.c                  | 44 ++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 47 insertions(+), 13 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 5087a69..2104811 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -694,8 +694,7 @@ void mremap_userfaultfd_complete(struct vm_userfaultfd_ctx *vm_ctx,
 	userfaultfd_event_wait_completion(ctx, &ewq);
 }
 
-void userfaultfd_remove(struct vm_area_struct *vma,
-			struct vm_area_struct **prev,
+bool userfaultfd_remove(struct vm_area_struct *vma,
 			unsigned long start, unsigned long end)
 {
 	struct mm_struct *mm = vma->vm_mm;
@@ -704,13 +703,11 @@ void userfaultfd_remove(struct vm_area_struct *vma,
 
 	ctx = vma->vm_userfaultfd_ctx.ctx;
 	if (!ctx || !(ctx->features & UFFD_FEATURE_EVENT_REMOVE))
-		return;
+		return true;
 
 	userfaultfd_ctx_get(ctx);
 	up_read(&mm->mmap_sem);
 
-	*prev = NULL; /* We wait for ACK w/o the mmap semaphore */
-
 	msg_init(&ewq.msg);
 
 	ewq.msg.event = UFFD_EVENT_REMOVE;
@@ -719,7 +716,7 @@ void userfaultfd_remove(struct vm_area_struct *vma,
 
 	userfaultfd_event_wait_completion(ctx, &ewq);
 
-	down_read(&mm->mmap_sem);
+	return false;
 }
 
 static bool has_unmap_ctx(struct userfaultfd_ctx *ctx, struct list_head *unmaps,
diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index f2b79bf..48a3483 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -61,8 +61,7 @@ extern void mremap_userfaultfd_complete(struct vm_userfaultfd_ctx *,
 					unsigned long from, unsigned long to,
 					unsigned long len);
 
-extern void userfaultfd_remove(struct vm_area_struct *vma,
-			       struct vm_area_struct **prev,
+extern bool userfaultfd_remove(struct vm_area_struct *vma,
 			       unsigned long start,
 			       unsigned long end);
 
@@ -118,11 +117,11 @@ static inline void mremap_userfaultfd_complete(struct vm_userfaultfd_ctx *ctx,
 {
 }
 
-static inline void userfaultfd_remove(struct vm_area_struct *vma,
-				      struct vm_area_struct **prev,
+static inline bool userfaultfd_remove(struct vm_area_struct *vma,
 				      unsigned long start,
 				      unsigned long end)
 {
+	return true;
 }
 
 static inline int userfaultfd_unmap_prep(struct vm_area_struct *vma,
diff --git a/mm/madvise.c b/mm/madvise.c
index dc5927c..7a2abf0 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -513,7 +513,43 @@ static long madvise_dontneed(struct vm_area_struct *vma,
 	if (!can_madv_dontneed_vma(vma))
 		return -EINVAL;
 
-	userfaultfd_remove(vma, prev, start, end);
+	if (!userfaultfd_remove(vma, start, end)) {
+		*prev = NULL; /* mmap_sem has been dropped, prev is stale */
+
+		down_read(&current->mm->mmap_sem);
+		vma = find_vma(current->mm, start);
+		if (!vma)
+			return -ENOMEM;
+		if (start < vma->vm_start) {
+			/*
+			 * This "vma" under revalidation is the one
+			 * with the lowest vma->vm_start where start
+			 * is also < vma->vm_end. If start <
+			 * vma->vm_start it means an hole materialized
+			 * in the user address space within the
+			 * virtual range passed to MADV_DONTNEED.
+			 */
+			return -ENOMEM;
+		}
+		if (!can_madv_dontneed_vma(vma))
+			return -EINVAL;
+		if (end > vma->vm_end) {
+			/*
+			 * Don't fail if end > vma->vm_end. If the old
+			 * vma was splitted while the mmap_sem was
+			 * released the effect of the concurrent
+			 * operation may not cause MADV_DONTNEED to
+			 * have an undefined result. There may be an
+			 * adjacent next vma that we'll walk
+			 * next. userfaultfd_remove() will generate an
+			 * UFFD_EVENT_REMOVE repetition on the
+			 * end-vma->vm_end range, but the manager can
+			 * handle a repetition fine.
+			 */
+			end = vma->vm_end;
+		}
+		VM_WARN_ON(start >= end);
+	}
 	zap_page_range(vma, start, end - start);
 	return 0;
 }
@@ -554,8 +590,10 @@ static long madvise_remove(struct vm_area_struct *vma,
 	 * mmap_sem.
 	 */
 	get_file(f);
-	userfaultfd_remove(vma, prev, start, end);
-	up_read(&current->mm->mmap_sem);
+	if (userfaultfd_remove(vma, start, end)) {
+		/* mmap_sem was not released by userfaultfd_remove() */
+		up_read(&current->mm->mmap_sem);
+	}
 	error = vfs_fallocate(f,
 				FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
 				offset, end - start);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/3] userfaultfd: selftest: vm: allow to build in vm/ directory
  2017-03-02 17:37 [PATCH 0/3] userfaultfd v4.11 updates Andrea Arcangeli
  2017-03-02 17:37 ` [PATCH 1/3] userfaultfd: non-cooperative: fix fork fctx->new memleak Andrea Arcangeli
  2017-03-02 17:37 ` [PATCH 2/3] userfaultfd: non-cooperative: userfaultfd_remove revalidate vma in MADV_DONTNEED Andrea Arcangeli
@ 2017-03-02 17:37 ` Andrea Arcangeli
  2 siblings, 0 replies; 8+ messages in thread
From: Andrea Arcangeli @ 2017-03-02 17:37 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: Mike Rapoport, Dr. David Alan Gilbert, Mike Kravetz,
	Pavel Emelyanov, Hillf Danton

linux/tools/testing/selftests/vm $ make
gcc -Wall -I ../../../../usr/include     compaction_test.c -lrt -o /compaction_test
/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.4/../../../../x86_64-pc-linux-gnu/bin/ld:
cannot open output file /compaction_test: Permission denied
collect2: error: ld returned 1 exit status
make: *** [../lib.mk:54: /compaction_test] Error 1

Since commit a8ba798bc8ec663cf02e80b0dd770324de9bafd9 selftests/vm
build fails if run from the "selftests/vm" directory, but it works in
the selftests/ directory. It's quicker to be able to do a local
vm-only build after a tree wipe and this patch allows for it again.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 tools/testing/selftests/vm/Makefile | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile
index 4cff7e7..41642ba 100644
--- a/tools/testing/selftests/vm/Makefile
+++ b/tools/testing/selftests/vm/Makefile
@@ -1,5 +1,9 @@
 # Makefile for vm selftests
 
+ifndef OUTPUT
+  OUTPUT := $(shell pwd)
+endif
+
 CFLAGS = -Wall -I ../../../../usr/include $(EXTRA_CFLAGS)
 LDLIBS = -lrt
 TEST_GEN_FILES = compaction_test

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] userfaultfd: non-cooperative: fix fork fctx->new memleak
  2017-03-02 17:37 ` [PATCH 1/3] userfaultfd: non-cooperative: fix fork fctx->new memleak Andrea Arcangeli
@ 2017-03-03 11:09   ` Mike Rapoport
  2017-09-28  9:26   ` Sasha Levin
  1 sibling, 0 replies; 8+ messages in thread
From: Mike Rapoport @ 2017-03-03 11:09 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: linux-mm, Andrew Morton, Dr. David Alan Gilbert, Mike Kravetz,
	Pavel Emelyanov, Hillf Danton

> From: Mike Rapoport <rppt@linux.vnet.ibm.com>
> 
> We have a memleak in the ->new ctx if the uffd of the parent is closed
> before the fork event is read, nothing frees the new context.
> 
> Reported-by: Andrea Arcangeli <aarcange@redhat.com>

I think
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
would be appropriate here.

> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> ---
>  fs/userfaultfd.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index d2f15a6..5087a69 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -548,6 +548,15 @@ static void userfaultfd_event_wait_completion(struct userfaultfd_ctx *ctx,
>  		if (ACCESS_ONCE(ctx->released) ||
>  		    fatal_signal_pending(current)) {
>  			__remove_wait_queue(&ctx->event_wqh, &ewq->wq);
> +			if (ewq->msg.event == UFFD_EVENT_FORK) {
> +				struct userfaultfd_ctx *new;
> +
> +				new = (struct userfaultfd_ctx *)
> +					(unsigned long)
> +					ewq->msg.arg.reserved.reserved1;
> +
> +				userfaultfd_ctx_put(new);
> +			}
>  			break;
>  		}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] userfaultfd: non-cooperative: userfaultfd_remove revalidate vma in MADV_DONTNEED
  2017-03-02 17:37 ` [PATCH 2/3] userfaultfd: non-cooperative: userfaultfd_remove revalidate vma in MADV_DONTNEED Andrea Arcangeli
@ 2017-03-03 19:48   ` Mike Rapoport
  0 siblings, 0 replies; 8+ messages in thread
From: Mike Rapoport @ 2017-03-03 19:48 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: linux-mm, Andrew Morton, Dr. David Alan Gilbert, Mike Kravetz,
	Pavel Emelyanov, Hillf Danton

> userfaultfd_remove() has to be execute before zapping the pagetables or
> UFFDIO_COPY could keep filling pages after zap_page_range returned,
> which would result in non zero data after a MADV_DONTNEED.
> 
> However userfaultfd_remove() may have to release the mmap_sem. This
> was handled correctly in MADV_REMOVE, but MADV_DONTNEED accessed a
> potentially stale vma (the very vma passed to zap_page_range(vma, ...)).
> 
> The fix consists in revalidating the vma in case userfaultfd_remove()
> had to release the mmap_sem.
> 
> This also optimizes away an unnecessary down_read/up_read in the
> MADV_REMOVE case if UFFD_EVENT_FORK had to be delivered.
> 
> It all remains zero runtime cost in case CONFIG_USERFAULTFD=n as
> userfaultfd_remove() will be defined as "true" at build time.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>

Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>

> ---
>  fs/userfaultfd.c              |  9 +++------
>  include/linux/userfaultfd_k.h |  7 +++----
>  mm/madvise.c                  | 44 ++++++++++++++++++++++++++++++++++++++++---
>  3 files changed, 47 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index 5087a69..2104811 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -694,8 +694,7 @@ void mremap_userfaultfd_complete(struct vm_userfaultfd_ctx *vm_ctx,
>  	userfaultfd_event_wait_completion(ctx, &ewq);
>  }
>  
> -void userfaultfd_remove(struct vm_area_struct *vma,
> -			struct vm_area_struct **prev,
> +bool userfaultfd_remove(struct vm_area_struct *vma,
>  			unsigned long start, unsigned long end)
>  {
>  	struct mm_struct *mm = vma->vm_mm;
> @@ -704,13 +703,11 @@ void userfaultfd_remove(struct vm_area_struct *vma,
>  
>  	ctx = vma->vm_userfaultfd_ctx.ctx;
>  	if (!ctx || !(ctx->features & UFFD_FEATURE_EVENT_REMOVE))
> -		return;
> +		return true;
>  
>  	userfaultfd_ctx_get(ctx);
>  	up_read(&mm->mmap_sem);
>  
> -	*prev = NULL; /* We wait for ACK w/o the mmap semaphore */
> -
>  	msg_init(&ewq.msg);
>  
>  	ewq.msg.event = UFFD_EVENT_REMOVE;
> @@ -719,7 +716,7 @@ void userfaultfd_remove(struct vm_area_struct *vma,
>  
>  	userfaultfd_event_wait_completion(ctx, &ewq);
>  
> -	down_read(&mm->mmap_sem);
> +	return false;
>  }
>  
>  static bool has_unmap_ctx(struct userfaultfd_ctx *ctx, struct list_head *unmaps,
> diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
> index f2b79bf..48a3483 100644
> --- a/include/linux/userfaultfd_k.h
> +++ b/include/linux/userfaultfd_k.h
> @@ -61,8 +61,7 @@ extern void mremap_userfaultfd_complete(struct vm_userfaultfd_ctx *,
>  					unsigned long from, unsigned long to,
>  					unsigned long len);
>  
> -extern void userfaultfd_remove(struct vm_area_struct *vma,
> -			       struct vm_area_struct **prev,
> +extern bool userfaultfd_remove(struct vm_area_struct *vma,
>  			       unsigned long start,
>  			       unsigned long end);
>  
> @@ -118,11 +117,11 @@ static inline void mremap_userfaultfd_complete(struct vm_userfaultfd_ctx *ctx,
>  {
>  }
>  
> -static inline void userfaultfd_remove(struct vm_area_struct *vma,
> -				      struct vm_area_struct **prev,
> +static inline bool userfaultfd_remove(struct vm_area_struct *vma,
>  				      unsigned long start,
>  				      unsigned long end)
>  {
> +	return true;
>  }
>  
>  static inline int userfaultfd_unmap_prep(struct vm_area_struct *vma,
> diff --git a/mm/madvise.c b/mm/madvise.c
> index dc5927c..7a2abf0 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -513,7 +513,43 @@ static long madvise_dontneed(struct vm_area_struct *vma,
>  	if (!can_madv_dontneed_vma(vma))
>  		return -EINVAL;
>  
> -	userfaultfd_remove(vma, prev, start, end);
> +	if (!userfaultfd_remove(vma, start, end)) {
> +		*prev = NULL; /* mmap_sem has been dropped, prev is stale */
> +
> +		down_read(&current->mm->mmap_sem);
> +		vma = find_vma(current->mm, start);
> +		if (!vma)
> +			return -ENOMEM;
> +		if (start < vma->vm_start) {
> +			/*
> +			 * This "vma" under revalidation is the one
> +			 * with the lowest vma->vm_start where start
> +			 * is also < vma->vm_end. If start <
> +			 * vma->vm_start it means an hole materialized
> +			 * in the user address space within the
> +			 * virtual range passed to MADV_DONTNEED.
> +			 */
> +			return -ENOMEM;
> +		}
> +		if (!can_madv_dontneed_vma(vma))
> +			return -EINVAL;
> +		if (end > vma->vm_end) {
> +			/*
> +			 * Don't fail if end > vma->vm_end. If the old
> +			 * vma was splitted while the mmap_sem was
> +			 * released the effect of the concurrent
> +			 * operation may not cause MADV_DONTNEED to
> +			 * have an undefined result. There may be an
> +			 * adjacent next vma that we'll walk
> +			 * next. userfaultfd_remove() will generate an
> +			 * UFFD_EVENT_REMOVE repetition on the
> +			 * end-vma->vm_end range, but the manager can
> +			 * handle a repetition fine.
> +			 */
> +			end = vma->vm_end;
> +		}
> +		VM_WARN_ON(start >= end);
> +	}
>  	zap_page_range(vma, start, end - start);
>  	return 0;
>  }
> @@ -554,8 +590,10 @@ static long madvise_remove(struct vm_area_struct *vma,
>  	 * mmap_sem.
>  	 */
>  	get_file(f);
> -	userfaultfd_remove(vma, prev, start, end);
> -	up_read(&current->mm->mmap_sem);
> +	if (userfaultfd_remove(vma, start, end)) {
> +		/* mmap_sem was not released by userfaultfd_remove() */
> +		up_read(&current->mm->mmap_sem);
> +	}
>  	error = vfs_fallocate(f,
>  				FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
>  				offset, end - start);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] userfaultfd: non-cooperative: fix fork fctx->new memleak
  2017-03-02 17:37 ` [PATCH 1/3] userfaultfd: non-cooperative: fix fork fctx->new memleak Andrea Arcangeli
  2017-03-03 11:09   ` Mike Rapoport
@ 2017-09-28  9:26   ` Sasha Levin
  2017-09-28 10:16     ` Andrea Arcangeli
  1 sibling, 1 reply; 8+ messages in thread
From: Sasha Levin @ 2017-09-28  9:26 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: linux-mm, Andrew Morton, Mike Rapoport, Dr. David Alan Gilbert,
	Mike Kravetz, Pavel Emelyanov, Hillf Danton, alexander.levin

On Thu, Mar 2, 2017 at 9:37 AM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> From: Mike Rapoport <rppt@linux.vnet.ibm.com>
>
> We have a memleak in the ->new ctx if the uffd of the parent is closed
> before the fork event is read, nothing frees the new context.

Hey Mike,

This seems to result in the following:

==================================================================
BUG: KASAN: use-after-free in resolve_userfault_fork
fs/userfaultfd.c:967 [inline]
BUG: KASAN: use-after-free in userfaultfd_ctx_read+0xa2e/0x2110
fs/userfaultfd.c:1093
Read of size 4 at addr ffff880064944204 by task syz-executor5/6524

CPU: 1 PID: 6524 Comm: syz-executor5 Not tainted 4.13.0-next-20170908+ #246
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.10.1-1ubuntu1 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x11d/0x1e5 lib/dump_stack.c:52
 print_address_description+0xcb/0x250 mm/kasan/report.c:252
 kasan_report_error mm/kasan/report.c:351 [inline]
 kasan_report+0x275/0x360 mm/kasan/report.c:409
 __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:429
 resolve_userfault_fork fs/userfaultfd.c:967 [inline]
 userfaultfd_ctx_read+0xa2e/0x2110 fs/userfaultfd.c:1093
 userfaultfd_read+0x1a3/0x260 fs/userfaultfd.c:1126
 do_loop_readv_writev fs/read_write.c:693 [inline]
 do_iter_read+0x3db/0x5b0 fs/read_write.c:917
 vfs_readv+0x130/0x1d0 fs/read_write.c:979
 do_readv+0x108/0x2d0 fs/read_write.c:1012
 SYSC_readv fs/read_write.c:1099 [inline]
 SyS_readv+0x27/0x30 fs/read_write.c:1096
 do_syscall_64+0x26a/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x452ea9
RSP: 002b:00007fa93a8bac08 EFLAGS: 00000216 ORIG_RAX: 0000000000000013
RAX: ffffffffffffffda RBX: 0000000000718210 RCX: 0000000000452ea9
RDX: 0000000000000001 RSI: 000000002000f000 RDI: 0000000000000015
RBP: 0000000000004000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004bc64f
R13: 00000000ffffffff R14: 0000000000000015 R15: 000000002000f000

Allocated by task 6495:
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
 save_stack+0x43/0xd0 mm/kasan/kasan.c:447
 set_track mm/kasan/kasan.c:459 [inline]
 kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489
 slab_post_alloc_hook mm/slab.h:444 [inline]
 slab_alloc_node mm/slub.c:2745 [inline]
 slab_alloc mm/slub.c:2753 [inline]
 kmem_cache_alloc+0x121/0x390 mm/slub.c:2758
 dup_userfaultfd+0x21e/0x890 fs/userfaultfd.c:653
 dup_mmap kernel/fork.c:658 [inline]
 dup_mm kernel/fork.c:1179 [inline]
 copy_mm+0xa49/0x130d kernel/fork.c:1233
 copy_process.part.30+0x21d5/0x4f00 kernel/fork.c:1735
 copy_process kernel/fork.c:1548 [inline]
 _do_fork+0x1f7/0x1050 kernel/fork.c:2027
 SYSC_clone kernel/fork.c:2137 [inline]
 SyS_clone+0x37/0x50 kernel/fork.c:2131
 do_syscall_64+0x26a/0x800 arch/x86/entry/common.c:287
 return_from_SYSCALL_64+0x0/0x7a

Freed by task 6495:
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
 save_stack+0x43/0xd0 mm/kasan/kasan.c:447
 set_track mm/kasan/kasan.c:459 [inline]
 kasan_slab_free+0x72/0xc0 mm/kasan/kasan.c:524
 slab_free_hook mm/slub.c:1390 [inline]
 slab_free_freelist_hook mm/slub.c:1412 [inline]
 slab_free mm/slub.c:2988 [inline]
 kmem_cache_free+0xb6/0x320 mm/slub.c:3010
 userfaultfd_ctx_put+0x50c/0x740 fs/userfaultfd.c:165
 userfaultfd_event_wait_completion+0x763/0x930 fs/userfaultfd.c:599
 dup_fctx fs/userfaultfd.c:687 [inline]
 dup_userfaultfd_complete+0x2e0/0x480 fs/userfaultfd.c:695
 dup_mmap kernel/fork.c:726 [inline]
 dup_mm kernel/fork.c:1179 [inline]
 copy_mm+0xe7c/0x130d kernel/fork.c:1233
 copy_process.part.30+0x21d5/0x4f00 kernel/fork.c:1735
 copy_process kernel/fork.c:1548 [inline]
 _do_fork+0x1f7/0x1050 kernel/fork.c:2027
 SYSC_clone kernel/fork.c:2137 [inline]
 SyS_clone+0x37/0x50 kernel/fork.c:2131
 do_syscall_64+0x26a/0x800 arch/x86/entry/common.c:287
 return_from_SYSCALL_64+0x0/0x7a

The buggy address belongs to the object at ffff880064944040
 which belongs to the cache userfaultfd_ctx_cache of size 480
The buggy address is located 452 bytes inside of
 480-byte region [ffff880064944040, ffff880064944220)
The buggy address belongs to the page:
page:ffffea0001925100 count:1 mapcount:0 mapping:          (null)
index:0xffff880064947bc0 compound_mapcount: 0
flags: 0x4fffe0000008100(slab|head)
raw: 04fffe0000008100 0000000000000000 ffff880064947bc0 0000000100120001
raw: ffff88006a556d98 ffff88006a556d98 ffff88003e7c31c0 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff880064944100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff880064944180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff880064944200: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
                   ^
 ffff880064944280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff880064944300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================

Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] userfaultfd: non-cooperative: fix fork fctx->new memleak
  2017-09-28  9:26   ` Sasha Levin
@ 2017-09-28 10:16     ` Andrea Arcangeli
  0 siblings, 0 replies; 8+ messages in thread
From: Andrea Arcangeli @ 2017-09-28 10:16 UTC (permalink / raw)
  To: Sasha Levin
  Cc: linux-mm, Andrew Morton, Mike Rapoport, Dr. David Alan Gilbert,
	Mike Kravetz, Pavel Emelyanov, Hillf Danton, alexander.levin

Hello Sasha,

On Thu, Sep 28, 2017 at 02:26:42AM -0700, Sasha Levin wrote:
> On Thu, Mar 2, 2017 at 9:37 AM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> > From: Mike Rapoport <rppt@linux.vnet.ibm.com>
> >
> > We have a memleak in the ->new ctx if the uffd of the parent is closed
> > before the fork event is read, nothing frees the new context.
> 
> Hey Mike,
> 
> This seems to result in the following:

Andrew just included this fix in -mm:

https://lkml.org/lkml/2017/9/20/571

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-09-28 10:16 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-02 17:37 [PATCH 0/3] userfaultfd v4.11 updates Andrea Arcangeli
2017-03-02 17:37 ` [PATCH 1/3] userfaultfd: non-cooperative: fix fork fctx->new memleak Andrea Arcangeli
2017-03-03 11:09   ` Mike Rapoport
2017-09-28  9:26   ` Sasha Levin
2017-09-28 10:16     ` Andrea Arcangeli
2017-03-02 17:37 ` [PATCH 2/3] userfaultfd: non-cooperative: userfaultfd_remove revalidate vma in MADV_DONTNEED Andrea Arcangeli
2017-03-03 19:48   ` Mike Rapoport
2017-03-02 17:37 ` [PATCH 3/3] userfaultfd: selftest: vm: allow to build in vm/ directory Andrea Arcangeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).