linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] userfaultfd: don't pin the user memory in userfaultfd_file_create()
@ 2016-05-16 15:25 Oleg Nesterov
  2016-05-16 15:25 ` [PATCH 1/1] " Oleg Nesterov
  0 siblings, 1 reply; 8+ messages in thread
From: Oleg Nesterov @ 2016-05-16 15:25 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli; +Cc: Linus Torvalds, linux-kernel, linux-mm

Hello,

Sorry for delay. So this is the same patch, just I added the helpers for get/put
mm->mm_users. I won't mind to change userfaultfd_get_mm() to return mm_struct-or-
NULL, or perhaps instead we should simply add the trivial helper which does
atomic_inc_not_zero(mm->mm_users) into sched.h, it can have more callers (fs/proc,
uprobes).

Testing. I have found selftests/vm/userfaultfd.c and it seems to work.

Oleg.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/1] userfaultfd: don't pin the user memory in userfaultfd_file_create()
  2016-05-16 15:25 [PATCH 0/1] userfaultfd: don't pin the user memory in userfaultfd_file_create() Oleg Nesterov
@ 2016-05-16 15:25 ` Oleg Nesterov
  2016-05-16 15:57   ` Andrea Arcangeli
  2016-05-16 17:22   ` [PATCH v2 " Oleg Nesterov
  0 siblings, 2 replies; 8+ messages in thread
From: Oleg Nesterov @ 2016-05-16 15:25 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli; +Cc: Linus Torvalds, linux-kernel, linux-mm

userfaultfd_file_create() increments mm->mm_users; this means that the memory
won't be unmapped/freed if mm owner exits/execs, and UFFDIO_COPY after that can
populate the orphaned mm more.

Change userfaultfd_file_create() and userfaultfd_ctx_put() to use mm->mm_count
to pin mm_struct. This means that atomic_inc_not_zero(mm->mm_users) is needed
when we are going to actually play with this memory. Except handle_userfault()
path doesn't need this, the caller must already have a reference.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/userfaultfd.c | 55 ++++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 42 insertions(+), 13 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 66cdb44..1a2f38a 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -70,6 +70,20 @@ struct userfaultfd_wake_range {
 	unsigned long len;
 };
 
+/*
+ * mm_struct can't go away, but we need to verify that this memory is still
+ * alive and avoid the race with exit_mmap().
+ */
+static inline bool userfaultfd_get_mm(struct userfaultfd_ctx *ctx)
+{
+	return atomic_inc_not_zero(&ctx->mm->mm_users);
+}
+
+static inline void userfaultfd_put_mm(struct userfaultfd_ctx *ctx)
+{
+	mmput(ctx->mm);
+}
+
 static int userfaultfd_wake_function(wait_queue_t *wq, unsigned mode,
 				     int wake_flags, void *key)
 {
@@ -137,7 +151,7 @@ static void userfaultfd_ctx_put(struct userfaultfd_ctx *ctx)
 		VM_BUG_ON(waitqueue_active(&ctx->fault_wqh));
 		VM_BUG_ON(spin_is_locked(&ctx->fd_wqh.lock));
 		VM_BUG_ON(waitqueue_active(&ctx->fd_wqh));
-		mmput(ctx->mm);
+		mmdrop(ctx->mm);
 		kmem_cache_free(userfaultfd_ctx_cachep, ctx);
 	}
 }
@@ -434,6 +448,9 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
 
 	ACCESS_ONCE(ctx->released) = true;
 
+	if (!userfaultfd_get_mm(ctx))
+		goto wakeup;
+
 	/*
 	 * Flush page faults out of all CPUs. NOTE: all page faults
 	 * must be retried without returning VM_FAULT_SIGBUS if
@@ -466,7 +483,8 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
 		vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
 	}
 	up_write(&mm->mmap_sem);
-
+	userfaultfd_put_mm(ctx);
+wakeup:
 	/*
 	 * After no new page faults can wait on this fault_*wqh, flush
 	 * the last page faults that may have been already waiting on
@@ -760,10 +778,12 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
 	start = uffdio_register.range.start;
 	end = start + uffdio_register.range.len;
 
+	ret = -ENOMEM;
+	if (!userfaultfd_get_mm(ctx))
+		goto out;
+
 	down_write(&mm->mmap_sem);
 	vma = find_vma_prev(mm, start, &prev);
-
-	ret = -ENOMEM;
 	if (!vma)
 		goto out_unlock;
 
@@ -864,6 +884,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
 	} while (vma && vma->vm_start < end);
 out_unlock:
 	up_write(&mm->mmap_sem);
+	userfaultfd_put_mm(ctx);
 	if (!ret) {
 		/*
 		 * Now that we scanned all vmas we can already tell
@@ -902,10 +923,12 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
 	start = uffdio_unregister.start;
 	end = start + uffdio_unregister.len;
 
+	ret = -ENOMEM;
+	if (!userfaultfd_get_mm(ctx))
+		goto out;
+
 	down_write(&mm->mmap_sem);
 	vma = find_vma_prev(mm, start, &prev);
-
-	ret = -ENOMEM;
 	if (!vma)
 		goto out_unlock;
 
@@ -998,6 +1021,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
 	} while (vma && vma->vm_start < end);
 out_unlock:
 	up_write(&mm->mmap_sem);
+	userfaultfd_put_mm(ctx);
 out:
 	return ret;
 }
@@ -1067,9 +1091,11 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx,
 		goto out;
 	if (uffdio_copy.mode & ~UFFDIO_COPY_MODE_DONTWAKE)
 		goto out;
-
-	ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src,
-			   uffdio_copy.len);
+	if (userfaultfd_get_mm(ctx)) {
+		ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src,
+				   uffdio_copy.len);
+		userfaultfd_put_mm(ctx);
+	}
 	if (unlikely(put_user(ret, &user_uffdio_copy->copy)))
 		return -EFAULT;
 	if (ret < 0)
@@ -1110,8 +1136,11 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx,
 	if (uffdio_zeropage.mode & ~UFFDIO_ZEROPAGE_MODE_DONTWAKE)
 		goto out;
 
-	ret = mfill_zeropage(ctx->mm, uffdio_zeropage.range.start,
-			     uffdio_zeropage.range.len);
+	if (userfaultfd_get_mm(ctx)) {
+		ret = mfill_zeropage(ctx->mm, uffdio_zeropage.range.start,
+				     uffdio_zeropage.range.len);
+		userfaultfd_put_mm(ctx);
+	}
 	if (unlikely(put_user(ret, &user_uffdio_zeropage->zeropage)))
 		return -EFAULT;
 	if (ret < 0)
@@ -1289,12 +1318,12 @@ static struct file *userfaultfd_file_create(int flags)
 	ctx->released = false;
 	ctx->mm = current->mm;
 	/* prevent the mm struct to be freed */
-	atomic_inc(&ctx->mm->mm_users);
+	atomic_inc(&ctx->mm->mm_count);
 
 	file = anon_inode_getfile("[userfaultfd]", &userfaultfd_fops, ctx,
 				  O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS));
 	if (IS_ERR(file)) {
-		mmput(ctx->mm);
+		mmdrop(ctx->mm);
 		kmem_cache_free(userfaultfd_ctx_cachep, ctx);
 	}
 out:
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] userfaultfd: don't pin the user memory in userfaultfd_file_create()
  2016-05-16 15:25 ` [PATCH 1/1] " Oleg Nesterov
@ 2016-05-16 15:57   ` Andrea Arcangeli
  2016-05-16 16:20     ` Oleg Nesterov
  2016-05-16 17:22   ` [PATCH v2 " Oleg Nesterov
  1 sibling, 1 reply; 8+ messages in thread
From: Andrea Arcangeli @ 2016-05-16 15:57 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Andrew Morton, Linus Torvalds, linux-kernel, linux-mm

On Mon, May 16, 2016 at 05:25:46PM +0200, Oleg Nesterov wrote:
> userfaultfd_file_create() increments mm->mm_users; this means that the memory
> won't be unmapped/freed if mm owner exits/execs, and UFFDIO_COPY after that can
> populate the orphaned mm more.
> 
> Change userfaultfd_file_create() and userfaultfd_ctx_put() to use mm->mm_count
> to pin mm_struct. This means that atomic_inc_not_zero(mm->mm_users) is needed
> when we are going to actually play with this memory. Except handle_userfault()
> path doesn't need this, the caller must already have a reference.

This is nice and desired improvement to reduce the pinning from the
"mm" as a whole to just the "mm struct". The code used mm_users for
simplicity, but using mm_count was definitely wanted to always keep
the memory footprint as low as possible (especially to avoid some
latency in the footprint reduction in the future non-cooperative
usage).

Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>

> +static inline bool userfaultfd_get_mm(struct userfaultfd_ctx *ctx)
> +{
> +	return atomic_inc_not_zero(&ctx->mm->mm_users);
> +}

Nice cleanup, but wouldn't it be more generic to implement this as
mmget(&ctx->mm) (or maybe mmget_not_zero) in include/linux/mm.h
instead of userfaultfd.c, so then others can use it too, see:

drivers/gpu/drm/i915/i915_gem_userptr.c:                if (atomic_inc_not_zero(&mm->mm_users)) {
drivers/iommu/intel-svm.c:              if (!atomic_inc_not_zero(&svm->mm->mm_users))
fs/proc/base.c: if (!atomic_inc_not_zero(&mm->mm_users))
fs/proc/base.c: if (!atomic_inc_not_zero(&mm->mm_users))
fs/proc/task_mmu.c:     if (!mm || !atomic_inc_not_zero(&mm->mm_users))
fs/proc/task_mmu.c:     if (!mm || !atomic_inc_not_zero(&mm->mm_users))
fs/proc/task_nommu.c:   if (!mm || !atomic_inc_not_zero(&mm->mm_users))
kernel/events/uprobes.c:                if (!atomic_inc_not_zero(&vma->vm_mm->mm_users))
mm/oom_kill.c:  if (!atomic_inc_not_zero(&mm->mm_users)) {
mm/swapfile.c:                          if (!atomic_inc_not_zero(&mm->mm_users))

Anyway this is just an idea, userfaultfd_get_mm is sure fine with me.

Thanks,
Andrea

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] userfaultfd: don't pin the user memory in userfaultfd_file_create()
  2016-05-16 15:57   ` Andrea Arcangeli
@ 2016-05-16 16:20     ` Oleg Nesterov
  0 siblings, 0 replies; 8+ messages in thread
From: Oleg Nesterov @ 2016-05-16 16:20 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Andrew Morton, Linus Torvalds, linux-kernel, linux-mm

On 05/16, Andrea Arcangeli wrote:
>
> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>

Thanks,

> > +static inline bool userfaultfd_get_mm(struct userfaultfd_ctx *ctx)
> > +{
> > +	return atomic_inc_not_zero(&ctx->mm->mm_users);
> > +}
>
> Nice cleanup, but wouldn't it be more generic to implement this as
> mmget(&ctx->mm) (or maybe mmget_not_zero) in include/linux/mm.h
> instead of userfaultfd.c, so then others can use it too, see:

Yes, agreed. userfaultfd_get_mm() doesn't look as good as I initially thought.

So I guess it would be better to make V2 right now, to avoid another change in
userfaultfd.c which changes the same code.

Except I think mmget_not_zero() should go to linux/sched.h, until we move
mmdrop/mmput/etc to linux/mm.h.

I'll send V2 soon...

Oleg.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/1] userfaultfd: don't pin the user memory in userfaultfd_file_create()
  2016-05-16 15:25 ` [PATCH 1/1] " Oleg Nesterov
  2016-05-16 15:57   ` Andrea Arcangeli
@ 2016-05-16 17:22   ` Oleg Nesterov
  2016-05-17 15:33     ` Michal Hocko
  1 sibling, 1 reply; 8+ messages in thread
From: Oleg Nesterov @ 2016-05-16 17:22 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli; +Cc: Linus Torvalds, linux-kernel, linux-mm

userfaultfd_file_create() increments mm->mm_users; this means that the memory
won't be unmapped/freed if mm owner exits/execs, and UFFDIO_COPY after that can
populate the orphaned mm more.

Change userfaultfd_file_create() and userfaultfd_ctx_put() to use mm->mm_count
to pin mm_struct. This means that atomic_inc_not_zero(mm->mm_users) is needed
when we are going to actually play with this memory. Except handle_userfault()
path doesn't need this, the caller must already have a reference.

The patch adds the new trivial helper, mmget_not_zero(), it can have more users.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/userfaultfd.c      | 41 ++++++++++++++++++++++++++++-------------
 include/linux/sched.h |  7 ++++++-
 2 files changed, 34 insertions(+), 14 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 66cdb44..2d97952 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -137,7 +137,7 @@ static void userfaultfd_ctx_put(struct userfaultfd_ctx *ctx)
 		VM_BUG_ON(waitqueue_active(&ctx->fault_wqh));
 		VM_BUG_ON(spin_is_locked(&ctx->fd_wqh.lock));
 		VM_BUG_ON(waitqueue_active(&ctx->fd_wqh));
-		mmput(ctx->mm);
+		mmdrop(ctx->mm);
 		kmem_cache_free(userfaultfd_ctx_cachep, ctx);
 	}
 }
@@ -434,6 +434,9 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
 
 	ACCESS_ONCE(ctx->released) = true;
 
+	if (!mmget_not_zero(mm))
+		goto wakeup;
+
 	/*
 	 * Flush page faults out of all CPUs. NOTE: all page faults
 	 * must be retried without returning VM_FAULT_SIGBUS if
@@ -466,7 +469,8 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
 		vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
 	}
 	up_write(&mm->mmap_sem);
-
+	mmput(mm);
+wakeup:
 	/*
 	 * After no new page faults can wait on this fault_*wqh, flush
 	 * the last page faults that may have been already waiting on
@@ -760,10 +764,12 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
 	start = uffdio_register.range.start;
 	end = start + uffdio_register.range.len;
 
+	ret = -ENOMEM;
+	if (!mmget_not_zero(mm))
+		goto out;
+
 	down_write(&mm->mmap_sem);
 	vma = find_vma_prev(mm, start, &prev);
-
-	ret = -ENOMEM;
 	if (!vma)
 		goto out_unlock;
 
@@ -864,6 +870,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
 	} while (vma && vma->vm_start < end);
 out_unlock:
 	up_write(&mm->mmap_sem);
+	mmput(mm);
 	if (!ret) {
 		/*
 		 * Now that we scanned all vmas we can already tell
@@ -902,10 +909,12 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
 	start = uffdio_unregister.start;
 	end = start + uffdio_unregister.len;
 
+	ret = -ENOMEM;
+	if (!mmget_not_zero(mm))
+		goto out;
+
 	down_write(&mm->mmap_sem);
 	vma = find_vma_prev(mm, start, &prev);
-
-	ret = -ENOMEM;
 	if (!vma)
 		goto out_unlock;
 
@@ -998,6 +1007,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
 	} while (vma && vma->vm_start < end);
 out_unlock:
 	up_write(&mm->mmap_sem);
+	mmput(mm);
 out:
 	return ret;
 }
@@ -1067,9 +1077,11 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx,
 		goto out;
 	if (uffdio_copy.mode & ~UFFDIO_COPY_MODE_DONTWAKE)
 		goto out;
-
-	ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src,
-			   uffdio_copy.len);
+	if (mmget_not_zero(ctx->mm)) {
+		ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src,
+				   uffdio_copy.len);
+		mmput(ctx->mm);
+	}
 	if (unlikely(put_user(ret, &user_uffdio_copy->copy)))
 		return -EFAULT;
 	if (ret < 0)
@@ -1110,8 +1122,11 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx,
 	if (uffdio_zeropage.mode & ~UFFDIO_ZEROPAGE_MODE_DONTWAKE)
 		goto out;
 
-	ret = mfill_zeropage(ctx->mm, uffdio_zeropage.range.start,
-			     uffdio_zeropage.range.len);
+	if (mmget_not_zero(ctx->mm)) {
+		ret = mfill_zeropage(ctx->mm, uffdio_zeropage.range.start,
+				     uffdio_zeropage.range.len);
+		mmput(ctx->mm);
+	}
 	if (unlikely(put_user(ret, &user_uffdio_zeropage->zeropage)))
 		return -EFAULT;
 	if (ret < 0)
@@ -1289,12 +1304,12 @@ static struct file *userfaultfd_file_create(int flags)
 	ctx->released = false;
 	ctx->mm = current->mm;
 	/* prevent the mm struct to be freed */
-	atomic_inc(&ctx->mm->mm_users);
+	atomic_inc(&ctx->mm->mm_count);
 
 	file = anon_inode_getfile("[userfaultfd]", &userfaultfd_fops, ctx,
 				  O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS));
 	if (IS_ERR(file)) {
-		mmput(ctx->mm);
+		mmdrop(ctx->mm);
 		kmem_cache_free(userfaultfd_ctx_cachep, ctx);
 	}
 out:
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 52c4847..49997bf 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2610,12 +2610,17 @@ extern struct mm_struct * mm_alloc(void);
 
 /* mmdrop drops the mm and the page tables */
 extern void __mmdrop(struct mm_struct *);
-static inline void mmdrop(struct mm_struct * mm)
+static inline void mmdrop(struct mm_struct *mm)
 {
 	if (unlikely(atomic_dec_and_test(&mm->mm_count)))
 		__mmdrop(mm);
 }
 
+static inline bool mmget_not_zero(struct mm_struct *mm)
+{
+	return atomic_inc_not_zero(&mm->mm_users);
+}
+
 /* mmput gets rid of the mappings and all user-space */
 extern void mmput(struct mm_struct *);
 /* Grab a reference to a task's mm, if it is not already going away */
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/1] userfaultfd: don't pin the user memory in userfaultfd_file_create()
  2016-05-16 17:22   ` [PATCH v2 " Oleg Nesterov
@ 2016-05-17 15:33     ` Michal Hocko
  2016-05-17 16:30       ` Oleg Nesterov
  0 siblings, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2016-05-17 15:33 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Andrea Arcangeli, Linus Torvalds, linux-kernel, linux-mm

On Mon 16-05-16 19:22:54, Oleg Nesterov wrote:
> userfaultfd_file_create() increments mm->mm_users; this means that the memory
> won't be unmapped/freed if mm owner exits/execs, and UFFDIO_COPY after that can
> populate the orphaned mm more.
> 
> Change userfaultfd_file_create() and userfaultfd_ctx_put() to use mm->mm_count
> to pin mm_struct. This means that atomic_inc_not_zero(mm->mm_users) is needed
> when we are going to actually play with this memory. Except handle_userfault()
> path doesn't need this, the caller must already have a reference.

We should definitely get rid of all unbound pinning via mm_users.
 
> The patch adds the new trivial helper, mmget_not_zero(), it can have more users.

Is this really helpful?

> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

The patch seems good to me but I am not familiar with the userfaultfd
internals enought to give you reviewed-by nor acked-by. I welcome the
change anyway.

> ---
>  fs/userfaultfd.c      | 41 ++++++++++++++++++++++++++++-------------
>  include/linux/sched.h |  7 ++++++-
>  2 files changed, 34 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index 66cdb44..2d97952 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -137,7 +137,7 @@ static void userfaultfd_ctx_put(struct userfaultfd_ctx *ctx)
>  		VM_BUG_ON(waitqueue_active(&ctx->fault_wqh));
>  		VM_BUG_ON(spin_is_locked(&ctx->fd_wqh.lock));
>  		VM_BUG_ON(waitqueue_active(&ctx->fd_wqh));
> -		mmput(ctx->mm);
> +		mmdrop(ctx->mm);
>  		kmem_cache_free(userfaultfd_ctx_cachep, ctx);
>  	}
>  }
> @@ -434,6 +434,9 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
>  
>  	ACCESS_ONCE(ctx->released) = true;
>  
> +	if (!mmget_not_zero(mm))
> +		goto wakeup;
> +
>  	/*
>  	 * Flush page faults out of all CPUs. NOTE: all page faults
>  	 * must be retried without returning VM_FAULT_SIGBUS if
> @@ -466,7 +469,8 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
>  		vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
>  	}
>  	up_write(&mm->mmap_sem);
> -
> +	mmput(mm);
> +wakeup:
>  	/*
>  	 * After no new page faults can wait on this fault_*wqh, flush
>  	 * the last page faults that may have been already waiting on
> @@ -760,10 +764,12 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
>  	start = uffdio_register.range.start;
>  	end = start + uffdio_register.range.len;
>  
> +	ret = -ENOMEM;
> +	if (!mmget_not_zero(mm))
> +		goto out;
> +
>  	down_write(&mm->mmap_sem);
>  	vma = find_vma_prev(mm, start, &prev);
> -
> -	ret = -ENOMEM;
>  	if (!vma)
>  		goto out_unlock;
>  
> @@ -864,6 +870,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
>  	} while (vma && vma->vm_start < end);
>  out_unlock:
>  	up_write(&mm->mmap_sem);
> +	mmput(mm);
>  	if (!ret) {
>  		/*
>  		 * Now that we scanned all vmas we can already tell
> @@ -902,10 +909,12 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
>  	start = uffdio_unregister.start;
>  	end = start + uffdio_unregister.len;
>  
> +	ret = -ENOMEM;
> +	if (!mmget_not_zero(mm))
> +		goto out;
> +
>  	down_write(&mm->mmap_sem);
>  	vma = find_vma_prev(mm, start, &prev);
> -
> -	ret = -ENOMEM;
>  	if (!vma)
>  		goto out_unlock;
>  
> @@ -998,6 +1007,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
>  	} while (vma && vma->vm_start < end);
>  out_unlock:
>  	up_write(&mm->mmap_sem);
> +	mmput(mm);
>  out:
>  	return ret;
>  }
> @@ -1067,9 +1077,11 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx,
>  		goto out;
>  	if (uffdio_copy.mode & ~UFFDIO_COPY_MODE_DONTWAKE)
>  		goto out;
> -
> -	ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src,
> -			   uffdio_copy.len);
> +	if (mmget_not_zero(ctx->mm)) {
> +		ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src,
> +				   uffdio_copy.len);
> +		mmput(ctx->mm);
> +	}
>  	if (unlikely(put_user(ret, &user_uffdio_copy->copy)))
>  		return -EFAULT;
>  	if (ret < 0)
> @@ -1110,8 +1122,11 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx,
>  	if (uffdio_zeropage.mode & ~UFFDIO_ZEROPAGE_MODE_DONTWAKE)
>  		goto out;
>  
> -	ret = mfill_zeropage(ctx->mm, uffdio_zeropage.range.start,
> -			     uffdio_zeropage.range.len);
> +	if (mmget_not_zero(ctx->mm)) {
> +		ret = mfill_zeropage(ctx->mm, uffdio_zeropage.range.start,
> +				     uffdio_zeropage.range.len);
> +		mmput(ctx->mm);
> +	}
>  	if (unlikely(put_user(ret, &user_uffdio_zeropage->zeropage)))
>  		return -EFAULT;
>  	if (ret < 0)
> @@ -1289,12 +1304,12 @@ static struct file *userfaultfd_file_create(int flags)
>  	ctx->released = false;
>  	ctx->mm = current->mm;
>  	/* prevent the mm struct to be freed */
> -	atomic_inc(&ctx->mm->mm_users);
> +	atomic_inc(&ctx->mm->mm_count);
>  
>  	file = anon_inode_getfile("[userfaultfd]", &userfaultfd_fops, ctx,
>  				  O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS));
>  	if (IS_ERR(file)) {
> -		mmput(ctx->mm);
> +		mmdrop(ctx->mm);
>  		kmem_cache_free(userfaultfd_ctx_cachep, ctx);
>  	}
>  out:
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 52c4847..49997bf 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2610,12 +2610,17 @@ extern struct mm_struct * mm_alloc(void);
>  
>  /* mmdrop drops the mm and the page tables */
>  extern void __mmdrop(struct mm_struct *);
> -static inline void mmdrop(struct mm_struct * mm)
> +static inline void mmdrop(struct mm_struct *mm)
>  {
>  	if (unlikely(atomic_dec_and_test(&mm->mm_count)))
>  		__mmdrop(mm);
>  }
>  
> +static inline bool mmget_not_zero(struct mm_struct *mm)
> +{
> +	return atomic_inc_not_zero(&mm->mm_users);
> +}
> +
>  /* mmput gets rid of the mappings and all user-space */
>  extern void mmput(struct mm_struct *);
>  /* Grab a reference to a task's mm, if it is not already going away */
> -- 
> 2.5.0
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/1] userfaultfd: don't pin the user memory in userfaultfd_file_create()
  2016-05-17 15:33     ` Michal Hocko
@ 2016-05-17 16:30       ` Oleg Nesterov
  2016-05-17 20:34         ` Michal Hocko
  0 siblings, 1 reply; 8+ messages in thread
From: Oleg Nesterov @ 2016-05-17 16:30 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Andrea Arcangeli, Linus Torvalds, linux-kernel, linux-mm

On 05/17, Michal Hocko wrote:
>
> On Mon 16-05-16 19:22:54, Oleg Nesterov wrote:
>
> > The patch adds the new trivial helper, mmget_not_zero(), it can have more users.
>
> Is this really helpful?

Well, this is subjective of course, but I think the code looks a bit better this
way. uprobes, fs/proc and more can use this helper too.

And in fact the initial version of this patch did atomic_inc_not_zero(mm->users) by
hand, then it was suggested to add a helper.

> > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
>
> The patch seems good to me but I am not familiar with the userfaultfd
> internals enought to give you reviewed-by nor acked-by. I welcome the
> change anyway.

Thanks ;)

Oleg.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/1] userfaultfd: don't pin the user memory in userfaultfd_file_create()
  2016-05-17 16:30       ` Oleg Nesterov
@ 2016-05-17 20:34         ` Michal Hocko
  0 siblings, 0 replies; 8+ messages in thread
From: Michal Hocko @ 2016-05-17 20:34 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Andrea Arcangeli, Linus Torvalds, linux-kernel, linux-mm

On Tue 17-05-16 18:30:44, Oleg Nesterov wrote:
> On 05/17, Michal Hocko wrote:
> >
> > On Mon 16-05-16 19:22:54, Oleg Nesterov wrote:
> >
> > > The patch adds the new trivial helper, mmget_not_zero(), it can have more users.
> >
> > Is this really helpful?
> 
> Well, this is subjective of course, but I think the code looks a bit better this
> way. uprobes, fs/proc and more can use this helper too.
> 
> And in fact the initial version of this patch did atomic_inc_not_zero(mm->users) by
> hand, then it was suggested to add a helper.

I would prefer a more descriptive name (something like mmget_alive) but
as you say this is highly subjective and nothing that should delay this
fix.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-05-17 20:34 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-16 15:25 [PATCH 0/1] userfaultfd: don't pin the user memory in userfaultfd_file_create() Oleg Nesterov
2016-05-16 15:25 ` [PATCH 1/1] " Oleg Nesterov
2016-05-16 15:57   ` Andrea Arcangeli
2016-05-16 16:20     ` Oleg Nesterov
2016-05-16 17:22   ` [PATCH v2 " Oleg Nesterov
2016-05-17 15:33     ` Michal Hocko
2016-05-17 16:30       ` Oleg Nesterov
2016-05-17 20:34         ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).