All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm: replace mmap_sem for mm->exe_file serialization
@ 2015-02-26 19:36 ` Davidlohr Bueso
  0 siblings, 0 replies; 14+ messages in thread
From: Davidlohr Bueso @ 2015-02-26 19:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Viro, Cyrill Gorcunov, linux-mm, linux-kernel, dave

We currently use the mmap_sem to serialize the mm exe_file.
This is atrocious and a clear example of the misuses this
lock has all over the place, making any significant changes
to the address space locking that much more complex and tedious.
This also has to do of how we used to check for the vma's vm_file
being VM_EXECUTABLE (much of which was replaced by 2dd8ad81e31).

This patch, therefore, removes the mmap_sem dependency and
introduces a specific lock for the exe_file (rwlock_t, as it is
read mostly and protects a trivial critical region). As mentioned,
the motivation is to cleanup mmap_sem (as opposed to exe_file
performance). A nice side effect of this is that we avoid taking
the mmap_sem (shared) in fork paths for the exe_file handling
(note that readers block when the rwsem is taken exclusively by
another thread).

Now that callers have been updated and standardized[1, 2] around
the get_mm_set_exe_file() interface, changing the locking scheme
is quite straightforward. The exception being the prctl calls
(ie PR_SET_MM_EXE_FILE). Because this caller actually _updates_
the mm->exe_file, we need to handle it in the same patch that changes
the locking rules. For this we need to reorganize prctl_set_mm_exe_file,
such that:

o mmap_sem is taken when actually needed.

o a new set_mm_exe_file_locked() function is introduced to be used by
  prctl. We now need to explicitly acquire the exe_file_lock as before
  it was implicit in holding the mmap_sem for write.

o a new __prctl_set_mm_exe_file() helper is created, which actually
  does the exe_file handling for the mm side -- needing the write
  lock for updating the mm->flags (*sigh*). In the future we could
  have a unique mm::exe_file_struct and keep track of MMF_EXE_FILE_CHANGED
  on our own.

mm: improve handling of mm->exe_file
[1] Part 1: https://lkml.org/lkml/2015/2/18/721
[2] Part 2: https://lkml.org/lkml/2015/2/25/679

Applies on top of linux-next (20150225).

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 fs/exec.c                |  6 ++++
 include/linux/mm.h       |  3 ++
 include/linux/mm_types.h |  1 +
 kernel/fork.c            | 25 ++++++++++---
 kernel/sys.c             | 92 ++++++++++++++++++++++++++++--------------------
 5 files changed, 84 insertions(+), 43 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index c7f9b73..a5fef83 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1078,7 +1078,13 @@ int flush_old_exec(struct linux_binprm * bprm)
 	if (retval)
 		goto out;
 
+	/*
+	 * Must be called _before_ exec_mmap() as bprm->mm is
+	 * not visibile until then. This also enables the update
+	 * to be lockless.
+	 */
 	set_mm_exe_file(bprm->mm, bprm->file);
+
 	/*
 	 * Release all of the old mmap stuff
 	 */
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 47a9392..2a210b2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1907,7 +1907,10 @@ static inline int check_data_rlimit(unsigned long rlim,
 extern int mm_take_all_locks(struct mm_struct *mm);
 extern void mm_drop_all_locks(struct mm_struct *mm);
 
+/* mm->exe_file handling */
 extern void set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file);
+extern void set_mm_exe_file_locked(struct mm_struct *mm,
+				   struct file *new_exe_file);
 extern struct file *get_mm_exe_file(struct mm_struct *mm);
 
 extern int may_expand_vm(struct mm_struct *mm, unsigned long npages);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 199a03a..74e0b99 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -427,6 +427,7 @@ struct mm_struct {
 #endif
 
 	/* store ref to file /proc/<pid>/exe symlink points to */
+	rwlock_t exe_file_lock;
 	struct file *exe_file;
 #ifdef CONFIG_MMU_NOTIFIER
 	struct mmu_notifier_mm *mmu_notifier_mm;
diff --git a/kernel/fork.c b/kernel/fork.c
index cf65139..ca4499a 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -565,6 +565,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p)
 	mm_init_aio(mm);
 	mm_init_owner(mm, p);
 	mmu_notifier_mm_init(mm);
+	rwlock_init(&mm->exe_file_lock);
 	clear_tlb_flush_pending(mm);
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS
 	mm->pmd_huge_pte = NULL;
@@ -674,12 +675,25 @@ void mmput(struct mm_struct *mm)
 }
 EXPORT_SYMBOL_GPL(mmput);
 
+void set_mm_exe_file_locked(struct mm_struct *mm, struct file *new_exe_file)
+{
+	if (new_exe_file)
+		get_file(new_exe_file);
+	if (mm->exe_file)
+		fput(mm->exe_file);
+
+	write_lock(&mm->exe_file_lock);
+	mm->exe_file = new_exe_file;
+	write_unlock(&mm->exe_file_lock);
+}
+
 void set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file)
 {
 	if (new_exe_file)
 		get_file(new_exe_file);
 	if (mm->exe_file)
 		fput(mm->exe_file);
+
 	mm->exe_file = new_exe_file;
 }
 
@@ -687,19 +701,20 @@ struct file *get_mm_exe_file(struct mm_struct *mm)
 {
 	struct file *exe_file;
 
-	/* We need mmap_sem to protect against races with removal of exe_file */
-	down_read(&mm->mmap_sem);
+	read_lock(&mm->exe_file_lock);
 	exe_file = mm->exe_file;
 	if (exe_file)
 		get_file(exe_file);
-	up_read(&mm->mmap_sem);
+	read_unlock(&mm->exe_file_lock);
 	return exe_file;
 }
 
 static void dup_mm_exe_file(struct mm_struct *oldmm, struct mm_struct *newmm)
 {
-	/* It's safe to write the exe_file pointer without exe_file_lock because
-	 * this is called during fork when the task is not yet in /proc */
+	/*
+	 * It's safe to write the exe_file without the
+	 * exe_file_lock as we are just setting up the new task.
+	 */
 	newmm->exe_file = get_mm_exe_file(oldmm);
 }
 
diff --git a/kernel/sys.c b/kernel/sys.c
index 667b2e6..2ab2d1c 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1646,14 +1646,59 @@ SYSCALL_DEFINE1(umask, int, mask)
 	return mask;
 }
 
-static int prctl_set_mm_exe_file_locked(struct mm_struct *mm, unsigned int fd)
+static int __prctl_set_mm_exe_file(struct mm_struct *mm, struct fd exe)
+{
+	int err;
+	struct file *exe_file;
+
+	/*
+	 * Forbid mm->exe_file change if old file still mapped.
+	 */
+	exe_file = get_mm_exe_file(mm);
+	err = -EBUSY;
+	down_write(&mm->mmap_sem);
+	if (exe_file) {
+		struct vm_area_struct *vma;
+
+		for (vma = mm->mmap; vma; vma = vma->vm_next) {
+			if (!vma->vm_file)
+				continue;
+			if (path_equal(&vma->vm_file->f_path,
+				       &exe_file->f_path)) {
+				fput(exe_file);
+				goto exit_err;
+			}
+		}
+
+		fput(exe_file);
+	}
+
+	/*
+	 * The symlink can be changed only once, just to disallow arbitrary
+	 * transitions malicious software might bring in. This means one
+	 * could make a snapshot over all processes running and monitor
+	 * /proc/pid/exe changes to notice unusual activity if needed.
+	 */
+	err = -EPERM;
+	if (test_and_set_bit(MMF_EXE_FILE_CHANGED, &mm->flags))
+		goto exit_err;
+
+	up_write(&mm->mmap_sem);
+
+	/* this grabs a reference to exe.file */
+	set_mm_exe_file_locked(mm, exe.file);
+	return 0;
+exit_err:
+	up_write(&mm->mmap_sem);
+	return err;
+}
+
+static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
 {
 	struct fd exe;
 	struct inode *inode;
 	int err;
 
-	VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_sem), mm);
-
 	exe = fdget(fd);
 	if (!exe.file)
 		return -EBADF;
@@ -1674,32 +1719,7 @@ static int prctl_set_mm_exe_file_locked(struct mm_struct *mm, unsigned int fd)
 	if (err)
 		goto exit;
 
-	/*
-	 * Forbid mm->exe_file change if old file still mapped.
-	 */
-	err = -EBUSY;
-	if (mm->exe_file) {
-		struct vm_area_struct *vma;
-
-		for (vma = mm->mmap; vma; vma = vma->vm_next)
-			if (vma->vm_file &&
-			    path_equal(&vma->vm_file->f_path,
-				       &mm->exe_file->f_path))
-				goto exit;
-	}
-
-	/*
-	 * The symlink can be changed only once, just to disallow arbitrary
-	 * transitions malicious software might bring in. This means one
-	 * could make a snapshot over all processes running and monitor
-	 * /proc/pid/exe changes to notice unusual activity if needed.
-	 */
-	err = -EPERM;
-	if (test_and_set_bit(MMF_EXE_FILE_CHANGED, &mm->flags))
-		goto exit;
-
-	err = 0;
-	set_mm_exe_file(mm, exe.file);	/* this grabs a reference to exe.file */
+	err = __prctl_set_mm_exe_file(mm, exe);
 exit:
 	fdput(exe);
 	return err;
@@ -1837,10 +1857,10 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
 		user_auxv[AT_VECTOR_SIZE - 1] = AT_NULL;
 	}
 
-	down_write(&mm->mmap_sem);
 	if (prctl_map.exe_fd != (u32)-1)
-		error = prctl_set_mm_exe_file_locked(mm, prctl_map.exe_fd);
-	downgrade_write(&mm->mmap_sem);
+		error = prctl_set_mm_exe_file(mm, prctl_map.exe_fd);
+
+	down_read(&mm->mmap_sem);
 	if (error)
 		goto out;
 
@@ -1906,12 +1926,8 @@ static int prctl_set_mm(int opt, unsigned long addr,
 	if (!capable(CAP_SYS_RESOURCE))
 		return -EPERM;
 
-	if (opt == PR_SET_MM_EXE_FILE) {
-		down_write(&mm->mmap_sem);
-		error = prctl_set_mm_exe_file_locked(mm, (unsigned int)addr);
-		up_write(&mm->mmap_sem);
-		return error;
-	}
+	if (opt == PR_SET_MM_EXE_FILE)
+		return prctl_set_mm_exe_file(mm, (unsigned int)addr);
 
 	if (addr >= TASK_SIZE || addr < mmap_min_addr)
 		return -EINVAL;
-- 
2.1.4




^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH] mm: replace mmap_sem for mm->exe_file serialization
@ 2015-02-26 19:36 ` Davidlohr Bueso
  0 siblings, 0 replies; 14+ messages in thread
From: Davidlohr Bueso @ 2015-02-26 19:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Viro, Cyrill Gorcunov, linux-mm, linux-kernel, dave

We currently use the mmap_sem to serialize the mm exe_file.
This is atrocious and a clear example of the misuses this
lock has all over the place, making any significant changes
to the address space locking that much more complex and tedious.
This also has to do of how we used to check for the vma's vm_file
being VM_EXECUTABLE (much of which was replaced by 2dd8ad81e31).

This patch, therefore, removes the mmap_sem dependency and
introduces a specific lock for the exe_file (rwlock_t, as it is
read mostly and protects a trivial critical region). As mentioned,
the motivation is to cleanup mmap_sem (as opposed to exe_file
performance). A nice side effect of this is that we avoid taking
the mmap_sem (shared) in fork paths for the exe_file handling
(note that readers block when the rwsem is taken exclusively by
another thread).

Now that callers have been updated and standardized[1, 2] around
the get_mm_set_exe_file() interface, changing the locking scheme
is quite straightforward. The exception being the prctl calls
(ie PR_SET_MM_EXE_FILE). Because this caller actually _updates_
the mm->exe_file, we need to handle it in the same patch that changes
the locking rules. For this we need to reorganize prctl_set_mm_exe_file,
such that:

o mmap_sem is taken when actually needed.

o a new set_mm_exe_file_locked() function is introduced to be used by
  prctl. We now need to explicitly acquire the exe_file_lock as before
  it was implicit in holding the mmap_sem for write.

o a new __prctl_set_mm_exe_file() helper is created, which actually
  does the exe_file handling for the mm side -- needing the write
  lock for updating the mm->flags (*sigh*). In the future we could
  have a unique mm::exe_file_struct and keep track of MMF_EXE_FILE_CHANGED
  on our own.

mm: improve handling of mm->exe_file
[1] Part 1: https://lkml.org/lkml/2015/2/18/721
[2] Part 2: https://lkml.org/lkml/2015/2/25/679

Applies on top of linux-next (20150225).

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 fs/exec.c                |  6 ++++
 include/linux/mm.h       |  3 ++
 include/linux/mm_types.h |  1 +
 kernel/fork.c            | 25 ++++++++++---
 kernel/sys.c             | 92 ++++++++++++++++++++++++++++--------------------
 5 files changed, 84 insertions(+), 43 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index c7f9b73..a5fef83 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1078,7 +1078,13 @@ int flush_old_exec(struct linux_binprm * bprm)
 	if (retval)
 		goto out;
 
+	/*
+	 * Must be called _before_ exec_mmap() as bprm->mm is
+	 * not visibile until then. This also enables the update
+	 * to be lockless.
+	 */
 	set_mm_exe_file(bprm->mm, bprm->file);
+
 	/*
 	 * Release all of the old mmap stuff
 	 */
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 47a9392..2a210b2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1907,7 +1907,10 @@ static inline int check_data_rlimit(unsigned long rlim,
 extern int mm_take_all_locks(struct mm_struct *mm);
 extern void mm_drop_all_locks(struct mm_struct *mm);
 
+/* mm->exe_file handling */
 extern void set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file);
+extern void set_mm_exe_file_locked(struct mm_struct *mm,
+				   struct file *new_exe_file);
 extern struct file *get_mm_exe_file(struct mm_struct *mm);
 
 extern int may_expand_vm(struct mm_struct *mm, unsigned long npages);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 199a03a..74e0b99 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -427,6 +427,7 @@ struct mm_struct {
 #endif
 
 	/* store ref to file /proc/<pid>/exe symlink points to */
+	rwlock_t exe_file_lock;
 	struct file *exe_file;
 #ifdef CONFIG_MMU_NOTIFIER
 	struct mmu_notifier_mm *mmu_notifier_mm;
diff --git a/kernel/fork.c b/kernel/fork.c
index cf65139..ca4499a 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -565,6 +565,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p)
 	mm_init_aio(mm);
 	mm_init_owner(mm, p);
 	mmu_notifier_mm_init(mm);
+	rwlock_init(&mm->exe_file_lock);
 	clear_tlb_flush_pending(mm);
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS
 	mm->pmd_huge_pte = NULL;
@@ -674,12 +675,25 @@ void mmput(struct mm_struct *mm)
 }
 EXPORT_SYMBOL_GPL(mmput);
 
+void set_mm_exe_file_locked(struct mm_struct *mm, struct file *new_exe_file)
+{
+	if (new_exe_file)
+		get_file(new_exe_file);
+	if (mm->exe_file)
+		fput(mm->exe_file);
+
+	write_lock(&mm->exe_file_lock);
+	mm->exe_file = new_exe_file;
+	write_unlock(&mm->exe_file_lock);
+}
+
 void set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file)
 {
 	if (new_exe_file)
 		get_file(new_exe_file);
 	if (mm->exe_file)
 		fput(mm->exe_file);
+
 	mm->exe_file = new_exe_file;
 }
 
@@ -687,19 +701,20 @@ struct file *get_mm_exe_file(struct mm_struct *mm)
 {
 	struct file *exe_file;
 
-	/* We need mmap_sem to protect against races with removal of exe_file */
-	down_read(&mm->mmap_sem);
+	read_lock(&mm->exe_file_lock);
 	exe_file = mm->exe_file;
 	if (exe_file)
 		get_file(exe_file);
-	up_read(&mm->mmap_sem);
+	read_unlock(&mm->exe_file_lock);
 	return exe_file;
 }
 
 static void dup_mm_exe_file(struct mm_struct *oldmm, struct mm_struct *newmm)
 {
-	/* It's safe to write the exe_file pointer without exe_file_lock because
-	 * this is called during fork when the task is not yet in /proc */
+	/*
+	 * It's safe to write the exe_file without the
+	 * exe_file_lock as we are just setting up the new task.
+	 */
 	newmm->exe_file = get_mm_exe_file(oldmm);
 }
 
diff --git a/kernel/sys.c b/kernel/sys.c
index 667b2e6..2ab2d1c 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1646,14 +1646,59 @@ SYSCALL_DEFINE1(umask, int, mask)
 	return mask;
 }
 
-static int prctl_set_mm_exe_file_locked(struct mm_struct *mm, unsigned int fd)
+static int __prctl_set_mm_exe_file(struct mm_struct *mm, struct fd exe)
+{
+	int err;
+	struct file *exe_file;
+
+	/*
+	 * Forbid mm->exe_file change if old file still mapped.
+	 */
+	exe_file = get_mm_exe_file(mm);
+	err = -EBUSY;
+	down_write(&mm->mmap_sem);
+	if (exe_file) {
+		struct vm_area_struct *vma;
+
+		for (vma = mm->mmap; vma; vma = vma->vm_next) {
+			if (!vma->vm_file)
+				continue;
+			if (path_equal(&vma->vm_file->f_path,
+				       &exe_file->f_path)) {
+				fput(exe_file);
+				goto exit_err;
+			}
+		}
+
+		fput(exe_file);
+	}
+
+	/*
+	 * The symlink can be changed only once, just to disallow arbitrary
+	 * transitions malicious software might bring in. This means one
+	 * could make a snapshot over all processes running and monitor
+	 * /proc/pid/exe changes to notice unusual activity if needed.
+	 */
+	err = -EPERM;
+	if (test_and_set_bit(MMF_EXE_FILE_CHANGED, &mm->flags))
+		goto exit_err;
+
+	up_write(&mm->mmap_sem);
+
+	/* this grabs a reference to exe.file */
+	set_mm_exe_file_locked(mm, exe.file);
+	return 0;
+exit_err:
+	up_write(&mm->mmap_sem);
+	return err;
+}
+
+static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
 {
 	struct fd exe;
 	struct inode *inode;
 	int err;
 
-	VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_sem), mm);
-
 	exe = fdget(fd);
 	if (!exe.file)
 		return -EBADF;
@@ -1674,32 +1719,7 @@ static int prctl_set_mm_exe_file_locked(struct mm_struct *mm, unsigned int fd)
 	if (err)
 		goto exit;
 
-	/*
-	 * Forbid mm->exe_file change if old file still mapped.
-	 */
-	err = -EBUSY;
-	if (mm->exe_file) {
-		struct vm_area_struct *vma;
-
-		for (vma = mm->mmap; vma; vma = vma->vm_next)
-			if (vma->vm_file &&
-			    path_equal(&vma->vm_file->f_path,
-				       &mm->exe_file->f_path))
-				goto exit;
-	}
-
-	/*
-	 * The symlink can be changed only once, just to disallow arbitrary
-	 * transitions malicious software might bring in. This means one
-	 * could make a snapshot over all processes running and monitor
-	 * /proc/pid/exe changes to notice unusual activity if needed.
-	 */
-	err = -EPERM;
-	if (test_and_set_bit(MMF_EXE_FILE_CHANGED, &mm->flags))
-		goto exit;
-
-	err = 0;
-	set_mm_exe_file(mm, exe.file);	/* this grabs a reference to exe.file */
+	err = __prctl_set_mm_exe_file(mm, exe);
 exit:
 	fdput(exe);
 	return err;
@@ -1837,10 +1857,10 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
 		user_auxv[AT_VECTOR_SIZE - 1] = AT_NULL;
 	}
 
-	down_write(&mm->mmap_sem);
 	if (prctl_map.exe_fd != (u32)-1)
-		error = prctl_set_mm_exe_file_locked(mm, prctl_map.exe_fd);
-	downgrade_write(&mm->mmap_sem);
+		error = prctl_set_mm_exe_file(mm, prctl_map.exe_fd);
+
+	down_read(&mm->mmap_sem);
 	if (error)
 		goto out;
 
@@ -1906,12 +1926,8 @@ static int prctl_set_mm(int opt, unsigned long addr,
 	if (!capable(CAP_SYS_RESOURCE))
 		return -EPERM;
 
-	if (opt == PR_SET_MM_EXE_FILE) {
-		down_write(&mm->mmap_sem);
-		error = prctl_set_mm_exe_file_locked(mm, (unsigned int)addr);
-		up_write(&mm->mmap_sem);
-		return error;
-	}
+	if (opt == PR_SET_MM_EXE_FILE)
+		return prctl_set_mm_exe_file(mm, (unsigned int)addr);
 
 	if (addr >= TASK_SIZE || addr < mmap_min_addr)
 		return -EINVAL;
-- 
2.1.4



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: replace mmap_sem for mm->exe_file serialization
  2015-02-26 19:36 ` Davidlohr Bueso
@ 2015-02-26 20:51   ` Cyrill Gorcunov
  -1 siblings, 0 replies; 14+ messages in thread
From: Cyrill Gorcunov @ 2015-02-26 20:51 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Andrew Morton, Alexander Viro, linux-mm, linux-kernel, dave,
	Oleg Nesterov

On Thu, Feb 26, 2015 at 11:36:57AM -0800, Davidlohr Bueso wrote:
> We currently use the mmap_sem to serialize the mm exe_file.
> This is atrocious and a clear example of the misuses this
> lock has all over the place, making any significant changes
> to the address space locking that much more complex and tedious.
> This also has to do of how we used to check for the vma's vm_file
> being VM_EXECUTABLE (much of which was replaced by 2dd8ad81e31).
> 
> This patch, therefore, removes the mmap_sem dependency and
> introduces a specific lock for the exe_file (rwlock_t, as it is
> read mostly and protects a trivial critical region). As mentioned,
> the motivation is to cleanup mmap_sem (as opposed to exe_file
> performance). A nice side effect of this is that we avoid taking
> the mmap_sem (shared) in fork paths for the exe_file handling
> (note that readers block when the rwsem is taken exclusively by
> another thread).
> 
> Now that callers have been updated and standardized[1, 2] around
> the get_mm_set_exe_file() interface, changing the locking scheme
> is quite straightforward. The exception being the prctl calls
> (ie PR_SET_MM_EXE_FILE). Because this caller actually _updates_
> the mm->exe_file, we need to handle it in the same patch that changes
> the locking rules. For this we need to reorganize prctl_set_mm_exe_file,
> such that:
> 
> o mmap_sem is taken when actually needed.
> 
> o a new set_mm_exe_file_locked() function is introduced to be used by
>   prctl. We now need to explicitly acquire the exe_file_lock as before
>   it was implicit in holding the mmap_sem for write.
> 
> o a new __prctl_set_mm_exe_file() helper is created, which actually
>   does the exe_file handling for the mm side -- needing the write
>   lock for updating the mm->flags (*sigh*). In the future we could
>   have a unique mm::exe_file_struct and keep track of MMF_EXE_FILE_CHANGED
>   on our own.
> 
> mm: improve handling of mm->exe_file
> [1] Part 1: https://lkml.org/lkml/2015/2/18/721
> [2] Part 2: https://lkml.org/lkml/2015/2/25/679
> 
> Applies on top of linux-next (20150225).
> 
> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>

Hi Davidlohr, it would be interesting to know if the cleanup
bring some performance benefit?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: replace mmap_sem for mm->exe_file serialization
@ 2015-02-26 20:51   ` Cyrill Gorcunov
  0 siblings, 0 replies; 14+ messages in thread
From: Cyrill Gorcunov @ 2015-02-26 20:51 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Andrew Morton, Alexander Viro, linux-mm, linux-kernel, dave,
	Oleg Nesterov

On Thu, Feb 26, 2015 at 11:36:57AM -0800, Davidlohr Bueso wrote:
> We currently use the mmap_sem to serialize the mm exe_file.
> This is atrocious and a clear example of the misuses this
> lock has all over the place, making any significant changes
> to the address space locking that much more complex and tedious.
> This also has to do of how we used to check for the vma's vm_file
> being VM_EXECUTABLE (much of which was replaced by 2dd8ad81e31).
> 
> This patch, therefore, removes the mmap_sem dependency and
> introduces a specific lock for the exe_file (rwlock_t, as it is
> read mostly and protects a trivial critical region). As mentioned,
> the motivation is to cleanup mmap_sem (as opposed to exe_file
> performance). A nice side effect of this is that we avoid taking
> the mmap_sem (shared) in fork paths for the exe_file handling
> (note that readers block when the rwsem is taken exclusively by
> another thread).
> 
> Now that callers have been updated and standardized[1, 2] around
> the get_mm_set_exe_file() interface, changing the locking scheme
> is quite straightforward. The exception being the prctl calls
> (ie PR_SET_MM_EXE_FILE). Because this caller actually _updates_
> the mm->exe_file, we need to handle it in the same patch that changes
> the locking rules. For this we need to reorganize prctl_set_mm_exe_file,
> such that:
> 
> o mmap_sem is taken when actually needed.
> 
> o a new set_mm_exe_file_locked() function is introduced to be used by
>   prctl. We now need to explicitly acquire the exe_file_lock as before
>   it was implicit in holding the mmap_sem for write.
> 
> o a new __prctl_set_mm_exe_file() helper is created, which actually
>   does the exe_file handling for the mm side -- needing the write
>   lock for updating the mm->flags (*sigh*). In the future we could
>   have a unique mm::exe_file_struct and keep track of MMF_EXE_FILE_CHANGED
>   on our own.
> 
> mm: improve handling of mm->exe_file
> [1] Part 1: https://lkml.org/lkml/2015/2/18/721
> [2] Part 2: https://lkml.org/lkml/2015/2/25/679
> 
> Applies on top of linux-next (20150225).
> 
> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>

Hi Davidlohr, it would be interesting to know if the cleanup
bring some performance benefit?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: replace mmap_sem for mm->exe_file serialization
  2015-02-26 20:51   ` Cyrill Gorcunov
@ 2015-02-27 17:36     ` Oleg Nesterov
  -1 siblings, 0 replies; 14+ messages in thread
From: Oleg Nesterov @ 2015-02-27 17:36 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Davidlohr Bueso, Andrew Morton, Alexander Viro, linux-mm,
	linux-kernel, dave

On 02/26, Cyrill Gorcunov wrote:
>
> On Thu, Feb 26, 2015 at 11:36:57AM -0800, Davidlohr Bueso wrote:
> > We currently use the mmap_sem to serialize the mm exe_file.
> > This is atrocious and a clear example of the misuses this
> > lock has all over the place, making any significant changes
> > to the address space locking that much more complex and tedious.
> > This also has to do of how we used to check for the vma's vm_file
> > being VM_EXECUTABLE (much of which was replaced by 2dd8ad81e31).
> >
> > This patch, therefore, removes the mmap_sem dependency and
> > introduces a specific lock for the exe_file (rwlock_t, as it is
> > read mostly and protects a trivial critical region). As mentioned,
> > the motivation is to cleanup mmap_sem (as opposed to exe_file
> > performance).

Well, I didn't see the patch, can't really comment.

But I have to admit that this looks as atrocious and a clear example of
"lets add yet another random lock which we will regret about later" ;)

rwlock_t in mm_struct just to serialize access to exe_file?

> A nice side effect of this is that we avoid taking
> > the mmap_sem (shared) in fork paths for the exe_file handling
> > (note that readers block when the rwsem is taken exclusively by
> > another thread).

Yes, this is ugly. Can't we kill this dup_mm_exe_file() and copy change
dup_mmap() to also dup ->exe_file ?

> Hi Davidlohr, it would be interesting to know if the cleanup
> bring some performance benefit?

To me the main question is whether the patch makes this code simpler
or uglier ;)

Oleg.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: replace mmap_sem for mm->exe_file serialization
@ 2015-02-27 17:36     ` Oleg Nesterov
  0 siblings, 0 replies; 14+ messages in thread
From: Oleg Nesterov @ 2015-02-27 17:36 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Davidlohr Bueso, Andrew Morton, Alexander Viro, linux-mm,
	linux-kernel, dave

On 02/26, Cyrill Gorcunov wrote:
>
> On Thu, Feb 26, 2015 at 11:36:57AM -0800, Davidlohr Bueso wrote:
> > We currently use the mmap_sem to serialize the mm exe_file.
> > This is atrocious and a clear example of the misuses this
> > lock has all over the place, making any significant changes
> > to the address space locking that much more complex and tedious.
> > This also has to do of how we used to check for the vma's vm_file
> > being VM_EXECUTABLE (much of which was replaced by 2dd8ad81e31).
> >
> > This patch, therefore, removes the mmap_sem dependency and
> > introduces a specific lock for the exe_file (rwlock_t, as it is
> > read mostly and protects a trivial critical region). As mentioned,
> > the motivation is to cleanup mmap_sem (as opposed to exe_file
> > performance).

Well, I didn't see the patch, can't really comment.

But I have to admit that this looks as atrocious and a clear example of
"lets add yet another random lock which we will regret about later" ;)

rwlock_t in mm_struct just to serialize access to exe_file?

> A nice side effect of this is that we avoid taking
> > the mmap_sem (shared) in fork paths for the exe_file handling
> > (note that readers block when the rwsem is taken exclusively by
> > another thread).

Yes, this is ugly. Can't we kill this dup_mm_exe_file() and copy change
dup_mmap() to also dup ->exe_file ?

> Hi Davidlohr, it would be interesting to know if the cleanup
> bring some performance benefit?

To me the main question is whether the patch makes this code simpler
or uglier ;)

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: replace mmap_sem for mm->exe_file serialization
  2015-02-27 17:36     ` Oleg Nesterov
@ 2015-02-27 18:34       ` Davidlohr Bueso
  -1 siblings, 0 replies; 14+ messages in thread
From: Davidlohr Bueso @ 2015-02-27 18:34 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Cyrill Gorcunov, Davidlohr Bueso, Andrew Morton, Alexander Viro,
	linux-mm, linux-kernel

On Fri, 2015-02-27 at 18:36 +0100, Oleg Nesterov wrote:
> On 02/26, Cyrill Gorcunov wrote:
> >
> > On Thu, Feb 26, 2015 at 11:36:57AM -0800, Davidlohr Bueso wrote:
> > > We currently use the mmap_sem to serialize the mm exe_file.
> > > This is atrocious and a clear example of the misuses this
> > > lock has all over the place, making any significant changes
> > > to the address space locking that much more complex and tedious.
> > > This also has to do of how we used to check for the vma's vm_file
> > > being VM_EXECUTABLE (much of which was replaced by 2dd8ad81e31).
> > >
> > > This patch, therefore, removes the mmap_sem dependency and
> > > introduces a specific lock for the exe_file (rwlock_t, as it is
> > > read mostly and protects a trivial critical region). As mentioned,
> > > the motivation is to cleanup mmap_sem (as opposed to exe_file
> > > performance).
> 
> Well, I didn't see the patch, can't really comment.
> 
> But I have to admit that this looks as atrocious and a clear example of
> "lets add yet another random lock which we will regret about later" ;)
> 
> rwlock_t in mm_struct just to serialize access to exe_file?

I don't see why this is a random lock nor how would we regret this
later. I regret having to do these kind of patches because people were
lazy and just relied on mmap_sem without thinking beyond their use case.
As mentioned I'm also planning on creating an own sort of
exe_file_struct, which would be an isolated entity (still in the mm
though), with its own locking and prctl bits, that would tidy mm_struct
a bit. RCU was something else I considered, but it doesn't suite well in
all paths and we would still need a spinlock when updating the file
anyway.

If you have a better suggestion please do tell.

> 
> > A nice side effect of this is that we avoid taking
> > > the mmap_sem (shared) in fork paths for the exe_file handling
> > > (note that readers block when the rwsem is taken exclusively by
> > > another thread).
> 
> Yes, this is ugly. Can't we kill this dup_mm_exe_file() and copy change
> dup_mmap() to also dup ->exe_file ?
> 
> > Hi Davidlohr, it would be interesting to know if the cleanup
> > bring some performance benefit?
> 
> To me the main question is whether the patch makes this code simpler
> or uglier ;)

Its much beyond that. As mentioned, for any significant changes to the
mmap_sem locking scheme, this sort of thing needs to be addressed first.

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: replace mmap_sem for mm->exe_file serialization
@ 2015-02-27 18:34       ` Davidlohr Bueso
  0 siblings, 0 replies; 14+ messages in thread
From: Davidlohr Bueso @ 2015-02-27 18:34 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Cyrill Gorcunov, Davidlohr Bueso, Andrew Morton, Alexander Viro,
	linux-mm, linux-kernel

On Fri, 2015-02-27 at 18:36 +0100, Oleg Nesterov wrote:
> On 02/26, Cyrill Gorcunov wrote:
> >
> > On Thu, Feb 26, 2015 at 11:36:57AM -0800, Davidlohr Bueso wrote:
> > > We currently use the mmap_sem to serialize the mm exe_file.
> > > This is atrocious and a clear example of the misuses this
> > > lock has all over the place, making any significant changes
> > > to the address space locking that much more complex and tedious.
> > > This also has to do of how we used to check for the vma's vm_file
> > > being VM_EXECUTABLE (much of which was replaced by 2dd8ad81e31).
> > >
> > > This patch, therefore, removes the mmap_sem dependency and
> > > introduces a specific lock for the exe_file (rwlock_t, as it is
> > > read mostly and protects a trivial critical region). As mentioned,
> > > the motivation is to cleanup mmap_sem (as opposed to exe_file
> > > performance).
> 
> Well, I didn't see the patch, can't really comment.
> 
> But I have to admit that this looks as atrocious and a clear example of
> "lets add yet another random lock which we will regret about later" ;)
> 
> rwlock_t in mm_struct just to serialize access to exe_file?

I don't see why this is a random lock nor how would we regret this
later. I regret having to do these kind of patches because people were
lazy and just relied on mmap_sem without thinking beyond their use case.
As mentioned I'm also planning on creating an own sort of
exe_file_struct, which would be an isolated entity (still in the mm
though), with its own locking and prctl bits, that would tidy mm_struct
a bit. RCU was something else I considered, but it doesn't suite well in
all paths and we would still need a spinlock when updating the file
anyway.

If you have a better suggestion please do tell.

> 
> > A nice side effect of this is that we avoid taking
> > > the mmap_sem (shared) in fork paths for the exe_file handling
> > > (note that readers block when the rwsem is taken exclusively by
> > > another thread).
> 
> Yes, this is ugly. Can't we kill this dup_mm_exe_file() and copy change
> dup_mmap() to also dup ->exe_file ?
> 
> > Hi Davidlohr, it would be interesting to know if the cleanup
> > bring some performance benefit?
> 
> To me the main question is whether the patch makes this code simpler
> or uglier ;)

Its much beyond that. As mentioned, for any significant changes to the
mmap_sem locking scheme, this sort of thing needs to be addressed first.

Thanks,
Davidlohr

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: replace mmap_sem for mm->exe_file serialization
  2015-02-27 18:34       ` Davidlohr Bueso
@ 2015-03-11 12:21         ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 14+ messages in thread
From: Konstantin Khlebnikov @ 2015-03-11 12:21 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Oleg Nesterov, Cyrill Gorcunov, Davidlohr Bueso, Andrew Morton,
	Alexander Viro, linux-mm, Linux Kernel Mailing List

On Fri, Feb 27, 2015 at 9:34 PM, Davidlohr Bueso <dave@stgolabs.net> wrote:
> On Fri, 2015-02-27 at 18:36 +0100, Oleg Nesterov wrote:
>> On 02/26, Cyrill Gorcunov wrote:
>> >
>> > On Thu, Feb 26, 2015 at 11:36:57AM -0800, Davidlohr Bueso wrote:
>> > > We currently use the mmap_sem to serialize the mm exe_file.
>> > > This is atrocious and a clear example of the misuses this
>> > > lock has all over the place, making any significant changes
>> > > to the address space locking that much more complex and tedious.
>> > > This also has to do of how we used to check for the vma's vm_file
>> > > being VM_EXECUTABLE (much of which was replaced by 2dd8ad81e31).
>> > >
>> > > This patch, therefore, removes the mmap_sem dependency and
>> > > introduces a specific lock for the exe_file (rwlock_t, as it is
>> > > read mostly and protects a trivial critical region). As mentioned,
>> > > the motivation is to cleanup mmap_sem (as opposed to exe_file
>> > > performance).
>>
>> Well, I didn't see the patch, can't really comment.
>>
>> But I have to admit that this looks as atrocious and a clear example of
>> "lets add yet another random lock which we will regret about later" ;)
>>
>> rwlock_t in mm_struct just to serialize access to exe_file?
>
> I don't see why this is a random lock nor how would we regret this
> later. I regret having to do these kind of patches because people were
> lazy and just relied on mmap_sem without thinking beyond their use case.

That's history: exe_file had direct relation to mm->mmap_sem,
that was file from first executable vma. After my patch it's less
related to vmas.

> As mentioned I'm also planning on creating an own sort of
> exe_file_struct, which would be an isolated entity (still in the mm
> though), with its own locking and prctl bits, that would tidy mm_struct
> a bit. RCU was something else I considered, but it doesn't suite well in
> all paths and we would still need a spinlock when updating the file
> anyway.

Please don't. What's wrong with mmap_sem?

Do you want optimize reading mm->exe_file?
Then you should use rcu for that: struct file is rcu-protected thing.
See fget(), you could do something like that.

>
> If you have a better suggestion please do tell.
>
>>
>> > A nice side effect of this is that we avoid taking
>> > > the mmap_sem (shared) in fork paths for the exe_file handling
>> > > (note that readers block when the rwsem is taken exclusively by
>> > > another thread).
>>
>> Yes, this is ugly. Can't we kill this dup_mm_exe_file() and copy change
>> dup_mmap() to also dup ->exe_file ?
>>
>> > Hi Davidlohr, it would be interesting to know if the cleanup
>> > bring some performance benefit?
>>
>> To me the main question is whether the patch makes this code simpler
>> or uglier ;)
>
> Its much beyond that. As mentioned, for any significant changes to the
> mmap_sem locking scheme, this sort of thing needs to be addressed first.
>
> Thanks,
> Davidlohr
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: replace mmap_sem for mm->exe_file serialization
@ 2015-03-11 12:21         ` Konstantin Khlebnikov
  0 siblings, 0 replies; 14+ messages in thread
From: Konstantin Khlebnikov @ 2015-03-11 12:21 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Oleg Nesterov, Cyrill Gorcunov, Davidlohr Bueso, Andrew Morton,
	Alexander Viro, linux-mm, Linux Kernel Mailing List

On Fri, Feb 27, 2015 at 9:34 PM, Davidlohr Bueso <dave@stgolabs.net> wrote:
> On Fri, 2015-02-27 at 18:36 +0100, Oleg Nesterov wrote:
>> On 02/26, Cyrill Gorcunov wrote:
>> >
>> > On Thu, Feb 26, 2015 at 11:36:57AM -0800, Davidlohr Bueso wrote:
>> > > We currently use the mmap_sem to serialize the mm exe_file.
>> > > This is atrocious and a clear example of the misuses this
>> > > lock has all over the place, making any significant changes
>> > > to the address space locking that much more complex and tedious.
>> > > This also has to do of how we used to check for the vma's vm_file
>> > > being VM_EXECUTABLE (much of which was replaced by 2dd8ad81e31).
>> > >
>> > > This patch, therefore, removes the mmap_sem dependency and
>> > > introduces a specific lock for the exe_file (rwlock_t, as it is
>> > > read mostly and protects a trivial critical region). As mentioned,
>> > > the motivation is to cleanup mmap_sem (as opposed to exe_file
>> > > performance).
>>
>> Well, I didn't see the patch, can't really comment.
>>
>> But I have to admit that this looks as atrocious and a clear example of
>> "lets add yet another random lock which we will regret about later" ;)
>>
>> rwlock_t in mm_struct just to serialize access to exe_file?
>
> I don't see why this is a random lock nor how would we regret this
> later. I regret having to do these kind of patches because people were
> lazy and just relied on mmap_sem without thinking beyond their use case.

That's history: exe_file had direct relation to mm->mmap_sem,
that was file from first executable vma. After my patch it's less
related to vmas.

> As mentioned I'm also planning on creating an own sort of
> exe_file_struct, which would be an isolated entity (still in the mm
> though), with its own locking and prctl bits, that would tidy mm_struct
> a bit. RCU was something else I considered, but it doesn't suite well in
> all paths and we would still need a spinlock when updating the file
> anyway.

Please don't. What's wrong with mmap_sem?

Do you want optimize reading mm->exe_file?
Then you should use rcu for that: struct file is rcu-protected thing.
See fget(), you could do something like that.

>
> If you have a better suggestion please do tell.
>
>>
>> > A nice side effect of this is that we avoid taking
>> > > the mmap_sem (shared) in fork paths for the exe_file handling
>> > > (note that readers block when the rwsem is taken exclusively by
>> > > another thread).
>>
>> Yes, this is ugly. Can't we kill this dup_mm_exe_file() and copy change
>> dup_mmap() to also dup ->exe_file ?
>>
>> > Hi Davidlohr, it would be interesting to know if the cleanup
>> > bring some performance benefit?
>>
>> To me the main question is whether the patch makes this code simpler
>> or uglier ;)
>
> Its much beyond that. As mentioned, for any significant changes to the
> mmap_sem locking scheme, this sort of thing needs to be addressed first.
>
> Thanks,
> Davidlohr
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: replace mmap_sem for mm->exe_file serialization
  2015-03-11 12:21         ` Konstantin Khlebnikov
@ 2015-03-11 12:40           ` Davidlohr Bueso
  -1 siblings, 0 replies; 14+ messages in thread
From: Davidlohr Bueso @ 2015-03-11 12:40 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Oleg Nesterov, Cyrill Gorcunov, Davidlohr Bueso, Andrew Morton,
	Alexander Viro, linux-mm, Linux Kernel Mailing List

On Wed, 2015-03-11 at 15:21 +0300, Konstantin Khlebnikov wrote:
> On Fri, Feb 27, 2015 at 9:34 PM, Davidlohr Bueso <dave@stgolabs.net> wrote:
> > On Fri, 2015-02-27 at 18:36 +0100, Oleg Nesterov wrote:
> >> On 02/26, Cyrill Gorcunov wrote:
> >> >
> >> > On Thu, Feb 26, 2015 at 11:36:57AM -0800, Davidlohr Bueso wrote:
> >> > > We currently use the mmap_sem to serialize the mm exe_file.
> >> > > This is atrocious and a clear example of the misuses this
> >> > > lock has all over the place, making any significant changes
> >> > > to the address space locking that much more complex and tedious.
> >> > > This also has to do of how we used to check for the vma's vm_file
> >> > > being VM_EXECUTABLE (much of which was replaced by 2dd8ad81e31).
> >> > >
> >> > > This patch, therefore, removes the mmap_sem dependency and
> >> > > introduces a specific lock for the exe_file (rwlock_t, as it is
> >> > > read mostly and protects a trivial critical region). As mentioned,
> >> > > the motivation is to cleanup mmap_sem (as opposed to exe_file
> >> > > performance).
> >>
> >> Well, I didn't see the patch, can't really comment.
> >>
> >> But I have to admit that this looks as atrocious and a clear example of
> >> "lets add yet another random lock which we will regret about later" ;)
> >>
> >> rwlock_t in mm_struct just to serialize access to exe_file?
> >
> > I don't see why this is a random lock nor how would we regret this
> > later. I regret having to do these kind of patches because people were
> > lazy and just relied on mmap_sem without thinking beyond their use case.
> 
> That's history: exe_file had direct relation to mm->mmap_sem,
> that was file from first executable vma. After my patch it's less
> related to vmas.

Indeed. Yet I'm not changing the exe_file address space semantics at
all.

> 
> > As mentioned I'm also planning on creating an own sort of
> > exe_file_struct, which would be an isolated entity (still in the mm
> > though), with its own locking and prctl bits, that would tidy mm_struct
> > a bit. RCU was something else I considered, but it doesn't suite well in
> > all paths and we would still need a spinlock when updating the file
> > anyway.
> 
> Please don't. What's wrong with mmap_sem?
> 
> Do you want optimize reading mm->exe_file?

No, I want to get rid of certain things being done under mmap_sem,
that's all. This is not performance motivated, it's to allow future work
on lock breaking. I've just yesterday explained this at lsfmm (and not
only related to exe_file). In any case I've clean up this patch and
added more on top to create a friendlier interface, I'll send that out a
bit later.

> Then you should use rcu for that: struct file is rcu-protected thing.
> See fget(), you could do something like that.

As mentioned, not all exe paths are RCU friendly ;) We'd at least need
srcu, but that's neither here nor there. A rwlock is suficient to get
the job done and we really need not care much about optimizing this
particular file further.

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: replace mmap_sem for mm->exe_file serialization
@ 2015-03-11 12:40           ` Davidlohr Bueso
  0 siblings, 0 replies; 14+ messages in thread
From: Davidlohr Bueso @ 2015-03-11 12:40 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Oleg Nesterov, Cyrill Gorcunov, Davidlohr Bueso, Andrew Morton,
	Alexander Viro, linux-mm, Linux Kernel Mailing List

On Wed, 2015-03-11 at 15:21 +0300, Konstantin Khlebnikov wrote:
> On Fri, Feb 27, 2015 at 9:34 PM, Davidlohr Bueso <dave@stgolabs.net> wrote:
> > On Fri, 2015-02-27 at 18:36 +0100, Oleg Nesterov wrote:
> >> On 02/26, Cyrill Gorcunov wrote:
> >> >
> >> > On Thu, Feb 26, 2015 at 11:36:57AM -0800, Davidlohr Bueso wrote:
> >> > > We currently use the mmap_sem to serialize the mm exe_file.
> >> > > This is atrocious and a clear example of the misuses this
> >> > > lock has all over the place, making any significant changes
> >> > > to the address space locking that much more complex and tedious.
> >> > > This also has to do of how we used to check for the vma's vm_file
> >> > > being VM_EXECUTABLE (much of which was replaced by 2dd8ad81e31).
> >> > >
> >> > > This patch, therefore, removes the mmap_sem dependency and
> >> > > introduces a specific lock for the exe_file (rwlock_t, as it is
> >> > > read mostly and protects a trivial critical region). As mentioned,
> >> > > the motivation is to cleanup mmap_sem (as opposed to exe_file
> >> > > performance).
> >>
> >> Well, I didn't see the patch, can't really comment.
> >>
> >> But I have to admit that this looks as atrocious and a clear example of
> >> "lets add yet another random lock which we will regret about later" ;)
> >>
> >> rwlock_t in mm_struct just to serialize access to exe_file?
> >
> > I don't see why this is a random lock nor how would we regret this
> > later. I regret having to do these kind of patches because people were
> > lazy and just relied on mmap_sem without thinking beyond their use case.
> 
> That's history: exe_file had direct relation to mm->mmap_sem,
> that was file from first executable vma. After my patch it's less
> related to vmas.

Indeed. Yet I'm not changing the exe_file address space semantics at
all.

> 
> > As mentioned I'm also planning on creating an own sort of
> > exe_file_struct, which would be an isolated entity (still in the mm
> > though), with its own locking and prctl bits, that would tidy mm_struct
> > a bit. RCU was something else I considered, but it doesn't suite well in
> > all paths and we would still need a spinlock when updating the file
> > anyway.
> 
> Please don't. What's wrong with mmap_sem?
> 
> Do you want optimize reading mm->exe_file?

No, I want to get rid of certain things being done under mmap_sem,
that's all. This is not performance motivated, it's to allow future work
on lock breaking. I've just yesterday explained this at lsfmm (and not
only related to exe_file). In any case I've clean up this patch and
added more on top to create a friendlier interface, I'll send that out a
bit later.

> Then you should use rcu for that: struct file is rcu-protected thing.
> See fget(), you could do something like that.

As mentioned, not all exe paths are RCU friendly ;) We'd at least need
srcu, but that's neither here nor there. A rwlock is suficient to get
the job done and we really need not care much about optimizing this
particular file further.

Thanks,
Davidlohr

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: replace mmap_sem for mm->exe_file serialization
  2015-03-11 12:40           ` Davidlohr Bueso
@ 2015-03-11 13:26             ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 14+ messages in thread
From: Konstantin Khlebnikov @ 2015-03-11 13:26 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Oleg Nesterov, Cyrill Gorcunov, Davidlohr Bueso, Andrew Morton,
	Alexander Viro, linux-mm, Linux Kernel Mailing List

On Wed, Mar 11, 2015 at 3:40 PM, Davidlohr Bueso <dave@stgolabs.net> wrote:
> On Wed, 2015-03-11 at 15:21 +0300, Konstantin Khlebnikov wrote:
>> On Fri, Feb 27, 2015 at 9:34 PM, Davidlohr Bueso <dave@stgolabs.net> wrote:
>> > On Fri, 2015-02-27 at 18:36 +0100, Oleg Nesterov wrote:
>> >> On 02/26, Cyrill Gorcunov wrote:
>> >> >
>> >> > On Thu, Feb 26, 2015 at 11:36:57AM -0800, Davidlohr Bueso wrote:
>> >> > > We currently use the mmap_sem to serialize the mm exe_file.
>> >> > > This is atrocious and a clear example of the misuses this
>> >> > > lock has all over the place, making any significant changes
>> >> > > to the address space locking that much more complex and tedious.
>> >> > > This also has to do of how we used to check for the vma's vm_file
>> >> > > being VM_EXECUTABLE (much of which was replaced by 2dd8ad81e31).
>> >> > >
>> >> > > This patch, therefore, removes the mmap_sem dependency and
>> >> > > introduces a specific lock for the exe_file (rwlock_t, as it is
>> >> > > read mostly and protects a trivial critical region). As mentioned,
>> >> > > the motivation is to cleanup mmap_sem (as opposed to exe_file
>> >> > > performance).
>> >>
>> >> Well, I didn't see the patch, can't really comment.
>> >>
>> >> But I have to admit that this looks as atrocious and a clear example of
>> >> "lets add yet another random lock which we will regret about later" ;)
>> >>
>> >> rwlock_t in mm_struct just to serialize access to exe_file?
>> >
>> > I don't see why this is a random lock nor how would we regret this
>> > later. I regret having to do these kind of patches because people were
>> > lazy and just relied on mmap_sem without thinking beyond their use case.
>>
>> That's history: exe_file had direct relation to mm->mmap_sem,
>> that was file from first executable vma. After my patch it's less
>> related to vmas.
>
> Indeed. Yet I'm not changing the exe_file address space semantics at
> all.
>
>>
>> > As mentioned I'm also planning on creating an own sort of
>> > exe_file_struct, which would be an isolated entity (still in the mm
>> > though), with its own locking and prctl bits, that would tidy mm_struct
>> > a bit. RCU was something else I considered, but it doesn't suite well in
>> > all paths and we would still need a spinlock when updating the file
>> > anyway.
>>
>> Please don't. What's wrong with mmap_sem?
>>
>> Do you want optimize reading mm->exe_file?
>
> No, I want to get rid of certain things being done under mmap_sem,
> that's all. This is not performance motivated, it's to allow future work
> on lock breaking. I've just yesterday explained this at lsfmm (and not
> only related to exe_file). In any case I've clean up this patch and
> added more on top to create a friendlier interface, I'll send that out a
> bit later.
>
>> Then you should use rcu for that: struct file is rcu-protected thing.
>> See fget(), you could do something like that.
>
> As mentioned, not all exe paths are RCU friendly ;) We'd at least need
> srcu, but that's neither here nor there. A rwlock is suficient to get
> the job done and we really need not care much about optimizing this
> particular file further.

I mean you could make mm->exe_file rcu protected pointer and use
everywhere get_mm_exe_file() which grabs file refcount under rcu and
returns pointer.

>
> Thanks,
> Davidlohr
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm: replace mmap_sem for mm->exe_file serialization
@ 2015-03-11 13:26             ` Konstantin Khlebnikov
  0 siblings, 0 replies; 14+ messages in thread
From: Konstantin Khlebnikov @ 2015-03-11 13:26 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Oleg Nesterov, Cyrill Gorcunov, Davidlohr Bueso, Andrew Morton,
	Alexander Viro, linux-mm, Linux Kernel Mailing List

On Wed, Mar 11, 2015 at 3:40 PM, Davidlohr Bueso <dave@stgolabs.net> wrote:
> On Wed, 2015-03-11 at 15:21 +0300, Konstantin Khlebnikov wrote:
>> On Fri, Feb 27, 2015 at 9:34 PM, Davidlohr Bueso <dave@stgolabs.net> wrote:
>> > On Fri, 2015-02-27 at 18:36 +0100, Oleg Nesterov wrote:
>> >> On 02/26, Cyrill Gorcunov wrote:
>> >> >
>> >> > On Thu, Feb 26, 2015 at 11:36:57AM -0800, Davidlohr Bueso wrote:
>> >> > > We currently use the mmap_sem to serialize the mm exe_file.
>> >> > > This is atrocious and a clear example of the misuses this
>> >> > > lock has all over the place, making any significant changes
>> >> > > to the address space locking that much more complex and tedious.
>> >> > > This also has to do of how we used to check for the vma's vm_file
>> >> > > being VM_EXECUTABLE (much of which was replaced by 2dd8ad81e31).
>> >> > >
>> >> > > This patch, therefore, removes the mmap_sem dependency and
>> >> > > introduces a specific lock for the exe_file (rwlock_t, as it is
>> >> > > read mostly and protects a trivial critical region). As mentioned,
>> >> > > the motivation is to cleanup mmap_sem (as opposed to exe_file
>> >> > > performance).
>> >>
>> >> Well, I didn't see the patch, can't really comment.
>> >>
>> >> But I have to admit that this looks as atrocious and a clear example of
>> >> "lets add yet another random lock which we will regret about later" ;)
>> >>
>> >> rwlock_t in mm_struct just to serialize access to exe_file?
>> >
>> > I don't see why this is a random lock nor how would we regret this
>> > later. I regret having to do these kind of patches because people were
>> > lazy and just relied on mmap_sem without thinking beyond their use case.
>>
>> That's history: exe_file had direct relation to mm->mmap_sem,
>> that was file from first executable vma. After my patch it's less
>> related to vmas.
>
> Indeed. Yet I'm not changing the exe_file address space semantics at
> all.
>
>>
>> > As mentioned I'm also planning on creating an own sort of
>> > exe_file_struct, which would be an isolated entity (still in the mm
>> > though), with its own locking and prctl bits, that would tidy mm_struct
>> > a bit. RCU was something else I considered, but it doesn't suite well in
>> > all paths and we would still need a spinlock when updating the file
>> > anyway.
>>
>> Please don't. What's wrong with mmap_sem?
>>
>> Do you want optimize reading mm->exe_file?
>
> No, I want to get rid of certain things being done under mmap_sem,
> that's all. This is not performance motivated, it's to allow future work
> on lock breaking. I've just yesterday explained this at lsfmm (and not
> only related to exe_file). In any case I've clean up this patch and
> added more on top to create a friendlier interface, I'll send that out a
> bit later.
>
>> Then you should use rcu for that: struct file is rcu-protected thing.
>> See fget(), you could do something like that.
>
> As mentioned, not all exe paths are RCU friendly ;) We'd at least need
> srcu, but that's neither here nor there. A rwlock is suficient to get
> the job done and we really need not care much about optimizing this
> particular file further.

I mean you could make mm->exe_file rcu protected pointer and use
everywhere get_mm_exe_file() which grabs file refcount under rcu and
returns pointer.

>
> Thanks,
> Davidlohr
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-03-11 13:26 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-26 19:36 [PATCH] mm: replace mmap_sem for mm->exe_file serialization Davidlohr Bueso
2015-02-26 19:36 ` Davidlohr Bueso
2015-02-26 20:51 ` Cyrill Gorcunov
2015-02-26 20:51   ` Cyrill Gorcunov
2015-02-27 17:36   ` Oleg Nesterov
2015-02-27 17:36     ` Oleg Nesterov
2015-02-27 18:34     ` Davidlohr Bueso
2015-02-27 18:34       ` Davidlohr Bueso
2015-03-11 12:21       ` Konstantin Khlebnikov
2015-03-11 12:21         ` Konstantin Khlebnikov
2015-03-11 12:40         ` Davidlohr Bueso
2015-03-11 12:40           ` Davidlohr Bueso
2015-03-11 13:26           ` Konstantin Khlebnikov
2015-03-11 13:26             ` Konstantin Khlebnikov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.