All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/7] Remove in-tree usage of MAP_DENYWRITE
@ 2021-04-23 13:16 David Hildenbrand
  2021-04-23 13:16 ` [PATCH RFC 1/7] binfmt: don't use MAP_DENYWRITE when loading shared libraries via uselib() David Hildenbrand
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: David Hildenbrand @ 2021-04-23 13:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: David Hildenbrand, Andrew Morton, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Alexander Viro, Alexey Dobriyan,
	Steven Rostedt, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Petr Mladek, Sergey Senozhatsky, Andy Shevchenko,
	Rasmus Villemoes, Kees Cook, Eric W. Biederman, Greg Ungerer,
	Geert Uytterhoeven, Mike Rapoport, Vlastimil Babka,
	Vincenzo Frascino, Chinwen Chang, Michel Lespinasse,
	Catalin Marinas, Matthew Wilcox (Oracle),
	Huang Ying, Jann Horn, Feng Tang, Kevin Brodsky,
	Michael Ellerman, Shawn Anastasio, Steven Price, Nicholas Piggin,
	Christian Brauner, Jens Axboe, Gabriel Krisman Bertazi, Peter Xu,
	Suren Baghdasaryan, Shakeel Butt, Marco Elver, Daniel Jordan,
	Nicolas Viennot, Thomas Cedeno, Collin Fijalkovich, Michal Hocko,
	linux-api, x86, linux-fsdevel, linux-mm

This series is based on [1]
	[PATCH v1 0/3] perf/binfmt/mm: remove in-tree usage of
	MAP_EXECUTABLE
and [2]
	[PATCH v2] mm, thp: Relax the VM_DENYWRITE constraint on
	file-backed THPs

This series removes all in-tree usage of MAP_DENYWRITE from the kernel
and removes VM_DENYWRITE. We stopped supporting MAP_DENYWRITE for
user space applications a while ago because of the chance for DoS.
The last renaming user is binfmt binary loading during exec and
legacy library loading via uselib(). 

With this change, MAP_DENYWRITE is effectively ignored throughout the
kernel. Although the net change is small, I think the cleanup in mmap()
is quite nice.

There are some (minor) user-visible changes with this series, that's why
I am flagging this as RFC and cc-ing linux-api:
1. We no longer deny write access to shared libaries loaded via legacy
   uselib(); this behavior matches modern user space e.g., via dlopen().
2. We no longer deny write access to the elf interpreter after exec
   completed, treating it just like shared libraries (which it often is).
3. We always deny write access to the file linked via /proc/pid/exe:
   sys_prctl(PR_SET_MM_EXE_FILE) will fail if write access to the file
   cannot be denied, and write access to the file will remain denied
   until the link is effectivel gone (exec, termination,
   PR_SET_MM_EXE_FILE) -- just as if exec'ing the file.

I was wondering if we really care about permanently disabling write access
to the executable, or if it would be good enough to just disable write
access while loading the new executable during exec; but I don't know
the history of that -- and it somewhat makes sense to deny write access
at least to the main executable. With modern user space -- dlopen() -- we
can effectively modify the content of shared libraries while being used.

I'm not 100% sure if the race documented in patch #3 applies (forking
while another thread is doing a PR_SET_MM_EXE_FILE), but I
assume this is possible.

[1] https://lkml.kernel.org/r/20210421093453.6904-1-david@redhat.com
[2] https://lkml.kernel.org/r/20210406000930.3455850-1-cfijalkovich@google.com

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Kevin Brodsky <Kevin.Brodsky@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Shawn Anastasio <shawn@anastas.io>
Cc: Steven Price <steven.price@arm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Gabriel Krisman Bertazi <krisman@collabora.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Nicolas Viennot <Nicolas.Viennot@twosigma.com>
Cc: Thomas Cedeno <thomascedeno@google.com>
Cc: Collin Fijalkovich <cfijalkovich@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: linux-api@vger.kernel.org
Cc: x86@kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-mm@kvack.org

David Hildenbrand (7):
  binfmt: don't use MAP_DENYWRITE when loading shared libraries via
    uselib()
  kernel/fork: factor out atomcially replacing the current MM exe_file
  kernel/fork: always deny write access to current MM exe_file
  binfmt: remove in-tree usage of MAP_DENYWRITE
  mm: remove VM_DENYWRITE
  mm: ignore MAP_DENYWRITE in ksys_mmap_pgoff()
  fs: update documentation of get_write_access() and friends

 arch/x86/ia32/ia32_aout.c      |  8 ++--
 fs/binfmt_aout.c               |  7 ++--
 fs/binfmt_elf.c                |  6 +--
 fs/binfmt_elf_fdpic.c          |  2 +-
 fs/proc/task_mmu.c             |  1 -
 include/linux/fs.h             | 19 +++++----
 include/linux/mm.h             |  3 +-
 include/linux/mman.h           |  4 +-
 include/trace/events/mmflags.h |  1 -
 kernel/events/core.c           |  2 -
 kernel/fork.c                  | 75 ++++++++++++++++++++++++++++++----
 kernel/sys.c                   | 33 +--------------
 lib/test_printf.c              |  5 +--
 mm/mmap.c                      | 29 ++-----------
 mm/nommu.c                     |  2 -
 15 files changed, 98 insertions(+), 99 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH RFC 1/7] binfmt: don't use MAP_DENYWRITE when loading shared libraries via uselib()
  2021-04-23 13:16 [PATCH RFC 0/7] Remove in-tree usage of MAP_DENYWRITE David Hildenbrand
@ 2021-04-23 13:16 ` David Hildenbrand
  2021-04-23 13:16 ` [PATCH RFC 2/7] kernel/fork: factor out atomcially replacing the current MM exe_file David Hildenbrand
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: David Hildenbrand @ 2021-04-23 13:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: David Hildenbrand, Andrew Morton, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Alexander Viro, Alexey Dobriyan,
	Steven Rostedt, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Petr Mladek, Sergey Senozhatsky, Andy Shevchenko,
	Rasmus Villemoes, Kees Cook, Eric W. Biederman, Greg Ungerer,
	Geert Uytterhoeven, Mike Rapoport, Vlastimil Babka,
	Vincenzo Frascino, Chinwen Chang, Michel Lespinasse,
	Catalin Marinas, Matthew Wilcox (Oracle),
	Huang Ying, Jann Horn, Feng Tang, Kevin Brodsky,
	Michael Ellerman, Shawn Anastasio, Steven Price, Nicholas Piggin,
	Christian Brauner, Jens Axboe, Gabriel Krisman Bertazi, Peter Xu,
	Suren Baghdasaryan, Shakeel Butt, Marco Elver, Daniel Jordan,
	Nicolas Viennot, Thomas Cedeno, Collin Fijalkovich, Michal Hocko,
	linux-api, x86, linux-fsdevel, linux-mm

uselib() is the legacy systemcall for loading shared libraries.
Nowadays, applications use dlopen() to load shared libraries, completely
implemented in user space via mmap().

For example, glibc uses MAP_COPY to mmap shared libraries. While this
maps to MAP_PRIVATE | MAP_DENYWRITE on Linux, Linux ignores any
MAP_DENYWRITE specification from user space in mmap.

With this change, all remaining in-tree users of MAP_DENYWRITE use it
to map an executable. We will be able to open shared libraries loaded
via uselib() writable, just as we already can via dlopen() from user
space.

This is one step into the direction of removing MAP_DENYWRITE from the
kernel. This can be considered a minor user space visible change.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/x86/ia32/ia32_aout.c | 2 +-
 fs/binfmt_aout.c          | 2 +-
 fs/binfmt_elf.c           | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/ia32/ia32_aout.c b/arch/x86/ia32/ia32_aout.c
index 5e5b9fc2747f..321d7b22ad2d 100644
--- a/arch/x86/ia32/ia32_aout.c
+++ b/arch/x86/ia32/ia32_aout.c
@@ -293,7 +293,7 @@ static int load_aout_library(struct file *file)
 	/* Now use mmap to map the library into memory. */
 	error = vm_mmap(file, start_addr, ex.a_text + ex.a_data,
 			PROT_READ | PROT_WRITE | PROT_EXEC,
-			MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE | MAP_32BIT,
+			MAP_FIXED | MAP_PRIVATE | MAP_32BIT,
 			N_TXTOFF(ex));
 	retval = error;
 	if (error != start_addr)
diff --git a/fs/binfmt_aout.c b/fs/binfmt_aout.c
index 12461f3ed04f..37df8fee63d7 100644
--- a/fs/binfmt_aout.c
+++ b/fs/binfmt_aout.c
@@ -309,7 +309,7 @@ static int load_aout_library(struct file *file)
 	/* Now use mmap to map the library into memory. */
 	error = vm_mmap(file, start_addr, ex.a_text + ex.a_data,
 			PROT_READ | PROT_WRITE | PROT_EXEC,
-			MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE,
+			MAP_FIXED | MAP_PRIVATE;
 			N_TXTOFF(ex));
 	retval = error;
 	if (error != start_addr)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index e0427b817425..763188ac398e 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1384,7 +1384,7 @@ static int load_elf_library(struct file *file)
 			(eppnt->p_filesz +
 			 ELF_PAGEOFFSET(eppnt->p_vaddr)),
 			PROT_READ | PROT_WRITE | PROT_EXEC,
-			MAP_FIXED_NOREPLACE | MAP_PRIVATE | MAP_DENYWRITE,
+			MAP_FIXED_NOREPLACE | MAP_PRIVATE,
 			(eppnt->p_offset -
 			 ELF_PAGEOFFSET(eppnt->p_vaddr)));
 	if (error != ELF_PAGESTART(eppnt->p_vaddr))
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH RFC 2/7] kernel/fork: factor out atomcially replacing the current MM exe_file
  2021-04-23 13:16 [PATCH RFC 0/7] Remove in-tree usage of MAP_DENYWRITE David Hildenbrand
  2021-04-23 13:16 ` [PATCH RFC 1/7] binfmt: don't use MAP_DENYWRITE when loading shared libraries via uselib() David Hildenbrand
@ 2021-04-23 13:16 ` David Hildenbrand
  2021-04-23 13:16 ` [PATCH RFC 3/7] kernel/fork: always deny write access to " David Hildenbrand
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: David Hildenbrand @ 2021-04-23 13:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: David Hildenbrand, Andrew Morton, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Alexander Viro, Alexey Dobriyan,
	Steven Rostedt, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Petr Mladek, Sergey Senozhatsky, Andy Shevchenko,
	Rasmus Villemoes, Kees Cook, Eric W. Biederman, Greg Ungerer,
	Geert Uytterhoeven, Mike Rapoport, Vlastimil Babka,
	Vincenzo Frascino, Chinwen Chang, Michel Lespinasse,
	Catalin Marinas, Matthew Wilcox (Oracle),
	Huang Ying, Jann Horn, Feng Tang, Kevin Brodsky,
	Michael Ellerman, Shawn Anastasio, Steven Price, Nicholas Piggin,
	Christian Brauner, Jens Axboe, Gabriel Krisman Bertazi, Peter Xu,
	Suren Baghdasaryan, Shakeel Butt, Marco Elver, Daniel Jordan,
	Nicolas Viennot, Thomas Cedeno, Collin Fijalkovich, Michal Hocko,
	linux-api, x86, linux-fsdevel, linux-mm

Let's factor the main logic out into atomic_set_mm_exe_file(), such that
all mm->exe_file logic is contained in kernel/fork.c.

While at it, perform some simple cleanups that are possible now that
we're simplifying the individual functions.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/mm.h |  2 ++
 kernel/fork.c      | 35 +++++++++++++++++++++++++++++++++--
 kernel/sys.c       | 33 +--------------------------------
 3 files changed, 36 insertions(+), 34 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8ba434287387..043702972e5f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2540,6 +2540,8 @@ extern int mm_take_all_locks(struct mm_struct *mm);
 extern void mm_drop_all_locks(struct mm_struct *mm);
 
 extern void set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file);
+extern int atomic_set_mm_exe_file(struct mm_struct *mm,
+				  struct file *new_exe_file);
 extern struct file *get_mm_exe_file(struct mm_struct *mm);
 extern struct file *get_task_exe_file(struct task_struct *task);
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 426cd0c51f9e..199463625adc 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1138,8 +1138,8 @@ void mmput_async(struct mm_struct *mm)
  * Main users are mmput() and sys_execve(). Callers prevent concurrent
  * invocations: in mmput() nobody alive left, in execve task is single
  * threaded. sys_prctl(PR_SET_MM_MAP/EXE_FILE) also needs to set the
- * mm->exe_file, but does so without using set_mm_exe_file() in order
- * to do avoid the need for any locks.
+ * mm->exe_file, but uses atomic_set_mm_exe_file(), avoiding the need
+ * for any locks.
  */
 void set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file)
 {
@@ -1159,6 +1159,37 @@ void set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file)
 		fput(old_exe_file);
 }
 
+int atomic_set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file)
+{
+	struct vm_area_struct *vma;
+	struct file *old_exe_file;
+	int ret = 0;
+
+	/* Forbid mm->exe_file change if old file still mapped. */
+	old_exe_file = get_mm_exe_file(mm);
+	if (old_exe_file) {
+		mmap_read_lock(mm);
+		for (vma = mm->mmap; vma && !ret; vma = vma->vm_next) {
+			if (!vma->vm_file)
+				continue;
+			if (path_equal(&vma->vm_file->f_path,
+				       &old_exe_file->f_path))
+				ret = -EBUSY;
+		}
+		mmap_read_unlock(mm);
+		fput(old_exe_file);
+		if (ret)
+			return ret;
+	}
+
+	/* set the new file, lockless */
+	get_file(new_exe_file);
+	old_exe_file = xchg(&mm->exe_file, new_exe_file);
+	if (old_exe_file)
+		fput(old_exe_file);
+	return 0;
+}
+
 /**
  * get_mm_exe_file - acquire a reference to the mm's executable file
  *
diff --git a/kernel/sys.c b/kernel/sys.c
index 2e2e3f378d97..7dcd9fb3153c 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1828,7 +1828,6 @@ SYSCALL_DEFINE1(umask, int, mask)
 static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
 {
 	struct fd exe;
-	struct file *old_exe, *exe_file;
 	struct inode *inode;
 	int err;
 
@@ -1851,40 +1850,10 @@ static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
 	if (err)
 		goto exit;
 
-	/*
-	 * Forbid mm->exe_file change if old file still mapped.
-	 */
-	exe_file = get_mm_exe_file(mm);
-	err = -EBUSY;
-	if (exe_file) {
-		struct vm_area_struct *vma;
-
-		mmap_read_lock(mm);
-		for (vma = mm->mmap; vma; vma = vma->vm_next) {
-			if (!vma->vm_file)
-				continue;
-			if (path_equal(&vma->vm_file->f_path,
-				       &exe_file->f_path))
-				goto exit_err;
-		}
-
-		mmap_read_unlock(mm);
-		fput(exe_file);
-	}
-
-	err = 0;
-	/* set the new file, lockless */
-	get_file(exe.file);
-	old_exe = xchg(&mm->exe_file, exe.file);
-	if (old_exe)
-		fput(old_exe);
+	err = atomic_set_mm_exe_file(mm, exe.file);
 exit:
 	fdput(exe);
 	return err;
-exit_err:
-	mmap_read_unlock(mm);
-	fput(exe_file);
-	goto exit;
 }
 
 /*
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH RFC 3/7] kernel/fork: always deny write access to current MM exe_file
  2021-04-23 13:16 [PATCH RFC 0/7] Remove in-tree usage of MAP_DENYWRITE David Hildenbrand
  2021-04-23 13:16 ` [PATCH RFC 1/7] binfmt: don't use MAP_DENYWRITE when loading shared libraries via uselib() David Hildenbrand
  2021-04-23 13:16 ` [PATCH RFC 2/7] kernel/fork: factor out atomcially replacing the current MM exe_file David Hildenbrand
@ 2021-04-23 13:16 ` David Hildenbrand
  2021-04-23 13:16 ` [PATCH RFC 4/7] binfmt: remove in-tree usage of MAP_DENYWRITE David Hildenbrand
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: David Hildenbrand @ 2021-04-23 13:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: David Hildenbrand, Andrew Morton, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Alexander Viro, Alexey Dobriyan,
	Steven Rostedt, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Petr Mladek, Sergey Senozhatsky, Andy Shevchenko,
	Rasmus Villemoes, Kees Cook, Eric W. Biederman, Greg Ungerer,
	Geert Uytterhoeven, Mike Rapoport, Vlastimil Babka,
	Vincenzo Frascino, Chinwen Chang, Michel Lespinasse,
	Catalin Marinas, Matthew Wilcox (Oracle),
	Huang Ying, Jann Horn, Feng Tang, Kevin Brodsky,
	Michael Ellerman, Shawn Anastasio, Steven Price, Nicholas Piggin,
	Christian Brauner, Jens Axboe, Gabriel Krisman Bertazi, Peter Xu,
	Suren Baghdasaryan, Shakeel Butt, Marco Elver, Daniel Jordan,
	Nicolas Viennot, Thomas Cedeno, Collin Fijalkovich, Michal Hocko,
	linux-api, x86, linux-fsdevel, linux-mm

We want to remove VM_DENYWRITE only currently only used when mapping the
executable during exec. During exec, we already deny_write_access() the
executable, however, after exec completes the VMAs mapped
with VM_DENYWRITE effectively keeps write access denied via
deny_write_access().

Let's deny write access when setting the MM exe_file. With this change, we
can remove VM_DENYWRITE for mapping executables.

This represents a minor user space visible change:
sys_prctl(PR_SET_MM_EXE_FILE) can now fail if the file is already
opened writable. Also, after sys_prctl(PR_SET_MM_EXE_FILE), the file
cannot be opened writable. Note that we can already fail with -EACCES if
the file doesn't have execute permissions.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 kernel/fork.c | 39 ++++++++++++++++++++++++++++++++++-----
 1 file changed, 34 insertions(+), 5 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 199463625adc..0681f2973667 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -472,6 +472,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 {
 	struct vm_area_struct *mpnt, *tmp, *prev, **pprev;
 	struct rb_node **rb_link, *rb_parent;
+	struct file *exe_file;
 	int retval;
 	unsigned long charge;
 	LIST_HEAD(uf);
@@ -489,7 +490,10 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 	mmap_write_lock_nested(mm, SINGLE_DEPTH_NESTING);
 
 	/* No ordering required: file already has been exposed. */
-	RCU_INIT_POINTER(mm->exe_file, get_mm_exe_file(oldmm));
+	exe_file = get_mm_exe_file(oldmm);
+	RCU_INIT_POINTER(mm->exe_file, exe_file);
+	if (exe_file)
+		deny_write_access(exe_file);
 
 	mm->total_vm = oldmm->total_vm;
 	mm->data_vm = oldmm->data_vm;
@@ -634,8 +638,13 @@ static inline void mm_free_pgd(struct mm_struct *mm)
 #else
 static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm)
 {
+	struct file *exe_file;
+
 	mmap_write_lock(oldmm);
-	RCU_INIT_POINTER(mm->exe_file, get_mm_exe_file(oldmm));
+	exe_file = get_mm_exe_file(oldmm);
+	RCU_INIT_POINTER(mm->exe_file, exe_file);
+	if (exe_file)
+		deny_write_access(exe_file);
 	mmap_write_unlock(oldmm);
 	return 0;
 }
@@ -1152,11 +1161,19 @@ void set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file)
 	 */
 	old_exe_file = rcu_dereference_raw(mm->exe_file);
 
-	if (new_exe_file)
+	if (new_exe_file) {
 		get_file(new_exe_file);
+		/*
+		 * exec code is required to deny_write_access() successfully,
+		 * so this cannot fail
+		 */
+		deny_write_access(new_exe_file);
+	}
 	rcu_assign_pointer(mm->exe_file, new_exe_file);
-	if (old_exe_file)
+	if (old_exe_file) {
+		allow_write_access(old_exe_file);
 		fput(old_exe_file);
+	}
 }
 
 int atomic_set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file)
@@ -1183,10 +1200,22 @@ int atomic_set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file)
 	}
 
 	/* set the new file, lockless */
+	ret = deny_write_access(new_exe_file);
+	if (ret)
+		return -EACCES;
 	get_file(new_exe_file);
+
 	old_exe_file = xchg(&mm->exe_file, new_exe_file);
-	if (old_exe_file)
+	if (old_exe_file) {
+		/*
+		 * Don't race with dup_mmap() getting the file and disallowing
+		 * write access while someone might open the file writable.
+		 */
+		mmap_read_lock(mm);
+		allow_write_access(old_exe_file);
 		fput(old_exe_file);
+		mmap_read_unlock(mm);
+	}
 	return 0;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH RFC 4/7] binfmt: remove in-tree usage of MAP_DENYWRITE
  2021-04-23 13:16 [PATCH RFC 0/7] Remove in-tree usage of MAP_DENYWRITE David Hildenbrand
                   ` (2 preceding siblings ...)
  2021-04-23 13:16 ` [PATCH RFC 3/7] kernel/fork: always deny write access to " David Hildenbrand
@ 2021-04-23 13:16 ` David Hildenbrand
  2021-04-23 13:16 ` [PATCH RFC 5/7] mm: remove VM_DENYWRITE David Hildenbrand
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: David Hildenbrand @ 2021-04-23 13:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: David Hildenbrand, Andrew Morton, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Alexander Viro, Alexey Dobriyan,
	Steven Rostedt, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Petr Mladek, Sergey Senozhatsky, Andy Shevchenko,
	Rasmus Villemoes, Kees Cook, Eric W. Biederman, Greg Ungerer,
	Geert Uytterhoeven, Mike Rapoport, Vlastimil Babka,
	Vincenzo Frascino, Chinwen Chang, Michel Lespinasse,
	Catalin Marinas, Matthew Wilcox (Oracle),
	Huang Ying, Jann Horn, Feng Tang, Kevin Brodsky,
	Michael Ellerman, Shawn Anastasio, Steven Price, Nicholas Piggin,
	Christian Brauner, Jens Axboe, Gabriel Krisman Bertazi, Peter Xu,
	Suren Baghdasaryan, Shakeel Butt, Marco Elver, Daniel Jordan,
	Nicolas Viennot, Thomas Cedeno, Collin Fijalkovich, Michal Hocko,
	linux-api, x86, linux-fsdevel, linux-mm

At exec time when we mmap the new executable via MAP_DENYWRITE we have it
opened via do_open_execat() and already deny_write_access()'ed the file
successfully. Once exec completes, we allow_write_acces(); however,
we set mm->exe_file in begin_new_exec() via set_mm_exe_file() and
also deny_write_access() as long as mm->exe_file remains set. We'll
effectively deny write access to our executable via mm->exe_file
until mm->exe_file is changed -- when the process is removed, on new
exec, or via sys_prctl(PR_SET_MM_EXE_FILE).

Let's remove all usage of MAP_DENYWRITE, it's no longer necessary for
mm->exe_file.

In case of an elf interpreter, we'll now only deny write access to the file
during exec. This is somewhat okay, because the interpreter behaves
(and sometime is) a shared library; all shared libraries, especially the
ones loaded directly in user space like via dlopen() won't ever be mapped
via MAP_DENYWRITE, because we ignore that from user space completely;
these shared libraries can always be modified while mapped and executed.
Let's only special-case the main executable, denying write access while
being executed by a process. This can be considered a minor user space
visible change.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/x86/ia32/ia32_aout.c | 6 ++----
 fs/binfmt_aout.c          | 5 ++---
 fs/binfmt_elf.c           | 4 ++--
 fs/binfmt_elf_fdpic.c     | 2 +-
 4 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/arch/x86/ia32/ia32_aout.c b/arch/x86/ia32/ia32_aout.c
index 321d7b22ad2d..9bd15241fadb 100644
--- a/arch/x86/ia32/ia32_aout.c
+++ b/arch/x86/ia32/ia32_aout.c
@@ -202,8 +202,7 @@ static int load_aout_binary(struct linux_binprm *bprm)
 
 		error = vm_mmap(bprm->file, N_TXTADDR(ex), ex.a_text,
 				PROT_READ | PROT_EXEC,
-				MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE |
-				MAP_32BIT,
+				MAP_FIXED | MAP_PRIVATE | MAP_32BIT,
 				fd_offset);
 
 		if (error != N_TXTADDR(ex))
@@ -211,8 +210,7 @@ static int load_aout_binary(struct linux_binprm *bprm)
 
 		error = vm_mmap(bprm->file, N_DATADDR(ex), ex.a_data,
 				PROT_READ | PROT_WRITE | PROT_EXEC,
-				MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE |
-				MAP_32BIT,
+				MAP_FIXED | MAP_PRIVATE | MAP_32BIT,
 				fd_offset + ex.a_text);
 		if (error != N_DATADDR(ex))
 			return error;
diff --git a/fs/binfmt_aout.c b/fs/binfmt_aout.c
index 37df8fee63d7..9c44892d6469 100644
--- a/fs/binfmt_aout.c
+++ b/fs/binfmt_aout.c
@@ -221,8 +221,7 @@ static int load_aout_binary(struct linux_binprm * bprm)
 		}
 
 		error = vm_mmap(bprm->file, N_TXTADDR(ex), ex.a_text,
-			PROT_READ | PROT_EXEC,
-			MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE;
+			PROT_READ | PROT_EXEC, MAP_FIXED | MAP_PRIVATE;
 			fd_offset);
 
 		if (error != N_TXTADDR(ex))
@@ -230,7 +229,7 @@ static int load_aout_binary(struct linux_binprm * bprm)
 
 		error = vm_mmap(bprm->file, N_DATADDR(ex), ex.a_data,
 				PROT_READ | PROT_WRITE | PROT_EXEC,
-				MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE;
+				MAP_FIXED | MAP_PRIVATE;
 				fd_offset + ex.a_text);
 		if (error != N_DATADDR(ex))
 			return error;
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 763188ac398e..76bb342e9c9b 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -622,7 +622,7 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
 	eppnt = interp_elf_phdata;
 	for (i = 0; i < interp_elf_ex->e_phnum; i++, eppnt++) {
 		if (eppnt->p_type == PT_LOAD) {
-			int elf_type = MAP_PRIVATE | MAP_DENYWRITE;
+			int elf_type = MAP_PRIVATE;
 			int elf_prot = make_prot(eppnt->p_flags, arch_state,
 						 true, true);
 			unsigned long vaddr = 0;
@@ -1070,7 +1070,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
 		elf_prot = make_prot(elf_ppnt->p_flags, &arch_state,
 				     !!interpreter, false);
 
-		elf_flags = MAP_PRIVATE | MAP_DENYWRITE;
+		elf_flags = MAP_PRIVATE;
 
 		vaddr = elf_ppnt->p_vaddr;
 		/*
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index 8723b6686b66..18a9e42e41d1 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -1041,7 +1041,7 @@ static int elf_fdpic_map_file_by_direct_mmap(struct elf_fdpic_params *params,
 		if (phdr->p_flags & PF_W) prot |= PROT_WRITE;
 		if (phdr->p_flags & PF_X) prot |= PROT_EXEC;
 
-		flags = MAP_PRIVATE | MAP_DENYWRITE;
+		flags = MAP_PRIVATE;
 		maddr = 0;
 
 		switch (params->flags & ELF_FDPIC_FLAG_ARRANGEMENT) {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH RFC 5/7] mm: remove VM_DENYWRITE
  2021-04-23 13:16 [PATCH RFC 0/7] Remove in-tree usage of MAP_DENYWRITE David Hildenbrand
                   ` (3 preceding siblings ...)
  2021-04-23 13:16 ` [PATCH RFC 4/7] binfmt: remove in-tree usage of MAP_DENYWRITE David Hildenbrand
@ 2021-04-23 13:16 ` David Hildenbrand
  2021-04-23 13:16 ` [PATCH RFC 6/7] mm: ignore MAP_DENYWRITE in ksys_mmap_pgoff() David Hildenbrand
  2021-04-23 13:16 ` [PATCH RFC 7/7] fs: update documentation of get_write_access() and friends David Hildenbrand
  6 siblings, 0 replies; 8+ messages in thread
From: David Hildenbrand @ 2021-04-23 13:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: David Hildenbrand, Andrew Morton, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Alexander Viro, Alexey Dobriyan,
	Steven Rostedt, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Petr Mladek, Sergey Senozhatsky, Andy Shevchenko,
	Rasmus Villemoes, Kees Cook, Eric W. Biederman, Greg Ungerer,
	Geert Uytterhoeven, Mike Rapoport, Vlastimil Babka,
	Vincenzo Frascino, Chinwen Chang, Michel Lespinasse,
	Catalin Marinas, Matthew Wilcox (Oracle),
	Huang Ying, Jann Horn, Feng Tang, Kevin Brodsky,
	Michael Ellerman, Shawn Anastasio, Steven Price, Nicholas Piggin,
	Christian Brauner, Jens Axboe, Gabriel Krisman Bertazi, Peter Xu,
	Suren Baghdasaryan, Shakeel Butt, Marco Elver, Daniel Jordan,
	Nicolas Viennot, Thomas Cedeno, Collin Fijalkovich, Michal Hocko,
	linux-api, x86, linux-fsdevel, linux-mm

All in-tree users of MAP_DENYWRITE are gone. MAP_DENYWRITE cannot be
set from user space, so all users are gone; let's remove it.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 fs/proc/task_mmu.c             |  1 -
 include/linux/mm.h             |  1 -
 include/linux/mman.h           |  1 -
 include/trace/events/mmflags.h |  1 -
 kernel/events/core.c           |  2 --
 kernel/fork.c                  |  3 ---
 lib/test_printf.c              |  5 ++---
 mm/mmap.c                      | 27 +++------------------------
 8 files changed, 5 insertions(+), 36 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index e862cab69583..2710703c39b0 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -621,7 +621,6 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
 		[ilog2(VM_MAYSHARE)]	= "ms",
 		[ilog2(VM_GROWSDOWN)]	= "gd",
 		[ilog2(VM_PFNMAP)]	= "pf",
-		[ilog2(VM_DENYWRITE)]	= "dw",
 		[ilog2(VM_LOCKED)]	= "lo",
 		[ilog2(VM_IO)]		= "io",
 		[ilog2(VM_SEQ_READ)]	= "sr",
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 043702972e5f..cdad125af9fa 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -274,7 +274,6 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_GROWSDOWN	0x00000100	/* general info on the segment */
 #define VM_UFFD_MISSING	0x00000200	/* missing pages tracking */
 #define VM_PFNMAP	0x00000400	/* Page-ranges managed without "struct page", just pure PFN */
-#define VM_DENYWRITE	0x00000800	/* ETXTBSY on write attempts.. */
 #define VM_UFFD_WP	0x00001000	/* wrprotect pages tracking */
 
 #define VM_LOCKED	0x00002000
diff --git a/include/linux/mman.h b/include/linux/mman.h
index ebb09a964272..bd9aadda047b 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -153,7 +153,6 @@ static inline unsigned long
 calc_vm_flag_bits(unsigned long flags)
 {
 	return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
-	       _calc_vm_trans(flags, MAP_DENYWRITE,  VM_DENYWRITE ) |
 	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
 	       _calc_vm_trans(flags, MAP_SYNC,	     VM_SYNC      ) |
 	       arch_calc_vm_flag_bits(flags);
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index 67018d367b9f..e1721ce6381d 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -149,7 +149,6 @@ IF_HAVE_PG_ARCH_2(PG_arch_2,		"arch_2"	)
 	{VM_GROWSDOWN,			"growsdown"	},		\
 	{VM_UFFD_MISSING,		"uffd_missing"	},		\
 	{VM_PFNMAP,			"pfnmap"	},		\
-	{VM_DENYWRITE,			"denywrite"	},		\
 	{VM_UFFD_WP,			"uffd_wp"	},		\
 	{VM_LOCKED,			"locked"	},		\
 	{VM_IO,				"io"		},		\
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 3dfd463f1831..1ed3eb7aa4c9 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8184,8 +8184,6 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event)
 	else
 		flags = MAP_PRIVATE;
 
-	if (vma->vm_flags & VM_DENYWRITE)
-		flags |= MAP_DENYWRITE;
 	if (vma->vm_flags & VM_LOCKED)
 		flags |= MAP_LOCKED;
 	if (is_vm_hugetlb_page(vma))
diff --git a/kernel/fork.c b/kernel/fork.c
index 0681f2973667..fda43ba05d91 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -556,12 +556,9 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 		tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
 		file = tmp->vm_file;
 		if (file) {
-			struct inode *inode = file_inode(file);
 			struct address_space *mapping = file->f_mapping;
 
 			get_file(file);
-			if (tmp->vm_flags & VM_DENYWRITE)
-				put_write_access(inode);
 			i_mmap_lock_write(mapping);
 			if (tmp->vm_flags & VM_SHARED)
 				mapping_allow_writable(mapping);
diff --git a/lib/test_printf.c b/lib/test_printf.c
index 95a2f82427c7..54f0583dc45d 100644
--- a/lib/test_printf.c
+++ b/lib/test_printf.c
@@ -596,9 +596,8 @@ flags(void)
 	test("uptodate|dirty|lru|active|swapbacked", "%pGp", &flags);
 
 
-	flags = VM_READ | VM_EXEC | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC
-			| VM_DENYWRITE;
-	test("read|exec|mayread|maywrite|mayexec|denywrite", "%pGv", &flags);
+	flags = VM_READ | VM_EXEC | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;
+	test("read|exec|mayread|maywrite|mayexec", "%pGv", &flags);
 
 	gfp = GFP_TRANSHUGE;
 	test("GFP_TRANSHUGE", "%pGg", &gfp);
diff --git a/mm/mmap.c b/mm/mmap.c
index 882f8ee4af1f..b335f8907568 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -142,8 +142,6 @@ void vma_set_page_prot(struct vm_area_struct *vma)
 static void __remove_shared_vm_struct(struct vm_area_struct *vma,
 		struct file *file, struct address_space *mapping)
 {
-	if (vma->vm_flags & VM_DENYWRITE)
-		allow_write_access(file);
 	if (vma->vm_flags & VM_SHARED)
 		mapping_unmap_writable(mapping);
 
@@ -660,8 +658,6 @@ static void __vma_link_file(struct vm_area_struct *vma)
 	if (file) {
 		struct address_space *mapping = file->f_mapping;
 
-		if (vma->vm_flags & VM_DENYWRITE)
-			put_write_access(file_inode(file));
 		if (vma->vm_flags & VM_SHARED)
 			mapping_allow_writable(mapping);
 
@@ -1785,22 +1781,12 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 	vma->vm_pgoff = pgoff;
 
 	if (file) {
-		if (vm_flags & VM_DENYWRITE) {
-			error = deny_write_access(file);
-			if (error)
-				goto free_vma;
-		}
 		if (vm_flags & VM_SHARED) {
 			error = mapping_map_writable(file->f_mapping);
 			if (error)
-				goto allow_write_and_free_vma;
+				goto free_vma;
 		}
 
-		/* ->mmap() can change vma->vm_file, but must guarantee that
-		 * vma_link() below can deny write-access if VM_DENYWRITE is set
-		 * and map writably if VM_SHARED is set. This usually means the
-		 * new file must not have been exposed to user-space, yet.
-		 */
 		vma->vm_file = get_file(file);
 		error = call_mmap(file, vma);
 		if (error)
@@ -1857,13 +1843,9 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 
 	vma_link(mm, vma, prev, rb_link, rb_parent);
 	/* Once vma denies write, undo our temporary denial count */
-	if (file) {
 unmap_writable:
-		if (vm_flags & VM_SHARED)
-			mapping_unmap_writable(file->f_mapping);
-		if (vm_flags & VM_DENYWRITE)
-			allow_write_access(file);
-	}
+	if (file && vm_flags & VM_SHARED)
+		mapping_unmap_writable(file->f_mapping);
 	file = vma->vm_file;
 out:
 	perf_event_mmap(vma);
@@ -1903,9 +1885,6 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 	charged = 0;
 	if (vm_flags & VM_SHARED)
 		mapping_unmap_writable(file->f_mapping);
-allow_write_and_free_vma:
-	if (vm_flags & VM_DENYWRITE)
-		allow_write_access(file);
 free_vma:
 	vm_area_free(vma);
 unacct_error:
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH RFC 6/7] mm: ignore MAP_DENYWRITE in ksys_mmap_pgoff()
  2021-04-23 13:16 [PATCH RFC 0/7] Remove in-tree usage of MAP_DENYWRITE David Hildenbrand
                   ` (4 preceding siblings ...)
  2021-04-23 13:16 ` [PATCH RFC 5/7] mm: remove VM_DENYWRITE David Hildenbrand
@ 2021-04-23 13:16 ` David Hildenbrand
  2021-04-23 13:16 ` [PATCH RFC 7/7] fs: update documentation of get_write_access() and friends David Hildenbrand
  6 siblings, 0 replies; 8+ messages in thread
From: David Hildenbrand @ 2021-04-23 13:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: David Hildenbrand, Andrew Morton, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Alexander Viro, Alexey Dobriyan,
	Steven Rostedt, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Petr Mladek, Sergey Senozhatsky, Andy Shevchenko,
	Rasmus Villemoes, Kees Cook, Eric W. Biederman, Greg Ungerer,
	Geert Uytterhoeven, Mike Rapoport, Vlastimil Babka,
	Vincenzo Frascino, Chinwen Chang, Michel Lespinasse,
	Catalin Marinas, Matthew Wilcox (Oracle),
	Huang Ying, Jann Horn, Feng Tang, Kevin Brodsky,
	Michael Ellerman, Shawn Anastasio, Steven Price, Nicholas Piggin,
	Christian Brauner, Jens Axboe, Gabriel Krisman Bertazi, Peter Xu,
	Suren Baghdasaryan, Shakeel Butt, Marco Elver, Daniel Jordan,
	Nicolas Viennot, Thomas Cedeno, Collin Fijalkovich, Michal Hocko,
	linux-api, x86, linux-fsdevel, linux-mm

Let's also remove masking off MAP_DENYWROTE from ksys_mmap_pgoff():
the last in-tree occurrence of MAP_DENYWRITE is now in LEGACY_MAP_MASK,
which accepts the flag e.g., for MAP_SHARED_VALIDATE; however, the flag
is ignored throughout the kernel now.

Add a comment to LEGACY_MAP_MASK stating that MAP_DENYWRITE is ignored.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/mman.h | 3 ++-
 mm/mmap.c            | 2 --
 mm/nommu.c           | 2 --
 3 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/include/linux/mman.h b/include/linux/mman.h
index bd9aadda047b..b66e91b8176c 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -32,7 +32,8 @@
  * The historical set of flags that all mmap implementations implicitly
  * support when a ->mmap_validate() op is not provided in file_operations.
  *
- * MAP_EXECUTABLE is completely ignored throughout the kernel.
+ * MAP_EXECUTABLE and MAP_DENYWRITE are completely ignored throughout the
+ * kernel.
  */
 #define LEGACY_MAP_MASK (MAP_SHARED \
 		| MAP_PRIVATE \
diff --git a/mm/mmap.c b/mm/mmap.c
index b335f8907568..33479ce05653 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1623,8 +1623,6 @@ unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len,
 			return PTR_ERR(file);
 	}
 
-	flags &= ~MAP_DENYWRITE;
-
 	retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff);
 out_fput:
 	if (file)
diff --git a/mm/nommu.c b/mm/nommu.c
index 1fafe3e9d3df..7d85c8a46582 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1306,8 +1306,6 @@ unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len,
 			goto out;
 	}
 
-	flags &= ~MAP_DENYWRITE;
-
 	retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff);
 
 	if (file)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH RFC 7/7] fs: update documentation of get_write_access() and friends
  2021-04-23 13:16 [PATCH RFC 0/7] Remove in-tree usage of MAP_DENYWRITE David Hildenbrand
                   ` (5 preceding siblings ...)
  2021-04-23 13:16 ` [PATCH RFC 6/7] mm: ignore MAP_DENYWRITE in ksys_mmap_pgoff() David Hildenbrand
@ 2021-04-23 13:16 ` David Hildenbrand
  6 siblings, 0 replies; 8+ messages in thread
From: David Hildenbrand @ 2021-04-23 13:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: David Hildenbrand, Andrew Morton, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Alexander Viro, Alexey Dobriyan,
	Steven Rostedt, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Petr Mladek, Sergey Senozhatsky, Andy Shevchenko,
	Rasmus Villemoes, Kees Cook, Eric W. Biederman, Greg Ungerer,
	Geert Uytterhoeven, Mike Rapoport, Vlastimil Babka,
	Vincenzo Frascino, Chinwen Chang, Michel Lespinasse,
	Catalin Marinas, Matthew Wilcox (Oracle),
	Huang Ying, Jann Horn, Feng Tang, Kevin Brodsky,
	Michael Ellerman, Shawn Anastasio, Steven Price, Nicholas Piggin,
	Christian Brauner, Jens Axboe, Gabriel Krisman Bertazi, Peter Xu,
	Suren Baghdasaryan, Shakeel Butt, Marco Elver, Daniel Jordan,
	Nicolas Viennot, Thomas Cedeno, Collin Fijalkovich, Michal Hocko,
	linux-api, x86, linux-fsdevel, linux-mm

As VM_DENYWRITE does no longer exists, let's spring-clean the
documentation of get_write_access() and friends.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/fs.h | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index ec8f3ddf4a6a..b0a410a14170 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2906,15 +2906,20 @@ static inline void file_end_write(struct file *file)
 }
 
 /*
+ * This is used for regular files where some users -- especially the
+ * currently executed binary in a process, previously handled via
+ * VM_DENYWRITE -- cannot handle concurrent write (and maybe mmap
+ * read-write shared) accesses.
+ *
  * get_write_access() gets write permission for a file.
  * put_write_access() releases this write permission.
- * This is used for regular files.
- * We cannot support write (and maybe mmap read-write shared) accesses and
- * MAP_DENYWRITE mmappings simultaneously. The i_writecount field of an inode
- * can have the following values:
- * 0: no writers, no VM_DENYWRITE mappings
- * < 0: (-i_writecount) vm_area_structs with VM_DENYWRITE set exist
- * > 0: (i_writecount) users are writing to the file.
+ * deny_write_access() denies write access to a file.
+ * allow_write_access() re-enables write access to a file.
+ *
+ * The i_writecount field of an inode can have the following values:
+ * 0: no write access, no denied write access
+ * < 0: (-i_writecount) users that denied write access to the file.
+ * > 0: (i_writecount) users that have write access to the file.
  *
  * Normally we operate on that counter with atomic_{inc,dec} and it's safe
  * except for the cases where we don't hold i_writecount yet. Then we need to
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-04-23 13:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-23 13:16 [PATCH RFC 0/7] Remove in-tree usage of MAP_DENYWRITE David Hildenbrand
2021-04-23 13:16 ` [PATCH RFC 1/7] binfmt: don't use MAP_DENYWRITE when loading shared libraries via uselib() David Hildenbrand
2021-04-23 13:16 ` [PATCH RFC 2/7] kernel/fork: factor out atomcially replacing the current MM exe_file David Hildenbrand
2021-04-23 13:16 ` [PATCH RFC 3/7] kernel/fork: always deny write access to " David Hildenbrand
2021-04-23 13:16 ` [PATCH RFC 4/7] binfmt: remove in-tree usage of MAP_DENYWRITE David Hildenbrand
2021-04-23 13:16 ` [PATCH RFC 5/7] mm: remove VM_DENYWRITE David Hildenbrand
2021-04-23 13:16 ` [PATCH RFC 6/7] mm: ignore MAP_DENYWRITE in ksys_mmap_pgoff() David Hildenbrand
2021-04-23 13:16 ` [PATCH RFC 7/7] fs: update documentation of get_write_access() and friends David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.