linux-unionfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christian Brauner <christian.brauner@ubuntu.com>
To: David Hildenbrand <david@redhat.com>
Cc: linux-kernel@vger.kernel.org,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Borislav Petkov" <bp@alien8.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"Alexander Viro" <viro@zeniv.linux.org.uk>,
	"Alexey Dobriyan" <adobriyan@gmail.com>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Arnaldo Carvalho de Melo" <acme@kernel.org>,
	"Mark Rutland" <mark.rutland@arm.com>,
	"Alexander Shishkin" <alexander.shishkin@linux.intel.com>,
	"Jiri Olsa" <jolsa@redhat.com>,
	"Namhyung Kim" <namhyung@kernel.org>,
	"Petr Mladek" <pmladek@suse.com>,
	"Sergey Senozhatsky" <sergey.senozhatsky@gmail.com>,
	"Andy Shevchenko" <andriy.shevchenko@linux.intel.com>,
	"Rasmus Villemoes" <linux@rasmusvillemoes.dk>,
	"Kees Cook" <keescook@chromium.org>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	"Greg Ungerer" <gerg@linux-m68k.org>,
	"Geert Uytterhoeven" <geert@linux-m68k.org>,
	"Mike Rapoport" <rppt@kernel.org>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Vincenzo Frascino" <vincenzo.frascino@arm.com>,
	"Chinwen Chang" <chinwen.chang@mediatek.com>,
	"Michel Lespinasse" <walken@google.com>,
	"Catalin Marinas" <catalin.marinas@arm.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	"Huang Ying" <ying.huang@intel.com>,
	"Jann Horn" <jannh@google.com>, "Feng Tang" <feng.tang@intel.com>,
	"Kevin Brodsky" <Kevin.Brodsky@arm.com>,
	"Michael Ellerman" <mpe@ellerman.id.au>,
	"Shawn Anastasio" <shawn@anastas.io>,
	"Steven Price" <steven.price@arm.com>,
	"Nicholas Piggin" <npiggin@gmail.com>,
	"Jens Axboe" <axboe@kernel.dk>,
	"Gabriel Krisman Bertazi" <krisman@collabora.com>,
	"Peter Xu" <peterx@redhat.com>,
	"Suren Baghdasaryan" <surenb@google.com>,
	"Shakeel Butt" <shakeelb@google.com>,
	"Marco Elver" <elver@google.com>,
	"Daniel Jordan" <daniel.m.jordan@oracle.com>,
	"Nicolas Viennot" <Nicolas.Viennot@twosigma.com>,
	"Thomas Cedeno" <thomascedeno@google.com>,
	"Collin Fijalkovich" <cfijalkovich@google.com>,
	"Michal Hocko" <mhocko@suse.com>,
	"Miklos Szeredi" <miklos@szeredi.hu>,
	"Chengguang Xu" <cgxu519@mykernel.net>,
	"Christian König" <ckoenig.leichtzumerken@gmail.com>,
	linux-unionfs@vger.kernel.org, linux-api@vger.kernel.org,
	x86@kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, "Andrei Vagin" <avagin@gmail.com>
Subject: Re: [PATCH v1 3/7] kernel/fork: always deny write access to current MM exe_file
Date: Thu, 12 Aug 2021 14:32:39 +0200	[thread overview]
Message-ID: <20210812123239.trksnm57owzwzokj@wittgenstein> (raw)
In-Reply-To: <1b6d27cf-2238-0c1c-c563-b38728fbabc2@redhat.com>

On Thu, Aug 12, 2021 at 12:13:44PM +0200, David Hildenbrand wrote:
> On 12.08.21 12:05, Christian Brauner wrote:
> > [+Cc Andrei]
> > 
> > On Thu, Aug 12, 2021 at 10:43:44AM +0200, David Hildenbrand wrote:
> > > We want to remove VM_DENYWRITE only currently only used when mapping the
> > > executable during exec. During exec, we already deny_write_access() the
> > > executable, however, after exec completes the VMAs mapped
> > > with VM_DENYWRITE effectively keeps write access denied via
> > > deny_write_access().
> > > 
> > > Let's deny write access when setting the MM exe_file. With this change, we
> > > can remove VM_DENYWRITE for mapping executables.
> > > 
> > > This represents a minor user space visible change:
> > > sys_prctl(PR_SET_MM_EXE_FILE) can now fail if the file is already
> > > opened writable. Also, after sys_prctl(PR_SET_MM_EXE_FILE), the file
> > 
> > Just for completeness, this also affects PR_SET_MM_MAP when exe_fd is
> > set.
> 
> Correct.
> 
> > 
> > > cannot be opened writable. Note that we can already fail with -EACCES if
> > > the file doesn't have execute permissions.
> > > 
> > > Signed-off-by: David Hildenbrand <david@redhat.com>
> > > ---
> > 
> > The biggest user I know and that I'm involved in is CRIU which heavily
> > uses PR_SET_MM_MAP (with a fallback to PR_SET_MM_EXE_FILE on older
> > kernels) during restore. Afair, criu opens the exe fd as an O_PATH
> > during dump and thus will use the same flag during restore when
> > opening it. So that should be fine.
> 
> Yes.
> 
> > 
> > However, if I understand the consequences of this change correctly, a
> > problem could be restoring workloads that hold a writable fd open to
> > their exe file at dump time which would mean that during restore that fd
> > would be reopened writable causing CRIU to fail when setting the exe
> > file for the task to be restored.
> 
> If it's their exe file, then the existing VM_DENYWRITE handling would have
> forbidden these workloads to open the fd of their exe file writable, right?

Yes.

> At least before doing any PR_SET_MM_MAP/PR_SET_MM_EXE_FILE. But that should
> rule out quite a lot of cases we might be worried about, right?

Yes, it rules out the most obvious cases. The problem is really just
that we don't know how common weirder cases are. But that doesn't mean
we shouldn't try and risk it. This is a nice cleanup and playing
/proc/self/exe games isn't super common.

> 
> > 
> > Which honestly, no idea how many such workloads exist. (I know at least
> > of runC and LXC need to sometimes reopen to rexec themselves (weird bug
> > to protect against attacking the exe file) and thus re-open
> > /proc/self/exe but read-only.)
> > 
> > >   kernel/fork.c | 39 ++++++++++++++++++++++++++++++++++-----
> > >   1 file changed, 34 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/kernel/fork.c b/kernel/fork.c
> > > index 6bd2e52bcdfb..5d904878f19b 100644
> > > --- a/kernel/fork.c
> > > +++ b/kernel/fork.c
> > > @@ -476,6 +476,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
> > >   {
> > >   	struct vm_area_struct *mpnt, *tmp, *prev, **pprev;
> > >   	struct rb_node **rb_link, *rb_parent;
> > > +	struct file *exe_file;
> > >   	int retval;
> > >   	unsigned long charge;
> > >   	LIST_HEAD(uf);
> > > @@ -493,7 +494,10 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
> > >   	mmap_write_lock_nested(mm, SINGLE_DEPTH_NESTING);
> > >   	/* No ordering required: file already has been exposed. */
> > > -	RCU_INIT_POINTER(mm->exe_file, get_mm_exe_file(oldmm));
> > > +	exe_file = get_mm_exe_file(oldmm);
> > > +	RCU_INIT_POINTER(mm->exe_file, exe_file);
> > > +	if (exe_file)
> > > +		deny_write_access(exe_file);
> > >   	mm->total_vm = oldmm->total_vm;
> > >   	mm->data_vm = oldmm->data_vm;
> > > @@ -638,8 +642,13 @@ static inline void mm_free_pgd(struct mm_struct *mm)
> > >   #else
> > >   static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm)
> > >   {
> > > +	struct file *exe_file;
> > > +
> > >   	mmap_write_lock(oldmm);
> > > -	RCU_INIT_POINTER(mm->exe_file, get_mm_exe_file(oldmm));
> > > +	exe_file = get_mm_exe_file(oldmm);
> > > +	RCU_INIT_POINTER(mm->exe_file, exe_file);
> > > +	if (exe_file)
> > > +		deny_write_access(exe_file);
> > >   	mmap_write_unlock(oldmm);
> > >   	return 0;
> > >   }
> > > @@ -1163,11 +1172,19 @@ void set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file)
> > >   	 */
> > >   	old_exe_file = rcu_dereference_raw(mm->exe_file);
> > > -	if (new_exe_file)
> > > +	if (new_exe_file) {
> > >   		get_file(new_exe_file);
> > > +		/*
> > > +		 * exec code is required to deny_write_access() successfully,
> > > +		 * so this cannot fail
> > > +		 */
> > > +		deny_write_access(new_exe_file);
> > > +	}
> > >   	rcu_assign_pointer(mm->exe_file, new_exe_file);
> > > -	if (old_exe_file)
> > > +	if (old_exe_file) {
> > > +		allow_write_access(old_exe_file);
> > >   		fput(old_exe_file);
> > > +	}
> > >   }
> > >   int atomic_set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file)
> > > @@ -1194,10 +1211,22 @@ int atomic_set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file)
> > >   	}
> > >   	/* set the new file, lockless */
> > > +	ret = deny_write_access(new_exe_file);
> > > +	if (ret)
> > > +		return -EACCES;
> > >   	get_file(new_exe_file);
> > > +
> > >   	old_exe_file = xchg(&mm->exe_file, new_exe_file);
> > > -	if (old_exe_file)
> > > +	if (old_exe_file) {
> > > +		/*
> > > +		 * Don't race with dup_mmap() getting the file and disallowing
> > > +		 * write access while someone might open the file writable.
> > > +		 */
> > > +		mmap_read_lock(mm);
> > > +		allow_write_access(old_exe_file);
> > >   		fput(old_exe_file);
> > > +		mmap_read_unlock(mm);
> > > +	}
> > >   	return 0;
> > >   }
> > > -- 
> > > 2.31.1
> > > 
> > 
> 
> 
> -- 
> Thanks,
> 
> David / dhildenb
> 

  reply	other threads:[~2021-08-12 12:32 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-12  8:43 [PATCH v1 0/7] Remove in-tree usage of MAP_DENYWRITE David Hildenbrand
2021-08-12  8:43 ` [PATCH v1 1/7] binfmt: don't use MAP_DENYWRITE when loading shared libraries via uselib() David Hildenbrand
2021-08-12  8:43 ` [PATCH v1 2/7] kernel/fork: factor out atomcially replacing the current MM exe_file David Hildenbrand
2021-08-12  9:17   ` Christian Brauner
2021-08-12  8:43 ` [PATCH v1 3/7] kernel/fork: always deny write access to " David Hildenbrand
2021-08-12 10:05   ` Christian Brauner
2021-08-12 10:13     ` David Hildenbrand
2021-08-12 12:32       ` Christian Brauner [this message]
2021-08-12 12:38         ` David Hildenbrand
2021-08-12 16:51   ` Linus Torvalds
2021-08-12 19:38     ` David Hildenbrand
2021-08-12  8:43 ` [PATCH v1 4/7] binfmt: remove in-tree usage of MAP_DENYWRITE David Hildenbrand
2021-08-12  8:43 ` [PATCH v1 5/7] mm: remove VM_DENYWRITE David Hildenbrand
2021-08-12  8:43 ` [PATCH v1 6/7] mm: ignore MAP_DENYWRITE in ksys_mmap_pgoff() David Hildenbrand
2021-08-12  8:43 ` [PATCH v1 7/7] fs: update documentation of get_write_access() and friends David Hildenbrand
2021-08-12 12:20 ` [PATCH v1 0/7] Remove in-tree usage of MAP_DENYWRITE Florian Weimer
2021-08-12 12:47   ` David Hildenbrand
2021-08-12 16:17   ` Eric W. Biederman
2021-08-12 17:32 ` Eric W. Biederman
2021-08-12 17:35   ` Andy Lutomirski
2021-08-12 17:48     ` Eric W. Biederman
2021-08-12 18:01       ` Andy Lutomirski
2021-08-12 18:10       ` Linus Torvalds
2021-08-12 18:47         ` Eric W. Biederman
2021-08-13  9:05           ` David Laight
     [not found]             ` <87h7ft2j68.fsf@disp2133>
2021-08-13 20:51               ` Florian Weimer
2021-08-14  0:31               ` Linus Torvalds
2021-08-14  0:49                 ` Andy Lutomirski
2021-08-14  0:54                   ` Linus Torvalds
2021-08-14  0:58                     ` Linus Torvalds
2021-08-14  1:57                       ` Al Viro
2021-08-14  2:02                         ` Al Viro
2021-08-14  9:06                           ` David Hildenbrand
2021-08-14  7:53                         ` Christian Brauner
2021-08-14 19:52                     ` David Laight
2021-08-26 17:48                     ` Andy Lutomirski
2021-08-26 21:47                       ` David Hildenbrand
2021-08-26 22:13                         ` Eric W. Biederman
2021-08-27  8:22                           ` David Laight
2021-08-27 15:58                             ` Eric W. Biederman
2021-09-01  8:28                           ` David Hildenbrand
2021-08-27 10:18                         ` Christian Brauner
2021-08-14  3:04                   ` Matthew Wilcox
2021-08-17 16:48                     ` Removing Mandatory Locks Eric W. Biederman
2021-08-17 16:50                       ` David Hildenbrand
2021-08-18  9:34                       ` Rodrigo Campos
2021-08-19 19:18                         ` Jeff Layton
2021-08-19 20:03                           ` Willy Tarreau
2021-08-19 18:39                       ` Jeff Layton
2021-08-19 19:15                         ` Linus Torvalds
2021-08-19 19:55                           ` Eric Biggers
2021-08-19 20:18                           ` Jeff Layton
2021-08-19 20:31                             ` Linus Torvalds
2021-08-19 21:43                               ` Jeff Layton
2021-08-19 22:32                                 ` Linus Torvalds
2021-08-20  8:30                                   ` David Laight
2021-08-23  7:55                                     ` Geert Uytterhoeven
2021-08-23  8:14                                       ` David Laight
2021-08-20 13:43                                   ` Steven Rostedt
2021-08-20 16:06                                     ` Linus Torvalds
2021-08-20  2:10                               ` Matthew Wilcox
2021-08-20  6:36                               ` Amir Goldstein
2021-08-20  7:14                                 ` Amir Goldstein
2021-08-20 12:27                                   ` Jeff Layton
2021-08-20 12:38                                     ` Willy Tarreau
2021-08-20 13:03                                       ` Jeff Layton
2021-08-20 13:11                                         ` Willy Tarreau
2021-08-20 16:30                           ` Kees Cook
2021-08-20 19:17                             ` H. Peter Anvin
2021-08-20 21:29                               ` Jeff Layton
2021-08-21 12:45                                 ` Jeff Layton
2021-08-23 22:15                                   ` J. Bruce Fields
2021-08-20 22:31                               ` Matthew Wilcox
2021-08-18  7:51                     ` [PATCH v1 0/7] Remove in-tree usage of MAP_DENYWRITE Christian Brauner
2021-08-18 15:42                   ` J. Bruce Fields
2021-08-19 13:56                     ` Eric W. Biederman
2021-08-19 14:33                       ` J. Bruce Fields
2021-08-20 12:54                         ` Jeff Layton
     [not found]                     ` <162943109106.9892.7426782042253067338@noble.neil.brown.name>
2021-08-20  8:25                       ` David Laight
2021-08-12 19:24         ` David Hildenbrand
2021-08-12 18:15       ` Florian Weimer
2021-08-12 18:21         ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210812123239.trksnm57owzwzokj@wittgenstein \
    --to=christian.brauner@ubuntu.com \
    --cc=Kevin.Brodsky@arm.com \
    --cc=Nicolas.Viennot@twosigma.com \
    --cc=acme@kernel.org \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=avagin@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=cfijalkovich@google.com \
    --cc=cgxu519@mykernel.net \
    --cc=chinwen.chang@mediatek.com \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=david@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=elver@google.com \
    --cc=feng.tang@intel.com \
    --cc=geert@linux-m68k.org \
    --cc=gerg@linux-m68k.org \
    --cc=hpa@zytor.com \
    --cc=jannh@google.com \
    --cc=jolsa@redhat.com \
    --cc=keescook@chromium.org \
    --cc=krisman@collabora.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-unionfs@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=mark.rutland@arm.com \
    --cc=mhocko@suse.com \
    --cc=miklos@szeredi.hu \
    --cc=mingo@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=namhyung@kernel.org \
    --cc=npiggin@gmail.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=shakeelb@google.com \
    --cc=shawn@anastas.io \
    --cc=steven.price@arm.com \
    --cc=surenb@google.com \
    --cc=tglx@linutronix.de \
    --cc=thomascedeno@google.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=vincenzo.frascino@arm.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=walken@google.com \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).