From: David Hildenbrand <david@redhat.com>
To: Christian Brauner <christian.brauner@ubuntu.com>
Cc: linux-kernel@vger.kernel.org,
"Linus Torvalds" <torvalds@linux-foundation.org>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Ingo Molnar" <mingo@redhat.com>,
"Borislav Petkov" <bp@alien8.de>,
"H. Peter Anvin" <hpa@zytor.com>,
"Alexander Viro" <viro@zeniv.linux.org.uk>,
"Alexey Dobriyan" <adobriyan@gmail.com>,
"Steven Rostedt" <rostedt@goodmis.org>,
"Peter Zijlstra" <peterz@infradead.org>,
"Arnaldo Carvalho de Melo" <acme@kernel.org>,
"Mark Rutland" <mark.rutland@arm.com>,
"Alexander Shishkin" <alexander.shishkin@linux.intel.com>,
"Jiri Olsa" <jolsa@redhat.com>,
"Namhyung Kim" <namhyung@kernel.org>,
"Petr Mladek" <pmladek@suse.com>,
"Sergey Senozhatsky" <sergey.senozhatsky@gmail.com>,
"Andy Shevchenko" <andriy.shevchenko@linux.intel.com>,
"Rasmus Villemoes" <linux@rasmusvillemoes.dk>,
"Kees Cook" <keescook@chromium.org>,
"Eric W. Biederman" <ebiederm@xmission.com>,
"Greg Ungerer" <gerg@linux-m68k.org>,
"Geert Uytterhoeven" <geert@linux-m68k.org>,
"Mike Rapoport" <rppt@kernel.org>,
"Vlastimil Babka" <vbabka@suse.cz>,
"Vincenzo Frascino" <vincenzo.frascino@arm.com>,
"Chinwen Chang" <chinwen.chang@mediatek.com>,
"Michel Lespinasse" <walken@google.com>,
"Catalin Marinas" <catalin.marinas@arm.com>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
"Huang Ying" <ying.huang@intel.com>,
"Jann Horn" <jannh@google.com>, "Feng Tang" <feng.tang@intel.com>,
"Kevin Brodsky" <Kevin.Brodsky@arm.com>,
"Michael Ellerman" <mpe@ellerman.id.au>,
"Shawn Anastasio" <shawn@anastas.io>,
"Steven Price" <steven.price@arm.com>,
"Nicholas Piggin" <npiggin@gmail.com>,
"Jens Axboe" <axboe@kernel.dk>,
"Gabriel Krisman Bertazi" <krisman@collabora.com>,
"Peter Xu" <peterx@redhat.com>,
"Suren Baghdasaryan" <surenb@google.com>,
"Shakeel Butt" <shakeelb@google.com>,
"Marco Elver" <elver@google.com>,
"Daniel Jordan" <daniel.m.jordan@oracle.com>,
"Nicolas Viennot" <Nicolas.Viennot@twosigma.com>,
"Thomas Cedeno" <thomascedeno@google.com>,
"Collin Fijalkovich" <cfijalkovich@google.com>,
"Michal Hocko" <mhocko@suse.com>,
"Miklos Szeredi" <miklos@szeredi.hu>,
"Chengguang Xu" <cgxu519@mykernel.net>,
"Christian König" <ckoenig.leichtzumerken@gmail.com>,
linux-unionfs@vger.kernel.org, linux-api@vger.kernel.org,
x86@kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, "Andrei Vagin" <avagin@gmail.com>
Subject: Re: [PATCH v1 3/7] kernel/fork: always deny write access to current MM exe_file
Date: Thu, 12 Aug 2021 12:13:44 +0200 [thread overview]
Message-ID: <1b6d27cf-2238-0c1c-c563-b38728fbabc2@redhat.com> (raw)
In-Reply-To: <20210812100544.uhsfp75b4jcrv3qx@wittgenstein>
On 12.08.21 12:05, Christian Brauner wrote:
> [+Cc Andrei]
>
> On Thu, Aug 12, 2021 at 10:43:44AM +0200, David Hildenbrand wrote:
>> We want to remove VM_DENYWRITE only currently only used when mapping the
>> executable during exec. During exec, we already deny_write_access() the
>> executable, however, after exec completes the VMAs mapped
>> with VM_DENYWRITE effectively keeps write access denied via
>> deny_write_access().
>>
>> Let's deny write access when setting the MM exe_file. With this change, we
>> can remove VM_DENYWRITE for mapping executables.
>>
>> This represents a minor user space visible change:
>> sys_prctl(PR_SET_MM_EXE_FILE) can now fail if the file is already
>> opened writable. Also, after sys_prctl(PR_SET_MM_EXE_FILE), the file
>
> Just for completeness, this also affects PR_SET_MM_MAP when exe_fd is
> set.
Correct.
>
>> cannot be opened writable. Note that we can already fail with -EACCES if
>> the file doesn't have execute permissions.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>
> The biggest user I know and that I'm involved in is CRIU which heavily
> uses PR_SET_MM_MAP (with a fallback to PR_SET_MM_EXE_FILE on older
> kernels) during restore. Afair, criu opens the exe fd as an O_PATH
> during dump and thus will use the same flag during restore when
> opening it. So that should be fine.
Yes.
>
> However, if I understand the consequences of this change correctly, a
> problem could be restoring workloads that hold a writable fd open to
> their exe file at dump time which would mean that during restore that fd
> would be reopened writable causing CRIU to fail when setting the exe
> file for the task to be restored.
If it's their exe file, then the existing VM_DENYWRITE handling would
have forbidden these workloads to open the fd of their exe file
writable, right? At least before doing any
PR_SET_MM_MAP/PR_SET_MM_EXE_FILE. But that should rule out quite a lot
of cases we might be worried about, right?
>
> Which honestly, no idea how many such workloads exist. (I know at least
> of runC and LXC need to sometimes reopen to rexec themselves (weird bug
> to protect against attacking the exe file) and thus re-open
> /proc/self/exe but read-only.)
>
>> kernel/fork.c | 39 ++++++++++++++++++++++++++++++++++-----
>> 1 file changed, 34 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/fork.c b/kernel/fork.c
>> index 6bd2e52bcdfb..5d904878f19b 100644
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
>> @@ -476,6 +476,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>> {
>> struct vm_area_struct *mpnt, *tmp, *prev, **pprev;
>> struct rb_node **rb_link, *rb_parent;
>> + struct file *exe_file;
>> int retval;
>> unsigned long charge;
>> LIST_HEAD(uf);
>> @@ -493,7 +494,10 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>> mmap_write_lock_nested(mm, SINGLE_DEPTH_NESTING);
>>
>> /* No ordering required: file already has been exposed. */
>> - RCU_INIT_POINTER(mm->exe_file, get_mm_exe_file(oldmm));
>> + exe_file = get_mm_exe_file(oldmm);
>> + RCU_INIT_POINTER(mm->exe_file, exe_file);
>> + if (exe_file)
>> + deny_write_access(exe_file);
>>
>> mm->total_vm = oldmm->total_vm;
>> mm->data_vm = oldmm->data_vm;
>> @@ -638,8 +642,13 @@ static inline void mm_free_pgd(struct mm_struct *mm)
>> #else
>> static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm)
>> {
>> + struct file *exe_file;
>> +
>> mmap_write_lock(oldmm);
>> - RCU_INIT_POINTER(mm->exe_file, get_mm_exe_file(oldmm));
>> + exe_file = get_mm_exe_file(oldmm);
>> + RCU_INIT_POINTER(mm->exe_file, exe_file);
>> + if (exe_file)
>> + deny_write_access(exe_file);
>> mmap_write_unlock(oldmm);
>> return 0;
>> }
>> @@ -1163,11 +1172,19 @@ void set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file)
>> */
>> old_exe_file = rcu_dereference_raw(mm->exe_file);
>>
>> - if (new_exe_file)
>> + if (new_exe_file) {
>> get_file(new_exe_file);
>> + /*
>> + * exec code is required to deny_write_access() successfully,
>> + * so this cannot fail
>> + */
>> + deny_write_access(new_exe_file);
>> + }
>> rcu_assign_pointer(mm->exe_file, new_exe_file);
>> - if (old_exe_file)
>> + if (old_exe_file) {
>> + allow_write_access(old_exe_file);
>> fput(old_exe_file);
>> + }
>> }
>>
>> int atomic_set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file)
>> @@ -1194,10 +1211,22 @@ int atomic_set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file)
>> }
>>
>> /* set the new file, lockless */
>> + ret = deny_write_access(new_exe_file);
>> + if (ret)
>> + return -EACCES;
>> get_file(new_exe_file);
>> +
>> old_exe_file = xchg(&mm->exe_file, new_exe_file);
>> - if (old_exe_file)
>> + if (old_exe_file) {
>> + /*
>> + * Don't race with dup_mmap() getting the file and disallowing
>> + * write access while someone might open the file writable.
>> + */
>> + mmap_read_lock(mm);
>> + allow_write_access(old_exe_file);
>> fput(old_exe_file);
>> + mmap_read_unlock(mm);
>> + }
>> return 0;
>> }
>>
>> --
>> 2.31.1
>>
>
--
Thanks,
David / dhildenb
next prev parent reply other threads:[~2021-08-12 10:13 UTC|newest]
Thread overview: 82+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-12 8:43 [PATCH v1 0/7] Remove in-tree usage of MAP_DENYWRITE David Hildenbrand
2021-08-12 8:43 ` [PATCH v1 1/7] binfmt: don't use MAP_DENYWRITE when loading shared libraries via uselib() David Hildenbrand
2021-08-12 8:43 ` [PATCH v1 2/7] kernel/fork: factor out atomcially replacing the current MM exe_file David Hildenbrand
2021-08-12 9:17 ` Christian Brauner
2021-08-12 8:43 ` [PATCH v1 3/7] kernel/fork: always deny write access to " David Hildenbrand
2021-08-12 10:05 ` Christian Brauner
2021-08-12 10:13 ` David Hildenbrand [this message]
2021-08-12 12:32 ` Christian Brauner
2021-08-12 12:38 ` David Hildenbrand
2021-08-12 16:51 ` Linus Torvalds
2021-08-12 19:38 ` David Hildenbrand
2021-08-12 8:43 ` [PATCH v1 4/7] binfmt: remove in-tree usage of MAP_DENYWRITE David Hildenbrand
2021-08-12 8:43 ` [PATCH v1 5/7] mm: remove VM_DENYWRITE David Hildenbrand
2021-08-12 8:43 ` [PATCH v1 6/7] mm: ignore MAP_DENYWRITE in ksys_mmap_pgoff() David Hildenbrand
2021-08-12 8:43 ` [PATCH v1 7/7] fs: update documentation of get_write_access() and friends David Hildenbrand
2021-08-12 12:20 ` [PATCH v1 0/7] Remove in-tree usage of MAP_DENYWRITE Florian Weimer
2021-08-12 12:47 ` David Hildenbrand
2021-08-12 16:17 ` Eric W. Biederman
2021-08-12 17:32 ` Eric W. Biederman
2021-08-12 17:35 ` Andy Lutomirski
2021-08-12 17:48 ` Eric W. Biederman
2021-08-12 18:01 ` Andy Lutomirski
2021-08-12 18:10 ` Linus Torvalds
2021-08-12 18:47 ` Eric W. Biederman
2021-08-13 9:05 ` David Laight
[not found] ` <87h7ft2j68.fsf@disp2133>
2021-08-13 20:51 ` Florian Weimer
2021-08-14 0:31 ` Linus Torvalds
2021-08-14 0:49 ` Andy Lutomirski
2021-08-14 0:54 ` Linus Torvalds
2021-08-14 0:58 ` Linus Torvalds
2021-08-14 1:57 ` Al Viro
2021-08-14 2:02 ` Al Viro
2021-08-14 9:06 ` David Hildenbrand
2021-08-14 7:53 ` Christian Brauner
2021-08-14 19:52 ` David Laight
2021-08-26 17:48 ` Andy Lutomirski
2021-08-26 21:47 ` David Hildenbrand
2021-08-26 22:13 ` Eric W. Biederman
2021-08-27 8:22 ` David Laight
2021-08-27 15:58 ` Eric W. Biederman
2021-09-01 8:28 ` David Hildenbrand
2021-08-27 10:18 ` Christian Brauner
2021-08-14 3:04 ` Matthew Wilcox
2021-08-17 16:48 ` Removing Mandatory Locks Eric W. Biederman
2021-08-17 16:50 ` David Hildenbrand
2021-08-18 9:34 ` Rodrigo Campos
2021-08-19 19:18 ` Jeff Layton
2021-08-19 20:03 ` Willy Tarreau
2021-08-19 18:39 ` Jeff Layton
2021-08-19 19:15 ` Linus Torvalds
2021-08-19 19:55 ` Eric Biggers
2021-08-19 20:18 ` Jeff Layton
2021-08-19 20:31 ` Linus Torvalds
2021-08-19 21:43 ` Jeff Layton
2021-08-19 22:32 ` Linus Torvalds
2021-08-20 8:30 ` David Laight
2021-08-23 7:55 ` Geert Uytterhoeven
2021-08-23 8:14 ` David Laight
2021-08-20 13:43 ` Steven Rostedt
2021-08-20 16:06 ` Linus Torvalds
2021-08-20 2:10 ` Matthew Wilcox
2021-08-20 6:36 ` Amir Goldstein
2021-08-20 7:14 ` Amir Goldstein
2021-08-20 12:27 ` Jeff Layton
2021-08-20 12:38 ` Willy Tarreau
2021-08-20 13:03 ` Jeff Layton
2021-08-20 13:11 ` Willy Tarreau
2021-08-20 16:30 ` Kees Cook
2021-08-20 19:17 ` H. Peter Anvin
2021-08-20 21:29 ` Jeff Layton
2021-08-21 12:45 ` Jeff Layton
2021-08-23 22:15 ` J. Bruce Fields
2021-08-20 22:31 ` Matthew Wilcox
2021-08-18 7:51 ` [PATCH v1 0/7] Remove in-tree usage of MAP_DENYWRITE Christian Brauner
2021-08-18 15:42 ` J. Bruce Fields
2021-08-19 13:56 ` Eric W. Biederman
2021-08-19 14:33 ` J. Bruce Fields
2021-08-20 12:54 ` Jeff Layton
[not found] ` <162943109106.9892.7426782042253067338@noble.neil.brown.name>
2021-08-20 8:25 ` David Laight
2021-08-12 19:24 ` David Hildenbrand
2021-08-12 18:15 ` Florian Weimer
2021-08-12 18:21 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1b6d27cf-2238-0c1c-c563-b38728fbabc2@redhat.com \
--to=david@redhat.com \
--cc=Kevin.Brodsky@arm.com \
--cc=Nicolas.Viennot@twosigma.com \
--cc=acme@kernel.org \
--cc=adobriyan@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=alexander.shishkin@linux.intel.com \
--cc=andriy.shevchenko@linux.intel.com \
--cc=avagin@gmail.com \
--cc=axboe@kernel.dk \
--cc=bp@alien8.de \
--cc=catalin.marinas@arm.com \
--cc=cfijalkovich@google.com \
--cc=cgxu519@mykernel.net \
--cc=chinwen.chang@mediatek.com \
--cc=christian.brauner@ubuntu.com \
--cc=ckoenig.leichtzumerken@gmail.com \
--cc=daniel.m.jordan@oracle.com \
--cc=ebiederm@xmission.com \
--cc=elver@google.com \
--cc=feng.tang@intel.com \
--cc=geert@linux-m68k.org \
--cc=gerg@linux-m68k.org \
--cc=hpa@zytor.com \
--cc=jannh@google.com \
--cc=jolsa@redhat.com \
--cc=keescook@chromium.org \
--cc=krisman@collabora.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-unionfs@vger.kernel.org \
--cc=linux@rasmusvillemoes.dk \
--cc=mark.rutland@arm.com \
--cc=mhocko@suse.com \
--cc=miklos@szeredi.hu \
--cc=mingo@redhat.com \
--cc=mpe@ellerman.id.au \
--cc=namhyung@kernel.org \
--cc=npiggin@gmail.com \
--cc=peterx@redhat.com \
--cc=peterz@infradead.org \
--cc=pmladek@suse.com \
--cc=rostedt@goodmis.org \
--cc=rppt@kernel.org \
--cc=sergey.senozhatsky@gmail.com \
--cc=shakeelb@google.com \
--cc=shawn@anastas.io \
--cc=steven.price@arm.com \
--cc=surenb@google.com \
--cc=tglx@linutronix.de \
--cc=thomascedeno@google.com \
--cc=torvalds@linux-foundation.org \
--cc=vbabka@suse.cz \
--cc=vincenzo.frascino@arm.com \
--cc=viro@zeniv.linux.org.uk \
--cc=walken@google.com \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).