linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christian Brauner <christian.brauner@ubuntu.com>
To: Al Viro <viro@zeniv.linux.org.uk>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andy Lutomirski <luto@kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
	"David Laight" <David.Laight@aculab.com>,
	"David Hildenbrand" <david@redhat.com>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Borislav Petkov" <bp@alien8.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"Alexey Dobriyan" <adobriyan@gmail.com>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	"Arnaldo Carvalho de Melo" <acme@kernel.org>,
	"Mark Rutland" <mark.rutland@arm.com>,
	"Alexander Shishkin" <alexander.shishkin@linux.intel.com>,
	"Jiri Olsa" <jolsa@redhat.com>,
	"Namhyung Kim" <namhyung@kernel.org>,
	"Petr Mladek" <pmladek@suse.com>,
	"Sergey Senozhatsky" <sergey.senozhatsky@gmail.com>,
	"Andy Shevchenko" <andriy.shevchenko@linux.intel.com>,
	"Rasmus Villemoes" <linux@rasmusvillemoes.dk>,
	"Kees Cook" <keescook@chromium.org>,
	"Greg Ungerer" <gerg@linux-m68k.org>,
	"Geert Uytterhoeven" <geert@linux-m68k.org>,
	"Mike Rapoport" <rppt@kernel.org>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Vincenzo Frascino" <vincenzo.frascino@arm.com>,
	"Chinwen Chang" <chinwen.chang@mediatek.com>,
	"Michel Lespinasse" <walken@google.com>,
	"Catalin Marinas" <catalin.marinas@arm.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	"Huang Ying" <ying.huang@intel.com>,
	"Jann Horn" <jannh@google.com>, "Feng Tang" <feng.tang@intel.com>,
	"Kevin Brodsky" <Kevin.Brodsky@arm.com>,
	"Michael Ellerman" <mpe@ellerman.id.au>,
	"Shawn Anastasio" <shawn@anastas.io>,
	"Steven Price" <steven.price@arm.com>,
	"Nicholas Piggin" <npiggin@gmail.com>,
	"Jens Axboe" <axboe@kernel.dk>,
	"Gabriel Krisman Bertazi" <krisman@collabora.com>,
	"Peter Xu" <peterx@redhat.com>,
	"Suren Baghdasaryan" <surenb@google.com>,
	"Shakeel Butt" <shakeelb@google.com>,
	"Marco Elver" <elver@google.com>,
	"Daniel Jordan" <daniel.m.jordan@oracle.com>,
	"Nicolas Viennot" <Nicolas.Viennot@twosigma.com>,
	"Thomas Cedeno" <thomascedeno@google.com>,
	"Collin Fijalkovich" <cfijalkovich@google.com>,
	"Michal Hocko" <mhocko@suse.com>,
	"Miklos Szeredi" <miklos@szeredi.hu>,
	"Chengguang Xu" <cgxu519@mykernel.net>,
	"Christian König" <ckoenig.leichtzumerken@gmail.com>,
	"linux-unionfs@vger.kernel.org" <linux-unionfs@vger.kernel.org>,
	"Linux API" <linux-api@vger.kernel.org>,
	"the arch/x86 maintainers" <x86@kernel.org>,
	"<linux-fsdevel@vger.kernel.org>" <linux-fsdevel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	"Florian Weimer" <fweimer@redhat.com>,
	"Michael Kerrisk" <mtk.manpages@gmail.com>,
	"Christoph Hellwig" <hch@lst.de>
Subject: Re: [PATCH v1 0/7] Remove in-tree usage of MAP_DENYWRITE
Date: Sat, 14 Aug 2021 09:53:33 +0200	[thread overview]
Message-ID: <20210814075333.7333bxduk4tei57i@wittgenstein> (raw)
In-Reply-To: <YRcjCwfHvUZhcKf3@zeniv-ca.linux.org.uk>

On Sat, Aug 14, 2021 at 01:57:31AM +0000, Al Viro wrote:
> On Fri, Aug 13, 2021 at 02:58:57PM -1000, Linus Torvalds wrote:
> > On Fri, Aug 13, 2021 at 2:54 PM Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > > And nobody really complained when we weakened it, so maybe removing it
> > > entirely might be acceptable.
> > 
> > I guess we could just try it and see... Worst comes to worst, we'll
> > have to put it back, but at least we'd know what crazy thing still
> > wants it..
> 
> Umm...  I'll need to go back and look through the thread, but I'm
> fairly sure that there used to be suckers that did replacement of
> binary that way (try to write, count on exclusion with execve while
> it's being written to) instead of using rename.  Install scripts
> of weird crap and stuff like that...

I'm not agains trying to remove it, but I think Al has a point.

Removing the write protection will also most certainly make certain
classes of attacks _easier_. For example, the runC container breakout
from last year using privileged containers issued CVE-2019-5736 would be
easier. I'm quoting from the commit I fixed this with:

    The attack can be made when attaching to a running container or when starting a
    container running a specially crafted image.  For example, when runC attaches
    to a container the attacker can trick it into executing itself. This could be
    done by replacing the target binary inside the container with a custom binary
    pointing back at the runC binary itself. As an example, if the target binary
    was /bin/bash, this could be replaced with an executable script specifying the
    interpreter path #!/proc/self/exe (/proc/self/exec is a symbolic link created
    by the kernel for every process which points to the binary that was executed
    for that process). As such when /bin/bash is executed inside the container,
    instead the target of /proc/self/exe will be executed - which will point to the
    runc binary on the host. The attacker can then proceed to write to the target
    of /proc/self/exe to try and overwrite the runC binary on the host.

and then the write protection kicks in of course:

    However in general, this will not succeed as the kernel will not
    permit it to be overwritten whilst runC is executing.

which the attack can of course already overcome nowadays with minimal
smarts:

    To overcome this, the attacker can instead open a file descriptor to
    /proc/self/exe using the O_PATH flag and then proceed to reopen the
    binary as O_WRONLY through /proc/self/fd/<nr> and try to write to it
    in a busy loop from a separate process. Ultimately it will succeed
    when the runC binary exits. After this the runC binary is
    compromised and can be used to attack other containers or the host
    itself.

But with write protection removed you'd allow such attacks to succeed
right away. It's not a huge deal to remove it since we need to have
other protection mechanisms in place already:

    To prevent this attack, LXC has been patched to create a temporary copy of the
    calling binary itself when it starts or attaches to containers. To do this LXC
    creates an anonymous, in-memory file using the memfd_create() system call and
    copies itself into the temporary in-memory file, which is then sealed to
    prevent further modifications. LXC then executes this sealed, in-memory file
    instead of the original on-disk binary. Any compromising write operations from
    a privileged container to the host LXC binary will then write to the temporary
    in-memory binary and not to the host binary on-disk, preserving the integrity
    of the host LXC binary. Also as the temporary, in-memory LXC binary is sealed,
    writes to this will also fail.

    Note: memfd_create() was added to the Linux kernel in the 3.17 release.

However, I still like to pich the upgrade mask idea Aleksa and we tried
to implement when we did openat2(). If we leave write-protection in
preventing /proc/self/exe from being written to:

we can take some time and upstream the upgrade mask patchset which was
part of the initial openat2() patchset but was dropped back then (and I
had Linus remove the last remants of the idea in [1]).

The idea was to add a new field to struct open_how "upgrade_mask" that
would allow a caller to specify with what permissions an fd could be
reopened with. I still like this idea a great deal and it would be a
very welcome addition to system management programs. The upgrade mask is
of course optional, i.e. the caller would have to specify the upgrade
mask at open time to restrict reopening (lest we regress the whole
world).

But, we could make it so that an O_PATH fd gotten from opening
/proc/<pid>/exe always gets a restricted upgrade mask set and so it
can't be upgraded to a O_WRONLY fd afterwards. For this to be
meaningful, write protection for /proc/self/exe would need to be kept.

[1]: commit 5c350aa11b441b32baf3bfe4018168cb8d10cef7
     Author: Christian Brauner <christian.brauner@ubuntu.com>
     Date:   Fri May 28 11:24:15 2021 +0200
     
         fcntl: remove unused VALID_UPGRADE_FLAGS
     
         We currently do not maky use of this feature and should we implement
         something like this in the future it's trivial to add it back.
     
         Link: https://lore.kernel.org/r/20210528092417.3942079-2-brauner@kernel.org
         Cc: Christoph Hellwig <hch@lst.de>
         Cc: Aleksa Sarai <cyphar@cyphar.com>
         Cc: Al Viro <viro@zeniv.linux.org.uk>
         Cc: linux-fsdevel@vger.kernel.org
         Suggested-by: Richard Guy Briggs <rgb@redhat.com>
         Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
         Reviewed-by: Christoph Hellwig <hch@lst.de>
         Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

  parent reply	other threads:[~2021-08-14  7:53 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-12  8:43 [PATCH v1 0/7] Remove in-tree usage of MAP_DENYWRITE David Hildenbrand
2021-08-12  8:43 ` [PATCH v1 1/7] binfmt: don't use MAP_DENYWRITE when loading shared libraries via uselib() David Hildenbrand
2021-08-12  8:43 ` [PATCH v1 2/7] kernel/fork: factor out atomcially replacing the current MM exe_file David Hildenbrand
2021-08-12  9:17   ` Christian Brauner
2021-08-12  8:43 ` [PATCH v1 3/7] kernel/fork: always deny write access to " David Hildenbrand
2021-08-12 10:05   ` Christian Brauner
2021-08-12 10:13     ` David Hildenbrand
2021-08-12 12:32       ` Christian Brauner
2021-08-12 12:38         ` David Hildenbrand
2021-08-12 16:51   ` Linus Torvalds
2021-08-12 19:38     ` David Hildenbrand
2021-08-12  8:43 ` [PATCH v1 4/7] binfmt: remove in-tree usage of MAP_DENYWRITE David Hildenbrand
2021-08-12  8:43 ` [PATCH v1 5/7] mm: remove VM_DENYWRITE David Hildenbrand
2021-08-12  8:43 ` [PATCH v1 6/7] mm: ignore MAP_DENYWRITE in ksys_mmap_pgoff() David Hildenbrand
2021-08-12  8:43 ` [PATCH v1 7/7] fs: update documentation of get_write_access() and friends David Hildenbrand
2021-08-12 12:20 ` [PATCH v1 0/7] Remove in-tree usage of MAP_DENYWRITE Florian Weimer
2021-08-12 12:47   ` David Hildenbrand
2021-08-12 16:17   ` Eric W. Biederman
2021-08-12 17:32 ` Eric W. Biederman
2021-08-12 17:35   ` Andy Lutomirski
2021-08-12 17:48     ` Eric W. Biederman
2021-08-12 18:01       ` Andy Lutomirski
2021-08-12 18:10       ` Linus Torvalds
2021-08-12 18:47         ` Eric W. Biederman
2021-08-13  9:05           ` David Laight
     [not found]             ` <87h7ft2j68.fsf@disp2133>
2021-08-13 20:51               ` Florian Weimer
2021-08-14  0:31               ` Linus Torvalds
2021-08-14  0:49                 ` Andy Lutomirski
2021-08-14  0:54                   ` Linus Torvalds
2021-08-14  0:58                     ` Linus Torvalds
2021-08-14  1:57                       ` Al Viro
2021-08-14  2:02                         ` Al Viro
2021-08-14  9:06                           ` David Hildenbrand
2021-08-14  7:53                         ` Christian Brauner [this message]
2021-08-14 19:52                     ` David Laight
2021-08-26 17:48                     ` Andy Lutomirski
2021-08-26 21:47                       ` David Hildenbrand
2021-08-26 22:13                         ` Eric W. Biederman
2021-08-27  8:22                           ` David Laight
2021-08-27 15:58                             ` Eric W. Biederman
2021-09-01  8:28                           ` David Hildenbrand
2021-08-27 10:18                         ` Christian Brauner
2021-08-14  3:04                   ` Matthew Wilcox
2021-08-17 16:48                     ` Removing Mandatory Locks Eric W. Biederman
2021-08-17 16:50                       ` David Hildenbrand
2021-08-18  9:34                       ` Rodrigo Campos
2021-08-19 19:18                         ` Jeff Layton
2021-08-19 20:03                           ` Willy Tarreau
2021-08-19 18:39                       ` Jeff Layton
2021-08-19 19:15                         ` Linus Torvalds
2021-08-19 19:55                           ` Eric Biggers
2021-08-19 20:18                           ` Jeff Layton
2021-08-19 20:31                             ` Linus Torvalds
2021-08-19 21:43                               ` Jeff Layton
2021-08-19 22:32                                 ` Linus Torvalds
2021-08-20  8:30                                   ` David Laight
2021-08-23  7:55                                     ` Geert Uytterhoeven
2021-08-23  8:14                                       ` David Laight
2021-08-20 13:43                                   ` Steven Rostedt
2021-08-20 16:06                                     ` Linus Torvalds
2021-08-20  2:10                               ` Matthew Wilcox
2021-08-20  6:36                               ` Amir Goldstein
2021-08-20  7:14                                 ` Amir Goldstein
2021-08-20 12:27                                   ` Jeff Layton
2021-08-20 12:38                                     ` Willy Tarreau
2021-08-20 13:03                                       ` Jeff Layton
2021-08-20 13:11                                         ` Willy Tarreau
2021-08-20 16:30                           ` Kees Cook
2021-08-20 19:17                             ` H. Peter Anvin
2021-08-20 21:29                               ` Jeff Layton
2021-08-21 12:45                                 ` Jeff Layton
2021-08-23 22:15                                   ` J. Bruce Fields
2021-08-20 22:31                               ` Matthew Wilcox
2021-08-18  7:51                     ` [PATCH v1 0/7] Remove in-tree usage of MAP_DENYWRITE Christian Brauner
2021-08-18 15:42                   ` J. Bruce Fields
2021-08-19 13:56                     ` Eric W. Biederman
2021-08-19 14:33                       ` J. Bruce Fields
2021-08-20 12:54                         ` Jeff Layton
     [not found]                     ` <162943109106.9892.7426782042253067338@noble.neil.brown.name>
2021-08-20  8:25                       ` David Laight
2021-08-12 19:24         ` David Hildenbrand
2021-08-12 18:15       ` Florian Weimer
2021-08-12 18:21         ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210814075333.7333bxduk4tei57i@wittgenstein \
    --to=christian.brauner@ubuntu.com \
    --cc=David.Laight@aculab.com \
    --cc=Kevin.Brodsky@arm.com \
    --cc=Nicolas.Viennot@twosigma.com \
    --cc=acme@kernel.org \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=axboe@kernel.dk \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=cfijalkovich@google.com \
    --cc=cgxu519@mykernel.net \
    --cc=chinwen.chang@mediatek.com \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=david@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=elver@google.com \
    --cc=feng.tang@intel.com \
    --cc=fweimer@redhat.com \
    --cc=geert@linux-m68k.org \
    --cc=gerg@linux-m68k.org \
    --cc=hch@lst.de \
    --cc=hpa@zytor.com \
    --cc=jannh@google.com \
    --cc=jolsa@redhat.com \
    --cc=keescook@chromium.org \
    --cc=krisman@collabora.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-unionfs@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=luto@kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mhocko@suse.com \
    --cc=miklos@szeredi.hu \
    --cc=mingo@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=mtk.manpages@gmail.com \
    --cc=namhyung@kernel.org \
    --cc=npiggin@gmail.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=shakeelb@google.com \
    --cc=shawn@anastas.io \
    --cc=steven.price@arm.com \
    --cc=surenb@google.com \
    --cc=tglx@linutronix.de \
    --cc=thomascedeno@google.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=vincenzo.frascino@arm.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=walken@google.com \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).