All of lore.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: David Laight <David.Laight@ACULAB.COM>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>,
	"Andy Lutomirski" <luto@kernel.org>,
	"David Hildenbrand" <david@redhat.com>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Borislav Petkov" <bp@alien8.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"Al Viro" <viro@zeniv.linux.org.uk>,
	"Alexey Dobriyan" <adobriyan@gmail.com>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	"Arnaldo Carvalho de Melo" <acme@kernel.org>,
	"Mark Rutland" <mark.rutland@arm.com>,
	"Alexander Shishkin" <alexander.shishkin@linux.intel.com>,
	"Jiri Olsa" <jolsa@redhat.com>,
	"Namhyung Kim" <namhyung@kernel.org>,
	"Petr Mladek" <pmladek@suse.com>,
	"Sergey Senozhatsky" <sergey.senozhatsky@gmail.com>,
	"Andy Shevchenko" <andriy.shevchenko@linux.intel.com>,
	"Rasmus Villemoes" <linux@rasmusvillemoes.dk>,
	"Kees Cook" <keescook@chromium.org>,
	"Greg Ungerer" <gerg@linux-m68k.org>,
	"Geert Uytterhoeven" <geert@linux-m68k.org>,
	"Mike Rapoport" <rppt@kernel.org>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Vincenzo Frascino" <vincenzo.frascino@arm.com>,
	"Chinwen Chang" <chinwen.chang@mediatek.com>,
	"Michel Lespinasse" <walken@google.com>,
	"Catalin Marinas" <catalin.marinas@arm.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	"Huang Ying" <ying.huang@intel.com>,
	"Jann Horn" <jannh@google.com>, "Feng Tang" <feng.tang@intel.com>,
	"Kevin Brodsky" <Kevin.Brodsky@arm.com>,
	"Michael Ellerman" <mpe@ellerman.id.au>,
	"Shawn Anastasio" <shawn@anastas.io>,
	"Steven Price" <steven.price@arm.com>,
	"Nicholas Piggin" <npiggin@gmail.com>,
	"Christian Brauner" <christian.brauner@ubuntu.com>,
	"Jens Axboe" <axboe@kernel.dk>,
	"Gabriel Krisman Bertazi" <krisman@collabora.com>,
	"Peter Xu" <peterx@redhat.com>,
	"Suren Baghdasaryan" <surenb@google.com>,
	"Shakeel Butt" <shakeelb@google.com>,
	"Marco Elver" <elver@google.com>,
	"Daniel Jordan" <daniel.m.jordan@oracle.com>,
	"Nicolas Viennot" <Nicolas.Viennot@twosigma.com>,
	"Thomas Cedeno" <thomascedeno@google.com>,
	"Collin Fijalkovich" <cfijalkovich@google.com>,
	"Michal Hocko" <mhocko@suse.com>,
	"Miklos Szeredi" <miklos@szeredi.hu>,
	"Chengguang Xu" <cgxu519@mykernel.net>,
	"Christian König" <ckoenig.leichtzumerken@gmail.com>,
	"linux-unionfs@vger.kernel.org" <linux-unionfs@vger.kernel.org>,
	"Linux API" <linux-api@vger.kernel.org>,
	"the arch/x86 maintainers" <x86@kernel.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v1 0/7] Remove in-tree usage of MAP_DENYWRITE
Date: Fri, 13 Aug 2021 15:17:51 -0500	[thread overview]
Message-ID: <87h7ft2j68.fsf@disp2133> (raw)
In-Reply-To: <5b0d7c1e73ca43ef9ce6665fec6c4d7e@AcuMS.aculab.com> (David Laight's message of "Fri, 13 Aug 2021 09:05:43 +0000")

David Laight <David.Laight@ACULAB.COM> writes:

> From: Eric W. Biederman
>> Sent: 12 August 2021 19:47
> ...
>> So today the best advice I can give to userspace is to mark their
>> executables and shared libraries as read-only and immutable.  Otherwise
>> a change to the executable file can change what is mapped into memory.
>> MAP_PRIVATE does not help.
>
> While 'immutable' might be ok for files installed by distributions
> it would be a PITA in development.

For development simply making the files read-only should be sufficient.

What I think should happen is when a new binary is installed it should
always be placed in a new file and renamed to the old name, rather than
copied over the old file.  That can be added to a makefile by writing to
a temporary file name and then using "mv" to the final name.

I tried to look at which options I would need to give to install to
implement that pattern but I don't see it.  Perhaps -b for backup.

I thought I could overcome the feature of CAP_DAC_OVERRIDE where
read-only files can be over-written by adding the immutable attribute.
I just tested and reread the code and I see that using immutable has
a couple of problems.  Only root is allowed to set immutable, and not
even root is allowed to rename or delete immutable files while the
immutable attribute is set.

> ETXTBUSY is a useful reminder that the file you are copying from
> machine A to machine B (etc) is still running and probably ought
> to be killed/stopped before you get confused.

That is true.

> I've never really understood why it doesn't stop shared libraries
> being overwritten - but they do tend to be updated less often.

The problem is that MAP_DENYWRITE can be applied to any file.  Which
makes it another kind of mandatory file lock, and it allows blocking
preventing all writes to any file you can read/mmap.  Which creates all
kinds of denial-of service opportunities.

A nasty example would be using mmap MAP_DENYWRITE on /var/log/messages.
Which a hostile actor could use to hide traces of their presence on a
machine.

So far no one has pointed out how to abuse denying writes to
/proc/self/exe so we can keep the denywrite behavior there for now.

> Overwriting an in-use shared library could be really confusing.
> It is likely that all the code is actually in memory.
> So everything carries on running as normal.
> Until the kernel gets under memory pressure and discards a page.
> Then a page from the new version is faulted in and random
> programs start getting SEGVs.
> This could be days after the borked update.

This should actually happen quite quickly after a borked update.  The
pages in the page cache are cache coherent so as soon as you get a cpu
cache flush the new contents of the overwritten page will be visible.



Which gets to half of the confusion with this.  Long ago and far away
rtld in glibc was written with the assumption that something called
MAP_COPY existed (I think it exists on hurd?).  On Linux at one point it
was emulated with MAP_PRIVATE | MAP_DENYWRITE.  Then we made
MAP_DENYWRITE a noop.  This was probably 20 years ago now.

The glibc rtld implementation is still written using MAP_COPY with
MAP_COPY defined to MAP_PRIVATE | MAP_DENYWRITE on linux.

Even with all of the improvements to the linux mm subsystem since the
year 2000.  Linux the linux mm code does not have the infrastructure to
support MAP_COPY.  Semantically MAP_COPY sounds nice.  An implementation
unfortunately would require that anyone who performs a write(2) to a
file mapped MAP_COPY would require walking the file mapping data
structures and creating one copy of the page for each place that page is
mapped MAP_COPY.  Which in the case of someone overwriting rtld would
require one copy of ld.so in memory for each program running on the
system.  A 457 * 162KiB = 74MiB increase on my little system that has
been up for a while.  Where libc would require something like 457 *
1.8MiB = 822.6MiB.  The performance of creating those copies would
likely also be atrocious.

Anything short of performing copy-on-write when a file is written using
write(2) would not provide the protection for programs.



Florian Weimer, would it be possible to get glibc's ld.so implementation to use
MAP_SHARED?  Just so people reading the code know what to expect of the
kernel?  As far as I can tell there is not a practical difference
between a read-only MAP_PRIVATE and a read-only MAP_SHARED.


Michael Kerrisk, is there any change we can document that in linux for
MAP_PRIVATE mappings the mapped pages will match the underlying file
unless a value is written to a page through the mapping?

Eric


  reply	other threads:[~2021-08-13 20:18 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-12  8:43 [PATCH v1 0/7] Remove in-tree usage of MAP_DENYWRITE David Hildenbrand
2021-08-12  8:43 ` [PATCH v1 1/7] binfmt: don't use MAP_DENYWRITE when loading shared libraries via uselib() David Hildenbrand
2021-08-12  8:43 ` [PATCH v1 2/7] kernel/fork: factor out atomcially replacing the current MM exe_file David Hildenbrand
2021-08-12  9:17   ` Christian Brauner
2021-08-12  8:43 ` [PATCH v1 3/7] kernel/fork: always deny write access to " David Hildenbrand
2021-08-12 10:05   ` Christian Brauner
2021-08-12 10:13     ` David Hildenbrand
2021-08-12 12:32       ` Christian Brauner
2021-08-12 12:38         ` David Hildenbrand
2021-08-12 16:51   ` Linus Torvalds
2021-08-12 19:38     ` David Hildenbrand
2021-08-12  8:43 ` [PATCH v1 4/7] binfmt: remove in-tree usage of MAP_DENYWRITE David Hildenbrand
2021-08-12  8:43 ` [PATCH v1 5/7] mm: remove VM_DENYWRITE David Hildenbrand
2021-08-12  8:43 ` [PATCH v1 6/7] mm: ignore MAP_DENYWRITE in ksys_mmap_pgoff() David Hildenbrand
2021-08-12  8:43 ` [PATCH v1 7/7] fs: update documentation of get_write_access() and friends David Hildenbrand
2021-08-12 12:20 ` [PATCH v1 0/7] Remove in-tree usage of MAP_DENYWRITE Florian Weimer
2021-08-12 12:47   ` David Hildenbrand
2021-08-12 16:17   ` Eric W. Biederman
2021-08-12 17:32 ` Eric W. Biederman
2021-08-12 17:35   ` Andy Lutomirski
2021-08-12 17:48     ` Eric W. Biederman
2021-08-12 18:01       ` Andy Lutomirski
2021-08-12 18:10       ` Linus Torvalds
2021-08-12 18:47         ` Eric W. Biederman
2021-08-13  9:05           ` David Laight
2021-08-13 20:17             ` Eric W. Biederman [this message]
2021-08-13 20:51               ` Florian Weimer
2021-08-14  0:31               ` Linus Torvalds
2021-08-14  0:49                 ` Andy Lutomirski
2021-08-14  0:54                   ` Linus Torvalds
2021-08-14  0:58                     ` Linus Torvalds
2021-08-14  1:57                       ` Al Viro
2021-08-14  2:02                         ` Al Viro
2021-08-14  9:06                           ` David Hildenbrand
2021-08-14  7:53                         ` Christian Brauner
2021-08-14 19:52                     ` David Laight
2021-08-26 17:48                     ` Andy Lutomirski
2021-08-26 21:47                       ` David Hildenbrand
2021-08-26 22:13                         ` Eric W. Biederman
2021-08-27  8:22                           ` David Laight
2021-08-27 15:58                             ` Eric W. Biederman
2021-09-01  8:28                           ` David Hildenbrand
2021-08-27 10:18                         ` Christian Brauner
2021-08-14  3:04                   ` Matthew Wilcox
2021-08-17 16:48                     ` Removing Mandatory Locks Eric W. Biederman
2021-08-17 16:50                       ` David Hildenbrand
2021-08-18  9:34                       ` Rodrigo Campos
2021-08-19 19:18                         ` Jeff Layton
2021-08-19 20:03                           ` Willy Tarreau
2021-08-19 18:39                       ` Jeff Layton
2021-08-19 19:15                         ` Linus Torvalds
2021-08-19 19:55                           ` Eric Biggers
2021-08-19 20:18                           ` Jeff Layton
2021-08-19 20:31                             ` Linus Torvalds
2021-08-19 21:43                               ` Jeff Layton
2021-08-19 22:32                                 ` Linus Torvalds
2021-08-20  8:30                                   ` David Laight
2021-08-23  7:55                                     ` Geert Uytterhoeven
2021-08-23  8:14                                       ` David Laight
2021-08-20 13:43                                   ` Steven Rostedt
2021-08-20 16:06                                     ` Linus Torvalds
2021-08-20  2:10                               ` Matthew Wilcox
2021-08-20  6:36                               ` Amir Goldstein
2021-08-20  7:14                                 ` Amir Goldstein
2021-08-20 12:27                                   ` Jeff Layton
2021-08-20 12:38                                     ` Willy Tarreau
2021-08-20 13:03                                       ` Jeff Layton
2021-08-20 13:11                                         ` Willy Tarreau
2021-08-20 16:30                           ` Kees Cook
2021-08-20 19:17                             ` H. Peter Anvin
2021-08-20 21:29                               ` Jeff Layton
2021-08-20 21:29                                 ` Jeff Layton
2021-08-21 12:45                                 ` Jeff Layton
2021-08-21 12:45                                   ` Jeff Layton
2021-08-23 22:15                                   ` J. Bruce Fields
2021-08-20 22:31                               ` Matthew Wilcox
2021-08-18  7:51                     ` [PATCH v1 0/7] Remove in-tree usage of MAP_DENYWRITE Christian Brauner
2021-08-18 15:42                   ` J. Bruce Fields
2021-08-19 13:56                     ` Eric W. Biederman
2021-08-19 14:33                       ` J. Bruce Fields
2021-08-20 12:54                         ` Jeff Layton
2021-08-20  3:44                     ` NeilBrown
2021-08-20  8:25                       ` David Laight
2021-08-12 19:24         ` David Hildenbrand
2021-08-12 18:15       ` Florian Weimer
2021-08-12 18:21         ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h7ft2j68.fsf@disp2133 \
    --to=ebiederm@xmission.com \
    --cc=David.Laight@ACULAB.COM \
    --cc=Kevin.Brodsky@arm.com \
    --cc=Nicolas.Viennot@twosigma.com \
    --cc=acme@kernel.org \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=axboe@kernel.dk \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=cfijalkovich@google.com \
    --cc=cgxu519@mykernel.net \
    --cc=chinwen.chang@mediatek.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=david@redhat.com \
    --cc=elver@google.com \
    --cc=feng.tang@intel.com \
    --cc=geert@linux-m68k.org \
    --cc=gerg@linux-m68k.org \
    --cc=hpa@zytor.com \
    --cc=jannh@google.com \
    --cc=jolsa@redhat.com \
    --cc=keescook@chromium.org \
    --cc=krisman@collabora.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-unionfs@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=luto@kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mhocko@suse.com \
    --cc=miklos@szeredi.hu \
    --cc=mingo@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=namhyung@kernel.org \
    --cc=npiggin@gmail.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=shakeelb@google.com \
    --cc=shawn@anastas.io \
    --cc=steven.price@arm.com \
    --cc=surenb@google.com \
    --cc=tglx@linutronix.de \
    --cc=thomascedeno@google.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=vincenzo.frascino@arm.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=walken@google.com \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.