All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rich Felker <dalias@aerifal.cx>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: David Drysdale <drysdale@google.com>,
	"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Andy Lutomirski <luto@amacapital.net>,
	Meredydd Luff <meredydd@senatehouse.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Miller <davem@davemloft.net>,
	Thomas Gleixner <tglx@linutronix.de>,
	Stephen Rothwell <sfr@canb.auug.org.au>,
	Oleg Nesterov <oleg@redhat.com>, Ingo Molnar <mingo@redhat.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Kees Cook <keescook@chromium.org>, Arnd Bergmann <arnd@arndb.de>,
	Christoph Hellwig <hch@infradead.org>, X86 ML <x86@kernel.org>,
	linux-arch <linux-arch@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>,
	sparclinux@vger.kernel.org
Subject: Re: [PATCHv10 man-pages 5/5] execveat.2: initial man page for execveat(2)
Date: Fri, 9 Jan 2015 16:28:52 -0500	[thread overview]
Message-ID: <20150109212852.GU4574@brightrain.aerifal.cx> (raw)
In-Reply-To: <20150109210941.GL22149@ZenIV.linux.org.uk>

On Fri, Jan 09, 2015 at 09:09:41PM +0000, Al Viro wrote:
> On Fri, Jan 09, 2015 at 03:59:26PM -0500, Rich Felker wrote:
> 
> > > For fsck sake, folks, if you have bloody /proc, you don't need that shite
> > > at all!  Just do execve on /proc/self/fd/n, and be done with that.
> > > 
> > > The sole excuse for merging that thing in the first place had been
> > > "would anybody think of children^Wsclerotic^Whardened environments
> > > where they have no /proc at all".
> > 
> > That doesn't work. With O_CLOEXEC, /proc/self/fd/n is already gone at
> > the time the interpreter runs, whether you're using fexecveat or
> > execve with "/proc/self/fd/n" to implement POSIX fexecve(). That's the
> > problem. This breaks the intended idiom for fexecve.
> 
> Just what will your magical symlink do in case when the file is opened,
> unlinked and marked O_CLOEXEC?  When should actual freeing of disk blocks,
> etc. happen?  And no, you can't assume that interpreter will open the
> damn thing even once - there's nothing to oblige it to do so.

Unlinking is not relevant. Magical symlinks refer to open file
descriptions (either real ones or O_PATH inode-reference-only ones),
not files. There is no new complexity proposed for freeing disk blocks
here. Semantics are identical to existing O_PATH inode references.

> Al, more and more tempted to ask reverting the whole thing - this hardcoded
> /dev/fd/... (in fs/exec.c, no less) is disgraceful enough, but threats of
> even more revolting kludges in the name of "intended idiom for fexecve"...

If you have a multithreaded process that's executing an external
program via fexecve, then unless it has specialized knowledge about
what other parts of the program/libraries are doing, it needs to be
using O_CLOEXEC for the file descriptor. Otherwise, the file
descriptor could be leaked to child processes started by other
threads. This is what I mean by the "intended idiom". Note that it's
easier to use pathnames instead of fexecve, but doing so may not be an
option if the program needs to verify the file before exec'ing it.

This issue can be avoided if you're going to fork-and-fexecve rather
than replacing the calling process, since after forking it's safe to
remove the close-on-exec flag. But then you still have the issue that
the child process, after exec, keeps a spurious file descriptor to its
own process image (executable file) open which it can never close
(because it doesn't know the number). This could eventually lead to fd
exhaustion after many generations.

The "magic open-once magic symlink" approach is really the cleanest
solution I can find. In the case where the interpreter does not open
the script, nothing terribly bad happens; the magic symlink just
sticks around until _exit or exec. In the case where the interpreter
opens it more than once, you get a failure, but as far as I know
existing interpreters don't do this, and it's arguably bad design. In
any case it's a caught error.

Rich

WARNING: multiple messages have this Message-ID (diff)
From: Rich Felker <dalias@aerifal.cx>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: David Drysdale <drysdale@google.com>,
	"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Andy Lutomirski <luto@amacapital.net>,
	Meredydd Luff <meredydd@senatehouse.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Miller <davem@davemloft.net>,
	Thomas Gleixner <tglx@linutronix.de>,
	Stephen Rothwell <sfr@canb.auug.org.au>,
	Oleg Nesterov <oleg@redhat.com>, Ingo Molnar <mingo@redhat.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Kees Cook <keescook@chromium.org>, Arnd Bergmann <arnd@arndb.de>,
	Christoph Hellwig <hch@infradead.org>, X86 ML <x86@kernel.org>,
	linux-arch <linux-arch@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>,
	sparclinux@vger.kernel.org
Subject: Re: [PATCHv10 man-pages 5/5] execveat.2: initial man page for execveat(2)
Date: Fri, 09 Jan 2015 21:28:52 +0000	[thread overview]
Message-ID: <20150109212852.GU4574@brightrain.aerifal.cx> (raw)
In-Reply-To: <20150109210941.GL22149@ZenIV.linux.org.uk>

On Fri, Jan 09, 2015 at 09:09:41PM +0000, Al Viro wrote:
> On Fri, Jan 09, 2015 at 03:59:26PM -0500, Rich Felker wrote:
> 
> > > For fsck sake, folks, if you have bloody /proc, you don't need that shite
> > > at all!  Just do execve on /proc/self/fd/n, and be done with that.
> > > 
> > > The sole excuse for merging that thing in the first place had been
> > > "would anybody think of children^Wsclerotic^Whardened environments
> > > where they have no /proc at all".
> > 
> > That doesn't work. With O_CLOEXEC, /proc/self/fd/n is already gone at
> > the time the interpreter runs, whether you're using fexecveat or
> > execve with "/proc/self/fd/n" to implement POSIX fexecve(). That's the
> > problem. This breaks the intended idiom for fexecve.
> 
> Just what will your magical symlink do in case when the file is opened,
> unlinked and marked O_CLOEXEC?  When should actual freeing of disk blocks,
> etc. happen?  And no, you can't assume that interpreter will open the
> damn thing even once - there's nothing to oblige it to do so.

Unlinking is not relevant. Magical symlinks refer to open file
descriptions (either real ones or O_PATH inode-reference-only ones),
not files. There is no new complexity proposed for freeing disk blocks
here. Semantics are identical to existing O_PATH inode references.

> Al, more and more tempted to ask reverting the whole thing - this hardcoded
> /dev/fd/... (in fs/exec.c, no less) is disgraceful enough, but threats of
> even more revolting kludges in the name of "intended idiom for fexecve"...

If you have a multithreaded process that's executing an external
program via fexecve, then unless it has specialized knowledge about
what other parts of the program/libraries are doing, it needs to be
using O_CLOEXEC for the file descriptor. Otherwise, the file
descriptor could be leaked to child processes started by other
threads. This is what I mean by the "intended idiom". Note that it's
easier to use pathnames instead of fexecve, but doing so may not be an
option if the program needs to verify the file before exec'ing it.

This issue can be avoided if you're going to fork-and-fexecve rather
than replacing the calling process, since after forking it's safe to
remove the close-on-exec flag. But then you still have the issue that
the child process, after exec, keeps a spurious file descriptor to its
own process image (executable file) open which it can never close
(because it doesn't know the number). This could eventually lead to fd
exhaustion after many generations.

The "magic open-once magic symlink" approach is really the cleanest
solution I can find. In the case where the interpreter does not open
the script, nothing terribly bad happens; the magic symlink just
sticks around until _exit or exec. In the case where the interpreter
opens it more than once, you get a failure, but as far as I know
existing interpreters don't do this, and it's arguably bad design. In
any case it's a caught error.

Rich

  reply	other threads:[~2015-01-09 21:29 UTC|newest]

Thread overview: 123+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-24 11:53 [PATCHv10 0/5] syscalls,x86,sparc: Add execveat() system call David Drysdale
2014-11-24 11:53 ` David Drysdale
2014-11-24 11:53 ` [PATCHv10 1/5] syscalls: implement " David Drysdale
2014-11-24 11:53   ` David Drysdale
2014-11-24 11:53 ` [PATCHv10 2/5] x86: Hook up execveat " David Drysdale
2014-11-24 12:45   ` Thomas Gleixner
2014-11-24 12:45     ` Thomas Gleixner
2014-11-24 12:45     ` Thomas Gleixner
2014-11-24 17:06   ` Dan Carpenter
2014-11-24 17:06     ` Dan Carpenter
2014-11-24 17:06     ` Dan Carpenter
2014-11-24 18:26     ` David Drysdale
2014-11-24 18:26       ` David Drysdale
2014-11-25 12:16       ` Dan Carpenter
2014-11-25 12:16         ` Dan Carpenter
2014-11-25 12:16         ` Dan Carpenter
2014-11-24 18:53     ` Thomas Gleixner
2014-11-24 18:53       ` Thomas Gleixner
2014-11-24 11:53 ` [PATCHv10 3/5] syscalls: add selftest for execveat(2) David Drysdale
2014-11-24 11:53   ` David Drysdale
2014-11-24 11:53 ` [PATCHv10 4/5] sparc: Hook up execveat system call David Drysdale
2014-11-24 18:36   ` David Miller
2014-11-24 18:36     ` David Miller
2014-11-24 11:53 ` [PATCHv10 man-pages 5/5] execveat.2: initial man page for execveat(2) David Drysdale
2015-01-09 15:47   ` Michael Kerrisk (man-pages)
2015-01-09 15:47     ` Michael Kerrisk (man-pages)
2015-01-09 16:13     ` Rich Felker
2015-01-09 16:13       ` Rich Felker
2015-01-09 17:46       ` David Drysdale
2015-01-09 17:46         ` David Drysdale
2015-01-09 17:46         ` David Drysdale
2015-01-09 20:48         ` Rich Felker
2015-01-09 20:48           ` Rich Felker
2015-01-09 20:48           ` Rich Felker
2015-01-09 20:56           ` Al Viro
2015-01-09 20:56             ` Al Viro
2015-01-09 20:59             ` Rich Felker
2015-01-09 20:59               ` Rich Felker
2015-01-09 20:59               ` Rich Felker
2015-01-09 21:09               ` Al Viro
2015-01-09 21:09                 ` Al Viro
2015-01-09 21:09                 ` Al Viro
2015-01-09 21:28                 ` Rich Felker [this message]
2015-01-09 21:28                   ` Rich Felker
2015-01-09 21:50                   ` Al Viro
2015-01-09 21:50                     ` Al Viro
2015-01-09 22:17                     ` Rich Felker
2015-01-09 22:17                       ` Rich Felker
2015-01-09 22:33                       ` Al Viro
2015-01-09 22:33                         ` Al Viro
2015-01-09 22:42                         ` Rich Felker
2015-01-09 22:42                           ` Rich Felker
2015-01-09 22:57                           ` Al Viro
2015-01-09 22:57                             ` Al Viro
2015-01-09 22:57                             ` Al Viro
2015-01-09 23:12                             ` Rich Felker
2015-01-09 23:12                               ` Rich Felker
2015-01-09 23:24                               ` Andy Lutomirski
2015-01-09 23:24                                 ` Andy Lutomirski
2015-01-09 23:37                                 ` Rich Felker
2015-01-09 23:37                                   ` Rich Felker
2015-01-10  0:01                                 ` Al Viro
2015-01-09 23:36                               ` Al Viro
2015-01-09 23:36                                 ` Al Viro
2015-01-10  3:03                                 ` Al Viro
2015-01-10  3:03                                   ` Al Viro
2015-01-10  3:03                                   ` Al Viro
2015-01-10  3:41                                   ` Rich Felker
2015-01-10  3:41                                     ` Rich Felker
2015-01-10  4:14                                     ` Al Viro
2015-01-10  5:57                                       ` Rich Felker
2015-01-10  5:57                                         ` Rich Felker
2015-01-10 22:27                                         ` Eric W. Biederman
2015-01-10 22:27                                           ` Eric W. Biederman
2015-01-10 22:27                                           ` Eric W. Biederman
2015-01-11  1:15                                           ` Rich Felker
2015-01-11  1:15                                             ` Rich Felker
2015-01-11  2:09                                             ` Eric W. Biederman
2015-01-11  2:09                                               ` Eric W. Biederman
2015-01-11  2:09                                               ` Eric W. Biederman
2015-01-11 11:02                                               ` Christoph Hellwig
2015-01-11 11:02                                                 ` Christoph Hellwig
2015-01-11 11:02                                                 ` Christoph Hellwig
2015-01-12 14:18                     ` David Drysdale
2015-01-09 22:13                   ` Eric W. Biederman
2015-01-09 22:13                     ` Eric W. Biederman
2015-01-09 22:13                     ` Eric W. Biederman
2015-01-09 22:13                     ` Eric W. Biederman
2015-01-09 22:38                     ` Rich Felker
2015-01-09 22:38                       ` Rich Felker
2015-01-10  1:17                       ` Eric W. Biederman
2015-01-10  1:17                         ` Eric W. Biederman
2015-01-10  1:17                         ` Eric W. Biederman
2015-01-10  1:17                         ` Eric W. Biederman
2015-01-10  1:33                         ` Rich Felker
2015-01-10  1:33                           ` Rich Felker
2015-01-10  1:33                           ` Rich Felker
2015-01-12 11:33                           ` David Drysdale
2015-01-12 16:07                             ` Rich Felker
2015-01-12 16:07                               ` Rich Felker
2015-01-10  7:13                     ` Michael Kerrisk (man-pages)
2015-01-10  7:13                       ` Michael Kerrisk (man-pages)
2015-01-09 21:20               ` Eric W. Biederman
2015-01-09 21:20                 ` Eric W. Biederman
2015-01-09 21:20                 ` Eric W. Biederman
2015-01-09 21:31                 ` Rich Felker
2015-01-09 21:31                   ` Rich Felker
2015-01-09 21:31                   ` Rich Felker
2015-01-10  7:43         ` Michael Kerrisk (man-pages)
2015-01-10  7:43           ` Michael Kerrisk (man-pages)
2015-01-10  7:43           ` Michael Kerrisk (man-pages)
2015-01-10  8:27         ` Michael Kerrisk (man-pages)
2015-01-10  8:27           ` Michael Kerrisk (man-pages)
2015-01-10 13:31           ` Rich Felker
2015-01-10 13:31             ` Rich Felker
2015-01-10  7:38       ` Michael Kerrisk (man-pages)
2015-01-10  7:38         ` Michael Kerrisk (man-pages)
2015-01-10  7:38         ` Michael Kerrisk (man-pages)
2015-01-09 18:02     ` David Drysdale
2015-01-09 18:02       ` David Drysdale
2015-01-10  7:56       ` Michael Kerrisk (man-pages)
2015-01-10  7:56         ` Michael Kerrisk (man-pages)
2015-01-10  7:56         ` Michael Kerrisk (man-pages)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150109212852.GU4574@brightrain.aerifal.cx \
    --to=dalias@aerifal.cx \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=davem@davemloft.net \
    --cc=drysdale@google.com \
    --cc=ebiederm@xmission.com \
    --cc=hch@infradead.org \
    --cc=hpa@zytor.com \
    --cc=keescook@chromium.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=meredydd@senatehouse.org \
    --cc=mingo@redhat.com \
    --cc=mtk.manpages@gmail.com \
    --cc=oleg@redhat.com \
    --cc=sfr@canb.auug.org.au \
    --cc=sparclinux@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=viro@ZenIV.linux.org.uk \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.