linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Rob Landley <rob@landley.net>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: linux-kernel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Oleg Nesterov <oleg@redhat.com>, Jann Horn <jannh@google.com>,
	Kees Cook <keescook@chromium.org>,
	Greg Ungerer <gerg@linux-m68k.org>,
	Bernd Edlinger <bernd.edlinger@hotmail.de>,
	linux-fsdevel@vger.kernel.org, Al Viro <viro@ZenIV.linux.org.uk>,
	Alexey Dobriyan <adobriyan@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Casey Schaufler <casey@schaufler-ca.com>,
	linux-security-module@vger.kernel.org,
	James Morris <jmorris@namei.org>,
	"Serge E. Hallyn" <serge@hallyn.com>,
	Andy Lutomirski <luto@amacapital.net>
Subject: Re: [PATCH v2 7/8] exec: Generic execfd support
Date: Thu, 21 May 2020 23:51:20 -0500	[thread overview]
Message-ID: <6ce125fd-4fb1-8c39-a9a9-098391f2016a@landley.net> (raw)
In-Reply-To: <87r1vcd4wo.fsf@x220.int.ebiederm.org>

On 5/21/20 10:28 PM, Eric W. Biederman wrote:
> 
> Rob Landley <rob@landley.net> writes:
> 
>> On 5/20/20 11:05 AM, Eric W. Biederman wrote:
> 
>> Toybox would _like_ proc mounted, but can't assume it. I'm writing a new
>> bash-compatible shell with nommu support, which means in order to do subshell
>> and background tasks if (!CONFIG_FORK) I need to create a pipe pair, vfork(),
>> have the child exec itself to unblock the parent, and then read the context data
>> that just got discarded through the pipe from the parent. ("Wheee." And you can
>> quote me on that.)
> 
> Do you have clone(CLONE_VM) ?  If my quick skim of the kernel sources is
> correct that should be the same as vfork except without causing the
> parent to wait for you.  Which I think would remove the need to reexec
> yourself.

As with perpetual motion, that only seems like it would work if you don't
understand what's going on.

A nommu system uses physical addresses, not virtual ones, so every process sees
the same addresses. So if I allocate a new block of memory and memcpy the
contents of the old one into the new one, any pointers in the copy point back
into the ORIGINAL block of memory. Trying to adjust the pointers in the copy is
the exact same problem as trying to do garbage collection in C: it's an AI
complete problem.

Any attempt to "implement a full fork" on nommu hits this problem: copying an
existing mapping to a new address range means any address values in the new
mapping point into the OLD mapping. Things like fdpic fix this up at exec time
(traversing elf tables and relocating), but not at runtime. If you can solve the
"relocate at runtime all addresses within an existing mapping, and all other
mappings that might point to this mapping, including local variables on the
stack that point to a structure member or halfway into a string rather than the
start of an allocation, without adjusting unrelated values coincidentally within
RANGE of a mapping" problem, THEN you can fork on a nommu system.

What vfork() does is pause the parent and have the child continue AS the parent
for a bit (with the system call returning 0). The child starts with all the same
memory mappings the parent has (usually not even a new stack). The child has a
new PID and new resources like its own file descriptor table so close() and
open() don't affect the parent, but if you change a global that's visible to the
parent when it resumes (ant often local variables too: don't return from the
function that called vfork() because if you DON'T have a new stack it'll stomp
the return address the parent needs when IT does it). If the child calls
malloc() the parent needs to free it because it's same heap (because same
mapping of the same physical memory).

Then when the child is ready to discard all those mappings (due to calling
either execve() or _exit(), those are the only two options), the parent resumes
from where it left off with the PID of the child as the system call return value.

The reason the child pauses the parent is so only one process is ever using
those mappings at a given time. Otherwise they're acting like threads without
locking, and usually both are sharing a stack.

P.S. You can use threads _instead_ of fork for some stuff on nommu, but that's
its own can of worms. You still need to vfork() when you do create a child
process you're going to exec, so it doesn't go away, you're just requiring
multiple techniques simultaneously to handle a special case.

P.P.S. vfork() is useful on mmu systems to solve the "don't fork from a thread"
problem. You can vfork() from a thread cheaply and reliably and it only pauses
the one thread you forked from, not every thread in the whole process. If you
fork() from a heavily threadded process you can cause a multi-milisecond latency
spike because even with an mmu the copy on write "keep track of what's shared by
what" generally can't handle the "threads AND processes sharing mappings" case,
so it just gives up and copies it all at fork time, in one go, holding a big
lock while doing so. This causes a large latency spike which vfork() avoids.
(And can cause a large wasteful allocation and memory dirtying which is
immediately freed.)

>>> The file descriptor is stored in mm->exe_file.
>>> Probably the most straight forward implementation is to allow
>>> execveat(AT_EXE_FILE, ...).
>>
>> Cool, that works.
>>
>>> You can look at binfmt_misc for how to reopen an open file descriptor.
>>
>> Added to the todo heap.
> 
> Yes I don't think it would be a lot of code.
> 
> I think you might be better served with clone(CLONE_VM) as it doesn't
> block so you don't need to feed yourself your context over a pipe.

Except that doesn't fix it.

Yes I could use threads instead, but the cure is worse than the disease and the
result is your shell background processes are threads rather than independent
processes (is $$ reporting PID or TID, I really don't want to go there).

> Eric

Rob

  reply	other threads:[~2020-05-22  4:51 UTC|newest]

Thread overview: 149+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-05 19:39 exec: Promised cleanups after introducing exec_update_mutex Eric W. Biederman
2020-05-05 19:41 ` [PATCH 1/7] binfmt: Move install_exec_creds after setup_new_exec to match binfmt_elf Eric W. Biederman
2020-05-05 20:45   ` Kees Cook
2020-05-06 12:42   ` Greg Ungerer
2020-05-06 12:56     ` Eric W. Biederman
2020-05-05 19:41 ` [PATCH 2/7] exec: Make unlocking exec_update_mutex explict Eric W. Biederman
2020-05-05 20:46   ` Kees Cook
2020-05-05 19:42 ` [PATCH 3/7] exec: Rename the flag called_exec_mmap point_of_no_return Eric W. Biederman
2020-05-05 20:49   ` Kees Cook
2020-05-05 19:43 ` [PATCH 4/7] exec: Merge install_exec_creds into setup_new_exec Eric W. Biederman
2020-05-05 20:50   ` Kees Cook
2020-05-05 19:44 ` [PATCH 5/7] exec: In setup_new_exec cache current in the local variable me Eric W. Biederman
2020-05-05 20:51   ` Kees Cook
2020-05-05 19:45 ` [PATCH 6/7] exec: Move most of setup_new_exec into flush_old_exec Eric W. Biederman
2020-05-05 21:29   ` Kees Cook
2020-05-06 14:57     ` Eric W. Biederman
2020-05-06 15:30       ` Kees Cook
2020-05-07 19:51         ` Eric W. Biederman
2020-05-07 21:51     ` Eric W. Biederman
2020-05-08  5:50       ` Kees Cook
2020-05-05 19:46 ` [PATCH 7/7] exec: Rename flush_old_exec begin_new_exec Eric W. Biederman
2020-05-05 21:30   ` Kees Cook
2020-05-06 12:41 ` exec: Promised cleanups after introducing exec_update_mutex Greg Ungerer
2020-05-08 18:43 ` [PATCH 0/6] exec: Trivial cleanups for exec Eric W. Biederman
2020-05-08 18:44   ` [PATCH 1/6] exec: Move the comment from above de_thread to above unshare_sighand Eric W. Biederman
2020-05-09  5:02     ` Kees Cook
2020-05-08 18:44   ` [PATCH 2/6] exec: Fix spelling of search_binary_handler in a comment Eric W. Biederman
2020-05-09  5:03     ` Kees Cook
2020-05-08 18:45   ` [PATCH 3/6] exec: Stop open coding mutex_lock_killable of cred_guard_mutex Eric W. Biederman
2020-05-09  5:08     ` Kees Cook
2020-05-09 19:18     ` Linus Torvalds
2020-05-09 19:57       ` Eric W. Biederman
2020-05-10 20:33       ` Kees Cook
2020-05-08 18:45   ` [PATCH 4/6] exec: Run sync_mm_rss before taking exec_update_mutex Eric W. Biederman
2020-05-09  5:15     ` Kees Cook
2020-05-09 14:17       ` Eric W. Biederman
2020-05-08 18:47   ` [PATCH 5/6] exec: Move handling of the point of no return to the top level Eric W. Biederman
2020-05-09  5:31     ` Kees Cook
2020-05-09 13:39       ` Eric W. Biederman
2020-05-08 18:48   ` [PATCH 6/6] exec: Set the point of no return sooner Eric W. Biederman
2020-05-09  5:33     ` Kees Cook
2020-05-09 19:40   ` [PATCH 0/5] exec: Control flow simplifications Eric W. Biederman
2020-05-09 19:40     ` [PATCH 1/5] exec: Call cap_bprm_set_creds directly from prepare_binprm Eric W. Biederman
2020-05-09 20:04       ` Linus Torvalds
2020-05-09 19:41     ` [PATCH 2/5] exec: Directly call security_bprm_set_creds from __do_execve_file Eric W. Biederman
2020-05-09 20:07       ` Linus Torvalds
2020-05-09 20:12         ` Eric W. Biederman
2020-05-09 20:19           ` Linus Torvalds
2020-05-11  3:15       ` Kees Cook
2020-05-11 16:52         ` Eric W. Biederman
2020-05-11 21:18           ` Kees Cook
2020-05-09 19:41     ` [PATCH 3/5] exec: Remove recursion from search_binary_handler Eric W. Biederman
2020-05-09 20:16       ` Linus Torvalds
2020-05-10  4:22       ` Tetsuo Handa
2020-05-10 19:38         ` Linus Torvalds
2020-05-11 14:33           ` Eric W. Biederman
2020-05-11 19:10             ` Rob Landley
2020-05-13 21:59               ` Eric W. Biederman
2020-05-14 18:46                 ` Rob Landley
2020-05-11 21:55             ` Kees Cook
2020-05-12 18:42               ` Eric W. Biederman
2020-05-12 19:25                 ` Kees Cook
2020-05-12 20:31                   ` Eric W. Biederman
2020-05-12 23:08                     ` Kees Cook
2020-05-12 23:47                       ` Kees Cook
2020-05-12 23:51                         ` Kees Cook
2020-05-14 14:56                           ` Eric W. Biederman
2020-05-14 16:56                             ` Casey Schaufler
2020-05-14 17:02                               ` Eric W. Biederman
2020-05-13  0:20                 ` Linus Torvalds
2020-05-13  2:39                   ` Rob Landley
2020-05-13 19:51                     ` Linus Torvalds
2020-05-14 16:49                   ` Eric W. Biederman
2020-05-09 19:42     ` [PATCH 4/5] exec: Allow load_misc_binary to call prepare_binfmt unconditionally Eric W. Biederman
2020-05-11 22:09       ` Kees Cook
2020-05-09 19:42     ` [PATCH 5/5] exec: Move the call of prepare_binprm into search_binary_handler Eric W. Biederman
2020-05-11 22:24       ` Kees Cook
2020-05-19  0:29     ` [PATCH v2 0/8] exec: Control flow simplifications Eric W. Biederman
2020-05-19  0:29       ` [PATCH v2 1/8] exec: Teach prepare_exec_creds how exec treats uids & gids Eric W. Biederman
2020-05-19 18:03         ` Kees Cook
2020-05-19 18:28           ` Linus Torvalds
2020-05-19 18:57             ` Eric W. Biederman
2020-05-19  0:30       ` [PATCH v2 2/8] exec: Factor security_bprm_creds_for_exec out of security_bprm_set_creds Eric W. Biederman
2020-05-19 15:34         ` Casey Schaufler
2020-05-19 18:10         ` Kees Cook
2020-05-19 21:28           ` James Morris
2020-05-19  0:31       ` [PATCH v2 3/8] exec: Convert security_bprm_set_creds into security_bprm_repopulate_creds Eric W. Biederman
2020-05-19 18:21         ` Kees Cook
2020-05-19 19:03           ` Eric W. Biederman
2020-05-19 19:14             ` Kees Cook
2020-05-20 20:22               ` Eric W. Biederman
2020-05-20 20:53                 ` Kees Cook
2020-05-19 21:52         ` James Morris
2020-05-20 12:40           ` Eric W. Biederman
2020-05-19  0:31       ` [PATCH v2 4/8] exec: Allow load_misc_binary to call prepare_binfmt unconditionally Eric W. Biederman
2020-05-19 18:27         ` Kees Cook
2020-05-19 19:08           ` Eric W. Biederman
2020-05-19 19:17             ` Kees Cook
2020-05-19  0:32       ` [PATCH v2 5/8] exec: Move the call of prepare_binprm into search_binary_handler Eric W. Biederman
2020-05-19 18:27         ` Kees Cook
2020-05-19 21:30         ` James Morris
2020-05-19  0:33       ` [PATCH v2 6/8] exec/binfmt_script: Don't modify bprm->buf and then return -ENOEXEC Eric W. Biederman
2020-05-19 19:08         ` Kees Cook
2020-05-19 19:19           ` Eric W. Biederman
2020-05-19  0:33       ` [PATCH v2 7/8] exec: Generic execfd support Eric W. Biederman
2020-05-19 19:46         ` Kees Cook
2020-05-19 19:54           ` Linus Torvalds
2020-05-19 20:20             ` Eric W. Biederman
2020-05-19 21:59         ` Rob Landley
2020-05-20 16:05           ` Eric W. Biederman
2020-05-21 22:50             ` Rob Landley
2020-05-22  3:28               ` Eric W. Biederman
2020-05-22  4:51                 ` Rob Landley [this message]
2020-05-22 13:35                   ` Eric W. Biederman
2020-05-19  0:34       ` [PATCH v2 8/8] exec: Remove recursion from search_binary_handler Eric W. Biederman
2020-05-19 20:37         ` Kees Cook
2020-05-19  1:25       ` [PATCH v2 0/8] exec: Control flow simplifications Linus Torvalds
2020-05-19 21:55       ` Kees Cook
2020-05-20 13:02         ` Eric W. Biederman
2020-05-20 22:12       ` Eric W. Biederman
2020-05-20 23:43         ` Kees Cook
2020-05-21 11:53           ` Eric W. Biederman
2020-05-28 15:38       ` [PATCH 0/11] exec: cred calculation simplifications Eric W. Biederman
2020-05-28 15:41         ` [PATCH 01/11] exec: Reduce bprm->per_clear to a single bit Eric W. Biederman
2020-05-28 19:04           ` Linus Torvalds
2020-05-28 19:17             ` Eric W. Biederman
2020-05-28 15:42         ` [PATCH 02/11] exec: Introduce active_per_clear the per file version of per_clear Eric W. Biederman
2020-05-28 19:05           ` Linus Torvalds
2020-05-28 15:42         ` [PATCH 03/11] exec: Compute file based creds only once Eric W. Biederman
2020-05-28 15:43         ` [PATCH 04/11] exec: Move uid/gid handling from creds_from_file into bprm_fill_uid Eric W. Biederman
2020-05-28 15:44         ` Eric W. Biederman
2020-05-28 15:44         ` [PATCH 05/11] exec: In bprm_fill_uid use CAP_SETGID to see if a gid change is safe Eric W. Biederman
2020-05-28 15:48         ` [PATCH 06/11] exec: Don't set secureexec when the uid or gid changes are abandoned Eric W. Biederman
2020-05-28 15:48         ` [PATCH 07/11] exec: Set saved, fs, and effective ids together in bprm_fill_uid Eric W. Biederman
2020-05-28 15:49         ` [PATCH 08/11] exec: In bprm_fill_uid remove unnecessary no new privs check Eric W. Biederman
2020-05-28 15:49         ` [PATCH 09/11] exec: In bprm_fill_uid only set per_clear when honoring suid or sgid Eric W. Biederman
2020-05-28 19:08           ` Linus Torvalds
2020-05-28 19:21             ` Eric W. Biederman
2020-05-28 15:50         ` [PATCH 10/11] exec: In bprm_fill_uid set secureexec at same time as per_clear Eric W. Biederman
2020-05-28 15:50         ` [PATCH 11/11] exec: Remove the label after_setid from bprm_fill_uid Eric W. Biederman
2020-05-29 16:45         ` [PATCH 0/2] exec: Remove the computation of bprm->cred Eric W. Biederman
2020-05-29 16:46           ` [PATCH 1/2] exec: Add a per bprm->file version of per_clear Eric W. Biederman
2020-05-29 21:06             ` Kees Cook
2020-05-30  3:23               ` Eric W. Biederman
2020-05-30  5:14                 ` Kees Cook
2020-05-29 16:47           ` [PATCH 2/2] exec: Compute file based creds only once Eric W. Biederman
2020-05-29 21:24             ` Kees Cook
2020-05-30  3:28               ` Eric W. Biederman
2020-05-30  5:18                 ` Kees Cook

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6ce125fd-4fb1-8c39-a9a9-098391f2016a@landley.net \
    --to=rob@landley.net \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bernd.edlinger@hotmail.de \
    --cc=casey@schaufler-ca.com \
    --cc=ebiederm@xmission.com \
    --cc=gerg@linux-m68k.org \
    --cc=jannh@google.com \
    --cc=jmorris@namei.org \
    --cc=keescook@chromium.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=oleg@redhat.com \
    --cc=serge@hallyn.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).