All of lore.kernel.org
 help / color / mirror / Atom feed
* chroot(2) and bind mounts as non-root
@ 2011-12-07 17:54 Colin Walters
  2011-12-07 19:36 ` John Stoffel
                   ` (3 more replies)
  0 siblings, 4 replies; 31+ messages in thread
From: Colin Walters @ 2011-12-07 17:54 UTC (permalink / raw)
  To: LKML; +Cc: morgan, serue, dhowells, kzak

[-- Attachment #1: Type: text/plain, Size: 1878 bytes --]

Hi,

(TL;DR version: Please audit the attached setuid program)

I've recently been doing some work in software compilation, and it'd be
really handy if I could call chroot(2) as a non-root user.  The reason
to chroot is to help avoid "host contamination" - I can set up a build
root and then chroot in.  The reason to do it as non-root is, well,
requiring root to build software sucks for multiple obvious reasons.

(Now you can do LD_PRELOAD hacks to talk to a daemon like
https://github.com/wrpseudo/pseudo does, but really - too gross and too
slow).

The historical reason one can't call chroot(2) as non-root is because of
setuid binaries (hard link a setuid binary into chroot of your choice
with trojaned libc.so).  But it turns out a while back this commit:

commit 3898b1b4ebff8dcfbcf1807e0661585e06c9a91c
Author: Andrew G. Morgan <morgan@kernel.org>
Date:   Mon Apr 28 02:13:40 2008 -0700

    capabilities: implement per-process securebits

Added *exactly* what we need.  We just call:

prctl (PR_SET_SECUREBITS, SECBIT_NOROOT | SECBIT_NOROOT_LOCKED);

A setuid program to call both this and chroot(2) is *almost* good enough
for my use case - but it's a little hard to run most build software
without say /dev/null, /dev/urandom and /proc.

The other key thing Linux recently gained is CLONE_NEWNS - with this
(and also SECBIT_NOROOT), we can allow users to make bind mounts to
their heart's content, which frankly is just cool.  Bind mounts are a
really neat VFS feature.

While I was making a setuid program, I also exposed the other useful
clone flags like CLONE_NEWNET and CLONE_NEWIPC.

So...I'd like to eventually get something like this into
util-linux...possibly as a hardlink to mount (which is already setuid).

Auditing appreciated!  Am I missing anything?  Has anyone else noticed
the nice pairing of SECBIT_NOROOT and chroot(2) and used it for a tool?



[-- Attachment #2: ostbuild-user-chroot.c --]
[-- Type: text/x-csrc, Size: 10061 bytes --]

/* -*- mode: c; tab-width: 2; indent-tabs-mode: nil -*-
 *
 * user-chroot: A setuid program that allows non-root users to safely chroot(2)
 *
 * "safely": I believe that this program, when deployed as setuid on a
 * typical "distribution" such as RHEL or Debian, does not, even when
 * used in combination with typical software installed on that
 * distribution, allow privilege escalation.
 *
 * Copyright 2011 Colin Walters <walters@verbum.org>
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it would be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software Foundation,
 * Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 *
 *
 */

#define _GNU_SOURCE
#include <unistd.h>
#include <stdio.h>
#include <fcntl.h>
#include <stdarg.h>
#include <string.h>
#include <assert.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/prctl.h>
#include <sys/mount.h>
#include <sys/syscall.h>
#include <sys/wait.h>
#include <linux/securebits.h>
#include <sched.h>

static void fatal (const char *message, ...) __attribute__ ((noreturn)) __attribute__ ((format (printf, 1, 2)));
static void fatal_errno (const char *message) __attribute__ ((noreturn));

static void
fatal (const char *fmt,
       ...)
{
  va_list args;
  
  va_start (args, fmt);

  vfprintf (stderr, fmt, args);
  putc ('\n', stderr);
  
  va_end (args);
  exit (1);
}

static void
fatal_errno (const char *message)
{
  perror (message);
  exit (1);
}

typedef enum {
  MOUNT_SPEC_BIND,
  MOUNT_SPEC_READONLY,
  MOUNT_SPEC_PROCFS
} MountSpecType;

typedef struct _MountSpec MountSpec;
struct _MountSpec {
  MountSpecType type;

  const char *source;
  const char *dest;
  
  MountSpec *next;
};

static MountSpec *
reverse_mount_list (MountSpec *mount)
{
  MountSpec *prev = NULL;

  while (mount)
    {
      MountSpec *next = mount->next;
      mount->next = prev;
      prev = mount;
      mount = next;
    }

  return prev;
}

int
main (int      argc,
      char   **argv)
{
  const char *argv0;
  const char *chroot_dir;
  const char *program;
  uid_t ruid, euid, suid;
  gid_t rgid, egid, sgid;
  int after_mount_arg_index;
  unsigned int n_mounts = 0;
  const unsigned int max_mounts = 50; /* Totally arbitrary... */
  char **program_argv;
  MountSpec *bind_mounts = NULL;
  MountSpec *bind_mount_iter;
  int unshare_ipc = 0;
  int unshare_net = 0;
  int unshare_pid = 0;
  int clone_flags = 0;
  int child_status = 0;
  pid_t child;

  if (argc <= 0)
    return 1;

  argv0 = argv[0];
  argc--;
  argv++;

  if (argc < 1)
    fatal ("ROOTDIR argument must be specified");

  after_mount_arg_index = 0;
  while (after_mount_arg_index < argc)
    {
      const char *arg = argv[after_mount_arg_index];
      MountSpec *mount = NULL;

      if (n_mounts >= max_mounts)
        fatal ("Too many mounts (maximum of %u)", n_mounts);
      n_mounts++;

      if (strcmp (arg, "--mount-bind") == 0)
        {
          if ((argc - after_mount_arg_index) < 3)
            fatal ("--mount-bind takes two arguments");

          mount = malloc (sizeof (MountSpec));
          mount->type = MOUNT_SPEC_BIND;
          mount->source = argv[after_mount_arg_index+1];
          mount->dest = argv[after_mount_arg_index+2];
          mount->next = bind_mounts;
          
          bind_mounts = mount;
          after_mount_arg_index += 3;
        }
      else if (strcmp (arg, "--mount-readonly") == 0)
        {
          MountSpec *mount;

          if ((argc - after_mount_arg_index) < 2)
            fatal ("--mount-readonly takes one argument");

          mount = malloc (sizeof (MountSpec));
          mount->type = MOUNT_SPEC_READONLY;
          mount->source = NULL;
          mount->dest = argv[after_mount_arg_index+1];
          mount->next = bind_mounts;
          
          bind_mounts = mount;
          after_mount_arg_index += 2;
        }
      else if (strcmp (arg, "--mount-proc") == 0)
        {
          MountSpec *mount;

          if ((argc - after_mount_arg_index) < 2)
            fatal ("--mount-proc takes one argument");

          mount = malloc (sizeof (MountSpec));
          mount->type = MOUNT_SPEC_PROCFS;
          mount->source = NULL;
          mount->dest = argv[after_mount_arg_index+1];
          mount->next = bind_mounts;
          
          bind_mounts = mount;
          after_mount_arg_index += 2;
        }
      else if (strcmp (arg, "--unshare-ipc") == 0)
        {
          unshare_ipc = 1;
          after_mount_arg_index += 1;
        }
      else if (strcmp (arg, "--unshare-pid") == 0)
        {
          unshare_pid = 1;
          after_mount_arg_index += 1;
        }
      else if (strcmp (arg, "--unshare-net") == 0)
        {
          unshare_net = 1;
          after_mount_arg_index += 1;
        }
      else
        break;
    }
        
  bind_mounts = reverse_mount_list (bind_mounts);

  if ((argc - after_mount_arg_index) < 2)
    fatal ("usage: %s [--unshare-ipc] [--unshare-pid] [--unshare-net] [--mount-proc DIR] [--mount-readonly DIR] [--mount-bind SOURCE DEST] ROOTDIR PROGRAM ARGS...", argv0);
  chroot_dir = argv[after_mount_arg_index];
  program = argv[after_mount_arg_index+1];
  program_argv = argv + after_mount_arg_index + 1;

  if (getresgid (&rgid, &egid, &sgid) < 0)
    fatal_errno ("getresgid");
  if (getresuid (&ruid, &euid, &suid) < 0)
    fatal_errno ("getresuid");

  if (ruid == 0)
    fatal ("error: ruid is 0");
  if (rgid == 0)
    rgid = ruid;

  /* CLONE_NEWNS makes it so that when we create bind mounts below,
   * we're only affecting our children, not the entire system.  This
   * way it's harmless to bind mount e.g. /proc over an arbitrary
   * directory.
   */
  clone_flags = SIGCHLD | CLONE_NEWNS;
  /* CLONE_NEWIPC and CLONE_NEWUTS are avenues of communication that
   * might leak outside the container; any IPC can be done by setting
   * up a bind mount and using files or sockets there, if desired.
   */
  if (unshare_ipc)
    clone_flags |= (CLONE_NEWIPC | CLONE_NEWUTS);
  /* CLONE_NEWPID helps ensure random build or test scripts don't kill
   * processes outside of the container.
   */
  if (unshare_pid)
    clone_flags |= CLONE_NEWPID;

  /* Isolated networking */
  if (unshare_net)
    clone_flags |= CLONE_NEWNET;

  if ((child = syscall (__NR_clone, clone_flags, NULL)) < 0)
    perror ("clone");

  if (child == 0)
    {
      /* Ensure we can't execute setuid programs.  See prctl(2) and
       * capabilities(7).
       *
       * This closes the main historical reason why only uid 0 can
       * chroot(2) - because unprivileged users can create hard links to
       * setuid binaries, and possibly confuse them into looking at data
       * (or loading libraries) that they don't expect, and thus elevating
       * privileges.
       */
      if (prctl (PR_SET_SECUREBITS,
                 SECBIT_NOROOT | SECBIT_NOROOT_LOCKED) < 0)
        fatal_errno ("prctl (SECBIT_NOROOT)");

      /* This is necessary to undo the damage "sandbox" creates on Fedora
       * by making / a shared mount instead of private.  This isn't
       * totally correct because the targets for our bind mounts may still
       * be shared, but really, Fedora's sandbox is broken.
       */
      if (mount ("/", "/", "none", MS_PRIVATE | MS_REC, NULL) < 0)
        fatal_errno ("mount(/, MS_PRIVATE | MS_REC)");

      /* Now let's set up our bind mounts */
      for (bind_mount_iter = bind_mounts; bind_mount_iter; bind_mount_iter = bind_mount_iter->next)
        {
          char *dest;
          
          asprintf (&dest, "%s%s", chroot_dir, bind_mount_iter->dest);
          
          if (bind_mount_iter->type == MOUNT_SPEC_READONLY)
            {
              if (mount (dest, dest,
                         NULL, MS_BIND | MS_PRIVATE, NULL) < 0)
                fatal_errno ("mount (MS_BIND)");
              if (mount (dest, dest,
                         NULL, MS_BIND | MS_PRIVATE | MS_REMOUNT | MS_RDONLY, NULL) < 0)
                fatal_errno ("mount (MS_BIND | MS_RDONLY)");
            }
          else if (bind_mount_iter->type == MOUNT_SPEC_BIND)
            {
              if (mount (bind_mount_iter->source, dest,
                         NULL, MS_BIND | MS_PRIVATE, NULL) < 0)
                fatal_errno ("mount (MS_BIND)");
            }
          else if (bind_mount_iter->type == MOUNT_SPEC_PROCFS)
            {
              if (mount ("proc", dest,
                         "proc", MS_MGC_VAL | MS_PRIVATE, NULL) < 0)
                fatal_errno ("mount (\"proc\")");
            }
          else
            assert (0);
          free (dest);
        }
      
      /* Actually perform the chroot. */
      if (chroot (chroot_dir) < 0)
        fatal_errno ("chroot");
      if (chdir ("/") < 0)
        fatal_errno ("chdir");

      /* Switch back to the uid of our invoking process.  These calls are
       * irrevocable - see setuid(2) */
      if (setgid (rgid) < 0)
        fatal_errno ("setgid");
      if (setuid (ruid) < 0)
        fatal_errno ("setuid");

      if (execv (program, program_argv) < 0)
        fatal_errno ("execv");
    }

  /* Let's also setuid back in the parent - there's no reason to stay uid 0, and
   * it's just better to drop privileges. */
  if (setgid (rgid) < 0)
    fatal_errno ("setgid");
  if (setuid (ruid) < 0)
    fatal_errno ("setuid");

  /* Kind of lame to sit around blocked in waitpid, but oh well. */
  if (waitpid (child, &child_status, 0) < 0)
    fatal_errno ("waitpid");
  
  if (WIFEXITED (child_status))
    return WEXITSTATUS (child_status);
  else
    return 1;
}

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: chroot(2) and bind mounts as non-root
  2011-12-07 17:54 chroot(2) and bind mounts as non-root Colin Walters
@ 2011-12-07 19:36 ` John Stoffel
  2011-12-08 16:10   ` Colin Walters
  2011-12-08 17:04   ` Arnd Bergmann
  2011-12-07 19:40 ` Andy Lutomirski
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 31+ messages in thread
From: John Stoffel @ 2011-12-07 19:36 UTC (permalink / raw)
  To: Colin Walters; +Cc: LKML, morgan, serue, dhowells, kzak

>>>>> "Colin" == Colin Walters <walters@verbum.org> writes:

Colin> I've recently been doing some work in software compilation, and it'd be
Colin> really handy if I could call chroot(2) as a non-root user.  The reason
Colin> to chroot is to help avoid "host contamination" - I can set up a build
Colin> root and then chroot in.  The reason to do it as non-root is, well,
Colin> requiring root to build software sucks for multiple obvious reasons.

What's wrong with using 'fakeroot' or tools like that instead?  Why
does the Kernel need to be involved like this?  I'm not against your
proposal so much, as trying to understand how compiling a bunch of
source requires this change.

John

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: chroot(2) and bind mounts as non-root
  2011-12-07 17:54 chroot(2) and bind mounts as non-root Colin Walters
  2011-12-07 19:36 ` John Stoffel
@ 2011-12-07 19:40 ` Andy Lutomirski
  2011-12-08 16:58   ` Colin Walters
  2011-12-07 20:34 ` H. Peter Anvin
  2011-12-10  5:29 ` Serge E. Hallyn
  3 siblings, 1 reply; 31+ messages in thread
From: Andy Lutomirski @ 2011-12-07 19:40 UTC (permalink / raw)
  To: Colin Walters; +Cc: LKML, morgan, serue, dhowells, kzak

On 12/07/2011 09:54 AM, Colin Walters wrote:
> Hi,
> 
> (TL;DR version: Please audit the attached setuid program)
> 
> I've recently been doing some work in software compilation, and it'd be
> really handy if I could call chroot(2) as a non-root user.  The reason
> to chroot is to help avoid "host contamination" - I can set up a build
> root and then chroot in.  The reason to do it as non-root is, well,
> requiring root to build software sucks for multiple obvious reasons.
> 
> (Now you can do LD_PRELOAD hacks to talk to a daemon like
> https://github.com/wrpseudo/pseudo does, but really - too gross and too
> slow).
> 
> The historical reason one can't call chroot(2) as non-root is because of
> setuid binaries (hard link a setuid binary into chroot of your choice
> with trojaned libc.so).  But it turns out a while back this commit:
> 
> commit 3898b1b4ebff8dcfbcf1807e0661585e06c9a91c
> Author: Andrew G. Morgan <morgan@kernel.org>
> Date:   Mon Apr 28 02:13:40 2008 -0700
> 
>     capabilities: implement per-process securebits
> 
> Added *exactly* what we need.  We just call:
> 
> prctl (PR_SET_SECUREBITS, SECBIT_NOROOT | SECBIT_NOROOT_LOCKED);
> 
> A setuid program to call both this and chroot(2) is *almost* good enough
> for my use case - but it's a little hard to run most build software
> without say /dev/null, /dev/urandom and /proc.
> 
> The other key thing Linux recently gained is CLONE_NEWNS - with this
> (and also SECBIT_NOROOT), we can allow users to make bind mounts to
> their heart's content, which frankly is just cool.  Bind mounts are a
> really neat VFS feature.

I will personally always be nervous until something like this happens:

http://thread.gmane.org/gmane.linux.kernel.lsm/10659

execve() is IMO scary.

--Andy

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: chroot(2) and bind mounts as non-root
  2011-12-07 17:54 chroot(2) and bind mounts as non-root Colin Walters
  2011-12-07 19:36 ` John Stoffel
  2011-12-07 19:40 ` Andy Lutomirski
@ 2011-12-07 20:34 ` H. Peter Anvin
  2011-12-07 20:54   ` Alan Cox
  2011-12-10  5:29 ` Serge E. Hallyn
  3 siblings, 1 reply; 31+ messages in thread
From: H. Peter Anvin @ 2011-12-07 20:34 UTC (permalink / raw)
  To: Colin Walters; +Cc: LKML, morgan, serue, dhowells, kzak

On 12/07/2011 09:54 AM, Colin Walters wrote:
> 
> The historical reason one can't call chroot(2) as non-root is because of
> setuid binaries (hard link a setuid binary into chroot of your choice
> with trojaned libc.so).

No.  The historical reason is that it lets anyone escape a chroot jail:

	mkdir("jailbreak", 0666);
	chroot("jailbreak");

	/* Now the cwd is outside the root, and therefore not bound by
           it, walk the chain of .. directories until they don't change
           anymore */

	chroot(".");	/* Change the root to the system root */

Oops.

	-hpa


	

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: chroot(2) and bind mounts as non-root
  2011-12-07 20:34 ` H. Peter Anvin
@ 2011-12-07 20:54   ` Alan Cox
  2011-12-15 18:55     ` Andrew G. Morgan
  0 siblings, 1 reply; 31+ messages in thread
From: Alan Cox @ 2011-12-07 20:54 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Colin Walters, LKML, morgan, serue, dhowells, kzak

On Wed, 07 Dec 2011 12:34:28 -0800
"H. Peter Anvin" <hpa@zytor.com> wrote:

> On 12/07/2011 09:54 AM, Colin Walters wrote:
> > 
> > The historical reason one can't call chroot(2) as non-root is because of
> > setuid binaries (hard link a setuid binary into chroot of your choice
> > with trojaned libc.so).
> 
> No.  The historical reason is that it lets anyone escape a chroot jail:

Beg to differ

Nobody ever considered chroot a jail except a certain brand of
urban-legend-programming people. Indeed chroot has never been a jail
except in the 'open prison' security sense of it.

The big problem with chroot was abusing setuid binaries - particularly
things like uucp and /bin/mail.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: chroot(2) and bind mounts as non-root
  2011-12-07 19:36 ` John Stoffel
@ 2011-12-08 16:10   ` Colin Walters
  2011-12-08 18:14     ` John Stoffel
  2011-12-08 17:04   ` Arnd Bergmann
  1 sibling, 1 reply; 31+ messages in thread
From: Colin Walters @ 2011-12-08 16:10 UTC (permalink / raw)
  To: John Stoffel; +Cc: LKML

On Wed, 2011-12-07 at 14:36 -0500, John Stoffel wrote:
> >>>>> "Colin" == Colin Walters <walters@verbum.org> writes:
> 
> Colin> I've recently been doing some work in software compilation, and it'd be
> Colin> really handy if I could call chroot(2) as a non-root user.  The reason
> Colin> to chroot is to help avoid "host contamination" - I can set up a build
> Colin> root and then chroot in.  The reason to do it as non-root is, well,
> Colin> requiring root to build software sucks for multiple obvious reasons.
> 
> What's wrong with using 'fakeroot' or tools like that instead? 

I assume you mean
"fakechroot" ( https://github.com/fakechroot/fakechroot/wiki )

The answer is twofold:

1) It's a pile of gross hacks that can easily be buggy, and will be
permanently trying to keep up with newer system calls.
2) It's slower.  My edit-compile-debug cycle REALLY matters to me.  If
you're a developer, it should matter to you - it directly impacts your
productivity.

How much slower?  Okay, well I tried "fakechroot" from Fedora 15.  It
appears to break parallel make.  Which obviously already disqualifies it
from being a core part of my edit-compile-debug cycle.

But here's an example of a small autotools (~6000 significant lines of
C) project, of which running configure is by far the slowest part.  Note
'metabuild' is a trivial script which wraps the
'autogen.sh;configure;make' dance:

$ metabuild   # to prime the caches
...
$ git clean -dfx
...
$ time ostbuild-user-chroot --unshare-ipc --unshare-pid --unshare-net
--mount-bind /src /src --mount-proc /proc
--mount-bind /dev /dev / /bin/sh -c 'cd /src/test-project; metabuild'
...
real	0m17.627s
user	0m9.397s
sys	0m5.074s
$ git clean -dfx
...
$ time fakeroot fakechroot chroot / /bin/sh -c 'cd /src/test-project;
metabuild -j 1'
real	0m35.327s
user	0m13.118s
sys	0m10.639s

So it almost exactly doubles...Oh, crap, I just remembered I have
ccache, so we're really only timing configure runs here.  Anyways, you
get the point.  Doubling my compile time is bad.  And this is a
relatively small project.

One of the best parts of Linux is the filesystem and VFS - it's really
amazingly fast compared to other OSes, especially if you know how to use
it.  Adding in layers of emulation and crap in between the program and
the filesystem takes that away.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: chroot(2) and bind mounts as non-root
  2011-12-07 19:40 ` Andy Lutomirski
@ 2011-12-08 16:58   ` Colin Walters
  0 siblings, 0 replies; 31+ messages in thread
From: Colin Walters @ 2011-12-08 16:58 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: LKML, Daniel Walsh, Stephen Smalley, kzak

On Wed, 2011-12-07 at 11:40 -0800, Andy Lutomirski wrote:

> I will personally always be nervous until something like this happens:
> 
> http://thread.gmane.org/gmane.linux.kernel.lsm/10659
> 
> execve() is IMO scary.

Yeah, this came up in the context of the seccomp filter stuff too.  

So...it's worth noting that as far as SELinux goes, the beautiful thing
is that setuid only changes uid - it doesn't (by default) change the
domain.  So in this case for example, if the calling domain doesn't
have:

  self:capable { setpcap sys_chroot }

The binary is just going to error out.  Very few domains in the
reference policy have either of those, and only one non-unconfined
domain has both - seunshare.  (At least via grep, I didn't try an
analysis tool).

Oh crap, speaking of seunshare:
https://bugzilla.redhat.com/show_bug.cgi?id=633544

They should clearly be using SECBIT_NOROOT too - that would have avoided
the original hole.

In fact, I now notice my program is a safer generalization of seunshare.
I've added dwalsh to CC.

As far as other LSMs go - honestly I'm in the camp that they're stupid,
and SELinux is the only sane one.  That and opt-in application controls
such as seccomp are where we should be headed.

So instead of your execve_nosecurity, why don't we make SECBIT_NOROOT
disable domain transitions too?  (And encourage the other LSMs to treat
it similarly).

Here's some further food for thought - what if we made certain system
calls that were previously privileged suddenly start working if the
process is SECBIT_NOROOT | SECBIT_NOROOT_LOCKED?  I'm specifically
thinking of unshare(2) and chroot(2).  The tricky part is more allowing
a subset of mount system calls (I need to be able to mount procfs, but
clearly we don't want the user to be able to mount real devices).



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: chroot(2) and bind mounts as non-root
  2011-12-07 19:36 ` John Stoffel
  2011-12-08 16:10   ` Colin Walters
@ 2011-12-08 17:04   ` Arnd Bergmann
  2011-12-08 17:15     ` Colin Walters
  1 sibling, 1 reply; 31+ messages in thread
From: Arnd Bergmann @ 2011-12-08 17:04 UTC (permalink / raw)
  To: John Stoffel; +Cc: Colin Walters, LKML, morgan, serue, dhowells, kzak

On Wednesday 07 December 2011, John Stoffel wrote:
> >>>>> "Colin" == Colin Walters <walters@verbum.org> writes:
> 
> Colin> I've recently been doing some work in software compilation, and it'd be
> Colin> really handy if I could call chroot(2) as a non-root user.  The reason
> Colin> to chroot is to help avoid "host contamination" - I can set up a build
> Colin> root and then chroot in.  The reason to do it as non-root is, well,
> Colin> requiring root to build software sucks for multiple obvious reasons.
> 
> What's wrong with using 'fakeroot' or tools like that instead?  Why
> does the Kernel need to be involved like this?  I'm not against your
> proposal so much, as trying to understand how compiling a bunch of
> source requires this change.

I think the better question to ask is what is missing from 'schroot', which
is commonly used for exactly this purpose. Is it just about avoing the
suid bit for /usr/bin/schroot or something else?

	Arnd

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: chroot(2) and bind mounts as non-root
  2011-12-08 17:04   ` Arnd Bergmann
@ 2011-12-08 17:15     ` Colin Walters
  0 siblings, 0 replies; 31+ messages in thread
From: Colin Walters @ 2011-12-08 17:15 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: John Stoffel, LKML

On Thu, 2011-12-08 at 17:04 +0000, Arnd Bergmann wrote:

> I think the better question to ask is what is missing from 'schroot', which
> is commonly used for exactly this purpose. Is it just about avoing the
> suid bit for /usr/bin/schroot or something else?

The trust model for schroot is that it only allows executing code
downloaded from URLs in /etc/schroot/schroot.conf.  That means it
requires root for the initial setup to edit that config file, which
disqualifies it from my use case (no part of the compilation should
require root).

In my approach, you can keep the OS tree (/usr/include, /usr/bin/gcc)
owned by the user - there is absolutely no root-owned process running
under any indirect control of the user (except the tiny bit for my new
setuid binary).

Now ultimately to *run* your installed program locally (say you're
hacking on a system daemon like gdm), then clearly you need root.  But
nothing before that should.  And if you're not running it locally (e.g.
you're cross compiling, running a build server which distributes
binaries to other machines), then there is no root required.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: chroot(2) and bind mounts as non-root
  2011-12-08 16:10   ` Colin Walters
@ 2011-12-08 18:14     ` John Stoffel
  2011-12-08 18:26       ` Colin Walters
  0 siblings, 1 reply; 31+ messages in thread
From: John Stoffel @ 2011-12-08 18:14 UTC (permalink / raw)
  To: Colin Walters; +Cc: John Stoffel, LKML

>>>>> "Colin" == Colin Walters <walters@verbum.org> writes:

Colin> On Wed, 2011-12-07 at 14:36 -0500, John Stoffel wrote:
>> >>>>> "Colin" == Colin Walters <walters@verbum.org> writes:
>> 
Colin> I've recently been doing some work in software compilation, and it'd be
Colin> really handy if I could call chroot(2) as a non-root user.  The reason
Colin> to chroot is to help avoid "host contamination" - I can set up a build
Colin> root and then chroot in.  The reason to do it as non-root is, well,
Colin> requiring root to build software sucks for multiple obvious reasons.
>> 
>> What's wrong with using 'fakeroot' or tools like that instead? 

Colin> I assume you mean
Colin> "fakechroot" ( https://github.com/fakechroot/fakechroot/wiki )

Nah, I'm a doofus and mis-remembered about how fakeroot works to just
fake 'root' access for installers and such.  

Colin> The answer is twofold:

Colin> 1) It's a pile of gross hacks that can easily be buggy, and will be
Colin> permanently trying to keep up with newer system calls.
Colin> 2) It's slower.  My edit-compile-debug cycle REALLY matters to me.  If
Colin> you're a developer, it should matter to you - it directly impacts your
Colin> productivity.

Sure I can understand that, but why does your compiler need to be in a
chroot'd area?  If you're doing a cross compile, then just change the
tool chain.

Colin> How much slower?  Okay, well I tried "fakechroot" from Fedora 15.  It
Colin> appears to break parallel make.  Which obviously already disqualifies it
Colin> from being a core part of my edit-compile-debug cycle.

Find.

Colin> But here's an example of a small autotools (~6000 significant lines of
Colin> C) project, of which running configure is by far the slowest part.  Note
Colin> 'metabuild' is a trivial script which wraps the
Colin> 'autogen.sh;configure;make' dance:

Colin> $ metabuild   # to prime the caches
Colin> ...
Colin> $ git clean -dfx
Colin> ...
Colin> $ time ostbuild-user-chroot --unshare-ipc --unshare-pid --unshare-net
Colin> --mount-bind /src /src --mount-proc /proc
Colin> --mount-bind /dev /dev / /bin/sh -c 'cd /src/test-project; metabuild'

So what does 'ostbuild-user-chroot' that a simple makefile into a
seperate build area (with source just where it is now) doesn't do for
you?  

Or is it because you're trying to edit on one OS, such a fedora 14,
then build and debug inside an Debian 5.0 setup?  But without running
a completely seperate system, but just doing a chroot into a new
filesystem tree?  

Colin> So it almost exactly doubles...Oh, crap, I just remembered I
Colin> have ccache, so we're really only timing configure runs here.
Colin> Anyways, you get the point.  Doubling my compile time is bad.
Colin> And this is a relatively small project.

I guess I still don't understand why your compile setup requires/wants
a chroot'd area.  Just setup your toolchain without all that hassle.  

Colin> One of the best parts of Linux is the filesystem and VFS - it's
Colin> really amazingly fast compared to other OSes, especially if you
Colin> know how to use it.  Adding in layers of emulation and crap in
Colin> between the program and the filesystem takes that away.

I'm just pushing back because I think you're using a hammer to try and
drive staples or screws.  It sorta works but....

Feel free to ignore my objections, I'm not a core developer by any
means.  

John

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: chroot(2) and bind mounts as non-root
  2011-12-08 18:14     ` John Stoffel
@ 2011-12-08 18:26       ` Colin Walters
  2011-12-09  0:49         ` Sven-Haegar Koch
  2011-12-09 14:55         ` John Stoffel
  0 siblings, 2 replies; 31+ messages in thread
From: Colin Walters @ 2011-12-08 18:26 UTC (permalink / raw)
  To: John Stoffel; +Cc: LKML

On Thu, 2011-12-08 at 13:14 -0500, John Stoffel wrote:

> Or is it because you're trying to edit on one OS, such a fedora 14,
> then build and debug inside an Debian 5.0 setup?  But without running
> a completely seperate system, but just doing a chroot into a new
> filesystem tree?  

Yes, something like that; basically it's about ensuring that the libfoo
we're building binaries against is /home/walters/build/libfoo.so and
not /usr/lib/libfoo.so.

I'm actually intending for the core build system of my OS to work in
*both* cross and native compilation.  That means it's important to keep
them as close as possible.

What you were talking about above (i.e. "just don't chroot") is what
http://buildroot.net does (and others, I also semi-maintain GNOME's
jhbuild).  It works if you're very careful in your build scripts, know
and carefully propagate the large set of magic environment variables,
etc., then yes, you can do it.

But chroot is just so nice a hammer for this nail.




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: chroot(2) and bind mounts as non-root
  2011-12-08 18:26       ` Colin Walters
@ 2011-12-09  0:49         ` Sven-Haegar Koch
  2011-12-09 14:55         ` John Stoffel
  1 sibling, 0 replies; 31+ messages in thread
From: Sven-Haegar Koch @ 2011-12-09  0:49 UTC (permalink / raw)
  To: Colin Walters; +Cc: John Stoffel, LKML

On Thu, 8 Dec 2011, Colin Walters wrote:

> On Thu, 2011-12-08 at 13:14 -0500, John Stoffel wrote:
> 
> > Or is it because you're trying to edit on one OS, such a fedora 14,
> > then build and debug inside an Debian 5.0 setup?  But without running
> > a completely seperate system, but just doing a chroot into a new
> > filesystem tree?  
> 
> Yes, something like that; basically it's about ensuring that the libfoo
> we're building binaries against is /home/walters/build/libfoo.so and
> not /usr/lib/libfoo.so.
> 
> I'm actually intending for the core build system of my OS to work in
> *both* cross and native compilation.  That means it's important to keep
> them as close as possible.
> 
> What you were talking about above (i.e. "just don't chroot") is what
> http://buildroot.net does (and others, I also semi-maintain GNOME's
> jhbuild).  It works if you're very careful in your build scripts, know
> and carefully propagate the large set of magic environment variables,
> etc., then yes, you can do it.
> 
> But chroot is just so nice a hammer for this nail.

For Debian there is schroot ("securely enter a chroot environment"), 
which nicely encapsulates entering pre-prepared chroots as a user (with 
a suid root program), setting up bind mounts, etc etc, and in the end 
landing inside the chroot as the calling user (if you have the 
permissions).

I use this to have build environments for a couple old older Debian 
releases and both 32+64 bit available on a single 64bit machine.

Source is available at git://git.debian.org/git/buildd-tools/schroot.git

Can't think of it only working on a Debian - maybe give it a try.

c'ya
sven-haegar

-- 
Three may keep a secret, if two of them are dead.
- Ben F.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: chroot(2) and bind mounts as non-root
  2011-12-08 18:26       ` Colin Walters
  2011-12-09  0:49         ` Sven-Haegar Koch
@ 2011-12-09 14:55         ` John Stoffel
  2011-12-09 15:06           ` Colin Walters
  1 sibling, 1 reply; 31+ messages in thread
From: John Stoffel @ 2011-12-09 14:55 UTC (permalink / raw)
  To: Colin Walters; +Cc: John Stoffel, LKML

>>>>> "Colin" == Colin Walters <walters@verbum.org> writes:

Colin> On Thu, 2011-12-08 at 13:14 -0500, John Stoffel wrote:
>> Or is it because you're trying to edit on one OS, such a fedora 14,
>> then build and debug inside an Debian 5.0 setup?  But without running
>> a completely seperate system, but just doing a chroot into a new
>> filesystem tree?  

Colin> Yes, something like that; basically it's about ensuring that
Colin> the libfoo we're building binaries against is
Colin> /home/walters/build/libfoo.so and not /usr/lib/libfoo.so.

Colin> I'm actually intending for the core build system of my OS to
Colin> work in *both* cross and native compilation.  That means it's
Colin> important to keep them as close as possible.

Colin> What you were talking about above (i.e. "just don't chroot") is
Colin> what http://buildroot.net does (and others, I also
Colin> semi-maintain GNOME's jhbuild).  It works if you're very
Colin> careful in your build scripts, know and carefully propagate the
Colin> large set of magic environment variables, etc., then yes, you
Colin> can do it.

Colin> But chroot is just so nice a hammer for this nail.

I can see that, but maybe you can still fix this in userspace using
the schroot tool others have mentioned.  

John

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: chroot(2) and bind mounts as non-root
  2011-12-09 14:55         ` John Stoffel
@ 2011-12-09 15:06           ` Colin Walters
  0 siblings, 0 replies; 31+ messages in thread
From: Colin Walters @ 2011-12-09 15:06 UTC (permalink / raw)
  To: John Stoffel; +Cc: LKML

On Fri, 2011-12-09 at 09:55 -0500, John Stoffel wrote:

> I can see that, but maybe you can still fix this in userspace using
> the schroot tool others have mentioned.  

No, because it requires root to edit /etc/schroot/schroot.conf.  I've
already said this.  What is not being understood?

Again, the design constraint I have is that you should be able to get a
plain regular Unix account on say some classical timesharing server (in
the cloud if you like, or your university's RHEL instance), and do the
build.

This is also advantageous even in the "building on personal laptop case"
in that there is *no* instance of a user process being in direct or
indirect control over processes started as root - much less chance one
of those random postinst scripts that run as root not noticing they're
in a chroot and screwing your system.

If you allow the a user to upload .debs to the URL in schroot.conf, all
you have created in the end is a very elaborate chmod u+s /bin/sh for
them.

Does that make sense?  Stop telling me about schroot, I knew about it
even before I posted here, and I've already replied about it.




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: chroot(2) and bind mounts as non-root
  2011-12-07 17:54 chroot(2) and bind mounts as non-root Colin Walters
                   ` (2 preceding siblings ...)
  2011-12-07 20:34 ` H. Peter Anvin
@ 2011-12-10  5:29 ` Serge E. Hallyn
  2011-12-12 16:41   ` Colin Walters
  3 siblings, 1 reply; 31+ messages in thread
From: Serge E. Hallyn @ 2011-12-10  5:29 UTC (permalink / raw)
  To: Colin Walters; +Cc: LKML, morgan, Eric W. Biederman, dhowells, kzak

Quoting Colin Walters (walters@verbum.org):
>       if (prctl (PR_SET_SECUREBITS,
>                  SECBIT_NOROOT | SECBIT_NOROOT_LOCKED) < 0)
>         fatal_errno ("prctl (SECBIT_NOROOT)");

Sorry, only just saw this now.  Haven't taken too close a look, but a
comment and a few warnings.

First, 	what you are after is an explicit goal of user namespaces:  to
be able to change the environment without risk of fooling privileged
setuid programs with that environment.  And, thereby, to allow unprivileged
users to clone namespaces and, in new namespaces, freely muck with the
resources they own or create.  However, they're not quite usable yet.

So regarding your use of securebits: You are preventing a setuid-root
program from automatically acquiring capabilities, which is a good
start.  However, a setuid-root program will still execute as root (or
a setuid-mysql program as setuid-mysql).  That means it will own
root (or mysql) files while it is running.  That could still be
plenty dangerous.  (With user namespaces, programs running as the
container root will not own files belonging to the host root as they
are different users.)  Second, programs with file capabilities -a
more finegrained alternative to setuid-root - will still run with
privilege.  You could prevent that by not allowing xattrs I suppose.

-serge

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: chroot(2) and bind mounts as non-root
  2011-12-10  5:29 ` Serge E. Hallyn
@ 2011-12-12 16:41   ` Colin Walters
  2011-12-12 23:11     ` Serge E. Hallyn
  0 siblings, 1 reply; 31+ messages in thread
From: Colin Walters @ 2011-12-12 16:41 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: LKML, alan, morgan, Eric W. Biederman, luto, kzak, Steve Grubb

On Sat, 2011-12-10 at 05:29 +0000, Serge E. Hallyn wrote:

> First, 	what you are after is an explicit goal of user namespaces:  to
> be able to change the environment without risk of fooling privileged
> setuid programs with that environment. 

Hmm...so I looked at the user namespace stuff
( https://wiki.ubuntu.com/UserNamespace ) and it kind of scares me in
terms of complexity.  I think I understand the intersection of cgroups,
capabilities, and SELinux as they are today; this would be a whole new
set of options.  But that's an aside.

>  And, thereby, to allow unprivileged
> users to clone namespaces and, in new namespaces, freely muck with the
> resources they own or create.  However, they're not quite usable yet.

So I'm assuming the actual high level goal of user namespaces is more
secure "containers" where you can run a mostly unmodified General
Purpose Linux system which includes setuid binaries, creating new users
etc., right?

If that's the case then my use case is much smaller - I don't need to be
able to run setuid binaries, or in fact change user ids at all.

A tool like this would make my life *so* much better that I'm trying
hard to use existing kernel features.

> So regarding your use of securebits: You are preventing a setuid-root
> program from automatically acquiring capabilities, which is a good
> start.  However, a setuid-root program will still execute as root (or
> a setuid-mysql program as setuid-mysql).  That means it will own
> root (or mysql) files while it is running. 

Oh, very good point.  I should have noticed that =/

But it was pretty trivial to modify my tool to make a MS_NOSUID bind
mount over /:

      mount (NULL, "/", "none", MS_PRIVATE | MS_REMOUNT | MS_NOSUID,
NULL);

That's hopefully enough to plug that hole (right?), albeit not in a
beautiful way.  I would be happier with a prctl to turn off suid
binaries entirely.

Oh...ok, digging farther back in here from the thread Andy started I see
Eric proposed a patch for exactly this:

https://lkml.org/lkml/2009/12/30/265

Ok, I've now read most of the back threads for this - I should have
searched farther back for previous discussion, sorry.

>  Second, programs with file capabilities -a
> more finegrained alternative to setuid-root - will still run with
> privilege.  You could prevent that by not allowing xattrs I suppose.

Looks to me like the MS_NOSUID bind mount prevents acquisition of file
capabilities too.

I experimented with dropping all capabilities from the capability
bounding set, but the API seems a bit lame in that CAP_LAST_CAP is
encoded in the kernel capability.h, but if an old binary is run on a new
kernel, I might silently fail to drop a newly added capability.  Right?
Steve Grubb's "libcap-ng" appears to not handle this scenario at all;
Steve, am I missing something?

Anyways, in the big picture here I think this tool is now pretty safe to
install suid root, since we rely on MS_NOSUID to close all privilege
escalation mechanisms today from plugging in a USB drive, which is
effectively "user controls arbitrary filesystem layout".

But getting in Eric's patch for disabling suid binaries from a process
tree would be really nice.  Alan, do you still object?  Your main issue
seemed to be that it should be in a LSM, but the suid issue does span
existing LSMs.  And as far as adding restrictions introduces new attack
vectors, pretty much all of those are abusing suid binaries, precisely
what we just want to axe off entirely.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: chroot(2) and bind mounts as non-root
  2011-12-12 16:41   ` Colin Walters
@ 2011-12-12 23:11     ` Serge E. Hallyn
  2011-12-15 20:56       ` Colin Walters
  0 siblings, 1 reply; 31+ messages in thread
From: Serge E. Hallyn @ 2011-12-12 23:11 UTC (permalink / raw)
  To: Colin Walters
  Cc: LKML, alan, morgan, Eric W. Biederman, luto, kzak, Steve Grubb

Quoting Colin Walters (walters@verbum.org):
> But it was pretty trivial to modify my tool to make a MS_NOSUID bind
> mount over /:
> 
>       mount (NULL, "/", "none", MS_PRIVATE | MS_REMOUNT | MS_NOSUID,
> NULL);
> 
> That's hopefully enough to plug that hole (right?), albeit not in a

Heh, yeah I think that suffices :)

...

> Looks to me like the MS_NOSUID bind mount prevents acquisition of file
> capabilities too.

Yup.

> I experimented with dropping all capabilities from the capability
> bounding set, but the API seems a bit lame in that CAP_LAST_CAP is
> encoded in the kernel capability.h, but if an old binary is run on a new
> kernel, I might silently fail to drop a newly added capability.  Right?

Look at the cap_get_bound.3 manpage, and look for CAP_IS_SUPPORTED.
If you start at CAP_LAST_CAP and keep going up/down depending on whether
it was support or not it shouldn't take too long to find the last
valid value.  Not ideal, but should be reliable.

> Steve Grubb's "libcap-ng" appears to not handle this scenario at all;
> Steve, am I missing something?
> 
> Anyways, in the big picture here I think this tool is now pretty safe to
> install suid root, since we rely on MS_NOSUID to close all privilege

I haven't taken a critical look at the mount code but other than that
it seems reasonable and useful to me!  Thanks.

> escalation mechanisms today from plugging in a USB drive, which is
> effectively "user controls arbitrary filesystem layout".
> 
> But getting in Eric's patch for disabling suid binaries from a process
> tree would be really nice.  Alan, do you still object?  Your main issue
> seemed to be that it should be in a LSM, but the suid issue does span
> existing LSMs.  And as far as adding restrictions introduces new attack
> vectors, pretty much all of those are abusing suid binaries, precisely
> what we just want to axe off entirely.

-serge

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: chroot(2) and bind mounts as non-root
  2011-12-07 20:54   ` Alan Cox
@ 2011-12-15 18:55     ` Andrew G. Morgan
  2011-12-16 15:44       ` Colin Walters
  0 siblings, 1 reply; 31+ messages in thread
From: Andrew G. Morgan @ 2011-12-15 18:55 UTC (permalink / raw)
  To: Alan Cox; +Cc: H. Peter Anvin, Colin Walters, LKML, serue, dhowells, kzak

I'm genuinely confused whether all these concerns are valid with file
capabilities.

Consider (let's say luser is some user that I want to be active inside
the chroot, but I don't want to allow regular login to my system):

morgan> sudo su luser
luser> mkdir /tmp/chroot/
luser> chmod go-rx /tmp/chroot/
luser> exit

morgan> cat > launcher.c <<EOT
#include <stdio.h>
#include <unistd.h>
int main(int argc, char **argv)
{
  int ret = chroot("/tmp/chroot");
  printf("chroot %s.\n", ret ? "failed" : "worked");
  ret = chdir("/");
  printf("chdir %s.\n", ret ? "failed" : "worked");
  // Insert exec code to invoke chroot'd shell or whatever.
  return ret;
}
EOT
morgan> make launcher
cc launcher.c -o launcher
morgan> mv launcher /tmp/
morgan> sudo -s
root> setcap cap_sys_chroot=ep /tmp/launcher
root> cp <files and directories needed in the chroot> /tmp/chroot/
root> su luser

luser> /tmp/launcher

The last line being something that involves luser only - ie. it gives
no user privilege away to any child it might launch. Its also the only
regular user able to chroot to this luser owned chroot (because that
directory is exclusive to that user).

Is there a need for privileged binaries within /tmp/chroot? If not,
how might they get there (without help from root, always presuming I
can prevent luser from logging in outside of this chroot'd
environment)?

Thanks

Andrew

On Wed, Dec 7, 2011 at 12:54 PM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> On Wed, 07 Dec 2011 12:34:28 -0800
> "H. Peter Anvin" <hpa@zytor.com> wrote:
>
>> On 12/07/2011 09:54 AM, Colin Walters wrote:
>> >
>> > The historical reason one can't call chroot(2) as non-root is because of
>> > setuid binaries (hard link a setuid binary into chroot of your choice
>> > with trojaned libc.so).
>>
>> No.  The historical reason is that it lets anyone escape a chroot jail:
>
> Beg to differ
>
> Nobody ever considered chroot a jail except a certain brand of
> urban-legend-programming people. Indeed chroot has never been a jail
> except in the 'open prison' security sense of it.
>
> The big problem with chroot was abusing setuid binaries - particularly
> things like uucp and /bin/mail.
>
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: chroot(2) and bind mounts as non-root
  2011-12-12 23:11     ` Serge E. Hallyn
@ 2011-12-15 20:56       ` Colin Walters
  2011-12-16  6:14         ` Eric W. Biederman
  0 siblings, 1 reply; 31+ messages in thread
From: Colin Walters @ 2011-12-15 20:56 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: LKML, alan, morgan, Eric W. Biederman, luto, kzak, Steve Grubb

On Mon, 2011-12-12 at 23:11 +0000, Serge E. Hallyn wrote:

> Look at the cap_get_bound.3 manpage, and look for CAP_IS_SUPPORTED.
> If you start at CAP_LAST_CAP and keep going up/down depending on whether
> it was support or not it shouldn't take too long to find the last
> valid value.  Not ideal, but should be reliable.

Blah =/  I think I'll just rely on the MS_NOSUID bind mount for now.

> I haven't taken a critical look at the mount code but other than that
> it seems reasonable and useful to me!  Thanks.

Can you link me to any discussion of how the user namespace stuff you're
working on would enable any of this (chroot, bind mounts) to be
available to "unprivileged" users?  Is it that once a non-uid 0 process
enters a new namespace, when executing a setuid 0 binary from the
filesystem, because that binary is from a different user namespace, the
setuid bits don't apply?

What does it even mean for a file to be "owned" by a user namespace -
unless you're talking about patching e.g. ext4 to persist namespaces
somehow.

Where I'd ultimately like to get is having this utility in util-linux,
but before I propose that I'd like to have a good idea what the
possibilities are with user namespaces.

The more I think about this though, the more I am a big fan of what the
OpenWall people are doing - if it gets me chroot as a user, I am totally
on board with just removing all setuid binaries.  We're already fairly
far along on doing that in GNOME by using PolicyKit mechanisms anyways.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: chroot(2) and bind mounts as non-root
  2011-12-15 20:56       ` Colin Walters
@ 2011-12-16  6:14         ` Eric W. Biederman
  2011-12-18 16:01           ` Colin Walters
  2011-12-21 18:15           ` Steve Grubb
  0 siblings, 2 replies; 31+ messages in thread
From: Eric W. Biederman @ 2011-12-16  6:14 UTC (permalink / raw)
  To: Colin Walters
  Cc: Serge E. Hallyn, LKML, alan, morgan, luto, kzak, Steve Grubb

Colin Walters <walters@verbum.org> writes:

> On Mon, 2011-12-12 at 23:11 +0000, Serge E. Hallyn wrote:
>
>> Look at the cap_get_bound.3 manpage, and look for CAP_IS_SUPPORTED.
>> If you start at CAP_LAST_CAP and keep going up/down depending on whether
>> it was support or not it shouldn't take too long to find the last
>> valid value.  Not ideal, but should be reliable.
>
> Blah =/  I think I'll just rely on the MS_NOSUID bind mount for now.
>
>> I haven't taken a critical look at the mount code but other than that
>> it seems reasonable and useful to me!  Thanks.
>
> Can you link me to any discussion of how the user namespace stuff you're
> working on would enable any of this (chroot, bind mounts) to be
> available to "unprivileged" users?  Is it that once a non-uid 0 process
> enters a new namespace, when executing a setuid 0 binary from the
> filesystem, because that binary is from a different user namespace, the
> setuid bits don't apply?
>
> What does it even mean for a file to be "owned" by a user namespace -
> unless you're talking about patching e.g. ext4 to persist namespaces
> somehow.
>
> Where I'd ultimately like to get is having this utility in util-linux,
> but before I propose that I'd like to have a good idea what the
> possibilities are with user namespaces.

The essentials is that all of the security credentials a process sees
(uids, gids, capabilities, keys) all belong to the user namespace.  This
allows process migration while still being able to use the same global
identifiers you were using before.  At the same time this means that
once you enter a user namespace all of the capabilities you can acquire
are relative to that user namespace.

You can look at the details of ns_capable (merged) to see how those
capabilities will work.

It is envisioned that the other namespaces will start recording the user
namespace that created them so we can evaluate ns_capable relative to
the creator of those namespaces.  (It is trivial work we are just
holding off so we don't introduce a security hole while we get the
other bits implemented).

Which means it is safe to enter a new user namespace without root
privileges as once you are in if you execute a suid app it will be suid
relative to your user namespace.  The careful changing of capable to
ns_capable will allow other namespaces and other things that today are
root only because of fears of mucking up the execution environment to be
enabled.

What is slightly up in the air is how do we map user namespaces to
filesystems.  The simplest solution looks to be to setup a uid and gid
mappings from each child user namespace to the initial system user
namespace.  Then in a child user namespace setuid(2) will fail if
you attempt to use an id that does not have a mapping.

Similarly in fs/exec.c:prepare_binprm() at the point where we test
MNT_NOSUID we will add an additional test to see if the uid and gid
of the executable will map to the target user namespace.  If the ids
don't map we skip the suid step entirely.

Since except at the edges of userspace we use uids and gids in the
initial user namespace, the implications for confusing other security
mechanisms is minimized.

The downside of requiring a mapping is that there is the tiniest bit of
user policy that will have to be added to the distributions to take full
advantage of the user namespace.  If you don't have that policy setup
your real uid will not change but you will appear to userspace and uid
0. Which should be sufficient to compile, chroot, mount and just about
everything else interesting without privileges.

> The more I think about this though, the more I am a big fan of what the
> OpenWall people are doing - if it gets me chroot as a user, I am totally
> on board with just removing all setuid binaries.  We're already fairly
> far along on doing that in GNOME by using PolicyKit mechanisms
> anyways.

I am a great fan of the idea of removing from user space applications
the ability to gain privileges during exec.  There are some many fewer
cases you have to audit for, and it requires less kernel code to support
overall.  Although I admit the direction you have suggested at the
beginning of this thread has it's appeal.

Still I find in the kernel it generally is easier to solve the general
case.  It makes everyone happy and it removes the need to ask people to
rewrite all of their in house applications.

Eric

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: chroot(2) and bind mounts as non-root
  2011-12-15 18:55     ` Andrew G. Morgan
@ 2011-12-16 15:44       ` Colin Walters
  2011-12-18  1:22         ` Andrew G. Morgan
  0 siblings, 1 reply; 31+ messages in thread
From: Colin Walters @ 2011-12-16 15:44 UTC (permalink / raw)
  To: Andrew G. Morgan; +Cc: Alan Cox, H. Peter Anvin, LKML, serue, dhowells, kzak

[-- Attachment #1: Type: text/plain, Size: 2650 bytes --]

On Thu, 2011-12-15 at 10:55 -0800, Andrew G. Morgan wrote:
> I'm genuinely confused whether all these concerns are valid with file
> capabilities.
> 
> Consider (let's say luser is some user that I want to be active inside
> the chroot, but I don't want to allow regular login to my system):

Then we already have different deployment scenarios.  You seem to be
imagining a system where some user has an environment preconfigured by a
system administrator.  My constraint (read my previous posts) is that
the functionality must be available "out of the box" on a mainstream
"distro" such as RHEL or Debian to any uid.  I don't even want to
require addition to some magical group (that in reality is often a root
backdoor anyways).

> root> setcap cap_sys_chroot=ep /tmp/launcher
> Is there a need for privileged binaries within /tmp/chroot? If not,
> how might they get there (without help from root, always presuming I
> can prevent luser from logging in outside of this chroot'd
> environment)?

First of all, as I mentioned in my original mail (and is still in the
Subject line), chroot(2) *almost* gets me what I want - except I need
the ability to at least mount /proc, and being able to do bind mounts is
necessary to use /dev. 

But let's just ignore the bind mounts for a second and pretend
cap_sys_chroot is enough.  Is your suggestion that we could distribute a
copy of /usr/sbin/chroot that grants cap_sys_chroot via file caps a
secure thing to add to util-linux?  Or we could just add it to
coreutils?

See the attached shell script for an attack that should work against
*any* setuid binary that uses glibc.  I wrote this without looking at
other exploits on the internet, just reading the glibc sources - mainly
for my own edification.

It turns out in this case glibc trusts the contents of /etc, and in
particular /etc/ld.so.preload.  So all I need to do is make a shared
library that just runs /bin/bash as a __attribute__ ((constructor)), and
when the glibc dynamic linker is loading /bin/su that I've hardlinked
into the chroot, game over:

$ cp /usr/sbin/chroot /usr/local/bin/fcaps-chroot
$ sudo setcap cap_sys_chroot=ep /usr/local/bin/fcaps-chroot
$ ./chroot-with-su.sh
$ fcaps-chroot mychroot
(now inside the chroot, but still uid=500)
$ echo /lib64/rootshell.so > /etc/ld.so.preload
$ su -
uid=500; euid=0; starting /bin/bash
# id        
uid=0 gid=500 groups=500

The glibc linker also doesn't check that e.g. /lib64/libc.so.6 is owned
by root - clearly I could just replace that with whatever I want.  But
this is less typing.  Note glibc isn't buggy here, it was designed in a
world where unprivileged users can't chroot.


[-- Attachment #2: chroot-with-su.sh --]
[-- Type: application/x-shellscript, Size: 1120 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: chroot(2) and bind mounts as non-root
  2011-12-16 15:44       ` Colin Walters
@ 2011-12-18  1:22         ` Andrew G. Morgan
  2011-12-18 15:19           ` Colin Walters
  0 siblings, 1 reply; 31+ messages in thread
From: Andrew G. Morgan @ 2011-12-18  1:22 UTC (permalink / raw)
  To: Colin Walters; +Cc: Alan Cox, H. Peter Anvin, LKML, serue, dhowells, kzak

On Fri, Dec 16, 2011 at 7:44 AM, Colin Walters <walters@verbum.org> wrote:
> On Thu, 2011-12-15 at 10:55 -0800, Andrew G. Morgan wrote:
>> I'm genuinely confused whether all these concerns are valid with file
>> capabilities.
>>
>> Consider (let's say luser is some user that I want to be active inside
>> the chroot, but I don't want to allow regular login to my system):
>
> Then we already have different deployment scenarios.  You seem to be
> imagining a system where some user has an environment preconfigured by a
> system administrator.  My constraint (read my previous posts) is that
> the functionality must be available "out of the box" on a mainstream
> "distro" such as RHEL or Debian to any uid.  I don't even want to
> require addition to some magical group (that in reality is often a root
> backdoor anyways).

I don't read any issues with this in your original post. What I read
there is that you want to run a build in a chroot environment. Are you
also implying that the user gets to build this chroot filesystem from
nothing - without any privileges - or are you assuming that the root
user provides some sort of template into which the user adds
build-relevant files?

If the former, then yes I think you are going to have a very hard
time. If the latter then I still don't see the core problem...

>
>> root> setcap cap_sys_chroot=ep /tmp/launcher
>> Is there a need for privileged binaries within /tmp/chroot? If not,
>> how might they get there (without help from root, always presuming I
>> can prevent luser from logging in outside of this chroot'd
>> environment)?
>
> First of all, as I mentioned in my original mail (and is still in the
> Subject line), chroot(2) *almost* gets me what I want - except I need
> the ability to at least mount /proc, and being able to do bind mounts is
> necessary to use /dev.
>
> But let's just ignore the bind mounts for a second and pretend
> cap_sys_chroot is enough.  Is your suggestion that we could distribute a
> copy of /usr/sbin/chroot that grants cap_sys_chroot via file caps a
> secure thing to add to util-linux?  Or we could just add it to
> coreutils?

Before reaching that finish line, my suggestions/questions are trying
to get to the bottom of why this is believed impossible.

>
> See the attached shell script for an attack that should work against
> *any* setuid binary that uses glibc.  I wrote this without looking at
> other exploits on the internet, just reading the glibc sources - mainly
> for my own edification.
>
> It turns out in this case glibc trusts the contents of /etc, and in
> particular /etc/ld.so.preload.  So all I need to do is make a shared
> library that just runs /bin/bash as a __attribute__ ((constructor)), and
> when the glibc dynamic linker is loading /bin/su that I've hardlinked
> into the chroot, game over:
>
> $ cp /usr/sbin/chroot /usr/local/bin/fcaps-chroot
> $ sudo setcap cap_sys_chroot=ep /usr/local/bin/fcaps-chroot
> $ ./chroot-with-su.sh
> $ fcaps-chroot mychroot
> (now inside the chroot, but still uid=500)

So, you are saying that if I can explain how to prevent this from working:

> $ echo /lib64/rootshell.so > /etc/ld.so.preload

And prevent this from being possible:

> $ su -
> uid=500; euid=0; starting /bin/bash

You'll have what you want?

Or are there some other constraints not mentioned?

Thanks

Andrew

> # id
> uid=0 gid=500 groups=500
>
> The glibc linker also doesn't check that e.g. /lib64/libc.so.6 is owned
> by root - clearly I could just replace that with whatever I want.  But
> this is less typing.  Note glibc isn't buggy here, it was designed in a
> world where unprivileged users can't chroot.
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: chroot(2) and bind mounts as non-root
  2011-12-18  1:22         ` Andrew G. Morgan
@ 2011-12-18 15:19           ` Colin Walters
  0 siblings, 0 replies; 31+ messages in thread
From: Colin Walters @ 2011-12-18 15:19 UTC (permalink / raw)
  To: Andrew G. Morgan; +Cc: Alan Cox, H. Peter Anvin, LKML, serue, dhowells, kzak

On Sat, 2011-12-17 at 17:22 -0800, Andrew G. Morgan wrote:

> I don't read any issues with this in your original post. What I read
> there is that you want to run a build in a chroot environment. Are you
> also implying that the user gets to build this chroot filesystem from
> nothing - without any privileges - 

Yes.  The filesystem is owned by the user.  

> If the former, then yes I think you are going to have a very hard
> time.

Well, it already works with the setuid program I attached earlier.
So...what are we trying to accomplish in this discussion? 

If you think there's a way to allow users to chroot *without* cutting
off setuid binaries, I am definitely interested in that.  However I'm
very, very skeptical.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: chroot(2) and bind mounts as non-root
  2011-12-16  6:14         ` Eric W. Biederman
@ 2011-12-18 16:01           ` Colin Walters
  2011-12-19  0:55             ` Eric W. Biederman
  2011-12-21 18:15           ` Steve Grubb
  1 sibling, 1 reply; 31+ messages in thread
From: Colin Walters @ 2011-12-18 16:01 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge E. Hallyn, LKML, alan, morgan, luto, kzak, Steve Grubb

On Thu, 2011-12-15 at 22:14 -0800, Eric W. Biederman wrote:

> Which means it is safe to enter a new user namespace without root
> privileges as once you are in if you execute a suid app it will be suid
> relative to your user namespace.  The careful changing of capable to
> ns_capable will allow other namespaces and other things that today are
> root only because of fears of mucking up the execution environment to be
> enabled.
> 
> What is slightly up in the air is how do we map user namespaces to
> filesystems.  The simplest solution looks to be to setup a uid and gid
> mappings from each child user namespace to the initial system user
> namespace.  Then in a child user namespace setuid(2) will fail if
> you attempt to use an id that does not have a mapping.

But setting up a mapping is a privileged operation, right?  So then it
seems that practically speaking in an "out of the box" scenario on a
distro like RHEL or Debian, since there's no mapping configured, after a
process enters a new namespace it can't run setuid binaries?  

Also I don't see how user namespaces can replace "fakeroot" if this is
true.  The whole point of fakeroot is being able to do things like "make
install DESTDIR=/home/user/tmpdir && tar cz -C /home/user/tmpdir -f
foo.tar.gz ." to get a tarball with root-owned files, without actually
requiring the privileges to temporarily make real root owned files.  But
without a privileged mapping operation there's no way to map uid 0 in
the namespace to something else on the filesystem, right?

Basically it's not clear to me how you make user namespaces really
flexible without patching the filesystems to support persisting the
namespaces somehow.  Unix diehards will probably groan at this, but
honestly the Windows approach where "uids" (SIDs) are strings has its
appeal...that still requires patching filesystems (and in the end lots
of userspace) but it's much more flexible.

I can see how the user namespace work is useful for containers though.

> At the same time this means that
> once you enter a user namespace all of the capabilities you can
> acquire
> are relative to that user namespace.

So it seems like practically speaking if the goal is to be able to
securely run code that "feels like" uid 0 in a container (e.g. start
apache) you have to drop off most of the capabilities that let you take
over the "host".  There's a number of these in CAP_SYS_ADMIN.

> Still I find in the kernel it generally is easier to solve the general
> case.  It makes everyone happy and it removes the need to ask people to
> rewrite all of their in house applications.

Right, clearly we can't just drop support for setuid binaries from the
kernel, but we *do* have the source code to userspace...it's at least
worth thinking about what could be better if we can assume there aren't
setuid binaries.

I need to think more about the user namespace stuff - but I'm not
getting the impression so far it'll allow me to do what I want without
adding a new setuid binary (or a mount hardlink) to util-linux
basically.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: chroot(2) and bind mounts as non-root
  2011-12-18 16:01           ` Colin Walters
@ 2011-12-19  0:55             ` Eric W. Biederman
  2011-12-19  4:06               ` Serge E. Hallyn
  2011-12-20 21:23               ` Colin Walters
  0 siblings, 2 replies; 31+ messages in thread
From: Eric W. Biederman @ 2011-12-19  0:55 UTC (permalink / raw)
  To: Colin Walters
  Cc: Serge E. Hallyn, LKML, alan, morgan, luto, kzak, Steve Grubb

Colin Walters <walters@verbum.org> writes:

> On Thu, 2011-12-15 at 22:14 -0800, Eric W. Biederman wrote:
>
>> Which means it is safe to enter a new user namespace without root
>> privileges as once you are in if you execute a suid app it will be suid
>> relative to your user namespace.  The careful changing of capable to
>> ns_capable will allow other namespaces and other things that today are
>> root only because of fears of mucking up the execution environment to be
>> enabled.
>> 
>> What is slightly up in the air is how do we map user namespaces to
>> filesystems.  The simplest solution looks to be to setup a uid and gid
>> mappings from each child user namespace to the initial system user
>> namespace.  Then in a child user namespace setuid(2) will fail if
>> you attempt to use an id that does not have a mapping.
>
> But setting up a mapping is a privileged operation, right?  So then it
> seems that practically speaking in an "out of the box" scenario on a
> distro like RHEL or Debian, since there's no mapping configured, after a
> process enters a new namespace it can't run setuid binaries?  

Sort of.  Allowing the use of more than your current uid in the mapping
is a privileged operation.  I have a prototype that does an upcall using
the request-key infrastructure for the validation.

I expect by the time this makes it to "out of the box" experiences on
enterprise distros, useradd and friends will be giving out 1000 or so uids
to new accounts.

> Also I don't see how user namespaces can replace "fakeroot" if this is
> true.  The whole point of fakeroot is being able to do things like "make
> install DESTDIR=/home/user/tmpdir && tar cz -C /home/user/tmpdir -f
> foo.tar.gz ." to get a tarball with root-owned files, without actually
> requiring the privileges to temporarily make real root owned files.  But
> without a privileged mapping operation there's no way to map uid 0 in
> the namespace to something else on the filesystem, right?

Inside the user namespace the creators uid appears as uid 0.

> Basically it's not clear to me how you make user namespaces really
> flexible without patching the filesystems to support persisting the
> namespaces somehow.  Unix diehards will probably groan at this, but
> honestly the Windows approach where "uids" (SIDs) are strings has its
> appeal...that still requires patching filesystems (and in the end lots
> of userspace) but it's much more flexible.

The only thing that makes this better is a multi-part identifier stored
on disk where one part is a domain the identifier comes from.   That way
you can store overlapping identifiers and since your domains don't
conflict you are good.

At which point gaining access to a different persistent domain
identifier then your default one becomes a persistent identifier.

In practice I don't see any difference between that and gaining
access to a range of uids.  So I going forward with a range of uids
as my default case as that works with all unix filesystems without
extra work.

I don't know how a windows SIDs based system deals with storing files
from anther domain on the local filesystem.

Nothing prevents other filesystems using other algorithms besides
just storing the mapped uids for dealing with namespaces.   My goal
was to come up with a good default .

> I can see how the user namespace work is useful for containers though.

Oh definitely there.

I actually was thinking of a similar distributed build and test
environment as one of my test cases when I validated my design
the last round.

>> At the same time this means that
>> once you enter a user namespace all of the capabilities you can
>> acquire
>> are relative to that user namespace.
>
> So it seems like practically speaking if the goal is to be able to
> securely run code that "feels like" uid 0 in a container (e.g. start
> apache) you have to drop off most of the capabilities that let you take
> over the "host".  There's a number of these in CAP_SYS_ADMIN.

You misunderstood.  And you can look at the code in the kernel right
now for how this is implemented.

CAP_SYS_ADMIN in a user namespace is not the global CAP_SYS_ADMIN.

So despite having the user namespace's idea of CAP_SYS_ADMIN you can't
do the nasty CAP_SYS_ADMIN things.

So for the sites where CAP_SYS_ADMIN is required that are actually safe
for userspace once we remove the spoofing problem.  You will be allowed
to use those calls.

>> Still I find in the kernel it generally is easier to solve the general
>> case.  It makes everyone happy and it removes the need to ask people to
>> rewrite all of their in house applications.
>
> Right, clearly we can't just drop support for setuid binaries from the
> kernel, but we *do* have the source code to userspace...it's at least
> worth thinking about what could be better if we can assume there aren't
> setuid binaries.

Having a case where you don't have to worry about suid is very
compelling, and if I were to design an new unix like OS suid would
not be implemented.  I think the plan 9 guys got that right.

After going a couple rounds with how far can we go with suid being
disabled in my head I have decided to go down the user namespace route.
Especially since what is left is just cleaning up the code that is
in my tree and getting it merged.

> I need to think more about the user namespace stuff - but I'm not
> getting the impression so far it'll allow me to do what I want without
> adding a new setuid binary (or a mount hardlink) to util-linux
> basically.

I think the user namespace will do what you need. Certainly it appears
that everything in your example binary will be allowed by the time it is
done.  Still there is the old saying about a bird in the hand being
worth more than two birds in the bush.

Eric


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: chroot(2) and bind mounts as non-root
  2011-12-19  0:55             ` Eric W. Biederman
@ 2011-12-19  4:06               ` Serge E. Hallyn
  2011-12-19  9:22                 ` Eric W. Biederman
  2011-12-20 21:23               ` Colin Walters
  1 sibling, 1 reply; 31+ messages in thread
From: Serge E. Hallyn @ 2011-12-19  4:06 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Colin Walters, LKML, alan, morgan, luto, kzak, Steve Grubb

Quoting Eric W. Biederman (ebiederm@xmission.com):
> Colin Walters <walters@verbum.org> writes:
> 
> > On Thu, 2011-12-15 at 22:14 -0800, Eric W. Biederman wrote:
> >
> >> Which means it is safe to enter a new user namespace without root
> >> privileges as once you are in if you execute a suid app it will be suid
> >> relative to your user namespace.  The careful changing of capable to
> >> ns_capable will allow other namespaces and other things that today are
> >> root only because of fears of mucking up the execution environment to be
> >> enabled.
> >> 
> >> What is slightly up in the air is how do we map user namespaces to
> >> filesystems.  The simplest solution looks to be to setup a uid and gid
> >> mappings from each child user namespace to the initial system user
> >> namespace.  Then in a child user namespace setuid(2) will fail if
> >> you attempt to use an id that does not have a mapping.
> >
> > But setting up a mapping is a privileged operation, right?  So then it
> > seems that practically speaking in an "out of the box" scenario on a
> > distro like RHEL or Debian, since there's no mapping configured, after a
> > process enters a new namespace it can't run setuid binaries?  
> 
> Sort of.  Allowing the use of more than your current uid in the mapping
> is a privileged operation.  I have a prototype that does an upcall using
> the request-key infrastructure for the validation.

If I understand you both right, I think what Eric said here is not relevant
to what Colin cares about.

Colin, for the case of "fakeroot debian/rules binary" or
build+create-tarball inside of a user namespace, all that will matter
to you is that yes, inside the user namespace which you created without
privilege you will be able to create files which are owned by (the user
namespace's) root, and so you'll be able to get the tarball or .deb with
root owned files.

The mapping Eric is talking about here is new even to me, but I think it
is an implementation detail referring to a proposal where each uid in the
container maps to a real uid on the host.  The only thing about that mapping
that matters is that none of the host uids conflict with existing host
uids (or uids mapped for other containers).  Now if you want to do cool
things like map uid 501 on the host to 1001 in the container as well as
502 on the host to 1010 in the container, that will be supported - and I
think that's what Eric is referring to.

But for the sake of fire-off-a-build, you can ignore that and use random
uids on the host side of the mapping.

> I expect by the time this makes it to "out of the box" experiences on
> enterprise distros, useradd and friends will be giving out 1000 or so uids
> to new accounts.
> 
> > Also I don't see how user namespaces can replace "fakeroot" if this is
> > true.  The whole point of fakeroot is being able to do things like "make
> > install DESTDIR=/home/user/tmpdir && tar cz -C /home/user/tmpdir -f
> > foo.tar.gz ." to get a tarball with root-owned files, without actually
> > requiring the privileges to temporarily make real root owned files.  But
> > without a privileged mapping operation there's no way to map uid 0 in
> > the namespace to something else on the filesystem, right?
> 
> Inside the user namespace the creators uid appears as uid 0.

That's the most important thing, for your (Colin) use case, which should
give you what you ened.

...

> > So it seems like practically speaking if the goal is to be able to
> > securely run code that "feels like" uid 0 in a container (e.g. start
> > apache) you have to drop off most of the capabilities that let you take
> > over the "host".  There's a number of these in CAP_SYS_ADMIN.
> 
> You misunderstood.  And you can look at the code in the kernel right
> now for how this is implemented.
> 
> CAP_SYS_ADMIN in a user namespace is not the global CAP_SYS_ADMIN.

In particular, compare

capable(CAP_SYS_ADMIN)

to

ns_capable(ns, CAP_SYS_ADMIN).

...

> I think the user namespace will do what you need. Certainly it appears

As do I.

> that everything in your example binary will be allowed by the time it is
> done.  Still there is the old saying about a bird in the hand being
> worth more than two birds in the bush.
> 
> Eric

Right, this isn't there yet, after all, and your (Colin) program is  :)

-serge

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: chroot(2) and bind mounts as non-root
  2011-12-19  4:06               ` Serge E. Hallyn
@ 2011-12-19  9:22                 ` Eric W. Biederman
  2011-12-20 16:49                   ` Colin Walters
  0 siblings, 1 reply; 31+ messages in thread
From: Eric W. Biederman @ 2011-12-19  9:22 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Colin Walters, LKML, alan, morgan, luto, kzak, Steve Grubb

"Serge E. Hallyn" <serge@hallyn.com> writes:

> If I understand you both right, I think what Eric said here is not relevant
> to what Colin cares about.

As long as Colin only cares about being able to be the root user I
agree.  If Colin needs several uids during his build that is trickier.
But it sounds like Colin just needs to have a chroot build environment and
for that a single user sounds good enough.

Being able to use the other namespaces to get a good isolation from the
host environment is also nice and especially the pid namespace can
guarantee that processes won't escape his build environment.

> The mapping Eric is talking about here is new even to me, but I think it
> is an implementation detail referring to a proposal where each uid in the
> container maps to a real uid on the host.  The only thing about that mapping
> that matters is that none of the host uids conflict with existing host
> uids (or uids mapped for other containers).  Now if you want to do cool
> things like map uid 501 on the host to 1001 in the container as well as
> 502 on the host to 1010 in the container, that will be supported - and I
> think that's what Eric is referring to.
>
> But for the sake of fire-off-a-build, you can ignore that and use random
> uids on the host side of the mapping.

It is one of those worse is better implementation details but we can
discuss that more when I start posting patches in January. 

I am not an immediate fan of writing random uids to disk.  Uids being
persistent can be interesting to deal with if those uids are ever
reused.

Right now my implementation supports just 5 non-overlapping uid mapping
ranges.  Which is enough to cover a lot of uids but still fit within one
cacheline.  And I think to keep stat reasonable fast I want at to fit in
a cacheline at least for now.  Oy.  Hopefully it isn't too hard to find
some benchmarks to prove this out.  I expect the torture case is to
time ls -l in a huge directory with a lot of files, owned by a lot of
different users.

Eric

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: chroot(2) and bind mounts as non-root
  2011-12-19  9:22                 ` Eric W. Biederman
@ 2011-12-20 16:49                   ` Colin Walters
  0 siblings, 0 replies; 31+ messages in thread
From: Colin Walters @ 2011-12-20 16:49 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge E. Hallyn, LKML, alan, morgan, luto, kzak, Steve Grubb

On Mon, 2011-12-19 at 01:22 -0800, Eric W. Biederman wrote:
> "
> As long as Colin only cares about being able to be the root user I
> agree.  

I don't actually need "pretend to be uid 0" functionality myself, but
the "fakeroot" case was cited on the user namespace page, and so I
wanted to understand how it works.

> If Colin needs several uids during his build that is trickier.
> But it sounds like Colin just needs to have a chroot build environment and
> for that a single user sounds good enough.

Right, just need chroot (and bind mounts).

> Being able to use the other namespaces to get a good isolation from the
> host environment is also nice and especially the pid namespace can
> guarantee that processes won't escape his build environment.

Yeah, CLONE_NEWPID is great.

> It is one of those worse is better implementation details but we can
> discuss that more when I start posting patches in January. 
> 
> I am not an immediate fan of writing random uids to disk.  Uids being
> persistent can be interesting to deal with if those uids are ever
> reused.

Right...

> Right now my implementation supports just 5 non-overlapping uid mapping
> ranges.  Which is enough to cover a lot of uids but still fit within one
> cacheline.  And I think to keep stat reasonable fast I want at to fit in
> a cacheline at least for now.  Oy.  Hopefully it isn't too hard to find
> some benchmarks to prove this out.  I expect the torture case is to
> time ls -l in a huge directory with a lot of files, owned by a lot of
> different users.

Where's the current user namespace tree?  The link on
https://wiki.ubuntu.com/UserNamespace is broken.

Is it:
http://kernel.ubuntu.com/git?p=serge/linux-2.6.git;a=summary

?


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: chroot(2) and bind mounts as non-root
  2011-12-19  0:55             ` Eric W. Biederman
  2011-12-19  4:06               ` Serge E. Hallyn
@ 2011-12-20 21:23               ` Colin Walters
  1 sibling, 0 replies; 31+ messages in thread
From: Colin Walters @ 2011-12-20 21:23 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge E. Hallyn, LKML, alan, morgan, luto, kzak, Steve Grubb

On Sun, 2011-12-18 at 16:55 -0800, Eric W. Biederman wrote:

> I expect by the time this makes it to "out of the box" experiences on
> enterprise distros, useradd and friends will be giving out 1000 or so uids
> to new accounts.

Hmm...how would that work?  Would it be something that would happen at
PAM time, like a module that looks up some file in /etc and says "OK
this uid gets this range" and uploads that to the kernel? 

This whole idea of a normal uid getting *other* slave uids is cool but
scary at the same time.  So much infrastructure in what I think of as
"General Purpose Linux"[1] is built up around a uid - resource
restrictions and authentication for example.

I guess as long as we're sure that all cases where a "uid" crosses a
user namespace (say socket credentials) and appears as the right thing,
it may be secure.

> I think the user namespace will do what you need. Certainly it appears
> that everything in your example binary will be allowed by the time it is
> done.

That's cool, I will keep an eye on what you guys are doing.  Looks like
the containers list on linuxfoundation.org is the right one to follow?

[1] The code that's shared between RHEL and Debian roughly between the
kernel and GNOME, discarding the pointless "packaging" differences



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: chroot(2) and bind mounts as non-root
  2011-12-16  6:14         ` Eric W. Biederman
  2011-12-18 16:01           ` Colin Walters
@ 2011-12-21 18:15           ` Steve Grubb
  2012-01-03 23:13             ` Eric W. Biederman
  1 sibling, 1 reply; 31+ messages in thread
From: Steve Grubb @ 2011-12-21 18:15 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Colin Walters, Serge E. Hallyn, LKML, alan, morgan, luto, kzak

On Friday, December 16, 2011 01:14:36 AM Eric W. Biederman wrote:
> Colin Walters <walters@verbum.org> writes:
> > On Mon, 2011-12-12 at 23:11 +0000, Serge E. Hallyn wrote:
> >> Look at the cap_get_bound.3 manpage, and look for CAP_IS_SUPPORTED.
> >> If you start at CAP_LAST_CAP and keep going up/down depending on whether
> >> it was support or not it shouldn't take too long to find the last
> >> valid value.  Not ideal, but should be reliable.
> > 
> > Blah =/  I think I'll just rely on the MS_NOSUID bind mount for now.
> > 
> >> I haven't taken a critical look at the mount code but other than that
> >> it seems reasonable and useful to me!  Thanks.
> > 
> > Can you link me to any discussion of how the user namespace stuff you're
> > working on would enable any of this (chroot, bind mounts) to be
> > available to "unprivileged" users?  Is it that once a non-uid 0 process
> > enters a new namespace, when executing a setuid 0 binary from the
> > filesystem, because that binary is from a different user namespace, the
> > setuid bits don't apply?
> > 
> > What does it even mean for a file to be "owned" by a user namespace -
> > unless you're talking about patching e.g. ext4 to persist namespaces
> > somehow.
> > 
> > Where I'd ultimately like to get is having this utility in util-linux,
> > but before I propose that I'd like to have a good idea what the
> > possibilities are with user namespaces.
> 
> The essentials is that all of the security credentials a process sees
> (uids, gids, capabilities, keys) all belong to the user namespace.  This
> allows process migration while still being able to use the same global
> identifiers you were using before.  At the same time this means that
> once you enter a user namespace all of the capabilities you can acquire
> are relative to that user namespace.
> 
> You can look at the details of ns_capable (merged) to see how those
> capabilities will work.
> 
> It is envisioned that the other namespaces will start recording the user
> namespace that created them so we can evaluate ns_capable relative to
> the creator of those namespaces.  (It is trivial work we are just
> holding off so we don't introduce a security hole while we get the
> other bits implemented).
> 
> Which means it is safe to enter a new user namespace without root
> privileges as once you are in if you execute a suid app it will be suid
> relative to your user namespace.  The careful changing of capable to
> ns_capable will allow other namespaces and other things that today are
> root only because of fears of mucking up the execution environment to be
> enabled.
> 
> What is slightly up in the air is how do we map user namespaces to
> filesystems.  The simplest solution looks to be to setup a uid and gid
> mappings from each child user namespace to the initial system user
> namespace.  Then in a child user namespace setuid(2) will fail if
> you attempt to use an id that does not have a mapping.
> 
> Similarly in fs/exec.c:prepare_binprm() at the point where we test
> MNT_NOSUID we will add an additional test to see if the uid and gid
> of the executable will map to the target user namespace.  If the ids
> don't map we skip the suid step entirely.
> 
> Since except at the edges of userspace we use uids and gids in the
> initial user namespace, the implications for confusing other security
> mechanisms is minimized.

Is anyone thinking about how this affects the audit system?

-Steve

> The downside of requiring a mapping is that there is the tiniest bit of
> user policy that will have to be added to the distributions to take full
> advantage of the user namespace.  If you don't have that policy setup
> your real uid will not change but you will appear to userspace and uid
> 0. Which should be sufficient to compile, chroot, mount and just about
> everything else interesting without privileges.
> 
> > The more I think about this though, the more I am a big fan of what the
> > OpenWall people are doing - if it gets me chroot as a user, I am totally
> > on board with just removing all setuid binaries.  We're already fairly
> > far along on doing that in GNOME by using PolicyKit mechanisms
> > anyways.
> 
> I am a great fan of the idea of removing from user space applications
> the ability to gain privileges during exec.  There are some many fewer
> cases you have to audit for, and it requires less kernel code to support
> overall.  Although I admit the direction you have suggested at the
> beginning of this thread has it's appeal.
> 
> Still I find in the kernel it generally is easier to solve the general
> case.  It makes everyone happy and it removes the need to ask people to
> rewrite all of their in house applications.
> 
> Eric

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: chroot(2) and bind mounts as non-root
  2011-12-21 18:15           ` Steve Grubb
@ 2012-01-03 23:13             ` Eric W. Biederman
  0 siblings, 0 replies; 31+ messages in thread
From: Eric W. Biederman @ 2012-01-03 23:13 UTC (permalink / raw)
  To: Steve Grubb
  Cc: Colin Walters, Serge E. Hallyn, LKML, alan, morgan, luto, kzak

Steve Grubb <sgrubb@redhat.com> writes:

> On Friday, December 16, 2011 01:14:36 AM Eric W. Biederman wrote:
>> Since except at the edges of userspace we use uids and gids in the
>> initial user namespace, the implications for confusing other security
>> mechanisms is minimized.
>
> Is anyone thinking about how this affects the audit system?

A little.

Today the audit system can only be used from the initial namespaces and
the pids that we use are from the initial pid namespace.

It is my expectation that we can continue the same pattern for uids as
well.

Eric

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2012-01-03 23:11 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-07 17:54 chroot(2) and bind mounts as non-root Colin Walters
2011-12-07 19:36 ` John Stoffel
2011-12-08 16:10   ` Colin Walters
2011-12-08 18:14     ` John Stoffel
2011-12-08 18:26       ` Colin Walters
2011-12-09  0:49         ` Sven-Haegar Koch
2011-12-09 14:55         ` John Stoffel
2011-12-09 15:06           ` Colin Walters
2011-12-08 17:04   ` Arnd Bergmann
2011-12-08 17:15     ` Colin Walters
2011-12-07 19:40 ` Andy Lutomirski
2011-12-08 16:58   ` Colin Walters
2011-12-07 20:34 ` H. Peter Anvin
2011-12-07 20:54   ` Alan Cox
2011-12-15 18:55     ` Andrew G. Morgan
2011-12-16 15:44       ` Colin Walters
2011-12-18  1:22         ` Andrew G. Morgan
2011-12-18 15:19           ` Colin Walters
2011-12-10  5:29 ` Serge E. Hallyn
2011-12-12 16:41   ` Colin Walters
2011-12-12 23:11     ` Serge E. Hallyn
2011-12-15 20:56       ` Colin Walters
2011-12-16  6:14         ` Eric W. Biederman
2011-12-18 16:01           ` Colin Walters
2011-12-19  0:55             ` Eric W. Biederman
2011-12-19  4:06               ` Serge E. Hallyn
2011-12-19  9:22                 ` Eric W. Biederman
2011-12-20 16:49                   ` Colin Walters
2011-12-20 21:23               ` Colin Walters
2011-12-21 18:15           ` Steve Grubb
2012-01-03 23:13             ` Eric W. Biederman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.