All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jann Horn <jann@thejh.net>
To: "Serge E. Hallyn" <serge@hallyn.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
	"Serge E. Hallyn" <serge.hallyn@ubuntu.com>,
	lkml <linux-kernel@vger.kernel.org>,
	Andrew Morgan <morgan@kernel.org>,
	Andy Lutomirski <luto@amacapital.net>,
	lxc-devel@lists.linuxcontainers.org,
	Richard Weinberger <richard@nod.at>,
	LSM <linux-security-module@vger.kernel.org>,
	linux-api@vger.kernel.org, keescook@chromium.org
Subject: Re: [PATCH RFC] Introduce new security.nscapability xattr
Date: Wed, 20 Jan 2016 13:48:16 +0100	[thread overview]
Message-ID: <20160120124816.GB32379@pc.thejh.net> (raw)
In-Reply-To: <20151204202116.GA4809@mail.hallyn.com>

[-- Attachment #1: Type: text/plain, Size: 7646 bytes --]

On Fri, Dec 04, 2015 at 02:21:16PM -0600, Serge E. Hallyn wrote:
> Quoting Eric W. Biederman (ebiederm@xmission.com):
> > "Serge E. Hallyn" <serge.hallyn@ubuntu.com> writes:
> > 
> > > A common way for daemons to run with minimal privilege is to start as root,
> > > perhaps setuid-root, choose a desired capability set, set PR_SET_KEEPCAPS,
> > > then change uid to non-root.  A simpler way to achieve this is to set file
> > > capabilities on a not-setuid-root binary.  However, when installing a package
> > > inside a (user-namespaced) container, packages cannot be installed with file
> > > capabilities.  For this reason, containers must install ping setuid-root.
> > 
> > Don't ping sockets avoid that specific problem?
> > 
> > I expect the general case still holds.
> > 
> > > To achieve this, we would need for containers to be able to request file
> > > capabilities be added to a file without causing these to be honored in the
> > > initial user namespace.
> > >
> > > To this end, the patch below introduces a new capability xattr format.  The
> > > main enhancement over the existing security.capability xattr is that we
> > > tag capability sets with a uid - the uid of the root user in the namespace
> > > where the capabilities are set.  The capabilities will be ignored in any
> > > other namespace.  The special case of uid == -1 (which must only ever be
> > > able to be set by kuid 0) means use the capabilities in all
> > > namespaces.
> 
> really since security.capability xattrs are currently honored in all
> namespaces this isn't really necessary.  Until and unless Seth's set
> changes that.
> 
> > 
> > A quick comment on this.
> > 
> > We currently allow capabilities that have been gained to be valid in all
> > descendent user namespaces.
> > 
> > Applying this principle to the on-disk capabilities would make it so
> > that uid 0 would mean capabilities in all namespaces.
> > 
> > It might be worth it to introduce a fixed sized array with a length
> > parameter of perhaps 32 entries which is a path of root uids as seen by
> > the initial user namespace.  That way the entire construction of the
> > user namespace could be verified.  AKA verify the current user namespace
> > and the parent and the parents parent.  Up to the user namespace the
> > current filesystem is mounted in. We would look at how much space
> > allows an xattr to be stored without causing filesystems a challenge
> > to properly size such an array.
> > 
> > Given that uids are fundamentally flat that might not be particularly
> > useful.  If we add an alternative way of identifying user namespaces
> > say a privileged operation that set a uuid, then the complete path would
> > be more interesting.
> > 
> > > An alternative format would use a pair of uids to indicate a range of rootids.
> > > This would allow root in a user namespace with uids 100000-165536 mapped to
> > > set the xattr once on a file, then launch nested containers wherein the file
> > > could be used with privilege.  That's not what this patch does, but would be
> > > a trivial change if people think it would be worthwhile.
> > >
> > > This patch does not actually address the real problem, which is setting the
> > > xattrs from inside containers.  For that, I think the best solution is to
> > > add a pair of new system calls, setfcap and getfcap. Userspace would for
> > > instance call fsetfcap(fd, cap_user_header_t, cap_user_data_t), to which
> > > the kernel would, if not in init_user_ns, react by writing an appropriate
> > > security.nscapability xattr.
> > 
> > That feels hard to maintain, but you may be correct that we have a small
> > enough userspace that it would not be a problem.
> > 
> > Eric
> > 
> > 
> > > The libcap2 library's cap_set_file/cap_get_file could be switched over
> > > transparently to use this to hide its use from all callers.
> > >
> > > Comments appreciated.
> > >
> > > Note - In this patch, file capabilities only work for containers which have
> > > a root uid defined.  We may want to allow -1 uids to work in all
> > > namespaces.  There certainly would be uses for this, but I'm a bit unsettled
> > > about the implications of allowing a program privilege in a container where
> > > there is no uid with privilege.  This needs more thought.
> 
> So for actually enabling (user-namespaced) containers to use these, there
> are a few possibilities that come to mine.
> 
> 1. A new setfcap (/getfcap) syscall.  Uses mapped uid 0 from
> current_user_ns() to write a value in the security.nscapability xattr.
> Userspace doesn't need to worry at all about namespace issues.

Your patch has an "ncaps" field and supports giving (possibly different)
privileges to different namespace root users.
I think that with a setfcap() syscall as you describe, it would be hard
to add more than one security.nscapability entry to a file without
poking holes into stuff:

If setfcap() requires inode write access, every namespace root for whom
a security.nscapability entry can be added can overwrite the file, so
namespace roots might be able to execute code as each other.

If setfcap() doesn't require inode write access, that isn't a big
security issue, but it allows unprivileged users to waste disk space
in a writable-mounted filesystem on which they have no write access
anywhere. I'm not sure whether that is a sufficiently big problem for
anyone to care.


Another issue with this would be that it makes restoring sub-namespace
capabilities from a backup somewhat ugly, at least if you're not in the
init ns: You'd have to parse the capabilities attribute, then, for every
attribute you want to restore whose rootid doesn't refer to your
namespace, clone(CLONE_NEWUSER), map the target uid to 0 from the parent
process, seteuid(0), do the setfcap() and exit.

It would require less synchronization with my ptrace hardening patch
that checks whether uids are mapped (if that lands, I don't think it has
yet?) because that would allow you to first setuid, then clone(), map
the uid to 0 inside the namespace and setfcap(), but it'd still require
special-case code in backup software.


> 2. Just expect userspace to write a xattr;  kernel checks that no values
> are changed for any other namespaces.  This could be a lot of parsing and
> verifying in the kernel.

This option would allow the first problem I described with option 1 (if
it is a problem?) to be worked around by letting a helper in the outer
namespace add capabilities for lower namespaces - but it wouldn't be
pretty.

I think it would also allow tar, which already supports xattrs, to
deal with namespace capabilities without needing any additional code.
I think that would be nice to have.


> 3. Switch the xattr scheme - instead of one security.nscapability xattr
> with multiple entries, use security.nscapability.$(rootid).  Now the
> kernel only needs to verify that the $rootid is valid for the writing
> task, and we don't need a new syscall.  OTOH userspace needs to know
> what it's doing.  Of course we can still hide that behind libcap2's helpers.

This doesn't sound so good - how would the namespace know its rootid?
If it is just one level below init_ns, it can grab it from
/proc/self/uid_map, but for deeper levels that won't work.


> Any opinions on which way seems best?  1 does seem cleanest (and supports
> use of seccomp if we want to forbit its use by some containers), but
> involves a new pair of syscalls.  2 seems to me to be right out, but
> others might disagree...

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

  parent reply	other threads:[~2016-01-20 12:48 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-30 22:43 [PATCH RFC] Introduce new security.nscapability xattr Serge E. Hallyn
2015-11-30 23:08 ` Eric W. Biederman
2015-11-30 23:08   ` Eric W. Biederman
2015-12-01  3:51   ` Serge E. Hallyn
2015-12-01  3:51     ` Serge E. Hallyn
2015-12-04 20:21   ` Serge E. Hallyn
2016-01-19  7:09     ` Serge E. Hallyn
2016-01-20 12:48     ` Jann Horn [this message]
2016-01-27 16:08       ` Serge E. Hallyn
2016-01-27 17:22         ` Jann Horn
2016-01-28  0:36           ` Andy Lutomirski
2016-01-28  0:36             ` Andy Lutomirski
2016-01-29  7:31             ` Serge E. Hallyn
2016-01-29  7:31               ` Serge E. Hallyn
2016-02-29 21:38               ` Serge E. Hallyn
2016-02-29 21:38                 ` Serge E. Hallyn
2016-03-02  0:00                 ` Serge E. Hallyn
2016-03-02  0:00                   ` Serge E. Hallyn
2016-01-20 12:14 ` Jann Horn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160120124816.GB32379@pc.thejh.net \
    --to=jann@thejh.net \
    --cc=ebiederm@xmission.com \
    --cc=keescook@chromium.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=lxc-devel@lists.linuxcontainers.org \
    --cc=morgan@kernel.org \
    --cc=richard@nod.at \
    --cc=serge.hallyn@ubuntu.com \
    --cc=serge@hallyn.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.