From mboxrd@z Thu Jan  1 00:00:00 1970
From: Serge Hallyn <serge.hallyn@canonical.com>
Subject: Re: [REVIEW][PATCH 0/43] Completing the user namespace
Date: Tue, 10 Apr 2012 23:33:25 -0500
Message-ID: <20120411043325.GC7153@sergelap>
References: <m11unyn70b.fsf@fess.ebiederm.org>
 <4F84838B.8000408@mit.edu>
 <m14nsrxn6v.fsf@fess.ebiederm.org>
 <CAObL_7F2oHtOoDkvNM1io=dovKENNTxS4EDPkr4ns9AEdFqwaQ@mail.gmail.com>
 <m14nsrtady.fsf@fess.ebiederm.org>
 <CAObL_7GFkNfQggDNZ+MicdeTe7duJY7cJJELHcb2-vxHHJkS_g@mail.gmail.com>
 <m162d7kroj.fsf@fess.ebiederm.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: QUOTED-PRINTABLE
Cc: Andrew Lutomirski <luto@mit.edu>,
	Markus Gutschke <markus@chromium.org>,
	Will Drewry <wad@chromium.org>,
	Cyrill Gorcunov <gorcunov@openvz.org>,
	linux-security-module@vger.kernel.org,
	Al Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Return-path: <linux-security-module-owner@vger.kernel.org>
Content-Disposition: inline
In-Reply-To: <m162d7kroj.fsf@fess.ebiederm.org>
Sender: linux-security-module-owner@vger.kernel.org
List-Id: linux-fsdevel.vger.kernel.org

Quoting Eric W. Biederman (ebiederm@xmission.com):
> Andrew Lutomirski <luto@mit.edu> writes:
>=20
> > On Tue, Apr 10, 2012 at 4:50 PM, Eric W. Biederman
> > <ebiederm@xmission.com> wrote:
> >> Andrew Lutomirski <luto@mit.edu> writes:
> >>
> >>> On Tue, Apr 10, 2012 at 2:59 PM, Eric W. Biederman
> >>> <ebiederm@xmission.com> wrote:
> >>>> Andy Lutomirski <luto@MIT.EDU> writes:
> >>>>
> >>>> My understanding of no_new_privs is that current_cred() includin=
g
> >>>> the user, the user namespace and the security label will never c=
hange,
> >>>> with the goal of making the security analysis simple.
> >>>
> >>> They can change but only if you already have the privilege to cha=
nge
> >>> them yourself and then you do so. =A0For example, PR_SET_NO_NEW_P=
RIVS,
> >>> setuid, then drop caps is allowed and useful -- it's a race-free =
way
> >>> to make sure that a given uid never executes without no_new_privs=
 set.
> >>> =A0I've implemented this as a pam module.
> >>
> >> Careful. =A0There is the security_task_fix_setuid call that will r=
aise
> >> your capabilities from cap->effective to cap->permitted if you cal=
l
> >> setuid(0). =A0Which in the general case means you can regain all o=
f the
> >> root privileges if you only have CAP_SETUID.
> >>
> >
> > That's fine.  If you're running with CAP_SETUID and default
> > securebits, then you effectively have all capabilities already and
> > don't need to exploit setuid binaries to gain them.  no_new_privs
> > doesn't change that.  If you don't want to be able to gain all priv=
s,
> > change securebits or drop CAP_SETUID.  seccomp reduces the kernel
> > attack surface; no_new_privs reduces the userspace attack surface.
> > But see below...
> >
> >
> >>
> >>>> I don't recall how seccomp filters are dealt with if you don't h=
ave
> >>>> no_new_privs enabled. =A0If seccomp filters installed by root
> >>>> are dropped when we change privilege levels it might be worth lo=
oking
> >>>> at how to keep a seccomp filter installed as long as you stay in
> >>>> a user namespace.
> >>>>
> >>>
> >>> They're not dropped. =A0I think in the current implementation the=
y can't
> >>> be dropped at all.
> >>
> >> Which makes sense. =A0 Is this why you need no_new_privs? =A0So yo=
u can't run
> >> seccomp on higher privileged executables and confusing them into k=
eeping
> >> privileges when they should not?
> >
> > Exactly.  seccomp is flexible enough that it's probably possible to
> > confuse many setuid executables with it.
> >
> >>
> >>>> The emphasis is a bit different from new_new_privs as the user_n=
amespace
> >>>> does not need to guarantee that the lsm will not change security=
 labels,
> >>>> etc.
> >>>
> >>> Hmm. =A0Is this safe? =A0For example, if there's a program that L=
SM policy
> >>> grants extra privileges that malfunctions when run inside a user
> >>> namespace, can that be used to break out of LSM restrictions?
> >>
> >> I can't see how it would not be safe.
> >>
> >> Except for the user namespace pointer the state the LSM and the re=
st of
> >> the kernel sees is the same state the kernel sees. =A0Aka userspac=
e sees
> >> uid 0, the LSM does not. =A0So I don't know why a LSM would get co=
nfused.
> >>
> >> Beyond that it is a bug for an LSM to grant permissions beyond the
> >> core DAC model. =A0So the worst I can see is an LSM not grokking u=
ser
> >> namespaces and getting confused and not restricting a process as
> >> much as the designer of the LSM would like.
> >
> > Right.  Suppose you have some program that has extra restrictions
> > applied by an LSM.  It executes a helper (e.g. Apache's suidexec
> > thing, but I bet there are more examples) which is supposed to be v=
ery
> > careful not to leak privileges.  The LSM is set to restrict that
> > helper less than the parent process.  But that program was written
> > before user namespaces existed, and it has a bug (or missing featur=
e)
> > that allows its parent to exploit it when run inside an unmapped us=
er
> > namespace.  The parent can now escape from the LSM restrictions.
> >
> > no_new_privs is designed to prevent exactly this issue.
>=20
> Currently the suid exec will fail because the uid's don't map.
>=20
> I might switch that around to simply ignoring the change of uid
> on suid exec.  I have a patch in my devel tree that plays with
> that idea.  However as much as I hit that case once in testing
> (I think it was ping).  I don't think running suid executables
> is particularly interesting.
>=20
> Certainly the application program won't care or break, because we are
> still bounded by the usaual DAC security.
>=20
> I wonder a little if the lsm might change labels on exec of a
> non suid binary.  That case is more interesting in the unmapped
> unprivileged user namespace.

They will (change labels on exec of non suid binary).  But.  First, any
well behaved user of user namespaces will switch to a (selinux, smack,
apparmor, whatever) context which is aware it is namespaced so that onl=
y
desired transitions happen.  So we're left with the concern of uid 1001
creates an unprivileged user namespace and runs a program (as uid 0)
which transitions him to uber_client_t.  Since as Eric has pointed out
the MAC can't override the DAC rules, it still won't be able to write t=
o
files not owned by uid 1001 in the initial user namespace.  We might
worry about it connecting to the privileged server and passing its
uber_client_t credentials to pass a request.  The server being in the
initial user ns will get uid 1001, not 0.

Perhaps the client checks its uid (0 in its user namespace) and passes
it to the server (as a simple message), which blindly accepts that.  In
that case the server could just as easily be exploited without user
namespaces. =20

It's possible that there's another way this can be exploited, but I
haven't thought of it yet.

-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-securit=
y-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html