From mboxrd@z Thu Jan  1 00:00:00 1970
From: ebiederm@xmission.com (Eric W. Biederman)
Subject: Re: [REVIEW][PATCH 0/43] Completing the user namespace
Date: Tue, 10 Apr 2012 16:50:01 -0700
Message-ID: <m14nsrtady.fsf@fess.ebiederm.org>
References: <m11unyn70b.fsf@fess.ebiederm.org> <4F84838B.8000408@mit.edu>
	<m14nsrxn6v.fsf@fess.ebiederm.org>
	<CAObL_7F2oHtOoDkvNM1io=dovKENNTxS4EDPkr4ns9AEdFqwaQ@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: QUOTED-PRINTABLE
Cc: Markus Gutschke <markus@chromium.org>,
	Will Drewry <wad@chromium.org>,
	Cyrill Gorcunov <gorcunov@openvz.org>,
	linux-security-module@vger.kernel.org,
	Al Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
To: Andrew Lutomirski <luto@mit.edu>
Return-path: <linux-security-module-owner@vger.kernel.org>
In-Reply-To: <CAObL_7F2oHtOoDkvNM1io=dovKENNTxS4EDPkr4ns9AEdFqwaQ@mail.gmail.com>
	(Andrew Lutomirski's message of "Tue, 10 Apr 2012 15:15:21 -0700")
Sender: linux-security-module-owner@vger.kernel.org
List-Id: linux-fsdevel.vger.kernel.org

Andrew Lutomirski <luto@mit.edu> writes:

> On Tue, Apr 10, 2012 at 2:59 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>> Andy Lutomirski <luto@MIT.EDU> writes:
>>>
>>> [...]
>>>
>>> I haven't read enough of the details to figure out how the uid mapp=
ing
>>> works (do all the child namespace uids map to the same parent uid?)=
, so
>>> I may be missing some details here.
>>
>> You seem to be missing a detail or two.
>>
>> What you want to look at are the functions make_kuid and from_kuid
>> in kernel/user_namespace.c =C2=A0You might look at the patches that =
talk
>> about uidgid.h and introducing a mapping layer.
>>
>> The implementation creates an incomplete but 1-1 mapping to the uids=
 in
>> the initial user namespace. =C2=A0Which means except for the change =
in
>> datatype (sigh) the existing permission checks don't need to be chan=
ged.
>
> I'll do my homework at the same time that I write up docs for
> no_new_privs (i.e. maybe today).
>
>>
>> My understanding of no_new_privs is that current_cred() including
>> the user, the user namespace and the security label will never chang=
e,
>> with the goal of making the security analysis simple.
>
> They can change but only if you already have the privilege to change
> them yourself and then you do so.  For example, PR_SET_NO_NEW_PRIVS,
> setuid, then drop caps is allowed and useful -- it's a race-free way
> to make sure that a given uid never executes without no_new_privs set=
=2E
>  I've implemented this as a pam module.

Careful.  There is the security_task_fix_setuid call that will raise
your capabilities from cap->effective to cap->permitted if you call
setuid(0).  Which in the general case means you can regain all of the
root privileges if you only have CAP_SETUID.

> This still simplifies security analysis: the guarantee is that, if
> no_new_privs is set, then a task's children cannot do anything that
> the task could do on it's own.  Therefore it's safe for the task to
> manipulate its own environment in whatever strange ways it wants,
> because even if that gives it the ability to subvert its children,
> there is no privilege gained.

>> I don't recall how seccomp filters are dealt with if you don't have
>> no_new_privs enabled. =C2=A0If seccomp filters installed by root
>> are dropped when we change privilege levels it might be worth lookin=
g
>> at how to keep a seccomp filter installed as long as you stay in
>> a user namespace.
>>
>
> They're not dropped.  I think in the current implementation they can'=
t
> be dropped at all.

Which makes sense.   Is this why you need no_new_privs?  So you can't r=
un
seccomp on higher privileged executables and confusing them into keepin=
g
privileges when they should not?

>> There are essentially two modes you can use the user namespace in:
>> with mappings setup (a privileged operation) and with no mappings.
>
>>
>> With no mappings you can not create a new user namespace or change o=
r
>> uid or gids, and suid exec fails (or possibly ignores the uid/gid ch=
ange
>> but I am starting with suid exec fails). =C2=A0Making user namespace=
s similar
>> to no_new_privs.
>>
>> The emphasis is a bit different from new_new_privs as the user_names=
pace
>> does not need to guarantee that the lsm will not change security lab=
els,
>> etc.
>
> Hmm.  Is this safe?  For example, if there's a program that LSM polic=
y
> grants extra privileges that malfunctions when run inside a user
> namespace, can that be used to break out of LSM restrictions?

I can't see how it would not be safe.

Except for the user namespace pointer the state the LSM and the rest of
the kernel sees is the same state the kernel sees.  Aka userspace sees
uid 0, the LSM does not.  So I don't know why a LSM would get confused.

Beyond that it is a bug for an LSM to grant permissions beyond the
core DAC model.  So the worst I can see is an LSM not grokking user
namespaces and getting confused and not restricting a process as
much as the designer of the LSM would like.

>> At a basic level of interaction I expect no_new_privs will need to f=
ail
>> any change of the user namespace. =C2=A0As changing the user namespa=
ce
>> changes current_cred(), and fundamentally allows more things to happ=
en.
>
> If a user namespace has no visible effect on processes that aren't
> descendents of whoever created it, then creating one in no_new_privs
> mode should be safe.  On the other hand, it could be somewhat useless=
=2E

Creating a user namespace will allowing a process access to more kernel
facilities.  Aka you can (or at least will be able to) create network
namespaces and mount namespaces and the like.  That increases the
surface of the kernel an attacker can hit.

So in a perfect kernel there are no affects on others.  In a scenario
where you are limiting how much of the kernel a user can use I think
you would want that.

Still given that you aren't doing the very restrictive current_cred()
must not change I don't know how it matters, and a bpf based seccomp ca=
n
pretty easily filter out new user namespace creation.  Shrug.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-securit=
y-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html