From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1754805AbdFWRo7 (ORCPT <rfc822;w@1wt.eu>);
        Fri, 23 Jun 2017 13:44:59 -0400
Received: from out03.mta.xmission.com ([166.70.13.233]:49130 "EHLO
        out03.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1754209AbdFWRo5 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 23 Jun 2017 13:44:57 -0400
From: ebiederm@xmission.com (Eric W. Biederman)
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>, zohar@linux.vnet.ibm.com,
        containers@lists.linux-foundation.org, linux-kernel@vger.kernel.org,
        xiaolong.ye@intel.com, linux-security-module@vger.kernel.org,
        lkp@01.org
References: <1498157989-11814-1-git-send-email-stefanb@linux.vnet.ibm.com>
        <1498174161.7636.4.camel@HansenPartnership.com>
        <20170622233619.GC2894@mail.hallyn.com>
        <1498176787.7636.11.camel@HansenPartnership.com>
Date: Fri, 23 Jun 2017 12:37:43 -0500
In-Reply-To: <1498176787.7636.11.camel@HansenPartnership.com> (James
        Bottomley's message of "Thu, 22 Jun 2017 17:13:07 -0700")
Message-ID: <87efuaip08.fsf@xmission.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-XM-SPF: eid=1dOSdT-0005hO-AG;;;mid=<87efuaip08.fsf@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=67.3.213.87;;;frm=ebiederm@xmission.com;;;spf=neutral
X-XM-AID: U2FsdGVkX19y9UYwas/EsETZPiH0T2EaqZutRCavTn0=
X-SA-Exim-Connect-IP: 67.3.213.87
X-SA-Exim-Mail-From: ebiederm@xmission.com
X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP
        *  0.0 TVD_RCVD_IP Message was received from an IP address
        *  1.5 XMNoVowels Alpha-numberic number with no vowels
        *  0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available.
        *  0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60%
        *      [score: 0.4952]
        * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC
        *      [sa07 1397; Body=1 Fuz1=1 Fuz2=1]
X-Spam-DCC: XMission; sa07 1397; Body=1 Fuz1=1 Fuz2=1 
X-Spam-Combo: *;James Bottomley <James.Bottomley@HansenPartnership.com>
X-Spam-Relay-Country: 
X-Spam-Timing: total 5305 ms - load_scoreonly_sql: 0.04 (0.0%),
        signal_user_changed: 2.6 (0.0%), b_tie_ro: 1.75 (0.0%), parse: 1.13 (0.0%),
        extract_message_metadata: 14 (0.3%), get_uri_detail_list: 3.4 (0.1%),
        tests_pri_-1000: 4.1 (0.1%), tests_pri_-950: 1.14 (0.0%), tests_pri_-900:
        1.03 (0.0%), tests_pri_-400: 29 (0.5%), check_bayes: 28 (0.5%), b_tokenize:
        10 (0.2%), b_tok_get_all: 10 (0.2%), b_comp_prob: 3.2 (0.1%),
        b_tok_touch_all: 3.0 (0.1%), b_finish: 0.54 (0.0%), tests_pri_0: 301 (5.7%),
        check_dkim_signature: 0.49 (0.0%), check_dkim_adsp: 3.1 (0.1%),
        tests_pri_500: 4948 (93.3%), poll_dns_idle: 4941 (93.1%), rewrite_mail: 0.00
        (0.0%)
Subject: Re: [PATCH 0/3] Enable namespaced file capabilities
X-Spam-Flag: No
X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600)
X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

James Bottomley <James.Bottomley@HansenPartnership.com> writes:

> On Thu, 2017-06-22 at 18:36 -0500, Serge E. Hallyn wrote:
>> Quoting James Bottomley (James.Bottomley@HansenPartnership.com):
>> > On Thu, 2017-06-22 at 14:59 -0400, Stefan Berger wrote:
>> > > This series of patches primary goal is to enable file 
>> > > capabilities in user namespaces without affecting the file 
>> > > capabilities that are effective on the host. This is to prevent 
>> > > that any unprivileged user on the host maps his own uid to root 
>> > > in a private namespace, writes the xattr, and executes the file
>> > > with privilege on the host.
>> > > 
>> > > We achieve this goal by writing extended attributes with a 
>> > > different name when a user namespace is used. If for example the 
>> > > root user in a user namespace writes the security.capability 
>> > > xattr, the name of the xattr that is actually written is encoded 
>> > > as security.capability@uid=1000 for root mapped to uid 1000 on 
>> > > the host. When listing the xattrs on the host, the existing
>> > > security.capability as well as the security.capability@uid=1000 
>> > > will be shown. Inside the namespace only 'security.capability', 
>> > > with the value of security.capability@uid=1000, is visible.
>> > 
>> > I'm a bit bothered by the @uid=1000 suffix.  What if I want to use 
>> > this capability but am dynamically mapping the namespaces (i.e. I 
>> > know I want unprivileged root, but I'm going to dynamically select 
>> > the range to map based on what's currently available on the 
>> > orchestration system).  If we stick with the @uid=X suffix, then 
>> > dynamic mapping won't work because X is potentially different each 
>> > time and there'll be a name mismatch in my xattrs.  Why not just 
>> > make the suffix @uid, which means if root is mapped to any 
>> > unprivileged uid then we pick this up otherwise we go with the
>> > unsuffixed property?
>> > 
>> > As far as I can see there's no real advantage to discriminating 
>> > userns specific xattrs based on where root is mapped to, unless 
>> > there's a use case I'm missing?
>> 
>> Yes, the use case is: to allow root in the container to set the
>> privilege itself, without endangering any resources not owned by
>> that root.
>
> OK, so you envisage the same filesystem being mounted in different user
> namespaces and being able to see their own value for the xattr.  It
> still seems a bit weird that they'd be able to change file contents and
> have that seen by the other userns but not xattrs.

When you dynamically talk about selecting a range based what is
currently available in an orchestration system I don't know exactly what
you mean.  If it is something like what adduser does, assigning a
container a persistent association with uids and gids, that makes sense
to me.  If it is picking an association just for the lifetime of the
conainer processes it makes me nervous.

Fundamentally storage is persistent and writing data into it is
persistent.

Which means that when dealing with storage we need to make things safe
by default and not depend upon an assumption that the container tools
carefully keeps files separate from each other.

>>From previous conversations I am happy with and generally expect only a
capability xattr per file.

Even with one xattr of any type there is something appealing about
putting the logic that limits that xattr to a namespace in the name.  As
that is trivially backwards compatible.  As that does not require reving
the on disk file format based upon containers.

>> As you say a @uid to say "any unprivileged userns" might be useful.
>> The implication is that root on the host doesn't trust the image
>> enough to write a real global file capability, but trusts it enough
>> to 'endanger' all containers on the host.  If that's the case, I have
>> no objection to adding this as a feature.
>
> Yes, precisely.  The filesystem is certified as permitted to override
> the xattr whatever unprivileged mapping for root is in place.
>
> How would we effect the switch?  I suppose some global flag because I
> can't see we'd be mixing use cases in a physical system.

Mixing use cases in a filesystem almost always happens.  At least if we
are talking an ordinary multi-user system.  Multi-user systems are rarer
than they once were because machines are cheap, and security is hard,
but that should be what we are designing for.  Anything else is just
asking for trouble.

James when you talk about a global flag and mixing use cases in a
physical system it sounds a lot like you are talking about a base
filesystem for shiftfs.

My gut feel is that if this gets down to something like the shiftfs use
case.  I would assume either everything is shifted slightly so that all
uids are say shifted by 100,000 even the capability names of the
capability xattrs.  So that shiftfs or some part of the vfs would need
to shift the names of the xattrs as well.

Certainly I expect filesystems that are mounted with s_user_ns !=
&init_user_ns to be shifting the names of the security xattrs when
queried from &init_user_ns if we go with general design.

Eric