On 07/13/2017 01:49 PM, Eric W. Biederman wrote:
> Stefan Berger <stefanb-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> writes:
>
>> On 07/13/2017 01:14 PM, Eric W. Biederman wrote:
>>> Theodore Ts'o <tytso-3s7WtUTddSA@public.gmane.org> writes:
>>>
>>>> On Thu, Jul 13, 2017 at 07:11:36AM -0500, Eric W. Biederman wrote:
>>>>> The concise summary:
>>>>>
>>>>> Today we have the xattr security.capable that holds a set of
>>>>> capabilities that an application gains when executed.  AKA setuid root exec
>>>>> without actually being setuid root.
>>>>>
>>>>> User namespaces have the concept of capabilities that are not global but
>>>>> are limited to their user namespace.  We do not currently have
>>>>> filesystem support for this concept.
>>>> So correct me if I am wrong; in general, there will only be one
>>>> variant of the form:
>>>>
>>>>      security.foo@uid=15000
>>>>
>>>> It's not like there will be:
>>>>
>>>>      security.foo@uid=1000
>>>>      security.foo@uid=2000
>>>>
>>>> Except.... if you have an Distribution root directory which is shared
>>>> by many containers, you would need to put the xattrs in the overlay
>>>> inodes.  Worse, each time you launch a new container, with a new
>>>> subuid allocation, you will have to iterate over all files with
>>>> capabilities and do a copy-up operations on the xattrs in overlayfs.
>>>> So that's actually a bit of a disaster.
>>>>
>>>> So for distribution overlays, you will need to do things a different
>>>> way, which is to map the distro subdirectory so you know that the
>>>> capability with the global uid 0 should be used for the container
>>>> "root" uid, right?
>>>>
>>>> So this hack of using security.foo@uid=1000 is *only* useful when the
>>>> subcontainer root wants to create the privileged executable.  You
>>>> still have to do things the other way.
>>>>
>>>> So can we make perhaps the assertion that *either*:
>>>>
>>>>      security.foo
>>>>
>>>> exists, *or*
>>>>
>>>>      security.foo@uid=BAR
>>>>
>>>> exists, but never both?  And there BAR is exclusive to only one
>>>> instances?
>>>>
>>>> Otherwise, I suspect that the architecture is going to turn around and
>>>> bite us in the *ss eventually, because someone will want to do
>>>> something crazy and the solution will not be scalable.
>>> Yep.  That is what it looks like from here.
>>>
>>> Which is why I asked the question about scalability of the xattr
>>> implementations.  It looks like trying to accomodate the general
>>> case just gets us in trouble, and sets unrealistic expectations.
>>>
>>> Which strongly suggests that Serge's previous version that
>>> just reved the format of security.capable so that a uid field could
>>> be added is likely to be the better approach.
>>>
>>> I want to see what Serge and Stefan have to say but the case looks
>>> pretty clear cut at the moment.
>> The approach of virtualizing the xattrs on the name-side, which is
>> what this patch does, provides a more general approach than to
>> virtualizing it on the value side, which is what Serge does in his
>> other patch for security.capability alone. With the virtualizing
>> on-the-value side virtualizing the xattr becomes an exercise that
>> needs to be repeated for every xattr name that one would want to
>> virtualize. With this patch you would just add another xattr name to a
>> list, a one-line patch in the end. Xattr with prefixes like trusted.*
>> need a bit more work but this can be woven in as well
>> (https://github.com/stefanberger/linux/commit/397b1a3b24045c67405fc83465e544fc865d402f).
> Reusable code has merit, as it reduces the maintenance burden.
>
> My big question right now is can you implement Ted's suggested
> restriction.  Only one security.foo or secuirty.foo@... attribute ?

We need to raw-list the xattrs and do the check before writing them. I 
am fairly sure this can be done.

So now you want to allow security.foo *and one* security.foo@uid=<> or 
just a single one security.foo(@[[:print:]]*)?

    Stefan