On 07/13/2017 01:49 PM, Eric W. Biederman wrote: > Stefan Berger writes: > >> On 07/13/2017 01:14 PM, Eric W. Biederman wrote: >>> Theodore Ts'o writes: >>> >>>> On Thu, Jul 13, 2017 at 07:11:36AM -0500, Eric W. Biederman wrote: >>>>> The concise summary: >>>>> >>>>> Today we have the xattr security.capable that holds a set of >>>>> capabilities that an application gains when executed. AKA setuid root exec >>>>> without actually being setuid root. >>>>> >>>>> User namespaces have the concept of capabilities that are not global but >>>>> are limited to their user namespace. We do not currently have >>>>> filesystem support for this concept. >>>> So correct me if I am wrong; in general, there will only be one >>>> variant of the form: >>>> >>>> security.foo@uid=15000 >>>> >>>> It's not like there will be: >>>> >>>> security.foo@uid=1000 >>>> security.foo@uid=2000 >>>> >>>> Except.... if you have an Distribution root directory which is shared >>>> by many containers, you would need to put the xattrs in the overlay >>>> inodes. Worse, each time you launch a new container, with a new >>>> subuid allocation, you will have to iterate over all files with >>>> capabilities and do a copy-up operations on the xattrs in overlayfs. >>>> So that's actually a bit of a disaster. >>>> >>>> So for distribution overlays, you will need to do things a different >>>> way, which is to map the distro subdirectory so you know that the >>>> capability with the global uid 0 should be used for the container >>>> "root" uid, right? >>>> >>>> So this hack of using security.foo@uid=1000 is *only* useful when the >>>> subcontainer root wants to create the privileged executable. You >>>> still have to do things the other way. >>>> >>>> So can we make perhaps the assertion that *either*: >>>> >>>> security.foo >>>> >>>> exists, *or* >>>> >>>> security.foo@uid=BAR >>>> >>>> exists, but never both? And there BAR is exclusive to only one >>>> instances? >>>> >>>> Otherwise, I suspect that the architecture is going to turn around and >>>> bite us in the *ss eventually, because someone will want to do >>>> something crazy and the solution will not be scalable. >>> Yep. That is what it looks like from here. >>> >>> Which is why I asked the question about scalability of the xattr >>> implementations. It looks like trying to accomodate the general >>> case just gets us in trouble, and sets unrealistic expectations. >>> >>> Which strongly suggests that Serge's previous version that >>> just reved the format of security.capable so that a uid field could >>> be added is likely to be the better approach. >>> >>> I want to see what Serge and Stefan have to say but the case looks >>> pretty clear cut at the moment. >> The approach of virtualizing the xattrs on the name-side, which is >> what this patch does, provides a more general approach than to >> virtualizing it on the value side, which is what Serge does in his >> other patch for security.capability alone. With the virtualizing >> on-the-value side virtualizing the xattr becomes an exercise that >> needs to be repeated for every xattr name that one would want to >> virtualize. With this patch you would just add another xattr name to a >> list, a one-line patch in the end. Xattr with prefixes like trusted.* >> need a bit more work but this can be woven in as well >> (https://github.com/stefanberger/linux/commit/397b1a3b24045c67405fc83465e544fc865d402f). > Reusable code has merit, as it reduces the maintenance burden. > > My big question right now is can you implement Ted's suggested > restriction. Only one security.foo or secuirty.foo@... attribute ? We need to raw-list the xattrs and do the check before writing them. I am fairly sure this can be done. So now you want to allow security.foo *and one* security.foo@uid=<> or just a single one security.foo(@[[:print:]]*)? Stefan