Theodore Ts'o <tytso@mit.edu> writes:

> On Fri, Jul 14, 2017 at 01:39:59PM -0700, James Bottomley wrote:
>> but why?  That's partly the point of all of this: some security.
>> attributes can't be written by container root without some supervision
>> (the capability ones are the hugely problematic ones from this point of
>> view), but for some there's no reason they shouldn't be.  What would be
>> the reason that root in a container shouldn't be able to write the ima
>> xattr the same as host root could on its filesystem?
>
> So I'm happy to say, "Ix-nay on nested containerization; that way lies
> insanity".  But my understanding is that there will be people who want
> to run containers in containers in containers in containers...  and
> this is what scares me.

I am happy to say we need to bound the space we take in an inode.
So a design that needs more space the more containers you have is
suspicious.

I am not fond of decisions that don't allow nesting of containers.  That
just paints us into a corner.  I am in favor of things that require
little or bounded space.

Generally I will frown at a decision that won't allow nesting, because
nesting of containers happens naturally

> What if someone in the Nth layer of containerization wants to allow
> the container root in the (N+1)th layer to set file capabilities that
> will not be honored in the Nth layer of containerization?

That works perfectly well with the design we have today.  And it only
needs a single security.capability attribute.  The actual design is
associated with the security.capability attribute (either in the
attribute or in the most recent iteration in the attribute name *scowl*)
is to have the uid (from the filesystems point of view) of the root user
of a user namespace.  Running that executable will give you those
capabilities if the uid matches the root user in your user namespace (or
a parent user namespace).

As anyone can who can modify a file can remove a security.capable
attribute just like anyone who can modify a file can remove the setuid
bit this works fine is is sufficient.  Though perhaps a little different.

> Again I think that this is insane, and I'm happy for the answer to be,
> "No, that's not supported".  That the "Host container" can have
> capabilities that it won't honor, but will be honored by all
> subcontainers, but that same thing can't be done between a
> subsubsub-container and its child subsubsubsub-container.
>
> Are we OK with that?  Because how we would encode this in the xattr
> seems to be to be hopelessly not scalable.

That really isn't an issue right now.

The real question right now is what to do with security.ima and
security.evm.  As it was proposed that we share a common code base for
them.  Right now it looks to me like the semantics are sufficiently
different that it doesn't make sense to share code between the two
implementations.  At which point all reason for storing any of this in
the xattr name goes away.  So we just have a single xattr.

Right now I am very much in favor of security xattrs continuing to have
well known names.  That easily limits how much space in the inode you
can have, and it makes thinking about things easier.  It doesn't
preclude having acls in your xattr.  That is exactly how posix
acls are implemented.   But I am not going to build generic support for
them, and I really don't expect they will be needed. 

Eric