linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] Enable namespaced file capabilities
@ 2017-07-11 15:05 Stefan Berger
  2017-07-11 15:05 ` [PATCH v2] xattr: Enable security.capability in user namespaces Stefan Berger
  0 siblings, 1 reply; 76+ messages in thread
From: Stefan Berger @ 2017-07-11 15:05 UTC (permalink / raw)
  To: ebiederm, containers
  Cc: lkp, linux-kernel, zohar, tycho, serge, James.Bottomley, vgoyal,
	christian.brauner, amir73il, linux-security-module, casey,
	Stefan Berger

From: Stefan Berger <stefanb@linux.vnet.ibm.com>

The primary goal of the following patch is to enable file capabilities
in user namespaces without affecting the file capabilities that are
effective on the host. This is to prevent that any unprivileged user
on the host maps his own uid to root in a private namespace, writes
the xattr, and executes the file with privilege on the host.

We achieve this goal by writing extended attributes with a different
name when a user namespace is used. If for example the root user
in a user namespace writes the security.capability xattr, the name
of the xattr that is actually written is encoded as
security.capability@uid=1000 for root mapped to uid 1000 on the host.
When listing the xattrs on the host, the existing security.capability
as well as the security.capability@uid=1000 will be shown. Inside the
namespace only 'security.capability', with the value of
security.capability@uid=1000, is visible.

To maintain compatibility with existing behavior, the value of
security.capability of the host is shown inside the user namespace
once the security.capability of the user namespace has been removed
(which really removes security.capability@uid=1000). Writing to
an extended attribute inside a user namespace effectively hides the
extended attribute of the host.

The general framework that is established with these patches can
be applied to other extended attributes as well, such as security.ima
or the 'trusted.' prefix.

Regards,
   Stefan & Serge

---
 v1->v2:
  - removed patch 3 related to security.selinux; no other xattr than
    security.capability is touched
  - reordered call to xattr_userns_name() to be before call to
    xattr_resolve_name() since the string passed into the handler must
    contain the full xattr name so that xattr_full_name() still works
    properly since only the xattr's suffix is passed to the handler
    function but xattr_resolve_name() may be called from that handler

Stefan Berger (1):
  xattr: Enable security.capability in user namespaces

 fs/xattr.c               | 509 +++++++++++++++++++++++++++++++++++++++++++++--
 security/commoncap.c     |  36 +++-
 security/selinux/hooks.c |   9 +-
 3 files changed, 523 insertions(+), 31 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-11 15:05 [PATCH v2] Enable namespaced file capabilities Stefan Berger
@ 2017-07-11 15:05 ` Stefan Berger
  2017-07-11 17:12   ` Serge E. Hallyn
                     ` (6 more replies)
  0 siblings, 7 replies; 76+ messages in thread
From: Stefan Berger @ 2017-07-11 15:05 UTC (permalink / raw)
  To: ebiederm, containers
  Cc: lkp, linux-kernel, zohar, tycho, serge, James.Bottomley, vgoyal,
	christian.brauner, amir73il, linux-security-module, casey,
	Stefan Berger

From: Stefan Berger <stefanb@linux.vnet.ibm.com>

This patch enables security.capability in user namespaces but also
takes a more general approach to enabling extended attributes in user
namespaces.

The following rules describe the approach using security.foo as a
'user namespace enabled' extended attribute:

Reading of extended attributes:

1a) Reading security.foo from a user namespace will read
    security.foo@uid=<uid> of the parent user namespace instead with uid
    being the mapping of root in that parent user namespace. An
    exception is if root is mapped to uid 0 on the host, and in this case
    we will read security.foo directly.
    --> reading security.foo will read security.foo@uid=1000 for uid
        mapping of root to 1000.

1b) If security.foo@uid=<uid> is not available, the security.foo of the
    parent namespace is tried to be read. This procedure is repeated up to
    the init user namespace. This step only applies for reading of extended
    attributes and provides the same behavior as older system where the
    host's extended attributes applied to user namespaces.

2) All security.foo@uid=<uid> with valid uid mapping in the user namespace
   can be read. The uid within the user namespace will be mapped to the
   corresponding uid on the host and that uid will be used in the name of
   the extended attribute.
   -> reading security.foo@uid=1 will read security.foo@uid=1001 for uid
      mapping of root to 1000, size of at least 2.

   All security.foo@uid=<uid> can be read (by root) on the host with values
   of <uid> also being subject to checking for valid mappings.

3) No other security.foo* can be read.

The same rules for reading apply to writing and removing of user
namespace enabled extended attributes.

When listing extended attributes of a file, only those are presented
to the user namespace that have a valid mapping. Besides that, names
of the extended attributes are adjusted to represent the mapping.
This means that if root is mapped to uid 1000 on the host, the
security.foo@uid=1000 will be listed as security.foo in the user
namespace, security.foo@uid=1001 becomes security.foo@uid=1 and so on.

Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Signed-off-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
---
 fs/xattr.c               | 509 +++++++++++++++++++++++++++++++++++++++++++++--
 security/commoncap.c     |  36 +++-
 security/selinux/hooks.c |   9 +-
 3 files changed, 523 insertions(+), 31 deletions(-)

diff --git a/fs/xattr.c b/fs/xattr.c
index 464c94b..eacad9e 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -133,20 +133,440 @@ xattr_permission(struct inode *inode, const char *name, int mask)
 	return inode_permission(inode, mask);
 }
 
+/*
+ * A list of extended attributes that are supported in user namespaces
+ */
+static const char *const userns_xattrs[] = {
+	XATTR_NAME_CAPS,
+	NULL
+};
+
+/*
+ * xattrs_is_userns_supported - Check whether an xattr is supported in userns
+ *
+ * @name:   full name of the extended attribute
+ * @prefix: do a prefix match (true) or a full match (false)
+ *
+ * This function returns < 0 if not supported, an index into userns_xattrs[]
+ * otherwise.
+ */
+static int
+xattr_is_userns_supported(const char *name, int prefix)
+{
+	int i;
+
+	if (!name)
+		return -1;
+
+	for (i = 0; userns_xattrs[i]; i++) {
+		if (prefix) {
+			if (!strncmp(userns_xattrs[i], name,
+				     strlen(userns_xattrs[i])))
+				return i;
+		} else {
+			if (!strcmp(userns_xattrs[i], name))
+				return i;
+		}
+	}
+	return -1;
+}
+
+/*
+ * xattr_write_uid - print a string in the format of "%s@uid=%u", which
+ *                   includes a prefix string
+ *
+ * @uid:     the uid
+ * @prefix:  prefix string; may be NULL
+ *
+ * This function returns a buffer with the string, or a NULL pointer in
+ * case of out-of-memory error.
+ */
+static char *
+xattr_write_uid(uid_t uid, const char *prefix)
+{
+	size_t buflen;
+	char *buffer;
+
+	buflen = sizeof("@uid=") - 1 + sizeof("4294967295") - 1 + 1;
+	if (prefix)
+		buflen += strlen(prefix);
+
+	buffer = kmalloc(buflen, GFP_KERNEL);
+	if (!buffer)
+		return NULL;
+
+	if (uid == 0)
+		*buffer = 0;
+	else
+		sprintf(buffer, "%s@uid=%u",
+			(prefix) ? prefix : "",
+			uid);
+
+	return buffer;
+}
+
+/*
+ * xattr_parse_uid_from_kuid - parse string in the format @uid=<uid>; consider
+ *                             user namespaces and check mappings
+ *
+ * @uidstr   : string in the format "@uid=<uid>"
+ * @userns   : the user namespace to consult for uid mappings
+ * @n_uidstr : returned pointer holding the rewritten @uid=<uid> string with
+ *             the uid remapped
+ *
+ * This function returns an error code or 0 in case of success. In case
+ * of success, 'n_uidstr' will hold a valid string.
+ */
+static int
+xattr_parse_uid_from_kuid(const char *uidstr, struct user_namespace *userns,
+			  char **n_uidstr)
+{
+	int n;
+	uid_t muid, p_uid;
+	char d;
+	kuid_t tuid;
+
+	*n_uidstr = NULL;
+
+	n = sscanf(uidstr, "@uid=%u%c", &p_uid, &d);
+	if (n != 1)
+		return -EINVAL;
+
+	/* do we have a mapping of the uid? */
+	tuid = KUIDT_INIT(p_uid);
+	muid = from_kuid(userns, tuid);
+	if (muid == -1)
+		return -ENOENT;
+
+	*n_uidstr = xattr_write_uid(muid, NULL);
+	if (!*n_uidstr)
+		return -ENOMEM;
+
+	return 0;
+}
+
+/*
+ * xattr_parse_uid_make_kuid - parse string in the format @uid=<uid>; consider
+ *                             user namespaces and check mappings
+ *
+ * @uidstr   : string in the format "@uid=<uid>"
+ * @userns   : the user namespace to consult for uid mappings
+ * @N_uidstr : returned pointer holding the rewritten @uid=<uid> string with
+ *             the uid remapped
+ *
+ * This function returns an error code or 0 in case of success. In case
+ * of success, 'n_uidstr' will hold a valid string.
+ */
+static int
+xattr_parse_uid_make_kuid(const char *uidstr, struct user_namespace *userns,
+			  char **n_uidstr)
+{
+	int n;
+	uid_t p_uid;
+	char d;
+	kuid_t tuid;
+
+	*n_uidstr = NULL;
+
+	n = sscanf(uidstr, "@uid=%u%c", &p_uid, &d);
+	if (n != 1)
+		return -EINVAL;
+
+	tuid = make_kuid(userns, p_uid);
+	if (!uid_valid(tuid))
+		return -ENOENT;
+
+	*n_uidstr = xattr_write_uid(__kuid_val(tuid), NULL);
+	if (!*n_uidstr)
+		return -ENOMEM;
+
+	return 0;
+}
+
+/*
+ * xattr_rewrite_userns_xattr - Rewrite and filter an extended attribute
+ *                              considering user namespace uid mappings and
+ *                              user namespace support extended attributes
+ *
+ * @name: full name of the extended attribute
+ *
+ * This function returns NULL if the name is to be filtered. Otherwise it can
+ * return the input buffer or a new buffer that the caller needs to free. The
+ * new buffer contains a rewritten extended attribute whose string length may
+ * exceed that of the given name.
+ */
+static char *
+xattr_rewrite_userns_xattr(char *name)
+{
+	int idx, error;
+	size_t len = 0, buflen;
+	char *buffer, *n_uidstr;
+
+	/* prefix-match name against supported attributes */
+	idx = xattr_is_userns_supported(name, true);
+	if (idx < 0) {
+		/* only rewrite those in userns_xattr[*] */
+		return name;
+	}
+
+	/* exact match ? */
+	len = strlen(userns_xattrs[idx]);
+	if (name[len] == 0)
+		return NULL;
+
+	/*
+	 * We must have a name[len] == '@'.
+	 */
+	error = xattr_parse_uid_from_kuid(&name[len], current_user_ns(),
+					  &n_uidstr);
+	if (error)
+		return NULL;
+
+	buflen = len + strlen(n_uidstr) + 1;
+	buffer = kmalloc(buflen, GFP_KERNEL);
+	if (!buffer) {
+		kfree(n_uidstr);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	name[len] = 0;
+
+	snprintf(buffer, buflen, "%s%s", name, n_uidstr);
+
+	name[len] = '@';
+
+	kfree(n_uidstr);
+
+	return buffer;
+}
+
+/*
+ * xattr_list_contains - check whether an xattr list already contains a needle
+ *
+ * @list    : 0-byte separated strings
+ * @listlen : length of the list
+ * @needle  : the needle to search for
+ */
+static int
+xattr_list_contains(const char *list, size_t listlen, const char *needle)
+{
+	size_t o = 0;
+
+	while (o < listlen) {
+		if (!strcmp(&list[o], needle))
+			return true;
+		o += strlen(&list[o]) + 1;
+	}
+	return false;
+}
+
+/*
+ * xattr_list_userns_rewrite - Rewrite list of xattr names for user namespaces
+ *                             or determine needed size for attribute list
+ *                             in case size == 0
+ *
+ * In a user namespace we do not present all extended attributes to the
+ * user. We filter out those that are in the list of userns supported xattr.
+ * Besides that we filter out those with @uid=<uid> when there is no mapping
+ * for that uid in the current user namespace.
+ *
+ * @list:        list of 0-byte separated xattr names
+ * @size:        the size of the list; may be 0 to determine needed list size
+ * @list_maxlen: allocated buffer size of list
+ */
+static ssize_t
+xattr_list_userns_rewrite(char *list, ssize_t size, size_t list_maxlen)
+{
+	char *nlist = NULL;
+	size_t s_off, len, nlen;
+	ssize_t d_off;
+	char *name, *newname;
+
+	if (!list || size < 0 || current_user_ns() == &init_user_ns)
+		return size;
+
+	if (size) {
+		nlist = kmalloc(list_maxlen, GFP_KERNEL);
+		if (!nlist)
+			return -ENOMEM;
+	}
+
+	s_off = d_off = 0;
+	while (s_off < size || size == 0) {
+		name = &list[s_off];
+
+		len = strlen(name);
+		if (!len)
+			break;
+
+		if (xattr_is_userns_supported(name, false) >= 0)
+			newname = name;
+		else {
+			newname = xattr_rewrite_userns_xattr(name);
+			if (IS_ERR(newname)) {
+				d_off = PTR_ERR(newname);
+				goto out_free;
+			}
+		}
+		if (newname && !xattr_list_contains(nlist, d_off, newname)) {
+			nlen = strlen(newname);
+
+			if (nlist) {
+				if (nlen + 1 > list_maxlen)
+					break;
+				strcpy(&nlist[d_off], newname);
+			}
+
+			d_off += nlen + 1;
+			if (newname != name)
+				kfree(newname);
+		}
+		s_off += len + 1;
+	}
+	if (nlist)
+		memcpy(list, nlist, d_off);
+out_free:
+	kfree(nlist);
+
+	return d_off;
+}
+
+/*
+ * xattr_userns_name - modify the name of a user namespace supported
+ *                     extended attribute
+ *
+ * In a user namespace we prevent read/write accesses to the host's
+ * security.foo to protect these extended attributes.
+ *
+ * Reading:
+ * 1a) Reading security.foo from a user namespace will read
+ *     security.foo@uid=<uid> of the parent user namespace instead with uid
+ *     being the mapping of root in that parent user namespace. An
+ *     exception is if root is mapped to uid 0 on the host, and in this case
+ *     we will read security.foo directly.
+ *     -> reading security.foo will read security.foo@uid=1000 for a uid
+ *        mapping of root to 1000.
+ *
+ * 1b) If security.foo@uid=<uid> is not available, the security.foo of the
+ *     parent namespace is tried to be read. This procedure is repeated up to
+ *     the init user namespace. This step only applies for reading of extended
+ *     attributes and provides the same behavior as older systems where the
+ *     host's extended attributes applied to user namespaces.
+ *
+ * 2) All security.foo@uid=<uid> with valid uid mappings in the user namespace
+ *    an be read. The uid within the user namespace will be mapped to the
+ *    corresponding uid on the host and that uid will be used in the name of
+ *    the extended attribute.
+ *    -> reading security.foo@uid=1 will read security.foo@uid=1001 for a uid
+ *       mapping of root to 1000, size of at least 2.
+ *
+ *    All security.foo@uid=<uid> can be read (by root) on the host with values
+ *    of <uid> also being subject to checking for valid mappings.
+ *
+ * 3) No other security.foo* can be read.
+ *
+ * Writing and removing:
+ * The same rules for reading apply to writing and removing, except for 1b).
+ *
+ * This function returns a buffer with either the original name or the
+ * user namespace adjusted name of the extended attribute.
+ *
+ * @name:     the full name of the extended attribute, e.g. security.foo
+ */
+char *
+xattr_userns_name(const char *name, struct user_namespace *userns)
+{
+	size_t buflen;
+	char *buffer, *n_uidstr;
+	kuid_t root_uid = make_kuid(userns, 0);
+	int idx, error;
+	size_t len;
+
+	/* only security.foo will be changed here - prefix match here */
+	idx = xattr_is_userns_supported(name, true);
+	if (idx < 0)
+		goto out_copy;
+
+	/* read security.foo? --> read security.foo@uid=<uid> instead */
+	len = strlen(userns_xattrs[idx]);
+	if (name[len] == 0) {
+		/*
+		 * init_user_ns or userns with root mapped to uid 0
+		 * may read security.foo directly
+		 */
+		if (userns == &init_user_ns ||
+		    __kuid_val(root_uid) == 0)
+			goto out_copy;
+
+		if (!uid_valid(root_uid))
+			return ERR_PTR(-EINVAL);
+
+		buffer = xattr_write_uid(__kuid_val(root_uid), name);
+		if (!buffer)
+			return ERR_PTR(-ENOMEM);
+
+		return buffer;
+	}
+
+	/*
+	 * We must have name[len] == '@'.
+	 */
+	error = xattr_parse_uid_make_kuid(&name[len],
+					  userns,
+					  &n_uidstr);
+	if (error)
+		return ERR_PTR(error);
+
+	/* name[len] == '@' */
+	buflen = len + strlen(n_uidstr) + 1;
+	buffer = kmalloc(buflen, GFP_KERNEL);
+	if (!buffer) {
+		kfree(n_uidstr);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	snprintf(buffer, len + 1, "%s", name);
+	snprintf(&buffer[len], buflen - len, "%s", n_uidstr);
+	kfree(n_uidstr);
+
+	return buffer;
+
+out_copy:
+	buffer = kstrdup(name, GFP_KERNEL);
+	if (!buffer)
+		return ERR_PTR(-ENOMEM);
+
+	return buffer;
+}
+
 int
 __vfs_setxattr(struct dentry *dentry, struct inode *inode, const char *name,
 	       const void *value, size_t size, int flags)
 {
 	const struct xattr_handler *handler;
+	char *newname;
+	int ret;
 
+	newname = xattr_userns_name(name, current_user_ns());
+	if (IS_ERR(newname))
+		return PTR_ERR(newname);
+	name = newname;
 	handler = xattr_resolve_name(inode, &name);
-	if (IS_ERR(handler))
-		return PTR_ERR(handler);
-	if (!handler->set)
-		return -EOPNOTSUPP;
+	if (IS_ERR(handler)) {
+		ret = PTR_ERR(handler);
+		goto out;
+	}
+	if (!handler->set) {
+		ret = -EOPNOTSUPP;
+		goto out;
+	}
 	if (size == 0)
 		value = "";  /* empty EA, do not remove */
-	return handler->set(handler, dentry, inode, name, value, size, flags);
+	ret = handler->set(handler, dentry, inode, name, value, size, flags);
+
+out:
+	kfree(newname);
+	return ret;
 }
 EXPORT_SYMBOL(__vfs_setxattr);
 
@@ -301,14 +721,39 @@ ssize_t
 __vfs_getxattr(struct dentry *dentry, struct inode *inode, const char *name,
 	       void *value, size_t size)
 {
-	const struct xattr_handler *handler;
+	const struct xattr_handler *handler = NULL;
+	char *newname =  NULL;
+	int ret, userns_supt_xattr;
+	struct user_namespace *userns = current_user_ns();
+
+	userns_supt_xattr = (xattr_is_userns_supported(name, false) >= 0);
+
+	do {
+		kfree(newname);
+
+		newname = xattr_userns_name(name, userns);
+		if (IS_ERR(newname))
+			return PTR_ERR(newname);
+
+		if (!handler) {
+			name = newname;
+			handler = xattr_resolve_name(inode, &name);
+			if (IS_ERR(handler)) {
+				ret = PTR_ERR(handler);
+				goto out;
+			}
+			if (!handler->get) {
+				ret = -EOPNOTSUPP;
+				goto out;
+			}
+		}
+		ret = handler->get(handler, dentry, inode, name, value, size);
+		userns = userns->parent;
+	} while ((ret == -ENODATA) && userns && userns_supt_xattr);
 
-	handler = xattr_resolve_name(inode, &name);
-	if (IS_ERR(handler))
-		return PTR_ERR(handler);
-	if (!handler->get)
-		return -EOPNOTSUPP;
-	return handler->get(handler, dentry, inode, name, value, size);
+out:
+	kfree(newname);
+	return ret;
 }
 EXPORT_SYMBOL(__vfs_getxattr);
 
@@ -328,8 +773,16 @@ vfs_getxattr(struct dentry *dentry, const char *name, void *value, size_t size)
 
 	if (!strncmp(name, XATTR_SECURITY_PREFIX,
 				XATTR_SECURITY_PREFIX_LEN)) {
-		const char *suffix = name + XATTR_SECURITY_PREFIX_LEN;
-		int ret = xattr_getsecurity(inode, suffix, value, size);
+		int ret;
+		const char *suffix;
+		char *newname = xattr_userns_name(name, current_user_ns());
+		if (IS_ERR(newname))
+			return PTR_ERR(newname);
+
+		suffix = newname + XATTR_SECURITY_PREFIX_LEN;
+
+		ret = xattr_getsecurity(inode, suffix, value, size);
+		kfree(newname);
 		/*
 		 * Only overwrite the return value if a security module
 		 * is actually active.
@@ -360,6 +813,9 @@ vfs_listxattr(struct dentry *dentry, char *list, size_t size)
 		if (size && error > size)
 			error = -ERANGE;
 	}
+	if (error > 0)
+		error = xattr_list_userns_rewrite(list, error, size);
+
 	return error;
 }
 EXPORT_SYMBOL_GPL(vfs_listxattr);
@@ -369,13 +825,28 @@ __vfs_removexattr(struct dentry *dentry, const char *name)
 {
 	struct inode *inode = d_inode(dentry);
 	const struct xattr_handler *handler;
+	char *newname;
+	int ret;
 
+	newname = xattr_userns_name(name, current_user_ns());
+	if (IS_ERR(newname))
+		return PTR_ERR(newname);
+	name = newname;
 	handler = xattr_resolve_name(inode, &name);
-	if (IS_ERR(handler))
-		return PTR_ERR(handler);
-	if (!handler->set)
-		return -EOPNOTSUPP;
-	return handler->set(handler, dentry, inode, name, NULL, 0, XATTR_REPLACE);
+	if (IS_ERR(handler)) {
+		ret = PTR_ERR(handler);
+		goto out;
+	}
+	if (!handler->set) {
+		ret = -EOPNOTSUPP;
+		goto out;
+	}
+	ret = handler->set(handler, dentry, inode, name, NULL, 0, XATTR_REPLACE);
+
+out:
+	kfree(newname);
+
+	return ret;
 }
 EXPORT_SYMBOL(__vfs_removexattr);
 
diff --git a/security/commoncap.c b/security/commoncap.c
index 7abebd7..c842690 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -660,15 +660,23 @@ int cap_bprm_secureexec(struct linux_binprm *bprm)
 int cap_inode_setxattr(struct dentry *dentry, const char *name,
 		       const void *value, size_t size, int flags)
 {
-	if (!strcmp(name, XATTR_NAME_CAPS)) {
-		if (!capable(CAP_SETFCAP))
+	if (strncmp(name, XATTR_SECURITY_PREFIX,
+		    sizeof(XATTR_SECURITY_PREFIX) - 1) != 0)
+		return 0;
+
+	if (strncmp(name, XATTR_NAME_CAPS,
+		    sizeof(XATTR_NAME_CAPS) - 1) == 0) {
+		struct inode *inode = d_backing_inode(dentry);
+
+		if (!inode)
+			return -EINVAL;
+		if (!capable_wrt_inode_uidgid(inode, CAP_SETFCAP))
 			return -EPERM;
+
 		return 0;
 	}
 
-	if (!strncmp(name, XATTR_SECURITY_PREFIX,
-		     sizeof(XATTR_SECURITY_PREFIX) - 1) &&
-	    !capable(CAP_SYS_ADMIN))
+	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
 	return 0;
 }
@@ -686,15 +694,23 @@ int cap_inode_setxattr(struct dentry *dentry, const char *name,
  */
 int cap_inode_removexattr(struct dentry *dentry, const char *name)
 {
-	if (!strcmp(name, XATTR_NAME_CAPS)) {
-		if (!capable(CAP_SETFCAP))
+	if (strncmp(name, XATTR_SECURITY_PREFIX,
+		    sizeof(XATTR_SECURITY_PREFIX) - 1) != 0)
+		return 0;
+
+	if (strncmp(name, XATTR_NAME_CAPS,
+		    sizeof(XATTR_NAME_CAPS) - 1) == 0) {
+		struct inode *inode = d_backing_inode(dentry);
+
+		if (!inode)
+			return -EINVAL;
+		if (!capable_wrt_inode_uidgid(inode, CAP_SETFCAP))
 			return -EPERM;
+
 		return 0;
 	}
 
-	if (!strncmp(name, XATTR_SECURITY_PREFIX,
-		     sizeof(XATTR_SECURITY_PREFIX) - 1) &&
-	    !capable(CAP_SYS_ADMIN))
+	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
 	return 0;
 }
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 819fd68..702c225 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -3091,8 +3091,13 @@ static int selinux_inode_setotherxattr(struct dentry *dentry, const char *name)
 
 	if (!strncmp(name, XATTR_SECURITY_PREFIX,
 		     sizeof XATTR_SECURITY_PREFIX - 1)) {
-		if (!strcmp(name, XATTR_NAME_CAPS)) {
-			if (!capable(CAP_SETFCAP))
+		if (!strncmp(name, XATTR_NAME_CAPS,
+			     sizeof(XATTR_NAME_CAPS) - 1)) {
+			struct inode *inode = d_backing_inode(dentry);
+
+			if (!inode)
+				return -EINVAL;
+			if (!capable_wrt_inode_uidgid(inode, CAP_SETFCAP))
 				return -EPERM;
 		} else if (!capable(CAP_SYS_ADMIN)) {
 			/* A different attribute in the security namespace.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-11 15:05 ` [PATCH v2] xattr: Enable security.capability in user namespaces Stefan Berger
@ 2017-07-11 17:12   ` Serge E. Hallyn
  2017-07-12  0:15     ` Stefan Berger
  2017-07-12  3:45   ` Serge E. Hallyn
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 76+ messages in thread
From: Serge E. Hallyn @ 2017-07-11 17:12 UTC (permalink / raw)
  To: Stefan Berger
  Cc: ebiederm, containers, lkp, linux-kernel, zohar, tycho, serge,
	James.Bottomley, vgoyal, christian.brauner, amir73il,
	linux-security-module, casey, Stefan Berger

Quoting Stefan Berger (Stefan Bergerstefanb@linux.vnet.ibm.com):
> er.kernel.org>
> X-Mailing-List: linux-kernel@vger.kernel.org
> Content-Length: 19839
> Lines: 700
> X-UID: 24770                                                 
> Status: RO
> 
> From: Stefan Berger <stefanb@linux.vnet.ibm.com>
> 
> This patch enables security.capability in user namespaces but also
> takes a more general approach to enabling extended attributes in user
> namespaces.
> 
> The following rules describe the approach using security.foo as a
> 'user namespace enabled' extended attribute:
> 
> Reading of extended attributes:
> 
> 1a) Reading security.foo from a user namespace will read
>     security.foo@uid=<uid> of the parent user namespace instead with uid
>     being the mapping of root in that parent user namespace. An
>     exception is if root is mapped to uid 0 on the host, and in this case
>     we will read security.foo directly.
>     --> reading security.foo will read security.foo@uid=1000 for uid
>         mapping of root to 1000.
> 
> 1b) If security.foo@uid=<uid> is not available, the security.foo of the
>     parent namespace is tried to be read. This procedure is repeated up to
>     the init user namespace. This step only applies for reading of extended
>     attributes and provides the same behavior as older system where the
>     host's extended attributes applied to user namespaces.
> 
> 2) All security.foo@uid=<uid> with valid uid mapping in the user namespace
>    can be read. The uid within the user namespace will be mapped to the
>    corresponding uid on the host and that uid will be used in the name of
>    the extended attribute.
>    -> reading security.foo@uid=1 will read security.foo@uid=1001 for uid
>       mapping of root to 1000, size of at least 2.
> 
>    All security.foo@uid=<uid> can be read (by root) on the host with values
>    of <uid> also being subject to checking for valid mappings.
> 
> 3) No other security.foo* can be read.
> 
> The same rules for reading apply to writing and removing of user
> namespace enabled extended attributes.
> 
> When listing extended attributes of a file, only those are presented
> to the user namespace that have a valid mapping. Besides that, names
> of the extended attributes are adjusted to represent the mapping.
> This means that if root is mapped to uid 1000 on the host, the
> security.foo@uid=1000 will be listed as security.foo in the user
> namespace, security.foo@uid=1001 becomes security.foo@uid=1 and so on.
> 
> Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
> Signed-off-by: Serge Hallyn <serge@hallyn.com>
> Reviewed-by: Serge Hallyn <serge@hallyn.com>
> ---
>  fs/xattr.c               | 509 +++++++++++++++++++++++++++++++++++++++++++++--
>  security/commoncap.c     |  36 +++-
>  security/selinux/hooks.c |   9 +-
>  3 files changed, 523 insertions(+), 31 deletions(-)
> 
> diff --git a/fs/xattr.c b/fs/xattr.c
> index 464c94b..eacad9e 100644
> --- a/fs/xattr.c
> +++ b/fs/xattr.c
> @@ -133,20 +133,440 @@ xattr_permission(struct inode *inode, const char *name, int mask)
>  	return inode_permission(inode, mask);
>  }
>  
> +/*
> + * A list of extended attributes that are supported in user namespaces
> + */
> +static const char *const userns_xattrs[] = {
> +	XATTR_NAME_CAPS,
> +	NULL
> +};
> +
> +/*
> + * xattrs_is_userns_supported - Check whether an xattr is supported in userns
> + *
> + * @name:   full name of the extended attribute
> + * @prefix: do a prefix match (true) or a full match (false)
> + *
> + * This function returns < 0 if not supported, an index into userns_xattrs[]
> + * otherwise.
> + */
> +static int
> +xattr_is_userns_supported(const char *name, int prefix)
> +{
> +	int i;
> +
> +	if (!name)
> +		return -1;
> +
> +	for (i = 0; userns_xattrs[i]; i++) {
> +		if (prefix) {
> +			if (!strncmp(userns_xattrs[i], name,
> +				     strlen(userns_xattrs[i])))
> +				return i;

I think you here need to also check that the next char is either
'\0' or '.' (or maybe '@')

> +		} else {
> +			if (!strcmp(userns_xattrs[i], name))
> +				return i;
> +		}
> +	}
> +	return -1;
> +}
> +
> +/*
> + * xattr_write_uid - print a string in the format of "%s@uid=%u", which
> + *                   includes a prefix string
> + *
> + * @uid:     the uid
> + * @prefix:  prefix string; may be NULL
> + *
> + * This function returns a buffer with the string, or a NULL pointer in
> + * case of out-of-memory error.
> + */
> +static char *
> +xattr_write_uid(uid_t uid, const char *prefix)
> +{
> +	size_t buflen;
> +	char *buffer;
> +
> +	buflen = sizeof("@uid=") - 1 + sizeof("4294967295") - 1 + 1;
> +	if (prefix)
> +		buflen += strlen(prefix);
> +
> +	buffer = kmalloc(buflen, GFP_KERNEL);
> +	if (!buffer)
> +		return NULL;
> +
> +	if (uid == 0)
> +		*buffer = 0;

Do you need to print out the prefix here?

> +	else
> +		sprintf(buffer, "%s@uid=%u",
> +			(prefix) ? prefix : "",
> +			uid);
> +
> +	return buffer;
> +}

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-11 17:12   ` Serge E. Hallyn
@ 2017-07-12  0:15     ` Stefan Berger
  2017-07-12  0:47       ` Serge E. Hallyn
  0 siblings, 1 reply; 76+ messages in thread
From: Stefan Berger @ 2017-07-12  0:15 UTC (permalink / raw)
  To: Serge E. Hallyn, Stefan Berger
  Cc: ebiederm, containers, lkp, linux-kernel, zohar, tycho,
	James.Bottomley, vgoyal, christian.brauner, amir73il,
	linux-security-module, casey

On 07/11/2017 01:12 PM, Serge E. Hallyn wrote:
> Quoting Stefan Berger (Stefan Bergerstefanb@linux.vnet.ibm.com):
>> er.kernel.org>
>> X-Mailing-List: linux-kernel@vger.kernel.org
>> Content-Length: 19839
>> Lines: 700
>> X-UID: 24770
>> Status: RO
>>
>> From: Stefan Berger <stefanb@linux.vnet.ibm.com>
>>
>> This patch enables security.capability in user namespaces but also
>> takes a more general approach to enabling extended attributes in user
>> namespaces.
>>
>> The following rules describe the approach using security.foo as a
>> 'user namespace enabled' extended attribute:
>>
>> Reading of extended attributes:
>>
>> 1a) Reading security.foo from a user namespace will read
>>      security.foo@uid=<uid> of the parent user namespace instead with uid
>>      being the mapping of root in that parent user namespace. An
>>      exception is if root is mapped to uid 0 on the host, and in this case
>>      we will read security.foo directly.
>>      --> reading security.foo will read security.foo@uid=1000 for uid
>>          mapping of root to 1000.
>>
>> 1b) If security.foo@uid=<uid> is not available, the security.foo of the
>>      parent namespace is tried to be read. This procedure is repeated up to
>>      the init user namespace. This step only applies for reading of extended
>>      attributes and provides the same behavior as older system where the
>>      host's extended attributes applied to user namespaces.
>>
>> 2) All security.foo@uid=<uid> with valid uid mapping in the user namespace
>>     can be read. The uid within the user namespace will be mapped to the
>>     corresponding uid on the host and that uid will be used in the name of
>>     the extended attribute.
>>     -> reading security.foo@uid=1 will read security.foo@uid=1001 for uid
>>        mapping of root to 1000, size of at least 2.
>>
>>     All security.foo@uid=<uid> can be read (by root) on the host with values
>>     of <uid> also being subject to checking for valid mappings.
>>
>> 3) No other security.foo* can be read.
>>
>> The same rules for reading apply to writing and removing of user
>> namespace enabled extended attributes.
>>
>> When listing extended attributes of a file, only those are presented
>> to the user namespace that have a valid mapping. Besides that, names
>> of the extended attributes are adjusted to represent the mapping.
>> This means that if root is mapped to uid 1000 on the host, the
>> security.foo@uid=1000 will be listed as security.foo in the user
>> namespace, security.foo@uid=1001 becomes security.foo@uid=1 and so on.
>>
>> Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
>> Signed-off-by: Serge Hallyn <serge@hallyn.com>
>> Reviewed-by: Serge Hallyn <serge@hallyn.com>
>> ---
>>   fs/xattr.c               | 509 +++++++++++++++++++++++++++++++++++++++++++++--
>>   security/commoncap.c     |  36 +++-
>>   security/selinux/hooks.c |   9 +-
>>   3 files changed, 523 insertions(+), 31 deletions(-)
>>
>> diff --git a/fs/xattr.c b/fs/xattr.c
>> index 464c94b..eacad9e 100644
>> --- a/fs/xattr.c
>> +++ b/fs/xattr.c
>> @@ -133,20 +133,440 @@ xattr_permission(struct inode *inode, const char *name, int mask)
>>   	return inode_permission(inode, mask);
>>   }
>>   
>> +/*
>> + * A list of extended attributes that are supported in user namespaces
>> + */
>> +static const char *const userns_xattrs[] = {
>> +	XATTR_NAME_CAPS,
>> +	NULL
>> +};
>> +
>> +/*
>> + * xattrs_is_userns_supported - Check whether an xattr is supported in userns
>> + *
>> + * @name:   full name of the extended attribute
>> + * @prefix: do a prefix match (true) or a full match (false)
>> + *
>> + * This function returns < 0 if not supported, an index into userns_xattrs[]
>> + * otherwise.
>> + */
>> +static int
>> +xattr_is_userns_supported(const char *name, int prefix)
>> +{
>> +	int i;
>> +
>> +	if (!name)
>> +		return -1;
>> +
>> +	for (i = 0; userns_xattrs[i]; i++) {
>> +		if (prefix) {
>> +			if (!strncmp(userns_xattrs[i], name,
>> +				     strlen(userns_xattrs[i])))
>> +				return i;
> I think you here need to also check that the next char is either
> '\0' or '.' (or maybe '@')

I have the checks for '@' and '\0' done by the caller. With the current 
support of only security.capability I don't think we need to check for '.'.

>
>> +		} else {
>> +			if (!strcmp(userns_xattrs[i], name))
>> +				return i;
>> +		}
>> +	}
>> +	return -1;
>> +}
>> +
>> +/*
>> + * xattr_write_uid - print a string in the format of "%s@uid=%u", which
>> + *                   includes a prefix string
>> + *
>> + * @uid:     the uid
>> + * @prefix:  prefix string; may be NULL
>> + *
>> + * This function returns a buffer with the string, or a NULL pointer in
>> + * case of out-of-memory error.
>> + */
>> +static char *
>> +xattr_write_uid(uid_t uid, const char *prefix)
>> +{
>> +	size_t buflen;
>> +	char *buffer;
>> +
>> +	buflen = sizeof("@uid=") - 1 + sizeof("4294967295") - 1 + 1;
>> +	if (prefix)
>> +		buflen += strlen(prefix);
>> +
>> +	buffer = kmalloc(buflen, GFP_KERNEL);
>> +	if (!buffer)
>> +		return NULL;
>> +
>> +	if (uid == 0)
>> +		*buffer = 0;
> Do you need to print out the prefix here?

Right. Fixed.


>
>> +	else
>> +		sprintf(buffer, "%s@uid=%u",
>> +			(prefix) ? prefix : "",
>> +			uid);
>> +
>> +	return buffer;
>> +}


Thanks.

    Stefan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-12  0:15     ` Stefan Berger
@ 2017-07-12  0:47       ` Serge E. Hallyn
  0 siblings, 0 replies; 76+ messages in thread
From: Serge E. Hallyn @ 2017-07-12  0:47 UTC (permalink / raw)
  To: Stefan Berger
  Cc: Serge E. Hallyn, Stefan Berger, ebiederm, containers, lkp,
	linux-kernel, zohar, tycho, James.Bottomley, vgoyal,
	christian.brauner, amir73il, linux-security-module, casey

Quoting Stefan Berger (stefanb@linux.vnet.ibm.com):
> On 07/11/2017 01:12 PM, Serge E. Hallyn wrote:
> >>diff --git a/fs/xattr.c b/fs/xattr.c
> >>index 464c94b..eacad9e 100644
> >>--- a/fs/xattr.c
> >>+++ b/fs/xattr.c
> >>@@ -133,20 +133,440 @@ xattr_permission(struct inode *inode, const char *name, int mask)
> >>  	return inode_permission(inode, mask);
> >>  }
> >>+/*
> >>+ * A list of extended attributes that are supported in user namespaces
> >>+ */
> >>+static const char *const userns_xattrs[] = {
> >>+	XATTR_NAME_CAPS,
> >>+	NULL
> >>+};
> >>+
> >>+/*
> >>+ * xattrs_is_userns_supported - Check whether an xattr is supported in userns
> >>+ *
> >>+ * @name:   full name of the extended attribute
> >>+ * @prefix: do a prefix match (true) or a full match (false)
> >>+ *
> >>+ * This function returns < 0 if not supported, an index into userns_xattrs[]
> >>+ * otherwise.
> >>+ */
> >>+static int
> >>+xattr_is_userns_supported(const char *name, int prefix)
> >>+{
> >>+	int i;
> >>+
> >>+	if (!name)
> >>+		return -1;
> >>+
> >>+	for (i = 0; userns_xattrs[i]; i++) {
> >>+		if (prefix) {
> >>+			if (!strncmp(userns_xattrs[i], name,
> >>+				     strlen(userns_xattrs[i])))
> >>+				return i;
> >I think you here need to also check that the next char is either
> >'\0' or '.' (or maybe '@')
> 
> I have the checks for '@' and '\0' done by the caller. With the
> current support of only security.capability I don't think we need to
> check for '.'.

Ah - ok, thanks.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-11 15:05 ` [PATCH v2] xattr: Enable security.capability in user namespaces Stefan Berger
  2017-07-11 17:12   ` Serge E. Hallyn
@ 2017-07-12  3:45   ` Serge E. Hallyn
  2017-07-12 11:35     ` Stefan Berger
  2017-07-12  7:59   ` James Morris
                     ` (4 subsequent siblings)
  6 siblings, 1 reply; 76+ messages in thread
From: Serge E. Hallyn @ 2017-07-12  3:45 UTC (permalink / raw)
  To: Stefan Berger
  Cc: ebiederm, containers, lkp, linux-kernel, zohar, tycho, serge,
	James.Bottomley, vgoyal, christian.brauner, amir73il,
	linux-security-module, casey, Stefan Berger

Quoting Stefan Berger (Stefan Bergerstefanb@linux.vnet.ibm.com):
> +/*
> + * xattr_list_userns_rewrite - Rewrite list of xattr names for user namespaces
> + *                             or determine needed size for attribute list
> + *                             in case size == 0
> + *
> + * In a user namespace we do not present all extended attributes to the
> + * user. We filter out those that are in the list of userns supported xattr.
> + * Besides that we filter out those with @uid=<uid> when there is no mapping
> + * for that uid in the current user namespace.
> + *
> + * @list:        list of 0-byte separated xattr names
> + * @size:        the size of the list; may be 0 to determine needed list size
> + * @list_maxlen: allocated buffer size of list
> + */
> +static ssize_t
> +xattr_list_userns_rewrite(char *list, ssize_t size, size_t list_maxlen)
> +{
> +	char *nlist = NULL;
> +	size_t s_off, len, nlen;
> +	ssize_t d_off;
> +	char *name, *newname;
> +
> +	if (!list || size < 0 || current_user_ns() == &init_user_ns)
> +		return size;
> +
> +	if (size) {
> +		nlist = kmalloc(list_maxlen, GFP_KERNEL);
> +		if (!nlist)
> +			return -ENOMEM;
> +	}
> +
> +	s_off = d_off = 0;
> +	while (s_off < size || size == 0) {
> +		name = &list[s_off];
> +
> +		len = strlen(name);
> +		if (!len)
> +			break;
> +
> +		if (xattr_is_userns_supported(name, false) >= 0)
> +			newname = name;
> +		else {
> +			newname = xattr_rewrite_userns_xattr(name);

Why are you doing this here?  If we get here it means that
xattr_is_userns_supported() returned < 0, meaning name is
not userns-supported.  So xattr_rewrite_userns_xattr() will
just return name.  Am I missing something?

> +			if (IS_ERR(newname)) {
> +				d_off = PTR_ERR(newname);
> +				goto out_free;
> +			}
> +		}
> +		if (newname && !xattr_list_contains(nlist, d_off, newname)) {

Now here, if name was recalculated to @newname, and @newname is
found in the nlist, that should raise an error right?  Something
fishy is going on?

> +			nlen = strlen(newname);
> +
> +			if (nlist) {
> +				if (nlen + 1 > list_maxlen)

d_off needs to be set to -ERANGE here.

> +					break;
> +				strcpy(&nlist[d_off], newname);
> +			}
> +
> +			d_off += nlen + 1;
> +			if (newname != name)
> +				kfree(newname);
> +		}
> +		s_off += len + 1;
> +	}
> +	if (nlist)
> +		memcpy(list, nlist, d_off);
> +out_free:
> +	kfree(nlist);
> +
> +	return d_off;
> +}

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-11 15:05 ` [PATCH v2] xattr: Enable security.capability in user namespaces Stefan Berger
  2017-07-11 17:12   ` Serge E. Hallyn
  2017-07-12  3:45   ` Serge E. Hallyn
@ 2017-07-12  7:59   ` James Morris
  2017-07-12 13:25   ` Eric W. Biederman
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 76+ messages in thread
From: James Morris @ 2017-07-12  7:59 UTC (permalink / raw)
  To: Stefan Berger
  Cc: ebiederm, containers, lkp, linux-kernel, James.Bottomley,
	linux-security-module, casey, zohar

On Tue, 11 Jul 2017, Stefan Berger wrote:

> +	buflen = sizeof("@uid=") - 1 + sizeof("4294967295") - 1 + 1;

Why not strlen() here?

-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-12  3:45   ` Serge E. Hallyn
@ 2017-07-12 11:35     ` Stefan Berger
  2017-07-12 17:32       ` Serge E. Hallyn
  0 siblings, 1 reply; 76+ messages in thread
From: Stefan Berger @ 2017-07-12 11:35 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: ebiederm, containers, lkp, linux-kernel, zohar, tycho,
	James.Bottomley, vgoyal, christian.brauner, amir73il,
	linux-security-module, casey

On 07/11/2017 11:45 PM, Serge E. Hallyn wrote:
> Quoting Stefan Berger (Stefan Bergerstefanb@linux.vnet.ibm.com):
>> +/*
>> + * xattr_list_userns_rewrite - Rewrite list of xattr names for user namespaces
>> + *                             or determine needed size for attribute list
>> + *                             in case size == 0
>> + *
>> + * In a user namespace we do not present all extended attributes to the
>> + * user. We filter out those that are in the list of userns supported xattr.
>> + * Besides that we filter out those with @uid=<uid> when there is no mapping
>> + * for that uid in the current user namespace.
>> + *
>> + * @list:        list of 0-byte separated xattr names
>> + * @size:        the size of the list; may be 0 to determine needed list size
>> + * @list_maxlen: allocated buffer size of list
>> + */
>> +static ssize_t
>> +xattr_list_userns_rewrite(char *list, ssize_t size, size_t list_maxlen)
>> +{
>> +	char *nlist = NULL;
>> +	size_t s_off, len, nlen;
>> +	ssize_t d_off;
>> +	char *name, *newname;
>> +
>> +	if (!list || size < 0 || current_user_ns() == &init_user_ns)
>> +		return size;
>> +
>> +	if (size) {
>> +		nlist = kmalloc(list_maxlen, GFP_KERNEL);
>> +		if (!nlist)
>> +			return -ENOMEM;
>> +	}
>> +
>> +	s_off = d_off = 0;
>> +	while (s_off < size || size == 0) {
>> +		name = &list[s_off];
>> +
>> +		len = strlen(name);
>> +		if (!len)
>> +			break;
>> +
>> +		if (xattr_is_userns_supported(name, false) >= 0)
>> +			newname = name;
>> +		else {
>> +			newname = xattr_rewrite_userns_xattr(name);
> Why are you doing this here?  If we get here it means that
> xattr_is_userns_supported() returned < 0, meaning name is
> not userns-supported.  So xattr_rewrite_userns_xattr() will
> just return name.  Am I missing something?

xattr_is_userns_support(name, false) does a _full string match_ rather 
than a prefix match and will only return >= 0 for security.capability. 
This case handles the hosts's security.capability which  'shines 
through' for read and needs to be listed. Only in this case we set 
newname=name.

In the else branch we handle security.capability@uid=1000 and rewrite 
that to security.capability for root mapping to uid=1000.

>
>> +			if (IS_ERR(newname)) {
>> +				d_off = PTR_ERR(newname);
>> +				goto out_free;
>> +			}
>> +		}
>> +		if (newname && !xattr_list_contains(nlist, d_off, newname)) {
> Now here, if name was recalculated to @newname, and @newname is
> found in the nlist, that should raise an error right?  Something
> fishy is going on?

If security.capability is set on a file but the container doesn't have 
security.capability@uid=1000, we still need to list the former here. 
However, we end up with duplicates if security.capability is there and 
security.capability@uid=1000 is also there and root is mapped to 
uid=1000. Both would be shown as security.capability inside the 
container. In this case we need to filter.

I think the code is correct. More problematic is a memory leak in the 
error case. Will fix that.

>
>> +			nlen = strlen(newname);
>> +
>> +			if (nlist) {
>> +				if (nlen + 1 > list_maxlen)
> d_off needs to be set to -ERANGE here.

Fixed.

>
>> +					break;
>> +				strcpy(&nlist[d_off], newname);
>> +			}
>> +
>> +			d_off += nlen + 1;
>> +			if (newname != name)
>> +				kfree(newname);
>> +		}
>> +		s_off += len + 1;
>> +	}
>> +	if (nlist)
>> +		memcpy(list, nlist, d_off);
>> +out_free:
>> +	kfree(nlist);
>> +
>> +	return d_off;
>> +}

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-11 15:05 ` [PATCH v2] xattr: Enable security.capability in user namespaces Stefan Berger
                     ` (2 preceding siblings ...)
  2017-07-12  7:59   ` James Morris
@ 2017-07-12 13:25   ` Eric W. Biederman
  2017-07-12 17:03     ` Serge E. Hallyn
  2017-07-12 17:53   ` Vivek Goyal
                     ` (2 subsequent siblings)
  6 siblings, 1 reply; 76+ messages in thread
From: Eric W. Biederman @ 2017-07-12 13:25 UTC (permalink / raw)
  To: Stefan Berger
  Cc: containers, lkp, linux-kernel, zohar, tycho, serge,
	James.Bottomley, vgoyal, christian.brauner, amir73il,
	linux-security-module, casey, Stefan Berger

Stefan Berger <"Stefan Bergerstefanb"@linux.vnet.ibm.com> writes:

> From: Stefan Berger <stefanb@linux.vnet.ibm.com>
>
> This patch enables security.capability in user namespaces but also
> takes a more general approach to enabling extended attributes in user
> namespaces.
>
> The following rules describe the approach using security.foo as a
> 'user namespace enabled' extended attribute:
>
> Reading of extended attributes:
>
> 1a) Reading security.foo from a user namespace will read
>     security.foo@uid=<uid> of the parent user namespace instead with uid
>     being the mapping of root in that parent user namespace. An
>     exception is if root is mapped to uid 0 on the host, and in this case
>     we will read security.foo directly.
>     --> reading security.foo will read security.foo@uid=1000 for uid
>         mapping of root to 1000.
>
> 1b) If security.foo@uid=<uid> is not available, the security.foo of the
>     parent namespace is tried to be read. This procedure is repeated up to
>     the init user namespace. This step only applies for reading of extended
>     attributes and provides the same behavior as older system where the
>     host's extended attributes applied to user namespaces.
>
> 2) All security.foo@uid=<uid> with valid uid mapping in the user namespace
>    can be read. The uid within the user namespace will be mapped to the
>    corresponding uid on the host and that uid will be used in the name of
>    the extended attribute.
>    -> reading security.foo@uid=1 will read security.foo@uid=1001 for uid
>       mapping of root to 1000, size of at least 2.
>
>    All security.foo@uid=<uid> can be read (by root) on the host with values
>    of <uid> also being subject to checking for valid mappings.
>
> 3) No other security.foo* can be read.
>
> The same rules for reading apply to writing and removing of user
> namespace enabled extended attributes.
>
> When listing extended attributes of a file, only those are presented
> to the user namespace that have a valid mapping. Besides that, names
> of the extended attributes are adjusted to represent the mapping.
> This means that if root is mapped to uid 1000 on the host, the
> security.foo@uid=1000 will be listed as security.foo in the user
> namespace, security.foo@uid=1001 becomes security.foo@uid=1 and so on.
>
> Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
> Signed-off-by: Serge Hallyn <serge@hallyn.com>
> Reviewed-by: Serge Hallyn <serge@hallyn.com>

It doesn't look like this is coming through Serge so I don't see how
the Signed-off-by tag is legtimate.

>From the replies to this it doesn't look like Serge has reviewed this
version either.

I am disappointed that all of my concerns about technical feasibility
remain unaddressed.

I hope my reading and review of the code goes better than my reading of
it's introduction.

Eric


> ---
>  fs/xattr.c               | 509 +++++++++++++++++++++++++++++++++++++++++++++--
>  security/commoncap.c     |  36 +++-
>  security/selinux/hooks.c |   9 +-
>  3 files changed, 523 insertions(+), 31 deletions(-)
>
> diff --git a/fs/xattr.c b/fs/xattr.c
> index 464c94b..eacad9e 100644
> --- a/fs/xattr.c
> +++ b/fs/xattr.c
> @@ -133,20 +133,440 @@ xattr_permission(struct inode *inode, const char *name, int mask)
>  	return inode_permission(inode, mask);
>  }
>  
> +/*
> + * A list of extended attributes that are supported in user namespaces
> + */
> +static const char *const userns_xattrs[] = {
> +	XATTR_NAME_CAPS,
> +	NULL
> +};
> +
> +/*
> + * xattrs_is_userns_supported - Check whether an xattr is supported in userns
> + *
> + * @name:   full name of the extended attribute
> + * @prefix: do a prefix match (true) or a full match (false)
> + *
> + * This function returns < 0 if not supported, an index into userns_xattrs[]
> + * otherwise.
> + */
> +static int
> +xattr_is_userns_supported(const char *name, int prefix)
> +{
> +	int i;
> +
> +	if (!name)
> +		return -1;
> +
> +	for (i = 0; userns_xattrs[i]; i++) {
> +		if (prefix) {
> +			if (!strncmp(userns_xattrs[i], name,
> +				     strlen(userns_xattrs[i])))
> +				return i;
> +		} else {
> +			if (!strcmp(userns_xattrs[i], name))
> +				return i;
> +		}
> +	}
> +	return -1;
> +}
> +
> +/*
> + * xattr_write_uid - print a string in the format of "%s@uid=%u", which
> + *                   includes a prefix string
> + *
> + * @uid:     the uid
> + * @prefix:  prefix string; may be NULL
> + *
> + * This function returns a buffer with the string, or a NULL pointer in
> + * case of out-of-memory error.
> + */
> +static char *
> +xattr_write_uid(uid_t uid, const char *prefix)
> +{
> +	size_t buflen;
> +	char *buffer;
> +
> +	buflen = sizeof("@uid=") - 1 + sizeof("4294967295") - 1 + 1;
> +	if (prefix)
> +		buflen += strlen(prefix);
> +
> +	buffer = kmalloc(buflen, GFP_KERNEL);
> +	if (!buffer)
> +		return NULL;
> +
> +	if (uid == 0)
> +		*buffer = 0;
> +	else
> +		sprintf(buffer, "%s@uid=%u",
> +			(prefix) ? prefix : "",
> +			uid);
> +
> +	return buffer;
> +}
> +
> +/*
> + * xattr_parse_uid_from_kuid - parse string in the format @uid=<uid>; consider
> + *                             user namespaces and check mappings
> + *
> + * @uidstr   : string in the format "@uid=<uid>"
> + * @userns   : the user namespace to consult for uid mappings
> + * @n_uidstr : returned pointer holding the rewritten @uid=<uid> string with
> + *             the uid remapped
> + *
> + * This function returns an error code or 0 in case of success. In case
> + * of success, 'n_uidstr' will hold a valid string.
> + */
> +static int
> +xattr_parse_uid_from_kuid(const char *uidstr, struct user_namespace *userns,
> +			  char **n_uidstr)
> +{
> +	int n;
> +	uid_t muid, p_uid;
> +	char d;
> +	kuid_t tuid;
> +
> +	*n_uidstr = NULL;
> +
> +	n = sscanf(uidstr, "@uid=%u%c", &p_uid, &d);
> +	if (n != 1)
> +		return -EINVAL;
> +
> +	/* do we have a mapping of the uid? */
> +	tuid = KUIDT_INIT(p_uid);
> +	muid = from_kuid(userns, tuid);
> +	if (muid == -1)
> +		return -ENOENT;
> +
> +	*n_uidstr = xattr_write_uid(muid, NULL);
> +	if (!*n_uidstr)
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +/*
> + * xattr_parse_uid_make_kuid - parse string in the format @uid=<uid>; consider
> + *                             user namespaces and check mappings
> + *
> + * @uidstr   : string in the format "@uid=<uid>"
> + * @userns   : the user namespace to consult for uid mappings
> + * @N_uidstr : returned pointer holding the rewritten @uid=<uid> string with
> + *             the uid remapped
> + *
> + * This function returns an error code or 0 in case of success. In case
> + * of success, 'n_uidstr' will hold a valid string.
> + */
> +static int
> +xattr_parse_uid_make_kuid(const char *uidstr, struct user_namespace *userns,
> +			  char **n_uidstr)
> +{
> +	int n;
> +	uid_t p_uid;
> +	char d;
> +	kuid_t tuid;
> +
> +	*n_uidstr = NULL;
> +
> +	n = sscanf(uidstr, "@uid=%u%c", &p_uid, &d);
> +	if (n != 1)
> +		return -EINVAL;
> +
> +	tuid = make_kuid(userns, p_uid);
> +	if (!uid_valid(tuid))
> +		return -ENOENT;
> +
> +	*n_uidstr = xattr_write_uid(__kuid_val(tuid), NULL);
> +	if (!*n_uidstr)
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +/*
> + * xattr_rewrite_userns_xattr - Rewrite and filter an extended attribute
> + *                              considering user namespace uid mappings and
> + *                              user namespace support extended attributes
> + *
> + * @name: full name of the extended attribute
> + *
> + * This function returns NULL if the name is to be filtered. Otherwise it can
> + * return the input buffer or a new buffer that the caller needs to free. The
> + * new buffer contains a rewritten extended attribute whose string length may
> + * exceed that of the given name.
> + */
> +static char *
> +xattr_rewrite_userns_xattr(char *name)
> +{
> +	int idx, error;
> +	size_t len = 0, buflen;
> +	char *buffer, *n_uidstr;
> +
> +	/* prefix-match name against supported attributes */
> +	idx = xattr_is_userns_supported(name, true);
> +	if (idx < 0) {
> +		/* only rewrite those in userns_xattr[*] */
> +		return name;
> +	}
> +
> +	/* exact match ? */
> +	len = strlen(userns_xattrs[idx]);
> +	if (name[len] == 0)
> +		return NULL;
> +
> +	/*
> +	 * We must have a name[len] == '@'.
> +	 */
> +	error = xattr_parse_uid_from_kuid(&name[len], current_user_ns(),
> +					  &n_uidstr);
> +	if (error)
> +		return NULL;
> +
> +	buflen = len + strlen(n_uidstr) + 1;
> +	buffer = kmalloc(buflen, GFP_KERNEL);
> +	if (!buffer) {
> +		kfree(n_uidstr);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	name[len] = 0;
> +
> +	snprintf(buffer, buflen, "%s%s", name, n_uidstr);
> +
> +	name[len] = '@';
> +
> +	kfree(n_uidstr);
> +
> +	return buffer;
> +}
> +
> +/*
> + * xattr_list_contains - check whether an xattr list already contains a needle
> + *
> + * @list    : 0-byte separated strings
> + * @listlen : length of the list
> + * @needle  : the needle to search for
> + */
> +static int
> +xattr_list_contains(const char *list, size_t listlen, const char *needle)
> +{
> +	size_t o = 0;
> +
> +	while (o < listlen) {
> +		if (!strcmp(&list[o], needle))
> +			return true;
> +		o += strlen(&list[o]) + 1;
> +	}
> +	return false;
> +}
> +
> +/*
> + * xattr_list_userns_rewrite - Rewrite list of xattr names for user namespaces
> + *                             or determine needed size for attribute list
> + *                             in case size == 0
> + *
> + * In a user namespace we do not present all extended attributes to the
> + * user. We filter out those that are in the list of userns supported xattr.
> + * Besides that we filter out those with @uid=<uid> when there is no mapping
> + * for that uid in the current user namespace.
> + *
> + * @list:        list of 0-byte separated xattr names
> + * @size:        the size of the list; may be 0 to determine needed list size
> + * @list_maxlen: allocated buffer size of list
> + */
> +static ssize_t
> +xattr_list_userns_rewrite(char *list, ssize_t size, size_t list_maxlen)
> +{
> +	char *nlist = NULL;
> +	size_t s_off, len, nlen;
> +	ssize_t d_off;
> +	char *name, *newname;
> +
> +	if (!list || size < 0 || current_user_ns() == &init_user_ns)
> +		return size;
> +
> +	if (size) {
> +		nlist = kmalloc(list_maxlen, GFP_KERNEL);
> +		if (!nlist)
> +			return -ENOMEM;
> +	}
> +
> +	s_off = d_off = 0;
> +	while (s_off < size || size == 0) {
> +		name = &list[s_off];
> +
> +		len = strlen(name);
> +		if (!len)
> +			break;
> +
> +		if (xattr_is_userns_supported(name, false) >= 0)
> +			newname = name;
> +		else {
> +			newname = xattr_rewrite_userns_xattr(name);
> +			if (IS_ERR(newname)) {
> +				d_off = PTR_ERR(newname);
> +				goto out_free;
> +			}
> +		}
> +		if (newname && !xattr_list_contains(nlist, d_off, newname)) {
> +			nlen = strlen(newname);
> +
> +			if (nlist) {
> +				if (nlen + 1 > list_maxlen)
> +					break;
> +				strcpy(&nlist[d_off], newname);
> +			}
> +
> +			d_off += nlen + 1;
> +			if (newname != name)
> +				kfree(newname);
> +		}
> +		s_off += len + 1;
> +	}
> +	if (nlist)
> +		memcpy(list, nlist, d_off);
> +out_free:
> +	kfree(nlist);
> +
> +	return d_off;
> +}
> +
> +/*
> + * xattr_userns_name - modify the name of a user namespace supported
> + *                     extended attribute
> + *
> + * In a user namespace we prevent read/write accesses to the host's
> + * security.foo to protect these extended attributes.
> + *
> + * Reading:
> + * 1a) Reading security.foo from a user namespace will read
> + *     security.foo@uid=<uid> of the parent user namespace instead with uid
> + *     being the mapping of root in that parent user namespace. An
> + *     exception is if root is mapped to uid 0 on the host, and in this case
> + *     we will read security.foo directly.
> + *     -> reading security.foo will read security.foo@uid=1000 for a uid
> + *        mapping of root to 1000.
> + *
> + * 1b) If security.foo@uid=<uid> is not available, the security.foo of the
> + *     parent namespace is tried to be read. This procedure is repeated up to
> + *     the init user namespace. This step only applies for reading of extended
> + *     attributes and provides the same behavior as older systems where the
> + *     host's extended attributes applied to user namespaces.
> + *
> + * 2) All security.foo@uid=<uid> with valid uid mappings in the user namespace
> + *    an be read. The uid within the user namespace will be mapped to the
> + *    corresponding uid on the host and that uid will be used in the name of
> + *    the extended attribute.
> + *    -> reading security.foo@uid=1 will read security.foo@uid=1001 for a uid
> + *       mapping of root to 1000, size of at least 2.
> + *
> + *    All security.foo@uid=<uid> can be read (by root) on the host with values
> + *    of <uid> also being subject to checking for valid mappings.
> + *
> + * 3) No other security.foo* can be read.
> + *
> + * Writing and removing:
> + * The same rules for reading apply to writing and removing, except for 1b).
> + *
> + * This function returns a buffer with either the original name or the
> + * user namespace adjusted name of the extended attribute.
> + *
> + * @name:     the full name of the extended attribute, e.g. security.foo
> + */
> +char *
> +xattr_userns_name(const char *name, struct user_namespace *userns)
> +{
> +	size_t buflen;
> +	char *buffer, *n_uidstr;
> +	kuid_t root_uid = make_kuid(userns, 0);
> +	int idx, error;
> +	size_t len;
> +
> +	/* only security.foo will be changed here - prefix match here */
> +	idx = xattr_is_userns_supported(name, true);
> +	if (idx < 0)
> +		goto out_copy;
> +
> +	/* read security.foo? --> read security.foo@uid=<uid> instead */
> +	len = strlen(userns_xattrs[idx]);
> +	if (name[len] == 0) {
> +		/*
> +		 * init_user_ns or userns with root mapped to uid 0
> +		 * may read security.foo directly
> +		 */
> +		if (userns == &init_user_ns ||
> +		    __kuid_val(root_uid) == 0)
> +			goto out_copy;
> +
> +		if (!uid_valid(root_uid))
> +			return ERR_PTR(-EINVAL);
> +
> +		buffer = xattr_write_uid(__kuid_val(root_uid), name);
> +		if (!buffer)
> +			return ERR_PTR(-ENOMEM);
> +
> +		return buffer;
> +	}
> +
> +	/*
> +	 * We must have name[len] == '@'.
> +	 */
> +	error = xattr_parse_uid_make_kuid(&name[len],
> +					  userns,
> +					  &n_uidstr);
> +	if (error)
> +		return ERR_PTR(error);
> +
> +	/* name[len] == '@' */
> +	buflen = len + strlen(n_uidstr) + 1;
> +	buffer = kmalloc(buflen, GFP_KERNEL);
> +	if (!buffer) {
> +		kfree(n_uidstr);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	snprintf(buffer, len + 1, "%s", name);
> +	snprintf(&buffer[len], buflen - len, "%s", n_uidstr);
> +	kfree(n_uidstr);
> +
> +	return buffer;
> +
> +out_copy:
> +	buffer = kstrdup(name, GFP_KERNEL);
> +	if (!buffer)
> +		return ERR_PTR(-ENOMEM);
> +
> +	return buffer;
> +}
> +
>  int
>  __vfs_setxattr(struct dentry *dentry, struct inode *inode, const char *name,
>  	       const void *value, size_t size, int flags)
>  {
>  	const struct xattr_handler *handler;
> +	char *newname;
> +	int ret;
>  
> +	newname = xattr_userns_name(name, current_user_ns());
> +	if (IS_ERR(newname))
> +		return PTR_ERR(newname);
> +	name = newname;
>  	handler = xattr_resolve_name(inode, &name);
> -	if (IS_ERR(handler))
> -		return PTR_ERR(handler);
> -	if (!handler->set)
> -		return -EOPNOTSUPP;
> +	if (IS_ERR(handler)) {
> +		ret = PTR_ERR(handler);
> +		goto out;
> +	}
> +	if (!handler->set) {
> +		ret = -EOPNOTSUPP;
> +		goto out;
> +	}
>  	if (size == 0)
>  		value = "";  /* empty EA, do not remove */
> -	return handler->set(handler, dentry, inode, name, value, size, flags);
> +	ret = handler->set(handler, dentry, inode, name, value, size, flags);
> +
> +out:
> +	kfree(newname);
> +	return ret;
>  }
>  EXPORT_SYMBOL(__vfs_setxattr);
>  
> @@ -301,14 +721,39 @@ ssize_t
>  __vfs_getxattr(struct dentry *dentry, struct inode *inode, const char *name,
>  	       void *value, size_t size)
>  {
> -	const struct xattr_handler *handler;
> +	const struct xattr_handler *handler = NULL;
> +	char *newname =  NULL;
> +	int ret, userns_supt_xattr;
> +	struct user_namespace *userns = current_user_ns();
> +
> +	userns_supt_xattr = (xattr_is_userns_supported(name, false) >= 0);
> +
> +	do {
> +		kfree(newname);
> +
> +		newname = xattr_userns_name(name, userns);
> +		if (IS_ERR(newname))
> +			return PTR_ERR(newname);
> +
> +		if (!handler) {
> +			name = newname;
> +			handler = xattr_resolve_name(inode, &name);
> +			if (IS_ERR(handler)) {
> +				ret = PTR_ERR(handler);
> +				goto out;
> +			}
> +			if (!handler->get) {
> +				ret = -EOPNOTSUPP;
> +				goto out;
> +			}
> +		}
> +		ret = handler->get(handler, dentry, inode, name, value, size);
> +		userns = userns->parent;
> +	} while ((ret == -ENODATA) && userns && userns_supt_xattr);
>  
> -	handler = xattr_resolve_name(inode, &name);
> -	if (IS_ERR(handler))
> -		return PTR_ERR(handler);
> -	if (!handler->get)
> -		return -EOPNOTSUPP;
> -	return handler->get(handler, dentry, inode, name, value, size);
> +out:
> +	kfree(newname);
> +	return ret;
>  }
>  EXPORT_SYMBOL(__vfs_getxattr);
>  
> @@ -328,8 +773,16 @@ vfs_getxattr(struct dentry *dentry, const char *name, void *value, size_t size)
>  
>  	if (!strncmp(name, XATTR_SECURITY_PREFIX,
>  				XATTR_SECURITY_PREFIX_LEN)) {
> -		const char *suffix = name + XATTR_SECURITY_PREFIX_LEN;
> -		int ret = xattr_getsecurity(inode, suffix, value, size);
> +		int ret;
> +		const char *suffix;
> +		char *newname = xattr_userns_name(name, current_user_ns());
> +		if (IS_ERR(newname))
> +			return PTR_ERR(newname);
> +
> +		suffix = newname + XATTR_SECURITY_PREFIX_LEN;
> +
> +		ret = xattr_getsecurity(inode, suffix, value, size);
> +		kfree(newname);
>  		/*
>  		 * Only overwrite the return value if a security module
>  		 * is actually active.
> @@ -360,6 +813,9 @@ vfs_listxattr(struct dentry *dentry, char *list, size_t size)
>  		if (size && error > size)
>  			error = -ERANGE;
>  	}
> +	if (error > 0)
> +		error = xattr_list_userns_rewrite(list, error, size);
> +
>  	return error;
>  }
>  EXPORT_SYMBOL_GPL(vfs_listxattr);
> @@ -369,13 +825,28 @@ __vfs_removexattr(struct dentry *dentry, const char *name)
>  {
>  	struct inode *inode = d_inode(dentry);
>  	const struct xattr_handler *handler;
> +	char *newname;
> +	int ret;
>  
> +	newname = xattr_userns_name(name, current_user_ns());
> +	if (IS_ERR(newname))
> +		return PTR_ERR(newname);
> +	name = newname;
>  	handler = xattr_resolve_name(inode, &name);
> -	if (IS_ERR(handler))
> -		return PTR_ERR(handler);
> -	if (!handler->set)
> -		return -EOPNOTSUPP;
> -	return handler->set(handler, dentry, inode, name, NULL, 0, XATTR_REPLACE);
> +	if (IS_ERR(handler)) {
> +		ret = PTR_ERR(handler);
> +		goto out;
> +	}
> +	if (!handler->set) {
> +		ret = -EOPNOTSUPP;
> +		goto out;
> +	}
> +	ret = handler->set(handler, dentry, inode, name, NULL, 0, XATTR_REPLACE);
> +
> +out:
> +	kfree(newname);
> +
> +	return ret;
>  }
>  EXPORT_SYMBOL(__vfs_removexattr);
>  
> diff --git a/security/commoncap.c b/security/commoncap.c
> index 7abebd7..c842690 100644
> --- a/security/commoncap.c
> +++ b/security/commoncap.c
> @@ -660,15 +660,23 @@ int cap_bprm_secureexec(struct linux_binprm *bprm)
>  int cap_inode_setxattr(struct dentry *dentry, const char *name,
>  		       const void *value, size_t size, int flags)
>  {
> -	if (!strcmp(name, XATTR_NAME_CAPS)) {
> -		if (!capable(CAP_SETFCAP))
> +	if (strncmp(name, XATTR_SECURITY_PREFIX,
> +		    sizeof(XATTR_SECURITY_PREFIX) - 1) != 0)
> +		return 0;
> +
> +	if (strncmp(name, XATTR_NAME_CAPS,
> +		    sizeof(XATTR_NAME_CAPS) - 1) == 0) {
> +		struct inode *inode = d_backing_inode(dentry);
> +
> +		if (!inode)
> +			return -EINVAL;
> +		if (!capable_wrt_inode_uidgid(inode, CAP_SETFCAP))
>  			return -EPERM;
> +
>  		return 0;
>  	}
>  
> -	if (!strncmp(name, XATTR_SECURITY_PREFIX,
> -		     sizeof(XATTR_SECURITY_PREFIX) - 1) &&
> -	    !capable(CAP_SYS_ADMIN))
> +	if (!capable(CAP_SYS_ADMIN))
>  		return -EPERM;
>  	return 0;
>  }
> @@ -686,15 +694,23 @@ int cap_inode_setxattr(struct dentry *dentry, const char *name,
>   */
>  int cap_inode_removexattr(struct dentry *dentry, const char *name)
>  {
> -	if (!strcmp(name, XATTR_NAME_CAPS)) {
> -		if (!capable(CAP_SETFCAP))
> +	if (strncmp(name, XATTR_SECURITY_PREFIX,
> +		    sizeof(XATTR_SECURITY_PREFIX) - 1) != 0)
> +		return 0;
> +
> +	if (strncmp(name, XATTR_NAME_CAPS,
> +		    sizeof(XATTR_NAME_CAPS) - 1) == 0) {
> +		struct inode *inode = d_backing_inode(dentry);
> +
> +		if (!inode)
> +			return -EINVAL;
> +		if (!capable_wrt_inode_uidgid(inode, CAP_SETFCAP))
>  			return -EPERM;
> +
>  		return 0;
>  	}
>  
> -	if (!strncmp(name, XATTR_SECURITY_PREFIX,
> -		     sizeof(XATTR_SECURITY_PREFIX) - 1) &&
> -	    !capable(CAP_SYS_ADMIN))
> +	if (!capable(CAP_SYS_ADMIN))
>  		return -EPERM;
>  	return 0;
>  }
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 819fd68..702c225 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -3091,8 +3091,13 @@ static int selinux_inode_setotherxattr(struct dentry *dentry, const char *name)
>  
>  	if (!strncmp(name, XATTR_SECURITY_PREFIX,
>  		     sizeof XATTR_SECURITY_PREFIX - 1)) {
> -		if (!strcmp(name, XATTR_NAME_CAPS)) {
> -			if (!capable(CAP_SETFCAP))
> +		if (!strncmp(name, XATTR_NAME_CAPS,
> +			     sizeof(XATTR_NAME_CAPS) - 1)) {
> +			struct inode *inode = d_backing_inode(dentry);
> +
> +			if (!inode)
> +				return -EINVAL;
> +			if (!capable_wrt_inode_uidgid(inode, CAP_SETFCAP))
>  				return -EPERM;
>  		} else if (!capable(CAP_SYS_ADMIN)) {
>  			/* A different attribute in the security namespace.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-12 13:25   ` Eric W. Biederman
@ 2017-07-12 17:03     ` Serge E. Hallyn
  2017-07-12 22:20       ` James Morris
  2017-07-12 23:13       ` Eric W. Biederman
  0 siblings, 2 replies; 76+ messages in thread
From: Serge E. Hallyn @ 2017-07-12 17:03 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Stefan Berger, containers, lkp, linux-kernel, zohar, tycho,
	serge, James.Bottomley, vgoyal, christian.brauner, amir73il,
	linux-security-module, casey

Quoting Eric W. Biederman (ebiederm@xmission.com):
> Stefan Berger <"Stefan Bergerstefanb"@linux.vnet.ibm.com> writes:
> > Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
> > Signed-off-by: Serge Hallyn <serge@hallyn.com>
> > Reviewed-by: Serge Hallyn <serge@hallyn.com>
> 
> It doesn't look like this is coming through Serge so I don't see how
> the Signed-off-by tag is legtimate.

This is mostly explained by the fact that there have been a *lot* of
changes, many of them discussed in private emails.

> >From the replies to this it doesn't look like Serge has reviewed this
> version either.
> 
> I am disappointed that all of my concerns about technical feasibility
> remain unaddressed.

Can you re-state those, or give a link to them?

I'd really like to get to a point where unprivileged containers can start
using filecaps - at this point if that means having an extra temporary
file format based on my earlier patchset while we hash this out, that
actually seems worthwhile.  But it would of course be ideal if we could
do the name based caps right in the first place.

-serge

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-12 11:35     ` Stefan Berger
@ 2017-07-12 17:32       ` Serge E. Hallyn
  0 siblings, 0 replies; 76+ messages in thread
From: Serge E. Hallyn @ 2017-07-12 17:32 UTC (permalink / raw)
  To: Stefan Berger
  Cc: Serge E. Hallyn, ebiederm, containers, lkp, linux-kernel, zohar,
	tycho, James.Bottomley, vgoyal, christian.brauner, amir73il,
	linux-security-module, casey

Quoting Stefan Berger (stefanb@linux.vnet.ibm.com):
> On 07/11/2017 11:45 PM, Serge E. Hallyn wrote:
> >Quoting Stefan Berger (Stefan Bergerstefanb@linux.vnet.ibm.com):
> >>+/*
> >>+ * xattr_list_userns_rewrite - Rewrite list of xattr names for user namespaces
> >>+ *                             or determine needed size for attribute list
> >>+ *                             in case size == 0
> >>+ *
> >>+ * In a user namespace we do not present all extended attributes to the
> >>+ * user. We filter out those that are in the list of userns supported xattr.
> >>+ * Besides that we filter out those with @uid=<uid> when there is no mapping
> >>+ * for that uid in the current user namespace.
> >>+ *
> >>+ * @list:        list of 0-byte separated xattr names
> >>+ * @size:        the size of the list; may be 0 to determine needed list size
> >>+ * @list_maxlen: allocated buffer size of list
> >>+ */
> >>+static ssize_t
> >>+xattr_list_userns_rewrite(char *list, ssize_t size, size_t list_maxlen)
> >>+{
> >>+	char *nlist = NULL;
> >>+	size_t s_off, len, nlen;
> >>+	ssize_t d_off;
> >>+	char *name, *newname;
> >>+
> >>+	if (!list || size < 0 || current_user_ns() == &init_user_ns)
> >>+		return size;
> >>+
> >>+	if (size) {
> >>+		nlist = kmalloc(list_maxlen, GFP_KERNEL);
> >>+		if (!nlist)
> >>+			return -ENOMEM;
> >>+	}
> >>+
> >>+	s_off = d_off = 0;
> >>+	while (s_off < size || size == 0) {
> >>+		name = &list[s_off];
> >>+
> >>+		len = strlen(name);
> >>+		if (!len)
> >>+			break;
> >>+
> >>+		if (xattr_is_userns_supported(name, false) >= 0)
> >>+			newname = name;
> >>+		else {
> >>+			newname = xattr_rewrite_userns_xattr(name);
> >Why are you doing this here?  If we get here it means that
> >xattr_is_userns_supported() returned < 0, meaning name is
> >not userns-supported.  So xattr_rewrite_userns_xattr() will
> >just return name.  Am I missing something?
> 
> xattr_is_userns_support(name, false) does a _full string match_
> rather than a prefix match and will only return >= 0 for
> security.capability. This case handles the hosts's
> security.capability which  'shines through' for read and needs to be
> listed. Only in this case we set newname=name.

Ah, right.

I think it would be worth #defining XATTR_PREFIX_SEARCH and
XATTR_FULLNAME_SEARCH or something.  Or maybe not, maybe I was
just being dense.

> In the else branch we handle security.capability@uid=1000 and
> rewrite that to security.capability for root mapping to uid=1000.
> 
> >
> >>+			if (IS_ERR(newname)) {
> >>+				d_off = PTR_ERR(newname);
> >>+				goto out_free;
> >>+			}
> >>+		}
> >>+		if (newname && !xattr_list_contains(nlist, d_off, newname)) {
> >Now here, if name was recalculated to @newname, and @newname is
> >found in the nlist, that should raise an error right?  Something
> >fishy is going on?
> 
> If security.capability is set on a file but the container doesn't
> have security.capability@uid=1000, we still need to list the former
> here. However, we end up with duplicates if security.capability is
> there and security.capability@uid=1000 is also there and root is
> mapped to uid=1000. Both would be shown as security.capability
> inside the container. In this case we need to filter.

Gotcha, thanks.

> I think the code is correct. More problematic is a memory leak in
> the error case. Will fix that.

Great.

> >
> >>+			nlen = strlen(newname);
> >>+
> >>+			if (nlist) {
> >>+				if (nlen + 1 > list_maxlen)
> >d_off needs to be set to -ERANGE here.
> 
> Fixed.

Great, thanks.

-serge

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-11 15:05 ` [PATCH v2] xattr: Enable security.capability in user namespaces Stefan Berger
                     ` (3 preceding siblings ...)
  2017-07-12 13:25   ` Eric W. Biederman
@ 2017-07-12 17:53   ` Vivek Goyal
  2017-07-12 19:19     ` Stefan Berger
  2017-07-14 23:41   ` Eric W. Biederman
  2017-07-17 18:58   ` Vivek Goyal
  6 siblings, 1 reply; 76+ messages in thread
From: Vivek Goyal @ 2017-07-12 17:53 UTC (permalink / raw)
  To: Stefan Berger
  Cc: ebiederm, containers, lkp, linux-kernel, zohar, tycho, serge,
	James.Bottomley, christian.brauner, amir73il,
	linux-security-module, casey, Stefan Berger

On Tue, Jul 11, 2017 at 11:05:11AM -0400, Stefan Berger wrote:

[..]
> @@ -301,14 +721,39 @@ ssize_t
>  __vfs_getxattr(struct dentry *dentry, struct inode *inode, const char *name,
>  	       void *value, size_t size)
>  {
> -	const struct xattr_handler *handler;
> +	const struct xattr_handler *handler = NULL;
> +	char *newname =  NULL;
> +	int ret, userns_supt_xattr;
> +	struct user_namespace *userns = current_user_ns();
> +
> +	userns_supt_xattr = (xattr_is_userns_supported(name, false) >= 0);
> +

Hi Stephan,

> +	do {
> +		kfree(newname);
> +
> +		newname = xattr_userns_name(name, userns);
					    ^^^
Will name be pointing to a freed string in second iteration of loop. 

> +		if (IS_ERR(newname))
> +			return PTR_ERR(newname);
> +
> +		if (!handler) {
> +			name = newname;

Here we assign name and at the beginning of second iteration we free 
newname.

Also I am not sure why do we do this assignment only if handler is NULL.

BTW, I set cap_sys_admin on a file outside usernamespace and then launched
user namespace (mapping 1000 to 0). And then tried to do getcap on file
and I am not seeing security.capability set by host. Not sure what am I 
doing wrong. getxattr() seems to return -ENODATA. Still debugging it.

Also, have we resovled the question of stacked filesystem like overlayfs.
There we are switching creds to mounter's creds when doing operations on
underlying filesystem. I am concenrned does that mean, we will get and
return security.capability to caller in usernamespace instead of
security.capability@uid=1000. 

Vivek

> +			handler = xattr_resolve_name(inode, &name);
> +			if (IS_ERR(handler)) {
> +				ret = PTR_ERR(handler);
> +				goto out;
> +			}
> +			if (!handler->get) {
> +				ret = -EOPNOTSUPP;
> +				goto out;
> +			}
> +		}
> +		ret = handler->get(handler, dentry, inode, name, value, size);
> +		userns = userns->parent;
> +	} while ((ret == -ENODATA) && userns && userns_supt_xattr);
>  
> -	handler = xattr_resolve_name(inode, &name);
> -	if (IS_ERR(handler))
> -		return PTR_ERR(handler);
> -	if (!handler->get)
> -		return -EOPNOTSUPP;
> -	return handler->get(handler, dentry, inode, name, value, size);
> +out:
> +	kfree(newname);
> +	return ret;
>  }
>  EXPORT_SYMBOL(__vfs_getxattr);
>  

Thanks
Vivek

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-12 17:53   ` Vivek Goyal
@ 2017-07-12 19:19     ` Stefan Berger
  0 siblings, 0 replies; 76+ messages in thread
From: Stefan Berger @ 2017-07-12 19:19 UTC (permalink / raw)
  To: Vivek Goyal, Stefan Berger
  Cc: ebiederm, containers, lkp, linux-kernel, zohar, tycho, serge,
	James.Bottomley, christian.brauner, amir73il,
	linux-security-module, casey

On 07/12/2017 01:53 PM, Vivek Goyal wrote:
> On Tue, Jul 11, 2017 at 11:05:11AM -0400, Stefan Berger wrote:
>
> [..]
>> @@ -301,14 +721,39 @@ ssize_t
>>   __vfs_getxattr(struct dentry *dentry, struct inode *inode, const char *name,
>>   	       void *value, size_t size)
>>   {
>> -	const struct xattr_handler *handler;
>> +	const struct xattr_handler *handler = NULL;
>> +	char *newname =  NULL;
>> +	int ret, userns_supt_xattr;
>> +	struct user_namespace *userns = current_user_ns();
>> +
>> +	userns_supt_xattr = (xattr_is_userns_supported(name, false) >= 0);
>> +
> Hi Stephan,
>
>> +	do {
>> +		kfree(newname);
>> +
>> +		newname = xattr_userns_name(name, userns);
> 					    ^^^
> Will name be pointing to a freed string in second iteration of loop.

Fixing for v3.

>
>> +		if (IS_ERR(newname))
>> +			return PTR_ERR(newname);
>> +
>> +		if (!handler) {
>> +			name = newname;
> Here we assign name and at the beginning of second iteration we free
> newname.
>
> Also I am not sure why do we do this assignment only if handler is NULL.

The handler shouldn't change but this optimization isn't helpful. Fixed 
through this patch:

https://github.com/stefanberger/linux/commit/10828401b29a13f8c56f8fad0c0fb2690e4af878


>
> BTW, I set cap_sys_admin on a file outside usernamespace and then launched
> user namespace (mapping 1000 to 0). And then tried to do getcap on file
> and I am not seeing security.capability set by host. Not sure what am I
> doing wrong. getxattr() seems to return -ENODATA. Still debugging it.

This was a regression due to the bug in the loop. I didn't have a test 
case (with runc) for it, now I do.

>
> Also, have we resovled the question of stacked filesystem like overlayfs.
> There we are switching creds to mounter's creds when doing operations on
> underlying filesystem. I am concenrned does that mean, we will get and
> return security.capability to caller in usernamespace instead of
> security.capability@uid=1000.

I would have to test this, otherwise I don't know. I'll try it out with 
Docker.

    Stefan

>
> Vivek
>
>> +			handler = xattr_resolve_name(inode, &name);
>> +			if (IS_ERR(handler)) {
>> +				ret = PTR_ERR(handler);
>> +				goto out;
>> +			}
>> +			if (!handler->get) {
>> +				ret = -EOPNOTSUPP;
>> +				goto out;
>> +			}
>> +		}
>> +		ret = handler->get(handler, dentry, inode, name, value, size);
>> +		userns = userns->parent;
>> +	} while ((ret == -ENODATA) && userns && userns_supt_xattr);
>>   
>> -	handler = xattr_resolve_name(inode, &name);
>> -	if (IS_ERR(handler))
>> -		return PTR_ERR(handler);
>> -	if (!handler->get)
>> -		return -EOPNOTSUPP;
>> -	return handler->get(handler, dentry, inode, name, value, size);
>> +out:
>> +	kfree(newname);
>> +	return ret;
>>   }
>>   EXPORT_SYMBOL(__vfs_getxattr);
>>   
> Thanks
> Vivek
>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-12 17:03     ` Serge E. Hallyn
@ 2017-07-12 22:20       ` James Morris
  2017-07-13  0:33         ` Eric W. Biederman
  2017-07-12 23:13       ` Eric W. Biederman
  1 sibling, 1 reply; 76+ messages in thread
From: James Morris @ 2017-07-12 22:20 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Eric W. Biederman, Stefan Berger, containers, lkp, linux-kernel,
	zohar, tycho, James.Bottomley, vgoyal, christian.brauner,
	amir73il, linux-security-module, casey

On Wed, 12 Jul 2017, Serge E. Hallyn wrote:

> Quoting Eric W. Biederman (ebiederm@xmission.com):
> > Stefan Berger <"Stefan Bergerstefanb"@linux.vnet.ibm.com> writes:
> > > Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
> > > Signed-off-by: Serge Hallyn <serge@hallyn.com>
> > > Reviewed-by: Serge Hallyn <serge@hallyn.com>
> > 
> > It doesn't look like this is coming through Serge so I don't see how
> > the Signed-off-by tag is legtimate.
> 
> This is mostly explained by the fact that there have been a *lot* of
> changes, many of them discussed in private emails.

Please try and keep technical discussions public or at least document them 
when reposting the patches.

-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-12 17:03     ` Serge E. Hallyn
  2017-07-12 22:20       ` James Morris
@ 2017-07-12 23:13       ` Eric W. Biederman
  2017-07-13  0:43         ` Serge E. Hallyn
  2017-07-13  0:44         ` Stefan Berger
  1 sibling, 2 replies; 76+ messages in thread
From: Eric W. Biederman @ 2017-07-12 23:13 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Stefan Berger, containers, lkp, linux-kernel, zohar, tycho,
	James.Bottomley, vgoyal, christian.brauner, amir73il,
	linux-security-module, casey

"Serge E. Hallyn" <serge@hallyn.com> writes:

> Quoting Eric W. Biederman (ebiederm@xmission.com):
>> Stefan Berger <"Stefan Bergerstefanb"@linux.vnet.ibm.com> writes:
>> > Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
>> > Signed-off-by: Serge Hallyn <serge@hallyn.com>
>> > Reviewed-by: Serge Hallyn <serge@hallyn.com>
>> 
>> It doesn't look like this is coming through Serge so I don't see how
>> the Signed-off-by tag is legtimate.
>
> This is mostly explained by the fact that there have been a *lot* of
> changes, many of them discussed in private emails.
>
>> >From the replies to this it doesn't look like Serge has reviewed this
>> version either.
>> 
>> I am disappointed that all of my concerns about technical feasibility
>> remain unaddressed.
>
> Can you re-state those, or give a link to them?

Well I only posted about one substantive comment on the last round
so it should be easy to find that said.

The big question is how does this intereact with filesystems
xattr implementations?

There is the potential that we create many more security xattrs this
way.  How does that scale?  With more names etc.
What happens if we have one xattr per uid for 1000+ uids?

How does this interact with filesystems optimization of xattr names?
For some filesystems they optmize the xattr names, and don't store the
entire thing.

> I'd really like to get to a point where unprivileged containers can start
> using filecaps - at this point if that means having an extra temporary
> file format based on my earlier patchset while we hash this out, that
> actually seems worthwhile.  But it would of course be ideal if we could
> do the name based caps right in the first place.

This whole new version has set my review back to square one
unfortunately.

Eric

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-12 22:20       ` James Morris
@ 2017-07-13  0:33         ` Eric W. Biederman
  2017-07-13  1:01           ` Serge E. Hallyn
  0 siblings, 1 reply; 76+ messages in thread
From: Eric W. Biederman @ 2017-07-13  0:33 UTC (permalink / raw)
  To: James Morris
  Cc: Serge E. Hallyn, Stefan Berger, containers, lkp, linux-kernel,
	zohar, tycho, James.Bottomley, vgoyal, christian.brauner,
	amir73il, linux-security-module, casey

James Morris <jmorris@namei.org> writes:

> On Wed, 12 Jul 2017, Serge E. Hallyn wrote:
>
>> Quoting Eric W. Biederman (ebiederm@xmission.com):
>> > Stefan Berger <"Stefan Bergerstefanb"@linux.vnet.ibm.com> writes:
>> > > Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
>> > > Signed-off-by: Serge Hallyn <serge@hallyn.com>
>> > > Reviewed-by: Serge Hallyn <serge@hallyn.com>
>> > 
>> > It doesn't look like this is coming through Serge so I don't see how
>> > the Signed-off-by tag is legtimate.
>> 
>> This is mostly explained by the fact that there have been a *lot* of
>> changes, many of them discussed in private emails.
>
> Please try and keep technical discussions public or at least document them 
> when reposting the patches.

Yes please.

A public discussion helps to understand what the challenges are.

Eric

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-12 23:13       ` Eric W. Biederman
@ 2017-07-13  0:43         ` Serge E. Hallyn
  2017-07-13  0:44         ` Stefan Berger
  1 sibling, 0 replies; 76+ messages in thread
From: Serge E. Hallyn @ 2017-07-13  0:43 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge E. Hallyn, Stefan Berger, containers, lkp, linux-kernel,
	zohar, tycho, James.Bottomley, vgoyal, christian.brauner,
	amir73il, linux-security-module, casey

Quoting Eric W. Biederman (ebiederm@xmission.com):
> "Serge E. Hallyn" <serge@hallyn.com> writes:
> 
> > Quoting Eric W. Biederman (ebiederm@xmission.com):
> >> Stefan Berger <"Stefan Bergerstefanb"@linux.vnet.ibm.com> writes:
> >> > Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
> >> > Signed-off-by: Serge Hallyn <serge@hallyn.com>
> >> > Reviewed-by: Serge Hallyn <serge@hallyn.com>
> >> 
> >> It doesn't look like this is coming through Serge so I don't see how
> >> the Signed-off-by tag is legtimate.
> >
> > This is mostly explained by the fact that there have been a *lot* of
> > changes, many of them discussed in private emails.
> >
> >> >From the replies to this it doesn't look like Serge has reviewed this
> >> version either.
> >> 
> >> I am disappointed that all of my concerns about technical feasibility
> >> remain unaddressed.
> >
> > Can you re-state those, or give a link to them?
> 
> Well I only posted about one substantive comment on the last round
> so it should be easy to find that said.

Ok so you are likely referring to http://lkml.org/lkml/2017/6/23/551 ,
thanks.  I had actually read that differently when you sent it, and
thought it was more to do with the suggestion of putting the nsid
tags in the middle of the xattr name versus putting it on the end.
As far as that is concerned, note that no other tags besides uid=
are currently supported, and only security.capability is being
namespaced.

> The big question is how does this intereact with filesystems
> xattr implementations?
> 
> There is the potential that we create many more security xattrs this
> way.  How does that scale?  With more names etc.
> What happens if we have one xattr per uid for 1000+ uids?

Well, that's not the intent here.  The goal is *not* to make one fs image
that satisfies 200k possible uid mappings.  The goal is to reconcile the
support for an unprivileged user to set uid agnostic (within the
container) file capabilities with the uid namespace's design goals -
namely root in a container is privileged over the container but
completely unprivileged wrt the host.

This is in part why in my previous version I only allowed a single
namespaced fscap.  But I don't think that we have to enforce a
single fscap - I think it's fair to tell users "go ahead and shoot
yourself in the foot" performance-wise, if they insist on doing
this.

The goal of now putting the root kuid in the name is not to support
multiple containers, but to have common code supporting
security.capability and security.ima, and maybe a few more.

> How does this interact with filesystems optimization of xattr names?
> For some filesystems they optmize the xattr names, and don't store the
> entire thing.

This I have no idea on.  Stefan, have you looked at this?

> > I'd really like to get to a point where unprivileged containers can start
> > using filecaps - at this point if that means having an extra temporary
> > file format based on my earlier patchset while we hash this out, that
> > actually seems worthwhile.  But it would of course be ideal if we could
> > do the name based caps right in the first place.
> 
> This whole new version has set my review back to square one
> unfortunately.

Well it is a whole new approach and whole new patch, so of course that's
to be expected :(

-serge

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-12 23:13       ` Eric W. Biederman
  2017-07-13  0:43         ` Serge E. Hallyn
@ 2017-07-13  0:44         ` Stefan Berger
  2017-07-13  1:15           ` Theodore Ts'o
  1 sibling, 1 reply; 76+ messages in thread
From: Stefan Berger @ 2017-07-13  0:44 UTC (permalink / raw)
  To: Eric W. Biederman, Serge E. Hallyn
  Cc: containers, lkp, linux-kernel, zohar, tycho, James.Bottomley,
	vgoyal, christian.brauner, amir73il, linux-security-module,
	casey

On 07/12/2017 07:13 PM, Eric W. Biederman wrote:
> "Serge E. Hallyn" <serge@hallyn.com> writes:
>
>> Quoting Eric W. Biederman (ebiederm@xmission.com):
>>> Stefan Berger <"Stefan Bergerstefanb"@linux.vnet.ibm.com> writes:
>>>> Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
>>>> Signed-off-by: Serge Hallyn <serge@hallyn.com>
>>>> Reviewed-by: Serge Hallyn <serge@hallyn.com>
>>> It doesn't look like this is coming through Serge so I don't see how
>>> the Signed-off-by tag is legtimate.
>> This is mostly explained by the fact that there have been a *lot* of
>> changes, many of them discussed in private emails.
>>
>>> >From the replies to this it doesn't look like Serge has reviewed this
>>> version either.
>>>
>>> I am disappointed that all of my concerns about technical feasibility
>>> remain unaddressed.
>> Can you re-state those, or give a link to them?
> Well I only posted about one substantive comment on the last round
> so it should be easy to find that said.
>
> The big question is how does this intereact with filesystems
> xattr implementations?
>
> There is the potential that we create many more security xattrs this
> way.  How does that scale?  With more names etc.

It doesn't scale. Shared filesystems are a problem if many containers 
use them.

'man listxattr' also mentions this here as a BUG:

" As noted in xattr(7), the VFS imposes a limit of 64 kB on the size of
        the extended attribute name list returned by listxattr(7). If the
        total size of attribute names attached to a file exceeds this limit,
        it is no longer possible to retrieve the list of attribute names."

A simple test on ext4:

#> touch foo
#> for ((i = 0; i < 200; i++)); do setfattr -n user.foo${i} -v hello 
foo; done

user.foo126 was the last one created...

Depending on the size of the data the xattrs are writing, the limit is 
reached sooner. Writing 'hellohello' only goes up to 'user.foo112'. 
Maybe one could try to encode the data more efficiently or as Serge did 
write the uid on the xattr value side, but either way, it won't scale 
due to that VFS limit.

     Stefan

> What happens if we have one xattr per uid for 1000+ uids?

>
> How does this interact with filesystems optimization of xattr names?
> For some filesystems they optmize the xattr names, and don't store the
> entire thing.
>
>> I'd really like to get to a point where unprivileged containers can start
>> using filecaps - at this point if that means having an extra temporary
>> file format based on my earlier patchset while we hash this out, that
>> actually seems worthwhile.  But it would of course be ideal if we could
>> do the name based caps right in the first place.
> This whole new version has set my review back to square one
> unfortunately.
>
> Eric
>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-13  0:33         ` Eric W. Biederman
@ 2017-07-13  1:01           ` Serge E. Hallyn
  0 siblings, 0 replies; 76+ messages in thread
From: Serge E. Hallyn @ 2017-07-13  1:01 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: James Morris, Serge E. Hallyn, Stefan Berger, containers, lkp,
	linux-kernel, zohar, tycho, James.Bottomley, vgoyal,
	christian.brauner, amir73il, linux-security-module, casey

Quoting Eric W. Biederman (ebiederm@xmission.com):
> James Morris <jmorris@namei.org> writes:
> 
> > On Wed, 12 Jul 2017, Serge E. Hallyn wrote:
> >
> >> Quoting Eric W. Biederman (ebiederm@xmission.com):
> >> > Stefan Berger <"Stefan Bergerstefanb"@linux.vnet.ibm.com> writes:
> >> > > Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
> >> > > Signed-off-by: Serge Hallyn <serge@hallyn.com>
> >> > > Reviewed-by: Serge Hallyn <serge@hallyn.com>
> >> > 
> >> > It doesn't look like this is coming through Serge so I don't see how
> >> > the Signed-off-by tag is legtimate.
> >> 
> >> This is mostly explained by the fact that there have been a *lot* of
> >> changes, many of them discussed in private emails.

I wasn't clear here.  There were a lot of changes between the first
mention of the approach and the posting of v1.

My point was that I did in fact agree to Reviewed-by, and the fact that
I've found a few more things to point out only reflects my missing them
before.  I don't think my name is being mis-used.

> > Please try and keep technical discussions public or at least document them 
> > when reposting the patches.
> 
> Yes please.
> 
> A public discussion helps to understand what the challenges are.

Yes, all discussion (that I can find in my mbox) since v1 has been public.

-serge

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-13  0:44         ` Stefan Berger
@ 2017-07-13  1:15           ` Theodore Ts'o
  2017-07-13  2:34             ` Serge E. Hallyn
  2017-07-13 12:11             ` Eric W. Biederman
  0 siblings, 2 replies; 76+ messages in thread
From: Theodore Ts'o @ 2017-07-13  1:15 UTC (permalink / raw)
  To: Stefan Berger
  Cc: Eric W. Biederman, Serge E. Hallyn, containers, lkp,
	linux-kernel, zohar, tycho, James.Bottomley, vgoyal,
	christian.brauner, amir73il, linux-security-module, casey

I'm really confused what problem that is trying to be solved, here,
but it **feels** really, really wrong.

Why do we need to store all of this state on a per-file basis, instead
of some kind of per-file system or per-container data structure?

And how many of these security.foo@uid=bar xattrs do you expect there
to be?  How many "foo", and how many "bar"?

Maybe I missed the full write up, in which case please send me a link
to the full writeup --- ideally in the form of a design doc that
explains the problem statement, gives some examples of how it's going
to be used, what were the other alternatives that were considered, and
why they were rejected, etc.

Thanks,

					- Ted

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-13  1:15           ` Theodore Ts'o
@ 2017-07-13  2:34             ` Serge E. Hallyn
  2017-07-13 12:11             ` Eric W. Biederman
  1 sibling, 0 replies; 76+ messages in thread
From: Serge E. Hallyn @ 2017-07-13  2:34 UTC (permalink / raw)
  To: Theodore Ts'o, Stefan Berger, Eric W. Biederman,
	Serge E. Hallyn, containers, lkp, linux-kernel, zohar, tycho,
	James.Bottomley, vgoyal, christian.brauner, amir73il,
	linux-security-module, casey

Quoting Theodore Ts'o (tytso@mit.edu):
> I'm really confused what problem that is trying to be solved, here,
> but it **feels** really, really wrong.

Hi,

The intro to my original patch might help (or maybe not), as it
has a different motivating text:

http://lkml.org/lkml/2016/11/19/158

We want file capabilities to be supported in unprivileged containers,
so that a piece of software can count on them being available rather
than having to supporting multiple ways of getting+dropping privilege
(for instance, being installed as uid 1000 with cap_net_raw=pe, versus
being installed setuid-root and being expected to do PR_SET_KEEPCAPS
and setuid).

If subuids 10000-20000 are delegated to uid 1001 on the host, and uid
1001 sets up a container with subuid 100000 mapped to container uid 0,
then the container root should be able to write file capabilities
which affect (that is, delegate container root's privilege to) all ids
over which it has privilege (all uids mapped into the container), but
should not have privilege over any uids not mapped into the container.
With regular file capabilities, this is impossible, since any filecap
he writes can then be exercised on the host by uid 1000.

The point of this set (and the ones before it) is to make it so that
the filecap written by the container root is tagged on disk as belonging
to subuid 100000.

> Why do we need to store all of this state on a per-file basis, instead
> of some kind of per-file system or per-container data structure?

This needs to be writeable by an unprivileged user, with no help from
the admin.  AFAICS that rules out per-fs data structure.

Note we are not assuming a filesystem per container.  The typical case
is (for instance) ~/.local/share/lxc/c1/rootfs being the root of
container c1's filesystem.  Mounting a filesystem from inside a user
namespace is still mostly science fiction today.

> And how many of these security.foo@uid=bar xattrs do you expect there
> to be?  How many "foo", and how many "bar"?

For now I'm expecting two foos - security and ima.  The '@uid=bar' is
generic enough that it *can* be re-used for a different kind of
property if we decide to later, but I have no intention of adding
anything.

Casey has mentioned 'smack=', but i think only to keep the option open.
I don't believe he has concrete plans.

> Maybe I missed the full write up, in which case please send me a link
> to the full writeup --- ideally in the form of a design doc that
> explains the problem statement, gives some examples of how it's going
> to be used, what were the other alternatives that were considered, and
> why they were rejected, etc.

As I'd mentioned in an even older patch, http://lkml.org/lkml/2016/5/18/622 ,
I had considered using a completely separate xattr name, but that would
have required invasive userspace changes.

There's no design doc as such, mainly a progressive series of patches to
lkml.  I am very seriously considering writing a paper to detail both
this design and the user ns design in general, as it has become clear
(in unrelated conversations) there is still a lot of confusiong out
there regarding uid namespaces and targeted capabilities.  But it's not
written yet.

-serge

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-13  1:15           ` Theodore Ts'o
  2017-07-13  2:34             ` Serge E. Hallyn
@ 2017-07-13 12:11             ` Eric W. Biederman
  2017-07-13 16:40               ` Theodore Ts'o
  1 sibling, 1 reply; 76+ messages in thread
From: Eric W. Biederman @ 2017-07-13 12:11 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Stefan Berger, Serge E. Hallyn, containers, lkp, linux-kernel,
	zohar, tycho, James.Bottomley, vgoyal, christian.brauner,
	amir73il, linux-security-module, casey

Theodore Ts'o <tytso@mit.edu> writes:

> I'm really confused what problem that is trying to be solved, here,
> but it **feels** really, really wrong.
>
> Why do we need to store all of this state on a per-file basis, instead
> of some kind of per-file system or per-container data structure?
>
> And how many of these security.foo@uid=bar xattrs do you expect there
> to be?  How many "foo", and how many "bar"?
>
> Maybe I missed the full write up, in which case please send me a link
> to the full writeup --- ideally in the form of a design doc that
> explains the problem statement, gives some examples of how it's going
> to be used, what were the other alternatives that were considered, and
> why they were rejected, etc.

The concise summary:

Today we have the xattr security.capable that holds a set of
capabilities that an application gains when executed.  AKA setuid root exec
without actually being setuid root.

User namespaces have the concept of capabilities that are not global but
are limited to their user namespace.  We do not currently have
filesystem support for this concept.

We currently have two proposals on the table.  One is to bump the
revision number of security.capable and add more information in that xattr.
The other is to use a sligthly different capability name.

We are currently evaluating between the two proposals.

Given that it appears the IMA xattrs will want similar treatment coming
up with a good pattern to follow is part of the analysis here.

Eric

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-13 12:11             ` Eric W. Biederman
@ 2017-07-13 16:40               ` Theodore Ts'o
  2017-07-13 17:05                 ` Stefan Berger
                                   ` (2 more replies)
  0 siblings, 3 replies; 76+ messages in thread
From: Theodore Ts'o @ 2017-07-13 16:40 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Stefan Berger, Serge E. Hallyn, containers, lkp, linux-kernel,
	zohar, tycho, James.Bottomley, vgoyal, christian.brauner,
	amir73il, linux-security-module, casey

On Thu, Jul 13, 2017 at 07:11:36AM -0500, Eric W. Biederman wrote:
> The concise summary:
> 
> Today we have the xattr security.capable that holds a set of
> capabilities that an application gains when executed.  AKA setuid root exec
> without actually being setuid root.
> 
> User namespaces have the concept of capabilities that are not global but
> are limited to their user namespace.  We do not currently have
> filesystem support for this concept.

So correct me if I am wrong; in general, there will only be one
variant of the form:

   security.foo@uid=15000

It's not like there will be:

   security.foo@uid=1000
   security.foo@uid=2000

Except.... if you have an Distribution root directory which is shared
by many containers, you would need to put the xattrs in the overlay
inodes.  Worse, each time you launch a new container, with a new
subuid allocation, you will have to iterate over all files with
capabilities and do a copy-up operations on the xattrs in overlayfs.
So that's actually a bit of a disaster.

So for distribution overlays, you will need to do things a different
way, which is to map the distro subdirectory so you know that the
capability with the global uid 0 should be used for the container
"root" uid, right?

So this hack of using security.foo@uid=1000 is *only* useful when the
subcontainer root wants to create the privileged executable.  You
still have to do things the other way.

So can we make perhaps the assertion that *either*:

   security.foo

exists, *or*

   security.foo@uid=BAR

exists, but never both?  And there BAR is exclusive to only one
instances?

Otherwise, I suspect that the architecture is going to turn around and
bite us in the *ss eventually, because someone will want to do
something crazy and the solution will not be scalable.

	  	    		      	   -Ted

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-13 16:40               ` Theodore Ts'o
@ 2017-07-13 17:05                 ` Stefan Berger
  2017-07-13 17:39                   ` Eric W. Biederman
  2017-07-18  7:01                   ` James Morris
  2017-07-13 17:14                 ` Eric W. Biederman
  2017-07-13 21:13                 ` Serge E. Hallyn
  2 siblings, 2 replies; 76+ messages in thread
From: Stefan Berger @ 2017-07-13 17:05 UTC (permalink / raw)
  To: Theodore Ts'o, Eric W. Biederman, Serge E. Hallyn,
	containers, lkp, linux-kernel, zohar, tycho, James.Bottomley,
	vgoyal, christian.brauner, amir73il, linux-security-module,
	casey

On 07/13/2017 12:40 PM, Theodore Ts'o wrote:
> On Thu, Jul 13, 2017 at 07:11:36AM -0500, Eric W. Biederman wrote:
>> The concise summary:
>>
>> Today we have the xattr security.capable that holds a set of
>> capabilities that an application gains when executed.  AKA setuid root exec
>> without actually being setuid root.
>>
>> User namespaces have the concept of capabilities that are not global but
>> are limited to their user namespace.  We do not currently have
>> filesystem support for this concept.
> So correct me if I am wrong; in general, there will only be one
> variant of the form:
>
>     security.foo@uid=15000
>
> It's not like there will be:
>
>     security.foo@uid=1000
>     security.foo@uid=2000

A file shared by 2 containers, one mapping root to uid=1000, the other 
mapping root to uid=2000, will show these two xattrs on the host 
(init_user_ns) once these containers set xattrs on that file.

>
> Except.... if you have an Distribution root directory which is shared
> by many containers, you would need to put the xattrs in the overlay
> inodes.  Worse, each time you launch a new container, with a new
> subuid allocation, you will have to iterate over all files with
> capabilities and do a copy-up operations on the xattrs in overlayfs.
> So that's actually a bit of a disaster.

Note that we do keep compatibility to existing behavior. The 
security.foo of the host is visible inside any container for as long as 
the container root user doesn't set its own security.foo on that file, 
which then hides it. Does that address this concern?


>
> So for distribution overlays, you will need to do things a different
> way, which is to map the distro subdirectory so you know that the
> capability with the global uid 0 should be used for the container
> "root" uid, right?
>
> So this hack of using security.foo@uid=1000 is *only* useful when the
> subcontainer root wants to create the privileged executable.  You
> still have to do things the other way.
>
> So can we make perhaps the assertion that *either*:
>
>     security.foo
>
> exists, *or*
>
>     security.foo@uid=BAR
>
> exists, but never both?  And there BAR is exclusive to only one
> instances?

In the current implementation BAR is visible inside of any instance that 
'covers' this uid with the mapping range. Above example of 
security.foo@uid=1000 appears as security.foo inside the container with 
root mapping to uid 1000 (@uid=0 is suppressed) but also appears as 
security.foo@uid=100 with root uid mapping to 900 (and range of at least 
101).

>
> Otherwise, I suspect that the architecture is going to turn around and
> bite us in the *ss eventually, because someone will want to do
> something crazy and the solution will not be scalable.

Can you define what 'scalable' means for you in this context?
 From what I can see sharing a filesystem between multiple containers 
doesn't 'scale well' for virtualizing the xattrs primarily because of 
size limitations of xattrs per file.

      Stefan

>
> 	  	    		      	   -Ted
>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-13 16:40               ` Theodore Ts'o
  2017-07-13 17:05                 ` Stefan Berger
@ 2017-07-13 17:14                 ` Eric W. Biederman
  2017-07-13 17:33                   ` Stefan Berger
  2017-07-13 21:13                 ` Serge E. Hallyn
  2 siblings, 1 reply; 76+ messages in thread
From: Eric W. Biederman @ 2017-07-13 17:14 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Stefan Berger, Serge E. Hallyn, containers, lkp, linux-kernel,
	zohar, tycho, James.Bottomley, vgoyal, christian.brauner,
	amir73il, linux-security-module, casey

Theodore Ts'o <tytso@mit.edu> writes:

> On Thu, Jul 13, 2017 at 07:11:36AM -0500, Eric W. Biederman wrote:
>> The concise summary:
>> 
>> Today we have the xattr security.capable that holds a set of
>> capabilities that an application gains when executed.  AKA setuid root exec
>> without actually being setuid root.
>> 
>> User namespaces have the concept of capabilities that are not global but
>> are limited to their user namespace.  We do not currently have
>> filesystem support for this concept.
>
> So correct me if I am wrong; in general, there will only be one
> variant of the form:
>
>    security.foo@uid=15000
>
> It's not like there will be:
>
>    security.foo@uid=1000
>    security.foo@uid=2000
>
> Except.... if you have an Distribution root directory which is shared
> by many containers, you would need to put the xattrs in the overlay
> inodes.  Worse, each time you launch a new container, with a new
> subuid allocation, you will have to iterate over all files with
> capabilities and do a copy-up operations on the xattrs in overlayfs.
> So that's actually a bit of a disaster.
>
> So for distribution overlays, you will need to do things a different
> way, which is to map the distro subdirectory so you know that the
> capability with the global uid 0 should be used for the container
> "root" uid, right?
>
> So this hack of using security.foo@uid=1000 is *only* useful when the
> subcontainer root wants to create the privileged executable.  You
> still have to do things the other way.
>
> So can we make perhaps the assertion that *either*:
>
>    security.foo
>
> exists, *or*
>
>    security.foo@uid=BAR
>
> exists, but never both?  And there BAR is exclusive to only one
> instances?
>
> Otherwise, I suspect that the architecture is going to turn around and
> bite us in the *ss eventually, because someone will want to do
> something crazy and the solution will not be scalable.

Yep.  That is what it looks like from here.

Which is why I asked the question about scalability of the xattr
implementations.  It looks like trying to accomodate the general
case just gets us in trouble, and sets unrealistic expectations.

Which strongly suggests that Serge's previous version that
just reved the format of security.capable so that a uid field could
be added is likely to be the better approach.

I want to see what Serge and Stefan have to say but the case looks
pretty clear cut at the moment.

Eric

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-13 17:14                 ` Eric W. Biederman
@ 2017-07-13 17:33                   ` Stefan Berger
  2017-07-13 17:49                     ` Eric W. Biederman
  2017-07-13 21:21                     ` Serge E. Hallyn
  0 siblings, 2 replies; 76+ messages in thread
From: Stefan Berger @ 2017-07-13 17:33 UTC (permalink / raw)
  To: Eric W. Biederman, Theodore Ts'o
  Cc: Serge E. Hallyn, containers, lkp, linux-kernel, zohar, tycho,
	James.Bottomley, vgoyal, christian.brauner, amir73il,
	linux-security-module, casey

On 07/13/2017 01:14 PM, Eric W. Biederman wrote:
> Theodore Ts'o <tytso@mit.edu> writes:
>
>> On Thu, Jul 13, 2017 at 07:11:36AM -0500, Eric W. Biederman wrote:
>>> The concise summary:
>>>
>>> Today we have the xattr security.capable that holds a set of
>>> capabilities that an application gains when executed.  AKA setuid root exec
>>> without actually being setuid root.
>>>
>>> User namespaces have the concept of capabilities that are not global but
>>> are limited to their user namespace.  We do not currently have
>>> filesystem support for this concept.
>> So correct me if I am wrong; in general, there will only be one
>> variant of the form:
>>
>>     security.foo@uid=15000
>>
>> It's not like there will be:
>>
>>     security.foo@uid=1000
>>     security.foo@uid=2000
>>
>> Except.... if you have an Distribution root directory which is shared
>> by many containers, you would need to put the xattrs in the overlay
>> inodes.  Worse, each time you launch a new container, with a new
>> subuid allocation, you will have to iterate over all files with
>> capabilities and do a copy-up operations on the xattrs in overlayfs.
>> So that's actually a bit of a disaster.
>>
>> So for distribution overlays, you will need to do things a different
>> way, which is to map the distro subdirectory so you know that the
>> capability with the global uid 0 should be used for the container
>> "root" uid, right?
>>
>> So this hack of using security.foo@uid=1000 is *only* useful when the
>> subcontainer root wants to create the privileged executable.  You
>> still have to do things the other way.
>>
>> So can we make perhaps the assertion that *either*:
>>
>>     security.foo
>>
>> exists, *or*
>>
>>     security.foo@uid=BAR
>>
>> exists, but never both?  And there BAR is exclusive to only one
>> instances?
>>
>> Otherwise, I suspect that the architecture is going to turn around and
>> bite us in the *ss eventually, because someone will want to do
>> something crazy and the solution will not be scalable.
> Yep.  That is what it looks like from here.
>
> Which is why I asked the question about scalability of the xattr
> implementations.  It looks like trying to accomodate the general
> case just gets us in trouble, and sets unrealistic expectations.
>
> Which strongly suggests that Serge's previous version that
> just reved the format of security.capable so that a uid field could
> be added is likely to be the better approach.
>
> I want to see what Serge and Stefan have to say but the case looks
> pretty clear cut at the moment.

The approach of virtualizing the xattrs on the name-side, which is what 
this patch does, provides a more general approach than to virtualizing 
it on the value side, which is what Serge does in his other patch for 
security.capability alone. With the virtualizing on-the-value side 
virtualizing the xattr becomes an exercise that needs to be repeated for 
every xattr name that one would want to virtualize. With this patch you 
would just add another xattr name to a list, a one-line patch in the 
end. Xattr with prefixes like trusted.* need a bit more work but this 
can be woven in as well 
(https://github.com/stefanberger/linux/commit/397b1a3b24045c67405fc83465e544fc865d402f).

For virtualizing the xattrs on the 'value' side I was looking for 
whether there's something like a 'wrapper' structure around the actual 
value of the xattr so that that wrapper could be extended to support 
different values at different uids and applied to any xattr. 
Unfortunately there's no such 'wrapper'.

    Stefan

>
> Eric
>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-13 17:05                 ` Stefan Berger
@ 2017-07-13 17:39                   ` Eric W. Biederman
  2017-07-13 19:14                     ` Theodore Ts'o
  2017-07-13 21:17                     ` Serge E. Hallyn
  2017-07-18  7:01                   ` James Morris
  1 sibling, 2 replies; 76+ messages in thread
From: Eric W. Biederman @ 2017-07-13 17:39 UTC (permalink / raw)
  To: Stefan Berger
  Cc: Theodore Ts'o, Serge E. Hallyn, containers, lkp,
	linux-kernel, zohar, tycho, James.Bottomley, vgoyal,
	christian.brauner, amir73il, linux-security-module, casey

Stefan Berger <stefanb@linux.vnet.ibm.com> writes:

> On 07/13/2017 12:40 PM, Theodore Ts'o wrote:
>> On Thu, Jul 13, 2017 at 07:11:36AM -0500, Eric W. Biederman wrote:
>>> The concise summary:
>>>
>>> Today we have the xattr security.capable that holds a set of
>>> capabilities that an application gains when executed.  AKA setuid root exec
>>> without actually being setuid root.
>>>
>>> User namespaces have the concept of capabilities that are not global but
>>> are limited to their user namespace.  We do not currently have
>>> filesystem support for this concept.
>> So correct me if I am wrong; in general, there will only be one
>> variant of the form:
>>
>>     security.foo@uid=15000
>>
>> It's not like there will be:
>>
>>     security.foo@uid=1000
>>     security.foo@uid=2000
>
> A file shared by 2 containers, one mapping root to uid=1000, the other
> mapping root to uid=2000, will show these two xattrs on the host
> (init_user_ns) once these containers set xattrs on that file.

There is an interesting solution for shared directory trees containing
executables.

Overlayfs is needed if you need those directory trees to be writable and
for the files to show up as owned by uid 0.  An overlayfs will have to
do something with the security.capable attribute.  So ignoring that case.

If you don't care about the ownership of the files, and read only is
acceptable, and you still don't want to give these executables
capabilities in the initial user namespace.  What you can do is
make everything owned by some non-zero uid including the security
capability.  Call this non-zero uid image-root.

When the container starts it creates two nested user namespaces first
with image-root mapped to 0.  Then with the containers choice of uid
mapped to 0 image-root unmapped.  This will ensure the capability
attributes work for all containers that share that root image.  And it
ensures the file are read-only from the container.

So I don't think there is ever a case where we would share a filesystem
image where we would need to set multiple security attributes on a file.

>> Otherwise, I suspect that the architecture is going to turn around and
>> bite us in the *ss eventually, because someone will want to do
>> something crazy and the solution will not be scalable.
>
> Can you define what 'scalable' means for you in this context?
> From what I can see sharing a filesystem between multiple containers
> doesn't 'scale well' for virtualizing the xattrs primarily because of
> size limitations of xattrs per file.

Worse than that I believe you will find that filesystems are built on
the assumption that there will be a small number of xattrs per file.
So even if the vfs limitations were lifted the filesystem performance
would suffer.

Even if the filesystem performed well I believe there are other issues
with stat, and simply not having so much meta-data that adminstrators
and tools get confused.

So I believe there are some very good fundamental reasons why we want to
limit the amount of meta-data per file.

Eric

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-13 17:33                   ` Stefan Berger
@ 2017-07-13 17:49                     ` Eric W. Biederman
  2017-07-13 19:48                       ` Serge E. Hallyn
       [not found]                       ` <9a3010e5-ca2b-5e7a-656b-fcc14f7bec4e@linux.vnet.ibm.com>
  2017-07-13 21:21                     ` Serge E. Hallyn
  1 sibling, 2 replies; 76+ messages in thread
From: Eric W. Biederman @ 2017-07-13 17:49 UTC (permalink / raw)
  To: Stefan Berger
  Cc: Theodore Ts'o, Serge E. Hallyn, containers, lkp,
	linux-kernel, zohar, tycho, James.Bottomley, vgoyal,
	christian.brauner, amir73il, linux-security-module, casey

Stefan Berger <stefanb@linux.vnet.ibm.com> writes:

> On 07/13/2017 01:14 PM, Eric W. Biederman wrote:
>> Theodore Ts'o <tytso@mit.edu> writes:
>>
>>> On Thu, Jul 13, 2017 at 07:11:36AM -0500, Eric W. Biederman wrote:
>>>> The concise summary:
>>>>
>>>> Today we have the xattr security.capable that holds a set of
>>>> capabilities that an application gains when executed.  AKA setuid root exec
>>>> without actually being setuid root.
>>>>
>>>> User namespaces have the concept of capabilities that are not global but
>>>> are limited to their user namespace.  We do not currently have
>>>> filesystem support for this concept.
>>> So correct me if I am wrong; in general, there will only be one
>>> variant of the form:
>>>
>>>     security.foo@uid=15000
>>>
>>> It's not like there will be:
>>>
>>>     security.foo@uid=1000
>>>     security.foo@uid=2000
>>>
>>> Except.... if you have an Distribution root directory which is shared
>>> by many containers, you would need to put the xattrs in the overlay
>>> inodes.  Worse, each time you launch a new container, with a new
>>> subuid allocation, you will have to iterate over all files with
>>> capabilities and do a copy-up operations on the xattrs in overlayfs.
>>> So that's actually a bit of a disaster.
>>>
>>> So for distribution overlays, you will need to do things a different
>>> way, which is to map the distro subdirectory so you know that the
>>> capability with the global uid 0 should be used for the container
>>> "root" uid, right?
>>>
>>> So this hack of using security.foo@uid=1000 is *only* useful when the
>>> subcontainer root wants to create the privileged executable.  You
>>> still have to do things the other way.
>>>
>>> So can we make perhaps the assertion that *either*:
>>>
>>>     security.foo
>>>
>>> exists, *or*
>>>
>>>     security.foo@uid=BAR
>>>
>>> exists, but never both?  And there BAR is exclusive to only one
>>> instances?
>>>
>>> Otherwise, I suspect that the architecture is going to turn around and
>>> bite us in the *ss eventually, because someone will want to do
>>> something crazy and the solution will not be scalable.
>> Yep.  That is what it looks like from here.
>>
>> Which is why I asked the question about scalability of the xattr
>> implementations.  It looks like trying to accomodate the general
>> case just gets us in trouble, and sets unrealistic expectations.
>>
>> Which strongly suggests that Serge's previous version that
>> just reved the format of security.capable so that a uid field could
>> be added is likely to be the better approach.
>>
>> I want to see what Serge and Stefan have to say but the case looks
>> pretty clear cut at the moment.
>
> The approach of virtualizing the xattrs on the name-side, which is
> what this patch does, provides a more general approach than to
> virtualizing it on the value side, which is what Serge does in his
> other patch for security.capability alone. With the virtualizing
> on-the-value side virtualizing the xattr becomes an exercise that
> needs to be repeated for every xattr name that one would want to
> virtualize. With this patch you would just add another xattr name to a
> list, a one-line patch in the end. Xattr with prefixes like trusted.*
> need a bit more work but this can be woven in as well
> (https://github.com/stefanberger/linux/commit/397b1a3b24045c67405fc83465e544fc865d402f).

Reusable code has merit, as it reduces the maintenance burden.

My big question right now is can you implement Ted's suggested
restriction.  Only one security.foo or secuirty.foo@... attribute ?

The maintenance gains are definitely worth taking if they do not
penalize the common case.

> For virtualizing the xattrs on the 'value' side I was looking for
> whether there's something like a 'wrapper' structure around the actual
> value of the xattr so that that wrapper could be extended to support
> different values at different uids and applied to any
> xattr. Unfortunately there's no such 'wrapper'.

Different values at different uids currently appear to be undesirable.
At least for security.capable it does not appear to be useful.

A wrapper structure is also a reasonable suggestion.  Put it's magic
number/version code where the existing version code is away we go.

Do you know of cases where we will truly want to have different
attributes for different containers?

The case that I can think of for IMA is that the signatures want to be
conntected to a key that goes with the filesystem image (so not a system
key) but that would not be something that would need to be changed
between containers.

Eric

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-13 17:39                   ` Eric W. Biederman
@ 2017-07-13 19:14                     ` Theodore Ts'o
  2017-07-13 19:41                       ` Serge E. Hallyn
  2017-07-13 21:17                     ` Serge E. Hallyn
  1 sibling, 1 reply; 76+ messages in thread
From: Theodore Ts'o @ 2017-07-13 19:14 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Stefan Berger, Serge E. Hallyn, containers, lkp, linux-kernel,
	zohar, tycho, James.Bottomley, vgoyal, christian.brauner,
	amir73il, linux-security-module, casey

On Thu, Jul 13, 2017 at 12:39:10PM -0500, Eric W. Biederman wrote:
> > Can you define what 'scalable' means for you in this context?
> > From what I can see sharing a filesystem between multiple containers
> > doesn't 'scale well' for virtualizing the xattrs primarily because of
> > size limitations of xattrs per file.
> 
> Worse than that I believe you will find that filesystems are built on
> the assumption that there will be a small number of xattrs per file.
> So even if the vfs limitations were lifted the filesystem performance
> would suffer.

That's why I've been pushing here.  If people try to do

   security.capable@uid=1000
   security.capable@uid=2000
   security.capable@uid=3000
   security.capable@uid=4000
   security.capable@uid=5000
   security.capable@uid=6000
   security.capable@uid=7000
   security.capable@uid=8000
   security.capable@uid=9000
         .
	 .
	 .

... where the values of all of these will be the same, this is going
to be *awful* even if the file system can support it.

So maybe we are better off if we define an xattr

   security.capable@guest-container

... so the property is that it is ignored by the host ("real")
container, and in all of the subcontainers, it will be used if the
local container root is trying to execute the file.

Now, this doesn't support the solution of the "turtles all the way
down" insane containers configuraiton.

E.g., where in one container we boot a RHEL distro, which then
launches another container running another RHEL distro, and repeat
this 1000 times, for a very deeply nested subsubsubsubsub...container.

I think this is insane and we shouldn't support this, but I know there
are people who think this is a perfectly sane thing to do.


The other thing this doesn't support is someone who wants to use IMA,
and where the every single IMA is using a different signed HMAC:

   security.ima@uid=1000
   security.ima@uid=2000
   security.ima@uid=3000
   security.ima@uid=4000
   security.ima@uid=5000
   security.ima@uid=6000
   security.ima@uid=7000
   security.ima@uid=8000
   security.ima@uid=9000
	.
	.
	.

Where each security IMA could either be a 32 byte HMAC, or worse, a
256 byte RSA signed signature.  Now let's assume there are 10,000
containers, each of which needs a separate RSA signature.  This sounds
insane.... but I've seen things that I've thought were more insane
coming out of containerland, so it would be nice if we can get
something signed in blood promisng that no, we would *never* do
something that insane, or better yet, make it impossible to do from an
architectural standpoint.

					- Ted

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-13 19:14                     ` Theodore Ts'o
@ 2017-07-13 19:41                       ` Serge E. Hallyn
  0 siblings, 0 replies; 76+ messages in thread
From: Serge E. Hallyn @ 2017-07-13 19:41 UTC (permalink / raw)
  To: Theodore Ts'o, Eric W. Biederman, Stefan Berger,
	Serge E. Hallyn, containers, lkp, linux-kernel, zohar, tycho,
	James.Bottomley, vgoyal, christian.brauner, amir73il,
	linux-security-module, casey

Quoting Theodore Ts'o (tytso@mit.edu):
> On Thu, Jul 13, 2017 at 12:39:10PM -0500, Eric W. Biederman wrote:
> > > Can you define what 'scalable' means for you in this context?
> > > From what I can see sharing a filesystem between multiple containers
> > > doesn't 'scale well' for virtualizing the xattrs primarily because of
> > > size limitations of xattrs per file.
> > 
> > Worse than that I believe you will find that filesystems are built on
> > the assumption that there will be a small number of xattrs per file.
> > So even if the vfs limitations were lifted the filesystem performance
> > would suffer.
> 
> That's why I've been pushing here.  If people try to do
> 
>    security.capable@uid=1000
>    security.capable@uid=2000
>    security.capable@uid=3000
>    security.capable@uid=4000
>    security.capable@uid=5000
>    security.capable@uid=6000
>    security.capable@uid=7000
>    security.capable@uid=8000
>    security.capable@uid=9000
>          .
> 	 .
> 	 .
> 
> ... where the values of all of these will be the same, this is going
> to be *awful* even if the file system can support it.

Typically users will be allocated a single range of ids, for instance
100000-200000.  We might therefore consider putting a range in the uid=,
i.e. security.capable@uid=100000-200000.  I don't think that's really
needed, but it's an option.

Consider that the executable will be owned by some kuid+kgid.  If we
have all the xattrs you list above, then who would we have actually
owning the file?  If we're chown'ing it anyway (to be root-owned but
not seutid-root), then this discussion is moot, because we'll have
to re-write the xattr after the chown.  So for this to matter, we
would have an fs owned by either uid nobody in the container, or
by some special user (mapped to 100000 in the container perhaps)
which is always special-case-mapped into the container.

> So maybe we are better off if we define an xattr
> 
>    security.capable@guest-container
> ... so the property is that it is ignored by the host ("real")
> container, and in all of the subcontainers, it will be used if the
> local container root is trying to execute the file.

In the previous discussion we considered having 'security.capable@uid='
with no following integer, meaning that it would take effect in all
user namespaces which do not have kuid 0 as root.

This could be useful for cases like docker hosts, but note that writing
this has to require either global CAP_SETFCAP, or CAP_SETFCAP in a
user namespace that has every kuid except 0 mapped.  If joe, uid 1000,
has subuids 100000-20000 delegated to him, then he must not be allowed
to write something that can affect someone with kuids 300000-400000.

-serge

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-13 17:49                     ` Eric W. Biederman
@ 2017-07-13 19:48                       ` Serge E. Hallyn
  2017-07-13 21:12                         ` Eric W. Biederman
       [not found]                       ` <9a3010e5-ca2b-5e7a-656b-fcc14f7bec4e@linux.vnet.ibm.com>
  1 sibling, 1 reply; 76+ messages in thread
From: Serge E. Hallyn @ 2017-07-13 19:48 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Stefan Berger, Theodore Ts'o, Serge E. Hallyn, containers,
	lkp, linux-kernel, zohar, tycho, James.Bottomley, vgoyal,
	christian.brauner, amir73il, linux-security-module, casey

Quoting Eric W. Biederman (ebiederm@xmission.com):
> Stefan Berger <stefanb@linux.vnet.ibm.com> writes:
> 
> > On 07/13/2017 01:14 PM, Eric W. Biederman wrote:
> >> Theodore Ts'o <tytso@mit.edu> writes:
> >>
> >>> On Thu, Jul 13, 2017 at 07:11:36AM -0500, Eric W. Biederman wrote:
> >>>> The concise summary:
> >>>>
> >>>> Today we have the xattr security.capable that holds a set of
> >>>> capabilities that an application gains when executed.  AKA setuid root exec
> >>>> without actually being setuid root.
> >>>>
> >>>> User namespaces have the concept of capabilities that are not global but
> >>>> are limited to their user namespace.  We do not currently have
> >>>> filesystem support for this concept.
> >>> So correct me if I am wrong; in general, there will only be one
> >>> variant of the form:
> >>>
> >>>     security.foo@uid=15000
> >>>
> >>> It's not like there will be:
> >>>
> >>>     security.foo@uid=1000
> >>>     security.foo@uid=2000
> >>>
> >>> Except.... if you have an Distribution root directory which is shared
> >>> by many containers, you would need to put the xattrs in the overlay
> >>> inodes.  Worse, each time you launch a new container, with a new
> >>> subuid allocation, you will have to iterate over all files with
> >>> capabilities and do a copy-up operations on the xattrs in overlayfs.
> >>> So that's actually a bit of a disaster.
> >>>
> >>> So for distribution overlays, you will need to do things a different
> >>> way, which is to map the distro subdirectory so you know that the
> >>> capability with the global uid 0 should be used for the container
> >>> "root" uid, right?
> >>>
> >>> So this hack of using security.foo@uid=1000 is *only* useful when the
> >>> subcontainer root wants to create the privileged executable.  You
> >>> still have to do things the other way.
> >>>
> >>> So can we make perhaps the assertion that *either*:
> >>>
> >>>     security.foo
> >>>
> >>> exists, *or*
> >>>
> >>>     security.foo@uid=BAR
> >>>
> >>> exists, but never both?  And there BAR is exclusive to only one
> >>> instances?
> >>>
> >>> Otherwise, I suspect that the architecture is going to turn around and
> >>> bite us in the *ss eventually, because someone will want to do
> >>> something crazy and the solution will not be scalable.
> >> Yep.  That is what it looks like from here.
> >>
> >> Which is why I asked the question about scalability of the xattr
> >> implementations.  It looks like trying to accomodate the general
> >> case just gets us in trouble, and sets unrealistic expectations.
> >>
> >> Which strongly suggests that Serge's previous version that
> >> just reved the format of security.capable so that a uid field could
> >> be added is likely to be the better approach.
> >>
> >> I want to see what Serge and Stefan have to say but the case looks
> >> pretty clear cut at the moment.

I'm fine with that.  Now, we'll be doing the enforcement at xattr
write time, meaning someone *can* come up with an fs image with >1
such xattrs.  Which is *fine*, I believe, it won't break anything
security-wise, and our goal is only to stop users from thinking it
is legitimate two write multiple such xattrs, so that they don't later
bug the fs folks like Ted saying "hey why can't I write 1000 of these,
I think that's a bug."

So at xattr write time,

	1. if there is already an xattr, and it is either the global
	non-namespaced xattr, or it has kuid=X where X is the kuid
	mapped to root in a parent of the container, then we refuse
	the write
	2. if there is already an xattr, and it is for a kuid=X where
	X is mapped into the container, then we overwrite the existing
	xattr.

At read/use time, we use the rules we have now.

Does that seem reasonable?

-serge

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-13 19:48                       ` Serge E. Hallyn
@ 2017-07-13 21:12                         ` Eric W. Biederman
  0 siblings, 0 replies; 76+ messages in thread
From: Eric W. Biederman @ 2017-07-13 21:12 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Stefan Berger, Theodore Ts'o, containers, lkp, linux-kernel,
	zohar, tycho, James.Bottomley, vgoyal, christian.brauner,
	amir73il, linux-security-module, casey

"Serge E. Hallyn" <serge@hallyn.com> writes:

> Quoting Eric W. Biederman (ebiederm@xmission.com):
>> Stefan Berger <stefanb@linux.vnet.ibm.com> writes:
>> 
>> > On 07/13/2017 01:14 PM, Eric W. Biederman wrote:
>> >> Theodore Ts'o <tytso@mit.edu> writes:
>> >>
>> >>> On Thu, Jul 13, 2017 at 07:11:36AM -0500, Eric W. Biederman wrote:
>> >>>> The concise summary:
>> >>>>
>> >>>> Today we have the xattr security.capable that holds a set of
>> >>>> capabilities that an application gains when executed.  AKA setuid root exec
>> >>>> without actually being setuid root.
>> >>>>
>> >>>> User namespaces have the concept of capabilities that are not global but
>> >>>> are limited to their user namespace.  We do not currently have
>> >>>> filesystem support for this concept.
>> >>> So correct me if I am wrong; in general, there will only be one
>> >>> variant of the form:
>> >>>
>> >>>     security.foo@uid=15000
>> >>>
>> >>> It's not like there will be:
>> >>>
>> >>>     security.foo@uid=1000
>> >>>     security.foo@uid=2000
>> >>>
>> >>> Except.... if you have an Distribution root directory which is shared
>> >>> by many containers, you would need to put the xattrs in the overlay
>> >>> inodes.  Worse, each time you launch a new container, with a new
>> >>> subuid allocation, you will have to iterate over all files with
>> >>> capabilities and do a copy-up operations on the xattrs in overlayfs.
>> >>> So that's actually a bit of a disaster.
>> >>>
>> >>> So for distribution overlays, you will need to do things a different
>> >>> way, which is to map the distro subdirectory so you know that the
>> >>> capability with the global uid 0 should be used for the container
>> >>> "root" uid, right?
>> >>>
>> >>> So this hack of using security.foo@uid=1000 is *only* useful when the
>> >>> subcontainer root wants to create the privileged executable.  You
>> >>> still have to do things the other way.
>> >>>
>> >>> So can we make perhaps the assertion that *either*:
>> >>>
>> >>>     security.foo
>> >>>
>> >>> exists, *or*
>> >>>
>> >>>     security.foo@uid=BAR
>> >>>
>> >>> exists, but never both?  And there BAR is exclusive to only one
>> >>> instances?
>> >>>
>> >>> Otherwise, I suspect that the architecture is going to turn around and
>> >>> bite us in the *ss eventually, because someone will want to do
>> >>> something crazy and the solution will not be scalable.
>> >> Yep.  That is what it looks like from here.
>> >>
>> >> Which is why I asked the question about scalability of the xattr
>> >> implementations.  It looks like trying to accomodate the general
>> >> case just gets us in trouble, and sets unrealistic expectations.
>> >>
>> >> Which strongly suggests that Serge's previous version that
>> >> just reved the format of security.capable so that a uid field could
>> >> be added is likely to be the better approach.
>> >>
>> >> I want to see what Serge and Stefan have to say but the case looks
>> >> pretty clear cut at the moment.
>
> I'm fine with that.  Now, we'll be doing the enforcement at xattr
> write time, meaning someone *can* come up with an fs image with >1
> such xattrs.  Which is *fine*, I believe, it won't break anything
> security-wise, and our goal is only to stop users from thinking it
> is legitimate two write multiple such xattrs, so that they don't later
> bug the fs folks like Ted saying "hey why can't I write 1000 of these,
> I think that's a bug."
>
> So at xattr write time,
>
> 	1. if there is already an xattr, and it is either the global
> 	non-namespaced xattr, or it has kuid=X where X is the kuid
> 	mapped to root in a parent of the container, then we refuse
> 	the write
> 	2. if there is already an xattr, and it is for a kuid=X where
> 	X is mapped into the container, then we overwrite the existing
> 	xattr.
>
> At read/use time, we use the rules we have now.
>
> Does that seem reasonable?

That sounds like it would keep us to one xattr of any given type so yes.

It occurs to me while I am writing this that this is also important
for ima/evm.  There is an xattr that has a hash of all of the other
security relevant xattrs.   Without a limit on the number of xattrs
calculating that security xattr could become time prohibitive.


Eric

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-13 16:40               ` Theodore Ts'o
  2017-07-13 17:05                 ` Stefan Berger
  2017-07-13 17:14                 ` Eric W. Biederman
@ 2017-07-13 21:13                 ` Serge E. Hallyn
  2 siblings, 0 replies; 76+ messages in thread
From: Serge E. Hallyn @ 2017-07-13 21:13 UTC (permalink / raw)
  To: Theodore Ts'o, Eric W. Biederman, Stefan Berger,
	Serge E. Hallyn, containers, lkp, linux-kernel, zohar, tycho,
	James.Bottomley, vgoyal, christian.brauner, amir73il,
	linux-security-module, casey

Quoting Theodore Ts'o (tytso@mit.edu):
> On Thu, Jul 13, 2017 at 07:11:36AM -0500, Eric W. Biederman wrote:
> > The concise summary:
> > 
> > Today we have the xattr security.capable that holds a set of
> > capabilities that an application gains when executed.  AKA setuid root exec
> > without actually being setuid root.
> > 
> > User namespaces have the concept of capabilities that are not global but
> > are limited to their user namespace.  We do not currently have
> > filesystem support for this concept.
> 
> So correct me if I am wrong; in general, there will only be one
> variant of the form:
> 
>    security.foo@uid=15000
> 
> It's not like there will be:
> 
>    security.foo@uid=1000
>    security.foo@uid=2000
> 
> Except.... if you have an Distribution root directory which is shared
> by many containers, you would need to put the xattrs in the overlay
> inodes.

Is that a problem?  Essentially people who would try to do the
above also want to use 'shiftfs' stackable filesystem, which would
presumably eventually do this for you.

>   Worse, each time you launch a new container, with a new
> subuid allocation, you will have to iterate over all files with
> capabilities and do a copy-up operations on the xattrs in overlayfs.
> So that's actually a bit of a disaster.

Only if you create the container rootfs as a copy.

Note that generally they would want to walk the fs in that case anyway, to chown
the files into the container.  And said chown would clear out any existing file
capabilities (and suid/sgid bits).

On the other hand, unprivileged lxc containers are created by
untarring the distro image straight into the mapped user namespace.
So no chowning is needed, and - once we we have this properly supported -
the filecaps should be automatically written correctly for the container.

> So for distribution overlays, you will need to do things a different
> way, which is to map the distro subdirectory so you know that the
> capability with the global uid 0 should be used for the container
> "root" uid, right?
> 
> So this hack of using security.foo@uid=1000 is *only* useful when the
> subcontainer root wants to create the privileged executable.  You
> still have to do things the other way.
> 
> So can we make perhaps the assertion that *either*:
> 
>    security.foo
> 
> exists, *or*
> 
>    security.foo@uid=BAR
> 
> exists, but never both?  And there BAR is exclusive to only one
> instances?

I think that's fine.

-serge

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-13 17:39                   ` Eric W. Biederman
  2017-07-13 19:14                     ` Theodore Ts'o
@ 2017-07-13 21:17                     ` Serge E. Hallyn
  1 sibling, 0 replies; 76+ messages in thread
From: Serge E. Hallyn @ 2017-07-13 21:17 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Stefan Berger, Theodore Ts'o, Serge E. Hallyn, containers,
	lkp, linux-kernel, zohar, tycho, James.Bottomley, vgoyal,
	christian.brauner, amir73il, linux-security-module, casey

Quoting Eric W. Biederman (ebiederm@xmission.com):
> Stefan Berger <stefanb@linux.vnet.ibm.com> writes:
> If you don't care about the ownership of the files, and read only is
> acceptable, and you still don't want to give these executables
> capabilities in the initial user namespace.  What you can do is
> make everything owned by some non-zero uid including the security
> capability.  Call this non-zero uid image-root.
> 
> When the container starts it creates two nested user namespaces first
> with image-root mapped to 0.  Then with the containers choice of uid
> mapped to 0 image-root unmapped.  This will ensure the capability
> attributes work for all containers that share that root image.  And it
> ensures the file are read-only from the container.
> 
> So I don't think there is ever a case where we would share a filesystem
> image where we would need to set multiple security attributes on a file.

Neat idea.  In fact, you can take it a step further and still have the
files be owned by valid uids in the containers.  The parent ns just
needs to have its *root* map to a common kuid not mapped into the child
namespaces, but the files can be owned by another kuid which *is* mapped
into the child containers.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-13 17:33                   ` Stefan Berger
  2017-07-13 17:49                     ` Eric W. Biederman
@ 2017-07-13 21:21                     ` Serge E. Hallyn
  1 sibling, 0 replies; 76+ messages in thread
From: Serge E. Hallyn @ 2017-07-13 21:21 UTC (permalink / raw)
  To: Stefan Berger
  Cc: Eric W. Biederman, Theodore Ts'o, Serge E. Hallyn,
	containers, lkp, linux-kernel, zohar, tycho, James.Bottomley,
	vgoyal, christian.brauner, amir73il, linux-security-module,
	casey

Quoting Stefan Berger (stefanb@linux.vnet.ibm.com):
> For virtualizing the xattrs on the 'value' side I was looking for
> whether there's something like a 'wrapper' structure around the
> actual value of the xattr so that that wrapper could be extended to
> support different values at different uids and applied to any xattr.
> Unfortunately there's no such 'wrapper'.

I believe my very first implementation did essentially this - it used
the not uncommon structure of (mostly making this up):

struct ns_vfs_cap {
	int magic;
	int ncaps;
	struct ns_vfs_cap_data data[0];
};

with (ncaps * sizeof(ns_vfs_cap_data)) following that.

I didn't like it.

-serge

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
       [not found]                       ` <9a3010e5-ca2b-5e7a-656b-fcc14f7bec4e@linux.vnet.ibm.com>
@ 2017-07-14  0:38                         ` Eric W. Biederman
  2017-07-14 11:32                           ` Stefan Berger
  0 siblings, 1 reply; 76+ messages in thread
From: Eric W. Biederman @ 2017-07-14  0:38 UTC (permalink / raw)
  To: Stefan Berger
  Cc: Theodore Ts'o, Serge E. Hallyn, containers, lkp,
	linux-kernel, zohar, tycho, James.Bottomley, vgoyal,
	christian.brauner, amir73il, linux-security-module, casey

Stefan Berger <stefanb@linux.vnet.ibm.com> writes:

> On 07/13/2017 01:49 PM, Eric W. Biederman wrote:
>
> > My big question right now is can you implement Ted's suggested
> > restriction.  Only one security.foo or secuirty.foo@... attribute ?

> We need to raw-list the xattrs and do the check before writing them. I am fairly sure this can be done.
>
> So now you want to allow security.foo and one security.foo@uid=<> or just a single one security.foo(@[[:print:]]*)?
>

The latter.

Eric

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-14  0:38                         ` Eric W. Biederman
@ 2017-07-14 11:32                           ` Stefan Berger
  2017-07-14 12:04                             ` Eric W. Biederman
  2017-07-14 13:34                             ` Serge E. Hallyn
  0 siblings, 2 replies; 76+ messages in thread
From: Stefan Berger @ 2017-07-14 11:32 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Theodore Ts'o, Serge E. Hallyn, containers, lkp,
	linux-kernel, zohar, tycho, James.Bottomley, vgoyal,
	christian.brauner, amir73il, linux-security-module, casey

On 07/13/2017 08:38 PM, Eric W. Biederman wrote:
> Stefan Berger <stefanb@linux.vnet.ibm.com> writes:
>
>> On 07/13/2017 01:49 PM, Eric W. Biederman wrote:
>>
>>> My big question right now is can you implement Ted's suggested
>>> restriction.  Only one security.foo or secuirty.foo@... attribute ?
>> We need to raw-list the xattrs and do the check before writing them. I am fairly sure this can be done.
>>
>> So now you want to allow security.foo and one security.foo@uid=<> or just a single one security.foo(@[[:print:]]*)?
>>
> The latter.

That case would prevent a container user from overriding the xattr on 
the host. Is that what we want? For limiting the number of xattrs and 
getting that functionality (override IMA signature for example) the 
former seems better...

For the former I now have the topmost patch here: 
https://github.com/stefanberger/linux/commits/xattr_for_userns.v3

    Stefan


>
> Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-14 11:32                           ` Stefan Berger
@ 2017-07-14 12:04                             ` Eric W. Biederman
  2017-07-14 12:39                               ` Stefan Berger
  2017-07-14 13:34                             ` Serge E. Hallyn
  1 sibling, 1 reply; 76+ messages in thread
From: Eric W. Biederman @ 2017-07-14 12:04 UTC (permalink / raw)
  To: Stefan Berger
  Cc: Theodore Ts'o, Serge E. Hallyn, containers, lkp,
	linux-kernel, zohar, tycho, James.Bottomley, vgoyal,
	christian.brauner, amir73il, linux-security-module, casey

Stefan Berger <stefanb@linux.vnet.ibm.com> writes:

> On 07/13/2017 08:38 PM, Eric W. Biederman wrote:
>> Stefan Berger <stefanb@linux.vnet.ibm.com> writes:
>>
>>> On 07/13/2017 01:49 PM, Eric W. Biederman wrote:
>>>
>>>> My big question right now is can you implement Ted's suggested
>>>> restriction.  Only one security.foo or secuirty.foo@... attribute ?
>>> We need to raw-list the xattrs and do the check before writing them. I am fairly sure this can be done.
>>>
>>> So now you want to allow security.foo and one security.foo@uid=<> or just a single one security.foo(@[[:print:]]*)?
>>>
>> The latter.
>
> That case would prevent a container user from overriding the xattr on
> the host. Is that what we want?

Most definitely.  If a more privileged use has set secure.capable that
is better.  

> For limiting the number of xattrs and
> getting that functionality (override IMA signature for example) the
> former seems better...

I don't know about IMA.  But my feeling is that we will only be dealing
with a single signing key, so I don't see how having multiple IMA xattrs
make sense.  Could you explain that to me?

> For the former I now have the topmost patch here:
> https://github.com/stefanberger/linux/commits/xattr_for_userns.v3

Thank you.

Eric

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-14 12:04                             ` Eric W. Biederman
@ 2017-07-14 12:39                               ` Stefan Berger
  0 siblings, 0 replies; 76+ messages in thread
From: Stefan Berger @ 2017-07-14 12:39 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Theodore Ts'o, Serge E. Hallyn, containers, lkp,
	linux-kernel, zohar, tycho, James.Bottomley, vgoyal,
	christian.brauner, amir73il, linux-security-module, casey

On 07/14/2017 08:04 AM, Eric W. Biederman wrote:
> Stefan Berger <stefanb@linux.vnet.ibm.com> writes:
>
>> On 07/13/2017 08:38 PM, Eric W. Biederman wrote:
>>> Stefan Berger <stefanb@linux.vnet.ibm.com> writes:
>>>
>>>> On 07/13/2017 01:49 PM, Eric W. Biederman wrote:
>>>>
>>>>> My big question right now is can you implement Ted's suggested
>>>>> restriction.  Only one security.foo or secuirty.foo@... attribute ?
>>>> We need to raw-list the xattrs and do the check before writing them. I am fairly sure this can be done.
>>>>
>>>> So now you want to allow security.foo and one security.foo@uid=<> or just a single one security.foo(@[[:print:]]*)?
>>>>
>>> The latter.
>> That case would prevent a container user from overriding the xattr on
>> the host. Is that what we want?
> Most definitely.  If a more privileged use has set secure.capable that
> is better.
>
>> For limiting the number of xattrs and
>> getting that functionality (override IMA signature for example) the
>> former seems better...
> I don't know about IMA.  But my feeling is that we will only be dealing
> with a single signing key, so I don't see how having multiple IMA xattrs
> make sense.  Could you explain that to me?

Admittedly I would need to construct and example where the user inside 
the container doesn't want to share the public key with the host on a 
file mounted from the host for some reason.

An example related to security.capability could be a Fedora Docker 
container where the container is distributed with the ping tool 
installed. The ping tool is installed with cap_net_admin,cap_net_raw+ep. 
On a normal Fedora container I cannot use this tool due to my 
capabilities bounding set not including cap_net_admin. So, I overwrite 
this and set only cap_net_raw+ep and I can use for pinging. I may loose 
some functionality on the way due to the lost cap_net_admin but I can 
now use the tool. I guess the point is one can override the capabilities 
set of a distributed container if the container is started with less 
capabilities.

    Stefan


>
>> For the former I now have the topmost patch here:
>> https://github.com/stefanberger/linux/commits/xattr_for_userns.v3
> Thank you.
>
> Eric
>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-14 11:32                           ` Stefan Berger
  2017-07-14 12:04                             ` Eric W. Biederman
@ 2017-07-14 13:34                             ` Serge E. Hallyn
  2017-07-14 15:22                               ` Stefan Berger
  1 sibling, 1 reply; 76+ messages in thread
From: Serge E. Hallyn @ 2017-07-14 13:34 UTC (permalink / raw)
  To: Stefan Berger
  Cc: Eric W. Biederman, Theodore Ts'o, Serge E. Hallyn,
	containers, lkp, linux-kernel, zohar, tycho, James.Bottomley,
	vgoyal, christian.brauner, amir73il, linux-security-module,
	casey

Quoting Stefan Berger (stefanb@linux.vnet.ibm.com):
> On 07/13/2017 08:38 PM, Eric W. Biederman wrote:
> >Stefan Berger <stefanb@linux.vnet.ibm.com> writes:
> >
> >>On 07/13/2017 01:49 PM, Eric W. Biederman wrote:
> >>
> >>>My big question right now is can you implement Ted's suggested
> >>>restriction.  Only one security.foo or secuirty.foo@... attribute ?
> >>We need to raw-list the xattrs and do the check before writing them. I am fairly sure this can be done.
> >>
> >>So now you want to allow security.foo and one security.foo@uid=<> or just a single one security.foo(@[[:print:]]*)?
> >>
> >The latter.
> 
> That case would prevent a container user from overriding the xattr
> on the host. Is that what we want? For limiting the number of xattrs

Not really.  If the file is owned by a uid mapped into the container,
then the container root can chown the file which will clear the file
capability, after which he can set a new one.  If the file is not
owned by a uid mapped into the container, then container root could
not set a filecap anyway.

-serge

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-14 13:34                             ` Serge E. Hallyn
@ 2017-07-14 15:22                               ` Stefan Berger
  2017-07-14 17:35                                 ` Serge E. Hallyn
  2017-07-14 17:36                                 ` Eric W. Biederman
  0 siblings, 2 replies; 76+ messages in thread
From: Stefan Berger @ 2017-07-14 15:22 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Eric W. Biederman, Theodore Ts'o, containers, lkp,
	linux-kernel, zohar, tycho, James.Bottomley, vgoyal,
	christian.brauner, amir73il, linux-security-module, casey

On 07/14/2017 09:34 AM, Serge E. Hallyn wrote:
> Quoting Stefan Berger (stefanb@linux.vnet.ibm.com):
>> On 07/13/2017 08:38 PM, Eric W. Biederman wrote:
>>> Stefan Berger <stefanb@linux.vnet.ibm.com> writes:
>>>
>>>> On 07/13/2017 01:49 PM, Eric W. Biederman wrote:
>>>>
>>>>> My big question right now is can you implement Ted's suggested
>>>>> restriction.  Only one security.foo or secuirty.foo@... attribute ?
>>>> We need to raw-list the xattrs and do the check before writing them. I am fairly sure this can be done.
>>>>
>>>> So now you want to allow security.foo and one security.foo@uid=<> or just a single one security.foo(@[[:print:]]*)?
>>>>
>>> The latter.
>> That case would prevent a container user from overriding the xattr
>> on the host. Is that what we want? For limiting the number of xattrs
> Not really.  If the file is owned by a uid mapped into the container,
> then the container root can chown the file which will clear the file
> capability, after which he can set a new one.  If the file is not
> owned by a uid mapped into the container, then container root could
> not set a filecap anyway.

Let's say I installed a container where all files are signed and thus 
have security.ima. Now for some reason I want to re-sign some or all 
files inside that container. How would I do that ? Would I need to get 
rid of security.ima first, possibly by copying each file, deleting the 
original file, and renaming the copied file to the original name, or 
should I just be able to write out a new signature, thus creating 
security.ima@uid=1000 besides the security.ima ?

    Stefan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-14 15:22                               ` Stefan Berger
@ 2017-07-14 17:35                                 ` Serge E. Hallyn
  2017-07-14 18:17                                   ` Eric W. Biederman
                                                     ` (2 more replies)
  2017-07-14 17:36                                 ` Eric W. Biederman
  1 sibling, 3 replies; 76+ messages in thread
From: Serge E. Hallyn @ 2017-07-14 17:35 UTC (permalink / raw)
  To: Stefan Berger, Mimi Zohar
  Cc: Serge E. Hallyn, Eric W. Biederman, Theodore Ts'o,
	containers, lkp, linux-kernel, tycho, James.Bottomley, vgoyal,
	christian.brauner, amir73il, linux-security-module, casey

Quoting Stefan Berger (stefanb@linux.vnet.ibm.com):
> On 07/14/2017 09:34 AM, Serge E. Hallyn wrote:
> >Quoting Stefan Berger (stefanb@linux.vnet.ibm.com):
> >>On 07/13/2017 08:38 PM, Eric W. Biederman wrote:
> >>>Stefan Berger <stefanb@linux.vnet.ibm.com> writes:
> >>>
> >>>>On 07/13/2017 01:49 PM, Eric W. Biederman wrote:
> >>>>
> >>>>>My big question right now is can you implement Ted's suggested
> >>>>>restriction.  Only one security.foo or secuirty.foo@... attribute ?
> >>>>We need to raw-list the xattrs and do the check before writing them. I am fairly sure this can be done.
> >>>>
> >>>>So now you want to allow security.foo and one security.foo@uid=<> or just a single one security.foo(@[[:print:]]*)?
> >>>>
> >>>The latter.
> >>That case would prevent a container user from overriding the xattr
> >>on the host. Is that what we want? For limiting the number of xattrs
> >Not really.  If the file is owned by a uid mapped into the container,
> >then the container root can chown the file which will clear the file
> >capability, after which he can set a new one.  If the file is not
> >owned by a uid mapped into the container, then container root could
> >not set a filecap anyway.
> 
> Let's say I installed a container where all files are signed and
> thus have security.ima. Now for some reason I want to re-sign some
> or all files inside that container. How would I do that ? Would I
> need to get rid of security.ima first, possibly by copying each
> file, deleting the original file, and renaming the copied file to
> the original name, or should I just be able to write out a new
> signature, thus creating security.ima@uid=1000 besides the
> security.ima ?
> 
>    Stefan

Hi Mimi,

what do you think makes most sense for IMA?

-serge

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-14 15:22                               ` Stefan Berger
  2017-07-14 17:35                                 ` Serge E. Hallyn
@ 2017-07-14 17:36                                 ` Eric W. Biederman
  2017-07-14 19:22                                   ` Stefan Berger
  1 sibling, 1 reply; 76+ messages in thread
From: Eric W. Biederman @ 2017-07-14 17:36 UTC (permalink / raw)
  To: Stefan Berger
  Cc: Serge E. Hallyn, Theodore Ts'o, containers, lkp,
	linux-kernel, zohar, tycho, James.Bottomley, vgoyal,
	christian.brauner, amir73il, linux-security-module, casey

Stefan Berger <stefanb@linux.vnet.ibm.com> writes:

> On 07/14/2017 09:34 AM, Serge E. Hallyn wrote:
>> Quoting Stefan Berger (stefanb@linux.vnet.ibm.com):
>>> On 07/13/2017 08:38 PM, Eric W. Biederman wrote:
>>>> Stefan Berger <stefanb@linux.vnet.ibm.com> writes:
>>>>
>>>>> On 07/13/2017 01:49 PM, Eric W. Biederman wrote:
>>>>>
>>>>>> My big question right now is can you implement Ted's suggested
>>>>>> restriction.  Only one security.foo or secuirty.foo@... attribute ?
>>>>> We need to raw-list the xattrs and do the check before writing them. I am fairly sure this can be done.
>>>>>
>>>>> So now you want to allow security.foo and one security.foo@uid=<> or just a single one security.foo(@[[:print:]]*)?
>>>>>
>>>> The latter.
>>> That case would prevent a container user from overriding the xattr
>>> on the host. Is that what we want? For limiting the number of xattrs
>> Not really.  If the file is owned by a uid mapped into the container,
>> then the container root can chown the file which will clear the file
>> capability, after which he can set a new one.  If the file is not
>> owned by a uid mapped into the container, then container root could
>> not set a filecap anyway.
>
> Let's say I installed a container where all files are signed and thus have
> security.ima. Now for some reason I want to re-sign some or all files inside
> that container. How would I do that ? Would I need to get rid of security.ima
> first, possibly by copying each file, deleting the original file, and renaming
> the copied file to the original name, or should I just be able to write out a
> new signature, thus creating security.ima@uid=1000 besides the security.ima ?

This gets us into some interesting territory, where the semantics of
these attributes matters.

The implementation of security.capable implements the security killpriv
hooks.  Anyone merely by changing the file can cause the security
capability to go away.  So it makes sense from the security.capable side
that anyone who has the capable_wrt_inode_uidgid(CAP_SETFCAP) will be
able to clear and set security.capable.

The integrity xattrs do not.  Which results in very big semantic
difference between these two kinds of attributes.  I am insufficiently
familiar with the rules for security.ima and security.evm to understand
what those rules should be.

That may be enough that we can not share code between these two cases.

Eric

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-14 17:35                                 ` Serge E. Hallyn
@ 2017-07-14 18:17                                   ` Eric W. Biederman
  2017-07-14 18:48                                   ` Mimi Zohar
       [not found]                                   ` <xagsmtp2.20170714182525.6604@vmsdvm4.vnet.ibm.com>
  2 siblings, 0 replies; 76+ messages in thread
From: Eric W. Biederman @ 2017-07-14 18:17 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Stefan Berger, Mimi Zohar, Theodore Ts'o, containers, lkp,
	linux-kernel, tycho, James.Bottomley, vgoyal, christian.brauner,
	amir73il, linux-security-module, casey

"Serge E. Hallyn" <serge@hallyn.com> writes:

> Quoting Stefan Berger (stefanb@linux.vnet.ibm.com):
>> On 07/14/2017 09:34 AM, Serge E. Hallyn wrote:
>> >Quoting Stefan Berger (stefanb@linux.vnet.ibm.com):
>> >>On 07/13/2017 08:38 PM, Eric W. Biederman wrote:
>> >>>Stefan Berger <stefanb@linux.vnet.ibm.com> writes:
>> >>>
>> >>>>On 07/13/2017 01:49 PM, Eric W. Biederman wrote:
>> >>>>
>> >>>>>My big question right now is can you implement Ted's suggested
>> >>>>>restriction.  Only one security.foo or secuirty.foo@... attribute ?
>> >>>>We need to raw-list the xattrs and do the check before writing them. I am fairly sure this can be done.
>> >>>>
>> >>>>So now you want to allow security.foo and one security.foo@uid=<> or just a single one security.foo(@[[:print:]]*)?
>> >>>>
>> >>>The latter.
>> >>That case would prevent a container user from overriding the xattr
>> >>on the host. Is that what we want? For limiting the number of xattrs
>> >Not really.  If the file is owned by a uid mapped into the container,
>> >then the container root can chown the file which will clear the file
>> >capability, after which he can set a new one.  If the file is not
>> >owned by a uid mapped into the container, then container root could
>> >not set a filecap anyway.
>> 
>> Let's say I installed a container where all files are signed and
>> thus have security.ima. Now for some reason I want to re-sign some
>> or all files inside that container. How would I do that ? Would I
>> need to get rid of security.ima first, possibly by copying each
>> file, deleting the original file, and renaming the copied file to
>> the original name, or should I just be able to write out a new
>> signature, thus creating security.ima@uid=1000 besides the
>> security.ima ?
>> 
>>    Stefan
>
> Hi Mimi,
>
> what do you think makes most sense for IMA?

I am going to give my two cents since I have been thinking about this.

First I think this entire scheme plays hobs with the security.evm
attribute as security.evm needs to know the names of the xattrs to
protect.

I forget which attributes has a hash and what has a message
athentication code.

If there is an attribute with a simple file hash I think it only make
sense for the kernel to touch it, and I don't see any sense in having
multiples.

If there is an attribute with a message authentication code (roughly a
signed hash) it makes sense to have that to be tied to the kernel key
ring that controlls the keys.  (Which probably means a per user
namespace thing at some point).  But again pretty untouchable otherwise.

Which brings us to the semantic question of would it be nice to have
stacked IMA/EVM on the same file.

I really don't think we do.  I think allowing multiple keys for
different part of trusting files is easy enough that we should have no
need to fight over which keys do which.

Looking at integrity.h I see signature_v2_hdr that has a keyid.  Any use
case I can think of for distributing a distribution image with ima/evm
xattrs will need to use asymmetric keys aka public/private keypairs so
that the originator of the content does not give away their private
keys.

Given that usefully we are talking about content that should be
connected to keys in one way or another I don't believe it even makes
sense at this point to attempt to use uids for dealing with ima and
evm content.

Further looking Serge's previous patch is 300 lines of code Setfan's
patch that provides the possibility of code resuse is 500 lines of code.

Increasingly it is looking to me that code reuse rather than concept
reuse is a false economy.  The code does not get smaller.  The semantic
differences make it problematic.  Possibly to the problematic to the
point where significant pieces may not be reused.  The format breaks
assumptions for other parts of the code like security.evm.  The format
by multiple names instead of a single attribute requires more disk
access so is less efficient.

In short I am seeing more code that runs slower and is harder to
maintain.  Please point out where I am wrong.

Eric

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-14 17:35                                 ` Serge E. Hallyn
  2017-07-14 18:17                                   ` Eric W. Biederman
@ 2017-07-14 18:48                                   ` Mimi Zohar
  2017-07-14 18:52                                     ` James Bottomley
  2017-07-14 19:29                                     ` Theodore Ts'o
       [not found]                                   ` <xagsmtp2.20170714182525.6604@vmsdvm4.vnet.ibm.com>
  2 siblings, 2 replies; 76+ messages in thread
From: Mimi Zohar @ 2017-07-14 18:48 UTC (permalink / raw)
  To: Serge E. Hallyn, Stefan Berger, Mimi Zohar
  Cc: Eric W. Biederman, Theodore Ts'o, containers, lkp,
	linux-kernel, tycho, James.Bottomley, vgoyal, christian.brauner,
	amir73il, linux-security-module, casey

On Fri, 2017-07-14 at 12:35 -0500, Serge E. Hallyn wrote:
> Quoting Stefan Berger (stefanb@linux.vnet.ibm.com):
> > On 07/14/2017 09:34 AM, Serge E. Hallyn wrote:
> > >Quoting Stefan Berger (stefanb@linux.vnet.ibm.com):
> > >>On 07/13/2017 08:38 PM, Eric W. Biederman wrote:
> > >>>Stefan Berger <stefanb@linux.vnet.ibm.com> writes:
> > >>>
> > >>>>On 07/13/2017 01:49 PM, Eric W. Biederman wrote:
> > >>>>
> > >>>>>My big question right now is can you implement Ted's suggested
> > >>>>>restriction.  Only one security.foo or secuirty.foo@... attribute ?
> > >>>>We need to raw-list the xattrs and do the check before writing them. I am fairly sure this can be done.
> > >>>>
> > >>>>So now you want to allow security.foo and one security.foo@uid=<> or just a single one security.foo(@[[:print:]]*)?
> > >>>>
> > >>>The latter.
> > >>That case would prevent a container user from overriding the xattr
> > >>on the host. Is that what we want? For limiting the number of xattrs
> > >Not really.  If the file is owned by a uid mapped into the container,
> > >then the container root can chown the file which will clear the file
> > >capability, after which he can set a new one.  If the file is not
> > >owned by a uid mapped into the container, then container root could
> > >not set a filecap anyway.
> > 
> > Let's say I installed a container where all files are signed and
> > thus have security.ima. Now for some reason I want to re-sign some
> > or all files inside that container. How would I do that ? Would I
> > need to get rid of security.ima first, possibly by copying each
> > file, deleting the original file, and renaming the copied file to
> > the original name, or should I just be able to write out a new
> > signature, thus creating security.ima@uid=1000 besides the
> > security.ima ?
> > 
> >    Stefan
> 
> Hi Mimi,
> 
> what do you think makes most sense for IMA?

If I'm understanding the discussion correctly, this isn't an issue for
layered copy on write filesystems, as each fs layer could have it's
own set of xattrs.  The underlying and layered xattrs should be able
to co-exist.  Use the layered xattr if it exists, but fall back to
using the underlying xattr if it doesn't.

The concern is with a shared filesystems.  In that case, for IMA it
would make sense to support a native and a namespace xattr.  If due to
xattr space limitations we have to limit the number of xattrs, then we
should limit it to two - a native and a namespace version, with a
"uid=" tag - first namespace gets permission to write the namespace
xattr.  Again, like in the layered case, if the namespace xattr
doesn't exist, fall back to using the native xattr.

This allows most files to use the underlying xattrs, but allows a few
files to be re-signed inside the namespace, as needed.  For the
layered filesystem case, this would allow mutable file hashes to be
written.  (Unclear as to how shared filesystems would work in this
case.)

Mimi

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-14 18:48                                   ` Mimi Zohar
@ 2017-07-14 18:52                                     ` James Bottomley
  2017-07-14 20:03                                       ` Mimi Zohar
  2017-07-14 19:29                                     ` Theodore Ts'o
  1 sibling, 1 reply; 76+ messages in thread
From: James Bottomley @ 2017-07-14 18:52 UTC (permalink / raw)
  To: Mimi Zohar, Serge E. Hallyn, Stefan Berger, Mimi Zohar
  Cc: Eric W. Biederman, Theodore Ts'o, containers, lkp,
	linux-kernel, tycho, vgoyal, christian.brauner, amir73il,
	linux-security-module, casey

On Fri, 2017-07-14 at 14:48 -0400, Mimi Zohar wrote:
> The concern is with a shared filesystems.  In that case, for IMA it
> would make sense to support a native and a namespace xattr.  If due
> to xattr space limitations we have to limit the number of xattrs,
> then we should limit it to two - a native and a namespace version,
> with a "uid=" tag - first namespace gets permission to write the
> namespace xattr.  Again, like in the layered case, if the namespace
> xattr doesn't exist, fall back to using the native xattr.

Just on this point: if we're really concerned about the need on shared
filesystems to have multiple IMA signatures per file, might it not make
sense simply to support multiple signatures within the security.ima
xattr? The rules for writing signature updates within user namespaces
would be somewhat complex (say only able to replace a signature for
which you demonstrate you possess the key) but it would lead to an
implementation which would work for traditional shared filesystems
(like NFS) as well as containerised bind mounts.

James

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-14 17:36                                 ` Eric W. Biederman
@ 2017-07-14 19:22                                   ` Stefan Berger
  0 siblings, 0 replies; 76+ messages in thread
From: Stefan Berger @ 2017-07-14 19:22 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge E. Hallyn, Theodore Ts'o, containers, lkp,
	linux-kernel, zohar, tycho, James.Bottomley, vgoyal,
	christian.brauner, amir73il, linux-security-module, casey

On 07/14/2017 01:36 PM, Eric W. Biederman wrote:
> Stefan Berger <stefanb@linux.vnet.ibm.com> writes:
>
>> On 07/14/2017 09:34 AM, Serge E. Hallyn wrote:
>>> Quoting Stefan Berger (stefanb@linux.vnet.ibm.com):
>>>> On 07/13/2017 08:38 PM, Eric W. Biederman wrote:
>>>>> Stefan Berger <stefanb@linux.vnet.ibm.com> writes:
>>>>>
>>>>>> On 07/13/2017 01:49 PM, Eric W. Biederman wrote:
>>>>>>
>>>>>>> My big question right now is can you implement Ted's suggested
>>>>>>> restriction.  Only one security.foo or secuirty.foo@... attribute ?
>>>>>> We need to raw-list the xattrs and do the check before writing them. I am fairly sure this can be done.
>>>>>>
>>>>>> So now you want to allow security.foo and one security.foo@uid=<> or just a single one security.foo(@[[:print:]]*)?
>>>>>>
>>>>> The latter.
>>>> That case would prevent a container user from overriding the xattr
>>>> on the host. Is that what we want? For limiting the number of xattrs
>>> Not really.  If the file is owned by a uid mapped into the container,
>>> then the container root can chown the file which will clear the file
>>> capability, after which he can set a new one.  If the file is not
>>> owned by a uid mapped into the container, then container root could
>>> not set a filecap anyway.
>> Let's say I installed a container where all files are signed and thus have
>> security.ima. Now for some reason I want to re-sign some or all files inside
>> that container. How would I do that ? Would I need to get rid of security.ima
>> first, possibly by copying each file, deleting the original file, and renaming
>> the copied file to the original name, or should I just be able to write out a
>> new signature, thus creating security.ima@uid=1000 besides the security.ima ?
> This gets us into some interesting territory, where the semantics of
> these attributes matters.
>
> The implementation of security.capable implements the security killpriv
> hooks.  Anyone merely by changing the file can cause the security
> capability to go away.  So it makes sense from the security.capable side
> that anyone who has the capable_wrt_inode_uidgid(CAP_SETFCAP) will be
> able to clear and set security.capable.
>
> The integrity xattrs do not.  Which results in very big semantic
> difference between these two kinds of attributes.  I am insufficiently
> familiar with the rules for security.ima and security.evm to understand
> what those rules should be.
>
> That may be enough that we can not share code between these two cases.

On the host I can simply overwrite capabilities. I think the same model 
should apply to the virtualized world. The difference still is that 
removing an xattr, if written on the host, may only be possible by copy 
+ file move to original filename.

On IMA, when appending a letter to an executable, the executable doesn't 
run anymore when appraisal is used, but the signature is still there and 
needs to be re-written. Though I think this aspect on how they disappear 
doesn't matter as much if they can simply be overwritten.

Some things could certainly be solved with flags indicating behaviors of 
xattrs for as long as these flags only affect the reading, listing, and 
re-writing of the virtualized xattrs, which is what the patch does. For 
example a flag for security.capability could say that only a single 
'security.capability(@uid=<uid>)?' may exist while security.ima could 
have two.

    Stefan

>
> Eric
>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
       [not found]                                   ` <xagsmtp2.20170714182525.6604@vmsdvm4.vnet.ibm.com>
@ 2017-07-14 19:26                                     ` Mimi Zohar
  2017-07-15  0:02                                       ` Eric W. Biederman
                                                         ` (2 more replies)
  0 siblings, 3 replies; 76+ messages in thread
From: Mimi Zohar @ 2017-07-14 19:26 UTC (permalink / raw)
  To: Eric W. Biederman, Serge E. Hallyn
  Cc: Stefan Berger, Mimi Zohar, Theodore Ts'o, containers, lkp,
	linux-kernel, tycho, James.Bottomley, vgoyal, christian.brauner,
	amir73il, linux-security-module, casey

On Fri, 2017-07-14 at 13:17 -0500, Eric W. Biederman wrote:
> "Serge E. Hallyn" <serge@hallyn.com> writes:
> 
> > Quoting Stefan Berger (stefanb@linux.vnet.ibm.com):
> >> On 07/14/2017 09:34 AM, Serge E. Hallyn wrote:
> >> >Quoting Stefan Berger (stefanb@linux.vnet.ibm.com):
> >> >>On 07/13/2017 08:38 PM, Eric W. Biederman wrote:
> >> >>>Stefan Berger <stefanb@linux.vnet.ibm.com> writes:
> >> >>>
> >> >>>>On 07/13/2017 01:49 PM, Eric W. Biederman wrote:
> >> >>>>
> >> >>>>>My big question right now is can you implement Ted's suggested
> >> >>>>>restriction.  Only one security.foo or secuirty.foo@... attribute ?
> >> >>>>We need to raw-list the xattrs and do the check before writing them. I am fairly sure this can be done.
> >> >>>>
> >> >>>>So now you want to allow security.foo and one security.foo@uid=<> or just a single one security.foo(@[[:print:]]*)?
> >> >>>>
> >> >>>The latter.
> >> >>That case would prevent a container user from overriding the xattr
> >> >>on the host. Is that what we want? For limiting the number of xattrs
> >> >Not really.  If the file is owned by a uid mapped into the container,
> >> >then the container root can chown the file which will clear the file
> >> >capability, after which he can set a new one.  If the file is not
> >> >owned by a uid mapped into the container, then container root could
> >> >not set a filecap anyway.
> >> 
> >> Let's say I installed a container where all files are signed and
> >> thus have security.ima. Now for some reason I want to re-sign some
> >> or all files inside that container. How would I do that ? Would I
> >> need to get rid of security.ima first, possibly by copying each
> >> file, deleting the original file, and renaming the copied file to
> >> the original name, or should I just be able to write out a new
> >> signature, thus creating security.ima@uid=1000 besides the
> >> security.ima ?
> >> 
> >>    Stefan
> >
> > Hi Mimi,
> >
> > what do you think makes most sense for IMA?
> 
> I am going to give my two cents since I have been thinking about this.
> 
> First I think this entire scheme plays hobs with the security.evm
> attribute as security.evm needs to know the names of the xattrs to
> protect.
> 
> I forget which attributes has a hash and what has a message
> athentication code.

security.ima contains either a file hash or a signature.  (file data)
security.evm contains either a signature or an hmac of the security
xattrs and other file metadata. (file meta-data)

The same rules would apply to security.evm, as described in my
response.  Based on it's view of the security xattrs, either the
native or namespace security.evm would be updated.

> If there is an attribute with a simple file hash I think it only make
> sense for the kernel to touch it, and I don't see any sense in having
> multiples.

Only files that are in the IMA-appraisal policy is the file hash
calculated and written out as security.ima.  Depending this policy,
does the security.ima exist.  So if the file is in policy for both the
native and namespace policies, agreed the same hash doesn't need to be
written as two different xattrs.

> If there is an attribute with a message authentication code (roughly a
> signed hash) it makes sense to have that to be tied to the kernel key
> ring that controlls the keys.  (Which probably means a per user
> namespace thing at some point).  But again pretty untouchable otherwise.

Right, the namespace would require it's own EVM key. 

> Which brings us to the semantic question of would it be nice to have
> stacked IMA/EVM on the same file.
> 
> I really don't think we do.  I think allowing multiple keys for
> different part of trusting files is easy enough that we should have no
> need to fight over which keys do which.

We definitely want to support different policies on the native and in
the namespace with different keys and keyrings.

Refer to Mehmet Kaaylap's recent post, which refers to a PoC version
of IMA namespacing - kernsec.org/pipermail/linux-security-module-
archive/2017-July/002286.html.

> Looking at integrity.h I see signature_v2_hdr that has a keyid.  Any use
> case I can think of for distributing a distribution image with ima/evm
> xattrs will need to use asymmetric keys aka public/private keypairs so
> that the originator of the content does not give away their private
> keys.

Agreed.

> Given that usefully we are talking about content that should be
> connected to keys in one way or another I don't believe it even makes
> sense at this point to attempt to use uids for dealing with ima and
> evm content.

We need to resolve the xattr issue in order to namespace IMA-
appraisal. 

Mimi

> Further looking Serge's previous patch is 300 lines of code Setfan's
> patch that provides the possibility of code resuse is 500 lines of code.
> 
> Increasingly it is looking to me that code reuse rather than concept
> reuse is a false economy.  The code does not get smaller.  The semantic
> differences make it problematic.  Possibly to the problematic to the
> point where significant pieces may not be reused.  The format breaks
> assumptions for other parts of the code like security.evm.  The format
> by multiple names instead of a single attribute requires more disk
> access so is less efficient.
> 
> In short I am seeing more code that runs slower and is harder to
> maintain.  Please point out where I am wrong.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-14 18:48                                   ` Mimi Zohar
  2017-07-14 18:52                                     ` James Bottomley
@ 2017-07-14 19:29                                     ` Theodore Ts'o
  2017-07-14 19:43                                       ` Mimi Zohar
  1 sibling, 1 reply; 76+ messages in thread
From: Theodore Ts'o @ 2017-07-14 19:29 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: Serge E. Hallyn, Stefan Berger, Mimi Zohar, Eric W. Biederman,
	containers, lkp, linux-kernel, tycho, James.Bottomley, vgoyal,
	christian.brauner, amir73il, linux-security-module, casey

On Fri, Jul 14, 2017 at 02:48:10PM -0400, Mimi Zohar wrote:
> 
> If I'm understanding the discussion correctly, this isn't an issue for
> layered copy on write filesystems, as each fs layer could have it's
> own set of xattrs.  The underlying and layered xattrs should be able
> to co-exist.  Use the layered xattr if it exists, but fall back to
> using the underlying xattr if it doesn't.

Note that this assumes that it is possible to "copy up" the xattrs
without necessarily "copying up" all of the data blocks.  This might
be true for some layers, but I don't believe it is currently true for
overlayfs, for example.

					- Ted

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-14 19:29                                     ` Theodore Ts'o
@ 2017-07-14 19:43                                       ` Mimi Zohar
  0 siblings, 0 replies; 76+ messages in thread
From: Mimi Zohar @ 2017-07-14 19:43 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Serge E. Hallyn, Stefan Berger, Mimi Zohar, Eric W. Biederman,
	containers, lkp, linux-kernel, tycho, James.Bottomley, vgoyal,
	christian.brauner, amir73il, linux-security-module, casey

On Fri, 2017-07-14 at 15:29 -0400, Theodore Ts'o wrote:
> On Fri, Jul 14, 2017 at 02:48:10PM -0400, Mimi Zohar wrote:
> > 
> > If I'm understanding the discussion correctly, this isn't an issue for
> > layered copy on write filesystems, as each fs layer could have it's
> > own set of xattrs.  The underlying and layered xattrs should be able
> > to co-exist.  Use the layered xattr if it exists, but fall back to
> > using the underlying xattr if it doesn't.
> 
> Note that this assumes that it is possible to "copy up" the xattrs
> without necessarily "copying up" all of the data blocks.  This might
> be true for some layers, but I don't believe it is currently true for
> overlayfs, for example.

Ok, so for the use case scneario where the container owner is willing
to use the public key distributed with the files, then only those
files that are new or modified in the overlay would need to be signed
with a key local to the overlay.  In the worst case scenario, where
the container owner is only willing to trust their own public key, I
guess we can live with having to copy up the files.

Mimi

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-14 18:52                                     ` James Bottomley
@ 2017-07-14 20:03                                       ` Mimi Zohar
  2017-07-14 20:39                                         ` James Bottomley
  0 siblings, 1 reply; 76+ messages in thread
From: Mimi Zohar @ 2017-07-14 20:03 UTC (permalink / raw)
  To: James Bottomley, Serge E. Hallyn, Stefan Berger, Mimi Zohar
  Cc: Eric W. Biederman, Theodore Ts'o, containers, lkp,
	linux-kernel, tycho, vgoyal, christian.brauner, amir73il,
	linux-security-module, casey

On Fri, 2017-07-14 at 11:52 -0700, James Bottomley wrote:
> On Fri, 2017-07-14 at 14:48 -0400, Mimi Zohar wrote:
> > The concern is with a shared filesystems.  In that case, for IMA it
> > would make sense to support a native and a namespace xattr.  If due
> > to xattr space limitations we have to limit the number of xattrs,
> > then we should limit it to two - a native and a namespace version,
> > with a "uid=" tag - first namespace gets permission to write the
> > namespace xattr.  Again, like in the layered case, if the namespace
> > xattr doesn't exist, fall back to using the native xattr.
> 
> Just on this point: if we're really concerned about the need on shared
> filesystems to have multiple IMA signatures per file, might it not make
> sense simply to support multiple signatures within the security.ima
> xattr? The rules for writing signature updates within user namespaces
> would be somewhat complex (say only able to replace a signature for
> which you demonstrate you possess the key) but it would lead to an
> implementation which would work for traditional shared filesystems
> (like NFS) as well as containerised bind mounts.

Writing security.ima requires being root with CAP_SYS_ADMIN
privileges.  I wouldn't want to give root within the namespace
permission to over write or just extend the native security.ima.

Mimi

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-14 20:03                                       ` Mimi Zohar
@ 2017-07-14 20:39                                         ` James Bottomley
  2017-07-14 21:34                                           ` Theodore Ts'o
                                                             ` (2 more replies)
  0 siblings, 3 replies; 76+ messages in thread
From: James Bottomley @ 2017-07-14 20:39 UTC (permalink / raw)
  To: Mimi Zohar, Serge E. Hallyn, Stefan Berger, Mimi Zohar
  Cc: containers, linux-kernel, linux-security-module,
	Eric W. Biederman, casey, Theodore Ts'o, lkp

On Fri, 2017-07-14 at 16:03 -0400, Mimi Zohar wrote:
> On Fri, 2017-07-14 at 11:52 -0700, James Bottomley wrote:
> > 
> > On Fri, 2017-07-14 at 14:48 -0400, Mimi Zohar wrote:
> > > 
> > > The concern is with a shared filesystems.  In that case, for IMA
> > > it would make sense to support a native and a namespace xattr.
> > >  If due to xattr space limitations we have to limit the number of
> > > xattrs, then we should limit it to two - a native and a namespace
> > > version, with a "uid=" tag - first namespace gets permission to
> > > write the namespace xattr.  Again, like in the layered case, if
> > > the namespace xattr doesn't exist, fall back to using the native
> > > xattr.
> > 
> > Just on this point: if we're really concerned about the need on
> > shared filesystems to have multiple IMA signatures per file, might
> > it not make sense simply to support multiple signatures within the
> > security.ima xattr? The rules for writing signature updates within
> > user namespaces would be somewhat complex (say only able to replace
> > a signature for which you demonstrate you possess the key) but it
> > would lead to an implementation which would work for traditional
> > shared filesystems (like NFS) as well as containerised bind mounts.
> 
> Writing security.ima requires being root with CAP_SYS_ADMIN
> privileges.  I wouldn't want to give root within the namespace
> permission to over write or just extend the native security.ima.

but why?  That's partly the point of all of this: some security.
attributes can't be written by container root without some supervision
(the capability ones are the hugely problematic ones from this point of
view), but for some there's no reason they shouldn't be.  What would be
the reason that root in a container shouldn't be able to write the ima
xattr the same as host root could on its filesystem?

James

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-14 20:39                                         ` James Bottomley
@ 2017-07-14 21:34                                           ` Theodore Ts'o
  2017-07-14 23:22                                             ` Eric W. Biederman
  2017-07-14 23:29                                           ` Mimi Zohar
  2017-07-14 23:53                                           ` Eric W. Biederman
  2 siblings, 1 reply; 76+ messages in thread
From: Theodore Ts'o @ 2017-07-14 21:34 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mimi Zohar, Serge E. Hallyn, Stefan Berger, Mimi Zohar,
	containers, linux-kernel, linux-security-module,
	Eric W. Biederman, casey, lkp

On Fri, Jul 14, 2017 at 01:39:59PM -0700, James Bottomley wrote:
> but why?  That's partly the point of all of this: some security.
> attributes can't be written by container root without some supervision
> (the capability ones are the hugely problematic ones from this point of
> view), but for some there's no reason they shouldn't be.  What would be
> the reason that root in a container shouldn't be able to write the ima
> xattr the same as host root could on its filesystem?

So I'm happy to say, "Ix-nay on nested containerization; that way lies
insanity".  But my understanding is that there will be people who want
to run containers in containers in containers in containers...  and
this is what scares me.

What if someone in the Nth layer of containerization wants to allow
the container root in the (N+1)th layer to set file capabilities that
will not be honored in the Nth layer of containerization?

Again I think that this is insane, and I'm happy for the answer to be,
"No, that's not supported".  That the "Host container" can have
capabilities that it won't honor, but will be honored by all
subcontainers, but that same thing can't be done between a
subsubsub-container and its child subsubsubsub-container.

Are we OK with that?  Because how we would encode this in the xattr
seems to be to be hopelessly not scalable.

					- Ted

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-14 21:34                                           ` Theodore Ts'o
@ 2017-07-14 23:22                                             ` Eric W. Biederman
  0 siblings, 0 replies; 76+ messages in thread
From: Eric W. Biederman @ 2017-07-14 23:22 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: James Bottomley, Mimi Zohar, Serge E. Hallyn, Stefan Berger,
	Mimi Zohar, containers, linux-kernel, linux-security-module,
	casey, lkp

Theodore Ts'o <tytso@mit.edu> writes:

> On Fri, Jul 14, 2017 at 01:39:59PM -0700, James Bottomley wrote:
>> but why?  That's partly the point of all of this: some security.
>> attributes can't be written by container root without some supervision
>> (the capability ones are the hugely problematic ones from this point of
>> view), but for some there's no reason they shouldn't be.  What would be
>> the reason that root in a container shouldn't be able to write the ima
>> xattr the same as host root could on its filesystem?
>
> So I'm happy to say, "Ix-nay on nested containerization; that way lies
> insanity".  But my understanding is that there will be people who want
> to run containers in containers in containers in containers...  and
> this is what scares me.

I am happy to say we need to bound the space we take in an inode.
So a design that needs more space the more containers you have is
suspicious.

I am not fond of decisions that don't allow nesting of containers.  That
just paints us into a corner.  I am in favor of things that require
little or bounded space.

Generally I will frown at a decision that won't allow nesting, because
nesting of containers happens naturally

> What if someone in the Nth layer of containerization wants to allow
> the container root in the (N+1)th layer to set file capabilities that
> will not be honored in the Nth layer of containerization?

That works perfectly well with the design we have today.  And it only
needs a single security.capability attribute.  The actual design is
associated with the security.capability attribute (either in the
attribute or in the most recent iteration in the attribute name *scowl*)
is to have the uid (from the filesystems point of view) of the root user
of a user namespace.  Running that executable will give you those
capabilities if the uid matches the root user in your user namespace (or
a parent user namespace).

As anyone can who can modify a file can remove a security.capable
attribute just like anyone who can modify a file can remove the setuid
bit this works fine is is sufficient.  Though perhaps a little different.

> Again I think that this is insane, and I'm happy for the answer to be,
> "No, that's not supported".  That the "Host container" can have
> capabilities that it won't honor, but will be honored by all
> subcontainers, but that same thing can't be done between a
> subsubsub-container and its child subsubsubsub-container.
>
> Are we OK with that?  Because how we would encode this in the xattr
> seems to be to be hopelessly not scalable.

That really isn't an issue right now.

The real question right now is what to do with security.ima and
security.evm.  As it was proposed that we share a common code base for
them.  Right now it looks to me like the semantics are sufficiently
different that it doesn't make sense to share code between the two
implementations.  At which point all reason for storing any of this in
the xattr name goes away.  So we just have a single xattr.

Right now I am very much in favor of security xattrs continuing to have
well known names.  That easily limits how much space in the inode you
can have, and it makes thinking about things easier.  It doesn't
preclude having acls in your xattr.  That is exactly how posix
acls are implemented.   But I am not going to build generic support for
them, and I really don't expect they will be needed. 

Eric

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-14 20:39                                         ` James Bottomley
  2017-07-14 21:34                                           ` Theodore Ts'o
@ 2017-07-14 23:29                                           ` Mimi Zohar
  2017-07-14 23:53                                           ` Eric W. Biederman
  2 siblings, 0 replies; 76+ messages in thread
From: Mimi Zohar @ 2017-07-14 23:29 UTC (permalink / raw)
  To: James Bottomley, Serge E. Hallyn, Stefan Berger, Mimi Zohar
  Cc: containers, linux-kernel, linux-security-module,
	Eric W. Biederman, casey, Theodore Ts'o, lkp

On Fri, 2017-07-14 at 13:39 -0700, James Bottomley wrote:
> On Fri, 2017-07-14 at 16:03 -0400, Mimi Zohar wrote:
> > On Fri, 2017-07-14 at 11:52 -0700, James Bottomley wrote:
> > > 
> > > On Fri, 2017-07-14 at 14:48 -0400, Mimi Zohar wrote:
> > > > 
> > > > The concern is with a shared filesystems.  In that case, for IMA
> > > > it would make sense to support a native and a namespace xattr.
> > > >  If due to xattr space limitations we have to limit the number of
> > > > xattrs, then we should limit it to two - a native and a namespace
> > > > version, with a "uid=" tag - first namespace gets permission to
> > > > write the namespace xattr.  Again, like in the layered case, if
> > > > the namespace xattr doesn't exist, fall back to using the native
> > > > xattr.
> > > 
> > > Just on this point: if we're really concerned about the need on
> > > shared filesystems to have multiple IMA signatures per file, might
> > > it not make sense simply to support multiple signatures within the
> > > security.ima xattr? The rules for writing signature updates within
> > > user namespaces would be somewhat complex (say only able to replace
> > > a signature for which you demonstrate you possess the key) but it
> > > would lead to an implementation which would work for traditional
> > > shared filesystems (like NFS) as well as containerised bind mounts.
> > 
> > Writing security.ima requires being root with CAP_SYS_ADMIN
> > privileges.  I wouldn't want to give root within the namespace
> > permission to over write or just extend the native security.ima.
> 
> but why?  That's partly the point of all of this: some security.
> attributes can't be written by container root without some supervision
> (the capability ones are the hugely problematic ones from this point of
> view), but for some there's no reason they shouldn't be.  What would be
> the reason that root in a container shouldn't be able to write the ima
> xattr the same as host root could on its filesystem?

Let's describe the different scenarios of a shared filesystem (not
layered).
1. The kernel updating the file hash as the file changes, written as
the security.ima xattr.
2. Root vs root in the namespace explicitly writing the security.ima
xattr. 
 
In the first case, neither root nor root in the namespace calculates
and writes the file hash as security.ima.  The kernel itself updates
the file hash.  Since the file hash is the same value, it doesn't need
to be written out as two separate security.ima xattrs.

The second case is where root or root in the namespace explicitly
writes out a file signature as security.ima.  Here different entities
might want to sign the file with different keys.  Up to now, only root
with CAP_SYS_ADMIN can write out security.ima.  That shouldn't change.
Nor should root in the namespace be allowed to change the native's
view, but should be only allowed to change its own view of the xattr.
Allowing root in the namespace to change the native's view isn't safe.

Remember writing security.ima also triggers security.evm to be
updated.  Root in the namespace should never be permitted to cause the
native's security.evm to be updated, only the namespaces's version.

Mimi

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-11 15:05 ` [PATCH v2] xattr: Enable security.capability in user namespaces Stefan Berger
                     ` (4 preceding siblings ...)
  2017-07-12 17:53   ` Vivek Goyal
@ 2017-07-14 23:41   ` Eric W. Biederman
  2017-07-15 21:27     ` Stefan Berger
  2017-07-17 18:58   ` Vivek Goyal
  6 siblings, 1 reply; 76+ messages in thread
From: Eric W. Biederman @ 2017-07-14 23:41 UTC (permalink / raw)
  To: Stefan Berger
  Cc: containers, lkp, linux-kernel, zohar, tycho, serge,
	James.Bottomley, vgoyal, christian.brauner, amir73il,
	linux-security-module, casey, Stefan Berger

Stefan Berger <"Stefan Bergerstefanb"@linux.vnet.ibm.com> writes:

> From: Stefan Berger <stefanb@linux.vnet.ibm.com>
>
> This patch enables security.capability in user namespaces but also
> takes a more general approach to enabling extended attributes in user
> namespaces.
>
> The following rules describe the approach using security.foo as a
> 'user namespace enabled' extended attribute:
>
> Reading of extended attributes:
>
> 1a) Reading security.foo from a user namespace will read
>     security.foo@uid=<uid> of the parent user namespace instead with uid
>     being the mapping of root in that parent user namespace. An
>     exception is if root is mapped to uid 0 on the host, and in this case
>     we will read security.foo directly.
>     --> reading security.foo will read security.foo@uid=1000 for uid
>         mapping of root to 1000.
>
> 1b) If security.foo@uid=<uid> is not available, the security.foo of the
>     parent namespace is tried to be read. This procedure is repeated up to
>     the init user namespace. This step only applies for reading of extended
>     attributes and provides the same behavior as older system where the
>     host's extended attributes applied to user namespaces.
>
> 2) All security.foo@uid=<uid> with valid uid mapping in the user namespace
>    can be read. The uid within the user namespace will be mapped to the
>    corresponding uid on the host and that uid will be used in the name of
>    the extended attribute.
>    -> reading security.foo@uid=1 will read security.foo@uid=1001 for uid
>       mapping of root to 1000, size of at least 2.
>
>    All security.foo@uid=<uid> can be read (by root) on the host with values
>    of <uid> also being subject to checking for valid mappings.
>
> 3) No other security.foo* can be read.
>
> The same rules for reading apply to writing and removing of user
> namespace enabled extended attributes.
>
> When listing extended attributes of a file, only those are presented
> to the user namespace that have a valid mapping. Besides that, names
> of the extended attributes are adjusted to represent the mapping.
> This means that if root is mapped to uid 1000 on the host, the
> security.foo@uid=1000 will be listed as security.foo in the user
> namespace, security.foo@uid=1001 becomes security.foo@uid=1 and so on.
>
> Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
> Signed-off-by: Serge Hallyn <serge@hallyn.com>
> Reviewed-by: Serge Hallyn <serge@hallyn.com>
> ---
>  fs/xattr.c               | 509 +++++++++++++++++++++++++++++++++++++++++++++--
>  security/commoncap.c     |  36 +++-
>  security/selinux/hooks.c |   9 +-
>  3 files changed, 523 insertions(+), 31 deletions(-)

I am just going to quickly and publicly point out that as designed this
patch breaks evm inode metadata signing.  As evm_config_xattrnames is not
updated.

While not completely insurmountable that seems like a strong limitation of
this design.

Eric

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-14 20:39                                         ` James Bottomley
  2017-07-14 21:34                                           ` Theodore Ts'o
  2017-07-14 23:29                                           ` Mimi Zohar
@ 2017-07-14 23:53                                           ` Eric W. Biederman
  2 siblings, 0 replies; 76+ messages in thread
From: Eric W. Biederman @ 2017-07-14 23:53 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mimi Zohar, Serge E. Hallyn, Stefan Berger, Mimi Zohar,
	containers, linux-kernel, linux-security-module, casey,
	Theodore Ts'o, lkp

James Bottomley <James.Bottomley@HansenPartnership.com> writes:

> On Fri, 2017-07-14 at 16:03 -0400, Mimi Zohar wrote:
>> On Fri, 2017-07-14 at 11:52 -0700, James Bottomley wrote:
>> > 
>> > On Fri, 2017-07-14 at 14:48 -0400, Mimi Zohar wrote:
>> > > 
>> > > The concern is with a shared filesystems.  In that case, for IMA
>> > > it would make sense to support a native and a namespace xattr.
>> > >  If due to xattr space limitations we have to limit the number of
>> > > xattrs, then we should limit it to two - a native and a namespace
>> > > version, with a "uid=" tag - first namespace gets permission to
>> > > write the namespace xattr.  Again, like in the layered case, if
>> > > the namespace xattr doesn't exist, fall back to using the native
>> > > xattr.
>> > 
>> > Just on this point: if we're really concerned about the need on
>> > shared filesystems to have multiple IMA signatures per file, might
>> > it not make sense simply to support multiple signatures within the
>> > security.ima xattr? The rules for writing signature updates within
>> > user namespaces would be somewhat complex (say only able to replace
>> > a signature for which you demonstrate you possess the key) but it
>> > would lead to an implementation which would work for traditional
>> > shared filesystems (like NFS) as well as containerised bind mounts.
>> 
>> Writing security.ima requires being root with CAP_SYS_ADMIN
>> privileges.  I wouldn't want to give root within the namespace
>> permission to over write or just extend the native security.ima.
>
> but why?  That's partly the point of all of this: some security.
> attributes can't be written by container root without some supervision
> (the capability ones are the hugely problematic ones from this point of
> view), but for some there's no reason they shouldn't be.  What would be
> the reason that root in a container shouldn't be able to write the ima
> xattr the same as host root could on its filesystem?

Mimi said she ``native''.  It competely makes sense for the things that
the container doesn't ``own'' to not be allowed to be written/updated by
the container.

James you are making the case here for root in the container to write
to the ima and evm attributes that are for the container.

So I don't actually see any disagreement here except perhaps for
terminology.

Eric

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-14 19:26                                     ` Mimi Zohar
@ 2017-07-15  0:02                                       ` Eric W. Biederman
       [not found]                                       ` <xagsmtp3.20170715001054.9173@uk1vsc.vnet.ibm.com>
  2017-07-26  3:00                                       ` Serge E. Hallyn
  2 siblings, 0 replies; 76+ messages in thread
From: Eric W. Biederman @ 2017-07-15  0:02 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: Serge E. Hallyn, Stefan Berger, Mimi Zohar, Theodore Ts'o,
	containers, lkp, linux-kernel, tycho, James.Bottomley, vgoyal,
	christian.brauner, amir73il, linux-security-module, casey

Mimi Zohar <zohar@linux.vnet.ibm.com> writes:

> On Fri, 2017-07-14 at 13:17 -0500, Eric W. Biederman wrote:
>> "Serge E. Hallyn" <serge@hallyn.com> writes:
>> 
>> > Quoting Stefan Berger (stefanb@linux.vnet.ibm.com):
>> >> On 07/14/2017 09:34 AM, Serge E. Hallyn wrote:
>> >> >Quoting Stefan Berger (stefanb@linux.vnet.ibm.com):
>> >> >>On 07/13/2017 08:38 PM, Eric W. Biederman wrote:
>> >> >>>Stefan Berger <stefanb@linux.vnet.ibm.com> writes:
>> >> >>>
>> >> >>>>On 07/13/2017 01:49 PM, Eric W. Biederman wrote:
>> >> >>>>
>> >> >>>>>My big question right now is can you implement Ted's suggested
>> >> >>>>>restriction.  Only one security.foo or secuirty.foo@... attribute ?
>> >> >>>>We need to raw-list the xattrs and do the check before writing them. I am fairly sure this can be done.
>> >> >>>>
>> >> >>>>So now you want to allow security.foo and one security.foo@uid=<> or just a single one security.foo(@[[:print:]]*)?
>> >> >>>>
>> >> >>>The latter.
>> >> >>That case would prevent a container user from overriding the xattr
>> >> >>on the host. Is that what we want? For limiting the number of xattrs
>> >> >Not really.  If the file is owned by a uid mapped into the container,
>> >> >then the container root can chown the file which will clear the file
>> >> >capability, after which he can set a new one.  If the file is not
>> >> >owned by a uid mapped into the container, then container root could
>> >> >not set a filecap anyway.
>> >> 
>> >> Let's say I installed a container where all files are signed and
>> >> thus have security.ima. Now for some reason I want to re-sign some
>> >> or all files inside that container. How would I do that ? Would I
>> >> need to get rid of security.ima first, possibly by copying each
>> >> file, deleting the original file, and renaming the copied file to
>> >> the original name, or should I just be able to write out a new
>> >> signature, thus creating security.ima@uid=1000 besides the
>> >> security.ima ?
>> >> 
>> >>    Stefan
>> >
>> > Hi Mimi,
>> >
>> > what do you think makes most sense for IMA?
>> 
>> I am going to give my two cents since I have been thinking about this.
>> 
>> First I think this entire scheme plays hobs with the security.evm
>> attribute as security.evm needs to know the names of the xattrs to
>> protect.
>> 
>> I forget which attributes has a hash and what has a message
>> athentication code.
>
> security.ima contains either a file hash or a signature.  (file data)
> security.evm contains either a signature or an hmac of the security
> xattrs and other file metadata. (file meta-data)
>
> The same rules would apply to security.evm, as described in my
> response.  Based on it's view of the security xattrs, either the
> native or namespace security.evm would be updated.
>
>> If there is an attribute with a simple file hash I think it only make
>> sense for the kernel to touch it, and I don't see any sense in having
>> multiples.
>
> Only files that are in the IMA-appraisal policy is the file hash
> calculated and written out as security.ima.  Depending this policy,
> does the security.ima exist.  So if the file is in policy for both the
> native and namespace policies, agreed the same hash doesn't need to be
> written as two different xattrs.
>
>> If there is an attribute with a message authentication code (roughly a
>> signed hash) it makes sense to have that to be tied to the kernel key
>> ring that controlls the keys.  (Which probably means a per user
>> namespace thing at some point).  But again pretty untouchable otherwise.
>
> Right, the namespace would require it's own EVM key. 
>
>> Which brings us to the semantic question of would it be nice to have
>> stacked IMA/EVM on the same file.
>> 
>> I really don't think we do.  I think allowing multiple keys for
>> different part of trusting files is easy enough that we should have no
>> need to fight over which keys do which.
>
> We definitely want to support different policies on the native and in
> the namespace with different keys and keyrings.
>
> Refer to Mehmet Kaaylap's recent post, which refers to a PoC version
> of IMA namespacing - kernsec.org/pipermail/linux-security-module-
> archive/2017-July/002286.html.
>
>> Looking at integrity.h I see signature_v2_hdr that has a keyid.  Any use
>> case I can think of for distributing a distribution image with ima/evm
>> xattrs will need to use asymmetric keys aka public/private keypairs so
>> that the originator of the content does not give away their private
>> keys.
>
> Agreed.
>
>> Given that usefully we are talking about content that should be
>> connected to keys in one way or another I don't believe it even makes
>> sense at this point to attempt to use uids for dealing with ima and
>> evm content.
>
> We need to resolve the xattr issue in order to namespace IMA-
> appraisal. 


Mimi I have two questions:

a) Is the keyid enough to distinguish the security.ima and security.evm
   xattrs of one container from another container and from native?  Or
   do we have some important security xattrs that are associated with
   keys that don't have a keyid?

b) Can we reasonably live with a limitation that the native and the
   namespace'd policies don't intersect?  Or in the case of an
   interesection the native policy is the only one that is executed?

I submit that if the answer is keyids are always present, and we can
live with the native policy taking precedence over the container policy
then we have a solution to the IMA xattrs.

Eric

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-14 23:41   ` Eric W. Biederman
@ 2017-07-15 21:27     ` Stefan Berger
  0 siblings, 0 replies; 76+ messages in thread
From: Stefan Berger @ 2017-07-15 21:27 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: containers, lkp, linux-kernel, zohar, tycho, serge,
	James.Bottomley, vgoyal, christian.brauner, amir73il,
	linux-security-module, casey

On 07/14/2017 07:41 PM, Eric W. Biederman wrote:
> Stefan Berger <"Stefan Bergerstefanb"@linux.vnet.ibm.com> writes:
>
>> From: Stefan Berger <stefanb@linux.vnet.ibm.com>
>>
>> This patch enables security.capability in user namespaces but also
>> takes a more general approach to enabling extended attributes in user
>> namespaces.
>>
>> The following rules describe the approach using security.foo as a
>> 'user namespace enabled' extended attribute:
>>
>> Reading of extended attributes:
>>
>> 1a) Reading security.foo from a user namespace will read
>>      security.foo@uid=<uid> of the parent user namespace instead with uid
>>      being the mapping of root in that parent user namespace. An
>>      exception is if root is mapped to uid 0 on the host, and in this case
>>      we will read security.foo directly.
>>      --> reading security.foo will read security.foo@uid=1000 for uid
>>          mapping of root to 1000.
>>
>> 1b) If security.foo@uid=<uid> is not available, the security.foo of the
>>      parent namespace is tried to be read. This procedure is repeated up to
>>      the init user namespace. This step only applies for reading of extended
>>      attributes and provides the same behavior as older system where the
>>      host's extended attributes applied to user namespaces.
>>
>> 2) All security.foo@uid=<uid> with valid uid mapping in the user namespace
>>     can be read. The uid within the user namespace will be mapped to the
>>     corresponding uid on the host and that uid will be used in the name of
>>     the extended attribute.
>>     -> reading security.foo@uid=1 will read security.foo@uid=1001 for uid
>>        mapping of root to 1000, size of at least 2.
>>
>>     All security.foo@uid=<uid> can be read (by root) on the host with values
>>     of <uid> also being subject to checking for valid mappings.
>>
>> 3) No other security.foo* can be read.
>>
>> The same rules for reading apply to writing and removing of user
>> namespace enabled extended attributes.
>>
>> When listing extended attributes of a file, only those are presented
>> to the user namespace that have a valid mapping. Besides that, names
>> of the extended attributes are adjusted to represent the mapping.
>> This means that if root is mapped to uid 1000 on the host, the
>> security.foo@uid=1000 will be listed as security.foo in the user
>> namespace, security.foo@uid=1001 becomes security.foo@uid=1 and so on.
>>
>> Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
>> Signed-off-by: Serge Hallyn <serge@hallyn.com>
>> Reviewed-by: Serge Hallyn <serge@hallyn.com>
>> ---
>>   fs/xattr.c               | 509 +++++++++++++++++++++++++++++++++++++++++++++--
>>   security/commoncap.c     |  36 +++-
>>   security/selinux/hooks.c |   9 +-
>>   3 files changed, 523 insertions(+), 31 deletions(-)
> I am just going to quickly and publicly point out that as designed this
> patch breaks evm inode metadata signing.  As evm_config_xattrnames is not
> updated.
>
> While not completely insurmountable that seems like a strong limitation of
> this design.

EVM could be converted to get the list of xattrs and prefix-compare it 
against the evm_config_xattrnames to do what it does now.

    Stefan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
       [not found]                                       ` <xagsmtp3.20170715001054.9173@uk1vsc.vnet.ibm.com>
@ 2017-07-16 11:25                                         ` Mimi Zohar
  0 siblings, 0 replies; 76+ messages in thread
From: Mimi Zohar @ 2017-07-16 11:25 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge E. Hallyn, Stefan Berger, Mimi Zohar, Theodore Ts'o,
	containers, lkp, linux-kernel, tycho, James.Bottomley, vgoyal,
	christian.brauner, amir73il, linux-security-module, casey

On Fri, 2017-07-14 at 19:02 -0500, Eric W. Biederman wrote:
> Mimi Zohar <zohar@linux.vnet.ibm.com> writes:
> 
> > On Fri, 2017-07-14 at 13:17 -0500, Eric W. Biederman wrote:
> >> "Serge E. Hallyn" <serge@hallyn.com> writes:
> >> 
> >> > Quoting Stefan Berger (stefanb@linux.vnet.ibm.com):
> >> >> On 07/14/2017 09:34 AM, Serge E. Hallyn wrote:
> >> >> >Quoting Stefan Berger (stefanb@linux.vnet.ibm.com):
> >> >> >>On 07/13/2017 08:38 PM, Eric W. Biederman wrote:
> >> >> >>>Stefan Berger <stefanb@linux.vnet.ibm.com> writes:
> >> >> >>>
> >> >> >>>>On 07/13/2017 01:49 PM, Eric W. Biederman wrote:
> >> >> >>>>
> >> >> >>>>>My big question right now is can you implement Ted's suggested
> >> >> >>>>>restriction.  Only one security.foo or secuirty.foo@... attribute ?
> >> >> >>>>We need to raw-list the xattrs and do the check before writing them. I am fairly sure this can be done.
> >> >> >>>>
> >> >> >>>>So now you want to allow security.foo and one security.foo@uid=<> or just a single one security.foo(@[[:print:]]*)?
> >> >> >>>>
> >> >> >>>The latter.
> >> >> >>That case would prevent a container user from overriding the xattr
> >> >> >>on the host. Is that what we want? For limiting the number of xattrs
> >> >> >Not really.  If the file is owned by a uid mapped into the container,
> >> >> >then the container root can chown the file which will clear the file
> >> >> >capability, after which he can set a new one.  If the file is not
> >> >> >owned by a uid mapped into the container, then container root could
> >> >> >not set a filecap anyway.
> >> >> 
> >> >> Let's say I installed a container where all files are signed and
> >> >> thus have security.ima. Now for some reason I want to re-sign some
> >> >> or all files inside that container. How would I do that ? Would I
> >> >> need to get rid of security.ima first, possibly by copying each
> >> >> file, deleting the original file, and renaming the copied file to
> >> >> the original name, or should I just be able to write out a new
> >> >> signature, thus creating security.ima@uid=1000 besides the
> >> >> security.ima ?
> >> >> 
> >> >>    Stefan
> >> >
> >> > Hi Mimi,
> >> >
> >> > what do you think makes most sense for IMA?
> >> 
> >> I am going to give my two cents since I have been thinking about this.
> >> 
> >> First I think this entire scheme plays hobs with the security.evm
> >> attribute as security.evm needs to know the names of the xattrs to
> >> protect.
> >> 
> >> I forget which attributes has a hash and what has a message
> >> athentication code.
> >
> > security.ima contains either a file hash or a signature.  (file data)
> > security.evm contains either a signature or an hmac of the security
> > xattrs and other file metadata. (file meta-data)
> >
> > The same rules would apply to security.evm, as described in my
> > response.  Based on it's view of the security xattrs, either the
> > native or namespace security.evm would be updated.
> >
> >> If there is an attribute with a simple file hash I think it only make
> >> sense for the kernel to touch it, and I don't see any sense in having
> >> multiples.
> >
> > Only files that are in the IMA-appraisal policy is the file hash
> > calculated and written out as security.ima.  Depending this policy,
> > does the security.ima exist.  So if the file is in policy for both the
> > native and namespace policies, agreed the same hash doesn't need to be
> > written as two different xattrs.
> >
> >> If there is an attribute with a message authentication code (roughly a
> >> signed hash) it makes sense to have that to be tied to the kernel key
> >> ring that controlls the keys.  (Which probably means a per user
> >> namespace thing at some point).  But again pretty untouchable otherwise.
> >
> > Right, the namespace would require it's own EVM key. 
> >
> >> Which brings us to the semantic question of would it be nice to have
> >> stacked IMA/EVM on the same file.
> >> 
> >> I really don't think we do.  I think allowing multiple keys for
> >> different part of trusting files is easy enough that we should have no
> >> need to fight over which keys do which.
> >
> > We definitely want to support different policies on the native and in
> > the namespace with different keys and keyrings.
> >
> > Refer to Mehmet Kaaylap's recent post, which refers to a PoC version
> > of IMA namespacing - kernsec.org/pipermail/linux-security-module-
> > archive/2017-July/002286.html.
> >
> >> Looking at integrity.h I see signature_v2_hdr that has a keyid.  Any use
> >> case I can think of for distributing a distribution image with ima/evm
> >> xattrs will need to use asymmetric keys aka public/private keypairs so
> >> that the originator of the content does not give away their private
> >> keys.
> >
> > Agreed.
> >
> >> Given that usefully we are talking about content that should be
> >> connected to keys in one way or another I don't believe it even makes
> >> sense at this point to attempt to use uids for dealing with ima and
> >> evm content.
> >
> > We need to resolve the xattr issue in order to namespace IMA-
> > appraisal. 
> 
> 
> Mimi I have two questions:
> 
> a) Is the keyid enough to distinguish the security.ima and security.evm
>    xattrs of one container from another container and from native?  Or
>    do we have some important security xattrs that are associated with
>    keys that don't have a keyid?
> 
> b) Can we reasonably live with a limitation that the native and the
>    namespace'd policies don't intersect?  Or in the case of an
>    interesection the native policy is the only one that is executed?
> 
> I submit that if the answer is keyids are always present, and we can
> live with the native policy taking precedence over the container policy
> then we have a solution to the IMA xattrs.

IMA-measurement is hierachical, meaning that the measurement policy
determines whether the measurement exists in the native, the
container, or both measurement lists.

One of the main namespacing use cases for IMA-appraisal is the ability
to limit running an executable to a particular container.  So unlike
IMA-measurement, which is hierarchical, the IMA-appraisal namespace
policy takes precedence over the native policy.

Mimi

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-11 15:05 ` [PATCH v2] xattr: Enable security.capability in user namespaces Stefan Berger
                     ` (5 preceding siblings ...)
  2017-07-14 23:41   ` Eric W. Biederman
@ 2017-07-17 18:58   ` Vivek Goyal
  2017-07-17 20:50     ` Stefan Berger
  6 siblings, 1 reply; 76+ messages in thread
From: Vivek Goyal @ 2017-07-17 18:58 UTC (permalink / raw)
  To: Stefan Berger
  Cc: ebiederm, containers, lkp, linux-kernel, zohar, tycho, serge,
	James.Bottomley, christian.brauner, amir73il,
	linux-security-module, casey, Stefan Berger

On Tue, Jul 11, 2017 at 11:05:11AM -0400, Stefan Berger wrote:

[..]
> +/*
> + * xattr_list_userns_rewrite - Rewrite list of xattr names for user namespaces
> + *                             or determine needed size for attribute list
> + *                             in case size == 0
> + *
> + * In a user namespace we do not present all extended attributes to the
> + * user. We filter out those that are in the list of userns supported xattr.
> + * Besides that we filter out those with @uid=<uid> when there is no mapping
> + * for that uid in the current user namespace.
> + *
> + * @list:        list of 0-byte separated xattr names
> + * @size:        the size of the list; may be 0 to determine needed list size
> + * @list_maxlen: allocated buffer size of list
> + */
> +static ssize_t
> +xattr_list_userns_rewrite(char *list, ssize_t size, size_t list_maxlen)
> +{
> +	char *nlist = NULL;
> +	size_t s_off, len, nlen;
> +	ssize_t d_off;
> +	char *name, *newname;
> +
> +	if (!list || size < 0 || current_user_ns() == &init_user_ns)

size will never be less than 0 here. Only caller calls this function only
if size is >0. So can we remove this?

What about case of "!list". So if user space called listxattr(foo, NULL,
0), we want to return the size of buffer as if all the xattrs will be
returned to user space. But in practice we probably will filter out some
xattrs so actually returned string will be smaller than size reported
previously.

Looks like that's the intent of "!list" condition here. Just wanted to
make sure, hence asking.

BTW, I am testing this with overlayfs and trying to figure out if
switching of creds will create issues. Simple operations like listxattr
and getxattr and setxattr so far worked for me. And reason seems to be
that name transformation we are doing in top layer based on creds of
caller (and not based on creds of mounter). 

Vivek

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-17 18:58   ` Vivek Goyal
@ 2017-07-17 20:50     ` Stefan Berger
  2017-07-18 11:48       ` Vivek Goyal
  0 siblings, 1 reply; 76+ messages in thread
From: Stefan Berger @ 2017-07-17 20:50 UTC (permalink / raw)
  To: Vivek Goyal, Stefan Berger
  Cc: ebiederm, containers, lkp, linux-kernel, zohar, tycho, serge,
	James.Bottomley, christian.brauner, amir73il,
	linux-security-module, casey

On 07/17/2017 02:58 PM, Vivek Goyal wrote:
> On Tue, Jul 11, 2017 at 11:05:11AM -0400, Stefan Berger wrote:
>
> [..]
>> +/*
>> + * xattr_list_userns_rewrite - Rewrite list of xattr names for user namespaces
>> + *                             or determine needed size for attribute list
>> + *                             in case size == 0
>> + *
>> + * In a user namespace we do not present all extended attributes to the
>> + * user. We filter out those that are in the list of userns supported xattr.
>> + * Besides that we filter out those with @uid=<uid> when there is no mapping
>> + * for that uid in the current user namespace.
>> + *
>> + * @list:        list of 0-byte separated xattr names
>> + * @size:        the size of the list; may be 0 to determine needed list size
>> + * @list_maxlen: allocated buffer size of list
>> + */
>> +static ssize_t
>> +xattr_list_userns_rewrite(char *list, ssize_t size, size_t list_maxlen)
>> +{
>> +	char *nlist = NULL;
>> +	size_t s_off, len, nlen;
>> +	ssize_t d_off;
>> +	char *name, *newname;
>> +
>> +	if (!list || size < 0 || current_user_ns() == &init_user_ns)
> size will never be less than 0 here. Only caller calls this function only
> if size is >0. So can we remove this?

Correct.

>
> What about case of "!list". So if user space called listxattr(foo, NULL,
> 0), we want to return the size of buffer as if all the xattrs will be
> returned to user space. But in practice we probably will filter out some
> xattrs so actually returned string will be smaller than size reported
> previously.

This case of size=0 is a problem in userns. Depending on the mapping of 
the userid's the list can expand. A security.foo@uid=100 can become 
security.foo@uid=100000, if the mapping is set up so that uid 100 on the 
host becomes uid 100000 inside the container. So for now we only have 
security.capability and the way I solved this is by allocating a 65k 
buffer when calling from a userns. In this buffer where we gather the 
xattr names and then walk them to determine the size that's needed for 
the buffer by simulating the rewriting. It's not nice but I don't know 
of any other solution.


>
> Looks like that's the intent of "!list" condition here. Just wanted to
> make sure, hence asking.

Thanks for asking. I thought I had this case covered, but obviously I 
did not.

>
> BTW, I am testing this with overlayfs and trying to figure out if
> switching of creds will create issues. Simple operations like listxattr
> and getxattr and setxattr so far worked for me. And reason seems to be
> that name transformation we are doing in top layer based on creds of
> caller (and not based on creds of mounter).
>
> Vivek
>

    Stefan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-13 17:05                 ` Stefan Berger
  2017-07-13 17:39                   ` Eric W. Biederman
@ 2017-07-18  7:01                   ` James Morris
  2017-07-18 12:12                     ` Stefan Berger
  1 sibling, 1 reply; 76+ messages in thread
From: James Morris @ 2017-07-18  7:01 UTC (permalink / raw)
  To: Stefan Berger
  Cc: Theodore Ts'o, Eric W. Biederman, Serge E. Hallyn,
	containers, lkp, linux-kernel, zohar, tycho, James.Bottomley,
	vgoyal, christian.brauner, amir73il, linux-security-module,
	casey

On Thu, 13 Jul 2017, Stefan Berger wrote:

> A file shared by 2 containers, one mapping root to uid=1000, the other mapping
> root to uid=2000, will show these two xattrs on the host (init_user_ns) once
> these containers set xattrs on that file.

I may be missing something here, but what happens when say the uid=2000 
container and associated user is deleted from the system, then another is 
created with the same uid?

Won't this mean that you have unexpected capabilities turning up in the 
new container?

-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-17 20:50     ` Stefan Berger
@ 2017-07-18 11:48       ` Vivek Goyal
  2017-07-18 12:05         ` Stefan Berger
  0 siblings, 1 reply; 76+ messages in thread
From: Vivek Goyal @ 2017-07-18 11:48 UTC (permalink / raw)
  To: Stefan Berger
  Cc: Stefan Berger, ebiederm, containers, lkp, linux-kernel, zohar,
	tycho, serge, James.Bottomley, christian.brauner, amir73il,
	linux-security-module, casey

On Mon, Jul 17, 2017 at 04:50:22PM -0400, Stefan Berger wrote:
> On 07/17/2017 02:58 PM, Vivek Goyal wrote:
> > On Tue, Jul 11, 2017 at 11:05:11AM -0400, Stefan Berger wrote:
> > 
> > [..]
> > > +/*
> > > + * xattr_list_userns_rewrite - Rewrite list of xattr names for user namespaces
> > > + *                             or determine needed size for attribute list
> > > + *                             in case size == 0
> > > + *
> > > + * In a user namespace we do not present all extended attributes to the
> > > + * user. We filter out those that are in the list of userns supported xattr.
> > > + * Besides that we filter out those with @uid=<uid> when there is no mapping
> > > + * for that uid in the current user namespace.
> > > + *
> > > + * @list:        list of 0-byte separated xattr names
> > > + * @size:        the size of the list; may be 0 to determine needed list size
> > > + * @list_maxlen: allocated buffer size of list
> > > + */
> > > +static ssize_t
> > > +xattr_list_userns_rewrite(char *list, ssize_t size, size_t list_maxlen)
> > > +{
> > > +	char *nlist = NULL;
> > > +	size_t s_off, len, nlen;
> > > +	ssize_t d_off;
> > > +	char *name, *newname;
> > > +
> > > +	if (!list || size < 0 || current_user_ns() == &init_user_ns)
> > size will never be less than 0 here. Only caller calls this function only
> > if size is >0. So can we remove this?
> 
> Correct.
> 
> > 
> > What about case of "!list". So if user space called listxattr(foo, NULL,
> > 0), we want to return the size of buffer as if all the xattrs will be
> > returned to user space. But in practice we probably will filter out some
> > xattrs so actually returned string will be smaller than size reported
> > previously.
> 
> This case of size=0 is a problem in userns. Depending on the mapping of the
> userid's the list can expand. A security.foo@uid=100 can become
> security.foo@uid=100000, if the mapping is set up so that uid 100 on the
> host becomes uid 100000 inside the container. So for now we only have
> security.capability and the way I solved this is by allocating a 65k buffer
> when calling from a userns. In this buffer where we gather the xattr names
> and then walk them to determine the size that's needed for the buffer by
> simulating the rewriting. It's not nice but I don't know of any other
> solution.

Hi Stefan,

For the case of size==0, why don't we iterate through all the xattr,
filter them, remap them and then return the size to process in user
namespace. That should fix this? I thought that's what
xattr_list_userns_rewrite() was doing. But looks like this logic will not
kick in for the case of size==0 due to "!list" condition.

Also we could probably replace "!list" with "!size" wheverever required.
Its little easy to read and understand.

For the other case where some xattrs can get filtered out and we report
a buffer size bigger than actually needed, I am hoping that its acceptable
and none of the existing users are broken.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-18 11:48       ` Vivek Goyal
@ 2017-07-18 12:05         ` Stefan Berger
  2017-07-18 12:30           ` Vivek Goyal
  0 siblings, 1 reply; 76+ messages in thread
From: Stefan Berger @ 2017-07-18 12:05 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Stefan Berger, ebiederm, containers, lkp, linux-kernel, zohar,
	tycho, serge, James.Bottomley, christian.brauner, amir73il,
	linux-security-module, casey

On 07/18/2017 07:48 AM, Vivek Goyal wrote:
> On Mon, Jul 17, 2017 at 04:50:22PM -0400, Stefan Berger wrote:
>> On 07/17/2017 02:58 PM, Vivek Goyal wrote:
>>> On Tue, Jul 11, 2017 at 11:05:11AM -0400, Stefan Berger wrote:
>>>
>>> [..]
>>>> +/*
>>>> + * xattr_list_userns_rewrite - Rewrite list of xattr names for user namespaces
>>>> + *                             or determine needed size for attribute list
>>>> + *                             in case size == 0
>>>> + *
>>>> + * In a user namespace we do not present all extended attributes to the
>>>> + * user. We filter out those that are in the list of userns supported xattr.
>>>> + * Besides that we filter out those with @uid=<uid> when there is no mapping
>>>> + * for that uid in the current user namespace.
>>>> + *
>>>> + * @list:        list of 0-byte separated xattr names
>>>> + * @size:        the size of the list; may be 0 to determine needed list size
>>>> + * @list_maxlen: allocated buffer size of list
>>>> + */
>>>> +static ssize_t
>>>> +xattr_list_userns_rewrite(char *list, ssize_t size, size_t list_maxlen)
>>>> +{
>>>> +	char *nlist = NULL;
>>>> +	size_t s_off, len, nlen;
>>>> +	ssize_t d_off;
>>>> +	char *name, *newname;
>>>> +
>>>> +	if (!list || size < 0 || current_user_ns() == &init_user_ns)
>>> size will never be less than 0 here. Only caller calls this function only
>>> if size is >0. So can we remove this?
>> Correct.
>>
>>> What about case of "!list". So if user space called listxattr(foo, NULL,
>>> 0), we want to return the size of buffer as if all the xattrs will be
>>> returned to user space. But in practice we probably will filter out some
>>> xattrs so actually returned string will be smaller than size reported
>>> previously.
>> This case of size=0 is a problem in userns. Depending on the mapping of the
>> userid's the list can expand. A security.foo@uid=100 can become
>> security.foo@uid=100000, if the mapping is set up so that uid 100 on the
>> host becomes uid 100000 inside the container. So for now we only have
>> security.capability and the way I solved this is by allocating a 65k buffer
>> when calling from a userns. In this buffer where we gather the xattr names
>> and then walk them to determine the size that's needed for the buffer by
>> simulating the rewriting. It's not nice but I don't know of any other
>> solution.
> Hi Stefan,
>
> For the case of size==0, why don't we iterate through all the xattr,
> filter them, remap them and then return the size to process in user
> namespace. That should fix this? I thought that's what


For the size==0 we need a temp. buffer where the raw xattr names are 
written to so that the xattr_list_userns_rewrite() can actually rewrite 
what the filesystem drivers returned. Not knowing exactly how big that 
buffer should be, I allocate 65k for it. From what I read there is a 64k 
limit on the vfs layer for xattrs, probably including xattr values. So 
65k would for sure be enough also if each one of the xattr names becomes 
bigger.

@@ -922,10 +947,20 @@ vfs_listxattr(struct dentry *dentry, char *list, 
size_t size, bool rewrite)
  {
      struct inode *inode = d_inode(dentry);
      ssize_t error;
+    bool getsize = false;

      error = security_inode_listxattr(dentry);
      if (error)
          return error;
+
+    if (!size) {
+        if (current_user_ns() != &init_user_ns) {
+            size = 65 * 1024;
+            list = kmalloc(size, GFP_KERNEL);
+        }
+        getsize = true;
+    }
+
      if (inode->i_op->listxattr && (inode->i_opflags & IOP_XATTR)) {
          error = -EOPNOTSUPP;
          error = inode->i_op->listxattr(dentry, list, size);
@@ -937,6 +972,9 @@ vfs_listxattr(struct dentry *dentry, char *list, 
size_t size, bool rewrite)
      if (error > 0 && rewrite)
          error = xattr_list_userns_rewrite(list, error, size);

+    if (getsize)
+        kfree(list);
+
      return error;
  }
  EXPORT_SYMBOL_GPL(vfs_listxattr);


     Stefan

> xattr_list_userns_rewrite() was doing. But looks like this logic will not
> kick in for the case of size==0 due to "!list" condition.
>
> Also we could probably replace "!list" with "!size" wheverever required.
> Its little easy to read and understand.
>
> For the other case where some xattrs can get filtered out and we report
> a buffer size bigger than actually needed, I am hoping that its acceptable
> and none of the existing users are broken.
>
> Thanks
> Vivek
> --
> To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-18  7:01                   ` James Morris
@ 2017-07-18 12:12                     ` Stefan Berger
  2017-07-18 13:26                       ` Eric W. Biederman
  0 siblings, 1 reply; 76+ messages in thread
From: Stefan Berger @ 2017-07-18 12:12 UTC (permalink / raw)
  To: James Morris
  Cc: Theodore Ts'o, Eric W. Biederman, Serge E. Hallyn,
	containers, lkp, linux-kernel, zohar, tycho, James.Bottomley,
	vgoyal, christian.brauner, amir73il, linux-security-module,
	casey

On 07/18/2017 03:01 AM, James Morris wrote:
> On Thu, 13 Jul 2017, Stefan Berger wrote:
>
>> A file shared by 2 containers, one mapping root to uid=1000, the other mapping
>> root to uid=2000, will show these two xattrs on the host (init_user_ns) once
>> these containers set xattrs on that file.
> I may be missing something here, but what happens when say the uid=2000
> container and associated user is deleted from the system, then another is
> created with the same uid?
>
> Won't this mean that you have unexpected capabilities turning up in the
> new container?
>

Yes, that's right. I don't know any solution for that. We would have to 
walk the filesystems and find all 'stale' xattrs with such a uid. This 
is independent of whether the uid is encoded on the name side, as in 
this patch, or on the value side, as in Serge's original proposal. And 
uids of a mapped container root user don't necessarily have to have an 
account on the host so that an account deletion could trigger that.

     Stefan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-18 12:05         ` Stefan Berger
@ 2017-07-18 12:30           ` Vivek Goyal
  2017-07-18 12:36             ` Vivek Goyal
  2017-07-18 13:21             ` Stefan Berger
  0 siblings, 2 replies; 76+ messages in thread
From: Vivek Goyal @ 2017-07-18 12:30 UTC (permalink / raw)
  To: Stefan Berger
  Cc: Stefan Berger, ebiederm, containers, lkp, linux-kernel, zohar,
	tycho, serge, James.Bottomley, christian.brauner, amir73il,
	linux-security-module, casey

On Tue, Jul 18, 2017 at 08:05:18AM -0400, Stefan Berger wrote:
> On 07/18/2017 07:48 AM, Vivek Goyal wrote:
> > On Mon, Jul 17, 2017 at 04:50:22PM -0400, Stefan Berger wrote:
> > > On 07/17/2017 02:58 PM, Vivek Goyal wrote:
> > > > On Tue, Jul 11, 2017 at 11:05:11AM -0400, Stefan Berger wrote:
> > > > 
> > > > [..]
> > > > > +/*
> > > > > + * xattr_list_userns_rewrite - Rewrite list of xattr names for user namespaces
> > > > > + *                             or determine needed size for attribute list
> > > > > + *                             in case size == 0
> > > > > + *
> > > > > + * In a user namespace we do not present all extended attributes to the
> > > > > + * user. We filter out those that are in the list of userns supported xattr.
> > > > > + * Besides that we filter out those with @uid=<uid> when there is no mapping
> > > > > + * for that uid in the current user namespace.
> > > > > + *
> > > > > + * @list:        list of 0-byte separated xattr names
> > > > > + * @size:        the size of the list; may be 0 to determine needed list size
> > > > > + * @list_maxlen: allocated buffer size of list
> > > > > + */
> > > > > +static ssize_t
> > > > > +xattr_list_userns_rewrite(char *list, ssize_t size, size_t list_maxlen)
> > > > > +{
> > > > > +	char *nlist = NULL;
> > > > > +	size_t s_off, len, nlen;
> > > > > +	ssize_t d_off;
> > > > > +	char *name, *newname;
> > > > > +
> > > > > +	if (!list || size < 0 || current_user_ns() == &init_user_ns)
> > > > size will never be less than 0 here. Only caller calls this function only
> > > > if size is >0. So can we remove this?
> > > Correct.
> > > 
> > > > What about case of "!list". So if user space called listxattr(foo, NULL,
> > > > 0), we want to return the size of buffer as if all the xattrs will be
> > > > returned to user space. But in practice we probably will filter out some
> > > > xattrs so actually returned string will be smaller than size reported
> > > > previously.
> > > This case of size=0 is a problem in userns. Depending on the mapping of the
> > > userid's the list can expand. A security.foo@uid=100 can become
> > > security.foo@uid=100000, if the mapping is set up so that uid 100 on the
> > > host becomes uid 100000 inside the container. So for now we only have
> > > security.capability and the way I solved this is by allocating a 65k buffer
> > > when calling from a userns. In this buffer where we gather the xattr names
> > > and then walk them to determine the size that's needed for the buffer by
> > > simulating the rewriting. It's not nice but I don't know of any other
> > > solution.
> > Hi Stefan,
> > 
> > For the case of size==0, why don't we iterate through all the xattr,
> > filter them, remap them and then return the size to process in user
> > namespace. That should fix this? I thought that's what
> 
> 
> For the size==0 we need a temp. buffer where the raw xattr names are written
> to so that the xattr_list_userns_rewrite() can actually rewrite what the
> filesystem drivers returned.

I am probably missing something, but for the case of size==0, we don't
have to copy all xattrs. We just need to determine size. So we can walk
through each xattr, remap it and add to the size. I mean there should not
be a need to allocate this 65K buffer. Just enough space needed to be
able to store remapped xattr.

You are already doing it in xattr_parse_uid_from_kuid(). It returns the
buffer containing remapped xattr. So we should be able to just determine
the size and free the buffer. And do it for all the xattrs returned by
filesystem.

What am I missing?

Vivek

> Not knowing exactly how big that buffer should
> be, I allocate 65k for it. From what I read there is a 64k limit on the vfs
> layer for xattrs, probably including xattr values. So 65k would for sure be
> enough also if each one of the xattr names becomes bigger.
> 
> @@ -922,10 +947,20 @@ vfs_listxattr(struct dentry *dentry, char *list,
> size_t size, bool rewrite)
>  {
>      struct inode *inode = d_inode(dentry);
>      ssize_t error;
> +    bool getsize = false;
> 
>      error = security_inode_listxattr(dentry);
>      if (error)
>          return error;
> +
> +    if (!size) {
> +        if (current_user_ns() != &init_user_ns) {
> +            size = 65 * 1024;
> +            list = kmalloc(size, GFP_KERNEL);
> +        }
> +        getsize = true;
> +    }
> +
>      if (inode->i_op->listxattr && (inode->i_opflags & IOP_XATTR)) {
>          error = -EOPNOTSUPP;
>          error = inode->i_op->listxattr(dentry, list, size);
> @@ -937,6 +972,9 @@ vfs_listxattr(struct dentry *dentry, char *list, size_t
> size, bool rewrite)
>      if (error > 0 && rewrite)
>          error = xattr_list_userns_rewrite(list, error, size);
> 
> +    if (getsize)
> +        kfree(list);
> +
>      return error;
>  }
>  EXPORT_SYMBOL_GPL(vfs_listxattr);
> 
> 
>     Stefan
> 
> > xattr_list_userns_rewrite() was doing. But looks like this logic will not
> > kick in for the case of size==0 due to "!list" condition.
> > 
> > Also we could probably replace "!list" with "!size" wheverever required.
> > Its little easy to read and understand.
> > 
> > For the other case where some xattrs can get filtered out and we report
> > a buffer size bigger than actually needed, I am hoping that its acceptable
> > and none of the existing users are broken.
> > 
> > Thanks
> > Vivek
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-18 12:30           ` Vivek Goyal
@ 2017-07-18 12:36             ` Vivek Goyal
  2017-07-18 13:29               ` Eric W. Biederman
  2017-07-18 13:21             ` Stefan Berger
  1 sibling, 1 reply; 76+ messages in thread
From: Vivek Goyal @ 2017-07-18 12:36 UTC (permalink / raw)
  To: Stefan Berger
  Cc: Stefan Berger, ebiederm, containers, lkp, linux-kernel, zohar,
	tycho, serge, James.Bottomley, christian.brauner, amir73il,
	linux-security-module, casey

On Tue, Jul 18, 2017 at 08:30:09AM -0400, Vivek Goyal wrote:
> On Tue, Jul 18, 2017 at 08:05:18AM -0400, Stefan Berger wrote:
> > On 07/18/2017 07:48 AM, Vivek Goyal wrote:
> > > On Mon, Jul 17, 2017 at 04:50:22PM -0400, Stefan Berger wrote:
> > > > On 07/17/2017 02:58 PM, Vivek Goyal wrote:
> > > > > On Tue, Jul 11, 2017 at 11:05:11AM -0400, Stefan Berger wrote:
> > > > > 
> > > > > [..]
> > > > > > +/*
> > > > > > + * xattr_list_userns_rewrite - Rewrite list of xattr names for user namespaces
> > > > > > + *                             or determine needed size for attribute list
> > > > > > + *                             in case size == 0
> > > > > > + *
> > > > > > + * In a user namespace we do not present all extended attributes to the
> > > > > > + * user. We filter out those that are in the list of userns supported xattr.
> > > > > > + * Besides that we filter out those with @uid=<uid> when there is no mapping
> > > > > > + * for that uid in the current user namespace.
> > > > > > + *
> > > > > > + * @list:        list of 0-byte separated xattr names
> > > > > > + * @size:        the size of the list; may be 0 to determine needed list size
> > > > > > + * @list_maxlen: allocated buffer size of list
> > > > > > + */
> > > > > > +static ssize_t
> > > > > > +xattr_list_userns_rewrite(char *list, ssize_t size, size_t list_maxlen)
> > > > > > +{
> > > > > > +	char *nlist = NULL;
> > > > > > +	size_t s_off, len, nlen;
> > > > > > +	ssize_t d_off;
> > > > > > +	char *name, *newname;
> > > > > > +
> > > > > > +	if (!list || size < 0 || current_user_ns() == &init_user_ns)
> > > > > size will never be less than 0 here. Only caller calls this function only
> > > > > if size is >0. So can we remove this?
> > > > Correct.
> > > > 
> > > > > What about case of "!list". So if user space called listxattr(foo, NULL,
> > > > > 0), we want to return the size of buffer as if all the xattrs will be
> > > > > returned to user space. But in practice we probably will filter out some
> > > > > xattrs so actually returned string will be smaller than size reported
> > > > > previously.
> > > > This case of size=0 is a problem in userns. Depending on the mapping of the
> > > > userid's the list can expand. A security.foo@uid=100 can become
> > > > security.foo@uid=100000, if the mapping is set up so that uid 100 on the
> > > > host becomes uid 100000 inside the container. So for now we only have
> > > > security.capability and the way I solved this is by allocating a 65k buffer
> > > > when calling from a userns. In this buffer where we gather the xattr names
> > > > and then walk them to determine the size that's needed for the buffer by
> > > > simulating the rewriting. It's not nice but I don't know of any other
> > > > solution.
> > > Hi Stefan,
> > > 
> > > For the case of size==0, why don't we iterate through all the xattr,
> > > filter them, remap them and then return the size to process in user
> > > namespace. That should fix this? I thought that's what
> > 
> > 
> > For the size==0 we need a temp. buffer where the raw xattr names are written
> > to so that the xattr_list_userns_rewrite() can actually rewrite what the
> > filesystem drivers returned.
> 
> I am probably missing something, but for the case of size==0, we don't
> have to copy all xattrs. We just need to determine size. So we can walk
> through each xattr, remap it and add to the size. I mean there should not
> be a need to allocate this 65K buffer. Just enough space needed to be
> able to store remapped xattr.
> 
> You are already doing it in xattr_parse_uid_from_kuid(). It returns the
> buffer containing remapped xattr. So we should be able to just determine
> the size and free the buffer. And do it for all the xattrs returned by
> filesystem.
> 
> What am I missing?

Oh, I think I get it. If I don't pass a buffer to underlying driver, then
it will just return the size (and not actual list). So that's why you are
allocating that big buffer and getting the whole list internally, doing
mapping and returning size to user space. Hmm...

Vivek

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-18 12:30           ` Vivek Goyal
  2017-07-18 12:36             ` Vivek Goyal
@ 2017-07-18 13:21             ` Stefan Berger
  2017-07-18 14:57               ` Vivek Goyal
  1 sibling, 1 reply; 76+ messages in thread
From: Stefan Berger @ 2017-07-18 13:21 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: ebiederm, containers, lkp, linux-kernel, zohar, tycho, serge,
	James.Bottomley, christian.brauner, amir73il,
	linux-security-module, casey

On 07/18/2017 08:30 AM, Vivek Goyal wrote:
> On Tue, Jul 18, 2017 at 08:05:18AM -0400, Stefan Berger wrote:
>> On 07/18/2017 07:48 AM, Vivek Goyal wrote:
>>> On Mon, Jul 17, 2017 at 04:50:22PM -0400, Stefan Berger wrote:
>>>> On 07/17/2017 02:58 PM, Vivek Goyal wrote:
>>>>> On Tue, Jul 11, 2017 at 11:05:11AM -0400, Stefan Berger wrote:
>>>>>
>>>>> [..]
>>>>>> +/*
>>>>>> + * xattr_list_userns_rewrite - Rewrite list of xattr names for user namespaces
>>>>>> + *                             or determine needed size for attribute list
>>>>>> + *                             in case size == 0
>>>>>> + *
>>>>>> + * In a user namespace we do not present all extended attributes to the
>>>>>> + * user. We filter out those that are in the list of userns supported xattr.
>>>>>> + * Besides that we filter out those with @uid=<uid> when there is no mapping
>>>>>> + * for that uid in the current user namespace.
>>>>>> + *
>>>>>> + * @list:        list of 0-byte separated xattr names
>>>>>> + * @size:        the size of the list; may be 0 to determine needed list size
>>>>>> + * @list_maxlen: allocated buffer size of list
>>>>>> + */
>>>>>> +static ssize_t
>>>>>> +xattr_list_userns_rewrite(char *list, ssize_t size, size_t list_maxlen)
>>>>>> +{
>>>>>> +	char *nlist = NULL;
>>>>>> +	size_t s_off, len, nlen;
>>>>>> +	ssize_t d_off;
>>>>>> +	char *name, *newname;
>>>>>> +
>>>>>> +	if (!list || size < 0 || current_user_ns() == &init_user_ns)
>>>>> size will never be less than 0 here. Only caller calls this function only
>>>>> if size is >0. So can we remove this?
>>>> Correct.
>>>>
>>>>> What about case of "!list". So if user space called listxattr(foo, NULL,
>>>>> 0), we want to return the size of buffer as if all the xattrs will be
>>>>> returned to user space. But in practice we probably will filter out some
>>>>> xattrs so actually returned string will be smaller than size reported
>>>>> previously.
>>>> This case of size=0 is a problem in userns. Depending on the mapping of the
>>>> userid's the list can expand. A security.foo@uid=100 can become
>>>> security.foo@uid=100000, if the mapping is set up so that uid 100 on the
>>>> host becomes uid 100000 inside the container. So for now we only have
>>>> security.capability and the way I solved this is by allocating a 65k buffer
>>>> when calling from a userns. In this buffer where we gather the xattr names
>>>> and then walk them to determine the size that's needed for the buffer by
>>>> simulating the rewriting. It's not nice but I don't know of any other
>>>> solution.
>>> Hi Stefan,
>>>
>>> For the case of size==0, why don't we iterate through all the xattr,
>>> filter them, remap them and then return the size to process in user
>>> namespace. That should fix this? I thought that's what
>>
>> For the size==0 we need a temp. buffer where the raw xattr names are written
>> to so that the xattr_list_userns_rewrite() can actually rewrite what the
>> filesystem drivers returned.
> I am probably missing something, but for the case of size==0, we don't
> have to copy all xattrs. We just need to determine size. So we can walk
> through each xattr, remap it and add to the size. I mean there should not
> be a need to allocate this 65K buffer. Just enough space needed to be
> able to store remapped xattr.
>
> You are already doing it in xattr_parse_uid_from_kuid(). It returns the
> buffer containing remapped xattr. So we should be able to just determine
> the size and free the buffer. And do it for all the xattrs returned by
> filesystem.
>
> What am I missing?

The problem is that each filesystem has a function that collects the 
xattr names. These functions only return the needed size if size==0 and 
don't write anything into a buffer. If the buffer is empty or there is 
no buffer, I have nothing to remap and calculate size for. So I pass a 
buffer large enough to hold the xattr names to the filesystem functions 
so I can then subsequently walk the xattrs and remap them. The remapping 
only needs to be done in non-init_user_ns since there the uid parts 
(@uid=1000) may need to be rewritten and most importantly, the size of 
the needed buffer can increase, depending on how the uid mappings are.

I don't want to extend every filesystem's xattr name gathering function...


    Stefan



>
> Vivek
>
>> Not knowing exactly how big that buffer should
>> be, I allocate 65k for it. From what I read there is a 64k limit on the vfs
>> layer for xattrs, probably including xattr values. So 65k would for sure be
>> enough also if each one of the xattr names becomes bigger.
>>
>> @@ -922,10 +947,20 @@ vfs_listxattr(struct dentry *dentry, char *list,
>> size_t size, bool rewrite)
>>   {
>>       struct inode *inode = d_inode(dentry);
>>       ssize_t error;
>> +    bool getsize = false;
>>
>>       error = security_inode_listxattr(dentry);
>>       if (error)
>>           return error;
>> +
>> +    if (!size) {
>> +        if (current_user_ns() != &init_user_ns) {
>> +            size = 65 * 1024;
>> +            list = kmalloc(size, GFP_KERNEL);
>> +        }
>> +        getsize = true;
>> +    }
>> +
>>       if (inode->i_op->listxattr && (inode->i_opflags & IOP_XATTR)) {
>>           error = -EOPNOTSUPP;
>>           error = inode->i_op->listxattr(dentry, list, size);
>> @@ -937,6 +972,9 @@ vfs_listxattr(struct dentry *dentry, char *list, size_t
>> size, bool rewrite)
>>       if (error > 0 && rewrite)
>>           error = xattr_list_userns_rewrite(list, error, size);
>>
>> +    if (getsize)
>> +        kfree(list);
>> +
>>       return error;
>>   }
>>   EXPORT_SYMBOL_GPL(vfs_listxattr);
>>
>>
>>      Stefan
>>
>>> xattr_list_userns_rewrite() was doing. But looks like this logic will not
>>> kick in for the case of size==0 due to "!list" condition.
>>>
>>> Also we could probably replace "!list" with "!size" wheverever required.
>>> Its little easy to read and understand.
>>>
>>> For the other case where some xattrs can get filtered out and we report
>>> a buffer size bigger than actually needed, I am hoping that its acceptable
>>> and none of the existing users are broken.
>>>
>>> Thanks
>>> Vivek
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-18 12:12                     ` Stefan Berger
@ 2017-07-18 13:26                       ` Eric W. Biederman
  2017-07-18 23:13                         ` Serge E. Hallyn
  0 siblings, 1 reply; 76+ messages in thread
From: Eric W. Biederman @ 2017-07-18 13:26 UTC (permalink / raw)
  To: Stefan Berger
  Cc: James Morris, Theodore Ts'o, Serge E. Hallyn, containers,
	lkp, linux-kernel, zohar, tycho, James.Bottomley, vgoyal,
	christian.brauner, amir73il, linux-security-module, casey

Stefan Berger <stefanb@linux.vnet.ibm.com> writes:

> On 07/18/2017 03:01 AM, James Morris wrote:
>> On Thu, 13 Jul 2017, Stefan Berger wrote:
>>
>>> A file shared by 2 containers, one mapping root to uid=1000, the other mapping
>>> root to uid=2000, will show these two xattrs on the host (init_user_ns) once
>>> these containers set xattrs on that file.
>> I may be missing something here, but what happens when say the uid=2000
>> container and associated user is deleted from the system, then another is
>> created with the same uid?
>>
>> Won't this mean that you have unexpected capabilities turning up in the
>> new container?
>>
>
> Yes, that's right. I don't know any solution for that. We would have to walk the
> filesystems and find all 'stale' xattrs with such a uid. This is independent of
> whether the uid is encoded on the name side, as in this patch, or on the value
> side, as in Serge's original proposal. And uids of a mapped container root user
> don't necessarily have to have an account on the host so that an account
> deletion could trigger that.

This problem is actually independent of this piece of code entirely.
Any lingering files owned by that uid have the same issue.

Uids are are persistent system property.  It must be arranged that they
are managed carefully.  If you want to reuse a uid userdel or the
equivalent must be able to go out and delete everything they have owned.

Which is to say fundamentally this is userspaces's responsibility.

I don't see this as being particularly subtle or novel.  We already
track which uids and which subordinate uids go together.  I will nack
anything that allows multiple capability attributes per file as we have
established they are unnecessary.  So I don't see any scenarios where
this is likely to come up that it would not be a problem without these
uid tagged security capabilities.

Eric

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-18 12:36             ` Vivek Goyal
@ 2017-07-18 13:29               ` Eric W. Biederman
  0 siblings, 0 replies; 76+ messages in thread
From: Eric W. Biederman @ 2017-07-18 13:29 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Stefan Berger, Stefan Berger, containers, lkp, linux-kernel,
	zohar, tycho, serge, James.Bottomley, christian.brauner,
	amir73il, linux-security-module, casey

Vivek Goyal <vgoyal@redhat.com> writes:

> On Tue, Jul 18, 2017 at 08:30:09AM -0400, Vivek Goyal wrote:
>> On Tue, Jul 18, 2017 at 08:05:18AM -0400, Stefan Berger wrote:
>> > On 07/18/2017 07:48 AM, Vivek Goyal wrote:
>> > > On Mon, Jul 17, 2017 at 04:50:22PM -0400, Stefan Berger wrote:
>> > > > On 07/17/2017 02:58 PM, Vivek Goyal wrote:
>> > > > > On Tue, Jul 11, 2017 at 11:05:11AM -0400, Stefan Berger wrote:
>> > > > > 
>> > > > > [..]
>> > > > > > +/*
>> > > > > > + * xattr_list_userns_rewrite - Rewrite list of xattr names for user namespaces
>> > > > > > + *                             or determine needed size for attribute list
>> > > > > > + *                             in case size == 0
>> > > > > > + *
>> > > > > > + * In a user namespace we do not present all extended attributes to the
>> > > > > > + * user. We filter out those that are in the list of userns supported xattr.
>> > > > > > + * Besides that we filter out those with @uid=<uid> when there is no mapping
>> > > > > > + * for that uid in the current user namespace.
>> > > > > > + *
>> > > > > > + * @list:        list of 0-byte separated xattr names
>> > > > > > + * @size:        the size of the list; may be 0 to determine needed list size
>> > > > > > + * @list_maxlen: allocated buffer size of list
>> > > > > > + */
>> > > > > > +static ssize_t
>> > > > > > +xattr_list_userns_rewrite(char *list, ssize_t size, size_t list_maxlen)
>> > > > > > +{
>> > > > > > +	char *nlist = NULL;
>> > > > > > +	size_t s_off, len, nlen;
>> > > > > > +	ssize_t d_off;
>> > > > > > +	char *name, *newname;
>> > > > > > +
>> > > > > > +	if (!list || size < 0 || current_user_ns() == &init_user_ns)
>> > > > > size will never be less than 0 here. Only caller calls this function only
>> > > > > if size is >0. So can we remove this?
>> > > > Correct.
>> > > > 
>> > > > > What about case of "!list". So if user space called listxattr(foo, NULL,
>> > > > > 0), we want to return the size of buffer as if all the xattrs will be
>> > > > > returned to user space. But in practice we probably will filter out some
>> > > > > xattrs so actually returned string will be smaller than size reported
>> > > > > previously.
>> > > > This case of size=0 is a problem in userns. Depending on the mapping of the
>> > > > userid's the list can expand. A security.foo@uid=100 can become
>> > > > security.foo@uid=100000, if the mapping is set up so that uid 100 on the
>> > > > host becomes uid 100000 inside the container. So for now we only have
>> > > > security.capability and the way I solved this is by allocating a 65k buffer
>> > > > when calling from a userns. In this buffer where we gather the xattr names
>> > > > and then walk them to determine the size that's needed for the buffer by
>> > > > simulating the rewriting. It's not nice but I don't know of any other
>> > > > solution.
>> > > Hi Stefan,
>> > > 
>> > > For the case of size==0, why don't we iterate through all the xattr,
>> > > filter them, remap them and then return the size to process in user
>> > > namespace. That should fix this? I thought that's what
>> > 
>> > 
>> > For the size==0 we need a temp. buffer where the raw xattr names are written
>> > to so that the xattr_list_userns_rewrite() can actually rewrite what the
>> > filesystem drivers returned.
>> 
>> I am probably missing something, but for the case of size==0, we don't
>> have to copy all xattrs. We just need to determine size. So we can walk
>> through each xattr, remap it and add to the size. I mean there should not
>> be a need to allocate this 65K buffer. Just enough space needed to be
>> able to store remapped xattr.
>> 
>> You are already doing it in xattr_parse_uid_from_kuid(). It returns the
>> buffer containing remapped xattr. So we should be able to just determine
>> the size and free the buffer. And do it for all the xattrs returned by
>> filesystem.
>> 
>> What am I missing?
>
> Oh, I think I get it. If I don't pass a buffer to underlying driver, then
> it will just return the size (and not actual list). So that's why you are
> allocating that big buffer and getting the whole list internally, doing
> mapping and returning size to user space. Hmm...

A valid reason to be leary of storing attributs in the xattrs.

Eric

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-18 13:21             ` Stefan Berger
@ 2017-07-18 14:57               ` Vivek Goyal
  2017-07-18 16:11                 ` Stefan Berger
  0 siblings, 1 reply; 76+ messages in thread
From: Vivek Goyal @ 2017-07-18 14:57 UTC (permalink / raw)
  To: Stefan Berger
  Cc: ebiederm, containers, lkp, linux-kernel, zohar, tycho, serge,
	James.Bottomley, christian.brauner, amir73il,
	linux-security-module, casey

On Tue, Jul 18, 2017 at 09:21:22AM -0400, Stefan Berger wrote:
> On 07/18/2017 08:30 AM, Vivek Goyal wrote:
> > On Tue, Jul 18, 2017 at 08:05:18AM -0400, Stefan Berger wrote:
> > > On 07/18/2017 07:48 AM, Vivek Goyal wrote:
> > > > On Mon, Jul 17, 2017 at 04:50:22PM -0400, Stefan Berger wrote:
> > > > > On 07/17/2017 02:58 PM, Vivek Goyal wrote:
> > > > > > On Tue, Jul 11, 2017 at 11:05:11AM -0400, Stefan Berger wrote:
> > > > > > 
> > > > > > [..]
> > > > > > > +/*
> > > > > > > + * xattr_list_userns_rewrite - Rewrite list of xattr names for user namespaces
> > > > > > > + *                             or determine needed size for attribute list
> > > > > > > + *                             in case size == 0
> > > > > > > + *
> > > > > > > + * In a user namespace we do not present all extended attributes to the
> > > > > > > + * user. We filter out those that are in the list of userns supported xattr.
> > > > > > > + * Besides that we filter out those with @uid=<uid> when there is no mapping
> > > > > > > + * for that uid in the current user namespace.
> > > > > > > + *
> > > > > > > + * @list:        list of 0-byte separated xattr names
> > > > > > > + * @size:        the size of the list; may be 0 to determine needed list size
> > > > > > > + * @list_maxlen: allocated buffer size of list
> > > > > > > + */
> > > > > > > +static ssize_t
> > > > > > > +xattr_list_userns_rewrite(char *list, ssize_t size, size_t list_maxlen)
> > > > > > > +{
> > > > > > > +	char *nlist = NULL;
> > > > > > > +	size_t s_off, len, nlen;
> > > > > > > +	ssize_t d_off;
> > > > > > > +	char *name, *newname;
> > > > > > > +
> > > > > > > +	if (!list || size < 0 || current_user_ns() == &init_user_ns)
> > > > > > size will never be less than 0 here. Only caller calls this function only
> > > > > > if size is >0. So can we remove this?
> > > > > Correct.
> > > > > 
> > > > > > What about case of "!list". So if user space called listxattr(foo, NULL,
> > > > > > 0), we want to return the size of buffer as if all the xattrs will be
> > > > > > returned to user space. But in practice we probably will filter out some
> > > > > > xattrs so actually returned string will be smaller than size reported
> > > > > > previously.
> > > > > This case of size=0 is a problem in userns. Depending on the mapping of the
> > > > > userid's the list can expand. A security.foo@uid=100 can become
> > > > > security.foo@uid=100000, if the mapping is set up so that uid 100 on the
> > > > > host becomes uid 100000 inside the container. So for now we only have
> > > > > security.capability and the way I solved this is by allocating a 65k buffer
> > > > > when calling from a userns. In this buffer where we gather the xattr names
> > > > > and then walk them to determine the size that's needed for the buffer by
> > > > > simulating the rewriting. It's not nice but I don't know of any other
> > > > > solution.
> > > > Hi Stefan,
> > > > 
> > > > For the case of size==0, why don't we iterate through all the xattr,
> > > > filter them, remap them and then return the size to process in user
> > > > namespace. That should fix this? I thought that's what
> > > 
> > > For the size==0 we need a temp. buffer where the raw xattr names are written
> > > to so that the xattr_list_userns_rewrite() can actually rewrite what the
> > > filesystem drivers returned.
> > I am probably missing something, but for the case of size==0, we don't
> > have to copy all xattrs. We just need to determine size. So we can walk
> > through each xattr, remap it and add to the size. I mean there should not
> > be a need to allocate this 65K buffer. Just enough space needed to be
> > able to store remapped xattr.
> > 
> > You are already doing it in xattr_parse_uid_from_kuid(). It returns the
> > buffer containing remapped xattr. So we should be able to just determine
> > the size and free the buffer. And do it for all the xattrs returned by
> > filesystem.
> > 
> > What am I missing?
> 
> The problem is that each filesystem has a function that collects the xattr
> names. These functions only return the needed size if size==0 and don't
> write anything into a buffer. If the buffer is empty or there is no buffer,
> I have nothing to remap and calculate size for.

How about calling listxattr() twice. In the first call you will get the
size of buffer to allocate. Allocate that buffer and call ->listxattr()
again, this time passing that buffer? That way you will not have to
hardcode the size of buffer.

Vivek

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-18 14:57               ` Vivek Goyal
@ 2017-07-18 16:11                 ` Stefan Berger
  0 siblings, 0 replies; 76+ messages in thread
From: Stefan Berger @ 2017-07-18 16:11 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: ebiederm, containers, lkp, linux-kernel, zohar, tycho, serge,
	James.Bottomley, christian.brauner, amir73il,
	linux-security-module, casey

On 07/18/2017 10:57 AM, Vivek Goyal wrote:
> On Tue, Jul 18, 2017 at 09:21:22AM -0400, Stefan Berger wrote:
>> On 07/18/2017 08:30 AM, Vivek Goyal wrote:
>>> On Tue, Jul 18, 2017 at 08:05:18AM -0400, Stefan Berger wrote:
>>>> On 07/18/2017 07:48 AM, Vivek Goyal wrote:
>>>>> On Mon, Jul 17, 2017 at 04:50:22PM -0400, Stefan Berger wrote:
>>>>>> On 07/17/2017 02:58 PM, Vivek Goyal wrote:
>>>>>>> On Tue, Jul 11, 2017 at 11:05:11AM -0400, Stefan Berger wrote:
>>>>>>>
>>>>>>> [..]
>>>>>>>> +/*
>>>>>>>> + * xattr_list_userns_rewrite - Rewrite list of xattr names for user namespaces
>>>>>>>> + *                             or determine needed size for attribute list
>>>>>>>> + *                             in case size == 0
>>>>>>>> + *
>>>>>>>> + * In a user namespace we do not present all extended attributes to the
>>>>>>>> + * user. We filter out those that are in the list of userns supported xattr.
>>>>>>>> + * Besides that we filter out those with @uid=<uid> when there is no mapping
>>>>>>>> + * for that uid in the current user namespace.
>>>>>>>> + *
>>>>>>>> + * @list:        list of 0-byte separated xattr names
>>>>>>>> + * @size:        the size of the list; may be 0 to determine needed list size
>>>>>>>> + * @list_maxlen: allocated buffer size of list
>>>>>>>> + */
>>>>>>>> +static ssize_t
>>>>>>>> +xattr_list_userns_rewrite(char *list, ssize_t size, size_t list_maxlen)
>>>>>>>> +{
>>>>>>>> +	char *nlist = NULL;
>>>>>>>> +	size_t s_off, len, nlen;
>>>>>>>> +	ssize_t d_off;
>>>>>>>> +	char *name, *newname;
>>>>>>>> +
>>>>>>>> +	if (!list || size < 0 || current_user_ns() == &init_user_ns)
>>>>>>> size will never be less than 0 here. Only caller calls this function only
>>>>>>> if size is >0. So can we remove this?
>>>>>> Correct.
>>>>>>
>>>>>>> What about case of "!list". So if user space called listxattr(foo, NULL,
>>>>>>> 0), we want to return the size of buffer as if all the xattrs will be
>>>>>>> returned to user space. But in practice we probably will filter out some
>>>>>>> xattrs so actually returned string will be smaller than size reported
>>>>>>> previously.
>>>>>> This case of size=0 is a problem in userns. Depending on the mapping of the
>>>>>> userid's the list can expand. A security.foo@uid=100 can become
>>>>>> security.foo@uid=100000, if the mapping is set up so that uid 100 on the
>>>>>> host becomes uid 100000 inside the container. So for now we only have
>>>>>> security.capability and the way I solved this is by allocating a 65k buffer
>>>>>> when calling from a userns. In this buffer where we gather the xattr names
>>>>>> and then walk them to determine the size that's needed for the buffer by
>>>>>> simulating the rewriting. It's not nice but I don't know of any other
>>>>>> solution.
>>>>> Hi Stefan,
>>>>>
>>>>> For the case of size==0, why don't we iterate through all the xattr,
>>>>> filter them, remap them and then return the size to process in user
>>>>> namespace. That should fix this? I thought that's what
>>>> For the size==0 we need a temp. buffer where the raw xattr names are written
>>>> to so that the xattr_list_userns_rewrite() can actually rewrite what the
>>>> filesystem drivers returned.
>>> I am probably missing something, but for the case of size==0, we don't
>>> have to copy all xattrs. We just need to determine size. So we can walk
>>> through each xattr, remap it and add to the size. I mean there should not
>>> be a need to allocate this 65K buffer. Just enough space needed to be
>>> able to store remapped xattr.
>>>
>>> You are already doing it in xattr_parse_uid_from_kuid(). It returns the
>>> buffer containing remapped xattr. So we should be able to just determine
>>> the size and free the buffer. And do it for all the xattrs returned by
>>> filesystem.
>>>
>>> What am I missing?
>> The problem is that each filesystem has a function that collects the xattr
>> names. These functions only return the needed size if size==0 and don't
>> write anything into a buffer. If the buffer is empty or there is no buffer,
>> I have nothing to remap and calculate size for.
> How about calling listxattr() twice. In the first call you will get the
> size of buffer to allocate. Allocate that buffer and call ->listxattr()
> again, this time passing that buffer? That way you will not have to
> hardcode the size of buffer.

Good idea. I modified the code to do this now. Thanks.

    Stefan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-18 13:26                       ` Eric W. Biederman
@ 2017-07-18 23:13                         ` Serge E. Hallyn
  0 siblings, 0 replies; 76+ messages in thread
From: Serge E. Hallyn @ 2017-07-18 23:13 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Stefan Berger, James Morris, Theodore Ts'o, Serge E. Hallyn,
	containers, lkp, linux-kernel, zohar, tycho, James.Bottomley,
	vgoyal, christian.brauner, amir73il, linux-security-module,
	casey

Quoting Eric W. Biederman (ebiederm@xmission.com):
> Stefan Berger <stefanb@linux.vnet.ibm.com> writes:
> 
> > On 07/18/2017 03:01 AM, James Morris wrote:
> >> On Thu, 13 Jul 2017, Stefan Berger wrote:
> >>
> >>> A file shared by 2 containers, one mapping root to uid=1000, the other mapping
> >>> root to uid=2000, will show these two xattrs on the host (init_user_ns) once
> >>> these containers set xattrs on that file.
> >> I may be missing something here, but what happens when say the uid=2000
> >> container and associated user is deleted from the system, then another is
> >> created with the same uid?
> >>
> >> Won't this mean that you have unexpected capabilities turning up in the
> >> new container?
> >>
> >
> > Yes, that's right. I don't know any solution for that. We would have to walk the
> > filesystems and find all 'stale' xattrs with such a uid. This is independent of
> > whether the uid is encoded on the name side, as in this patch, or on the value
> > side, as in Serge's original proposal. And uids of a mapped container root user
> > don't necessarily have to have an account on the host so that an account
> > deletion could trigger that.
> 
> This problem is actually independent of this piece of code entirely.
> Any lingering files owned by that uid have the same issue.

In particular, any setuid-root files in that container have the precisely
analogous issue.

-serge

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-14 19:26                                     ` Mimi Zohar
  2017-07-15  0:02                                       ` Eric W. Biederman
       [not found]                                       ` <xagsmtp3.20170715001054.9173@uk1vsc.vnet.ibm.com>
@ 2017-07-26  3:00                                       ` Serge E. Hallyn
  2017-07-26 13:57                                         ` Mimi Zohar
  2 siblings, 1 reply; 76+ messages in thread
From: Serge E. Hallyn @ 2017-07-26  3:00 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: Eric W. Biederman, Serge E. Hallyn, Stefan Berger, Mimi Zohar,
	Theodore Ts'o, containers, lkp, linux-kernel, tycho,
	James.Bottomley, vgoyal, christian.brauner, amir73il,
	linux-security-module, casey

On Fri, Jul 14, 2017 at 03:26:14PM -0400, Mimi Zohar wrote:
> On Fri, 2017-07-14 at 13:17 -0500, Eric W. Biederman wrote:
> > Which brings us to the semantic question of would it be nice to have
> > stacked IMA/EVM on the same file.
> > 
> > I really don't think we do.  I think allowing multiple keys for
> > different part of trusting files is easy enough that we should have no
> > need to fight over which keys do which.
> 
> We definitely want to support different policies on the native and in
> the namespace with different keys and keyrings.

Ok, so Stefan's code to support userspace in a container reading
security.ima and getting back the value for security.ima@uid=1000
(if 1000 is the kuid of the container's root user) is in fact
useful to IMA?

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v2] xattr: Enable security.capability in user namespaces
  2017-07-26  3:00                                       ` Serge E. Hallyn
@ 2017-07-26 13:57                                         ` Mimi Zohar
  0 siblings, 0 replies; 76+ messages in thread
From: Mimi Zohar @ 2017-07-26 13:57 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Eric W. Biederman, Stefan Berger, Mimi Zohar, Theodore Ts'o,
	containers, lkp, linux-kernel, tycho, James.Bottomley, vgoyal,
	christian.brauner, amir73il, linux-security-module, casey

On Tue, 2017-07-25 at 22:00 -0500, Serge E. Hallyn wrote:
> On Fri, Jul 14, 2017 at 03:26:14PM -0400, Mimi Zohar wrote:
> > On Fri, 2017-07-14 at 13:17 -0500, Eric W. Biederman wrote:
> > > Which brings us to the semantic question of would it be nice to have
> > > stacked IMA/EVM on the same file.
> > > 
> > > I really don't think we do.  I think allowing multiple keys for
> > > different part of trusting files is easy enough that we should have no
> > > need to fight over which keys do which.
> > 
> > We definitely want to support different policies on the native and in
> > the namespace with different keys and keyrings.
> 
> Ok, so Stefan's code to support userspace in a container reading
> security.ima and getting back the value for security.ima@uid=1000
> (if 1000 is the kuid of the container's root user) is in fact
> useful to IMA?

Definitely!  Root within the namespace needs to be able to read and
write security.ima in order to (re)sign files, with a specific key
known to that container.  Stefan's code provides different views of
the security xattrs.

Mimi

^ permalink raw reply	[flat|nested] 76+ messages in thread

end of thread, other threads:[~2017-07-26 13:58 UTC | newest]

Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-11 15:05 [PATCH v2] Enable namespaced file capabilities Stefan Berger
2017-07-11 15:05 ` [PATCH v2] xattr: Enable security.capability in user namespaces Stefan Berger
2017-07-11 17:12   ` Serge E. Hallyn
2017-07-12  0:15     ` Stefan Berger
2017-07-12  0:47       ` Serge E. Hallyn
2017-07-12  3:45   ` Serge E. Hallyn
2017-07-12 11:35     ` Stefan Berger
2017-07-12 17:32       ` Serge E. Hallyn
2017-07-12  7:59   ` James Morris
2017-07-12 13:25   ` Eric W. Biederman
2017-07-12 17:03     ` Serge E. Hallyn
2017-07-12 22:20       ` James Morris
2017-07-13  0:33         ` Eric W. Biederman
2017-07-13  1:01           ` Serge E. Hallyn
2017-07-12 23:13       ` Eric W. Biederman
2017-07-13  0:43         ` Serge E. Hallyn
2017-07-13  0:44         ` Stefan Berger
2017-07-13  1:15           ` Theodore Ts'o
2017-07-13  2:34             ` Serge E. Hallyn
2017-07-13 12:11             ` Eric W. Biederman
2017-07-13 16:40               ` Theodore Ts'o
2017-07-13 17:05                 ` Stefan Berger
2017-07-13 17:39                   ` Eric W. Biederman
2017-07-13 19:14                     ` Theodore Ts'o
2017-07-13 19:41                       ` Serge E. Hallyn
2017-07-13 21:17                     ` Serge E. Hallyn
2017-07-18  7:01                   ` James Morris
2017-07-18 12:12                     ` Stefan Berger
2017-07-18 13:26                       ` Eric W. Biederman
2017-07-18 23:13                         ` Serge E. Hallyn
2017-07-13 17:14                 ` Eric W. Biederman
2017-07-13 17:33                   ` Stefan Berger
2017-07-13 17:49                     ` Eric W. Biederman
2017-07-13 19:48                       ` Serge E. Hallyn
2017-07-13 21:12                         ` Eric W. Biederman
     [not found]                       ` <9a3010e5-ca2b-5e7a-656b-fcc14f7bec4e@linux.vnet.ibm.com>
2017-07-14  0:38                         ` Eric W. Biederman
2017-07-14 11:32                           ` Stefan Berger
2017-07-14 12:04                             ` Eric W. Biederman
2017-07-14 12:39                               ` Stefan Berger
2017-07-14 13:34                             ` Serge E. Hallyn
2017-07-14 15:22                               ` Stefan Berger
2017-07-14 17:35                                 ` Serge E. Hallyn
2017-07-14 18:17                                   ` Eric W. Biederman
2017-07-14 18:48                                   ` Mimi Zohar
2017-07-14 18:52                                     ` James Bottomley
2017-07-14 20:03                                       ` Mimi Zohar
2017-07-14 20:39                                         ` James Bottomley
2017-07-14 21:34                                           ` Theodore Ts'o
2017-07-14 23:22                                             ` Eric W. Biederman
2017-07-14 23:29                                           ` Mimi Zohar
2017-07-14 23:53                                           ` Eric W. Biederman
2017-07-14 19:29                                     ` Theodore Ts'o
2017-07-14 19:43                                       ` Mimi Zohar
     [not found]                                   ` <xagsmtp2.20170714182525.6604@vmsdvm4.vnet.ibm.com>
2017-07-14 19:26                                     ` Mimi Zohar
2017-07-15  0:02                                       ` Eric W. Biederman
     [not found]                                       ` <xagsmtp3.20170715001054.9173@uk1vsc.vnet.ibm.com>
2017-07-16 11:25                                         ` Mimi Zohar
2017-07-26  3:00                                       ` Serge E. Hallyn
2017-07-26 13:57                                         ` Mimi Zohar
2017-07-14 17:36                                 ` Eric W. Biederman
2017-07-14 19:22                                   ` Stefan Berger
2017-07-13 21:21                     ` Serge E. Hallyn
2017-07-13 21:13                 ` Serge E. Hallyn
2017-07-12 17:53   ` Vivek Goyal
2017-07-12 19:19     ` Stefan Berger
2017-07-14 23:41   ` Eric W. Biederman
2017-07-15 21:27     ` Stefan Berger
2017-07-17 18:58   ` Vivek Goyal
2017-07-17 20:50     ` Stefan Berger
2017-07-18 11:48       ` Vivek Goyal
2017-07-18 12:05         ` Stefan Berger
2017-07-18 12:30           ` Vivek Goyal
2017-07-18 12:36             ` Vivek Goyal
2017-07-18 13:29               ` Eric W. Biederman
2017-07-18 13:21             ` Stefan Berger
2017-07-18 14:57               ` Vivek Goyal
2017-07-18 16:11                 ` Stefan Berger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).