linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/4] cgroup: add xattr support
@ 2012-08-16 17:44 aris
  2012-08-16 17:44 ` [PATCH v6 1/4] xattr: extract simple_xattr code from tmpfs aris
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: aris @ 2012-08-16 17:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: cgroups, Li Zefan, Tejun Heo, Hugh Dickins, Hillf Danton,
	Lennart Poettering

This series are a refreshed version of a patchset submitted by Li Zefan back
in march:
	https://lkml.org/lkml/2012/3/1/13

With Li's permission, I refreshed the patches to apply over the latest upstream
and added the modifications suggested by others in the thread:
- using a mount option instead of config option to enable the xattr support
- reinitialize the list in kmem_xattrs_free()
- renamed functions to simple_xattr_*()

There're two users for this patchset:
- SELinux: to be able to control access to cgroupfs inside containers
- systemd: to store meta information such as main PID in a service cgroup,
  set specific services special options in the cgroup.

While the xattrs will use kernel memory like tmpfs, they're restricted to
'security' (which controls the format of the value) and 'trusted' (which
requires CAP_SYS_ADMIN). If kernel memory usage is still a concern, we're
not far from having memcg account for kernel memory.

v6:
- only allow trusted and security
- replace subsys_bits by something more meaningful
v5:
- check for permissions for user xattr namespace
v4:
- implemented requested changes by Tejun Heo in patch #2

Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>

-- 
Aristeu

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v6 1/4] xattr: extract simple_xattr code from tmpfs
  2012-08-16 17:44 [PATCH v6 0/4] cgroup: add xattr support aris
@ 2012-08-16 17:44 ` aris
  2012-08-16 19:58   ` Tejun Heo
  2012-08-16 17:44 ` [PATCH v6 2/4] cgroup: revise how we re-populate root directory aris
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: aris @ 2012-08-16 17:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: cgroups, Li Zefan, Tejun Heo, Hugh Dickins, Hillf Danton,
	Lennart Poettering

[-- Attachment #1: cgroup1.patch --]
[-- Type: text/plain, Size: 14942 bytes --]

From: Li Zefan <lizefan@huawei.com>

Extract in-memory xattr APIs from tmpfs. Will be used by cgroup.

$ size vmlinux.o
   text    data     bss     dec     hex filename
4658782  880729 5195032 10734543         a3cbcf vmlinux.o
$ size vmlinux.o
   text    data     bss     dec     hex filename
4658957  880729 5195032 10734718         a3cc7e vmlinux.o

v6:
- no changes
v5:
- no changes
v4:
- move simple_xattrs_free() to fs/xattr.c
v3:
- in kmem_xattrs_free(), reinitialize the list
- use simple_xattr_* prefix
- introduce simple_xattr_add() to prevent direct list usage

Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>

---
 fs/xattr.c               |  200 +++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/shmem_fs.h |    3 
 include/linux/xattr.h    |   25 +++++
 mm/shmem.c               |  171 +++-------------------------------------
 4 files changed, 240 insertions(+), 159 deletions(-)

Index: github/fs/xattr.c
===================================================================
--- github.orig/fs/xattr.c	2012-08-16 11:28:12.719273435 -0400
+++ github/fs/xattr.c	2012-08-16 11:28:13.975307743 -0400
@@ -791,3 +791,203 @@
 EXPORT_SYMBOL(generic_listxattr);
 EXPORT_SYMBOL(generic_setxattr);
 EXPORT_SYMBOL(generic_removexattr);
+
+/*
+ * initialize the simple_xattrs structure
+ */
+void simple_xattrs_init(struct simple_xattrs *xattrs)
+{
+	INIT_LIST_HEAD(&xattrs->head);
+	spin_lock_init(&xattrs->lock);
+}
+
+/*
+ * Allocate new xattr and copy in the value; but leave the name to callers.
+ */
+struct simple_xattr *simple_xattr_alloc(const void *value, size_t size)
+{
+	struct simple_xattr *new_xattr;
+	size_t len;
+
+	/* wrap around? */
+	len = sizeof(*new_xattr) + size;
+	if (len <= sizeof(*new_xattr))
+		return NULL;
+
+	new_xattr = kmalloc(len, GFP_KERNEL);
+	if (!new_xattr)
+		return NULL;
+
+	new_xattr->size = size;
+	memcpy(new_xattr->value, value, size);
+	return new_xattr;
+}
+
+/*
+ * free all the xattrs
+ */
+void simple_xattrs_free(struct simple_xattrs *xattrs)
+{
+	struct simple_xattr *xattr, *node;
+
+	spin_lock(&xattrs->lock);
+	list_for_each_entry_safe(xattr, node, &xattrs->head, list) {
+		kfree(xattr->name);
+		kfree(xattr);
+	}
+	INIT_LIST_HEAD(&xattrs->head);
+	spin_unlock(&xattrs->lock);
+}
+
+/*
+ * xattr GET operation for in-memory/pseudo filesystems
+ */
+int simple_xattr_get(struct simple_xattrs *xattrs, const char *name,
+		     void *buffer, size_t size)
+{
+	struct simple_xattr *xattr;
+	int ret = -ENODATA;
+
+	spin_lock(&xattrs->lock);
+	list_for_each_entry(xattr, &xattrs->head, list) {
+		if (strcmp(name, xattr->name))
+			continue;
+
+		ret = xattr->size;
+		if (buffer) {
+			if (size < xattr->size)
+				ret = -ERANGE;
+			else
+				memcpy(buffer, xattr->value, xattr->size);
+		}
+		break;
+	}
+	spin_unlock(&xattrs->lock);
+	return ret;
+}
+
+static int __simple_xattr_set(struct simple_xattrs *xattrs, const char *name,
+			      const void *value, size_t size, int flags)
+{
+	struct simple_xattr *xattr;
+	struct simple_xattr *new_xattr = NULL;
+	size_t len;
+	int err = 0;
+
+	/* value == NULL means remove */
+	if (value) {
+		/* wrap around? */
+		len = sizeof(*new_xattr) + size;
+		if (len <= sizeof(*new_xattr))
+			return -ENOMEM;
+
+		new_xattr = kmalloc(len, GFP_KERNEL);
+		if (!new_xattr)
+			return -ENOMEM;
+
+		new_xattr->name = kstrdup(name, GFP_KERNEL);
+		if (!new_xattr->name) {
+			kfree(new_xattr);
+			return -ENOMEM;
+		}
+
+		new_xattr->size = size;
+		memcpy(new_xattr->value, value, size);
+	}
+
+	spin_lock(&xattrs->lock);
+	list_for_each_entry(xattr, &xattrs->head, list) {
+		if (!strcmp(name, xattr->name)) {
+			if (flags & XATTR_CREATE) {
+				xattr = new_xattr;
+				err = -EEXIST;
+			} else if (new_xattr) {
+				list_replace(&xattr->list, &new_xattr->list);
+			} else {
+				list_del(&xattr->list);
+			}
+			goto out;
+		}
+	}
+	if (flags & XATTR_REPLACE) {
+		xattr = new_xattr;
+		err = -ENODATA;
+	} else {
+		list_add(&new_xattr->list, &xattrs->head);
+		xattr = NULL;
+	}
+out:
+	spin_unlock(&xattrs->lock);
+	if (xattr) {
+		kfree(xattr->name);
+		kfree(xattr);
+	}
+	return err;
+
+}
+
+/*
+ * xattr SET operation for in-memory/pseudo filesystems
+ */
+int simple_xattr_set(struct simple_xattrs *xattrs, const char *name,
+		     const void *value, size_t size, int flags)
+{
+	if (size == 0)
+		value = ""; /* empty EA, do not remove */
+	return __simple_xattr_set(xattrs, name, value, size, flags);
+}
+
+/*
+ * xattr REMOVE operation for in-memory/pseudo filesystems
+ */
+int simple_xattr_remove(struct simple_xattrs *xattrs, const char *name)
+{
+        return __simple_xattr_set(xattrs, name, NULL, 0, XATTR_REPLACE);
+}
+
+static bool xattr_is_trusted(const char *name)
+{
+	return !strncmp(name, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN);
+}
+
+/*
+ * xattr LIST operation for in-memory/pseudo filesystems
+ */
+ssize_t simple_xattr_list(struct simple_xattrs *xattrs, char *buffer, size_t size)
+{
+	bool trusted = capable(CAP_SYS_ADMIN);
+	struct simple_xattr *xattr;
+	size_t used = 0;
+
+	spin_lock(&xattrs->lock);
+	list_for_each_entry(xattr, &xattrs->head, list) {
+		size_t len;
+
+		/* skip "trusted." attributes for unprivileged callers */
+		if (!trusted && xattr_is_trusted(xattr->name))
+			continue;
+
+		len = strlen(xattr->name) + 1;
+		used += len;
+		if (buffer) {
+			if (size < used) {
+				used = -ERANGE;
+				break;
+			}
+			memcpy(buffer, xattr->name, len);
+			buffer += len;
+		}
+	}
+	spin_unlock(&xattrs->lock);
+
+	return used;
+}
+
+void simple_xattr_list_add(struct simple_xattrs *xattrs,
+			   struct simple_xattr *new_xattr)
+{
+	spin_lock(&xattrs->lock);
+	list_add(&new_xattr->list, &xattrs->head);
+	spin_unlock(&xattrs->lock);
+}
+
Index: github/include/linux/shmem_fs.h
===================================================================
--- github.orig/include/linux/shmem_fs.h	2012-08-16 11:28:12.719273435 -0400
+++ github/include/linux/shmem_fs.h	2012-08-16 11:28:13.975307743 -0400
@@ -5,6 +5,7 @@
 #include <linux/mempolicy.h>
 #include <linux/pagemap.h>
 #include <linux/percpu_counter.h>
+#include <linux/xattr.h>
 
 /* inode in-kernel data */
 
@@ -18,7 +19,7 @@
 	};
 	struct shared_policy	policy;		/* NUMA memory alloc policy */
 	struct list_head	swaplist;	/* chain of maybes on swap */
-	struct list_head	xattr_list;	/* list of shmem_xattr */
+	struct simple_xattrs	xattrs;		/* list of xattrs */
 	struct inode		vfs_inode;
 };
 
Index: github/include/linux/xattr.h
===================================================================
--- github.orig/include/linux/xattr.h	2012-08-16 11:28:12.719273435 -0400
+++ github/include/linux/xattr.h	2012-08-16 11:28:13.975307743 -0400
@@ -60,6 +60,7 @@
 #ifdef  __KERNEL__
 
 #include <linux/types.h>
+#include <linux/spinlock.h>
 
 struct inode;
 struct dentry;
@@ -96,6 +97,30 @@
 			   char **xattr_value, size_t size, gfp_t flags);
 int vfs_xattr_cmp(struct dentry *dentry, const char *xattr_name,
 		  const char *value, size_t size, gfp_t flags);
+
+struct simple_xattrs {
+	struct list_head head;
+	spinlock_t lock;
+};
+
+struct simple_xattr {
+	struct list_head list;
+	char *name;
+	size_t size;
+	char value[0];
+};
+
+void simple_xattrs_init(struct simple_xattrs *xattrs);
+struct simple_xattr *simple_xattr_alloc(const void *value, size_t size);
+void simple_xattrs_free(struct simple_xattrs *xattrs);
+int simple_xattr_get(struct simple_xattrs *xattrs, const char *name,
+		     void *buffer, size_t size);
+int simple_xattr_set(struct simple_xattrs *xattrs, const char *name,
+		     const void *value, size_t size, int flags);
+int simple_xattr_remove(struct simple_xattrs *xattrs, const char *name);
+ssize_t simple_xattr_list(struct simple_xattrs *xattrs, char *buffer, size_t size);
+void simple_xattr_list_add(struct simple_xattrs *xattrs, struct simple_xattr *new_xattr);
+
 #endif  /*  __KERNEL__  */
 
 #endif	/* _LINUX_XATTR_H */
Index: github/mm/shmem.c
===================================================================
--- github.orig/mm/shmem.c	2012-08-16 11:28:12.719273435 -0400
+++ github/mm/shmem.c	2012-08-16 11:28:13.975307743 -0400
@@ -77,13 +77,6 @@
 /* Symlink up to this size is kmalloc'ed instead of using a swappable page */
 #define SHORT_SYMLINK_LEN 128
 
-struct shmem_xattr {
-	struct list_head list;	/* anchored by shmem_inode_info->xattr_list */
-	char *name;		/* xattr name */
-	size_t size;
-	char value[0];
-};
-
 /*
  * shmem_fallocate and shmem_writepage communicate via inode->i_private
  * (with i_mutex making sure that it has only one user at a time):
@@ -636,7 +629,6 @@
 static void shmem_evict_inode(struct inode *inode)
 {
 	struct shmem_inode_info *info = SHMEM_I(inode);
-	struct shmem_xattr *xattr, *nxattr;
 
 	if (inode->i_mapping->a_ops == &shmem_aops) {
 		shmem_unacct_size(info->flags, inode->i_size);
@@ -650,10 +642,7 @@
 	} else
 		kfree(info->symlink);
 
-	list_for_each_entry_safe(xattr, nxattr, &info->xattr_list, list) {
-		kfree(xattr->name);
-		kfree(xattr);
-	}
+	simple_xattrs_free(&info->xattrs);
 	BUG_ON(inode->i_blocks);
 	shmem_free_inode(inode->i_sb);
 	clear_inode(inode);
@@ -1377,7 +1366,7 @@
 		spin_lock_init(&info->lock);
 		info->flags = flags & VM_NORESERVE;
 		INIT_LIST_HEAD(&info->swaplist);
-		INIT_LIST_HEAD(&info->xattr_list);
+		simple_xattrs_init(&info->xattrs);
 		cache_no_acl(inode);
 
 		switch (mode & S_IFMT) {
@@ -2060,28 +2049,6 @@
  */
 
 /*
- * Allocate new xattr and copy in the value; but leave the name to callers.
- */
-static struct shmem_xattr *shmem_xattr_alloc(const void *value, size_t size)
-{
-	struct shmem_xattr *new_xattr;
-	size_t len;
-
-	/* wrap around? */
-	len = sizeof(*new_xattr) + size;
-	if (len <= sizeof(*new_xattr))
-		return NULL;
-
-	new_xattr = kmalloc(len, GFP_KERNEL);
-	if (!new_xattr)
-		return NULL;
-
-	new_xattr->size = size;
-	memcpy(new_xattr->value, value, size);
-	return new_xattr;
-}
-
-/*
  * Callback for security_inode_init_security() for acquiring xattrs.
  */
 static int shmem_initxattrs(struct inode *inode,
@@ -2090,11 +2057,11 @@
 {
 	struct shmem_inode_info *info = SHMEM_I(inode);
 	const struct xattr *xattr;
-	struct shmem_xattr *new_xattr;
+	struct simple_xattr *new_xattr;
 	size_t len;
 
 	for (xattr = xattr_array; xattr->name != NULL; xattr++) {
-		new_xattr = shmem_xattr_alloc(xattr->value, xattr->value_len);
+		new_xattr = simple_xattr_alloc(xattr->value, xattr->value_len);
 		if (!new_xattr)
 			return -ENOMEM;
 
@@ -2111,91 +2078,12 @@
 		memcpy(new_xattr->name + XATTR_SECURITY_PREFIX_LEN,
 		       xattr->name, len);
 
-		spin_lock(&info->lock);
-		list_add(&new_xattr->list, &info->xattr_list);
-		spin_unlock(&info->lock);
+		simple_xattr_list_add(&info->xattrs, new_xattr);
 	}
 
 	return 0;
 }
 
-static int shmem_xattr_get(struct dentry *dentry, const char *name,
-			   void *buffer, size_t size)
-{
-	struct shmem_inode_info *info;
-	struct shmem_xattr *xattr;
-	int ret = -ENODATA;
-
-	info = SHMEM_I(dentry->d_inode);
-
-	spin_lock(&info->lock);
-	list_for_each_entry(xattr, &info->xattr_list, list) {
-		if (strcmp(name, xattr->name))
-			continue;
-
-		ret = xattr->size;
-		if (buffer) {
-			if (size < xattr->size)
-				ret = -ERANGE;
-			else
-				memcpy(buffer, xattr->value, xattr->size);
-		}
-		break;
-	}
-	spin_unlock(&info->lock);
-	return ret;
-}
-
-static int shmem_xattr_set(struct inode *inode, const char *name,
-			   const void *value, size_t size, int flags)
-{
-	struct shmem_inode_info *info = SHMEM_I(inode);
-	struct shmem_xattr *xattr;
-	struct shmem_xattr *new_xattr = NULL;
-	int err = 0;
-
-	/* value == NULL means remove */
-	if (value) {
-		new_xattr = shmem_xattr_alloc(value, size);
-		if (!new_xattr)
-			return -ENOMEM;
-
-		new_xattr->name = kstrdup(name, GFP_KERNEL);
-		if (!new_xattr->name) {
-			kfree(new_xattr);
-			return -ENOMEM;
-		}
-	}
-
-	spin_lock(&info->lock);
-	list_for_each_entry(xattr, &info->xattr_list, list) {
-		if (!strcmp(name, xattr->name)) {
-			if (flags & XATTR_CREATE) {
-				xattr = new_xattr;
-				err = -EEXIST;
-			} else if (new_xattr) {
-				list_replace(&xattr->list, &new_xattr->list);
-			} else {
-				list_del(&xattr->list);
-			}
-			goto out;
-		}
-	}
-	if (flags & XATTR_REPLACE) {
-		xattr = new_xattr;
-		err = -ENODATA;
-	} else {
-		list_add(&new_xattr->list, &info->xattr_list);
-		xattr = NULL;
-	}
-out:
-	spin_unlock(&info->lock);
-	if (xattr)
-		kfree(xattr->name);
-	kfree(xattr);
-	return err;
-}
-
 static const struct xattr_handler *shmem_xattr_handlers[] = {
 #ifdef CONFIG_TMPFS_POSIX_ACL
 	&generic_acl_access_handler,
@@ -2226,6 +2114,7 @@
 static ssize_t shmem_getxattr(struct dentry *dentry, const char *name,
 			      void *buffer, size_t size)
 {
+	struct shmem_inode_info *info = SHMEM_I(dentry->d_inode);
 	int err;
 
 	/*
@@ -2240,12 +2129,13 @@
 	if (err)
 		return err;
 
-	return shmem_xattr_get(dentry, name, buffer, size);
+	return simple_xattr_get(&info->xattrs, name, buffer, size);
 }
 
 static int shmem_setxattr(struct dentry *dentry, const char *name,
 			  const void *value, size_t size, int flags)
 {
+	struct shmem_inode_info *info = SHMEM_I(dentry->d_inode);
 	int err;
 
 	/*
@@ -2260,15 +2150,12 @@
 	if (err)
 		return err;
 
-	if (size == 0)
-		value = "";  /* empty EA, do not remove */
-
-	return shmem_xattr_set(dentry->d_inode, name, value, size, flags);
-
+	return simple_xattr_set(&info->xattrs, name, value, size, flags);
 }
 
 static int shmem_removexattr(struct dentry *dentry, const char *name)
 {
+	struct shmem_inode_info *info = SHMEM_I(dentry->d_inode);
 	int err;
 
 	/*
@@ -2283,45 +2170,13 @@
 	if (err)
 		return err;
 
-	return shmem_xattr_set(dentry->d_inode, name, NULL, 0, XATTR_REPLACE);
-}
-
-static bool xattr_is_trusted(const char *name)
-{
-	return !strncmp(name, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN);
+	return simple_xattr_remove(&info->xattrs, name);
 }
 
 static ssize_t shmem_listxattr(struct dentry *dentry, char *buffer, size_t size)
 {
-	bool trusted = capable(CAP_SYS_ADMIN);
-	struct shmem_xattr *xattr;
-	struct shmem_inode_info *info;
-	size_t used = 0;
-
-	info = SHMEM_I(dentry->d_inode);
-
-	spin_lock(&info->lock);
-	list_for_each_entry(xattr, &info->xattr_list, list) {
-		size_t len;
-
-		/* skip "trusted." attributes for unprivileged callers */
-		if (!trusted && xattr_is_trusted(xattr->name))
-			continue;
-
-		len = strlen(xattr->name) + 1;
-		used += len;
-		if (buffer) {
-			if (size < used) {
-				used = -ERANGE;
-				break;
-			}
-			memcpy(buffer, xattr->name, len);
-			buffer += len;
-		}
-	}
-	spin_unlock(&info->lock);
-
-	return used;
+	struct shmem_inode_info *info = SHMEM_I(dentry->d_inode);
+	return simple_xattr_list(&info->xattrs, buffer, size);
 }
 #endif /* CONFIG_TMPFS_XATTR */
 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v6 2/4] cgroup: revise how we re-populate root directory
  2012-08-16 17:44 [PATCH v6 0/4] cgroup: add xattr support aris
  2012-08-16 17:44 ` [PATCH v6 1/4] xattr: extract simple_xattr code from tmpfs aris
@ 2012-08-16 17:44 ` aris
  2012-08-16 17:44 ` [PATCH v6 3/4] cgroup: add xattr support aris
  2012-08-16 17:44 ` [PATCH v6 4/4] cgroup: rename subsys_bits to subsys_mask aris
  3 siblings, 0 replies; 19+ messages in thread
From: aris @ 2012-08-16 17:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: cgroups, Li Zefan, Tejun Heo, Hugh Dickins, Hillf Danton,
	Lennart Poettering

[-- Attachment #1: cgroup2.patch --]
[-- Type: text/plain, Size: 5842 bytes --]

From: Li Zefan <lizefan@huawei.com>

When remounting cgroupfs with some subsystems added to it and some
removed, cgroup will remove all the files in root directory and then
re-popluate it.

What I'm doing here is, only remove files which belong to subsystems that
are to be unbinded, and only create files for newly-added subsystems.
The purpose is to have all other files untouched.

This is a preparation for cgroup xattr support.

v6:
- no changes
v5:
- no changes
v4:
- refactored cgroup_clear_directory() to not use cgroup_rm_file()
- instead of going thru the list of files, get the file list using the
  subsystems
- use 'subsys_mask' instead of {added,removed}_bits and made
  cgroup_populate_dir() to match the parameters with cgroup_clear_directory()
v3:
- refresh patches after recent refactoring

Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>

---
 kernel/cgroup.c |   61 ++++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 48 insertions(+), 13 deletions(-)

Index: github/kernel/cgroup.c
===================================================================
--- github.orig/kernel/cgroup.c	2012-08-16 11:28:12.683272452 -0400
+++ github/kernel/cgroup.c	2012-08-16 11:28:42.592089681 -0400
@@ -824,7 +824,8 @@
 static int cgroup_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode);
 static struct dentry *cgroup_lookup(struct inode *, struct dentry *, unsigned int);
 static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry);
-static int cgroup_populate_dir(struct cgroup *cgrp);
+static int cgroup_populate_dir(struct cgroup *cgrp, bool base_files,
+			       unsigned long subsys_mask);
 static const struct inode_operations cgroup_dir_inode_operations;
 static const struct file_operations proc_cgroupstats_operations;
 
@@ -963,12 +964,29 @@
 	return -ENOENT;
 }
 
-static void cgroup_clear_directory(struct dentry *dir)
+/**
+ * cgroup_clear_directory - selective removal of base and subsystem files
+ * @dir: directory containing the files
+ * @base_files: true if the base files should be removed
+ * @subsys_mask: mask of the subsystem ids whose files should be removed
+ */
+static void cgroup_clear_directory(struct dentry *dir, bool base_files,
+				   unsigned long subsys_mask)
 {
 	struct cgroup *cgrp = __d_cgrp(dir);
+	struct cgroup_subsys *ss;
 
-	while (!list_empty(&cgrp->files))
-		cgroup_rm_file(cgrp, NULL);
+	for_each_subsys(cgrp->root, ss) {
+		struct cftype_set *set;
+		if (!test_bit(ss->subsys_id, &subsys_mask))
+			continue;
+		list_for_each_entry(set, &ss->cftsets, node)
+			cgroup_rm_file(cgrp, set->cfts);
+	}
+	if (base_files) {
+		while (!list_empty(&cgrp->files))
+			cgroup_rm_file(cgrp, NULL);
+	}
 }
 
 /*
@@ -977,8 +995,9 @@
 static void cgroup_d_remove_dir(struct dentry *dentry)
 {
 	struct dentry *parent;
+	struct cgroupfs_root *root = dentry->d_sb->s_fs_info;
 
-	cgroup_clear_directory(dentry);
+	cgroup_clear_directory(dentry, true, root->subsys_bits);
 
 	parent = dentry->d_parent;
 	spin_lock(&parent->d_lock);
@@ -1339,6 +1358,7 @@
 	struct cgroupfs_root *root = sb->s_fs_info;
 	struct cgroup *cgrp = &root->top_cgroup;
 	struct cgroup_sb_opts opts;
+	unsigned long added_bits, removed_bits;
 
 	mutex_lock(&cgrp->dentry->d_inode->i_mutex);
 	mutex_lock(&cgroup_mutex);
@@ -1354,6 +1374,9 @@
 		pr_warning("cgroup: option changes via remount are deprecated (pid=%d comm=%s)\n",
 			   task_tgid_nr(current), current->comm);
 
+	added_bits = opts.subsys_bits & ~root->subsys_bits;
+	removed_bits = root->subsys_bits & ~opts.subsys_bits;
+
 	/* Don't allow flags or name to change at remount */
 	if (opts.flags != root->flags ||
 	    (opts.name && strcmp(opts.name, root->name))) {
@@ -1369,8 +1392,9 @@
 	}
 
 	/* clear out any existing files and repopulate subsystem files */
-	cgroup_clear_directory(cgrp->dentry);
-	cgroup_populate_dir(cgrp);
+	cgroup_clear_directory(cgrp->dentry, false, removed_bits);
+	/* re-populate subsystem files */
+	cgroup_populate_dir(cgrp, false, added_bits);
 
 	if (opts.release_agent)
 		strcpy(root->release_agent_path, opts.release_agent);
@@ -1669,7 +1693,7 @@
 		BUG_ON(root->number_of_cgroups != 1);
 
 		cred = override_creds(&init_cred);
-		cgroup_populate_dir(root_cgrp);
+		cgroup_populate_dir(root_cgrp, true, root->subsys_bits);
 		revert_creds(cred);
 		mutex_unlock(&cgroup_root_mutex);
 		mutex_unlock(&cgroup_mutex);
@@ -3843,18 +3867,29 @@
 	{ }	/* terminate */
 };
 
-static int cgroup_populate_dir(struct cgroup *cgrp)
+/**
+ * cgroup_populate_dir - selectively creation of files in a directory
+ * @cgrp: target cgroup
+ * @base_files: true if the base files should be added
+ * @subsys_mask: mask of the subsystem ids whose files should be added
+ */
+static int cgroup_populate_dir(struct cgroup *cgrp, bool base_files,
+			       unsigned long subsys_mask)
 {
 	int err;
 	struct cgroup_subsys *ss;
 
-	err = cgroup_addrm_files(cgrp, NULL, files, true);
-	if (err < 0)
-		return err;
+        if (base_files) {
+		err = cgroup_addrm_files(cgrp, NULL, files, true);
+		if (err < 0)
+			return err;
+	}
 
 	/* process cftsets of each subsystem */
 	for_each_subsys(cgrp->root, ss) {
 		struct cftype_set *set;
+		if (!test_bit(ss->subsys_id, &subsys_mask))
+			continue;
 
 		list_for_each_entry(set, &ss->cftsets, node)
 			cgroup_addrm_files(cgrp, ss, set->cfts, true);
@@ -3988,7 +4023,7 @@
 
 	list_add_tail(&cgrp->allcg_node, &root->allcg_list);
 
-	err = cgroup_populate_dir(cgrp);
+	err = cgroup_populate_dir(cgrp, true, root->subsys_bits);
 	/* If err < 0, we have a half-filled directory - oh well ;) */
 
 	mutex_unlock(&cgroup_mutex);


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v6 3/4] cgroup: add xattr support
  2012-08-16 17:44 [PATCH v6 0/4] cgroup: add xattr support aris
  2012-08-16 17:44 ` [PATCH v6 1/4] xattr: extract simple_xattr code from tmpfs aris
  2012-08-16 17:44 ` [PATCH v6 2/4] cgroup: revise how we re-populate root directory aris
@ 2012-08-16 17:44 ` aris
  2012-08-16 20:00   ` Tejun Heo
  2012-08-16 17:44 ` [PATCH v6 4/4] cgroup: rename subsys_bits to subsys_mask aris
  3 siblings, 1 reply; 19+ messages in thread
From: aris @ 2012-08-16 17:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: cgroups, Li Zefan, Tejun Heo, Hugh Dickins, Hillf Danton,
	Lennart Poettering

[-- Attachment #1: cgroup3.patch --]
[-- Type: text/plain, Size: 10265 bytes --]

From: Li Zefan <lizefan@huawei.com>

This is one of the items in the plumber's wish list.

For use cases:

>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart

v6:
- remove user xattr namespace, only allow trusted and security
v5:
- check for capabilities before setting/removing xattrs
v4:
- no changes
v3:
- instead of config option, use mount option to enable xattr support

Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>

---
 include/linux/cgroup.h |   13 ++++--
 kernel/cgroup.c        |  100 +++++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 103 insertions(+), 10 deletions(-)

Index: github/include/linux/cgroup.h
===================================================================
--- github.orig/include/linux/cgroup.h	2012-08-16 10:24:50.000000000 -0400
+++ github/include/linux/cgroup.h	2012-08-16 10:27:53.975223786 -0400
@@ -17,6 +17,7 @@
 #include <linux/rwsem.h>
 #include <linux/idr.h>
 #include <linux/workqueue.h>
+#include <linux/xattr.h>
 
 #ifdef CONFIG_CGROUPS
 
@@ -216,6 +217,9 @@
 	/* List of events which userspace want to receive */
 	struct list_head event_list;
 	spinlock_t event_list_lock;
+
+	/* directory xattrs */
+	struct simple_xattrs xattrs;
 };
 
 /*
@@ -309,6 +313,9 @@
 	/* CFTYPE_* flags */
 	unsigned int flags;
 
+	/* file xattrs */
+	struct simple_xattrs xattrs;
+
 	int (*open)(struct inode *inode, struct file *file);
 	ssize_t (*read)(struct cgroup *cgrp, struct cftype *cft,
 			struct file *file,
@@ -394,7 +401,7 @@
  */
 struct cftype_set {
 	struct list_head		node;	/* chained at subsys->cftsets */
-	const struct cftype		*cfts;
+	struct cftype			*cfts;
 };
 
 struct cgroup_scanner {
@@ -406,8 +413,8 @@
 	void *data;
 };
 
-int cgroup_add_cftypes(struct cgroup_subsys *ss, const struct cftype *cfts);
-int cgroup_rm_cftypes(struct cgroup_subsys *ss, const struct cftype *cfts);
+int cgroup_add_cftypes(struct cgroup_subsys *ss, struct cftype *cfts);
+int cgroup_rm_cftypes(struct cgroup_subsys *ss, struct cftype *cfts);
 
 int cgroup_is_removed(const struct cgroup *cgrp);
 
Index: github/kernel/cgroup.c
===================================================================
--- github.orig/kernel/cgroup.c	2012-08-16 10:27:45.000000000 -0400
+++ github/kernel/cgroup.c	2012-08-16 11:10:37.470765933 -0400
@@ -276,7 +276,8 @@
 
 /* bits in struct cgroupfs_root flags field */
 enum {
-	ROOT_NOPREFIX, /* mounted subsystems have no named prefix */
+	ROOT_NOPREFIX,	/* mounted subsystems have no named prefix */
+	ROOT_XATTR,	/* supports extended attributes */
 };
 
 static int cgroup_is_releasable(const struct cgroup *cgrp)
@@ -913,15 +914,19 @@
 		 */
 		BUG_ON(!list_empty(&cgrp->pidlists));
 
+		simple_xattrs_free(&cgrp->xattrs);
+
 		kfree_rcu(cgrp, rcu_head);
 	} else {
 		struct cfent *cfe = __d_cfe(dentry);
 		struct cgroup *cgrp = dentry->d_parent->d_fsdata;
+		struct cftype *cft = cfe->type;
 
 		WARN_ONCE(!list_empty(&cfe->node) &&
 			  cgrp != &cgrp->root->top_cgroup,
 			  "cfe still linked for %s\n", cfe->type->name);
 		kfree(cfe);
+		simple_xattrs_free(&cft->xattrs);
 	}
 	iput(inode);
 }
@@ -1140,6 +1145,8 @@
 		seq_printf(seq, ",%s", ss->name);
 	if (test_bit(ROOT_NOPREFIX, &root->flags))
 		seq_puts(seq, ",noprefix");
+	if (test_bit(ROOT_XATTR, &root->flags))
+		seq_puts(seq, ",xattr");
 	if (strlen(root->release_agent_path))
 		seq_printf(seq, ",release_agent=%s", root->release_agent_path);
 	if (clone_children(&root->top_cgroup))
@@ -1208,6 +1215,10 @@
 			opts->clone_children = true;
 			continue;
 		}
+		if (!strcmp(token, "xattr")) {
+			set_bit(ROOT_XATTR, &opts->flags);
+			continue;
+		}
 		if (!strncmp(token, "release_agent=", 14)) {
 			/* Specifying two release agents is forbidden */
 			if (opts->release_agent)
@@ -1425,6 +1436,7 @@
 	mutex_init(&cgrp->pidlist_mutex);
 	INIT_LIST_HEAD(&cgrp->event_list);
 	spin_lock_init(&cgrp->event_list_lock);
+	simple_xattrs_init(&cgrp->xattrs);
 }
 
 static void init_cgroup_root(struct cgroupfs_root *root)
@@ -1769,6 +1781,8 @@
 	mutex_unlock(&cgroup_root_mutex);
 	mutex_unlock(&cgroup_mutex);
 
+	simple_xattrs_free(&cgrp->xattrs);
+
 	kill_litter_super(sb);
 	cgroup_drop_root(root);
 }
@@ -2575,6 +2589,64 @@
 	return simple_rename(old_dir, old_dentry, new_dir, new_dentry);
 }
 
+static struct simple_xattrs *__d_xattrs(struct dentry *dentry)
+{
+	if (S_ISDIR(dentry->d_inode->i_mode))
+		return &__d_cgrp(dentry)->xattrs;
+	else
+		return &__d_cft(dentry)->xattrs;
+}
+
+static inline int xattr_enabled(struct dentry *dentry)
+{
+	struct cgroupfs_root *root = dentry->d_sb->s_fs_info;
+	return test_bit(ROOT_XATTR, &root->flags);
+}
+
+static bool is_valid_xattr(const char *name)
+{
+	if (!strncmp(name, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN) ||
+	    !strncmp(name, XATTR_SECURITY_PREFIX, XATTR_SECURITY_PREFIX_LEN))
+		return true;
+	return false;
+}
+
+static int cgroup_setxattr(struct dentry *dentry, const char *name,
+			   const void *val, size_t size, int flags)
+{
+	if (!xattr_enabled(dentry))
+		return -EOPNOTSUPP;
+	if (!is_valid_xattr(name))
+		return -EINVAL;
+	return simple_xattr_set(__d_xattrs(dentry), name, val, size, flags);
+}
+
+static int cgroup_removexattr(struct dentry *dentry, const char *name)
+{
+	if (!xattr_enabled(dentry))
+		return -EOPNOTSUPP;
+	if (!is_valid_xattr(name))
+		return -EINVAL;
+	return simple_xattr_remove(__d_xattrs(dentry), name);
+}
+
+static ssize_t cgroup_getxattr(struct dentry *dentry, const char *name,
+			       void *buf, size_t size)
+{
+	if (!xattr_enabled(dentry))
+		return -EOPNOTSUPP;
+	if (!is_valid_xattr(name))
+		return -EINVAL;
+	return simple_xattr_get(__d_xattrs(dentry), name, buf, size);
+}
+
+static ssize_t cgroup_listxattr(struct dentry *dentry, char *buf, size_t size)
+{
+	if (!xattr_enabled(dentry))
+		return -EOPNOTSUPP;
+	return simple_xattr_list(__d_xattrs(dentry), buf, size);
+}
+
 static const struct file_operations cgroup_file_operations = {
 	.read = cgroup_file_read,
 	.write = cgroup_file_write,
@@ -2583,11 +2655,22 @@
 	.release = cgroup_file_release,
 };
 
+static const struct inode_operations cgroup_file_inode_operations = {
+	.setxattr = cgroup_setxattr,
+	.getxattr = cgroup_getxattr,
+	.listxattr = cgroup_listxattr,
+	.removexattr = cgroup_removexattr,
+};
+
 static const struct inode_operations cgroup_dir_inode_operations = {
 	.lookup = cgroup_lookup,
 	.mkdir = cgroup_mkdir,
 	.rmdir = cgroup_rmdir,
 	.rename = cgroup_rename,
+	.setxattr = cgroup_setxattr,
+	.getxattr = cgroup_getxattr,
+	.listxattr = cgroup_listxattr,
+	.removexattr = cgroup_removexattr,
 };
 
 static struct dentry *cgroup_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags)
@@ -2635,6 +2718,7 @@
 	} else if (S_ISREG(mode)) {
 		inode->i_size = 0;
 		inode->i_fop = &cgroup_file_operations;
+		inode->i_op = &cgroup_file_inode_operations;
 	}
 	d_instantiate(dentry, inode);
 	dget(dentry);	/* Extra count - pin the dentry in core */
@@ -2695,7 +2779,7 @@
 }
 
 static int cgroup_add_file(struct cgroup *cgrp, struct cgroup_subsys *subsys,
-			   const struct cftype *cft)
+			   struct cftype *cft)
 {
 	struct dentry *dir = cgrp->dentry;
 	struct cgroup *parent = __d_cgrp(dir);
@@ -2705,6 +2789,8 @@
 	umode_t mode;
 	char name[MAX_CGROUP_TYPE_NAMELEN + MAX_CFTYPE_NAME + 2] = { 0 };
 
+	simple_xattrs_init(&cft->xattrs);
+
 	/* does @cft->flags tell us to skip creation on @cgrp? */
 	if ((cft->flags & CFTYPE_NOT_ON_ROOT) && !cgrp->parent)
 		return 0;
@@ -2745,9 +2831,9 @@
 }
 
 static int cgroup_addrm_files(struct cgroup *cgrp, struct cgroup_subsys *subsys,
-			      const struct cftype cfts[], bool is_add)
+			      struct cftype cfts[], bool is_add)
 {
-	const struct cftype *cft;
+	struct cftype *cft;
 	int err, ret = 0;
 
 	for (cft = cfts; cft->name[0] != '\0'; cft++) {
@@ -2781,7 +2867,7 @@
 }
 
 static void cgroup_cfts_commit(struct cgroup_subsys *ss,
-			       const struct cftype *cfts, bool is_add)
+			       struct cftype *cfts, bool is_add)
 	__releases(&cgroup_mutex) __releases(&cgroup_cft_mutex)
 {
 	LIST_HEAD(pending);
@@ -2832,7 +2918,7 @@
  * function currently returns 0 as long as @cfts registration is successful
  * even if some file creation attempts on existing cgroups fail.
  */
-int cgroup_add_cftypes(struct cgroup_subsys *ss, const struct cftype *cfts)
+int cgroup_add_cftypes(struct cgroup_subsys *ss, struct cftype *cfts)
 {
 	struct cftype_set *set;
 
@@ -2862,7 +2948,7 @@
  * Returns 0 on successful unregistration, -ENOENT if @cfts is not
  * registered with @ss.
  */
-int cgroup_rm_cftypes(struct cgroup_subsys *ss, const struct cftype *cfts)
+int cgroup_rm_cftypes(struct cgroup_subsys *ss, struct cftype *cfts)
 {
 	struct cftype_set *set;
 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v6 4/4] cgroup: rename subsys_bits to subsys_mask
  2012-08-16 17:44 [PATCH v6 0/4] cgroup: add xattr support aris
                   ` (2 preceding siblings ...)
  2012-08-16 17:44 ` [PATCH v6 3/4] cgroup: add xattr support aris
@ 2012-08-16 17:44 ` aris
  3 siblings, 0 replies; 19+ messages in thread
From: aris @ 2012-08-16 17:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: cgroups, Li Zefan, Tejun Heo, Hugh Dickins, Hillf Danton,
	Lennart Poettering

[-- Attachment #1: cgroup-renaming.patch --]
[-- Type: text/plain, Size: 10214 bytes --]

In a previous discussion, Tejun Heo suggested to rename references to
subsys_bits (added_bits, removed_bits, etc) by something more meaningful.

Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com

---
 kernel/cgroup.c |   84 ++++++++++++++++++++++++++++----------------------------
 1 file changed, 42 insertions(+), 42 deletions(-)

Index: github/kernel/cgroup.c
===================================================================
--- github.orig/kernel/cgroup.c	2012-08-16 11:33:43.276337899 -0400
+++ github/kernel/cgroup.c	2012-08-16 11:33:43.292338339 -0400
@@ -111,13 +111,13 @@
 	 * The bitmask of subsystems intended to be attached to this
 	 * hierarchy
 	 */
-	unsigned long subsys_bits;
+	unsigned long subsys_mask;
 
 	/* Unique id for this hierarchy. */
 	int hierarchy_id;
 
 	/* The bitmask of subsystems currently attached to this hierarchy */
-	unsigned long actual_subsys_bits;
+	unsigned long actual_subsys_mask;
 
 	/* A list running through the attached subsystems */
 	struct list_head subsys_list;
@@ -557,7 +557,7 @@
 	 * won't change, so no need for locking.
 	 */
 	for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
-		if (root->subsys_bits & (1UL << i)) {
+		if (root->subsys_mask & (1UL << i)) {
 			/* Subsystem is in this hierarchy. So we want
 			 * the subsystem state from the new
 			 * cgroup */
@@ -1002,7 +1002,7 @@
 	struct dentry *parent;
 	struct cgroupfs_root *root = dentry->d_sb->s_fs_info;
 
-	cgroup_clear_directory(dentry, true, root->subsys_bits);
+	cgroup_clear_directory(dentry, true, root->subsys_mask);
 
 	parent = dentry->d_parent;
 	spin_lock(&parent->d_lock);
@@ -1046,22 +1046,22 @@
  * returns an error, no reference counts are touched.
  */
 static int rebind_subsystems(struct cgroupfs_root *root,
-			      unsigned long final_bits)
+			      unsigned long final_subsys_mask)
 {
-	unsigned long added_bits, removed_bits;
+	unsigned long added_mask, removed_mask;
 	struct cgroup *cgrp = &root->top_cgroup;
 	int i;
 
 	BUG_ON(!mutex_is_locked(&cgroup_mutex));
 	BUG_ON(!mutex_is_locked(&cgroup_root_mutex));
 
-	removed_bits = root->actual_subsys_bits & ~final_bits;
-	added_bits = final_bits & ~root->actual_subsys_bits;
+	removed_mask = root->actual_subsys_mask & ~final_subsys_mask;
+	added_mask = final_subsys_mask & ~root->actual_subsys_mask;
 	/* Check that any added subsystems are currently free */
 	for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
 		unsigned long bit = 1UL << i;
 		struct cgroup_subsys *ss = subsys[i];
-		if (!(bit & added_bits))
+		if (!(bit & added_mask))
 			continue;
 		/*
 		 * Nobody should tell us to do a subsys that doesn't exist:
@@ -1086,7 +1086,7 @@
 	for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
 		struct cgroup_subsys *ss = subsys[i];
 		unsigned long bit = 1UL << i;
-		if (bit & added_bits) {
+		if (bit & added_mask) {
 			/* We're binding this subsystem to this hierarchy */
 			BUG_ON(ss == NULL);
 			BUG_ON(cgrp->subsys[i]);
@@ -1099,7 +1099,7 @@
 			if (ss->bind)
 				ss->bind(cgrp);
 			/* refcount was already taken, and we're keeping it */
-		} else if (bit & removed_bits) {
+		} else if (bit & removed_mask) {
 			/* We're removing this subsystem */
 			BUG_ON(ss == NULL);
 			BUG_ON(cgrp->subsys[i] != dummytop->subsys[i]);
@@ -1112,7 +1112,7 @@
 			list_move(&ss->sibling, &rootnode.subsys_list);
 			/* subsystem is now free - drop reference on module */
 			module_put(ss->module);
-		} else if (bit & final_bits) {
+		} else if (bit & final_subsys_mask) {
 			/* Subsystem state should already exist */
 			BUG_ON(ss == NULL);
 			BUG_ON(!cgrp->subsys[i]);
@@ -1129,7 +1129,7 @@
 			BUG_ON(cgrp->subsys[i]);
 		}
 	}
-	root->subsys_bits = root->actual_subsys_bits = final_bits;
+	root->subsys_mask = root->actual_subsys_mask = final_subsys_mask;
 	synchronize_rcu();
 
 	return 0;
@@ -1158,7 +1158,7 @@
 }
 
 struct cgroup_sb_opts {
-	unsigned long subsys_bits;
+	unsigned long subsys_mask;
 	unsigned long flags;
 	char *release_agent;
 	bool clone_children;
@@ -1267,7 +1267,7 @@
 			/* Mutually exclusive option 'all' + subsystem name */
 			if (all_ss)
 				return -EINVAL;
-			set_bit(i, &opts->subsys_bits);
+			set_bit(i, &opts->subsys_mask);
 			one_ss = true;
 
 			break;
@@ -1288,7 +1288,7 @@
 				continue;
 			if (ss->disabled)
 				continue;
-			set_bit(i, &opts->subsys_bits);
+			set_bit(i, &opts->subsys_mask);
 		}
 	}
 
@@ -1300,19 +1300,19 @@
 	 * the cpuset subsystem.
 	 */
 	if (test_bit(ROOT_NOPREFIX, &opts->flags) &&
-	    (opts->subsys_bits & mask))
+	    (opts->subsys_mask & mask))
 		return -EINVAL;
 
 
 	/* Can't specify "none" and some subsystems */
-	if (opts->subsys_bits && opts->none)
+	if (opts->subsys_mask && opts->none)
 		return -EINVAL;
 
 	/*
 	 * We either have to specify by name or by subsystems. (So all
 	 * empty hierarchies must have a name).
 	 */
-	if (!opts->subsys_bits && !opts->name)
+	if (!opts->subsys_mask && !opts->name)
 		return -EINVAL;
 
 	/*
@@ -1324,7 +1324,7 @@
 	for (i = CGROUP_BUILTIN_SUBSYS_COUNT; i < CGROUP_SUBSYS_COUNT; i++) {
 		unsigned long bit = 1UL << i;
 
-		if (!(bit & opts->subsys_bits))
+		if (!(bit & opts->subsys_mask))
 			continue;
 		if (!try_module_get(subsys[i]->module)) {
 			module_pin_failed = true;
@@ -1341,7 +1341,7 @@
 			/* drop refcounts only on the ones we took */
 			unsigned long bit = 1UL << i;
 
-			if (!(bit & opts->subsys_bits))
+			if (!(bit & opts->subsys_mask))
 				continue;
 			module_put(subsys[i]->module);
 		}
@@ -1351,13 +1351,13 @@
 	return 0;
 }
 
-static void drop_parsed_module_refcounts(unsigned long subsys_bits)
+static void drop_parsed_module_refcounts(unsigned long subsys_mask)
 {
 	int i;
 	for (i = CGROUP_BUILTIN_SUBSYS_COUNT; i < CGROUP_SUBSYS_COUNT; i++) {
 		unsigned long bit = 1UL << i;
 
-		if (!(bit & subsys_bits))
+		if (!(bit & subsys_mask))
 			continue;
 		module_put(subsys[i]->module);
 	}
@@ -1369,7 +1369,7 @@
 	struct cgroupfs_root *root = sb->s_fs_info;
 	struct cgroup *cgrp = &root->top_cgroup;
 	struct cgroup_sb_opts opts;
-	unsigned long added_bits, removed_bits;
+	unsigned long added_mask, removed_mask;
 
 	mutex_lock(&cgrp->dentry->d_inode->i_mutex);
 	mutex_lock(&cgroup_mutex);
@@ -1381,31 +1381,31 @@
 		goto out_unlock;
 
 	/* See feature-removal-schedule.txt */
-	if (opts.subsys_bits != root->actual_subsys_bits || opts.release_agent)
+	if (opts.subsys_mask != root->actual_subsys_mask || opts.release_agent)
 		pr_warning("cgroup: option changes via remount are deprecated (pid=%d comm=%s)\n",
 			   task_tgid_nr(current), current->comm);
 
-	added_bits = opts.subsys_bits & ~root->subsys_bits;
-	removed_bits = root->subsys_bits & ~opts.subsys_bits;
+	added_mask = opts.subsys_mask & ~root->subsys_mask;
+	removed_mask = root->subsys_mask & ~opts.subsys_mask;
 
 	/* Don't allow flags or name to change at remount */
 	if (opts.flags != root->flags ||
 	    (opts.name && strcmp(opts.name, root->name))) {
 		ret = -EINVAL;
-		drop_parsed_module_refcounts(opts.subsys_bits);
+		drop_parsed_module_refcounts(opts.subsys_mask);
 		goto out_unlock;
 	}
 
-	ret = rebind_subsystems(root, opts.subsys_bits);
+	ret = rebind_subsystems(root, opts.subsys_mask);
 	if (ret) {
-		drop_parsed_module_refcounts(opts.subsys_bits);
+		drop_parsed_module_refcounts(opts.subsys_mask);
 		goto out_unlock;
 	}
 
 	/* clear out any existing files and repopulate subsystem files */
-	cgroup_clear_directory(cgrp->dentry, false, removed_bits);
+	cgroup_clear_directory(cgrp->dentry, false, removed_mask);
 	/* re-populate subsystem files */
-	cgroup_populate_dir(cgrp, false, added_bits);
+	cgroup_populate_dir(cgrp, false, added_mask);
 
 	if (opts.release_agent)
 		strcpy(root->release_agent_path, opts.release_agent);
@@ -1491,8 +1491,8 @@
 	 * If we asked for subsystems (or explicitly for no
 	 * subsystems) then they must match
 	 */
-	if ((opts->subsys_bits || opts->none)
-	    && (opts->subsys_bits != root->subsys_bits))
+	if ((opts->subsys_mask || opts->none)
+	    && (opts->subsys_mask != root->subsys_mask))
 		return 0;
 
 	return 1;
@@ -1502,7 +1502,7 @@
 {
 	struct cgroupfs_root *root;
 
-	if (!opts->subsys_bits && !opts->none)
+	if (!opts->subsys_mask && !opts->none)
 		return NULL;
 
 	root = kzalloc(sizeof(*root), GFP_KERNEL);
@@ -1515,7 +1515,7 @@
 	}
 	init_cgroup_root(root);
 
-	root->subsys_bits = opts->subsys_bits;
+	root->subsys_mask = opts->subsys_mask;
 	root->flags = opts->flags;
 	if (opts->release_agent)
 		strcpy(root->release_agent_path, opts->release_agent);
@@ -1547,7 +1547,7 @@
 	if (!opts->new_root)
 		return -EINVAL;
 
-	BUG_ON(!opts->subsys_bits && !opts->none);
+	BUG_ON(!opts->subsys_mask && !opts->none);
 
 	ret = set_anon_super(sb, NULL);
 	if (ret)
@@ -1665,7 +1665,7 @@
 		if (ret)
 			goto unlock_drop;
 
-		ret = rebind_subsystems(root, root->subsys_bits);
+		ret = rebind_subsystems(root, root->subsys_mask);
 		if (ret == -EBUSY) {
 			free_cg_links(&tmp_cg_links);
 			goto unlock_drop;
@@ -1705,7 +1705,7 @@
 		BUG_ON(root->number_of_cgroups != 1);
 
 		cred = override_creds(&init_cred);
-		cgroup_populate_dir(root_cgrp, true, root->subsys_bits);
+		cgroup_populate_dir(root_cgrp, true, root->subsys_mask);
 		revert_creds(cred);
 		mutex_unlock(&cgroup_root_mutex);
 		mutex_unlock(&cgroup_mutex);
@@ -1717,7 +1717,7 @@
 		 */
 		cgroup_drop_root(opts.new_root);
 		/* no subsys rebinding, so refcounts don't change */
-		drop_parsed_module_refcounts(opts.subsys_bits);
+		drop_parsed_module_refcounts(opts.subsys_mask);
 	}
 
 	kfree(opts.release_agent);
@@ -1731,7 +1731,7 @@
  drop_new_super:
 	deactivate_locked_super(sb);
  drop_modules:
-	drop_parsed_module_refcounts(opts.subsys_bits);
+	drop_parsed_module_refcounts(opts.subsys_mask);
  out_err:
 	kfree(opts.release_agent);
 	kfree(opts.name);
@@ -4109,7 +4109,7 @@
 
 	list_add_tail(&cgrp->allcg_node, &root->allcg_list);
 
-	err = cgroup_populate_dir(cgrp, true, root->subsys_bits);
+	err = cgroup_populate_dir(cgrp, true, root->subsys_mask);
 	/* If err < 0, we have a half-filled directory - oh well ;) */
 
 	mutex_unlock(&cgroup_mutex);


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 1/4] xattr: extract simple_xattr code from tmpfs
  2012-08-16 17:44 ` [PATCH v6 1/4] xattr: extract simple_xattr code from tmpfs aris
@ 2012-08-16 19:58   ` Tejun Heo
  2012-08-20  7:10     ` Hugh Dickins
  0 siblings, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2012-08-16 19:58 UTC (permalink / raw)
  To: aris
  Cc: linux-kernel, cgroups, Li Zefan, Hugh Dickins, Hillf Danton,
	Lennart Poettering

On Thu, Aug 16, 2012 at 01:44:54PM -0400, aris@redhat.com wrote:
> From: Li Zefan <lizefan@huawei.com>
> 
> Extract in-memory xattr APIs from tmpfs. Will be used by cgroup.
> 
> $ size vmlinux.o
>    text    data     bss     dec     hex filename
> 4658782  880729 5195032 10734543         a3cbcf vmlinux.o
> $ size vmlinux.o
>    text    data     bss     dec     hex filename
> 4658957  880729 5195032 10734718         a3cc7e vmlinux.o
> 
> v6:
> - no changes
> v5:
> - no changes
> v4:
> - move simple_xattrs_free() to fs/xattr.c
> v3:
> - in kmem_xattrs_free(), reinitialize the list
> - use simple_xattr_* prefix
> - introduce simple_xattr_add() to prevent direct list usage
> 
> Cc: Li Zefan <lizefan@huawei.com>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Hugh Dickins <hughd@google.com>

Hugh, can you please review and ack this one?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 3/4] cgroup: add xattr support
  2012-08-16 17:44 ` [PATCH v6 3/4] cgroup: add xattr support aris
@ 2012-08-16 20:00   ` Tejun Heo
  2012-08-21 21:43     ` Lennart Poettering
  0 siblings, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2012-08-16 20:00 UTC (permalink / raw)
  To: aris
  Cc: linux-kernel, cgroups, Li Zefan, Hugh Dickins, Hillf Danton,
	Lennart Poettering

On Thu, Aug 16, 2012 at 01:44:56PM -0400, aris@redhat.com wrote:
> From: Li Zefan <lizefan@huawei.com>
> 
> This is one of the items in the plumber's wish list.
> 
> For use cases:
> 
> >> What would the use case be for this?
> >
> > Attaching meta information to services, in an easily discoverable
> > way. For example, in systemd we create one cgroup for each service, and
> > could then store data like the main pid of the specific service as an
> > xattr on the cgroup itself. That way we'd have almost all service state
> > in the cgroupfs, which would make it possible to terminate systemd and
> > later restart it without losing any state information. But there's more:
> > for example, some very peculiar services cannot be terminated on
> > shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> > services in question could just mark that on their cgroup, by setting an
> > xattr. On the more desktopy side of things there are other
> > possibilities: for example there are plans defining what an application
> > is along the lines of a cgroup (i.e. an app being a collection of
> > processes). With xattrs one could then attach an icon or human readable
> > program name on the cgroup.
> >
> > The key idea is that this would allow attaching runtime meta information
> > to cgroups and everything they model (services, apps, vms), that doesn't
> > need any complex userspace infrastructure, has good access control
> > (i.e. because the file system enforces that anyway, and there's the
> > "trusted." xattr namespace), notifications (inotify), and can easily be
> > shared among applications.
> >
> > Lennart
> 
> v6:
> - remove user xattr namespace, only allow trusted and security
> v5:
> - check for capabilities before setting/removing xattrs
> v4:
> - no changes
> v3:
> - instead of config option, use mount option to enable xattr support
> 
> Cc: Li Zefan <lizefan@huawei.com>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Hillf Danton <dhillf@gmail.com>
> Cc: Lennart Poettering <lpoetter@redhat.com>
> Signed-off-by: Li Zefan <lizefan@huawei.com>
> Signed-off-by: Aristeu Rozanski <aris@redhat.com>

I'm not against this but unsure whether using kmem is enough for the
suggested use case.  Lennart, would this suit systemd?  How much
metadata are we talking about?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 1/4] xattr: extract simple_xattr code from tmpfs
  2012-08-16 19:58   ` Tejun Heo
@ 2012-08-20  7:10     ` Hugh Dickins
  2012-08-20 19:00       ` Aristeu Rozanski
  0 siblings, 1 reply; 19+ messages in thread
From: Hugh Dickins @ 2012-08-20  7:10 UTC (permalink / raw)
  To: Tejun Heo
  Cc: aris, linux-kernel, cgroups, Li Zefan, Hillf Danton, Lennart Poettering

On Thu, 16 Aug 2012, Tejun Heo wrote:
> On Thu, Aug 16, 2012 at 01:44:54PM -0400, aris@redhat.com wrote:
> > From: Li Zefan <lizefan@huawei.com>
> > 
> > Extract in-memory xattr APIs from tmpfs. Will be used by cgroup.
> > 
> > $ size vmlinux.o
> >    text    data     bss     dec     hex filename
> > 4658782  880729 5195032 10734543         a3cbcf vmlinux.o
> > $ size vmlinux.o
> >    text    data     bss     dec     hex filename
> > 4658957  880729 5195032 10734718         a3cc7e vmlinux.o
> > 
> > v6:
> > - no changes
> > v5:
> > - no changes
> > v4:
> > - move simple_xattrs_free() to fs/xattr.c
> > v3:
> > - in kmem_xattrs_free(), reinitialize the list
> > - use simple_xattr_* prefix
> > - introduce simple_xattr_add() to prevent direct list usage
> > 
> > Cc: Li Zefan <lizefan@huawei.com>
> > Cc: Tejun Heo <tj@kernel.org>
> > Cc: Hugh Dickins <hughd@google.com>
> 
> Hugh, can you please review and ack this one?

Yes, it looks nice to me.  I might have preferred more as inlines in
the header file to lower the slight init/evict overhead, and I don't
see why __simple_xattr_set() isn't using simple_xattr_alloc() in the
same way that shmem_xattr_set() used shmem_xattr_alloc().  But none
of that matters:

Acked-by: Hugh Dickins <hughd@google.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 1/4] xattr: extract simple_xattr code from tmpfs
  2012-08-20  7:10     ` Hugh Dickins
@ 2012-08-20 19:00       ` Aristeu Rozanski
  2012-08-21  4:47         ` Hugh Dickins
  0 siblings, 1 reply; 19+ messages in thread
From: Aristeu Rozanski @ 2012-08-20 19:00 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Tejun Heo, linux-kernel, cgroups, Li Zefan, Hillf Danton,
	Lennart Poettering

On Mon, Aug 20, 2012 at 12:10:09AM -0700, Hugh Dickins wrote:
> Yes, it looks nice to me.  I might have preferred more as inlines in
> the header file to lower the slight init/evict overhead, and I don't
> see why __simple_xattr_set() isn't using simple_xattr_alloc() in the
> same way that shmem_xattr_set() used shmem_xattr_alloc().  But none
> of that matters:
> 
> Acked-by: Hugh Dickins <hughd@google.com>

I can submit additional patches to fix these. What functions you want
inlined?

On why __simple_xattr_set() is not using simple_xattr_alloc(), there's
no reason to be that way, I missed it.

Thanks for reviewing!

-- 
Aristeu


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 1/4] xattr: extract simple_xattr code from tmpfs
  2012-08-20 19:00       ` Aristeu Rozanski
@ 2012-08-21  4:47         ` Hugh Dickins
  2012-08-22 20:07           ` Aristeu Rozanski
  0 siblings, 1 reply; 19+ messages in thread
From: Hugh Dickins @ 2012-08-21  4:47 UTC (permalink / raw)
  To: Aristeu Rozanski
  Cc: Tejun Heo, linux-kernel, cgroups, Li Zefan, Hillf Danton,
	Lennart Poettering

On Mon, 20 Aug 2012, Aristeu Rozanski wrote:
> On Mon, Aug 20, 2012 at 12:10:09AM -0700, Hugh Dickins wrote:
> > Yes, it looks nice to me.  I might have preferred more as inlines in
> > the header file to lower the slight init/evict overhead, and I don't
> > see why __simple_xattr_set() isn't using simple_xattr_alloc() in the
> > same way that shmem_xattr_set() used shmem_xattr_alloc().  But none
> > of that matters:
> > 
> > Acked-by: Hugh Dickins <hughd@google.com>
> 
> I can submit additional patches to fix these. What functions you want
> inlined?

Oh, thank you.  I was thinking that it's uncommon for tmpfs files to
have xattrs (and the same probably true of other filesystems), so it's
best to minimize xattrs impact on shared paths.  If simple_xattrs_init()
and simple_xattrs_free() can be static inline functions in linux/xattr.h,
that would be nice.

Probably more important would be to remove spin_lock() and spin_unlock()
(and INIT_LIST_HEAD) from simple_xattrs_free() - those are unnecessary
in shmem_evict_inode(), and wouldn't they be unnecessary whenever
simple_xattrs_free() gets called?

Hugh

> 
> On why __simple_xattr_set() is not using simple_xattr_alloc(), there's
> no reason to be that way, I missed it.
> 
> Thanks for reviewing!
> 
> -- 
> Aristeu
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 3/4] cgroup: add xattr support
  2012-08-16 20:00   ` Tejun Heo
@ 2012-08-21 21:43     ` Lennart Poettering
  2012-08-21 21:48       ` Tejun Heo
  2012-08-24  0:02       ` Eric W. Biederman
  0 siblings, 2 replies; 19+ messages in thread
From: Lennart Poettering @ 2012-08-21 21:43 UTC (permalink / raw)
  To: Tejun Heo
  Cc: aris, linux-kernel, cgroups, Li Zefan, Hugh Dickins, Hillf Danton

Heya,

(sorry for the late reply)

On 16.08.2012 22:00, Tejun Heo wrote:
> On Thu, Aug 16, 2012 at 01:44:56PM -0400, aris@redhat.com wrote:

>>> Attaching meta information to services, in an easily discoverable
>>> way. For example, in systemd we create one cgroup for each service, and
>>> could then store data like the main pid of the specific service as an
>>> xattr on the cgroup itself. That way we'd have almost all service state
>>> in the cgroupfs, which would make it possible to terminate systemd and
>>> later restart it without losing any state information. But there's more:
>>> for example, some very peculiar services cannot be terminated on
>>> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
>>> services in question could just mark that on their cgroup, by setting an
>>> xattr. On the more desktopy side of things there are other
>>> possibilities: for example there are plans defining what an application
>>> is along the lines of a cgroup (i.e. an app being a collection of
>>> processes). With xattrs one could then attach an icon or human readable
>>> program name on the cgroup.
>>>
>>> The key idea is that this would allow attaching runtime meta information
>>> to cgroups and everything they model (services, apps, vms), that doesn't
>>> need any complex userspace infrastructure, has good access control
>>> (i.e. because the file system enforces that anyway, and there's the
>>> "trusted." xattr namespace), notifications (inotify), and can easily be
>>> shared among applications.

>
> I'm not against this but unsure whether using kmem is enough for the
> suggested use case.  Lennart, would this suit systemd?  How much
> metadata are we talking about?

Just small things, like values, PIDs, i.e. a few 100 bytes or so per 
cgroup should be more than sufficient for our needs.

Lennart

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 3/4] cgroup: add xattr support
  2012-08-21 21:43     ` Lennart Poettering
@ 2012-08-21 21:48       ` Tejun Heo
  2012-08-21 23:29         ` Hugh Dickins
  2012-08-24  0:02       ` Eric W. Biederman
  1 sibling, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2012-08-21 21:48 UTC (permalink / raw)
  To: Lennart Poettering
  Cc: aris, linux-kernel, cgroups, Li Zefan, Hugh Dickins, Hillf Danton

Hello,

On Tue, Aug 21, 2012 at 11:43:44PM +0200, Lennart Poettering wrote:
> >I'm not against this but unsure whether using kmem is enough for the
> >suggested use case.  Lennart, would this suit systemd?  How much
> >metadata are we talking about?
> 
> Just small things, like values, PIDs, i.e. a few 100 bytes or so per
> cgroup should be more than sufficient for our needs.

Alright, then.  I think there's gonna be one more round to address
Hugh's comments.  Hugh, how should this be routed?  Is there some git
branch that tmpfs changes can go in so that cgroup tree can pull?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 3/4] cgroup: add xattr support
  2012-08-21 21:48       ` Tejun Heo
@ 2012-08-21 23:29         ` Hugh Dickins
  2012-08-23 19:44           ` Tejun Heo
  0 siblings, 1 reply; 19+ messages in thread
From: Hugh Dickins @ 2012-08-21 23:29 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Lennart Poettering, aris, linux-kernel, cgroups,
	Li Zefan, Hillf Danton

On Tue, 21 Aug 2012, Tejun Heo wrote:
> On Tue, Aug 21, 2012 at 11:43:44PM +0200, Lennart Poettering wrote:
> > >I'm not against this but unsure whether using kmem is enough for the
> > >suggested use case.  Lennart, would this suit systemd?  How much
> > >metadata are we talking about?
> > 
> > Just small things, like values, PIDs, i.e. a few 100 bytes or so per
> > cgroup should be more than sufficient for our needs.

That is reasonable.

> 
> Alright, then.  I think there's gonna be one more round to address
> Hugh's comments.  Hugh, how should this be routed?  Is there some git
> branch that tmpfs changes can go in so that cgroup tree can pull?

No git tree, but we can easily handle it in one of two ways.

include/linux/shmem_fs.h and mm/shmem.c usually go to Linus from Andrew
from his mmotm tree (which includes and is included in linux-next,
by some magic escaping infinite recursion).

Are we expecting Aristeu+Zefan's simple_xattr patches to go into 3.7?
I don't have anything planned for shmem.c for 3.7 beyond a bugfix,
which shouldn't interact with the simple_xattr changes at all
(I could remove info->lock, but will not do so this time around,
precisely so as not to interfere with those patches).

So it should be perfectly workable for you to take Aristeu+Zefan's
shmem patches into your cgroup tree, then any further mods from
mmotm will get layered on top.

But if you prefer to leave shmem.c changes to Andrew, then it would
also be perfectly workable for Aristeu to split the 1/4 into two:
one for you which updates fs/xattr.c and include/linux/xattr.h with
simple_xattr code stolen from mm/shmem.c and include/linux/shmem_fs.h;
and one for Andrew which updates mm/shmem.c and include/linux/shmem_fs.h
to delete its shmem_xattr stuff and use simple_xattr interfaces instead.

Either approach is fine with me.

Hugh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 1/4] xattr: extract simple_xattr code from tmpfs
  2012-08-21  4:47         ` Hugh Dickins
@ 2012-08-22 20:07           ` Aristeu Rozanski
  2012-08-22 20:25             ` Hugh Dickins
  0 siblings, 1 reply; 19+ messages in thread
From: Aristeu Rozanski @ 2012-08-22 20:07 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Tejun Heo, linux-kernel, cgroups, Li Zefan, Hillf Danton,
	Lennart Poettering

On Mon, Aug 20, 2012 at 09:47:15PM -0700, Hugh Dickins wrote:
> On Mon, 20 Aug 2012, Aristeu Rozanski wrote:
> > On Mon, Aug 20, 2012 at 12:10:09AM -0700, Hugh Dickins wrote:
> > > Yes, it looks nice to me.  I might have preferred more as inlines in
> > > the header file to lower the slight init/evict overhead, and I don't
> > > see why __simple_xattr_set() isn't using simple_xattr_alloc() in the
> > > same way that shmem_xattr_set() used shmem_xattr_alloc().  But none
> > > of that matters:
> > > 
> > > Acked-by: Hugh Dickins <hughd@google.com>
> > 
> > I can submit additional patches to fix these. What functions you want
> > inlined?
> 
> Oh, thank you.  I was thinking that it's uncommon for tmpfs files to
> have xattrs (and the same probably true of other filesystems), so it's
> best to minimize xattrs impact on shared paths.  If simple_xattrs_init()
> and simple_xattrs_free() can be static inline functions in linux/xattr.h,
> that would be nice.

ok, done. and since Tejun is waiting for those before integrating, I'll
merge them in the original patches and resubmit everything.

> Probably more important would be to remove spin_lock() and spin_unlock()
> (and INIT_LIST_HEAD) from simple_xattrs_free() - those are unnecessary
> in shmem_evict_inode(), and wouldn't they be unnecessary whenever
> simple_xattrs_free() gets called?

Removing INIT_LIST_HEAD() it's possible by actually unlinking each xattr
inside the loop before freeing them. still, it'll have to check if the list is
empty or not, which might end up being the same?

About the locking, I'm not sure, I'm investigating it.

-- 
Aristeu


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 1/4] xattr: extract simple_xattr code from tmpfs
  2012-08-22 20:07           ` Aristeu Rozanski
@ 2012-08-22 20:25             ` Hugh Dickins
  2012-08-22 20:55               ` Aristeu Rozanski
  0 siblings, 1 reply; 19+ messages in thread
From: Hugh Dickins @ 2012-08-22 20:25 UTC (permalink / raw)
  To: Aristeu Rozanski
  Cc: Tejun Heo, linux-kernel, cgroups, Li Zefan, Hillf Danton,
	Lennart Poettering

On Wed, 22 Aug 2012, Aristeu Rozanski wrote:
> On Mon, Aug 20, 2012 at 09:47:15PM -0700, Hugh Dickins wrote:
> > On Mon, 20 Aug 2012, Aristeu Rozanski wrote:
> > > On Mon, Aug 20, 2012 at 12:10:09AM -0700, Hugh Dickins wrote:
> > > > Yes, it looks nice to me.  I might have preferred more as inlines in
> > > > the header file to lower the slight init/evict overhead, and I don't
> > > > see why __simple_xattr_set() isn't using simple_xattr_alloc() in the
> > > > same way that shmem_xattr_set() used shmem_xattr_alloc().  But none
> > > > of that matters:
> > > > 
> > > > Acked-by: Hugh Dickins <hughd@google.com>
> > > 
> > > I can submit additional patches to fix these. What functions you want
> > > inlined?
> > 
> > Oh, thank you.  I was thinking that it's uncommon for tmpfs files to
> > have xattrs (and the same probably true of other filesystems), so it's
> > best to minimize xattrs impact on shared paths.  If simple_xattrs_init()
> > and simple_xattrs_free() can be static inline functions in linux/xattr.h,
> > that would be nice.
> 
> ok, done. and since Tejun is waiting for those before integrating, I'll
> merge them in the original patches and resubmit everything.

Thanks.

> 
> > Probably more important would be to remove spin_lock() and spin_unlock()
> > (and INIT_LIST_HEAD) from simple_xattrs_free() - those are unnecessary
> > in shmem_evict_inode(), and wouldn't they be unnecessary whenever
> > simple_xattrs_free() gets called?
> 
> Removing INIT_LIST_HEAD() it's possible by actually unlinking each xattr
> inside the loop before freeing them. still, it'll have to check if the list is
> empty or not, which might end up being the same?
> 
> About the locking, I'm not sure, I'm investigating it.

I think we have a misunderstanding.

INIT_LIST_HEAD() is not expensive, I just meant to remove it because
I thought it unnecessary by that point.

Do you envisage anywhere that would call simple_xattrs_free() except
a filesystem's evict_inode()?

By that point, the inode is on its way out of the system: nothing
much (yes, I am being a bit vague there ;) can get to it any more,
there's no need to reinitialize the list head and there's no need for
locking, because nothing else can be playing with those xattrs now.

If I'm wrong, then the current shmem_evict_inode() is already wrong
not to be locking there.

Would renaming simple_xattrs_free() to simple_xattrs_evict() help?

Hugh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 1/4] xattr: extract simple_xattr code from tmpfs
  2012-08-22 20:25             ` Hugh Dickins
@ 2012-08-22 20:55               ` Aristeu Rozanski
  0 siblings, 0 replies; 19+ messages in thread
From: Aristeu Rozanski @ 2012-08-22 20:55 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Tejun Heo, linux-kernel, cgroups, Li Zefan, Hillf Danton,
	Lennart Poettering

On Wed, Aug 22, 2012 at 01:25:06PM -0700, Hugh Dickins wrote:
> > > Probably more important would be to remove spin_lock() and spin_unlock()
> > > (and INIT_LIST_HEAD) from simple_xattrs_free() - those are unnecessary
> > > in shmem_evict_inode(), and wouldn't they be unnecessary whenever
> > > simple_xattrs_free() gets called?
> > 
> > Removing INIT_LIST_HEAD() it's possible by actually unlinking each xattr
> > inside the loop before freeing them. still, it'll have to check if the list is
> > empty or not, which might end up being the same?
> > 
> > About the locking, I'm not sure, I'm investigating it.
> 
> I think we have a misunderstanding.
> 
> INIT_LIST_HEAD() is not expensive, I just meant to remove it because
> I thought it unnecessary by that point.

ah, I see.

> Do you envisage anywhere that would call simple_xattrs_free() except
> a filesystem's evict_inode()?

cgroup does it differently and it's called in d_iput() path (see cgroup_diput),
because it needs to selectively remove files upon remount.

> By that point, the inode is on its way out of the system: nothing
> much (yes, I am being a bit vague there ;) can get to it any more,
> there's no need to reinitialize the list head and there's no need for
> locking, because nothing else can be playing with those xattrs now.

I agree with you. That's why I'm looking into it because I'm pretty sure
I removed it at some point in the past and decided to put it back after
investigating the easily reproducible oops. Sadly I managed to forget
the analisys I did at the time.

-- 
Aristeu


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 3/4] cgroup: add xattr support
  2012-08-21 23:29         ` Hugh Dickins
@ 2012-08-23 19:44           ` Tejun Heo
  2012-08-23 19:58             ` Aristeu Rozanski
  0 siblings, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2012-08-23 19:44 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Lennart Poettering, aris, linux-kernel, cgroups,
	Li Zefan, Hillf Danton

Hello, Hugh.

On Tue, Aug 21, 2012 at 04:29:53PM -0700, Hugh Dickins wrote:
> Are we expecting Aristeu+Zefan's simple_xattr patches to go into 3.7?

Yeah, probably.

> I don't have anything planned for shmem.c for 3.7 beyond a bugfix,
> which shouldn't interact with the simple_xattr changes at all
> (I could remove info->lock, but will not do so this time around,
> precisely so as not to interfere with those patches).
> 
> So it should be perfectly workable for you to take Aristeu+Zefan's
> shmem patches into your cgroup tree, then any further mods from
> mmotm will get layered on top.

I think this approach would be simpler.  Once Aristeu posts the
updated version, I'll route the whole series through cgroup/for-3.7.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 3/4] cgroup: add xattr support
  2012-08-23 19:44           ` Tejun Heo
@ 2012-08-23 19:58             ` Aristeu Rozanski
  0 siblings, 0 replies; 19+ messages in thread
From: Aristeu Rozanski @ 2012-08-23 19:58 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Hugh Dickins, Andrew Morton, Lennart Poettering, linux-kernel,
	cgroups, Li Zefan, Hillf Danton

On Thu, Aug 23, 2012 at 12:44:23PM -0700, Tejun Heo wrote:
> Hello, Hugh.
> 
> On Tue, Aug 21, 2012 at 04:29:53PM -0700, Hugh Dickins wrote:
> > Are we expecting Aristeu+Zefan's simple_xattr patches to go into 3.7?
> 
> Yeah, probably.
> 
> > I don't have anything planned for shmem.c for 3.7 beyond a bugfix,
> > which shouldn't interact with the simple_xattr changes at all
> > (I could remove info->lock, but will not do so this time around,
> > precisely so as not to interfere with those patches).
> > 
> > So it should be perfectly workable for you to take Aristeu+Zefan's
> > shmem patches into your cgroup tree, then any further mods from
> > mmotm will get layered on top.
> 
> I think this approach would be simpler.  Once Aristeu posts the
> updated version, I'll route the whole series through cgroup/for-3.7.

I'm about to submit it, just doing last build and testing round. Merged
back the changes Hugh asked in patch 1.

Also found why the list reinitialization was needed in the past; one of
the old iterations had a bug in the remount code and was cleaning the
xattrs out of the cgroup directory but not actually removing the
directory, thus new xattrs got added to a list of freed xattrs.

So I believe we're good to go.

-- 
Aristeu


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 3/4] cgroup: add xattr support
  2012-08-21 21:43     ` Lennart Poettering
  2012-08-21 21:48       ` Tejun Heo
@ 2012-08-24  0:02       ` Eric W. Biederman
  1 sibling, 0 replies; 19+ messages in thread
From: Eric W. Biederman @ 2012-08-24  0:02 UTC (permalink / raw)
  To: Lennart Poettering
  Cc: Tejun Heo, aris, linux-kernel, cgroups, Li Zefan, Hugh Dickins,
	Hillf Danton

Lennart Poettering <lpoetter@redhat.com> writes:

> Heya,
>
> (sorry for the late reply)
>
> On 16.08.2012 22:00, Tejun Heo wrote:
>> On Thu, Aug 16, 2012 at 01:44:56PM -0400, aris@redhat.com wrote:
>
>>>> Attaching meta information to services, in an easily discoverable
>>>> way. For example, in systemd we create one cgroup for each service, and
>>>> could then store data like the main pid of the specific service as an
>>>> xattr on the cgroup itself. That way we'd have almost all service state
>>>> in the cgroupfs, which would make it possible to terminate systemd and
>>>> later restart it without losing any state information. But there's more:
>>>> for example, some very peculiar services cannot be terminated on
>>>> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
>>>> services in question could just mark that on their cgroup, by setting an
>>>> xattr. On the more desktopy side of things there are other
>>>> possibilities: for example there are plans defining what an application
>>>> is along the lines of a cgroup (i.e. an app being a collection of
>>>> processes). With xattrs one could then attach an icon or human readable
>>>> program name on the cgroup.
>>>>
>>>> The key idea is that this would allow attaching runtime meta information
>>>> to cgroups and everything they model (services, apps, vms), that doesn't
>>>> need any complex userspace infrastructure, has good access control
>>>> (i.e. because the file system enforces that anyway, and there's the
>>>> "trusted." xattr namespace), notifications (inotify), and can easily be
>>>> shared among applications.
>
>>
>> I'm not against this but unsure whether using kmem is enough for the
>> suggested use case.  Lennart, would this suit systemd?  How much
>> metadata are we talking about?
>
> Just small things, like values, PIDs, i.e. a few 100 bytes or so per cgroup
> should be more than sufficient for our needs.

I have a really silly question.  Why is storing these things in xattrs
in a cgroup better than simply implementing a file in a cgroup?

It is most definitely going to be a real pain to discover as unix tools
do not support xattrs well.

Furthermore I am having nasty visiions that storing pids is going to
start breaking cgroups the way storing pids already breaks futexes.
Which pid namespace is that pid relative to?

Eric

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-08-24  0:02 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-16 17:44 [PATCH v6 0/4] cgroup: add xattr support aris
2012-08-16 17:44 ` [PATCH v6 1/4] xattr: extract simple_xattr code from tmpfs aris
2012-08-16 19:58   ` Tejun Heo
2012-08-20  7:10     ` Hugh Dickins
2012-08-20 19:00       ` Aristeu Rozanski
2012-08-21  4:47         ` Hugh Dickins
2012-08-22 20:07           ` Aristeu Rozanski
2012-08-22 20:25             ` Hugh Dickins
2012-08-22 20:55               ` Aristeu Rozanski
2012-08-16 17:44 ` [PATCH v6 2/4] cgroup: revise how we re-populate root directory aris
2012-08-16 17:44 ` [PATCH v6 3/4] cgroup: add xattr support aris
2012-08-16 20:00   ` Tejun Heo
2012-08-21 21:43     ` Lennart Poettering
2012-08-21 21:48       ` Tejun Heo
2012-08-21 23:29         ` Hugh Dickins
2012-08-23 19:44           ` Tejun Heo
2012-08-23 19:58             ` Aristeu Rozanski
2012-08-24  0:02       ` Eric W. Biederman
2012-08-16 17:44 ` [PATCH v6 4/4] cgroup: rename subsys_bits to subsys_mask aris

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).