[PATCH v3 0/3] cgroup: add xattr support

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/3] cgroup: add xattr support
@ 2012-07-02 14:29 Aristeu Rozanski
  2012-07-02 14:29 ` [PATCH v3 1/3] xattr: extract simple_xattr code from tmpfs Aristeu Rozanski
                   ` (3 more replies)
  0 siblings, 4 replies; 27+ messages in thread
From: Aristeu Rozanski @ 2012-07-02 14:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: Li Zefan, aris, Tejun Heo, Hugh Dickins, Hillf Danton

This series are a refreshed version of a patchset submitted by Li Zefan back
in march:
	https://lkml.org/lkml/2012/3/1/13

With Li's permission, I refreshed the patches to apply over the latest upstream
and added the modifications suggested by others in the thread:
- using a mount option instead of config option to enable the xattr support
- reinitialize the list in kmem_xattrs_free()
- renamed functions to simple_xattr_*()

Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>

 fs/xattr.c               |  178 +++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/cgroup.h   |   16 +++-
 include/linux/shmem_fs.h |    3 
 include/linux/xattr.h    |   24 ++++++
 kernel/cgroup.c          |  167 +++++++++++++++++++++++++++++++++++++-------
 mm/shmem.c               |  153 +++-------------------------------------
 6 files changed, 375 insertions(+), 166 deletions(-)

-- 
Aristeu

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 1/3] xattr: extract simple_xattr code from tmpfs
  2012-07-02 14:29 [PATCH v3 0/3] cgroup: add xattr support Aristeu Rozanski
@ 2012-07-02 14:29 ` Aristeu Rozanski
  2012-07-03 16:53   ` [PATCH v4 " Aristeu Rozanski
  2012-07-02 14:29 ` [PATCH v3 2/3] cgroup: revise how we re-populate root directory Aristeu Rozanski
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 27+ messages in thread
From: Aristeu Rozanski @ 2012-07-02 14:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: Li Zefan, aris, Tejun Heo, Hugh Dickins, Hillf Danton

[-- Attachment #1: cgroup1.patch --]
[-- Type: text/plain, Size: 14024 bytes --]

From: Li Zefan <lizefan@huawei.com>

Extract in-memory xattr APIs from tmpfs. Will be used by cgroup.

$ size vmlinux.o
   text    data     bss     dec     hex filename
4658782  880729 5195032 10734543         a3cbcf vmlinux.o
$ size vmlinux.o
   text    data     bss     dec     hex filename
4658957  880729 5195032 10734718         a3cc7e vmlinux.o

v3:
- in simple_xattrs_free(), reinitialize the list
- use simple_xattr_* prefix
- introduce simple_xattr_add() to prevent direct list usage

Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>

---
 fs/xattr.c               |  178 +++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/shmem_fs.h |    3 
 include/linux/xattr.h    |   24 ++++++
 mm/shmem.c               |  153 +++-------------------------------------
 4 files changed, 219 insertions(+), 139 deletions(-)

--- linus-2.6.orig/fs/xattr.c	2012-07-02 10:21:39.582897061 -0400
+++ linus-2.6/fs/xattr.c	2012-07-02 10:21:40.778709426 -0400
@@ -783,3 +783,181 @@ EXPORT_SYMBOL(generic_getxattr);
 EXPORT_SYMBOL(generic_listxattr);
 EXPORT_SYMBOL(generic_setxattr);
 EXPORT_SYMBOL(generic_removexattr);
+
+/*
+ * initialize the simple_xattrs structure
+ */
+void simple_xattrs_init(struct simple_xattrs *xattrs)
+{
+	INIT_LIST_HEAD(&xattrs->head);
+	spin_lock_init(&xattrs->lock);
+}
+
+/*
+ * free all the xattrs
+ */
+void simple_xattrs_free(struct simple_xattrs *xattrs)
+{
+	struct simple_xattr *xattr, *node;
+
+	spin_lock(&xattrs->lock);
+	list_for_each_entry_safe(xattr, node, &xattrs->head, list) {
+		kfree(xattr->name);
+		kfree(xattr);
+	}
+	INIT_LIST_HEAD(&xattrs->head);
+	spin_unlock(&xattrs->lock);
+}
+
+/*
+ * xattr GET operation for in-memory/pseudo filesystems
+ */
+int simple_xattr_get(struct simple_xattrs *xattrs, const char *name,
+		     void *buffer, size_t size)
+{
+	struct simple_xattr *xattr;
+	int ret = -ENODATA;
+
+	spin_lock(&xattrs->lock);
+	list_for_each_entry(xattr, &xattrs->head, list) {
+		if (strcmp(name, xattr->name))
+			continue;
+
+		ret = xattr->size;
+		if (buffer) {
+			if (size < xattr->size)
+				ret = -ERANGE;
+			else
+				memcpy(buffer, xattr->value, xattr->size);
+		}
+		break;
+	}
+	spin_unlock(&xattrs->lock);
+	return ret;
+}
+
+static int __simple_xattr_set(struct simple_xattrs *xattrs, const char *name,
+			      const void *value, size_t size, int flags)
+{
+	struct simple_xattr *xattr;
+	struct simple_xattr *new_xattr = NULL;
+	size_t len;
+	int err = 0;
+
+	/* value == NULL means remove */
+	if (value) {
+		/* wrap around? */
+		len = sizeof(*new_xattr) + size;
+		if (len <= sizeof(*new_xattr))
+			return -ENOMEM;
+
+		new_xattr = kmalloc(len, GFP_KERNEL);
+		if (!new_xattr)
+			return -ENOMEM;
+
+		new_xattr->name = kstrdup(name, GFP_KERNEL);
+		if (!new_xattr->name) {
+			kfree(new_xattr);
+			return -ENOMEM;
+		}
+
+		new_xattr->size = size;
+		memcpy(new_xattr->value, value, size);
+	}
+
+	spin_lock(&xattrs->lock);
+	list_for_each_entry(xattr, &xattrs->head, list) {
+		if (!strcmp(name, xattr->name)) {
+			if (flags & XATTR_CREATE) {
+				xattr = new_xattr;
+				err = -EEXIST;
+			} else if (new_xattr) {
+				list_replace(&xattr->list, &new_xattr->list);
+			} else {
+				list_del(&xattr->list);
+			}
+			goto out;
+		}
+	}
+	if (flags & XATTR_REPLACE) {
+		xattr = new_xattr;
+		err = -ENODATA;
+	} else {
+		list_add(&new_xattr->list, &xattrs->head);
+		xattr = NULL;
+	}
+out:
+	spin_unlock(&xattrs->lock);
+	if (xattr) {
+		kfree(xattr->name);
+		kfree(xattr);
+	}
+	return err;
+
+}
+
+/*
+ * xattr SET operation for in-memory/pseudo filesystems
+ */
+int simple_xattr_set(struct simple_xattrs *xattrs, const char *name,
+		     const void *value, size_t size, int flags)
+{
+	if (size == 0)
+		value = ""; /* empty EA, do not remove */
+	return __simple_xattr_set(xattrs, name, value, size, flags);
+}
+
+/*
+ * xattr REMOVE operation for in-memory/pseudo filesystems
+ */
+int simple_xattr_remove(struct simple_xattrs *xattrs, const char *name)
+{
+        return __simple_xattr_set(xattrs, name, NULL, 0, XATTR_REPLACE);
+}
+
+static bool xattr_is_trusted(const char *name)
+{
+	return !strncmp(name, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN);
+}
+
+/*
+ * xattr LIST operation for in-memory/pseudo filesystems
+ */
+ssize_t simple_xattr_list(struct simple_xattrs *xattrs, char *buffer, size_t size)
+{
+	bool trusted = capable(CAP_SYS_ADMIN);
+	struct simple_xattr *xattr;
+	size_t used = 0;
+
+	spin_lock(&xattrs->lock);
+	list_for_each_entry(xattr, &xattrs->head, list) {
+		size_t len;
+
+		/* skip "trusted." attributes for unprivileged callers */
+		if (!trusted && xattr_is_trusted(xattr->name))
+			continue;
+
+		len = strlen(xattr->name) + 1;
+		used += len;
+		if (buffer) {
+			if (size < used) {
+				used = -ERANGE;
+				break;
+			}
+			memcpy(buffer, xattr->name, len);
+			buffer += len;
+		}
+	}
+	spin_unlock(&xattrs->lock);
+
+	return used;
+}
+
+void simple_xattr_list_add(struct simple_xattrs *xattrs,
+			   struct simple_xattr *new_xattr)
+{
+	spin_lock(&xattrs->lock);
+	list_add(&new_xattr->list, &xattrs->head);
+	spin_unlock(&xattrs->lock);
+}
+
--- linus-2.6.orig/include/linux/shmem_fs.h	2012-07-02 10:21:39.581897084 -0400
+++ linus-2.6/include/linux/shmem_fs.h	2012-07-02 10:21:40.779709427 -0400
@@ -5,6 +5,7 @@
 #include <linux/mempolicy.h>
 #include <linux/pagemap.h>
 #include <linux/percpu_counter.h>
+#include <linux/xattr.h>
 
 /* inode in-kernel data */
 
@@ -18,7 +19,7 @@ struct shmem_inode_info {
 	};
 	struct shared_policy	policy;		/* NUMA memory alloc policy */
 	struct list_head	swaplist;	/* chain of maybes on swap */
-	struct list_head	xattr_list;	/* list of shmem_xattr */
+	struct simple_xattrs	xattrs;		/* list of xattrs */
 	struct inode		vfs_inode;
 };
 
--- linus-2.6.orig/include/linux/xattr.h	2012-07-02 10:21:39.581897084 -0400
+++ linus-2.6/include/linux/xattr.h	2012-07-02 10:21:40.781709159 -0400
@@ -60,6 +60,7 @@ #define XATTR_REPLACE	0x2	/* set value, 
 #ifdef  __KERNEL__
 
 #include <linux/types.h>
+#include <linux/spinlock.h>
 
 struct inode;
 struct dentry;
@@ -96,6 +97,29 @@ ssize_t vfs_getxattr_alloc(struct dentry
 			   char **xattr_value, size_t size, gfp_t flags);
 int vfs_xattr_cmp(struct dentry *dentry, const char *xattr_name,
 		  const char *value, size_t size, gfp_t flags);
+
+struct simple_xattrs {
+	struct list_head head;
+	spinlock_t lock;
+};
+
+struct simple_xattr {
+	struct list_head list;
+	char *name;
+	size_t size;
+	char value[0];
+};
+
+void simple_xattrs_init(struct simple_xattrs *xattrs);
+void simple_xattrs_free(struct simple_xattrs *xattrs);
+int simple_xattr_get(struct simple_xattrs *xattrs, const char *name,
+		     void *buffer, size_t size);
+int simple_xattr_set(struct simple_xattrs *xattrs, const char *name,
+		     const void *value, size_t size, int flags);
+int simple_xattr_remove(struct simple_xattrs *xattrs, const char *name);
+ssize_t simple_xattr_list(struct simple_xattrs *xattrs, char *buffer, size_t size);
+void simple_xattr_list_add(struct simple_xattrs *xattrs, struct simple_xattr *new_xattr);
+
 #endif  /*  __KERNEL__  */
 
 #endif	/* _LINUX_XATTR_H */
--- linus-2.6.orig/mm/shmem.c	2012-07-02 10:21:39.582897061 -0400
+++ linus-2.6/mm/shmem.c	2012-07-02 10:21:40.784709469 -0400
@@ -77,13 +77,6 @@ #define BOGO_DIRENT_SIZE 20
 /* Symlink up to this size is kmalloc'ed instead of using a swappable page */
 #define SHORT_SYMLINK_LEN 128
 
-struct shmem_xattr {
-	struct list_head list;	/* anchored by shmem_inode_info->xattr_list */
-	char *name;		/* xattr name */
-	size_t size;
-	char value[0];
-};
-
 /*
  * shmem_fallocate and shmem_writepage communicate via inode->i_private
  * (with i_mutex making sure that it has only one user at a time):
@@ -627,7 +620,6 @@ 			unmap_mapping_range(inode->i_mapping,
 static void shmem_evict_inode(struct inode *inode)
 {
 	struct shmem_inode_info *info = SHMEM_I(inode);
-	struct shmem_xattr *xattr, *nxattr;
 
 	if (inode->i_mapping->a_ops == &shmem_aops) {
 		shmem_unacct_size(info->flags, inode->i_size);
@@ -641,10 +633,7 @@ 		shmem_truncate_range(inode, 0, (loff_t
 	} else
 		kfree(info->symlink);
 
-	list_for_each_entry_safe(xattr, nxattr, &info->xattr_list, list) {
-		kfree(xattr->name);
-		kfree(xattr);
-	}
+	simple_xattrs_free(&info->xattrs);
 	BUG_ON(inode->i_blocks);
 	shmem_free_inode(inode->i_sb);
 	clear_inode(inode);
@@ -1360,7 +1349,7 @@ 		memset(info, 0, (char *)inode - (char 
 		spin_lock_init(&info->lock);
 		info->flags = flags & VM_NORESERVE;
 		INIT_LIST_HEAD(&info->swaplist);
-		INIT_LIST_HEAD(&info->xattr_list);
+		simple_xattrs_init(&info->xattrs);
 		cache_no_acl(inode);
 
 		switch (mode & S_IFMT) {
@@ -2136,9 +2125,9 @@ static void shmem_put_link(struct dentry
 /*
  * Allocate new xattr and copy in the value; but leave the name to callers.
  */
-static struct shmem_xattr *shmem_xattr_alloc(const void *value, size_t size)
+static struct simple_xattr *simple_xattr_alloc(const void *value, size_t size)
 {
-	struct shmem_xattr *new_xattr;
+	struct simple_xattr *new_xattr;
 	size_t len;
 
 	/* wrap around? */
@@ -2164,11 +2153,11 @@ static int shmem_initxattrs(struct inode
 {
 	struct shmem_inode_info *info = SHMEM_I(inode);
 	const struct xattr *xattr;
-	struct shmem_xattr *new_xattr;
+	struct simple_xattr *new_xattr;
 	size_t len;
 
 	for (xattr = xattr_array; xattr->name != NULL; xattr++) {
-		new_xattr = shmem_xattr_alloc(xattr->value, xattr->value_len);
+		new_xattr = simple_xattr_alloc(xattr->value, xattr->value_len);
 		if (!new_xattr)
 			return -ENOMEM;
 
@@ -2185,91 +2174,12 @@ static int shmem_initxattrs(struct inode
 		memcpy(new_xattr->name + XATTR_SECURITY_PREFIX_LEN,
 		       xattr->name, len);
 
-		spin_lock(&info->lock);
-		list_add(&new_xattr->list, &info->xattr_list);
-		spin_unlock(&info->lock);
+		simple_xattr_list_add(&info->xattrs, new_xattr);
 	}
 
 	return 0;
 }
 
-static int shmem_xattr_get(struct dentry *dentry, const char *name,
-			   void *buffer, size_t size)
-{
-	struct shmem_inode_info *info;
-	struct shmem_xattr *xattr;
-	int ret = -ENODATA;
-
-	info = SHMEM_I(dentry->d_inode);
-
-	spin_lock(&info->lock);
-	list_for_each_entry(xattr, &info->xattr_list, list) {
-		if (strcmp(name, xattr->name))
-			continue;
-
-		ret = xattr->size;
-		if (buffer) {
-			if (size < xattr->size)
-				ret = -ERANGE;
-			else
-				memcpy(buffer, xattr->value, xattr->size);
-		}
-		break;
-	}
-	spin_unlock(&info->lock);
-	return ret;
-}
-
-static int shmem_xattr_set(struct inode *inode, const char *name,
-			   const void *value, size_t size, int flags)
-{
-	struct shmem_inode_info *info = SHMEM_I(inode);
-	struct shmem_xattr *xattr;
-	struct shmem_xattr *new_xattr = NULL;
-	int err = 0;
-
-	/* value == NULL means remove */
-	if (value) {
-		new_xattr = shmem_xattr_alloc(value, size);
-		if (!new_xattr)
-			return -ENOMEM;
-
-		new_xattr->name = kstrdup(name, GFP_KERNEL);
-		if (!new_xattr->name) {
-			kfree(new_xattr);
-			return -ENOMEM;
-		}
-	}
-
-	spin_lock(&info->lock);
-	list_for_each_entry(xattr, &info->xattr_list, list) {
-		if (!strcmp(name, xattr->name)) {
-			if (flags & XATTR_CREATE) {
-				xattr = new_xattr;
-				err = -EEXIST;
-			} else if (new_xattr) {
-				list_replace(&xattr->list, &new_xattr->list);
-			} else {
-				list_del(&xattr->list);
-			}
-			goto out;
-		}
-	}
-	if (flags & XATTR_REPLACE) {
-		xattr = new_xattr;
-		err = -ENODATA;
-	} else {
-		list_add(&new_xattr->list, &info->xattr_list);
-		xattr = NULL;
-	}
-out:
-	spin_unlock(&info->lock);
-	if (xattr)
-		kfree(xattr->name);
-	kfree(xattr);
-	return err;
-}
-
 static const struct xattr_handler *shmem_xattr_handlers[] = {
 #ifdef CONFIG_TMPFS_POSIX_ACL
 	&generic_acl_access_handler,
@@ -2300,6 +2210,7 @@ 			return 0;
 static ssize_t shmem_getxattr(struct dentry *dentry, const char *name,
 			      void *buffer, size_t size)
 {
+	struct shmem_inode_info *info = SHMEM_I(dentry->d_inode);
 	int err;
 
 	/*
@@ -2314,12 +2225,13 @@ static ssize_t shmem_getxattr(struct den
 	if (err)
 		return err;
 
-	return shmem_xattr_get(dentry, name, buffer, size);
+	return simple_xattr_get(&info->xattrs, name, buffer, size);
 }
 
 static int shmem_setxattr(struct dentry *dentry, const char *name,
 			  const void *value, size_t size, int flags)
 {
+	struct shmem_inode_info *info = SHMEM_I(dentry->d_inode);
 	int err;
 
 	/*
@@ -2334,15 +2246,12 @@ static int shmem_setxattr(struct dentry 
 	if (err)
 		return err;
 
-	if (size == 0)
-		value = "";  /* empty EA, do not remove */
-
-	return shmem_xattr_set(dentry->d_inode, name, value, size, flags);
-
+	return simple_xattr_set(&info->xattrs, name, value, size, flags);
 }
 
 static int shmem_removexattr(struct dentry *dentry, const char *name)
 {
+	struct shmem_inode_info *info = SHMEM_I(dentry->d_inode);
 	int err;
 
 	/*
@@ -2357,45 +2266,13 @@ static int shmem_removexattr(struct dent
 	if (err)
 		return err;
 
-	return shmem_xattr_set(dentry->d_inode, name, NULL, 0, XATTR_REPLACE);
-}
-
-static bool xattr_is_trusted(const char *name)
-{
-	return !strncmp(name, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN);
+	return simple_xattr_remove(&info->xattrs, name);
 }
 
 static ssize_t shmem_listxattr(struct dentry *dentry, char *buffer, size_t size)
 {
-	bool trusted = capable(CAP_SYS_ADMIN);
-	struct shmem_xattr *xattr;
-	struct shmem_inode_info *info;
-	size_t used = 0;
-
-	info = SHMEM_I(dentry->d_inode);
-
-	spin_lock(&info->lock);
-	list_for_each_entry(xattr, &info->xattr_list, list) {
-		size_t len;
-
-		/* skip "trusted." attributes for unprivileged callers */
-		if (!trusted && xattr_is_trusted(xattr->name))
-			continue;
-
-		len = strlen(xattr->name) + 1;
-		used += len;
-		if (buffer) {
-			if (size < used) {
-				used = -ERANGE;
-				break;
-			}
-			memcpy(buffer, xattr->name, len);
-			buffer += len;
-		}
-	}
-	spin_unlock(&info->lock);
-
-	return used;
+	struct shmem_inode_info *info = SHMEM_I(dentry->d_inode);
+	return simple_xattr_list(&info->xattrs, buffer, size);
 }
 #endif /* CONFIG_TMPFS_XATTR */
 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 2/3] cgroup: revise how we re-populate root directory
  2012-07-02 14:29 [PATCH v3 0/3] cgroup: add xattr support Aristeu Rozanski
  2012-07-02 14:29 ` [PATCH v3 1/3] xattr: extract simple_xattr code from tmpfs Aristeu Rozanski
@ 2012-07-02 14:29 ` Aristeu Rozanski
  2012-07-09 17:17   ` Tejun Heo
  2012-07-02 14:29 ` [PATCH v3 3/3] cgroup: add xattr support Aristeu Rozanski
  2012-07-17 20:41 ` [PATCH v3 0/3] " Tejun Heo
  3 siblings, 1 reply; 27+ messages in thread
From: Aristeu Rozanski @ 2012-07-02 14:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: Li Zefan, aris, Tejun Heo, Hugh Dickins, Hillf Danton

[-- Attachment #1: cgroup2.patch --]
[-- Type: text/plain, Size: 5610 bytes --]

From: Li Zefan <lizefan@huawei.com>

When remounting cgroupfs with some subsystems added to it and some
removed, cgroup will remove all the files in root directory and then
re-popluate it.

What I'm doing here is, only remove files which belong to subsystems that
are to be unbinded, and only create files for newly-added subsystems.
The purpose is to have all other files untouched.

This is a preparation for cgroup xattr support.

v3:
- refresh patches after recent refactoring

Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>

---
 include/linux/cgroup.h |    3 +++
 kernel/cgroup.c        |   49 +++++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 44 insertions(+), 8 deletions(-)

--- linus-2.6.orig/include/linux/cgroup.h	2012-07-02 10:21:39.505647330 -0400
+++ linus-2.6/include/linux/cgroup.h	2012-07-02 10:21:58.055646941 -0400
@@ -306,6 +306,9 @@ 	 * If not 0, file mode is set to this v
 	 */
 	size_t max_write_len;
 
+	/* The subsystem this cgroup file belongs to */
+	struct cgroup_subsys *subsys;
+
 	/* CFTYPE_* flags */
 	unsigned int flags;
 
--- linus-2.6.orig/kernel/cgroup.c	2012-07-02 10:21:39.505647330 -0400
+++ linus-2.6/kernel/cgroup.c	2012-07-02 10:21:58.059646980 -0400
@@ -825,6 +825,7 @@ static int cgroup_mkdir(struct inode *di
 static struct dentry *cgroup_lookup(struct inode *, struct dentry *, struct nameidata *);
 static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry);
 static int cgroup_populate_dir(struct cgroup *cgrp);
+static int cgroup_repopulate_dir(struct cgroup *cgrp, unsigned long added_bits);
 static const struct inode_operations cgroup_dir_inode_operations;
 static const struct file_operations proc_cgroupstats_operations;
 
@@ -949,7 +950,8 @@ static void remove_dir(struct dentry *d)
 	dput(parent);
 }
 
-static int cgroup_rm_file(struct cgroup *cgrp, const struct cftype *cft)
+static int __cgroup_rm_file(struct cgroup *cgrp, const struct cftype *cft,
+			  bool remove_all, unsigned long removed_bits)
 {
 	struct cfent *cfe;
 
@@ -958,8 +960,13 @@ static int cgroup_rm_file(struct cgroup 
 
 	list_for_each_entry(cfe, &cgrp->files, node) {
 		struct dentry *d = cfe->dentry;
+		struct cftype *cft2 = cfe->type;
+
+		if (cft && cft2 != cft)
+			continue;
 
-		if (cft && cfe->type != cft)
+		if (!remove_all && cft2->subsys &&
+		    !test_bit(cft2->subsys->subsys_id, &removed_bits))
 			continue;
 
 		dget(d);
@@ -973,12 +980,19 @@ 		return 0;
 	return -ENOENT;
 }
 
-static void cgroup_clear_directory(struct dentry *dir)
+static int cgroup_rm_file(struct cgroup *cgrp, const struct cftype *cft)
+{
+	return __cgroup_rm_file(cgrp, cft, false, 0);
+}
+
+static void cgroup_clear_directory(struct dentry *dir, bool remove_all,
+				   unsigned long removed_bits)
 {
 	struct cgroup *cgrp = __d_cgrp(dir);
 
 	while (!list_empty(&cgrp->files))
-		cgroup_rm_file(cgrp, NULL);
+		if (__cgroup_rm_file(cgrp, NULL, remove_all, removed_bits))
+			break;
 }
 
 /*
@@ -988,7 +1002,7 @@ static void cgroup_d_remove_dir(struct d
 {
 	struct dentry *parent;
 
-	cgroup_clear_directory(dentry);
+	cgroup_clear_directory(dentry, true, 0);
 
 	parent = dentry->d_parent;
 	spin_lock(&parent->d_lock);
@@ -1353,6 +1367,7 @@ 	int ret = 0;
 	struct cgroupfs_root *root = sb->s_fs_info;
 	struct cgroup *cgrp = &root->top_cgroup;
 	struct cgroup_sb_opts opts;
+	unsigned long added_bits, removed_bits;
 
 	mutex_lock(&cgrp->dentry->d_inode->i_mutex);
 	mutex_lock(&cgroup_mutex);
@@ -1368,6 +1383,9 @@ 	int ret = 0;
 		pr_warning("cgroup: option changes via remount are deprecated (pid=%d comm=%s)\n",
 			   task_tgid_nr(current), current->comm);
 
+	added_bits = opts.subsys_bits & ~root->subsys_bits;
+	removed_bits = root->subsys_bits & ~opts.subsys_bits;
+
 	/* Don't allow flags or name to change at remount */
 	if (opts.flags != root->flags ||
 	    (opts.name && strcmp(opts.name, root->name))) {
@@ -1383,8 +1401,9 @@ 	int ret = 0;
 	}
 
 	/* clear out any existing files and repopulate subsystem files */
-	cgroup_clear_directory(cgrp->dentry);
-	cgroup_populate_dir(cgrp);
+	cgroup_clear_directory(cgrp->dentry, false, removed_bits);
+	/* re-populate subsystem files */
+	cgroup_repopulate_dir(cgrp, added_bits);
 
 	if (opts.release_agent)
 		strcpy(root->release_agent_path, opts.release_agent);
@@ -2696,6 +2715,8 @@ static int cgroup_add_file(struct cgroup
 	umode_t mode;
 	char name[MAX_CGROUP_TYPE_NAMELEN + MAX_CFTYPE_NAME + 2] = { 0 };
 
+	cft->subsys = subsys;
+
 	/* does @cft->flags tell us to skip creation on @cgrp? */
 	if ((cft->flags & CFTYPE_NOT_ON_ROOT) && !cgrp->parent)
 		return 0;
@@ -3858,7 +3879,7 @@ static struct cftype files[] = {
 	{ }	/* terminate */
 };
 
-static int cgroup_populate_dir(struct cgroup *cgrp)
+static int __cgroup_populate_dir(struct cgroup *cgrp, unsigned long added_bits)
 {
 	int err;
 	struct cgroup_subsys *ss;
@@ -3870,6 +3891,8 @@ 	if (err < 0)
 	/* process cftsets of each subsystem */
 	for_each_subsys(cgrp->root, ss) {
 		struct cftype_set *set;
+		if (!test_bit(ss->subsys_id, &added_bits))
+			continue;
 
 		list_for_each_entry(set, &ss->cftsets, node)
 			cgroup_addrm_files(cgrp, ss, set->cfts, true);
@@ -3890,6 +3913,16 @@ 	if (err < 0)
 	return 0;
 }
 
+static int cgroup_populate_dir(struct cgroup *cgrp)
+{
+	return __cgroup_populate_dir(cgrp, cgrp->root->subsys_bits);
+}
+
+static int cgroup_repopulate_dir(struct cgroup *cgrp, unsigned long added_bits)
+{
+	return __cgroup_populate_dir(cgrp, added_bits);
+}
+
 static void css_dput_fn(struct work_struct *work)
 {
 	struct cgroup_subsys_state *css =


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 3/3] cgroup: add xattr support
  2012-07-02 14:29 [PATCH v3 0/3] cgroup: add xattr support Aristeu Rozanski
  2012-07-02 14:29 ` [PATCH v3 1/3] xattr: extract simple_xattr code from tmpfs Aristeu Rozanski
  2012-07-02 14:29 ` [PATCH v3 2/3] cgroup: revise how we re-populate root directory Aristeu Rozanski
@ 2012-07-02 14:29 ` Aristeu Rozanski
  2012-07-17 20:41 ` [PATCH v3 0/3] " Tejun Heo
  3 siblings, 0 replies; 27+ messages in thread
From: Aristeu Rozanski @ 2012-07-02 14:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: Li Zefan, aris, Tejun Heo, Hugh Dickins, Hillf Danton

[-- Attachment #1: cgroup3.patch --]
[-- Type: text/plain, Size: 10791 bytes --]

From: Li Zefan <lizefan@huawei.com>

This is one of the items in the plumber's wish list.

For use cases:

>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart

v3:
- instead of config option, use mount option to enable xattr support

Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>

---
 include/linux/cgroup.h |   13 ++++-
 kernel/cgroup.c        |  118 ++++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 112 insertions(+), 19 deletions(-)

--- linus-2.6.orig/include/linux/cgroup.h	2012-07-02 10:21:58.055646941 -0400
+++ linus-2.6/include/linux/cgroup.h	2012-07-02 10:22:22.012584730 -0400
@@ -17,6 +17,7 @@  *  Copyright (C) 2004-2006 Silicon Grap
 #include <linux/rwsem.h>
 #include <linux/idr.h>
 #include <linux/workqueue.h>
+#include <linux/xattr.h>
 
 #ifdef CONFIG_CGROUPS
 
@@ -216,6 +217,9 @@ 	 * count users of this cgroup. >0 means
 	/* List of events which userspace want to receive */
 	struct list_head event_list;
 	spinlock_t event_list_lock;
+
+	/* directory xattrs */
+	struct simple_xattrs xattrs;
 };
 
 /*
@@ -312,6 +316,9 @@ 	 * If not 0, file mode is set to this v
 	/* CFTYPE_* flags */
 	unsigned int flags;
 
+	/* file xattrs */
+	struct simple_xattrs xattrs;
+
 	int (*open)(struct inode *inode, struct file *file);
 	ssize_t (*read)(struct cgroup *cgrp, struct cftype *cft,
 			struct file *file,
@@ -397,7 +404,7 @@ 	 * Returns 0 or -ve error code.
  */
 struct cftype_set {
 	struct list_head		node;	/* chained at subsys->cftsets */
-	const struct cftype		*cfts;
+	struct cftype			*cfts;
 };
 
 struct cgroup_scanner {
@@ -409,8 +416,8 @@ struct cgroup_scanner {
 	void *data;
 };
 
-int cgroup_add_cftypes(struct cgroup_subsys *ss, const struct cftype *cfts);
-int cgroup_rm_cftypes(struct cgroup_subsys *ss, const struct cftype *cfts);
+int cgroup_add_cftypes(struct cgroup_subsys *ss, struct cftype *cfts);
+int cgroup_rm_cftypes(struct cgroup_subsys *ss, struct cftype *cfts);
 
 int cgroup_is_removed(const struct cgroup *cgrp);
 
--- linus-2.6.orig/kernel/cgroup.c	2012-07-02 10:21:58.059646980 -0400
+++ linus-2.6/kernel/cgroup.c	2012-07-02 10:22:22.017584857 -0400
@@ -276,7 +276,8 @@ inline int cgroup_is_removed(const struc
 
 /* bits in struct cgroupfs_root flags field */
 enum {
-	ROOT_NOPREFIX, /* mounted subsystems have no named prefix */
+	ROOT_NOPREFIX,	/* mounted subsystems have no named prefix */
+	ROOT_XATTR,	/* supports extended attributes */
 };
 
 static int cgroup_is_releasable(const struct cgroup *cgrp)
@@ -916,15 +917,19 @@ static void cgroup_diput(struct dentry *
 		 */
 		BUG_ON(!list_empty(&cgrp->pidlists));
 
+		simple_xattrs_free(&cgrp->xattrs);
+
 		kfree_rcu(cgrp, rcu_head);
 	} else {
 		struct cfent *cfe = __d_cfe(dentry);
 		struct cgroup *cgrp = dentry->d_parent->d_fsdata;
+		struct cftype *cft = cfe->type;
 
 		WARN_ONCE(!list_empty(&cfe->node) &&
 			  cgrp != &cgrp->root->top_cgroup,
 			  "cfe still linked for %s\n", cfe->type->name);
 		kfree(cfe);
+		simple_xattrs_free(&cft->xattrs);
 	}
 	iput(inode);
 }
@@ -1149,6 +1154,8 @@ static int cgroup_show_options(struct se
 		seq_printf(seq, ",%s", ss->name);
 	if (test_bit(ROOT_NOPREFIX, &root->flags))
 		seq_puts(seq, ",noprefix");
+	if (test_bit(ROOT_XATTR, &root->flags))
+		seq_puts(seq, ",xattr");
 	if (strlen(root->release_agent_path))
 		seq_printf(seq, ",release_agent=%s", root->release_agent_path);
 	if (clone_children(&root->top_cgroup))
@@ -1217,6 +1224,10 @@ 	memset(opts, 0, sizeof(*opts));
 			opts->clone_children = true;
 			continue;
 		}
+		if (!strcmp(token, "xattr")) {
+			set_bit(ROOT_XATTR, &opts->flags);
+			continue;
+		}
 		if (!strncmp(token, "release_agent=", 14)) {
 			/* Specifying two release agents is forbidden */
 			if (opts->release_agent)
@@ -1434,6 +1445,7 @@ static void init_cgroup_housekeeping(str
 	mutex_init(&cgrp->pidlist_mutex);
 	INIT_LIST_HEAD(&cgrp->event_list);
 	spin_lock_init(&cgrp->event_list_lock);
+	simple_xattrs_init(&cgrp->xattrs);
 }
 
 static void init_cgroup_root(struct cgroupfs_root *root)
@@ -1779,6 +1791,8 @@ 	ret = rebind_subsystems(root, 0);
 	mutex_unlock(&cgroup_root_mutex);
 	mutex_unlock(&cgroup_mutex);
 
+	simple_xattrs_free(&cgrp->xattrs);
+
 	kill_litter_super(sb);
 	cgroup_drop_root(root);
 }
@@ -2585,19 +2599,89 @@ static int cgroup_rename(struct inode *o
 	return simple_rename(old_dir, old_dentry, new_dir, new_dentry);
 }
 
+static struct simple_xattrs *__d_xattrs(struct dentry *dentry)
+{
+	if (S_ISDIR(dentry->d_inode->i_mode))
+		return &__d_cgrp(dentry)->xattrs;
+	else
+		return &__d_cft(dentry)->xattrs;
+}
+
+static inline int xattr_enabled(struct dentry *dentry)
+{
+	struct cgroupfs_root *root = dentry->d_sb->s_fs_info;
+	return test_bit(ROOT_XATTR, &root->flags);
+}
+
+static bool is_valid_xattr(const char *name)
+{
+	if (!strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) ||
+	    !strncmp(name, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN) ||
+	    !strncmp(name, XATTR_SECURITY_PREFIX, XATTR_SECURITY_PREFIX_LEN))
+		return true;
+	return false;
+}
+
+static int cgroup_setxattr(struct dentry *dentry, const char *name,
+			   const void *val, size_t size, int flags)
+{
+	if (!xattr_enabled(dentry))
+		return -EOPNOTSUPP;
+	if (!is_valid_xattr(name))
+		return -EINVAL;
+	return simple_xattr_set(__d_xattrs(dentry), name, val, size, flags);
+}
+
+static int cgroup_removexattr(struct dentry *dentry, const char *name)
+{
+	if (!xattr_enabled(dentry))
+		return -EOPNOTSUPP;
+	if (!is_valid_xattr(name))
+		return -EINVAL;
+	return simple_xattr_remove(__d_xattrs(dentry), name);
+}
+
+static ssize_t cgroup_getxattr(struct dentry *dentry, const char *name,
+			       void *buf, size_t size)
+{
+	if (!xattr_enabled(dentry))
+		return -EOPNOTSUPP;
+	if (!is_valid_xattr(name))
+		return -EINVAL;
+	return simple_xattr_get(__d_xattrs(dentry), name, buf, size);
+}
+
+static ssize_t cgroup_listxattr(struct dentry *dentry, char *buf, size_t size)
+{
+	if (!xattr_enabled(dentry))
+		return -EOPNOTSUPP;
+	return simple_xattr_list(__d_xattrs(dentry), buf, size);
+}
+
 static const struct file_operations cgroup_file_operations = {
-	.read = cgroup_file_read,
-	.write = cgroup_file_write,
-	.llseek = generic_file_llseek,
-	.open = cgroup_file_open,
-	.release = cgroup_file_release,
+	.read		= cgroup_file_read,
+	.write		= cgroup_file_write,
+	.llseek		= generic_file_llseek,
+	.open		= cgroup_file_open,
+	.release	= cgroup_file_release,
+};
+
+static const struct inode_operations cgroup_file_inode_operations = {
+	.setxattr	= cgroup_setxattr,
+	.getxattr	= cgroup_getxattr,
+	.listxattr	= cgroup_listxattr,
+	.removexattr	= cgroup_removexattr,
 };
 
 static const struct inode_operations cgroup_dir_inode_operations = {
-	.lookup = cgroup_lookup,
-	.mkdir = cgroup_mkdir,
-	.rmdir = cgroup_rmdir,
-	.rename = cgroup_rename,
+	.lookup		= cgroup_lookup,
+	.mkdir		= cgroup_mkdir,
+	.rmdir		= cgroup_rmdir,
+	.rename		= cgroup_rename,
+	.setxattr	= cgroup_setxattr,
+	.getxattr	= cgroup_getxattr,
+	.listxattr	= cgroup_listxattr,
+	.removexattr	= cgroup_removexattr,
 };
 
 static struct dentry *cgroup_lookup(struct inode *dir, struct dentry *dentry, struct nameidata *nd)
@@ -2645,6 +2729,7 @@ static int cgroup_create_file(struct den
 	} else if (S_ISREG(mode)) {
 		inode->i_size = 0;
 		inode->i_fop = &cgroup_file_operations;
+		inode->i_op = &cgroup_file_inode_operations;
 	}
 	d_instantiate(dentry, inode);
 	dget(dentry);	/* Extra count - pin the dentry in core */
@@ -2705,7 +2790,7 @@ 	umode_t mode = 0;
 }
 
 static int cgroup_add_file(struct cgroup *cgrp, struct cgroup_subsys *subsys,
-			   const struct cftype *cft)
+			   struct cftype *cft)
 {
 	struct dentry *dir = cgrp->dentry;
 	struct cgroup *parent = __d_cgrp(dir);
@@ -2716,6 +2801,7 @@ static int cgroup_add_file(struct cgroup
 	char name[MAX_CGROUP_TYPE_NAMELEN + MAX_CFTYPE_NAME + 2] = { 0 };
 
 	cft->subsys = subsys;
+	simple_xattrs_init(&cft->xattrs);
 
 	/* does @cft->flags tell us to skip creation on @cgrp? */
 	if ((cft->flags & CFTYPE_NOT_ON_ROOT) && !cgrp->parent)
@@ -2757,9 +2843,9 @@ out:
 }
 
 static int cgroup_addrm_files(struct cgroup *cgrp, struct cgroup_subsys *subsys,
-			      const struct cftype cfts[], bool is_add)
+			      struct cftype cfts[], bool is_add)
 {
-	const struct cftype *cft;
+	struct cftype *cft;
 	int err, ret = 0;
 
 	for (cft = cfts; cft->name[0] != '\0'; cft++) {
@@ -2793,7 +2879,7 @@ static void cgroup_cfts_prepare(void)
 }
 
 static void cgroup_cfts_commit(struct cgroup_subsys *ss,
-			       const struct cftype *cfts, bool is_add)
+			       struct cftype *cfts, bool is_add)
 	__releases(&cgroup_mutex) __releases(&cgroup_cft_mutex)
 {
 	LIST_HEAD(pending);
@@ -2844,7 +2930,7 @@  * Returns 0 on successful registration,
  * function currently returns 0 as long as @cfts registration is successful
  * even if some file creation attempts on existing cgroups fail.
  */
-int cgroup_add_cftypes(struct cgroup_subsys *ss, const struct cftype *cfts)
+int cgroup_add_cftypes(struct cgroup_subsys *ss, struct cftype *cfts)
 {
 	struct cftype_set *set;
 
@@ -2874,7 +2960,7 @@ EXPORT_SYMBOL_GPL(cgroup_add_cftypes);
  * Returns 0 on successful unregistration, -ENOENT if @cfts is not
  * registered with @ss.
  */
-int cgroup_rm_cftypes(struct cgroup_subsys *ss, const struct cftype *cfts)
+int cgroup_rm_cftypes(struct cgroup_subsys *ss, struct cftype *cfts)
 {
 	struct cftype_set *set;
 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v4 1/3] xattr: extract simple_xattr code from tmpfs
  2012-07-02 14:29 ` [PATCH v3 1/3] xattr: extract simple_xattr code from tmpfs Aristeu Rozanski
@ 2012-07-03 16:53   ` Aristeu Rozanski
  0 siblings, 0 replies; 27+ messages in thread
From: Aristeu Rozanski @ 2012-07-03 16:53 UTC (permalink / raw)
  To: Aristeu Rozanski
  Cc: linux-kernel, Li Zefan, Tejun Heo, Hugh Dickins, Hillf Danton

From: Li Zefan <lizefan@huawei.com>

Extract in-memory xattr APIs from tmpfs. Will be used by cgroup.

$ size vmlinux.o
   text    data     bss     dec     hex filename
4658782  880729 5195032 10734543         a3cbcf vmlinux.o
$ size vmlinux.o
   text    data     bss     dec     hex filename
4658957  880729 5195032 10734718         a3cc7e vmlinux.o

v4:
- move simple_xattrs_free() to fs/xattr.c
v3:
- in kmem_xattrs_free(), reinitialize the list
- use simple_xattr_* prefix
- introduce simple_xattr_add() to prevent direct list usage

Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>

---
 fs/xattr.c               |  200 +++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/shmem_fs.h |    3 
 include/linux/xattr.h    |   25 +++++
 mm/shmem.c               |  171 +++-------------------------------------
 4 files changed, 240 insertions(+), 159 deletions(-)

Index: xattr/fs/xattr.c
===================================================================
--- xattr.orig/fs/xattr.c	2012-07-03 11:34:51.641674167 -0400
+++ xattr/fs/xattr.c	2012-07-03 11:57:47.079419043 -0400
@@ -783,3 +783,203 @@
 EXPORT_SYMBOL(generic_listxattr);
 EXPORT_SYMBOL(generic_setxattr);
 EXPORT_SYMBOL(generic_removexattr);
+
+/*
+ * initialize the simple_xattrs structure
+ */
+void simple_xattrs_init(struct simple_xattrs *xattrs)
+{
+	INIT_LIST_HEAD(&xattrs->head);
+	spin_lock_init(&xattrs->lock);
+}
+
+/*
+ * Allocate new xattr and copy in the value; but leave the name to callers.
+ */
+struct simple_xattr *simple_xattr_alloc(const void *value, size_t size)
+{
+	struct simple_xattr *new_xattr;
+	size_t len;
+
+	/* wrap around? */
+	len = sizeof(*new_xattr) + size;
+	if (len <= sizeof(*new_xattr))
+		return NULL;
+
+	new_xattr = kmalloc(len, GFP_KERNEL);
+	if (!new_xattr)
+		return NULL;
+
+	new_xattr->size = size;
+	memcpy(new_xattr->value, value, size);
+	return new_xattr;
+}
+
+/*
+ * free all the xattrs
+ */
+void simple_xattrs_free(struct simple_xattrs *xattrs)
+{
+	struct simple_xattr *xattr, *node;
+
+	spin_lock(&xattrs->lock);
+	list_for_each_entry_safe(xattr, node, &xattrs->head, list) {
+		kfree(xattr->name);
+		kfree(xattr);
+	}
+	INIT_LIST_HEAD(&xattrs->head);
+	spin_unlock(&xattrs->lock);
+}
+
+/*
+ * xattr GET operation for in-memory/pseudo filesystems
+ */
+int simple_xattr_get(struct simple_xattrs *xattrs, const char *name,
+		     void *buffer, size_t size)
+{
+	struct simple_xattr *xattr;
+	int ret = -ENODATA;
+
+	spin_lock(&xattrs->lock);
+	list_for_each_entry(xattr, &xattrs->head, list) {
+		if (strcmp(name, xattr->name))
+			continue;
+
+		ret = xattr->size;
+		if (buffer) {
+			if (size < xattr->size)
+				ret = -ERANGE;
+			else
+				memcpy(buffer, xattr->value, xattr->size);
+		}
+		break;
+	}
+	spin_unlock(&xattrs->lock);
+	return ret;
+}
+
+static int __simple_xattr_set(struct simple_xattrs *xattrs, const char *name,
+			      const void *value, size_t size, int flags)
+{
+	struct simple_xattr *xattr;
+	struct simple_xattr *new_xattr = NULL;
+	size_t len;
+	int err = 0;
+
+	/* value == NULL means remove */
+	if (value) {
+		/* wrap around? */
+		len = sizeof(*new_xattr) + size;
+		if (len <= sizeof(*new_xattr))
+			return -ENOMEM;
+
+		new_xattr = kmalloc(len, GFP_KERNEL);
+		if (!new_xattr)
+			return -ENOMEM;
+
+		new_xattr->name = kstrdup(name, GFP_KERNEL);
+		if (!new_xattr->name) {
+			kfree(new_xattr);
+			return -ENOMEM;
+		}
+
+		new_xattr->size = size;
+		memcpy(new_xattr->value, value, size);
+	}
+
+	spin_lock(&xattrs->lock);
+	list_for_each_entry(xattr, &xattrs->head, list) {
+		if (!strcmp(name, xattr->name)) {
+			if (flags & XATTR_CREATE) {
+				xattr = new_xattr;
+				err = -EEXIST;
+			} else if (new_xattr) {
+				list_replace(&xattr->list, &new_xattr->list);
+			} else {
+				list_del(&xattr->list);
+			}
+			goto out;
+		}
+	}
+	if (flags & XATTR_REPLACE) {
+		xattr = new_xattr;
+		err = -ENODATA;
+	} else {
+		list_add(&new_xattr->list, &xattrs->head);
+		xattr = NULL;
+	}
+out:
+	spin_unlock(&xattrs->lock);
+	if (xattr) {
+		kfree(xattr->name);
+		kfree(xattr);
+	}
+	return err;
+
+}
+
+/*
+ * xattr SET operation for in-memory/pseudo filesystems
+ */
+int simple_xattr_set(struct simple_xattrs *xattrs, const char *name,
+		     const void *value, size_t size, int flags)
+{
+	if (size == 0)
+		value = ""; /* empty EA, do not remove */
+	return __simple_xattr_set(xattrs, name, value, size, flags);
+}
+
+/*
+ * xattr REMOVE operation for in-memory/pseudo filesystems
+ */
+int simple_xattr_remove(struct simple_xattrs *xattrs, const char *name)
+{
+        return __simple_xattr_set(xattrs, name, NULL, 0, XATTR_REPLACE);
+}
+
+static bool xattr_is_trusted(const char *name)
+{
+	return !strncmp(name, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN);
+}
+
+/*
+ * xattr LIST operation for in-memory/pseudo filesystems
+ */
+ssize_t simple_xattr_list(struct simple_xattrs *xattrs, char *buffer, size_t size)
+{
+	bool trusted = capable(CAP_SYS_ADMIN);
+	struct simple_xattr *xattr;
+	size_t used = 0;
+
+	spin_lock(&xattrs->lock);
+	list_for_each_entry(xattr, &xattrs->head, list) {
+		size_t len;
+
+		/* skip "trusted." attributes for unprivileged callers */
+		if (!trusted && xattr_is_trusted(xattr->name))
+			continue;
+
+		len = strlen(xattr->name) + 1;
+		used += len;
+		if (buffer) {
+			if (size < used) {
+				used = -ERANGE;
+				break;
+			}
+			memcpy(buffer, xattr->name, len);
+			buffer += len;
+		}
+	}
+	spin_unlock(&xattrs->lock);
+
+	return used;
+}
+
+void simple_xattr_list_add(struct simple_xattrs *xattrs,
+			   struct simple_xattr *new_xattr)
+{
+	spin_lock(&xattrs->lock);
+	list_add(&new_xattr->list, &xattrs->head);
+	spin_unlock(&xattrs->lock);
+}
+
Index: xattr/include/linux/shmem_fs.h
===================================================================
--- xattr.orig/include/linux/shmem_fs.h	2012-07-03 11:34:51.641674167 -0400
+++ xattr/include/linux/shmem_fs.h	2012-07-03 11:34:54.273746249 -0400
@@ -5,6 +5,7 @@
 #include <linux/mempolicy.h>
 #include <linux/pagemap.h>
 #include <linux/percpu_counter.h>
+#include <linux/xattr.h>
 
 /* inode in-kernel data */
 
@@ -18,7 +19,7 @@
 	};
 	struct shared_policy	policy;		/* NUMA memory alloc policy */
 	struct list_head	swaplist;	/* chain of maybes on swap */
-	struct list_head	xattr_list;	/* list of shmem_xattr */
+	struct simple_xattrs	xattrs;		/* list of xattrs */
 	struct inode		vfs_inode;
 };
 
Index: xattr/include/linux/xattr.h
===================================================================
--- xattr.orig/include/linux/xattr.h	2012-07-03 11:34:51.641674167 -0400
+++ xattr/include/linux/xattr.h	2012-07-03 11:36:21.900146346 -0400
@@ -60,6 +60,7 @@
 #ifdef  __KERNEL__
 
 #include <linux/types.h>
+#include <linux/spinlock.h>
 
 struct inode;
 struct dentry;
@@ -96,6 +97,30 @@
 			   char **xattr_value, size_t size, gfp_t flags);
 int vfs_xattr_cmp(struct dentry *dentry, const char *xattr_name,
 		  const char *value, size_t size, gfp_t flags);
+
+struct simple_xattrs {
+	struct list_head head;
+	spinlock_t lock;
+};
+
+struct simple_xattr {
+	struct list_head list;
+	char *name;
+	size_t size;
+	char value[0];
+};
+
+void simple_xattrs_init(struct simple_xattrs *xattrs);
+struct simple_xattr *simple_xattr_alloc(const void *value, size_t size);
+void simple_xattrs_free(struct simple_xattrs *xattrs);
+int simple_xattr_get(struct simple_xattrs *xattrs, const char *name,
+		     void *buffer, size_t size);
+int simple_xattr_set(struct simple_xattrs *xattrs, const char *name,
+		     const void *value, size_t size, int flags);
+int simple_xattr_remove(struct simple_xattrs *xattrs, const char *name);
+ssize_t simple_xattr_list(struct simple_xattrs *xattrs, char *buffer, size_t size);
+void simple_xattr_list_add(struct simple_xattrs *xattrs, struct simple_xattr *new_xattr);
+
 #endif  /*  __KERNEL__  */
 
 #endif	/* _LINUX_XATTR_H */
Index: xattr/mm/shmem.c
===================================================================
--- xattr.orig/mm/shmem.c	2012-07-03 11:34:51.641674167 -0400
+++ xattr/mm/shmem.c	2012-07-03 11:57:55.159641189 -0400
@@ -77,13 +77,6 @@
 /* Symlink up to this size is kmalloc'ed instead of using a swappable page */
 #define SHORT_SYMLINK_LEN 128
 
-struct shmem_xattr {
-	struct list_head list;	/* anchored by shmem_inode_info->xattr_list */
-	char *name;		/* xattr name */
-	size_t size;
-	char value[0];
-};
-
 /*
  * shmem_fallocate and shmem_writepage communicate via inode->i_private
  * (with i_mutex making sure that it has only one user at a time):
@@ -627,7 +620,6 @@
 static void shmem_evict_inode(struct inode *inode)
 {
 	struct shmem_inode_info *info = SHMEM_I(inode);
-	struct shmem_xattr *xattr, *nxattr;
 
 	if (inode->i_mapping->a_ops == &shmem_aops) {
 		shmem_unacct_size(info->flags, inode->i_size);
@@ -641,10 +633,7 @@
 	} else
 		kfree(info->symlink);
 
-	list_for_each_entry_safe(xattr, nxattr, &info->xattr_list, list) {
-		kfree(xattr->name);
-		kfree(xattr);
-	}
+	simple_xattrs_free(&info->xattrs);
 	BUG_ON(inode->i_blocks);
 	shmem_free_inode(inode->i_sb);
 	clear_inode(inode);
@@ -1360,7 +1349,7 @@
 		spin_lock_init(&info->lock);
 		info->flags = flags & VM_NORESERVE;
 		INIT_LIST_HEAD(&info->swaplist);
-		INIT_LIST_HEAD(&info->xattr_list);
+		simple_xattrs_init(&info->xattrs);
 		cache_no_acl(inode);
 
 		switch (mode & S_IFMT) {
@@ -2134,28 +2123,6 @@
  */
 
 /*
- * Allocate new xattr and copy in the value; but leave the name to callers.
- */
-static struct shmem_xattr *shmem_xattr_alloc(const void *value, size_t size)
-{
-	struct shmem_xattr *new_xattr;
-	size_t len;
-
-	/* wrap around? */
-	len = sizeof(*new_xattr) + size;
-	if (len <= sizeof(*new_xattr))
-		return NULL;
-
-	new_xattr = kmalloc(len, GFP_KERNEL);
-	if (!new_xattr)
-		return NULL;
-
-	new_xattr->size = size;
-	memcpy(new_xattr->value, value, size);
-	return new_xattr;
-}
-
-/*
  * Callback for security_inode_init_security() for acquiring xattrs.
  */
 static int shmem_initxattrs(struct inode *inode,
@@ -2164,11 +2131,11 @@
 {
 	struct shmem_inode_info *info = SHMEM_I(inode);
 	const struct xattr *xattr;
-	struct shmem_xattr *new_xattr;
+	struct simple_xattr *new_xattr;
 	size_t len;
 
 	for (xattr = xattr_array; xattr->name != NULL; xattr++) {
-		new_xattr = shmem_xattr_alloc(xattr->value, xattr->value_len);
+		new_xattr = simple_xattr_alloc(xattr->value, xattr->value_len);
 		if (!new_xattr)
 			return -ENOMEM;
 
@@ -2185,91 +2152,12 @@
 		memcpy(new_xattr->name + XATTR_SECURITY_PREFIX_LEN,
 		       xattr->name, len);
 
-		spin_lock(&info->lock);
-		list_add(&new_xattr->list, &info->xattr_list);
-		spin_unlock(&info->lock);
+		simple_xattr_list_add(&info->xattrs, new_xattr);
 	}
 
 	return 0;
 }
 
-static int shmem_xattr_get(struct dentry *dentry, const char *name,
-			   void *buffer, size_t size)
-{
-	struct shmem_inode_info *info;
-	struct shmem_xattr *xattr;
-	int ret = -ENODATA;
-
-	info = SHMEM_I(dentry->d_inode);
-
-	spin_lock(&info->lock);
-	list_for_each_entry(xattr, &info->xattr_list, list) {
-		if (strcmp(name, xattr->name))
-			continue;
-
-		ret = xattr->size;
-		if (buffer) {
-			if (size < xattr->size)
-				ret = -ERANGE;
-			else
-				memcpy(buffer, xattr->value, xattr->size);
-		}
-		break;
-	}
-	spin_unlock(&info->lock);
-	return ret;
-}
-
-static int shmem_xattr_set(struct inode *inode, const char *name,
-			   const void *value, size_t size, int flags)
-{
-	struct shmem_inode_info *info = SHMEM_I(inode);
-	struct shmem_xattr *xattr;
-	struct shmem_xattr *new_xattr = NULL;
-	int err = 0;
-
-	/* value == NULL means remove */
-	if (value) {
-		new_xattr = shmem_xattr_alloc(value, size);
-		if (!new_xattr)
-			return -ENOMEM;
-
-		new_xattr->name = kstrdup(name, GFP_KERNEL);
-		if (!new_xattr->name) {
-			kfree(new_xattr);
-			return -ENOMEM;
-		}
-	}
-
-	spin_lock(&info->lock);
-	list_for_each_entry(xattr, &info->xattr_list, list) {
-		if (!strcmp(name, xattr->name)) {
-			if (flags & XATTR_CREATE) {
-				xattr = new_xattr;
-				err = -EEXIST;
-			} else if (new_xattr) {
-				list_replace(&xattr->list, &new_xattr->list);
-			} else {
-				list_del(&xattr->list);
-			}
-			goto out;
-		}
-	}
-	if (flags & XATTR_REPLACE) {
-		xattr = new_xattr;
-		err = -ENODATA;
-	} else {
-		list_add(&new_xattr->list, &info->xattr_list);
-		xattr = NULL;
-	}
-out:
-	spin_unlock(&info->lock);
-	if (xattr)
-		kfree(xattr->name);
-	kfree(xattr);
-	return err;
-}
-
 static const struct xattr_handler *shmem_xattr_handlers[] = {
 #ifdef CONFIG_TMPFS_POSIX_ACL
 	&generic_acl_access_handler,
@@ -2300,6 +2188,7 @@
 static ssize_t shmem_getxattr(struct dentry *dentry, const char *name,
 			      void *buffer, size_t size)
 {
+	struct shmem_inode_info *info = SHMEM_I(dentry->d_inode);
 	int err;
 
 	/*
@@ -2314,12 +2203,13 @@
 	if (err)
 		return err;
 
-	return shmem_xattr_get(dentry, name, buffer, size);
+	return simple_xattr_get(&info->xattrs, name, buffer, size);
 }
 
 static int shmem_setxattr(struct dentry *dentry, const char *name,
 			  const void *value, size_t size, int flags)
 {
+	struct shmem_inode_info *info = SHMEM_I(dentry->d_inode);
 	int err;
 
 	/*
@@ -2334,15 +2224,12 @@
 	if (err)
 		return err;
 
-	if (size == 0)
-		value = "";  /* empty EA, do not remove */
-
-	return shmem_xattr_set(dentry->d_inode, name, value, size, flags);
-
+	return simple_xattr_set(&info->xattrs, name, value, size, flags);
 }
 
 static int shmem_removexattr(struct dentry *dentry, const char *name)
 {
+	struct shmem_inode_info *info = SHMEM_I(dentry->d_inode);
 	int err;
 
 	/*
@@ -2357,45 +2244,13 @@
 	if (err)
 		return err;
 
-	return shmem_xattr_set(dentry->d_inode, name, NULL, 0, XATTR_REPLACE);
-}
-
-static bool xattr_is_trusted(const char *name)
-{
-	return !strncmp(name, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN);
+	return simple_xattr_remove(&info->xattrs, name);
 }
 
 static ssize_t shmem_listxattr(struct dentry *dentry, char *buffer, size_t size)
 {
-	bool trusted = capable(CAP_SYS_ADMIN);
-	struct shmem_xattr *xattr;
-	struct shmem_inode_info *info;
-	size_t used = 0;
-
-	info = SHMEM_I(dentry->d_inode);
-
-	spin_lock(&info->lock);
-	list_for_each_entry(xattr, &info->xattr_list, list) {
-		size_t len;
-
-		/* skip "trusted." attributes for unprivileged callers */
-		if (!trusted && xattr_is_trusted(xattr->name))
-			continue;
-
-		len = strlen(xattr->name) + 1;
-		used += len;
-		if (buffer) {
-			if (size < used) {
-				used = -ERANGE;
-				break;
-			}
-			memcpy(buffer, xattr->name, len);
-			buffer += len;
-		}
-	}
-	spin_unlock(&info->lock);
-
-	return used;
+	struct shmem_inode_info *info = SHMEM_I(dentry->d_inode);
+	return simple_xattr_list(&info->xattrs, buffer, size);
 }
 #endif /* CONFIG_TMPFS_XATTR */
 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/3] cgroup: revise how we re-populate root directory
  2012-07-02 14:29 ` [PATCH v3 2/3] cgroup: revise how we re-populate root directory Aristeu Rozanski
@ 2012-07-09 17:17   ` Tejun Heo
  2012-07-09 17:22     ` Tejun Heo
  0 siblings, 1 reply; 27+ messages in thread
From: Tejun Heo @ 2012-07-09 17:17 UTC (permalink / raw)
  To: Aristeu Rozanski; +Cc: linux-kernel, Li Zefan, Hugh Dickins, Hillf Danton

On Mon, Jul 02, 2012 at 10:29:27AM -0400, Aristeu Rozanski wrote:
> From: Li Zefan <lizefan@huawei.com>
> 
> When remounting cgroupfs with some subsystems added to it and some
> removed, cgroup will remove all the files in root directory and then
> re-popluate it.
> 
> What I'm doing here is, only remove files which belong to subsystems that
> are to be unbinded, and only create files for newly-added subsystems.
> The purpose is to have all other files untouched.
> 
> This is a preparation for cgroup xattr support.
> 
> v3:
> - refresh patches after recent refactoring

Changing subsys_bits via remount is being deprecated.  No need to
worry about this.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/3] cgroup: revise how we re-populate root directory
  2012-07-09 17:17   ` Tejun Heo
@ 2012-07-09 17:22     ` Tejun Heo
  2012-07-09 17:28       ` Tejun Heo
  0 siblings, 1 reply; 27+ messages in thread
From: Tejun Heo @ 2012-07-09 17:22 UTC (permalink / raw)
  To: Aristeu Rozanski; +Cc: linux-kernel, Li Zefan, Hugh Dickins, Hillf Danton

On Mon, Jul 09, 2012 at 10:17:48AM -0700, Tejun Heo wrote:
> On Mon, Jul 02, 2012 at 10:29:27AM -0400, Aristeu Rozanski wrote:
> > From: Li Zefan <lizefan@huawei.com>
> > 
> > When remounting cgroupfs with some subsystems added to it and some
> > removed, cgroup will remove all the files in root directory and then
> > re-popluate it.
> > 
> > What I'm doing here is, only remove files which belong to subsystems that
> > are to be unbinded, and only create files for newly-added subsystems.
> > The purpose is to have all other files untouched.
> > 
> > This is a preparation for cgroup xattr support.
> > 
> > v3:
> > - refresh patches after recent refactoring
> 
> Changing subsys_bits via remount is being deprecated.  No need to
> worry about this.

Also, why does this matter?  The xattrs are kept in cgrp anyway.  Why
does keeping dentry/inode around make difference?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/3] cgroup: revise how we re-populate root directory
  2012-07-09 17:22     ` Tejun Heo
@ 2012-07-09 17:28       ` Tejun Heo
  2012-07-10 19:27         ` Aristeu Rozanski
  0 siblings, 1 reply; 27+ messages in thread
From: Tejun Heo @ 2012-07-09 17:28 UTC (permalink / raw)
  To: Aristeu Rozanski; +Cc: linux-kernel, Li Zefan, Hugh Dickins, Hillf Danton

On Mon, Jul 09, 2012 at 10:22:20AM -0700, Tejun Heo wrote:
> > Changing subsys_bits via remount is being deprecated.  No need to
> > worry about this.
> 
> Also, why does this matter?  The xattrs are kept in cgrp anyway.  Why
> does keeping dentry/inode around make difference?

Ah, okay, the file xattrs are kept in cfe which goes away on file
deletion.  :)

While changing subsys via remount is going away, I think removing
files selectively is still nice to have (even for single hierarchy).
I wish it were implemented differently tho.  As cgroup_rm_file()
already has @cft argument, wouldn't it be better to e.g. add @subsys
arg to cgroup_clear_directory() and let that walk subsys->cftsets
instead?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/3] cgroup: revise how we re-populate root directory
  2012-07-09 17:28       ` Tejun Heo
@ 2012-07-10 19:27         ` Aristeu Rozanski
  2012-07-17 13:56           ` Aristeu Rozanski
  2012-07-17 18:38           ` Tejun Heo
  0 siblings, 2 replies; 27+ messages in thread
From: Aristeu Rozanski @ 2012-07-10 19:27 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Li Zefan, Hugh Dickins, Hillf Danton

Hi Tejun,
On Mon, Jul 09, 2012 at 10:28:31AM -0700, Tejun Heo wrote:
> On Mon, Jul 09, 2012 at 10:22:20AM -0700, Tejun Heo wrote:
> > > Changing subsys_bits via remount is being deprecated.  No need to
> > > worry about this.
> > 
> > Also, why does this matter?  The xattrs are kept in cgrp anyway.  Why
> > does keeping dentry/inode around make difference?
> 
> Ah, okay, the file xattrs are kept in cfe which goes away on file
> deletion.  :)
> 
> While changing subsys via remount is going away, I think removing
> files selectively is still nice to have (even for single hierarchy).
> I wish it were implemented differently tho.  As cgroup_rm_file()
> already has @cft argument, wouldn't it be better to e.g. add @subsys
> arg to cgroup_clear_directory() and let that walk subsys->cftsets
> instead?

something like this? (will resubmit if you're ok with this version)

Index: xattr/include/linux/cgroup.h
===================================================================
--- xattr.orig/include/linux/cgroup.h	2012-07-03 15:43:43.404334484 -0400
+++ xattr/include/linux/cgroup.h	2012-07-10 13:17:32.285119770 -0400
@@ -306,6 +306,9 @@
 	 */
 	size_t max_write_len;
 
+	/* The subsystem this cgroup file belongs to */
+	struct cgroup_subsys *subsys;
+
 	/* CFTYPE_* flags */
 	unsigned int flags;
 
Index: xattr/kernel/cgroup.c
===================================================================
--- xattr.orig/kernel/cgroup.c	2012-07-03 15:43:43.404334484 -0400
+++ xattr/kernel/cgroup.c	2012-07-10 13:28:18.424657727 -0400
@@ -825,6 +825,7 @@
 static struct dentry *cgroup_lookup(struct inode *, struct dentry *, struct nameidata *);
 static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry);
 static int cgroup_populate_dir(struct cgroup *cgrp);
+static int cgroup_repopulate_dir(struct cgroup *cgrp, unsigned long added_bits);
 static const struct inode_operations cgroup_dir_inode_operations;
 static const struct file_operations proc_cgroupstats_operations;
 
@@ -973,12 +974,23 @@
 	return -ENOENT;
 }
 
-static void cgroup_clear_directory(struct dentry *dir)
+static void cgroup_clear_directory(struct dentry *dir, bool remove_all,
+				   unsigned long removed_bits)
 {
 	struct cgroup *cgrp = __d_cgrp(dir);
+	struct cgroup_subsys *ss;
 
-	while (!list_empty(&cgrp->files))
-		cgroup_rm_file(cgrp, NULL);
+	for_each_subsys(cgrp->root, ss) {
+		struct cftype_set *set;
+		if (!remove_all && !test_bit(ss->subsys_id, &removed_bits))
+			continue;
+		list_for_each_entry(set, &ss->cftsets, node)
+			cgroup_rm_file(cgrp, set->cfts);
+	}
+	if (remove_all) {
+		while (!list_empty(&cgrp->files))
+			cgroup_rm_file(cgrp, NULL);
+	}
 }
 
 /*
@@ -988,7 +1000,7 @@
 {
 	struct dentry *parent;
 
-	cgroup_clear_directory(dentry);
+	cgroup_clear_directory(dentry, true, 0);
 
 	parent = dentry->d_parent;
 	spin_lock(&parent->d_lock);
@@ -1353,6 +1365,7 @@
 	struct cgroupfs_root *root = sb->s_fs_info;
 	struct cgroup *cgrp = &root->top_cgroup;
 	struct cgroup_sb_opts opts;
+	unsigned long added_bits, removed_bits;
 
 	mutex_lock(&cgrp->dentry->d_inode->i_mutex);
 	mutex_lock(&cgroup_mutex);
@@ -1368,6 +1381,9 @@
 		pr_warning("cgroup: option changes via remount are deprecated (pid=%d comm=%s)\n",
 			   task_tgid_nr(current), current->comm);
 
+	added_bits = opts.subsys_bits & ~root->subsys_bits;
+	removed_bits = root->subsys_bits & ~opts.subsys_bits;
+
 	/* Don't allow flags or name to change at remount */
 	if (opts.flags != root->flags ||
 	    (opts.name && strcmp(opts.name, root->name))) {
@@ -1383,8 +1399,9 @@
 	}
 
 	/* clear out any existing files and repopulate subsystem files */
-	cgroup_clear_directory(cgrp->dentry);
-	cgroup_populate_dir(cgrp);
+	cgroup_clear_directory(cgrp->dentry, false, removed_bits);
+	/* re-populate subsystem files */
+	cgroup_repopulate_dir(cgrp, added_bits);
 
 	if (opts.release_agent)
 		strcpy(root->release_agent_path, opts.release_agent);
@@ -2696,6 +2713,8 @@
 	umode_t mode;
 	char name[MAX_CGROUP_TYPE_NAMELEN + MAX_CFTYPE_NAME + 2] = { 0 };
 
+	cft->subsys = subsys;
+
 	/* does @cft->flags tell us to skip creation on @cgrp? */
 	if ((cft->flags & CFTYPE_NOT_ON_ROOT) && !cgrp->parent)
 		return 0;
@@ -3858,18 +3877,16 @@
 	{ }	/* terminate */
 };
 
-static int cgroup_populate_dir(struct cgroup *cgrp)
+static int __cgroup_populate_dir(struct cgroup *cgrp, unsigned long added_bits)
 {
 	int err;
 	struct cgroup_subsys *ss;
 
-	err = cgroup_addrm_files(cgrp, NULL, files, true);
-	if (err < 0)
-		return err;
-
 	/* process cftsets of each subsystem */
 	for_each_subsys(cgrp->root, ss) {
 		struct cftype_set *set;
+		if (!test_bit(ss->subsys_id, &added_bits))
+			continue;
 
 		list_for_each_entry(set, &ss->cftsets, node)
 			cgroup_addrm_files(cgrp, ss, set->cfts, true);
@@ -3890,6 +3907,22 @@
 	return 0;
 }
 
+static int cgroup_populate_dir(struct cgroup *cgrp)
+{
+	int err;
+
+	err = cgroup_addrm_files(cgrp, NULL, files, true);
+	if (err < 0)
+		return err;
+
+	return __cgroup_populate_dir(cgrp, cgrp->root->subsys_bits);
+}
+
+static int cgroup_repopulate_dir(struct cgroup *cgrp, unsigned long added_bits)
+{
+	return __cgroup_populate_dir(cgrp, added_bits);
+}
+
 static void css_dput_fn(struct work_struct *work)
 {
 	struct cgroup_subsys_state *css =

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/3] cgroup: revise how we re-populate root directory
  2012-07-10 19:27         ` Aristeu Rozanski
@ 2012-07-17 13:56           ` Aristeu Rozanski
  2012-07-17 18:38           ` Tejun Heo
  1 sibling, 0 replies; 27+ messages in thread
From: Aristeu Rozanski @ 2012-07-17 13:56 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Aristeu Rozanski, linux-kernel, Li Zefan, Hugh Dickins, Hillf Danton

Hi Tejun,
On Tue, Jul 10, 2012 at 03:27:22PM -0400, Aristeu Rozanski wrote:
> Hi Tejun,
> On Mon, Jul 09, 2012 at 10:28:31AM -0700, Tejun Heo wrote:
> > On Mon, Jul 09, 2012 at 10:22:20AM -0700, Tejun Heo wrote:
> > > > Changing subsys_bits via remount is being deprecated.  No need to
> > > > worry about this.
> > > 
> > > Also, why does this matter?  The xattrs are kept in cgrp anyway.  Why
> > > does keeping dentry/inode around make difference?
> > 
> > Ah, okay, the file xattrs are kept in cfe which goes away on file
> > deletion.  :)
> > 
> > While changing subsys via remount is going away, I think removing
> > files selectively is still nice to have (even for single hierarchy).
> > I wish it were implemented differently tho.  As cgroup_rm_file()
> > already has @cft argument, wouldn't it be better to e.g. add @subsys
> > arg to cgroup_clear_directory() and let that walk subsys->cftsets
> > instead?
> 
> something like this? (will resubmit if you're ok with this version)

had some time to look on this?

> Index: xattr/include/linux/cgroup.h
> ===================================================================
> --- xattr.orig/include/linux/cgroup.h	2012-07-03 15:43:43.404334484 -0400
> +++ xattr/include/linux/cgroup.h	2012-07-10 13:17:32.285119770 -0400
> @@ -306,6 +306,9 @@
>  	 */
>  	size_t max_write_len;
>  
> +	/* The subsystem this cgroup file belongs to */
> +	struct cgroup_subsys *subsys;
> +
>  	/* CFTYPE_* flags */
>  	unsigned int flags;
>  
> Index: xattr/kernel/cgroup.c
> ===================================================================
> --- xattr.orig/kernel/cgroup.c	2012-07-03 15:43:43.404334484 -0400
> +++ xattr/kernel/cgroup.c	2012-07-10 13:28:18.424657727 -0400
> @@ -825,6 +825,7 @@
>  static struct dentry *cgroup_lookup(struct inode *, struct dentry *, struct nameidata *);
>  static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry);
>  static int cgroup_populate_dir(struct cgroup *cgrp);
> +static int cgroup_repopulate_dir(struct cgroup *cgrp, unsigned long added_bits);
>  static const struct inode_operations cgroup_dir_inode_operations;
>  static const struct file_operations proc_cgroupstats_operations;
>  
> @@ -973,12 +974,23 @@
>  	return -ENOENT;
>  }
>  
> -static void cgroup_clear_directory(struct dentry *dir)
> +static void cgroup_clear_directory(struct dentry *dir, bool remove_all,
> +				   unsigned long removed_bits)
>  {
>  	struct cgroup *cgrp = __d_cgrp(dir);
> +	struct cgroup_subsys *ss;
>  
> -	while (!list_empty(&cgrp->files))
> -		cgroup_rm_file(cgrp, NULL);
> +	for_each_subsys(cgrp->root, ss) {
> +		struct cftype_set *set;
> +		if (!remove_all && !test_bit(ss->subsys_id, &removed_bits))
> +			continue;
> +		list_for_each_entry(set, &ss->cftsets, node)
> +			cgroup_rm_file(cgrp, set->cfts);
> +	}
> +	if (remove_all) {
> +		while (!list_empty(&cgrp->files))
> +			cgroup_rm_file(cgrp, NULL);
> +	}
>  }
>  
>  /*
> @@ -988,7 +1000,7 @@
>  {
>  	struct dentry *parent;
>  
> -	cgroup_clear_directory(dentry);
> +	cgroup_clear_directory(dentry, true, 0);
>  
>  	parent = dentry->d_parent;
>  	spin_lock(&parent->d_lock);
> @@ -1353,6 +1365,7 @@
>  	struct cgroupfs_root *root = sb->s_fs_info;
>  	struct cgroup *cgrp = &root->top_cgroup;
>  	struct cgroup_sb_opts opts;
> +	unsigned long added_bits, removed_bits;
>  
>  	mutex_lock(&cgrp->dentry->d_inode->i_mutex);
>  	mutex_lock(&cgroup_mutex);
> @@ -1368,6 +1381,9 @@
>  		pr_warning("cgroup: option changes via remount are deprecated (pid=%d comm=%s)\n",
>  			   task_tgid_nr(current), current->comm);
>  
> +	added_bits = opts.subsys_bits & ~root->subsys_bits;
> +	removed_bits = root->subsys_bits & ~opts.subsys_bits;
> +
>  	/* Don't allow flags or name to change at remount */
>  	if (opts.flags != root->flags ||
>  	    (opts.name && strcmp(opts.name, root->name))) {
> @@ -1383,8 +1399,9 @@
>  	}
>  
>  	/* clear out any existing files and repopulate subsystem files */
> -	cgroup_clear_directory(cgrp->dentry);
> -	cgroup_populate_dir(cgrp);
> +	cgroup_clear_directory(cgrp->dentry, false, removed_bits);
> +	/* re-populate subsystem files */
> +	cgroup_repopulate_dir(cgrp, added_bits);
>  
>  	if (opts.release_agent)
>  		strcpy(root->release_agent_path, opts.release_agent);
> @@ -2696,6 +2713,8 @@
>  	umode_t mode;
>  	char name[MAX_CGROUP_TYPE_NAMELEN + MAX_CFTYPE_NAME + 2] = { 0 };
>  
> +	cft->subsys = subsys;
> +
>  	/* does @cft->flags tell us to skip creation on @cgrp? */
>  	if ((cft->flags & CFTYPE_NOT_ON_ROOT) && !cgrp->parent)
>  		return 0;
> @@ -3858,18 +3877,16 @@
>  	{ }	/* terminate */
>  };
>  
> -static int cgroup_populate_dir(struct cgroup *cgrp)
> +static int __cgroup_populate_dir(struct cgroup *cgrp, unsigned long added_bits)
>  {
>  	int err;
>  	struct cgroup_subsys *ss;
>  
> -	err = cgroup_addrm_files(cgrp, NULL, files, true);
> -	if (err < 0)
> -		return err;
> -
>  	/* process cftsets of each subsystem */
>  	for_each_subsys(cgrp->root, ss) {
>  		struct cftype_set *set;
> +		if (!test_bit(ss->subsys_id, &added_bits))
> +			continue;
>  
>  		list_for_each_entry(set, &ss->cftsets, node)
>  			cgroup_addrm_files(cgrp, ss, set->cfts, true);
> @@ -3890,6 +3907,22 @@
>  	return 0;
>  }
>  
> +static int cgroup_populate_dir(struct cgroup *cgrp)
> +{
> +	int err;
> +
> +	err = cgroup_addrm_files(cgrp, NULL, files, true);
> +	if (err < 0)
> +		return err;
> +
> +	return __cgroup_populate_dir(cgrp, cgrp->root->subsys_bits);
> +}
> +
> +static int cgroup_repopulate_dir(struct cgroup *cgrp, unsigned long added_bits)
> +{
> +	return __cgroup_populate_dir(cgrp, added_bits);
> +}
> +
>  static void css_dput_fn(struct work_struct *work)
>  {
>  	struct cgroup_subsys_state *css =
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/3] cgroup: revise how we re-populate root directory
  2012-07-10 19:27         ` Aristeu Rozanski
  2012-07-17 13:56           ` Aristeu Rozanski
@ 2012-07-17 18:38           ` Tejun Heo
  2012-07-17 21:29             ` Aristeu Rozanski
  1 sibling, 1 reply; 27+ messages in thread
From: Tejun Heo @ 2012-07-17 18:38 UTC (permalink / raw)
  To: Aristeu Rozanski; +Cc: linux-kernel, Li Zefan, Hugh Dickins, Hillf Danton

Hello,

Sorry about the delay.

On Tue, Jul 10, 2012 at 03:27:22PM -0400, Aristeu Rozanski wrote:
> Index: xattr/include/linux/cgroup.h
> ===================================================================
> --- xattr.orig/include/linux/cgroup.h	2012-07-03 15:43:43.404334484 -0400
> +++ xattr/include/linux/cgroup.h	2012-07-10 13:17:32.285119770 -0400
> @@ -306,6 +306,9 @@
>  	 */
>  	size_t max_write_len;
>  
> +	/* The subsystem this cgroup file belongs to */
> +	struct cgroup_subsys *subsys;

I couldn't find how this is used.  How is it used?

> Index: xattr/kernel/cgroup.c
> ===================================================================
> --- xattr.orig/kernel/cgroup.c	2012-07-03 15:43:43.404334484 -0400
> +++ xattr/kernel/cgroup.c	2012-07-10 13:28:18.424657727 -0400
> @@ -825,6 +825,7 @@
>  static struct dentry *cgroup_lookup(struct inode *, struct dentry *, struct nameidata *);
>  static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry);
>  static int cgroup_populate_dir(struct cgroup *cgrp);
> +static int cgroup_repopulate_dir(struct cgroup *cgrp, unsigned long added_bits);
>  static const struct inode_operations cgroup_dir_inode_operations;
>  static const struct file_operations proc_cgroupstats_operations;
>  
> @@ -973,12 +974,23 @@
>  	return -ENOENT;
>  }
>  
> -static void cgroup_clear_directory(struct dentry *dir)
> +static void cgroup_clear_directory(struct dentry *dir, bool remove_all,
> +				   unsigned long removed_bits)

Maybe @subsys_mask is better?  Also, why is @remove_all a separate
argument?  Can't we define, say, CGRP_ALL_SUBSYS and pass it in as
@subsys_mask?

>  {
>  	struct cgroup *cgrp = __d_cgrp(dir);
> +	struct cgroup_subsys *ss;
>  
> -	while (!list_empty(&cgrp->files))
> -		cgroup_rm_file(cgrp, NULL);
> +	for_each_subsys(cgrp->root, ss) {
> +		struct cftype_set *set;
> +		if (!remove_all && !test_bit(ss->subsys_id, &removed_bits))
> +			continue;
> +		list_for_each_entry(set, &ss->cftsets, node)
> +			cgroup_rm_file(cgrp, set->cfts);
> +	}
> +	if (remove_all) {
> +		while (!list_empty(&cgrp->files))
> +			cgroup_rm_file(cgrp, NULL);
> +	}

Ah, I see.  We don't have a bit for the base files.  I think it's a
bit confusing that there are two params controlling subsys file
removal.  Maybe reserve a bit for NULL subsys or use
@remove_base_files argument instead?

> +static int cgroup_populate_dir(struct cgroup *cgrp)
> +{
> +	int err;
> +
> +	err = cgroup_addrm_files(cgrp, NULL, files, true);
> +	if (err < 0)
> +		return err;
> +
> +	return __cgroup_populate_dir(cgrp, cgrp->root->subsys_bits);
> +}

I think it would be best to keep the clear and populate interface more
or less symmetrical so that both have @subsys_mask (and maybe
@base_files).

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 0/3] cgroup: add xattr support
  2012-07-02 14:29 [PATCH v3 0/3] cgroup: add xattr support Aristeu Rozanski
                   ` (2 preceding siblings ...)
  2012-07-02 14:29 ` [PATCH v3 3/3] cgroup: add xattr support Aristeu Rozanski
@ 2012-07-17 20:41 ` Tejun Heo
  2012-07-18 20:02   ` Hugh Dickins
  3 siblings, 1 reply; 27+ messages in thread
From: Tejun Heo @ 2012-07-17 20:41 UTC (permalink / raw)
  To: Aristeu Rozanski; +Cc: linux-kernel, Li Zefan, Hugh Dickins, Hillf Danton

Hello,

On Mon, Jul 02, 2012 at 10:29:25AM -0400, Aristeu Rozanski wrote:
> This series are a refreshed version of a patchset submitted by Li Zefan back
> in march:
> 	https://lkml.org/lkml/2012/3/1/13
> 
> With Li's permission, I refreshed the patches to apply over the latest upstream
> and added the modifications suggested by others in the thread:
> - using a mount option instead of config option to enable the xattr support
> - reinitialize the list in kmem_xattrs_free()
> - renamed functions to simple_xattr_*()
> 
> Signed-off-by: Li Zefan <lizefan@huawei.com>
> Signed-off-by: Aristeu Rozanski <aris@redhat.com>

I raised this point before but I'm worried about directly exposing
kernel memory through xattr interface to userland.  Maybe it's okay as
long as !root users are kept from creating them.  I don't know.  I
really hope it used anonymous page cache instead of kmem tho.  Hugh,
would something like that be difficult?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/3] cgroup: revise how we re-populate root directory
  2012-07-17 18:38           ` Tejun Heo
@ 2012-07-17 21:29             ` Aristeu Rozanski
  2012-07-17 21:40               ` Tejun Heo
  0 siblings, 1 reply; 27+ messages in thread
From: Aristeu Rozanski @ 2012-07-17 21:29 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Li Zefan, Hugh Dickins, Hillf Danton

On Tue, Jul 17, 2012 at 11:38:51AM -0700, Tejun Heo wrote:
> Hello,
> 
> Sorry about the delay.

no worries, thanks for doing this

> On Tue, Jul 10, 2012 at 03:27:22PM -0400, Aristeu Rozanski wrote:
> > Index: xattr/include/linux/cgroup.h
> > ===================================================================
> > --- xattr.orig/include/linux/cgroup.h	2012-07-03 15:43:43.404334484 -0400
> > +++ xattr/include/linux/cgroup.h	2012-07-10 13:17:32.285119770 -0400
> > @@ -306,6 +306,9 @@
> >  	 */
> >  	size_t max_write_len;
> >  
> > +	/* The subsystem this cgroup file belongs to */
> > +	struct cgroup_subsys *subsys;
> 
> I couldn't find how this is used.  How is it used?

ignore this, I forgot it there from a previous version of the patch

> Maybe @subsys_mask is better?  Also, why is @remove_all a separate
> argument?  Can't we define, say, CGRP_ALL_SUBSYS and pass it in as
> @subsys_mask?

(...)

> Ah, I see.  We don't have a bit for the base files.  I think it's a
> bit confusing that there are two params controlling subsys file
> removal.  Maybe reserve a bit for NULL subsys or use
> @remove_base_files argument instead?

if we'd create one subsys id for base, a lot of code would have to be
changed in order to accomodate this. I think
remove_base_files/base_files will look better.

> 
> > +static int cgroup_populate_dir(struct cgroup *cgrp)
> > +{
> > +	int err;
> > +
> > +	err = cgroup_addrm_files(cgrp, NULL, files, true);
> > +	if (err < 0)
> > +		return err;
> > +
> > +	return __cgroup_populate_dir(cgrp, cgrp->root->subsys_bits);
> > +}
> 
> I think it would be best to keep the clear and populate interface more
> or less symmetrical so that both have @subsys_mask (and maybe
> @base_files).

what about this version:

--- xattr.orig/kernel/cgroup.c	2012-07-03 15:43:43.404334484 -0400
+++ xattr/kernel/cgroup.c	2012-07-17 15:49:10.536825559 -0400
@@ -824,7 +824,8 @@
 static int cgroup_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode);
 static struct dentry *cgroup_lookup(struct inode *, struct dentry *, struct nameidata *);
 static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry);
-static int cgroup_populate_dir(struct cgroup *cgrp);
+static int cgroup_populate_dir(struct cgroup *cgrp, bool base_files,
+			       unsigned long added_bits);
 static const struct inode_operations cgroup_dir_inode_operations;
 static const struct file_operations proc_cgroupstats_operations;
 
@@ -973,12 +974,23 @@
 	return -ENOENT;
 }
 
-static void cgroup_clear_directory(struct dentry *dir)
+static void cgroup_clear_directory(struct dentry *dir, bool base_files,
+				   unsigned long removed_bits)
 {
 	struct cgroup *cgrp = __d_cgrp(dir);
+	struct cgroup_subsys *ss;
 
-	while (!list_empty(&cgrp->files))
-		cgroup_rm_file(cgrp, NULL);
+	for_each_subsys(cgrp->root, ss) {
+		struct cftype_set *set;
+		if (!test_bit(ss->subsys_id, &removed_bits))
+			continue;
+		list_for_each_entry(set, &ss->cftsets, node)
+			cgroup_rm_file(cgrp, set->cfts);
+	}
+	if (base_files) {
+		while (!list_empty(&cgrp->files))
+			cgroup_rm_file(cgrp, NULL);
+	}
 }
 
 /*
@@ -987,8 +999,9 @@
 static void cgroup_d_remove_dir(struct dentry *dentry)
 {
 	struct dentry *parent;
+	struct cgroupfs_root *root = dentry->d_sb->s_fs_info;
 
-	cgroup_clear_directory(dentry);
+	cgroup_clear_directory(dentry, true, root->subsys_bits);
 
 	parent = dentry->d_parent;
 	spin_lock(&parent->d_lock);
@@ -1353,6 +1366,7 @@
 	struct cgroupfs_root *root = sb->s_fs_info;
 	struct cgroup *cgrp = &root->top_cgroup;
 	struct cgroup_sb_opts opts;
+	unsigned long added_bits, removed_bits;
 
 	mutex_lock(&cgrp->dentry->d_inode->i_mutex);
 	mutex_lock(&cgroup_mutex);
@@ -1368,6 +1382,9 @@
 		pr_warning("cgroup: option changes via remount are deprecated (pid=%d comm=%s)\n",
 			   task_tgid_nr(current), current->comm);
 
+	added_bits = opts.subsys_bits & ~root->subsys_bits;
+	removed_bits = root->subsys_bits & ~opts.subsys_bits;
+
 	/* Don't allow flags or name to change at remount */
 	if (opts.flags != root->flags ||
 	    (opts.name && strcmp(opts.name, root->name))) {
@@ -1383,8 +1400,9 @@
 	}
 
 	/* clear out any existing files and repopulate subsystem files */
-	cgroup_clear_directory(cgrp->dentry);
-	cgroup_populate_dir(cgrp);
+	cgroup_clear_directory(cgrp->dentry, false, removed_bits);
+	/* re-populate subsystem files */
+	cgroup_populate_dir(cgrp, false, added_bits);
 
 	if (opts.release_agent)
 		strcpy(root->release_agent_path, opts.release_agent);
@@ -1684,7 +1702,7 @@
 		BUG_ON(root->number_of_cgroups != 1);
 
 		cred = override_creds(&init_cred);
-		cgroup_populate_dir(root_cgrp);
+		cgroup_populate_dir(root_cgrp, true, root->subsys_bits);
 		revert_creds(cred);
 		mutex_unlock(&cgroup_root_mutex);
 		mutex_unlock(&cgroup_mutex);
@@ -3858,18 +3876,23 @@
 	{ }	/* terminate */
 };
 
-static int cgroup_populate_dir(struct cgroup *cgrp)
+static int cgroup_populate_dir(struct cgroup *cgrp, bool base_files,
+			       unsigned long added_bits)
 {
 	int err;
 	struct cgroup_subsys *ss;
 
-	err = cgroup_addrm_files(cgrp, NULL, files, true);
-	if (err < 0)
-		return err;
+        if (base_files) {
+		err = cgroup_addrm_files(cgrp, NULL, files, true);
+		if (err < 0)
+			return err;
+	}
 
 	/* process cftsets of each subsystem */
 	for_each_subsys(cgrp->root, ss) {
 		struct cftype_set *set;
+		if (!test_bit(ss->subsys_id, &added_bits))
+			continue;
 
 		list_for_each_entry(set, &ss->cftsets, node)
 			cgroup_addrm_files(cgrp, ss, set->cfts, true);
@@ -4032,7 +4055,7 @@
 
 	list_add_tail(&cgrp->allcg_node, &root->allcg_list);
 
-	err = cgroup_populate_dir(cgrp);
+	err = cgroup_populate_dir(cgrp, true, root->subsys_bits);
 	/* If err < 0, we have a half-filled directory - oh well ;) */
 
 	mutex_unlock(&cgroup_mutex);


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/3] cgroup: revise how we re-populate root directory
  2012-07-17 21:29             ` Aristeu Rozanski
@ 2012-07-17 21:40               ` Tejun Heo
  2012-07-18 14:16                 ` Aristeu Rozanski
  0 siblings, 1 reply; 27+ messages in thread
From: Tejun Heo @ 2012-07-17 21:40 UTC (permalink / raw)
  To: Aristeu Rozanski; +Cc: linux-kernel, Li Zefan, Hugh Dickins, Hillf Danton

Hello,

On Tue, Jul 17, 2012 at 05:29:27PM -0400, Aristeu Rozanski wrote:
> what about this version:

Yeah, generally looks good to me although @added/removed_bits argument
names irk me a bit.  The name may be okay for local variables but I
keep thinking "what bits?".  @subsys_mask or something indicating that
it's mask of subsystems would better.  Also, can you please add /**
function comment explaining the clear/populate functions and their
arguments?

Thanks!

-- 
tejun

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/3] cgroup: revise how we re-populate root directory
  2012-07-17 21:40               ` Tejun Heo
@ 2012-07-18 14:16                 ` Aristeu Rozanski
  2012-07-18 16:32                   ` Tejun Heo
  0 siblings, 1 reply; 27+ messages in thread
From: Aristeu Rozanski @ 2012-07-18 14:16 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Li Zefan, Hugh Dickins, Hillf Danton

Hi Tejun,
On Tue, Jul 17, 2012 at 02:40:10PM -0700, Tejun Heo wrote:
> On Tue, Jul 17, 2012 at 05:29:27PM -0400, Aristeu Rozanski wrote:
> > what about this version:
> 
> Yeah, generally looks good to me although @added/removed_bits argument
> names irk me a bit.  The name may be okay for local variables but I
> keep thinking "what bits?".  @subsys_mask or something indicating that
> it's mask of subsystems would better.  Also, can you please add /**
> function comment explaining the clear/populate functions and their
> arguments?

agreed. although the "_bits" thing is spread all around the code. you
want that cleaned up too? (I can post a followup patch after this
patchset is done)

--- xattr.orig/kernel/cgroup.c	2012-07-03 15:43:43.404334484 -0400
+++ xattr/kernel/cgroup.c	2012-07-18 09:48:19.914320313 -0400
@@ -824,7 +824,8 @@
 static int cgroup_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode);
 static struct dentry *cgroup_lookup(struct inode *, struct dentry *, struct nameidata *);
 static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry);
-static int cgroup_populate_dir(struct cgroup *cgrp);
+static int cgroup_populate_dir(struct cgroup *cgrp, bool base_files,
+			       unsigned long subsys_mask);
 static const struct inode_operations cgroup_dir_inode_operations;
 static const struct file_operations proc_cgroupstats_operations;
 
@@ -973,12 +974,29 @@
 	return -ENOENT;
 }
 
-static void cgroup_clear_directory(struct dentry *dir)
+/**
+ * cgroup_clear_directory - selective removal of base and subsystem files
+ * @dir: directory containing the files
+ * @base_files: true if the base files should be removed
+ * @subsys_mask: mask of the subsystem ids whose files should be removed
+ */
+static void cgroup_clear_directory(struct dentry *dir, bool base_files,
+				   unsigned long subsys_mask)
 {
 	struct cgroup *cgrp = __d_cgrp(dir);
+	struct cgroup_subsys *ss;
 
-	while (!list_empty(&cgrp->files))
-		cgroup_rm_file(cgrp, NULL);
+	for_each_subsys(cgrp->root, ss) {
+		struct cftype_set *set;
+		if (!test_bit(ss->subsys_id, &subsys_mask))
+			continue;
+		list_for_each_entry(set, &ss->cftsets, node)
+			cgroup_rm_file(cgrp, set->cfts);
+	}
+	if (base_files) {
+		while (!list_empty(&cgrp->files))
+			cgroup_rm_file(cgrp, NULL);
+	}
 }
 
 /*
@@ -987,8 +1005,9 @@
 static void cgroup_d_remove_dir(struct dentry *dentry)
 {
 	struct dentry *parent;
+	struct cgroupfs_root *root = dentry->d_sb->s_fs_info;
 
-	cgroup_clear_directory(dentry);
+	cgroup_clear_directory(dentry, true, root->subsys_bits);
 
 	parent = dentry->d_parent;
 	spin_lock(&parent->d_lock);
@@ -1353,6 +1372,7 @@
 	struct cgroupfs_root *root = sb->s_fs_info;
 	struct cgroup *cgrp = &root->top_cgroup;
 	struct cgroup_sb_opts opts;
+	unsigned long added_bits, removed_bits;
 
 	mutex_lock(&cgrp->dentry->d_inode->i_mutex);
 	mutex_lock(&cgroup_mutex);
@@ -1368,6 +1388,9 @@
 		pr_warning("cgroup: option changes via remount are deprecated (pid=%d comm=%s)\n",
 			   task_tgid_nr(current), current->comm);
 
+	added_bits = opts.subsys_bits & ~root->subsys_bits;
+	removed_bits = root->subsys_bits & ~opts.subsys_bits;
+
 	/* Don't allow flags or name to change at remount */
 	if (opts.flags != root->flags ||
 	    (opts.name && strcmp(opts.name, root->name))) {
@@ -1383,8 +1406,9 @@
 	}
 
 	/* clear out any existing files and repopulate subsystem files */
-	cgroup_clear_directory(cgrp->dentry);
-	cgroup_populate_dir(cgrp);
+	cgroup_clear_directory(cgrp->dentry, false, removed_bits);
+	/* re-populate subsystem files */
+	cgroup_populate_dir(cgrp, false, added_bits);
 
 	if (opts.release_agent)
 		strcpy(root->release_agent_path, opts.release_agent);
@@ -1684,7 +1708,7 @@
 		BUG_ON(root->number_of_cgroups != 1);
 
 		cred = override_creds(&init_cred);
-		cgroup_populate_dir(root_cgrp);
+		cgroup_populate_dir(root_cgrp, true, root->subsys_bits);
 		revert_creds(cred);
 		mutex_unlock(&cgroup_root_mutex);
 		mutex_unlock(&cgroup_mutex);
@@ -3858,18 +3882,29 @@
 	{ }	/* terminate */
 };
 
-static int cgroup_populate_dir(struct cgroup *cgrp)
+/**
+ * cgroup_populate_dir - selectively creation of files in a directory
+ * @cgrp: target cgroup
+ * @base_files: true if the base files should be added
+ * @subsys_mask: mask of the subsystem ids whose files should be added
+ */
+static int cgroup_populate_dir(struct cgroup *cgrp, bool base_files,
+			       unsigned long subsys_mask)
 {
 	int err;
 	struct cgroup_subsys *ss;
 
-	err = cgroup_addrm_files(cgrp, NULL, files, true);
-	if (err < 0)
-		return err;
+        if (base_files) {
+		err = cgroup_addrm_files(cgrp, NULL, files, true);
+		if (err < 0)
+			return err;
+	}
 
 	/* process cftsets of each subsystem */
 	for_each_subsys(cgrp->root, ss) {
 		struct cftype_set *set;
+		if (!test_bit(ss->subsys_id, &subsys_mask))
+			continue;
 
 		list_for_each_entry(set, &ss->cftsets, node)
 			cgroup_addrm_files(cgrp, ss, set->cfts, true);
@@ -4032,7 +4067,7 @@
 
 	list_add_tail(&cgrp->allcg_node, &root->allcg_list);
 
-	err = cgroup_populate_dir(cgrp);
+	err = cgroup_populate_dir(cgrp, true, root->subsys_bits);
 	/* If err < 0, we have a half-filled directory - oh well ;) */
 
 	mutex_unlock(&cgroup_mutex);

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/3] cgroup: revise how we re-populate root directory
  2012-07-18 14:16                 ` Aristeu Rozanski
@ 2012-07-18 16:32                   ` Tejun Heo
  0 siblings, 0 replies; 27+ messages in thread
From: Tejun Heo @ 2012-07-18 16:32 UTC (permalink / raw)
  To: Aristeu Rozanski; +Cc: linux-kernel, Li Zefan, Hugh Dickins, Hillf Danton

Hello,

On Wed, Jul 18, 2012 at 10:16:05AM -0400, Aristeu Rozanski wrote:
> > Yeah, generally looks good to me although @added/removed_bits argument
> > names irk me a bit.  The name may be okay for local variables but I
> > keep thinking "what bits?".  @subsys_mask or something indicating that
> > it's mask of subsystems would better.  Also, can you please add /**
> > function comment explaining the clear/populate functions and their
> > arguments?
> 
> agreed. although the "_bits" thing is spread all around the code. you
> want that cleaned up too? (I can post a followup patch after this
> patchset is done)

Yeah, that would be great.

Thanks a lot!

-- 
tejun

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 0/3] cgroup: add xattr support
  2012-07-17 20:41 ` [PATCH v3 0/3] " Tejun Heo
@ 2012-07-18 20:02   ` Hugh Dickins
  2012-07-18 22:10     ` Tejun Heo
  0 siblings, 1 reply; 27+ messages in thread
From: Hugh Dickins @ 2012-07-18 20:02 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Aristeu Rozanski, linux-kernel, Li Zefan, Hillf Danton

On Tue, 17 Jul 2012, Tejun Heo wrote:
> On Mon, Jul 02, 2012 at 10:29:25AM -0400, Aristeu Rozanski wrote:
> > This series are a refreshed version of a patchset submitted by Li Zefan back
> > in march:
> > 	https://lkml.org/lkml/2012/3/1/13
> > 
> > With Li's permission, I refreshed the patches to apply over the latest upstream
> > and added the modifications suggested by others in the thread:
> > - using a mount option instead of config option to enable the xattr support
> > - reinitialize the list in kmem_xattrs_free()
> > - renamed functions to simple_xattr_*()
> > 
> > Signed-off-by: Li Zefan <lizefan@huawei.com>
> > Signed-off-by: Aristeu Rozanski <aris@redhat.com>
> 
> I raised this point before but I'm worried about directly exposing
> kernel memory through xattr interface to userland.

Quite rightly.

> Maybe it's okay as long as !root users are kept from creating them.

That's how I see it with tmpfs.

> I don't know.  I
> really hope it used anonymous page cache instead of kmem tho.  Hugh,
> would something like that be difficult?

Yes, it would be difficult.

You don't use the word "swappable", but I take that to be implicit
when you say "anonymous ... instead of kmem": it might as well be
kmem if it cannot be swapped.

So you're wondering whether to introduce a new category of swappable
memory: not the original anon pages mapped into userspace, not shmem
use of swappable pages, but xattrs in the cgroup filesystem?

I think it would be foolish to add that dimension of complication
just for cgroupfs xattrs - the shmem/tmpfs case gives enough surprises
as it is.

If we were going to consider swappable kernel memory, we'd be choosing
easier targets (page tables and stacks), than something that at present
is being served by slab objects.

Or am I misunderstanding, or looking at this from the wrong angle?

If there's not a reasonable upper bound on what gets stored here
(did I see the word "icon" earlier in the thread?  which made me
think people would be loading their photo albums into these xattrs),
maybe the problem does need turning around.

Let userspace manage a tmpfs hierarchy parallel to the cgroupfs one?

Hugh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 0/3] cgroup: add xattr support
  2012-07-18 20:02   ` Hugh Dickins
@ 2012-07-18 22:10     ` Tejun Heo
  2012-07-19  1:11       ` Hugh Dickins
  0 siblings, 1 reply; 27+ messages in thread
From: Tejun Heo @ 2012-07-18 22:10 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Aristeu Rozanski, linux-kernel, Li Zefan, Hillf Danton

Hello, Hugh.

On Wed, Jul 18, 2012 at 01:02:03PM -0700, Hugh Dickins wrote:
> > I don't know.  I
> > really hope it used anonymous page cache instead of kmem tho.  Hugh,
> > would something like that be difficult?
> 
> Yes, it would be difficult.
> 
> You don't use the word "swappable", but I take that to be implicit
> when you say "anonymous ... instead of kmem": it might as well be
> kmem if it cannot be swapped.

Yeah, I just am not mm-savvy enough to be able to use the correct
term. :)

> So you're wondering whether to introduce a new category of swappable
> memory: not the original anon pages mapped into userspace, not shmem
> use of swappable pages, but xattrs in the cgroup filesystem?

But why do we need something completely new?  Can't we hijack some
inodes used by tmpfs and use them for xattr storage?  ie. Would it be
difficult to use tmpfs as backend storage for on-memory xattr?  With
that, we would already have the mechanism and interface(!) for
limiting the size.

> Or am I misunderstanding, or looking at this from the wrong angle?
> 
> If there's not a reasonable upper bound on what gets stored here
> (did I see the word "icon" earlier in the thread?  which made me
> think people would be loading their photo albums into these xattrs),
> maybe the problem does need turning around.
> 
> Let userspace manage a tmpfs hierarchy parallel to the cgroupfs one?

IIRC xattr for cgroupfs was suggested from systemd which currently is
using tmpfs to manage parallel hierarchy.  It's reportedly cumbersome
to keep in sync and they would much prefer if they can store metadata
right inside cgroup.  If we can have a shared implementation which
doesn't complicate each pseudo filesystem much, I think why not.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 0/3] cgroup: add xattr support
  2012-07-18 22:10     ` Tejun Heo
@ 2012-07-19  1:11       ` Hugh Dickins
  2012-07-20 17:59         ` Aristeu Rozanski
  2012-07-20 18:10         ` Tejun Heo
  0 siblings, 2 replies; 27+ messages in thread
From: Hugh Dickins @ 2012-07-19  1:11 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Aristeu Rozanski, linux-kernel, Li Zefan, Hillf Danton

On Wed, 18 Jul 2012, Tejun Heo wrote:
> On Wed, Jul 18, 2012 at 01:02:03PM -0700, Hugh Dickins wrote:
> > > I don't know.  I
> > > really hope it used anonymous page cache instead of kmem tho.  Hugh,
> > > would something like that be difficult?
> > 
> > Yes, it would be difficult.
> > 
> > You don't use the word "swappable", but I take that to be implicit
> > when you say "anonymous ... instead of kmem": it might as well be
> > kmem if it cannot be swapped.
> 
> Yeah, I just am not mm-savvy enough to be able to use the correct
> term. :)

No, I wasn't trying to correct you, just making that aspect explicit;
I don't often hear anyone else use the term "swappable" at all.

> 
> > So you're wondering whether to introduce a new category of swappable
> > memory: not the original anon pages mapped into userspace, not shmem
> > use of swappable pages, but xattrs in the cgroup filesystem?
> 
> But why do we need something completely new?  Can't we hijack some
> inodes used by tmpfs and use them for xattr storage?  ie. Would it be
> difficult to use tmpfs as backend storage for on-memory xattr?  With
> that, we would already have the mechanism and interface(!) for
> limiting the size.

That sounds just like what I was suggesting in my last sentence:
let userspace manage a tmpfs hierarchy parallel to the cgroupfs one.

Except, perhaps, where I assume "userspace" should be doing the hard work.

> 
> > Or am I misunderstanding, or looking at this from the wrong angle?
> > 
> > If there's not a reasonable upper bound on what gets stored here
> > (did I see the word "icon" earlier in the thread?  which made me
> > think people would be loading their photo albums into these xattrs),
> > maybe the problem does need turning around.
> > 
> > Let userspace manage a tmpfs hierarchy parallel to the cgroupfs one?
> 
> IIRC xattr for cgroupfs was suggested from systemd which currently is
> using tmpfs to manage parallel hierarchy.  It's reportedly cumbersome
> to keep in sync and they would much prefer if they can store metadata
> right inside cgroup.  If we can have a shared implementation which
> doesn't complicate each pseudo filesystem much, I think why not.

So systemd is already doing it as I suggested, but finds that awkward in
some respects, and wants more help from the kernel in synchronization.

Stuffing an unbounded amount of data into cgroupfs xattrs doesn't sound
the right way to go.

I wonder if it could be turned completely on its head, and the cgroupfs
parts be represented in xattrs on a tmpfs hierarchy?  The data contents of
the tmpfs files being entirely up to userspace i.e. systemd in ths case.

Hugh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 0/3] cgroup: add xattr support
  2012-07-19  1:11       ` Hugh Dickins
@ 2012-07-20 17:59         ` Aristeu Rozanski
  2012-07-20 18:04           ` Tejun Heo
  2012-07-22 19:12           ` Hugh Dickins
  2012-07-20 18:10         ` Tejun Heo
  1 sibling, 2 replies; 27+ messages in thread
From: Aristeu Rozanski @ 2012-07-20 17:59 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Tejun Heo, Aristeu Rozanski, linux-kernel, Li Zefan, Hillf Danton

Hugh,
On Wed, Jul 18, 2012 at 06:11:32PM -0700, Hugh Dickins wrote:
> > But why do we need something completely new?  Can't we hijack some
> > inodes used by tmpfs and use them for xattr storage?  ie. Would it be
> > difficult to use tmpfs as backend storage for on-memory xattr?  With
> > that, we would already have the mechanism and interface(!) for
> > limiting the size.
> 
> That sounds just like what I was suggesting in my last sentence:
> let userspace manage a tmpfs hierarchy parallel to the cgroupfs one.
> 
> Except, perhaps, where I assume "userspace" should be doing the hard work.

hm, not sure that's what Tejun meant. tmpfs uses anonymous memory for the file
contents, so reuse that infrastructure to allocate space for the extended
attributes the same way, instead of using kmem.

First thing I can think of is to use whole pages for it to prevent further
complexity. Shouldn't make much difference considering the usecases we have
now (systemd and containers), right?

--
Aristeu


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 0/3] cgroup: add xattr support
  2012-07-20 17:59         ` Aristeu Rozanski
@ 2012-07-20 18:04           ` Tejun Heo
  2012-08-07 15:22             ` Aristeu Rozanski
  2012-07-22 19:12           ` Hugh Dickins
  1 sibling, 1 reply; 27+ messages in thread
From: Tejun Heo @ 2012-07-20 18:04 UTC (permalink / raw)
  To: Aristeu Rozanski
  Cc: Hugh Dickins, Aristeu Rozanski, linux-kernel, Li Zefan, Hillf Danton

Hello, Hugh, Aristeu.

On Fri, Jul 20, 2012 at 01:59:59PM -0400, Aristeu Rozanski wrote:
> hm, not sure that's what Tejun meant. tmpfs uses anonymous memory for the file
> contents, so reuse that infrastructure to allocate space for the extended
> attributes the same way, instead of using kmem.
> 
> First thing I can think of is to use whole pages for it to prevent further
> complexity. Shouldn't make much difference considering the usecases we have
> now (systemd and containers), right?

Yeah, that's what I meant.  The internal fragmentation is ugly but I
think that should do for now at least.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 0/3] cgroup: add xattr support
  2012-07-19  1:11       ` Hugh Dickins
  2012-07-20 17:59         ` Aristeu Rozanski
@ 2012-07-20 18:10         ` Tejun Heo
  1 sibling, 0 replies; 27+ messages in thread
From: Tejun Heo @ 2012-07-20 18:10 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Aristeu Rozanski, linux-kernel, Li Zefan, Hillf Danton

Hello, Hugh.

On Wed, Jul 18, 2012 at 06:11:32PM -0700, Hugh Dickins wrote:
> So systemd is already doing it as I suggested, but finds that awkward in
> some respects, and wants more help from the kernel in synchronization.

Yeah, pretty much.

> Stuffing an unbounded amount of data into cgroupfs xattrs doesn't sound
> the right way to go.
> 
> I wonder if it could be turned completely on its head, and the cgroupfs
> parts be represented in xattrs on a tmpfs hierarchy?  The data contents of
> the tmpfs files being entirely up to userspace i.e. systemd in ths case.

Probably not.  We can't change the userland-visible cgroup interface
and the cgroup filesystem is unfortunately ***DEEPLY*** (yes, it's
crazily deep) entangled with cgroup core implementation. :(

Longer term goal is to factor out sysfs from kobject / driver model
and share it between cgroup and sysfs.  I think it's generally not a
bad idea to support xattr on pseudo filesystems if it isn't too hairy
to support.  Low level system management software should be able to
make pretty good use of them.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 0/3] cgroup: add xattr support
  2012-07-20 17:59         ` Aristeu Rozanski
  2012-07-20 18:04           ` Tejun Heo
@ 2012-07-22 19:12           ` Hugh Dickins
  2012-07-23 18:12             ` Aristeu Rozanski
  1 sibling, 1 reply; 27+ messages in thread
From: Hugh Dickins @ 2012-07-22 19:12 UTC (permalink / raw)
  To: Aristeu Rozanski
  Cc: Tejun Heo, Aristeu Rozanski, linux-kernel, Li Zefan, Hillf Danton

On Fri, 20 Jul 2012, Aristeu Rozanski wrote:
> On Wed, Jul 18, 2012 at 06:11:32PM -0700, Hugh Dickins wrote:
> > > But why do we need something completely new?  Can't we hijack some
> > > inodes used by tmpfs and use them for xattr storage?  ie. Would it be
> > > difficult to use tmpfs as backend storage for on-memory xattr?  With
> > > that, we would already have the mechanism and interface(!) for
> > > limiting the size.
> > 
> > That sounds just like what I was suggesting in my last sentence:
> > let userspace manage a tmpfs hierarchy parallel to the cgroupfs one.
> > 
> > Except, perhaps, where I assume "userspace" should be doing the hard work.
> 
> hm, not sure that's what Tejun meant. tmpfs uses anonymous memory for the file
> contents, so reuse that infrastructure to allocate space for the extended
> attributes the same way, instead of using kmem.
> 
> First thing I can think of is to use whole pages for it to prevent further
> complexity. Shouldn't make much difference considering the usecases we have
> now (systemd and containers), right?

Please, do not do this.

It may be fun to implement, but not to review and maintain.

If we're going to start supporting swappable kernel memory, tmpfs
xattrs is not the right place to start, and libfs xattrs certainly not:
they are a poor fit for swappable memory.  (You contemplate using whole
pages above: that will not be very kind to those without swap.)

By all means continue Zefan's work to move xattr support from tmpfs
to libfs (ah, to fs/xattr.c actually, okay), but keep them as kmem.

Support setting and removing user xattrs only if the user has the
appropriate capability (which root will have): looking through the
list of existing capabilities, CAP_IPC_LOCK actually looks appropriate,
although I admit its name certainly does not - it's the "lock down
unlimited amounts of memory" capability.

And support setting and removing user xattrs only if the filesystem
opts in to that: so cgroupfs can opt in, everything else stay out,
and we know where to look when memory goes missing.

Will "lsattr -R" in the cgroupfs mountpoint do enough to judge how
much memory is being used in this way?  I expect not, but I'm
unfamliar with it: you may need to show counts elsewhere.

If we keep an eye on those counts as systemd starts to make use of
this feature, perhaps a real case for making this memory swappable
will emerge; but more likely, a case for systemd to be economical
with them - they may be good for storing paths to data blobs, but
I doubt they're good for large blobs.

Hugh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 0/3] cgroup: add xattr support
  2012-07-22 19:12           ` Hugh Dickins
@ 2012-07-23 18:12             ` Aristeu Rozanski
  2012-07-24 18:28               ` Tejun Heo
  0 siblings, 1 reply; 27+ messages in thread
From: Aristeu Rozanski @ 2012-07-23 18:12 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Tejun Heo, Aristeu Rozanski, linux-kernel, Li Zefan, Hillf Danton

On Sun, Jul 22, 2012 at 12:12:07PM -0700, Hugh Dickins wrote:
> Please, do not do this.
> 
> It may be fun to implement, but not to review and maintain.
> 
> If we're going to start supporting swappable kernel memory, tmpfs
> xattrs is not the right place to start, and libfs xattrs certainly not:
> they are a poor fit for swappable memory.  (You contemplate using whole
> pages above: that will not be very kind to those without swap.)
> 
> By all means continue Zefan's work to move xattr support from tmpfs
> to libfs (ah, to fs/xattr.c actually, okay), but keep them as kmem.
> 
> Support setting and removing user xattrs only if the user has the
> appropriate capability (which root will have): looking through the
> list of existing capabilities, CAP_IPC_LOCK actually looks appropriate,
> although I admit its name certainly does not - it's the "lock down
> unlimited amounts of memory" capability.

ok, will have to do it in cgroupfs because it allows user prefixes, not the
case on tmpfs.

> And support setting and removing user xattrs only if the filesystem
> opts in to that: so cgroupfs can opt in, everything else stay out,
> and we know where to look when memory goes missing.

hm, for tmpfs, there's a config option for it, while cgroup has a mount
option, default off.

> Will "lsattr -R" in the cgroupfs mountpoint do enough to judge how
> much memory is being used in this way?  I expect not, but I'm
> unfamliar with it: you may need to show counts elsewhere.

that's for ext{2,3,4} file attributes, not extended attributes. but agreed,
there's a need to have this stat somewhere. Tejun, any ideas?

-- 
Aristeu


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 0/3] cgroup: add xattr support
  2012-07-23 18:12             ` Aristeu Rozanski
@ 2012-07-24 18:28               ` Tejun Heo
  2012-07-24 21:44                 ` Aristeu Rozanski
  0 siblings, 1 reply; 27+ messages in thread
From: Tejun Heo @ 2012-07-24 18:28 UTC (permalink / raw)
  To: Aristeu Rozanski
  Cc: Hugh Dickins, Aristeu Rozanski, linux-kernel, Li Zefan, Hillf Danton

Hello,

On Mon, Jul 23, 2012 at 02:12:52PM -0400, Aristeu Rozanski wrote:
> > Will "lsattr -R" in the cgroupfs mountpoint do enough to judge how
> > much memory is being used in this way?  I expect not, but I'm
> > unfamliar with it: you may need to show counts elsewhere.
> 
> that's for ext{2,3,4} file attributes, not extended attributes. but agreed,
> there's a need to have this stat somewhere. Tejun, any ideas?

No idea.  Don't we need some mechanism to limit the amount of memory
consumed too?  Also, do you know what type of metadata systemd is
trying to store in cgroupfs?  Depending on the size requirement, it
might not be worth it to implement it using kernel memory.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 0/3] cgroup: add xattr support
  2012-07-24 18:28               ` Tejun Heo
@ 2012-07-24 21:44                 ` Aristeu Rozanski
  0 siblings, 0 replies; 27+ messages in thread
From: Aristeu Rozanski @ 2012-07-24 21:44 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Hugh Dickins, Aristeu Rozanski, linux-kernel, Li Zefan,
	Hillf Danton, Kay Sievers

On Tue, Jul 24, 2012 at 11:28:09AM -0700, Tejun Heo wrote:
> On Mon, Jul 23, 2012 at 02:12:52PM -0400, Aristeu Rozanski wrote:
> > > Will "lsattr -R" in the cgroupfs mountpoint do enough to judge how
> > > much memory is being used in this way?  I expect not, but I'm
> > > unfamliar with it: you may need to show counts elsewhere.
> > 
> > that's for ext{2,3,4} file attributes, not extended attributes. but agreed,
> > there's a need to have this stat somewhere. Tejun, any ideas?
> 
> No idea.  Don't we need some mechanism to limit the amount of memory
> consumed too?  Also, do you know what type of metadata systemd is
> trying to store in cgroupfs?  Depending on the size requirement, it
> might not be worth it to implement it using kernel memory.

AFAIK, the memcg should be able to account for slab usage too already or soon.
Talked briefly with one of systemd developers, Kay Sievers (Cc'd), and they
initially would have the main PID of the cgroup (systemd uses a cgroup per
service), so I suspect the usage will be low.

Another use is trying to use selinux to limit the access to cgroups for using
with containers.

-- 
Aristeu


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 0/3] cgroup: add xattr support
  2012-07-20 18:04           ` Tejun Heo
@ 2012-08-07 15:22             ` Aristeu Rozanski
  0 siblings, 0 replies; 27+ messages in thread
From: Aristeu Rozanski @ 2012-08-07 15:22 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Hugh Dickins, Aristeu Rozanski, linux-kernel, Li Zefan,
	Hillf Danton, lennart

(Added Lennart to the discussion so he can help explaining how systemd will
use the xattrs)

On Fri, Jul 20, 2012 at 11:04:59AM -0700, Tejun Heo wrote:
> Hello, Hugh, Aristeu.
> 
> On Fri, Jul 20, 2012 at 01:59:59PM -0400, Aristeu Rozanski wrote:
> > hm, not sure that's what Tejun meant. tmpfs uses anonymous memory for the file
> > contents, so reuse that infrastructure to allocate space for the extended
> > attributes the same way, instead of using kmem.
> > 
> > First thing I can think of is to use whole pages for it to prevent further
> > complexity. Shouldn't make much difference considering the usecases we have
> > now (systemd and containers), right?
> 
> Yeah, that's what I meant.  The internal fragmentation is ugly but I
> think that should do for now at least.

while talking to Lennart, it seems there're a lot of ways the xattrs can be
used in systemd. If we're concerned with the amount of memory used, why not
create a new namespace just for this, limiting by definition the maximum value
size?

-- 
Aristeu


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2012-08-07 15:22 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-02 14:29 [PATCH v3 0/3] cgroup: add xattr support Aristeu Rozanski
2012-07-02 14:29 ` [PATCH v3 1/3] xattr: extract simple_xattr code from tmpfs Aristeu Rozanski
2012-07-03 16:53   ` [PATCH v4 " Aristeu Rozanski
2012-07-02 14:29 ` [PATCH v3 2/3] cgroup: revise how we re-populate root directory Aristeu Rozanski
2012-07-09 17:17   ` Tejun Heo
2012-07-09 17:22     ` Tejun Heo
2012-07-09 17:28       ` Tejun Heo
2012-07-10 19:27         ` Aristeu Rozanski
2012-07-17 13:56           ` Aristeu Rozanski
2012-07-17 18:38           ` Tejun Heo
2012-07-17 21:29             ` Aristeu Rozanski
2012-07-17 21:40               ` Tejun Heo
2012-07-18 14:16                 ` Aristeu Rozanski
2012-07-18 16:32                   ` Tejun Heo
2012-07-02 14:29 ` [PATCH v3 3/3] cgroup: add xattr support Aristeu Rozanski
2012-07-17 20:41 ` [PATCH v3 0/3] " Tejun Heo
2012-07-18 20:02   ` Hugh Dickins
2012-07-18 22:10     ` Tejun Heo
2012-07-19  1:11       ` Hugh Dickins
2012-07-20 17:59         ` Aristeu Rozanski
2012-07-20 18:04           ` Tejun Heo
2012-08-07 15:22             ` Aristeu Rozanski
2012-07-22 19:12           ` Hugh Dickins
2012-07-23 18:12             ` Aristeu Rozanski
2012-07-24 18:28               ` Tejun Heo
2012-07-24 21:44                 ` Aristeu Rozanski
2012-07-20 18:10         ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.