All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] cgroup: add xattr support
@ 2012-03-01  6:16 Li Zefan
  2012-03-01  6:17   ` Li Zefan
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Li Zefan @ 2012-03-01  6:16 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Lennart Poettering, Kay Sievers, Hugh Dickins,
	LKML, Cgroups, Eric Paris

This can be used to attach metadata to a cgroup, which is required
by systemd.

As lazy as I am, here just quoted from Lennart for the detailed use
cases:

>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart

Changelog v1->v2:
- add kmem_xattr APIs
- use these new APIs in both tmpfs and cgroup

Early discussions can be found in this thread:

https://lkml.org/lkml/2012/1/16/51

Note: this patchset is based on Tejun's cgroup.git/for-next

--
 fs/xattr.c               |  167 +++++++++++++++++++++++++++++++++++++++++
 include/linux/cgroup.h   |   18 +++-
 include/linux/shmem_fs.h |    3 +-
 include/linux/xattr.h    |   23 ++++++
 init/Kconfig             |   12 +++
 kernel/cgroup.c          |  184 ++++++++++++++++++++++++++++++++++++----------
 mm/shmem.c               |  151 +++-----------------------------------
 7 files changed, 371 insertions(+), 187 deletions(-)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 1/3] xattr: extract kmem_xattr code from tmpfs
@ 2012-03-01  6:17   ` Li Zefan
  0 siblings, 0 replies; 20+ messages in thread
From: Li Zefan @ 2012-03-01  6:17 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Lennart Poettering, Kay Sievers, Hugh Dickins,
	LKML, Cgroups, Eric Paris

Extract in-memory xattr APIs from tmpfs. Will be used by cgroup.

$ size vmlinux.o
   text    data     bss     dec     hex filename
5480650  655744 7039512 13175906         c90c62 vmlinux.o
$ size vmlinux.o
   text    data     bss     dec     hex filename
5480427  655744 7039448 13175619         c90b43 vmlinux.o

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
---
 fs/xattr.c               |  167 ++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/shmem_fs.h |    3 +-
 include/linux/xattr.h    |   23 ++++++
 mm/shmem.c               |  151 +++--------------------------------------
 4 files changed, 202 insertions(+), 142 deletions(-)

diff --git a/fs/xattr.c b/fs/xattr.c
index 82f4337..8af1b7f 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -757,3 +757,170 @@ EXPORT_SYMBOL(generic_getxattr);
 EXPORT_SYMBOL(generic_listxattr);
 EXPORT_SYMBOL(generic_setxattr);
 EXPORT_SYMBOL(generic_removexattr);
+
+/*
+ * initialize the kmem_xattrs structure
+ */
+void kmem_xattrs_init(struct kmem_xattrs *xattrs)
+{
+	INIT_LIST_HEAD(&xattrs->head);
+	spin_lock_init(&xattrs->lock);
+}
+
+/*
+ * free all the xattrs
+ */
+void kmem_xattrs_free(struct kmem_xattrs *xattrs)
+{
+	struct kmem_xattr *xattr, *node;
+
+	spin_lock(&xattrs->lock);
+	list_for_each_entry_safe(xattr, node, &xattrs->head, list) {
+		kfree(xattr->name);
+		kfree(xattr);
+	}
+	spin_unlock(&xattrs->lock);
+}
+
+/*
+ * xattr GET operation for in-memory/pseudo filesystems
+ */
+int kmem_xattr_get(struct kmem_xattrs *xattrs, const char *name,
+		   void *buffer, size_t size)
+{
+	struct kmem_xattr *xattr;
+	int ret = -ENODATA;
+
+	spin_lock(&xattrs->lock);
+	list_for_each_entry(xattr, &xattrs->head, list) {
+		if (strcmp(name, xattr->name))
+			continue;
+
+		ret = xattr->size;
+		if (buffer) {
+			if (size < xattr->size)
+				ret = -ERANGE;
+			else
+				memcpy(buffer, xattr->value, xattr->size);
+		}
+		break;
+	}
+	spin_unlock(&xattrs->lock);
+	return ret;
+}
+
+static int __kmem_xattr_set(struct kmem_xattrs *xattrs, const char *name,
+			    const void *value, size_t size, int flags)
+{
+	struct kmem_xattr *xattr;
+	struct kmem_xattr *new_xattr = NULL;
+	size_t len;
+	int err = 0;
+
+	/* value == NULL means remove */
+	if (value) {
+		/* wrap around? */
+		len = sizeof(*new_xattr) + size;
+		if (len <= sizeof(*new_xattr))
+			return -ENOMEM;
+
+		new_xattr = kmalloc(len, GFP_KERNEL);
+		if (!new_xattr)
+			return -ENOMEM;
+
+		new_xattr->name = kstrdup(name, GFP_KERNEL);
+		if (!new_xattr->name) {
+			kfree(new_xattr);
+			return -ENOMEM;
+		}
+
+		new_xattr->size = size;
+		memcpy(new_xattr->value, value, size);
+	}
+
+	spin_lock(&xattrs->lock);
+	list_for_each_entry(xattr, &xattrs->head, list) {
+		if (!strcmp(name, xattr->name)) {
+			if (flags & XATTR_CREATE) {
+				xattr = new_xattr;
+				err = -EEXIST;
+			} else if (new_xattr) {
+				list_replace(&xattr->list, &new_xattr->list);
+			} else {
+				list_del(&xattr->list);
+			}
+			goto out;
+		}
+	}
+	if (flags & XATTR_REPLACE) {
+		xattr = new_xattr;
+		err = -ENODATA;
+	} else {
+		list_add(&new_xattr->list, &xattrs->head);
+		xattr = NULL;
+	}
+out:
+	spin_unlock(&xattrs->lock);
+	if (xattr)
+		kfree(xattr->name);
+	kfree(xattr);
+	return err;
+
+}
+
+/*
+ * xattr SET operation for in-memory/pseudo filesystems
+ */
+int kmem_xattr_set(struct kmem_xattrs *xattrs, const char *name,
+		   const void *value, size_t size, int flags)
+{
+	if (size == 0)
+		value = ""; /* empty EA, do not remove */
+	return __kmem_xattr_set(xattrs, name, value, size, flags);
+}
+
+/*
+ * xattr REMOVE operation for in-memory/pseudo filesystems
+ */
+int kmem_xattr_remove(struct kmem_xattrs *xattrs, const char *name)
+{
+        return __kmem_xattr_set(xattrs, name, NULL, 0, XATTR_REPLACE);
+}
+
+static bool xattr_is_trusted(const char *name)
+{
+	return !strncmp(name, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN);
+}
+
+/*
+ * xattr LIST operation for in-memory/pseudo filesystems
+ */
+ssize_t kmem_xattr_list(struct kmem_xattrs *xattrs, char *buffer, size_t size)
+{
+	bool trusted = capable(CAP_SYS_ADMIN);
+	struct kmem_xattr *xattr;
+	size_t used = 0;
+
+	spin_lock(&xattrs->lock);
+	list_for_each_entry(xattr, &xattrs->head, list) {
+		size_t len;
+
+		/* skip "trusted." attributes for unprivileged callers */
+		if (!trusted && xattr_is_trusted(xattr->name))
+			continue;
+
+		len = strlen(xattr->name) + 1;
+		used += len;
+		if (buffer) {
+			if (size < used) {
+				used = -ERANGE;
+				break;
+			}
+			memcpy(buffer, xattr->name, len);
+			buffer += len;
+		}
+	}
+	spin_unlock(&xattrs->lock);
+
+	return used;
+}
diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index e4c711c..6d006b6 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -5,6 +5,7 @@
 #include <linux/mempolicy.h>
 #include <linux/pagemap.h>
 #include <linux/percpu_counter.h>
+#include <linux/xattr.h>
 
 /* inode in-kernel data */
 
@@ -18,7 +19,7 @@ struct shmem_inode_info {
 	};
 	struct shared_policy	policy;		/* NUMA memory alloc policy */
 	struct list_head	swaplist;	/* chain of maybes on swap */
-	struct list_head	xattr_list;	/* list of shmem_xattr */
+	struct kmem_xattrs	xattrs;		/* list of xattrs */
 	struct inode		vfs_inode;
 };
 
diff --git a/include/linux/xattr.h b/include/linux/xattr.h
index e5d1220..4d07f97 100644
--- a/include/linux/xattr.h
+++ b/include/linux/xattr.h
@@ -60,6 +60,7 @@
 #ifdef  __KERNEL__
 
 #include <linux/types.h>
+#include <linux/spinlock.h>
 
 struct inode;
 struct dentry;
@@ -96,6 +97,28 @@ ssize_t vfs_getxattr_alloc(struct dentry *dentry, const char *name,
 			   char **xattr_value, size_t size, gfp_t flags);
 int vfs_xattr_cmp(struct dentry *dentry, const char *xattr_name,
 		  const char *value, size_t size, gfp_t flags);
+
+struct kmem_xattrs {
+	struct list_head head;
+	spinlock_t lock;
+};
+
+struct kmem_xattr {
+	struct list_head list;
+	char *name;
+	size_t size;
+	char value[0];
+};
+
+void kmem_xattrs_init(struct kmem_xattrs *xattrs);
+void kmem_xattrs_free(struct kmem_xattrs *xattrs);
+int kmem_xattr_get(struct kmem_xattrs *xattrs, const char *name,
+		   void *buffer, size_t size);
+int kmem_xattr_set(struct kmem_xattrs *xattrs, const char *name,
+		   const void *value, size_t size, int flags);
+int kmem_xattr_remove(struct kmem_xattrs *xattrs, const char *name);
+ssize_t kmem_xattr_list(struct kmem_xattrs *xattrs, char *buffer, size_t size);
+
 #endif  /*  __KERNEL__  */
 
 #endif	/* _LINUX_XATTR_H */
diff --git a/mm/shmem.c b/mm/shmem.c
index feead19..2e1e416 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -76,13 +76,6 @@ static struct vfsmount *shm_mnt;
 /* Symlink up to this size is kmalloc'ed instead of using a swappable page */
 #define SHORT_SYMLINK_LEN 128
 
-struct shmem_xattr {
-	struct list_head list;	/* anchored by shmem_inode_info->xattr_list */
-	char *name;		/* xattr name */
-	size_t size;
-	char value[0];
-};
-
 /* Flag allocation requirements to shmem_getpage */
 enum sgp_type {
 	SGP_READ,	/* don't exceed i_size, don't allocate page */
@@ -545,7 +538,6 @@ static int shmem_setattr(struct dentry *dentry, struct iattr *attr)
 static void shmem_evict_inode(struct inode *inode)
 {
 	struct shmem_inode_info *info = SHMEM_I(inode);
-	struct shmem_xattr *xattr, *nxattr;
 
 	if (inode->i_mapping->a_ops == &shmem_aops) {
 		shmem_unacct_size(info->flags, inode->i_size);
@@ -559,10 +551,7 @@ static void shmem_evict_inode(struct inode *inode)
 	} else
 		kfree(info->symlink);
 
-	list_for_each_entry_safe(xattr, nxattr, &info->xattr_list, list) {
-		kfree(xattr->name);
-		kfree(xattr);
-	}
+	kmem_xattrs_free(&info->xattrs);
 	BUG_ON(inode->i_blocks);
 	shmem_free_inode(inode->i_sb);
 	end_writeback(inode);
@@ -1114,7 +1103,7 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode
 		spin_lock_init(&info->lock);
 		info->flags = flags & VM_NORESERVE;
 		INIT_LIST_HEAD(&info->swaplist);
-		INIT_LIST_HEAD(&info->xattr_list);
+		kmem_xattrs_init(&info->xattrs);
 		cache_no_acl(inode);
 
 		switch (mode & S_IFMT) {
@@ -1679,93 +1668,6 @@ static void shmem_put_link(struct dentry *dentry, struct nameidata *nd, void *co
  * filesystem level, though.
  */
 
-static int shmem_xattr_get(struct dentry *dentry, const char *name,
-			   void *buffer, size_t size)
-{
-	struct shmem_inode_info *info;
-	struct shmem_xattr *xattr;
-	int ret = -ENODATA;
-
-	info = SHMEM_I(dentry->d_inode);
-
-	spin_lock(&info->lock);
-	list_for_each_entry(xattr, &info->xattr_list, list) {
-		if (strcmp(name, xattr->name))
-			continue;
-
-		ret = xattr->size;
-		if (buffer) {
-			if (size < xattr->size)
-				ret = -ERANGE;
-			else
-				memcpy(buffer, xattr->value, xattr->size);
-		}
-		break;
-	}
-	spin_unlock(&info->lock);
-	return ret;
-}
-
-static int shmem_xattr_set(struct dentry *dentry, const char *name,
-			   const void *value, size_t size, int flags)
-{
-	struct inode *inode = dentry->d_inode;
-	struct shmem_inode_info *info = SHMEM_I(inode);
-	struct shmem_xattr *xattr;
-	struct shmem_xattr *new_xattr = NULL;
-	size_t len;
-	int err = 0;
-
-	/* value == NULL means remove */
-	if (value) {
-		/* wrap around? */
-		len = sizeof(*new_xattr) + size;
-		if (len <= sizeof(*new_xattr))
-			return -ENOMEM;
-
-		new_xattr = kmalloc(len, GFP_KERNEL);
-		if (!new_xattr)
-			return -ENOMEM;
-
-		new_xattr->name = kstrdup(name, GFP_KERNEL);
-		if (!new_xattr->name) {
-			kfree(new_xattr);
-			return -ENOMEM;
-		}
-
-		new_xattr->size = size;
-		memcpy(new_xattr->value, value, size);
-	}
-
-	spin_lock(&info->lock);
-	list_for_each_entry(xattr, &info->xattr_list, list) {
-		if (!strcmp(name, xattr->name)) {
-			if (flags & XATTR_CREATE) {
-				xattr = new_xattr;
-				err = -EEXIST;
-			} else if (new_xattr) {
-				list_replace(&xattr->list, &new_xattr->list);
-			} else {
-				list_del(&xattr->list);
-			}
-			goto out;
-		}
-	}
-	if (flags & XATTR_REPLACE) {
-		xattr = new_xattr;
-		err = -ENODATA;
-	} else {
-		list_add(&new_xattr->list, &info->xattr_list);
-		xattr = NULL;
-	}
-out:
-	spin_unlock(&info->lock);
-	if (xattr)
-		kfree(xattr->name);
-	kfree(xattr);
-	return err;
-}
-
 static const struct xattr_handler *shmem_xattr_handlers[] = {
 #ifdef CONFIG_TMPFS_POSIX_ACL
 	&generic_acl_access_handler,
@@ -1796,6 +1698,7 @@ static int shmem_xattr_validate(const char *name)
 static ssize_t shmem_getxattr(struct dentry *dentry, const char *name,
 			      void *buffer, size_t size)
 {
+	struct shmem_inode_info *info = SHMEM_I(dentry->d_inode);
 	int err;
 
 	/*
@@ -1810,12 +1713,13 @@ static ssize_t shmem_getxattr(struct dentry *dentry, const char *name,
 	if (err)
 		return err;
 
-	return shmem_xattr_get(dentry, name, buffer, size);
+	return kmem_xattr_get(&info->xattrs, name, buffer, size);
 }
 
 static int shmem_setxattr(struct dentry *dentry, const char *name,
 			  const void *value, size_t size, int flags)
 {
+	struct shmem_inode_info *info = SHMEM_I(dentry->d_inode);
 	int err;
 
 	/*
@@ -1830,15 +1734,12 @@ static int shmem_setxattr(struct dentry *dentry, const char *name,
 	if (err)
 		return err;
 
-	if (size == 0)
-		value = "";  /* empty EA, do not remove */
-
-	return shmem_xattr_set(dentry, name, value, size, flags);
-
+	return kmem_xattr_set(&info->xattrs, name, value, size, flags);
 }
 
 static int shmem_removexattr(struct dentry *dentry, const char *name)
 {
+	struct shmem_inode_info *info = SHMEM_I(dentry->d_inode);
 	int err;
 
 	/*
@@ -1853,45 +1754,13 @@ static int shmem_removexattr(struct dentry *dentry, const char *name)
 	if (err)
 		return err;
 
-	return shmem_xattr_set(dentry, name, NULL, 0, XATTR_REPLACE);
-}
-
-static bool xattr_is_trusted(const char *name)
-{
-	return !strncmp(name, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN);
+	return kmem_xattr_remove(&info->xattrs, name);
 }
 
 static ssize_t shmem_listxattr(struct dentry *dentry, char *buffer, size_t size)
 {
-	bool trusted = capable(CAP_SYS_ADMIN);
-	struct shmem_xattr *xattr;
-	struct shmem_inode_info *info;
-	size_t used = 0;
-
-	info = SHMEM_I(dentry->d_inode);
-
-	spin_lock(&info->lock);
-	list_for_each_entry(xattr, &info->xattr_list, list) {
-		size_t len;
-
-		/* skip "trusted." attributes for unprivileged callers */
-		if (!trusted && xattr_is_trusted(xattr->name))
-			continue;
-
-		len = strlen(xattr->name) + 1;
-		used += len;
-		if (buffer) {
-			if (size < used) {
-				used = -ERANGE;
-				break;
-			}
-			memcpy(buffer, xattr->name, len);
-			buffer += len;
-		}
-	}
-	spin_unlock(&info->lock);
-
-	return used;
+	struct shmem_inode_info *info = SHMEM_I(dentry->d_inode);
+	return kmem_xattr_list(&info->xattrs, buffer, size);
 }
 #endif /* CONFIG_TMPFS_XATTR */
 
-- 
1.7.3.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 1/3] xattr: extract kmem_xattr code from tmpfs
@ 2012-03-01  6:17   ` Li Zefan
  0 siblings, 0 replies; 20+ messages in thread
From: Li Zefan @ 2012-03-01  6:17 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Lennart Poettering, Kay Sievers, Hugh Dickins,
	LKML, Cgroups, Eric Paris

Extract in-memory xattr APIs from tmpfs. Will be used by cgroup.

$ size vmlinux.o
   text    data     bss     dec     hex filename
5480650  655744 7039512 13175906         c90c62 vmlinux.o
$ size vmlinux.o
   text    data     bss     dec     hex filename
5480427  655744 7039448 13175619         c90b43 vmlinux.o

Signed-off-by: Li Zefan <lizf-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
---
 fs/xattr.c               |  167 ++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/shmem_fs.h |    3 +-
 include/linux/xattr.h    |   23 ++++++
 mm/shmem.c               |  151 +++--------------------------------------
 4 files changed, 202 insertions(+), 142 deletions(-)

diff --git a/fs/xattr.c b/fs/xattr.c
index 82f4337..8af1b7f 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -757,3 +757,170 @@ EXPORT_SYMBOL(generic_getxattr);
 EXPORT_SYMBOL(generic_listxattr);
 EXPORT_SYMBOL(generic_setxattr);
 EXPORT_SYMBOL(generic_removexattr);
+
+/*
+ * initialize the kmem_xattrs structure
+ */
+void kmem_xattrs_init(struct kmem_xattrs *xattrs)
+{
+	INIT_LIST_HEAD(&xattrs->head);
+	spin_lock_init(&xattrs->lock);
+}
+
+/*
+ * free all the xattrs
+ */
+void kmem_xattrs_free(struct kmem_xattrs *xattrs)
+{
+	struct kmem_xattr *xattr, *node;
+
+	spin_lock(&xattrs->lock);
+	list_for_each_entry_safe(xattr, node, &xattrs->head, list) {
+		kfree(xattr->name);
+		kfree(xattr);
+	}
+	spin_unlock(&xattrs->lock);
+}
+
+/*
+ * xattr GET operation for in-memory/pseudo filesystems
+ */
+int kmem_xattr_get(struct kmem_xattrs *xattrs, const char *name,
+		   void *buffer, size_t size)
+{
+	struct kmem_xattr *xattr;
+	int ret = -ENODATA;
+
+	spin_lock(&xattrs->lock);
+	list_for_each_entry(xattr, &xattrs->head, list) {
+		if (strcmp(name, xattr->name))
+			continue;
+
+		ret = xattr->size;
+		if (buffer) {
+			if (size < xattr->size)
+				ret = -ERANGE;
+			else
+				memcpy(buffer, xattr->value, xattr->size);
+		}
+		break;
+	}
+	spin_unlock(&xattrs->lock);
+	return ret;
+}
+
+static int __kmem_xattr_set(struct kmem_xattrs *xattrs, const char *name,
+			    const void *value, size_t size, int flags)
+{
+	struct kmem_xattr *xattr;
+	struct kmem_xattr *new_xattr = NULL;
+	size_t len;
+	int err = 0;
+
+	/* value == NULL means remove */
+	if (value) {
+		/* wrap around? */
+		len = sizeof(*new_xattr) + size;
+		if (len <= sizeof(*new_xattr))
+			return -ENOMEM;
+
+		new_xattr = kmalloc(len, GFP_KERNEL);
+		if (!new_xattr)
+			return -ENOMEM;
+
+		new_xattr->name = kstrdup(name, GFP_KERNEL);
+		if (!new_xattr->name) {
+			kfree(new_xattr);
+			return -ENOMEM;
+		}
+
+		new_xattr->size = size;
+		memcpy(new_xattr->value, value, size);
+	}
+
+	spin_lock(&xattrs->lock);
+	list_for_each_entry(xattr, &xattrs->head, list) {
+		if (!strcmp(name, xattr->name)) {
+			if (flags & XATTR_CREATE) {
+				xattr = new_xattr;
+				err = -EEXIST;
+			} else if (new_xattr) {
+				list_replace(&xattr->list, &new_xattr->list);
+			} else {
+				list_del(&xattr->list);
+			}
+			goto out;
+		}
+	}
+	if (flags & XATTR_REPLACE) {
+		xattr = new_xattr;
+		err = -ENODATA;
+	} else {
+		list_add(&new_xattr->list, &xattrs->head);
+		xattr = NULL;
+	}
+out:
+	spin_unlock(&xattrs->lock);
+	if (xattr)
+		kfree(xattr->name);
+	kfree(xattr);
+	return err;
+
+}
+
+/*
+ * xattr SET operation for in-memory/pseudo filesystems
+ */
+int kmem_xattr_set(struct kmem_xattrs *xattrs, const char *name,
+		   const void *value, size_t size, int flags)
+{
+	if (size == 0)
+		value = ""; /* empty EA, do not remove */
+	return __kmem_xattr_set(xattrs, name, value, size, flags);
+}
+
+/*
+ * xattr REMOVE operation for in-memory/pseudo filesystems
+ */
+int kmem_xattr_remove(struct kmem_xattrs *xattrs, const char *name)
+{
+        return __kmem_xattr_set(xattrs, name, NULL, 0, XATTR_REPLACE);
+}
+
+static bool xattr_is_trusted(const char *name)
+{
+	return !strncmp(name, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN);
+}
+
+/*
+ * xattr LIST operation for in-memory/pseudo filesystems
+ */
+ssize_t kmem_xattr_list(struct kmem_xattrs *xattrs, char *buffer, size_t size)
+{
+	bool trusted = capable(CAP_SYS_ADMIN);
+	struct kmem_xattr *xattr;
+	size_t used = 0;
+
+	spin_lock(&xattrs->lock);
+	list_for_each_entry(xattr, &xattrs->head, list) {
+		size_t len;
+
+		/* skip "trusted." attributes for unprivileged callers */
+		if (!trusted && xattr_is_trusted(xattr->name))
+			continue;
+
+		len = strlen(xattr->name) + 1;
+		used += len;
+		if (buffer) {
+			if (size < used) {
+				used = -ERANGE;
+				break;
+			}
+			memcpy(buffer, xattr->name, len);
+			buffer += len;
+		}
+	}
+	spin_unlock(&xattrs->lock);
+
+	return used;
+}
diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index e4c711c..6d006b6 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -5,6 +5,7 @@
 #include <linux/mempolicy.h>
 #include <linux/pagemap.h>
 #include <linux/percpu_counter.h>
+#include <linux/xattr.h>
 
 /* inode in-kernel data */
 
@@ -18,7 +19,7 @@ struct shmem_inode_info {
 	};
 	struct shared_policy	policy;		/* NUMA memory alloc policy */
 	struct list_head	swaplist;	/* chain of maybes on swap */
-	struct list_head	xattr_list;	/* list of shmem_xattr */
+	struct kmem_xattrs	xattrs;		/* list of xattrs */
 	struct inode		vfs_inode;
 };
 
diff --git a/include/linux/xattr.h b/include/linux/xattr.h
index e5d1220..4d07f97 100644
--- a/include/linux/xattr.h
+++ b/include/linux/xattr.h
@@ -60,6 +60,7 @@
 #ifdef  __KERNEL__
 
 #include <linux/types.h>
+#include <linux/spinlock.h>
 
 struct inode;
 struct dentry;
@@ -96,6 +97,28 @@ ssize_t vfs_getxattr_alloc(struct dentry *dentry, const char *name,
 			   char **xattr_value, size_t size, gfp_t flags);
 int vfs_xattr_cmp(struct dentry *dentry, const char *xattr_name,
 		  const char *value, size_t size, gfp_t flags);
+
+struct kmem_xattrs {
+	struct list_head head;
+	spinlock_t lock;
+};
+
+struct kmem_xattr {
+	struct list_head list;
+	char *name;
+	size_t size;
+	char value[0];
+};
+
+void kmem_xattrs_init(struct kmem_xattrs *xattrs);
+void kmem_xattrs_free(struct kmem_xattrs *xattrs);
+int kmem_xattr_get(struct kmem_xattrs *xattrs, const char *name,
+		   void *buffer, size_t size);
+int kmem_xattr_set(struct kmem_xattrs *xattrs, const char *name,
+		   const void *value, size_t size, int flags);
+int kmem_xattr_remove(struct kmem_xattrs *xattrs, const char *name);
+ssize_t kmem_xattr_list(struct kmem_xattrs *xattrs, char *buffer, size_t size);
+
 #endif  /*  __KERNEL__  */
 
 #endif	/* _LINUX_XATTR_H */
diff --git a/mm/shmem.c b/mm/shmem.c
index feead19..2e1e416 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -76,13 +76,6 @@ static struct vfsmount *shm_mnt;
 /* Symlink up to this size is kmalloc'ed instead of using a swappable page */
 #define SHORT_SYMLINK_LEN 128
 
-struct shmem_xattr {
-	struct list_head list;	/* anchored by shmem_inode_info->xattr_list */
-	char *name;		/* xattr name */
-	size_t size;
-	char value[0];
-};
-
 /* Flag allocation requirements to shmem_getpage */
 enum sgp_type {
 	SGP_READ,	/* don't exceed i_size, don't allocate page */
@@ -545,7 +538,6 @@ static int shmem_setattr(struct dentry *dentry, struct iattr *attr)
 static void shmem_evict_inode(struct inode *inode)
 {
 	struct shmem_inode_info *info = SHMEM_I(inode);
-	struct shmem_xattr *xattr, *nxattr;
 
 	if (inode->i_mapping->a_ops == &shmem_aops) {
 		shmem_unacct_size(info->flags, inode->i_size);
@@ -559,10 +551,7 @@ static void shmem_evict_inode(struct inode *inode)
 	} else
 		kfree(info->symlink);
 
-	list_for_each_entry_safe(xattr, nxattr, &info->xattr_list, list) {
-		kfree(xattr->name);
-		kfree(xattr);
-	}
+	kmem_xattrs_free(&info->xattrs);
 	BUG_ON(inode->i_blocks);
 	shmem_free_inode(inode->i_sb);
 	end_writeback(inode);
@@ -1114,7 +1103,7 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode
 		spin_lock_init(&info->lock);
 		info->flags = flags & VM_NORESERVE;
 		INIT_LIST_HEAD(&info->swaplist);
-		INIT_LIST_HEAD(&info->xattr_list);
+		kmem_xattrs_init(&info->xattrs);
 		cache_no_acl(inode);
 
 		switch (mode & S_IFMT) {
@@ -1679,93 +1668,6 @@ static void shmem_put_link(struct dentry *dentry, struct nameidata *nd, void *co
  * filesystem level, though.
  */
 
-static int shmem_xattr_get(struct dentry *dentry, const char *name,
-			   void *buffer, size_t size)
-{
-	struct shmem_inode_info *info;
-	struct shmem_xattr *xattr;
-	int ret = -ENODATA;
-
-	info = SHMEM_I(dentry->d_inode);
-
-	spin_lock(&info->lock);
-	list_for_each_entry(xattr, &info->xattr_list, list) {
-		if (strcmp(name, xattr->name))
-			continue;
-
-		ret = xattr->size;
-		if (buffer) {
-			if (size < xattr->size)
-				ret = -ERANGE;
-			else
-				memcpy(buffer, xattr->value, xattr->size);
-		}
-		break;
-	}
-	spin_unlock(&info->lock);
-	return ret;
-}
-
-static int shmem_xattr_set(struct dentry *dentry, const char *name,
-			   const void *value, size_t size, int flags)
-{
-	struct inode *inode = dentry->d_inode;
-	struct shmem_inode_info *info = SHMEM_I(inode);
-	struct shmem_xattr *xattr;
-	struct shmem_xattr *new_xattr = NULL;
-	size_t len;
-	int err = 0;
-
-	/* value == NULL means remove */
-	if (value) {
-		/* wrap around? */
-		len = sizeof(*new_xattr) + size;
-		if (len <= sizeof(*new_xattr))
-			return -ENOMEM;
-
-		new_xattr = kmalloc(len, GFP_KERNEL);
-		if (!new_xattr)
-			return -ENOMEM;
-
-		new_xattr->name = kstrdup(name, GFP_KERNEL);
-		if (!new_xattr->name) {
-			kfree(new_xattr);
-			return -ENOMEM;
-		}
-
-		new_xattr->size = size;
-		memcpy(new_xattr->value, value, size);
-	}
-
-	spin_lock(&info->lock);
-	list_for_each_entry(xattr, &info->xattr_list, list) {
-		if (!strcmp(name, xattr->name)) {
-			if (flags & XATTR_CREATE) {
-				xattr = new_xattr;
-				err = -EEXIST;
-			} else if (new_xattr) {
-				list_replace(&xattr->list, &new_xattr->list);
-			} else {
-				list_del(&xattr->list);
-			}
-			goto out;
-		}
-	}
-	if (flags & XATTR_REPLACE) {
-		xattr = new_xattr;
-		err = -ENODATA;
-	} else {
-		list_add(&new_xattr->list, &info->xattr_list);
-		xattr = NULL;
-	}
-out:
-	spin_unlock(&info->lock);
-	if (xattr)
-		kfree(xattr->name);
-	kfree(xattr);
-	return err;
-}
-
 static const struct xattr_handler *shmem_xattr_handlers[] = {
 #ifdef CONFIG_TMPFS_POSIX_ACL
 	&generic_acl_access_handler,
@@ -1796,6 +1698,7 @@ static int shmem_xattr_validate(const char *name)
 static ssize_t shmem_getxattr(struct dentry *dentry, const char *name,
 			      void *buffer, size_t size)
 {
+	struct shmem_inode_info *info = SHMEM_I(dentry->d_inode);
 	int err;
 
 	/*
@@ -1810,12 +1713,13 @@ static ssize_t shmem_getxattr(struct dentry *dentry, const char *name,
 	if (err)
 		return err;
 
-	return shmem_xattr_get(dentry, name, buffer, size);
+	return kmem_xattr_get(&info->xattrs, name, buffer, size);
 }
 
 static int shmem_setxattr(struct dentry *dentry, const char *name,
 			  const void *value, size_t size, int flags)
 {
+	struct shmem_inode_info *info = SHMEM_I(dentry->d_inode);
 	int err;
 
 	/*
@@ -1830,15 +1734,12 @@ static int shmem_setxattr(struct dentry *dentry, const char *name,
 	if (err)
 		return err;
 
-	if (size == 0)
-		value = "";  /* empty EA, do not remove */
-
-	return shmem_xattr_set(dentry, name, value, size, flags);
-
+	return kmem_xattr_set(&info->xattrs, name, value, size, flags);
 }
 
 static int shmem_removexattr(struct dentry *dentry, const char *name)
 {
+	struct shmem_inode_info *info = SHMEM_I(dentry->d_inode);
 	int err;
 
 	/*
@@ -1853,45 +1754,13 @@ static int shmem_removexattr(struct dentry *dentry, const char *name)
 	if (err)
 		return err;
 
-	return shmem_xattr_set(dentry, name, NULL, 0, XATTR_REPLACE);
-}
-
-static bool xattr_is_trusted(const char *name)
-{
-	return !strncmp(name, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN);
+	return kmem_xattr_remove(&info->xattrs, name);
 }
 
 static ssize_t shmem_listxattr(struct dentry *dentry, char *buffer, size_t size)
 {
-	bool trusted = capable(CAP_SYS_ADMIN);
-	struct shmem_xattr *xattr;
-	struct shmem_inode_info *info;
-	size_t used = 0;
-
-	info = SHMEM_I(dentry->d_inode);
-
-	spin_lock(&info->lock);
-	list_for_each_entry(xattr, &info->xattr_list, list) {
-		size_t len;
-
-		/* skip "trusted." attributes for unprivileged callers */
-		if (!trusted && xattr_is_trusted(xattr->name))
-			continue;
-
-		len = strlen(xattr->name) + 1;
-		used += len;
-		if (buffer) {
-			if (size < used) {
-				used = -ERANGE;
-				break;
-			}
-			memcpy(buffer, xattr->name, len);
-			buffer += len;
-		}
-	}
-	spin_unlock(&info->lock);
-
-	return used;
+	struct shmem_inode_info *info = SHMEM_I(dentry->d_inode);
+	return kmem_xattr_list(&info->xattrs, buffer, size);
 }
 #endif /* CONFIG_TMPFS_XATTR */
 
-- 
1.7.3.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 3/3] cgroup: add xattr support
  2012-03-01  6:16 [PATCH v2 0/3] cgroup: add xattr support Li Zefan
  2012-03-01  6:17   ` Li Zefan
@ 2012-03-01  6:17 ` Li Zefan
  2012-03-01  6:17   ` Li Zefan
  2012-03-04 18:05   ` Tejun Heo
  3 siblings, 0 replies; 20+ messages in thread
From: Li Zefan @ 2012-03-01  6:17 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Lennart Poettering, Kay Sievers, Hugh Dickins,
	LKML, Cgroups, Eric Paris

This is one of the items in the plumber's wish list.

For use cases:

>> What would the use case be for this?
>
> Attaching meta information to services, in an easily discoverable
> way. For example, in systemd we create one cgroup for each service, and
> could then store data like the main pid of the specific service as an
> xattr on the cgroup itself. That way we'd have almost all service state
> in the cgroupfs, which would make it possible to terminate systemd and
> later restart it without losing any state information. But there's more:
> for example, some very peculiar services cannot be terminated on
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the
> services in question could just mark that on their cgroup, by setting an
> xattr. On the more desktopy side of things there are other
> possibilities: for example there are plans defining what an application
> is along the lines of a cgroup (i.e. an app being a collection of
> processes). With xattrs one could then attach an icon or human readable
> program name on the cgroup.
>
> The key idea is that this would allow attaching runtime meta information
> to cgroups and everything they model (services, apps, vms), that doesn't
> need any complex userspace infrastructure, has good access control
> (i.e. because the file system enforces that anyway, and there's the
> "trusted." xattr namespace), notifications (inotify), and can easily be
> shared among applications.
>
> Lennart

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
---
 include/linux/cgroup.h |    7 ++++
 init/Kconfig           |   12 ++++++
 kernel/cgroup.c        |   92 +++++++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 102 insertions(+), 9 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 9b93c9a..141c3ad 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -16,6 +16,7 @@
 #include <linux/prio_heap.h>
 #include <linux/rwsem.h>
 #include <linux/idr.h>
+#include <linux/xattr.h>
 
 #ifdef CONFIG_CGROUPS
 
@@ -211,6 +212,9 @@ struct cgroup {
 	/* List of events which userspace want to receive */
 	struct list_head event_list;
 	spinlock_t event_list_lock;
+
+	/* directory xattrs */
+	struct kmem_xattrs xattrs;
 };
 
 /*
@@ -298,6 +302,9 @@ struct cftype {
 	/* The subsystem this cgroup file belongs to */
 	struct cgroup_subsys *subsys;
 
+	/* file xattrs */
+	struct kmem_xattrs xattrs;
+
 	int (*open)(struct inode *inode, struct file *file);
 	ssize_t (*read)(struct cgroup *cgrp, struct cftype *cft,
 			struct file *file,
diff --git a/init/Kconfig b/init/Kconfig
index 3f42cd6..ba9a9dc 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -601,6 +601,18 @@ menuconfig CGROUPS
 
 if CGROUPS
 
+config CGROUP_XATTR
+	bool "Cgroup extended attributes"
+	default n
+	help
+	  Extended attributes are name:value pairs associated with inodes by
+	  the kernel or by users (see the attr(5) manual page, or visit
+	  <http://acl.bestbits.at/> for details).
+
+	  Currently the system.* namespace is not supported.
+
+	  If unsure, say N.
+
 config CGROUP_DEBUG
 	bool "Example debug cgroup subsystem"
 	default n
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 5ec9048..57f3b79 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -865,7 +865,12 @@ static void cgroup_diput(struct dentry *dentry, struct inode *inode)
 		 */
 		BUG_ON(!list_empty(&cgrp->pidlists));
 
+		kmem_xattrs_free(&cgrp->xattrs);
+
 		kfree_rcu(cgrp, rcu_head);
+	} else {
+		struct cftype *cft = dentry->d_fsdata;
+		kmem_xattrs_free(&cft->xattrs);
 	}
 	iput(inode);
 }
@@ -1355,6 +1360,7 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp)
 	mutex_init(&cgrp->pidlist_mutex);
 	INIT_LIST_HEAD(&cgrp->event_list);
 	spin_lock_init(&cgrp->event_list_lock);
+	kmem_xattrs_init(&cgrp->xattrs);
 }
 
 static void init_cgroup_root(struct cgroupfs_root *root)
@@ -1700,6 +1706,8 @@ static void cgroup_kill_sb(struct super_block *sb) {
 	mutex_unlock(&cgroup_root_mutex);
 	mutex_unlock(&cgroup_mutex);
 
+	kmem_xattrs_free(&cgrp->xattrs);
+
 	kill_litter_super(sb);
 	cgroup_drop_root(root);
 }
@@ -2504,19 +2512,83 @@ static int cgroup_rename(struct inode *old_dir, struct dentry *old_dentry,
 	return simple_rename(old_dir, old_dentry, new_dir, new_dentry);
 }
 
+#ifdef CONFIG_CGROUP_XATTR
+
+static struct kmem_xattrs *__d_xattrs(struct dentry *dentry)
+{
+	if (S_ISDIR(dentry->d_inode->i_mode))
+		return &__d_cgrp(dentry)->xattrs;
+	else
+		return &__d_cft(dentry)->xattrs;
+}
+
+static bool is_valid_xattr(const char *name)
+{
+	if (!strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) ||
+	    !strncmp(name, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN) ||
+	    !strncmp(name, XATTR_SECURITY_PREFIX, XATTR_SECURITY_PREFIX_LEN))
+		return true;
+	return false;
+}
+
+static int cgroup_setxattr(struct dentry *dentry, const char *name,
+			   const void *val, size_t size, int flags)
+{
+	if (!is_valid_xattr(name))
+		return -EINVAL;
+	return kmem_xattr_set(__d_xattrs(dentry), name, val, size, flags);
+}
+
+static int cgroup_removexattr(struct dentry *dentry, const char *name)
+{
+	if (!is_valid_xattr(name))
+		return -EINVAL;
+	return kmem_xattr_remove(__d_xattrs(dentry), name);
+}
+
+static ssize_t cgroup_getxattr(struct dentry *dentry, const char *name,
+			       void *buf, size_t size)
+{
+	if (!is_valid_xattr(name))
+		return -EINVAL;
+	return kmem_xattr_get(__d_xattrs(dentry), name, buf, size);
+}
+
+static ssize_t cgroup_listxattr(struct dentry *dentry, char *buf, size_t size)
+{
+	return kmem_xattr_list(__d_xattrs(dentry), buf, size);
+}
+
+#endif /* CONFIG_CGROUP_XATTR */
+
 static const struct file_operations cgroup_file_operations = {
-	.read = cgroup_file_read,
-	.write = cgroup_file_write,
-	.llseek = generic_file_llseek,
-	.open = cgroup_file_open,
-	.release = cgroup_file_release,
+	.read		= cgroup_file_read,
+	.write		= cgroup_file_write,
+	.llseek		= generic_file_llseek,
+	.open		= cgroup_file_open,
+	.release	= cgroup_file_release,
+};
+
+static const struct inode_operations cgroup_file_inode_operations = {
+#ifdef CONFIG_CGROUP_XATTR
+	.setxattr	= cgroup_setxattr,
+	.getxattr	= cgroup_getxattr,
+	.listxattr	= cgroup_listxattr,
+	.removexattr	= cgroup_removexattr,
+#endif
 };
 
 static const struct inode_operations cgroup_dir_inode_operations = {
-	.lookup = cgroup_lookup,
-	.mkdir = cgroup_mkdir,
-	.rmdir = cgroup_rmdir,
-	.rename = cgroup_rename,
+	.lookup		= cgroup_lookup,
+	.mkdir		= cgroup_mkdir,
+	.rmdir		= cgroup_rmdir,
+	.rename		= cgroup_rename,
+#ifdef CONFIG_CGROUP_XATTR
+	.setxattr	= cgroup_setxattr,
+	.getxattr	= cgroup_getxattr,
+	.listxattr	= cgroup_listxattr,
+	.removexattr	= cgroup_removexattr,
+#endif
 };
 
 static struct dentry *cgroup_lookup(struct inode *dir, struct dentry *dentry, struct nameidata *nd)
@@ -2564,6 +2636,7 @@ static int cgroup_create_file(struct dentry *dentry, umode_t mode,
 	} else if (S_ISREG(mode)) {
 		inode->i_size = 0;
 		inode->i_fop = &cgroup_file_operations;
+		inode->i_op = &cgroup_file_inode_operations;
 	}
 	d_instantiate(dentry, inode);
 	dget(dentry);	/* Extra count - pin the dentry in core */
@@ -2633,6 +2706,7 @@ int cgroup_add_file(struct cgroup *cgrp, struct cgroup_subsys *subsys,
 	char name[MAX_CGROUP_TYPE_NAMELEN + MAX_CFTYPE_NAME + 2] = { 0 };
 
 	cft->subsys = subsys;
+	kmem_xattrs_init(&cft->xattrs);
 
 	if (subsys && !test_bit(ROOT_NOPREFIX, &cgrp->root->flags)) {
 		strcpy(name, subsys->name);
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/3] cgroup: revise how we re-populate root directory
@ 2012-03-01  6:17   ` Li Zefan
  0 siblings, 0 replies; 20+ messages in thread
From: Li Zefan @ 2012-03-01  6:17 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Lennart Poettering, Kay Sievers, Hugh Dickins,
	LKML, Cgroups, Eric Paris

When remounting cgroupfs with some subsystems added to it and some
removed, cgroup will remove all the files in root directory and then
re-popluate it.

What I'm doing here is, only remove files which belong to subsystems that
are to be unbinded, and only create files for newly-added subsystems.
The purpose is to have all other files untouched.

This is a preparation for cgroup xattr support.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
---
 include/linux/cgroup.h |   11 +++---
 kernel/cgroup.c        |   92 +++++++++++++++++++++++++++++++----------------
 2 files changed, 67 insertions(+), 36 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 501adb1..9b93c9a 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -295,6 +295,9 @@ struct cftype {
 	 */
 	size_t max_write_len;
 
+	/* The subsystem this cgroup file belongs to */
+	struct cgroup_subsys *subsys;
+
 	int (*open)(struct inode *inode, struct file *file);
 	ssize_t (*read)(struct cgroup *cgrp, struct cftype *cft,
 			struct file *file,
@@ -387,16 +390,14 @@ struct cgroup_scanner {
  * called by subsystems from within a populate() method
  */
 int cgroup_add_file(struct cgroup *cgrp, struct cgroup_subsys *subsys,
-		       const struct cftype *cft);
+		    struct cftype *cft);
 
 /*
  * Add a set of new files to the given cgroup directory. Should
  * only be called by subsystems from within a populate() method
  */
-int cgroup_add_files(struct cgroup *cgrp,
-			struct cgroup_subsys *subsys,
-			const struct cftype cft[],
-			int count);
+int cgroup_add_files(struct cgroup *cgrp, struct cgroup_subsys *subsys,
+		     struct cftype cft[], int count);
 
 int cgroup_is_removed(const struct cgroup *cgrp);
 
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index c6877fe..5ec9048 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -781,6 +781,8 @@ static int cgroup_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode);
 static struct dentry *cgroup_lookup(struct inode *, struct dentry *, struct nameidata *);
 static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry);
 static int cgroup_populate_dir(struct cgroup *cgrp);
+static int cgroup_repopulate_dir(struct cgroup *cgrp, unsigned long added_bits,
+				 unsigned long removed_bits);
 static const struct inode_operations cgroup_dir_inode_operations;
 static const struct file_operations proc_cgroupstats_operations;
 
@@ -882,34 +884,44 @@ static void remove_dir(struct dentry *d)
 	dput(parent);
 }
 
-static void cgroup_clear_directory(struct dentry *dentry)
+static void cgroup_clear_directory(struct dentry *dentry, bool remove_all,
+				   unsigned long removed_bits)
 {
-	struct list_head *node;
+	LIST_HEAD(head);
+	struct dentry *d, *node;
 
 	BUG_ON(!mutex_is_locked(&dentry->d_inode->i_mutex));
+
 	spin_lock(&dentry->d_lock);
-	node = dentry->d_subdirs.next;
-	while (node != &dentry->d_subdirs) {
-		struct dentry *d = list_entry(node, struct dentry, d_u.d_child);
+	list_for_each_entry_safe(d, node, &dentry->d_subdirs, d_u.d_child) {
+		struct cftype *cft = d->d_fsdata;
+
+		if (!remove_all && cft->subsys &&
+		    !test_bit(cft->subsys->subsys_id, &removed_bits))
+			continue;
 
 		spin_lock_nested(&d->d_lock, DENTRY_D_LOCK_NESTED);
-		list_del_init(node);
+		list_move(&d->d_u.d_child, &head);
 		if (d->d_inode) {
 			/* This should never be called on a cgroup
 			 * directory with child cgroups */
 			BUG_ON(d->d_inode->i_mode & S_IFDIR);
 			dget_dlock(d);
-			spin_unlock(&d->d_lock);
-			spin_unlock(&dentry->d_lock);
+		}
+		spin_unlock(&d->d_lock);
+	}
+	spin_unlock(&dentry->d_lock);
+
+	list_for_each_entry_safe(d, node, &head, d_u.d_child) {
+		spin_lock(&d->d_lock);
+		list_del_init(&d->d_u.d_child);
+		spin_unlock(&d->d_lock);
+		if (d->d_inode) {
 			d_delete(d);
 			simple_unlink(dentry->d_inode, d);
 			dput(d);
-			spin_lock(&dentry->d_lock);
-		} else
-			spin_unlock(&d->d_lock);
-		node = dentry->d_subdirs.next;
+		}
 	}
-	spin_unlock(&dentry->d_lock);
 }
 
 /*
@@ -919,7 +931,7 @@ static void cgroup_d_remove_dir(struct dentry *dentry)
 {
 	struct dentry *parent;
 
-	cgroup_clear_directory(dentry);
+	cgroup_clear_directory(dentry, true, 0);
 
 	parent = dentry->d_parent;
 	spin_lock(&parent->d_lock);
@@ -1284,6 +1296,7 @@ static int cgroup_remount(struct super_block *sb, int *flags, char *data)
 	struct cgroupfs_root *root = sb->s_fs_info;
 	struct cgroup *cgrp = &root->top_cgroup;
 	struct cgroup_sb_opts opts;
+	unsigned long added_bits, removed_bits;
 
 	mutex_lock(&cgrp->dentry->d_inode->i_mutex);
 	mutex_lock(&cgroup_mutex);
@@ -1294,6 +1307,9 @@ static int cgroup_remount(struct super_block *sb, int *flags, char *data)
 	if (ret)
 		goto out_unlock;
 
+	added_bits = opts.subsys_bits & ~root->subsys_bits;
+	removed_bits = root->subsys_bits & ~opts.subsys_bits;
+
 	/* Don't allow flags or name to change at remount */
 	if (opts.flags != root->flags ||
 	    (opts.name && strcmp(opts.name, root->name))) {
@@ -1308,8 +1324,8 @@ static int cgroup_remount(struct super_block *sb, int *flags, char *data)
 		goto out_unlock;
 	}
 
-	/* (re)populate subsystem files */
-	cgroup_populate_dir(cgrp);
+	/* re-populate subsystem files */
+	cgroup_repopulate_dir(cgrp, added_bits, removed_bits);
 
 	if (opts.release_agent)
 		strcpy(root->release_agent_path, opts.release_agent);
@@ -2607,16 +2623,17 @@ static umode_t cgroup_file_mode(const struct cftype *cft)
 	return mode;
 }
 
-int cgroup_add_file(struct cgroup *cgrp,
-		       struct cgroup_subsys *subsys,
-		       const struct cftype *cft)
+int cgroup_add_file(struct cgroup *cgrp, struct cgroup_subsys *subsys,
+		    struct cftype *cft)
 {
 	struct dentry *dir = cgrp->dentry;
 	struct dentry *dentry;
 	int error;
 	umode_t mode;
-
 	char name[MAX_CGROUP_TYPE_NAMELEN + MAX_CFTYPE_NAME + 2] = { 0 };
+
+	cft->subsys = subsys;
+
 	if (subsys && !test_bit(ROOT_NOPREFIX, &cgrp->root->flags)) {
 		strcpy(name, subsys->name);
 		strcat(name, ".");
@@ -2637,10 +2654,8 @@ int cgroup_add_file(struct cgroup *cgrp,
 }
 EXPORT_SYMBOL_GPL(cgroup_add_file);
 
-int cgroup_add_files(struct cgroup *cgrp,
-			struct cgroup_subsys *subsys,
-			const struct cftype cft[],
-			int count)
+int cgroup_add_files(struct cgroup *cgrp, struct cgroup_subsys *subsys,
+		     struct cftype cft[], int count)
 {
 	int i, err;
 	for (i = 0; i < count; i++) {
@@ -3638,26 +3653,27 @@ static struct cftype cft_release_agent = {
 	.max_write_len = PATH_MAX,
 };
 
-static int cgroup_populate_dir(struct cgroup *cgrp)
+static int __cgroup_populate_dir(struct cgroup *cgrp, unsigned long added_bits)
 {
 	int err;
 	struct cgroup_subsys *ss;
 
-	/* First clear out any existing files */
-	cgroup_clear_directory(cgrp->dentry);
-
 	err = cgroup_add_files(cgrp, NULL, files, ARRAY_SIZE(files));
 	if (err < 0)
 		return err;
 
 	if (cgrp == cgrp->top_cgroup) {
-		if ((err = cgroup_add_file(cgrp, NULL, &cft_release_agent)) < 0)
+		err = cgroup_add_file(cgrp, NULL, &cft_release_agent);
+		if (err < 0)
 			return err;
 	}
 
 	for_each_subsys(cgrp->root, ss) {
-		if (ss->populate && (err = ss->populate(ss, cgrp)) < 0)
-			return err;
+		if (test_bit(ss->subsys_id, &added_bits) && ss->populate) {
+			err = ss->populate(ss, cgrp);
+			if (err < 0)
+				return err;
+		}
 	}
 	/* This cgroup is ready now */
 	for_each_subsys(cgrp->root, ss) {
@@ -3674,6 +3690,20 @@ static int cgroup_populate_dir(struct cgroup *cgrp)
 	return 0;
 }
 
+static int cgroup_populate_dir(struct cgroup *cgrp)
+{
+	return __cgroup_populate_dir(cgrp, cgrp->root->subsys_bits);
+}
+
+static int cgroup_repopulate_dir(struct cgroup *cgrp, unsigned long added_bits,
+				 unsigned long removed_bits)
+{
+	/* First clear out files */
+	cgroup_clear_directory(cgrp->dentry, false, removed_bits);
+
+	return __cgroup_populate_dir(cgrp, added_bits);
+}
+
 static void init_cgroup_css(struct cgroup_subsys_state *css,
 			       struct cgroup_subsys *ss,
 			       struct cgroup *cgrp)
-- 
1.7.3.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/3] cgroup: revise how we re-populate root directory
@ 2012-03-01  6:17   ` Li Zefan
  0 siblings, 0 replies; 20+ messages in thread
From: Li Zefan @ 2012-03-01  6:17 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Lennart Poettering, Kay Sievers, Hugh Dickins,
	LKML, Cgroups, Eric Paris

When remounting cgroupfs with some subsystems added to it and some
removed, cgroup will remove all the files in root directory and then
re-popluate it.

What I'm doing here is, only remove files which belong to subsystems that
are to be unbinded, and only create files for newly-added subsystems.
The purpose is to have all other files untouched.

This is a preparation for cgroup xattr support.

Signed-off-by: Li Zefan <lizf-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
---
 include/linux/cgroup.h |   11 +++---
 kernel/cgroup.c        |   92 +++++++++++++++++++++++++++++++----------------
 2 files changed, 67 insertions(+), 36 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 501adb1..9b93c9a 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -295,6 +295,9 @@ struct cftype {
 	 */
 	size_t max_write_len;
 
+	/* The subsystem this cgroup file belongs to */
+	struct cgroup_subsys *subsys;
+
 	int (*open)(struct inode *inode, struct file *file);
 	ssize_t (*read)(struct cgroup *cgrp, struct cftype *cft,
 			struct file *file,
@@ -387,16 +390,14 @@ struct cgroup_scanner {
  * called by subsystems from within a populate() method
  */
 int cgroup_add_file(struct cgroup *cgrp, struct cgroup_subsys *subsys,
-		       const struct cftype *cft);
+		    struct cftype *cft);
 
 /*
  * Add a set of new files to the given cgroup directory. Should
  * only be called by subsystems from within a populate() method
  */
-int cgroup_add_files(struct cgroup *cgrp,
-			struct cgroup_subsys *subsys,
-			const struct cftype cft[],
-			int count);
+int cgroup_add_files(struct cgroup *cgrp, struct cgroup_subsys *subsys,
+		     struct cftype cft[], int count);
 
 int cgroup_is_removed(const struct cgroup *cgrp);
 
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index c6877fe..5ec9048 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -781,6 +781,8 @@ static int cgroup_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode);
 static struct dentry *cgroup_lookup(struct inode *, struct dentry *, struct nameidata *);
 static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry);
 static int cgroup_populate_dir(struct cgroup *cgrp);
+static int cgroup_repopulate_dir(struct cgroup *cgrp, unsigned long added_bits,
+				 unsigned long removed_bits);
 static const struct inode_operations cgroup_dir_inode_operations;
 static const struct file_operations proc_cgroupstats_operations;
 
@@ -882,34 +884,44 @@ static void remove_dir(struct dentry *d)
 	dput(parent);
 }
 
-static void cgroup_clear_directory(struct dentry *dentry)
+static void cgroup_clear_directory(struct dentry *dentry, bool remove_all,
+				   unsigned long removed_bits)
 {
-	struct list_head *node;
+	LIST_HEAD(head);
+	struct dentry *d, *node;
 
 	BUG_ON(!mutex_is_locked(&dentry->d_inode->i_mutex));
+
 	spin_lock(&dentry->d_lock);
-	node = dentry->d_subdirs.next;
-	while (node != &dentry->d_subdirs) {
-		struct dentry *d = list_entry(node, struct dentry, d_u.d_child);
+	list_for_each_entry_safe(d, node, &dentry->d_subdirs, d_u.d_child) {
+		struct cftype *cft = d->d_fsdata;
+
+		if (!remove_all && cft->subsys &&
+		    !test_bit(cft->subsys->subsys_id, &removed_bits))
+			continue;
 
 		spin_lock_nested(&d->d_lock, DENTRY_D_LOCK_NESTED);
-		list_del_init(node);
+		list_move(&d->d_u.d_child, &head);
 		if (d->d_inode) {
 			/* This should never be called on a cgroup
 			 * directory with child cgroups */
 			BUG_ON(d->d_inode->i_mode & S_IFDIR);
 			dget_dlock(d);
-			spin_unlock(&d->d_lock);
-			spin_unlock(&dentry->d_lock);
+		}
+		spin_unlock(&d->d_lock);
+	}
+	spin_unlock(&dentry->d_lock);
+
+	list_for_each_entry_safe(d, node, &head, d_u.d_child) {
+		spin_lock(&d->d_lock);
+		list_del_init(&d->d_u.d_child);
+		spin_unlock(&d->d_lock);
+		if (d->d_inode) {
 			d_delete(d);
 			simple_unlink(dentry->d_inode, d);
 			dput(d);
-			spin_lock(&dentry->d_lock);
-		} else
-			spin_unlock(&d->d_lock);
-		node = dentry->d_subdirs.next;
+		}
 	}
-	spin_unlock(&dentry->d_lock);
 }
 
 /*
@@ -919,7 +931,7 @@ static void cgroup_d_remove_dir(struct dentry *dentry)
 {
 	struct dentry *parent;
 
-	cgroup_clear_directory(dentry);
+	cgroup_clear_directory(dentry, true, 0);
 
 	parent = dentry->d_parent;
 	spin_lock(&parent->d_lock);
@@ -1284,6 +1296,7 @@ static int cgroup_remount(struct super_block *sb, int *flags, char *data)
 	struct cgroupfs_root *root = sb->s_fs_info;
 	struct cgroup *cgrp = &root->top_cgroup;
 	struct cgroup_sb_opts opts;
+	unsigned long added_bits, removed_bits;
 
 	mutex_lock(&cgrp->dentry->d_inode->i_mutex);
 	mutex_lock(&cgroup_mutex);
@@ -1294,6 +1307,9 @@ static int cgroup_remount(struct super_block *sb, int *flags, char *data)
 	if (ret)
 		goto out_unlock;
 
+	added_bits = opts.subsys_bits & ~root->subsys_bits;
+	removed_bits = root->subsys_bits & ~opts.subsys_bits;
+
 	/* Don't allow flags or name to change at remount */
 	if (opts.flags != root->flags ||
 	    (opts.name && strcmp(opts.name, root->name))) {
@@ -1308,8 +1324,8 @@ static int cgroup_remount(struct super_block *sb, int *flags, char *data)
 		goto out_unlock;
 	}
 
-	/* (re)populate subsystem files */
-	cgroup_populate_dir(cgrp);
+	/* re-populate subsystem files */
+	cgroup_repopulate_dir(cgrp, added_bits, removed_bits);
 
 	if (opts.release_agent)
 		strcpy(root->release_agent_path, opts.release_agent);
@@ -2607,16 +2623,17 @@ static umode_t cgroup_file_mode(const struct cftype *cft)
 	return mode;
 }
 
-int cgroup_add_file(struct cgroup *cgrp,
-		       struct cgroup_subsys *subsys,
-		       const struct cftype *cft)
+int cgroup_add_file(struct cgroup *cgrp, struct cgroup_subsys *subsys,
+		    struct cftype *cft)
 {
 	struct dentry *dir = cgrp->dentry;
 	struct dentry *dentry;
 	int error;
 	umode_t mode;
-
 	char name[MAX_CGROUP_TYPE_NAMELEN + MAX_CFTYPE_NAME + 2] = { 0 };
+
+	cft->subsys = subsys;
+
 	if (subsys && !test_bit(ROOT_NOPREFIX, &cgrp->root->flags)) {
 		strcpy(name, subsys->name);
 		strcat(name, ".");
@@ -2637,10 +2654,8 @@ int cgroup_add_file(struct cgroup *cgrp,
 }
 EXPORT_SYMBOL_GPL(cgroup_add_file);
 
-int cgroup_add_files(struct cgroup *cgrp,
-			struct cgroup_subsys *subsys,
-			const struct cftype cft[],
-			int count)
+int cgroup_add_files(struct cgroup *cgrp, struct cgroup_subsys *subsys,
+		     struct cftype cft[], int count)
 {
 	int i, err;
 	for (i = 0; i < count; i++) {
@@ -3638,26 +3653,27 @@ static struct cftype cft_release_agent = {
 	.max_write_len = PATH_MAX,
 };
 
-static int cgroup_populate_dir(struct cgroup *cgrp)
+static int __cgroup_populate_dir(struct cgroup *cgrp, unsigned long added_bits)
 {
 	int err;
 	struct cgroup_subsys *ss;
 
-	/* First clear out any existing files */
-	cgroup_clear_directory(cgrp->dentry);
-
 	err = cgroup_add_files(cgrp, NULL, files, ARRAY_SIZE(files));
 	if (err < 0)
 		return err;
 
 	if (cgrp == cgrp->top_cgroup) {
-		if ((err = cgroup_add_file(cgrp, NULL, &cft_release_agent)) < 0)
+		err = cgroup_add_file(cgrp, NULL, &cft_release_agent);
+		if (err < 0)
 			return err;
 	}
 
 	for_each_subsys(cgrp->root, ss) {
-		if (ss->populate && (err = ss->populate(ss, cgrp)) < 0)
-			return err;
+		if (test_bit(ss->subsys_id, &added_bits) && ss->populate) {
+			err = ss->populate(ss, cgrp);
+			if (err < 0)
+				return err;
+		}
 	}
 	/* This cgroup is ready now */
 	for_each_subsys(cgrp->root, ss) {
@@ -3674,6 +3690,20 @@ static int cgroup_populate_dir(struct cgroup *cgrp)
 	return 0;
 }
 
+static int cgroup_populate_dir(struct cgroup *cgrp)
+{
+	return __cgroup_populate_dir(cgrp, cgrp->root->subsys_bits);
+}
+
+static int cgroup_repopulate_dir(struct cgroup *cgrp, unsigned long added_bits,
+				 unsigned long removed_bits)
+{
+	/* First clear out files */
+	cgroup_clear_directory(cgrp->dentry, false, removed_bits);
+
+	return __cgroup_populate_dir(cgrp, added_bits);
+}
+
 static void init_cgroup_css(struct cgroup_subsys_state *css,
 			       struct cgroup_subsys *ss,
 			       struct cgroup *cgrp)
-- 
1.7.3.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] xattr: extract kmem_xattr code from tmpfs
@ 2012-03-01 12:42     ` Hillf Danton
  0 siblings, 0 replies; 20+ messages in thread
From: Hillf Danton @ 2012-03-01 12:42 UTC (permalink / raw)
  To: Li Zefan
  Cc: Tejun Heo, Andrew Morton, Lennart Poettering, Kay Sievers,
	Hugh Dickins, LKML, Cgroups, Eric Paris

On Thu, Mar 1, 2012 at 2:17 PM, Li Zefan <lizf@cn.fujitsu.com> wrote:
> +/*
> + * free all the xattrs
> + */
> +void kmem_xattrs_free(struct kmem_xattrs *xattrs)
> +{
> +       struct kmem_xattr *xattr, *node;
> +
> +       spin_lock(&xattrs->lock);
> +       list_for_each_entry_safe(xattr, node, &xattrs->head, list) {
> +               kfree(xattr->name);
> +               kfree(xattr);
> +       }
> +       spin_unlock(&xattrs->lock);

In your work it is a library function, operations on &xattrs->head
are no longer allowed, say list_empty(), after it called, though I
dunno if such operations exist.

And kmem_xattrs is too close to kmem_xattr, kmem_xattr_list
is not pron of typo.

-hd

> +}
> +
[...]
> @@ -559,10 +551,7 @@ static void shmem_evict_inode(struct inode *inode)
>        } else
>                kfree(info->symlink);
>
> -       list_for_each_entry_safe(xattr, nxattr, &info->xattr_list, list) {
> -               kfree(xattr->name);
> -               kfree(xattr);
> -       }
> +       kmem_xattrs_free(&info->xattrs);
>        BUG_ON(inode->i_blocks);
>        shmem_free_inode(inode->i_sb);
>        end_writeback(inode);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] xattr: extract kmem_xattr code from tmpfs
@ 2012-03-01 12:42     ` Hillf Danton
  0 siblings, 0 replies; 20+ messages in thread
From: Hillf Danton @ 2012-03-01 12:42 UTC (permalink / raw)
  To: Li Zefan
  Cc: Tejun Heo, Andrew Morton, Lennart Poettering, Kay Sievers,
	Hugh Dickins, LKML, Cgroups, Eric Paris

On Thu, Mar 1, 2012 at 2:17 PM, Li Zefan <lizf-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org> wrote:
> +/*
> + * free all the xattrs
> + */
> +void kmem_xattrs_free(struct kmem_xattrs *xattrs)
> +{
> +       struct kmem_xattr *xattr, *node;
> +
> +       spin_lock(&xattrs->lock);
> +       list_for_each_entry_safe(xattr, node, &xattrs->head, list) {
> +               kfree(xattr->name);
> +               kfree(xattr);
> +       }
> +       spin_unlock(&xattrs->lock);

In your work it is a library function, operations on &xattrs->head
are no longer allowed, say list_empty(), after it called, though I
dunno if such operations exist.

And kmem_xattrs is too close to kmem_xattr, kmem_xattr_list
is not pron of typo.

-hd

> +}
> +
[...]
> @@ -559,10 +551,7 @@ static void shmem_evict_inode(struct inode *inode)
>        } else
>                kfree(info->symlink);
>
> -       list_for_each_entry_safe(xattr, nxattr, &info->xattr_list, list) {
> -               kfree(xattr->name);
> -               kfree(xattr);
> -       }
> +       kmem_xattrs_free(&info->xattrs);
>        BUG_ON(inode->i_blocks);
>        shmem_free_inode(inode->i_sb);
>        end_writeback(inode);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/3] cgroup: revise how we re-populate root directory
@ 2012-03-01 12:48     ` Hillf Danton
  0 siblings, 0 replies; 20+ messages in thread
From: Hillf Danton @ 2012-03-01 12:48 UTC (permalink / raw)
  To: Li Zefan
  Cc: Tejun Heo, Andrew Morton, Lennart Poettering, Kay Sievers,
	Hugh Dickins, LKML, Cgroups, Eric Paris

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 2780 bytes --]

On Thu, Mar 1, 2012 at 2:17 PM, Li Zefan <lizf@cn.fujitsu.com> wrote:
>
> -static void cgroup_clear_directory(struct dentry *dentry)
> +static void cgroup_clear_directory(struct dentry *dentry, bool remove_all,
> +                                  unsigned long removed_bits)
>  {
> -       struct list_head *node;
> +       LIST_HEAD(head);
> +       struct dentry *d, *node;
>
>        BUG_ON(!mutex_is_locked(&dentry->d_inode->i_mutex));
> +
>        spin_lock(&dentry->d_lock);
> -       node = dentry->d_subdirs.next;
> -       while (node != &dentry->d_subdirs) {
> -               struct dentry *d = list_entry(node, struct dentry, d_u.d_child);
> +       list_for_each_entry_safe(d, node, &dentry->d_subdirs, d_u.d_child) {
> +               struct cftype *cft = d->d_fsdata;
> +
> +               if (!remove_all && cft->subsys &&
> +                   !test_bit(cft->subsys->subsys_id, &removed_bits))
> +                       continue;
>
>                spin_lock_nested(&d->d_lock, DENTRY_D_LOCK_NESTED);
> -               list_del_init(node);
> +               list_move(&d->d_u.d_child, &head);
>                if (d->d_inode) {
>                        /* This should never be called on a cgroup
>                         * directory with child cgroups */
>                        BUG_ON(d->d_inode->i_mode & S_IFDIR);
>                        dget_dlock(d);
> -                       spin_unlock(&d->d_lock);
> -                       spin_unlock(&dentry->d_lock);
> +               }
> +               spin_unlock(&d->d_lock);
> +       }
> +       spin_unlock(&dentry->d_lock);
> +

Safe to process isolated entries without dentry->d_lock held?

-hd

> +       list_for_each_entry_safe(d, node, &head, d_u.d_child) {
> +               spin_lock(&d->d_lock);
> +               list_del_init(&d->d_u.d_child);
> +               spin_unlock(&d->d_lock);
> +               if (d->d_inode) {
>                        d_delete(d);
>                        simple_unlink(dentry->d_inode, d);
>                        dput(d);
> -                       spin_lock(&dentry->d_lock);
> -               } else
> -                       spin_unlock(&d->d_lock);
> -               node = dentry->d_subdirs.next;
> +               }
>        }
> -       spin_unlock(&dentry->d_lock);
>  }
>
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/3] cgroup: revise how we re-populate root directory
@ 2012-03-01 12:48     ` Hillf Danton
  0 siblings, 0 replies; 20+ messages in thread
From: Hillf Danton @ 2012-03-01 12:48 UTC (permalink / raw)
  To: Li Zefan
  Cc: Tejun Heo, Andrew Morton, Lennart Poettering, Kay Sievers,
	Hugh Dickins, LKML, Cgroups, Eric Paris

On Thu, Mar 1, 2012 at 2:17 PM, Li Zefan <lizf@cn.fujitsu.com> wrote:
>
> -static void cgroup_clear_directory(struct dentry *dentry)
> +static void cgroup_clear_directory(struct dentry *dentry, bool remove_all,
> +                                  unsigned long removed_bits)
>  {
> -       struct list_head *node;
> +       LIST_HEAD(head);
> +       struct dentry *d, *node;
>
>        BUG_ON(!mutex_is_locked(&dentry->d_inode->i_mutex));
> +
>        spin_lock(&dentry->d_lock);
> -       node = dentry->d_subdirs.next;
> -       while (node != &dentry->d_subdirs) {
> -               struct dentry *d = list_entry(node, struct dentry, d_u.d_child);
> +       list_for_each_entry_safe(d, node, &dentry->d_subdirs, d_u.d_child) {
> +               struct cftype *cft = d->d_fsdata;
> +
> +               if (!remove_all && cft->subsys &&
> +                   !test_bit(cft->subsys->subsys_id, &removed_bits))
> +                       continue;
>
>                spin_lock_nested(&d->d_lock, DENTRY_D_LOCK_NESTED);
> -               list_del_init(node);
> +               list_move(&d->d_u.d_child, &head);
>                if (d->d_inode) {
>                        /* This should never be called on a cgroup
>                         * directory with child cgroups */
>                        BUG_ON(d->d_inode->i_mode & S_IFDIR);
>                        dget_dlock(d);
> -                       spin_unlock(&d->d_lock);
> -                       spin_unlock(&dentry->d_lock);
> +               }
> +               spin_unlock(&d->d_lock);
> +       }
> +       spin_unlock(&dentry->d_lock);
> +

Safe to process isolated entries without dentry->d_lock held?

-hd

> +       list_for_each_entry_safe(d, node, &head, d_u.d_child) {
> +               spin_lock(&d->d_lock);
> +               list_del_init(&d->d_u.d_child);
> +               spin_unlock(&d->d_lock);
> +               if (d->d_inode) {
>                        d_delete(d);
>                        simple_unlink(dentry->d_inode, d);
>                        dput(d);
> -                       spin_lock(&dentry->d_lock);
> -               } else
> -                       spin_unlock(&d->d_lock);
> -               node = dentry->d_subdirs.next;
> +               }
>        }
> -       spin_unlock(&dentry->d_lock);
>  }
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] xattr: extract kmem_xattr code from tmpfs
@ 2012-03-02  5:23       ` Li Zefan
  0 siblings, 0 replies; 20+ messages in thread
From: Li Zefan @ 2012-03-02  5:23 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Tejun Heo, Andrew Morton, Lennart Poettering, Kay Sievers,
	Hugh Dickins, LKML, Cgroups, Eric Paris

Hillf Danton wrote:
> On Thu, Mar 1, 2012 at 2:17 PM, Li Zefan <lizf@cn.fujitsu.com> wrote:
>> +/*
>> + * free all the xattrs
>> + */
>> +void kmem_xattrs_free(struct kmem_xattrs *xattrs)
>> +{
>> +       struct kmem_xattr *xattr, *node;
>> +
>> +       spin_lock(&xattrs->lock);
>> +       list_for_each_entry_safe(xattr, node, &xattrs->head, list) {
>> +               kfree(xattr->name);
>> +               kfree(xattr);
>> +       }
>> +       spin_unlock(&xattrs->lock);
> 
> In your work it is a library function, operations on &xattrs->head
> are no longer allowed, say list_empty(), after it called, though I
> dunno if such operations exist.
> 

I'd prefer to make it safer to use, so I'll re-init the list head
after freeing all the list items.

> And kmem_xattrs is too close to kmem_xattr, kmem_xattr_list
> is not pron of typo.
> 

Then you won't pass compile. ;)

kmem_xattr_list exposed the data structure we used internally.
We might change to use an rbtree.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] xattr: extract kmem_xattr code from tmpfs
@ 2012-03-02  5:23       ` Li Zefan
  0 siblings, 0 replies; 20+ messages in thread
From: Li Zefan @ 2012-03-02  5:23 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Tejun Heo, Andrew Morton, Lennart Poettering, Kay Sievers,
	Hugh Dickins, LKML, Cgroups, Eric Paris

Hillf Danton wrote:
> On Thu, Mar 1, 2012 at 2:17 PM, Li Zefan <lizf-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org> wrote:
>> +/*
>> + * free all the xattrs
>> + */
>> +void kmem_xattrs_free(struct kmem_xattrs *xattrs)
>> +{
>> +       struct kmem_xattr *xattr, *node;
>> +
>> +       spin_lock(&xattrs->lock);
>> +       list_for_each_entry_safe(xattr, node, &xattrs->head, list) {
>> +               kfree(xattr->name);
>> +               kfree(xattr);
>> +       }
>> +       spin_unlock(&xattrs->lock);
> 
> In your work it is a library function, operations on &xattrs->head
> are no longer allowed, say list_empty(), after it called, though I
> dunno if such operations exist.
> 

I'd prefer to make it safer to use, so I'll re-init the list head
after freeing all the list items.

> And kmem_xattrs is too close to kmem_xattr, kmem_xattr_list
> is not pron of typo.
> 

Then you won't pass compile. ;)

kmem_xattr_list exposed the data structure we used internally.
We might change to use an rbtree.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/3] cgroup: revise how we re-populate root directory
@ 2012-03-02  5:49       ` Li Zefan
  0 siblings, 0 replies; 20+ messages in thread
From: Li Zefan @ 2012-03-02  5:49 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Tejun Heo, Andrew Morton, Lennart Poettering, Kay Sievers,
	Hugh Dickins, LKML, Cgroups, Eric Paris

20:48, Hillf Danton wrote:
> On Thu, Mar 1, 2012 at 2:17 PM, Li Zefan <lizf@cn.fujitsu.com> wrote:
>>
>> -static void cgroup_clear_directory(struct dentry *dentry)
>> +static void cgroup_clear_directory(struct dentry *dentry, bool remove_all,
>> +                                  unsigned long removed_bits)
>>  {
>> -       struct list_head *node;
>> +       LIST_HEAD(head);
>> +       struct dentry *d, *node;
>>
>>        BUG_ON(!mutex_is_locked(&dentry->d_inode->i_mutex));
>> +
>>        spin_lock(&dentry->d_lock);
>> -       node = dentry->d_subdirs.next;
>> -       while (node != &dentry->d_subdirs) {
>> -               struct dentry *d = list_entry(node, struct dentry, d_u.d_child);
>> +       list_for_each_entry_safe(d, node, &dentry->d_subdirs, d_u.d_child) {
>> +               struct cftype *cft = d->d_fsdata;
>> +
>> +               if (!remove_all && cft->subsys &&
>> +                   !test_bit(cft->subsys->subsys_id, &removed_bits))
>> +                       continue;
>>
>>                spin_lock_nested(&d->d_lock, DENTRY_D_LOCK_NESTED);
>> -               list_del_init(node);
>> +               list_move(&d->d_u.d_child, &head);
>>                if (d->d_inode) {
>>                        /* This should never be called on a cgroup
>>                         * directory with child cgroups */
>>                        BUG_ON(d->d_inode->i_mode & S_IFDIR);
>>                        dget_dlock(d);
>> -                       spin_unlock(&d->d_lock);
>> -                       spin_unlock(&dentry->d_lock);
>> +               }
>> +               spin_unlock(&d->d_lock);
>> +       }
>> +       spin_unlock(&dentry->d_lock);
>> +
> 
> Safe to process isolated entries without dentry->d_lock held?
> 

no vfs expert, I need to check into the vfs code.

> -hd
> 
>> +       list_for_each_entry_safe(d, node, &head, d_u.d_child) {
>> +               spin_lock(&d->d_lock);
>> +               list_del_init(&d->d_u.d_child);
>> +               spin_unlock(&d->d_lock);
>> +               if (d->d_inode) {
>>                        d_delete(d);
>>                        simple_unlink(dentry->d_inode, d);
>>                        dput(d);
>> -                       spin_lock(&dentry->d_lock);
>> -               } else
>> -                       spin_unlock(&d->d_lock);
>> -               node = dentry->d_subdirs.next;
>> +               }
>>        }
>> -       spin_unlock(&dentry->d_lock);
>>  }
>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/3] cgroup: revise how we re-populate root directory
@ 2012-03-02  5:49       ` Li Zefan
  0 siblings, 0 replies; 20+ messages in thread
From: Li Zefan @ 2012-03-02  5:49 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Tejun Heo, Andrew Morton, Lennart Poettering, Kay Sievers,
	Hugh Dickins, LKML, Cgroups, Eric Paris

20:48, Hillf Danton wrote:
> On Thu, Mar 1, 2012 at 2:17 PM, Li Zefan <lizf-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org> wrote:
>>
>> -static void cgroup_clear_directory(struct dentry *dentry)
>> +static void cgroup_clear_directory(struct dentry *dentry, bool remove_all,
>> +                                  unsigned long removed_bits)
>>  {
>> -       struct list_head *node;
>> +       LIST_HEAD(head);
>> +       struct dentry *d, *node;
>>
>>        BUG_ON(!mutex_is_locked(&dentry->d_inode->i_mutex));
>> +
>>        spin_lock(&dentry->d_lock);
>> -       node = dentry->d_subdirs.next;
>> -       while (node != &dentry->d_subdirs) {
>> -               struct dentry *d = list_entry(node, struct dentry, d_u.d_child);
>> +       list_for_each_entry_safe(d, node, &dentry->d_subdirs, d_u.d_child) {
>> +               struct cftype *cft = d->d_fsdata;
>> +
>> +               if (!remove_all && cft->subsys &&
>> +                   !test_bit(cft->subsys->subsys_id, &removed_bits))
>> +                       continue;
>>
>>                spin_lock_nested(&d->d_lock, DENTRY_D_LOCK_NESTED);
>> -               list_del_init(node);
>> +               list_move(&d->d_u.d_child, &head);
>>                if (d->d_inode) {
>>                        /* This should never be called on a cgroup
>>                         * directory with child cgroups */
>>                        BUG_ON(d->d_inode->i_mode & S_IFDIR);
>>                        dget_dlock(d);
>> -                       spin_unlock(&d->d_lock);
>> -                       spin_unlock(&dentry->d_lock);
>> +               }
>> +               spin_unlock(&d->d_lock);
>> +       }
>> +       spin_unlock(&dentry->d_lock);
>> +
> 
> Safe to process isolated entries without dentry->d_lock held?
> 

no vfs expert, I need to check into the vfs code.

> -hd
> 
>> +       list_for_each_entry_safe(d, node, &head, d_u.d_child) {
>> +               spin_lock(&d->d_lock);
>> +               list_del_init(&d->d_u.d_child);
>> +               spin_unlock(&d->d_lock);
>> +               if (d->d_inode) {
>>                        d_delete(d);
>>                        simple_unlink(dentry->d_inode, d);
>>                        dput(d);
>> -                       spin_lock(&dentry->d_lock);
>> -               } else
>> -                       spin_unlock(&d->d_lock);
>> -               node = dentry->d_subdirs.next;
>> +               }
>>        }
>> -       spin_unlock(&dentry->d_lock);
>>  }
>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/3] cgroup: add xattr support
@ 2012-03-04 18:05   ` Tejun Heo
  0 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2012-03-04 18:05 UTC (permalink / raw)
  To: Li Zefan
  Cc: Andrew Morton, Lennart Poettering, Kay Sievers, Hugh Dickins,
	LKML, Cgroups, Eric Paris

Hello, Li.

Sorry about the delay.  Was on vacation.

On Thu, Mar 01, 2012 at 02:16:57PM +0800, Li Zefan wrote:
> This can be used to attach metadata to a cgroup, which is required
> by systemd.
> 
> As lazy as I am, here just quoted from Lennart for the detailed use
> cases:

I much prefer this version but would prefer to deal with this in the
next merge window as we're already on rc6.  On a cursory look...

* I'm not sure kmem_ is a good prefix.  It sucks that xattr values are
  stored in kmem w/o size limit and we probably want to improve on
  both fronts in the future.  Wouldn't it be better to name them
  simple_*() and put them in fs/libfs.c?

* I don't think we want CONFIG option for cgroupfs xattr.  Let's just
  add a mount option.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/3] cgroup: add xattr support
@ 2012-03-04 18:05   ` Tejun Heo
  0 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2012-03-04 18:05 UTC (permalink / raw)
  To: Li Zefan
  Cc: Andrew Morton, Lennart Poettering, Kay Sievers, Hugh Dickins,
	LKML, Cgroups, Eric Paris

Hello, Li.

Sorry about the delay.  Was on vacation.

On Thu, Mar 01, 2012 at 02:16:57PM +0800, Li Zefan wrote:
> This can be used to attach metadata to a cgroup, which is required
> by systemd.
> 
> As lazy as I am, here just quoted from Lennart for the detailed use
> cases:

I much prefer this version but would prefer to deal with this in the
next merge window as we're already on rc6.  On a cursory look...

* I'm not sure kmem_ is a good prefix.  It sucks that xattr values are
  stored in kmem w/o size limit and we probably want to improve on
  both fronts in the future.  Wouldn't it be better to name them
  simple_*() and put them in fs/libfs.c?

* I don't think we want CONFIG option for cgroupfs xattr.  Let's just
  add a mount option.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/3] cgroup: add xattr support
@ 2012-03-06 21:37     ` Hugh Dickins
  0 siblings, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2012-03-06 21:37 UTC (permalink / raw)
  To: Li Zefan
  Cc: Tejun Heo, Andrew Morton, Lennart Poettering, Kay Sievers,
	Jarkko Sakkinen, Miklos Szeredi, LKML, Cgroups, Eric Paris

On Sun, 4 Mar 2012, Tejun Heo wrote:
> On Thu, Mar 01, 2012 at 02:16:57PM +0800, Li Zefan wrote:
> > This can be used to attach metadata to a cgroup, which is required
> > by systemd.

I've not reviewed your patches in detail, but I'm very happy for most
of the shmem xattr code to move out to a library for use elsewhere.

Note that the linux-next mm/shmem.c now includes Jarkko Sakkinen's
shmem_initxattrs() work, maybe you can draw from that too.

> > 
> > As lazy as I am, here just quoted from Lennart for the detailed use
> > cases:
> 
> I much prefer this version but would prefer to deal with this in the
> next merge window as we're already on rc6.  On a cursory look...
> 
> * I'm not sure kmem_ is a good prefix.  It sucks that xattr values are
>   stored in kmem w/o size limit and we probably want to improve on
>   both fronts in the future.  Wouldn't it be better to name them
>   simple_*() and put them in fs/libfs.c?

Yes, simple_*() in fs/libfs.c sounds much better to me than kmem_*().

On the size limit issue, shmem has its own shmem_xattr_validate(),
which rejects XATTR_USER_PREFIX, and the patches leave that unchanged.

I don't know whether the cgroup xattrs need to support USER or not,
and I don't know what permissions would be involved if they do;
but it's an important concern.

Hugh

> 
> * I don't think we want CONFIG option for cgroupfs xattr.  Let's just
>   add a mount option.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/3] cgroup: add xattr support
@ 2012-03-06 21:37     ` Hugh Dickins
  0 siblings, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2012-03-06 21:37 UTC (permalink / raw)
  To: Li Zefan
  Cc: Tejun Heo, Andrew Morton, Lennart Poettering, Kay Sievers,
	Jarkko Sakkinen, Miklos Szeredi, LKML, Cgroups, Eric Paris

On Sun, 4 Mar 2012, Tejun Heo wrote:
> On Thu, Mar 01, 2012 at 02:16:57PM +0800, Li Zefan wrote:
> > This can be used to attach metadata to a cgroup, which is required
> > by systemd.

I've not reviewed your patches in detail, but I'm very happy for most
of the shmem xattr code to move out to a library for use elsewhere.

Note that the linux-next mm/shmem.c now includes Jarkko Sakkinen's
shmem_initxattrs() work, maybe you can draw from that too.

> > 
> > As lazy as I am, here just quoted from Lennart for the detailed use
> > cases:
> 
> I much prefer this version but would prefer to deal with this in the
> next merge window as we're already on rc6.  On a cursory look...
> 
> * I'm not sure kmem_ is a good prefix.  It sucks that xattr values are
>   stored in kmem w/o size limit and we probably want to improve on
>   both fronts in the future.  Wouldn't it be better to name them
>   simple_*() and put them in fs/libfs.c?

Yes, simple_*() in fs/libfs.c sounds much better to me than kmem_*().

On the size limit issue, shmem has its own shmem_xattr_validate(),
which rejects XATTR_USER_PREFIX, and the patches leave that unchanged.

I don't know whether the cgroup xattrs need to support USER or not,
and I don't know what permissions would be involved if they do;
but it's an important concern.

Hugh

> 
> * I don't think we want CONFIG option for cgroupfs xattr.  Let's just
>   add a mount option.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/3] cgroup: add xattr support
@ 2012-03-08  9:03     ` Li Zefan
  0 siblings, 0 replies; 20+ messages in thread
From: Li Zefan @ 2012-03-08  9:03 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Lennart Poettering, Kay Sievers, Hugh Dickins,
	LKML, Cgroups, Eric Paris

Tejun Heo wrote:
> Hello, Li.
> 
> Sorry about the delay.  Was on vacation.
> 

Me too. ;)

> On Thu, Mar 01, 2012 at 02:16:57PM +0800, Li Zefan wrote:
>> This can be used to attach metadata to a cgroup, which is required
>> by systemd.
>>
>> As lazy as I am, here just quoted from Lennart for the detailed use
>> cases:
> 
> I much prefer this version but would prefer to deal with this in the
> next merge window as we're already on rc6.  On a cursory look...
> 

I'm fine with this, especially I'm busy with other stuff.

> * I'm not sure kmem_ is a good prefix.  It sucks that xattr values are
>   stored in kmem w/o size limit and we probably want to improve on
>   both fronts in the future.  Wouldn't it be better to name them
>   simple_*() and put them in fs/libfs.c?
> 

Ok.

> * I don't think we want CONFIG option for cgroupfs xattr.  Let's just
>   add a mount option.

Ok.

> 
> Thanks.
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/3] cgroup: add xattr support
@ 2012-03-08  9:03     ` Li Zefan
  0 siblings, 0 replies; 20+ messages in thread
From: Li Zefan @ 2012-03-08  9:03 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Lennart Poettering, Kay Sievers, Hugh Dickins,
	LKML, Cgroups, Eric Paris

Tejun Heo wrote:
> Hello, Li.
> 
> Sorry about the delay.  Was on vacation.
> 

Me too. ;)

> On Thu, Mar 01, 2012 at 02:16:57PM +0800, Li Zefan wrote:
>> This can be used to attach metadata to a cgroup, which is required
>> by systemd.
>>
>> As lazy as I am, here just quoted from Lennart for the detailed use
>> cases:
> 
> I much prefer this version but would prefer to deal with this in the
> next merge window as we're already on rc6.  On a cursory look...
> 

I'm fine with this, especially I'm busy with other stuff.

> * I'm not sure kmem_ is a good prefix.  It sucks that xattr values are
>   stored in kmem w/o size limit and we probably want to improve on
>   both fronts in the future.  Wouldn't it be better to name them
>   simple_*() and put them in fs/libfs.c?
> 

Ok.

> * I don't think we want CONFIG option for cgroupfs xattr.  Let's just
>   add a mount option.

Ok.

> 
> Thanks.
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2012-03-08  9:03 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-01  6:16 [PATCH v2 0/3] cgroup: add xattr support Li Zefan
2012-03-01  6:17 ` [PATCH v2 1/3] xattr: extract kmem_xattr code from tmpfs Li Zefan
2012-03-01  6:17   ` Li Zefan
2012-03-01 12:42   ` Hillf Danton
2012-03-01 12:42     ` Hillf Danton
2012-03-02  5:23     ` Li Zefan
2012-03-02  5:23       ` Li Zefan
2012-03-01  6:17 ` [PATCH v2 3/3] cgroup: add xattr support Li Zefan
2012-03-01  6:17 ` [PATCH v2 2/3] cgroup: revise how we re-populate root directory Li Zefan
2012-03-01  6:17   ` Li Zefan
2012-03-01 12:48   ` Hillf Danton
2012-03-01 12:48     ` Hillf Danton
2012-03-02  5:49     ` Li Zefan
2012-03-02  5:49       ` Li Zefan
2012-03-04 18:05 ` [PATCH v2 0/3] cgroup: add xattr support Tejun Heo
2012-03-04 18:05   ` Tejun Heo
2012-03-06 21:37   ` Hugh Dickins
2012-03-06 21:37     ` Hugh Dickins
2012-03-08  9:03   ` Li Zefan
2012-03-08  9:03     ` Li Zefan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.