All of lore.kernel.org
 help / color / mirror / Atom feed
* CGroup Namespaces (v4)
@ 2015-11-16 19:51 ` serge
  0 siblings, 0 replies; 180+ messages in thread
From: serge-A9i7LUbDfNHQT0dZR+AlfA @ 2015-11-16 19:51 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	tj-DgEjT+Ai2ygdnm+yROfE0A, cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

Hi,

following is a revised set of the CGroup Namespace patchset which Aditya
Kali has previously sent.  The code can also be found in the cgroupns.v4
branch of

https://git.kernel.org/cgit/linux/kernel/git/sergeh/linux-security.git/

To summarize the semantics:

1. CLONE_NEWCGROUP re-uses 0x02000000, which was previously CLONE_STOPPED

2. unsharing a cgroup namespace makes all your current cgroups your new
cgroup root.

3. /proc/pid/cgroup always shows cgroup paths relative to the reader's
cgroup namespce root.  A task outside of  your cgroup looks like

	8:memory:/../../..

4. when a task mounts a cgroupfs, the cgroup which shows up as root depends
on the mounting task's  cgroup namespace.

5. setns to a cgroup namespace switches your cgroup namespace but not
your cgroups.

With this, using github.com/hallyn/lxc #2015-11-09/cgns (and
github.com/hallyn/lxcfs #2015-11-10/cgns) we can start a container in a full
proper cgroup namespace, avoiding either cgmanager or lxcfs cgroup bind mounts.

This is completely backward compatible and will be completely invisible
to any existing cgroup users (except for those running inside a cgroup
namespace and looking at /proc/pid/cgroup of tasks outside their
namespace.)

Changes from V3:
1. Rebased onto latest cgroup changes.  In particular switch to
   css_set_lock and ns_common.
2. Support all hierarchies.

Changes from V2:
1. Added documentation in Documentation/cgroups/namespace.txt
2. Fixed a bug that caused crash
3. Incorporated some other suggestions from last patchset:
   - removed use of threadgroup_lock() while creating new cgroupns
   - use task_lock() instead of rcu_read_lock() while accessing
     task->nsproxy
   - optimized setns() to own cgroupns
   - simplified code around sane-behavior mount option parsing
4. Restored ACKs from Serge Hallyn from v1 on few patches that have
   not changed since then.

Changes from V1:
1. No pinning of processes within cgroupns. Tasks can be freely moved
   across cgroups even outside of their cgroupns-root. Usual DAC/MAC policies
   apply as before.
2. Path in /proc/<pid>/cgroup is now always shown and is relative to
   cgroupns-root. So path can contain '/..' strings depending on cgroupns-root
   of the reader and cgroup of <pid>.
3. setns() does not require the process to first move under target
   cgroupns-root.

Changes form RFC (V0):
1. setns support for cgroupns
2. 'mount -t cgroup cgroup <mntpt>' from inside a cgroupns now
   mounts the cgroup hierarcy with cgroupns-root as the filesystem root.
3. writes to cgroup files outside of cgroupns-root are not allowed
4. visibility of /proc/<pid>/cgroup is further restricted by not showing
   anything if the <pid> is in a sibling cgroupns and its cgroup falls outside
   your cgroupns-root.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* CGroup Namespaces (v4)
@ 2015-11-16 19:51 ` serge
  0 siblings, 0 replies; 180+ messages in thread
From: serge @ 2015-11-16 19:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: adityakali, tj, linux-api, containers, cgroups, lxc-devel, akpm,
	ebiederm

Hi,

following is a revised set of the CGroup Namespace patchset which Aditya
Kali has previously sent.  The code can also be found in the cgroupns.v4
branch of

https://git.kernel.org/cgit/linux/kernel/git/sergeh/linux-security.git/

To summarize the semantics:

1. CLONE_NEWCGROUP re-uses 0x02000000, which was previously CLONE_STOPPED

2. unsharing a cgroup namespace makes all your current cgroups your new
cgroup root.

3. /proc/pid/cgroup always shows cgroup paths relative to the reader's
cgroup namespce root.  A task outside of  your cgroup looks like

	8:memory:/../../..

4. when a task mounts a cgroupfs, the cgroup which shows up as root depends
on the mounting task's  cgroup namespace.

5. setns to a cgroup namespace switches your cgroup namespace but not
your cgroups.

With this, using github.com/hallyn/lxc #2015-11-09/cgns (and
github.com/hallyn/lxcfs #2015-11-10/cgns) we can start a container in a full
proper cgroup namespace, avoiding either cgmanager or lxcfs cgroup bind mounts.

This is completely backward compatible and will be completely invisible
to any existing cgroup users (except for those running inside a cgroup
namespace and looking at /proc/pid/cgroup of tasks outside their
namespace.)

Changes from V3:
1. Rebased onto latest cgroup changes.  In particular switch to
   css_set_lock and ns_common.
2. Support all hierarchies.

Changes from V2:
1. Added documentation in Documentation/cgroups/namespace.txt
2. Fixed a bug that caused crash
3. Incorporated some other suggestions from last patchset:
   - removed use of threadgroup_lock() while creating new cgroupns
   - use task_lock() instead of rcu_read_lock() while accessing
     task->nsproxy
   - optimized setns() to own cgroupns
   - simplified code around sane-behavior mount option parsing
4. Restored ACKs from Serge Hallyn from v1 on few patches that have
   not changed since then.

Changes from V1:
1. No pinning of processes within cgroupns. Tasks can be freely moved
   across cgroups even outside of their cgroupns-root. Usual DAC/MAC policies
   apply as before.
2. Path in /proc/<pid>/cgroup is now always shown and is relative to
   cgroupns-root. So path can contain '/..' strings depending on cgroupns-root
   of the reader and cgroup of <pid>.
3. setns() does not require the process to first move under target
   cgroupns-root.

Changes form RFC (V0):
1. setns support for cgroupns
2. 'mount -t cgroup cgroup <mntpt>' from inside a cgroupns now
   mounts the cgroup hierarcy with cgroupns-root as the filesystem root.
3. writes to cgroup files outside of cgroupns-root are not allowed
4. visibility of /proc/<pid>/cgroup is further restricted by not showing
   anything if the <pid> is in a sibling cgroupns and its cgroup falls outside
   your cgroupns-root.


^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 1/8] kernfs: Add API to generate relative kernfs path
  2015-11-16 19:51 ` serge
@ 2015-11-16 19:51     ` serge
  -1 siblings, 0 replies; 180+ messages in thread
From: serge-A9i7LUbDfNHQT0dZR+AlfA @ 2015-11-16 19:51 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	tj-DgEjT+Ai2ygdnm+yROfE0A, cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

From: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>

The new function kernfs_path_from_node() generates and returns
kernfs path of a given kernfs_node relative to a given parent
kernfs_node.

Signed-off-by: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Acked-by: Serge E. Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
---
 fs/kernfs/dir.c        |  195 ++++++++++++++++++++++++++++++++++++++++++------
 include/linux/kernfs.h |    3 +
 2 files changed, 177 insertions(+), 21 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 91e0045..dba0d42 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -44,28 +44,159 @@ static int kernfs_name_locked(struct kernfs_node *kn, char *buf, size_t buflen)
 	return strlcpy(buf, kn->parent ? kn->name : "/", buflen);
 }
 
-static char * __must_check kernfs_path_locked(struct kernfs_node *kn, char *buf,
-					      size_t buflen)
+/**
+ * kernfs_node_depth - compute depth of the kernfs node from root.
+ * The root node itself is considered to be at depth 0.
+ */
+static size_t kernfs_node_depth(struct kernfs_node *kn)
 {
-	char *p = buf + buflen;
+	size_t depth = 0;
+
+	BUG_ON(!kn);
+	while (kn->parent) {
+		depth++;
+		kn = kn->parent;
+	}
+	return depth;
+}
+
+/**
+ * kernfs_path_from_node_locked - find a relative path from @kn_from to @kn_to
+ * @kn_from: reference node of the path
+ * @kn_to: kernfs node to which path is needed
+ * @buf: buffer to copy the path into
+ * @buflen: size of @buf
+ *
+ * We need to handle couple of scenarios here:
+ * [1] when @kn_from is an ancestor of @kn_to at some level
+ * kn_from: /n1/n2/n3
+ * kn_to:   /n1/n2/n3/n4/n5
+ * result:  /n4/n5
+ *
+ * [2] when @kn_from is on a different hierarchy and we need to find common
+ * ancestor between @kn_from and @kn_to.
+ * kn_from: /n1/n2/n3/n4
+ * kn_to:   /n1/n2/n5
+ * result:  /../../n5
+ * OR
+ * kn_from: /n1/n2/n3/n4/n5   [depth=5]
+ * kn_to:   /n1/n2/n3         [depth=3]
+ * result:  /../..
+ */
+static char * __must_check kernfs_path_from_node_locked(
+	struct kernfs_node *kn_from,
+	struct kernfs_node *kn_to,
+	char *buf,
+	size_t buflen)
+{
+	char *p = buf;
+	struct kernfs_node *kn;
+	size_t depth_from = 0, depth_to, d;
 	int len;
 
-	*--p = '\0';
+	/* We atleast need 2 bytes to write "/\0". */
+	BUG_ON(buflen < 2);
 
-	do {
-		len = strlen(kn->name);
-		if (p - buf < len + 1) {
-			buf[0] = '\0';
-			p = NULL;
-			break;
+	/* Short-circuit the easy case - kn_to is the root node. */
+	if ((kn_from == kn_to) || (!kn_from && !kn_to->parent)) {
+		*p = '/';
+		*(p + 1) = '\0';
+		return p;
+	}
+
+	/* We can find the relative path only if both the nodes belong to the
+	 * same kernfs root.
+	 */
+	if (kn_from) {
+		BUG_ON(kernfs_root(kn_from) != kernfs_root(kn_to));
+		depth_from = kernfs_node_depth(kn_from);
+	}
+
+	depth_to = kernfs_node_depth(kn_to);
+
+	/* We compose path from left to right. So first write out all possible
+	 * "/.." strings needed to reach from 'kn_from' to the common ancestor.
+	 */
+	if (kn_from) {
+		while (depth_from > depth_to) {
+			len = strlen("/..");
+			if ((buflen - (p - buf)) < len + 1) {
+				/* buffer not big enough. */
+				buf[0] = '\0';
+				return NULL;
+			}
+			memcpy(p, "/..", len);
+			p += len;
+			*p = '\0';
+			--depth_from;
+			kn_from = kn_from->parent;
 		}
+
+		d = depth_to;
+		kn = kn_to;
+		while (depth_from < d) {
+			kn = kn->parent;
+			d--;
+		}
+
+		/* Now we have 'depth_from == depth_to' at this point. Add more
+		 * "/.."s until we reach common ancestor. In the worst case,
+		 * root node will be the common ancestor.
+		 */
+		while (depth_from > 0) {
+			/* If we reached common ancestor, stop. */
+			if (kn_from == kn)
+				break;
+			len = strlen("/..");
+			if ((buflen - (p - buf)) < len + 1) {
+				/* buffer not big enough. */
+				buf[0] = '\0';
+				return NULL;
+			}
+			memcpy(p, "/..", len);
+			p += len;
+			*p = '\0';
+			--depth_from;
+			kn_from = kn_from->parent;
+			kn = kn->parent;
+		}
+	}
+
+	/* Figure out how many bytes we need to write the path.
+	 */
+	d = depth_to;
+	kn = kn_to;
+	len = 0;
+	while (depth_from < d) {
+		/* Account for "/<name>". */
+		len += strlen(kn->name) + 1;
+		kn = kn->parent;
+		--d;
+	}
+
+	if ((buflen - (p - buf)) < len + 1) {
+		/* buffer not big enough. */
+		buf[0] = '\0';
+		return NULL;
+	}
+
+	/* We have enough space. Move 'p' ahead by computed length and start
+	 * writing node names into buffer.
+	 */
+	p += len;
+	*p = '\0';
+	d = depth_to;
+	kn = kn_to;
+	while (d > depth_from) {
+		len = strlen(kn->name);
 		p -= len;
 		memcpy(p, kn->name, len);
 		*--p = '/';
 		kn = kn->parent;
-	} while (kn && kn->parent);
+		--d;
+	}
 
-	return p;
+	return buf;
 }
 
 /**
@@ -115,26 +246,48 @@ size_t kernfs_path_len(struct kernfs_node *kn)
 }
 
 /**
- * kernfs_path - build full path of a given node
+ * kernfs_path_from_node - build path of node @kn relative to @kn_root.
+ * @kn_root: parent kernfs_node relative to which we need to build the path
  * @kn: kernfs_node of interest
- * @buf: buffer to copy @kn's name into
+ * @buf: buffer to copy @kn's path into
  * @buflen: size of @buf
  *
- * Builds and returns the full path of @kn in @buf of @buflen bytes.  The
- * path is built from the end of @buf so the returned pointer usually
- * doesn't match @buf.  If @buf isn't long enough, @buf is nul terminated
+ * Builds and returns @kn's path relative to @kn_root. @kn_root and @kn must
+ * be on the same kernfs-root. If @kn_root is not parent of @kn, then a relative
+ * path (which includes '..'s) as needed to reach from @kn_root to @kn is
+ * returned.
+ * The path may be built from the end of @buf so the returned pointer may not
+ * match @buf.  If @buf isn't long enough, @buf is nul terminated
  * and %NULL is returned.
  */
-char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen)
+char *kernfs_path_from_node(struct kernfs_node *kn_root, struct kernfs_node *kn,
+			    char *buf, size_t buflen)
 {
 	unsigned long flags;
 	char *p;
 
 	spin_lock_irqsave(&kernfs_rename_lock, flags);
-	p = kernfs_path_locked(kn, buf, buflen);
+	p = kernfs_path_from_node_locked(kn_root, kn, buf, buflen);
 	spin_unlock_irqrestore(&kernfs_rename_lock, flags);
 	return p;
 }
+EXPORT_SYMBOL_GPL(kernfs_path_from_node);
+
+/**
+ * kernfs_path - build full path of a given node
+ * @kn: kernfs_node of interest
+ * @buf: buffer to copy @kn's name into
+ * @buflen: size of @buf
+ *
+ * Builds and returns the full path of @kn in @buf of @buflen bytes.  The
+ * path is built from the end of @buf so the returned pointer usually
+ * doesn't match @buf.  If @buf isn't long enough, @buf is nul terminated
+ * and %NULL is returned.
+ */
+char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen)
+{
+	return kernfs_path_from_node(NULL, kn, buf, buflen);
+}
 EXPORT_SYMBOL_GPL(kernfs_path);
 
 /**
@@ -168,8 +321,8 @@ void pr_cont_kernfs_path(struct kernfs_node *kn)
 
 	spin_lock_irqsave(&kernfs_rename_lock, flags);
 
-	p = kernfs_path_locked(kn, kernfs_pr_cont_buf,
-			       sizeof(kernfs_pr_cont_buf));
+	p = kernfs_path_from_node_locked(NULL, kn, kernfs_pr_cont_buf,
+					 sizeof(kernfs_pr_cont_buf));
 	if (p)
 		pr_cont("%s", p);
 	else
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 5d4e9c4..d025ebd 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -267,6 +267,9 @@ static inline bool kernfs_ns_enabled(struct kernfs_node *kn)
 
 int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen);
 size_t kernfs_path_len(struct kernfs_node *kn);
+char * __must_check kernfs_path_from_node(struct kernfs_node *root_kn,
+					  struct kernfs_node *kn, char *buf,
+					  size_t buflen);
 char * __must_check kernfs_path(struct kernfs_node *kn, char *buf,
 				size_t buflen);
 void pr_cont_kernfs_name(struct kernfs_node *kn);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 1/8] kernfs: Add API to generate relative kernfs path
@ 2015-11-16 19:51     ` serge
  0 siblings, 0 replies; 180+ messages in thread
From: serge @ 2015-11-16 19:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: adityakali, tj, linux-api, containers, cgroups, lxc-devel, akpm,
	ebiederm

From: Aditya Kali <adityakali@google.com>

The new function kernfs_path_from_node() generates and returns
kernfs path of a given kernfs_node relative to a given parent
kernfs_node.

Signed-off-by: Aditya Kali <adityakali@google.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
---
 fs/kernfs/dir.c        |  195 ++++++++++++++++++++++++++++++++++++++++++------
 include/linux/kernfs.h |    3 +
 2 files changed, 177 insertions(+), 21 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 91e0045..dba0d42 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -44,28 +44,159 @@ static int kernfs_name_locked(struct kernfs_node *kn, char *buf, size_t buflen)
 	return strlcpy(buf, kn->parent ? kn->name : "/", buflen);
 }
 
-static char * __must_check kernfs_path_locked(struct kernfs_node *kn, char *buf,
-					      size_t buflen)
+/**
+ * kernfs_node_depth - compute depth of the kernfs node from root.
+ * The root node itself is considered to be at depth 0.
+ */
+static size_t kernfs_node_depth(struct kernfs_node *kn)
 {
-	char *p = buf + buflen;
+	size_t depth = 0;
+
+	BUG_ON(!kn);
+	while (kn->parent) {
+		depth++;
+		kn = kn->parent;
+	}
+	return depth;
+}
+
+/**
+ * kernfs_path_from_node_locked - find a relative path from @kn_from to @kn_to
+ * @kn_from: reference node of the path
+ * @kn_to: kernfs node to which path is needed
+ * @buf: buffer to copy the path into
+ * @buflen: size of @buf
+ *
+ * We need to handle couple of scenarios here:
+ * [1] when @kn_from is an ancestor of @kn_to at some level
+ * kn_from: /n1/n2/n3
+ * kn_to:   /n1/n2/n3/n4/n5
+ * result:  /n4/n5
+ *
+ * [2] when @kn_from is on a different hierarchy and we need to find common
+ * ancestor between @kn_from and @kn_to.
+ * kn_from: /n1/n2/n3/n4
+ * kn_to:   /n1/n2/n5
+ * result:  /../../n5
+ * OR
+ * kn_from: /n1/n2/n3/n4/n5   [depth=5]
+ * kn_to:   /n1/n2/n3         [depth=3]
+ * result:  /../..
+ */
+static char * __must_check kernfs_path_from_node_locked(
+	struct kernfs_node *kn_from,
+	struct kernfs_node *kn_to,
+	char *buf,
+	size_t buflen)
+{
+	char *p = buf;
+	struct kernfs_node *kn;
+	size_t depth_from = 0, depth_to, d;
 	int len;
 
-	*--p = '\0';
+	/* We atleast need 2 bytes to write "/\0". */
+	BUG_ON(buflen < 2);
 
-	do {
-		len = strlen(kn->name);
-		if (p - buf < len + 1) {
-			buf[0] = '\0';
-			p = NULL;
-			break;
+	/* Short-circuit the easy case - kn_to is the root node. */
+	if ((kn_from == kn_to) || (!kn_from && !kn_to->parent)) {
+		*p = '/';
+		*(p + 1) = '\0';
+		return p;
+	}
+
+	/* We can find the relative path only if both the nodes belong to the
+	 * same kernfs root.
+	 */
+	if (kn_from) {
+		BUG_ON(kernfs_root(kn_from) != kernfs_root(kn_to));
+		depth_from = kernfs_node_depth(kn_from);
+	}
+
+	depth_to = kernfs_node_depth(kn_to);
+
+	/* We compose path from left to right. So first write out all possible
+	 * "/.." strings needed to reach from 'kn_from' to the common ancestor.
+	 */
+	if (kn_from) {
+		while (depth_from > depth_to) {
+			len = strlen("/..");
+			if ((buflen - (p - buf)) < len + 1) {
+				/* buffer not big enough. */
+				buf[0] = '\0';
+				return NULL;
+			}
+			memcpy(p, "/..", len);
+			p += len;
+			*p = '\0';
+			--depth_from;
+			kn_from = kn_from->parent;
 		}
+
+		d = depth_to;
+		kn = kn_to;
+		while (depth_from < d) {
+			kn = kn->parent;
+			d--;
+		}
+
+		/* Now we have 'depth_from == depth_to' at this point. Add more
+		 * "/.."s until we reach common ancestor. In the worst case,
+		 * root node will be the common ancestor.
+		 */
+		while (depth_from > 0) {
+			/* If we reached common ancestor, stop. */
+			if (kn_from == kn)
+				break;
+			len = strlen("/..");
+			if ((buflen - (p - buf)) < len + 1) {
+				/* buffer not big enough. */
+				buf[0] = '\0';
+				return NULL;
+			}
+			memcpy(p, "/..", len);
+			p += len;
+			*p = '\0';
+			--depth_from;
+			kn_from = kn_from->parent;
+			kn = kn->parent;
+		}
+	}
+
+	/* Figure out how many bytes we need to write the path.
+	 */
+	d = depth_to;
+	kn = kn_to;
+	len = 0;
+	while (depth_from < d) {
+		/* Account for "/<name>". */
+		len += strlen(kn->name) + 1;
+		kn = kn->parent;
+		--d;
+	}
+
+	if ((buflen - (p - buf)) < len + 1) {
+		/* buffer not big enough. */
+		buf[0] = '\0';
+		return NULL;
+	}
+
+	/* We have enough space. Move 'p' ahead by computed length and start
+	 * writing node names into buffer.
+	 */
+	p += len;
+	*p = '\0';
+	d = depth_to;
+	kn = kn_to;
+	while (d > depth_from) {
+		len = strlen(kn->name);
 		p -= len;
 		memcpy(p, kn->name, len);
 		*--p = '/';
 		kn = kn->parent;
-	} while (kn && kn->parent);
+		--d;
+	}
 
-	return p;
+	return buf;
 }
 
 /**
@@ -115,26 +246,48 @@ size_t kernfs_path_len(struct kernfs_node *kn)
 }
 
 /**
- * kernfs_path - build full path of a given node
+ * kernfs_path_from_node - build path of node @kn relative to @kn_root.
+ * @kn_root: parent kernfs_node relative to which we need to build the path
  * @kn: kernfs_node of interest
- * @buf: buffer to copy @kn's name into
+ * @buf: buffer to copy @kn's path into
  * @buflen: size of @buf
  *
- * Builds and returns the full path of @kn in @buf of @buflen bytes.  The
- * path is built from the end of @buf so the returned pointer usually
- * doesn't match @buf.  If @buf isn't long enough, @buf is nul terminated
+ * Builds and returns @kn's path relative to @kn_root. @kn_root and @kn must
+ * be on the same kernfs-root. If @kn_root is not parent of @kn, then a relative
+ * path (which includes '..'s) as needed to reach from @kn_root to @kn is
+ * returned.
+ * The path may be built from the end of @buf so the returned pointer may not
+ * match @buf.  If @buf isn't long enough, @buf is nul terminated
  * and %NULL is returned.
  */
-char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen)
+char *kernfs_path_from_node(struct kernfs_node *kn_root, struct kernfs_node *kn,
+			    char *buf, size_t buflen)
 {
 	unsigned long flags;
 	char *p;
 
 	spin_lock_irqsave(&kernfs_rename_lock, flags);
-	p = kernfs_path_locked(kn, buf, buflen);
+	p = kernfs_path_from_node_locked(kn_root, kn, buf, buflen);
 	spin_unlock_irqrestore(&kernfs_rename_lock, flags);
 	return p;
 }
+EXPORT_SYMBOL_GPL(kernfs_path_from_node);
+
+/**
+ * kernfs_path - build full path of a given node
+ * @kn: kernfs_node of interest
+ * @buf: buffer to copy @kn's name into
+ * @buflen: size of @buf
+ *
+ * Builds and returns the full path of @kn in @buf of @buflen bytes.  The
+ * path is built from the end of @buf so the returned pointer usually
+ * doesn't match @buf.  If @buf isn't long enough, @buf is nul terminated
+ * and %NULL is returned.
+ */
+char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen)
+{
+	return kernfs_path_from_node(NULL, kn, buf, buflen);
+}
 EXPORT_SYMBOL_GPL(kernfs_path);
 
 /**
@@ -168,8 +321,8 @@ void pr_cont_kernfs_path(struct kernfs_node *kn)
 
 	spin_lock_irqsave(&kernfs_rename_lock, flags);
 
-	p = kernfs_path_locked(kn, kernfs_pr_cont_buf,
-			       sizeof(kernfs_pr_cont_buf));
+	p = kernfs_path_from_node_locked(NULL, kn, kernfs_pr_cont_buf,
+					 sizeof(kernfs_pr_cont_buf));
 	if (p)
 		pr_cont("%s", p);
 	else
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 5d4e9c4..d025ebd 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -267,6 +267,9 @@ static inline bool kernfs_ns_enabled(struct kernfs_node *kn)
 
 int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen);
 size_t kernfs_path_len(struct kernfs_node *kn);
+char * __must_check kernfs_path_from_node(struct kernfs_node *root_kn,
+					  struct kernfs_node *kn, char *buf,
+					  size_t buflen);
 char * __must_check kernfs_path(struct kernfs_node *kn, char *buf,
 				size_t buflen);
 void pr_cont_kernfs_name(struct kernfs_node *kn);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 2/8] sched: new clone flag CLONE_NEWCGROUP for cgroup namespace
       [not found] ` <1447703505-29672-1-git-send-email-serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
  2015-11-16 19:51     ` serge
@ 2015-11-16 19:51   ` serge-A9i7LUbDfNHQT0dZR+AlfA
  2015-11-16 19:51     ` serge
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 180+ messages in thread
From: serge-A9i7LUbDfNHQT0dZR+AlfA @ 2015-11-16 19:51 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	tj-DgEjT+Ai2ygdnm+yROfE0A, cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

From: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>

CLONE_NEWCGROUP will be used to create new cgroup namespace.

Signed-off-by: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
 include/uapi/linux/sched.h |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index cc89dde..5f0fe01 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -21,8 +21,7 @@
 #define CLONE_DETACHED		0x00400000	/* Unused, ignored */
 #define CLONE_UNTRACED		0x00800000	/* set if the tracing process can't force CLONE_PTRACE on this clone */
 #define CLONE_CHILD_SETTID	0x01000000	/* set the TID in the child */
-/* 0x02000000 was previously the unused CLONE_STOPPED (Start in stopped state)
-   and is now available for re-use. */
+#define CLONE_NEWCGROUP		0x02000000	/* New cgroup namespace */
 #define CLONE_NEWUTS		0x04000000	/* New utsname namespace */
 #define CLONE_NEWIPC		0x08000000	/* New ipc namespace */
 #define CLONE_NEWUSER		0x10000000	/* New user namespace */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 2/8] sched: new clone flag CLONE_NEWCGROUP for cgroup namespace
       [not found] ` <1447703505-29672-1-git-send-email-serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
@ 2015-11-16 19:51   ` serge-A9i7LUbDfNHQT0dZR+AlfA
  2015-11-16 19:51   ` [PATCH 2/8] sched: new clone flag CLONE_NEWCGROUP for cgroup namespace serge-A9i7LUbDfNHQT0dZR+AlfA
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 180+ messages in thread
From: serge @ 2015-11-16 19:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: adityakali, tj, linux-api, containers, cgroups, lxc-devel, akpm,
	ebiederm

From: Aditya Kali <adityakali@google.com>

CLONE_NEWCGROUP will be used to create new cgroup namespace.

Signed-off-by: Aditya Kali <adityakali@google.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
---
 include/uapi/linux/sched.h |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index cc89dde..5f0fe01 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -21,8 +21,7 @@
 #define CLONE_DETACHED		0x00400000	/* Unused, ignored */
 #define CLONE_UNTRACED		0x00800000	/* set if the tracing process can't force CLONE_PTRACE on this clone */
 #define CLONE_CHILD_SETTID	0x01000000	/* set the TID in the child */
-/* 0x02000000 was previously the unused CLONE_STOPPED (Start in stopped state)
-   and is now available for re-use. */
+#define CLONE_NEWCGROUP		0x02000000	/* New cgroup namespace */
 #define CLONE_NEWUTS		0x04000000	/* New utsname namespace */
 #define CLONE_NEWIPC		0x08000000	/* New ipc namespace */
 #define CLONE_NEWUSER		0x10000000	/* New user namespace */
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 2/8] sched: new clone flag CLONE_NEWCGROUP for cgroup namespace
@ 2015-11-16 19:51   ` serge-A9i7LUbDfNHQT0dZR+AlfA
  0 siblings, 0 replies; 180+ messages in thread
From: serge-A9i7LUbDfNHQT0dZR+AlfA @ 2015-11-16 19:51 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: adityakali-hpIqsD4AKlfQT0dZR+AlfA, tj-DgEjT+Ai2ygdnm+yROfE0A,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

From: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>

CLONE_NEWCGROUP will be used to create new cgroup namespace.

Signed-off-by: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
 include/uapi/linux/sched.h |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index cc89dde..5f0fe01 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -21,8 +21,7 @@
 #define CLONE_DETACHED		0x00400000	/* Unused, ignored */
 #define CLONE_UNTRACED		0x00800000	/* set if the tracing process can't force CLONE_PTRACE on this clone */
 #define CLONE_CHILD_SETTID	0x01000000	/* set the TID in the child */
-/* 0x02000000 was previously the unused CLONE_STOPPED (Start in stopped state)
-   and is now available for re-use. */
+#define CLONE_NEWCGROUP		0x02000000	/* New cgroup namespace */
 #define CLONE_NEWUTS		0x04000000	/* New utsname namespace */
 #define CLONE_NEWIPC		0x08000000	/* New ipc namespace */
 #define CLONE_NEWUSER		0x10000000	/* New user namespace */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 3/8] cgroup: add function to get task's cgroup
  2015-11-16 19:51 ` serge
@ 2015-11-16 19:51     ` serge
  -1 siblings, 0 replies; 180+ messages in thread
From: serge-A9i7LUbDfNHQT0dZR+AlfA @ 2015-11-16 19:51 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	tj-DgEjT+Ai2ygdnm+yROfE0A, cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

From: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>

get_task_cgroup() returns the (reference counted) cgroup of the
given task.

Signed-off-by: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
 include/linux/cgroup.h |    1 +
 kernel/cgroup.c        |   25 +++++++++++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 22e3754..29f0b02 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -326,6 +326,7 @@ static inline bool css_tryget_online(struct cgroup_subsys_state *css)
 		return percpu_ref_tryget_live(&css->refcnt);
 	return true;
 }
+struct cgroup *get_task_cgroup(struct task_struct *task);
 
 /**
  * css_put - put a css reference
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index f1603c1..e29c346 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2210,6 +2210,31 @@ char *task_cgroup_path(struct task_struct *task, char *buf, size_t buflen)
 }
 EXPORT_SYMBOL_GPL(task_cgroup_path);
 
+/*
+ * get_task_cgroup - returns the cgroup of the task in the default cgroup
+ * hierarchy.
+ *
+ * @task: target task
+ * This function returns the @task's cgroup on the default cgroup hierarchy. The
+ * returned cgroup has its reference incremented (by calling cgroup_get()). So
+ * the caller must cgroup_put() the obtained reference once it is done with it.
+ */
+struct cgroup *get_task_cgroup(struct task_struct *task)
+{
+	struct cgroup *cgrp;
+
+	mutex_lock(&cgroup_mutex);
+	spin_lock_bh(&css_set_lock);
+
+	cgrp = task_cgroup_from_root(task, &cgrp_dfl_root);
+	cgroup_get(cgrp);
+
+	spin_unlock_bh(&css_set_lock);
+	mutex_unlock(&cgroup_mutex);
+	return cgrp;
+}
+EXPORT_SYMBOL_GPL(get_task_cgroup);
+
 /* used to track tasks and other necessary states during migration */
 struct cgroup_taskset {
 	/* the src and dst cset list running through cset->mg_node */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 3/8] cgroup: add function to get task's cgroup
@ 2015-11-16 19:51     ` serge
  0 siblings, 0 replies; 180+ messages in thread
From: serge @ 2015-11-16 19:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: adityakali, tj, linux-api, containers, cgroups, lxc-devel, akpm,
	ebiederm

From: Aditya Kali <adityakali@google.com>

get_task_cgroup() returns the (reference counted) cgroup of the
given task.

Signed-off-by: Aditya Kali <adityakali@google.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
---
 include/linux/cgroup.h |    1 +
 kernel/cgroup.c        |   25 +++++++++++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 22e3754..29f0b02 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -326,6 +326,7 @@ static inline bool css_tryget_online(struct cgroup_subsys_state *css)
 		return percpu_ref_tryget_live(&css->refcnt);
 	return true;
 }
+struct cgroup *get_task_cgroup(struct task_struct *task);
 
 /**
  * css_put - put a css reference
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index f1603c1..e29c346 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2210,6 +2210,31 @@ char *task_cgroup_path(struct task_struct *task, char *buf, size_t buflen)
 }
 EXPORT_SYMBOL_GPL(task_cgroup_path);
 
+/*
+ * get_task_cgroup - returns the cgroup of the task in the default cgroup
+ * hierarchy.
+ *
+ * @task: target task
+ * This function returns the @task's cgroup on the default cgroup hierarchy. The
+ * returned cgroup has its reference incremented (by calling cgroup_get()). So
+ * the caller must cgroup_put() the obtained reference once it is done with it.
+ */
+struct cgroup *get_task_cgroup(struct task_struct *task)
+{
+	struct cgroup *cgrp;
+
+	mutex_lock(&cgroup_mutex);
+	spin_lock_bh(&css_set_lock);
+
+	cgrp = task_cgroup_from_root(task, &cgrp_dfl_root);
+	cgroup_get(cgrp);
+
+	spin_unlock_bh(&css_set_lock);
+	mutex_unlock(&cgroup_mutex);
+	return cgrp;
+}
+EXPORT_SYMBOL_GPL(get_task_cgroup);
+
 /* used to track tasks and other necessary states during migration */
 struct cgroup_taskset {
 	/* the src and dst cset list running through cset->mg_node */
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 4/8] cgroup: export cgroup_get() and cgroup_put()
       [not found] ` <1447703505-29672-1-git-send-email-serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
                     ` (2 preceding siblings ...)
  2015-11-16 19:51     ` serge
@ 2015-11-16 19:51   ` serge-A9i7LUbDfNHQT0dZR+AlfA
  2015-11-16 19:51     ` serge
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 180+ messages in thread
From: serge-A9i7LUbDfNHQT0dZR+AlfA @ 2015-11-16 19:51 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	tj-DgEjT+Ai2ygdnm+yROfE0A, cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

From: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>

move cgroup_get() and cgroup_put() into cgroup.h so that
they can be called from other places.

Signed-off-by: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
 include/linux/cgroup.h |   21 +++++++++++++++++++++
 kernel/cgroup.c        |   22 ----------------------
 2 files changed, 21 insertions(+), 22 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 29f0b02..99096be 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -231,6 +231,27 @@ void css_task_iter_end(struct css_task_iter *it);
 #define css_for_each_descendant_post(pos, css)				\
 	for ((pos) = css_next_descendant_post(NULL, (css)); (pos);	\
 	     (pos) = css_next_descendant_post((pos), (css)))
+/* convenient tests for these bits */
+static inline bool cgroup_is_dead(const struct cgroup *cgrp)
+{
+	return !(cgrp->self.flags & CSS_ONLINE);
+}
+
+static inline void cgroup_get(struct cgroup *cgrp)
+{
+	WARN_ON_ONCE(cgroup_is_dead(cgrp));
+	css_get(&cgrp->self);
+}
+
+static inline bool cgroup_tryget(struct cgroup *cgrp)
+{
+	return css_tryget(&cgrp->self);
+}
+
+static inline void cgroup_put(struct cgroup *cgrp)
+{
+	css_put(&cgrp->self);
+}
 
 /**
  * cgroup_taskset_for_each - iterate cgroup_taskset
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index e29c346..e972259 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -417,28 +417,6 @@ out_unlock:
 	return css;
 }
 
-/* convenient tests for these bits */
-static inline bool cgroup_is_dead(const struct cgroup *cgrp)
-{
-	return !(cgrp->self.flags & CSS_ONLINE);
-}
-
-static void cgroup_get(struct cgroup *cgrp)
-{
-	WARN_ON_ONCE(cgroup_is_dead(cgrp));
-	css_get(&cgrp->self);
-}
-
-static bool cgroup_tryget(struct cgroup *cgrp)
-{
-	return css_tryget(&cgrp->self);
-}
-
-static void cgroup_put(struct cgroup *cgrp)
-{
-	css_put(&cgrp->self);
-}
-
 struct cgroup_subsys_state *of_css(struct kernfs_open_file *of)
 {
 	struct cgroup *cgrp = of->kn->parent->priv;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 4/8] cgroup: export cgroup_get() and cgroup_put()
       [not found] ` <1447703505-29672-1-git-send-email-serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
@ 2015-11-16 19:51   ` serge-A9i7LUbDfNHQT0dZR+AlfA
  2015-11-16 19:51   ` [PATCH 2/8] sched: new clone flag CLONE_NEWCGROUP for cgroup namespace serge-A9i7LUbDfNHQT0dZR+AlfA
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 180+ messages in thread
From: serge @ 2015-11-16 19:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: adityakali, tj, linux-api, containers, cgroups, lxc-devel, akpm,
	ebiederm

From: Aditya Kali <adityakali@google.com>

move cgroup_get() and cgroup_put() into cgroup.h so that
they can be called from other places.

Signed-off-by: Aditya Kali <adityakali@google.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
---
 include/linux/cgroup.h |   21 +++++++++++++++++++++
 kernel/cgroup.c        |   22 ----------------------
 2 files changed, 21 insertions(+), 22 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 29f0b02..99096be 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -231,6 +231,27 @@ void css_task_iter_end(struct css_task_iter *it);
 #define css_for_each_descendant_post(pos, css)				\
 	for ((pos) = css_next_descendant_post(NULL, (css)); (pos);	\
 	     (pos) = css_next_descendant_post((pos), (css)))
+/* convenient tests for these bits */
+static inline bool cgroup_is_dead(const struct cgroup *cgrp)
+{
+	return !(cgrp->self.flags & CSS_ONLINE);
+}
+
+static inline void cgroup_get(struct cgroup *cgrp)
+{
+	WARN_ON_ONCE(cgroup_is_dead(cgrp));
+	css_get(&cgrp->self);
+}
+
+static inline bool cgroup_tryget(struct cgroup *cgrp)
+{
+	return css_tryget(&cgrp->self);
+}
+
+static inline void cgroup_put(struct cgroup *cgrp)
+{
+	css_put(&cgrp->self);
+}
 
 /**
  * cgroup_taskset_for_each - iterate cgroup_taskset
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index e29c346..e972259 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -417,28 +417,6 @@ out_unlock:
 	return css;
 }
 
-/* convenient tests for these bits */
-static inline bool cgroup_is_dead(const struct cgroup *cgrp)
-{
-	return !(cgrp->self.flags & CSS_ONLINE);
-}
-
-static void cgroup_get(struct cgroup *cgrp)
-{
-	WARN_ON_ONCE(cgroup_is_dead(cgrp));
-	css_get(&cgrp->self);
-}
-
-static bool cgroup_tryget(struct cgroup *cgrp)
-{
-	return css_tryget(&cgrp->self);
-}
-
-static void cgroup_put(struct cgroup *cgrp)
-{
-	css_put(&cgrp->self);
-}
-
 struct cgroup_subsys_state *of_css(struct kernfs_open_file *of)
 {
 	struct cgroup *cgrp = of->kn->parent->priv;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 4/8] cgroup: export cgroup_get() and cgroup_put()
@ 2015-11-16 19:51   ` serge-A9i7LUbDfNHQT0dZR+AlfA
  0 siblings, 0 replies; 180+ messages in thread
From: serge-A9i7LUbDfNHQT0dZR+AlfA @ 2015-11-16 19:51 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: adityakali-hpIqsD4AKlfQT0dZR+AlfA, tj-DgEjT+Ai2ygdnm+yROfE0A,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

From: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>

move cgroup_get() and cgroup_put() into cgroup.h so that
they can be called from other places.

Signed-off-by: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
 include/linux/cgroup.h |   21 +++++++++++++++++++++
 kernel/cgroup.c        |   22 ----------------------
 2 files changed, 21 insertions(+), 22 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 29f0b02..99096be 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -231,6 +231,27 @@ void css_task_iter_end(struct css_task_iter *it);
 #define css_for_each_descendant_post(pos, css)				\
 	for ((pos) = css_next_descendant_post(NULL, (css)); (pos);	\
 	     (pos) = css_next_descendant_post((pos), (css)))
+/* convenient tests for these bits */
+static inline bool cgroup_is_dead(const struct cgroup *cgrp)
+{
+	return !(cgrp->self.flags & CSS_ONLINE);
+}
+
+static inline void cgroup_get(struct cgroup *cgrp)
+{
+	WARN_ON_ONCE(cgroup_is_dead(cgrp));
+	css_get(&cgrp->self);
+}
+
+static inline bool cgroup_tryget(struct cgroup *cgrp)
+{
+	return css_tryget(&cgrp->self);
+}
+
+static inline void cgroup_put(struct cgroup *cgrp)
+{
+	css_put(&cgrp->self);
+}
 
 /**
  * cgroup_taskset_for_each - iterate cgroup_taskset
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index e29c346..e972259 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -417,28 +417,6 @@ out_unlock:
 	return css;
 }
 
-/* convenient tests for these bits */
-static inline bool cgroup_is_dead(const struct cgroup *cgrp)
-{
-	return !(cgrp->self.flags & CSS_ONLINE);
-}
-
-static void cgroup_get(struct cgroup *cgrp)
-{
-	WARN_ON_ONCE(cgroup_is_dead(cgrp));
-	css_get(&cgrp->self);
-}
-
-static bool cgroup_tryget(struct cgroup *cgrp)
-{
-	return css_tryget(&cgrp->self);
-}
-
-static void cgroup_put(struct cgroup *cgrp)
-{
-	css_put(&cgrp->self);
-}
-
 struct cgroup_subsys_state *of_css(struct kernfs_open_file *of)
 {
 	struct cgroup *cgrp = of->kn->parent->priv;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 5/8] cgroup: introduce cgroup namespaces
  2015-11-16 19:51 ` serge
@ 2015-11-16 19:51     ` serge
  -1 siblings, 0 replies; 180+ messages in thread
From: serge-A9i7LUbDfNHQT0dZR+AlfA @ 2015-11-16 19:51 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	tj-DgEjT+Ai2ygdnm+yROfE0A, cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

From: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>

Introduce the ability to create new cgroup namespace. The newly created
cgroup namespace remembers the cgroup of the process at the point
of creation of the cgroup namespace (referred as cgroupns-root).
The main purpose of cgroup namespace is to virtualize the contents
of /proc/self/cgroup file. Processes inside a cgroup namespace
are only able to see paths relative to their namespace root
(unless they are moved outside of their cgroupns-root, at which point
 they will see a relative path from their cgroupns-root).
For a correctly setup container this enables container-tools
(like libcontainer, lxc, lmctfy, etc.) to create completely virtualized
containers without leaking system level cgroup hierarchy to the task.
This patch only implements the 'unshare' part of the cgroupns.

Signed-off-by: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
---
 fs/proc/namespaces.c             |    3 +
 include/linux/cgroup.h           |   21 ++++---
 include/linux/cgroup_namespace.h |   46 ++++++++++++++
 include/linux/nsproxy.h          |    2 +
 include/linux/proc_ns.h          |    4 ++
 kernel/Makefile                  |    2 +-
 kernel/cgroup.c                  |   39 +++++++++++-
 kernel/cgroup_namespace.c        |  127 ++++++++++++++++++++++++++++++++++++++
 kernel/fork.c                    |    2 +-
 kernel/nsproxy.c                 |   21 ++++++-
 10 files changed, 252 insertions(+), 15 deletions(-)
 create mode 100644 include/linux/cgroup_namespace.h
 create mode 100644 kernel/cgroup_namespace.c

diff --git a/fs/proc/namespaces.c b/fs/proc/namespaces.c
index f6e8354..bd61075 100644
--- a/fs/proc/namespaces.c
+++ b/fs/proc/namespaces.c
@@ -28,6 +28,9 @@ static const struct proc_ns_operations *ns_entries[] = {
 	&userns_operations,
 #endif
 	&mntns_operations,
+#ifdef CONFIG_CGROUPS
+	&cgroupns_operations,
+#endif
 };
 
 static const char *proc_ns_follow_link(struct dentry *dentry, void **cookie)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 99096be..b3ce9d9 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -17,6 +17,9 @@
 #include <linux/seq_file.h>
 #include <linux/kernfs.h>
 #include <linux/jump_label.h>
+#include <linux/nsproxy.h>
+#include <linux/types.h>
+#include <linux/ns_common.h>
 
 #include <linux/cgroup-defs.h>
 
@@ -237,6 +240,10 @@ static inline bool cgroup_is_dead(const struct cgroup *cgrp)
 	return !(cgrp->self.flags & CSS_ONLINE);
 }
 
+static inline void css_get(struct cgroup_subsys_state *css);
+static inline void css_put(struct cgroup_subsys_state *css);
+static inline bool css_tryget(struct cgroup_subsys_state *css);
+
 static inline void cgroup_get(struct cgroup *cgrp)
 {
 	WARN_ON_ONCE(cgroup_is_dead(cgrp));
@@ -284,9 +291,11 @@ static inline void cgroup_put(struct cgroup *cgrp)
 			;						\
 		else
 
-/*
- * Inline functions.
- */
+extern char * __must_check cgroup_path_ns(struct cgroup_namespace *ns,
+		struct cgroup *cgrp, char *buf, size_t buflen);
+
+extern char * __must_check cgroup_path(struct cgroup *cgrp, char *buf,
+		size_t buflen);
 
 /**
  * css_get - obtain a reference on the specified css
@@ -522,12 +531,6 @@ static inline int cgroup_name(struct cgroup *cgrp, char *buf, size_t buflen)
 	return kernfs_name(cgrp->kn, buf, buflen);
 }
 
-static inline char * __must_check cgroup_path(struct cgroup *cgrp, char *buf,
-					      size_t buflen)
-{
-	return kernfs_path(cgrp->kn, buf, buflen);
-}
-
 static inline void pr_cont_cgroup_name(struct cgroup *cgrp)
 {
 	pr_cont_kernfs_name(cgrp->kn);
diff --git a/include/linux/cgroup_namespace.h b/include/linux/cgroup_namespace.h
new file mode 100644
index 0000000..ed181c3
--- /dev/null
+++ b/include/linux/cgroup_namespace.h
@@ -0,0 +1,46 @@
+#ifndef _LINUX_CGROUP_NAMESPACE_H
+#define _LINUX_CGROUP_NAMESPACE_H
+
+#include <linux/nsproxy.h>
+#include <linux/cgroup.h>
+#include <linux/types.h>
+#include <linux/user_namespace.h>
+
+struct css_set;
+struct cgroup_namespace {
+	atomic_t		count;
+	struct ns_common	ns;
+	struct user_namespace	*user_ns;
+	struct css_set          *root_cgrps;
+};
+
+extern struct cgroup_namespace init_cgroup_ns;
+
+static inline struct cgroup_namespace *get_cgroup_ns(
+		struct cgroup_namespace *ns)
+{
+	if (ns)
+		atomic_inc(&ns->count);
+	return ns;
+}
+
+#ifdef CONFIG_CGROUPS
+extern void free_cgroup_ns(struct cgroup_namespace *ns);
+extern struct cgroup_namespace *copy_cgroup_ns(unsigned long flags,
+					       struct user_namespace *user_ns,
+					       struct cgroup_namespace *old_ns);
+#else /* CONFIG_CGROUP */
+static inline void free_cgroup_ns(struct cgroup_namespace *ns) { }
+static inline struct cgroup_namespace *copy_cgroup_ns(unsigned long flags,
+					struct user_namespace *user_ns,
+					struct cgroup_namespace *old_ns)
+{ return old_ns; }
+#endif
+
+static inline void put_cgroup_ns(struct cgroup_namespace *ns)
+{
+	if (ns && atomic_dec_and_test(&ns->count))
+		free_cgroup_ns(ns);
+}
+
+#endif  /* _LINUX_CGROUP_NAMESPACE_H */
diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h
index 35fa08f..ac0d65b 100644
--- a/include/linux/nsproxy.h
+++ b/include/linux/nsproxy.h
@@ -8,6 +8,7 @@ struct mnt_namespace;
 struct uts_namespace;
 struct ipc_namespace;
 struct pid_namespace;
+struct cgroup_namespace;
 struct fs_struct;
 
 /*
@@ -33,6 +34,7 @@ struct nsproxy {
 	struct mnt_namespace *mnt_ns;
 	struct pid_namespace *pid_ns_for_children;
 	struct net 	     *net_ns;
+	struct cgroup_namespace *cgroup_ns;
 };
 extern struct nsproxy init_nsproxy;
 
diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h
index 42dfc61..de0e771 100644
--- a/include/linux/proc_ns.h
+++ b/include/linux/proc_ns.h
@@ -9,6 +9,8 @@
 struct pid_namespace;
 struct nsproxy;
 struct path;
+struct task_struct;
+struct inode;
 
 struct proc_ns_operations {
 	const char *name;
@@ -24,6 +26,7 @@ extern const struct proc_ns_operations ipcns_operations;
 extern const struct proc_ns_operations pidns_operations;
 extern const struct proc_ns_operations userns_operations;
 extern const struct proc_ns_operations mntns_operations;
+extern const struct proc_ns_operations cgroupns_operations;
 
 /*
  * We always define these enumerators
@@ -34,6 +37,7 @@ enum {
 	PROC_UTS_INIT_INO	= 0xEFFFFFFEU,
 	PROC_USER_INIT_INO	= 0xEFFFFFFDU,
 	PROC_PID_INIT_INO	= 0xEFFFFFFCU,
+	PROC_CGROUP_INIT_INO	= 0xEFFFFFFBU,
 };
 
 #ifdef CONFIG_PROC_FS
diff --git a/kernel/Makefile b/kernel/Makefile
index 53abf00..1dce664 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -54,7 +54,7 @@ obj-$(CONFIG_KEXEC) += kexec.o
 obj-$(CONFIG_KEXEC_FILE) += kexec_file.o
 obj-$(CONFIG_BACKTRACE_SELF_TEST) += backtracetest.o
 obj-$(CONFIG_COMPAT) += compat.o
-obj-$(CONFIG_CGROUPS) += cgroup.o
+obj-$(CONFIG_CGROUPS) += cgroup.o cgroup_namespace.o
 obj-$(CONFIG_CGROUP_FREEZER) += cgroup_freezer.o
 obj-$(CONFIG_CGROUP_PIDS) += cgroup_pids.o
 obj-$(CONFIG_CPUSETS) += cpuset.o
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index e972259..1d696de 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -57,6 +57,8 @@
 #include <linux/vmalloc.h> /* TODO: replace with more sophisticated array */
 #include <linux/kthread.h>
 #include <linux/delay.h>
+#include <linux/proc_ns.h>
+#include <linux/cgroup_namespace.h>
 
 #include <linux/atomic.h>
 
@@ -290,6 +292,15 @@ static bool cgroup_on_dfl(const struct cgroup *cgrp)
 {
 	return cgrp->root == &cgrp_dfl_root;
 }
+struct cgroup_namespace init_cgroup_ns = {
+	.count = {
+		.counter = 1,
+	},
+	.user_ns = &init_user_ns,
+	.ns.ops = &cgroupns_operations,
+	.ns.inum = PROC_CGROUP_INIT_INO,
+	.root_cgrps = &init_css_set,
+};
 
 /* IDR wrappers which synchronize using cgroup_idr_lock */
 static int cgroup_idr_alloc(struct idr *idr, void *ptr, int start, int end,
@@ -749,7 +760,7 @@ static void put_css_set_locked(struct css_set *cset)
 	kfree_rcu(cset, rcu_head);
 }
 
-static void put_css_set(struct css_set *cset)
+void put_css_set(struct css_set *cset)
 {
 	/*
 	 * Ensure that the refcount doesn't hit zero while any readers
@@ -767,7 +778,7 @@ static void put_css_set(struct css_set *cset)
 /*
  * refcounted get/put for css_set objects
  */
-static inline void get_css_set(struct css_set *cset)
+void get_css_set(struct css_set *cset)
 {
 	atomic_inc(&cset->refcount);
 }
@@ -2148,6 +2159,28 @@ static struct file_system_type cgroup_fs_type = {
 	.kill_sb = cgroup_kill_sb,
 };
 
+char * __must_check cgroup_path_ns(struct cgroup_namespace *ns,
+						 struct cgroup *cgrp, char *buf,
+						 size_t buflen)
+{
+	if (ns) {
+		struct cgroup *root;
+		root = cset_cgroup_from_root(ns->root_cgrps, cgrp->root);
+		return kernfs_path_from_node(root->kn, cgrp->kn, buf,
+					     buflen);
+	} else {
+		return kernfs_path(cgrp->kn, buf, buflen);
+	}
+}
+
+char * __must_check cgroup_path(struct cgroup *cgrp, char *buf,
+					      size_t buflen)
+{
+	return cgroup_path_ns(current->nsproxy->cgroup_ns, cgrp, buf,
+				      buflen);
+}
+EXPORT_SYMBOL_GPL(cgroup_path);
+
 /**
  * task_cgroup_path - cgroup path of a task in the first cgroup hierarchy
  * @task: target task
@@ -5237,6 +5270,8 @@ int __init cgroup_init(void)
 	BUG_ON(cgroup_init_cftypes(NULL, cgroup_dfl_base_files));
 	BUG_ON(cgroup_init_cftypes(NULL, cgroup_legacy_base_files));
 
+	get_user_ns(init_cgroup_ns.user_ns);
+
 	mutex_lock(&cgroup_mutex);
 
 	/* Add init_css_set to the hash table */
diff --git a/kernel/cgroup_namespace.c b/kernel/cgroup_namespace.c
new file mode 100644
index 0000000..ef20777
--- /dev/null
+++ b/kernel/cgroup_namespace.c
@@ -0,0 +1,127 @@
+/*
+ *  Copyright (C) 2014 Google Inc.
+ *
+ *  Author: Aditya Kali (adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org)
+ *
+ *  This program is free software; you can redistribute it and/or modify it
+ *  under the terms of the GNU General Public License as published by the Free
+ *  Software Foundation, version 2 of the License.
+ */
+
+#include <linux/cgroup.h>
+#include <linux/cgroup_namespace.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/nsproxy.h>
+#include <linux/proc_ns.h>
+
+const struct proc_ns_operations cgroupns_operations;
+
+static struct cgroup_namespace *alloc_cgroup_ns(void)
+{
+	struct cgroup_namespace *new_ns;
+	int ret;
+
+	new_ns = kzalloc(sizeof(struct cgroup_namespace), GFP_KERNEL);
+	if (!new_ns)
+		return ERR_PTR(-ENOMEM);
+	ret = ns_alloc_inum(&new_ns->ns);
+	if (ret) {
+		kfree(new_ns);
+		return ERR_PTR(ret);
+	}
+	atomic_set(&new_ns->count, 1);
+	new_ns->ns.ops = &cgroupns_operations;
+	return new_ns;
+}
+
+extern void put_css_set(struct css_set *cset);
+extern  void get_css_set(struct css_set *cset);
+
+void free_cgroup_ns(struct cgroup_namespace *ns)
+{
+	put_css_set(ns->root_cgrps);
+	put_user_ns(ns->user_ns);
+	ns_free_inum(&ns->ns);
+	kfree(ns);
+}
+EXPORT_SYMBOL(free_cgroup_ns);
+
+struct cgroup_namespace *copy_cgroup_ns(unsigned long flags,
+					struct user_namespace *user_ns,
+					struct cgroup_namespace *old_ns)
+{
+	struct cgroup_namespace *new_ns = NULL;
+	struct css_set *cgrps = NULL;
+	int err;
+
+	BUG_ON(!old_ns);
+
+	if (!(flags & CLONE_NEWCGROUP))
+		return get_cgroup_ns(old_ns);
+
+	/* Allow only sysadmin to create cgroup namespace. */
+	err = -EPERM;
+	if (!ns_capable(user_ns, CAP_SYS_ADMIN))
+		goto err_out;
+
+	cgrps = task_css_set(current);
+	get_css_set(cgrps);
+
+	err = -ENOMEM;
+	new_ns = alloc_cgroup_ns();
+	if (!new_ns)
+		goto err_out;
+
+	new_ns->user_ns = get_user_ns(user_ns);
+	new_ns->root_cgrps = cgrps;
+
+	return new_ns;
+
+err_out:
+	if (cgrps)
+		put_css_set(cgrps);
+	kfree(new_ns);
+	return ERR_PTR(err);
+}
+
+static int cgroupns_install(struct nsproxy *nsproxy, void *ns)
+{
+	pr_info("setns not supported for cgroup namespace");
+	return -EINVAL;
+}
+
+static struct ns_common *cgroupns_get(struct task_struct *task)
+{
+	struct cgroup_namespace *ns = NULL;
+	struct nsproxy *nsproxy;
+
+	task_lock(task);
+	nsproxy = task->nsproxy;
+	if (nsproxy) {
+		ns = nsproxy->cgroup_ns;
+		get_cgroup_ns(ns);
+	}
+	task_unlock(task);
+
+	return ns ? &ns->ns : NULL;
+}
+
+static void cgroupns_put(struct ns_common *ns)
+{
+	put_cgroup_ns(to_cg_ns(ns));
+}
+
+const struct proc_ns_operations cgroupns_operations = {
+	.name		= "cgroup",
+	.type		= CLONE_NEWCGROUP,
+	.get		= cgroupns_get,
+	.put		= cgroupns_put,
+	.install	= cgroupns_install,
+};
+
+static __init int cgroup_namespaces_init(void)
+{
+	return 0;
+}
+subsys_initcall(cgroup_namespaces_init);
diff --git a/kernel/fork.c b/kernel/fork.c
index f97f2c4..c16e6a3 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1884,7 +1884,7 @@ static int check_unshare_flags(unsigned long unshare_flags)
 	if (unshare_flags & ~(CLONE_THREAD|CLONE_FS|CLONE_NEWNS|CLONE_SIGHAND|
 				CLONE_VM|CLONE_FILES|CLONE_SYSVSEM|
 				CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWNET|
-				CLONE_NEWUSER|CLONE_NEWPID))
+				CLONE_NEWUSER|CLONE_NEWPID|CLONE_NEWCGROUP))
 		return -EINVAL;
 	/*
 	 * Not implemented, but pretend it works if there is nothing
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index 49746c8..7f51796 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -25,6 +25,7 @@
 #include <linux/proc_ns.h>
 #include <linux/file.h>
 #include <linux/syscalls.h>
+#include <linux/cgroup_namespace.h>
 
 static struct kmem_cache *nsproxy_cachep;
 
@@ -39,6 +40,9 @@ struct nsproxy init_nsproxy = {
 #ifdef CONFIG_NET
 	.net_ns			= &init_net,
 #endif
+#ifdef CONFIG_CGROUPS
+	.cgroup_ns		= &init_cgroup_ns,
+#endif
 };
 
 static inline struct nsproxy *create_nsproxy(void)
@@ -92,6 +96,13 @@ static struct nsproxy *create_new_namespaces(unsigned long flags,
 		goto out_pid;
 	}
 
+	new_nsp->cgroup_ns = copy_cgroup_ns(flags, user_ns,
+					    tsk->nsproxy->cgroup_ns);
+	if (IS_ERR(new_nsp->cgroup_ns)) {
+		err = PTR_ERR(new_nsp->cgroup_ns);
+		goto out_cgroup;
+	}
+
 	new_nsp->net_ns = copy_net_ns(flags, user_ns, tsk->nsproxy->net_ns);
 	if (IS_ERR(new_nsp->net_ns)) {
 		err = PTR_ERR(new_nsp->net_ns);
@@ -101,6 +112,9 @@ static struct nsproxy *create_new_namespaces(unsigned long flags,
 	return new_nsp;
 
 out_net:
+	if (new_nsp->cgroup_ns)
+		put_cgroup_ns(new_nsp->cgroup_ns);
+out_cgroup:
 	if (new_nsp->pid_ns_for_children)
 		put_pid_ns(new_nsp->pid_ns_for_children);
 out_pid:
@@ -128,7 +142,8 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
 	struct nsproxy *new_ns;
 
 	if (likely(!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
-			      CLONE_NEWPID | CLONE_NEWNET)))) {
+			      CLONE_NEWPID | CLONE_NEWNET |
+			      CLONE_NEWCGROUP)))) {
 		get_nsproxy(old_ns);
 		return 0;
 	}
@@ -165,6 +180,8 @@ void free_nsproxy(struct nsproxy *ns)
 		put_ipc_ns(ns->ipc_ns);
 	if (ns->pid_ns_for_children)
 		put_pid_ns(ns->pid_ns_for_children);
+	if (ns->cgroup_ns)
+		put_cgroup_ns(ns->cgroup_ns);
 	put_net(ns->net_ns);
 	kmem_cache_free(nsproxy_cachep, ns);
 }
@@ -180,7 +197,7 @@ int unshare_nsproxy_namespaces(unsigned long unshare_flags,
 	int err = 0;
 
 	if (!(unshare_flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
-			       CLONE_NEWNET | CLONE_NEWPID)))
+			       CLONE_NEWNET | CLONE_NEWPID | CLONE_NEWCGROUP)))
 		return 0;
 
 	user_ns = new_cred ? new_cred->user_ns : current_user_ns();
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 5/8] cgroup: introduce cgroup namespaces
@ 2015-11-16 19:51     ` serge
  0 siblings, 0 replies; 180+ messages in thread
From: serge @ 2015-11-16 19:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: adityakali, tj, linux-api, containers, cgroups, lxc-devel, akpm,
	ebiederm, Serge Hallyn

From: Aditya Kali <adityakali@google.com>

Introduce the ability to create new cgroup namespace. The newly created
cgroup namespace remembers the cgroup of the process at the point
of creation of the cgroup namespace (referred as cgroupns-root).
The main purpose of cgroup namespace is to virtualize the contents
of /proc/self/cgroup file. Processes inside a cgroup namespace
are only able to see paths relative to their namespace root
(unless they are moved outside of their cgroupns-root, at which point
 they will see a relative path from their cgroupns-root).
For a correctly setup container this enables container-tools
(like libcontainer, lxc, lmctfy, etc.) to create completely virtualized
containers without leaking system level cgroup hierarchy to the task.
This patch only implements the 'unshare' part of the cgroupns.

Signed-off-by: Aditya Kali <adityakali@google.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
---
 fs/proc/namespaces.c             |    3 +
 include/linux/cgroup.h           |   21 ++++---
 include/linux/cgroup_namespace.h |   46 ++++++++++++++
 include/linux/nsproxy.h          |    2 +
 include/linux/proc_ns.h          |    4 ++
 kernel/Makefile                  |    2 +-
 kernel/cgroup.c                  |   39 +++++++++++-
 kernel/cgroup_namespace.c        |  127 ++++++++++++++++++++++++++++++++++++++
 kernel/fork.c                    |    2 +-
 kernel/nsproxy.c                 |   21 ++++++-
 10 files changed, 252 insertions(+), 15 deletions(-)
 create mode 100644 include/linux/cgroup_namespace.h
 create mode 100644 kernel/cgroup_namespace.c

diff --git a/fs/proc/namespaces.c b/fs/proc/namespaces.c
index f6e8354..bd61075 100644
--- a/fs/proc/namespaces.c
+++ b/fs/proc/namespaces.c
@@ -28,6 +28,9 @@ static const struct proc_ns_operations *ns_entries[] = {
 	&userns_operations,
 #endif
 	&mntns_operations,
+#ifdef CONFIG_CGROUPS
+	&cgroupns_operations,
+#endif
 };
 
 static const char *proc_ns_follow_link(struct dentry *dentry, void **cookie)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 99096be..b3ce9d9 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -17,6 +17,9 @@
 #include <linux/seq_file.h>
 #include <linux/kernfs.h>
 #include <linux/jump_label.h>
+#include <linux/nsproxy.h>
+#include <linux/types.h>
+#include <linux/ns_common.h>
 
 #include <linux/cgroup-defs.h>
 
@@ -237,6 +240,10 @@ static inline bool cgroup_is_dead(const struct cgroup *cgrp)
 	return !(cgrp->self.flags & CSS_ONLINE);
 }
 
+static inline void css_get(struct cgroup_subsys_state *css);
+static inline void css_put(struct cgroup_subsys_state *css);
+static inline bool css_tryget(struct cgroup_subsys_state *css);
+
 static inline void cgroup_get(struct cgroup *cgrp)
 {
 	WARN_ON_ONCE(cgroup_is_dead(cgrp));
@@ -284,9 +291,11 @@ static inline void cgroup_put(struct cgroup *cgrp)
 			;						\
 		else
 
-/*
- * Inline functions.
- */
+extern char * __must_check cgroup_path_ns(struct cgroup_namespace *ns,
+		struct cgroup *cgrp, char *buf, size_t buflen);
+
+extern char * __must_check cgroup_path(struct cgroup *cgrp, char *buf,
+		size_t buflen);
 
 /**
  * css_get - obtain a reference on the specified css
@@ -522,12 +531,6 @@ static inline int cgroup_name(struct cgroup *cgrp, char *buf, size_t buflen)
 	return kernfs_name(cgrp->kn, buf, buflen);
 }
 
-static inline char * __must_check cgroup_path(struct cgroup *cgrp, char *buf,
-					      size_t buflen)
-{
-	return kernfs_path(cgrp->kn, buf, buflen);
-}
-
 static inline void pr_cont_cgroup_name(struct cgroup *cgrp)
 {
 	pr_cont_kernfs_name(cgrp->kn);
diff --git a/include/linux/cgroup_namespace.h b/include/linux/cgroup_namespace.h
new file mode 100644
index 0000000..ed181c3
--- /dev/null
+++ b/include/linux/cgroup_namespace.h
@@ -0,0 +1,46 @@
+#ifndef _LINUX_CGROUP_NAMESPACE_H
+#define _LINUX_CGROUP_NAMESPACE_H
+
+#include <linux/nsproxy.h>
+#include <linux/cgroup.h>
+#include <linux/types.h>
+#include <linux/user_namespace.h>
+
+struct css_set;
+struct cgroup_namespace {
+	atomic_t		count;
+	struct ns_common	ns;
+	struct user_namespace	*user_ns;
+	struct css_set          *root_cgrps;
+};
+
+extern struct cgroup_namespace init_cgroup_ns;
+
+static inline struct cgroup_namespace *get_cgroup_ns(
+		struct cgroup_namespace *ns)
+{
+	if (ns)
+		atomic_inc(&ns->count);
+	return ns;
+}
+
+#ifdef CONFIG_CGROUPS
+extern void free_cgroup_ns(struct cgroup_namespace *ns);
+extern struct cgroup_namespace *copy_cgroup_ns(unsigned long flags,
+					       struct user_namespace *user_ns,
+					       struct cgroup_namespace *old_ns);
+#else /* CONFIG_CGROUP */
+static inline void free_cgroup_ns(struct cgroup_namespace *ns) { }
+static inline struct cgroup_namespace *copy_cgroup_ns(unsigned long flags,
+					struct user_namespace *user_ns,
+					struct cgroup_namespace *old_ns)
+{ return old_ns; }
+#endif
+
+static inline void put_cgroup_ns(struct cgroup_namespace *ns)
+{
+	if (ns && atomic_dec_and_test(&ns->count))
+		free_cgroup_ns(ns);
+}
+
+#endif  /* _LINUX_CGROUP_NAMESPACE_H */
diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h
index 35fa08f..ac0d65b 100644
--- a/include/linux/nsproxy.h
+++ b/include/linux/nsproxy.h
@@ -8,6 +8,7 @@ struct mnt_namespace;
 struct uts_namespace;
 struct ipc_namespace;
 struct pid_namespace;
+struct cgroup_namespace;
 struct fs_struct;
 
 /*
@@ -33,6 +34,7 @@ struct nsproxy {
 	struct mnt_namespace *mnt_ns;
 	struct pid_namespace *pid_ns_for_children;
 	struct net 	     *net_ns;
+	struct cgroup_namespace *cgroup_ns;
 };
 extern struct nsproxy init_nsproxy;
 
diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h
index 42dfc61..de0e771 100644
--- a/include/linux/proc_ns.h
+++ b/include/linux/proc_ns.h
@@ -9,6 +9,8 @@
 struct pid_namespace;
 struct nsproxy;
 struct path;
+struct task_struct;
+struct inode;
 
 struct proc_ns_operations {
 	const char *name;
@@ -24,6 +26,7 @@ extern const struct proc_ns_operations ipcns_operations;
 extern const struct proc_ns_operations pidns_operations;
 extern const struct proc_ns_operations userns_operations;
 extern const struct proc_ns_operations mntns_operations;
+extern const struct proc_ns_operations cgroupns_operations;
 
 /*
  * We always define these enumerators
@@ -34,6 +37,7 @@ enum {
 	PROC_UTS_INIT_INO	= 0xEFFFFFFEU,
 	PROC_USER_INIT_INO	= 0xEFFFFFFDU,
 	PROC_PID_INIT_INO	= 0xEFFFFFFCU,
+	PROC_CGROUP_INIT_INO	= 0xEFFFFFFBU,
 };
 
 #ifdef CONFIG_PROC_FS
diff --git a/kernel/Makefile b/kernel/Makefile
index 53abf00..1dce664 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -54,7 +54,7 @@ obj-$(CONFIG_KEXEC) += kexec.o
 obj-$(CONFIG_KEXEC_FILE) += kexec_file.o
 obj-$(CONFIG_BACKTRACE_SELF_TEST) += backtracetest.o
 obj-$(CONFIG_COMPAT) += compat.o
-obj-$(CONFIG_CGROUPS) += cgroup.o
+obj-$(CONFIG_CGROUPS) += cgroup.o cgroup_namespace.o
 obj-$(CONFIG_CGROUP_FREEZER) += cgroup_freezer.o
 obj-$(CONFIG_CGROUP_PIDS) += cgroup_pids.o
 obj-$(CONFIG_CPUSETS) += cpuset.o
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index e972259..1d696de 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -57,6 +57,8 @@
 #include <linux/vmalloc.h> /* TODO: replace with more sophisticated array */
 #include <linux/kthread.h>
 #include <linux/delay.h>
+#include <linux/proc_ns.h>
+#include <linux/cgroup_namespace.h>
 
 #include <linux/atomic.h>
 
@@ -290,6 +292,15 @@ static bool cgroup_on_dfl(const struct cgroup *cgrp)
 {
 	return cgrp->root == &cgrp_dfl_root;
 }
+struct cgroup_namespace init_cgroup_ns = {
+	.count = {
+		.counter = 1,
+	},
+	.user_ns = &init_user_ns,
+	.ns.ops = &cgroupns_operations,
+	.ns.inum = PROC_CGROUP_INIT_INO,
+	.root_cgrps = &init_css_set,
+};
 
 /* IDR wrappers which synchronize using cgroup_idr_lock */
 static int cgroup_idr_alloc(struct idr *idr, void *ptr, int start, int end,
@@ -749,7 +760,7 @@ static void put_css_set_locked(struct css_set *cset)
 	kfree_rcu(cset, rcu_head);
 }
 
-static void put_css_set(struct css_set *cset)
+void put_css_set(struct css_set *cset)
 {
 	/*
 	 * Ensure that the refcount doesn't hit zero while any readers
@@ -767,7 +778,7 @@ static void put_css_set(struct css_set *cset)
 /*
  * refcounted get/put for css_set objects
  */
-static inline void get_css_set(struct css_set *cset)
+void get_css_set(struct css_set *cset)
 {
 	atomic_inc(&cset->refcount);
 }
@@ -2148,6 +2159,28 @@ static struct file_system_type cgroup_fs_type = {
 	.kill_sb = cgroup_kill_sb,
 };
 
+char * __must_check cgroup_path_ns(struct cgroup_namespace *ns,
+						 struct cgroup *cgrp, char *buf,
+						 size_t buflen)
+{
+	if (ns) {
+		struct cgroup *root;
+		root = cset_cgroup_from_root(ns->root_cgrps, cgrp->root);
+		return kernfs_path_from_node(root->kn, cgrp->kn, buf,
+					     buflen);
+	} else {
+		return kernfs_path(cgrp->kn, buf, buflen);
+	}
+}
+
+char * __must_check cgroup_path(struct cgroup *cgrp, char *buf,
+					      size_t buflen)
+{
+	return cgroup_path_ns(current->nsproxy->cgroup_ns, cgrp, buf,
+				      buflen);
+}
+EXPORT_SYMBOL_GPL(cgroup_path);
+
 /**
  * task_cgroup_path - cgroup path of a task in the first cgroup hierarchy
  * @task: target task
@@ -5237,6 +5270,8 @@ int __init cgroup_init(void)
 	BUG_ON(cgroup_init_cftypes(NULL, cgroup_dfl_base_files));
 	BUG_ON(cgroup_init_cftypes(NULL, cgroup_legacy_base_files));
 
+	get_user_ns(init_cgroup_ns.user_ns);
+
 	mutex_lock(&cgroup_mutex);
 
 	/* Add init_css_set to the hash table */
diff --git a/kernel/cgroup_namespace.c b/kernel/cgroup_namespace.c
new file mode 100644
index 0000000..ef20777
--- /dev/null
+++ b/kernel/cgroup_namespace.c
@@ -0,0 +1,127 @@
+/*
+ *  Copyright (C) 2014 Google Inc.
+ *
+ *  Author: Aditya Kali (adityakali@google.com)
+ *
+ *  This program is free software; you can redistribute it and/or modify it
+ *  under the terms of the GNU General Public License as published by the Free
+ *  Software Foundation, version 2 of the License.
+ */
+
+#include <linux/cgroup.h>
+#include <linux/cgroup_namespace.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/nsproxy.h>
+#include <linux/proc_ns.h>
+
+const struct proc_ns_operations cgroupns_operations;
+
+static struct cgroup_namespace *alloc_cgroup_ns(void)
+{
+	struct cgroup_namespace *new_ns;
+	int ret;
+
+	new_ns = kzalloc(sizeof(struct cgroup_namespace), GFP_KERNEL);
+	if (!new_ns)
+		return ERR_PTR(-ENOMEM);
+	ret = ns_alloc_inum(&new_ns->ns);
+	if (ret) {
+		kfree(new_ns);
+		return ERR_PTR(ret);
+	}
+	atomic_set(&new_ns->count, 1);
+	new_ns->ns.ops = &cgroupns_operations;
+	return new_ns;
+}
+
+extern void put_css_set(struct css_set *cset);
+extern  void get_css_set(struct css_set *cset);
+
+void free_cgroup_ns(struct cgroup_namespace *ns)
+{
+	put_css_set(ns->root_cgrps);
+	put_user_ns(ns->user_ns);
+	ns_free_inum(&ns->ns);
+	kfree(ns);
+}
+EXPORT_SYMBOL(free_cgroup_ns);
+
+struct cgroup_namespace *copy_cgroup_ns(unsigned long flags,
+					struct user_namespace *user_ns,
+					struct cgroup_namespace *old_ns)
+{
+	struct cgroup_namespace *new_ns = NULL;
+	struct css_set *cgrps = NULL;
+	int err;
+
+	BUG_ON(!old_ns);
+
+	if (!(flags & CLONE_NEWCGROUP))
+		return get_cgroup_ns(old_ns);
+
+	/* Allow only sysadmin to create cgroup namespace. */
+	err = -EPERM;
+	if (!ns_capable(user_ns, CAP_SYS_ADMIN))
+		goto err_out;
+
+	cgrps = task_css_set(current);
+	get_css_set(cgrps);
+
+	err = -ENOMEM;
+	new_ns = alloc_cgroup_ns();
+	if (!new_ns)
+		goto err_out;
+
+	new_ns->user_ns = get_user_ns(user_ns);
+	new_ns->root_cgrps = cgrps;
+
+	return new_ns;
+
+err_out:
+	if (cgrps)
+		put_css_set(cgrps);
+	kfree(new_ns);
+	return ERR_PTR(err);
+}
+
+static int cgroupns_install(struct nsproxy *nsproxy, void *ns)
+{
+	pr_info("setns not supported for cgroup namespace");
+	return -EINVAL;
+}
+
+static struct ns_common *cgroupns_get(struct task_struct *task)
+{
+	struct cgroup_namespace *ns = NULL;
+	struct nsproxy *nsproxy;
+
+	task_lock(task);
+	nsproxy = task->nsproxy;
+	if (nsproxy) {
+		ns = nsproxy->cgroup_ns;
+		get_cgroup_ns(ns);
+	}
+	task_unlock(task);
+
+	return ns ? &ns->ns : NULL;
+}
+
+static void cgroupns_put(struct ns_common *ns)
+{
+	put_cgroup_ns(to_cg_ns(ns));
+}
+
+const struct proc_ns_operations cgroupns_operations = {
+	.name		= "cgroup",
+	.type		= CLONE_NEWCGROUP,
+	.get		= cgroupns_get,
+	.put		= cgroupns_put,
+	.install	= cgroupns_install,
+};
+
+static __init int cgroup_namespaces_init(void)
+{
+	return 0;
+}
+subsys_initcall(cgroup_namespaces_init);
diff --git a/kernel/fork.c b/kernel/fork.c
index f97f2c4..c16e6a3 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1884,7 +1884,7 @@ static int check_unshare_flags(unsigned long unshare_flags)
 	if (unshare_flags & ~(CLONE_THREAD|CLONE_FS|CLONE_NEWNS|CLONE_SIGHAND|
 				CLONE_VM|CLONE_FILES|CLONE_SYSVSEM|
 				CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWNET|
-				CLONE_NEWUSER|CLONE_NEWPID))
+				CLONE_NEWUSER|CLONE_NEWPID|CLONE_NEWCGROUP))
 		return -EINVAL;
 	/*
 	 * Not implemented, but pretend it works if there is nothing
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index 49746c8..7f51796 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -25,6 +25,7 @@
 #include <linux/proc_ns.h>
 #include <linux/file.h>
 #include <linux/syscalls.h>
+#include <linux/cgroup_namespace.h>
 
 static struct kmem_cache *nsproxy_cachep;
 
@@ -39,6 +40,9 @@ struct nsproxy init_nsproxy = {
 #ifdef CONFIG_NET
 	.net_ns			= &init_net,
 #endif
+#ifdef CONFIG_CGROUPS
+	.cgroup_ns		= &init_cgroup_ns,
+#endif
 };
 
 static inline struct nsproxy *create_nsproxy(void)
@@ -92,6 +96,13 @@ static struct nsproxy *create_new_namespaces(unsigned long flags,
 		goto out_pid;
 	}
 
+	new_nsp->cgroup_ns = copy_cgroup_ns(flags, user_ns,
+					    tsk->nsproxy->cgroup_ns);
+	if (IS_ERR(new_nsp->cgroup_ns)) {
+		err = PTR_ERR(new_nsp->cgroup_ns);
+		goto out_cgroup;
+	}
+
 	new_nsp->net_ns = copy_net_ns(flags, user_ns, tsk->nsproxy->net_ns);
 	if (IS_ERR(new_nsp->net_ns)) {
 		err = PTR_ERR(new_nsp->net_ns);
@@ -101,6 +112,9 @@ static struct nsproxy *create_new_namespaces(unsigned long flags,
 	return new_nsp;
 
 out_net:
+	if (new_nsp->cgroup_ns)
+		put_cgroup_ns(new_nsp->cgroup_ns);
+out_cgroup:
 	if (new_nsp->pid_ns_for_children)
 		put_pid_ns(new_nsp->pid_ns_for_children);
 out_pid:
@@ -128,7 +142,8 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
 	struct nsproxy *new_ns;
 
 	if (likely(!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
-			      CLONE_NEWPID | CLONE_NEWNET)))) {
+			      CLONE_NEWPID | CLONE_NEWNET |
+			      CLONE_NEWCGROUP)))) {
 		get_nsproxy(old_ns);
 		return 0;
 	}
@@ -165,6 +180,8 @@ void free_nsproxy(struct nsproxy *ns)
 		put_ipc_ns(ns->ipc_ns);
 	if (ns->pid_ns_for_children)
 		put_pid_ns(ns->pid_ns_for_children);
+	if (ns->cgroup_ns)
+		put_cgroup_ns(ns->cgroup_ns);
 	put_net(ns->net_ns);
 	kmem_cache_free(nsproxy_cachep, ns);
 }
@@ -180,7 +197,7 @@ int unshare_nsproxy_namespaces(unsigned long unshare_flags,
 	int err = 0;
 
 	if (!(unshare_flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
-			       CLONE_NEWNET | CLONE_NEWPID)))
+			       CLONE_NEWNET | CLONE_NEWPID | CLONE_NEWCGROUP)))
 		return 0;
 
 	user_ns = new_cred ? new_cred->user_ns : current_user_ns();
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 6/8] cgroup: cgroup namespace setns support
  2015-11-16 19:51 ` serge
@ 2015-11-16 19:51     ` serge
  -1 siblings, 0 replies; 180+ messages in thread
From: serge-A9i7LUbDfNHQT0dZR+AlfA @ 2015-11-16 19:51 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	tj-DgEjT+Ai2ygdnm+yROfE0A, cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

From: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>

setns on a cgroup namespace is allowed only if
task has CAP_SYS_ADMIN in its current user-namespace and
over the user-namespace associated with target cgroupns.
No implicit cgroup changes happen with attaching to another
cgroupns. It is expected that the somone moves the attaching
process under the target cgroupns-root.

Signed-off-by: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Acked-by: Serge E. Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
---
 kernel/cgroup_namespace.c |   23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/kernel/cgroup_namespace.c b/kernel/cgroup_namespace.c
index ef20777..9651478 100644
--- a/kernel/cgroup_namespace.c
+++ b/kernel/cgroup_namespace.c
@@ -85,10 +85,27 @@ err_out:
 	return ERR_PTR(err);
 }
 
-static int cgroupns_install(struct nsproxy *nsproxy, void *ns)
+static inline struct cgroup_namespace *to_cg_ns(struct ns_common *ns) {
+	return container_of(ns, struct cgroup_namespace, ns);
+}
+
+static int cgroupns_install(struct nsproxy *nsproxy, struct ns_common *ns)
 {
-	pr_info("setns not supported for cgroup namespace");
-	return -EINVAL;
+	struct cgroup_namespace *cgroup_ns = to_cg_ns(ns);
+
+	if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN) ||
+	    !ns_capable(cgroup_ns->user_ns, CAP_SYS_ADMIN))
+		return -EPERM;
+
+	/* Don't need to do anything if we are attaching to our own cgroupns. */
+	if (cgroup_ns == nsproxy->cgroup_ns)
+		return 0;
+
+	get_cgroup_ns(cgroup_ns);
+	put_cgroup_ns(nsproxy->cgroup_ns);
+	nsproxy->cgroup_ns = cgroup_ns;
+
+	return 0;
 }
 
 static struct ns_common *cgroupns_get(struct task_struct *task)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 6/8] cgroup: cgroup namespace setns support
@ 2015-11-16 19:51     ` serge
  0 siblings, 0 replies; 180+ messages in thread
From: serge @ 2015-11-16 19:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: adityakali, tj, linux-api, containers, cgroups, lxc-devel, akpm,
	ebiederm

From: Aditya Kali <adityakali@google.com>

setns on a cgroup namespace is allowed only if
task has CAP_SYS_ADMIN in its current user-namespace and
over the user-namespace associated with target cgroupns.
No implicit cgroup changes happen with attaching to another
cgroupns. It is expected that the somone moves the attaching
process under the target cgroupns-root.

Signed-off-by: Aditya Kali <adityakali@google.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
---
 kernel/cgroup_namespace.c |   23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/kernel/cgroup_namespace.c b/kernel/cgroup_namespace.c
index ef20777..9651478 100644
--- a/kernel/cgroup_namespace.c
+++ b/kernel/cgroup_namespace.c
@@ -85,10 +85,27 @@ err_out:
 	return ERR_PTR(err);
 }
 
-static int cgroupns_install(struct nsproxy *nsproxy, void *ns)
+static inline struct cgroup_namespace *to_cg_ns(struct ns_common *ns) {
+	return container_of(ns, struct cgroup_namespace, ns);
+}
+
+static int cgroupns_install(struct nsproxy *nsproxy, struct ns_common *ns)
 {
-	pr_info("setns not supported for cgroup namespace");
-	return -EINVAL;
+	struct cgroup_namespace *cgroup_ns = to_cg_ns(ns);
+
+	if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN) ||
+	    !ns_capable(cgroup_ns->user_ns, CAP_SYS_ADMIN))
+		return -EPERM;
+
+	/* Don't need to do anything if we are attaching to our own cgroupns. */
+	if (cgroup_ns == nsproxy->cgroup_ns)
+		return 0;
+
+	get_cgroup_ns(cgroup_ns);
+	put_cgroup_ns(nsproxy->cgroup_ns);
+	nsproxy->cgroup_ns = cgroup_ns;
+
+	return 0;
 }
 
 static struct ns_common *cgroupns_get(struct task_struct *task)
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
  2015-11-16 19:51 ` serge
@ 2015-11-16 19:51     ` serge
  -1 siblings, 0 replies; 180+ messages in thread
From: serge-A9i7LUbDfNHQT0dZR+AlfA @ 2015-11-16 19:51 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	tj-DgEjT+Ai2ygdnm+yROfE0A, cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

From: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>

This patch enables cgroup mounting inside userns when a process
as appropriate privileges. The cgroup filesystem mounted is
rooted at the cgroupns-root. Thus, in a container-setup, only
the hierarchy under the cgroupns-root is exposed inside the container.
This allows container management tools to run inside the containers
without depending on any global state.
In order to support this, a new kernfs api is added to lookup the
dentry for the cgroupns-root.

Signed-off-by: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Acked-by: Serge E. Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
---
 fs/kernfs/mount.c      |   48 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/kernfs.h |    2 ++
 kernel/cgroup.c        |   32 +++++++++++++++++++++++++++++++-
 3 files changed, 81 insertions(+), 1 deletion(-)

diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index 8eaf417..64613864 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -62,6 +62,54 @@ struct kernfs_root *kernfs_root_from_sb(struct super_block *sb)
 	return NULL;
 }
 
+/**
+ * kernfs_obtain_root - get a dentry for the given kernfs_node
+ * @sb: the kernfs super_block
+ * @kn: kernfs_node for which a dentry is needed
+ *
+ * This can used used by callers which want to mount only a part of the kernfs
+ * as root of the filesystem.
+ */
+struct dentry *kernfs_obtain_root(struct super_block *sb,
+				  struct kernfs_node *kn)
+{
+	struct dentry *dentry;
+	struct inode *inode;
+
+	BUG_ON(sb->s_op != &kernfs_sops);
+
+	/* inode for the given kernfs_node should already exist. */
+	inode = ilookup(sb, kn->ino);
+	if (!inode) {
+		pr_debug("kernfs: could not get inode for '");
+		pr_cont_kernfs_path(kn);
+		pr_cont("'.\n");
+		return ERR_PTR(-EINVAL);
+	}
+
+	/* instantiate and link root dentry */
+	dentry = d_obtain_root(inode);
+	if (!dentry) {
+		pr_debug("kernfs: could not get dentry for '");
+		pr_cont_kernfs_path(kn);
+		pr_cont("'.\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	/* If this is a new dentry, set it up. We need kernfs_mutex because this
+	 * may be called by callers other than kernfs_fill_super. */
+	mutex_lock(&kernfs_mutex);
+	if (!dentry->d_fsdata) {
+		kernfs_get(kn);
+		dentry->d_fsdata = kn;
+	} else {
+		WARN_ON(dentry->d_fsdata != kn);
+	}
+	mutex_unlock(&kernfs_mutex);
+
+	return dentry;
+}
+
 static int kernfs_fill_super(struct super_block *sb, unsigned long magic)
 {
 	struct kernfs_super_info *info = kernfs_info(sb);
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index d025ebd..1903777 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -284,6 +284,8 @@ struct kernfs_node *kernfs_node_from_dentry(struct dentry *dentry);
 struct kernfs_root *kernfs_root_from_sb(struct super_block *sb);
 struct inode *kernfs_get_inode(struct super_block *sb, struct kernfs_node *kn);
 
+struct dentry *kernfs_obtain_root(struct super_block *sb,
+				  struct kernfs_node *kn);
 struct kernfs_root *kernfs_create_root(struct kernfs_syscall_ops *scops,
 				       unsigned int flags, void *priv);
 void kernfs_destroy_root(struct kernfs_root *root);
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 1d696de..0a3e893 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1980,6 +1980,14 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
 	int ret;
 	int i;
 	bool new_sb;
+	struct cgroup_namespace *ns =
+		get_cgroup_ns(current->nsproxy->cgroup_ns);
+
+	/* Check if the caller has permission to mount. */
+	if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN)) {
+		put_cgroup_ns(ns);
+		return ERR_PTR(-EPERM);
+	}
 
 	/*
 	 * The first time anyone tries to mount a cgroup, enable the list
@@ -2112,11 +2120,31 @@ out_free:
 	kfree(opts.release_agent);
 	kfree(opts.name);
 
-	if (ret)
+	if (ret) {
+		put_cgroup_ns(ns);
 		return ERR_PTR(ret);
+	}
 
 	dentry = kernfs_mount(fs_type, flags, root->kf_root,
 				CGROUP_SUPER_MAGIC, &new_sb);
+
+	if (!IS_ERR(dentry)) {
+		/* In non-init cgroup namespace, instead of root cgroup's
+		 * dentry, we return the dentry corresponding to the
+		 * cgroupns->root_cgrp.
+		 */
+		if (ns != &init_cgroup_ns) {
+			struct dentry *nsdentry;
+			struct cgroup *cgrp;
+
+			cgrp = cset_cgroup_from_root(ns->root_cgrps, root);
+			nsdentry = kernfs_obtain_root(dentry->d_sb,
+				cgrp->kn);
+			dput(dentry);
+			dentry = nsdentry;
+		}
+	}
+
 	if (IS_ERR(dentry) || !new_sb)
 		cgroup_put(&root->cgrp);
 
@@ -2129,6 +2157,7 @@ out_free:
 		deactivate_super(pinned_sb);
 	}
 
+	put_cgroup_ns(ns);
 	return dentry;
 }
 
@@ -2157,6 +2186,7 @@ static struct file_system_type cgroup_fs_type = {
 	.name = "cgroup",
 	.mount = cgroup_mount,
 	.kill_sb = cgroup_kill_sb,
+	.fs_flags = FS_USERNS_MOUNT,
 };
 
 char * __must_check cgroup_path_ns(struct cgroup_namespace *ns,
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
@ 2015-11-16 19:51     ` serge
  0 siblings, 0 replies; 180+ messages in thread
From: serge @ 2015-11-16 19:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: adityakali, tj, linux-api, containers, cgroups, lxc-devel, akpm,
	ebiederm

From: Aditya Kali <adityakali@google.com>

This patch enables cgroup mounting inside userns when a process
as appropriate privileges. The cgroup filesystem mounted is
rooted at the cgroupns-root. Thus, in a container-setup, only
the hierarchy under the cgroupns-root is exposed inside the container.
This allows container management tools to run inside the containers
without depending on any global state.
In order to support this, a new kernfs api is added to lookup the
dentry for the cgroupns-root.

Signed-off-by: Aditya Kali <adityakali@google.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
---
 fs/kernfs/mount.c      |   48 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/kernfs.h |    2 ++
 kernel/cgroup.c        |   32 +++++++++++++++++++++++++++++++-
 3 files changed, 81 insertions(+), 1 deletion(-)

diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index 8eaf417..64613864 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -62,6 +62,54 @@ struct kernfs_root *kernfs_root_from_sb(struct super_block *sb)
 	return NULL;
 }
 
+/**
+ * kernfs_obtain_root - get a dentry for the given kernfs_node
+ * @sb: the kernfs super_block
+ * @kn: kernfs_node for which a dentry is needed
+ *
+ * This can used used by callers which want to mount only a part of the kernfs
+ * as root of the filesystem.
+ */
+struct dentry *kernfs_obtain_root(struct super_block *sb,
+				  struct kernfs_node *kn)
+{
+	struct dentry *dentry;
+	struct inode *inode;
+
+	BUG_ON(sb->s_op != &kernfs_sops);
+
+	/* inode for the given kernfs_node should already exist. */
+	inode = ilookup(sb, kn->ino);
+	if (!inode) {
+		pr_debug("kernfs: could not get inode for '");
+		pr_cont_kernfs_path(kn);
+		pr_cont("'.\n");
+		return ERR_PTR(-EINVAL);
+	}
+
+	/* instantiate and link root dentry */
+	dentry = d_obtain_root(inode);
+	if (!dentry) {
+		pr_debug("kernfs: could not get dentry for '");
+		pr_cont_kernfs_path(kn);
+		pr_cont("'.\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	/* If this is a new dentry, set it up. We need kernfs_mutex because this
+	 * may be called by callers other than kernfs_fill_super. */
+	mutex_lock(&kernfs_mutex);
+	if (!dentry->d_fsdata) {
+		kernfs_get(kn);
+		dentry->d_fsdata = kn;
+	} else {
+		WARN_ON(dentry->d_fsdata != kn);
+	}
+	mutex_unlock(&kernfs_mutex);
+
+	return dentry;
+}
+
 static int kernfs_fill_super(struct super_block *sb, unsigned long magic)
 {
 	struct kernfs_super_info *info = kernfs_info(sb);
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index d025ebd..1903777 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -284,6 +284,8 @@ struct kernfs_node *kernfs_node_from_dentry(struct dentry *dentry);
 struct kernfs_root *kernfs_root_from_sb(struct super_block *sb);
 struct inode *kernfs_get_inode(struct super_block *sb, struct kernfs_node *kn);
 
+struct dentry *kernfs_obtain_root(struct super_block *sb,
+				  struct kernfs_node *kn);
 struct kernfs_root *kernfs_create_root(struct kernfs_syscall_ops *scops,
 				       unsigned int flags, void *priv);
 void kernfs_destroy_root(struct kernfs_root *root);
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 1d696de..0a3e893 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1980,6 +1980,14 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
 	int ret;
 	int i;
 	bool new_sb;
+	struct cgroup_namespace *ns =
+		get_cgroup_ns(current->nsproxy->cgroup_ns);
+
+	/* Check if the caller has permission to mount. */
+	if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN)) {
+		put_cgroup_ns(ns);
+		return ERR_PTR(-EPERM);
+	}
 
 	/*
 	 * The first time anyone tries to mount a cgroup, enable the list
@@ -2112,11 +2120,31 @@ out_free:
 	kfree(opts.release_agent);
 	kfree(opts.name);
 
-	if (ret)
+	if (ret) {
+		put_cgroup_ns(ns);
 		return ERR_PTR(ret);
+	}
 
 	dentry = kernfs_mount(fs_type, flags, root->kf_root,
 				CGROUP_SUPER_MAGIC, &new_sb);
+
+	if (!IS_ERR(dentry)) {
+		/* In non-init cgroup namespace, instead of root cgroup's
+		 * dentry, we return the dentry corresponding to the
+		 * cgroupns->root_cgrp.
+		 */
+		if (ns != &init_cgroup_ns) {
+			struct dentry *nsdentry;
+			struct cgroup *cgrp;
+
+			cgrp = cset_cgroup_from_root(ns->root_cgrps, root);
+			nsdentry = kernfs_obtain_root(dentry->d_sb,
+				cgrp->kn);
+			dput(dentry);
+			dentry = nsdentry;
+		}
+	}
+
 	if (IS_ERR(dentry) || !new_sb)
 		cgroup_put(&root->cgrp);
 
@@ -2129,6 +2157,7 @@ out_free:
 		deactivate_super(pinned_sb);
 	}
 
+	put_cgroup_ns(ns);
 	return dentry;
 }
 
@@ -2157,6 +2186,7 @@ static struct file_system_type cgroup_fs_type = {
 	.name = "cgroup",
 	.mount = cgroup_mount,
 	.kill_sb = cgroup_kill_sb,
+	.fs_flags = FS_USERNS_MOUNT,
 };
 
 char * __must_check cgroup_path_ns(struct cgroup_namespace *ns,
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 8/8] cgroup: Add documentation for cgroup namespaces
  2015-11-16 19:51 ` serge
@ 2015-11-16 19:51     ` serge
  -1 siblings, 0 replies; 180+ messages in thread
From: serge-A9i7LUbDfNHQT0dZR+AlfA @ 2015-11-16 19:51 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	tj-DgEjT+Ai2ygdnm+yROfE0A, cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

From: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>

Signed-off-by: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
---
 Documentation/cgroups/namespace.txt |  142 +++++++++++++++++++++++++++++++++++
 1 file changed, 142 insertions(+)
 create mode 100644 Documentation/cgroups/namespace.txt

diff --git a/Documentation/cgroups/namespace.txt b/Documentation/cgroups/namespace.txt
new file mode 100644
index 0000000..a5b80e8
--- /dev/null
+++ b/Documentation/cgroups/namespace.txt
@@ -0,0 +1,142 @@
+			CGroup Namespaces
+
+CGroup Namespace provides a mechanism to virtualize the view of the
+/proc/<pid>/cgroup file. The CLONE_NEWCGROUP clone-flag can be used with
+clone() and unshare() syscalls to create a new cgroup namespace.
+The process running inside the cgroup namespace will have its /proc/<pid>/cgroup
+output restricted to cgroupns-root. cgroupns-root is the cgroup of the process
+at the time of creation of the cgroup namespace.
+
+Prior to CGroup Namespace, the /proc/<pid>/cgroup file used to show complete
+path of the cgroup of a process. In a container setup (where a set of cgroups
+and namespaces are intended to isolate processes), the /proc/<pid>/cgroup file
+may leak potential system level information to the isolated processes.
+
+For Example:
+  $ cat /proc/self/cgroup
+  0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/batchjobs/container_id1
+
+The path '/batchjobs/container_id1' can generally be considered as system-data
+and its desirable to not expose it to the isolated process.
+
+CGroup Namespaces can be used to restrict visibility of this path.
+For Example:
+  # Before creating cgroup namespace
+  $ ls -l /proc/self/ns/cgroup
+  lrwxrwxrwx 1 root root 0 2014-07-15 10:37 /proc/self/ns/cgroup -> cgroup:[4026531835]
+  $ cat /proc/self/cgroup
+  0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/batchjobs/container_id1
+
+  # unshare(CLONE_NEWCGROUP) and exec /bin/bash
+  $ ~/unshare -c
+  [ns]$ ls -l /proc/self/ns/cgroup
+  lrwxrwxrwx 1 root root 0 2014-07-15 10:35 /proc/self/ns/cgroup -> cgroup:[4026532183]
+  # From within new cgroupns, process sees that its in the root cgroup
+  [ns]$ cat /proc/self/cgroup
+  0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/
+
+  # From global cgroupns:
+  $ cat /proc/<pid>/cgroup
+  0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/batchjobs/container_id1
+
+  # Unshare cgroupns along with userns and mountns
+  # Following calls unshare(CLONE_NEWCGROUP|CLONE_NEWUSER|CLONE_NEWNS), then
+  # sets up uid/gid map and execs /bin/bash
+  $ ~/unshare -c -u -m
+  # Originally, we were in /batchjobs/container_id1 cgroup. Mount our own cgroup
+  # hierarchy.
+  [ns]$ mount -t cgroup cgroup /tmp/cgroup
+  [ns]$ ls -l /tmp/cgroup
+  total 0
+  -r--r--r-- 1 root root 0 2014-10-13 09:32 cgroup.controllers
+  -r--r--r-- 1 root root 0 2014-10-13 09:32 cgroup.populated
+  -rw-r--r-- 1 root root 0 2014-10-13 09:25 cgroup.procs
+  -rw-r--r-- 1 root root 0 2014-10-13 09:32 cgroup.subtree_control
+
+The cgroupns-root (/batchjobs/container_id1 in above example) becomes the
+filesystem root for the namespace specific cgroupfs mount.
+
+The virtualization of /proc/self/cgroup file combined with restricting
+the view of cgroup hierarchy by namespace-private cgroupfs mount
+should provide a completely isolated cgroup view inside the container.
+
+In its current form, the cgroup namespaces patcheset provides following
+behavior:
+
+(1) The 'cgroupns-root' for a cgroup namespace is the cgroup in which
+    the process calling unshare is running.
+    For ex. if a process in /batchjobs/container_id1 cgroup calls unshare,
+    cgroup /batchjobs/container_id1 becomes the cgroupns-root.
+    For the init_cgroup_ns, this is the real root ('/') cgroup
+    (identified in code as cgrp_dfl_root.cgrp).
+
+(2) The cgroupns-root cgroup does not change even if the namespace
+    creator process later moves to a different cgroup.
+    $ ~/unshare -c # unshare cgroupns in some cgroup
+    [ns]$ cat /proc/self/cgroup
+    0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/
+    [ns]$ mkdir sub_cgrp_1
+    [ns]$ echo 0 > sub_cgrp_1/cgroup.procs
+    [ns]$ cat /proc/self/cgroup
+    0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/sub_cgrp_1
+
+(3) Each process gets its CGROUPNS specific view of /proc/<pid>/cgroup
+(a) Processes running inside the cgroup namespace will be able to see
+    cgroup paths (in /proc/self/cgroup) only inside their root cgroup
+    [ns]$ sleep 100000 &  # From within unshared cgroupns
+    [1] 7353
+    [ns]$ echo 7353 > sub_cgrp_1/cgroup.procs
+    [ns]$ cat /proc/7353/cgroup
+    0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/sub_cgrp_1
+
+(b) From global cgroupns, the real cgroup path will be visible:
+    $ cat /proc/7353/cgroup
+    0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/batchjobs/container_id1/sub_cgrp_1
+
+(c) From a sibling cgroupns (cgroupns root-ed at a different cgroup), cgroup
+    path relative to its own cgroupns-root will be shown:
+    # ns2's cgroupns-root is at '/batchjobs/container_id2'
+    [ns2]$ cat /proc/7353/cgroup
+    0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/../container_id2/sub_cgrp_1
+
+    Note that the relative path always starts with '/' to indicate that its
+    relative to the cgroupns-root of the caller.
+
+(4) Processes inside a cgroupns can move in-and-out of the cgroupns-root
+    (if they have proper access to external cgroups).
+    # From inside cgroupns (with cgroupns-root at /batchjobs/container_id1), and
+    # assuming that the global hierarchy is still accessible inside cgroupns:
+    $ cat /proc/7353/cgroup
+    0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/sub_cgrp_1
+    $ echo 7353 > batchjobs/container_id2/cgroup.procs
+    $ cat /proc/7353/cgroup
+    0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/../container_id2
+
+    Note that this kind of setup is not encouraged. A task inside cgroupns
+    should only be exposed to its own cgroupns hierarchy. Otherwise it makes
+    the virtualization of /proc/<pid>/cgroup less useful.
+
+(5) Setns to another cgroup namespace is allowed when:
+    (a) the process has CAP_SYS_ADMIN in its current userns
+    (b) the process has CAP_SYS_ADMIN in the target cgroupns' userns
+    No implicit cgroup changes happen with attaching to another cgroupns. It
+    is expected that the somone moves the attaching process under the target
+    cgroupns-root.
+
+(6) When some thread from a multi-threaded process unshares its
+    cgroup-namespace, the new cgroupns gets applied to the entire process (all
+    the threads). For the unified-hierarchy this is expected as it only allows
+    process-level containerization.  For the legacy hierarchies this may be
+    unexpected.  So all the threads in the process will have the same cgroup.
+
+(7) The cgroup namespace is alive as long as there is atleast 1
+    process inside it. When the last process exits, the cgroup
+    namespace is destroyed. The cgroupns-root and the actual cgroups
+    remain though.
+
+(8) Namespace specific cgroup hierarchy can be mounted by a process running
+    inside cgroupns:
+    $ mount -t cgroup -o __DEVEL__sane_behavior cgroup $MOUNT_POINT
+
+    This will mount the unified cgroup hierarchy with cgroupns-root as the
+    filesystem root. The process needs CAP_SYS_ADMIN in its userns and mntns.
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 8/8] cgroup: Add documentation for cgroup namespaces
@ 2015-11-16 19:51     ` serge
  0 siblings, 0 replies; 180+ messages in thread
From: serge @ 2015-11-16 19:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: adityakali, tj, linux-api, containers, cgroups, lxc-devel, akpm,
	ebiederm, Serge Hallyn

From: Aditya Kali <adityakali@google.com>

Signed-off-by: Aditya Kali <adityakali@google.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
---
 Documentation/cgroups/namespace.txt |  142 +++++++++++++++++++++++++++++++++++
 1 file changed, 142 insertions(+)
 create mode 100644 Documentation/cgroups/namespace.txt

diff --git a/Documentation/cgroups/namespace.txt b/Documentation/cgroups/namespace.txt
new file mode 100644
index 0000000..a5b80e8
--- /dev/null
+++ b/Documentation/cgroups/namespace.txt
@@ -0,0 +1,142 @@
+			CGroup Namespaces
+
+CGroup Namespace provides a mechanism to virtualize the view of the
+/proc/<pid>/cgroup file. The CLONE_NEWCGROUP clone-flag can be used with
+clone() and unshare() syscalls to create a new cgroup namespace.
+The process running inside the cgroup namespace will have its /proc/<pid>/cgroup
+output restricted to cgroupns-root. cgroupns-root is the cgroup of the process
+at the time of creation of the cgroup namespace.
+
+Prior to CGroup Namespace, the /proc/<pid>/cgroup file used to show complete
+path of the cgroup of a process. In a container setup (where a set of cgroups
+and namespaces are intended to isolate processes), the /proc/<pid>/cgroup file
+may leak potential system level information to the isolated processes.
+
+For Example:
+  $ cat /proc/self/cgroup
+  0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/batchjobs/container_id1
+
+The path '/batchjobs/container_id1' can generally be considered as system-data
+and its desirable to not expose it to the isolated process.
+
+CGroup Namespaces can be used to restrict visibility of this path.
+For Example:
+  # Before creating cgroup namespace
+  $ ls -l /proc/self/ns/cgroup
+  lrwxrwxrwx 1 root root 0 2014-07-15 10:37 /proc/self/ns/cgroup -> cgroup:[4026531835]
+  $ cat /proc/self/cgroup
+  0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/batchjobs/container_id1
+
+  # unshare(CLONE_NEWCGROUP) and exec /bin/bash
+  $ ~/unshare -c
+  [ns]$ ls -l /proc/self/ns/cgroup
+  lrwxrwxrwx 1 root root 0 2014-07-15 10:35 /proc/self/ns/cgroup -> cgroup:[4026532183]
+  # From within new cgroupns, process sees that its in the root cgroup
+  [ns]$ cat /proc/self/cgroup
+  0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/
+
+  # From global cgroupns:
+  $ cat /proc/<pid>/cgroup
+  0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/batchjobs/container_id1
+
+  # Unshare cgroupns along with userns and mountns
+  # Following calls unshare(CLONE_NEWCGROUP|CLONE_NEWUSER|CLONE_NEWNS), then
+  # sets up uid/gid map and execs /bin/bash
+  $ ~/unshare -c -u -m
+  # Originally, we were in /batchjobs/container_id1 cgroup. Mount our own cgroup
+  # hierarchy.
+  [ns]$ mount -t cgroup cgroup /tmp/cgroup
+  [ns]$ ls -l /tmp/cgroup
+  total 0
+  -r--r--r-- 1 root root 0 2014-10-13 09:32 cgroup.controllers
+  -r--r--r-- 1 root root 0 2014-10-13 09:32 cgroup.populated
+  -rw-r--r-- 1 root root 0 2014-10-13 09:25 cgroup.procs
+  -rw-r--r-- 1 root root 0 2014-10-13 09:32 cgroup.subtree_control
+
+The cgroupns-root (/batchjobs/container_id1 in above example) becomes the
+filesystem root for the namespace specific cgroupfs mount.
+
+The virtualization of /proc/self/cgroup file combined with restricting
+the view of cgroup hierarchy by namespace-private cgroupfs mount
+should provide a completely isolated cgroup view inside the container.
+
+In its current form, the cgroup namespaces patcheset provides following
+behavior:
+
+(1) The 'cgroupns-root' for a cgroup namespace is the cgroup in which
+    the process calling unshare is running.
+    For ex. if a process in /batchjobs/container_id1 cgroup calls unshare,
+    cgroup /batchjobs/container_id1 becomes the cgroupns-root.
+    For the init_cgroup_ns, this is the real root ('/') cgroup
+    (identified in code as cgrp_dfl_root.cgrp).
+
+(2) The cgroupns-root cgroup does not change even if the namespace
+    creator process later moves to a different cgroup.
+    $ ~/unshare -c # unshare cgroupns in some cgroup
+    [ns]$ cat /proc/self/cgroup
+    0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/
+    [ns]$ mkdir sub_cgrp_1
+    [ns]$ echo 0 > sub_cgrp_1/cgroup.procs
+    [ns]$ cat /proc/self/cgroup
+    0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/sub_cgrp_1
+
+(3) Each process gets its CGROUPNS specific view of /proc/<pid>/cgroup
+(a) Processes running inside the cgroup namespace will be able to see
+    cgroup paths (in /proc/self/cgroup) only inside their root cgroup
+    [ns]$ sleep 100000 &  # From within unshared cgroupns
+    [1] 7353
+    [ns]$ echo 7353 > sub_cgrp_1/cgroup.procs
+    [ns]$ cat /proc/7353/cgroup
+    0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/sub_cgrp_1
+
+(b) From global cgroupns, the real cgroup path will be visible:
+    $ cat /proc/7353/cgroup
+    0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/batchjobs/container_id1/sub_cgrp_1
+
+(c) From a sibling cgroupns (cgroupns root-ed at a different cgroup), cgroup
+    path relative to its own cgroupns-root will be shown:
+    # ns2's cgroupns-root is at '/batchjobs/container_id2'
+    [ns2]$ cat /proc/7353/cgroup
+    0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/../container_id2/sub_cgrp_1
+
+    Note that the relative path always starts with '/' to indicate that its
+    relative to the cgroupns-root of the caller.
+
+(4) Processes inside a cgroupns can move in-and-out of the cgroupns-root
+    (if they have proper access to external cgroups).
+    # From inside cgroupns (with cgroupns-root at /batchjobs/container_id1), and
+    # assuming that the global hierarchy is still accessible inside cgroupns:
+    $ cat /proc/7353/cgroup
+    0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/sub_cgrp_1
+    $ echo 7353 > batchjobs/container_id2/cgroup.procs
+    $ cat /proc/7353/cgroup
+    0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/../container_id2
+
+    Note that this kind of setup is not encouraged. A task inside cgroupns
+    should only be exposed to its own cgroupns hierarchy. Otherwise it makes
+    the virtualization of /proc/<pid>/cgroup less useful.
+
+(5) Setns to another cgroup namespace is allowed when:
+    (a) the process has CAP_SYS_ADMIN in its current userns
+    (b) the process has CAP_SYS_ADMIN in the target cgroupns' userns
+    No implicit cgroup changes happen with attaching to another cgroupns. It
+    is expected that the somone moves the attaching process under the target
+    cgroupns-root.
+
+(6) When some thread from a multi-threaded process unshares its
+    cgroup-namespace, the new cgroupns gets applied to the entire process (all
+    the threads). For the unified-hierarchy this is expected as it only allows
+    process-level containerization.  For the legacy hierarchies this may be
+    unexpected.  So all the threads in the process will have the same cgroup.
+
+(7) The cgroup namespace is alive as long as there is atleast 1
+    process inside it. When the last process exits, the cgroup
+    namespace is destroyed. The cgroupns-root and the actual cgroups
+    remain though.
+
+(8) Namespace specific cgroup hierarchy can be mounted by a process running
+    inside cgroupns:
+    $ mount -t cgroup -o __DEVEL__sane_behavior cgroup $MOUNT_POINT
+
+    This will mount the unified cgroup hierarchy with cgroupns-root as the
+    filesystem root. The process needs CAP_SYS_ADMIN in its userns and mntns.
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: CGroup Namespaces (v4)
       [not found] ` <1447703505-29672-1-git-send-email-serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
                     ` (7 preceding siblings ...)
  2015-11-16 19:51     ` serge
@ 2015-11-16 20:41   ` Richard Weinberger
  8 siblings, 0 replies; 180+ messages in thread
From: Richard Weinberger @ 2015-11-16 20:41 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: open list:ABI/API, Linux Containers, LKML, Eric W. Biederman,
	LXC development mailing-list, Tejun Heo, cgroups mailinglist,
	Andrew Morton

Serge,

On Mon, Nov 16, 2015 at 8:51 PM,  <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote:
> To summarize the semantics:
>
> 1. CLONE_NEWCGROUP re-uses 0x02000000, which was previously CLONE_STOPPED
>
> 2. unsharing a cgroup namespace makes all your current cgroups your new
> cgroup root.
>
> 3. /proc/pid/cgroup always shows cgroup paths relative to the reader's
> cgroup namespce root.  A task outside of  your cgroup looks like
>
>         8:memory:/../../..
>
> 4. when a task mounts a cgroupfs, the cgroup which shows up as root depends
> on the mounting task's  cgroup namespace.
>
> 5. setns to a cgroup namespace switches your cgroup namespace but not
> your cgroups.
>
> With this, using github.com/hallyn/lxc #2015-11-09/cgns (and
> github.com/hallyn/lxcfs #2015-11-10/cgns) we can start a container in a full
> proper cgroup namespace, avoiding either cgmanager or lxcfs cgroup bind mounts.
>
> This is completely backward compatible and will be completely invisible
> to any existing cgroup users (except for those running inside a cgroup
> namespace and looking at /proc/pid/cgroup of tasks outside their
> namespace.)
>    cgroupns-root.

IIRC one downside of this series was that only the new "sane" cgroup
layout was supported
and hence it was useless for everything which expected the default layout.
Hence, still no systemd for us. :)

Is this now different?

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: CGroup Namespaces (v4)
       [not found] ` <1447703505-29672-1-git-send-email-serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
@ 2015-11-16 20:41   ` Richard Weinberger
  2015-11-16 19:51   ` [PATCH 2/8] sched: new clone flag CLONE_NEWCGROUP for cgroup namespace serge-A9i7LUbDfNHQT0dZR+AlfA
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 180+ messages in thread
From: Richard Weinberger @ 2015-11-16 20:41 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: LKML, open list:ABI/API, Linux Containers, Eric W. Biederman,
	LXC development mailing-list, Tejun Heo, cgroups mailinglist,
	Andrew Morton

Serge,

On Mon, Nov 16, 2015 at 8:51 PM,  <serge@hallyn.com> wrote:
> To summarize the semantics:
>
> 1. CLONE_NEWCGROUP re-uses 0x02000000, which was previously CLONE_STOPPED
>
> 2. unsharing a cgroup namespace makes all your current cgroups your new
> cgroup root.
>
> 3. /proc/pid/cgroup always shows cgroup paths relative to the reader's
> cgroup namespce root.  A task outside of  your cgroup looks like
>
>         8:memory:/../../..
>
> 4. when a task mounts a cgroupfs, the cgroup which shows up as root depends
> on the mounting task's  cgroup namespace.
>
> 5. setns to a cgroup namespace switches your cgroup namespace but not
> your cgroups.
>
> With this, using github.com/hallyn/lxc #2015-11-09/cgns (and
> github.com/hallyn/lxcfs #2015-11-10/cgns) we can start a container in a full
> proper cgroup namespace, avoiding either cgmanager or lxcfs cgroup bind mounts.
>
> This is completely backward compatible and will be completely invisible
> to any existing cgroup users (except for those running inside a cgroup
> namespace and looking at /proc/pid/cgroup of tasks outside their
> namespace.)
>    cgroupns-root.

IIRC one downside of this series was that only the new "sane" cgroup
layout was supported
and hence it was useless for everything which expected the default layout.
Hence, still no systemd for us. :)

Is this now different?

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: CGroup Namespaces (v4)
@ 2015-11-16 20:41   ` Richard Weinberger
  0 siblings, 0 replies; 180+ messages in thread
From: Richard Weinberger @ 2015-11-16 20:41 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: LKML, open list:ABI/API, Linux Containers, Eric W. Biederman,
	LXC development mailing-list, Tejun Heo, cgroups mailinglist,
	Andrew Morton

Serge,

On Mon, Nov 16, 2015 at 8:51 PM,  <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote:
> To summarize the semantics:
>
> 1. CLONE_NEWCGROUP re-uses 0x02000000, which was previously CLONE_STOPPED
>
> 2. unsharing a cgroup namespace makes all your current cgroups your new
> cgroup root.
>
> 3. /proc/pid/cgroup always shows cgroup paths relative to the reader's
> cgroup namespce root.  A task outside of  your cgroup looks like
>
>         8:memory:/../../..
>
> 4. when a task mounts a cgroupfs, the cgroup which shows up as root depends
> on the mounting task's  cgroup namespace.
>
> 5. setns to a cgroup namespace switches your cgroup namespace but not
> your cgroups.
>
> With this, using github.com/hallyn/lxc #2015-11-09/cgns (and
> github.com/hallyn/lxcfs #2015-11-10/cgns) we can start a container in a full
> proper cgroup namespace, avoiding either cgmanager or lxcfs cgroup bind mounts.
>
> This is completely backward compatible and will be completely invisible
> to any existing cgroup users (except for those running inside a cgroup
> namespace and looking at /proc/pid/cgroup of tasks outside their
> namespace.)
>    cgroupns-root.

IIRC one downside of this series was that only the new "sane" cgroup
layout was supported
and hence it was useless for everything which expected the default layout.
Hence, still no systemd for us. :)

Is this now different?

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: CGroup Namespaces (v4)
       [not found]   ` <CAFLxGvzVmbZHrpaTmXUAK03hsnVPwEs3SJGNFNXfthh3NL8EDg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-11-16 20:46     ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-16 20:46 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: open list:ABI/API, Linux Containers, LKML, Eric W. Biederman,
	LXC development mailing-list, Tejun Heo, cgroups mailinglist,
	Andrew Morton

On Mon, Nov 16, 2015 at 09:41:15PM +0100, Richard Weinberger wrote:
> Serge,
> 
> On Mon, Nov 16, 2015 at 8:51 PM,  <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote:
> > To summarize the semantics:
> >
> > 1. CLONE_NEWCGROUP re-uses 0x02000000, which was previously CLONE_STOPPED
> >
> > 2. unsharing a cgroup namespace makes all your current cgroups your new
> > cgroup root.
> >
> > 3. /proc/pid/cgroup always shows cgroup paths relative to the reader's
> > cgroup namespce root.  A task outside of  your cgroup looks like
> >
> >         8:memory:/../../..
> >
> > 4. when a task mounts a cgroupfs, the cgroup which shows up as root depends
> > on the mounting task's  cgroup namespace.
> >
> > 5. setns to a cgroup namespace switches your cgroup namespace but not
> > your cgroups.
> >
> > With this, using github.com/hallyn/lxc #2015-11-09/cgns (and
> > github.com/hallyn/lxcfs #2015-11-10/cgns) we can start a container in a full
> > proper cgroup namespace, avoiding either cgmanager or lxcfs cgroup bind mounts.
> >
> > This is completely backward compatible and will be completely invisible
> > to any existing cgroup users (except for those running inside a cgroup
> > namespace and looking at /proc/pid/cgroup of tasks outside their
> > namespace.)
> >    cgroupns-root.
> 
> IIRC one downside of this series was that only the new "sane" cgroup
> layout was supported
> and hence it was useless for everything which expected the default layout.
> Hence, still no systemd for us. :)
> 
> Is this now different?

Yes, all hierachies are no supported.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: CGroup Namespaces (v4)
       [not found]   ` <CAFLxGvzVmbZHrpaTmXUAK03hsnVPwEs3SJGNFNXfthh3NL8EDg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-11-16 20:46     ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-16 20:46 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Serge E. Hallyn, LKML, open list:ABI/API, Linux Containers,
	Eric W. Biederman, LXC development mailing-list, Tejun Heo,
	cgroups mailinglist, Andrew Morton

On Mon, Nov 16, 2015 at 09:41:15PM +0100, Richard Weinberger wrote:
> Serge,
> 
> On Mon, Nov 16, 2015 at 8:51 PM,  <serge@hallyn.com> wrote:
> > To summarize the semantics:
> >
> > 1. CLONE_NEWCGROUP re-uses 0x02000000, which was previously CLONE_STOPPED
> >
> > 2. unsharing a cgroup namespace makes all your current cgroups your new
> > cgroup root.
> >
> > 3. /proc/pid/cgroup always shows cgroup paths relative to the reader's
> > cgroup namespce root.  A task outside of  your cgroup looks like
> >
> >         8:memory:/../../..
> >
> > 4. when a task mounts a cgroupfs, the cgroup which shows up as root depends
> > on the mounting task's  cgroup namespace.
> >
> > 5. setns to a cgroup namespace switches your cgroup namespace but not
> > your cgroups.
> >
> > With this, using github.com/hallyn/lxc #2015-11-09/cgns (and
> > github.com/hallyn/lxcfs #2015-11-10/cgns) we can start a container in a full
> > proper cgroup namespace, avoiding either cgmanager or lxcfs cgroup bind mounts.
> >
> > This is completely backward compatible and will be completely invisible
> > to any existing cgroup users (except for those running inside a cgroup
> > namespace and looking at /proc/pid/cgroup of tasks outside their
> > namespace.)
> >    cgroupns-root.
> 
> IIRC one downside of this series was that only the new "sane" cgroup
> layout was supported
> and hence it was useless for everything which expected the default layout.
> Hence, still no systemd for us. :)
> 
> Is this now different?

Yes, all hierachies are no supported.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: CGroup Namespaces (v4)
@ 2015-11-16 20:46     ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-16 20:46 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Serge E. Hallyn, LKML, open list:ABI/API, Linux Containers,
	Eric W. Biederman, LXC development mailing-list, Tejun Heo,
	cgroups mailinglist, Andrew Morton

On Mon, Nov 16, 2015 at 09:41:15PM +0100, Richard Weinberger wrote:
> Serge,
> 
> On Mon, Nov 16, 2015 at 8:51 PM,  <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote:
> > To summarize the semantics:
> >
> > 1. CLONE_NEWCGROUP re-uses 0x02000000, which was previously CLONE_STOPPED
> >
> > 2. unsharing a cgroup namespace makes all your current cgroups your new
> > cgroup root.
> >
> > 3. /proc/pid/cgroup always shows cgroup paths relative to the reader's
> > cgroup namespce root.  A task outside of  your cgroup looks like
> >
> >         8:memory:/../../..
> >
> > 4. when a task mounts a cgroupfs, the cgroup which shows up as root depends
> > on the mounting task's  cgroup namespace.
> >
> > 5. setns to a cgroup namespace switches your cgroup namespace but not
> > your cgroups.
> >
> > With this, using github.com/hallyn/lxc #2015-11-09/cgns (and
> > github.com/hallyn/lxcfs #2015-11-10/cgns) we can start a container in a full
> > proper cgroup namespace, avoiding either cgmanager or lxcfs cgroup bind mounts.
> >
> > This is completely backward compatible and will be completely invisible
> > to any existing cgroup users (except for those running inside a cgroup
> > namespace and looking at /proc/pid/cgroup of tasks outside their
> > namespace.)
> >    cgroupns-root.
> 
> IIRC one downside of this series was that only the new "sane" cgroup
> layout was supported
> and hence it was useless for everything which expected the default layout.
> Hence, still no systemd for us. :)
> 
> Is this now different?

Yes, all hierachies are no supported.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: CGroup Namespaces (v4)
       [not found]     ` <20151116204606.GA30681-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2015-11-16 20:50       ` Richard Weinberger
  0 siblings, 0 replies; 180+ messages in thread
From: Richard Weinberger @ 2015-11-16 20:50 UTC (permalink / raw)
  To: Serge E. Hallyn, Richard Weinberger
  Cc: open list:ABI/API, Linux Containers, LKML, Eric W. Biederman,
	LXC development mailing-list, Tejun Heo, cgroups mailinglist,
	Andrew Morton

Am 16.11.2015 um 21:46 schrieb Serge E. Hallyn:
> On Mon, Nov 16, 2015 at 09:41:15PM +0100, Richard Weinberger wrote:
>> Serge,
>>
>> On Mon, Nov 16, 2015 at 8:51 PM,  <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote:
>>> To summarize the semantics:
>>>
>>> 1. CLONE_NEWCGROUP re-uses 0x02000000, which was previously CLONE_STOPPED
>>>
>>> 2. unsharing a cgroup namespace makes all your current cgroups your new
>>> cgroup root.
>>>
>>> 3. /proc/pid/cgroup always shows cgroup paths relative to the reader's
>>> cgroup namespce root.  A task outside of  your cgroup looks like
>>>
>>>         8:memory:/../../..
>>>
>>> 4. when a task mounts a cgroupfs, the cgroup which shows up as root depends
>>> on the mounting task's  cgroup namespace.
>>>
>>> 5. setns to a cgroup namespace switches your cgroup namespace but not
>>> your cgroups.
>>>
>>> With this, using github.com/hallyn/lxc #2015-11-09/cgns (and
>>> github.com/hallyn/lxcfs #2015-11-10/cgns) we can start a container in a full
>>> proper cgroup namespace, avoiding either cgmanager or lxcfs cgroup bind mounts.
>>>
>>> This is completely backward compatible and will be completely invisible
>>> to any existing cgroup users (except for those running inside a cgroup
>>> namespace and looking at /proc/pid/cgroup of tasks outside their
>>> namespace.)
>>>    cgroupns-root.
>>
>> IIRC one downside of this series was that only the new "sane" cgroup
>> layout was supported
>> and hence it was useless for everything which expected the default layout.
>> Hence, still no systemd for us. :)
>>
>> Is this now different?
> 
> Yes, all hierachies are no supported.
> 

Should read "now"? :-)
If so, *awesome*!

Thanks,
//richard

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: CGroup Namespaces (v4)
       [not found]     ` <20151116204606.GA30681-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2015-11-16 20:50       ` Richard Weinberger
  0 siblings, 0 replies; 180+ messages in thread
From: Richard Weinberger @ 2015-11-16 20:50 UTC (permalink / raw)
  To: Serge E. Hallyn, Richard Weinberger
  Cc: LKML, open list:ABI/API, Linux Containers, Eric W. Biederman,
	LXC development mailing-list, Tejun Heo, cgroups mailinglist,
	Andrew Morton

Am 16.11.2015 um 21:46 schrieb Serge E. Hallyn:
> On Mon, Nov 16, 2015 at 09:41:15PM +0100, Richard Weinberger wrote:
>> Serge,
>>
>> On Mon, Nov 16, 2015 at 8:51 PM,  <serge@hallyn.com> wrote:
>>> To summarize the semantics:
>>>
>>> 1. CLONE_NEWCGROUP re-uses 0x02000000, which was previously CLONE_STOPPED
>>>
>>> 2. unsharing a cgroup namespace makes all your current cgroups your new
>>> cgroup root.
>>>
>>> 3. /proc/pid/cgroup always shows cgroup paths relative to the reader's
>>> cgroup namespce root.  A task outside of  your cgroup looks like
>>>
>>>         8:memory:/../../..
>>>
>>> 4. when a task mounts a cgroupfs, the cgroup which shows up as root depends
>>> on the mounting task's  cgroup namespace.
>>>
>>> 5. setns to a cgroup namespace switches your cgroup namespace but not
>>> your cgroups.
>>>
>>> With this, using github.com/hallyn/lxc #2015-11-09/cgns (and
>>> github.com/hallyn/lxcfs #2015-11-10/cgns) we can start a container in a full
>>> proper cgroup namespace, avoiding either cgmanager or lxcfs cgroup bind mounts.
>>>
>>> This is completely backward compatible and will be completely invisible
>>> to any existing cgroup users (except for those running inside a cgroup
>>> namespace and looking at /proc/pid/cgroup of tasks outside their
>>> namespace.)
>>>    cgroupns-root.
>>
>> IIRC one downside of this series was that only the new "sane" cgroup
>> layout was supported
>> and hence it was useless for everything which expected the default layout.
>> Hence, still no systemd for us. :)
>>
>> Is this now different?
> 
> Yes, all hierachies are no supported.
> 

Should read "now"? :-)
If so, *awesome*!

Thanks,
//richard

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: CGroup Namespaces (v4)
@ 2015-11-16 20:50       ` Richard Weinberger
  0 siblings, 0 replies; 180+ messages in thread
From: Richard Weinberger @ 2015-11-16 20:50 UTC (permalink / raw)
  To: Serge E. Hallyn, Richard Weinberger
  Cc: LKML, open list:ABI/API, Linux Containers, Eric W. Biederman,
	LXC development mailing-list, Tejun Heo, cgroups mailinglist,
	Andrew Morton

Am 16.11.2015 um 21:46 schrieb Serge E. Hallyn:
> On Mon, Nov 16, 2015 at 09:41:15PM +0100, Richard Weinberger wrote:
>> Serge,
>>
>> On Mon, Nov 16, 2015 at 8:51 PM,  <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote:
>>> To summarize the semantics:
>>>
>>> 1. CLONE_NEWCGROUP re-uses 0x02000000, which was previously CLONE_STOPPED
>>>
>>> 2. unsharing a cgroup namespace makes all your current cgroups your new
>>> cgroup root.
>>>
>>> 3. /proc/pid/cgroup always shows cgroup paths relative to the reader's
>>> cgroup namespce root.  A task outside of  your cgroup looks like
>>>
>>>         8:memory:/../../..
>>>
>>> 4. when a task mounts a cgroupfs, the cgroup which shows up as root depends
>>> on the mounting task's  cgroup namespace.
>>>
>>> 5. setns to a cgroup namespace switches your cgroup namespace but not
>>> your cgroups.
>>>
>>> With this, using github.com/hallyn/lxc #2015-11-09/cgns (and
>>> github.com/hallyn/lxcfs #2015-11-10/cgns) we can start a container in a full
>>> proper cgroup namespace, avoiding either cgmanager or lxcfs cgroup bind mounts.
>>>
>>> This is completely backward compatible and will be completely invisible
>>> to any existing cgroup users (except for those running inside a cgroup
>>> namespace and looking at /proc/pid/cgroup of tasks outside their
>>> namespace.)
>>>    cgroupns-root.
>>
>> IIRC one downside of this series was that only the new "sane" cgroup
>> layout was supported
>> and hence it was useless for everything which expected the default layout.
>> Hence, still no systemd for us. :)
>>
>> Is this now different?
> 
> Yes, all hierachies are no supported.
> 

Should read "now"? :-)
If so, *awesome*!

Thanks,
//richard

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: CGroup Namespaces (v4)
       [not found]       ` <564A41AF.4040208-/L3Ra7n9ekc@public.gmane.org>
@ 2015-11-16 20:54         ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-16 20:54 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: open list:ABI/API, Linux Containers, LKML, Eric W. Biederman,
	LXC development mailing-list, Tejun Heo, cgroups mailinglist,
	Andrew Morton

On Mon, Nov 16, 2015 at 09:50:55PM +0100, Richard Weinberger wrote:
> Am 16.11.2015 um 21:46 schrieb Serge E. Hallyn:
> > On Mon, Nov 16, 2015 at 09:41:15PM +0100, Richard Weinberger wrote:
> >> Serge,
> >>
> >> On Mon, Nov 16, 2015 at 8:51 PM,  <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote:
> >>> To summarize the semantics:
> >>>
> >>> 1. CLONE_NEWCGROUP re-uses 0x02000000, which was previously CLONE_STOPPED
> >>>
> >>> 2. unsharing a cgroup namespace makes all your current cgroups your new
> >>> cgroup root.
> >>>
> >>> 3. /proc/pid/cgroup always shows cgroup paths relative to the reader's
> >>> cgroup namespce root.  A task outside of  your cgroup looks like
> >>>
> >>>         8:memory:/../../..
> >>>
> >>> 4. when a task mounts a cgroupfs, the cgroup which shows up as root depends
> >>> on the mounting task's  cgroup namespace.
> >>>
> >>> 5. setns to a cgroup namespace switches your cgroup namespace but not
> >>> your cgroups.
> >>>
> >>> With this, using github.com/hallyn/lxc #2015-11-09/cgns (and
> >>> github.com/hallyn/lxcfs #2015-11-10/cgns) we can start a container in a full
> >>> proper cgroup namespace, avoiding either cgmanager or lxcfs cgroup bind mounts.
> >>>
> >>> This is completely backward compatible and will be completely invisible
> >>> to any existing cgroup users (except for those running inside a cgroup
> >>> namespace and looking at /proc/pid/cgroup of tasks outside their
> >>> namespace.)
> >>>    cgroupns-root.
> >>
> >> IIRC one downside of this series was that only the new "sane" cgroup
> >> layout was supported
> >> and hence it was useless for everything which expected the default layout.
> >> Hence, still no systemd for us. :)
> >>
> >> Is this now different?
> > 
> > Yes, all hierachies are no supported.
> > 
> 
> Should read "now"? :-)
> If so, *awesome*!

D'oh!  Yes, now :-)

-serge

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: CGroup Namespaces (v4)
       [not found]       ` <564A41AF.4040208-/L3Ra7n9ekc@public.gmane.org>
@ 2015-11-16 20:54         ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-16 20:54 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Serge E. Hallyn, Richard Weinberger, LKML, open list:ABI/API,
	Linux Containers, Eric W. Biederman,
	LXC development mailing-list, Tejun Heo, cgroups mailinglist,
	Andrew Morton

On Mon, Nov 16, 2015 at 09:50:55PM +0100, Richard Weinberger wrote:
> Am 16.11.2015 um 21:46 schrieb Serge E. Hallyn:
> > On Mon, Nov 16, 2015 at 09:41:15PM +0100, Richard Weinberger wrote:
> >> Serge,
> >>
> >> On Mon, Nov 16, 2015 at 8:51 PM,  <serge@hallyn.com> wrote:
> >>> To summarize the semantics:
> >>>
> >>> 1. CLONE_NEWCGROUP re-uses 0x02000000, which was previously CLONE_STOPPED
> >>>
> >>> 2. unsharing a cgroup namespace makes all your current cgroups your new
> >>> cgroup root.
> >>>
> >>> 3. /proc/pid/cgroup always shows cgroup paths relative to the reader's
> >>> cgroup namespce root.  A task outside of  your cgroup looks like
> >>>
> >>>         8:memory:/../../..
> >>>
> >>> 4. when a task mounts a cgroupfs, the cgroup which shows up as root depends
> >>> on the mounting task's  cgroup namespace.
> >>>
> >>> 5. setns to a cgroup namespace switches your cgroup namespace but not
> >>> your cgroups.
> >>>
> >>> With this, using github.com/hallyn/lxc #2015-11-09/cgns (and
> >>> github.com/hallyn/lxcfs #2015-11-10/cgns) we can start a container in a full
> >>> proper cgroup namespace, avoiding either cgmanager or lxcfs cgroup bind mounts.
> >>>
> >>> This is completely backward compatible and will be completely invisible
> >>> to any existing cgroup users (except for those running inside a cgroup
> >>> namespace and looking at /proc/pid/cgroup of tasks outside their
> >>> namespace.)
> >>>    cgroupns-root.
> >>
> >> IIRC one downside of this series was that only the new "sane" cgroup
> >> layout was supported
> >> and hence it was useless for everything which expected the default layout.
> >> Hence, still no systemd for us. :)
> >>
> >> Is this now different?
> > 
> > Yes, all hierachies are no supported.
> > 
> 
> Should read "now"? :-)
> If so, *awesome*!

D'oh!  Yes, now :-)

-serge

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: CGroup Namespaces (v4)
@ 2015-11-16 20:54         ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-16 20:54 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Serge E. Hallyn, Richard Weinberger, LKML, open list:ABI/API,
	Linux Containers, Eric W. Biederman,
	LXC development mailing-list, Tejun Heo, cgroups mailinglist,
	Andrew Morton

On Mon, Nov 16, 2015 at 09:50:55PM +0100, Richard Weinberger wrote:
> Am 16.11.2015 um 21:46 schrieb Serge E. Hallyn:
> > On Mon, Nov 16, 2015 at 09:41:15PM +0100, Richard Weinberger wrote:
> >> Serge,
> >>
> >> On Mon, Nov 16, 2015 at 8:51 PM,  <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote:
> >>> To summarize the semantics:
> >>>
> >>> 1. CLONE_NEWCGROUP re-uses 0x02000000, which was previously CLONE_STOPPED
> >>>
> >>> 2. unsharing a cgroup namespace makes all your current cgroups your new
> >>> cgroup root.
> >>>
> >>> 3. /proc/pid/cgroup always shows cgroup paths relative to the reader's
> >>> cgroup namespce root.  A task outside of  your cgroup looks like
> >>>
> >>>         8:memory:/../../..
> >>>
> >>> 4. when a task mounts a cgroupfs, the cgroup which shows up as root depends
> >>> on the mounting task's  cgroup namespace.
> >>>
> >>> 5. setns to a cgroup namespace switches your cgroup namespace but not
> >>> your cgroups.
> >>>
> >>> With this, using github.com/hallyn/lxc #2015-11-09/cgns (and
> >>> github.com/hallyn/lxcfs #2015-11-10/cgns) we can start a container in a full
> >>> proper cgroup namespace, avoiding either cgmanager or lxcfs cgroup bind mounts.
> >>>
> >>> This is completely backward compatible and will be completely invisible
> >>> to any existing cgroup users (except for those running inside a cgroup
> >>> namespace and looking at /proc/pid/cgroup of tasks outside their
> >>> namespace.)
> >>>    cgroupns-root.
> >>
> >> IIRC one downside of this series was that only the new "sane" cgroup
> >> layout was supported
> >> and hence it was useless for everything which expected the default layout.
> >> Hence, still no systemd for us. :)
> >>
> >> Is this now different?
> > 
> > Yes, all hierachies are no supported.
> > 
> 
> Should read "now"? :-)
> If so, *awesome*!

D'oh!  Yes, now :-)

-serge

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: CGroup Namespaces (v4)
       [not found]         ` <20151116205452.GA30975-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2015-11-16 22:24           ` Eric W. Biederman
  0 siblings, 0 replies; 180+ messages in thread
From: Eric W. Biederman @ 2015-11-16 22:24 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Richard Weinberger, Linux Containers, LKML,
	LXC development mailing-list, open list:ABI/API, Tejun Heo,
	cgroups mailinglist, Andrew Morton

"Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:

> On Mon, Nov 16, 2015 at 09:50:55PM +0100, Richard Weinberger wrote:
>> Am 16.11.2015 um 21:46 schrieb Serge E. Hallyn:
>> > On Mon, Nov 16, 2015 at 09:41:15PM +0100, Richard Weinberger wrote:
>> >> Serge,
>> >>
>> >> On Mon, Nov 16, 2015 at 8:51 PM,  <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote:
>> >>> To summarize the semantics:
>> >>>
>> >>> 1. CLONE_NEWCGROUP re-uses 0x02000000, which was previously CLONE_STOPPED
>> >>>
>> >>> 2. unsharing a cgroup namespace makes all your current cgroups your new
>> >>> cgroup root.
>> >>>
>> >>> 3. /proc/pid/cgroup always shows cgroup paths relative to the reader's
>> >>> cgroup namespce root.  A task outside of  your cgroup looks like
>> >>>
>> >>>         8:memory:/../../..
>> >>>
>> >>> 4. when a task mounts a cgroupfs, the cgroup which shows up as root depends
>> >>> on the mounting task's  cgroup namespace.
>> >>>
>> >>> 5. setns to a cgroup namespace switches your cgroup namespace but not
>> >>> your cgroups.
>> >>>
>> >>> With this, using github.com/hallyn/lxc #2015-11-09/cgns (and
>> >>> github.com/hallyn/lxcfs #2015-11-10/cgns) we can start a container in a full
>> >>> proper cgroup namespace, avoiding either cgmanager or lxcfs cgroup bind mounts.
>> >>>
>> >>> This is completely backward compatible and will be completely invisible
>> >>> to any existing cgroup users (except for those running inside a cgroup
>> >>> namespace and looking at /proc/pid/cgroup of tasks outside their
>> >>> namespace.)
>> >>>    cgroupns-root.
>> >>
>> >> IIRC one downside of this series was that only the new "sane" cgroup
>> >> layout was supported
>> >> and hence it was useless for everything which expected the default layout.
>> >> Hence, still no systemd for us. :)
>> >>
>> >> Is this now different?
>> > 
>> > Yes, all hierachies are no supported.
>> > 
>> 
>> Should read "now"? :-)
>> If so, *awesome*!
>
> D'oh!  Yes, now :-)

I am glad to see multiple hierarchy support, that is something people
can use today.

A couple of quick questions before I delve into a review.

Does this allow mixing of cgroupfs and cgroupfs2?  That is can I: "mount
-t cgroupfs" inside a container and "mount -t cgroupfs2" outside a
container? and still have reasonable things happen?  I suspect the
semantics of cgroups prevent this but I am interested to know what happens.

Similary have you considered what it required to be able to safely set
FS_USERNS_MOUNT?

Eric

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: CGroup Namespaces (v4)
       [not found]         ` <20151116205452.GA30975-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2015-11-16 22:24           ` Eric W. Biederman
  0 siblings, 0 replies; 180+ messages in thread
From: Eric W. Biederman @ 2015-11-16 22:24 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Richard Weinberger, Richard Weinberger, LKML, open list:ABI/API,
	Linux Containers, LXC development mailing-list, Tejun Heo,
	cgroups mailinglist, Andrew Morton

"Serge E. Hallyn" <serge@hallyn.com> writes:

> On Mon, Nov 16, 2015 at 09:50:55PM +0100, Richard Weinberger wrote:
>> Am 16.11.2015 um 21:46 schrieb Serge E. Hallyn:
>> > On Mon, Nov 16, 2015 at 09:41:15PM +0100, Richard Weinberger wrote:
>> >> Serge,
>> >>
>> >> On Mon, Nov 16, 2015 at 8:51 PM,  <serge@hallyn.com> wrote:
>> >>> To summarize the semantics:
>> >>>
>> >>> 1. CLONE_NEWCGROUP re-uses 0x02000000, which was previously CLONE_STOPPED
>> >>>
>> >>> 2. unsharing a cgroup namespace makes all your current cgroups your new
>> >>> cgroup root.
>> >>>
>> >>> 3. /proc/pid/cgroup always shows cgroup paths relative to the reader's
>> >>> cgroup namespce root.  A task outside of  your cgroup looks like
>> >>>
>> >>>         8:memory:/../../..
>> >>>
>> >>> 4. when a task mounts a cgroupfs, the cgroup which shows up as root depends
>> >>> on the mounting task's  cgroup namespace.
>> >>>
>> >>> 5. setns to a cgroup namespace switches your cgroup namespace but not
>> >>> your cgroups.
>> >>>
>> >>> With this, using github.com/hallyn/lxc #2015-11-09/cgns (and
>> >>> github.com/hallyn/lxcfs #2015-11-10/cgns) we can start a container in a full
>> >>> proper cgroup namespace, avoiding either cgmanager or lxcfs cgroup bind mounts.
>> >>>
>> >>> This is completely backward compatible and will be completely invisible
>> >>> to any existing cgroup users (except for those running inside a cgroup
>> >>> namespace and looking at /proc/pid/cgroup of tasks outside their
>> >>> namespace.)
>> >>>    cgroupns-root.
>> >>
>> >> IIRC one downside of this series was that only the new "sane" cgroup
>> >> layout was supported
>> >> and hence it was useless for everything which expected the default layout.
>> >> Hence, still no systemd for us. :)
>> >>
>> >> Is this now different?
>> > 
>> > Yes, all hierachies are no supported.
>> > 
>> 
>> Should read "now"? :-)
>> If so, *awesome*!
>
> D'oh!  Yes, now :-)

I am glad to see multiple hierarchy support, that is something people
can use today.

A couple of quick questions before I delve into a review.

Does this allow mixing of cgroupfs and cgroupfs2?  That is can I: "mount
-t cgroupfs" inside a container and "mount -t cgroupfs2" outside a
container? and still have reasonable things happen?  I suspect the
semantics of cgroups prevent this but I am interested to know what happens.

Similary have you considered what it required to be able to safely set
FS_USERNS_MOUNT?

Eric

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: CGroup Namespaces (v4)
@ 2015-11-16 22:24           ` Eric W. Biederman
  0 siblings, 0 replies; 180+ messages in thread
From: Eric W. Biederman @ 2015-11-16 22:24 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Richard Weinberger, Richard Weinberger, LKML, open list:ABI/API,
	Linux Containers, LXC development mailing-list, Tejun Heo,
	cgroups mailinglist, Andrew Morton

"Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:

> On Mon, Nov 16, 2015 at 09:50:55PM +0100, Richard Weinberger wrote:
>> Am 16.11.2015 um 21:46 schrieb Serge E. Hallyn:
>> > On Mon, Nov 16, 2015 at 09:41:15PM +0100, Richard Weinberger wrote:
>> >> Serge,
>> >>
>> >> On Mon, Nov 16, 2015 at 8:51 PM,  <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote:
>> >>> To summarize the semantics:
>> >>>
>> >>> 1. CLONE_NEWCGROUP re-uses 0x02000000, which was previously CLONE_STOPPED
>> >>>
>> >>> 2. unsharing a cgroup namespace makes all your current cgroups your new
>> >>> cgroup root.
>> >>>
>> >>> 3. /proc/pid/cgroup always shows cgroup paths relative to the reader's
>> >>> cgroup namespce root.  A task outside of  your cgroup looks like
>> >>>
>> >>>         8:memory:/../../..
>> >>>
>> >>> 4. when a task mounts a cgroupfs, the cgroup which shows up as root depends
>> >>> on the mounting task's  cgroup namespace.
>> >>>
>> >>> 5. setns to a cgroup namespace switches your cgroup namespace but not
>> >>> your cgroups.
>> >>>
>> >>> With this, using github.com/hallyn/lxc #2015-11-09/cgns (and
>> >>> github.com/hallyn/lxcfs #2015-11-10/cgns) we can start a container in a full
>> >>> proper cgroup namespace, avoiding either cgmanager or lxcfs cgroup bind mounts.
>> >>>
>> >>> This is completely backward compatible and will be completely invisible
>> >>> to any existing cgroup users (except for those running inside a cgroup
>> >>> namespace and looking at /proc/pid/cgroup of tasks outside their
>> >>> namespace.)
>> >>>    cgroupns-root.
>> >>
>> >> IIRC one downside of this series was that only the new "sane" cgroup
>> >> layout was supported
>> >> and hence it was useless for everything which expected the default layout.
>> >> Hence, still no systemd for us. :)
>> >>
>> >> Is this now different?
>> > 
>> > Yes, all hierachies are no supported.
>> > 
>> 
>> Should read "now"? :-)
>> If so, *awesome*!
>
> D'oh!  Yes, now :-)

I am glad to see multiple hierarchy support, that is something people
can use today.

A couple of quick questions before I delve into a review.

Does this allow mixing of cgroupfs and cgroupfs2?  That is can I: "mount
-t cgroupfs" inside a container and "mount -t cgroupfs2" outside a
container? and still have reasonable things happen?  I suspect the
semantics of cgroups prevent this but I am interested to know what happens.

Similary have you considered what it required to be able to safely set
FS_USERNS_MOUNT?

Eric

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: CGroup Namespaces (v4)
       [not found]           ` <87y4dxh9b8.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
@ 2015-11-16 22:37             ` Tejun Heo
  2015-11-17  1:13               ` Serge E. Hallyn
  2015-11-18  2:30               ` Serge E. Hallyn
  2 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-16 22:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Richard Weinberger, Linux Containers, LKML,
	LXC development mailing-list, open list:ABI/API,
	cgroups mailinglist, Andrew Morton

Hello, Eric.

On Mon, Nov 16, 2015 at 04:24:27PM -0600, Eric W. Biederman wrote:
> Does this allow mixing of cgroupfs and cgroupfs2?  That is can I: "mount
> -t cgroupfs" inside a container and "mount -t cgroupfs2" outside a
> container? and still have reasonable things happen?  I suspect the
> semantics of cgroups prevent this but I am interested to know what happens.

cgroup v1 and v2 are just separate hierarchies.  They can't nest each
other but co-existing and namespacing on their own is completely fine.
The caveat is that a given controller can be on only one hierarchy but
that's the same among v1 hierarchies too.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: CGroup Namespaces (v4)
       [not found]           ` <87y4dxh9b8.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
@ 2015-11-16 22:37             ` Tejun Heo
  2015-11-17  1:13               ` Serge E. Hallyn
  2015-11-18  2:30               ` Serge E. Hallyn
  2 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-16 22:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge E. Hallyn, Richard Weinberger, Richard Weinberger, LKML,
	open list:ABI/API, Linux Containers,
	LXC development mailing-list, cgroups mailinglist, Andrew Morton

Hello, Eric.

On Mon, Nov 16, 2015 at 04:24:27PM -0600, Eric W. Biederman wrote:
> Does this allow mixing of cgroupfs and cgroupfs2?  That is can I: "mount
> -t cgroupfs" inside a container and "mount -t cgroupfs2" outside a
> container? and still have reasonable things happen?  I suspect the
> semantics of cgroups prevent this but I am interested to know what happens.

cgroup v1 and v2 are just separate hierarchies.  They can't nest each
other but co-existing and namespacing on their own is completely fine.
The caveat is that a given controller can be on only one hierarchy but
that's the same among v1 hierarchies too.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: CGroup Namespaces (v4)
@ 2015-11-16 22:37             ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-16 22:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge E. Hallyn, Richard Weinberger, Richard Weinberger, LKML,
	open list:ABI/API, Linux Containers,
	LXC development mailing-list, cgroups mailinglist, Andrew Morton

Hello, Eric.

On Mon, Nov 16, 2015 at 04:24:27PM -0600, Eric W. Biederman wrote:
> Does this allow mixing of cgroupfs and cgroupfs2?  That is can I: "mount
> -t cgroupfs" inside a container and "mount -t cgroupfs2" outside a
> container? and still have reasonable things happen?  I suspect the
> semantics of cgroups prevent this but I am interested to know what happens.

cgroup v1 and v2 are just separate hierarchies.  They can't nest each
other but co-existing and namespacing on their own is completely fine.
The caveat is that a given controller can be on only one hierarchy but
that's the same among v1 hierarchies too.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: CGroup Namespaces (v4)
  2015-11-16 22:24           ` Eric W. Biederman
  (?)
@ 2015-11-17  1:13               ` Serge E. Hallyn
  -1 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-17  1:13 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Richard Weinberger, Linux Containers, LKML,
	LXC development mailing-list, open list:ABI/API, Tejun Heo,
	cgroups mailinglist, Andrew Morton

On Mon, Nov 16, 2015 at 04:24:27PM -0600, Eric W. Biederman wrote:
> "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:
> 
> > On Mon, Nov 16, 2015 at 09:50:55PM +0100, Richard Weinberger wrote:
> >> Am 16.11.2015 um 21:46 schrieb Serge E. Hallyn:
> >> > On Mon, Nov 16, 2015 at 09:41:15PM +0100, Richard Weinberger wrote:
> >> >> Serge,
> >> >>
> >> >> On Mon, Nov 16, 2015 at 8:51 PM,  <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote:
> >> >>> To summarize the semantics:
> >> >>>
> >> >>> 1. CLONE_NEWCGROUP re-uses 0x02000000, which was previously CLONE_STOPPED
> >> >>>
> >> >>> 2. unsharing a cgroup namespace makes all your current cgroups your new
> >> >>> cgroup root.
> >> >>>
> >> >>> 3. /proc/pid/cgroup always shows cgroup paths relative to the reader's
> >> >>> cgroup namespce root.  A task outside of  your cgroup looks like
> >> >>>
> >> >>>         8:memory:/../../..
> >> >>>
> >> >>> 4. when a task mounts a cgroupfs, the cgroup which shows up as root depends
> >> >>> on the mounting task's  cgroup namespace.
> >> >>>
> >> >>> 5. setns to a cgroup namespace switches your cgroup namespace but not
> >> >>> your cgroups.
> >> >>>
> >> >>> With this, using github.com/hallyn/lxc #2015-11-09/cgns (and
> >> >>> github.com/hallyn/lxcfs #2015-11-10/cgns) we can start a container in a full
> >> >>> proper cgroup namespace, avoiding either cgmanager or lxcfs cgroup bind mounts.
> >> >>>
> >> >>> This is completely backward compatible and will be completely invisible
> >> >>> to any existing cgroup users (except for those running inside a cgroup
> >> >>> namespace and looking at /proc/pid/cgroup of tasks outside their
> >> >>> namespace.)
> >> >>>    cgroupns-root.
> >> >>
> >> >> IIRC one downside of this series was that only the new "sane" cgroup
> >> >> layout was supported
> >> >> and hence it was useless for everything which expected the default layout.
> >> >> Hence, still no systemd for us. :)
> >> >>
> >> >> Is this now different?
> >> > 
> >> > Yes, all hierachies are no supported.
> >> > 
> >> 
> >> Should read "now"? :-)
> >> If so, *awesome*!
> >
> > D'oh!  Yes, now :-)
> 
> I am glad to see multiple hierarchy support, that is something people
> can use today.
> 
> A couple of quick questions before I delve into a review.
> 
> Does this allow mixing of cgroupfs and cgroupfs2?  That is can I: "mount
> -t cgroupfs" inside a container and "mount -t cgroupfs2" outside a
> container? and still have reasonable things happen?  I suspect the
> semantics of cgroups prevent this but I am interested to know what happens.

As Tejun said, this is not an issue.  There's not an actual separate cgroupfs2
filesystem, it's just a separate hierarchy which controllers can be bound to
or not, which has its own set of semantics (like no tasks on leafnodes).  So
a legacy application would never be able to run on the unified hierarchy, but
this does not change that.

> Similary have you considered what it required to be able to safely set
> FS_USERNS_MOUNT?

I think the only thing we need to do is

1. go through and make sure that any ability to change mount flags is under
capable() (which I have not yet done).  The cgroup_mount() itself checks that
flags are not changed, but there may be some subtle way to effect a change
that I'm not aware of yet.

2. Make sure that to bind a new controller you must be true root.  It's
possible that a patch like the one below would suffice.

-serge

From 37699aa868cba3efb6ea0aa2e53e0b85b619f02d Mon Sep 17 00:00:00 2001
From: Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
Date: Mon, 16 Nov 2015 19:11:07 -0600
Subject: [PATCH 1/1] Don't allow user namespaces to bind new subsystems

If memory was not mounted on the host, then root in a container
should not be able to mount it.

Signed-off-by: Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
---
 kernel/cgroup.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 0a3e893..db514b4 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2102,6 +2102,11 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
 		goto out_unlock;
 	}
 
+	if (!opts.none && !capable(CAP_SYS_ADMIN)) {
+		ret = -EPERM;
+		goto out_unlock;
+	}
+
 	root = kzalloc(sizeof(*root), GFP_KERNEL);
 	if (!root) {
 		ret = -ENOMEM;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: CGroup Namespaces (v4)
@ 2015-11-17  1:13               ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-17  1:13 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge E. Hallyn, Richard Weinberger, Richard Weinberger, LKML,
	open list:ABI/API, Linux Containers,
	LXC development mailing-list, Tejun Heo, cgroups mailinglist,
	Andrew Morton

On Mon, Nov 16, 2015 at 04:24:27PM -0600, Eric W. Biederman wrote:
> "Serge E. Hallyn" <serge@hallyn.com> writes:
> 
> > On Mon, Nov 16, 2015 at 09:50:55PM +0100, Richard Weinberger wrote:
> >> Am 16.11.2015 um 21:46 schrieb Serge E. Hallyn:
> >> > On Mon, Nov 16, 2015 at 09:41:15PM +0100, Richard Weinberger wrote:
> >> >> Serge,
> >> >>
> >> >> On Mon, Nov 16, 2015 at 8:51 PM,  <serge@hallyn.com> wrote:
> >> >>> To summarize the semantics:
> >> >>>
> >> >>> 1. CLONE_NEWCGROUP re-uses 0x02000000, which was previously CLONE_STOPPED
> >> >>>
> >> >>> 2. unsharing a cgroup namespace makes all your current cgroups your new
> >> >>> cgroup root.
> >> >>>
> >> >>> 3. /proc/pid/cgroup always shows cgroup paths relative to the reader's
> >> >>> cgroup namespce root.  A task outside of  your cgroup looks like
> >> >>>
> >> >>>         8:memory:/../../..
> >> >>>
> >> >>> 4. when a task mounts a cgroupfs, the cgroup which shows up as root depends
> >> >>> on the mounting task's  cgroup namespace.
> >> >>>
> >> >>> 5. setns to a cgroup namespace switches your cgroup namespace but not
> >> >>> your cgroups.
> >> >>>
> >> >>> With this, using github.com/hallyn/lxc #2015-11-09/cgns (and
> >> >>> github.com/hallyn/lxcfs #2015-11-10/cgns) we can start a container in a full
> >> >>> proper cgroup namespace, avoiding either cgmanager or lxcfs cgroup bind mounts.
> >> >>>
> >> >>> This is completely backward compatible and will be completely invisible
> >> >>> to any existing cgroup users (except for those running inside a cgroup
> >> >>> namespace and looking at /proc/pid/cgroup of tasks outside their
> >> >>> namespace.)
> >> >>>    cgroupns-root.
> >> >>
> >> >> IIRC one downside of this series was that only the new "sane" cgroup
> >> >> layout was supported
> >> >> and hence it was useless for everything which expected the default layout.
> >> >> Hence, still no systemd for us. :)
> >> >>
> >> >> Is this now different?
> >> > 
> >> > Yes, all hierachies are no supported.
> >> > 
> >> 
> >> Should read "now"? :-)
> >> If so, *awesome*!
> >
> > D'oh!  Yes, now :-)
> 
> I am glad to see multiple hierarchy support, that is something people
> can use today.
> 
> A couple of quick questions before I delve into a review.
> 
> Does this allow mixing of cgroupfs and cgroupfs2?  That is can I: "mount
> -t cgroupfs" inside a container and "mount -t cgroupfs2" outside a
> container? and still have reasonable things happen?  I suspect the
> semantics of cgroups prevent this but I am interested to know what happens.

As Tejun said, this is not an issue.  There's not an actual separate cgroupfs2
filesystem, it's just a separate hierarchy which controllers can be bound to
or not, which has its own set of semantics (like no tasks on leafnodes).  So
a legacy application would never be able to run on the unified hierarchy, but
this does not change that.

> Similary have you considered what it required to be able to safely set
> FS_USERNS_MOUNT?

I think the only thing we need to do is

1. go through and make sure that any ability to change mount flags is under
capable() (which I have not yet done).  The cgroup_mount() itself checks that
flags are not changed, but there may be some subtle way to effect a change
that I'm not aware of yet.

2. Make sure that to bind a new controller you must be true root.  It's
possible that a patch like the one below would suffice.

-serge

>From 37699aa868cba3efb6ea0aa2e53e0b85b619f02d Mon Sep 17 00:00:00 2001
From: Serge Hallyn <serge.hallyn@ubuntu.com>
Date: Mon, 16 Nov 2015 19:11:07 -0600
Subject: [PATCH 1/1] Don't allow user namespaces to bind new subsystems

If memory was not mounted on the host, then root in a container
should not be able to mount it.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
---
 kernel/cgroup.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 0a3e893..db514b4 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2102,6 +2102,11 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
 		goto out_unlock;
 	}
 
+	if (!opts.none && !capable(CAP_SYS_ADMIN)) {
+		ret = -EPERM;
+		goto out_unlock;
+	}
+
 	root = kzalloc(sizeof(*root), GFP_KERNEL);
 	if (!root) {
 		ret = -ENOMEM;
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: CGroup Namespaces (v4)
@ 2015-11-17  1:13               ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-17  1:13 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Richard Weinberger, Linux Containers, LKML,
	LXC development mailing-list, open list:ABI/API, Tejun Heo,
	cgroups mailinglist, Andrew Morton

On Mon, Nov 16, 2015 at 04:24:27PM -0600, Eric W. Biederman wrote:
> "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:
> 
> > On Mon, Nov 16, 2015 at 09:50:55PM +0100, Richard Weinberger wrote:
> >> Am 16.11.2015 um 21:46 schrieb Serge E. Hallyn:
> >> > On Mon, Nov 16, 2015 at 09:41:15PM +0100, Richard Weinberger wrote:
> >> >> Serge,
> >> >>
> >> >> On Mon, Nov 16, 2015 at 8:51 PM,  <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote:
> >> >>> To summarize the semantics:
> >> >>>
> >> >>> 1. CLONE_NEWCGROUP re-uses 0x02000000, which was previously CLONE_STOPPED
> >> >>>
> >> >>> 2. unsharing a cgroup namespace makes all your current cgroups your new
> >> >>> cgroup root.
> >> >>>
> >> >>> 3. /proc/pid/cgroup always shows cgroup paths relative to the reader's
> >> >>> cgroup namespce root.  A task outside of  your cgroup looks like
> >> >>>
> >> >>>         8:memory:/../../..
> >> >>>
> >> >>> 4. when a task mounts a cgroupfs, the cgroup which shows up as root depends
> >> >>> on the mounting task's  cgroup namespace.
> >> >>>
> >> >>> 5. setns to a cgroup namespace switches your cgroup namespace but not
> >> >>> your cgroups.
> >> >>>
> >> >>> With this, using github.com/hallyn/lxc #2015-11-09/cgns (and
> >> >>> github.com/hallyn/lxcfs #2015-11-10/cgns) we can start a container in a full
> >> >>> proper cgroup namespace, avoiding either cgmanager or lxcfs cgroup bind mounts.
> >> >>>
> >> >>> This is completely backward compatible and will be completely invisible
> >> >>> to any existing cgroup users (except for those running inside a cgroup
> >> >>> namespace and looking at /proc/pid/cgroup of tasks outside their
> >> >>> namespace.)
> >> >>>    cgroupns-root.
> >> >>
> >> >> IIRC one downside of this series was that only the new "sane" cgroup
> >> >> layout was supported
> >> >> and hence it was useless for everything which expected the default layout.
> >> >> Hence, still no systemd for us. :)
> >> >>
> >> >> Is this now different?
> >> > 
> >> > Yes, all hierachies are no supported.
> >> > 
> >> 
> >> Should read "now"? :-)
> >> If so, *awesome*!
> >
> > D'oh!  Yes, now :-)
> 
> I am glad to see multiple hierarchy support, that is something people
> can use today.
> 
> A couple of quick questions before I delve into a review.
> 
> Does this allow mixing of cgroupfs and cgroupfs2?  That is can I: "mount
> -t cgroupfs" inside a container and "mount -t cgroupfs2" outside a
> container? and still have reasonable things happen?  I suspect the
> semantics of cgroups prevent this but I am interested to know what happens.

As Tejun said, this is not an issue.  There's not an actual separate cgroupfs2
filesystem, it's just a separate hierarchy which controllers can be bound to
or not, which has its own set of semantics (like no tasks on leafnodes).  So
a legacy application would never be able to run on the unified hierarchy, but
this does not change that.

> Similary have you considered what it required to be able to safely set
> FS_USERNS_MOUNT?

I think the only thing we need to do is

1. go through and make sure that any ability to change mount flags is under
capable() (which I have not yet done).  The cgroup_mount() itself checks that
flags are not changed, but there may be some subtle way to effect a change
that I'm not aware of yet.

2. Make sure that to bind a new controller you must be true root.  It's
possible that a patch like the one below would suffice.

-serge

>From 37699aa868cba3efb6ea0aa2e53e0b85b619f02d Mon Sep 17 00:00:00 2001
From: Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
Date: Mon, 16 Nov 2015 19:11:07 -0600
Subject: [PATCH 1/1] Don't allow user namespaces to bind new subsystems

If memory was not mounted on the host, then root in a container
should not be able to mount it.

Signed-off-by: Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
---
 kernel/cgroup.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 0a3e893..db514b4 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2102,6 +2102,11 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
 		goto out_unlock;
 	}
 
+	if (!opts.none && !capable(CAP_SYS_ADMIN)) {
+		ret = -EPERM;
+		goto out_unlock;
+	}
+
 	root = kzalloc(sizeof(*root), GFP_KERNEL);
 	if (!root) {
 		ret = -ENOMEM;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: CGroup Namespaces (v4)
       [not found]               ` <20151117011349.GA1958-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2015-11-17  1:40                 ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-17  1:40 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Richard Weinberger, Linux Containers, LKML, Eric W. Biederman,
	LXC development mailing-list, open list:ABI/API, Tejun Heo,
	cgroups mailinglist, Andrew Morton

On Mon, Nov 16, 2015 at 07:13:49PM -0600, Serge E. Hallyn wrote:
> On Mon, Nov 16, 2015 at 04:24:27PM -0600, Eric W. Biederman wrote:
> > "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:
> > 
> > > On Mon, Nov 16, 2015 at 09:50:55PM +0100, Richard Weinberger wrote:
> > >> Am 16.11.2015 um 21:46 schrieb Serge E. Hallyn:
> > >> > On Mon, Nov 16, 2015 at 09:41:15PM +0100, Richard Weinberger wrote:
> > >> >> Serge,
> > >> >>
> > >> >> On Mon, Nov 16, 2015 at 8:51 PM,  <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote:
> > >> >>> To summarize the semantics:
> > >> >>>
> > >> >>> 1. CLONE_NEWCGROUP re-uses 0x02000000, which was previously CLONE_STOPPED
> > >> >>>
> > >> >>> 2. unsharing a cgroup namespace makes all your current cgroups your new
> > >> >>> cgroup root.
> > >> >>>
> > >> >>> 3. /proc/pid/cgroup always shows cgroup paths relative to the reader's
> > >> >>> cgroup namespce root.  A task outside of  your cgroup looks like
> > >> >>>
> > >> >>>         8:memory:/../../..
> > >> >>>
> > >> >>> 4. when a task mounts a cgroupfs, the cgroup which shows up as root depends
> > >> >>> on the mounting task's  cgroup namespace.
> > >> >>>
> > >> >>> 5. setns to a cgroup namespace switches your cgroup namespace but not
> > >> >>> your cgroups.
> > >> >>>
> > >> >>> With this, using github.com/hallyn/lxc #2015-11-09/cgns (and
> > >> >>> github.com/hallyn/lxcfs #2015-11-10/cgns) we can start a container in a full
> > >> >>> proper cgroup namespace, avoiding either cgmanager or lxcfs cgroup bind mounts.
> > >> >>>
> > >> >>> This is completely backward compatible and will be completely invisible
> > >> >>> to any existing cgroup users (except for those running inside a cgroup
> > >> >>> namespace and looking at /proc/pid/cgroup of tasks outside their
> > >> >>> namespace.)
> > >> >>>    cgroupns-root.
> > >> >>
> > >> >> IIRC one downside of this series was that only the new "sane" cgroup
> > >> >> layout was supported
> > >> >> and hence it was useless for everything which expected the default layout.
> > >> >> Hence, still no systemd for us. :)
> > >> >>
> > >> >> Is this now different?
> > >> > 
> > >> > Yes, all hierachies are no supported.
> > >> > 
> > >> 
> > >> Should read "now"? :-)
> > >> If so, *awesome*!
> > >
> > > D'oh!  Yes, now :-)
> > 
> > I am glad to see multiple hierarchy support, that is something people
> > can use today.
> > 
> > A couple of quick questions before I delve into a review.
> > 
> > Does this allow mixing of cgroupfs and cgroupfs2?  That is can I: "mount
> > -t cgroupfs" inside a container and "mount -t cgroupfs2" outside a
> > container? and still have reasonable things happen?  I suspect the
> > semantics of cgroups prevent this but I am interested to know what happens.
> 
> As Tejun said, this is not an issue.  There's not an actual separate cgroupfs2
> filesystem, it's just a separate hierarchy which controllers can be bound to
> or not, which has its own set of semantics (like no tasks on leafnodes).  So
> a legacy application would never be able to run on the unified hierarchy, but
> this does not change that.
> 
> > Similary have you considered what it required to be able to safely set
> > FS_USERNS_MOUNT?
> 
> I think the only thing we need to do is
> 
> 1. go through and make sure that any ability to change mount flags is under
> capable() (which I have not yet done).  The cgroup_mount() itself checks that
> flags are not changed, but there may be some subtle way to effect a change
> that I'm not aware of yet.
> 

At least the ability to change the clone_children and release agent through
remount need to be restricted to init_user_ns root.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: CGroup Namespaces (v4)
       [not found]               ` <20151117011349.GA1958-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2015-11-17  1:40                 ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-17  1:40 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Eric W. Biederman, Richard Weinberger, Richard Weinberger, LKML,
	open list:ABI/API, Linux Containers,
	LXC development mailing-list, Tejun Heo, cgroups mailinglist,
	Andrew Morton

On Mon, Nov 16, 2015 at 07:13:49PM -0600, Serge E. Hallyn wrote:
> On Mon, Nov 16, 2015 at 04:24:27PM -0600, Eric W. Biederman wrote:
> > "Serge E. Hallyn" <serge@hallyn.com> writes:
> > 
> > > On Mon, Nov 16, 2015 at 09:50:55PM +0100, Richard Weinberger wrote:
> > >> Am 16.11.2015 um 21:46 schrieb Serge E. Hallyn:
> > >> > On Mon, Nov 16, 2015 at 09:41:15PM +0100, Richard Weinberger wrote:
> > >> >> Serge,
> > >> >>
> > >> >> On Mon, Nov 16, 2015 at 8:51 PM,  <serge@hallyn.com> wrote:
> > >> >>> To summarize the semantics:
> > >> >>>
> > >> >>> 1. CLONE_NEWCGROUP re-uses 0x02000000, which was previously CLONE_STOPPED
> > >> >>>
> > >> >>> 2. unsharing a cgroup namespace makes all your current cgroups your new
> > >> >>> cgroup root.
> > >> >>>
> > >> >>> 3. /proc/pid/cgroup always shows cgroup paths relative to the reader's
> > >> >>> cgroup namespce root.  A task outside of  your cgroup looks like
> > >> >>>
> > >> >>>         8:memory:/../../..
> > >> >>>
> > >> >>> 4. when a task mounts a cgroupfs, the cgroup which shows up as root depends
> > >> >>> on the mounting task's  cgroup namespace.
> > >> >>>
> > >> >>> 5. setns to a cgroup namespace switches your cgroup namespace but not
> > >> >>> your cgroups.
> > >> >>>
> > >> >>> With this, using github.com/hallyn/lxc #2015-11-09/cgns (and
> > >> >>> github.com/hallyn/lxcfs #2015-11-10/cgns) we can start a container in a full
> > >> >>> proper cgroup namespace, avoiding either cgmanager or lxcfs cgroup bind mounts.
> > >> >>>
> > >> >>> This is completely backward compatible and will be completely invisible
> > >> >>> to any existing cgroup users (except for those running inside a cgroup
> > >> >>> namespace and looking at /proc/pid/cgroup of tasks outside their
> > >> >>> namespace.)
> > >> >>>    cgroupns-root.
> > >> >>
> > >> >> IIRC one downside of this series was that only the new "sane" cgroup
> > >> >> layout was supported
> > >> >> and hence it was useless for everything which expected the default layout.
> > >> >> Hence, still no systemd for us. :)
> > >> >>
> > >> >> Is this now different?
> > >> > 
> > >> > Yes, all hierachies are no supported.
> > >> > 
> > >> 
> > >> Should read "now"? :-)
> > >> If so, *awesome*!
> > >
> > > D'oh!  Yes, now :-)
> > 
> > I am glad to see multiple hierarchy support, that is something people
> > can use today.
> > 
> > A couple of quick questions before I delve into a review.
> > 
> > Does this allow mixing of cgroupfs and cgroupfs2?  That is can I: "mount
> > -t cgroupfs" inside a container and "mount -t cgroupfs2" outside a
> > container? and still have reasonable things happen?  I suspect the
> > semantics of cgroups prevent this but I am interested to know what happens.
> 
> As Tejun said, this is not an issue.  There's not an actual separate cgroupfs2
> filesystem, it's just a separate hierarchy which controllers can be bound to
> or not, which has its own set of semantics (like no tasks on leafnodes).  So
> a legacy application would never be able to run on the unified hierarchy, but
> this does not change that.
> 
> > Similary have you considered what it required to be able to safely set
> > FS_USERNS_MOUNT?
> 
> I think the only thing we need to do is
> 
> 1. go through and make sure that any ability to change mount flags is under
> capable() (which I have not yet done).  The cgroup_mount() itself checks that
> flags are not changed, but there may be some subtle way to effect a change
> that I'm not aware of yet.
> 

At least the ability to change the clone_children and release agent through
remount need to be restricted to init_user_ns root.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: CGroup Namespaces (v4)
@ 2015-11-17  1:40                 ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-17  1:40 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Eric W. Biederman, Richard Weinberger, Richard Weinberger, LKML,
	open list:ABI/API, Linux Containers,
	LXC development mailing-list, Tejun Heo, cgroups mailinglist,
	Andrew Morton

On Mon, Nov 16, 2015 at 07:13:49PM -0600, Serge E. Hallyn wrote:
> On Mon, Nov 16, 2015 at 04:24:27PM -0600, Eric W. Biederman wrote:
> > "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:
> > 
> > > On Mon, Nov 16, 2015 at 09:50:55PM +0100, Richard Weinberger wrote:
> > >> Am 16.11.2015 um 21:46 schrieb Serge E. Hallyn:
> > >> > On Mon, Nov 16, 2015 at 09:41:15PM +0100, Richard Weinberger wrote:
> > >> >> Serge,
> > >> >>
> > >> >> On Mon, Nov 16, 2015 at 8:51 PM,  <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote:
> > >> >>> To summarize the semantics:
> > >> >>>
> > >> >>> 1. CLONE_NEWCGROUP re-uses 0x02000000, which was previously CLONE_STOPPED
> > >> >>>
> > >> >>> 2. unsharing a cgroup namespace makes all your current cgroups your new
> > >> >>> cgroup root.
> > >> >>>
> > >> >>> 3. /proc/pid/cgroup always shows cgroup paths relative to the reader's
> > >> >>> cgroup namespce root.  A task outside of  your cgroup looks like
> > >> >>>
> > >> >>>         8:memory:/../../..
> > >> >>>
> > >> >>> 4. when a task mounts a cgroupfs, the cgroup which shows up as root depends
> > >> >>> on the mounting task's  cgroup namespace.
> > >> >>>
> > >> >>> 5. setns to a cgroup namespace switches your cgroup namespace but not
> > >> >>> your cgroups.
> > >> >>>
> > >> >>> With this, using github.com/hallyn/lxc #2015-11-09/cgns (and
> > >> >>> github.com/hallyn/lxcfs #2015-11-10/cgns) we can start a container in a full
> > >> >>> proper cgroup namespace, avoiding either cgmanager or lxcfs cgroup bind mounts.
> > >> >>>
> > >> >>> This is completely backward compatible and will be completely invisible
> > >> >>> to any existing cgroup users (except for those running inside a cgroup
> > >> >>> namespace and looking at /proc/pid/cgroup of tasks outside their
> > >> >>> namespace.)
> > >> >>>    cgroupns-root.
> > >> >>
> > >> >> IIRC one downside of this series was that only the new "sane" cgroup
> > >> >> layout was supported
> > >> >> and hence it was useless for everything which expected the default layout.
> > >> >> Hence, still no systemd for us. :)
> > >> >>
> > >> >> Is this now different?
> > >> > 
> > >> > Yes, all hierachies are no supported.
> > >> > 
> > >> 
> > >> Should read "now"? :-)
> > >> If so, *awesome*!
> > >
> > > D'oh!  Yes, now :-)
> > 
> > I am glad to see multiple hierarchy support, that is something people
> > can use today.
> > 
> > A couple of quick questions before I delve into a review.
> > 
> > Does this allow mixing of cgroupfs and cgroupfs2?  That is can I: "mount
> > -t cgroupfs" inside a container and "mount -t cgroupfs2" outside a
> > container? and still have reasonable things happen?  I suspect the
> > semantics of cgroups prevent this but I am interested to know what happens.
> 
> As Tejun said, this is not an issue.  There's not an actual separate cgroupfs2
> filesystem, it's just a separate hierarchy which controllers can be bound to
> or not, which has its own set of semantics (like no tasks on leafnodes).  So
> a legacy application would never be able to run on the unified hierarchy, but
> this does not change that.
> 
> > Similary have you considered what it required to be able to safely set
> > FS_USERNS_MOUNT?
> 
> I think the only thing we need to do is
> 
> 1. go through and make sure that any ability to change mount flags is under
> capable() (which I have not yet done).  The cgroup_mount() itself checks that
> flags are not changed, but there may be some subtle way to effect a change
> that I'm not aware of yet.
> 

At least the ability to change the clone_children and release agent through
remount need to be restricted to init_user_ns root.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: CGroup Namespaces (v4)
       [not found]                 ` <20151117014026.GA2331-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2015-11-17  3:54                   ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-17  3:54 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Richard Weinberger, Linux Containers, LKML, Eric W. Biederman,
	LXC development mailing-list, open list:ABI/API, Tejun Heo,
	cgroups mailinglist, Andrew Morton

On Mon, Nov 16, 2015 at 07:40:26PM -0600, Serge E. Hallyn wrote:
> On Mon, Nov 16, 2015 at 07:13:49PM -0600, Serge E. Hallyn wrote:
> > On Mon, Nov 16, 2015 at 04:24:27PM -0600, Eric W. Biederman wrote:
...
> > > Similary have you considered what it required to be able to safely set
> > > FS_USERNS_MOUNT?
> > 
> > I think the only thing we need to do is
> > 
> > 1. go through and make sure that any ability to change mount flags is under
> > capable() (which I have not yet done).  The cgroup_mount() itself checks that
> > flags are not changed, but there may be some subtle way to effect a change
> > that I'm not aware of yet.
> > 
> 
> At least the ability to change the clone_children and release agent through
> remount need to be restricted to init_user_ns root.

No, they can only be changed on a new mount, so these are fine.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: CGroup Namespaces (v4)
       [not found]                 ` <20151117014026.GA2331-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2015-11-17  3:54                   ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-17  3:54 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Eric W. Biederman, Richard Weinberger, Richard Weinberger, LKML,
	open list:ABI/API, Linux Containers,
	LXC development mailing-list, Tejun Heo, cgroups mailinglist,
	Andrew Morton

On Mon, Nov 16, 2015 at 07:40:26PM -0600, Serge E. Hallyn wrote:
> On Mon, Nov 16, 2015 at 07:13:49PM -0600, Serge E. Hallyn wrote:
> > On Mon, Nov 16, 2015 at 04:24:27PM -0600, Eric W. Biederman wrote:
...
> > > Similary have you considered what it required to be able to safely set
> > > FS_USERNS_MOUNT?
> > 
> > I think the only thing we need to do is
> > 
> > 1. go through and make sure that any ability to change mount flags is under
> > capable() (which I have not yet done).  The cgroup_mount() itself checks that
> > flags are not changed, but there may be some subtle way to effect a change
> > that I'm not aware of yet.
> > 
> 
> At least the ability to change the clone_children and release agent through
> remount need to be restricted to init_user_ns root.

No, they can only be changed on a new mount, so these are fine.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: CGroup Namespaces (v4)
@ 2015-11-17  3:54                   ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-17  3:54 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Eric W. Biederman, Richard Weinberger, Richard Weinberger, LKML,
	open list:ABI/API, Linux Containers,
	LXC development mailing-list, Tejun Heo, cgroups mailinglist,
	Andrew Morton

On Mon, Nov 16, 2015 at 07:40:26PM -0600, Serge E. Hallyn wrote:
> On Mon, Nov 16, 2015 at 07:13:49PM -0600, Serge E. Hallyn wrote:
> > On Mon, Nov 16, 2015 at 04:24:27PM -0600, Eric W. Biederman wrote:
...
> > > Similary have you considered what it required to be able to safely set
> > > FS_USERNS_MOUNT?
> > 
> > I think the only thing we need to do is
> > 
> > 1. go through and make sure that any ability to change mount flags is under
> > capable() (which I have not yet done).  The cgroup_mount() itself checks that
> > flags are not changed, but there may be some subtle way to effect a change
> > that I'm not aware of yet.
> > 
> 
> At least the ability to change the clone_children and release agent through
> remount need to be restricted to init_user_ns root.

No, they can only be changed on a new mount, so these are fine.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: CGroup Namespaces (v4)
  2015-11-16 22:24           ` Eric W. Biederman
@ 2015-11-18  2:30               ` Serge E. Hallyn
  -1 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-18  2:30 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Richard Weinberger, Linux Containers, LKML,
	LXC development mailing-list, open list:ABI/API, Tejun Heo,
	cgroups mailinglist, Andrew Morton

On Mon, Nov 16, 2015 at 04:24:27PM -0600, Eric W. Biederman wrote:
> Similary have you considered what it required to be able to safely set
> FS_USERNS_MOUNT?

I pushed the one patch which I feel is needed to my branch (it's also
included in another reply).  Aditya had already added FS_USERNS_MOUNT to
the cgroup fs flags, so I think we're now all set.  I can start
unprivileged containers which mount cgroupfs (which make systemd happy).

-serge

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: CGroup Namespaces (v4)
@ 2015-11-18  2:30               ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-18  2:30 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge E. Hallyn, Richard Weinberger, Richard Weinberger, LKML,
	open list:ABI/API, Linux Containers,
	LXC development mailing-list, Tejun Heo, cgroups mailinglist,
	Andrew Morton

On Mon, Nov 16, 2015 at 04:24:27PM -0600, Eric W. Biederman wrote:
> Similary have you considered what it required to be able to safely set
> FS_USERNS_MOUNT?

I pushed the one patch which I feel is needed to my branch (it's also
included in another reply).  Aditya had already added FS_USERNS_MOUNT to
the cgroup fs flags, so I think we're now all set.  I can start
unprivileged containers which mount cgroupfs (which make systemd happy).

-serge

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: CGroup Namespaces (v4)
  2015-11-18  2:30               ` Serge E. Hallyn
@ 2015-11-18  9:18                   ` Eric W. Biederman
  -1 siblings, 0 replies; 180+ messages in thread
From: Eric W. Biederman @ 2015-11-18  9:18 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Richard Weinberger, Linux Containers, LKML,
	LXC development mailing-list, open list:ABI/API, Tejun Heo,
	cgroups mailinglist, Andrew Morton

"Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:

> On Mon, Nov 16, 2015 at 04:24:27PM -0600, Eric W. Biederman wrote:
>> Similary have you considered what it required to be able to safely set
>> FS_USERNS_MOUNT?
>
> I pushed the one patch which I feel is needed to my branch (it's also
> included in another reply).  Aditya had already added FS_USERNS_MOUNT to
> the cgroup fs flags, so I think we're now all set.  I can start
> unprivileged containers which mount cgroupfs (which make systemd happy).

In principle that sounds very good, and I am glad to see that.

Let's hold off on merging the unprivileged part until everything else is
reviewed and merged and we have performed an extra hard look at the
security implications as it can be easy to overlook something when
relaxing the permissions.

Eric

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: CGroup Namespaces (v4)
@ 2015-11-18  9:18                   ` Eric W. Biederman
  0 siblings, 0 replies; 180+ messages in thread
From: Eric W. Biederman @ 2015-11-18  9:18 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Richard Weinberger, Richard Weinberger, LKML, open list:ABI/API,
	Linux Containers, LXC development mailing-list, Tejun Heo,
	cgroups mailinglist, Andrew Morton

"Serge E. Hallyn" <serge@hallyn.com> writes:

> On Mon, Nov 16, 2015 at 04:24:27PM -0600, Eric W. Biederman wrote:
>> Similary have you considered what it required to be able to safely set
>> FS_USERNS_MOUNT?
>
> I pushed the one patch which I feel is needed to my branch (it's also
> included in another reply).  Aditya had already added FS_USERNS_MOUNT to
> the cgroup fs flags, so I think we're now all set.  I can start
> unprivileged containers which mount cgroupfs (which make systemd happy).

In principle that sounds very good, and I am glad to see that.

Let's hold off on merging the unprivileged part until everything else is
reviewed and merged and we have performed an extra hard look at the
security implications as it can be easy to overlook something when
relaxing the permissions.

Eric


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: CGroup Namespaces (v4)
  2015-11-18  9:18                   ` Eric W. Biederman
@ 2015-11-18 15:43                       ` Serge E. Hallyn
  -1 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-18 15:43 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Richard Weinberger, Linux Containers, LKML,
	LXC development mailing-list, open list:ABI/API, Tejun Heo,
	cgroups mailinglist, Andrew Morton

On Wed, Nov 18, 2015 at 03:18:44AM -0600, Eric W. Biederman wrote:
> "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:
> 
> > On Mon, Nov 16, 2015 at 04:24:27PM -0600, Eric W. Biederman wrote:
> >> Similary have you considered what it required to be able to safely set
> >> FS_USERNS_MOUNT?
> >
> > I pushed the one patch which I feel is needed to my branch (it's also
> > included in another reply).  Aditya had already added FS_USERNS_MOUNT to
> > the cgroup fs flags, so I think we're now all set.  I can start
> > unprivileged containers which mount cgroupfs (which make systemd happy).
> 
> In principle that sounds very good, and I am glad to see that.
> 
> Let's hold off on merging the unprivileged part until everything else is
> reviewed and merged and we have performed an extra hard look at the
> security implications as it can be easy to overlook something when
> relaxing the permissions.

I'll break out the FS_USERNS_MOUNT flag into the very last patch.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: CGroup Namespaces (v4)
@ 2015-11-18 15:43                       ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-18 15:43 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge E. Hallyn, Richard Weinberger, Richard Weinberger, LKML,
	open list:ABI/API, Linux Containers,
	LXC development mailing-list, Tejun Heo, cgroups mailinglist,
	Andrew Morton

On Wed, Nov 18, 2015 at 03:18:44AM -0600, Eric W. Biederman wrote:
> "Serge E. Hallyn" <serge@hallyn.com> writes:
> 
> > On Mon, Nov 16, 2015 at 04:24:27PM -0600, Eric W. Biederman wrote:
> >> Similary have you considered what it required to be able to safely set
> >> FS_USERNS_MOUNT?
> >
> > I pushed the one patch which I feel is needed to my branch (it's also
> > included in another reply).  Aditya had already added FS_USERNS_MOUNT to
> > the cgroup fs flags, so I think we're now all set.  I can start
> > unprivileged containers which mount cgroupfs (which make systemd happy).
> 
> In principle that sounds very good, and I am glad to see that.
> 
> Let's hold off on merging the unprivileged part until everything else is
> reviewed and merged and we have performed an extra hard look at the
> security implications as it can be easy to overlook something when
> relaxing the permissions.

I'll break out the FS_USERNS_MOUNT flag into the very last patch.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
       [not found]     ` <1447703505-29672-2-git-send-email-serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
@ 2015-11-24 16:16       ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-24 16:16 UTC (permalink / raw)
  To: serge-A9i7LUbDfNHQT0dZR+AlfA
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

Hello,

On Mon, Nov 16, 2015 at 01:51:38PM -0600, serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org wrote:
> +static char * __must_check kernfs_path_from_node_locked(
> +	struct kernfs_node *kn_from,
> +	struct kernfs_node *kn_to,
> +	char *buf,
> +	size_t buflen)
> +{
> +	char *p = buf;
> +	struct kernfs_node *kn;
> +	size_t depth_from = 0, depth_to, d;
>  	int len;
>  
> +	/* We atleast need 2 bytes to write "/\0". */
> +	BUG_ON(buflen < 2);

I don't think this is BUG worthy.  Just return NULL?  Also, the only
reason the original function returned char * was because the starting
point may not be the start of the buffer which helps keeping the
implementation simple.  If this function is gonna be complex anyway, a
better approach would be returning ssize_t and implement a simliar
behavior to strlcpy().

> +	/* Short-circuit the easy case - kn_to is the root node. */
> +	if ((kn_from == kn_to) || (!kn_from && !kn_to->parent)) {
> +		*p = '/';
> +		*(p + 1) = '\0';

Hmm... so if kn_from == kn_to, the output is "/"?

> +		return p;
> +	}
> +
> +	/* We can find the relative path only if both the nodes belong to the
> +	 * same kernfs root.
> +	 */
> +	if (kn_from) {
> +		BUG_ON(kernfs_root(kn_from) != kernfs_root(kn_to));

Ditto, just return NULL and maybe trigger WARN_ON_ONCE().

> +		depth_from = kernfs_node_depth(kn_from);
> +	}
> +
> +	depth_to = kernfs_node_depth(kn_to);
> +
> +	/* We compose path from left to right. So first write out all possible
                                             ^
					     , so

> +	 * "/.." strings needed to reach from 'kn_from' to the common ancestor.
> +	 */

Please fully-wing multiline comments.

> +	if (kn_from) {
> +		while (depth_from > depth_to) {
> +			len = strlen("/..");

Maybe do something like the following instead?

			const char parent_str[] = "/..";
			size_t len = sizeof(parent_str) - 1;

> +			if ((buflen - (p - buf)) < len + 1) {
> +				/* buffer not big enough. */
> +				buf[0] = '\0';
> +				return NULL;
> +			}
> +			memcpy(p, "/..", len);
> +			p += len;
> +			*p = '\0';
> +			--depth_from;
> +			kn_from = kn_from->parent;
>  		}
> +
> +		d = depth_to;
> +		kn = kn_to;
> +		while (depth_from < d) {
> +			kn = kn->parent;
> +			d--;
> +		}
> +
> +		/* Now we have 'depth_from == depth_to' at this point. Add more

Ditto with winging.

> +		 * "/.."s until we reach common ancestor. In the worst case,
> +		 * root node will be the common ancestor.
> +		 */
> +		while (depth_from > 0) {
> +			/* If we reached common ancestor, stop. */
> +			if (kn_from == kn)
> +				break;
> +			len = strlen("/..");
> +			if ((buflen - (p - buf)) < len + 1) {
> +				/* buffer not big enough. */
> +				buf[0] = '\0';
> +				return NULL;
> +			}
> +			memcpy(p, "/..", len);
> +			p += len;
> +			*p = '\0';
> +			--depth_from;
> +			kn_from = kn_from->parent;
> +			kn = kn->parent;
> +		}

Hmmm... I wonder whether this and the above block can be merged.
Wouldn't it be simpler to calculate common ancestor and generate
/.. till it reached that point?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
       [not found]     ` <1447703505-29672-2-git-send-email-serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
@ 2015-11-24 16:16       ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-24 16:16 UTC (permalink / raw)
  To: serge
  Cc: linux-kernel, adityakali, linux-api, containers, cgroups,
	lxc-devel, akpm, ebiederm

Hello,

On Mon, Nov 16, 2015 at 01:51:38PM -0600, serge@hallyn.com wrote:
> +static char * __must_check kernfs_path_from_node_locked(
> +	struct kernfs_node *kn_from,
> +	struct kernfs_node *kn_to,
> +	char *buf,
> +	size_t buflen)
> +{
> +	char *p = buf;
> +	struct kernfs_node *kn;
> +	size_t depth_from = 0, depth_to, d;
>  	int len;
>  
> +	/* We atleast need 2 bytes to write "/\0". */
> +	BUG_ON(buflen < 2);

I don't think this is BUG worthy.  Just return NULL?  Also, the only
reason the original function returned char * was because the starting
point may not be the start of the buffer which helps keeping the
implementation simple.  If this function is gonna be complex anyway, a
better approach would be returning ssize_t and implement a simliar
behavior to strlcpy().

> +	/* Short-circuit the easy case - kn_to is the root node. */
> +	if ((kn_from == kn_to) || (!kn_from && !kn_to->parent)) {
> +		*p = '/';
> +		*(p + 1) = '\0';

Hmm... so if kn_from == kn_to, the output is "/"?

> +		return p;
> +	}
> +
> +	/* We can find the relative path only if both the nodes belong to the
> +	 * same kernfs root.
> +	 */
> +	if (kn_from) {
> +		BUG_ON(kernfs_root(kn_from) != kernfs_root(kn_to));

Ditto, just return NULL and maybe trigger WARN_ON_ONCE().

> +		depth_from = kernfs_node_depth(kn_from);
> +	}
> +
> +	depth_to = kernfs_node_depth(kn_to);
> +
> +	/* We compose path from left to right. So first write out all possible
                                             ^
					     , so

> +	 * "/.." strings needed to reach from 'kn_from' to the common ancestor.
> +	 */

Please fully-wing multiline comments.

> +	if (kn_from) {
> +		while (depth_from > depth_to) {
> +			len = strlen("/..");

Maybe do something like the following instead?

			const char parent_str[] = "/..";
			size_t len = sizeof(parent_str) - 1;

> +			if ((buflen - (p - buf)) < len + 1) {
> +				/* buffer not big enough. */
> +				buf[0] = '\0';
> +				return NULL;
> +			}
> +			memcpy(p, "/..", len);
> +			p += len;
> +			*p = '\0';
> +			--depth_from;
> +			kn_from = kn_from->parent;
>  		}
> +
> +		d = depth_to;
> +		kn = kn_to;
> +		while (depth_from < d) {
> +			kn = kn->parent;
> +			d--;
> +		}
> +
> +		/* Now we have 'depth_from == depth_to' at this point. Add more

Ditto with winging.

> +		 * "/.."s until we reach common ancestor. In the worst case,
> +		 * root node will be the common ancestor.
> +		 */
> +		while (depth_from > 0) {
> +			/* If we reached common ancestor, stop. */
> +			if (kn_from == kn)
> +				break;
> +			len = strlen("/..");
> +			if ((buflen - (p - buf)) < len + 1) {
> +				/* buffer not big enough. */
> +				buf[0] = '\0';
> +				return NULL;
> +			}
> +			memcpy(p, "/..", len);
> +			p += len;
> +			*p = '\0';
> +			--depth_from;
> +			kn_from = kn_from->parent;
> +			kn = kn->parent;
> +		}

Hmmm... I wonder whether this and the above block can be merged.
Wouldn't it be simpler to calculate common ancestor and generate
/.. till it reached that point?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
@ 2015-11-24 16:16       ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-24 16:16 UTC (permalink / raw)
  To: serge-A9i7LUbDfNHQT0dZR+AlfA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	adityakali-hpIqsD4AKlfQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

Hello,

On Mon, Nov 16, 2015 at 01:51:38PM -0600, serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org wrote:
> +static char * __must_check kernfs_path_from_node_locked(
> +	struct kernfs_node *kn_from,
> +	struct kernfs_node *kn_to,
> +	char *buf,
> +	size_t buflen)
> +{
> +	char *p = buf;
> +	struct kernfs_node *kn;
> +	size_t depth_from = 0, depth_to, d;
>  	int len;
>  
> +	/* We atleast need 2 bytes to write "/\0". */
> +	BUG_ON(buflen < 2);

I don't think this is BUG worthy.  Just return NULL?  Also, the only
reason the original function returned char * was because the starting
point may not be the start of the buffer which helps keeping the
implementation simple.  If this function is gonna be complex anyway, a
better approach would be returning ssize_t and implement a simliar
behavior to strlcpy().

> +	/* Short-circuit the easy case - kn_to is the root node. */
> +	if ((kn_from == kn_to) || (!kn_from && !kn_to->parent)) {
> +		*p = '/';
> +		*(p + 1) = '\0';

Hmm... so if kn_from == kn_to, the output is "/"?

> +		return p;
> +	}
> +
> +	/* We can find the relative path only if both the nodes belong to the
> +	 * same kernfs root.
> +	 */
> +	if (kn_from) {
> +		BUG_ON(kernfs_root(kn_from) != kernfs_root(kn_to));

Ditto, just return NULL and maybe trigger WARN_ON_ONCE().

> +		depth_from = kernfs_node_depth(kn_from);
> +	}
> +
> +	depth_to = kernfs_node_depth(kn_to);
> +
> +	/* We compose path from left to right. So first write out all possible
                                             ^
					     , so

> +	 * "/.." strings needed to reach from 'kn_from' to the common ancestor.
> +	 */

Please fully-wing multiline comments.

> +	if (kn_from) {
> +		while (depth_from > depth_to) {
> +			len = strlen("/..");

Maybe do something like the following instead?

			const char parent_str[] = "/..";
			size_t len = sizeof(parent_str) - 1;

> +			if ((buflen - (p - buf)) < len + 1) {
> +				/* buffer not big enough. */
> +				buf[0] = '\0';
> +				return NULL;
> +			}
> +			memcpy(p, "/..", len);
> +			p += len;
> +			*p = '\0';
> +			--depth_from;
> +			kn_from = kn_from->parent;
>  		}
> +
> +		d = depth_to;
> +		kn = kn_to;
> +		while (depth_from < d) {
> +			kn = kn->parent;
> +			d--;
> +		}
> +
> +		/* Now we have 'depth_from == depth_to' at this point. Add more

Ditto with winging.

> +		 * "/.."s until we reach common ancestor. In the worst case,
> +		 * root node will be the common ancestor.
> +		 */
> +		while (depth_from > 0) {
> +			/* If we reached common ancestor, stop. */
> +			if (kn_from == kn)
> +				break;
> +			len = strlen("/..");
> +			if ((buflen - (p - buf)) < len + 1) {
> +				/* buffer not big enough. */
> +				buf[0] = '\0';
> +				return NULL;
> +			}
> +			memcpy(p, "/..", len);
> +			p += len;
> +			*p = '\0';
> +			--depth_from;
> +			kn_from = kn_from->parent;
> +			kn = kn->parent;
> +		}

Hmmm... I wonder whether this and the above block can be merged.
Wouldn't it be simpler to calculate common ancestor and generate
/.. till it reached that point?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
       [not found]       ` <20151124161630.GL17033-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2015-11-24 16:17         ` Tejun Heo
  2015-11-27  5:25         ` Serge E. Hallyn
  1 sibling, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-24 16:17 UTC (permalink / raw)
  To: serge-A9i7LUbDfNHQT0dZR+AlfA
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

Oops, also please cc Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
on kernfs changes.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
       [not found]       ` <20151124161630.GL17033-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2015-11-24 16:17         ` Tejun Heo
  2015-11-27  5:25         ` Serge E. Hallyn
  1 sibling, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-24 16:17 UTC (permalink / raw)
  To: serge
  Cc: linux-kernel, adityakali, linux-api, containers, cgroups,
	lxc-devel, akpm, ebiederm

Oops, also please cc Greg Kroah-Hartman <gregkh@linuxfoundation.org>
on kernfs changes.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
@ 2015-11-24 16:17         ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-24 16:17 UTC (permalink / raw)
  To: serge-A9i7LUbDfNHQT0dZR+AlfA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	adityakali-hpIqsD4AKlfQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

Oops, also please cc Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
on kernfs changes.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 3/8] cgroup: add function to get task's cgroup
  2015-11-16 19:51     ` serge
@ 2015-11-24 16:27         ` Tejun Heo
  -1 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-24 16:27 UTC (permalink / raw)
  To: serge-A9i7LUbDfNHQT0dZR+AlfA
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

Hello,

On Mon, Nov 16, 2015 at 01:51:40PM -0600, serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org wrote:
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 22e3754..29f0b02 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -326,6 +326,7 @@ static inline bool css_tryget_online(struct cgroup_subsys_state *css)
>  		return percpu_ref_tryget_live(&css->refcnt);
>  	return true;
>  }
> +struct cgroup *get_task_cgroup(struct task_struct *task);

Please move this where other prototypes are.

> +/*
> + * get_task_cgroup - returns the cgroup of the task in the default cgroup
> + * hierarchy.
> + *
> + * @task: target task
> + * This function returns the @task's cgroup on the default cgroup hierarchy. The
> + * returned cgroup has its reference incremented (by calling cgroup_get()). So
> + * the caller must cgroup_put() the obtained reference once it is done with it.
> + */
> +struct cgroup *get_task_cgroup(struct task_struct *task)
> +{
> +	struct cgroup *cgrp;
> +
> +	mutex_lock(&cgroup_mutex);
> +	spin_lock_bh(&css_set_lock);
> +
> +	cgrp = task_cgroup_from_root(task, &cgrp_dfl_root);
> +	cgroup_get(cgrp);
> +
> +	spin_unlock_bh(&css_set_lock);
> +	mutex_unlock(&cgroup_mutex);
> +	return cgrp;
> +}
> +EXPORT_SYMBOL_GPL(get_task_cgroup);

So, exposing cgroup_mutex this way can lead to ugly lock dependency
issues as cgroup_mutex is expected to be outside of pretty much
everything.  task_cgroup_path() does it but it has no users (should
prolly removed) and cgroup_attach_task_all() is pretty specific.

Hmmm... cc'ing Li (btw, please cc him and Johannes from the next
posting).  Li, I don't think cset_cgroup_from_root() really needs
cgroup_mutex.  css_set_lock seems to be enough.  What do you think?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 3/8] cgroup: add function to get task's cgroup
@ 2015-11-24 16:27         ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-24 16:27 UTC (permalink / raw)
  To: serge
  Cc: linux-kernel, adityakali, linux-api, containers, cgroups,
	lxc-devel, akpm, ebiederm, Li Zefan

Hello,

On Mon, Nov 16, 2015 at 01:51:40PM -0600, serge@hallyn.com wrote:
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 22e3754..29f0b02 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -326,6 +326,7 @@ static inline bool css_tryget_online(struct cgroup_subsys_state *css)
>  		return percpu_ref_tryget_live(&css->refcnt);
>  	return true;
>  }
> +struct cgroup *get_task_cgroup(struct task_struct *task);

Please move this where other prototypes are.

> +/*
> + * get_task_cgroup - returns the cgroup of the task in the default cgroup
> + * hierarchy.
> + *
> + * @task: target task
> + * This function returns the @task's cgroup on the default cgroup hierarchy. The
> + * returned cgroup has its reference incremented (by calling cgroup_get()). So
> + * the caller must cgroup_put() the obtained reference once it is done with it.
> + */
> +struct cgroup *get_task_cgroup(struct task_struct *task)
> +{
> +	struct cgroup *cgrp;
> +
> +	mutex_lock(&cgroup_mutex);
> +	spin_lock_bh(&css_set_lock);
> +
> +	cgrp = task_cgroup_from_root(task, &cgrp_dfl_root);
> +	cgroup_get(cgrp);
> +
> +	spin_unlock_bh(&css_set_lock);
> +	mutex_unlock(&cgroup_mutex);
> +	return cgrp;
> +}
> +EXPORT_SYMBOL_GPL(get_task_cgroup);

So, exposing cgroup_mutex this way can lead to ugly lock dependency
issues as cgroup_mutex is expected to be outside of pretty much
everything.  task_cgroup_path() does it but it has no users (should
prolly removed) and cgroup_attach_task_all() is pretty specific.

Hmmm... cc'ing Li (btw, please cc him and Johannes from the next
posting).  Li, I don't think cset_cgroup_from_root() really needs
cgroup_mutex.  css_set_lock seems to be enough.  What do you think?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 4/8] cgroup: export cgroup_get() and cgroup_put()
  2015-11-16 19:51   ` serge-A9i7LUbDfNHQT0dZR+AlfA
@ 2015-11-24 16:30       ` Tejun Heo
  -1 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-24 16:30 UTC (permalink / raw)
  To: serge-A9i7LUbDfNHQT0dZR+AlfA
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

Hello,

On Mon, Nov 16, 2015 at 01:51:41PM -0600, serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org wrote:
> From: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> 
> move cgroup_get() and cgroup_put() into cgroup.h so that
> they can be called from other places.
> 
> Signed-off-by: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> ---
>  include/linux/cgroup.h |   21 +++++++++++++++++++++
>  kernel/cgroup.c        |   22 ----------------------
>  2 files changed, 21 insertions(+), 22 deletions(-)
> 
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 29f0b02..99096be 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -231,6 +231,27 @@ void css_task_iter_end(struct css_task_iter *it);
>  #define css_for_each_descendant_post(pos, css)				\
>  	for ((pos) = css_next_descendant_post(NULL, (css)); (pos);	\
>  	     (pos) = css_next_descendant_post((pos), (css)))

Please insert a blank line here.

> +/* convenient tests for these bits */

And I don't think the comment makes sense here.

> +static inline bool cgroup_is_dead(const struct cgroup *cgrp)
> +{
> +	return !(cgrp->self.flags & CSS_ONLINE);
> +}
> +
> +static inline void cgroup_get(struct cgroup *cgrp)
> +{
> +	WARN_ON_ONCE(cgroup_is_dead(cgrp));
> +	css_get(&cgrp->self);
> +}
> +
> +static inline bool cgroup_tryget(struct cgroup *cgrp)
> +{
> +	return css_tryget(&cgrp->self);
> +}
> +
> +static inline void cgroup_put(struct cgroup *cgrp)
> +{
> +	css_put(&cgrp->self);
> +}

So these are being exposed for cgroup NS support.  Hmmm... idk, does
cgroup NS support needs to be in a spearate file?  The added amount
isn't that big.  If we split cgroup.c, I'd much prefer to have
cgroup-internal.h for internally shared stuff than pushing them out to
cgroup.h.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 4/8] cgroup: export cgroup_get() and cgroup_put()
@ 2015-11-24 16:30       ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-24 16:30 UTC (permalink / raw)
  To: serge
  Cc: linux-kernel, adityakali, linux-api, containers, cgroups,
	lxc-devel, akpm, ebiederm

Hello,

On Mon, Nov 16, 2015 at 01:51:41PM -0600, serge@hallyn.com wrote:
> From: Aditya Kali <adityakali@google.com>
> 
> move cgroup_get() and cgroup_put() into cgroup.h so that
> they can be called from other places.
> 
> Signed-off-by: Aditya Kali <adityakali@google.com>
> Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
> ---
>  include/linux/cgroup.h |   21 +++++++++++++++++++++
>  kernel/cgroup.c        |   22 ----------------------
>  2 files changed, 21 insertions(+), 22 deletions(-)
> 
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 29f0b02..99096be 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -231,6 +231,27 @@ void css_task_iter_end(struct css_task_iter *it);
>  #define css_for_each_descendant_post(pos, css)				\
>  	for ((pos) = css_next_descendant_post(NULL, (css)); (pos);	\
>  	     (pos) = css_next_descendant_post((pos), (css)))

Please insert a blank line here.

> +/* convenient tests for these bits */

And I don't think the comment makes sense here.

> +static inline bool cgroup_is_dead(const struct cgroup *cgrp)
> +{
> +	return !(cgrp->self.flags & CSS_ONLINE);
> +}
> +
> +static inline void cgroup_get(struct cgroup *cgrp)
> +{
> +	WARN_ON_ONCE(cgroup_is_dead(cgrp));
> +	css_get(&cgrp->self);
> +}
> +
> +static inline bool cgroup_tryget(struct cgroup *cgrp)
> +{
> +	return css_tryget(&cgrp->self);
> +}
> +
> +static inline void cgroup_put(struct cgroup *cgrp)
> +{
> +	css_put(&cgrp->self);
> +}

So these are being exposed for cgroup NS support.  Hmmm... idk, does
cgroup NS support needs to be in a spearate file?  The added amount
isn't that big.  If we split cgroup.c, I'd much prefer to have
cgroup-internal.h for internally shared stuff than pushing them out to
cgroup.h.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/8] cgroup: introduce cgroup namespaces
       [not found]     ` <1447703505-29672-6-git-send-email-serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
@ 2015-11-24 16:49       ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-24 16:49 UTC (permalink / raw)
  To: serge-A9i7LUbDfNHQT0dZR+AlfA
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

Hello,

On Mon, Nov 16, 2015 at 01:51:42PM -0600, serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org wrote:
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 99096be..b3ce9d9 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -17,6 +17,9 @@
>  #include <linux/seq_file.h>
>  #include <linux/kernfs.h>
>  #include <linux/jump_label.h>
> +#include <linux/nsproxy.h>
> +#include <linux/types.h>
> +#include <linux/ns_common.h>
>  
>  #include <linux/cgroup-defs.h>
>  
> @@ -237,6 +240,10 @@ static inline bool cgroup_is_dead(const struct cgroup *cgrp)
>  	return !(cgrp->self.flags & CSS_ONLINE);
>  }
>  
> +static inline void css_get(struct cgroup_subsys_state *css);
> +static inline void css_put(struct cgroup_subsys_state *css);
> +static inline bool css_tryget(struct cgroup_subsys_state *css);

Heh, what's going on here?

> +
>  static inline void cgroup_get(struct cgroup *cgrp)
>  {
>  	WARN_ON_ONCE(cgroup_is_dead(cgrp));
> @@ -284,9 +291,11 @@ static inline void cgroup_put(struct cgroup *cgrp)
>  			;						\
>  		else
>  
> -/*
> - * Inline functions.
> - */
> +extern char * __must_check cgroup_path_ns(struct cgroup_namespace *ns,
> +		struct cgroup *cgrp, char *buf, size_t buflen);
> +
> +extern char * __must_check cgroup_path(struct cgroup *cgrp, char *buf,
> +		size_t buflen);

Please move them next to other prototypes and drop extern.

> diff --git a/include/linux/cgroup_namespace.h b/include/linux/cgroup_namespace.h
> new file mode 100644
> index 0000000..ed181c3
> --- /dev/null
> +++ b/include/linux/cgroup_namespace.h
> @@ -0,0 +1,46 @@
> +#ifndef _LINUX_CGROUP_NAMESPACE_H
> +#define _LINUX_CGROUP_NAMESPACE_H
> +
> +#include <linux/nsproxy.h>
> +#include <linux/cgroup.h>
> +#include <linux/types.h>
> +#include <linux/user_namespace.h>
> +
> +struct css_set;

Blank line here or linux/cgroup-defs.h can be included.

> +struct cgroup_namespace {
> +	atomic_t		count;
> +	struct ns_common	ns;
> +	struct user_namespace	*user_ns;
> +	struct css_set          *root_cgrps;
> +};
> +
> +extern struct cgroup_namespace init_cgroup_ns;
> +
> +static inline struct cgroup_namespace *get_cgroup_ns(
> +		struct cgroup_namespace *ns)

I personally prefer putting just the return type on a separate line
when things get too long.

static inline struct cgroup_namespace *
get_cgroup_ns(struct cgroup_namespace *ns)

> +{
> +	if (ns)
> +		atomic_inc(&ns->count);
> +	return ns;
> +}

Ugh... if the function doesn't do anything about the return type,
please make it a void function.  We tried the above style with kobj
and driver model and it ended up pretty horrible.

> +#ifdef CONFIG_CGROUPS
> +extern void free_cgroup_ns(struct cgroup_namespace *ns);
> +extern struct cgroup_namespace *copy_cgroup_ns(unsigned long flags,
> +					       struct user_namespace *user_ns,
> +					       struct cgroup_namespace *old_ns);

Please drop extern.

> +#else /* CONFIG_CGROUP */
> +static inline void free_cgroup_ns(struct cgroup_namespace *ns) { }
> +static inline struct cgroup_namespace *copy_cgroup_ns(unsigned long flags,
> +					struct user_namespace *user_ns,
> +					struct cgroup_namespace *old_ns)
> +{ return old_ns; }
> +#endif
> +
> +static inline void put_cgroup_ns(struct cgroup_namespace *ns)
> +{
> +	if (ns && atomic_dec_and_test(&ns->count))
> +		free_cgroup_ns(ns);
> +}
> +
> +#endif  /* _LINUX_CGROUP_NAMESPACE_H */

I don't know.  Does this warrant a separate file?

> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index e972259..1d696de 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -57,6 +57,8 @@
>  #include <linux/vmalloc.h> /* TODO: replace with more sophisticated array */
>  #include <linux/kthread.h>
>  #include <linux/delay.h>
> +#include <linux/proc_ns.h>
> +#include <linux/cgroup_namespace.h>
>  
>  #include <linux/atomic.h>
>  
> @@ -290,6 +292,15 @@ static bool cgroup_on_dfl(const struct cgroup *cgrp)
>  {
>  	return cgrp->root == &cgrp_dfl_root;
>  }
> +struct cgroup_namespace init_cgroup_ns = {
> +	.count = {
> +		.counter = 1,
> +	},
> +	.user_ns = &init_user_ns,
> +	.ns.ops = &cgroupns_operations,
> +	.ns.inum = PROC_CGROUP_INIT_INO,
> +	.root_cgrps = &init_css_set,
> +};

Can you please tab align the assignments?

> @@ -2148,6 +2159,28 @@ static struct file_system_type cgroup_fs_type = {
>  	.kill_sb = cgroup_kill_sb,
>  };
>  
> +char * __must_check cgroup_path_ns(struct cgroup_namespace *ns,
> +						 struct cgroup *cgrp, char *buf,
> +						 size_t buflen)

Please align to the same column as the argument on the first line and
make the optional @ns the last argument.

> +{
> +	if (ns) {
> +		struct cgroup *root;
> +		root = cset_cgroup_from_root(ns->root_cgrps, cgrp->root);
> +		return kernfs_path_from_node(root->kn, cgrp->kn, buf,
> +					     buflen);
> +	} else {
> +		return kernfs_path(cgrp->kn, buf, buflen);
> +	}
> +}
> +
> +char * __must_check cgroup_path(struct cgroup *cgrp, char *buf,
> +					      size_t buflen)
> +{
> +	return cgroup_path_ns(current->nsproxy->cgroup_ns, cgrp, buf,
> +				      buflen);

Ditto with alignment.

> diff --git a/kernel/cgroup_namespace.c b/kernel/cgroup_namespace.c
> new file mode 100644
> index 0000000..ef20777
> --- /dev/null
> +++ b/kernel/cgroup_namespace.c
> @@ -0,0 +1,127 @@
> +/*
> + *  Copyright (C) 2014 Google Inc.
> + *
> + *  Author: Aditya Kali (adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org)
> + *
> + *  This program is free software; you can redistribute it and/or modify it
> + *  under the terms of the GNU General Public License as published by the Free
> + *  Software Foundation, version 2 of the License.
> + */
> +
> +#include <linux/cgroup.h>
> +#include <linux/cgroup_namespace.h>
> +#include <linux/sched.h>
> +#include <linux/slab.h>
> +#include <linux/nsproxy.h>
> +#include <linux/proc_ns.h>
> +
> +const struct proc_ns_operations cgroupns_operations;
> +
> +static struct cgroup_namespace *alloc_cgroup_ns(void)
> +{
> +	struct cgroup_namespace *new_ns;
> +	int ret;
> +
> +	new_ns = kzalloc(sizeof(struct cgroup_namespace), GFP_KERNEL);
> +	if (!new_ns)
> +		return ERR_PTR(-ENOMEM);
> +	ret = ns_alloc_inum(&new_ns->ns);
> +	if (ret) {
> +		kfree(new_ns);
> +		return ERR_PTR(ret);
> +	}
> +	atomic_set(&new_ns->count, 1);
> +	new_ns->ns.ops = &cgroupns_operations;
> +	return new_ns;
> +}
> +
> +extern void put_css_set(struct css_set *cset);
> +extern  void get_css_set(struct css_set *cset);

Heh, idk, so we're moving cgroup_get/put() to cgroup.h while
redclaring css_set functions in this file?

> +struct cgroup_namespace *copy_cgroup_ns(unsigned long flags,
> +					struct user_namespace *user_ns,
> +					struct cgroup_namespace *old_ns)
> +{
> +	struct cgroup_namespace *new_ns = NULL;
> +	struct css_set *cgrps = NULL;
> +	int err;
> +
> +	BUG_ON(!old_ns);
> +
> +	if (!(flags & CLONE_NEWCGROUP))
> +		return get_cgroup_ns(old_ns);
> +
> +	/* Allow only sysadmin to create cgroup namespace. */
> +	err = -EPERM;
> +	if (!ns_capable(user_ns, CAP_SYS_ADMIN))
> +		goto err_out;
> +
> +	cgrps = task_css_set(current);
> +	get_css_set(cgrps);
> +
> +	err = -ENOMEM;
> +	new_ns = alloc_cgroup_ns();
> +	if (!new_ns)
> +		goto err_out;
> +
> +	new_ns->user_ns = get_user_ns(user_ns);
> +	new_ns->root_cgrps = cgrps;

Let's name it ->root_cset.  The data structures involved are already
really confusing.  No need to add more to it.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/8] cgroup: introduce cgroup namespaces
       [not found]     ` <1447703505-29672-6-git-send-email-serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
@ 2015-11-24 16:49       ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-24 16:49 UTC (permalink / raw)
  To: serge
  Cc: linux-kernel, adityakali, linux-api, containers, cgroups,
	lxc-devel, akpm, ebiederm, Serge Hallyn

Hello,

On Mon, Nov 16, 2015 at 01:51:42PM -0600, serge@hallyn.com wrote:
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 99096be..b3ce9d9 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -17,6 +17,9 @@
>  #include <linux/seq_file.h>
>  #include <linux/kernfs.h>
>  #include <linux/jump_label.h>
> +#include <linux/nsproxy.h>
> +#include <linux/types.h>
> +#include <linux/ns_common.h>
>  
>  #include <linux/cgroup-defs.h>
>  
> @@ -237,6 +240,10 @@ static inline bool cgroup_is_dead(const struct cgroup *cgrp)
>  	return !(cgrp->self.flags & CSS_ONLINE);
>  }
>  
> +static inline void css_get(struct cgroup_subsys_state *css);
> +static inline void css_put(struct cgroup_subsys_state *css);
> +static inline bool css_tryget(struct cgroup_subsys_state *css);

Heh, what's going on here?

> +
>  static inline void cgroup_get(struct cgroup *cgrp)
>  {
>  	WARN_ON_ONCE(cgroup_is_dead(cgrp));
> @@ -284,9 +291,11 @@ static inline void cgroup_put(struct cgroup *cgrp)
>  			;						\
>  		else
>  
> -/*
> - * Inline functions.
> - */
> +extern char * __must_check cgroup_path_ns(struct cgroup_namespace *ns,
> +		struct cgroup *cgrp, char *buf, size_t buflen);
> +
> +extern char * __must_check cgroup_path(struct cgroup *cgrp, char *buf,
> +		size_t buflen);

Please move them next to other prototypes and drop extern.

> diff --git a/include/linux/cgroup_namespace.h b/include/linux/cgroup_namespace.h
> new file mode 100644
> index 0000000..ed181c3
> --- /dev/null
> +++ b/include/linux/cgroup_namespace.h
> @@ -0,0 +1,46 @@
> +#ifndef _LINUX_CGROUP_NAMESPACE_H
> +#define _LINUX_CGROUP_NAMESPACE_H
> +
> +#include <linux/nsproxy.h>
> +#include <linux/cgroup.h>
> +#include <linux/types.h>
> +#include <linux/user_namespace.h>
> +
> +struct css_set;

Blank line here or linux/cgroup-defs.h can be included.

> +struct cgroup_namespace {
> +	atomic_t		count;
> +	struct ns_common	ns;
> +	struct user_namespace	*user_ns;
> +	struct css_set          *root_cgrps;
> +};
> +
> +extern struct cgroup_namespace init_cgroup_ns;
> +
> +static inline struct cgroup_namespace *get_cgroup_ns(
> +		struct cgroup_namespace *ns)

I personally prefer putting just the return type on a separate line
when things get too long.

static inline struct cgroup_namespace *
get_cgroup_ns(struct cgroup_namespace *ns)

> +{
> +	if (ns)
> +		atomic_inc(&ns->count);
> +	return ns;
> +}

Ugh... if the function doesn't do anything about the return type,
please make it a void function.  We tried the above style with kobj
and driver model and it ended up pretty horrible.

> +#ifdef CONFIG_CGROUPS
> +extern void free_cgroup_ns(struct cgroup_namespace *ns);
> +extern struct cgroup_namespace *copy_cgroup_ns(unsigned long flags,
> +					       struct user_namespace *user_ns,
> +					       struct cgroup_namespace *old_ns);

Please drop extern.

> +#else /* CONFIG_CGROUP */
> +static inline void free_cgroup_ns(struct cgroup_namespace *ns) { }
> +static inline struct cgroup_namespace *copy_cgroup_ns(unsigned long flags,
> +					struct user_namespace *user_ns,
> +					struct cgroup_namespace *old_ns)
> +{ return old_ns; }
> +#endif
> +
> +static inline void put_cgroup_ns(struct cgroup_namespace *ns)
> +{
> +	if (ns && atomic_dec_and_test(&ns->count))
> +		free_cgroup_ns(ns);
> +}
> +
> +#endif  /* _LINUX_CGROUP_NAMESPACE_H */

I don't know.  Does this warrant a separate file?

> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index e972259..1d696de 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -57,6 +57,8 @@
>  #include <linux/vmalloc.h> /* TODO: replace with more sophisticated array */
>  #include <linux/kthread.h>
>  #include <linux/delay.h>
> +#include <linux/proc_ns.h>
> +#include <linux/cgroup_namespace.h>
>  
>  #include <linux/atomic.h>
>  
> @@ -290,6 +292,15 @@ static bool cgroup_on_dfl(const struct cgroup *cgrp)
>  {
>  	return cgrp->root == &cgrp_dfl_root;
>  }
> +struct cgroup_namespace init_cgroup_ns = {
> +	.count = {
> +		.counter = 1,
> +	},
> +	.user_ns = &init_user_ns,
> +	.ns.ops = &cgroupns_operations,
> +	.ns.inum = PROC_CGROUP_INIT_INO,
> +	.root_cgrps = &init_css_set,
> +};

Can you please tab align the assignments?

> @@ -2148,6 +2159,28 @@ static struct file_system_type cgroup_fs_type = {
>  	.kill_sb = cgroup_kill_sb,
>  };
>  
> +char * __must_check cgroup_path_ns(struct cgroup_namespace *ns,
> +						 struct cgroup *cgrp, char *buf,
> +						 size_t buflen)

Please align to the same column as the argument on the first line and
make the optional @ns the last argument.

> +{
> +	if (ns) {
> +		struct cgroup *root;
> +		root = cset_cgroup_from_root(ns->root_cgrps, cgrp->root);
> +		return kernfs_path_from_node(root->kn, cgrp->kn, buf,
> +					     buflen);
> +	} else {
> +		return kernfs_path(cgrp->kn, buf, buflen);
> +	}
> +}
> +
> +char * __must_check cgroup_path(struct cgroup *cgrp, char *buf,
> +					      size_t buflen)
> +{
> +	return cgroup_path_ns(current->nsproxy->cgroup_ns, cgrp, buf,
> +				      buflen);

Ditto with alignment.

> diff --git a/kernel/cgroup_namespace.c b/kernel/cgroup_namespace.c
> new file mode 100644
> index 0000000..ef20777
> --- /dev/null
> +++ b/kernel/cgroup_namespace.c
> @@ -0,0 +1,127 @@
> +/*
> + *  Copyright (C) 2014 Google Inc.
> + *
> + *  Author: Aditya Kali (adityakali@google.com)
> + *
> + *  This program is free software; you can redistribute it and/or modify it
> + *  under the terms of the GNU General Public License as published by the Free
> + *  Software Foundation, version 2 of the License.
> + */
> +
> +#include <linux/cgroup.h>
> +#include <linux/cgroup_namespace.h>
> +#include <linux/sched.h>
> +#include <linux/slab.h>
> +#include <linux/nsproxy.h>
> +#include <linux/proc_ns.h>
> +
> +const struct proc_ns_operations cgroupns_operations;
> +
> +static struct cgroup_namespace *alloc_cgroup_ns(void)
> +{
> +	struct cgroup_namespace *new_ns;
> +	int ret;
> +
> +	new_ns = kzalloc(sizeof(struct cgroup_namespace), GFP_KERNEL);
> +	if (!new_ns)
> +		return ERR_PTR(-ENOMEM);
> +	ret = ns_alloc_inum(&new_ns->ns);
> +	if (ret) {
> +		kfree(new_ns);
> +		return ERR_PTR(ret);
> +	}
> +	atomic_set(&new_ns->count, 1);
> +	new_ns->ns.ops = &cgroupns_operations;
> +	return new_ns;
> +}
> +
> +extern void put_css_set(struct css_set *cset);
> +extern  void get_css_set(struct css_set *cset);

Heh, idk, so we're moving cgroup_get/put() to cgroup.h while
redclaring css_set functions in this file?

> +struct cgroup_namespace *copy_cgroup_ns(unsigned long flags,
> +					struct user_namespace *user_ns,
> +					struct cgroup_namespace *old_ns)
> +{
> +	struct cgroup_namespace *new_ns = NULL;
> +	struct css_set *cgrps = NULL;
> +	int err;
> +
> +	BUG_ON(!old_ns);
> +
> +	if (!(flags & CLONE_NEWCGROUP))
> +		return get_cgroup_ns(old_ns);
> +
> +	/* Allow only sysadmin to create cgroup namespace. */
> +	err = -EPERM;
> +	if (!ns_capable(user_ns, CAP_SYS_ADMIN))
> +		goto err_out;
> +
> +	cgrps = task_css_set(current);
> +	get_css_set(cgrps);
> +
> +	err = -ENOMEM;
> +	new_ns = alloc_cgroup_ns();
> +	if (!new_ns)
> +		goto err_out;
> +
> +	new_ns->user_ns = get_user_ns(user_ns);
> +	new_ns->root_cgrps = cgrps;

Let's name it ->root_cset.  The data structures involved are already
really confusing.  No need to add more to it.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 5/8] cgroup: introduce cgroup namespaces
@ 2015-11-24 16:49       ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-24 16:49 UTC (permalink / raw)
  To: serge-A9i7LUbDfNHQT0dZR+AlfA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	adityakali-hpIqsD4AKlfQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, Serge Hallyn

Hello,

On Mon, Nov 16, 2015 at 01:51:42PM -0600, serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org wrote:
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 99096be..b3ce9d9 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -17,6 +17,9 @@
>  #include <linux/seq_file.h>
>  #include <linux/kernfs.h>
>  #include <linux/jump_label.h>
> +#include <linux/nsproxy.h>
> +#include <linux/types.h>
> +#include <linux/ns_common.h>
>  
>  #include <linux/cgroup-defs.h>
>  
> @@ -237,6 +240,10 @@ static inline bool cgroup_is_dead(const struct cgroup *cgrp)
>  	return !(cgrp->self.flags & CSS_ONLINE);
>  }
>  
> +static inline void css_get(struct cgroup_subsys_state *css);
> +static inline void css_put(struct cgroup_subsys_state *css);
> +static inline bool css_tryget(struct cgroup_subsys_state *css);

Heh, what's going on here?

> +
>  static inline void cgroup_get(struct cgroup *cgrp)
>  {
>  	WARN_ON_ONCE(cgroup_is_dead(cgrp));
> @@ -284,9 +291,11 @@ static inline void cgroup_put(struct cgroup *cgrp)
>  			;						\
>  		else
>  
> -/*
> - * Inline functions.
> - */
> +extern char * __must_check cgroup_path_ns(struct cgroup_namespace *ns,
> +		struct cgroup *cgrp, char *buf, size_t buflen);
> +
> +extern char * __must_check cgroup_path(struct cgroup *cgrp, char *buf,
> +		size_t buflen);

Please move them next to other prototypes and drop extern.

> diff --git a/include/linux/cgroup_namespace.h b/include/linux/cgroup_namespace.h
> new file mode 100644
> index 0000000..ed181c3
> --- /dev/null
> +++ b/include/linux/cgroup_namespace.h
> @@ -0,0 +1,46 @@
> +#ifndef _LINUX_CGROUP_NAMESPACE_H
> +#define _LINUX_CGROUP_NAMESPACE_H
> +
> +#include <linux/nsproxy.h>
> +#include <linux/cgroup.h>
> +#include <linux/types.h>
> +#include <linux/user_namespace.h>
> +
> +struct css_set;

Blank line here or linux/cgroup-defs.h can be included.

> +struct cgroup_namespace {
> +	atomic_t		count;
> +	struct ns_common	ns;
> +	struct user_namespace	*user_ns;
> +	struct css_set          *root_cgrps;
> +};
> +
> +extern struct cgroup_namespace init_cgroup_ns;
> +
> +static inline struct cgroup_namespace *get_cgroup_ns(
> +		struct cgroup_namespace *ns)

I personally prefer putting just the return type on a separate line
when things get too long.

static inline struct cgroup_namespace *
get_cgroup_ns(struct cgroup_namespace *ns)

> +{
> +	if (ns)
> +		atomic_inc(&ns->count);
> +	return ns;
> +}

Ugh... if the function doesn't do anything about the return type,
please make it a void function.  We tried the above style with kobj
and driver model and it ended up pretty horrible.

> +#ifdef CONFIG_CGROUPS
> +extern void free_cgroup_ns(struct cgroup_namespace *ns);
> +extern struct cgroup_namespace *copy_cgroup_ns(unsigned long flags,
> +					       struct user_namespace *user_ns,
> +					       struct cgroup_namespace *old_ns);

Please drop extern.

> +#else /* CONFIG_CGROUP */
> +static inline void free_cgroup_ns(struct cgroup_namespace *ns) { }
> +static inline struct cgroup_namespace *copy_cgroup_ns(unsigned long flags,
> +					struct user_namespace *user_ns,
> +					struct cgroup_namespace *old_ns)
> +{ return old_ns; }
> +#endif
> +
> +static inline void put_cgroup_ns(struct cgroup_namespace *ns)
> +{
> +	if (ns && atomic_dec_and_test(&ns->count))
> +		free_cgroup_ns(ns);
> +}
> +
> +#endif  /* _LINUX_CGROUP_NAMESPACE_H */

I don't know.  Does this warrant a separate file?

> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index e972259..1d696de 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -57,6 +57,8 @@
>  #include <linux/vmalloc.h> /* TODO: replace with more sophisticated array */
>  #include <linux/kthread.h>
>  #include <linux/delay.h>
> +#include <linux/proc_ns.h>
> +#include <linux/cgroup_namespace.h>
>  
>  #include <linux/atomic.h>
>  
> @@ -290,6 +292,15 @@ static bool cgroup_on_dfl(const struct cgroup *cgrp)
>  {
>  	return cgrp->root == &cgrp_dfl_root;
>  }
> +struct cgroup_namespace init_cgroup_ns = {
> +	.count = {
> +		.counter = 1,
> +	},
> +	.user_ns = &init_user_ns,
> +	.ns.ops = &cgroupns_operations,
> +	.ns.inum = PROC_CGROUP_INIT_INO,
> +	.root_cgrps = &init_css_set,
> +};

Can you please tab align the assignments?

> @@ -2148,6 +2159,28 @@ static struct file_system_type cgroup_fs_type = {
>  	.kill_sb = cgroup_kill_sb,
>  };
>  
> +char * __must_check cgroup_path_ns(struct cgroup_namespace *ns,
> +						 struct cgroup *cgrp, char *buf,
> +						 size_t buflen)

Please align to the same column as the argument on the first line and
make the optional @ns the last argument.

> +{
> +	if (ns) {
> +		struct cgroup *root;
> +		root = cset_cgroup_from_root(ns->root_cgrps, cgrp->root);
> +		return kernfs_path_from_node(root->kn, cgrp->kn, buf,
> +					     buflen);
> +	} else {
> +		return kernfs_path(cgrp->kn, buf, buflen);
> +	}
> +}
> +
> +char * __must_check cgroup_path(struct cgroup *cgrp, char *buf,
> +					      size_t buflen)
> +{
> +	return cgroup_path_ns(current->nsproxy->cgroup_ns, cgrp, buf,
> +				      buflen);

Ditto with alignment.

> diff --git a/kernel/cgroup_namespace.c b/kernel/cgroup_namespace.c
> new file mode 100644
> index 0000000..ef20777
> --- /dev/null
> +++ b/kernel/cgroup_namespace.c
> @@ -0,0 +1,127 @@
> +/*
> + *  Copyright (C) 2014 Google Inc.
> + *
> + *  Author: Aditya Kali (adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org)
> + *
> + *  This program is free software; you can redistribute it and/or modify it
> + *  under the terms of the GNU General Public License as published by the Free
> + *  Software Foundation, version 2 of the License.
> + */
> +
> +#include <linux/cgroup.h>
> +#include <linux/cgroup_namespace.h>
> +#include <linux/sched.h>
> +#include <linux/slab.h>
> +#include <linux/nsproxy.h>
> +#include <linux/proc_ns.h>
> +
> +const struct proc_ns_operations cgroupns_operations;
> +
> +static struct cgroup_namespace *alloc_cgroup_ns(void)
> +{
> +	struct cgroup_namespace *new_ns;
> +	int ret;
> +
> +	new_ns = kzalloc(sizeof(struct cgroup_namespace), GFP_KERNEL);
> +	if (!new_ns)
> +		return ERR_PTR(-ENOMEM);
> +	ret = ns_alloc_inum(&new_ns->ns);
> +	if (ret) {
> +		kfree(new_ns);
> +		return ERR_PTR(ret);
> +	}
> +	atomic_set(&new_ns->count, 1);
> +	new_ns->ns.ops = &cgroupns_operations;
> +	return new_ns;
> +}
> +
> +extern void put_css_set(struct css_set *cset);
> +extern  void get_css_set(struct css_set *cset);

Heh, idk, so we're moving cgroup_get/put() to cgroup.h while
redclaring css_set functions in this file?

> +struct cgroup_namespace *copy_cgroup_ns(unsigned long flags,
> +					struct user_namespace *user_ns,
> +					struct cgroup_namespace *old_ns)
> +{
> +	struct cgroup_namespace *new_ns = NULL;
> +	struct css_set *cgrps = NULL;
> +	int err;
> +
> +	BUG_ON(!old_ns);
> +
> +	if (!(flags & CLONE_NEWCGROUP))
> +		return get_cgroup_ns(old_ns);
> +
> +	/* Allow only sysadmin to create cgroup namespace. */
> +	err = -EPERM;
> +	if (!ns_capable(user_ns, CAP_SYS_ADMIN))
> +		goto err_out;
> +
> +	cgrps = task_css_set(current);
> +	get_css_set(cgrps);
> +
> +	err = -ENOMEM;
> +	new_ns = alloc_cgroup_ns();
> +	if (!new_ns)
> +		goto err_out;
> +
> +	new_ns->user_ns = get_user_ns(user_ns);
> +	new_ns->root_cgrps = cgrps;

Let's name it ->root_cset.  The data structures involved are already
really confusing.  No need to add more to it.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 6/8] cgroup: cgroup namespace setns support
       [not found]     ` <1447703505-29672-7-git-send-email-serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
@ 2015-11-24 16:52       ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-24 16:52 UTC (permalink / raw)
  To: serge-A9i7LUbDfNHQT0dZR+AlfA
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On Mon, Nov 16, 2015 at 01:51:43PM -0600, serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org wrote:
> @@ -85,10 +85,27 @@ err_out:
>  	return ERR_PTR(err);
>  }
>  
> -static int cgroupns_install(struct nsproxy *nsproxy, void *ns)
> +static inline struct cgroup_namespace *to_cg_ns(struct ns_common *ns) {
> +	return container_of(ns, struct cgroup_namespace, ns);
> +}

Heh, what's up with the formatting?  Please update the patches to
conform to Documentation/CodingStyle in general.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 6/8] cgroup: cgroup namespace setns support
       [not found]     ` <1447703505-29672-7-git-send-email-serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
@ 2015-11-24 16:52       ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-24 16:52 UTC (permalink / raw)
  To: serge
  Cc: linux-kernel, adityakali, linux-api, containers, cgroups,
	lxc-devel, akpm, ebiederm

On Mon, Nov 16, 2015 at 01:51:43PM -0600, serge@hallyn.com wrote:
> @@ -85,10 +85,27 @@ err_out:
>  	return ERR_PTR(err);
>  }
>  
> -static int cgroupns_install(struct nsproxy *nsproxy, void *ns)
> +static inline struct cgroup_namespace *to_cg_ns(struct ns_common *ns) {
> +	return container_of(ns, struct cgroup_namespace, ns);
> +}

Heh, what's up with the formatting?  Please update the patches to
conform to Documentation/CodingStyle in general.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 6/8] cgroup: cgroup namespace setns support
@ 2015-11-24 16:52       ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-24 16:52 UTC (permalink / raw)
  To: serge-A9i7LUbDfNHQT0dZR+AlfA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	adityakali-hpIqsD4AKlfQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

On Mon, Nov 16, 2015 at 01:51:43PM -0600, serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org wrote:
> @@ -85,10 +85,27 @@ err_out:
>  	return ERR_PTR(err);
>  }
>  
> -static int cgroupns_install(struct nsproxy *nsproxy, void *ns)
> +static inline struct cgroup_namespace *to_cg_ns(struct ns_common *ns) {
> +	return container_of(ns, struct cgroup_namespace, ns);
> +}

Heh, what's up with the formatting?  Please update the patches to
conform to Documentation/CodingStyle in general.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 3/8] cgroup: add function to get task's cgroup
       [not found]         ` <20151124162728.GN17033-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2015-11-24 16:54           ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-24 16:54 UTC (permalink / raw)
  To: serge-A9i7LUbDfNHQT0dZR+AlfA
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On Tue, Nov 24, 2015 at 11:27:28AM -0500, Tejun Heo wrote:
> > +struct cgroup *get_task_cgroup(struct task_struct *task)

Umm... is this function even used?

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 3/8] cgroup: add function to get task's cgroup
       [not found]         ` <20151124162728.GN17033-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2015-11-24 16:54           ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-24 16:54 UTC (permalink / raw)
  To: serge
  Cc: linux-kernel, adityakali, linux-api, containers, cgroups,
	lxc-devel, akpm, ebiederm, Li Zefan

On Tue, Nov 24, 2015 at 11:27:28AM -0500, Tejun Heo wrote:
> > +struct cgroup *get_task_cgroup(struct task_struct *task)

Umm... is this function even used?

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 3/8] cgroup: add function to get task's cgroup
@ 2015-11-24 16:54           ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-24 16:54 UTC (permalink / raw)
  To: serge-A9i7LUbDfNHQT0dZR+AlfA
  Cc: adityakali-hpIqsD4AKlfQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On Tue, Nov 24, 2015 at 11:27:28AM -0500, Tejun Heo wrote:
> > +struct cgroup *get_task_cgroup(struct task_struct *task)

Umm... is this function even used?

-- 
tejun
_______________________________________________
lxc-devel mailing list
lxc-devel@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-devel

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
       [not found]     ` <1447703505-29672-8-git-send-email-serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
@ 2015-11-24 17:16       ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-24 17:16 UTC (permalink / raw)
  To: serge-A9i7LUbDfNHQT0dZR+AlfA
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

Hello,

On Mon, Nov 16, 2015 at 01:51:44PM -0600, serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org wrote:
> +struct dentry *kernfs_obtain_root(struct super_block *sb,
> +				  struct kernfs_node *kn)
> +{
> +	struct dentry *dentry;
> +	struct inode *inode;
> +
> +	BUG_ON(sb->s_op != &kernfs_sops);
> +
> +	/* inode for the given kernfs_node should already exist. */
> +	inode = ilookup(sb, kn->ino);
> +	if (!inode) {
> +		pr_debug("kernfs: could not get inode for '");
> +		pr_cont_kernfs_path(kn);
> +		pr_cont("'.\n");
> +		return ERR_PTR(-EINVAL);
> +	}

Hmmm... but inode might not have been instantiated yet.  Why not use
kernfs_get_inode()?

> +	/* instantiate and link root dentry */
> +	dentry = d_obtain_root(inode);
> +	if (!dentry) {
> +		pr_debug("kernfs: could not get dentry for '");
> +		pr_cont_kernfs_path(kn);
> +		pr_cont("'.\n");
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	/* If this is a new dentry, set it up. We need kernfs_mutex because this
> +	 * may be called by callers other than kernfs_fill_super. */

Formatting.

> +	mutex_lock(&kernfs_mutex);
> +	if (!dentry->d_fsdata) {
> +		kernfs_get(kn);
> +		dentry->d_fsdata = kn;
> +	} else {
> +		WARN_ON(dentry->d_fsdata != kn);
> +	}
> +	mutex_unlock(&kernfs_mutex);
> +
> +	return dentry;
> +}

Wouldn't it be simpler to walk dentry from kernfs root than
duplicating dentry instantiation?

> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 1d696de..0a3e893 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -2112,11 +2120,31 @@ out_free:
>  	kfree(opts.release_agent);
>  	kfree(opts.name);
>  
> -	if (ret)
> +	if (ret) {
> +		put_cgroup_ns(ns);
>  		return ERR_PTR(ret);
> +	}
>  
>  	dentry = kernfs_mount(fs_type, flags, root->kf_root,
>  				CGROUP_SUPER_MAGIC, &new_sb);
> +
> +	if (!IS_ERR(dentry)) {
> +		/* In non-init cgroup namespace, instead of root cgroup's
> +		 * dentry, we return the dentry corresponding to the
> +		 * cgroupns->root_cgrp.
> +		 */

Formatting.

> +		if (ns != &init_cgroup_ns) {
> +			struct dentry *nsdentry;
> +			struct cgroup *cgrp;
> +
> +			cgrp = cset_cgroup_from_root(ns->root_cgrps, root);
> +			nsdentry = kernfs_obtain_root(dentry->d_sb,
> +				cgrp->kn);
> +			dput(dentry);
> +			dentry = nsdentry;
> +		}
> +	}

So, this would effectively allow namespace mounts to claim controllers
which aren't configured otherwise which doesn't seem like a good idea.
I think the right thing to do for namespace mounts is to always
require an existing superblock.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
       [not found]     ` <1447703505-29672-8-git-send-email-serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
@ 2015-11-24 17:16       ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-24 17:16 UTC (permalink / raw)
  To: serge
  Cc: linux-kernel, adityakali, linux-api, containers, cgroups,
	lxc-devel, akpm, ebiederm

Hello,

On Mon, Nov 16, 2015 at 01:51:44PM -0600, serge@hallyn.com wrote:
> +struct dentry *kernfs_obtain_root(struct super_block *sb,
> +				  struct kernfs_node *kn)
> +{
> +	struct dentry *dentry;
> +	struct inode *inode;
> +
> +	BUG_ON(sb->s_op != &kernfs_sops);
> +
> +	/* inode for the given kernfs_node should already exist. */
> +	inode = ilookup(sb, kn->ino);
> +	if (!inode) {
> +		pr_debug("kernfs: could not get inode for '");
> +		pr_cont_kernfs_path(kn);
> +		pr_cont("'.\n");
> +		return ERR_PTR(-EINVAL);
> +	}

Hmmm... but inode might not have been instantiated yet.  Why not use
kernfs_get_inode()?

> +	/* instantiate and link root dentry */
> +	dentry = d_obtain_root(inode);
> +	if (!dentry) {
> +		pr_debug("kernfs: could not get dentry for '");
> +		pr_cont_kernfs_path(kn);
> +		pr_cont("'.\n");
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	/* If this is a new dentry, set it up. We need kernfs_mutex because this
> +	 * may be called by callers other than kernfs_fill_super. */

Formatting.

> +	mutex_lock(&kernfs_mutex);
> +	if (!dentry->d_fsdata) {
> +		kernfs_get(kn);
> +		dentry->d_fsdata = kn;
> +	} else {
> +		WARN_ON(dentry->d_fsdata != kn);
> +	}
> +	mutex_unlock(&kernfs_mutex);
> +
> +	return dentry;
> +}

Wouldn't it be simpler to walk dentry from kernfs root than
duplicating dentry instantiation?

> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 1d696de..0a3e893 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -2112,11 +2120,31 @@ out_free:
>  	kfree(opts.release_agent);
>  	kfree(opts.name);
>  
> -	if (ret)
> +	if (ret) {
> +		put_cgroup_ns(ns);
>  		return ERR_PTR(ret);
> +	}
>  
>  	dentry = kernfs_mount(fs_type, flags, root->kf_root,
>  				CGROUP_SUPER_MAGIC, &new_sb);
> +
> +	if (!IS_ERR(dentry)) {
> +		/* In non-init cgroup namespace, instead of root cgroup's
> +		 * dentry, we return the dentry corresponding to the
> +		 * cgroupns->root_cgrp.
> +		 */

Formatting.

> +		if (ns != &init_cgroup_ns) {
> +			struct dentry *nsdentry;
> +			struct cgroup *cgrp;
> +
> +			cgrp = cset_cgroup_from_root(ns->root_cgrps, root);
> +			nsdentry = kernfs_obtain_root(dentry->d_sb,
> +				cgrp->kn);
> +			dput(dentry);
> +			dentry = nsdentry;
> +		}
> +	}

So, this would effectively allow namespace mounts to claim controllers
which aren't configured otherwise which doesn't seem like a good idea.
I think the right thing to do for namespace mounts is to always
require an existing superblock.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
@ 2015-11-24 17:16       ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-24 17:16 UTC (permalink / raw)
  To: serge-A9i7LUbDfNHQT0dZR+AlfA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	adityakali-hpIqsD4AKlfQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

Hello,

On Mon, Nov 16, 2015 at 01:51:44PM -0600, serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org wrote:
> +struct dentry *kernfs_obtain_root(struct super_block *sb,
> +				  struct kernfs_node *kn)
> +{
> +	struct dentry *dentry;
> +	struct inode *inode;
> +
> +	BUG_ON(sb->s_op != &kernfs_sops);
> +
> +	/* inode for the given kernfs_node should already exist. */
> +	inode = ilookup(sb, kn->ino);
> +	if (!inode) {
> +		pr_debug("kernfs: could not get inode for '");
> +		pr_cont_kernfs_path(kn);
> +		pr_cont("'.\n");
> +		return ERR_PTR(-EINVAL);
> +	}

Hmmm... but inode might not have been instantiated yet.  Why not use
kernfs_get_inode()?

> +	/* instantiate and link root dentry */
> +	dentry = d_obtain_root(inode);
> +	if (!dentry) {
> +		pr_debug("kernfs: could not get dentry for '");
> +		pr_cont_kernfs_path(kn);
> +		pr_cont("'.\n");
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	/* If this is a new dentry, set it up. We need kernfs_mutex because this
> +	 * may be called by callers other than kernfs_fill_super. */

Formatting.

> +	mutex_lock(&kernfs_mutex);
> +	if (!dentry->d_fsdata) {
> +		kernfs_get(kn);
> +		dentry->d_fsdata = kn;
> +	} else {
> +		WARN_ON(dentry->d_fsdata != kn);
> +	}
> +	mutex_unlock(&kernfs_mutex);
> +
> +	return dentry;
> +}

Wouldn't it be simpler to walk dentry from kernfs root than
duplicating dentry instantiation?

> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 1d696de..0a3e893 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -2112,11 +2120,31 @@ out_free:
>  	kfree(opts.release_agent);
>  	kfree(opts.name);
>  
> -	if (ret)
> +	if (ret) {
> +		put_cgroup_ns(ns);
>  		return ERR_PTR(ret);
> +	}
>  
>  	dentry = kernfs_mount(fs_type, flags, root->kf_root,
>  				CGROUP_SUPER_MAGIC, &new_sb);
> +
> +	if (!IS_ERR(dentry)) {
> +		/* In non-init cgroup namespace, instead of root cgroup's
> +		 * dentry, we return the dentry corresponding to the
> +		 * cgroupns->root_cgrp.
> +		 */

Formatting.

> +		if (ns != &init_cgroup_ns) {
> +			struct dentry *nsdentry;
> +			struct cgroup *cgrp;
> +
> +			cgrp = cset_cgroup_from_root(ns->root_cgrps, root);
> +			nsdentry = kernfs_obtain_root(dentry->d_sb,
> +				cgrp->kn);
> +			dput(dentry);
> +			dentry = nsdentry;
> +		}
> +	}

So, this would effectively allow namespace mounts to claim controllers
which aren't configured otherwise which doesn't seem like a good idea.
I think the right thing to do for namespace mounts is to always
require an existing superblock.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 8/8] cgroup: Add documentation for cgroup namespaces
       [not found]     ` <1447703505-29672-9-git-send-email-serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
@ 2015-11-24 17:16       ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-24 17:16 UTC (permalink / raw)
  To: serge-A9i7LUbDfNHQT0dZR+AlfA
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On Mon, Nov 16, 2015 at 01:51:45PM -0600, serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org wrote:
> From: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> 
> Signed-off-by: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
> ---
>  Documentation/cgroups/namespace.txt |  142 +++++++++++++++++++++++++++++++++++

Please refresh on top of cgroup/for-4.5 branch.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 8/8] cgroup: Add documentation for cgroup namespaces
       [not found]     ` <1447703505-29672-9-git-send-email-serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
@ 2015-11-24 17:16       ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-24 17:16 UTC (permalink / raw)
  To: serge
  Cc: linux-kernel, adityakali, linux-api, containers, cgroups,
	lxc-devel, akpm, ebiederm, Serge Hallyn

On Mon, Nov 16, 2015 at 01:51:45PM -0600, serge@hallyn.com wrote:
> From: Aditya Kali <adityakali@google.com>
> 
> Signed-off-by: Aditya Kali <adityakali@google.com>
> Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
> ---
>  Documentation/cgroups/namespace.txt |  142 +++++++++++++++++++++++++++++++++++

Please refresh on top of cgroup/for-4.5 branch.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 8/8] cgroup: Add documentation for cgroup namespaces
@ 2015-11-24 17:16       ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-24 17:16 UTC (permalink / raw)
  To: serge-A9i7LUbDfNHQT0dZR+AlfA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	adityakali-hpIqsD4AKlfQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, Serge Hallyn

On Mon, Nov 16, 2015 at 01:51:45PM -0600, serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org wrote:
> From: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> 
> Signed-off-by: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
> ---
>  Documentation/cgroups/namespace.txt |  142 +++++++++++++++++++++++++++++++++++

Please refresh on top of cgroup/for-4.5 branch.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
       [not found]         ` <20151124161709.GM17033-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2015-11-24 17:43           ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-24 17:43 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On Tue, Nov 24, 2015 at 11:17:09AM -0500, Tejun Heo wrote:
> Oops, also please cc Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
> on kernfs changes.

Will do.  Thank you for all the feedback.  I'll send out a new set
when I get it all addressed.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
       [not found]         ` <20151124161709.GM17033-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2015-11-24 17:43           ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-24 17:43 UTC (permalink / raw)
  To: Tejun Heo
  Cc: serge, linux-kernel, adityakali, linux-api, containers, cgroups,
	lxc-devel, akpm, ebiederm

On Tue, Nov 24, 2015 at 11:17:09AM -0500, Tejun Heo wrote:
> Oops, also please cc Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> on kernfs changes.

Will do.  Thank you for all the feedback.  I'll send out a new set
when I get it all addressed.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
@ 2015-11-24 17:43           ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-24 17:43 UTC (permalink / raw)
  To: Tejun Heo
  Cc: serge-A9i7LUbDfNHQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	adityakali-hpIqsD4AKlfQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

On Tue, Nov 24, 2015 at 11:17:09AM -0500, Tejun Heo wrote:
> Oops, also please cc Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
> on kernfs changes.

Will do.  Thank you for all the feedback.  I'll send out a new set
when I get it all addressed.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 4/8] cgroup: export cgroup_get() and cgroup_put()
       [not found]       ` <20151124163056.GO17033-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2015-11-24 22:35         ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-24 22:35 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On Tue, Nov 24, 2015 at 11:30:56AM -0500, Tejun Heo wrote:
> Hello,
> 
> On Mon, Nov 16, 2015 at 01:51:41PM -0600, serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org wrote:
> > From: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > 
> > move cgroup_get() and cgroup_put() into cgroup.h so that
> > they can be called from other places.
> > 
> > Signed-off-by: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> > ---
> >  include/linux/cgroup.h |   21 +++++++++++++++++++++
> >  kernel/cgroup.c        |   22 ----------------------
> >  2 files changed, 21 insertions(+), 22 deletions(-)
> > 
> > diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> > index 29f0b02..99096be 100644
> > --- a/include/linux/cgroup.h
> > +++ b/include/linux/cgroup.h
> > @@ -231,6 +231,27 @@ void css_task_iter_end(struct css_task_iter *it);
> >  #define css_for_each_descendant_post(pos, css)				\
> >  	for ((pos) = css_next_descendant_post(NULL, (css)); (pos);	\
> >  	     (pos) = css_next_descendant_post((pos), (css)))
> 
> Please insert a blank line here.
> 
> > +/* convenient tests for these bits */
> 
> And I don't think the comment makes sense here.
> 
> > +static inline bool cgroup_is_dead(const struct cgroup *cgrp)
> > +{
> > +	return !(cgrp->self.flags & CSS_ONLINE);
> > +}
> > +
> > +static inline void cgroup_get(struct cgroup *cgrp)
> > +{
> > +	WARN_ON_ONCE(cgroup_is_dead(cgrp));
> > +	css_get(&cgrp->self);
> > +}
> > +
> > +static inline bool cgroup_tryget(struct cgroup *cgrp)
> > +{
> > +	return css_tryget(&cgrp->self);
> > +}
> > +
> > +static inline void cgroup_put(struct cgroup *cgrp)
> > +{
> > +	css_put(&cgrp->self);
> > +}
> 
> So these are being exposed for cgroup NS support.  Hmmm... idk, does
> cgroup NS support needs to be in a spearate file?  The added amount
> isn't that big.  If we split cgroup.c, I'd much prefer to have
> cgroup-internal.h for internally shared stuff than pushing them out to
> cgroup.h.

Yeah, I think it makes more sense to merge them.

thanks,
-serge

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 4/8] cgroup: export cgroup_get() and cgroup_put()
       [not found]       ` <20151124163056.GO17033-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2015-11-24 22:35         ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-24 22:35 UTC (permalink / raw)
  To: Tejun Heo
  Cc: serge, linux-kernel, adityakali, linux-api, containers, cgroups,
	lxc-devel, akpm, ebiederm

On Tue, Nov 24, 2015 at 11:30:56AM -0500, Tejun Heo wrote:
> Hello,
> 
> On Mon, Nov 16, 2015 at 01:51:41PM -0600, serge@hallyn.com wrote:
> > From: Aditya Kali <adityakali@google.com>
> > 
> > move cgroup_get() and cgroup_put() into cgroup.h so that
> > they can be called from other places.
> > 
> > Signed-off-by: Aditya Kali <adityakali@google.com>
> > Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
> > ---
> >  include/linux/cgroup.h |   21 +++++++++++++++++++++
> >  kernel/cgroup.c        |   22 ----------------------
> >  2 files changed, 21 insertions(+), 22 deletions(-)
> > 
> > diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> > index 29f0b02..99096be 100644
> > --- a/include/linux/cgroup.h
> > +++ b/include/linux/cgroup.h
> > @@ -231,6 +231,27 @@ void css_task_iter_end(struct css_task_iter *it);
> >  #define css_for_each_descendant_post(pos, css)				\
> >  	for ((pos) = css_next_descendant_post(NULL, (css)); (pos);	\
> >  	     (pos) = css_next_descendant_post((pos), (css)))
> 
> Please insert a blank line here.
> 
> > +/* convenient tests for these bits */
> 
> And I don't think the comment makes sense here.
> 
> > +static inline bool cgroup_is_dead(const struct cgroup *cgrp)
> > +{
> > +	return !(cgrp->self.flags & CSS_ONLINE);
> > +}
> > +
> > +static inline void cgroup_get(struct cgroup *cgrp)
> > +{
> > +	WARN_ON_ONCE(cgroup_is_dead(cgrp));
> > +	css_get(&cgrp->self);
> > +}
> > +
> > +static inline bool cgroup_tryget(struct cgroup *cgrp)
> > +{
> > +	return css_tryget(&cgrp->self);
> > +}
> > +
> > +static inline void cgroup_put(struct cgroup *cgrp)
> > +{
> > +	css_put(&cgrp->self);
> > +}
> 
> So these are being exposed for cgroup NS support.  Hmmm... idk, does
> cgroup NS support needs to be in a spearate file?  The added amount
> isn't that big.  If we split cgroup.c, I'd much prefer to have
> cgroup-internal.h for internally shared stuff than pushing them out to
> cgroup.h.

Yeah, I think it makes more sense to merge them.

thanks,
-serge

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 4/8] cgroup: export cgroup_get() and cgroup_put()
@ 2015-11-24 22:35         ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-24 22:35 UTC (permalink / raw)
  To: Tejun Heo
  Cc: serge-A9i7LUbDfNHQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	adityakali-hpIqsD4AKlfQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

On Tue, Nov 24, 2015 at 11:30:56AM -0500, Tejun Heo wrote:
> Hello,
> 
> On Mon, Nov 16, 2015 at 01:51:41PM -0600, serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org wrote:
> > From: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > 
> > move cgroup_get() and cgroup_put() into cgroup.h so that
> > they can be called from other places.
> > 
> > Signed-off-by: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> > ---
> >  include/linux/cgroup.h |   21 +++++++++++++++++++++
> >  kernel/cgroup.c        |   22 ----------------------
> >  2 files changed, 21 insertions(+), 22 deletions(-)
> > 
> > diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> > index 29f0b02..99096be 100644
> > --- a/include/linux/cgroup.h
> > +++ b/include/linux/cgroup.h
> > @@ -231,6 +231,27 @@ void css_task_iter_end(struct css_task_iter *it);
> >  #define css_for_each_descendant_post(pos, css)				\
> >  	for ((pos) = css_next_descendant_post(NULL, (css)); (pos);	\
> >  	     (pos) = css_next_descendant_post((pos), (css)))
> 
> Please insert a blank line here.
> 
> > +/* convenient tests for these bits */
> 
> And I don't think the comment makes sense here.
> 
> > +static inline bool cgroup_is_dead(const struct cgroup *cgrp)
> > +{
> > +	return !(cgrp->self.flags & CSS_ONLINE);
> > +}
> > +
> > +static inline void cgroup_get(struct cgroup *cgrp)
> > +{
> > +	WARN_ON_ONCE(cgroup_is_dead(cgrp));
> > +	css_get(&cgrp->self);
> > +}
> > +
> > +static inline bool cgroup_tryget(struct cgroup *cgrp)
> > +{
> > +	return css_tryget(&cgrp->self);
> > +}
> > +
> > +static inline void cgroup_put(struct cgroup *cgrp)
> > +{
> > +	css_put(&cgrp->self);
> > +}
> 
> So these are being exposed for cgroup NS support.  Hmmm... idk, does
> cgroup NS support needs to be in a spearate file?  The added amount
> isn't that big.  If we split cgroup.c, I'd much prefer to have
> cgroup-internal.h for internally shared stuff than pushing them out to
> cgroup.h.

Yeah, I think it makes more sense to merge them.

thanks,
-serge

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
       [not found]       ` <20151124171610.GS17033-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2015-11-25  6:01         ` Serge E. Hallyn
  2015-11-27  5:17         ` Serge E. Hallyn
  1 sibling, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-25  6:01 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On Tue, Nov 24, 2015 at 12:16:10PM -0500, Tejun Heo wrote:
...
> > +		if (ns != &init_cgroup_ns) {
> > +			struct dentry *nsdentry;
> > +			struct cgroup *cgrp;
> > +
> > +			cgrp = cset_cgroup_from_root(ns->root_cgrps, root);
> > +			nsdentry = kernfs_obtain_root(dentry->d_sb,
> > +				cgrp->kn);
> > +			dput(dentry);
> > +			dentry = nsdentry;
> > +		}
> > +	}
> 
> So, this would effectively allow namespace mounts to claim controllers
> which aren't configured otherwise which doesn't seem like a good idea.
> I think the right thing to do for namespace mounts is to always
> require an existing superblock.

that was my goal with https://git.kernel.org/cgit/linux/kernel/git/sergeh/linux-security.git/commit/?h=cgroupns.v4&id=8eb75d2bb24df59e262f050dce567d2332adc5f3
(which was sent inline earlier in this thread in response to Eric)  Does
that look sufficient?

thanks,
-serge

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
  2015-11-24 17:16       ` Tejun Heo
  (?)
@ 2015-11-25  6:01       ` Serge E. Hallyn
       [not found]         ` <20151125060156.GA678-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
  2015-11-25 19:10           ` Tejun Heo
  -1 siblings, 2 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-25  6:01 UTC (permalink / raw)
  To: Tejun Heo
  Cc: serge, linux-kernel, adityakali, linux-api, containers, cgroups,
	lxc-devel, akpm, ebiederm

On Tue, Nov 24, 2015 at 12:16:10PM -0500, Tejun Heo wrote:
...
> > +		if (ns != &init_cgroup_ns) {
> > +			struct dentry *nsdentry;
> > +			struct cgroup *cgrp;
> > +
> > +			cgrp = cset_cgroup_from_root(ns->root_cgrps, root);
> > +			nsdentry = kernfs_obtain_root(dentry->d_sb,
> > +				cgrp->kn);
> > +			dput(dentry);
> > +			dentry = nsdentry;
> > +		}
> > +	}
> 
> So, this would effectively allow namespace mounts to claim controllers
> which aren't configured otherwise which doesn't seem like a good idea.
> I think the right thing to do for namespace mounts is to always
> require an existing superblock.

that was my goal with https://git.kernel.org/cgit/linux/kernel/git/sergeh/linux-security.git/commit/?h=cgroupns.v4&id=8eb75d2bb24df59e262f050dce567d2332adc5f3
(which was sent inline earlier in this thread in response to Eric)  Does
that look sufficient?

thanks,
-serge

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
       [not found]         ` <20151125060156.GA678-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2015-11-25 19:10           ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-25 19:10 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

Hello, Serge.

On Wed, Nov 25, 2015 at 12:01:56AM -0600, Serge E. Hallyn wrote:
> that was my goal with https://git.kernel.org/cgit/linux/kernel/git/sergeh/linux-security.git/commit/?h=cgroupns.v4&id=8eb75d2bb24df59e262f050dce567d2332adc5f3
> (which was sent inline earlier in this thread in response to Eric)  Does
> that look sufficient?

Hmmm... but that wouldn't work with non-root and user ns.  I think
what's necessary is ensuring that namespace scoped mount never creates
a new hierarchy but always reuses an existing one.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
       [not found]         ` <20151125060156.GA678-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2015-11-25 19:10           ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-25 19:10 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: serge, linux-kernel, adityakali, linux-api, containers, cgroups,
	lxc-devel, akpm, ebiederm

Hello, Serge.

On Wed, Nov 25, 2015 at 12:01:56AM -0600, Serge E. Hallyn wrote:
> that was my goal with https://git.kernel.org/cgit/linux/kernel/git/sergeh/linux-security.git/commit/?h=cgroupns.v4&id=8eb75d2bb24df59e262f050dce567d2332adc5f3
> (which was sent inline earlier in this thread in response to Eric)  Does
> that look sufficient?

Hmmm... but that wouldn't work with non-root and user ns.  I think
what's necessary is ensuring that namespace scoped mount never creates
a new hierarchy but always reuses an existing one.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
@ 2015-11-25 19:10           ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-25 19:10 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: serge-A9i7LUbDfNHQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	adityakali-hpIqsD4AKlfQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

Hello, Serge.

On Wed, Nov 25, 2015 at 12:01:56AM -0600, Serge E. Hallyn wrote:
> that was my goal with https://git.kernel.org/cgit/linux/kernel/git/sergeh/linux-security.git/commit/?h=cgroupns.v4&id=8eb75d2bb24df59e262f050dce567d2332adc5f3
> (which was sent inline earlier in this thread in response to Eric)  Does
> that look sufficient?

Hmmm... but that wouldn't work with non-root and user ns.  I think
what's necessary is ensuring that namespace scoped mount never creates
a new hierarchy but always reuses an existing one.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
       [not found]           ` <20151125191041.GB14240-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
@ 2015-11-25 19:55             ` Serge Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge Hallyn @ 2015-11-25 19:55 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

Quoting Tejun Heo (tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org):
> Hello, Serge.
> 
> On Wed, Nov 25, 2015 at 12:01:56AM -0600, Serge E. Hallyn wrote:
> > that was my goal with https://git.kernel.org/cgit/linux/kernel/git/sergeh/linux-security.git/commit/?h=cgroupns.v4&id=8eb75d2bb24df59e262f050dce567d2332adc5f3
> > (which was sent inline earlier in this thread in response to Eric)  Does
> > that look sufficient?
> 
> Hmmm... but that wouldn't work with non-root and user ns.  I think

Are you sure?  IIUC that code block is only hit when we didn't find
an already-mounted subsystem.

> what's necessary is ensuring that namespace scoped mount never creates
> a new hierarchy but always reuses an existing one.
> 
> Thanks.
> 
> -- 
> tejun
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
       [not found]           ` <20151125191041.GB14240-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
@ 2015-11-25 19:55             ` Serge Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge Hallyn @ 2015-11-25 19:55 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-api, containers, linux-kernel, ebiederm, lxc-devel, cgroups, akpm

Quoting Tejun Heo (tj@kernel.org):
> Hello, Serge.
> 
> On Wed, Nov 25, 2015 at 12:01:56AM -0600, Serge E. Hallyn wrote:
> > that was my goal with https://git.kernel.org/cgit/linux/kernel/git/sergeh/linux-security.git/commit/?h=cgroupns.v4&id=8eb75d2bb24df59e262f050dce567d2332adc5f3
> > (which was sent inline earlier in this thread in response to Eric)  Does
> > that look sufficient?
> 
> Hmmm... but that wouldn't work with non-root and user ns.  I think

Are you sure?  IIUC that code block is only hit when we didn't find
an already-mounted subsystem.

> what's necessary is ensuring that namespace scoped mount never creates
> a new hierarchy but always reuses an existing one.
> 
> Thanks.
> 
> -- 
> tejun
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
@ 2015-11-25 19:55             ` Serge Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge Hallyn @ 2015-11-25 19:55 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

Quoting Tejun Heo (tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org):
> Hello, Serge.
> 
> On Wed, Nov 25, 2015 at 12:01:56AM -0600, Serge E. Hallyn wrote:
> > that was my goal with https://git.kernel.org/cgit/linux/kernel/git/sergeh/linux-security.git/commit/?h=cgroupns.v4&id=8eb75d2bb24df59e262f050dce567d2332adc5f3
> > (which was sent inline earlier in this thread in response to Eric)  Does
> > that look sufficient?
> 
> Hmmm... but that wouldn't work with non-root and user ns.  I think

Are you sure?  IIUC that code block is only hit when we didn't find
an already-mounted subsystem.

> what's necessary is ensuring that namespace scoped mount never creates
> a new hierarchy but always reuses an existing one.
> 
> Thanks.
> 
> -- 
> tejun
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
  2015-11-25 19:55             ` Serge Hallyn
  (?)
  (?)
@ 2015-11-25 19:57             ` Tejun Heo
  -1 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-25 19:57 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On Wed, Nov 25, 2015 at 07:55:53PM +0000, Serge Hallyn wrote:
> Quoting Tejun Heo (tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org):
> > Hello, Serge.
> > 
> > On Wed, Nov 25, 2015 at 12:01:56AM -0600, Serge E. Hallyn wrote:
> > > that was my goal with https://git.kernel.org/cgit/linux/kernel/git/sergeh/linux-security.git/commit/?h=cgroupns.v4&id=8eb75d2bb24df59e262f050dce567d2332adc5f3
> > > (which was sent inline earlier in this thread in response to Eric)  Does
> > > that look sufficient?
> > 
> > Hmmm... but that wouldn't work with non-root and user ns.  I think
> 
> Are you sure?  IIUC that code block is only hit when we didn't find
> an already-mounted subsystem.

Heh, you're right.  This should work.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
  2015-11-25 19:55             ` Serge Hallyn
@ 2015-11-25 19:57               ` Tejun Heo
  -1 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-25 19:57 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: linux-api, containers, linux-kernel, ebiederm, lxc-devel, cgroups, akpm

On Wed, Nov 25, 2015 at 07:55:53PM +0000, Serge Hallyn wrote:
> Quoting Tejun Heo (tj@kernel.org):
> > Hello, Serge.
> > 
> > On Wed, Nov 25, 2015 at 12:01:56AM -0600, Serge E. Hallyn wrote:
> > > that was my goal with https://git.kernel.org/cgit/linux/kernel/git/sergeh/linux-security.git/commit/?h=cgroupns.v4&id=8eb75d2bb24df59e262f050dce567d2332adc5f3
> > > (which was sent inline earlier in this thread in response to Eric)  Does
> > > that look sufficient?
> > 
> > Hmmm... but that wouldn't work with non-root and user ns.  I think
> 
> Are you sure?  IIUC that code block is only hit when we didn't find
> an already-mounted subsystem.

Heh, you're right.  This should work.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
@ 2015-11-25 19:57               ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-25 19:57 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On Wed, Nov 25, 2015 at 07:55:53PM +0000, Serge Hallyn wrote:
> Quoting Tejun Heo (tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org):
> > Hello, Serge.
> > 
> > On Wed, Nov 25, 2015 at 12:01:56AM -0600, Serge E. Hallyn wrote:
> > > that was my goal with https://git.kernel.org/cgit/linux/kernel/git/sergeh/linux-security.git/commit/?h=cgroupns.v4&id=8eb75d2bb24df59e262f050dce567d2332adc5f3
> > > (which was sent inline earlier in this thread in response to Eric)  Does
> > > that look sufficient?
> > 
> > Hmmm... but that wouldn't work with non-root and user ns.  I think
> 
> Are you sure?  IIUC that code block is only hit when we didn't find
> an already-mounted subsystem.

Heh, you're right.  This should work.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
       [not found]       ` <20151124171610.GS17033-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
  2015-11-25  6:01         ` Serge E. Hallyn
@ 2015-11-27  5:17         ` Serge E. Hallyn
  1 sibling, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-27  5:17 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On Tue, Nov 24, 2015 at 12:16:10PM -0500, Tejun Heo wrote:
> Hello,
> 
> On Mon, Nov 16, 2015 at 01:51:44PM -0600, serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org wrote:
> > +struct dentry *kernfs_obtain_root(struct super_block *sb,
> > +				  struct kernfs_node *kn)
> > +{
> > +	struct dentry *dentry;
> > +	struct inode *inode;
> > +
> > +	BUG_ON(sb->s_op != &kernfs_sops);
> > +
> > +	/* inode for the given kernfs_node should already exist. */
> > +	inode = ilookup(sb, kn->ino);
> > +	if (!inode) {
> > +		pr_debug("kernfs: could not get inode for '");
> > +		pr_cont_kernfs_path(kn);
> > +		pr_cont("'.\n");
> > +		return ERR_PTR(-EINVAL);
> > +	}
> 
> Hmmm... but inode might not have been instantiated yet.  Why not use
> kernfs_get_inode()?
> 
> > +	/* instantiate and link root dentry */
> > +	dentry = d_obtain_root(inode);
> > +	if (!dentry) {
> > +		pr_debug("kernfs: could not get dentry for '");
> > +		pr_cont_kernfs_path(kn);
> > +		pr_cont("'.\n");
> > +		return ERR_PTR(-ENOMEM);
> > +	}
> > +
> > +	/* If this is a new dentry, set it up. We need kernfs_mutex because this
> > +	 * may be called by callers other than kernfs_fill_super. */
> 
> Formatting.
> 
> > +	mutex_lock(&kernfs_mutex);
> > +	if (!dentry->d_fsdata) {
> > +		kernfs_get(kn);
> > +		dentry->d_fsdata = kn;
> > +	} else {
> > +		WARN_ON(dentry->d_fsdata != kn);
> > +	}
> > +	mutex_unlock(&kernfs_mutex);
> > +
> > +	return dentry;
> > +}
> 
> Wouldn't it be simpler to walk dentry from kernfs root than
> duplicating dentry instantiation?

Sorry I don't think I'm following.  Are you suggesting walking the
kn->parent chain backward and doing d_lookup() at each point starting
with sb->s_root?

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
       [not found]       ` <20151124171610.GS17033-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2015-11-27  5:17         ` Serge E. Hallyn
  2015-11-27  5:17         ` Serge E. Hallyn
  1 sibling, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-27  5:17 UTC (permalink / raw)
  To: Tejun Heo
  Cc: serge, linux-kernel, adityakali, linux-api, containers, cgroups,
	lxc-devel, akpm, ebiederm

On Tue, Nov 24, 2015 at 12:16:10PM -0500, Tejun Heo wrote:
> Hello,
> 
> On Mon, Nov 16, 2015 at 01:51:44PM -0600, serge@hallyn.com wrote:
> > +struct dentry *kernfs_obtain_root(struct super_block *sb,
> > +				  struct kernfs_node *kn)
> > +{
> > +	struct dentry *dentry;
> > +	struct inode *inode;
> > +
> > +	BUG_ON(sb->s_op != &kernfs_sops);
> > +
> > +	/* inode for the given kernfs_node should already exist. */
> > +	inode = ilookup(sb, kn->ino);
> > +	if (!inode) {
> > +		pr_debug("kernfs: could not get inode for '");
> > +		pr_cont_kernfs_path(kn);
> > +		pr_cont("'.\n");
> > +		return ERR_PTR(-EINVAL);
> > +	}
> 
> Hmmm... but inode might not have been instantiated yet.  Why not use
> kernfs_get_inode()?
> 
> > +	/* instantiate and link root dentry */
> > +	dentry = d_obtain_root(inode);
> > +	if (!dentry) {
> > +		pr_debug("kernfs: could not get dentry for '");
> > +		pr_cont_kernfs_path(kn);
> > +		pr_cont("'.\n");
> > +		return ERR_PTR(-ENOMEM);
> > +	}
> > +
> > +	/* If this is a new dentry, set it up. We need kernfs_mutex because this
> > +	 * may be called by callers other than kernfs_fill_super. */
> 
> Formatting.
> 
> > +	mutex_lock(&kernfs_mutex);
> > +	if (!dentry->d_fsdata) {
> > +		kernfs_get(kn);
> > +		dentry->d_fsdata = kn;
> > +	} else {
> > +		WARN_ON(dentry->d_fsdata != kn);
> > +	}
> > +	mutex_unlock(&kernfs_mutex);
> > +
> > +	return dentry;
> > +}
> 
> Wouldn't it be simpler to walk dentry from kernfs root than
> duplicating dentry instantiation?

Sorry I don't think I'm following.  Are you suggesting walking the
kn->parent chain backward and doing d_lookup() at each point starting
with sb->s_root?

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
@ 2015-11-27  5:17         ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-27  5:17 UTC (permalink / raw)
  To: Tejun Heo
  Cc: serge-A9i7LUbDfNHQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	adityakali-hpIqsD4AKlfQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

On Tue, Nov 24, 2015 at 12:16:10PM -0500, Tejun Heo wrote:
> Hello,
> 
> On Mon, Nov 16, 2015 at 01:51:44PM -0600, serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org wrote:
> > +struct dentry *kernfs_obtain_root(struct super_block *sb,
> > +				  struct kernfs_node *kn)
> > +{
> > +	struct dentry *dentry;
> > +	struct inode *inode;
> > +
> > +	BUG_ON(sb->s_op != &kernfs_sops);
> > +
> > +	/* inode for the given kernfs_node should already exist. */
> > +	inode = ilookup(sb, kn->ino);
> > +	if (!inode) {
> > +		pr_debug("kernfs: could not get inode for '");
> > +		pr_cont_kernfs_path(kn);
> > +		pr_cont("'.\n");
> > +		return ERR_PTR(-EINVAL);
> > +	}
> 
> Hmmm... but inode might not have been instantiated yet.  Why not use
> kernfs_get_inode()?
> 
> > +	/* instantiate and link root dentry */
> > +	dentry = d_obtain_root(inode);
> > +	if (!dentry) {
> > +		pr_debug("kernfs: could not get dentry for '");
> > +		pr_cont_kernfs_path(kn);
> > +		pr_cont("'.\n");
> > +		return ERR_PTR(-ENOMEM);
> > +	}
> > +
> > +	/* If this is a new dentry, set it up. We need kernfs_mutex because this
> > +	 * may be called by callers other than kernfs_fill_super. */
> 
> Formatting.
> 
> > +	mutex_lock(&kernfs_mutex);
> > +	if (!dentry->d_fsdata) {
> > +		kernfs_get(kn);
> > +		dentry->d_fsdata = kn;
> > +	} else {
> > +		WARN_ON(dentry->d_fsdata != kn);
> > +	}
> > +	mutex_unlock(&kernfs_mutex);
> > +
> > +	return dentry;
> > +}
> 
> Wouldn't it be simpler to walk dentry from kernfs root than
> duplicating dentry instantiation?

Sorry I don't think I'm following.  Are you suggesting walking the
kn->parent chain backward and doing d_lookup() at each point starting
with sb->s_root?

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
       [not found]       ` <20151124161630.GL17033-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
  2015-11-24 16:17         ` Tejun Heo
@ 2015-11-27  5:25         ` Serge E. Hallyn
  1 sibling, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-27  5:25 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On Tue, Nov 24, 2015 at 11:16:30AM -0500, Tejun Heo wrote:
> Hello,
> 
> On Mon, Nov 16, 2015 at 01:51:38PM -0600, serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org wrote:
> > +static char * __must_check kernfs_path_from_node_locked(

(Note I've rewritten this to find a common ancestor and walk back to
and from that, as you suggested later in this email)

> > +	/* Short-circuit the easy case - kn_to is the root node. */
> > +	if ((kn_from == kn_to) || (!kn_from && !kn_to->parent)) {
> > +		*p = '/';
> > +		*(p + 1) = '\0';
> 
> Hmm... so if kn_from == kn_to, the output is "/"?

Yes, that's what seems to make the most sense for cgroup namespaces.  I
could see a case for '.' being used instead in general, but for cgroup
namespaces I think we'd have to convert those back to '/'.  Otherwise
we'll fail in being able to run legacy software, which would get
confused.

-serge

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
       [not found]       ` <20151124161630.GL17033-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2015-11-27  5:25         ` Serge E. Hallyn
  2015-11-27  5:25         ` Serge E. Hallyn
  1 sibling, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-27  5:25 UTC (permalink / raw)
  To: Tejun Heo
  Cc: serge, linux-kernel, adityakali, linux-api, containers, cgroups,
	lxc-devel, akpm, ebiederm

On Tue, Nov 24, 2015 at 11:16:30AM -0500, Tejun Heo wrote:
> Hello,
> 
> On Mon, Nov 16, 2015 at 01:51:38PM -0600, serge@hallyn.com wrote:
> > +static char * __must_check kernfs_path_from_node_locked(

(Note I've rewritten this to find a common ancestor and walk back to
and from that, as you suggested later in this email)

> > +	/* Short-circuit the easy case - kn_to is the root node. */
> > +	if ((kn_from == kn_to) || (!kn_from && !kn_to->parent)) {
> > +		*p = '/';
> > +		*(p + 1) = '\0';
> 
> Hmm... so if kn_from == kn_to, the output is "/"?

Yes, that's what seems to make the most sense for cgroup namespaces.  I
could see a case for '.' being used instead in general, but for cgroup
namespaces I think we'd have to convert those back to '/'.  Otherwise
we'll fail in being able to run legacy software, which would get
confused.

-serge

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
@ 2015-11-27  5:25         ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-27  5:25 UTC (permalink / raw)
  To: Tejun Heo
  Cc: serge-A9i7LUbDfNHQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	adityakali-hpIqsD4AKlfQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

On Tue, Nov 24, 2015 at 11:16:30AM -0500, Tejun Heo wrote:
> Hello,
> 
> On Mon, Nov 16, 2015 at 01:51:38PM -0600, serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org wrote:
> > +static char * __must_check kernfs_path_from_node_locked(

(Note I've rewritten this to find a common ancestor and walk back to
and from that, as you suggested later in this email)

> > +	/* Short-circuit the easy case - kn_to is the root node. */
> > +	if ((kn_from == kn_to) || (!kn_from && !kn_to->parent)) {
> > +		*p = '/';
> > +		*(p + 1) = '\0';
> 
> Hmm... so if kn_from == kn_to, the output is "/"?

Yes, that's what seems to make the most sense for cgroup namespaces.  I
could see a case for '.' being used instead in general, but for cgroup
namespaces I think we'd have to convert those back to '/'.  Otherwise
we'll fail in being able to run legacy software, which would get
confused.

-serge

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
       [not found]         ` <20151127051745.GA24521-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2015-11-30 15:09           ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-30 15:09 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

Hello, Serge.

On Thu, Nov 26, 2015 at 11:17:45PM -0600, Serge E. Hallyn wrote:
> > Wouldn't it be simpler to walk dentry from kernfs root than
> > duplicating dentry instantiation?
> 
> Sorry I don't think I'm following.  Are you suggesting walking the
> kn->parent chain backward and doing d_lookup() at each point starting
> with sb->s_root?

Yeah, something like that.  I wonder whether there are already code
paths doing that.  What we need is a straight path walk.  I could be
wrong but it shouldn't be that complex and if it works out we can
avoid introducing another instantiation / lookup path.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
       [not found]         ` <20151127051745.GA24521-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2015-11-30 15:09           ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-30 15:09 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: serge, linux-kernel, adityakali, linux-api, containers, cgroups,
	lxc-devel, akpm, ebiederm

Hello, Serge.

On Thu, Nov 26, 2015 at 11:17:45PM -0600, Serge E. Hallyn wrote:
> > Wouldn't it be simpler to walk dentry from kernfs root than
> > duplicating dentry instantiation?
> 
> Sorry I don't think I'm following.  Are you suggesting walking the
> kn->parent chain backward and doing d_lookup() at each point starting
> with sb->s_root?

Yeah, something like that.  I wonder whether there are already code
paths doing that.  What we need is a straight path walk.  I could be
wrong but it shouldn't be that complex and if it works out we can
avoid introducing another instantiation / lookup path.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
@ 2015-11-30 15:09           ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-30 15:09 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: serge-A9i7LUbDfNHQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	adityakali-hpIqsD4AKlfQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

Hello, Serge.

On Thu, Nov 26, 2015 at 11:17:45PM -0600, Serge E. Hallyn wrote:
> > Wouldn't it be simpler to walk dentry from kernfs root than
> > duplicating dentry instantiation?
> 
> Sorry I don't think I'm following.  Are you suggesting walking the
> kn->parent chain backward and doing d_lookup() at each point starting
> with sb->s_root?

Yeah, something like that.  I wonder whether there are already code
paths doing that.  What we need is a straight path walk.  I could be
wrong but it shouldn't be that complex and if it works out we can
avoid introducing another instantiation / lookup path.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
       [not found]         ` <20151127052511.GA25490-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2015-11-30 15:11           ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-30 15:11 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

Hello,

On Thu, Nov 26, 2015 at 11:25:11PM -0600, Serge E. Hallyn wrote:
> > > +	/* Short-circuit the easy case - kn_to is the root node. */
> > > +	if ((kn_from == kn_to) || (!kn_from && !kn_to->parent)) {
> > > +		*p = '/';
> > > +		*(p + 1) = '\0';
> > 
> > Hmm... so if kn_from == kn_to, the output is "/"?
> 
> Yes, that's what seems to make the most sense for cgroup namespaces.  I
> could see a case for '.' being used instead in general, but for cgroup
> namespaces I think we'd have to convert those back to '/'.  Otherwise
> we'll fail in being able to run legacy software, which would get
> confused.

Yeah, I agree but the name is kinda misleading tho.  The output isn't
really a relative path but rather absolute path against the specified
root.  Maybe updating the function and parameter names would be
helpful?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
       [not found]         ` <20151127052511.GA25490-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2015-11-30 15:11           ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-30 15:11 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: serge, linux-kernel, adityakali, linux-api, containers, cgroups,
	lxc-devel, akpm, ebiederm

Hello,

On Thu, Nov 26, 2015 at 11:25:11PM -0600, Serge E. Hallyn wrote:
> > > +	/* Short-circuit the easy case - kn_to is the root node. */
> > > +	if ((kn_from == kn_to) || (!kn_from && !kn_to->parent)) {
> > > +		*p = '/';
> > > +		*(p + 1) = '\0';
> > 
> > Hmm... so if kn_from == kn_to, the output is "/"?
> 
> Yes, that's what seems to make the most sense for cgroup namespaces.  I
> could see a case for '.' being used instead in general, but for cgroup
> namespaces I think we'd have to convert those back to '/'.  Otherwise
> we'll fail in being able to run legacy software, which would get
> confused.

Yeah, I agree but the name is kinda misleading tho.  The output isn't
really a relative path but rather absolute path against the specified
root.  Maybe updating the function and parameter names would be
helpful?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
@ 2015-11-30 15:11           ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-30 15:11 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: adityakali-hpIqsD4AKlfQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	serge-A9i7LUbDfNHQT0dZR+AlfA

Hello,

On Thu, Nov 26, 2015 at 11:25:11PM -0600, Serge E. Hallyn wrote:
> > > +	/* Short-circuit the easy case - kn_to is the root node. */
> > > +	if ((kn_from == kn_to) || (!kn_from && !kn_to->parent)) {
> > > +		*p = '/';
> > > +		*(p + 1) = '\0';
> > 
> > Hmm... so if kn_from == kn_to, the output is "/"?
> 
> Yes, that's what seems to make the most sense for cgroup namespaces.  I
> could see a case for '.' being used instead in general, but for cgroup
> namespaces I think we'd have to convert those back to '/'.  Otherwise
> we'll fail in being able to run legacy software, which would get
> confused.

Yeah, I agree but the name is kinda misleading tho.  The output isn't
really a relative path but rather absolute path against the specified
root.  Maybe updating the function and parameter names would be
helpful?

Thanks.

-- 
tejun
_______________________________________________
lxc-devel mailing list
lxc-devel@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-devel

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
       [not found]           ` <20151130151147.GG3535-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2015-11-30 18:37             ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-30 18:37 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge E. Hallyn, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On Mon, Nov 30, 2015 at 10:11:47AM -0500, Tejun Heo wrote:
> Hello,
> 
> On Thu, Nov 26, 2015 at 11:25:11PM -0600, Serge E. Hallyn wrote:
> > > > +	/* Short-circuit the easy case - kn_to is the root node. */
> > > > +	if ((kn_from == kn_to) || (!kn_from && !kn_to->parent)) {
> > > > +		*p = '/';
> > > > +		*(p + 1) = '\0';
> > > 
> > > Hmm... so if kn_from == kn_to, the output is "/"?
> > 
> > Yes, that's what seems to make the most sense for cgroup namespaces.  I
> > could see a case for '.' being used instead in general, but for cgroup
> > namespaces I think we'd have to convert those back to '/'.  Otherwise
> > we'll fail in being able to run legacy software, which would get
> > confused.
> 
> Yeah, I agree but the name is kinda misleading tho.  The output isn't
> really a relative path but rather absolute path against the specified
> root.  Maybe updating the function and parameter names would be
> helpful?
> 
> Thanks.

Ok - updating the comment is simple enough.  Though the name/params
kernfs_path_from_node_locked(from, to) still seem to make sense.  Would
you prefer something like kernfs_absolute_path_from node_locked()?  I
hesitate to call 'from' 'root' since kernfs_root is a thing and this
is not that.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
       [not found]           ` <20151130151147.GG3535-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2015-11-30 18:37             ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-30 18:37 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Serge E. Hallyn, serge, linux-kernel, adityakali, linux-api,
	containers, cgroups, lxc-devel, akpm, ebiederm

On Mon, Nov 30, 2015 at 10:11:47AM -0500, Tejun Heo wrote:
> Hello,
> 
> On Thu, Nov 26, 2015 at 11:25:11PM -0600, Serge E. Hallyn wrote:
> > > > +	/* Short-circuit the easy case - kn_to is the root node. */
> > > > +	if ((kn_from == kn_to) || (!kn_from && !kn_to->parent)) {
> > > > +		*p = '/';
> > > > +		*(p + 1) = '\0';
> > > 
> > > Hmm... so if kn_from == kn_to, the output is "/"?
> > 
> > Yes, that's what seems to make the most sense for cgroup namespaces.  I
> > could see a case for '.' being used instead in general, but for cgroup
> > namespaces I think we'd have to convert those back to '/'.  Otherwise
> > we'll fail in being able to run legacy software, which would get
> > confused.
> 
> Yeah, I agree but the name is kinda misleading tho.  The output isn't
> really a relative path but rather absolute path against the specified
> root.  Maybe updating the function and parameter names would be
> helpful?
> 
> Thanks.

Ok - updating the comment is simple enough.  Though the name/params
kernfs_path_from_node_locked(from, to) still seem to make sense.  Would
you prefer something like kernfs_absolute_path_from node_locked()?  I
hesitate to call 'from' 'root' since kernfs_root is a thing and this
is not that.


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
@ 2015-11-30 18:37             ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-11-30 18:37 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Serge E. Hallyn, serge-A9i7LUbDfNHQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	adityakali-hpIqsD4AKlfQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

On Mon, Nov 30, 2015 at 10:11:47AM -0500, Tejun Heo wrote:
> Hello,
> 
> On Thu, Nov 26, 2015 at 11:25:11PM -0600, Serge E. Hallyn wrote:
> > > > +	/* Short-circuit the easy case - kn_to is the root node. */
> > > > +	if ((kn_from == kn_to) || (!kn_from && !kn_to->parent)) {
> > > > +		*p = '/';
> > > > +		*(p + 1) = '\0';
> > > 
> > > Hmm... so if kn_from == kn_to, the output is "/"?
> > 
> > Yes, that's what seems to make the most sense for cgroup namespaces.  I
> > could see a case for '.' being used instead in general, but for cgroup
> > namespaces I think we'd have to convert those back to '/'.  Otherwise
> > we'll fail in being able to run legacy software, which would get
> > confused.
> 
> Yeah, I agree but the name is kinda misleading tho.  The output isn't
> really a relative path but rather absolute path against the specified
> root.  Maybe updating the function and parameter names would be
> helpful?
> 
> Thanks.

Ok - updating the comment is simple enough.  Though the name/params
kernfs_path_from_node_locked(from, to) still seem to make sense.  Would
you prefer something like kernfs_absolute_path_from node_locked()?  I
hesitate to call 'from' 'root' since kernfs_root is a thing and this
is not that.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
       [not found]             ` <20151130183758.GA25433-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2015-11-30 22:53               ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-30 22:53 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

Hello, Serge.

On Mon, Nov 30, 2015 at 12:37:58PM -0600, Serge E. Hallyn wrote:
> > Yeah, I agree but the name is kinda misleading tho.  The output isn't
> > really a relative path but rather absolute path against the specified
> > root.  Maybe updating the function and parameter names would be
> > helpful?
> > 
> 
> Ok - updating the comment is simple enough.  Though the name/params
> kernfs_path_from_node_locked(from, to) still seem to make sense.  Would
> you prefer something like kernfs_absolute_path_from node_locked()?  I
> hesitate to call 'from' 'root' since kernfs_root is a thing and this
> is not that.

Hmmm... I see.  Let's just make sure that the comment is clear about
the fact that it calculates (pseudo) absolute path rather than
relative path.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
  2015-11-30 18:37             ` Serge E. Hallyn
  (?)
@ 2015-11-30 22:53             ` Tejun Heo
       [not found]               ` <20151130225318.GD9039-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
  2015-12-01  2:08                 ` Serge E. Hallyn
  -1 siblings, 2 replies; 180+ messages in thread
From: Tejun Heo @ 2015-11-30 22:53 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: serge, linux-kernel, adityakali, linux-api, containers, cgroups,
	lxc-devel, akpm, ebiederm

Hello, Serge.

On Mon, Nov 30, 2015 at 12:37:58PM -0600, Serge E. Hallyn wrote:
> > Yeah, I agree but the name is kinda misleading tho.  The output isn't
> > really a relative path but rather absolute path against the specified
> > root.  Maybe updating the function and parameter names would be
> > helpful?
> > 
> 
> Ok - updating the comment is simple enough.  Though the name/params
> kernfs_path_from_node_locked(from, to) still seem to make sense.  Would
> you prefer something like kernfs_absolute_path_from node_locked()?  I
> hesitate to call 'from' 'root' since kernfs_root is a thing and this
> is not that.

Hmmm... I see.  Let's just make sure that the comment is clear about
the fact that it calculates (pseudo) absolute path rather than
relative path.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
       [not found]               ` <20151130225318.GD9039-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2015-12-01  2:08                 ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-12-01  2:08 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge E. Hallyn, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On Mon, Nov 30, 2015 at 05:53:18PM -0500, Tejun Heo wrote:
> Hello, Serge.
> 
> On Mon, Nov 30, 2015 at 12:37:58PM -0600, Serge E. Hallyn wrote:
> > > Yeah, I agree but the name is kinda misleading tho.  The output isn't
> > > really a relative path but rather absolute path against the specified
> > > root.  Maybe updating the function and parameter names would be
> > > helpful?
> > > 
> > 
> > Ok - updating the comment is simple enough.  Though the name/params
> > kernfs_path_from_node_locked(from, to) still seem to make sense.  Would
> > you prefer something like kernfs_absolute_path_from node_locked()?  I
> > hesitate to call 'from' 'root' since kernfs_root is a thing and this
> > is not that.
> 
> Hmmm... I see.  Let's just make sure that the comment is clear about
> the fact that it calculates (pseudo) absolute path rather than
> relative path.
> 
> Thanks.

Ok, new patch follows (and is pushed at
 https://git.kernel.org/cgit/linux/kernel/git/sergeh/linux-security.git/log/?h=2015-11-30/cgroupns)

[PATCH 1/7] kernfs: Add API to generate relative kernfs path

The new function kernfs_path_from_node() generates and returns kernfs
path of a given kernfs_node relative to a given parent kernfs_node.

Changelog 20151125:
  - Fully-wing multilinecomments
  - Rework kernfs_path_from_node_locked() logic
  - Replace BUG_ONs with returning NULL
  - Use a const char* for /.. and precalculate its size
Changelog 20151130:
  - Update kernfs_path_from_node_locked comment

Signed-off-by: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Acked-by: Serge E. Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
---
 fs/kernfs/dir.c        | 182 +++++++++++++++++++++++++++++++++++++++++--------
 include/linux/kernfs.h |   3 +
 2 files changed, 158 insertions(+), 27 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 91e0045..7cd4bb4 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -44,28 +44,134 @@ static int kernfs_name_locked(struct kernfs_node *kn, char *buf, size_t buflen)
 	return strlcpy(buf, kn->parent ? kn->name : "/", buflen);
 }
 
-static char * __must_check kernfs_path_locked(struct kernfs_node *kn, char *buf,
-					      size_t buflen)
+/* kernfs_node_depth - compute depth from @from to @to */
+static size_t kernfs_node_distance(struct kernfs_node *from, struct kernfs_node *to)
 {
-	char *p = buf + buflen;
-	int len;
+	size_t depth = 0;
 
-	*--p = '\0';
+	BUG_ON(!to);
+	BUG_ON(!from);
 
-	do {
-		len = strlen(kn->name);
-		if (p - buf < len + 1) {
-			buf[0] = '\0';
-			p = NULL;
-			break;
-		}
-		p -= len;
-		memcpy(p, kn->name, len);
-		*--p = '/';
-		kn = kn->parent;
-	} while (kn && kn->parent);
+	while (to->parent && to != from) {
+		depth++;
+		to = to->parent;
+	}
+	return depth;
+}
 
-	return p;
+static struct kernfs_node *kernfs_common_ancestor(struct kernfs_node *a,
+		struct kernfs_node *b)
+{
+	size_t da = kernfs_node_distance(kernfs_root(a)->kn, a);
+	size_t db = kernfs_node_distance(kernfs_root(b)->kn, b);
+
+	if (da == 0)
+		return a;
+	if (db == 0)
+		return b;
+
+	while (da > db) {
+		a = a->parent;
+		da--;
+	}
+	while (db > da) {
+		b = b->parent;
+		db--;
+	}
+
+	/* worst case b and a will be the same at root */
+	while (b != a) {
+		b = b->parent;
+		a = a->parent;
+	}
+
+	return a;
+}
+
+/**
+ * kernfs_path_from_node_locked - find a pseudo-absolute path to @kn_to,
+ * where kn_from is treated as root of the path.
+ * @kn_from: kernfs node which should be treated as root for the path
+ * @kn_to: kernfs node to which path is needed
+ * @buf: buffer to copy the path into
+ * @buflen: size of @buf
+ *
+ * We need to handle couple of scenarios here:
+ * [1] when @kn_from is an ancestor of @kn_to at some level
+ * kn_from: /n1/n2/n3
+ * kn_to:   /n1/n2/n3/n4/n5
+ * result:  /n4/n5
+ *
+ * [2] when @kn_from is on a different hierarchy and we need to find common
+ * ancestor between @kn_from and @kn_to.
+ * kn_from: /n1/n2/n3/n4
+ * kn_to:   /n1/n2/n5
+ * result:  /../../n5
+ * OR
+ * kn_from: /n1/n2/n3/n4/n5   [depth=5]
+ * kn_to:   /n1/n2/n3         [depth=3]
+ * result:  /../..
+ */
+static char *
+__must_check kernfs_path_from_node_locked(struct kernfs_node *kn_from,
+					  struct kernfs_node *kn_to, char *buf,
+					  size_t buflen)
+{
+	char *p = buf;
+	struct kernfs_node *kn, *common;
+	const char parent_str[] = "/..";
+	int i;
+	size_t depth_from, depth_to, len = 0, nlen = 0,
+	       plen = sizeof(parent_str) - 1;
+
+	/* We atleast need 2 bytes to write "/\0". */
+	if (buflen < 2)
+		return NULL;
+
+	if (!kn_from)
+		kn_from = kernfs_root(kn_to)->kn;
+
+	if (kn_from == kn_to) {
+		*p = '/';
+		*(++p) = '\0';
+		return buf;
+	}
+
+	common = kernfs_common_ancestor(kn_from, kn_to);
+	if (!common) {
+		WARN_ONCE("%s: kn_from and kn_to on different roots\n",
+			__func__);
+		return NULL;
+	}
+
+	depth_to = kernfs_node_distance(common, kn_to);
+	depth_from = kernfs_node_distance(common, kn_from);
+
+	for (i = 0; i < depth_from; i++) {
+		if (len + plen + 1 > buflen)
+			return NULL;
+		strcpy(p, parent_str);
+		p += plen;
+		len += plen;
+	}
+
+	/* Calculate how many bytes we need for the rest */
+	for (kn = kn_to; kn != common; kn = kn->parent)
+		nlen += strlen(kn->name) + 1;
+
+	if (len + nlen + 1 > buflen)
+		return NULL;
+
+	p += nlen;
+	*p = '\0';
+	for (kn = kn_to; kn != common; kn = kn->parent) {
+		nlen = strlen(kn->name);
+		p -= nlen;
+		memcpy(p, kn->name, nlen);
+		*(--p) = '/';
+	}
+
+	return buf;
 }
 
 /**
@@ -115,26 +221,48 @@ size_t kernfs_path_len(struct kernfs_node *kn)
 }
 
 /**
- * kernfs_path - build full path of a given node
+ * kernfs_path_from_node - build path of node @kn relative to @kn_root.
+ * @kn_root: parent kernfs_node relative to which we need to build the path
  * @kn: kernfs_node of interest
- * @buf: buffer to copy @kn's name into
+ * @buf: buffer to copy @kn's path into
  * @buflen: size of @buf
  *
- * Builds and returns the full path of @kn in @buf of @buflen bytes.  The
- * path is built from the end of @buf so the returned pointer usually
- * doesn't match @buf.  If @buf isn't long enough, @buf is nul terminated
+ * Builds and returns @kn's path relative to @kn_root. @kn_root and @kn must
+ * be on the same kernfs-root. If @kn_root is not parent of @kn, then a relative
+ * path (which includes '..'s) as needed to reach from @kn_root to @kn is
+ * returned.
+ * The path may be built from the end of @buf so the returned pointer may not
+ * match @buf.  If @buf isn't long enough, @buf is nul terminated
  * and %NULL is returned.
  */
-char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen)
+char *kernfs_path_from_node(struct kernfs_node *kn_root, struct kernfs_node *kn,
+			    char *buf, size_t buflen)
 {
 	unsigned long flags;
 	char *p;
 
 	spin_lock_irqsave(&kernfs_rename_lock, flags);
-	p = kernfs_path_locked(kn, buf, buflen);
+	p = kernfs_path_from_node_locked(kn_root, kn, buf, buflen);
 	spin_unlock_irqrestore(&kernfs_rename_lock, flags);
 	return p;
 }
+EXPORT_SYMBOL_GPL(kernfs_path_from_node);
+
+/**
+ * kernfs_path - build full path of a given node
+ * @kn: kernfs_node of interest
+ * @buf: buffer to copy @kn's name into
+ * @buflen: size of @buf
+ *
+ * Builds and returns the full path of @kn in @buf of @buflen bytes.  The
+ * path is built from the end of @buf so the returned pointer usually
+ * doesn't match @buf.  If @buf isn't long enough, @buf is nul terminated
+ * and %NULL is returned.
+ */
+char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen)
+{
+	return kernfs_path_from_node(NULL, kn, buf, buflen);
+}
 EXPORT_SYMBOL_GPL(kernfs_path);
 
 /**
@@ -168,8 +296,8 @@ void pr_cont_kernfs_path(struct kernfs_node *kn)
 
 	spin_lock_irqsave(&kernfs_rename_lock, flags);
 
-	p = kernfs_path_locked(kn, kernfs_pr_cont_buf,
-			       sizeof(kernfs_pr_cont_buf));
+	p = kernfs_path_from_node_locked(NULL, kn, kernfs_pr_cont_buf,
+					 sizeof(kernfs_pr_cont_buf));
 	if (p)
 		pr_cont("%s", p);
 	else
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 5d4e9c4..d025ebd 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -267,6 +267,9 @@ static inline bool kernfs_ns_enabled(struct kernfs_node *kn)
 
 int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen);
 size_t kernfs_path_len(struct kernfs_node *kn);
+char * __must_check kernfs_path_from_node(struct kernfs_node *root_kn,
+					  struct kernfs_node *kn, char *buf,
+					  size_t buflen);
 char * __must_check kernfs_path(struct kernfs_node *kn, char *buf,
 				size_t buflen);
 void pr_cont_kernfs_name(struct kernfs_node *kn);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
       [not found]               ` <20151130225318.GD9039-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2015-12-01  2:08                 ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-12-01  2:08 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Serge E. Hallyn, serge, linux-kernel, adityakali, linux-api,
	containers, cgroups, lxc-devel, akpm, ebiederm

On Mon, Nov 30, 2015 at 05:53:18PM -0500, Tejun Heo wrote:
> Hello, Serge.
> 
> On Mon, Nov 30, 2015 at 12:37:58PM -0600, Serge E. Hallyn wrote:
> > > Yeah, I agree but the name is kinda misleading tho.  The output isn't
> > > really a relative path but rather absolute path against the specified
> > > root.  Maybe updating the function and parameter names would be
> > > helpful?
> > > 
> > 
> > Ok - updating the comment is simple enough.  Though the name/params
> > kernfs_path_from_node_locked(from, to) still seem to make sense.  Would
> > you prefer something like kernfs_absolute_path_from node_locked()?  I
> > hesitate to call 'from' 'root' since kernfs_root is a thing and this
> > is not that.
> 
> Hmmm... I see.  Let's just make sure that the comment is clear about
> the fact that it calculates (pseudo) absolute path rather than
> relative path.
> 
> Thanks.

Ok, new patch follows (and is pushed at
 https://git.kernel.org/cgit/linux/kernel/git/sergeh/linux-security.git/log/?h=2015-11-30/cgroupns)

[PATCH 1/7] kernfs: Add API to generate relative kernfs path

The new function kernfs_path_from_node() generates and returns kernfs
path of a given kernfs_node relative to a given parent kernfs_node.

Changelog 20151125:
  - Fully-wing multilinecomments
  - Rework kernfs_path_from_node_locked() logic
  - Replace BUG_ONs with returning NULL
  - Use a const char* for /.. and precalculate its size
Changelog 20151130:
  - Update kernfs_path_from_node_locked comment

Signed-off-by: Aditya Kali <adityakali@google.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
---
 fs/kernfs/dir.c        | 182 +++++++++++++++++++++++++++++++++++++++++--------
 include/linux/kernfs.h |   3 +
 2 files changed, 158 insertions(+), 27 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 91e0045..7cd4bb4 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -44,28 +44,134 @@ static int kernfs_name_locked(struct kernfs_node *kn, char *buf, size_t buflen)
 	return strlcpy(buf, kn->parent ? kn->name : "/", buflen);
 }
 
-static char * __must_check kernfs_path_locked(struct kernfs_node *kn, char *buf,
-					      size_t buflen)
+/* kernfs_node_depth - compute depth from @from to @to */
+static size_t kernfs_node_distance(struct kernfs_node *from, struct kernfs_node *to)
 {
-	char *p = buf + buflen;
-	int len;
+	size_t depth = 0;
 
-	*--p = '\0';
+	BUG_ON(!to);
+	BUG_ON(!from);
 
-	do {
-		len = strlen(kn->name);
-		if (p - buf < len + 1) {
-			buf[0] = '\0';
-			p = NULL;
-			break;
-		}
-		p -= len;
-		memcpy(p, kn->name, len);
-		*--p = '/';
-		kn = kn->parent;
-	} while (kn && kn->parent);
+	while (to->parent && to != from) {
+		depth++;
+		to = to->parent;
+	}
+	return depth;
+}
 
-	return p;
+static struct kernfs_node *kernfs_common_ancestor(struct kernfs_node *a,
+		struct kernfs_node *b)
+{
+	size_t da = kernfs_node_distance(kernfs_root(a)->kn, a);
+	size_t db = kernfs_node_distance(kernfs_root(b)->kn, b);
+
+	if (da == 0)
+		return a;
+	if (db == 0)
+		return b;
+
+	while (da > db) {
+		a = a->parent;
+		da--;
+	}
+	while (db > da) {
+		b = b->parent;
+		db--;
+	}
+
+	/* worst case b and a will be the same at root */
+	while (b != a) {
+		b = b->parent;
+		a = a->parent;
+	}
+
+	return a;
+}
+
+/**
+ * kernfs_path_from_node_locked - find a pseudo-absolute path to @kn_to,
+ * where kn_from is treated as root of the path.
+ * @kn_from: kernfs node which should be treated as root for the path
+ * @kn_to: kernfs node to which path is needed
+ * @buf: buffer to copy the path into
+ * @buflen: size of @buf
+ *
+ * We need to handle couple of scenarios here:
+ * [1] when @kn_from is an ancestor of @kn_to at some level
+ * kn_from: /n1/n2/n3
+ * kn_to:   /n1/n2/n3/n4/n5
+ * result:  /n4/n5
+ *
+ * [2] when @kn_from is on a different hierarchy and we need to find common
+ * ancestor between @kn_from and @kn_to.
+ * kn_from: /n1/n2/n3/n4
+ * kn_to:   /n1/n2/n5
+ * result:  /../../n5
+ * OR
+ * kn_from: /n1/n2/n3/n4/n5   [depth=5]
+ * kn_to:   /n1/n2/n3         [depth=3]
+ * result:  /../..
+ */
+static char *
+__must_check kernfs_path_from_node_locked(struct kernfs_node *kn_from,
+					  struct kernfs_node *kn_to, char *buf,
+					  size_t buflen)
+{
+	char *p = buf;
+	struct kernfs_node *kn, *common;
+	const char parent_str[] = "/..";
+	int i;
+	size_t depth_from, depth_to, len = 0, nlen = 0,
+	       plen = sizeof(parent_str) - 1;
+
+	/* We atleast need 2 bytes to write "/\0". */
+	if (buflen < 2)
+		return NULL;
+
+	if (!kn_from)
+		kn_from = kernfs_root(kn_to)->kn;
+
+	if (kn_from == kn_to) {
+		*p = '/';
+		*(++p) = '\0';
+		return buf;
+	}
+
+	common = kernfs_common_ancestor(kn_from, kn_to);
+	if (!common) {
+		WARN_ONCE("%s: kn_from and kn_to on different roots\n",
+			__func__);
+		return NULL;
+	}
+
+	depth_to = kernfs_node_distance(common, kn_to);
+	depth_from = kernfs_node_distance(common, kn_from);
+
+	for (i = 0; i < depth_from; i++) {
+		if (len + plen + 1 > buflen)
+			return NULL;
+		strcpy(p, parent_str);
+		p += plen;
+		len += plen;
+	}
+
+	/* Calculate how many bytes we need for the rest */
+	for (kn = kn_to; kn != common; kn = kn->parent)
+		nlen += strlen(kn->name) + 1;
+
+	if (len + nlen + 1 > buflen)
+		return NULL;
+
+	p += nlen;
+	*p = '\0';
+	for (kn = kn_to; kn != common; kn = kn->parent) {
+		nlen = strlen(kn->name);
+		p -= nlen;
+		memcpy(p, kn->name, nlen);
+		*(--p) = '/';
+	}
+
+	return buf;
 }
 
 /**
@@ -115,26 +221,48 @@ size_t kernfs_path_len(struct kernfs_node *kn)
 }
 
 /**
- * kernfs_path - build full path of a given node
+ * kernfs_path_from_node - build path of node @kn relative to @kn_root.
+ * @kn_root: parent kernfs_node relative to which we need to build the path
  * @kn: kernfs_node of interest
- * @buf: buffer to copy @kn's name into
+ * @buf: buffer to copy @kn's path into
  * @buflen: size of @buf
  *
- * Builds and returns the full path of @kn in @buf of @buflen bytes.  The
- * path is built from the end of @buf so the returned pointer usually
- * doesn't match @buf.  If @buf isn't long enough, @buf is nul terminated
+ * Builds and returns @kn's path relative to @kn_root. @kn_root and @kn must
+ * be on the same kernfs-root. If @kn_root is not parent of @kn, then a relative
+ * path (which includes '..'s) as needed to reach from @kn_root to @kn is
+ * returned.
+ * The path may be built from the end of @buf so the returned pointer may not
+ * match @buf.  If @buf isn't long enough, @buf is nul terminated
  * and %NULL is returned.
  */
-char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen)
+char *kernfs_path_from_node(struct kernfs_node *kn_root, struct kernfs_node *kn,
+			    char *buf, size_t buflen)
 {
 	unsigned long flags;
 	char *p;
 
 	spin_lock_irqsave(&kernfs_rename_lock, flags);
-	p = kernfs_path_locked(kn, buf, buflen);
+	p = kernfs_path_from_node_locked(kn_root, kn, buf, buflen);
 	spin_unlock_irqrestore(&kernfs_rename_lock, flags);
 	return p;
 }
+EXPORT_SYMBOL_GPL(kernfs_path_from_node);
+
+/**
+ * kernfs_path - build full path of a given node
+ * @kn: kernfs_node of interest
+ * @buf: buffer to copy @kn's name into
+ * @buflen: size of @buf
+ *
+ * Builds and returns the full path of @kn in @buf of @buflen bytes.  The
+ * path is built from the end of @buf so the returned pointer usually
+ * doesn't match @buf.  If @buf isn't long enough, @buf is nul terminated
+ * and %NULL is returned.
+ */
+char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen)
+{
+	return kernfs_path_from_node(NULL, kn, buf, buflen);
+}
 EXPORT_SYMBOL_GPL(kernfs_path);
 
 /**
@@ -168,8 +296,8 @@ void pr_cont_kernfs_path(struct kernfs_node *kn)
 
 	spin_lock_irqsave(&kernfs_rename_lock, flags);
 
-	p = kernfs_path_locked(kn, kernfs_pr_cont_buf,
-			       sizeof(kernfs_pr_cont_buf));
+	p = kernfs_path_from_node_locked(NULL, kn, kernfs_pr_cont_buf,
+					 sizeof(kernfs_pr_cont_buf));
 	if (p)
 		pr_cont("%s", p);
 	else
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 5d4e9c4..d025ebd 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -267,6 +267,9 @@ static inline bool kernfs_ns_enabled(struct kernfs_node *kn)
 
 int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen);
 size_t kernfs_path_len(struct kernfs_node *kn);
+char * __must_check kernfs_path_from_node(struct kernfs_node *root_kn,
+					  struct kernfs_node *kn, char *buf,
+					  size_t buflen);
 char * __must_check kernfs_path(struct kernfs_node *kn, char *buf,
 				size_t buflen);
 void pr_cont_kernfs_name(struct kernfs_node *kn);
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
@ 2015-12-01  2:08                 ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-12-01  2:08 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Serge E. Hallyn, serge-A9i7LUbDfNHQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	adityakali-hpIqsD4AKlfQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

On Mon, Nov 30, 2015 at 05:53:18PM -0500, Tejun Heo wrote:
> Hello, Serge.
> 
> On Mon, Nov 30, 2015 at 12:37:58PM -0600, Serge E. Hallyn wrote:
> > > Yeah, I agree but the name is kinda misleading tho.  The output isn't
> > > really a relative path but rather absolute path against the specified
> > > root.  Maybe updating the function and parameter names would be
> > > helpful?
> > > 
> > 
> > Ok - updating the comment is simple enough.  Though the name/params
> > kernfs_path_from_node_locked(from, to) still seem to make sense.  Would
> > you prefer something like kernfs_absolute_path_from node_locked()?  I
> > hesitate to call 'from' 'root' since kernfs_root is a thing and this
> > is not that.
> 
> Hmmm... I see.  Let's just make sure that the comment is clear about
> the fact that it calculates (pseudo) absolute path rather than
> relative path.
> 
> Thanks.

Ok, new patch follows (and is pushed at
 https://git.kernel.org/cgit/linux/kernel/git/sergeh/linux-security.git/log/?h=2015-11-30/cgroupns)

[PATCH 1/7] kernfs: Add API to generate relative kernfs path

The new function kernfs_path_from_node() generates and returns kernfs
path of a given kernfs_node relative to a given parent kernfs_node.

Changelog 20151125:
  - Fully-wing multilinecomments
  - Rework kernfs_path_from_node_locked() logic
  - Replace BUG_ONs with returning NULL
  - Use a const char* for /.. and precalculate its size
Changelog 20151130:
  - Update kernfs_path_from_node_locked comment

Signed-off-by: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Acked-by: Serge E. Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
---
 fs/kernfs/dir.c        | 182 +++++++++++++++++++++++++++++++++++++++++--------
 include/linux/kernfs.h |   3 +
 2 files changed, 158 insertions(+), 27 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 91e0045..7cd4bb4 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -44,28 +44,134 @@ static int kernfs_name_locked(struct kernfs_node *kn, char *buf, size_t buflen)
 	return strlcpy(buf, kn->parent ? kn->name : "/", buflen);
 }
 
-static char * __must_check kernfs_path_locked(struct kernfs_node *kn, char *buf,
-					      size_t buflen)
+/* kernfs_node_depth - compute depth from @from to @to */
+static size_t kernfs_node_distance(struct kernfs_node *from, struct kernfs_node *to)
 {
-	char *p = buf + buflen;
-	int len;
+	size_t depth = 0;
 
-	*--p = '\0';
+	BUG_ON(!to);
+	BUG_ON(!from);
 
-	do {
-		len = strlen(kn->name);
-		if (p - buf < len + 1) {
-			buf[0] = '\0';
-			p = NULL;
-			break;
-		}
-		p -= len;
-		memcpy(p, kn->name, len);
-		*--p = '/';
-		kn = kn->parent;
-	} while (kn && kn->parent);
+	while (to->parent && to != from) {
+		depth++;
+		to = to->parent;
+	}
+	return depth;
+}
 
-	return p;
+static struct kernfs_node *kernfs_common_ancestor(struct kernfs_node *a,
+		struct kernfs_node *b)
+{
+	size_t da = kernfs_node_distance(kernfs_root(a)->kn, a);
+	size_t db = kernfs_node_distance(kernfs_root(b)->kn, b);
+
+	if (da == 0)
+		return a;
+	if (db == 0)
+		return b;
+
+	while (da > db) {
+		a = a->parent;
+		da--;
+	}
+	while (db > da) {
+		b = b->parent;
+		db--;
+	}
+
+	/* worst case b and a will be the same at root */
+	while (b != a) {
+		b = b->parent;
+		a = a->parent;
+	}
+
+	return a;
+}
+
+/**
+ * kernfs_path_from_node_locked - find a pseudo-absolute path to @kn_to,
+ * where kn_from is treated as root of the path.
+ * @kn_from: kernfs node which should be treated as root for the path
+ * @kn_to: kernfs node to which path is needed
+ * @buf: buffer to copy the path into
+ * @buflen: size of @buf
+ *
+ * We need to handle couple of scenarios here:
+ * [1] when @kn_from is an ancestor of @kn_to at some level
+ * kn_from: /n1/n2/n3
+ * kn_to:   /n1/n2/n3/n4/n5
+ * result:  /n4/n5
+ *
+ * [2] when @kn_from is on a different hierarchy and we need to find common
+ * ancestor between @kn_from and @kn_to.
+ * kn_from: /n1/n2/n3/n4
+ * kn_to:   /n1/n2/n5
+ * result:  /../../n5
+ * OR
+ * kn_from: /n1/n2/n3/n4/n5   [depth=5]
+ * kn_to:   /n1/n2/n3         [depth=3]
+ * result:  /../..
+ */
+static char *
+__must_check kernfs_path_from_node_locked(struct kernfs_node *kn_from,
+					  struct kernfs_node *kn_to, char *buf,
+					  size_t buflen)
+{
+	char *p = buf;
+	struct kernfs_node *kn, *common;
+	const char parent_str[] = "/..";
+	int i;
+	size_t depth_from, depth_to, len = 0, nlen = 0,
+	       plen = sizeof(parent_str) - 1;
+
+	/* We atleast need 2 bytes to write "/\0". */
+	if (buflen < 2)
+		return NULL;
+
+	if (!kn_from)
+		kn_from = kernfs_root(kn_to)->kn;
+
+	if (kn_from == kn_to) {
+		*p = '/';
+		*(++p) = '\0';
+		return buf;
+	}
+
+	common = kernfs_common_ancestor(kn_from, kn_to);
+	if (!common) {
+		WARN_ONCE("%s: kn_from and kn_to on different roots\n",
+			__func__);
+		return NULL;
+	}
+
+	depth_to = kernfs_node_distance(common, kn_to);
+	depth_from = kernfs_node_distance(common, kn_from);
+
+	for (i = 0; i < depth_from; i++) {
+		if (len + plen + 1 > buflen)
+			return NULL;
+		strcpy(p, parent_str);
+		p += plen;
+		len += plen;
+	}
+
+	/* Calculate how many bytes we need for the rest */
+	for (kn = kn_to; kn != common; kn = kn->parent)
+		nlen += strlen(kn->name) + 1;
+
+	if (len + nlen + 1 > buflen)
+		return NULL;
+
+	p += nlen;
+	*p = '\0';
+	for (kn = kn_to; kn != common; kn = kn->parent) {
+		nlen = strlen(kn->name);
+		p -= nlen;
+		memcpy(p, kn->name, nlen);
+		*(--p) = '/';
+	}
+
+	return buf;
 }
 
 /**
@@ -115,26 +221,48 @@ size_t kernfs_path_len(struct kernfs_node *kn)
 }
 
 /**
- * kernfs_path - build full path of a given node
+ * kernfs_path_from_node - build path of node @kn relative to @kn_root.
+ * @kn_root: parent kernfs_node relative to which we need to build the path
  * @kn: kernfs_node of interest
- * @buf: buffer to copy @kn's name into
+ * @buf: buffer to copy @kn's path into
  * @buflen: size of @buf
  *
- * Builds and returns the full path of @kn in @buf of @buflen bytes.  The
- * path is built from the end of @buf so the returned pointer usually
- * doesn't match @buf.  If @buf isn't long enough, @buf is nul terminated
+ * Builds and returns @kn's path relative to @kn_root. @kn_root and @kn must
+ * be on the same kernfs-root. If @kn_root is not parent of @kn, then a relative
+ * path (which includes '..'s) as needed to reach from @kn_root to @kn is
+ * returned.
+ * The path may be built from the end of @buf so the returned pointer may not
+ * match @buf.  If @buf isn't long enough, @buf is nul terminated
  * and %NULL is returned.
  */
-char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen)
+char *kernfs_path_from_node(struct kernfs_node *kn_root, struct kernfs_node *kn,
+			    char *buf, size_t buflen)
 {
 	unsigned long flags;
 	char *p;
 
 	spin_lock_irqsave(&kernfs_rename_lock, flags);
-	p = kernfs_path_locked(kn, buf, buflen);
+	p = kernfs_path_from_node_locked(kn_root, kn, buf, buflen);
 	spin_unlock_irqrestore(&kernfs_rename_lock, flags);
 	return p;
 }
+EXPORT_SYMBOL_GPL(kernfs_path_from_node);
+
+/**
+ * kernfs_path - build full path of a given node
+ * @kn: kernfs_node of interest
+ * @buf: buffer to copy @kn's name into
+ * @buflen: size of @buf
+ *
+ * Builds and returns the full path of @kn in @buf of @buflen bytes.  The
+ * path is built from the end of @buf so the returned pointer usually
+ * doesn't match @buf.  If @buf isn't long enough, @buf is nul terminated
+ * and %NULL is returned.
+ */
+char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen)
+{
+	return kernfs_path_from_node(NULL, kn, buf, buflen);
+}
 EXPORT_SYMBOL_GPL(kernfs_path);
 
 /**
@@ -168,8 +296,8 @@ void pr_cont_kernfs_path(struct kernfs_node *kn)
 
 	spin_lock_irqsave(&kernfs_rename_lock, flags);
 
-	p = kernfs_path_locked(kn, kernfs_pr_cont_buf,
-			       sizeof(kernfs_pr_cont_buf));
+	p = kernfs_path_from_node_locked(NULL, kn, kernfs_pr_cont_buf,
+					 sizeof(kernfs_pr_cont_buf));
 	if (p)
 		pr_cont("%s", p);
 	else
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 5d4e9c4..d025ebd 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -267,6 +267,9 @@ static inline bool kernfs_ns_enabled(struct kernfs_node *kn)
 
 int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen);
 size_t kernfs_path_len(struct kernfs_node *kn);
+char * __must_check kernfs_path_from_node(struct kernfs_node *root_kn,
+					  struct kernfs_node *kn, char *buf,
+					  size_t buflen);
 char * __must_check kernfs_path(struct kernfs_node *kn, char *buf,
 				size_t buflen);
 void pr_cont_kernfs_name(struct kernfs_node *kn);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
       [not found]           ` <20151130150938.GF3535-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2015-12-01  4:07             ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-12-01  4:07 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge E. Hallyn, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On Mon, Nov 30, 2015 at 10:09:38AM -0500, Tejun Heo wrote:
> Hello, Serge.
> 
> On Thu, Nov 26, 2015 at 11:17:45PM -0600, Serge E. Hallyn wrote:
> > > Wouldn't it be simpler to walk dentry from kernfs root than
> > > duplicating dentry instantiation?
> > 
> > Sorry I don't think I'm following.  Are you suggesting walking the
> > kn->parent chain backward and doing d_lookup() at each point starting
> > with sb->s_root?
> 
> Yeah, something like that.  I wonder whether there are already code
> paths doing that.  What we need is a straight path walk.  I could be
> wrong but it shouldn't be that complex and if it works out we can
> avoid introducing another instantiation / lookup path.
> 
> Thanks.

So actually the way the code is now, the first mount cannot
be done from a non-init user namespace; and kernfs_obtain_root()
is only called from non-init user namespace.  So can we assume
that the root dentry will be instantiated?  (or can it get
evicted?)

If we can assume that then most of that fn can go away.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
       [not found]           ` <20151130150938.GF3535-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2015-12-01  4:07             ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-12-01  4:07 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Serge E. Hallyn, serge, linux-kernel, adityakali, linux-api,
	containers, cgroups, lxc-devel, akpm, ebiederm

On Mon, Nov 30, 2015 at 10:09:38AM -0500, Tejun Heo wrote:
> Hello, Serge.
> 
> On Thu, Nov 26, 2015 at 11:17:45PM -0600, Serge E. Hallyn wrote:
> > > Wouldn't it be simpler to walk dentry from kernfs root than
> > > duplicating dentry instantiation?
> > 
> > Sorry I don't think I'm following.  Are you suggesting walking the
> > kn->parent chain backward and doing d_lookup() at each point starting
> > with sb->s_root?
> 
> Yeah, something like that.  I wonder whether there are already code
> paths doing that.  What we need is a straight path walk.  I could be
> wrong but it shouldn't be that complex and if it works out we can
> avoid introducing another instantiation / lookup path.
> 
> Thanks.

So actually the way the code is now, the first mount cannot
be done from a non-init user namespace; and kernfs_obtain_root()
is only called from non-init user namespace.  So can we assume
that the root dentry will be instantiated?  (or can it get
evicted?)

If we can assume that then most of that fn can go away.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
@ 2015-12-01  4:07             ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-12-01  4:07 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Serge E. Hallyn, serge-A9i7LUbDfNHQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	adityakali-hpIqsD4AKlfQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

On Mon, Nov 30, 2015 at 10:09:38AM -0500, Tejun Heo wrote:
> Hello, Serge.
> 
> On Thu, Nov 26, 2015 at 11:17:45PM -0600, Serge E. Hallyn wrote:
> > > Wouldn't it be simpler to walk dentry from kernfs root than
> > > duplicating dentry instantiation?
> > 
> > Sorry I don't think I'm following.  Are you suggesting walking the
> > kn->parent chain backward and doing d_lookup() at each point starting
> > with sb->s_root?
> 
> Yeah, something like that.  I wonder whether there are already code
> paths doing that.  What we need is a straight path walk.  I could be
> wrong but it shouldn't be that complex and if it works out we can
> avoid introducing another instantiation / lookup path.
> 
> Thanks.

So actually the way the code is now, the first mount cannot
be done from a non-init user namespace; and kernfs_obtain_root()
is only called from non-init user namespace.  So can we assume
that the root dentry will be instantiated?  (or can it get
evicted?)

If we can assume that then most of that fn can go away.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
       [not found]             ` <20151201040704.GA31067-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2015-12-01 16:46               ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-12-01 16:46 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

Hey, Serge.

On Mon, Nov 30, 2015 at 10:07:04PM -0600, Serge E. Hallyn wrote:
> So actually the way the code is now, the first mount cannot
> be done from a non-init user namespace; and kernfs_obtain_root()
> is only called from non-init user namespace.  So can we assume
> that the root dentry will be instantiated?  (or can it get
> evicted?)
> 
> If we can assume that then most of that fn can go away.

The v2 hierarchy is always mounted and non-init ns shouldn't be able
to create new v1 hierarchies, so the root dentry should always be
there.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
       [not found]             ` <20151201040704.GA31067-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2015-12-01 16:46               ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-12-01 16:46 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: serge, linux-kernel, adityakali, linux-api, containers, cgroups,
	lxc-devel, akpm, ebiederm

Hey, Serge.

On Mon, Nov 30, 2015 at 10:07:04PM -0600, Serge E. Hallyn wrote:
> So actually the way the code is now, the first mount cannot
> be done from a non-init user namespace; and kernfs_obtain_root()
> is only called from non-init user namespace.  So can we assume
> that the root dentry will be instantiated?  (or can it get
> evicted?)
> 
> If we can assume that then most of that fn can go away.

The v2 hierarchy is always mounted and non-init ns shouldn't be able
to create new v1 hierarchies, so the root dentry should always be
there.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
@ 2015-12-01 16:46               ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-12-01 16:46 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: serge-A9i7LUbDfNHQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	adityakali-hpIqsD4AKlfQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

Hey, Serge.

On Mon, Nov 30, 2015 at 10:07:04PM -0600, Serge E. Hallyn wrote:
> So actually the way the code is now, the first mount cannot
> be done from a non-init user namespace; and kernfs_obtain_root()
> is only called from non-init user namespace.  So can we assume
> that the root dentry will be instantiated?  (or can it get
> evicted?)
> 
> If we can assume that then most of that fn can go away.

The v2 hierarchy is always mounted and non-init ns shouldn't be able
to create new v1 hierarchies, so the root dentry should always be
there.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
       [not found]               ` <20151201164649.GD12922-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2015-12-01 21:58                 ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-12-01 21:58 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge E. Hallyn, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On Tue, Dec 01, 2015 at 11:46:49AM -0500, Tejun Heo wrote:
> Hey, Serge.
> 
> On Mon, Nov 30, 2015 at 10:07:04PM -0600, Serge E. Hallyn wrote:
> > So actually the way the code is now, the first mount cannot
> > be done from a non-init user namespace; and kernfs_obtain_root()
> > is only called from non-init user namespace.  So can we assume
> > that the root dentry will be instantiated?  (or can it get
> > evicted?)
> > 
> > If we can assume that then most of that fn can go away.
> 
> The v2 hierarchy is always mounted and non-init ns shouldn't be able
> to create new v1 hierarchies, so the root dentry should always be
> there.

I mispoke before though - it's not the hierarchy's root dentry,
but rather a dentry for a descendent cgroup which will become the
root dentry for the new superblock.  We do know that there must be
a css_set with a cgroup.  I'm still trying to track down whether
that cgrou's inode's dentry can ever be flushed.  I would think
not but am not sure.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
       [not found]               ` <20151201164649.GD12922-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2015-12-01 21:58                 ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-12-01 21:58 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Serge E. Hallyn, serge, linux-kernel, adityakali, linux-api,
	containers, cgroups, lxc-devel, akpm, ebiederm

On Tue, Dec 01, 2015 at 11:46:49AM -0500, Tejun Heo wrote:
> Hey, Serge.
> 
> On Mon, Nov 30, 2015 at 10:07:04PM -0600, Serge E. Hallyn wrote:
> > So actually the way the code is now, the first mount cannot
> > be done from a non-init user namespace; and kernfs_obtain_root()
> > is only called from non-init user namespace.  So can we assume
> > that the root dentry will be instantiated?  (or can it get
> > evicted?)
> > 
> > If we can assume that then most of that fn can go away.
> 
> The v2 hierarchy is always mounted and non-init ns shouldn't be able
> to create new v1 hierarchies, so the root dentry should always be
> there.

I mispoke before though - it's not the hierarchy's root dentry,
but rather a dentry for a descendent cgroup which will become the
root dentry for the new superblock.  We do know that there must be
a css_set with a cgroup.  I'm still trying to track down whether
that cgrou's inode's dentry can ever be flushed.  I would think
not but am not sure.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
@ 2015-12-01 21:58                 ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-12-01 21:58 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Serge E. Hallyn, serge-A9i7LUbDfNHQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	adityakali-hpIqsD4AKlfQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

On Tue, Dec 01, 2015 at 11:46:49AM -0500, Tejun Heo wrote:
> Hey, Serge.
> 
> On Mon, Nov 30, 2015 at 10:07:04PM -0600, Serge E. Hallyn wrote:
> > So actually the way the code is now, the first mount cannot
> > be done from a non-init user namespace; and kernfs_obtain_root()
> > is only called from non-init user namespace.  So can we assume
> > that the root dentry will be instantiated?  (or can it get
> > evicted?)
> > 
> > If we can assume that then most of that fn can go away.
> 
> The v2 hierarchy is always mounted and non-init ns shouldn't be able
> to create new v1 hierarchies, so the root dentry should always be
> there.

I mispoke before though - it's not the hierarchy's root dentry,
but rather a dentry for a descendent cgroup which will become the
root dentry for the new superblock.  We do know that there must be
a css_set with a cgroup.  I'm still trying to track down whether
that cgrou's inode's dentry can ever be flushed.  I would think
not but am not sure.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
       [not found]                 ` <20151201215853.GA9153-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2015-12-02 16:53                   ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-12-02 16:53 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

Hello, Serge.

On Tue, Dec 01, 2015 at 03:58:53PM -0600, Serge E. Hallyn wrote:
> I mispoke before though - it's not the hierarchy's root dentry,
> but rather a dentry for a descendent cgroup which will become the
> root dentry for the new superblock.  We do know that there must be
> a css_set with a cgroup.  I'm still trying to track down whether
> that cgrou's inode's dentry can ever be flushed.  I would think
> not but am not sure.

Hmmm... I'm not really following.  The inode can be flushed and that's
why it needs to be walked down from root.  What am I missing here?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
       [not found]                 ` <20151201215853.GA9153-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2015-12-02 16:53                   ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-12-02 16:53 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: serge, linux-kernel, adityakali, linux-api, containers, cgroups,
	lxc-devel, akpm, ebiederm

Hello, Serge.

On Tue, Dec 01, 2015 at 03:58:53PM -0600, Serge E. Hallyn wrote:
> I mispoke before though - it's not the hierarchy's root dentry,
> but rather a dentry for a descendent cgroup which will become the
> root dentry for the new superblock.  We do know that there must be
> a css_set with a cgroup.  I'm still trying to track down whether
> that cgrou's inode's dentry can ever be flushed.  I would think
> not but am not sure.

Hmmm... I'm not really following.  The inode can be flushed and that's
why it needs to be walked down from root.  What am I missing here?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
@ 2015-12-02 16:53                   ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-12-02 16:53 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: serge-A9i7LUbDfNHQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	adityakali-hpIqsD4AKlfQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

Hello, Serge.

On Tue, Dec 01, 2015 at 03:58:53PM -0600, Serge E. Hallyn wrote:
> I mispoke before though - it's not the hierarchy's root dentry,
> but rather a dentry for a descendent cgroup which will become the
> root dentry for the new superblock.  We do know that there must be
> a css_set with a cgroup.  I'm still trying to track down whether
> that cgrou's inode's dentry can ever be flushed.  I would think
> not but am not sure.

Hmmm... I'm not really following.  The inode can be flushed and that's
why it needs to be walked down from root.  What am I missing here?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
       [not found]                   ` <20151202165312.GB19878-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2015-12-02 16:56                     ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-12-02 16:56 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge E. Hallyn, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On Wed, Dec 02, 2015 at 11:53:12AM -0500, Tejun Heo wrote:
> Hello, Serge.
> 
> On Tue, Dec 01, 2015 at 03:58:53PM -0600, Serge E. Hallyn wrote:
> > I mispoke before though - it's not the hierarchy's root dentry,
> > but rather a dentry for a descendent cgroup which will become the
> > root dentry for the new superblock.  We do know that there must be
> > a css_set with a cgroup.  I'm still trying to track down whether
> > that cgrou's inode's dentry can ever be flushed.  I would think
> > not but am not sure.
> 
> Hmmm... I'm not really following.  The inode can be flushed and that's
> why it needs to be walked down from root.  What am I missing here?

Can it be flushed when we know that the cgroup is being pinned by
a css_set?  (There's either a task or a cgroup_namespace pinning it
or we wouldn't get here)

-serge

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
       [not found]                   ` <20151202165312.GB19878-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2015-12-02 16:56                     ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-12-02 16:56 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Serge E. Hallyn, serge, linux-kernel, adityakali, linux-api,
	containers, cgroups, lxc-devel, akpm, ebiederm

On Wed, Dec 02, 2015 at 11:53:12AM -0500, Tejun Heo wrote:
> Hello, Serge.
> 
> On Tue, Dec 01, 2015 at 03:58:53PM -0600, Serge E. Hallyn wrote:
> > I mispoke before though - it's not the hierarchy's root dentry,
> > but rather a dentry for a descendent cgroup which will become the
> > root dentry for the new superblock.  We do know that there must be
> > a css_set with a cgroup.  I'm still trying to track down whether
> > that cgrou's inode's dentry can ever be flushed.  I would think
> > not but am not sure.
> 
> Hmmm... I'm not really following.  The inode can be flushed and that's
> why it needs to be walked down from root.  What am I missing here?

Can it be flushed when we know that the cgroup is being pinned by
a css_set?  (There's either a task or a cgroup_namespace pinning it
or we wouldn't get here)

-serge

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
@ 2015-12-02 16:56                     ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-12-02 16:56 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Serge E. Hallyn, serge-A9i7LUbDfNHQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	adityakali-hpIqsD4AKlfQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

On Wed, Dec 02, 2015 at 11:53:12AM -0500, Tejun Heo wrote:
> Hello, Serge.
> 
> On Tue, Dec 01, 2015 at 03:58:53PM -0600, Serge E. Hallyn wrote:
> > I mispoke before though - it's not the hierarchy's root dentry,
> > but rather a dentry for a descendent cgroup which will become the
> > root dentry for the new superblock.  We do know that there must be
> > a css_set with a cgroup.  I'm still trying to track down whether
> > that cgrou's inode's dentry can ever be flushed.  I would think
> > not but am not sure.
> 
> Hmmm... I'm not really following.  The inode can be flushed and that's
> why it needs to be walked down from root.  What am I missing here?

Can it be flushed when we know that the cgroup is being pinned by
a css_set?  (There's either a task or a cgroup_namespace pinning it
or we wouldn't get here)

-serge

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
       [not found]                     ` <20151202165637.GA20840-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2015-12-02 16:58                       ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-12-02 16:58 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On Wed, Dec 02, 2015 at 10:56:37AM -0600, Serge E. Hallyn wrote:
> Can it be flushed when we know that the cgroup is being pinned by
> a css_set?  (There's either a task or a cgroup_namespace pinning it
> or we wouldn't get here)

Yeap, it can be flushed.  There's no ref coming out of cgroup to the
vfs objects.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
       [not found]                     ` <20151202165637.GA20840-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2015-12-02 16:58                       ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-12-02 16:58 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: serge, linux-kernel, adityakali, linux-api, containers, cgroups,
	lxc-devel, akpm, ebiederm

On Wed, Dec 02, 2015 at 10:56:37AM -0600, Serge E. Hallyn wrote:
> Can it be flushed when we know that the cgroup is being pinned by
> a css_set?  (There's either a task or a cgroup_namespace pinning it
> or we wouldn't get here)

Yeap, it can be flushed.  There's no ref coming out of cgroup to the
vfs objects.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
@ 2015-12-02 16:58                       ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-12-02 16:58 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: serge-A9i7LUbDfNHQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	adityakali-hpIqsD4AKlfQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

On Wed, Dec 02, 2015 at 10:56:37AM -0600, Serge E. Hallyn wrote:
> Can it be flushed when we know that the cgroup is being pinned by
> a css_set?  (There's either a task or a cgroup_namespace pinning it
> or we wouldn't get here)

Yeap, it can be flushed.  There's no ref coming out of cgroup to the
vfs objects.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
       [not found]                       ` <20151202165839.GD19878-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2015-12-02 17:02                         ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-12-02 17:02 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge E. Hallyn, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On Wed, Dec 02, 2015 at 11:58:39AM -0500, Tejun Heo wrote:
> On Wed, Dec 02, 2015 at 10:56:37AM -0600, Serge E. Hallyn wrote:
> > Can it be flushed when we know that the cgroup is being pinned by
> > a css_set?  (There's either a task or a cgroup_namespace pinning it
> > or we wouldn't get here)
> 
> Yeap, it can be flushed.  There's no ref coming out of cgroup to the
> vfs objects.

Ok, thanks.  Still seems to me to be more work to actually walk the
path ourselves, but I'll go that route and see what it looks like :)

thanks
-serge

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
       [not found]                       ` <20151202165839.GD19878-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2015-12-02 17:02                         ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-12-02 17:02 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Serge E. Hallyn, serge, linux-kernel, adityakali, linux-api,
	containers, cgroups, lxc-devel, akpm, ebiederm

On Wed, Dec 02, 2015 at 11:58:39AM -0500, Tejun Heo wrote:
> On Wed, Dec 02, 2015 at 10:56:37AM -0600, Serge E. Hallyn wrote:
> > Can it be flushed when we know that the cgroup is being pinned by
> > a css_set?  (There's either a task or a cgroup_namespace pinning it
> > or we wouldn't get here)
> 
> Yeap, it can be flushed.  There's no ref coming out of cgroup to the
> vfs objects.

Ok, thanks.  Still seems to me to be more work to actually walk the
path ourselves, but I'll go that route and see what it looks like :)

thanks
-serge

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
@ 2015-12-02 17:02                         ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-12-02 17:02 UTC (permalink / raw)
  To: Tejun Heo
  Cc: adityakali-hpIqsD4AKlfQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	serge-A9i7LUbDfNHQT0dZR+AlfA

On Wed, Dec 02, 2015 at 11:58:39AM -0500, Tejun Heo wrote:
> On Wed, Dec 02, 2015 at 10:56:37AM -0600, Serge E. Hallyn wrote:
> > Can it be flushed when we know that the cgroup is being pinned by
> > a css_set?  (There's either a task or a cgroup_namespace pinning it
> > or we wouldn't get here)
> 
> Yeap, it can be flushed.  There's no ref coming out of cgroup to the
> vfs objects.

Ok, thanks.  Still seems to me to be more work to actually walk the
path ourselves, but I'll go that route and see what it looks like :)

thanks
-serge
_______________________________________________
lxc-devel mailing list
lxc-devel@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-devel

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
  2015-12-02 17:02                         ` Serge E. Hallyn
@ 2015-12-02 17:05                             ` Tejun Heo
  -1 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-12-02 17:05 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On Wed, Dec 02, 2015 at 11:02:39AM -0600, Serge E. Hallyn wrote:
> On Wed, Dec 02, 2015 at 11:58:39AM -0500, Tejun Heo wrote:
> > On Wed, Dec 02, 2015 at 10:56:37AM -0600, Serge E. Hallyn wrote:
> > > Can it be flushed when we know that the cgroup is being pinned by
> > > a css_set?  (There's either a task or a cgroup_namespace pinning it
> > > or we wouldn't get here)
> > 
> > Yeap, it can be flushed.  There's no ref coming out of cgroup to the
> > vfs objects.
> 
> Ok, thanks.  Still seems to me to be more work to actually walk the
> path ourselves, but I'll go that route and see what it looks like :)

I just dislike having two separate paths instantiating the same
objects and would prefer doing it the same way userland would do if
that isn't too complex but yeah it might turn out to be a lot more
work.

Thanks a lot!

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
@ 2015-12-02 17:05                             ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-12-02 17:05 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: serge, linux-kernel, adityakali, linux-api, containers, cgroups,
	lxc-devel, akpm, ebiederm

On Wed, Dec 02, 2015 at 11:02:39AM -0600, Serge E. Hallyn wrote:
> On Wed, Dec 02, 2015 at 11:58:39AM -0500, Tejun Heo wrote:
> > On Wed, Dec 02, 2015 at 10:56:37AM -0600, Serge E. Hallyn wrote:
> > > Can it be flushed when we know that the cgroup is being pinned by
> > > a css_set?  (There's either a task or a cgroup_namespace pinning it
> > > or we wouldn't get here)
> > 
> > Yeap, it can be flushed.  There's no ref coming out of cgroup to the
> > vfs objects.
> 
> Ok, thanks.  Still seems to me to be more work to actually walk the
> path ourselves, but I'll go that route and see what it looks like :)

I just dislike having two separate paths instantiating the same
objects and would prefer doing it the same way userland would do if
that isn't too complex but yeah it might turn out to be a lot more
work.

Thanks a lot!

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
       [not found]                             ` <20151202170551.GE19878-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2015-12-03 22:47                               ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-12-03 22:47 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge E. Hallyn, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On Wed, Dec 02, 2015 at 12:05:51PM -0500, Tejun Heo wrote:
> On Wed, Dec 02, 2015 at 11:02:39AM -0600, Serge E. Hallyn wrote:
> > On Wed, Dec 02, 2015 at 11:58:39AM -0500, Tejun Heo wrote:
> > > On Wed, Dec 02, 2015 at 10:56:37AM -0600, Serge E. Hallyn wrote:
> > > > Can it be flushed when we know that the cgroup is being pinned by
> > > > a css_set?  (There's either a task or a cgroup_namespace pinning it
> > > > or we wouldn't get here)
> > > 
> > > Yeap, it can be flushed.  There's no ref coming out of cgroup to the
> > > vfs objects.
> > 
> > Ok, thanks.  Still seems to me to be more work to actually walk the
> > path ourselves, but I'll go that route and see what it looks like :)
> 
> I just dislike having two separate paths instantiating the same
> objects and would prefer doing it the same way userland would do if
> that isn't too complex but yeah it might turn out to be a lot more
> work.
> 
> Thanks a lot!

Here's a patch to make that change.  Seems to be working for me.  If it
looks ok I can fold it into the prevoius patches and resend the new set.

PATCH 1/1] kernfs_obtain_root: switch to walking the path [fold up]

Signed-off-by: Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
---
 fs/kernfs/mount.c | 80 ++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 47 insertions(+), 33 deletions(-)

diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index cc41fe1..027f4ca 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -14,6 +14,7 @@
 #include <linux/magic.h>
 #include <linux/slab.h>
 #include <linux/pagemap.h>
+#include <linux/namei.h>
 
 #include "kernfs-internal.h"
 
@@ -62,6 +63,27 @@ struct kernfs_root *kernfs_root_from_sb(struct super_block *sb)
 	return NULL;
 }
 
+/*
+ * find the next ancestor in the path down to @child, where @parent was the
+ * parent whose child we want to find.
+ *
+ * Say the path is /a/b/c/d.  @child is d, @parent is NULL.  We return the root
+ * node.  If @parent is b, then we return the node for c.
+ * Passing in d as @parent is not ok.
+ */
+static struct kernfs_node *
+find_kn_ancestor_below(struct kernfs_node *child, struct kernfs_node *parent)
+{
+	BUG_ON(child == parent);
+
+	while (child->parent != parent) {
+		BUG_ON(!child->parent);
+		child = child->parent;
+	}
+
+	return child;
+}
+
 /**
  * kernfs_obtain_root - get a dentry for the given kernfs_node
  * @sb: the kernfs super_block
@@ -74,42 +96,34 @@ struct dentry *kernfs_obtain_root(struct super_block *sb,
 				  struct kernfs_node *kn)
 {
 	struct dentry *dentry;
-	struct inode *inode;
+	struct kernfs_node *knparent = NULL;
 
 	BUG_ON(sb->s_op != &kernfs_sops);
 
-	/* inode for the given kernfs_node should already exist. */
-	inode = kernfs_get_inode(sb, kn);
-	if (!inode) {
-		pr_debug("kernfs: could not get inode for '");
-		pr_cont_kernfs_path(kn);
-		pr_cont("'.\n");
-		return ERR_PTR(-EINVAL);
-	}
-
-	/* instantiate and link root dentry */
-	dentry = d_obtain_root(inode);
-	if (!dentry) {
-		pr_debug("kernfs: could not get dentry for '");
-		pr_cont_kernfs_path(kn);
-		pr_cont("'.\n");
-		return ERR_PTR(-ENOMEM);
-	}
-
-	/*
-	 * If this is a new dentry, set it up. We need kernfs_mutex because
-	 * this may be called by callers other than kernfs_fill_super.
-	 */
-	mutex_lock(&kernfs_mutex);
-	if (!dentry->d_fsdata) {
-		kernfs_get(kn);
-		dentry->d_fsdata = kn;
-	} else {
-		WARN_ON(dentry->d_fsdata != kn);
-	}
-	mutex_unlock(&kernfs_mutex);
-
-	return dentry;
+	dentry = dget(sb->s_root);
+	if (!kn->parent) // this is the root
+		return dentry;
+
+	knparent = find_kn_ancestor_below(kn, NULL);
+	BUG_ON(!knparent);
+
+	do {
+		struct dentry *dtmp;
+		struct kernfs_node *kntmp;
+
+		if (kn == knparent)
+			return dentry;
+		kntmp = find_kn_ancestor_below(kn, knparent);
+		BUG_ON(!kntmp);
+		dtmp = lookup_one_len(kntmp->name, dentry, strlen(kntmp->name));
+		dput(dentry);
+		if (IS_ERR(dtmp))
+			return dtmp;
+		knparent = kntmp;
+		dentry = dtmp;
+	} while (1);
+
+	// notreached
 }
 
 static int kernfs_fill_super(struct super_block *sb, unsigned long magic)
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
       [not found]                             ` <20151202170551.GE19878-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2015-12-03 22:47                               ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-12-03 22:47 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Serge E. Hallyn, serge, linux-kernel, adityakali, linux-api,
	containers, cgroups, lxc-devel, akpm, ebiederm

On Wed, Dec 02, 2015 at 12:05:51PM -0500, Tejun Heo wrote:
> On Wed, Dec 02, 2015 at 11:02:39AM -0600, Serge E. Hallyn wrote:
> > On Wed, Dec 02, 2015 at 11:58:39AM -0500, Tejun Heo wrote:
> > > On Wed, Dec 02, 2015 at 10:56:37AM -0600, Serge E. Hallyn wrote:
> > > > Can it be flushed when we know that the cgroup is being pinned by
> > > > a css_set?  (There's either a task or a cgroup_namespace pinning it
> > > > or we wouldn't get here)
> > > 
> > > Yeap, it can be flushed.  There's no ref coming out of cgroup to the
> > > vfs objects.
> > 
> > Ok, thanks.  Still seems to me to be more work to actually walk the
> > path ourselves, but I'll go that route and see what it looks like :)
> 
> I just dislike having two separate paths instantiating the same
> objects and would prefer doing it the same way userland would do if
> that isn't too complex but yeah it might turn out to be a lot more
> work.
> 
> Thanks a lot!

Here's a patch to make that change.  Seems to be working for me.  If it
looks ok I can fold it into the prevoius patches and resend the new set.

PATCH 1/1] kernfs_obtain_root: switch to walking the path [fold up]

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
---
 fs/kernfs/mount.c | 80 ++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 47 insertions(+), 33 deletions(-)

diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index cc41fe1..027f4ca 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -14,6 +14,7 @@
 #include <linux/magic.h>
 #include <linux/slab.h>
 #include <linux/pagemap.h>
+#include <linux/namei.h>
 
 #include "kernfs-internal.h"
 
@@ -62,6 +63,27 @@ struct kernfs_root *kernfs_root_from_sb(struct super_block *sb)
 	return NULL;
 }
 
+/*
+ * find the next ancestor in the path down to @child, where @parent was the
+ * parent whose child we want to find.
+ *
+ * Say the path is /a/b/c/d.  @child is d, @parent is NULL.  We return the root
+ * node.  If @parent is b, then we return the node for c.
+ * Passing in d as @parent is not ok.
+ */
+static struct kernfs_node *
+find_kn_ancestor_below(struct kernfs_node *child, struct kernfs_node *parent)
+{
+	BUG_ON(child == parent);
+
+	while (child->parent != parent) {
+		BUG_ON(!child->parent);
+		child = child->parent;
+	}
+
+	return child;
+}
+
 /**
  * kernfs_obtain_root - get a dentry for the given kernfs_node
  * @sb: the kernfs super_block
@@ -74,42 +96,34 @@ struct dentry *kernfs_obtain_root(struct super_block *sb,
 				  struct kernfs_node *kn)
 {
 	struct dentry *dentry;
-	struct inode *inode;
+	struct kernfs_node *knparent = NULL;
 
 	BUG_ON(sb->s_op != &kernfs_sops);
 
-	/* inode for the given kernfs_node should already exist. */
-	inode = kernfs_get_inode(sb, kn);
-	if (!inode) {
-		pr_debug("kernfs: could not get inode for '");
-		pr_cont_kernfs_path(kn);
-		pr_cont("'.\n");
-		return ERR_PTR(-EINVAL);
-	}
-
-	/* instantiate and link root dentry */
-	dentry = d_obtain_root(inode);
-	if (!dentry) {
-		pr_debug("kernfs: could not get dentry for '");
-		pr_cont_kernfs_path(kn);
-		pr_cont("'.\n");
-		return ERR_PTR(-ENOMEM);
-	}
-
-	/*
-	 * If this is a new dentry, set it up. We need kernfs_mutex because
-	 * this may be called by callers other than kernfs_fill_super.
-	 */
-	mutex_lock(&kernfs_mutex);
-	if (!dentry->d_fsdata) {
-		kernfs_get(kn);
-		dentry->d_fsdata = kn;
-	} else {
-		WARN_ON(dentry->d_fsdata != kn);
-	}
-	mutex_unlock(&kernfs_mutex);
-
-	return dentry;
+	dentry = dget(sb->s_root);
+	if (!kn->parent) // this is the root
+		return dentry;
+
+	knparent = find_kn_ancestor_below(kn, NULL);
+	BUG_ON(!knparent);
+
+	do {
+		struct dentry *dtmp;
+		struct kernfs_node *kntmp;
+
+		if (kn == knparent)
+			return dentry;
+		kntmp = find_kn_ancestor_below(kn, knparent);
+		BUG_ON(!kntmp);
+		dtmp = lookup_one_len(kntmp->name, dentry, strlen(kntmp->name));
+		dput(dentry);
+		if (IS_ERR(dtmp))
+			return dtmp;
+		knparent = kntmp;
+		dentry = dtmp;
+	} while (1);
+
+	// notreached
 }
 
 static int kernfs_fill_super(struct super_block *sb, unsigned long magic)
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
@ 2015-12-03 22:47                               ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-12-03 22:47 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Serge E. Hallyn, serge-A9i7LUbDfNHQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	adityakali-hpIqsD4AKlfQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w

On Wed, Dec 02, 2015 at 12:05:51PM -0500, Tejun Heo wrote:
> On Wed, Dec 02, 2015 at 11:02:39AM -0600, Serge E. Hallyn wrote:
> > On Wed, Dec 02, 2015 at 11:58:39AM -0500, Tejun Heo wrote:
> > > On Wed, Dec 02, 2015 at 10:56:37AM -0600, Serge E. Hallyn wrote:
> > > > Can it be flushed when we know that the cgroup is being pinned by
> > > > a css_set?  (There's either a task or a cgroup_namespace pinning it
> > > > or we wouldn't get here)
> > > 
> > > Yeap, it can be flushed.  There's no ref coming out of cgroup to the
> > > vfs objects.
> > 
> > Ok, thanks.  Still seems to me to be more work to actually walk the
> > path ourselves, but I'll go that route and see what it looks like :)
> 
> I just dislike having two separate paths instantiating the same
> objects and would prefer doing it the same way userland would do if
> that isn't too complex but yeah it might turn out to be a lot more
> work.
> 
> Thanks a lot!

Here's a patch to make that change.  Seems to be working for me.  If it
looks ok I can fold it into the prevoius patches and resend the new set.

PATCH 1/1] kernfs_obtain_root: switch to walking the path [fold up]

Signed-off-by: Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
---
 fs/kernfs/mount.c | 80 ++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 47 insertions(+), 33 deletions(-)

diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index cc41fe1..027f4ca 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -14,6 +14,7 @@
 #include <linux/magic.h>
 #include <linux/slab.h>
 #include <linux/pagemap.h>
+#include <linux/namei.h>
 
 #include "kernfs-internal.h"
 
@@ -62,6 +63,27 @@ struct kernfs_root *kernfs_root_from_sb(struct super_block *sb)
 	return NULL;
 }
 
+/*
+ * find the next ancestor in the path down to @child, where @parent was the
+ * parent whose child we want to find.
+ *
+ * Say the path is /a/b/c/d.  @child is d, @parent is NULL.  We return the root
+ * node.  If @parent is b, then we return the node for c.
+ * Passing in d as @parent is not ok.
+ */
+static struct kernfs_node *
+find_kn_ancestor_below(struct kernfs_node *child, struct kernfs_node *parent)
+{
+	BUG_ON(child == parent);
+
+	while (child->parent != parent) {
+		BUG_ON(!child->parent);
+		child = child->parent;
+	}
+
+	return child;
+}
+
 /**
  * kernfs_obtain_root - get a dentry for the given kernfs_node
  * @sb: the kernfs super_block
@@ -74,42 +96,34 @@ struct dentry *kernfs_obtain_root(struct super_block *sb,
 				  struct kernfs_node *kn)
 {
 	struct dentry *dentry;
-	struct inode *inode;
+	struct kernfs_node *knparent = NULL;
 
 	BUG_ON(sb->s_op != &kernfs_sops);
 
-	/* inode for the given kernfs_node should already exist. */
-	inode = kernfs_get_inode(sb, kn);
-	if (!inode) {
-		pr_debug("kernfs: could not get inode for '");
-		pr_cont_kernfs_path(kn);
-		pr_cont("'.\n");
-		return ERR_PTR(-EINVAL);
-	}
-
-	/* instantiate and link root dentry */
-	dentry = d_obtain_root(inode);
-	if (!dentry) {
-		pr_debug("kernfs: could not get dentry for '");
-		pr_cont_kernfs_path(kn);
-		pr_cont("'.\n");
-		return ERR_PTR(-ENOMEM);
-	}
-
-	/*
-	 * If this is a new dentry, set it up. We need kernfs_mutex because
-	 * this may be called by callers other than kernfs_fill_super.
-	 */
-	mutex_lock(&kernfs_mutex);
-	if (!dentry->d_fsdata) {
-		kernfs_get(kn);
-		dentry->d_fsdata = kn;
-	} else {
-		WARN_ON(dentry->d_fsdata != kn);
-	}
-	mutex_unlock(&kernfs_mutex);
-
-	return dentry;
+	dentry = dget(sb->s_root);
+	if (!kn->parent) // this is the root
+		return dentry;
+
+	knparent = find_kn_ancestor_below(kn, NULL);
+	BUG_ON(!knparent);
+
+	do {
+		struct dentry *dtmp;
+		struct kernfs_node *kntmp;
+
+		if (kn == knparent)
+			return dentry;
+		kntmp = find_kn_ancestor_below(kn, knparent);
+		BUG_ON(!kntmp);
+		dtmp = lookup_one_len(kntmp->name, dentry, strlen(kntmp->name));
+		dput(dentry);
+		if (IS_ERR(dtmp))
+			return dtmp;
+		knparent = kntmp;
+		dentry = dtmp;
+	} while (1);
+
+	// notreached
 }
 
 static int kernfs_fill_super(struct super_block *sb, unsigned long magic)
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
       [not found]                               ` <20151203224706.GA19971-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2015-12-07 15:39                                 ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-12-07 15:39 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

Hello, Serge.

On Thu, Dec 03, 2015 at 04:47:06PM -0600, Serge E. Hallyn wrote:
...
> +	dentry = dget(sb->s_root);
> +	if (!kn->parent) // this is the root
> +		return dentry;
> +
> +	knparent = find_kn_ancestor_below(kn, NULL);
> +	BUG_ON(!knparent);

Doing WARN_ON() and returning failure is better, I think.  Failing ns
mount is an okay failure mode and a lot better than crashing the
system.  Also, how about find_next_ancestor() for the name of the
function?

> +	do {
> +		struct dentry *dtmp;
> +		struct kernfs_node *kntmp;
> +
> +		if (kn == knparent)
> +			return dentry;
> +		kntmp = find_kn_ancestor_below(kn, knparent);
> +		BUG_ON(!kntmp);
> +		dtmp = lookup_one_len(kntmp->name, dentry, strlen(kntmp->name));
> +		dput(dentry);
> +		if (IS_ERR(dtmp))
> +			return dtmp;
> +		knparent = kntmp;
> +		dentry = dtmp;
> +	} while (1);

Other than the nitpicks, looks good to me.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
       [not found]                               ` <20151203224706.GA19971-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2015-12-07 15:39                                 ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-12-07 15:39 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: serge, linux-kernel, adityakali, linux-api, containers, cgroups,
	lxc-devel, akpm, ebiederm

Hello, Serge.

On Thu, Dec 03, 2015 at 04:47:06PM -0600, Serge E. Hallyn wrote:
...
> +	dentry = dget(sb->s_root);
> +	if (!kn->parent) // this is the root
> +		return dentry;
> +
> +	knparent = find_kn_ancestor_below(kn, NULL);
> +	BUG_ON(!knparent);

Doing WARN_ON() and returning failure is better, I think.  Failing ns
mount is an okay failure mode and a lot better than crashing the
system.  Also, how about find_next_ancestor() for the name of the
function?

> +	do {
> +		struct dentry *dtmp;
> +		struct kernfs_node *kntmp;
> +
> +		if (kn == knparent)
> +			return dentry;
> +		kntmp = find_kn_ancestor_below(kn, knparent);
> +		BUG_ON(!kntmp);
> +		dtmp = lookup_one_len(kntmp->name, dentry, strlen(kntmp->name));
> +		dput(dentry);
> +		if (IS_ERR(dtmp))
> +			return dtmp;
> +		knparent = kntmp;
> +		dentry = dtmp;
> +	} while (1);

Other than the nitpicks, looks good to me.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
@ 2015-12-07 15:39                                 ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-12-07 15:39 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: adityakali-hpIqsD4AKlfQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	serge-A9i7LUbDfNHQT0dZR+AlfA

Hello, Serge.

On Thu, Dec 03, 2015 at 04:47:06PM -0600, Serge E. Hallyn wrote:
...
> +	dentry = dget(sb->s_root);
> +	if (!kn->parent) // this is the root
> +		return dentry;
> +
> +	knparent = find_kn_ancestor_below(kn, NULL);
> +	BUG_ON(!knparent);

Doing WARN_ON() and returning failure is better, I think.  Failing ns
mount is an okay failure mode and a lot better than crashing the
system.  Also, how about find_next_ancestor() for the name of the
function?

> +	do {
> +		struct dentry *dtmp;
> +		struct kernfs_node *kntmp;
> +
> +		if (kn == knparent)
> +			return dentry;
> +		kntmp = find_kn_ancestor_below(kn, knparent);
> +		BUG_ON(!kntmp);
> +		dtmp = lookup_one_len(kntmp->name, dentry, strlen(kntmp->name));
> +		dput(dentry);
> +		if (IS_ERR(dtmp))
> +			return dtmp;
> +		knparent = kntmp;
> +		dentry = dtmp;
> +	} while (1);

Other than the nitpicks, looks good to me.

Thanks.

-- 
tejun
_______________________________________________
lxc-devel mailing list
lxc-devel@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-devel

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
  2015-12-07 15:39                                 ` Tejun Heo
@ 2015-12-07 15:53                                     ` Serge Hallyn
  -1 siblings, 0 replies; 180+ messages in thread
From: Serge Hallyn @ 2015-12-07 15:53 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

Quoting Tejun Heo (tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org):
> Hello, Serge.
> 
> On Thu, Dec 03, 2015 at 04:47:06PM -0600, Serge E. Hallyn wrote:
> ...
> > +	dentry = dget(sb->s_root);
> > +	if (!kn->parent) // this is the root
> > +		return dentry;
> > +
> > +	knparent = find_kn_ancestor_below(kn, NULL);
> > +	BUG_ON(!knparent);
> 
> Doing WARN_ON() and returning failure is better, I think.  Failing ns
> mount is an okay failure mode and a lot better than crashing the
> system.

Ok - this shouldn't be user-triggerable, so if it happens it really
is a bug in our code, but I'll change it,

> Also, how about find_next_ancestor() for the name of the
> function?

Yeah it's static anyway :)

will change, squash, and resend the set.

> > +	do {
> > +		struct dentry *dtmp;
> > +		struct kernfs_node *kntmp;
> > +
> > +		if (kn == knparent)
> > +			return dentry;
> > +		kntmp = find_kn_ancestor_below(kn, knparent);
> > +		BUG_ON(!kntmp);
> > +		dtmp = lookup_one_len(kntmp->name, dentry, strlen(kntmp->name));
> > +		dput(dentry);
> > +		if (IS_ERR(dtmp))
> > +			return dtmp;
> > +		knparent = kntmp;
> > +		dentry = dtmp;
> > +	} while (1);
> 
> Other than the nitpicks, looks good to me.
> 
> Thanks.
> 
> -- 
> tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns
@ 2015-12-07 15:53                                     ` Serge Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge Hallyn @ 2015-12-07 15:53 UTC (permalink / raw)
  To: Tejun Heo
  Cc: serge, linux-kernel, adityakali, linux-api, containers, cgroups,
	lxc-devel, akpm, ebiederm

Quoting Tejun Heo (tj@kernel.org):
> Hello, Serge.
> 
> On Thu, Dec 03, 2015 at 04:47:06PM -0600, Serge E. Hallyn wrote:
> ...
> > +	dentry = dget(sb->s_root);
> > +	if (!kn->parent) // this is the root
> > +		return dentry;
> > +
> > +	knparent = find_kn_ancestor_below(kn, NULL);
> > +	BUG_ON(!knparent);
> 
> Doing WARN_ON() and returning failure is better, I think.  Failing ns
> mount is an okay failure mode and a lot better than crashing the
> system.

Ok - this shouldn't be user-triggerable, so if it happens it really
is a bug in our code, but I'll change it,

> Also, how about find_next_ancestor() for the name of the
> function?

Yeah it's static anyway :)

will change, squash, and resend the set.

> > +	do {
> > +		struct dentry *dtmp;
> > +		struct kernfs_node *kntmp;
> > +
> > +		if (kn == knparent)
> > +			return dentry;
> > +		kntmp = find_kn_ancestor_below(kn, knparent);
> > +		BUG_ON(!kntmp);
> > +		dtmp = lookup_one_len(kntmp->name, dentry, strlen(kntmp->name));
> > +		dput(dentry);
> > +		if (IS_ERR(dtmp))
> > +			return dtmp;
> > +		knparent = kntmp;
> > +		dentry = dtmp;
> > +	} while (1);
> 
> Other than the nitpicks, looks good to me.
> 
> Thanks.
> 
> -- 
> tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 1/8] kernfs: Add API to generate relative kernfs path
       [not found] ` <1454057651-23959-1-git-send-email-serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
@ 2016-01-29  8:54   ` serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA
  0 siblings, 0 replies; 180+ messages in thread
From: serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA @ 2016-01-29  8:54 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	hannes-druUgvl0LCNAfugRpC6u6w, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	tj-DgEjT+Ai2ygdnm+yROfE0A, cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

From: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>

The new function kernfs_path_from_node() generates and returns kernfs
path of a given kernfs_node relative to a given parent kernfs_node.

Signed-off-by: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Serge E. Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Acked-by: Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
---
Changelog 20151125:
  - Fully-wing multilinecomments
  - Rework kernfs_path_from_node_locked() logic
  - Replace BUG_ONs with returning NULL
  - Use a const char* for /.. and precalculate its size
Changelog 20151130:
  - Update kernfs_path_from_node_locked comment
Changelog 20151208:
  - kernfs_node_distance:
    * Remove BUG_ON(NULL)s
    * Rename kernfs_node_distance to kernfs_depth
  - kernfs_common-ancestor:
    * Remove useless checks for depth == 0
    * Add check to ensure nodes are from same root
  - kernfs_path_from_node_locked:
    * Remove needless __must_check
    * Put p;len on its own decl line.
    * Fix wrong WARN_ONCE usage
Changelog 20151209:
  - kernfs_path_from_node: change arguments to 'to' and 'from', and
    change their order.
Changelog 20151222:
  - kernfs_path_from_node{,_locked}: return the string length.
    kernfs_path is gpl-exported, so changing their return value seemed
    ill-advised, but if noone minds I can update it too.
Changelog 20151223:
  - don't allocate memory pr_cont_kernfs_path() under spinlock
---
 fs/kernfs/dir.c        |  192 ++++++++++++++++++++++++++++++++++++++++--------
 include/linux/kernfs.h |    9 ++-
 2 files changed, 166 insertions(+), 35 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 996b774..38fa03a 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -44,28 +44,123 @@ static int kernfs_name_locked(struct kernfs_node *kn, char *buf, size_t buflen)
 	return strlcpy(buf, kn->parent ? kn->name : "/", buflen);
 }
 
-static char * __must_check kernfs_path_locked(struct kernfs_node *kn, char *buf,
-					      size_t buflen)
+/* kernfs_node_depth - compute depth from @from to @to */
+static size_t kernfs_depth(struct kernfs_node *from, struct kernfs_node *to)
 {
-	char *p = buf + buflen;
-	int len;
+	size_t depth = 0;
 
-	*--p = '\0';
+	while (to->parent && to != from) {
+		depth++;
+		to = to->parent;
+	}
+	return depth;
+}
 
-	do {
-		len = strlen(kn->name);
-		if (p - buf < len + 1) {
-			buf[0] = '\0';
-			p = NULL;
-			break;
-		}
-		p -= len;
-		memcpy(p, kn->name, len);
-		*--p = '/';
-		kn = kn->parent;
-	} while (kn && kn->parent);
+static struct kernfs_node *kernfs_common_ancestor(struct kernfs_node *a,
+						  struct kernfs_node *b)
+{
+	size_t da, db;
+	struct kernfs_root *ra = kernfs_root(a), *rb = kernfs_root(b);
 
-	return p;
+	if (ra != rb)
+		return NULL;
+
+	da = kernfs_depth(ra->kn, a);
+	db = kernfs_depth(rb->kn, b);
+
+	while (da > db) {
+		a = a->parent;
+		da--;
+	}
+	while (db > da) {
+		b = b->parent;
+		db--;
+	}
+
+	/* worst case b and a will be the same at root */
+	while (b != a) {
+		b = b->parent;
+		a = a->parent;
+	}
+
+	return a;
+}
+
+/**
+ * kernfs_path_from_node_locked - find a pseudo-absolute path to @kn_to,
+ * where kn_from is treated as root of the path.
+ * @kn_from: kernfs node which should be treated as root for the path
+ * @kn_to: kernfs node to which path is needed
+ * @buf: buffer to copy the path into
+ * @buflen: size of @buf
+ *
+ * We need to handle couple of scenarios here:
+ * [1] when @kn_from is an ancestor of @kn_to at some level
+ * kn_from: /n1/n2/n3
+ * kn_to:   /n1/n2/n3/n4/n5
+ * result:  /n4/n5
+ *
+ * [2] when @kn_from is on a different hierarchy and we need to find common
+ * ancestor between @kn_from and @kn_to.
+ * kn_from: /n1/n2/n3/n4
+ * kn_to:   /n1/n2/n5
+ * result:  /../../n5
+ * OR
+ * kn_from: /n1/n2/n3/n4/n5   [depth=5]
+ * kn_to:   /n1/n2/n3         [depth=3]
+ * result:  /../..
+ *
+ * return value: length of the string.  If greater than buflen,
+ * then contents of buf are undefined.  On error, -1 is returned.
+ */
+static int
+kernfs_path_from_node_locked(struct kernfs_node *kn_to,
+			     struct kernfs_node *kn_from, char *buf,
+			     size_t buflen)
+{
+	struct kernfs_node *kn, *common;
+	const char parent_str[] = "/..";
+	size_t depth_from, depth_to, len = 0, nlen = 0;
+	char *p;
+	int i;
+
+	if (!kn_from)
+		kn_from = kernfs_root(kn_to)->kn;
+
+	if (kn_from == kn_to)
+		return strlcpy(buf, "/", buflen);
+
+	common = kernfs_common_ancestor(kn_from, kn_to);
+	if (WARN_ON(!common))
+		return -1;
+
+	depth_to = kernfs_depth(common, kn_to);
+	depth_from = kernfs_depth(common, kn_from);
+
+	if (buf)
+		buf[0] = '\0';
+
+	for (i = 0; i < depth_from; i++)
+		len += strlcpy(buf + len, parent_str,
+			       len < buflen ? buflen - len : 0);
+
+	/* Calculate how many bytes we need for the rest */
+	for (kn = kn_to; kn != common; kn = kn->parent)
+		nlen += strlen(kn->name) + 1;
+
+	if (len + nlen >= buflen)
+		return len + nlen;
+
+	p = buf + len + nlen;
+	*p = '\0';
+	for (kn = kn_to; kn != common; kn = kn->parent) {
+		nlen = strlen(kn->name);
+		p -= nlen;
+		memcpy(p, kn->name, nlen);
+		*(--p) = '/';
+	}
+
+	return len + nlen;
 }
 
 /**
@@ -115,6 +210,34 @@ size_t kernfs_path_len(struct kernfs_node *kn)
 }
 
 /**
+ * kernfs_path_from_node - build path of node @to relative to @from.
+ * @from: parent kernfs_node relative to which we need to build the path
+ * @to: kernfs_node of interest
+ * @buf: buffer to copy @to's path into
+ * @buflen: size of @buf
+ *
+ * Builds @to's path relative to @from in @buf. @from and @to must
+ * be on the same kernfs-root. If @from is not parent of @to, then a relative
+ * path (which includes '..'s) as needed to reach from @from to @to is
+ * returned.
+ *
+ * If @buf isn't long enough, the return value will be greater than @buflen
+ * and @buf contents are undefined.
+ */
+int kernfs_path_from_node(struct kernfs_node *to, struct kernfs_node *from,
+			  char *buf, size_t buflen)
+{
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(&kernfs_rename_lock, flags);
+	ret = kernfs_path_from_node_locked(to, from, buf, buflen);
+	spin_unlock_irqrestore(&kernfs_rename_lock, flags);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(kernfs_path_from_node);
+
+/**
  * kernfs_path - build full path of a given node
  * @kn: kernfs_node of interest
  * @buf: buffer to copy @kn's name into
@@ -127,13 +250,12 @@ size_t kernfs_path_len(struct kernfs_node *kn)
  */
 char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen)
 {
-	unsigned long flags;
-	char *p;
+	int ret;
 
-	spin_lock_irqsave(&kernfs_rename_lock, flags);
-	p = kernfs_path_locked(kn, buf, buflen);
-	spin_unlock_irqrestore(&kernfs_rename_lock, flags);
-	return p;
+	ret = kernfs_path_from_node(kn, NULL, buf, buflen);
+	if (ret < 0 || ret >= buflen)
+		return NULL;
+	return buf;
 }
 EXPORT_SYMBOL_GPL(kernfs_path);
 
@@ -164,17 +286,25 @@ void pr_cont_kernfs_name(struct kernfs_node *kn)
 void pr_cont_kernfs_path(struct kernfs_node *kn)
 {
 	unsigned long flags;
-	char *p;
+	int sz;
 
 	spin_lock_irqsave(&kernfs_rename_lock, flags);
 
-	p = kernfs_path_locked(kn, kernfs_pr_cont_buf,
-			       sizeof(kernfs_pr_cont_buf));
-	if (p)
-		pr_cont("%s", p);
-	else
-		pr_cont("<name too long>");
+	sz = kernfs_path_from_node_locked(kn, NULL, kernfs_pr_cont_buf,
+					  sizeof(kernfs_pr_cont_buf));
+	if (sz < 0) {
+		pr_cont("(error)");
+		goto out;
+	}
+
+	if (sz >= sizeof(kernfs_pr_cont_buf)) {
+		pr_cont("(name too long)");
+		goto out;
+	}
+
+	pr_cont("%s", kernfs_pr_cont_buf);
 
+out:
 	spin_unlock_irqrestore(&kernfs_rename_lock, flags);
 }
 
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index af51df3..716bfde 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -267,8 +267,9 @@ static inline bool kernfs_ns_enabled(struct kernfs_node *kn)
 
 int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen);
 size_t kernfs_path_len(struct kernfs_node *kn);
-char * __must_check kernfs_path(struct kernfs_node *kn, char *buf,
-				size_t buflen);
+int kernfs_path_from_node(struct kernfs_node *root_kn, struct kernfs_node *kn,
+			  char *buf, size_t buflen);
+char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen);
 void pr_cont_kernfs_name(struct kernfs_node *kn);
 void pr_cont_kernfs_path(struct kernfs_node *kn);
 struct kernfs_node *kernfs_get_parent(struct kernfs_node *kn);
@@ -338,8 +339,8 @@ static inline int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen)
 static inline size_t kernfs_path_len(struct kernfs_node *kn)
 { return 0; }
 
-static inline char * __must_check kernfs_path(struct kernfs_node *kn, char *buf,
-					      size_t buflen)
+static inline char *kernfs_path(struct kernfs_node *kn, char *buf,
+				size_t buflen)
 { return NULL; }
 
 static inline void pr_cont_kernfs_name(struct kernfs_node *kn) { }
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 1/8] kernfs: Add API to generate relative kernfs path
       [not found] ` <1454057651-23959-1-git-send-email-serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
@ 2016-01-29  8:54   ` serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA
  0 siblings, 0 replies; 180+ messages in thread
From: serge.hallyn @ 2016-01-29  8:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: adityakali, tj, linux-api, containers, cgroups, lxc-devel, akpm,
	ebiederm, gregkh, lizefan, hannes, Serge E. Hallyn

From: Aditya Kali <adityakali@google.com>

The new function kernfs_path_from_node() generates and returns kernfs
path of a given kernfs_node relative to a given parent kernfs_node.

Signed-off-by: Aditya Kali <adityakali@google.com>
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
Changelog 20151125:
  - Fully-wing multilinecomments
  - Rework kernfs_path_from_node_locked() logic
  - Replace BUG_ONs with returning NULL
  - Use a const char* for /.. and precalculate its size
Changelog 20151130:
  - Update kernfs_path_from_node_locked comment
Changelog 20151208:
  - kernfs_node_distance:
    * Remove BUG_ON(NULL)s
    * Rename kernfs_node_distance to kernfs_depth
  - kernfs_common-ancestor:
    * Remove useless checks for depth == 0
    * Add check to ensure nodes are from same root
  - kernfs_path_from_node_locked:
    * Remove needless __must_check
    * Put p;len on its own decl line.
    * Fix wrong WARN_ONCE usage
Changelog 20151209:
  - kernfs_path_from_node: change arguments to 'to' and 'from', and
    change their order.
Changelog 20151222:
  - kernfs_path_from_node{,_locked}: return the string length.
    kernfs_path is gpl-exported, so changing their return value seemed
    ill-advised, but if noone minds I can update it too.
Changelog 20151223:
  - don't allocate memory pr_cont_kernfs_path() under spinlock
---
 fs/kernfs/dir.c        |  192 ++++++++++++++++++++++++++++++++++++++++--------
 include/linux/kernfs.h |    9 ++-
 2 files changed, 166 insertions(+), 35 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 996b774..38fa03a 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -44,28 +44,123 @@ static int kernfs_name_locked(struct kernfs_node *kn, char *buf, size_t buflen)
 	return strlcpy(buf, kn->parent ? kn->name : "/", buflen);
 }
 
-static char * __must_check kernfs_path_locked(struct kernfs_node *kn, char *buf,
-					      size_t buflen)
+/* kernfs_node_depth - compute depth from @from to @to */
+static size_t kernfs_depth(struct kernfs_node *from, struct kernfs_node *to)
 {
-	char *p = buf + buflen;
-	int len;
+	size_t depth = 0;
 
-	*--p = '\0';
+	while (to->parent && to != from) {
+		depth++;
+		to = to->parent;
+	}
+	return depth;
+}
 
-	do {
-		len = strlen(kn->name);
-		if (p - buf < len + 1) {
-			buf[0] = '\0';
-			p = NULL;
-			break;
-		}
-		p -= len;
-		memcpy(p, kn->name, len);
-		*--p = '/';
-		kn = kn->parent;
-	} while (kn && kn->parent);
+static struct kernfs_node *kernfs_common_ancestor(struct kernfs_node *a,
+						  struct kernfs_node *b)
+{
+	size_t da, db;
+	struct kernfs_root *ra = kernfs_root(a), *rb = kernfs_root(b);
 
-	return p;
+	if (ra != rb)
+		return NULL;
+
+	da = kernfs_depth(ra->kn, a);
+	db = kernfs_depth(rb->kn, b);
+
+	while (da > db) {
+		a = a->parent;
+		da--;
+	}
+	while (db > da) {
+		b = b->parent;
+		db--;
+	}
+
+	/* worst case b and a will be the same at root */
+	while (b != a) {
+		b = b->parent;
+		a = a->parent;
+	}
+
+	return a;
+}
+
+/**
+ * kernfs_path_from_node_locked - find a pseudo-absolute path to @kn_to,
+ * where kn_from is treated as root of the path.
+ * @kn_from: kernfs node which should be treated as root for the path
+ * @kn_to: kernfs node to which path is needed
+ * @buf: buffer to copy the path into
+ * @buflen: size of @buf
+ *
+ * We need to handle couple of scenarios here:
+ * [1] when @kn_from is an ancestor of @kn_to at some level
+ * kn_from: /n1/n2/n3
+ * kn_to:   /n1/n2/n3/n4/n5
+ * result:  /n4/n5
+ *
+ * [2] when @kn_from is on a different hierarchy and we need to find common
+ * ancestor between @kn_from and @kn_to.
+ * kn_from: /n1/n2/n3/n4
+ * kn_to:   /n1/n2/n5
+ * result:  /../../n5
+ * OR
+ * kn_from: /n1/n2/n3/n4/n5   [depth=5]
+ * kn_to:   /n1/n2/n3         [depth=3]
+ * result:  /../..
+ *
+ * return value: length of the string.  If greater than buflen,
+ * then contents of buf are undefined.  On error, -1 is returned.
+ */
+static int
+kernfs_path_from_node_locked(struct kernfs_node *kn_to,
+			     struct kernfs_node *kn_from, char *buf,
+			     size_t buflen)
+{
+	struct kernfs_node *kn, *common;
+	const char parent_str[] = "/..";
+	size_t depth_from, depth_to, len = 0, nlen = 0;
+	char *p;
+	int i;
+
+	if (!kn_from)
+		kn_from = kernfs_root(kn_to)->kn;
+
+	if (kn_from == kn_to)
+		return strlcpy(buf, "/", buflen);
+
+	common = kernfs_common_ancestor(kn_from, kn_to);
+	if (WARN_ON(!common))
+		return -1;
+
+	depth_to = kernfs_depth(common, kn_to);
+	depth_from = kernfs_depth(common, kn_from);
+
+	if (buf)
+		buf[0] = '\0';
+
+	for (i = 0; i < depth_from; i++)
+		len += strlcpy(buf + len, parent_str,
+			       len < buflen ? buflen - len : 0);
+
+	/* Calculate how many bytes we need for the rest */
+	for (kn = kn_to; kn != common; kn = kn->parent)
+		nlen += strlen(kn->name) + 1;
+
+	if (len + nlen >= buflen)
+		return len + nlen;
+
+	p = buf + len + nlen;
+	*p = '\0';
+	for (kn = kn_to; kn != common; kn = kn->parent) {
+		nlen = strlen(kn->name);
+		p -= nlen;
+		memcpy(p, kn->name, nlen);
+		*(--p) = '/';
+	}
+
+	return len + nlen;
 }
 
 /**
@@ -115,6 +210,34 @@ size_t kernfs_path_len(struct kernfs_node *kn)
 }
 
 /**
+ * kernfs_path_from_node - build path of node @to relative to @from.
+ * @from: parent kernfs_node relative to which we need to build the path
+ * @to: kernfs_node of interest
+ * @buf: buffer to copy @to's path into
+ * @buflen: size of @buf
+ *
+ * Builds @to's path relative to @from in @buf. @from and @to must
+ * be on the same kernfs-root. If @from is not parent of @to, then a relative
+ * path (which includes '..'s) as needed to reach from @from to @to is
+ * returned.
+ *
+ * If @buf isn't long enough, the return value will be greater than @buflen
+ * and @buf contents are undefined.
+ */
+int kernfs_path_from_node(struct kernfs_node *to, struct kernfs_node *from,
+			  char *buf, size_t buflen)
+{
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(&kernfs_rename_lock, flags);
+	ret = kernfs_path_from_node_locked(to, from, buf, buflen);
+	spin_unlock_irqrestore(&kernfs_rename_lock, flags);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(kernfs_path_from_node);
+
+/**
  * kernfs_path - build full path of a given node
  * @kn: kernfs_node of interest
  * @buf: buffer to copy @kn's name into
@@ -127,13 +250,12 @@ size_t kernfs_path_len(struct kernfs_node *kn)
  */
 char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen)
 {
-	unsigned long flags;
-	char *p;
+	int ret;
 
-	spin_lock_irqsave(&kernfs_rename_lock, flags);
-	p = kernfs_path_locked(kn, buf, buflen);
-	spin_unlock_irqrestore(&kernfs_rename_lock, flags);
-	return p;
+	ret = kernfs_path_from_node(kn, NULL, buf, buflen);
+	if (ret < 0 || ret >= buflen)
+		return NULL;
+	return buf;
 }
 EXPORT_SYMBOL_GPL(kernfs_path);
 
@@ -164,17 +286,25 @@ void pr_cont_kernfs_name(struct kernfs_node *kn)
 void pr_cont_kernfs_path(struct kernfs_node *kn)
 {
 	unsigned long flags;
-	char *p;
+	int sz;
 
 	spin_lock_irqsave(&kernfs_rename_lock, flags);
 
-	p = kernfs_path_locked(kn, kernfs_pr_cont_buf,
-			       sizeof(kernfs_pr_cont_buf));
-	if (p)
-		pr_cont("%s", p);
-	else
-		pr_cont("<name too long>");
+	sz = kernfs_path_from_node_locked(kn, NULL, kernfs_pr_cont_buf,
+					  sizeof(kernfs_pr_cont_buf));
+	if (sz < 0) {
+		pr_cont("(error)");
+		goto out;
+	}
+
+	if (sz >= sizeof(kernfs_pr_cont_buf)) {
+		pr_cont("(name too long)");
+		goto out;
+	}
+
+	pr_cont("%s", kernfs_pr_cont_buf);
 
+out:
 	spin_unlock_irqrestore(&kernfs_rename_lock, flags);
 }
 
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index af51df3..716bfde 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -267,8 +267,9 @@ static inline bool kernfs_ns_enabled(struct kernfs_node *kn)
 
 int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen);
 size_t kernfs_path_len(struct kernfs_node *kn);
-char * __must_check kernfs_path(struct kernfs_node *kn, char *buf,
-				size_t buflen);
+int kernfs_path_from_node(struct kernfs_node *root_kn, struct kernfs_node *kn,
+			  char *buf, size_t buflen);
+char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen);
 void pr_cont_kernfs_name(struct kernfs_node *kn);
 void pr_cont_kernfs_path(struct kernfs_node *kn);
 struct kernfs_node *kernfs_get_parent(struct kernfs_node *kn);
@@ -338,8 +339,8 @@ static inline int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen)
 static inline size_t kernfs_path_len(struct kernfs_node *kn)
 { return 0; }
 
-static inline char * __must_check kernfs_path(struct kernfs_node *kn, char *buf,
-					      size_t buflen)
+static inline char *kernfs_path(struct kernfs_node *kn, char *buf,
+				size_t buflen)
 { return NULL; }
 
 static inline void pr_cont_kernfs_name(struct kernfs_node *kn) { }
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 1/8] kernfs: Add API to generate relative kernfs path
@ 2016-01-29  8:54   ` serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA
  0 siblings, 0 replies; 180+ messages in thread
From: serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA @ 2016-01-29  8:54 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: adityakali-hpIqsD4AKlfQT0dZR+AlfA, Serge E. Hallyn,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	hannes-druUgvl0LCNAfugRpC6u6w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	tj-DgEjT+Ai2ygdnm+yROfE0A, cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

From: Aditya Kali <adityakali@google.com>

The new function kernfs_path_from_node() generates and returns kernfs
path of a given kernfs_node relative to a given parent kernfs_node.

Signed-off-by: Aditya Kali <adityakali@google.com>
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
Changelog 20151125:
  - Fully-wing multilinecomments
  - Rework kernfs_path_from_node_locked() logic
  - Replace BUG_ONs with returning NULL
  - Use a const char* for /.. and precalculate its size
Changelog 20151130:
  - Update kernfs_path_from_node_locked comment
Changelog 20151208:
  - kernfs_node_distance:
    * Remove BUG_ON(NULL)s
    * Rename kernfs_node_distance to kernfs_depth
  - kernfs_common-ancestor:
    * Remove useless checks for depth == 0
    * Add check to ensure nodes are from same root
  - kernfs_path_from_node_locked:
    * Remove needless __must_check
    * Put p;len on its own decl line.
    * Fix wrong WARN_ONCE usage
Changelog 20151209:
  - kernfs_path_from_node: change arguments to 'to' and 'from', and
    change their order.
Changelog 20151222:
  - kernfs_path_from_node{,_locked}: return the string length.
    kernfs_path is gpl-exported, so changing their return value seemed
    ill-advised, but if noone minds I can update it too.
Changelog 20151223:
  - don't allocate memory pr_cont_kernfs_path() under spinlock
---
 fs/kernfs/dir.c        |  192 ++++++++++++++++++++++++++++++++++++++++--------
 include/linux/kernfs.h |    9 ++-
 2 files changed, 166 insertions(+), 35 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 996b774..38fa03a 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -44,28 +44,123 @@ static int kernfs_name_locked(struct kernfs_node *kn, char *buf, size_t buflen)
 	return strlcpy(buf, kn->parent ? kn->name : "/", buflen);
 }
 
-static char * __must_check kernfs_path_locked(struct kernfs_node *kn, char *buf,
-					      size_t buflen)
+/* kernfs_node_depth - compute depth from @from to @to */
+static size_t kernfs_depth(struct kernfs_node *from, struct kernfs_node *to)
 {
-	char *p = buf + buflen;
-	int len;
+	size_t depth = 0;
 
-	*--p = '\0';
+	while (to->parent && to != from) {
+		depth++;
+		to = to->parent;
+	}
+	return depth;
+}
 
-	do {
-		len = strlen(kn->name);
-		if (p - buf < len + 1) {
-			buf[0] = '\0';
-			p = NULL;
-			break;
-		}
-		p -= len;
-		memcpy(p, kn->name, len);
-		*--p = '/';
-		kn = kn->parent;
-	} while (kn && kn->parent);
+static struct kernfs_node *kernfs_common_ancestor(struct kernfs_node *a,
+						  struct kernfs_node *b)
+{
+	size_t da, db;
+	struct kernfs_root *ra = kernfs_root(a), *rb = kernfs_root(b);
 
-	return p;
+	if (ra != rb)
+		return NULL;
+
+	da = kernfs_depth(ra->kn, a);
+	db = kernfs_depth(rb->kn, b);
+
+	while (da > db) {
+		a = a->parent;
+		da--;
+	}
+	while (db > da) {
+		b = b->parent;
+		db--;
+	}
+
+	/* worst case b and a will be the same at root */
+	while (b != a) {
+		b = b->parent;
+		a = a->parent;
+	}
+
+	return a;
+}
+
+/**
+ * kernfs_path_from_node_locked - find a pseudo-absolute path to @kn_to,
+ * where kn_from is treated as root of the path.
+ * @kn_from: kernfs node which should be treated as root for the path
+ * @kn_to: kernfs node to which path is needed
+ * @buf: buffer to copy the path into
+ * @buflen: size of @buf
+ *
+ * We need to handle couple of scenarios here:
+ * [1] when @kn_from is an ancestor of @kn_to at some level
+ * kn_from: /n1/n2/n3
+ * kn_to:   /n1/n2/n3/n4/n5
+ * result:  /n4/n5
+ *
+ * [2] when @kn_from is on a different hierarchy and we need to find common
+ * ancestor between @kn_from and @kn_to.
+ * kn_from: /n1/n2/n3/n4
+ * kn_to:   /n1/n2/n5
+ * result:  /../../n5
+ * OR
+ * kn_from: /n1/n2/n3/n4/n5   [depth=5]
+ * kn_to:   /n1/n2/n3         [depth=3]
+ * result:  /../..
+ *
+ * return value: length of the string.  If greater than buflen,
+ * then contents of buf are undefined.  On error, -1 is returned.
+ */
+static int
+kernfs_path_from_node_locked(struct kernfs_node *kn_to,
+			     struct kernfs_node *kn_from, char *buf,
+			     size_t buflen)
+{
+	struct kernfs_node *kn, *common;
+	const char parent_str[] = "/..";
+	size_t depth_from, depth_to, len = 0, nlen = 0;
+	char *p;
+	int i;
+
+	if (!kn_from)
+		kn_from = kernfs_root(kn_to)->kn;
+
+	if (kn_from == kn_to)
+		return strlcpy(buf, "/", buflen);
+
+	common = kernfs_common_ancestor(kn_from, kn_to);
+	if (WARN_ON(!common))
+		return -1;
+
+	depth_to = kernfs_depth(common, kn_to);
+	depth_from = kernfs_depth(common, kn_from);
+
+	if (buf)
+		buf[0] = '\0';
+
+	for (i = 0; i < depth_from; i++)
+		len += strlcpy(buf + len, parent_str,
+			       len < buflen ? buflen - len : 0);
+
+	/* Calculate how many bytes we need for the rest */
+	for (kn = kn_to; kn != common; kn = kn->parent)
+		nlen += strlen(kn->name) + 1;
+
+	if (len + nlen >= buflen)
+		return len + nlen;
+
+	p = buf + len + nlen;
+	*p = '\0';
+	for (kn = kn_to; kn != common; kn = kn->parent) {
+		nlen = strlen(kn->name);
+		p -= nlen;
+		memcpy(p, kn->name, nlen);
+		*(--p) = '/';
+	}
+
+	return len + nlen;
 }
 
 /**
@@ -115,6 +210,34 @@ size_t kernfs_path_len(struct kernfs_node *kn)
 }
 
 /**
+ * kernfs_path_from_node - build path of node @to relative to @from.
+ * @from: parent kernfs_node relative to which we need to build the path
+ * @to: kernfs_node of interest
+ * @buf: buffer to copy @to's path into
+ * @buflen: size of @buf
+ *
+ * Builds @to's path relative to @from in @buf. @from and @to must
+ * be on the same kernfs-root. If @from is not parent of @to, then a relative
+ * path (which includes '..'s) as needed to reach from @from to @to is
+ * returned.
+ *
+ * If @buf isn't long enough, the return value will be greater than @buflen
+ * and @buf contents are undefined.
+ */
+int kernfs_path_from_node(struct kernfs_node *to, struct kernfs_node *from,
+			  char *buf, size_t buflen)
+{
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(&kernfs_rename_lock, flags);
+	ret = kernfs_path_from_node_locked(to, from, buf, buflen);
+	spin_unlock_irqrestore(&kernfs_rename_lock, flags);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(kernfs_path_from_node);
+
+/**
  * kernfs_path - build full path of a given node
  * @kn: kernfs_node of interest
  * @buf: buffer to copy @kn's name into
@@ -127,13 +250,12 @@ size_t kernfs_path_len(struct kernfs_node *kn)
  */
 char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen)
 {
-	unsigned long flags;
-	char *p;
+	int ret;
 
-	spin_lock_irqsave(&kernfs_rename_lock, flags);
-	p = kernfs_path_locked(kn, buf, buflen);
-	spin_unlock_irqrestore(&kernfs_rename_lock, flags);
-	return p;
+	ret = kernfs_path_from_node(kn, NULL, buf, buflen);
+	if (ret < 0 || ret >= buflen)
+		return NULL;
+	return buf;
 }
 EXPORT_SYMBOL_GPL(kernfs_path);
 
@@ -164,17 +286,25 @@ void pr_cont_kernfs_name(struct kernfs_node *kn)
 void pr_cont_kernfs_path(struct kernfs_node *kn)
 {
 	unsigned long flags;
-	char *p;
+	int sz;
 
 	spin_lock_irqsave(&kernfs_rename_lock, flags);
 
-	p = kernfs_path_locked(kn, kernfs_pr_cont_buf,
-			       sizeof(kernfs_pr_cont_buf));
-	if (p)
-		pr_cont("%s", p);
-	else
-		pr_cont("<name too long>");
+	sz = kernfs_path_from_node_locked(kn, NULL, kernfs_pr_cont_buf,
+					  sizeof(kernfs_pr_cont_buf));
+	if (sz < 0) {
+		pr_cont("(error)");
+		goto out;
+	}
+
+	if (sz >= sizeof(kernfs_pr_cont_buf)) {
+		pr_cont("(name too long)");
+		goto out;
+	}
+
+	pr_cont("%s", kernfs_pr_cont_buf);
 
+out:
 	spin_unlock_irqrestore(&kernfs_rename_lock, flags);
 }
 
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index af51df3..716bfde 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -267,8 +267,9 @@ static inline bool kernfs_ns_enabled(struct kernfs_node *kn)
 
 int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen);
 size_t kernfs_path_len(struct kernfs_node *kn);
-char * __must_check kernfs_path(struct kernfs_node *kn, char *buf,
-				size_t buflen);
+int kernfs_path_from_node(struct kernfs_node *root_kn, struct kernfs_node *kn,
+			  char *buf, size_t buflen);
+char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen);
 void pr_cont_kernfs_name(struct kernfs_node *kn);
 void pr_cont_kernfs_path(struct kernfs_node *kn);
 struct kernfs_node *kernfs_get_parent(struct kernfs_node *kn);
@@ -338,8 +339,8 @@ static inline int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen)
 static inline size_t kernfs_path_len(struct kernfs_node *kn)
 { return 0; }
 
-static inline char * __must_check kernfs_path(struct kernfs_node *kn, char *buf,
-					      size_t buflen)
+static inline char *kernfs_path(struct kernfs_node *kn, char *buf,
+				size_t buflen)
 { return NULL; }
 
 static inline void pr_cont_kernfs_name(struct kernfs_node *kn) { }
-- 
1.7.9.5

_______________________________________________
lxc-devel mailing list
lxc-devel@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-devel

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 1/8] kernfs: Add API to generate relative kernfs path
  2016-01-04 19:54 CGroup Namespaces (v9) serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA
@ 2016-01-04 19:54     ` serge.hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA @ 2016-01-04 19:54 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	hannes-druUgvl0LCNAfugRpC6u6w, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	tj-DgEjT+Ai2ygdnm+yROfE0A, cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

From: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>

The new function kernfs_path_from_node() generates and returns kernfs
path of a given kernfs_node relative to a given parent kernfs_node.

Signed-off-by: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Serge E. Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Acked-by: Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
---
Changelog 20151125:
  - Fully-wing multilinecomments
  - Rework kernfs_path_from_node_locked() logic
  - Replace BUG_ONs with returning NULL
  - Use a const char* for /.. and precalculate its size
Changelog 20151130:
  - Update kernfs_path_from_node_locked comment
Changelog 20151208:
  - kernfs_node_distance:
    * Remove BUG_ON(NULL)s
    * Rename kernfs_node_distance to kernfs_depth
  - kernfs_common-ancestor:
    * Remove useless checks for depth == 0
    * Add check to ensure nodes are from same root
  - kernfs_path_from_node_locked:
    * Remove needless __must_check
    * Put p;len on its own decl line.
    * Fix wrong WARN_ONCE usage
Changelog 20151209:
  - kernfs_path_from_node: change arguments to 'to' and 'from', and
    change their order.
Changelog 20151222:
  - kernfs_path_from_node{,_locked}: return the string length.
    kernfs_path is gpl-exported, so changing their return value seemed
    ill-advised, but if noone minds I can update it too.
Changelog 20151223:
  - don't allocate memory pr_cont_kernfs_path() under spinlock
---
 fs/kernfs/dir.c        |  192 ++++++++++++++++++++++++++++++++++++++++--------
 include/linux/kernfs.h |    9 ++-
 2 files changed, 166 insertions(+), 35 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 742bf4a..f2b2187 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -44,28 +44,123 @@ static int kernfs_name_locked(struct kernfs_node *kn, char *buf, size_t buflen)
 	return strlcpy(buf, kn->parent ? kn->name : "/", buflen);
 }
 
-static char * __must_check kernfs_path_locked(struct kernfs_node *kn, char *buf,
-					      size_t buflen)
+/* kernfs_node_depth - compute depth from @from to @to */
+static size_t kernfs_depth(struct kernfs_node *from, struct kernfs_node *to)
 {
-	char *p = buf + buflen;
-	int len;
+	size_t depth = 0;
 
-	*--p = '\0';
+	while (to->parent && to != from) {
+		depth++;
+		to = to->parent;
+	}
+	return depth;
+}
 
-	do {
-		len = strlen(kn->name);
-		if (p - buf < len + 1) {
-			buf[0] = '\0';
-			p = NULL;
-			break;
-		}
-		p -= len;
-		memcpy(p, kn->name, len);
-		*--p = '/';
-		kn = kn->parent;
-	} while (kn && kn->parent);
+static struct kernfs_node *kernfs_common_ancestor(struct kernfs_node *a,
+						  struct kernfs_node *b)
+{
+	size_t da, db;
+	struct kernfs_root *ra = kernfs_root(a), *rb = kernfs_root(b);
+
+	if (ra != rb)
+		return NULL;
+
+	da = kernfs_depth(ra->kn, a);
+	db = kernfs_depth(rb->kn, b);
+
+	while (da > db) {
+		a = a->parent;
+		da--;
+	}
+	while (db > da) {
+		b = b->parent;
+		db--;
+	}
+
+	/* worst case b and a will be the same at root */
+	while (b != a) {
+		b = b->parent;
+		a = a->parent;
+	}
+
+	return a;
+}
+
+/**
+ * kernfs_path_from_node_locked - find a pseudo-absolute path to @kn_to,
+ * where kn_from is treated as root of the path.
+ * @kn_from: kernfs node which should be treated as root for the path
+ * @kn_to: kernfs node to which path is needed
+ * @buf: buffer to copy the path into
+ * @buflen: size of @buf
+ *
+ * We need to handle couple of scenarios here:
+ * [1] when @kn_from is an ancestor of @kn_to at some level
+ * kn_from: /n1/n2/n3
+ * kn_to:   /n1/n2/n3/n4/n5
+ * result:  /n4/n5
+ *
+ * [2] when @kn_from is on a different hierarchy and we need to find common
+ * ancestor between @kn_from and @kn_to.
+ * kn_from: /n1/n2/n3/n4
+ * kn_to:   /n1/n2/n5
+ * result:  /../../n5
+ * OR
+ * kn_from: /n1/n2/n3/n4/n5   [depth=5]
+ * kn_to:   /n1/n2/n3         [depth=3]
+ * result:  /../..
+ *
+ * return value: length of the string.  If greater than buflen,
+ * then contents of buf are undefined.  On error, -1 is returned.
+ */
+static int
+kernfs_path_from_node_locked(struct kernfs_node *kn_to,
+			     struct kernfs_node *kn_from, char *buf,
+			     size_t buflen)
+{
+	struct kernfs_node *kn, *common;
+	const char parent_str[] = "/..";
+	size_t depth_from, depth_to, len = 0, nlen = 0;
+	char *p;
+	int i;
+
+	if (!kn_from)
+		kn_from = kernfs_root(kn_to)->kn;
+
+	if (kn_from == kn_to)
+		return strlcpy(buf, "/", buflen);
+
+	common = kernfs_common_ancestor(kn_from, kn_to);
+	if (WARN_ON(!common))
+		return -1;
+
+	depth_to = kernfs_depth(common, kn_to);
+	depth_from = kernfs_depth(common, kn_from);
+
+	if (buf)
+		buf[0] = '\0';
 
-	return p;
+	for (i = 0; i < depth_from; i++)
+		len += strlcpy(buf + len, parent_str,
+			       len < buflen ? buflen - len : 0);
+
+	/* Calculate how many bytes we need for the rest */
+	for (kn = kn_to; kn != common; kn = kn->parent)
+		nlen += strlen(kn->name) + 1;
+
+	if (len + nlen >= buflen)
+		return len + nlen;
+
+	p = buf + len + nlen;
+	*p = '\0';
+	for (kn = kn_to; kn != common; kn = kn->parent) {
+		nlen = strlen(kn->name);
+		p -= nlen;
+		memcpy(p, kn->name, nlen);
+		*(--p) = '/';
+	}
+
+	return len + nlen;
 }
 
 /**
@@ -115,6 +210,34 @@ size_t kernfs_path_len(struct kernfs_node *kn)
 }
 
 /**
+ * kernfs_path_from_node - build path of node @to relative to @from.
+ * @from: parent kernfs_node relative to which we need to build the path
+ * @to: kernfs_node of interest
+ * @buf: buffer to copy @to's path into
+ * @buflen: size of @buf
+ *
+ * Builds @to's path relative to @from in @buf. @from and @to must
+ * be on the same kernfs-root. If @from is not parent of @to, then a relative
+ * path (which includes '..'s) as needed to reach from @from to @to is
+ * returned.
+ * 
+ * If @buf isn't long enough, the return value will be greater than @buflen
+ * and @buf contents are undefined.
+ */
+int kernfs_path_from_node(struct kernfs_node *to, struct kernfs_node *from,
+			  char *buf, size_t buflen)
+{
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(&kernfs_rename_lock, flags);
+	ret = kernfs_path_from_node_locked(to, from, buf, buflen);
+	spin_unlock_irqrestore(&kernfs_rename_lock, flags);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(kernfs_path_from_node);
+
+/**
  * kernfs_path - build full path of a given node
  * @kn: kernfs_node of interest
  * @buf: buffer to copy @kn's name into
@@ -127,13 +250,12 @@ size_t kernfs_path_len(struct kernfs_node *kn)
  */
 char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen)
 {
-	unsigned long flags;
-	char *p;
+	int ret;
 
-	spin_lock_irqsave(&kernfs_rename_lock, flags);
-	p = kernfs_path_locked(kn, buf, buflen);
-	spin_unlock_irqrestore(&kernfs_rename_lock, flags);
-	return p;
+	ret = kernfs_path_from_node(kn, NULL, buf, buflen);
+	if (ret < 0 || ret >= buflen)
+		return NULL;
+	return buf;
 }
 EXPORT_SYMBOL_GPL(kernfs_path);
 
@@ -164,17 +286,25 @@ void pr_cont_kernfs_name(struct kernfs_node *kn)
 void pr_cont_kernfs_path(struct kernfs_node *kn)
 {
 	unsigned long flags;
-	char *p;
+	int sz;
 
 	spin_lock_irqsave(&kernfs_rename_lock, flags);
 
-	p = kernfs_path_locked(kn, kernfs_pr_cont_buf,
-			       sizeof(kernfs_pr_cont_buf));
-	if (p)
-		pr_cont("%s", p);
-	else
-		pr_cont("<name too long>");
+	sz = kernfs_path_from_node_locked(kn, NULL, kernfs_pr_cont_buf,
+					  sizeof(kernfs_pr_cont_buf));
+	if (sz < 0) {
+		pr_cont("(error)");
+		goto out;
+	}
+
+	if (sz >= sizeof(kernfs_pr_cont_buf)) {
+		pr_cont("(name too long)");
+		goto out;
+	}
+
+	pr_cont("%s", kernfs_pr_cont_buf);
 
+out:
 	spin_unlock_irqrestore(&kernfs_rename_lock, flags);
 }
 
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index af51df3..716bfde 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -267,8 +267,9 @@ static inline bool kernfs_ns_enabled(struct kernfs_node *kn)
 
 int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen);
 size_t kernfs_path_len(struct kernfs_node *kn);
-char * __must_check kernfs_path(struct kernfs_node *kn, char *buf,
-				size_t buflen);
+int kernfs_path_from_node(struct kernfs_node *root_kn, struct kernfs_node *kn,
+			  char *buf, size_t buflen);
+char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen);
 void pr_cont_kernfs_name(struct kernfs_node *kn);
 void pr_cont_kernfs_path(struct kernfs_node *kn);
 struct kernfs_node *kernfs_get_parent(struct kernfs_node *kn);
@@ -338,8 +339,8 @@ static inline int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen)
 static inline size_t kernfs_path_len(struct kernfs_node *kn)
 { return 0; }
 
-static inline char * __must_check kernfs_path(struct kernfs_node *kn, char *buf,
-					      size_t buflen)
+static inline char *kernfs_path(struct kernfs_node *kn, char *buf,
+				size_t buflen)
 { return NULL; }
 
 static inline void pr_cont_kernfs_name(struct kernfs_node *kn) { }
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 1/8] kernfs: Add API to generate relative kernfs path
@ 2016-01-04 19:54     ` serge.hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: serge.hallyn @ 2016-01-04 19:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: adityakali, tj, linux-api, containers, cgroups, lxc-devel, akpm,
	ebiederm, gregkh, lizefan, hannes, Serge E. Hallyn

From: Aditya Kali <adityakali@google.com>

The new function kernfs_path_from_node() generates and returns kernfs
path of a given kernfs_node relative to a given parent kernfs_node.

Signed-off-by: Aditya Kali <adityakali@google.com>
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
Changelog 20151125:
  - Fully-wing multilinecomments
  - Rework kernfs_path_from_node_locked() logic
  - Replace BUG_ONs with returning NULL
  - Use a const char* for /.. and precalculate its size
Changelog 20151130:
  - Update kernfs_path_from_node_locked comment
Changelog 20151208:
  - kernfs_node_distance:
    * Remove BUG_ON(NULL)s
    * Rename kernfs_node_distance to kernfs_depth
  - kernfs_common-ancestor:
    * Remove useless checks for depth == 0
    * Add check to ensure nodes are from same root
  - kernfs_path_from_node_locked:
    * Remove needless __must_check
    * Put p;len on its own decl line.
    * Fix wrong WARN_ONCE usage
Changelog 20151209:
  - kernfs_path_from_node: change arguments to 'to' and 'from', and
    change their order.
Changelog 20151222:
  - kernfs_path_from_node{,_locked}: return the string length.
    kernfs_path is gpl-exported, so changing their return value seemed
    ill-advised, but if noone minds I can update it too.
Changelog 20151223:
  - don't allocate memory pr_cont_kernfs_path() under spinlock
---
 fs/kernfs/dir.c        |  192 ++++++++++++++++++++++++++++++++++++++++--------
 include/linux/kernfs.h |    9 ++-
 2 files changed, 166 insertions(+), 35 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 742bf4a..f2b2187 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -44,28 +44,123 @@ static int kernfs_name_locked(struct kernfs_node *kn, char *buf, size_t buflen)
 	return strlcpy(buf, kn->parent ? kn->name : "/", buflen);
 }
 
-static char * __must_check kernfs_path_locked(struct kernfs_node *kn, char *buf,
-					      size_t buflen)
+/* kernfs_node_depth - compute depth from @from to @to */
+static size_t kernfs_depth(struct kernfs_node *from, struct kernfs_node *to)
 {
-	char *p = buf + buflen;
-	int len;
+	size_t depth = 0;
 
-	*--p = '\0';
+	while (to->parent && to != from) {
+		depth++;
+		to = to->parent;
+	}
+	return depth;
+}
 
-	do {
-		len = strlen(kn->name);
-		if (p - buf < len + 1) {
-			buf[0] = '\0';
-			p = NULL;
-			break;
-		}
-		p -= len;
-		memcpy(p, kn->name, len);
-		*--p = '/';
-		kn = kn->parent;
-	} while (kn && kn->parent);
+static struct kernfs_node *kernfs_common_ancestor(struct kernfs_node *a,
+						  struct kernfs_node *b)
+{
+	size_t da, db;
+	struct kernfs_root *ra = kernfs_root(a), *rb = kernfs_root(b);
+
+	if (ra != rb)
+		return NULL;
+
+	da = kernfs_depth(ra->kn, a);
+	db = kernfs_depth(rb->kn, b);
+
+	while (da > db) {
+		a = a->parent;
+		da--;
+	}
+	while (db > da) {
+		b = b->parent;
+		db--;
+	}
+
+	/* worst case b and a will be the same at root */
+	while (b != a) {
+		b = b->parent;
+		a = a->parent;
+	}
+
+	return a;
+}
+
+/**
+ * kernfs_path_from_node_locked - find a pseudo-absolute path to @kn_to,
+ * where kn_from is treated as root of the path.
+ * @kn_from: kernfs node which should be treated as root for the path
+ * @kn_to: kernfs node to which path is needed
+ * @buf: buffer to copy the path into
+ * @buflen: size of @buf
+ *
+ * We need to handle couple of scenarios here:
+ * [1] when @kn_from is an ancestor of @kn_to at some level
+ * kn_from: /n1/n2/n3
+ * kn_to:   /n1/n2/n3/n4/n5
+ * result:  /n4/n5
+ *
+ * [2] when @kn_from is on a different hierarchy and we need to find common
+ * ancestor between @kn_from and @kn_to.
+ * kn_from: /n1/n2/n3/n4
+ * kn_to:   /n1/n2/n5
+ * result:  /../../n5
+ * OR
+ * kn_from: /n1/n2/n3/n4/n5   [depth=5]
+ * kn_to:   /n1/n2/n3         [depth=3]
+ * result:  /../..
+ *
+ * return value: length of the string.  If greater than buflen,
+ * then contents of buf are undefined.  On error, -1 is returned.
+ */
+static int
+kernfs_path_from_node_locked(struct kernfs_node *kn_to,
+			     struct kernfs_node *kn_from, char *buf,
+			     size_t buflen)
+{
+	struct kernfs_node *kn, *common;
+	const char parent_str[] = "/..";
+	size_t depth_from, depth_to, len = 0, nlen = 0;
+	char *p;
+	int i;
+
+	if (!kn_from)
+		kn_from = kernfs_root(kn_to)->kn;
+
+	if (kn_from == kn_to)
+		return strlcpy(buf, "/", buflen);
+
+	common = kernfs_common_ancestor(kn_from, kn_to);
+	if (WARN_ON(!common))
+		return -1;
+
+	depth_to = kernfs_depth(common, kn_to);
+	depth_from = kernfs_depth(common, kn_from);
+
+	if (buf)
+		buf[0] = '\0';
 
-	return p;
+	for (i = 0; i < depth_from; i++)
+		len += strlcpy(buf + len, parent_str,
+			       len < buflen ? buflen - len : 0);
+
+	/* Calculate how many bytes we need for the rest */
+	for (kn = kn_to; kn != common; kn = kn->parent)
+		nlen += strlen(kn->name) + 1;
+
+	if (len + nlen >= buflen)
+		return len + nlen;
+
+	p = buf + len + nlen;
+	*p = '\0';
+	for (kn = kn_to; kn != common; kn = kn->parent) {
+		nlen = strlen(kn->name);
+		p -= nlen;
+		memcpy(p, kn->name, nlen);
+		*(--p) = '/';
+	}
+
+	return len + nlen;
 }
 
 /**
@@ -115,6 +210,34 @@ size_t kernfs_path_len(struct kernfs_node *kn)
 }
 
 /**
+ * kernfs_path_from_node - build path of node @to relative to @from.
+ * @from: parent kernfs_node relative to which we need to build the path
+ * @to: kernfs_node of interest
+ * @buf: buffer to copy @to's path into
+ * @buflen: size of @buf
+ *
+ * Builds @to's path relative to @from in @buf. @from and @to must
+ * be on the same kernfs-root. If @from is not parent of @to, then a relative
+ * path (which includes '..'s) as needed to reach from @from to @to is
+ * returned.
+ * 
+ * If @buf isn't long enough, the return value will be greater than @buflen
+ * and @buf contents are undefined.
+ */
+int kernfs_path_from_node(struct kernfs_node *to, struct kernfs_node *from,
+			  char *buf, size_t buflen)
+{
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(&kernfs_rename_lock, flags);
+	ret = kernfs_path_from_node_locked(to, from, buf, buflen);
+	spin_unlock_irqrestore(&kernfs_rename_lock, flags);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(kernfs_path_from_node);
+
+/**
  * kernfs_path - build full path of a given node
  * @kn: kernfs_node of interest
  * @buf: buffer to copy @kn's name into
@@ -127,13 +250,12 @@ size_t kernfs_path_len(struct kernfs_node *kn)
  */
 char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen)
 {
-	unsigned long flags;
-	char *p;
+	int ret;
 
-	spin_lock_irqsave(&kernfs_rename_lock, flags);
-	p = kernfs_path_locked(kn, buf, buflen);
-	spin_unlock_irqrestore(&kernfs_rename_lock, flags);
-	return p;
+	ret = kernfs_path_from_node(kn, NULL, buf, buflen);
+	if (ret < 0 || ret >= buflen)
+		return NULL;
+	return buf;
 }
 EXPORT_SYMBOL_GPL(kernfs_path);
 
@@ -164,17 +286,25 @@ void pr_cont_kernfs_name(struct kernfs_node *kn)
 void pr_cont_kernfs_path(struct kernfs_node *kn)
 {
 	unsigned long flags;
-	char *p;
+	int sz;
 
 	spin_lock_irqsave(&kernfs_rename_lock, flags);
 
-	p = kernfs_path_locked(kn, kernfs_pr_cont_buf,
-			       sizeof(kernfs_pr_cont_buf));
-	if (p)
-		pr_cont("%s", p);
-	else
-		pr_cont("<name too long>");
+	sz = kernfs_path_from_node_locked(kn, NULL, kernfs_pr_cont_buf,
+					  sizeof(kernfs_pr_cont_buf));
+	if (sz < 0) {
+		pr_cont("(error)");
+		goto out;
+	}
+
+	if (sz >= sizeof(kernfs_pr_cont_buf)) {
+		pr_cont("(name too long)");
+		goto out;
+	}
+
+	pr_cont("%s", kernfs_pr_cont_buf);
 
+out:
 	spin_unlock_irqrestore(&kernfs_rename_lock, flags);
 }
 
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index af51df3..716bfde 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -267,8 +267,9 @@ static inline bool kernfs_ns_enabled(struct kernfs_node *kn)
 
 int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen);
 size_t kernfs_path_len(struct kernfs_node *kn);
-char * __must_check kernfs_path(struct kernfs_node *kn, char *buf,
-				size_t buflen);
+int kernfs_path_from_node(struct kernfs_node *root_kn, struct kernfs_node *kn,
+			  char *buf, size_t buflen);
+char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen);
 void pr_cont_kernfs_name(struct kernfs_node *kn);
 void pr_cont_kernfs_path(struct kernfs_node *kn);
 struct kernfs_node *kernfs_get_parent(struct kernfs_node *kn);
@@ -338,8 +339,8 @@ static inline int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen)
 static inline size_t kernfs_path_len(struct kernfs_node *kn)
 { return 0; }
 
-static inline char * __must_check kernfs_path(struct kernfs_node *kn, char *buf,
-					      size_t buflen)
+static inline char *kernfs_path(struct kernfs_node *kn, char *buf,
+				size_t buflen)
 { return NULL; }
 
 static inline void pr_cont_kernfs_name(struct kernfs_node *kn) { }
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
       [not found]       ` <20151223162433.GH5003-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2015-12-23 16:51         ` Greg KH
  0 siblings, 0 replies; 180+ messages in thread
From: Greg KH @ 2015-12-23 16:51 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	hannes-druUgvl0LCNAfugRpC6u6w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On Wed, Dec 23, 2015 at 11:24:33AM -0500, Tejun Heo wrote:
> On Tue, Dec 22, 2015 at 10:23:22PM -0600, serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org wrote:
> > From: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > 
> > The new function kernfs_path_from_node() generates and returns kernfs
> > path of a given kernfs_node relative to a given parent kernfs_node.
> > 
> > Signed-off-by: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > Signed-off-by: Serge E. Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> 
> Greg, can I route this together with other changes?

Yes, please do:

Acked-by: Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
       [not found]       ` <20151223162433.GH5003-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2015-12-23 16:51         ` Greg KH
  0 siblings, 0 replies; 180+ messages in thread
From: Greg KH @ 2015-12-23 16:51 UTC (permalink / raw)
  To: Tejun Heo
  Cc: serge.hallyn, linux-kernel, adityakali, linux-api, containers,
	cgroups, lxc-devel, akpm, ebiederm, lizefan, hannes,
	Serge E. Hallyn

On Wed, Dec 23, 2015 at 11:24:33AM -0500, Tejun Heo wrote:
> On Tue, Dec 22, 2015 at 10:23:22PM -0600, serge.hallyn@ubuntu.com wrote:
> > From: Aditya Kali <adityakali@google.com>
> > 
> > The new function kernfs_path_from_node() generates and returns kernfs
> > path of a given kernfs_node relative to a given parent kernfs_node.
> > 
> > Signed-off-by: Aditya Kali <adityakali@google.com>
> > Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
> 
> Greg, can I route this together with other changes?

Yes, please do:

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
@ 2015-12-23 16:51         ` Greg KH
  0 siblings, 0 replies; 180+ messages in thread
From: Greg KH @ 2015-12-23 16:51 UTC (permalink / raw)
  To: Tejun Heo
  Cc: serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	adityakali-hpIqsD4AKlfQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, lizefan-hv44wF8Li93QT0dZR+AlfA,
	hannes-druUgvl0LCNAfugRpC6u6w, Serge E. Hallyn

On Wed, Dec 23, 2015 at 11:24:33AM -0500, Tejun Heo wrote:
> On Tue, Dec 22, 2015 at 10:23:22PM -0600, serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org wrote:
> > From: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > 
> > The new function kernfs_path_from_node() generates and returns kernfs
> > path of a given kernfs_node relative to a given parent kernfs_node.
> > 
> > Signed-off-by: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > Signed-off-by: Serge E. Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> 
> Greg, can I route this together with other changes?

Yes, please do:

Acked-by: Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
       [not found]       ` <20151223160854.GF5003-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2015-12-23 16:36         ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-12-23 16:36 UTC (permalink / raw)
  To: Tejun Heo
  Cc: gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	hannes-druUgvl0LCNAfugRpC6u6w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On Wed, Dec 23, 2015 at 11:08:54AM -0500, Tejun Heo wrote:
> Hello, Serge.
> 
> On Tue, Dec 22, 2015 at 10:23:22PM -0600, serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org wrote:
> > @@ -164,18 +286,39 @@ void pr_cont_kernfs_name(struct kernfs_node *kn)
> >  void pr_cont_kernfs_path(struct kernfs_node *kn)
> >  {
> >  	unsigned long flags;
> > -	char *p;
> > +	char *p = NULL;
> > +	int sz1, sz2;
> >  
> >  	spin_lock_irqsave(&kernfs_rename_lock, flags);
> >  
> > -	p = kernfs_path_locked(kn, kernfs_pr_cont_buf,
> > -			       sizeof(kernfs_pr_cont_buf));
> > -	if (p)
> > -		pr_cont("%s", p);
> > -	else
> > -		pr_cont("<name too long>");
> > +	sz1 = kernfs_path_from_node_locked(kn, NULL, kernfs_pr_cont_buf,
> > +					   sizeof(kernfs_pr_cont_buf));
> > +	if (sz1 < 0) {
> > +		pr_cont("(error)");
> > +		goto out;
> > +	}
> > +
> > +	if (sz1 < sizeof(kernfs_pr_cont_buf)) {
> > +		pr_cont("%s", kernfs_pr_cont_buf);
> > +		goto out;
> > +	}
> > +
> > +	p = kmalloc(sz1 + 1, GFP_NOFS);
> 
> We can't do GFP_NOFS allocation while holding a spinlock and we don't
> want to do atomic allocation here either.  I think it'd be best to
> keep using the static buffer.

D'oh, right.  Will update.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
       [not found]       ` <20151223160854.GF5003-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2015-12-23 16:36         ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-12-23 16:36 UTC (permalink / raw)
  To: Tejun Heo
  Cc: serge.hallyn, linux-api, containers, hannes, linux-kernel,
	ebiederm, lxc-devel, gregkh, cgroups, akpm

On Wed, Dec 23, 2015 at 11:08:54AM -0500, Tejun Heo wrote:
> Hello, Serge.
> 
> On Tue, Dec 22, 2015 at 10:23:22PM -0600, serge.hallyn@ubuntu.com wrote:
> > @@ -164,18 +286,39 @@ void pr_cont_kernfs_name(struct kernfs_node *kn)
> >  void pr_cont_kernfs_path(struct kernfs_node *kn)
> >  {
> >  	unsigned long flags;
> > -	char *p;
> > +	char *p = NULL;
> > +	int sz1, sz2;
> >  
> >  	spin_lock_irqsave(&kernfs_rename_lock, flags);
> >  
> > -	p = kernfs_path_locked(kn, kernfs_pr_cont_buf,
> > -			       sizeof(kernfs_pr_cont_buf));
> > -	if (p)
> > -		pr_cont("%s", p);
> > -	else
> > -		pr_cont("<name too long>");
> > +	sz1 = kernfs_path_from_node_locked(kn, NULL, kernfs_pr_cont_buf,
> > +					   sizeof(kernfs_pr_cont_buf));
> > +	if (sz1 < 0) {
> > +		pr_cont("(error)");
> > +		goto out;
> > +	}
> > +
> > +	if (sz1 < sizeof(kernfs_pr_cont_buf)) {
> > +		pr_cont("%s", kernfs_pr_cont_buf);
> > +		goto out;
> > +	}
> > +
> > +	p = kmalloc(sz1 + 1, GFP_NOFS);
> 
> We can't do GFP_NOFS allocation while holding a spinlock and we don't
> want to do atomic allocation here either.  I think it'd be best to
> keep using the static buffer.

D'oh, right.  Will update.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
@ 2015-12-23 16:36         ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-12-23 16:36 UTC (permalink / raw)
  To: Tejun Heo
  Cc: serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On Wed, Dec 23, 2015 at 11:08:54AM -0500, Tejun Heo wrote:
> Hello, Serge.
> 
> On Tue, Dec 22, 2015 at 10:23:22PM -0600, serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org wrote:
> > @@ -164,18 +286,39 @@ void pr_cont_kernfs_name(struct kernfs_node *kn)
> >  void pr_cont_kernfs_path(struct kernfs_node *kn)
> >  {
> >  	unsigned long flags;
> > -	char *p;
> > +	char *p = NULL;
> > +	int sz1, sz2;
> >  
> >  	spin_lock_irqsave(&kernfs_rename_lock, flags);
> >  
> > -	p = kernfs_path_locked(kn, kernfs_pr_cont_buf,
> > -			       sizeof(kernfs_pr_cont_buf));
> > -	if (p)
> > -		pr_cont("%s", p);
> > -	else
> > -		pr_cont("<name too long>");
> > +	sz1 = kernfs_path_from_node_locked(kn, NULL, kernfs_pr_cont_buf,
> > +					   sizeof(kernfs_pr_cont_buf));
> > +	if (sz1 < 0) {
> > +		pr_cont("(error)");
> > +		goto out;
> > +	}
> > +
> > +	if (sz1 < sizeof(kernfs_pr_cont_buf)) {
> > +		pr_cont("%s", kernfs_pr_cont_buf);
> > +		goto out;
> > +	}
> > +
> > +	p = kmalloc(sz1 + 1, GFP_NOFS);
> 
> We can't do GFP_NOFS allocation while holding a spinlock and we don't
> want to do atomic allocation here either.  I think it'd be best to
> keep using the static buffer.

D'oh, right.  Will update.

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
       [not found]     ` <1450844609-9194-2-git-send-email-serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
  2015-12-23 16:08       ` Tejun Heo
@ 2015-12-23 16:24       ` Tejun Heo
  1 sibling, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-12-23 16:24 UTC (permalink / raw)
  To: serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On Tue, Dec 22, 2015 at 10:23:22PM -0600, serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org wrote:
> From: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> 
> The new function kernfs_path_from_node() generates and returns kernfs
> path of a given kernfs_node relative to a given parent kernfs_node.
> 
> Signed-off-by: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Serge E. Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

Greg, can I route this together with other changes?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
       [not found]     ` <1450844609-9194-2-git-send-email-serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
@ 2015-12-23 16:24       ` Tejun Heo
  2015-12-23 16:24       ` Tejun Heo
  1 sibling, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-12-23 16:24 UTC (permalink / raw)
  To: serge.hallyn
  Cc: linux-kernel, adityakali, linux-api, containers, cgroups,
	lxc-devel, akpm, ebiederm, gregkh, lizefan, hannes,
	Serge E. Hallyn

On Tue, Dec 22, 2015 at 10:23:22PM -0600, serge.hallyn@ubuntu.com wrote:
> From: Aditya Kali <adityakali@google.com>
> 
> The new function kernfs_path_from_node() generates and returns kernfs
> path of a given kernfs_node relative to a given parent kernfs_node.
> 
> Signed-off-by: Aditya Kali <adityakali@google.com>
> Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>

Greg, can I route this together with other changes?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
@ 2015-12-23 16:24       ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-12-23 16:24 UTC (permalink / raw)
  To: serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	adityakali-hpIqsD4AKlfQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w,
	Serge E. Hallyn

On Tue, Dec 22, 2015 at 10:23:22PM -0600, serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org wrote:
> From: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> 
> The new function kernfs_path_from_node() generates and returns kernfs
> path of a given kernfs_node relative to a given parent kernfs_node.
> 
> Signed-off-by: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Serge E. Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

Greg, can I route this together with other changes?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
       [not found]     ` <1450844609-9194-2-git-send-email-serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
@ 2015-12-23 16:08       ` Tejun Heo
  2015-12-23 16:24       ` Tejun Heo
  1 sibling, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-12-23 16:08 UTC (permalink / raw)
  To: serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

Hello, Serge.

On Tue, Dec 22, 2015 at 10:23:22PM -0600, serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org wrote:
> @@ -164,18 +286,39 @@ void pr_cont_kernfs_name(struct kernfs_node *kn)
>  void pr_cont_kernfs_path(struct kernfs_node *kn)
>  {
>  	unsigned long flags;
> -	char *p;
> +	char *p = NULL;
> +	int sz1, sz2;
>  
>  	spin_lock_irqsave(&kernfs_rename_lock, flags);
>  
> -	p = kernfs_path_locked(kn, kernfs_pr_cont_buf,
> -			       sizeof(kernfs_pr_cont_buf));
> -	if (p)
> -		pr_cont("%s", p);
> -	else
> -		pr_cont("<name too long>");
> +	sz1 = kernfs_path_from_node_locked(kn, NULL, kernfs_pr_cont_buf,
> +					   sizeof(kernfs_pr_cont_buf));
> +	if (sz1 < 0) {
> +		pr_cont("(error)");
> +		goto out;
> +	}
> +
> +	if (sz1 < sizeof(kernfs_pr_cont_buf)) {
> +		pr_cont("%s", kernfs_pr_cont_buf);
> +		goto out;
> +	}
> +
> +	p = kmalloc(sz1 + 1, GFP_NOFS);

We can't do GFP_NOFS allocation while holding a spinlock and we don't
want to do atomic allocation here either.  I think it'd be best to
keep using the static buffer.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
       [not found]     ` <1450844609-9194-2-git-send-email-serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
@ 2015-12-23 16:08       ` Tejun Heo
  2015-12-23 16:24       ` Tejun Heo
  1 sibling, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-12-23 16:08 UTC (permalink / raw)
  To: serge.hallyn
  Cc: linux-kernel, adityakali, linux-api, containers, cgroups,
	lxc-devel, akpm, ebiederm, gregkh, lizefan, hannes,
	Serge E. Hallyn

Hello, Serge.

On Tue, Dec 22, 2015 at 10:23:22PM -0600, serge.hallyn@ubuntu.com wrote:
> @@ -164,18 +286,39 @@ void pr_cont_kernfs_name(struct kernfs_node *kn)
>  void pr_cont_kernfs_path(struct kernfs_node *kn)
>  {
>  	unsigned long flags;
> -	char *p;
> +	char *p = NULL;
> +	int sz1, sz2;
>  
>  	spin_lock_irqsave(&kernfs_rename_lock, flags);
>  
> -	p = kernfs_path_locked(kn, kernfs_pr_cont_buf,
> -			       sizeof(kernfs_pr_cont_buf));
> -	if (p)
> -		pr_cont("%s", p);
> -	else
> -		pr_cont("<name too long>");
> +	sz1 = kernfs_path_from_node_locked(kn, NULL, kernfs_pr_cont_buf,
> +					   sizeof(kernfs_pr_cont_buf));
> +	if (sz1 < 0) {
> +		pr_cont("(error)");
> +		goto out;
> +	}
> +
> +	if (sz1 < sizeof(kernfs_pr_cont_buf)) {
> +		pr_cont("%s", kernfs_pr_cont_buf);
> +		goto out;
> +	}
> +
> +	p = kmalloc(sz1 + 1, GFP_NOFS);

We can't do GFP_NOFS allocation while holding a spinlock and we don't
want to do atomic allocation here either.  I think it'd be best to
keep using the static buffer.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
@ 2015-12-23 16:08       ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-12-23 16:08 UTC (permalink / raw)
  To: serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	adityakali-hpIqsD4AKlfQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w,
	Serge E. Hallyn

Hello, Serge.

On Tue, Dec 22, 2015 at 10:23:22PM -0600, serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org wrote:
> @@ -164,18 +286,39 @@ void pr_cont_kernfs_name(struct kernfs_node *kn)
>  void pr_cont_kernfs_path(struct kernfs_node *kn)
>  {
>  	unsigned long flags;
> -	char *p;
> +	char *p = NULL;
> +	int sz1, sz2;
>  
>  	spin_lock_irqsave(&kernfs_rename_lock, flags);
>  
> -	p = kernfs_path_locked(kn, kernfs_pr_cont_buf,
> -			       sizeof(kernfs_pr_cont_buf));
> -	if (p)
> -		pr_cont("%s", p);
> -	else
> -		pr_cont("<name too long>");
> +	sz1 = kernfs_path_from_node_locked(kn, NULL, kernfs_pr_cont_buf,
> +					   sizeof(kernfs_pr_cont_buf));
> +	if (sz1 < 0) {
> +		pr_cont("(error)");
> +		goto out;
> +	}
> +
> +	if (sz1 < sizeof(kernfs_pr_cont_buf)) {
> +		pr_cont("%s", kernfs_pr_cont_buf);
> +		goto out;
> +	}
> +
> +	p = kmalloc(sz1 + 1, GFP_NOFS);

We can't do GFP_NOFS allocation while holding a spinlock and we don't
want to do atomic allocation here either.  I think it'd be best to
keep using the static buffer.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 1/8] kernfs: Add API to generate relative kernfs path
  2015-12-23  4:23 CGroup Namespaces (v8) serge.hallyn
@ 2015-12-23  4:23     ` serge.hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA @ 2015-12-23  4:23 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	hannes-druUgvl0LCNAfugRpC6u6w, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	tj-DgEjT+Ai2ygdnm+yROfE0A, cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

From: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>

The new function kernfs_path_from_node() generates and returns kernfs
path of a given kernfs_node relative to a given parent kernfs_node.

Signed-off-by: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Serge E. Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
Changelog 20151125:
  - Fully-wing multilinecomments
  - Rework kernfs_path_from_node_locked() logic
  - Replace BUG_ONs with returning NULL
  - Use a const char* for /.. and precalculate its size
Changelog 20151130:
  - Update kernfs_path_from_node_locked comment
Changelog 20151208:
  - kernfs_node_distance:
    * Remove BUG_ON(NULL)s
    * Rename kernfs_node_distance to kernfs_depth
  - kernfs_common-ancestor:
    * Remove useless checks for depth == 0
    * Add check to ensure nodes are from same root
  - kernfs_path_from_node_locked:
    * Remove needless __must_check
    * Put p;len on its own decl line.
    * Fix wrong WARN_ONCE usage
Changelog 20151209:
  - kernfs_path_from_node: change arguments to 'to' and 'from', and
    change their order.
Changelog 20151222:
  - kernfs_path_from_node{,_locked}: return the string length.
    kernfs_path is gpl-exported, so changing their return value seemed
    ill-advised, but if noone minds I can update it too.
---
 fs/kernfs/dir.c        |  205 ++++++++++++++++++++++++++++++++++++++++--------
 include/linux/kernfs.h |    9 ++-
 2 files changed, 179 insertions(+), 35 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 742bf4a..e82b9a1 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -44,28 +44,123 @@ static int kernfs_name_locked(struct kernfs_node *kn, char *buf, size_t buflen)
 	return strlcpy(buf, kn->parent ? kn->name : "/", buflen);
 }
 
-static char * __must_check kernfs_path_locked(struct kernfs_node *kn, char *buf,
-					      size_t buflen)
+/* kernfs_node_depth - compute depth from @from to @to */
+static size_t kernfs_depth(struct kernfs_node *from, struct kernfs_node *to)
 {
-	char *p = buf + buflen;
-	int len;
+	size_t depth = 0;
 
-	*--p = '\0';
+	while (to->parent && to != from) {
+		depth++;
+		to = to->parent;
+	}
+	return depth;
+}
 
-	do {
-		len = strlen(kn->name);
-		if (p - buf < len + 1) {
-			buf[0] = '\0';
-			p = NULL;
-			break;
-		}
-		p -= len;
-		memcpy(p, kn->name, len);
-		*--p = '/';
-		kn = kn->parent;
-	} while (kn && kn->parent);
+static struct kernfs_node *kernfs_common_ancestor(struct kernfs_node *a,
+						  struct kernfs_node *b)
+{
+	size_t da, db;
+	struct kernfs_root *ra = kernfs_root(a), *rb = kernfs_root(b);
+
+	if (ra != rb)
+		return NULL;
+
+	da = kernfs_depth(ra->kn, a);
+	db = kernfs_depth(rb->kn, b);
+
+	while (da > db) {
+		a = a->parent;
+		da--;
+	}
+	while (db > da) {
+		b = b->parent;
+		db--;
+	}
+
+	/* worst case b and a will be the same at root */
+	while (b != a) {
+		b = b->parent;
+		a = a->parent;
+	}
+
+	return a;
+}
+
+/**
+ * kernfs_path_from_node_locked - find a pseudo-absolute path to @kn_to,
+ * where kn_from is treated as root of the path.
+ * @kn_from: kernfs node which should be treated as root for the path
+ * @kn_to: kernfs node to which path is needed
+ * @buf: buffer to copy the path into
+ * @buflen: size of @buf
+ *
+ * We need to handle couple of scenarios here:
+ * [1] when @kn_from is an ancestor of @kn_to at some level
+ * kn_from: /n1/n2/n3
+ * kn_to:   /n1/n2/n3/n4/n5
+ * result:  /n4/n5
+ *
+ * [2] when @kn_from is on a different hierarchy and we need to find common
+ * ancestor between @kn_from and @kn_to.
+ * kn_from: /n1/n2/n3/n4
+ * kn_to:   /n1/n2/n5
+ * result:  /../../n5
+ * OR
+ * kn_from: /n1/n2/n3/n4/n5   [depth=5]
+ * kn_to:   /n1/n2/n3         [depth=3]
+ * result:  /../..
+ *
+ * return value: length of the string.  If greater than buflen,
+ * then contents of buf are undefined.  On error, -1 is returned.
+ */
+static int
+kernfs_path_from_node_locked(struct kernfs_node *kn_to,
+			     struct kernfs_node *kn_from, char *buf,
+			     size_t buflen)
+{
+	struct kernfs_node *kn, *common;
+	const char parent_str[] = "/..";
+	size_t depth_from, depth_to, len = 0, nlen = 0;
+	char *p;
+	int i;
+
+	if (!kn_from)
+		kn_from = kernfs_root(kn_to)->kn;
+
+	if (kn_from == kn_to)
+		return strlcpy(buf, "/", buflen);
+
+	common = kernfs_common_ancestor(kn_from, kn_to);
+	if (WARN_ON(!common))
+		return -1;
+
+	depth_to = kernfs_depth(common, kn_to);
+	depth_from = kernfs_depth(common, kn_from);
+
+	if (buf)
+		buf[0] = '\0';
+
+	for (i = 0; i < depth_from; i++)
+		len += strlcpy(buf + len, parent_str,
+			       len < buflen ? buflen - len : 0);
+
+	/* Calculate how many bytes we need for the rest */
+	for (kn = kn_to; kn != common; kn = kn->parent)
+		nlen += strlen(kn->name) + 1;
 
-	return p;
+	if (len + nlen >= buflen)
+		return len + nlen;
+
+	p = buf + len + nlen;
+	*p = '\0';
+	for (kn = kn_to; kn != common; kn = kn->parent) {
+		nlen = strlen(kn->name);
+		p -= nlen;
+		memcpy(p, kn->name, nlen);
+		*(--p) = '/';
+	}
+
+	return len + nlen;
 }
 
 /**
@@ -115,6 +210,34 @@ size_t kernfs_path_len(struct kernfs_node *kn)
 }
 
 /**
+ * kernfs_path_from_node - build path of node @to relative to @from.
+ * @from: parent kernfs_node relative to which we need to build the path
+ * @to: kernfs_node of interest
+ * @buf: buffer to copy @to's path into
+ * @buflen: size of @buf
+ *
+ * Builds @to's path relative to @from in @buf. @from and @to must
+ * be on the same kernfs-root. If @from is not parent of @to, then a relative
+ * path (which includes '..'s) as needed to reach from @from to @to is
+ * returned.
+ * 
+ * If @buf isn't long enough, the return value will be greater than @buflen
+ * and @buf contents are undefined.
+ */
+int kernfs_path_from_node(struct kernfs_node *to, struct kernfs_node *from,
+			  char *buf, size_t buflen)
+{
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(&kernfs_rename_lock, flags);
+	ret = kernfs_path_from_node_locked(to, from, buf, buflen);
+	spin_unlock_irqrestore(&kernfs_rename_lock, flags);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(kernfs_path_from_node);
+
+/**
  * kernfs_path - build full path of a given node
  * @kn: kernfs_node of interest
  * @buf: buffer to copy @kn's name into
@@ -127,13 +250,12 @@ size_t kernfs_path_len(struct kernfs_node *kn)
  */
 char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen)
 {
-	unsigned long flags;
-	char *p;
+	int ret;
 
-	spin_lock_irqsave(&kernfs_rename_lock, flags);
-	p = kernfs_path_locked(kn, buf, buflen);
-	spin_unlock_irqrestore(&kernfs_rename_lock, flags);
-	return p;
+	ret = kernfs_path_from_node(kn, NULL, buf, buflen);
+	if (ret < 0 || ret >= buflen)
+		return NULL;
+	return buf;
 }
 EXPORT_SYMBOL_GPL(kernfs_path);
 
@@ -164,18 +286,39 @@ void pr_cont_kernfs_name(struct kernfs_node *kn)
 void pr_cont_kernfs_path(struct kernfs_node *kn)
 {
 	unsigned long flags;
-	char *p;
+	char *p = NULL;
+	int sz1, sz2;
 
 	spin_lock_irqsave(&kernfs_rename_lock, flags);
 
-	p = kernfs_path_locked(kn, kernfs_pr_cont_buf,
-			       sizeof(kernfs_pr_cont_buf));
-	if (p)
-		pr_cont("%s", p);
-	else
-		pr_cont("<name too long>");
+	sz1 = kernfs_path_from_node_locked(kn, NULL, kernfs_pr_cont_buf,
+					   sizeof(kernfs_pr_cont_buf));
+	if (sz1 < 0) {
+		pr_cont("(error)");
+		goto out;
+	}
+
+	if (sz1 < sizeof(kernfs_pr_cont_buf)) {
+		pr_cont("%s", kernfs_pr_cont_buf);
+		goto out;
+	}
+
+	p = kmalloc(sz1 + 1, GFP_NOFS);
+	if (!p) {
+		pr_cont("(out of memory)");
+		goto out;
+	}
+	sz2 = kernfs_path_from_node_locked(kn, NULL, p, sz1 + 1);
+	if (sz2 > sz1 || sz2 < 0) {
+		pr_cont("(error)");
+		goto out;
+	}
+
+	pr_cont("%s", p);
 
+out:
 	spin_unlock_irqrestore(&kernfs_rename_lock, flags);
+	kfree(p);
 }
 
 /**
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index af51df3..716bfde 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -267,8 +267,9 @@ static inline bool kernfs_ns_enabled(struct kernfs_node *kn)
 
 int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen);
 size_t kernfs_path_len(struct kernfs_node *kn);
-char * __must_check kernfs_path(struct kernfs_node *kn, char *buf,
-				size_t buflen);
+int kernfs_path_from_node(struct kernfs_node *root_kn, struct kernfs_node *kn,
+			  char *buf, size_t buflen);
+char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen);
 void pr_cont_kernfs_name(struct kernfs_node *kn);
 void pr_cont_kernfs_path(struct kernfs_node *kn);
 struct kernfs_node *kernfs_get_parent(struct kernfs_node *kn);
@@ -338,8 +339,8 @@ static inline int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen)
 static inline size_t kernfs_path_len(struct kernfs_node *kn)
 { return 0; }
 
-static inline char * __must_check kernfs_path(struct kernfs_node *kn, char *buf,
-					      size_t buflen)
+static inline char *kernfs_path(struct kernfs_node *kn, char *buf,
+				size_t buflen)
 { return NULL; }
 
 static inline void pr_cont_kernfs_name(struct kernfs_node *kn) { }
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 1/8] kernfs: Add API to generate relative kernfs path
@ 2015-12-23  4:23     ` serge.hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: serge.hallyn @ 2015-12-23  4:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: adityakali, tj, linux-api, containers, cgroups, lxc-devel, akpm,
	ebiederm, gregkh, lizefan, hannes, Serge E. Hallyn

From: Aditya Kali <adityakali@google.com>

The new function kernfs_path_from_node() generates and returns kernfs
path of a given kernfs_node relative to a given parent kernfs_node.

Signed-off-by: Aditya Kali <adityakali@google.com>
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
---
Changelog 20151125:
  - Fully-wing multilinecomments
  - Rework kernfs_path_from_node_locked() logic
  - Replace BUG_ONs with returning NULL
  - Use a const char* for /.. and precalculate its size
Changelog 20151130:
  - Update kernfs_path_from_node_locked comment
Changelog 20151208:
  - kernfs_node_distance:
    * Remove BUG_ON(NULL)s
    * Rename kernfs_node_distance to kernfs_depth
  - kernfs_common-ancestor:
    * Remove useless checks for depth == 0
    * Add check to ensure nodes are from same root
  - kernfs_path_from_node_locked:
    * Remove needless __must_check
    * Put p;len on its own decl line.
    * Fix wrong WARN_ONCE usage
Changelog 20151209:
  - kernfs_path_from_node: change arguments to 'to' and 'from', and
    change their order.
Changelog 20151222:
  - kernfs_path_from_node{,_locked}: return the string length.
    kernfs_path is gpl-exported, so changing their return value seemed
    ill-advised, but if noone minds I can update it too.
---
 fs/kernfs/dir.c        |  205 ++++++++++++++++++++++++++++++++++++++++--------
 include/linux/kernfs.h |    9 ++-
 2 files changed, 179 insertions(+), 35 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 742bf4a..e82b9a1 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -44,28 +44,123 @@ static int kernfs_name_locked(struct kernfs_node *kn, char *buf, size_t buflen)
 	return strlcpy(buf, kn->parent ? kn->name : "/", buflen);
 }
 
-static char * __must_check kernfs_path_locked(struct kernfs_node *kn, char *buf,
-					      size_t buflen)
+/* kernfs_node_depth - compute depth from @from to @to */
+static size_t kernfs_depth(struct kernfs_node *from, struct kernfs_node *to)
 {
-	char *p = buf + buflen;
-	int len;
+	size_t depth = 0;
 
-	*--p = '\0';
+	while (to->parent && to != from) {
+		depth++;
+		to = to->parent;
+	}
+	return depth;
+}
 
-	do {
-		len = strlen(kn->name);
-		if (p - buf < len + 1) {
-			buf[0] = '\0';
-			p = NULL;
-			break;
-		}
-		p -= len;
-		memcpy(p, kn->name, len);
-		*--p = '/';
-		kn = kn->parent;
-	} while (kn && kn->parent);
+static struct kernfs_node *kernfs_common_ancestor(struct kernfs_node *a,
+						  struct kernfs_node *b)
+{
+	size_t da, db;
+	struct kernfs_root *ra = kernfs_root(a), *rb = kernfs_root(b);
+
+	if (ra != rb)
+		return NULL;
+
+	da = kernfs_depth(ra->kn, a);
+	db = kernfs_depth(rb->kn, b);
+
+	while (da > db) {
+		a = a->parent;
+		da--;
+	}
+	while (db > da) {
+		b = b->parent;
+		db--;
+	}
+
+	/* worst case b and a will be the same at root */
+	while (b != a) {
+		b = b->parent;
+		a = a->parent;
+	}
+
+	return a;
+}
+
+/**
+ * kernfs_path_from_node_locked - find a pseudo-absolute path to @kn_to,
+ * where kn_from is treated as root of the path.
+ * @kn_from: kernfs node which should be treated as root for the path
+ * @kn_to: kernfs node to which path is needed
+ * @buf: buffer to copy the path into
+ * @buflen: size of @buf
+ *
+ * We need to handle couple of scenarios here:
+ * [1] when @kn_from is an ancestor of @kn_to at some level
+ * kn_from: /n1/n2/n3
+ * kn_to:   /n1/n2/n3/n4/n5
+ * result:  /n4/n5
+ *
+ * [2] when @kn_from is on a different hierarchy and we need to find common
+ * ancestor between @kn_from and @kn_to.
+ * kn_from: /n1/n2/n3/n4
+ * kn_to:   /n1/n2/n5
+ * result:  /../../n5
+ * OR
+ * kn_from: /n1/n2/n3/n4/n5   [depth=5]
+ * kn_to:   /n1/n2/n3         [depth=3]
+ * result:  /../..
+ *
+ * return value: length of the string.  If greater than buflen,
+ * then contents of buf are undefined.  On error, -1 is returned.
+ */
+static int
+kernfs_path_from_node_locked(struct kernfs_node *kn_to,
+			     struct kernfs_node *kn_from, char *buf,
+			     size_t buflen)
+{
+	struct kernfs_node *kn, *common;
+	const char parent_str[] = "/..";
+	size_t depth_from, depth_to, len = 0, nlen = 0;
+	char *p;
+	int i;
+
+	if (!kn_from)
+		kn_from = kernfs_root(kn_to)->kn;
+
+	if (kn_from == kn_to)
+		return strlcpy(buf, "/", buflen);
+
+	common = kernfs_common_ancestor(kn_from, kn_to);
+	if (WARN_ON(!common))
+		return -1;
+
+	depth_to = kernfs_depth(common, kn_to);
+	depth_from = kernfs_depth(common, kn_from);
+
+	if (buf)
+		buf[0] = '\0';
+
+	for (i = 0; i < depth_from; i++)
+		len += strlcpy(buf + len, parent_str,
+			       len < buflen ? buflen - len : 0);
+
+	/* Calculate how many bytes we need for the rest */
+	for (kn = kn_to; kn != common; kn = kn->parent)
+		nlen += strlen(kn->name) + 1;
 
-	return p;
+	if (len + nlen >= buflen)
+		return len + nlen;
+
+	p = buf + len + nlen;
+	*p = '\0';
+	for (kn = kn_to; kn != common; kn = kn->parent) {
+		nlen = strlen(kn->name);
+		p -= nlen;
+		memcpy(p, kn->name, nlen);
+		*(--p) = '/';
+	}
+
+	return len + nlen;
 }
 
 /**
@@ -115,6 +210,34 @@ size_t kernfs_path_len(struct kernfs_node *kn)
 }
 
 /**
+ * kernfs_path_from_node - build path of node @to relative to @from.
+ * @from: parent kernfs_node relative to which we need to build the path
+ * @to: kernfs_node of interest
+ * @buf: buffer to copy @to's path into
+ * @buflen: size of @buf
+ *
+ * Builds @to's path relative to @from in @buf. @from and @to must
+ * be on the same kernfs-root. If @from is not parent of @to, then a relative
+ * path (which includes '..'s) as needed to reach from @from to @to is
+ * returned.
+ * 
+ * If @buf isn't long enough, the return value will be greater than @buflen
+ * and @buf contents are undefined.
+ */
+int kernfs_path_from_node(struct kernfs_node *to, struct kernfs_node *from,
+			  char *buf, size_t buflen)
+{
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(&kernfs_rename_lock, flags);
+	ret = kernfs_path_from_node_locked(to, from, buf, buflen);
+	spin_unlock_irqrestore(&kernfs_rename_lock, flags);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(kernfs_path_from_node);
+
+/**
  * kernfs_path - build full path of a given node
  * @kn: kernfs_node of interest
  * @buf: buffer to copy @kn's name into
@@ -127,13 +250,12 @@ size_t kernfs_path_len(struct kernfs_node *kn)
  */
 char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen)
 {
-	unsigned long flags;
-	char *p;
+	int ret;
 
-	spin_lock_irqsave(&kernfs_rename_lock, flags);
-	p = kernfs_path_locked(kn, buf, buflen);
-	spin_unlock_irqrestore(&kernfs_rename_lock, flags);
-	return p;
+	ret = kernfs_path_from_node(kn, NULL, buf, buflen);
+	if (ret < 0 || ret >= buflen)
+		return NULL;
+	return buf;
 }
 EXPORT_SYMBOL_GPL(kernfs_path);
 
@@ -164,18 +286,39 @@ void pr_cont_kernfs_name(struct kernfs_node *kn)
 void pr_cont_kernfs_path(struct kernfs_node *kn)
 {
 	unsigned long flags;
-	char *p;
+	char *p = NULL;
+	int sz1, sz2;
 
 	spin_lock_irqsave(&kernfs_rename_lock, flags);
 
-	p = kernfs_path_locked(kn, kernfs_pr_cont_buf,
-			       sizeof(kernfs_pr_cont_buf));
-	if (p)
-		pr_cont("%s", p);
-	else
-		pr_cont("<name too long>");
+	sz1 = kernfs_path_from_node_locked(kn, NULL, kernfs_pr_cont_buf,
+					   sizeof(kernfs_pr_cont_buf));
+	if (sz1 < 0) {
+		pr_cont("(error)");
+		goto out;
+	}
+
+	if (sz1 < sizeof(kernfs_pr_cont_buf)) {
+		pr_cont("%s", kernfs_pr_cont_buf);
+		goto out;
+	}
+
+	p = kmalloc(sz1 + 1, GFP_NOFS);
+	if (!p) {
+		pr_cont("(out of memory)");
+		goto out;
+	}
+	sz2 = kernfs_path_from_node_locked(kn, NULL, p, sz1 + 1);
+	if (sz2 > sz1 || sz2 < 0) {
+		pr_cont("(error)");
+		goto out;
+	}
+
+	pr_cont("%s", p);
 
+out:
 	spin_unlock_irqrestore(&kernfs_rename_lock, flags);
+	kfree(p);
 }
 
 /**
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index af51df3..716bfde 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -267,8 +267,9 @@ static inline bool kernfs_ns_enabled(struct kernfs_node *kn)
 
 int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen);
 size_t kernfs_path_len(struct kernfs_node *kn);
-char * __must_check kernfs_path(struct kernfs_node *kn, char *buf,
-				size_t buflen);
+int kernfs_path_from_node(struct kernfs_node *root_kn, struct kernfs_node *kn,
+			  char *buf, size_t buflen);
+char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen);
 void pr_cont_kernfs_name(struct kernfs_node *kn);
 void pr_cont_kernfs_path(struct kernfs_node *kn);
 struct kernfs_node *kernfs_get_parent(struct kernfs_node *kn);
@@ -338,8 +339,8 @@ static inline int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen)
 static inline size_t kernfs_path_len(struct kernfs_node *kn)
 { return 0; }
 
-static inline char * __must_check kernfs_path(struct kernfs_node *kn, char *buf,
-					      size_t buflen)
+static inline char *kernfs_path(struct kernfs_node *kn, char *buf,
+				size_t buflen)
 { return NULL; }
 
 static inline void pr_cont_kernfs_name(struct kernfs_node *kn) { }
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
  2015-12-09 22:36         ` Tejun Heo
@ 2015-12-10  1:28             ` Serge E. Hallyn
  -1 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-12-10  1:28 UTC (permalink / raw)
  To: Tejun Heo
  Cc: gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	hannes-druUgvl0LCNAfugRpC6u6w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On Wed, Dec 09, 2015 at 05:36:51PM -0500, Tejun Heo wrote:
> Hey,
> 
> On Wed, Dec 09, 2015 at 10:13:27PM +0000, Serge Hallyn wrote:
> > we can rename kn_root to from here if you think that's clearer (and
> > change the order here as well).
> 
> I think it'd be better for them to be consistent and in the same order
> - the target and then the optional base.
> 
> > > Was converting the path functions to return
> > > length too much work?  If so, that's fine but please explain what
> > > decisions were made.
> > 
> > Yes, I had replied saying:
> > 
> >  |I can change that, but the callers right now don't re-try with
> >  |larger buffer anyway, so this would actually complicate them just
> >  |a smidgeon.  Would you want them changed to do that?  (pr_cont_kernfs_path
> >  |right now writes into a static char[] for instance)
> > 
> > I can still make that change if you like.
> 
> Oops, sorry I forgot about that.  The reason why kernfs_path() is
> written the current way was me being lazy.  While I think it'd be
> better to make the functions behave like normal string handling
> functions if we're extending it, I don't think it's that important.
> If it's easy, please go ahead.  If not, we can get back to it later
> when necessary.
> 
> > > I skimmed through the series and spotted several other review points
> > > which didn't get addressed.  Can you please go over the previous
> > > review cycle and address the review points?
> > 
> > I did go through every email twice, once while making changes (one
> > branch per response) and once while making changelog for each patch,
> > sorry about whatever I missed.  I'll go through each again.
> 
> The other chunk I noticed was inline conversions of internal functions
> which didn't seem to belong to the patch.  I asked whether those were
> stray chunks.  Maybe the comment was too buried to notice?  Anyways,
> that part actually causes conflicts when applying to cgroup/for-4.5.
> 
> There are a couple more things.
> 
> * Can you please put the ns related decls after the regular cgroup
>   stuff in cgroup.h?
> 
> * I think I might need to edit the documentation anyway but it'd be
>   great if you can make the namespace section more in line with the
>   rest of the documentation - e.g. s/CGroup/cgroup/ and more
>   structured sectioning.

Ok fwiw I've fixed up the arguments to kernfs_path_from_node, removed
the inlines, and moved the ns related decls after the others in cgroup.h
(i.e. done the easy stuff) in the 2015-12-09/cgroupns.3 branch of
git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux-security.git

I'll address the rest either after next week or, hopefully, when I get
a chance earlier.

> At this point, it all generally looks good to me.  Let's get the
> nits out of the way and merge it.

If you wanted to take the branch as is, then I'll do the documentation
and pr_cont_kernfs_path() etc rewrite as separate patches, but I'll
assume you'd like to at least wait for doc rewrite.

-serge

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
@ 2015-12-10  1:28             ` Serge E. Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-12-10  1:28 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Serge Hallyn, linux-api, containers, hannes, linux-kernel,
	ebiederm, lxc-devel, gregkh, cgroups, akpm

On Wed, Dec 09, 2015 at 05:36:51PM -0500, Tejun Heo wrote:
> Hey,
> 
> On Wed, Dec 09, 2015 at 10:13:27PM +0000, Serge Hallyn wrote:
> > we can rename kn_root to from here if you think that's clearer (and
> > change the order here as well).
> 
> I think it'd be better for them to be consistent and in the same order
> - the target and then the optional base.
> 
> > > Was converting the path functions to return
> > > length too much work?  If so, that's fine but please explain what
> > > decisions were made.
> > 
> > Yes, I had replied saying:
> > 
> >  |I can change that, but the callers right now don't re-try with
> >  |larger buffer anyway, so this would actually complicate them just
> >  |a smidgeon.  Would you want them changed to do that?  (pr_cont_kernfs_path
> >  |right now writes into a static char[] for instance)
> > 
> > I can still make that change if you like.
> 
> Oops, sorry I forgot about that.  The reason why kernfs_path() is
> written the current way was me being lazy.  While I think it'd be
> better to make the functions behave like normal string handling
> functions if we're extending it, I don't think it's that important.
> If it's easy, please go ahead.  If not, we can get back to it later
> when necessary.
> 
> > > I skimmed through the series and spotted several other review points
> > > which didn't get addressed.  Can you please go over the previous
> > > review cycle and address the review points?
> > 
> > I did go through every email twice, once while making changes (one
> > branch per response) and once while making changelog for each patch,
> > sorry about whatever I missed.  I'll go through each again.
> 
> The other chunk I noticed was inline conversions of internal functions
> which didn't seem to belong to the patch.  I asked whether those were
> stray chunks.  Maybe the comment was too buried to notice?  Anyways,
> that part actually causes conflicts when applying to cgroup/for-4.5.
> 
> There are a couple more things.
> 
> * Can you please put the ns related decls after the regular cgroup
>   stuff in cgroup.h?
> 
> * I think I might need to edit the documentation anyway but it'd be
>   great if you can make the namespace section more in line with the
>   rest of the documentation - e.g. s/CGroup/cgroup/ and more
>   structured sectioning.

Ok fwiw I've fixed up the arguments to kernfs_path_from_node, removed
the inlines, and moved the ns related decls after the others in cgroup.h
(i.e. done the easy stuff) in the 2015-12-09/cgroupns.3 branch of
git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux-security.git

I'll address the rest either after next week or, hopefully, when I get
a chance earlier.

> At this point, it all generally looks good to me.  Let's get the
> nits out of the way and merge it.

If you wanted to take the branch as is, then I'll do the documentation
and pr_cont_kernfs_path() etc rewrite as separate patches, but I'll
assume you'd like to at least wait for doc rewrite.

-serge

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
       [not found]         ` <20151209223651.GQ30240-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2015-12-09 22:51           ` Serge E. Hallyn
  2015-12-10  1:28             ` Serge E. Hallyn
  1 sibling, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-12-09 22:51 UTC (permalink / raw)
  To: Tejun Heo
  Cc: gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	hannes-druUgvl0LCNAfugRpC6u6w, cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On Wed, Dec 09, 2015 at 05:36:51PM -0500, Tejun Heo wrote:
> Hey,
> 
> On Wed, Dec 09, 2015 at 10:13:27PM +0000, Serge Hallyn wrote:
> > we can rename kn_root to from here if you think that's clearer (and
> > change the order here as well).
> 
> I think it'd be better for them to be consistent and in the same order
> - the target and then the optional base.
> 
> > > Was converting the path functions to return
> > > length too much work?  If so, that's fine but please explain what
> > > decisions were made.
> > 
> > Yes, I had replied saying:
> > 
> >  |I can change that, but the callers right now don't re-try with
> >  |larger buffer anyway, so this would actually complicate them just
> >  |a smidgeon.  Would you want them changed to do that?  (pr_cont_kernfs_path
> >  |right now writes into a static char[] for instance)
> > 
> > I can still make that change if you like.
> 
> Oops, sorry I forgot about that.  The reason why kernfs_path() is
> written the current way was me being lazy.  While I think it'd be
> better to make the functions behave like normal string handling
> functions if we're extending it, I don't think it's that important.
> If it's easy, please go ahead.  If not, we can get back to it later
> when necessary.

Ok - I'm now gone until Dec 21 (and laptopping won't be an option :( ).
I'll make the other changes then and do this as well.  So
pr_cont_kernfs_path() will dynamically allocate a longer buffer (only)
if needed.

> > > I skimmed through the series and spotted several other review points
> > > which didn't get addressed.  Can you please go over the previous
> > > review cycle and address the review points?
> > 
> > I did go through every email twice, once while making changes (one
> > branch per response) and once while making changelog for each patch,
> > sorry about whatever I missed.  I'll go through each again.
> 
> The other chunk I noticed was inline conversions of internal functions
> which didn't seem to belong to the patch.  I asked whether those were
> stray chunks.  Maybe the comment was too buried to notice?  Anyways,
> that part actually causes conflicts when applying to cgroup/for-4.5.

Gah.  I saw one and removed it.  Grep tells me I missed some, will
remove them all next time.

> There are a couple more things.
> 
> * Can you please put the ns related decls after the regular cgroup
>   stuff in cgroup.h?

ok

> * I think I might need to edit the documentation anyway but it'd be
>   great if you can make the namespace section more in line with the
>   rest of the documentation - e.g. s/CGroup/cgroup/ and more
>   structured sectioning.

I'll read through it and look for patterns to change.

> At this point, it all generally looks good to me.  Let's get the
> nits out of the way and merge it.
> 
> Thanks.

thanks,
-serge

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
  2015-12-09 22:36         ` Tejun Heo
  (?)
@ 2015-12-09 22:51         ` Serge E. Hallyn
  -1 siblings, 0 replies; 180+ messages in thread
From: Serge E. Hallyn @ 2015-12-09 22:51 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Serge Hallyn, linux-api, containers, hannes, linux-kernel,
	ebiederm, lxc-devel, gregkh, cgroups, akpm

On Wed, Dec 09, 2015 at 05:36:51PM -0500, Tejun Heo wrote:
> Hey,
> 
> On Wed, Dec 09, 2015 at 10:13:27PM +0000, Serge Hallyn wrote:
> > we can rename kn_root to from here if you think that's clearer (and
> > change the order here as well).
> 
> I think it'd be better for them to be consistent and in the same order
> - the target and then the optional base.
> 
> > > Was converting the path functions to return
> > > length too much work?  If so, that's fine but please explain what
> > > decisions were made.
> > 
> > Yes, I had replied saying:
> > 
> >  |I can change that, but the callers right now don't re-try with
> >  |larger buffer anyway, so this would actually complicate them just
> >  |a smidgeon.  Would you want them changed to do that?  (pr_cont_kernfs_path
> >  |right now writes into a static char[] for instance)
> > 
> > I can still make that change if you like.
> 
> Oops, sorry I forgot about that.  The reason why kernfs_path() is
> written the current way was me being lazy.  While I think it'd be
> better to make the functions behave like normal string handling
> functions if we're extending it, I don't think it's that important.
> If it's easy, please go ahead.  If not, we can get back to it later
> when necessary.

Ok - I'm now gone until Dec 21 (and laptopping won't be an option :( ).
I'll make the other changes then and do this as well.  So
pr_cont_kernfs_path() will dynamically allocate a longer buffer (only)
if needed.

> > > I skimmed through the series and spotted several other review points
> > > which didn't get addressed.  Can you please go over the previous
> > > review cycle and address the review points?
> > 
> > I did go through every email twice, once while making changes (one
> > branch per response) and once while making changelog for each patch,
> > sorry about whatever I missed.  I'll go through each again.
> 
> The other chunk I noticed was inline conversions of internal functions
> which didn't seem to belong to the patch.  I asked whether those were
> stray chunks.  Maybe the comment was too buried to notice?  Anyways,
> that part actually causes conflicts when applying to cgroup/for-4.5.

Gah.  I saw one and removed it.  Grep tells me I missed some, will
remove them all next time.

> There are a couple more things.
> 
> * Can you please put the ns related decls after the regular cgroup
>   stuff in cgroup.h?

ok

> * I think I might need to edit the documentation anyway but it'd be
>   great if you can make the namespace section more in line with the
>   rest of the documentation - e.g. s/CGroup/cgroup/ and more
>   structured sectioning.

I'll read through it and look for patterns to change.

> At this point, it all generally looks good to me.  Let's get the
> nits out of the way and merge it.
> 
> Thanks.

thanks,
-serge

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
  2015-12-09 22:13       ` Serge Hallyn
  (?)
  (?)
@ 2015-12-09 22:36       ` Tejun Heo
  -1 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-12-09 22:36 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

Hey,

On Wed, Dec 09, 2015 at 10:13:27PM +0000, Serge Hallyn wrote:
> we can rename kn_root to from here if you think that's clearer (and
> change the order here as well).

I think it'd be better for them to be consistent and in the same order
- the target and then the optional base.

> > Was converting the path functions to return
> > length too much work?  If so, that's fine but please explain what
> > decisions were made.
> 
> Yes, I had replied saying:
> 
>  |I can change that, but the callers right now don't re-try with
>  |larger buffer anyway, so this would actually complicate them just
>  |a smidgeon.  Would you want them changed to do that?  (pr_cont_kernfs_path
>  |right now writes into a static char[] for instance)
> 
> I can still make that change if you like.

Oops, sorry I forgot about that.  The reason why kernfs_path() is
written the current way was me being lazy.  While I think it'd be
better to make the functions behave like normal string handling
functions if we're extending it, I don't think it's that important.
If it's easy, please go ahead.  If not, we can get back to it later
when necessary.

> > I skimmed through the series and spotted several other review points
> > which didn't get addressed.  Can you please go over the previous
> > review cycle and address the review points?
> 
> I did go through every email twice, once while making changes (one
> branch per response) and once while making changelog for each patch,
> sorry about whatever I missed.  I'll go through each again.

The other chunk I noticed was inline conversions of internal functions
which didn't seem to belong to the patch.  I asked whether those were
stray chunks.  Maybe the comment was too buried to notice?  Anyways,
that part actually causes conflicts when applying to cgroup/for-4.5.

There are a couple more things.

* Can you please put the ns related decls after the regular cgroup
  stuff in cgroup.h?

* I think I might need to edit the documentation anyway but it'd be
  great if you can make the namespace section more in line with the
  rest of the documentation - e.g. s/CGroup/cgroup/ and more
  structured sectioning.

At this point, it all generally looks good to me.  Let's get the
nits out of the way and merge it.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
  2015-12-09 22:13       ` Serge Hallyn
@ 2015-12-09 22:36         ` Tejun Heo
  -1 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-12-09 22:36 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: linux-kernel, adityakali, linux-api, containers, cgroups,
	lxc-devel, akpm, ebiederm, gregkh, lizefan, hannes,
	Serge E. Hallyn

Hey,

On Wed, Dec 09, 2015 at 10:13:27PM +0000, Serge Hallyn wrote:
> we can rename kn_root to from here if you think that's clearer (and
> change the order here as well).

I think it'd be better for them to be consistent and in the same order
- the target and then the optional base.

> > Was converting the path functions to return
> > length too much work?  If so, that's fine but please explain what
> > decisions were made.
> 
> Yes, I had replied saying:
> 
>  |I can change that, but the callers right now don't re-try with
>  |larger buffer anyway, so this would actually complicate them just
>  |a smidgeon.  Would you want them changed to do that?  (pr_cont_kernfs_path
>  |right now writes into a static char[] for instance)
> 
> I can still make that change if you like.

Oops, sorry I forgot about that.  The reason why kernfs_path() is
written the current way was me being lazy.  While I think it'd be
better to make the functions behave like normal string handling
functions if we're extending it, I don't think it's that important.
If it's easy, please go ahead.  If not, we can get back to it later
when necessary.

> > I skimmed through the series and spotted several other review points
> > which didn't get addressed.  Can you please go over the previous
> > review cycle and address the review points?
> 
> I did go through every email twice, once while making changes (one
> branch per response) and once while making changelog for each patch,
> sorry about whatever I missed.  I'll go through each again.

The other chunk I noticed was inline conversions of internal functions
which didn't seem to belong to the patch.  I asked whether those were
stray chunks.  Maybe the comment was too buried to notice?  Anyways,
that part actually causes conflicts when applying to cgroup/for-4.5.

There are a couple more things.

* Can you please put the ns related decls after the regular cgroup
  stuff in cgroup.h?

* I think I might need to edit the documentation anyway but it'd be
  great if you can make the namespace section more in line with the
  rest of the documentation - e.g. s/CGroup/cgroup/ and more
  structured sectioning.

At this point, it all generally looks good to me.  Let's get the
nits out of the way and merge it.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
@ 2015-12-09 22:36         ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-12-09 22:36 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	adityakali-hpIqsD4AKlfQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w,
	Serge E. Hallyn

Hey,

On Wed, Dec 09, 2015 at 10:13:27PM +0000, Serge Hallyn wrote:
> we can rename kn_root to from here if you think that's clearer (and
> change the order here as well).

I think it'd be better for them to be consistent and in the same order
- the target and then the optional base.

> > Was converting the path functions to return
> > length too much work?  If so, that's fine but please explain what
> > decisions were made.
> 
> Yes, I had replied saying:
> 
>  |I can change that, but the callers right now don't re-try with
>  |larger buffer anyway, so this would actually complicate them just
>  |a smidgeon.  Would you want them changed to do that?  (pr_cont_kernfs_path
>  |right now writes into a static char[] for instance)
> 
> I can still make that change if you like.

Oops, sorry I forgot about that.  The reason why kernfs_path() is
written the current way was me being lazy.  While I think it'd be
better to make the functions behave like normal string handling
functions if we're extending it, I don't think it's that important.
If it's easy, please go ahead.  If not, we can get back to it later
when necessary.

> > I skimmed through the series and spotted several other review points
> > which didn't get addressed.  Can you please go over the previous
> > review cycle and address the review points?
> 
> I did go through every email twice, once while making changes (one
> branch per response) and once while making changelog for each patch,
> sorry about whatever I missed.  I'll go through each again.

The other chunk I noticed was inline conversions of internal functions
which didn't seem to belong to the patch.  I asked whether those were
stray chunks.  Maybe the comment was too buried to notice?  Anyways,
that part actually causes conflicts when applying to cgroup/for-4.5.

There are a couple more things.

* Can you please put the ns related decls after the regular cgroup
  stuff in cgroup.h?

* I think I might need to edit the documentation anyway but it'd be
  great if you can make the namespace section more in line with the
  rest of the documentation - e.g. s/CGroup/cgroup/ and more
  structured sectioning.

At this point, it all generally looks good to me.  Let's get the
nits out of the way and merge it.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
       [not found]     ` <20151209213806.GP30240-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2015-12-09 22:13       ` Serge Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge Hallyn @ 2015-12-09 22:13 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

Quoting Tejun Heo (tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org):
> Hello, Serge.
> 
> On Wed, Dec 09, 2015 at 01:28:54PM -0600, serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org wrote:
> > +/* kernfs_node_depth - compute depth from @from to @to */
> > +static size_t kernfs_depth(struct kernfs_node *from, struct kernfs_node *to)
> ...
> > +char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen)
> > +{
> > +	return kernfs_path_from_node(NULL, kn, buf, buflen);
> > +}
> ...
> > diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
> > index 5d4e9c4..d025ebd 100644
> > --- a/include/linux/kernfs.h
> > +++ b/include/linux/kernfs.h
> > @@ -267,6 +267,9 @@ static inline bool kernfs_ns_enabled(struct kernfs_node *kn)
> >  
> >  int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen);
> >  size_t kernfs_path_len(struct kernfs_node *kn);
> > +char * __must_check kernfs_path_from_node(struct kernfs_node *root_kn,
> > +					  struct kernfs_node *kn, char *buf,
> > +					  size_t buflen);
> 
> I think I commented on the same thing before, but I think it'd make
> more sense to put @from after @to

Oh.  You said that for kernfs_path_from_node_locked(), and those were
changed.  kernfs_path_form_node() is a different fn, but

> and the prototype is using @root_kn
> which is a bit confusing.

we can rename kn_root to from here if you think that's clearer (and
change the order here as well).

> Was converting the path functions to return
> length too much work?  If so, that's fine but please explain what
> decisions were made.

Yes, I had replied saying:

 |I can change that, but the callers right now don't re-try with
 |larger buffer anyway, so this would actually complicate them just
 |a smidgeon.  Would you want them changed to do that?  (pr_cont_kernfs_path
 |right now writes into a static char[] for instance)

I can still make that change if you like.

> I skimmed through the series and spotted several other review points
> which didn't get addressed.  Can you please go over the previous
> review cycle and address the review points?

I did go through every email twice, once while making changes (one
branch per response) and once while making changelog for each patch,
sorry about whatever I missed.  I'll go through each again.

I'm going to be out for awhile after today, so next version will
unfortunately take awhile.

thanks,
-serge

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
       [not found]     ` <20151209213806.GP30240-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
@ 2015-12-09 22:13       ` Serge Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge Hallyn @ 2015-12-09 22:13 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, adityakali, linux-api, containers, cgroups,
	lxc-devel, akpm, ebiederm, gregkh, lizefan, hannes,
	Serge E. Hallyn

Quoting Tejun Heo (tj@kernel.org):
> Hello, Serge.
> 
> On Wed, Dec 09, 2015 at 01:28:54PM -0600, serge.hallyn@ubuntu.com wrote:
> > +/* kernfs_node_depth - compute depth from @from to @to */
> > +static size_t kernfs_depth(struct kernfs_node *from, struct kernfs_node *to)
> ...
> > +char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen)
> > +{
> > +	return kernfs_path_from_node(NULL, kn, buf, buflen);
> > +}
> ...
> > diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
> > index 5d4e9c4..d025ebd 100644
> > --- a/include/linux/kernfs.h
> > +++ b/include/linux/kernfs.h
> > @@ -267,6 +267,9 @@ static inline bool kernfs_ns_enabled(struct kernfs_node *kn)
> >  
> >  int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen);
> >  size_t kernfs_path_len(struct kernfs_node *kn);
> > +char * __must_check kernfs_path_from_node(struct kernfs_node *root_kn,
> > +					  struct kernfs_node *kn, char *buf,
> > +					  size_t buflen);
> 
> I think I commented on the same thing before, but I think it'd make
> more sense to put @from after @to

Oh.  You said that for kernfs_path_from_node_locked(), and those were
changed.  kernfs_path_form_node() is a different fn, but

> and the prototype is using @root_kn
> which is a bit confusing.

we can rename kn_root to from here if you think that's clearer (and
change the order here as well).

> Was converting the path functions to return
> length too much work?  If so, that's fine but please explain what
> decisions were made.

Yes, I had replied saying:

 |I can change that, but the callers right now don't re-try with
 |larger buffer anyway, so this would actually complicate them just
 |a smidgeon.  Would you want them changed to do that?  (pr_cont_kernfs_path
 |right now writes into a static char[] for instance)

I can still make that change if you like.

> I skimmed through the series and spotted several other review points
> which didn't get addressed.  Can you please go over the previous
> review cycle and address the review points?

I did go through every email twice, once while making changes (one
branch per response) and once while making changelog for each patch,
sorry about whatever I missed.  I'll go through each again.

I'm going to be out for awhile after today, so next version will
unfortunately take awhile.

thanks,
-serge

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
@ 2015-12-09 22:13       ` Serge Hallyn
  0 siblings, 0 replies; 180+ messages in thread
From: Serge Hallyn @ 2015-12-09 22:13 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	adityakali-hpIqsD4AKlfQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w,
	Serge E. Hallyn

Quoting Tejun Heo (tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org):
> Hello, Serge.
> 
> On Wed, Dec 09, 2015 at 01:28:54PM -0600, serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org wrote:
> > +/* kernfs_node_depth - compute depth from @from to @to */
> > +static size_t kernfs_depth(struct kernfs_node *from, struct kernfs_node *to)
> ...
> > +char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen)
> > +{
> > +	return kernfs_path_from_node(NULL, kn, buf, buflen);
> > +}
> ...
> > diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
> > index 5d4e9c4..d025ebd 100644
> > --- a/include/linux/kernfs.h
> > +++ b/include/linux/kernfs.h
> > @@ -267,6 +267,9 @@ static inline bool kernfs_ns_enabled(struct kernfs_node *kn)
> >  
> >  int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen);
> >  size_t kernfs_path_len(struct kernfs_node *kn);
> > +char * __must_check kernfs_path_from_node(struct kernfs_node *root_kn,
> > +					  struct kernfs_node *kn, char *buf,
> > +					  size_t buflen);
> 
> I think I commented on the same thing before, but I think it'd make
> more sense to put @from after @to

Oh.  You said that for kernfs_path_from_node_locked(), and those were
changed.  kernfs_path_form_node() is a different fn, but

> and the prototype is using @root_kn
> which is a bit confusing.

we can rename kn_root to from here if you think that's clearer (and
change the order here as well).

> Was converting the path functions to return
> length too much work?  If so, that's fine but please explain what
> decisions were made.

Yes, I had replied saying:

 |I can change that, but the callers right now don't re-try with
 |larger buffer anyway, so this would actually complicate them just
 |a smidgeon.  Would you want them changed to do that?  (pr_cont_kernfs_path
 |right now writes into a static char[] for instance)

I can still make that change if you like.

> I skimmed through the series and spotted several other review points
> which didn't get addressed.  Can you please go over the previous
> review cycle and address the review points?

I did go through every email twice, once while making changes (one
branch per response) and once while making changelog for each patch,
sorry about whatever I missed.  I'll go through each again.

I'm going to be out for awhile after today, so next version will
unfortunately take awhile.

thanks,
-serge

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
       [not found]   ` <1449689341-28742-2-git-send-email-serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
@ 2015-12-09 21:38     ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-12-09 21:38 UTC (permalink / raw)
  To: serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	hannes-druUgvl0LCNAfugRpC6u6w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

Hello, Serge.

On Wed, Dec 09, 2015 at 01:28:54PM -0600, serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org wrote:
> +/* kernfs_node_depth - compute depth from @from to @to */
> +static size_t kernfs_depth(struct kernfs_node *from, struct kernfs_node *to)
...
> +char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen)
> +{
> +	return kernfs_path_from_node(NULL, kn, buf, buflen);
> +}
...
> diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
> index 5d4e9c4..d025ebd 100644
> --- a/include/linux/kernfs.h
> +++ b/include/linux/kernfs.h
> @@ -267,6 +267,9 @@ static inline bool kernfs_ns_enabled(struct kernfs_node *kn)
>  
>  int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen);
>  size_t kernfs_path_len(struct kernfs_node *kn);
> +char * __must_check kernfs_path_from_node(struct kernfs_node *root_kn,
> +					  struct kernfs_node *kn, char *buf,
> +					  size_t buflen);

I think I commented on the same thing before, but I think it'd make
more sense to put @from after @to and the prototype is using @root_kn
which is a bit confusing.  Was converting the path functions to return
length too much work?  If so, that's fine but please explain what
decisions were made.

I skimmed through the series and spotted several other review points
which didn't get addressed.  Can you please go over the previous
review cycle and address the review points?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
       [not found]   ` <1449689341-28742-2-git-send-email-serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
@ 2015-12-09 21:38     ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-12-09 21:38 UTC (permalink / raw)
  To: serge.hallyn
  Cc: linux-kernel, adityakali, linux-api, containers, cgroups,
	lxc-devel, akpm, ebiederm, gregkh, lizefan, hannes,
	Serge E. Hallyn

Hello, Serge.

On Wed, Dec 09, 2015 at 01:28:54PM -0600, serge.hallyn@ubuntu.com wrote:
> +/* kernfs_node_depth - compute depth from @from to @to */
> +static size_t kernfs_depth(struct kernfs_node *from, struct kernfs_node *to)
...
> +char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen)
> +{
> +	return kernfs_path_from_node(NULL, kn, buf, buflen);
> +}
...
> diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
> index 5d4e9c4..d025ebd 100644
> --- a/include/linux/kernfs.h
> +++ b/include/linux/kernfs.h
> @@ -267,6 +267,9 @@ static inline bool kernfs_ns_enabled(struct kernfs_node *kn)
>  
>  int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen);
>  size_t kernfs_path_len(struct kernfs_node *kn);
> +char * __must_check kernfs_path_from_node(struct kernfs_node *root_kn,
> +					  struct kernfs_node *kn, char *buf,
> +					  size_t buflen);

I think I commented on the same thing before, but I think it'd make
more sense to put @from after @to and the prototype is using @root_kn
which is a bit confusing.  Was converting the path functions to return
length too much work?  If so, that's fine but please explain what
decisions were made.

I skimmed through the series and spotted several other review points
which didn't get addressed.  Can you please go over the previous
review cycle and address the review points?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH 1/8] kernfs: Add API to generate relative kernfs path
@ 2015-12-09 21:38     ` Tejun Heo
  0 siblings, 0 replies; 180+ messages in thread
From: Tejun Heo @ 2015-12-09 21:38 UTC (permalink / raw)
  To: serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	adityakali-hpIqsD4AKlfQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	lizefan-hv44wF8Li93QT0dZR+AlfA, hannes-druUgvl0LCNAfugRpC6u6w,
	Serge E. Hallyn

Hello, Serge.

On Wed, Dec 09, 2015 at 01:28:54PM -0600, serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org wrote:
> +/* kernfs_node_depth - compute depth from @from to @to */
> +static size_t kernfs_depth(struct kernfs_node *from, struct kernfs_node *to)
...
> +char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen)
> +{
> +	return kernfs_path_from_node(NULL, kn, buf, buflen);
> +}
...
> diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
> index 5d4e9c4..d025ebd 100644
> --- a/include/linux/kernfs.h
> +++ b/include/linux/kernfs.h
> @@ -267,6 +267,9 @@ static inline bool kernfs_ns_enabled(struct kernfs_node *kn)
>  
>  int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen);
>  size_t kernfs_path_len(struct kernfs_node *kn);
> +char * __must_check kernfs_path_from_node(struct kernfs_node *root_kn,
> +					  struct kernfs_node *kn, char *buf,
> +					  size_t buflen);

I think I commented on the same thing before, but I think it'd make
more sense to put @from after @to and the prototype is using @root_kn
which is a bit confusing.  Was converting the path functions to return
length too much work?  If so, that's fine but please explain what
decisions were made.

I skimmed through the series and spotted several other review points
which didn't get addressed.  Can you please go over the previous
review cycle and address the review points?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH 1/8] kernfs: Add API to generate relative kernfs path
       [not found] ` <1449689341-28742-1-git-send-email-serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
@ 2015-12-09 19:28   ` serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA
  0 siblings, 0 replies; 180+ messages in thread
From: serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA @ 2015-12-09 19:28 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	hannes-druUgvl0LCNAfugRpC6u6w, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	tj-DgEjT+Ai2ygdnm+yROfE0A, cgroups-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

From: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>

The new function kernfs_path_from_node() generates and returns kernfs
path of a given kernfs_node relative to a given parent kernfs_node.

Signed-off-by: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Serge E. Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
Changelog 20151125:
  - Fully-wing multilinecomments
  - Rework kernfs_path_from_node_locked() logic
  - Replace BUG_ONs with returning NULL
  - Use a const char* for /.. and precalculate its size
Changelog 20151130:
  - Update kernfs_path_from_node_locked comment
Changelog 20151208:
  - kernfs_node_distance:
    * Remove BUG_ON(NULL)s
    * Rename kernfs_node_distance to kernfs_depth
  - kernfs_common-ancestor:
    * Remove useless checks for depth == 0
    * Add check to ensure nodes are from same root
  - kernfs_path_from_node_locked:
    * Remove needless __must_check
    * Put p;len on its own decl line.
    * Fix wrong WARN_ONCE usage
---
 fs/kernfs/dir.c        |  177 ++++++++++++++++++++++++++++++++++++++++--------
 include/linux/kernfs.h |    3 +
 2 files changed, 153 insertions(+), 27 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 91e0045..d1a001a 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -44,28 +44,129 @@ static int kernfs_name_locked(struct kernfs_node *kn, char *buf, size_t buflen)
 	return strlcpy(buf, kn->parent ? kn->name : "/", buflen);
 }
 
-static char * __must_check kernfs_path_locked(struct kernfs_node *kn, char *buf,
-					      size_t buflen)
+/* kernfs_node_depth - compute depth from @from to @to */
+static size_t kernfs_depth(struct kernfs_node *from, struct kernfs_node *to)
 {
-	char *p = buf + buflen;
-	int len;
+	size_t depth = 0;
 
-	*--p = '\0';
+	while (to->parent && to != from) {
+		depth++;
+		to = to->parent;
+	}
+	return depth;
+}
 
-	do {
-		len = strlen(kn->name);
-		if (p - buf < len + 1) {
-			buf[0] = '\0';
-			p = NULL;
-			break;
-		}
-		p -= len;
-		memcpy(p, kn->name, len);
-		*--p = '/';
-		kn = kn->parent;
-	} while (kn && kn->parent);
+static struct kernfs_node *kernfs_common_ancestor(struct kernfs_node *a,
+		struct kernfs_node *b)
+{
+	size_t da, db;
+	struct kernfs_root *ra = kernfs_root(a), *rb = kernfs_root(b);
 
-	return p;
+	if (ra != rb)
+		return NULL;
+
+	da = kernfs_depth(ra->kn, a);
+	db = kernfs_depth(rb->kn, b);
+
+	while (da > db) {
+		a = a->parent;
+		da--;
+	}
+	while (db > da) {
+		b = b->parent;
+		db--;
+	}
+
+	/* worst case b and a will be the same at root */
+	while (b != a) {
+		b = b->parent;
+		a = a->parent;
+	}
+
+	return a;
+}
+
+/**
+ * kernfs_path_from_node_locked - find a pseudo-absolute path to @kn_to,
+ * where kn_from is treated as root of the path.
+ * @kn_from: kernfs node which should be treated as root for the path
+ * @kn_to: kernfs node to which path is needed
+ * @buf: buffer to copy the path into
+ * @buflen: size of @buf
+ *
+ * We need to handle couple of scenarios here:
+ * [1] when @kn_from is an ancestor of @kn_to at some level
+ * kn_from: /n1/n2/n3
+ * kn_to:   /n1/n2/n3/n4/n5
+ * result:  /n4/n5
+ *
+ * [2] when @kn_from is on a different hierarchy and we need to find common
+ * ancestor between @kn_from and @kn_to.
+ * kn_from: /n1/n2/n3/n4
+ * kn_to:   /n1/n2/n5
+ * result:  /../../n5
+ * OR
+ * kn_from: /n1/n2/n3/n4/n5   [depth=5]
+ * kn_to:   /n1/n2/n3         [depth=3]
+ * result:  /../..
+ */
+static char *
+kernfs_path_from_node_locked(struct kernfs_node *kn_to,
+			     struct kernfs_node *kn_from, char *buf,
+			     size_t buflen)
+{
+	char *p = buf;
+	struct kernfs_node *kn, *common;
+	const char parent_str[] = "/..";
+	int i;
+	size_t depth_from, depth_to, len = 0, nlen = 0;
+	size_t plen = sizeof(parent_str) - 1;
+
+	/* We atleast need 2 bytes to write "/\0". */
+	if (buflen < 2)
+		return NULL;
+
+	if (!kn_from)
+		kn_from = kernfs_root(kn_to)->kn;
+
+	if (kn_from == kn_to) {
+		*p = '/';
+		*(++p) = '\0';
+		return buf;
+	}
+
+	common = kernfs_common_ancestor(kn_from, kn_to);
+	if (WARN_ON(!common))
+		return NULL;
+
+	depth_to = kernfs_depth(common, kn_to);
+	depth_from = kernfs_depth(common, kn_from);
+
+	for (i = 0; i < depth_from; i++) {
+		if (len + plen + 1 > buflen)
+			return NULL;
+		strcpy(p, parent_str);
+		p += plen;
+		len += plen;
+	}
+
+	/* Calculate how many bytes we need for the rest */
+	for (kn = kn_to; kn != common; kn = kn->parent)
+		nlen += strlen(kn->name) + 1;
+
+	if (len + nlen + 1 > buflen)
+		return NULL;
+
+	p += nlen;
+	*p = '\0';
+	for (kn = kn_to; kn != common; kn = kn->parent) {
+		nlen = strlen(kn->name);
+		p -= nlen;
+		memcpy(p, kn->name, nlen);
+		*(--p) = '/';
+	}
+
+	return buf;
 }
 
 /**
@@ -115,26 +216,48 @@ size_t kernfs_path_len(struct kernfs_node *kn)
 }
 
 /**
- * kernfs_path - build full path of a given node
+ * kernfs_path_from_node - build path of node @kn relative to @kn_root.
+ * @kn_root: parent kernfs_node relative to which we need to build the path
  * @kn: kernfs_node of interest
- * @buf: buffer to copy @kn's name into
+ * @buf: buffer to copy @kn's path into
  * @buflen: size of @buf
  *
- * Builds and returns the full path of @kn in @buf of @buflen bytes.  The
- * path is built from the end of @buf so the returned pointer usually
- * doesn't match @buf.  If @buf isn't long enough, @buf is nul terminated
+ * Builds and returns @kn's path relative to @kn_root. @kn_root and @kn must
+ * be on the same kernfs-root. If @kn_root is not parent of @kn, then a relative
+ * path (which includes '..'s) as needed to reach from @kn_root to @kn is
+ * returned.
+ * The path may be built from the end of @buf so the returned pointer may not
+ * match @buf.  If @buf isn't long enough, @buf is nul terminated
  * and %NULL is returned.
  */
-char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen)
+char *kernfs_path_from_node(struct kernfs_node *kn_root, struct kernfs_node *kn,
+			    char *buf, size_t buflen)
 {
 	unsigned long flags;
 	char *p;
 
 	spin_lock_irqsave(&kernfs_rename_lock, flags);
-	p = kernfs_path_locked(kn, buf, buflen);
+	p = kernfs_path_from_node_locked(kn, kn_root, buf, buflen);
 	spin_unlock_irqrestore(&kernfs_rename_lock, flags);
 	return p;
 }
+EXPORT_SYMBOL_GPL(kernfs_path_from_node);
+
+/**
+ * kernfs_path - build full path of a given node
+ * @kn: kernfs_node of interest
+ * @buf: buffer to copy @kn's name into
+ * @buflen: size of @buf
+ *
+ * Builds and returns the full path of @kn in @buf of @buflen bytes.  The
+ * path is built from the end of @buf so the returned pointer usually
+ * doesn't match @buf.  If @buf isn't long enough, @buf is nul terminated
+ * and %NULL is returned.
+ */
+char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen)
+{
+	return kernfs_path_from_node(NULL, kn, buf, buflen);
+}
 EXPORT_SYMBOL_GPL(kernfs_path);
 
 /**
@@ -168,8 +291,8 @@ void pr_cont_kernfs_path(struct kernfs_node *kn)
 
 	spin_lock_irqsave(&kernfs_rename_lock, flags);
 
-	p = kernfs_path_locked(kn, kernfs_pr_cont_buf,
-			       sizeof(kernfs_pr_cont_buf));
+	p = kernfs_path_from_node_locked(kn, NULL, kernfs_pr_cont_buf,
+					 sizeof(kernfs_pr_cont_buf));
 	if (p)
 		pr_cont("%s", p);
 	else
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 5d4e9c4..d025ebd 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -267,6 +267,9 @@ static inline bool kernfs_ns_enabled(struct kernfs_node *kn)
 
 int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen);
 size_t kernfs_path_len(struct kernfs_node *kn);
+char * __must_check kernfs_path_from_node(struct kernfs_node *root_kn,
+					  struct kernfs_node *kn, char *buf,
+					  size_t buflen);
 char * __must_check kernfs_path(struct kernfs_node *kn, char *buf,
 				size_t buflen);
 void pr_cont_kernfs_name(struct kernfs_node *kn);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH 1/8] kernfs: Add API to generate relative kernfs path
  2015-12-09 19:28 CGroup Namespaces (v7) serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA
@ 2015-12-09 19:28 ` serge.hallyn
  2015-12-09 21:38     ` Tejun Heo
       [not found]   ` <1449689341-28742-2-git-send-email-serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
       [not found] ` <1449689341-28742-1-git-send-email-serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
  1 sibling, 2 replies; 180+ messages in thread
From: serge.hallyn @ 2015-12-09 19:28 UTC (permalink / raw)
  To: linux-kernel
  Cc: adityakali, tj, linux-api, containers, cgroups, lxc-devel, akpm,
	ebiederm, gregkh, lizefan, hannes, Serge E. Hallyn

From: Aditya Kali <adityakali@google.com>

The new function kernfs_path_from_node() generates and returns kernfs
path of a given kernfs_node relative to a given parent kernfs_node.

Signed-off-by: Aditya Kali <adityakali@google.com>
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
---
Changelog 20151125:
  - Fully-wing multilinecomments
  - Rework kernfs_path_from_node_locked() logic
  - Replace BUG_ONs with returning NULL
  - Use a const char* for /.. and precalculate its size
Changelog 20151130:
  - Update kernfs_path_from_node_locked comment
Changelog 20151208:
  - kernfs_node_distance:
    * Remove BUG_ON(NULL)s
    * Rename kernfs_node_distance to kernfs_depth
  - kernfs_common-ancestor:
    * Remove useless checks for depth == 0
    * Add check to ensure nodes are from same root
  - kernfs_path_from_node_locked:
    * Remove needless __must_check
    * Put p;len on its own decl line.
    * Fix wrong WARN_ONCE usage
---
 fs/kernfs/dir.c        |  177 ++++++++++++++++++++++++++++++++++++++++--------
 include/linux/kernfs.h |    3 +
 2 files changed, 153 insertions(+), 27 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 91e0045..d1a001a 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -44,28 +44,129 @@ static int kernfs_name_locked(struct kernfs_node *kn, char *buf, size_t buflen)
 	return strlcpy(buf, kn->parent ? kn->name : "/", buflen);
 }
 
-static char * __must_check kernfs_path_locked(struct kernfs_node *kn, char *buf,
-					      size_t buflen)
+/* kernfs_node_depth - compute depth from @from to @to */
+static size_t kernfs_depth(struct kernfs_node *from, struct kernfs_node *to)
 {
-	char *p = buf + buflen;
-	int len;
+	size_t depth = 0;
 
-	*--p = '\0';
+	while (to->parent && to != from) {
+		depth++;
+		to = to->parent;
+	}
+	return depth;
+}
 
-	do {
-		len = strlen(kn->name);
-		if (p - buf < len + 1) {
-			buf[0] = '\0';
-			p = NULL;
-			break;
-		}
-		p -= len;
-		memcpy(p, kn->name, len);
-		*--p = '/';
-		kn = kn->parent;
-	} while (kn && kn->parent);
+static struct kernfs_node *kernfs_common_ancestor(struct kernfs_node *a,
+		struct kernfs_node *b)
+{
+	size_t da, db;
+	struct kernfs_root *ra = kernfs_root(a), *rb = kernfs_root(b);
 
-	return p;
+	if (ra != rb)
+		return NULL;
+
+	da = kernfs_depth(ra->kn, a);
+	db = kernfs_depth(rb->kn, b);
+
+	while (da > db) {
+		a = a->parent;
+		da--;
+	}
+	while (db > da) {
+		b = b->parent;
+		db--;
+	}
+
+	/* worst case b and a will be the same at root */
+	while (b != a) {
+		b = b->parent;
+		a = a->parent;
+	}
+
+	return a;
+}
+
+/**
+ * kernfs_path_from_node_locked - find a pseudo-absolute path to @kn_to,
+ * where kn_from is treated as root of the path.
+ * @kn_from: kernfs node which should be treated as root for the path
+ * @kn_to: kernfs node to which path is needed
+ * @buf: buffer to copy the path into
+ * @buflen: size of @buf
+ *
+ * We need to handle couple of scenarios here:
+ * [1] when @kn_from is an ancestor of @kn_to at some level
+ * kn_from: /n1/n2/n3
+ * kn_to:   /n1/n2/n3/n4/n5
+ * result:  /n4/n5
+ *
+ * [2] when @kn_from is on a different hierarchy and we need to find common
+ * ancestor between @kn_from and @kn_to.
+ * kn_from: /n1/n2/n3/n4
+ * kn_to:   /n1/n2/n5
+ * result:  /../../n5
+ * OR
+ * kn_from: /n1/n2/n3/n4/n5   [depth=5]
+ * kn_to:   /n1/n2/n3         [depth=3]
+ * result:  /../..
+ */
+static char *
+kernfs_path_from_node_locked(struct kernfs_node *kn_to,
+			     struct kernfs_node *kn_from, char *buf,
+			     size_t buflen)
+{
+	char *p = buf;
+	struct kernfs_node *kn, *common;
+	const char parent_str[] = "/..";
+	int i;
+	size_t depth_from, depth_to, len = 0, nlen = 0;
+	size_t plen = sizeof(parent_str) - 1;
+
+	/* We atleast need 2 bytes to write "/\0". */
+	if (buflen < 2)
+		return NULL;
+
+	if (!kn_from)
+		kn_from = kernfs_root(kn_to)->kn;
+
+	if (kn_from == kn_to) {
+		*p = '/';
+		*(++p) = '\0';
+		return buf;
+	}
+
+	common = kernfs_common_ancestor(kn_from, kn_to);
+	if (WARN_ON(!common))
+		return NULL;
+
+	depth_to = kernfs_depth(common, kn_to);
+	depth_from = kernfs_depth(common, kn_from);
+
+	for (i = 0; i < depth_from; i++) {
+		if (len + plen + 1 > buflen)
+			return NULL;
+		strcpy(p, parent_str);
+		p += plen;
+		len += plen;
+	}
+
+	/* Calculate how many bytes we need for the rest */
+	for (kn = kn_to; kn != common; kn = kn->parent)
+		nlen += strlen(kn->name) + 1;
+
+	if (len + nlen + 1 > buflen)
+		return NULL;
+
+	p += nlen;
+	*p = '\0';
+	for (kn = kn_to; kn != common; kn = kn->parent) {
+		nlen = strlen(kn->name);
+		p -= nlen;
+		memcpy(p, kn->name, nlen);
+		*(--p) = '/';
+	}
+
+	return buf;
 }
 
 /**
@@ -115,26 +216,48 @@ size_t kernfs_path_len(struct kernfs_node *kn)
 }
 
 /**
- * kernfs_path - build full path of a given node
+ * kernfs_path_from_node - build path of node @kn relative to @kn_root.
+ * @kn_root: parent kernfs_node relative to which we need to build the path
  * @kn: kernfs_node of interest
- * @buf: buffer to copy @kn's name into
+ * @buf: buffer to copy @kn's path into
  * @buflen: size of @buf
  *
- * Builds and returns the full path of @kn in @buf of @buflen bytes.  The
- * path is built from the end of @buf so the returned pointer usually
- * doesn't match @buf.  If @buf isn't long enough, @buf is nul terminated
+ * Builds and returns @kn's path relative to @kn_root. @kn_root and @kn must
+ * be on the same kernfs-root. If @kn_root is not parent of @kn, then a relative
+ * path (which includes '..'s) as needed to reach from @kn_root to @kn is
+ * returned.
+ * The path may be built from the end of @buf so the returned pointer may not
+ * match @buf.  If @buf isn't long enough, @buf is nul terminated
  * and %NULL is returned.
  */
-char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen)
+char *kernfs_path_from_node(struct kernfs_node *kn_root, struct kernfs_node *kn,
+			    char *buf, size_t buflen)
 {
 	unsigned long flags;
 	char *p;
 
 	spin_lock_irqsave(&kernfs_rename_lock, flags);
-	p = kernfs_path_locked(kn, buf, buflen);
+	p = kernfs_path_from_node_locked(kn, kn_root, buf, buflen);
 	spin_unlock_irqrestore(&kernfs_rename_lock, flags);
 	return p;
 }
+EXPORT_SYMBOL_GPL(kernfs_path_from_node);
+
+/**
+ * kernfs_path - build full path of a given node
+ * @kn: kernfs_node of interest
+ * @buf: buffer to copy @kn's name into
+ * @buflen: size of @buf
+ *
+ * Builds and returns the full path of @kn in @buf of @buflen bytes.  The
+ * path is built from the end of @buf so the returned pointer usually
+ * doesn't match @buf.  If @buf isn't long enough, @buf is nul terminated
+ * and %NULL is returned.
+ */
+char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen)
+{
+	return kernfs_path_from_node(NULL, kn, buf, buflen);
+}
 EXPORT_SYMBOL_GPL(kernfs_path);
 
 /**
@@ -168,8 +291,8 @@ void pr_cont_kernfs_path(struct kernfs_node *kn)
 
 	spin_lock_irqsave(&kernfs_rename_lock, flags);
 
-	p = kernfs_path_locked(kn, kernfs_pr_cont_buf,
-			       sizeof(kernfs_pr_cont_buf));
+	p = kernfs_path_from_node_locked(kn, NULL, kernfs_pr_cont_buf,
+					 sizeof(kernfs_pr_cont_buf));
 	if (p)
 		pr_cont("%s", p);
 	else
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 5d4e9c4..d025ebd 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -267,6 +267,9 @@ static inline bool kernfs_ns_enabled(struct kernfs_node *kn)
 
 int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen);
 size_t kernfs_path_len(struct kernfs_node *kn);
+char * __must_check kernfs_path_from_node(struct kernfs_node *root_kn,
+					  struct kernfs_node *kn, char *buf,
+					  size_t buflen);
 char * __must_check kernfs_path(struct kernfs_node *kn, char *buf,
 				size_t buflen);
 void pr_cont_kernfs_name(struct kernfs_node *kn);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 180+ messages in thread

end of thread, other threads:[~2016-01-29  8:56 UTC | newest]

Thread overview: 180+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-16 19:51 CGroup Namespaces (v4) serge-A9i7LUbDfNHQT0dZR+AlfA
2015-11-16 19:51 ` serge
2015-11-16 19:51 ` [PATCH 2/8] sched: new clone flag CLONE_NEWCGROUP for cgroup namespace serge
2015-11-16 19:51   ` serge-A9i7LUbDfNHQT0dZR+AlfA
     [not found] ` <1447703505-29672-1-git-send-email-serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2015-11-16 19:51   ` [PATCH 1/8] kernfs: Add API to generate relative kernfs path serge-A9i7LUbDfNHQT0dZR+AlfA
2015-11-16 19:51     ` serge
     [not found]     ` <1447703505-29672-2-git-send-email-serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2015-11-24 16:16       ` Tejun Heo
2015-11-24 16:16     ` Tejun Heo
2015-11-24 16:16       ` Tejun Heo
2015-11-24 16:17       ` Tejun Heo
2015-11-24 16:17         ` Tejun Heo
2015-11-24 17:43         ` Serge E. Hallyn
2015-11-24 17:43           ` Serge E. Hallyn
     [not found]         ` <20151124161709.GM17033-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
2015-11-24 17:43           ` Serge E. Hallyn
     [not found]       ` <20151124161630.GL17033-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
2015-11-24 16:17         ` Tejun Heo
2015-11-27  5:25         ` Serge E. Hallyn
2015-11-27  5:25       ` Serge E. Hallyn
2015-11-27  5:25         ` Serge E. Hallyn
     [not found]         ` <20151127052511.GA25490-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2015-11-30 15:11           ` Tejun Heo
2015-11-30 15:11         ` Tejun Heo
2015-11-30 15:11           ` Tejun Heo
2015-11-30 18:37           ` Serge E. Hallyn
2015-11-30 18:37             ` Serge E. Hallyn
2015-11-30 22:53             ` Tejun Heo
     [not found]               ` <20151130225318.GD9039-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
2015-12-01  2:08                 ` Serge E. Hallyn
2015-12-01  2:08               ` Serge E. Hallyn
2015-12-01  2:08                 ` Serge E. Hallyn
     [not found]             ` <20151130183758.GA25433-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2015-11-30 22:53               ` Tejun Heo
     [not found]           ` <20151130151147.GG3535-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
2015-11-30 18:37             ` Serge E. Hallyn
2015-11-16 19:51   ` [PATCH 2/8] sched: new clone flag CLONE_NEWCGROUP for cgroup namespace serge-A9i7LUbDfNHQT0dZR+AlfA
2015-11-16 19:51   ` [PATCH 3/8] cgroup: add function to get task's cgroup serge-A9i7LUbDfNHQT0dZR+AlfA
2015-11-16 19:51     ` serge
     [not found]     ` <1447703505-29672-4-git-send-email-serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2015-11-24 16:27       ` Tejun Heo
2015-11-24 16:27         ` Tejun Heo
     [not found]         ` <20151124162728.GN17033-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
2015-11-24 16:54           ` Tejun Heo
2015-11-24 16:54         ` Tejun Heo
2015-11-24 16:54           ` Tejun Heo
2015-11-16 19:51   ` [PATCH 4/8] cgroup: export cgroup_get() and cgroup_put() serge-A9i7LUbDfNHQT0dZR+AlfA
2015-11-16 19:51   ` [PATCH 5/8] cgroup: introduce cgroup namespaces serge-A9i7LUbDfNHQT0dZR+AlfA
2015-11-16 19:51     ` serge
2015-11-24 16:49     ` Tejun Heo
2015-11-24 16:49       ` Tejun Heo
     [not found]     ` <1447703505-29672-6-git-send-email-serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2015-11-24 16:49       ` Tejun Heo
2015-11-16 19:51   ` [PATCH 6/8] cgroup: cgroup namespace setns support serge-A9i7LUbDfNHQT0dZR+AlfA
2015-11-16 19:51     ` serge
     [not found]     ` <1447703505-29672-7-git-send-email-serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2015-11-24 16:52       ` Tejun Heo
2015-11-24 16:52     ` Tejun Heo
2015-11-24 16:52       ` Tejun Heo
2015-11-16 19:51   ` [PATCH 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns serge-A9i7LUbDfNHQT0dZR+AlfA
2015-11-16 19:51     ` serge
     [not found]     ` <1447703505-29672-8-git-send-email-serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2015-11-24 17:16       ` Tejun Heo
2015-11-24 17:16     ` Tejun Heo
2015-11-24 17:16       ` Tejun Heo
2015-11-25  6:01       ` Serge E. Hallyn
     [not found]         ` <20151125060156.GA678-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2015-11-25 19:10           ` Tejun Heo
2015-11-25 19:10         ` Tejun Heo
2015-11-25 19:10           ` Tejun Heo
     [not found]           ` <20151125191041.GB14240-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
2015-11-25 19:55             ` Serge Hallyn
2015-11-25 19:55           ` Serge Hallyn
2015-11-25 19:55             ` Serge Hallyn
2015-11-25 19:57             ` Tejun Heo
2015-11-25 19:57               ` Tejun Heo
2015-11-25 19:57             ` Tejun Heo
     [not found]       ` <20151124171610.GS17033-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
2015-11-25  6:01         ` Serge E. Hallyn
2015-11-27  5:17         ` Serge E. Hallyn
2015-11-27  5:17       ` Serge E. Hallyn
2015-11-27  5:17         ` Serge E. Hallyn
     [not found]         ` <20151127051745.GA24521-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2015-11-30 15:09           ` Tejun Heo
2015-11-30 15:09         ` Tejun Heo
2015-11-30 15:09           ` Tejun Heo
     [not found]           ` <20151130150938.GF3535-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
2015-12-01  4:07             ` Serge E. Hallyn
2015-12-01  4:07           ` Serge E. Hallyn
2015-12-01  4:07             ` Serge E. Hallyn
     [not found]             ` <20151201040704.GA31067-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2015-12-01 16:46               ` Tejun Heo
2015-12-01 16:46             ` Tejun Heo
2015-12-01 16:46               ` Tejun Heo
2015-12-01 21:58               ` Serge E. Hallyn
2015-12-01 21:58                 ` Serge E. Hallyn
2015-12-02 16:53                 ` Tejun Heo
2015-12-02 16:53                   ` Tejun Heo
     [not found]                   ` <20151202165312.GB19878-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
2015-12-02 16:56                     ` Serge E. Hallyn
2015-12-02 16:56                   ` Serge E. Hallyn
2015-12-02 16:56                     ` Serge E. Hallyn
2015-12-02 16:58                     ` Tejun Heo
2015-12-02 16:58                       ` Tejun Heo
     [not found]                       ` <20151202165839.GD19878-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
2015-12-02 17:02                         ` Serge E. Hallyn
2015-12-02 17:02                       ` Serge E. Hallyn
2015-12-02 17:02                         ` Serge E. Hallyn
     [not found]                         ` <20151202170239.GA21009-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2015-12-02 17:05                           ` Tejun Heo
2015-12-02 17:05                             ` Tejun Heo
2015-12-03 22:47                             ` Serge E. Hallyn
2015-12-03 22:47                               ` Serge E. Hallyn
2015-12-07 15:39                               ` Tejun Heo
2015-12-07 15:39                                 ` Tejun Heo
     [not found]                                 ` <20151207153911.GF9175-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
2015-12-07 15:53                                   ` Serge Hallyn
2015-12-07 15:53                                     ` Serge Hallyn
     [not found]                               ` <20151203224706.GA19971-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2015-12-07 15:39                                 ` Tejun Heo
     [not found]                             ` <20151202170551.GE19878-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
2015-12-03 22:47                               ` Serge E. Hallyn
     [not found]                     ` <20151202165637.GA20840-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2015-12-02 16:58                       ` Tejun Heo
     [not found]                 ` <20151201215853.GA9153-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2015-12-02 16:53                   ` Tejun Heo
     [not found]               ` <20151201164649.GD12922-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
2015-12-01 21:58                 ` Serge E. Hallyn
2015-11-16 19:51   ` [PATCH 8/8] cgroup: Add documentation for cgroup namespaces serge-A9i7LUbDfNHQT0dZR+AlfA
2015-11-16 19:51     ` serge
2015-11-24 17:16     ` Tejun Heo
2015-11-24 17:16       ` Tejun Heo
     [not found]     ` <1447703505-29672-9-git-send-email-serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2015-11-24 17:16       ` Tejun Heo
2015-11-16 20:41   ` CGroup Namespaces (v4) Richard Weinberger
2015-11-16 19:51 ` [PATCH 4/8] cgroup: export cgroup_get() and cgroup_put() serge
2015-11-16 19:51   ` serge-A9i7LUbDfNHQT0dZR+AlfA
     [not found]   ` <1447703505-29672-5-git-send-email-serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2015-11-24 16:30     ` Tejun Heo
2015-11-24 16:30       ` Tejun Heo
2015-11-24 22:35       ` Serge E. Hallyn
2015-11-24 22:35         ` Serge E. Hallyn
     [not found]       ` <20151124163056.GO17033-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
2015-11-24 22:35         ` Serge E. Hallyn
2015-11-16 20:41 ` CGroup Namespaces (v4) Richard Weinberger
2015-11-16 20:41   ` Richard Weinberger
2015-11-16 20:46   ` Serge E. Hallyn
2015-11-16 20:46     ` Serge E. Hallyn
2015-11-16 20:50     ` Richard Weinberger
2015-11-16 20:50       ` Richard Weinberger
2015-11-16 20:54       ` Serge E. Hallyn
2015-11-16 20:54         ` Serge E. Hallyn
2015-11-16 22:24         ` Eric W. Biederman
2015-11-16 22:24           ` Eric W. Biederman
2015-11-16 22:37           ` Tejun Heo
2015-11-16 22:37             ` Tejun Heo
     [not found]           ` <87y4dxh9b8.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2015-11-16 22:37             ` Tejun Heo
2015-11-17  1:13             ` Serge E. Hallyn
2015-11-17  1:13               ` Serge E. Hallyn
2015-11-17  1:13               ` Serge E. Hallyn
2015-11-17  1:40               ` Serge E. Hallyn
2015-11-17  1:40                 ` Serge E. Hallyn
2015-11-17  3:54                 ` Serge E. Hallyn
2015-11-17  3:54                   ` Serge E. Hallyn
     [not found]                 ` <20151117014026.GA2331-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2015-11-17  3:54                   ` Serge E. Hallyn
     [not found]               ` <20151117011349.GA1958-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2015-11-17  1:40                 ` Serge E. Hallyn
2015-11-18  2:30             ` Serge E. Hallyn
2015-11-18  2:30               ` Serge E. Hallyn
     [not found]               ` <20151118023022.GA17501-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2015-11-18  9:18                 ` Eric W. Biederman
2015-11-18  9:18                   ` Eric W. Biederman
     [not found]                   ` <87r3jnfyx7.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2015-11-18 15:43                     ` Serge E. Hallyn
2015-11-18 15:43                       ` Serge E. Hallyn
     [not found]         ` <20151116205452.GA30975-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2015-11-16 22:24           ` Eric W. Biederman
     [not found]       ` <564A41AF.4040208-/L3Ra7n9ekc@public.gmane.org>
2015-11-16 20:54         ` Serge E. Hallyn
     [not found]     ` <20151116204606.GA30681-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2015-11-16 20:50       ` Richard Weinberger
     [not found]   ` <CAFLxGvzVmbZHrpaTmXUAK03hsnVPwEs3SJGNFNXfthh3NL8EDg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-11-16 20:46     ` Serge E. Hallyn
2015-12-09 19:28 CGroup Namespaces (v7) serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA
2015-12-09 19:28 ` [PATCH 1/8] kernfs: Add API to generate relative kernfs path serge.hallyn
2015-12-09 21:38   ` Tejun Heo
2015-12-09 21:38     ` Tejun Heo
2015-12-09 22:13     ` Serge Hallyn
2015-12-09 22:13       ` Serge Hallyn
2015-12-09 22:36       ` Tejun Heo
2015-12-09 22:36         ` Tejun Heo
2015-12-09 22:51         ` Serge E. Hallyn
     [not found]         ` <20151209223651.GQ30240-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
2015-12-09 22:51           ` Serge E. Hallyn
2015-12-10  1:28           ` Serge E. Hallyn
2015-12-10  1:28             ` Serge E. Hallyn
2015-12-09 22:36       ` Tejun Heo
     [not found]     ` <20151209213806.GP30240-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
2015-12-09 22:13       ` Serge Hallyn
     [not found]   ` <1449689341-28742-2-git-send-email-serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
2015-12-09 21:38     ` Tejun Heo
     [not found] ` <1449689341-28742-1-git-send-email-serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
2015-12-09 19:28   ` serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA
2015-12-23  4:23 CGroup Namespaces (v8) serge.hallyn
     [not found] ` <1450844609-9194-1-git-send-email-serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
2015-12-23  4:23   ` [PATCH 1/8] kernfs: Add API to generate relative kernfs path serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA
2015-12-23  4:23     ` serge.hallyn
2015-12-23 16:08     ` Tejun Heo
2015-12-23 16:08       ` Tejun Heo
2015-12-23 16:36       ` Serge E. Hallyn
2015-12-23 16:36         ` Serge E. Hallyn
     [not found]       ` <20151223160854.GF5003-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
2015-12-23 16:36         ` Serge E. Hallyn
2015-12-23 16:24     ` Tejun Heo
2015-12-23 16:24       ` Tejun Heo
2015-12-23 16:51       ` Greg KH
2015-12-23 16:51         ` Greg KH
     [not found]       ` <20151223162433.GH5003-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
2015-12-23 16:51         ` Greg KH
     [not found]     ` <1450844609-9194-2-git-send-email-serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
2015-12-23 16:08       ` Tejun Heo
2015-12-23 16:24       ` Tejun Heo
2016-01-04 19:54 CGroup Namespaces (v9) serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA
     [not found] ` <1451937294-22589-1-git-send-email-serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
2016-01-04 19:54   ` [PATCH 1/8] kernfs: Add API to generate relative kernfs path serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA
2016-01-04 19:54     ` serge.hallyn
2016-01-29  8:54 CGroup Namespaces (v10) serge.hallyn
2016-01-29  8:54 ` [PATCH 1/8] kernfs: Add API to generate relative kernfs path serge.hallyn
2016-01-29  8:54   ` serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA
     [not found] ` <1454057651-23959-1-git-send-email-serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
2016-01-29  8:54   ` serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.