All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] cgroup: use negative bias on css->refcnt to block css_tryget()
@ 2012-03-30 22:34 ` Tejun Heo
  0 siblings, 0 replies; 33+ messages in thread
From: Tejun Heo @ 2012-03-30 22:34 UTC (permalink / raw)
  To: Li Zefan, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA
  Cc: ctalbott-hpIqsD4AKlfQT0dZR+AlfA, rni-hpIqsD4AKlfQT0dZR+AlfA,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Michal Hocko,
	Johannes Weiner, Vivek Goyal

When a cgroup is about to be removed, cgroup_clear_css_refs() is
called to check and ensure that there are no active css references.

This is currently achieved by dropping the refcnt to zero iff it has
only the base ref.  If all css refs could be dropped to zero, ref
clearing is successful and CSS_REMOVED is set on all css.  If not, the
base ref is restored.  While css ref is zero w/o CSS_REMOVED set, any
css_tryget() attempt on it busy loops so that they are atomic
w.r.t. the whole css ref clearing.

This does work but dropping and re-instating the base ref is somewhat
hairy and makes it difficult to add more logic to the put path as
there are two of them - the regular css_put() and the reversible base
ref clearing.

This patch updates css ref clearing such that blocking new
css_tryget() and putting the base ref are separate operations.
CSS_DEACT_BIAS, defined as INT_MIN, is added to css->refcnt and
css_tryget() busy loops while refcnt is negative.  After all css refs
are deactivated, if they were all one, ref clearing succeeded and
CSS_REMOVED is set and the base ref is put using the regular
css_put(); otherwise, CSS_DEACT_BIAS is subtracted from the refcnts
and the original postive values are restored.

css_refcnt() accessor which always returns the unbiased positive
reference counts is added and used to simplify refcnt usages.  While
at it, relocate and reformat comments in cgroup_has_css_refs().

This separates css->refcnt deactivation and putting the base ref,
which enables the next patch to make ref clearing optional.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
These two patches should be applied on top of the cftype patchset[1]
and is available in the following git branch.

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git cgroup-css-refs

Thanks.

[1] http://thread.gmane.org/gmane.linux.kernel.cgroups/1322

 include/linux/cgroup.h |   12 ++---
 kernel/cgroup.c        |  119 ++++++++++++++++++++++++++++-------------------
 2 files changed, 75 insertions(+), 56 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 028478c..be81faf 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -115,16 +115,12 @@ static inline bool css_is_removed(struct cgroup_subsys_state *css)
  * the css has been destroyed.
  */
 
+extern bool __css_tryget(struct cgroup_subsys_state *css);
 static inline bool css_tryget(struct cgroup_subsys_state *css)
 {
 	if (test_bit(CSS_ROOT, &css->flags))
 		return true;
-	while (!atomic_inc_not_zero(&css->refcnt)) {
-		if (test_bit(CSS_REMOVED, &css->flags))
-			return false;
-		cpu_relax();
-	}
-	return true;
+	return __css_tryget(css);
 }
 
 /*
@@ -132,11 +128,11 @@ static inline bool css_tryget(struct cgroup_subsys_state *css)
  * css_get() or css_tryget()
  */
 
-extern void __css_put(struct cgroup_subsys_state *css, int count);
+extern void __css_put(struct cgroup_subsys_state *css);
 static inline void css_put(struct cgroup_subsys_state *css)
 {
 	if (!test_bit(CSS_ROOT, &css->flags))
-		__css_put(css, 1);
+		__css_put(css);
 }
 
 /* bits in struct cgroup flags field */
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 21bba77..2eade51 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -63,6 +63,9 @@
 
 #include <linux/atomic.h>
 
+/* css deactivation bias, makes css->refcnt negative to deny new trygets */
+#define CSS_DEACT_BIAS		INT_MIN
+
 /*
  * cgroup_mutex is the master lock.  Any modification to cgroup or its
  * hierarchy must be performed while holding it.
@@ -251,6 +254,14 @@ int cgroup_lock_is_held(void)
 
 EXPORT_SYMBOL_GPL(cgroup_lock_is_held);
 
+/* the current nr of refs, always >= 0 whether @css is deactivated or not */
+static int css_refcnt(struct cgroup_subsys_state *css)
+{
+	int v = atomic_read(&css->refcnt);
+
+	return v >= 0 ? v : v - CSS_DEACT_BIAS;
+}
+
 /* convenient tests for these bits */
 inline int cgroup_is_removed(const struct cgroup *cgrp)
 {
@@ -4006,18 +4017,19 @@ static int cgroup_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
 	return cgroup_create(c_parent, dentry, mode | S_IFDIR);
 }
 
+/*
+ * Check the reference count on each subsystem. Since we already
+ * established that there are no tasks in the cgroup, if the css refcount
+ * is also 1, then there should be no outstanding references, so the
+ * subsystem is safe to destroy. We scan across all subsystems rather than
+ * using the per-hierarchy linked list of mounted subsystems since we can
+ * be called via check_for_release() with no synchronization other than
+ * RCU, and the subsystem linked list isn't RCU-safe.
+ */
 static int cgroup_has_css_refs(struct cgroup *cgrp)
 {
-	/* Check the reference count on each subsystem. Since we
-	 * already established that there are no tasks in the
-	 * cgroup, if the css refcount is also 1, then there should
-	 * be no outstanding references, so the subsystem is safe to
-	 * destroy. We scan across all subsystems rather than using
-	 * the per-hierarchy linked list of mounted subsystems since
-	 * we can be called via check_for_release() with no
-	 * synchronization other than RCU, and the subsystem linked
-	 * list isn't RCU-safe */
 	int i;
+
 	/*
 	 * We won't need to lock the subsys array, because the subsystems
 	 * we're concerned about aren't going anywhere since our cgroup root
@@ -4026,17 +4038,21 @@ static int cgroup_has_css_refs(struct cgroup *cgrp)
 	for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
 		struct cgroup_subsys *ss = subsys[i];
 		struct cgroup_subsys_state *css;
+
 		/* Skip subsystems not present or not in this hierarchy */
 		if (ss == NULL || ss->root != cgrp->root)
 			continue;
+
 		css = cgrp->subsys[ss->subsys_id];
-		/* When called from check_for_release() it's possible
+		/*
+		 * When called from check_for_release() it's possible
 		 * that by this point the cgroup has been removed
 		 * and the css deleted. But a false-positive doesn't
 		 * matter, since it can only happen if the cgroup
 		 * has been deleted and hence no longer needs the
-		 * release agent to be called anyway. */
-		if (css && (atomic_read(&css->refcnt) > 1))
+		 * release agent to be called anyway.
+		 */
+		if (css && css_refcnt(css) > 1)
 			return 1;
 	}
 	return 0;
@@ -4053,44 +4069,37 @@ static int cgroup_clear_css_refs(struct cgroup *cgrp)
 	struct cgroup_subsys *ss;
 	unsigned long flags;
 	bool failed = false;
+
 	local_irq_save(flags);
+
+	/*
+	 * Block new css_tryget() by deactivating refcnt.  If all refcnts
+	 * were 1 at the moment of deactivation, we succeeded.
+	 */
 	for_each_subsys(cgrp->root, ss) {
 		struct cgroup_subsys_state *css = cgrp->subsys[ss->subsys_id];
-		int refcnt;
-		while (1) {
-			/* We can only remove a CSS with a refcnt==1 */
-			refcnt = atomic_read(&css->refcnt);
-			if (refcnt > 1) {
-				failed = true;
-				goto done;
-			}
-			BUG_ON(!refcnt);
-			/*
-			 * Drop the refcnt to 0 while we check other
-			 * subsystems. This will cause any racing
-			 * css_tryget() to spin until we set the
-			 * CSS_REMOVED bits or abort
-			 */
-			if (atomic_cmpxchg(&css->refcnt, refcnt, 0) == refcnt)
-				break;
-			cpu_relax();
-		}
+
+		WARN_ON(atomic_read(&css->refcnt) < 0);
+		atomic_add(CSS_DEACT_BIAS, &css->refcnt);
+		failed |= css_refcnt(css) != 1;
 	}
- done:
+
+	/*
+	 * If succeeded, set REMOVED and put all the base refs; otherwise,
+	 * restore refcnts to positive values.  Either way, all in-progress
+	 * css_tryget() will be released.
+	 */
 	for_each_subsys(cgrp->root, ss) {
 		struct cgroup_subsys_state *css = cgrp->subsys[ss->subsys_id];
-		if (failed) {
-			/*
-			 * Restore old refcnt if we previously managed
-			 * to clear it from 1 to 0
-			 */
-			if (!atomic_read(&css->refcnt))
-				atomic_set(&css->refcnt, 1);
-		} else {
-			/* Commit the fact that the CSS is removed */
+
+		if (!failed) {
 			set_bit(CSS_REMOVED, &css->flags);
+			css_put(css);
+		} else {
+			atomic_sub(CSS_DEACT_BIAS, &css->refcnt);
 		}
 	}
+
 	local_irq_restore(flags);
 	return !failed;
 }
@@ -4887,13 +4896,28 @@ static void check_for_release(struct cgroup *cgrp)
 }
 
 /* Caller must verify that the css is not for root cgroup */
-void __css_put(struct cgroup_subsys_state *css, int count)
+bool __css_tryget(struct cgroup_subsys_state *css)
+{
+	do {
+		int v = css_refcnt(css);
+
+		if (atomic_cmpxchg(&css->refcnt, v, v + 1) == v)
+			return true;
+		cpu_relax();
+	} while (!test_bit(CSS_REMOVED, &css->flags));
+
+	return false;
+}
+EXPORT_SYMBOL_GPL(__css_tryget);
+
+/* Caller must verify that the css is not for root cgroup */
+void __css_put(struct cgroup_subsys_state *css)
 {
 	struct cgroup *cgrp = css->cgroup;
-	int val;
+
 	rcu_read_lock();
-	val = atomic_sub_return(count, &css->refcnt);
-	if (val == 1) {
+	atomic_dec(&css->refcnt);
+	if (css_refcnt(css) == 1) {
 		if (notify_on_release(cgrp)) {
 			set_bit(CGRP_RELEASABLE, &cgrp->flags);
 			check_for_release(cgrp);
@@ -4901,7 +4925,6 @@ void __css_put(struct cgroup_subsys_state *css, int count)
 		cgroup_wakeup_rmdir_waiter(cgrp);
 	}
 	rcu_read_unlock();
-	WARN_ON_ONCE(val < 1);
 }
 EXPORT_SYMBOL_GPL(__css_put);
 
@@ -5020,7 +5043,7 @@ unsigned short css_id(struct cgroup_subsys_state *css)
 	 * on this or this is under rcu_read_lock(). Once css->id is allocated,
 	 * it's unchanged until freed.
 	 */
-	cssid = rcu_dereference_check(css->id, atomic_read(&css->refcnt));
+	cssid = rcu_dereference_check(css->id, css_refcnt(css));
 
 	if (cssid)
 		return cssid->id;
@@ -5032,7 +5055,7 @@ unsigned short css_depth(struct cgroup_subsys_state *css)
 {
 	struct css_id *cssid;
 
-	cssid = rcu_dereference_check(css->id, atomic_read(&css->refcnt));
+	cssid = rcu_dereference_check(css->id, css_refcnt(css));
 
 	if (cssid)
 		return cssid->depth;
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 1/2] cgroup: use negative bias on css->refcnt to block css_tryget()
@ 2012-03-30 22:34 ` Tejun Heo
  0 siblings, 0 replies; 33+ messages in thread
From: Tejun Heo @ 2012-03-30 22:34 UTC (permalink / raw)
  To: Li Zefan, containers, cgroups
  Cc: linux-kernel, fweisbec, rni, ctalbott, Vivek Goyal,
	Johannes Weiner, Michal Hocko, Balbir Singh, KAMEZAWA Hiroyuki

When a cgroup is about to be removed, cgroup_clear_css_refs() is
called to check and ensure that there are no active css references.

This is currently achieved by dropping the refcnt to zero iff it has
only the base ref.  If all css refs could be dropped to zero, ref
clearing is successful and CSS_REMOVED is set on all css.  If not, the
base ref is restored.  While css ref is zero w/o CSS_REMOVED set, any
css_tryget() attempt on it busy loops so that they are atomic
w.r.t. the whole css ref clearing.

This does work but dropping and re-instating the base ref is somewhat
hairy and makes it difficult to add more logic to the put path as
there are two of them - the regular css_put() and the reversible base
ref clearing.

This patch updates css ref clearing such that blocking new
css_tryget() and putting the base ref are separate operations.
CSS_DEACT_BIAS, defined as INT_MIN, is added to css->refcnt and
css_tryget() busy loops while refcnt is negative.  After all css refs
are deactivated, if they were all one, ref clearing succeeded and
CSS_REMOVED is set and the base ref is put using the regular
css_put(); otherwise, CSS_DEACT_BIAS is subtracted from the refcnts
and the original postive values are restored.

css_refcnt() accessor which always returns the unbiased positive
reference counts is added and used to simplify refcnt usages.  While
at it, relocate and reformat comments in cgroup_has_css_refs().

This separates css->refcnt deactivation and putting the base ref,
which enables the next patch to make ref clearing optional.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
These two patches should be applied on top of the cftype patchset[1]
and is available in the following git branch.

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git cgroup-css-refs

Thanks.

[1] http://thread.gmane.org/gmane.linux.kernel.cgroups/1322

 include/linux/cgroup.h |   12 ++---
 kernel/cgroup.c        |  119 ++++++++++++++++++++++++++++-------------------
 2 files changed, 75 insertions(+), 56 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 028478c..be81faf 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -115,16 +115,12 @@ static inline bool css_is_removed(struct cgroup_subsys_state *css)
  * the css has been destroyed.
  */
 
+extern bool __css_tryget(struct cgroup_subsys_state *css);
 static inline bool css_tryget(struct cgroup_subsys_state *css)
 {
 	if (test_bit(CSS_ROOT, &css->flags))
 		return true;
-	while (!atomic_inc_not_zero(&css->refcnt)) {
-		if (test_bit(CSS_REMOVED, &css->flags))
-			return false;
-		cpu_relax();
-	}
-	return true;
+	return __css_tryget(css);
 }
 
 /*
@@ -132,11 +128,11 @@ static inline bool css_tryget(struct cgroup_subsys_state *css)
  * css_get() or css_tryget()
  */
 
-extern void __css_put(struct cgroup_subsys_state *css, int count);
+extern void __css_put(struct cgroup_subsys_state *css);
 static inline void css_put(struct cgroup_subsys_state *css)
 {
 	if (!test_bit(CSS_ROOT, &css->flags))
-		__css_put(css, 1);
+		__css_put(css);
 }
 
 /* bits in struct cgroup flags field */
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 21bba77..2eade51 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -63,6 +63,9 @@
 
 #include <linux/atomic.h>
 
+/* css deactivation bias, makes css->refcnt negative to deny new trygets */
+#define CSS_DEACT_BIAS		INT_MIN
+
 /*
  * cgroup_mutex is the master lock.  Any modification to cgroup or its
  * hierarchy must be performed while holding it.
@@ -251,6 +254,14 @@ int cgroup_lock_is_held(void)
 
 EXPORT_SYMBOL_GPL(cgroup_lock_is_held);
 
+/* the current nr of refs, always >= 0 whether @css is deactivated or not */
+static int css_refcnt(struct cgroup_subsys_state *css)
+{
+	int v = atomic_read(&css->refcnt);
+
+	return v >= 0 ? v : v - CSS_DEACT_BIAS;
+}
+
 /* convenient tests for these bits */
 inline int cgroup_is_removed(const struct cgroup *cgrp)
 {
@@ -4006,18 +4017,19 @@ static int cgroup_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
 	return cgroup_create(c_parent, dentry, mode | S_IFDIR);
 }
 
+/*
+ * Check the reference count on each subsystem. Since we already
+ * established that there are no tasks in the cgroup, if the css refcount
+ * is also 1, then there should be no outstanding references, so the
+ * subsystem is safe to destroy. We scan across all subsystems rather than
+ * using the per-hierarchy linked list of mounted subsystems since we can
+ * be called via check_for_release() with no synchronization other than
+ * RCU, and the subsystem linked list isn't RCU-safe.
+ */
 static int cgroup_has_css_refs(struct cgroup *cgrp)
 {
-	/* Check the reference count on each subsystem. Since we
-	 * already established that there are no tasks in the
-	 * cgroup, if the css refcount is also 1, then there should
-	 * be no outstanding references, so the subsystem is safe to
-	 * destroy. We scan across all subsystems rather than using
-	 * the per-hierarchy linked list of mounted subsystems since
-	 * we can be called via check_for_release() with no
-	 * synchronization other than RCU, and the subsystem linked
-	 * list isn't RCU-safe */
 	int i;
+
 	/*
 	 * We won't need to lock the subsys array, because the subsystems
 	 * we're concerned about aren't going anywhere since our cgroup root
@@ -4026,17 +4038,21 @@ static int cgroup_has_css_refs(struct cgroup *cgrp)
 	for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
 		struct cgroup_subsys *ss = subsys[i];
 		struct cgroup_subsys_state *css;
+
 		/* Skip subsystems not present or not in this hierarchy */
 		if (ss == NULL || ss->root != cgrp->root)
 			continue;
+
 		css = cgrp->subsys[ss->subsys_id];
-		/* When called from check_for_release() it's possible
+		/*
+		 * When called from check_for_release() it's possible
 		 * that by this point the cgroup has been removed
 		 * and the css deleted. But a false-positive doesn't
 		 * matter, since it can only happen if the cgroup
 		 * has been deleted and hence no longer needs the
-		 * release agent to be called anyway. */
-		if (css && (atomic_read(&css->refcnt) > 1))
+		 * release agent to be called anyway.
+		 */
+		if (css && css_refcnt(css) > 1)
 			return 1;
 	}
 	return 0;
@@ -4053,44 +4069,37 @@ static int cgroup_clear_css_refs(struct cgroup *cgrp)
 	struct cgroup_subsys *ss;
 	unsigned long flags;
 	bool failed = false;
+
 	local_irq_save(flags);
+
+	/*
+	 * Block new css_tryget() by deactivating refcnt.  If all refcnts
+	 * were 1 at the moment of deactivation, we succeeded.
+	 */
 	for_each_subsys(cgrp->root, ss) {
 		struct cgroup_subsys_state *css = cgrp->subsys[ss->subsys_id];
-		int refcnt;
-		while (1) {
-			/* We can only remove a CSS with a refcnt==1 */
-			refcnt = atomic_read(&css->refcnt);
-			if (refcnt > 1) {
-				failed = true;
-				goto done;
-			}
-			BUG_ON(!refcnt);
-			/*
-			 * Drop the refcnt to 0 while we check other
-			 * subsystems. This will cause any racing
-			 * css_tryget() to spin until we set the
-			 * CSS_REMOVED bits or abort
-			 */
-			if (atomic_cmpxchg(&css->refcnt, refcnt, 0) == refcnt)
-				break;
-			cpu_relax();
-		}
+
+		WARN_ON(atomic_read(&css->refcnt) < 0);
+		atomic_add(CSS_DEACT_BIAS, &css->refcnt);
+		failed |= css_refcnt(css) != 1;
 	}
- done:
+
+	/*
+	 * If succeeded, set REMOVED and put all the base refs; otherwise,
+	 * restore refcnts to positive values.  Either way, all in-progress
+	 * css_tryget() will be released.
+	 */
 	for_each_subsys(cgrp->root, ss) {
 		struct cgroup_subsys_state *css = cgrp->subsys[ss->subsys_id];
-		if (failed) {
-			/*
-			 * Restore old refcnt if we previously managed
-			 * to clear it from 1 to 0
-			 */
-			if (!atomic_read(&css->refcnt))
-				atomic_set(&css->refcnt, 1);
-		} else {
-			/* Commit the fact that the CSS is removed */
+
+		if (!failed) {
 			set_bit(CSS_REMOVED, &css->flags);
+			css_put(css);
+		} else {
+			atomic_sub(CSS_DEACT_BIAS, &css->refcnt);
 		}
 	}
+
 	local_irq_restore(flags);
 	return !failed;
 }
@@ -4887,13 +4896,28 @@ static void check_for_release(struct cgroup *cgrp)
 }
 
 /* Caller must verify that the css is not for root cgroup */
-void __css_put(struct cgroup_subsys_state *css, int count)
+bool __css_tryget(struct cgroup_subsys_state *css)
+{
+	do {
+		int v = css_refcnt(css);
+
+		if (atomic_cmpxchg(&css->refcnt, v, v + 1) == v)
+			return true;
+		cpu_relax();
+	} while (!test_bit(CSS_REMOVED, &css->flags));
+
+	return false;
+}
+EXPORT_SYMBOL_GPL(__css_tryget);
+
+/* Caller must verify that the css is not for root cgroup */
+void __css_put(struct cgroup_subsys_state *css)
 {
 	struct cgroup *cgrp = css->cgroup;
-	int val;
+
 	rcu_read_lock();
-	val = atomic_sub_return(count, &css->refcnt);
-	if (val == 1) {
+	atomic_dec(&css->refcnt);
+	if (css_refcnt(css) == 1) {
 		if (notify_on_release(cgrp)) {
 			set_bit(CGRP_RELEASABLE, &cgrp->flags);
 			check_for_release(cgrp);
@@ -4901,7 +4925,6 @@ void __css_put(struct cgroup_subsys_state *css, int count)
 		cgroup_wakeup_rmdir_waiter(cgrp);
 	}
 	rcu_read_unlock();
-	WARN_ON_ONCE(val < 1);
 }
 EXPORT_SYMBOL_GPL(__css_put);
 
@@ -5020,7 +5043,7 @@ unsigned short css_id(struct cgroup_subsys_state *css)
 	 * on this or this is under rcu_read_lock(). Once css->id is allocated,
 	 * it's unchanged until freed.
 	 */
-	cssid = rcu_dereference_check(css->id, atomic_read(&css->refcnt));
+	cssid = rcu_dereference_check(css->id, css_refcnt(css));
 
 	if (cssid)
 		return cssid->id;
@@ -5032,7 +5055,7 @@ unsigned short css_depth(struct cgroup_subsys_state *css)
 {
 	struct css_id *cssid;
 
-	cssid = rcu_dereference_check(css->id, atomic_read(&css->refcnt));
+	cssid = rcu_dereference_check(css->id, css_refcnt(css));
 
 	if (cssid)
 		return cssid->depth;
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/2] cgroup: make css->refcnt clearing on cgroup removal optional
  2012-03-30 22:34 ` Tejun Heo
@ 2012-03-30 22:36     ` Tejun Heo
  -1 siblings, 0 replies; 33+ messages in thread
From: Tejun Heo @ 2012-03-30 22:36 UTC (permalink / raw)
  To: Li Zefan, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA
  Cc: ctalbott-hpIqsD4AKlfQT0dZR+AlfA, rni-hpIqsD4AKlfQT0dZR+AlfA,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Michal Hocko,
	Johannes Weiner, Vivek Goyal

Currently, cgroup removal tries to drain all css references.  If there
are active css references, the removal logic waits and retries
->pre_detroy() until either all refs drop to zero or removal is
cancelled.

This semantics is unusual and adds non-trivial complexity to cgroup
core and IMHO is fundamentally misguided in that it couples internal
implementation details (references to internal data structure) with
externally visible operation (rmdir).  To userland, this is a behavior
peculiarity which is unnecessary and difficult to expect (css refs is
otherwise invisible from userland), and, to policy implementations,
this is an unnecessary restriction (e.g. blkcg wants to hold css refs
for caching purposes but can't as that becomes visible as rmdir hang).

Unfortunately, memcg currently depends on ->pre_destroy() retrials and
cgroup removal vetoing and can't be immmediately switched to the new
behavior.  This patch introduces the new behavior of not waiting for
css refs to drain and maintains the old behavior for subsystems which
have __DEPRECATED_clear_css_refs set.

Once, memcg is updated, we can drop the code paths for the old
behavior as proposed in the following patch.  Note that the following
patch is incorrect in that dput work item is in cgroup and may lose
some of dputs when multiples css's are released back-to-back, and
__css_put() triggers check_for_release() when refcnt reaches 0 instead
of 1; however, it shows what part can be removed.

  http://thread.gmane.org/gmane.linux.kernel.containers/22559/focus=75251

Note that, in not-too-distant future, cgroup core will start emitting
warning messages for subsys which require the old behavior, so please
get moving.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Cc: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
Cc: Balbir Singh <bsingharora-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
---
So, memcg can keep the current behavior for now w/
__DEPRECATED_clear_css_refs set but I'll keep increasing the
pressure. ;P

Thanks.

 include/linux/cgroup.h |   17 +++++++++++
 kernel/cgroup.c        |   71 +++++++++++++++++++++++++++++++++++++++++------
 mm/memcontrol.c        |    1 +
 3 files changed, 80 insertions(+), 9 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index be81faf..565c803 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -16,6 +16,7 @@
 #include <linux/prio_heap.h>
 #include <linux/rwsem.h>
 #include <linux/idr.h>
+#include <linux/workqueue.h>
 
 #ifdef CONFIG_CGROUPS
 
@@ -76,12 +77,16 @@ struct cgroup_subsys_state {
 	unsigned long flags;
 	/* ID for this css, if possible */
 	struct css_id __rcu *id;
+
+	/* Used to put @cgroup->dentry on the last css_put() */
+	struct work_struct dput_work;
 };
 
 /* bits in struct cgroup_subsys_state flags field */
 enum {
 	CSS_ROOT, /* This CSS is the root of the subsystem */
 	CSS_REMOVED, /* This CSS is dead */
+	CSS_CLEAR_CSS_REFS,		/* @ss->__DEPRECATED_clear_css_refs */
 };
 
 /* Caller must verify that the css is not for root cgroup */
@@ -480,6 +485,18 @@ struct cgroup_subsys {
 	 * (not available in early_init time.)
 	 */
 	bool use_id;
+
+	/*
+	 * If %true, cgroup removal will try to clear css refs by retrying
+	 * ss->pre_destroy() until there's no css ref left.  This behavior
+	 * is strictly for backward compatibility and will be removed as
+	 * soon as the current user (memcg) is updated.
+	 *
+	 * If %false, ss->pre_destroy() can't fail and cgroup removal won't
+	 * wait for css refs to drop to zero before proceeding.
+	 */
+	bool __DEPRECATED_clear_css_refs;
+
 #define MAX_CGROUP_TYPE_NAMELEN 32
 	const char *name;
 
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 2eade51..2905977 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -854,12 +854,17 @@ static int cgroup_call_pre_destroy(struct cgroup *cgrp)
 	struct cgroup_subsys *ss;
 	int ret = 0;
 
-	for_each_subsys(cgrp->root, ss)
-		if (ss->pre_destroy) {
-			ret = ss->pre_destroy(cgrp);
-			if (ret)
-				break;
+	for_each_subsys(cgrp->root, ss) {
+		if (!ss->pre_destroy)
+			continue;
+
+		ret = ss->pre_destroy(cgrp);
+		if (ret) {
+			/* ->pre_destroy() failure is being deprecated */
+			WARN_ON_ONCE(!ss->__DEPRECATED_clear_css_refs);
+			break;
 		}
+	}
 
 	return ret;
 }
@@ -3859,6 +3864,14 @@ static int cgroup_populate_dir(struct cgroup *cgrp)
 	return 0;
 }
 
+static void css_dput_fn(struct work_struct *work)
+{
+	struct cgroup_subsys_state *css =
+		container_of(work, struct cgroup_subsys_state, dput_work);
+
+	dput(css->cgroup->dentry);
+}
+
 static void init_cgroup_css(struct cgroup_subsys_state *css,
 			       struct cgroup_subsys *ss,
 			       struct cgroup *cgrp)
@@ -3871,6 +3884,16 @@ static void init_cgroup_css(struct cgroup_subsys_state *css,
 		set_bit(CSS_ROOT, &css->flags);
 	BUG_ON(cgrp->subsys[ss->subsys_id]);
 	cgrp->subsys[ss->subsys_id] = css;
+
+	/*
+	 * If !clear_css_refs, css holds an extra ref to @cgrp->dentry
+	 * which is put on the last css_put().  dput() requires process
+	 * context, which css_put() may be called without.  @css->dput_work
+	 * will be used to invoke dput() asynchronously from css_put().
+	 */
+	INIT_WORK(&css->dput_work, css_dput_fn);
+	if (ss->__DEPRECATED_clear_css_refs)
+		set_bit(CSS_CLEAR_CSS_REFS, &css->flags);
 }
 
 static void cgroup_lock_hierarchy(struct cgroupfs_root *root)
@@ -3973,6 +3996,11 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
 	if (err < 0)
 		goto err_remove;
 
+	/* If !clear_css_refs, each css holds a ref to the cgroup's dentry */
+	for_each_subsys(root, ss)
+		if (!ss->__DEPRECATED_clear_css_refs)
+			dget(dentry);
+
 	/* The cgroup directory was pre-locked for us */
 	BUG_ON(!mutex_is_locked(&cgrp->dentry->d_inode->i_mutex));
 
@@ -4062,8 +4090,24 @@ static int cgroup_has_css_refs(struct cgroup *cgrp)
  * Atomically mark all (or else none) of the cgroup's CSS objects as
  * CSS_REMOVED. Return true on success, or false if the cgroup has
  * busy subsystems. Call with cgroup_mutex held
+ *
+ * Depending on whether a subsys has __DEPRECATED_clear_css_refs set or
+ * not, cgroup removal behaves differently.
+ *
+ * If clear is set, css refcnt for the subsystem should be zero before
+ * cgroup removal can be committed.  This is implemented by
+ * CGRP_WAIT_ON_RMDIR and retry logic around ->pre_destroy(), which may be
+ * called multiple times until all css refcnts reach zero and is allowed to
+ * veto removal on any invocation.  This behavior is deprecated and will be
+ * removed as soon as the existing user (memcg) is updated.
+ *
+ * If clear is not set, each css holds an extra reference to the cgroup's
+ * dentry and cgroup removal proceeds regardless of css refs.
+ * ->pre_destroy() will be called at least once and is not allowed to fail.
+ * On the last put of each css, whenever that may be, the extra dentry ref
+ * is put so that dentry destruction happens only after all css's are
+ * released.
  */
-
 static int cgroup_clear_css_refs(struct cgroup *cgrp)
 {
 	struct cgroup_subsys *ss;
@@ -4074,14 +4118,17 @@ static int cgroup_clear_css_refs(struct cgroup *cgrp)
 
 	/*
 	 * Block new css_tryget() by deactivating refcnt.  If all refcnts
-	 * were 1 at the moment of deactivation, we succeeded.
+	 * for subsystems w/ clear_css_refs set were 1 at the moment of
+	 * deactivation, we succeeded.
 	 */
 	for_each_subsys(cgrp->root, ss) {
 		struct cgroup_subsys_state *css = cgrp->subsys[ss->subsys_id];
 
 		WARN_ON(atomic_read(&css->refcnt) < 0);
 		atomic_add(CSS_DEACT_BIAS, &css->refcnt);
-		failed |= css_refcnt(css) != 1;
+
+		if (ss->__DEPRECATED_clear_css_refs)
+			failed |= css_refcnt(css) != 1;
 	}
 
 	/*
@@ -4917,12 +4964,18 @@ void __css_put(struct cgroup_subsys_state *css)
 
 	rcu_read_lock();
 	atomic_dec(&css->refcnt);
-	if (css_refcnt(css) == 1) {
+	switch (css_refcnt(css)) {
+	case 1:
 		if (notify_on_release(cgrp)) {
 			set_bit(CGRP_RELEASABLE, &cgrp->flags);
 			check_for_release(cgrp);
 		}
 		cgroup_wakeup_rmdir_waiter(cgrp);
+		break;
+	case 0:
+		if (!test_bit(CSS_CLEAR_CSS_REFS, &css->flags))
+			schedule_work(&css->dput_work);
+		break;
 	}
 	rcu_read_unlock();
 }
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index bef1142..d28359c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5635,6 +5635,7 @@ struct cgroup_subsys mem_cgroup_subsys = {
 	.base_cftypes = mem_cgroup_files,
 	.early_init = 0,
 	.use_id = 1,
+	.__DEPRECATED_clear_css_refs = true,
 };
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/2] cgroup: make css->refcnt clearing on cgroup removal optional
@ 2012-03-30 22:36     ` Tejun Heo
  0 siblings, 0 replies; 33+ messages in thread
From: Tejun Heo @ 2012-03-30 22:36 UTC (permalink / raw)
  To: Li Zefan, containers, cgroups
  Cc: linux-kernel, fweisbec, rni, ctalbott, Vivek Goyal,
	Johannes Weiner, Michal Hocko, Balbir Singh, KAMEZAWA Hiroyuki

Currently, cgroup removal tries to drain all css references.  If there
are active css references, the removal logic waits and retries
->pre_detroy() until either all refs drop to zero or removal is
cancelled.

This semantics is unusual and adds non-trivial complexity to cgroup
core and IMHO is fundamentally misguided in that it couples internal
implementation details (references to internal data structure) with
externally visible operation (rmdir).  To userland, this is a behavior
peculiarity which is unnecessary and difficult to expect (css refs is
otherwise invisible from userland), and, to policy implementations,
this is an unnecessary restriction (e.g. blkcg wants to hold css refs
for caching purposes but can't as that becomes visible as rmdir hang).

Unfortunately, memcg currently depends on ->pre_destroy() retrials and
cgroup removal vetoing and can't be immmediately switched to the new
behavior.  This patch introduces the new behavior of not waiting for
css refs to drain and maintains the old behavior for subsystems which
have __DEPRECATED_clear_css_refs set.

Once, memcg is updated, we can drop the code paths for the old
behavior as proposed in the following patch.  Note that the following
patch is incorrect in that dput work item is in cgroup and may lose
some of dputs when multiples css's are released back-to-back, and
__css_put() triggers check_for_release() when refcnt reaches 0 instead
of 1; however, it shows what part can be removed.

  http://thread.gmane.org/gmane.linux.kernel.containers/22559/focus=75251

Note that, in not-too-distant future, cgroup core will start emitting
warning messages for subsys which require the old behavior, so please
get moving.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
So, memcg can keep the current behavior for now w/
__DEPRECATED_clear_css_refs set but I'll keep increasing the
pressure. ;P

Thanks.

 include/linux/cgroup.h |   17 +++++++++++
 kernel/cgroup.c        |   71 +++++++++++++++++++++++++++++++++++++++++------
 mm/memcontrol.c        |    1 +
 3 files changed, 80 insertions(+), 9 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index be81faf..565c803 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -16,6 +16,7 @@
 #include <linux/prio_heap.h>
 #include <linux/rwsem.h>
 #include <linux/idr.h>
+#include <linux/workqueue.h>
 
 #ifdef CONFIG_CGROUPS
 
@@ -76,12 +77,16 @@ struct cgroup_subsys_state {
 	unsigned long flags;
 	/* ID for this css, if possible */
 	struct css_id __rcu *id;
+
+	/* Used to put @cgroup->dentry on the last css_put() */
+	struct work_struct dput_work;
 };
 
 /* bits in struct cgroup_subsys_state flags field */
 enum {
 	CSS_ROOT, /* This CSS is the root of the subsystem */
 	CSS_REMOVED, /* This CSS is dead */
+	CSS_CLEAR_CSS_REFS,		/* @ss->__DEPRECATED_clear_css_refs */
 };
 
 /* Caller must verify that the css is not for root cgroup */
@@ -480,6 +485,18 @@ struct cgroup_subsys {
 	 * (not available in early_init time.)
 	 */
 	bool use_id;
+
+	/*
+	 * If %true, cgroup removal will try to clear css refs by retrying
+	 * ss->pre_destroy() until there's no css ref left.  This behavior
+	 * is strictly for backward compatibility and will be removed as
+	 * soon as the current user (memcg) is updated.
+	 *
+	 * If %false, ss->pre_destroy() can't fail and cgroup removal won't
+	 * wait for css refs to drop to zero before proceeding.
+	 */
+	bool __DEPRECATED_clear_css_refs;
+
 #define MAX_CGROUP_TYPE_NAMELEN 32
 	const char *name;
 
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 2eade51..2905977 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -854,12 +854,17 @@ static int cgroup_call_pre_destroy(struct cgroup *cgrp)
 	struct cgroup_subsys *ss;
 	int ret = 0;
 
-	for_each_subsys(cgrp->root, ss)
-		if (ss->pre_destroy) {
-			ret = ss->pre_destroy(cgrp);
-			if (ret)
-				break;
+	for_each_subsys(cgrp->root, ss) {
+		if (!ss->pre_destroy)
+			continue;
+
+		ret = ss->pre_destroy(cgrp);
+		if (ret) {
+			/* ->pre_destroy() failure is being deprecated */
+			WARN_ON_ONCE(!ss->__DEPRECATED_clear_css_refs);
+			break;
 		}
+	}
 
 	return ret;
 }
@@ -3859,6 +3864,14 @@ static int cgroup_populate_dir(struct cgroup *cgrp)
 	return 0;
 }
 
+static void css_dput_fn(struct work_struct *work)
+{
+	struct cgroup_subsys_state *css =
+		container_of(work, struct cgroup_subsys_state, dput_work);
+
+	dput(css->cgroup->dentry);
+}
+
 static void init_cgroup_css(struct cgroup_subsys_state *css,
 			       struct cgroup_subsys *ss,
 			       struct cgroup *cgrp)
@@ -3871,6 +3884,16 @@ static void init_cgroup_css(struct cgroup_subsys_state *css,
 		set_bit(CSS_ROOT, &css->flags);
 	BUG_ON(cgrp->subsys[ss->subsys_id]);
 	cgrp->subsys[ss->subsys_id] = css;
+
+	/*
+	 * If !clear_css_refs, css holds an extra ref to @cgrp->dentry
+	 * which is put on the last css_put().  dput() requires process
+	 * context, which css_put() may be called without.  @css->dput_work
+	 * will be used to invoke dput() asynchronously from css_put().
+	 */
+	INIT_WORK(&css->dput_work, css_dput_fn);
+	if (ss->__DEPRECATED_clear_css_refs)
+		set_bit(CSS_CLEAR_CSS_REFS, &css->flags);
 }
 
 static void cgroup_lock_hierarchy(struct cgroupfs_root *root)
@@ -3973,6 +3996,11 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
 	if (err < 0)
 		goto err_remove;
 
+	/* If !clear_css_refs, each css holds a ref to the cgroup's dentry */
+	for_each_subsys(root, ss)
+		if (!ss->__DEPRECATED_clear_css_refs)
+			dget(dentry);
+
 	/* The cgroup directory was pre-locked for us */
 	BUG_ON(!mutex_is_locked(&cgrp->dentry->d_inode->i_mutex));
 
@@ -4062,8 +4090,24 @@ static int cgroup_has_css_refs(struct cgroup *cgrp)
  * Atomically mark all (or else none) of the cgroup's CSS objects as
  * CSS_REMOVED. Return true on success, or false if the cgroup has
  * busy subsystems. Call with cgroup_mutex held
+ *
+ * Depending on whether a subsys has __DEPRECATED_clear_css_refs set or
+ * not, cgroup removal behaves differently.
+ *
+ * If clear is set, css refcnt for the subsystem should be zero before
+ * cgroup removal can be committed.  This is implemented by
+ * CGRP_WAIT_ON_RMDIR and retry logic around ->pre_destroy(), which may be
+ * called multiple times until all css refcnts reach zero and is allowed to
+ * veto removal on any invocation.  This behavior is deprecated and will be
+ * removed as soon as the existing user (memcg) is updated.
+ *
+ * If clear is not set, each css holds an extra reference to the cgroup's
+ * dentry and cgroup removal proceeds regardless of css refs.
+ * ->pre_destroy() will be called at least once and is not allowed to fail.
+ * On the last put of each css, whenever that may be, the extra dentry ref
+ * is put so that dentry destruction happens only after all css's are
+ * released.
  */
-
 static int cgroup_clear_css_refs(struct cgroup *cgrp)
 {
 	struct cgroup_subsys *ss;
@@ -4074,14 +4118,17 @@ static int cgroup_clear_css_refs(struct cgroup *cgrp)
 
 	/*
 	 * Block new css_tryget() by deactivating refcnt.  If all refcnts
-	 * were 1 at the moment of deactivation, we succeeded.
+	 * for subsystems w/ clear_css_refs set were 1 at the moment of
+	 * deactivation, we succeeded.
 	 */
 	for_each_subsys(cgrp->root, ss) {
 		struct cgroup_subsys_state *css = cgrp->subsys[ss->subsys_id];
 
 		WARN_ON(atomic_read(&css->refcnt) < 0);
 		atomic_add(CSS_DEACT_BIAS, &css->refcnt);
-		failed |= css_refcnt(css) != 1;
+
+		if (ss->__DEPRECATED_clear_css_refs)
+			failed |= css_refcnt(css) != 1;
 	}
 
 	/*
@@ -4917,12 +4964,18 @@ void __css_put(struct cgroup_subsys_state *css)
 
 	rcu_read_lock();
 	atomic_dec(&css->refcnt);
-	if (css_refcnt(css) == 1) {
+	switch (css_refcnt(css)) {
+	case 1:
 		if (notify_on_release(cgrp)) {
 			set_bit(CGRP_RELEASABLE, &cgrp->flags);
 			check_for_release(cgrp);
 		}
 		cgroup_wakeup_rmdir_waiter(cgrp);
+		break;
+	case 0:
+		if (!test_bit(CSS_CLEAR_CSS_REFS, &css->flags))
+			schedule_work(&css->dput_work);
+		break;
 	}
 	rcu_read_unlock();
 }
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index bef1142..d28359c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5635,6 +5635,7 @@ struct cgroup_subsys mem_cgroup_subsys = {
 	.base_cftypes = mem_cgroup_files,
 	.early_init = 0,
 	.use_id = 1,
+	.__DEPRECATED_clear_css_refs = true,
 };
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] cgroup: make css->refcnt clearing on cgroup removal optional
       [not found]     ` <20120330223606.GK28934-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-03-31 21:28       ` Hiroyuki Kamezawa
  2012-04-01  2:00       ` Li Zefan
  2012-04-01 19:48         ` Vivek Goyal
  2 siblings, 0 replies; 33+ messages in thread
From: Hiroyuki Kamezawa @ 2012-03-31 21:28 UTC (permalink / raw)
  To: Tejun Heo
  Cc: ctalbott-hpIqsD4AKlfQT0dZR+AlfA, rni-hpIqsD4AKlfQT0dZR+AlfA,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Michal Hocko,
	Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA, Vivek Goyal

2012年3月31日7:36 Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>:
> Currently, cgroup removal tries to drain all css references.  If there
> are active css references, the removal logic waits and retries
> ->pre_detroy() until either all refs drop to zero or removal is
> cancelled.
>
> This semantics is unusual and adds non-trivial complexity to cgroup
> core and IMHO is fundamentally misguided in that it couples internal
> implementation details (references to internal data structure) with
> externally visible operation (rmdir).  To userland, this is a behavior
> peculiarity which is unnecessary and difficult to expect (css refs is
> otherwise invisible from userland), and, to policy implementations,
> this is an unnecessary restriction (e.g. blkcg wants to hold css refs
> for caching purposes but can't as that becomes visible as rmdir hang).
>
> Unfortunately, memcg currently depends on ->pre_destroy() retrials and
> cgroup removal vetoing and can't be immmediately switched to the new
> behavior.  This patch introduces the new behavior of not waiting for
> css refs to drain and maintains the old behavior for subsystems which
> have __DEPRECATED_clear_css_refs set.
>
> Once, memcg is updated, we can drop the code paths for the old
> behavior as proposed in the following patch.  Note that the following
> patch is incorrect in that dput work item is in cgroup and may lose
> some of dputs when multiples css's are released back-to-back, and
> __css_put() triggers check_for_release() when refcnt reaches 0 instead
> of 1; however, it shows what part can be removed.
>
>  http://thread.gmane.org/gmane.linux.kernel.containers/22559/focus=75251
>
> Note that, in not-too-distant future, cgroup core will start emitting
> warning messages for subsys which require the old behavior, so please
> get moving.
>
Ok, I'll update memcg ASAP (after LSF/MM).
I'm now thinking to move all resources to root if use_hierarchy=0.

Thanks,
-Kame

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] cgroup: make css->refcnt clearing on cgroup removal optional
       [not found]     ` <20120330223606.GK28934-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-03-31 21:28       ` Hiroyuki Kamezawa
  2012-04-01  2:00       ` Li Zefan
  2012-04-01 19:48         ` Vivek Goyal
  2 siblings, 0 replies; 33+ messages in thread
From: Hiroyuki Kamezawa @ 2012-03-31 21:28 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, containers, cgroups, linux-kernel, fweisbec, rni,
	ctalbott, Vivek Goyal, Johannes Weiner, Michal Hocko,
	Balbir Singh, KAMEZAWA Hiroyuki

2012年3月31日7:36 Tejun Heo <tj@kernel.org>:
> Currently, cgroup removal tries to drain all css references.  If there
> are active css references, the removal logic waits and retries
> ->pre_detroy() until either all refs drop to zero or removal is
> cancelled.
>
> This semantics is unusual and adds non-trivial complexity to cgroup
> core and IMHO is fundamentally misguided in that it couples internal
> implementation details (references to internal data structure) with
> externally visible operation (rmdir).  To userland, this is a behavior
> peculiarity which is unnecessary and difficult to expect (css refs is
> otherwise invisible from userland), and, to policy implementations,
> this is an unnecessary restriction (e.g. blkcg wants to hold css refs
> for caching purposes but can't as that becomes visible as rmdir hang).
>
> Unfortunately, memcg currently depends on ->pre_destroy() retrials and
> cgroup removal vetoing and can't be immmediately switched to the new
> behavior.  This patch introduces the new behavior of not waiting for
> css refs to drain and maintains the old behavior for subsystems which
> have __DEPRECATED_clear_css_refs set.
>
> Once, memcg is updated, we can drop the code paths for the old
> behavior as proposed in the following patch.  Note that the following
> patch is incorrect in that dput work item is in cgroup and may lose
> some of dputs when multiples css's are released back-to-back, and
> __css_put() triggers check_for_release() when refcnt reaches 0 instead
> of 1; however, it shows what part can be removed.
>
>  http://thread.gmane.org/gmane.linux.kernel.containers/22559/focus=75251
>
> Note that, in not-too-distant future, cgroup core will start emitting
> warning messages for subsys which require the old behavior, so please
> get moving.
>
Ok, I'll update memcg ASAP (after LSF/MM).
I'm now thinking to move all resources to root if use_hierarchy=0.

Thanks,
-Kame

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] cgroup: make css->refcnt clearing on cgroup removal optional
@ 2012-03-31 21:28       ` Hiroyuki Kamezawa
  0 siblings, 0 replies; 33+ messages in thread
From: Hiroyuki Kamezawa @ 2012-03-31 21:28 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w, rni-hpIqsD4AKlfQT0dZR+AlfA,
	ctalbott-hpIqsD4AKlfQT0dZR+AlfA, Vivek Goyal, Johannes Weiner,
	Michal Hocko, Balbir Singh, KAMEZAWA Hiroyuki

2012年3月31日7:36 Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>:
> Currently, cgroup removal tries to drain all css references.  If there
> are active css references, the removal logic waits and retries
> ->pre_detroy() until either all refs drop to zero or removal is
> cancelled.
>
> This semantics is unusual and adds non-trivial complexity to cgroup
> core and IMHO is fundamentally misguided in that it couples internal
> implementation details (references to internal data structure) with
> externally visible operation (rmdir).  To userland, this is a behavior
> peculiarity which is unnecessary and difficult to expect (css refs is
> otherwise invisible from userland), and, to policy implementations,
> this is an unnecessary restriction (e.g. blkcg wants to hold css refs
> for caching purposes but can't as that becomes visible as rmdir hang).
>
> Unfortunately, memcg currently depends on ->pre_destroy() retrials and
> cgroup removal vetoing and can't be immmediately switched to the new
> behavior.  This patch introduces the new behavior of not waiting for
> css refs to drain and maintains the old behavior for subsystems which
> have __DEPRECATED_clear_css_refs set.
>
> Once, memcg is updated, we can drop the code paths for the old
> behavior as proposed in the following patch.  Note that the following
> patch is incorrect in that dput work item is in cgroup and may lose
> some of dputs when multiples css's are released back-to-back, and
> __css_put() triggers check_for_release() when refcnt reaches 0 instead
> of 1; however, it shows what part can be removed.
>
>  http://thread.gmane.org/gmane.linux.kernel.containers/22559/focus=75251
>
> Note that, in not-too-distant future, cgroup core will start emitting
> warning messages for subsys which require the old behavior, so please
> get moving.
>
Ok, I'll update memcg ASAP (after LSF/MM).
I'm now thinking to move all resources to root if use_hierarchy=0.

Thanks,
-Kame

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] cgroup: make css->refcnt clearing on cgroup removal optional
       [not found]     ` <20120330223606.GK28934-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  2012-03-31 21:28       ` Hiroyuki Kamezawa
@ 2012-04-01  2:00       ` Li Zefan
  2012-04-01 19:48         ` Vivek Goyal
  2 siblings, 0 replies; 33+ messages in thread
From: Li Zefan @ 2012-04-01  2:00 UTC (permalink / raw)
  To: Tejun Heo
  Cc: ctalbott-hpIqsD4AKlfQT0dZR+AlfA, rni-hpIqsD4AKlfQT0dZR+AlfA,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Michal Hocko,
	Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA, Vivek Goyal

Tejun Heo wrote:

> Currently, cgroup removal tries to drain all css references.  If there
> are active css references, the removal logic waits and retries
> ->pre_detroy() until either all refs drop to zero or removal is
> cancelled.
> 


I don't like this either.

> This semantics is unusual and adds non-trivial complexity to cgroup
> core and IMHO is fundamentally misguided in that it couples internal
> implementation details (references to internal data structure) with
> externally visible operation (rmdir).  To userland, this is a behavior
> peculiarity which is unnecessary and difficult to expect (css refs is
> otherwise invisible from userland), and, to policy implementations,
> this is an unnecessary restriction (e.g. blkcg wants to hold css refs
> for caching purposes but can't as that becomes visible as rmdir hang).
> 
> Unfortunately, memcg currently depends on ->pre_destroy() retrials and
> cgroup removal vetoing and can't be immmediately switched to the new
> behavior.  This patch introduces the new behavior of not waiting for
> css refs to drain and maintains the old behavior for subsystems which
> have __DEPRECATED_clear_css_refs set.
> 
> Once, memcg is updated, we can drop the code paths for the old
> behavior as proposed in the following patch.  Note that the following
> patch is incorrect in that dput work item is in cgroup and may lose
> some of dputs when multiples css's are released back-to-back, and
> __css_put() triggers check_for_release() when refcnt reaches 0 instead
> of 1; however, it shows what part can be removed.
> 
>   http://thread.gmane.org/gmane.linux.kernel.containers/22559/focus=75251
> 
> Note that, in not-too-distant future, cgroup core will start emitting
> warning messages for subsys which require the old behavior, so please
> get moving.
> 
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>


Both patches look good.

Acked-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

and I'd like to see code shrink with memcg updates ASAP.

> Cc: Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
> Cc: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
> Cc: Balbir Singh <bsingharora-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] cgroup: make css->refcnt clearing on cgroup removal optional
       [not found]     ` <20120330223606.GK28934-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-04-01  2:00       ` Li Zefan
  2012-04-01  2:00       ` Li Zefan
  2012-04-01 19:48         ` Vivek Goyal
  2 siblings, 0 replies; 33+ messages in thread
From: Li Zefan @ 2012-04-01  2:00 UTC (permalink / raw)
  To: Tejun Heo
  Cc: containers, cgroups, linux-kernel, fweisbec, rni, ctalbott,
	Vivek Goyal, Johannes Weiner, Michal Hocko, Balbir Singh,
	KAMEZAWA Hiroyuki

Tejun Heo wrote:

> Currently, cgroup removal tries to drain all css references.  If there
> are active css references, the removal logic waits and retries
> ->pre_detroy() until either all refs drop to zero or removal is
> cancelled.
> 


I don't like this either.

> This semantics is unusual and adds non-trivial complexity to cgroup
> core and IMHO is fundamentally misguided in that it couples internal
> implementation details (references to internal data structure) with
> externally visible operation (rmdir).  To userland, this is a behavior
> peculiarity which is unnecessary and difficult to expect (css refs is
> otherwise invisible from userland), and, to policy implementations,
> this is an unnecessary restriction (e.g. blkcg wants to hold css refs
> for caching purposes but can't as that becomes visible as rmdir hang).
> 
> Unfortunately, memcg currently depends on ->pre_destroy() retrials and
> cgroup removal vetoing and can't be immmediately switched to the new
> behavior.  This patch introduces the new behavior of not waiting for
> css refs to drain and maintains the old behavior for subsystems which
> have __DEPRECATED_clear_css_refs set.
> 
> Once, memcg is updated, we can drop the code paths for the old
> behavior as proposed in the following patch.  Note that the following
> patch is incorrect in that dput work item is in cgroup and may lose
> some of dputs when multiples css's are released back-to-back, and
> __css_put() triggers check_for_release() when refcnt reaches 0 instead
> of 1; however, it shows what part can be removed.
> 
>   http://thread.gmane.org/gmane.linux.kernel.containers/22559/focus=75251
> 
> Note that, in not-too-distant future, cgroup core will start emitting
> warning messages for subsys which require the old behavior, so please
> get moving.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>


Both patches look good.

Acked-by: Li Zefan <lizefan@huawei.com>

and I'd like to see code shrink with memcg updates ASAP.

> Cc: Vivek Goyal <vgoyal@redhat.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@suse.cz>
> Cc: Balbir Singh <bsingharora@gmail.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] cgroup: make css->refcnt clearing on cgroup removal optional
@ 2012-04-01  2:00       ` Li Zefan
  0 siblings, 0 replies; 33+ messages in thread
From: Li Zefan @ 2012-04-01  2:00 UTC (permalink / raw)
  To: Tejun Heo
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w, rni-hpIqsD4AKlfQT0dZR+AlfA,
	ctalbott-hpIqsD4AKlfQT0dZR+AlfA, Vivek Goyal, Johannes Weiner,
	Michal Hocko, Balbir Singh, KAMEZAWA Hiroyuki

Tejun Heo wrote:

> Currently, cgroup removal tries to drain all css references.  If there
> are active css references, the removal logic waits and retries
> ->pre_detroy() until either all refs drop to zero or removal is
> cancelled.
> 


I don't like this either.

> This semantics is unusual and adds non-trivial complexity to cgroup
> core and IMHO is fundamentally misguided in that it couples internal
> implementation details (references to internal data structure) with
> externally visible operation (rmdir).  To userland, this is a behavior
> peculiarity which is unnecessary and difficult to expect (css refs is
> otherwise invisible from userland), and, to policy implementations,
> this is an unnecessary restriction (e.g. blkcg wants to hold css refs
> for caching purposes but can't as that becomes visible as rmdir hang).
> 
> Unfortunately, memcg currently depends on ->pre_destroy() retrials and
> cgroup removal vetoing and can't be immmediately switched to the new
> behavior.  This patch introduces the new behavior of not waiting for
> css refs to drain and maintains the old behavior for subsystems which
> have __DEPRECATED_clear_css_refs set.
> 
> Once, memcg is updated, we can drop the code paths for the old
> behavior as proposed in the following patch.  Note that the following
> patch is incorrect in that dput work item is in cgroup and may lose
> some of dputs when multiples css's are released back-to-back, and
> __css_put() triggers check_for_release() when refcnt reaches 0 instead
> of 1; however, it shows what part can be removed.
> 
>   http://thread.gmane.org/gmane.linux.kernel.containers/22559/focus=75251
> 
> Note that, in not-too-distant future, cgroup core will start emitting
> warning messages for subsys which require the old behavior, so please
> get moving.
> 
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>


Both patches look good.

Acked-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

and I'd like to see code shrink with memcg updates ASAP.

> Cc: Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
> Cc: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
> Cc: Balbir Singh <bsingharora-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] cgroup: make css->refcnt clearing on cgroup removal optional
       [not found]       ` <4F77B6BA.4070207-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2012-04-01 18:54         ` Tejun Heo
  0 siblings, 0 replies; 33+ messages in thread
From: Tejun Heo @ 2012-04-01 18:54 UTC (permalink / raw)
  To: Li Zefan
  Cc: ctalbott-hpIqsD4AKlfQT0dZR+AlfA, rni-hpIqsD4AKlfQT0dZR+AlfA,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Michal Hocko,
	Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA, Vivek Goyal

Hello, Li.

On Sun, Apr 01, 2012 at 10:00:26AM +0800, Li Zefan wrote:
> > Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> 
> Both patches look good.
> 
> Acked-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

Alright, adding the patches to cgroup/for-3.5.

> and I'd like to see code shrink with memcg updates ASAP.

Amen. :)

-- 
tejun

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] cgroup: make css->refcnt clearing on cgroup removal optional
       [not found]       ` <4F77B6BA.4070207-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2012-04-01 18:54         ` Tejun Heo
  0 siblings, 0 replies; 33+ messages in thread
From: Tejun Heo @ 2012-04-01 18:54 UTC (permalink / raw)
  To: Li Zefan
  Cc: containers, cgroups, linux-kernel, fweisbec, rni, ctalbott,
	Vivek Goyal, Johannes Weiner, Michal Hocko, Balbir Singh,
	KAMEZAWA Hiroyuki

Hello, Li.

On Sun, Apr 01, 2012 at 10:00:26AM +0800, Li Zefan wrote:
> > Signed-off-by: Tejun Heo <tj@kernel.org>
> 
> Both patches look good.
> 
> Acked-by: Li Zefan <lizefan@huawei.com>

Alright, adding the patches to cgroup/for-3.5.

> and I'd like to see code shrink with memcg updates ASAP.

Amen. :)

-- 
tejun

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] cgroup: make css->refcnt clearing on cgroup removal optional
@ 2012-04-01 18:54         ` Tejun Heo
  0 siblings, 0 replies; 33+ messages in thread
From: Tejun Heo @ 2012-04-01 18:54 UTC (permalink / raw)
  To: Li Zefan
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w, rni-hpIqsD4AKlfQT0dZR+AlfA,
	ctalbott-hpIqsD4AKlfQT0dZR+AlfA, Vivek Goyal, Johannes Weiner,
	Michal Hocko, Balbir Singh, KAMEZAWA Hiroyuki

Hello, Li.

On Sun, Apr 01, 2012 at 10:00:26AM +0800, Li Zefan wrote:
> > Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> 
> Both patches look good.
> 
> Acked-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

Alright, adding the patches to cgroup/for-3.5.

> and I'd like to see code shrink with memcg updates ASAP.

Amen. :)

-- 
tejun

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] cgroup: make css->refcnt clearing on cgroup removal optional
  2012-03-30 22:36     ` Tejun Heo
@ 2012-04-01 19:48         ` Vivek Goyal
  -1 siblings, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2012-04-01 19:48 UTC (permalink / raw)
  To: Tejun Heo
  Cc: ctalbott-hpIqsD4AKlfQT0dZR+AlfA, rni-hpIqsD4AKlfQT0dZR+AlfA,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Michal Hocko,
	Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA

On Fri, Mar 30, 2012 at 03:36:06PM -0700, Tejun Heo wrote:
> Currently, cgroup removal tries to drain all css references.  If there
> are active css references, the removal logic waits and retries
> ->pre_detroy() until either all refs drop to zero or removal is
> cancelled.
> 
> This semantics is unusual and adds non-trivial complexity to cgroup
> core and IMHO is fundamentally misguided in that it couples internal
> implementation details (references to internal data structure) with
> externally visible operation (rmdir).  To userland, this is a behavior
> peculiarity which is unnecessary and difficult to expect (css refs is
> otherwise invisible from userland), and, to policy implementations,
> this is an unnecessary restriction (e.g. blkcg wants to hold css refs
> for caching purposes but can't as that becomes visible as rmdir hang).

Good to see this patch. Now annyoying rmdir hang with blkcg should be
gone.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] cgroup: make css->refcnt clearing on cgroup removal optional
@ 2012-04-01 19:48         ` Vivek Goyal
  0 siblings, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2012-04-01 19:48 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, containers, cgroups, linux-kernel, fweisbec, rni,
	ctalbott, Johannes Weiner, Michal Hocko, Balbir Singh,
	KAMEZAWA Hiroyuki

On Fri, Mar 30, 2012 at 03:36:06PM -0700, Tejun Heo wrote:
> Currently, cgroup removal tries to drain all css references.  If there
> are active css references, the removal logic waits and retries
> ->pre_detroy() until either all refs drop to zero or removal is
> cancelled.
> 
> This semantics is unusual and adds non-trivial complexity to cgroup
> core and IMHO is fundamentally misguided in that it couples internal
> implementation details (references to internal data structure) with
> externally visible operation (rmdir).  To userland, this is a behavior
> peculiarity which is unnecessary and difficult to expect (css refs is
> otherwise invisible from userland), and, to policy implementations,
> this is an unnecessary restriction (e.g. blkcg wants to hold css refs
> for caching purposes but can't as that becomes visible as rmdir hang).

Good to see this patch. Now annyoying rmdir hang with blkcg should be
gone.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] cgroup: make css->refcnt clearing on cgroup removal optional
  2012-04-01 18:54         ` Tejun Heo
@ 2012-05-16 22:33             ` Sasha Levin
  -1 siblings, 0 replies; 33+ messages in thread
From: Sasha Levin @ 2012-05-16 22:33 UTC (permalink / raw)
  To: Tejun Heo
  Cc: ctalbott-hpIqsD4AKlfQT0dZR+AlfA, rni-hpIqsD4AKlfQT0dZR+AlfA,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Michal Hocko,
	Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA, Vivek Goyal

Hi Tejun,

On Sun, Apr 1, 2012 at 8:54 PM, Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> Hello, Li.
>
> On Sun, Apr 01, 2012 at 10:00:26AM +0800, Li Zefan wrote:
>> > Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
>>
>> Both patches look good.
>>
>> Acked-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
>
> Alright, adding the patches to cgroup/for-3.5.

This patch allows for race when removing a cgroup since one of the
css's may still have a dentry ref when the cgroup is removed, no? Is
there a plan to deal with that before this patch gets pushed into 3.5?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] cgroup: make css->refcnt clearing on cgroup removal optional
@ 2012-05-16 22:33             ` Sasha Levin
  0 siblings, 0 replies; 33+ messages in thread
From: Sasha Levin @ 2012-05-16 22:33 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, containers, cgroups, linux-kernel, fweisbec, rni,
	ctalbott, Vivek Goyal, Johannes Weiner, Michal Hocko,
	Balbir Singh, KAMEZAWA Hiroyuki

Hi Tejun,

On Sun, Apr 1, 2012 at 8:54 PM, Tejun Heo <tj@kernel.org> wrote:
> Hello, Li.
>
> On Sun, Apr 01, 2012 at 10:00:26AM +0800, Li Zefan wrote:
>> > Signed-off-by: Tejun Heo <tj@kernel.org>
>>
>> Both patches look good.
>>
>> Acked-by: Li Zefan <lizefan@huawei.com>
>
> Alright, adding the patches to cgroup/for-3.5.

This patch allows for race when removing a cgroup since one of the
css's may still have a dentry ref when the cgroup is removed, no? Is
there a plan to deal with that before this patch gets pushed into 3.5?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] cgroup: make css->refcnt clearing on cgroup removal optional
  2012-05-16 22:33             ` Sasha Levin
@ 2012-05-18 17:55                 ` Tejun Heo
  -1 siblings, 0 replies; 33+ messages in thread
From: Tejun Heo @ 2012-05-18 17:55 UTC (permalink / raw)
  To: Sasha Levin
  Cc: ctalbott-hpIqsD4AKlfQT0dZR+AlfA, rni-hpIqsD4AKlfQT0dZR+AlfA,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Michal Hocko,
	Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA, Vivek Goyal

On Thu, May 17, 2012 at 12:33:05AM +0200, Sasha Levin wrote:
> Hi Tejun,
> 
> On Sun, Apr 1, 2012 at 8:54 PM, Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> > Hello, Li.
> >
> > On Sun, Apr 01, 2012 at 10:00:26AM +0800, Li Zefan wrote:
> >> > Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> >>
> >> Both patches look good.
> >>
> >> Acked-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> >
> > Alright, adding the patches to cgroup/for-3.5.
> 
> This patch allows for race when removing a cgroup since one of the
> css's may still have a dentry ref when the cgroup is removed, no? Is
> there a plan to deal with that before this patch gets pushed into 3.5?

Can you please elaborate?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] cgroup: make css->refcnt clearing on cgroup removal optional
@ 2012-05-18 17:55                 ` Tejun Heo
  0 siblings, 0 replies; 33+ messages in thread
From: Tejun Heo @ 2012-05-18 17:55 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Li Zefan, containers, cgroups, linux-kernel, fweisbec, rni,
	ctalbott, Vivek Goyal, Johannes Weiner, Michal Hocko,
	Balbir Singh, KAMEZAWA Hiroyuki

On Thu, May 17, 2012 at 12:33:05AM +0200, Sasha Levin wrote:
> Hi Tejun,
> 
> On Sun, Apr 1, 2012 at 8:54 PM, Tejun Heo <tj@kernel.org> wrote:
> > Hello, Li.
> >
> > On Sun, Apr 01, 2012 at 10:00:26AM +0800, Li Zefan wrote:
> >> > Signed-off-by: Tejun Heo <tj@kernel.org>
> >>
> >> Both patches look good.
> >>
> >> Acked-by: Li Zefan <lizefan@huawei.com>
> >
> > Alright, adding the patches to cgroup/for-3.5.
> 
> This patch allows for race when removing a cgroup since one of the
> css's may still have a dentry ref when the cgroup is removed, no? Is
> there a plan to deal with that before this patch gets pushed into 3.5?

Can you please elaborate?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] cgroup: make css->refcnt clearing on cgroup removal optional
       [not found]                 ` <20120518175548.GM19388-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-05-18 18:28                   ` Sasha Levin
  0 siblings, 0 replies; 33+ messages in thread
From: Sasha Levin @ 2012-05-18 18:28 UTC (permalink / raw)
  To: Tejun Heo
  Cc: ctalbott-hpIqsD4AKlfQT0dZR+AlfA, rni-hpIqsD4AKlfQT0dZR+AlfA,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Michal Hocko,
	Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA, Vivek Goyal

On Fri, May 18, 2012 at 7:55 PM, Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> On Thu, May 17, 2012 at 12:33:05AM +0200, Sasha Levin wrote:
>> Hi Tejun,
>>
>> On Sun, Apr 1, 2012 at 8:54 PM, Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>> > Hello, Li.
>> >
>> > On Sun, Apr 01, 2012 at 10:00:26AM +0800, Li Zefan wrote:
>> >> > Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
>> >>
>> >> Both patches look good.
>> >>
>> >> Acked-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
>> >
>> > Alright, adding the patches to cgroup/for-3.5.
>>
>> This patch allows for race when removing a cgroup since one of the
>> css's may still have a dentry ref when the cgroup is removed, no? Is
>> there a plan to deal with that before this patch gets pushed into 3.5?
>
> Can you please elaborate?

From what I gathered, that patch means that you'll have refs to
dentries while you're trying to complete the unmount of the cgroup,
which would make something really angry.

I tested it by running two scripts to race with eachother:

First:

while [ true ];
do
mount -t cgroup dummy cgroup/
mkdir cgroup/0
rmdir cgroup/0
umount cgroup/
done

Second:

while [ true ];
do
mount -t cgroup dummy cgroup/
umount cgroup/
done

Now, if you give it a go, you'll see a BUG() and a system hang pretty fast.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] cgroup: make css->refcnt clearing on cgroup removal optional
       [not found]                 ` <20120518175548.GM19388-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-05-18 18:28                   ` Sasha Levin
  0 siblings, 0 replies; 33+ messages in thread
From: Sasha Levin @ 2012-05-18 18:28 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, containers, cgroups, linux-kernel, fweisbec, rni,
	ctalbott, Vivek Goyal, Johannes Weiner, Michal Hocko,
	Balbir Singh, KAMEZAWA Hiroyuki

On Fri, May 18, 2012 at 7:55 PM, Tejun Heo <tj@kernel.org> wrote:
> On Thu, May 17, 2012 at 12:33:05AM +0200, Sasha Levin wrote:
>> Hi Tejun,
>>
>> On Sun, Apr 1, 2012 at 8:54 PM, Tejun Heo <tj@kernel.org> wrote:
>> > Hello, Li.
>> >
>> > On Sun, Apr 01, 2012 at 10:00:26AM +0800, Li Zefan wrote:
>> >> > Signed-off-by: Tejun Heo <tj@kernel.org>
>> >>
>> >> Both patches look good.
>> >>
>> >> Acked-by: Li Zefan <lizefan@huawei.com>
>> >
>> > Alright, adding the patches to cgroup/for-3.5.
>>
>> This patch allows for race when removing a cgroup since one of the
>> css's may still have a dentry ref when the cgroup is removed, no? Is
>> there a plan to deal with that before this patch gets pushed into 3.5?
>
> Can you please elaborate?

>From what I gathered, that patch means that you'll have refs to
dentries while you're trying to complete the unmount of the cgroup,
which would make something really angry.

I tested it by running two scripts to race with eachother:

First:

while [ true ];
do
mount -t cgroup dummy cgroup/
mkdir cgroup/0
rmdir cgroup/0
umount cgroup/
done

Second:

while [ true ];
do
mount -t cgroup dummy cgroup/
umount cgroup/
done

Now, if you give it a go, you'll see a BUG() and a system hang pretty fast.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] cgroup: make css->refcnt clearing on cgroup removal optional
@ 2012-05-18 18:28                   ` Sasha Levin
  0 siblings, 0 replies; 33+ messages in thread
From: Sasha Levin @ 2012-05-18 18:28 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w, rni-hpIqsD4AKlfQT0dZR+AlfA,
	ctalbott-hpIqsD4AKlfQT0dZR+AlfA, Vivek Goyal, Johannes Weiner,
	Michal Hocko, Balbir Singh, KAMEZAWA Hiroyuki

On Fri, May 18, 2012 at 7:55 PM, Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> On Thu, May 17, 2012 at 12:33:05AM +0200, Sasha Levin wrote:
>> Hi Tejun,
>>
>> On Sun, Apr 1, 2012 at 8:54 PM, Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>> > Hello, Li.
>> >
>> > On Sun, Apr 01, 2012 at 10:00:26AM +0800, Li Zefan wrote:
>> >> > Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
>> >>
>> >> Both patches look good.
>> >>
>> >> Acked-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
>> >
>> > Alright, adding the patches to cgroup/for-3.5.
>>
>> This patch allows for race when removing a cgroup since one of the
>> css's may still have a dentry ref when the cgroup is removed, no? Is
>> there a plan to deal with that before this patch gets pushed into 3.5?
>
> Can you please elaborate?

From what I gathered, that patch means that you'll have refs to
dentries while you're trying to complete the unmount of the cgroup,
which would make something really angry.

I tested it by running two scripts to race with eachother:

First:

while [ true ];
do
mount -t cgroup dummy cgroup/
mkdir cgroup/0
rmdir cgroup/0
umount cgroup/
done

Second:

while [ true ];
do
mount -t cgroup dummy cgroup/
umount cgroup/
done

Now, if you give it a go, you'll see a BUG() and a system hang pretty fast.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] cgroup: make css->refcnt clearing on cgroup removal optional
       [not found]                   ` <CA+1xoqfr5aaGbxoX3YKrJHXTjU9fWSbX_xbuEOU=4K7kMay6XQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-05-23 22:22                     ` Tejun Heo
  0 siblings, 0 replies; 33+ messages in thread
From: Tejun Heo @ 2012-05-23 22:22 UTC (permalink / raw)
  To: Sasha Levin
  Cc: ctalbott-hpIqsD4AKlfQT0dZR+AlfA, rni-hpIqsD4AKlfQT0dZR+AlfA,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Michal Hocko,
	Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA, Vivek Goyal

Hello, Sasha.

Does the following patch fix the problem you're seeing?

Thanks.

---
 kernel/cgroup.c |   17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index ad8eae5..7cf9669 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -896,10 +896,13 @@ static void cgroup_diput(struct dentry *dentry, struct inode *inode)
 		mutex_unlock(&cgroup_mutex);
 
 		/*
-		 * Drop the active superblock reference that we took when we
-		 * created the cgroup
+		 * We want to drop the active superblock reference that we
+		 * took when we created the cgroup after all dentry refs
+		 * are gone - kill_sb gets mighty unhappy otherwise.  Set
+		 * dentry->d_fsdata to cgroup_diput() to tell
+		 * cgroup_d_release() to call deactivate_super().
 		 */
-		deactivate_super(cgrp->root->sb);
+		dentry->d_fsdata = cgroup_diput;
 
 		/*
 		 * if we're getting rid of the cgroup, refcount should ensure
@@ -925,6 +928,13 @@ static int cgroup_delete(const struct dentry *d)
 	return 1;
 }
 
+static void cgroup_d_release(struct dentry *dentry)
+{
+	/* did cgroup_diput() tell me to deactivate super? */
+	if (dentry->d_fsdata == cgroup_diput)
+		deactivate_super(dentry->d_sb);
+}
+
 static void remove_dir(struct dentry *d)
 {
 	struct dentry *parent = dget(d->d_parent);
@@ -1532,6 +1542,7 @@ static int cgroup_get_rootdir(struct super_block *sb)
 	static const struct dentry_operations cgroup_dops = {
 		.d_iput = cgroup_diput,
 		.d_delete = cgroup_delete,
+		.d_release = cgroup_d_release,
 	};
 
 	struct inode *inode =

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] cgroup: make css->refcnt clearing on cgroup removal optional
       [not found]                   ` <CA+1xoqfr5aaGbxoX3YKrJHXTjU9fWSbX_xbuEOU=4K7kMay6XQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-05-23 22:22                     ` Tejun Heo
  0 siblings, 0 replies; 33+ messages in thread
From: Tejun Heo @ 2012-05-23 22:22 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Li Zefan, containers, cgroups, linux-kernel, fweisbec, rni,
	ctalbott, Vivek Goyal, Johannes Weiner, Michal Hocko,
	Balbir Singh, KAMEZAWA Hiroyuki

Hello, Sasha.

Does the following patch fix the problem you're seeing?

Thanks.

---
 kernel/cgroup.c |   17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index ad8eae5..7cf9669 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -896,10 +896,13 @@ static void cgroup_diput(struct dentry *dentry, struct inode *inode)
 		mutex_unlock(&cgroup_mutex);
 
 		/*
-		 * Drop the active superblock reference that we took when we
-		 * created the cgroup
+		 * We want to drop the active superblock reference that we
+		 * took when we created the cgroup after all dentry refs
+		 * are gone - kill_sb gets mighty unhappy otherwise.  Set
+		 * dentry->d_fsdata to cgroup_diput() to tell
+		 * cgroup_d_release() to call deactivate_super().
 		 */
-		deactivate_super(cgrp->root->sb);
+		dentry->d_fsdata = cgroup_diput;
 
 		/*
 		 * if we're getting rid of the cgroup, refcount should ensure
@@ -925,6 +928,13 @@ static int cgroup_delete(const struct dentry *d)
 	return 1;
 }
 
+static void cgroup_d_release(struct dentry *dentry)
+{
+	/* did cgroup_diput() tell me to deactivate super? */
+	if (dentry->d_fsdata == cgroup_diput)
+		deactivate_super(dentry->d_sb);
+}
+
 static void remove_dir(struct dentry *d)
 {
 	struct dentry *parent = dget(d->d_parent);
@@ -1532,6 +1542,7 @@ static int cgroup_get_rootdir(struct super_block *sb)
 	static const struct dentry_operations cgroup_dops = {
 		.d_iput = cgroup_diput,
 		.d_delete = cgroup_delete,
+		.d_release = cgroup_d_release,
 	};
 
 	struct inode *inode =

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] cgroup: make css->refcnt clearing on cgroup removal optional
@ 2012-05-23 22:22                     ` Tejun Heo
  0 siblings, 0 replies; 33+ messages in thread
From: Tejun Heo @ 2012-05-23 22:22 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Li Zefan, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w, rni-hpIqsD4AKlfQT0dZR+AlfA,
	ctalbott-hpIqsD4AKlfQT0dZR+AlfA, Vivek Goyal, Johannes Weiner,
	Michal Hocko, Balbir Singh, KAMEZAWA Hiroyuki

Hello, Sasha.

Does the following patch fix the problem you're seeing?

Thanks.

---
 kernel/cgroup.c |   17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index ad8eae5..7cf9669 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -896,10 +896,13 @@ static void cgroup_diput(struct dentry *dentry, struct inode *inode)
 		mutex_unlock(&cgroup_mutex);
 
 		/*
-		 * Drop the active superblock reference that we took when we
-		 * created the cgroup
+		 * We want to drop the active superblock reference that we
+		 * took when we created the cgroup after all dentry refs
+		 * are gone - kill_sb gets mighty unhappy otherwise.  Set
+		 * dentry->d_fsdata to cgroup_diput() to tell
+		 * cgroup_d_release() to call deactivate_super().
 		 */
-		deactivate_super(cgrp->root->sb);
+		dentry->d_fsdata = cgroup_diput;
 
 		/*
 		 * if we're getting rid of the cgroup, refcount should ensure
@@ -925,6 +928,13 @@ static int cgroup_delete(const struct dentry *d)
 	return 1;
 }
 
+static void cgroup_d_release(struct dentry *dentry)
+{
+	/* did cgroup_diput() tell me to deactivate super? */
+	if (dentry->d_fsdata == cgroup_diput)
+		deactivate_super(dentry->d_sb);
+}
+
 static void remove_dir(struct dentry *d)
 {
 	struct dentry *parent = dget(d->d_parent);
@@ -1532,6 +1542,7 @@ static int cgroup_get_rootdir(struct super_block *sb)
 	static const struct dentry_operations cgroup_dops = {
 		.d_iput = cgroup_diput,
 		.d_delete = cgroup_delete,
+		.d_release = cgroup_d_release,
 	};
 
 	struct inode *inode =

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] cgroup: make css->refcnt clearing on cgroup removal optional
       [not found]                     ` <20120523222242.GD3933-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
@ 2012-05-24 13:21                       ` Sasha Levin
  0 siblings, 0 replies; 33+ messages in thread
From: Sasha Levin @ 2012-05-24 13:21 UTC (permalink / raw)
  To: Tejun Heo
  Cc: ctalbott-hpIqsD4AKlfQT0dZR+AlfA, rni-hpIqsD4AKlfQT0dZR+AlfA,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Michal Hocko,
	Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA, Vivek Goyal

On Thu, May 24, 2012 at 12:22 AM, Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> Hello, Sasha.
>
> Does the following patch fix the problem you're seeing?
>
> Thanks.

Yup, that did the trick.

Thanks!

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] cgroup: make css->refcnt clearing on cgroup removal optional
       [not found]                     ` <20120523222242.GD3933-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
@ 2012-05-24 13:21                       ` Sasha Levin
  0 siblings, 0 replies; 33+ messages in thread
From: Sasha Levin @ 2012-05-24 13:21 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, containers, cgroups, linux-kernel, fweisbec, rni,
	ctalbott, Vivek Goyal, Johannes Weiner, Michal Hocko,
	Balbir Singh, KAMEZAWA Hiroyuki

On Thu, May 24, 2012 at 12:22 AM, Tejun Heo <tj@kernel.org> wrote:
> Hello, Sasha.
>
> Does the following patch fix the problem you're seeing?
>
> Thanks.

Yup, that did the trick.

Thanks!

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/2] cgroup: make css->refcnt clearing on cgroup removal optional
@ 2012-05-24 13:21                       ` Sasha Levin
  0 siblings, 0 replies; 33+ messages in thread
From: Sasha Levin @ 2012-05-24 13:21 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w, rni-hpIqsD4AKlfQT0dZR+AlfA,
	ctalbott-hpIqsD4AKlfQT0dZR+AlfA, Vivek Goyal, Johannes Weiner,
	Michal Hocko, Balbir Singh, KAMEZAWA Hiroyuki

On Thu, May 24, 2012 at 12:22 AM, Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> Hello, Sasha.
>
> Does the following patch fix the problem you're seeing?
>
> Thanks.

Yup, that did the trick.

Thanks!

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH] cgroup: superblock can't be released with active dentries
       [not found]                       ` <CA+1xoqdbtkC5Ue3s1qZHHJBLcj9a=djFDGny6AJJSV-pdn7mcA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-05-24 15:41                         ` Tejun Heo
  0 siblings, 0 replies; 33+ messages in thread
From: Tejun Heo @ 2012-05-24 15:41 UTC (permalink / raw)
  To: Li Zefan, Sasha Levin
  Cc: ctalbott-hpIqsD4AKlfQT0dZR+AlfA, rni-hpIqsD4AKlfQT0dZR+AlfA,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Michal Hocko,
	Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA, Vivek Goyal

From 88787c483106c5830a46d18deaffdc1e70929af7 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Date: Thu, 24 May 2012 08:24:39 -0700

48ddbe1946 "cgroup: make css->refcnt clearing on cgroup removal
optional" allowed a css to linger after the associated cgroup is
removed.  As a css holds a reference on the cgroup's dentry, it means
that cgroup dentries may linger for a while.

cgroup_create() does grab an active reference on the superblock to
prevent it from going away while there are !root cgroups; however, the
reference is put from cgroup_diput() which is invoked on cgroup
removal, so cgroup dentries which are removed but persisting due to
lingering csses already have released their superblock active refs
allowing superblock to be killed while those dentries are around.

Given the right condition, this makes cgroup_kill_sb() call
kill_litter_super() with dentries with non-zero d_count leading to
BUG() in shrink_dcache_for_umount_subtree().

Fix it by adding cgroup_dops->d_release() operation and moving
deactivate_super() to it.  cgroup_diput() now marks dentry->d_fsdata
with itself if superblock should be deactivated and cgroup_d_release()
deactivates the superblock on dentry release.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Reported-by: Sasha Levin <levinsasha928-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Tested-by: Sasha Levin <levinsasha928-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
LKML-Reference: <CA+1xoqe5hMuxzCRhMy7J0XchDk2ZnuxOHJKikROk1-ReAzcT6g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
---
Li, can you please ack this?

Thanks.

 kernel/cgroup.c |   17 ++++++++++++++---
 1 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index ad8eae5..e887b55 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -896,10 +896,13 @@ static void cgroup_diput(struct dentry *dentry, struct inode *inode)
 		mutex_unlock(&cgroup_mutex);
 
 		/*
-		 * Drop the active superblock reference that we took when we
-		 * created the cgroup
+		 * We want to drop the active superblock reference from the
+		 * cgroup creation after all the dentry refs are gone -
+		 * kill_sb gets mighty unhappy otherwise.  Mark
+		 * dentry->d_fsdata with cgroup_diput() to tell
+		 * cgroup_d_release() to call deactivate_super().
 		 */
-		deactivate_super(cgrp->root->sb);
+		dentry->d_fsdata = cgroup_diput;
 
 		/*
 		 * if we're getting rid of the cgroup, refcount should ensure
@@ -925,6 +928,13 @@ static int cgroup_delete(const struct dentry *d)
 	return 1;
 }
 
+static void cgroup_d_release(struct dentry *dentry)
+{
+	/* did cgroup_diput() tell me to deactivate super? */
+	if (dentry->d_fsdata == cgroup_diput)
+		deactivate_super(dentry->d_sb);
+}
+
 static void remove_dir(struct dentry *d)
 {
 	struct dentry *parent = dget(d->d_parent);
@@ -1532,6 +1542,7 @@ static int cgroup_get_rootdir(struct super_block *sb)
 	static const struct dentry_operations cgroup_dops = {
 		.d_iput = cgroup_diput,
 		.d_delete = cgroup_delete,
+		.d_release = cgroup_d_release,
 	};
 
 	struct inode *inode =
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH] cgroup: superblock can't be released with active dentries
       [not found]                       ` <CA+1xoqdbtkC5Ue3s1qZHHJBLcj9a=djFDGny6AJJSV-pdn7mcA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-05-24 15:41                         ` Tejun Heo
  0 siblings, 0 replies; 33+ messages in thread
From: Tejun Heo @ 2012-05-24 15:41 UTC (permalink / raw)
  To: Li Zefan, Sasha Levin
  Cc: containers, cgroups, linux-kernel, fweisbec, rni, ctalbott,
	Vivek Goyal, Johannes Weiner, Michal Hocko, Balbir Singh,
	KAMEZAWA Hiroyuki

>From 88787c483106c5830a46d18deaffdc1e70929af7 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Thu, 24 May 2012 08:24:39 -0700

48ddbe1946 "cgroup: make css->refcnt clearing on cgroup removal
optional" allowed a css to linger after the associated cgroup is
removed.  As a css holds a reference on the cgroup's dentry, it means
that cgroup dentries may linger for a while.

cgroup_create() does grab an active reference on the superblock to
prevent it from going away while there are !root cgroups; however, the
reference is put from cgroup_diput() which is invoked on cgroup
removal, so cgroup dentries which are removed but persisting due to
lingering csses already have released their superblock active refs
allowing superblock to be killed while those dentries are around.

Given the right condition, this makes cgroup_kill_sb() call
kill_litter_super() with dentries with non-zero d_count leading to
BUG() in shrink_dcache_for_umount_subtree().

Fix it by adding cgroup_dops->d_release() operation and moving
deactivate_super() to it.  cgroup_diput() now marks dentry->d_fsdata
with itself if superblock should be deactivated and cgroup_d_release()
deactivates the superblock on dentry release.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
LKML-Reference: <CA+1xoqe5hMuxzCRhMy7J0XchDk2ZnuxOHJKikROk1-ReAzcT6g@mail.gmail.com>
---
Li, can you please ack this?

Thanks.

 kernel/cgroup.c |   17 ++++++++++++++---
 1 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index ad8eae5..e887b55 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -896,10 +896,13 @@ static void cgroup_diput(struct dentry *dentry, struct inode *inode)
 		mutex_unlock(&cgroup_mutex);
 
 		/*
-		 * Drop the active superblock reference that we took when we
-		 * created the cgroup
+		 * We want to drop the active superblock reference from the
+		 * cgroup creation after all the dentry refs are gone -
+		 * kill_sb gets mighty unhappy otherwise.  Mark
+		 * dentry->d_fsdata with cgroup_diput() to tell
+		 * cgroup_d_release() to call deactivate_super().
 		 */
-		deactivate_super(cgrp->root->sb);
+		dentry->d_fsdata = cgroup_diput;
 
 		/*
 		 * if we're getting rid of the cgroup, refcount should ensure
@@ -925,6 +928,13 @@ static int cgroup_delete(const struct dentry *d)
 	return 1;
 }
 
+static void cgroup_d_release(struct dentry *dentry)
+{
+	/* did cgroup_diput() tell me to deactivate super? */
+	if (dentry->d_fsdata == cgroup_diput)
+		deactivate_super(dentry->d_sb);
+}
+
 static void remove_dir(struct dentry *d)
 {
 	struct dentry *parent = dget(d->d_parent);
@@ -1532,6 +1542,7 @@ static int cgroup_get_rootdir(struct super_block *sb)
 	static const struct dentry_operations cgroup_dops = {
 		.d_iput = cgroup_diput,
 		.d_delete = cgroup_delete,
+		.d_release = cgroup_d_release,
 	};
 
 	struct inode *inode =
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH] cgroup: superblock can't be released with active dentries
@ 2012-05-24 15:41                         ` Tejun Heo
  0 siblings, 0 replies; 33+ messages in thread
From: Tejun Heo @ 2012-05-24 15:41 UTC (permalink / raw)
  To: Li Zefan, Sasha Levin
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w, rni-hpIqsD4AKlfQT0dZR+AlfA,
	ctalbott-hpIqsD4AKlfQT0dZR+AlfA, Vivek Goyal, Johannes Weiner,
	Michal Hocko, Balbir Singh, KAMEZAWA Hiroyuki

From 88787c483106c5830a46d18deaffdc1e70929af7 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Date: Thu, 24 May 2012 08:24:39 -0700

48ddbe1946 "cgroup: make css->refcnt clearing on cgroup removal
optional" allowed a css to linger after the associated cgroup is
removed.  As a css holds a reference on the cgroup's dentry, it means
that cgroup dentries may linger for a while.

cgroup_create() does grab an active reference on the superblock to
prevent it from going away while there are !root cgroups; however, the
reference is put from cgroup_diput() which is invoked on cgroup
removal, so cgroup dentries which are removed but persisting due to
lingering csses already have released their superblock active refs
allowing superblock to be killed while those dentries are around.

Given the right condition, this makes cgroup_kill_sb() call
kill_litter_super() with dentries with non-zero d_count leading to
BUG() in shrink_dcache_for_umount_subtree().

Fix it by adding cgroup_dops->d_release() operation and moving
deactivate_super() to it.  cgroup_diput() now marks dentry->d_fsdata
with itself if superblock should be deactivated and cgroup_d_release()
deactivates the superblock on dentry release.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Reported-by: Sasha Levin <levinsasha928-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Tested-by: Sasha Levin <levinsasha928-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
LKML-Reference: <CA+1xoqe5hMuxzCRhMy7J0XchDk2ZnuxOHJKikROk1-ReAzcT6g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
---
Li, can you please ack this?

Thanks.

 kernel/cgroup.c |   17 ++++++++++++++---
 1 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index ad8eae5..e887b55 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -896,10 +896,13 @@ static void cgroup_diput(struct dentry *dentry, struct inode *inode)
 		mutex_unlock(&cgroup_mutex);
 
 		/*
-		 * Drop the active superblock reference that we took when we
-		 * created the cgroup
+		 * We want to drop the active superblock reference from the
+		 * cgroup creation after all the dentry refs are gone -
+		 * kill_sb gets mighty unhappy otherwise.  Mark
+		 * dentry->d_fsdata with cgroup_diput() to tell
+		 * cgroup_d_release() to call deactivate_super().
 		 */
-		deactivate_super(cgrp->root->sb);
+		dentry->d_fsdata = cgroup_diput;
 
 		/*
 		 * if we're getting rid of the cgroup, refcount should ensure
@@ -925,6 +928,13 @@ static int cgroup_delete(const struct dentry *d)
 	return 1;
 }
 
+static void cgroup_d_release(struct dentry *dentry)
+{
+	/* did cgroup_diput() tell me to deactivate super? */
+	if (dentry->d_fsdata == cgroup_diput)
+		deactivate_super(dentry->d_sb);
+}
+
 static void remove_dir(struct dentry *d)
 {
 	struct dentry *parent = dget(d->d_parent);
@@ -1532,6 +1542,7 @@ static int cgroup_get_rootdir(struct super_block *sb)
 	static const struct dentry_operations cgroup_dops = {
 		.d_iput = cgroup_diput,
 		.d_delete = cgroup_delete,
+		.d_release = cgroup_d_release,
 	};
 
 	struct inode *inode =
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH] cgroup: superblock can't be released with active dentries
  2012-05-24 15:41                         ` Tejun Heo
@ 2012-05-28  0:18                             ` Li Zefan
  -1 siblings, 0 replies; 33+ messages in thread
From: Li Zefan @ 2012-05-28  0:18 UTC (permalink / raw)
  To: Tejun Heo
  Cc: ctalbott-hpIqsD4AKlfQT0dZR+AlfA, rni-hpIqsD4AKlfQT0dZR+AlfA,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Michal Hocko, Sasha Levin,
	Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA, Vivek Goyal

2012/5/24 23:41, Tejun Heo wrote:

>>From 88787c483106c5830a46d18deaffdc1e70929af7 Mon Sep 17 00:00:00 2001
> From: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Date: Thu, 24 May 2012 08:24:39 -0700
> 
> 48ddbe1946 "cgroup: make css->refcnt clearing on cgroup removal
> optional" allowed a css to linger after the associated cgroup is
> removed.  As a css holds a reference on the cgroup's dentry, it means
> that cgroup dentries may linger for a while.
> 
> cgroup_create() does grab an active reference on the superblock to
> prevent it from going away while there are !root cgroups; however, the
> reference is put from cgroup_diput() which is invoked on cgroup
> removal, so cgroup dentries which are removed but persisting due to
> lingering csses already have released their superblock active refs
> allowing superblock to be killed while those dentries are around.
> 
> Given the right condition, this makes cgroup_kill_sb() call
> kill_litter_super() with dentries with non-zero d_count leading to
> BUG() in shrink_dcache_for_umount_subtree().
> 
> Fix it by adding cgroup_dops->d_release() operation and moving
> deactivate_super() to it.  cgroup_diput() now marks dentry->d_fsdata
> with itself if superblock should be deactivated and cgroup_d_release()
> deactivates the superblock on dentry release.
> 
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Reported-by: Sasha Levin <levinsasha928-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Tested-by: Sasha Levin <levinsasha928-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> LKML-Reference: <CA+1xoqe5hMuxzCRhMy7J0XchDk2ZnuxOHJKikROk1-ReAzcT6g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>


Acked-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] cgroup: superblock can't be released with active dentries
@ 2012-05-28  0:18                             ` Li Zefan
  0 siblings, 0 replies; 33+ messages in thread
From: Li Zefan @ 2012-05-28  0:18 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Sasha Levin, containers, cgroups, linux-kernel, fweisbec, rni,
	ctalbott, Vivek Goyal, Johannes Weiner, Michal Hocko,
	Balbir Singh, KAMEZAWA Hiroyuki

2012/5/24 23:41, Tejun Heo wrote:

>>From 88787c483106c5830a46d18deaffdc1e70929af7 Mon Sep 17 00:00:00 2001
> From: Tejun Heo <tj@kernel.org>
> Date: Thu, 24 May 2012 08:24:39 -0700
> 
> 48ddbe1946 "cgroup: make css->refcnt clearing on cgroup removal
> optional" allowed a css to linger after the associated cgroup is
> removed.  As a css holds a reference on the cgroup's dentry, it means
> that cgroup dentries may linger for a while.
> 
> cgroup_create() does grab an active reference on the superblock to
> prevent it from going away while there are !root cgroups; however, the
> reference is put from cgroup_diput() which is invoked on cgroup
> removal, so cgroup dentries which are removed but persisting due to
> lingering csses already have released their superblock active refs
> allowing superblock to be killed while those dentries are around.
> 
> Given the right condition, this makes cgroup_kill_sb() call
> kill_litter_super() with dentries with non-zero d_count leading to
> BUG() in shrink_dcache_for_umount_subtree().
> 
> Fix it by adding cgroup_dops->d_release() operation and moving
> deactivate_super() to it.  cgroup_diput() now marks dentry->d_fsdata
> with itself if superblock should be deactivated and cgroup_d_release()
> deactivates the superblock on dentry release.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: Sasha Levin <levinsasha928@gmail.com>
> Tested-by: Sasha Levin <levinsasha928@gmail.com>
> LKML-Reference: <CA+1xoqe5hMuxzCRhMy7J0XchDk2ZnuxOHJKikROk1-ReAzcT6g@mail.gmail.com>


Acked-by: Li Zefan <lizefan@huawei.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2012-05-28  0:18 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-30 22:34 [PATCH 1/2] cgroup: use negative bias on css->refcnt to block css_tryget() Tejun Heo
2012-03-30 22:34 ` Tejun Heo
     [not found] ` <20120330223423.GJ28934-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-03-30 22:36   ` [PATCH 2/2] cgroup: make css->refcnt clearing on cgroup removal optional Tejun Heo
2012-03-30 22:36     ` Tejun Heo
     [not found]     ` <20120330223606.GK28934-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-03-31 21:28       ` Hiroyuki Kamezawa
2012-04-01  2:00       ` Li Zefan
2012-04-01 19:48       ` Vivek Goyal
2012-04-01 19:48         ` Vivek Goyal
2012-03-31 21:28     ` Hiroyuki Kamezawa
2012-03-31 21:28       ` Hiroyuki Kamezawa
2012-04-01  2:00     ` Li Zefan
2012-04-01  2:00       ` Li Zefan
2012-04-01 18:54       ` Tejun Heo
2012-04-01 18:54         ` Tejun Heo
     [not found]         ` <20120401185430.GA9230-RcKxWJ4Cfj1J2suj2OqeGauc2jM2gXBXkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
2012-05-16 22:33           ` Sasha Levin
2012-05-16 22:33             ` Sasha Levin
     [not found]             ` <CA+1xoqe5hMuxzCRhMy7J0XchDk2ZnuxOHJKikROk1-ReAzcT6g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-05-18 17:55               ` Tejun Heo
2012-05-18 17:55                 ` Tejun Heo
2012-05-18 18:28                 ` Sasha Levin
2012-05-18 18:28                   ` Sasha Levin
2012-05-23 22:22                   ` Tejun Heo
2012-05-23 22:22                     ` Tejun Heo
2012-05-24 13:21                     ` Sasha Levin
2012-05-24 13:21                       ` Sasha Levin
2012-05-24 15:41                       ` [PATCH] cgroup: superblock can't be released with active dentries Tejun Heo
2012-05-24 15:41                         ` Tejun Heo
     [not found]                         ` <20120524154139.GA27983-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-05-28  0:18                           ` Li Zefan
2012-05-28  0:18                             ` Li Zefan
     [not found]                       ` <CA+1xoqdbtkC5Ue3s1qZHHJBLcj9a=djFDGny6AJJSV-pdn7mcA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-05-24 15:41                         ` Tejun Heo
     [not found]                     ` <20120523222242.GD3933-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2012-05-24 13:21                       ` [PATCH 2/2] cgroup: make css->refcnt clearing on cgroup removal optional Sasha Levin
     [not found]                   ` <CA+1xoqfr5aaGbxoX3YKrJHXTjU9fWSbX_xbuEOU=4K7kMay6XQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-05-23 22:22                     ` Tejun Heo
     [not found]                 ` <20120518175548.GM19388-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-05-18 18:28                   ` Sasha Levin
     [not found]       ` <4F77B6BA.4070207-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2012-04-01 18:54         ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.