All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH cgroup/for-3.15-fixes 1/2] cgroup: introduce task_css_is_root()
@ 2014-05-05 22:35 Tejun Heo
       [not found] ` <20140505223557.GU11231-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Tejun Heo @ 2014-05-05 22:35 UTC (permalink / raw)
  To: Li Zefan; +Cc: cgroups-u79uwXL29TY76Z2rM5mHXA

From fc4ef71935b1effe8612bfb39c7e3a41a2e2346c Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Date: Mon, 5 May 2014 18:35:12 -0400

Determining the css of a task usually requires RCU read lock as that's
the only thing which keeps the returned css accessible till its
reference is acquired; however, testing whether a task belongs to the
root can be performed without dereferencing the returned css by
comparing the returned pointer against the root one in init_css_set[]
which never changes.

Implement task_css_is_root() which can be invoked in any context.
This will be used by the scheduled cgroup_freezer change.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 include/linux/cgroup.h | 15 +++++++++++++++
 kernel/cgroup.c        |  3 ++-
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index c251585..d60904b 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -473,6 +473,7 @@ struct cftype {
 };
 
 extern struct cgroup_root cgrp_dfl_root;
+extern struct css_set init_css_set;
 
 static inline bool cgroup_on_dfl(const struct cgroup *cgrp)
 {
@@ -700,6 +701,20 @@ static inline struct cgroup_subsys_state *task_css(struct task_struct *task,
 	return task_css_check(task, subsys_id, false);
 }
 
+/**
+ * task_css_is_root - test whether a task belongs to the root css
+ * @task: the target task
+ * @subsys_id: the target subsystem ID
+ *
+ * Test whether @task belongs to the root css on the specified subsystem.
+ * May be invoked in any context.
+ */
+static inline bool task_css_is_root(struct task_struct *task, int subsys_id)
+{
+	return task_css_check(task, subsys_id, true) ==
+		init_css_set.subsys[subsys_id];
+}
+
 static inline struct cgroup *task_cgroup(struct task_struct *task,
 					 int subsys_id)
 {
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 11a03d6..7030b37 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -348,7 +348,7 @@ struct cgrp_cset_link {
  * reference-counted, to improve performance when child cgroups
  * haven't been created.
  */
-static struct css_set init_css_set = {
+struct css_set init_css_set = {
 	.refcount		= ATOMIC_INIT(1),
 	.cgrp_links		= LIST_HEAD_INIT(init_css_set.cgrp_links),
 	.tasks			= LIST_HEAD_INIT(init_css_set.tasks),
@@ -356,6 +356,7 @@ static struct css_set init_css_set = {
 	.mg_preload_node	= LIST_HEAD_INIT(init_css_set.mg_preload_node),
 	.mg_node		= LIST_HEAD_INIT(init_css_set.mg_node),
 };
+EXPORT_SYMBOL_GPL(init_css_set);
 
 static int css_set_count	= 1;	/* 1 for init_css_set */
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH cgroup/for-3.15-fixes 2/2] cgroup_freezer: replace freezer->lock with freezer_mutex
       [not found] ` <20140505223557.GU11231-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
@ 2014-05-05 22:39   ` Tejun Heo
       [not found]     ` <20140505223941.GV11231-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
  2014-05-07  2:50   ` [PATCH cgroup/for-3.15-fixes 1/2] cgroup: introduce task_css_is_root() Li Zefan
  2014-05-07 13:06   ` [PATCH v2 " Tejun Heo
  2 siblings, 1 reply; 9+ messages in thread
From: Tejun Heo @ 2014-05-05 22:39 UTC (permalink / raw)
  To: Li Zefan; +Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, Rafael J. Wysocki

Hello,

So, I apparently missed this one while converting css_set_rwlock to
css_set_rwsem.  I thought about reverting that change but concluded
that it'd be better to update locking from cgroup_freezer side.  It
always bothered me that cgroup_freezer was walking all tasks in cgroup
sub-hierarchies under IRQ-safe spinlock.  In general, freezer has been
using way more intricate locking than required for it.

This patch converts cgroup_freezer to use a shared mutex instead of
per-cgroup spinlock for locking and fixes the rwsem-from-atomic issue
at the same time.

Thanks.
------ 8< ------
From 77f01328c3e6d1eae95426543e338674e597e06a Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Date: Mon, 5 May 2014 18:35:12 -0400

After 96d365e0b86e ("cgroup: make css_set_lock a rwsem and rename it
to css_set_rwsem"), css task iterators requires sleepable context as
it may block on css_set_rwsem.  I missed that cgroup_freezer was
iterating tasks under IRQ-safe spinlock freezer->lock.  This leads to
errors like the following on freezer state reads and transitions.

  BUG: sleeping function called from invalid context at /work
 /os/work/kernel/locking/rwsem.c:20
  in_atomic(): 0, irqs_disabled(): 0, pid: 462, name: bash
  5 locks held by bash/462:
   #0:  (sb_writers#7){.+.+.+}, at: [<ffffffff811f0843>] vfs_write+0x1a3/0x1c0
   #1:  (&of->mutex){+.+.+.}, at: [<ffffffff8126d78b>] kernfs_fop_write+0xbb/0x170
   #2:  (s_active#70){.+.+.+}, at: [<ffffffff8126d793>] kernfs_fop_write+0xc3/0x170
   #3:  (freezer_mutex){+.+...}, at: [<ffffffff81135981>] freezer_write+0x61/0x1e0
   #4:  (rcu_read_lock){......}, at: [<ffffffff81135973>] freezer_write+0x53/0x1e0
  Preemption disabled at:[<ffffffff81104404>] console_unlock+0x1e4/0x460

  CPU: 3 PID: 462 Comm: bash Not tainted 3.15.0-rc1-work+ #10
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
   ffff88000916a6d0 ffff88000e0a3da0 ffffffff81cf8c96 0000000000000000
   ffff88000e0a3dc8 ffffffff810cf4f2 ffffffff82388040 ffff880013aaf740
   0000000000000002 ffff88000e0a3de8 ffffffff81d05974 0000000000000246
  Call Trace:
   [<ffffffff81cf8c96>] dump_stack+0x4e/0x7a
   [<ffffffff810cf4f2>] __might_sleep+0x162/0x260
   [<ffffffff81d05974>] down_read+0x24/0x60
   [<ffffffff81133e87>] css_task_iter_start+0x27/0x70
   [<ffffffff8113584d>] freezer_apply_state+0x5d/0x130
   [<ffffffff81135a16>] freezer_write+0xf6/0x1e0
   [<ffffffff8112eb88>] cgroup_file_write+0xd8/0x230
   [<ffffffff8126d7b7>] kernfs_fop_write+0xe7/0x170
   [<ffffffff811f0756>] vfs_write+0xb6/0x1c0
   [<ffffffff811f121d>] SyS_write+0x4d/0xc0
   [<ffffffff81d08292>] system_call_fastpath+0x16/0x1b

freezer->lock used to be used in hot paths but that time is long gone
and there's no reason for the lock to be IRQ-safe spinlock or even
per-cgroup.  In fact, given the fact that a cgroup may contain large
number of tasks, it's not a good idea to iterate over them while
holding IRQ-safe spinlock.

Let's simplify locking by replacing per-cgroup freezer->lock with
global freezer_mutex.  This also makes the comments explaining the
intricacies of policy inheritance and the locking around it as the
states are protected by a common mutex.

The conversion is mostly straight-forward.  The followings are worth
mentioning.

* freezer_css_online() no longer needs double locking.

* freezer_attach() now performs propagation simply while holding
  freezer_mutex.  update_if_frozen() race no longer exists and the
  comment is removed.

* freezer_fork() now tests whether the task is in root cgroup using
  the new task_css_is_root() without doing rcu_read_lock/unlock().  If
  not, it grabs freezer_mutex and performs the operation.

* freezer_read() and freezer_change_state() grab freezer_mutex across
  the whole operation and pin the css while iterating so that each
  descendant processing happens in sleepable context.

Fixes: 96d365e0b86e ("cgroup: make css_set_lock a rwsem and rename it to css_set_rwsem")
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 kernel/cgroup_freezer.c | 112 ++++++++++++++++++++----------------------------
 1 file changed, 46 insertions(+), 66 deletions(-)

diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index 2bc4a22..12ead0b 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -21,6 +21,7 @@
 #include <linux/uaccess.h>
 #include <linux/freezer.h>
 #include <linux/seq_file.h>
+#include <linux/mutex.h>
 
 /*
  * A cgroup is freezing if any FREEZING flags are set.  FREEZING_SELF is
@@ -42,9 +43,10 @@ enum freezer_state_flags {
 struct freezer {
 	struct cgroup_subsys_state	css;
 	unsigned int			state;
-	spinlock_t			lock;
 };
 
+static DEFINE_MUTEX(freezer_mutex);
+
 static inline struct freezer *css_freezer(struct cgroup_subsys_state *css)
 {
 	return css ? container_of(css, struct freezer, css) : NULL;
@@ -93,7 +95,6 @@ freezer_css_alloc(struct cgroup_subsys_state *parent_css)
 	if (!freezer)
 		return ERR_PTR(-ENOMEM);
 
-	spin_lock_init(&freezer->lock);
 	return &freezer->css;
 }
 
@@ -110,14 +111,7 @@ static int freezer_css_online(struct cgroup_subsys_state *css)
 	struct freezer *freezer = css_freezer(css);
 	struct freezer *parent = parent_freezer(freezer);
 
-	/*
-	 * The following double locking and freezing state inheritance
-	 * guarantee that @cgroup can never escape ancestors' freezing
-	 * states.  See css_for_each_descendant_pre() for details.
-	 */
-	if (parent)
-		spin_lock_irq(&parent->lock);
-	spin_lock_nested(&freezer->lock, SINGLE_DEPTH_NESTING);
+	mutex_lock(&freezer_mutex);
 
 	freezer->state |= CGROUP_FREEZER_ONLINE;
 
@@ -126,10 +120,7 @@ static int freezer_css_online(struct cgroup_subsys_state *css)
 		atomic_inc(&system_freezing_cnt);
 	}
 
-	spin_unlock(&freezer->lock);
-	if (parent)
-		spin_unlock_irq(&parent->lock);
-
+	mutex_unlock(&freezer_mutex);
 	return 0;
 }
 
@@ -144,14 +135,14 @@ static void freezer_css_offline(struct cgroup_subsys_state *css)
 {
 	struct freezer *freezer = css_freezer(css);
 
-	spin_lock_irq(&freezer->lock);
+	mutex_lock(&freezer_mutex);
 
 	if (freezer->state & CGROUP_FREEZING)
 		atomic_dec(&system_freezing_cnt);
 
 	freezer->state = 0;
 
-	spin_unlock_irq(&freezer->lock);
+	mutex_unlock(&freezer_mutex);
 }
 
 static void freezer_css_free(struct cgroup_subsys_state *css)
@@ -175,7 +166,7 @@ static void freezer_attach(struct cgroup_subsys_state *new_css,
 	struct task_struct *task;
 	bool clear_frozen = false;
 
-	spin_lock_irq(&freezer->lock);
+	mutex_lock(&freezer_mutex);
 
 	/*
 	 * Make the new tasks conform to the current state of @new_css.
@@ -197,21 +188,13 @@ static void freezer_attach(struct cgroup_subsys_state *new_css,
 		}
 	}
 
-	spin_unlock_irq(&freezer->lock);
-
-	/*
-	 * Propagate FROZEN clearing upwards.  We may race with
-	 * update_if_frozen(), but as long as both work bottom-up, either
-	 * update_if_frozen() sees child's FROZEN cleared or we clear the
-	 * parent's FROZEN later.  No parent w/ !FROZEN children can be
-	 * left FROZEN.
-	 */
+	/* propagate FROZEN clearing upwards */
 	while (clear_frozen && (freezer = parent_freezer(freezer))) {
-		spin_lock_irq(&freezer->lock);
 		freezer->state &= ~CGROUP_FROZEN;
 		clear_frozen = freezer->state & CGROUP_FREEZING;
-		spin_unlock_irq(&freezer->lock);
 	}
+
+	mutex_unlock(&freezer_mutex);
 }
 
 /**
@@ -228,9 +211,6 @@ static void freezer_fork(struct task_struct *task)
 {
 	struct freezer *freezer;
 
-	rcu_read_lock();
-	freezer = task_freezer(task);
-
 	/*
 	 * The root cgroup is non-freezable, so we can skip locking the
 	 * freezer.  This is safe regardless of race with task migration.
@@ -238,24 +218,18 @@ static void freezer_fork(struct task_struct *task)
 	 * to do.  If we lost and root is the new cgroup, noop is still the
 	 * right thing to do.
 	 */
-	if (!parent_freezer(freezer))
-		goto out;
+	if (task_css_is_root(task, freezer_cgrp_id))
+		return;
 
-	/*
-	 * Grab @freezer->lock and freeze @task after verifying @task still
-	 * belongs to @freezer and it's freezing.  The former is for the
-	 * case where we have raced against task migration and lost and
-	 * @task is already in a different cgroup which may not be frozen.
-	 * This isn't strictly necessary as freeze_task() is allowed to be
-	 * called spuriously but let's do it anyway for, if nothing else,
-	 * documentation.
-	 */
-	spin_lock_irq(&freezer->lock);
-	if (freezer == task_freezer(task) && (freezer->state & CGROUP_FREEZING))
+	mutex_lock(&freezer_mutex);
+	rcu_read_lock();
+
+	freezer = task_freezer(task);
+	if (freezer->state & CGROUP_FREEZING)
 		freeze_task(task);
-	spin_unlock_irq(&freezer->lock);
-out:
+
 	rcu_read_unlock();
+	mutex_unlock(&freezer_mutex);
 }
 
 /**
@@ -281,22 +255,22 @@ static void update_if_frozen(struct cgroup_subsys_state *css)
 	struct css_task_iter it;
 	struct task_struct *task;
 
-	WARN_ON_ONCE(!rcu_read_lock_held());
-
-	spin_lock_irq(&freezer->lock);
+	lockdep_assert_held(&freezer_mutex);
 
 	if (!(freezer->state & CGROUP_FREEZING) ||
 	    (freezer->state & CGROUP_FROZEN))
-		goto out_unlock;
+		return;
 
 	/* are all (live) children frozen? */
+	rcu_read_lock();
 	css_for_each_child(pos, css) {
 		struct freezer *child = css_freezer(pos);
 
 		if ((child->state & CGROUP_FREEZER_ONLINE) &&
 		    !(child->state & CGROUP_FROZEN))
-			goto out_unlock;
+			return;
 	}
+	rcu_read_unlock();
 
 	/* are all tasks frozen? */
 	css_task_iter_start(css, &it);
@@ -317,21 +291,29 @@ static void update_if_frozen(struct cgroup_subsys_state *css)
 	freezer->state |= CGROUP_FROZEN;
 out_iter_end:
 	css_task_iter_end(&it);
-out_unlock:
-	spin_unlock_irq(&freezer->lock);
 }
 
 static int freezer_read(struct seq_file *m, void *v)
 {
 	struct cgroup_subsys_state *css = seq_css(m), *pos;
 
+	mutex_lock(&freezer_mutex);
 	rcu_read_lock();
 
 	/* update states bottom-up */
-	css_for_each_descendant_post(pos, css)
+	css_for_each_descendant_post(pos, css) {
+		if (!css_tryget(pos))
+			continue;
+		rcu_read_unlock();
+
 		update_if_frozen(pos);
 
+		rcu_read_lock();
+		css_put(pos);
+	}
+
 	rcu_read_unlock();
+	mutex_unlock(&freezer_mutex);
 
 	seq_puts(m, freezer_state_strs(css_freezer(css)->state));
 	seq_putc(m, '\n');
@@ -373,7 +355,7 @@ static void freezer_apply_state(struct freezer *freezer, bool freeze,
 				unsigned int state)
 {
 	/* also synchronizes against task migration, see freezer_attach() */
-	lockdep_assert_held(&freezer->lock);
+	lockdep_assert_held(&freezer_mutex);
 
 	if (!(freezer->state & CGROUP_FREEZER_ONLINE))
 		return;
@@ -414,31 +396,29 @@ static void freezer_change_state(struct freezer *freezer, bool freeze)
 	 * descendant will try to inherit its parent's FREEZING state as
 	 * CGROUP_FREEZING_PARENT.
 	 */
+	mutex_lock(&freezer_mutex);
 	rcu_read_lock();
 	css_for_each_descendant_pre(pos, &freezer->css) {
 		struct freezer *pos_f = css_freezer(pos);
 		struct freezer *parent = parent_freezer(pos_f);
 
-		spin_lock_irq(&pos_f->lock);
+		if (!css_tryget(pos))
+			continue;
+		rcu_read_unlock();
 
-		if (pos_f == freezer) {
+		if (pos_f == freezer)
 			freezer_apply_state(pos_f, freeze,
 					    CGROUP_FREEZING_SELF);
-		} else {
-			/*
-			 * Our update to @parent->state is already visible
-			 * which is all we need.  No need to lock @parent.
-			 * For more info on synchronization, see
-			 * freezer_post_create().
-			 */
+		else
 			freezer_apply_state(pos_f,
 					    parent->state & CGROUP_FREEZING,
 					    CGROUP_FREEZING_PARENT);
-		}
 
-		spin_unlock_irq(&pos_f->lock);
+		rcu_read_lock();
+		css_put(pos);
 	}
 	rcu_read_unlock();
+	mutex_unlock(&freezer_mutex);
 }
 
 static int freezer_write(struct cgroup_subsys_state *css, struct cftype *cft,
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH cgroup/for-3.15-fixes 1/2] cgroup: introduce task_css_is_root()
       [not found] ` <20140505223557.GU11231-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
  2014-05-05 22:39   ` [PATCH cgroup/for-3.15-fixes 2/2] cgroup_freezer: replace freezer->lock with freezer_mutex Tejun Heo
@ 2014-05-07  2:50   ` Li Zefan
       [not found]     ` <53699F6E.3070107-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
  2014-05-07 13:06   ` [PATCH v2 " Tejun Heo
  2 siblings, 1 reply; 9+ messages in thread
From: Li Zefan @ 2014-05-07  2:50 UTC (permalink / raw)
  To: Tejun Heo; +Cc: cgroups-u79uwXL29TY76Z2rM5mHXA

Hi Tejun,

> -static struct css_set init_css_set = {
> +struct css_set init_css_set = {
>  	.refcount		= ATOMIC_INIT(1),
>  	.cgrp_links		= LIST_HEAD_INIT(init_css_set.cgrp_links),
>  	.tasks			= LIST_HEAD_INIT(init_css_set.tasks),
> @@ -356,6 +356,7 @@ static struct css_set init_css_set = {
>  	.mg_preload_node	= LIST_HEAD_INIT(init_css_set.mg_preload_node),
>  	.mg_node		= LIST_HEAD_INIT(init_css_set.mg_node),
>  };
> +EXPORT_SYMBOL_GPL(init_css_set);
>  

Why do we export this symbol?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH cgroup/for-3.15-fixes 1/2] cgroup: introduce task_css_is_root()
       [not found]     ` <53699F6E.3070107-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2014-05-07  3:29       ` Tejun Heo
  0 siblings, 0 replies; 9+ messages in thread
From: Tejun Heo @ 2014-05-07  3:29 UTC (permalink / raw)
  To: Li Zefan; +Cc: Cgroups

Hello,

On Tue, May 6, 2014 at 10:50 PM, Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> wrote:
>> -static struct css_set init_css_set = {
>> +struct css_set init_css_set = {
>>       .refcount               = ATOMIC_INIT(1),
>>       .cgrp_links             = LIST_HEAD_INIT(init_css_set.cgrp_links),
>>       .tasks                  = LIST_HEAD_INIT(init_css_set.tasks),
>> @@ -356,6 +356,7 @@ static struct css_set init_css_set = {
>>       .mg_preload_node        = LIST_HEAD_INIT(init_css_set.mg_preload_node),
>>       .mg_node                = LIST_HEAD_INIT(init_css_set.mg_node),
>>  };
>> +EXPORT_SYMBOL_GPL(init_css_set);
>>
>
> Why do we export this symbol?

LOL because I forgot that we removed module support. I'll drop the EXPORT.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 cgroup/for-3.15-fixes 1/2] cgroup: introduce task_css_is_root()
       [not found] ` <20140505223557.GU11231-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
  2014-05-05 22:39   ` [PATCH cgroup/for-3.15-fixes 2/2] cgroup_freezer: replace freezer->lock with freezer_mutex Tejun Heo
  2014-05-07  2:50   ` [PATCH cgroup/for-3.15-fixes 1/2] cgroup: introduce task_css_is_root() Li Zefan
@ 2014-05-07 13:06   ` Tejun Heo
       [not found]     ` <20140507130627.GA16702-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
  2 siblings, 1 reply; 9+ messages in thread
From: Tejun Heo @ 2014-05-07 13:06 UTC (permalink / raw)
  To: Li Zefan; +Cc: cgroups-u79uwXL29TY76Z2rM5mHXA

Determining the css of a task usually requires RCU read lock as that's
the only thing which keeps the returned css accessible till its
reference is acquired; however, testing whether a task belongs to the
root can be performed without dereferencing the returned css by
comparing the returned pointer against the root one in init_css_set[]
which never changes.

Implement task_css_is_root() which can be invoked in any context.
This will be used by the scheduled cgroup_freezer change.

v2: cgroup no longer supports modular controllers.  No need to export
    init_css_set.  Pointed out by Li.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 include/linux/cgroup.h |   15 +++++++++++++++
 kernel/cgroup.c        |    2 +-
 2 files changed, 16 insertions(+), 1 deletion(-)

--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -473,6 +473,7 @@ struct cftype {
 };
 
 extern struct cgroup_root cgrp_dfl_root;
+extern struct css_set init_css_set;
 
 static inline bool cgroup_on_dfl(const struct cgroup *cgrp)
 {
@@ -700,6 +701,20 @@ static inline struct cgroup_subsys_state
 	return task_css_check(task, subsys_id, false);
 }
 
+/**
+ * task_css_is_root - test whether a task belongs to the root css
+ * @task: the target task
+ * @subsys_id: the target subsystem ID
+ *
+ * Test whether @task belongs to the root css on the specified subsystem.
+ * May be invoked in any context.
+ */
+static inline bool task_css_is_root(struct task_struct *task, int subsys_id)
+{
+	return task_css_check(task, subsys_id, true) ==
+		init_css_set.subsys[subsys_id];
+}
+
 static inline struct cgroup *task_cgroup(struct task_struct *task,
 					 int subsys_id)
 {
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -348,7 +348,7 @@ struct cgrp_cset_link {
  * reference-counted, to improve performance when child cgroups
  * haven't been created.
  */
-static struct css_set init_css_set = {
+struct css_set init_css_set = {
 	.refcount		= ATOMIC_INIT(1),
 	.cgrp_links		= LIST_HEAD_INIT(init_css_set.cgrp_links),
 	.tasks			= LIST_HEAD_INIT(init_css_set.tasks),

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 cgroup/for-3.15-fixes 1/2] cgroup: introduce task_css_is_root()
       [not found]     ` <20140507130627.GA16702-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
@ 2014-05-08  1:22       ` Li Zefan
  2014-05-08  1:32       ` Tejun Heo
  1 sibling, 0 replies; 9+ messages in thread
From: Li Zefan @ 2014-05-08  1:22 UTC (permalink / raw)
  To: Tejun Heo; +Cc: cgroups-u79uwXL29TY76Z2rM5mHXA

于 2014/5/7 21:06, Tejun Heo 写道:
> Determining the css of a task usually requires RCU read lock as that's
> the only thing which keeps the returned css accessible till its
> reference is acquired; however, testing whether a task belongs to the
> root can be performed without dereferencing the returned css by
> comparing the returned pointer against the root one in init_css_set[]
> which never changes.
> 
> Implement task_css_is_root() which can be invoked in any context.
> This will be used by the scheduled cgroup_freezer change.
> 
> v2: cgroup no longer supports modular controllers.  No need to export
>     init_css_set.  Pointed out by Li.
> 
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

Acked-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH cgroup/for-3.15-fixes 2/2] cgroup_freezer: replace freezer->lock with freezer_mutex
       [not found]     ` <20140505223941.GV11231-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
@ 2014-05-08  1:23       ` Li Zefan
  0 siblings, 0 replies; 9+ messages in thread
From: Li Zefan @ 2014-05-08  1:23 UTC (permalink / raw)
  To: Tejun Heo; +Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, Rafael J. Wysocki

> After 96d365e0b86e ("cgroup: make css_set_lock a rwsem and rename it
> to css_set_rwsem"), css task iterators requires sleepable context as
> it may block on css_set_rwsem.  I missed that cgroup_freezer was
> iterating tasks under IRQ-safe spinlock freezer->lock.  This leads to
> errors like the following on freezer state reads and transitions.
> 
>   BUG: sleeping function called from invalid context at /work
>  /os/work/kernel/locking/rwsem.c:20
>   in_atomic(): 0, irqs_disabled(): 0, pid: 462, name: bash
>   5 locks held by bash/462:
>    #0:  (sb_writers#7){.+.+.+}, at: [<ffffffff811f0843>] vfs_write+0x1a3/0x1c0
>    #1:  (&of->mutex){+.+.+.}, at: [<ffffffff8126d78b>] kernfs_fop_write+0xbb/0x170
>    #2:  (s_active#70){.+.+.+}, at: [<ffffffff8126d793>] kernfs_fop_write+0xc3/0x170
>    #3:  (freezer_mutex){+.+...}, at: [<ffffffff81135981>] freezer_write+0x61/0x1e0
>    #4:  (rcu_read_lock){......}, at: [<ffffffff81135973>] freezer_write+0x53/0x1e0
>   Preemption disabled at:[<ffffffff81104404>] console_unlock+0x1e4/0x460
> 
>   CPU: 3 PID: 462 Comm: bash Not tainted 3.15.0-rc1-work+ #10
>   Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>    ffff88000916a6d0 ffff88000e0a3da0 ffffffff81cf8c96 0000000000000000
>    ffff88000e0a3dc8 ffffffff810cf4f2 ffffffff82388040 ffff880013aaf740
>    0000000000000002 ffff88000e0a3de8 ffffffff81d05974 0000000000000246
>   Call Trace:
>    [<ffffffff81cf8c96>] dump_stack+0x4e/0x7a
>    [<ffffffff810cf4f2>] __might_sleep+0x162/0x260
>    [<ffffffff81d05974>] down_read+0x24/0x60
>    [<ffffffff81133e87>] css_task_iter_start+0x27/0x70
>    [<ffffffff8113584d>] freezer_apply_state+0x5d/0x130
>    [<ffffffff81135a16>] freezer_write+0xf6/0x1e0
>    [<ffffffff8112eb88>] cgroup_file_write+0xd8/0x230
>    [<ffffffff8126d7b7>] kernfs_fop_write+0xe7/0x170
>    [<ffffffff811f0756>] vfs_write+0xb6/0x1c0
>    [<ffffffff811f121d>] SyS_write+0x4d/0xc0
>    [<ffffffff81d08292>] system_call_fastpath+0x16/0x1b
> 
> freezer->lock used to be used in hot paths but that time is long gone
> and there's no reason for the lock to be IRQ-safe spinlock or even
> per-cgroup.  In fact, given the fact that a cgroup may contain large
> number of tasks, it's not a good idea to iterate over them while
> holding IRQ-safe spinlock.
> 
> Let's simplify locking by replacing per-cgroup freezer->lock with
> global freezer_mutex.  This also makes the comments explaining the
> intricacies of policy inheritance and the locking around it as the
> states are protected by a common mutex.
> 
> The conversion is mostly straight-forward.  The followings are worth
> mentioning.
> 
> * freezer_css_online() no longer needs double locking.
> 
> * freezer_attach() now performs propagation simply while holding
>   freezer_mutex.  update_if_frozen() race no longer exists and the
>   comment is removed.
> 
> * freezer_fork() now tests whether the task is in root cgroup using
>   the new task_css_is_root() without doing rcu_read_lock/unlock().  If
>   not, it grabs freezer_mutex and performs the operation.
> 
> * freezer_read() and freezer_change_state() grab freezer_mutex across
>   the whole operation and pin the css while iterating so that each
>   descendant processing happens in sleepable context.
> 
> Fixes: 96d365e0b86e ("cgroup: make css_set_lock a rwsem and rename it to css_set_rwsem")
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

Acked-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 cgroup/for-3.15-fixes 1/2] cgroup: introduce task_css_is_root()
       [not found]     ` <20140507130627.GA16702-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
  2014-05-08  1:22       ` Li Zefan
@ 2014-05-08  1:32       ` Tejun Heo
       [not found]         ` <20140508013206.GA2797-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
  1 sibling, 1 reply; 9+ messages in thread
From: Tejun Heo @ 2014-05-08  1:32 UTC (permalink / raw)
  To: Li Zefan; +Cc: cgroups-u79uwXL29TY76Z2rM5mHXA

On Wed, May 07, 2014 at 09:06:27AM -0400, Tejun Heo wrote:
> Determining the css of a task usually requires RCU read lock as that's
> the only thing which keeps the returned css accessible till its
> reference is acquired; however, testing whether a task belongs to the
> root can be performed without dereferencing the returned css by
> comparing the returned pointer against the root one in init_css_set[]
> which never changes.
> 
> Implement task_css_is_root() which can be invoked in any context.
> This will be used by the scheduled cgroup_freezer change.
> 
> v2: cgroup no longer supports modular controllers.  No need to export
>     init_css_set.  Pointed out by Li.
> 
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

Applied 1-2 to cgroup/for-3.15-fixes.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 cgroup/for-3.15-fixes 1/2] cgroup: introduce task_css_is_root()
       [not found]         ` <20140508013206.GA2797-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
@ 2014-05-13 15:35           ` Tejun Heo
  0 siblings, 0 replies; 9+ messages in thread
From: Tejun Heo @ 2014-05-13 15:35 UTC (permalink / raw)
  To: Li Zefan; +Cc: cgroups-u79uwXL29TY76Z2rM5mHXA

On Wed, May 07, 2014 at 09:32:06PM -0400, Tejun Heo wrote:
> On Wed, May 07, 2014 at 09:06:27AM -0400, Tejun Heo wrote:
> > Determining the css of a task usually requires RCU read lock as that's
> > the only thing which keeps the returned css accessible till its
> > reference is acquired; however, testing whether a task belongs to the
> > root can be performed without dereferencing the returned css by
> > comparing the returned pointer against the root one in init_css_set[]
> > which never changes.
> > 
> > Implement task_css_is_root() which can be invoked in any context.
> > This will be used by the scheduled cgroup_freezer change.
> > 
> > v2: cgroup no longer supports modular controllers.  No need to export
> >     init_css_set.  Pointed out by Li.
> > 
> > Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> 
> Applied 1-2 to cgroup/for-3.15-fixes.

Oops, mistakenly applied these to for-3.16.  Reverted and moved them
to for-3.15-fixes.  for-3.16 is rebased but the relevant content stays
the same as it pulls in for-3.15-fixes.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-05-13 15:35 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-05 22:35 [PATCH cgroup/for-3.15-fixes 1/2] cgroup: introduce task_css_is_root() Tejun Heo
     [not found] ` <20140505223557.GU11231-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2014-05-05 22:39   ` [PATCH cgroup/for-3.15-fixes 2/2] cgroup_freezer: replace freezer->lock with freezer_mutex Tejun Heo
     [not found]     ` <20140505223941.GV11231-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2014-05-08  1:23       ` Li Zefan
2014-05-07  2:50   ` [PATCH cgroup/for-3.15-fixes 1/2] cgroup: introduce task_css_is_root() Li Zefan
     [not found]     ` <53699F6E.3070107-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2014-05-07  3:29       ` Tejun Heo
2014-05-07 13:06   ` [PATCH v2 " Tejun Heo
     [not found]     ` <20140507130627.GA16702-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2014-05-08  1:22       ` Li Zefan
2014-05-08  1:32       ` Tejun Heo
     [not found]         ` <20140508013206.GA2797-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2014-05-13 15:35           ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.