linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/6] cgroup: initialize cgrp->dentry before css_alloc()
@ 2013-01-17  8:03 Li Zefan
  2013-01-17  8:03 ` [PATCH 2/6] sched: split out tg creation/destruction to css_online/css_offline Li Zefan
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Li Zefan @ 2013-01-17  8:03 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Peter Zijlstra, Ingo Molnar, LKML, Cgroups

With this change, it's guaranteed that cgroup_path() won't see NULL
cgrp->dentry, and thus we can remove the NULL check in it.

(Well, it's not true, because dummptop.dentry is always NULL)

Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 kernel/cgroup.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index a893985..23a0ce1 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1764,7 +1764,7 @@ int cgroup_path(const struct cgroup *cgrp, char *buf, int buflen)
 	rcu_lockdep_assert(rcu_read_lock_held() || cgroup_lock_is_held(),
 			   "cgroup_path() called without proper locking");
 
-	if (!dentry || cgrp == dummytop) {
+	if (cgrp == dummytop) {
 		/*
 		 * Inactive subsystems have no dentry for their root
 		 * cgroup
@@ -4150,6 +4150,9 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
 
 	init_cgroup_housekeeping(cgrp);
 
+	dentry->d_fsdata = cgrp;
+	cgrp->dentry = dentry;
+
 	cgrp->parent = parent;
 	cgrp->root = parent->root;
 	cgrp->top_cgroup = parent->top_cgroup;
@@ -4187,8 +4190,6 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
 	lockdep_assert_held(&dentry->d_inode->i_mutex);
 
 	/* allocation complete, commit to creation */
-	dentry->d_fsdata = cgrp;
-	cgrp->dentry = dentry;
 	list_add_tail(&cgrp->allcg_node, &root->allcg_list);
 	list_add_tail_rcu(&cgrp->sibling, &cgrp->parent->children);
 	root->number_of_cgroups++;
-- 
1.8.0.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/6] sched: split out tg creation/destruction to css_online/css_offline
  2013-01-17  8:03 [PATCH 1/6] cgroup: initialize cgrp->dentry before css_alloc() Li Zefan
@ 2013-01-17  8:03 ` Li Zefan
  2013-01-17  8:03 ` [PATCH 3/6] sched: remove redundant NULL cgroup check in task_group_path() Li Zefan
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Li Zefan @ 2013-01-17  8:03 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Peter Zijlstra, Ingo Molnar, LKML, Cgroups

With this change, it's safe to access tg->css.cgroup while iterating
the global task_group list.

This is a preparation for later patches.

Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 include/linux/sched.h     |  3 +++
 kernel/sched/auto_group.c |  3 +++
 kernel/sched/core.c       | 49 +++++++++++++++++++++++++++++++++++++----------
 3 files changed, 45 insertions(+), 10 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 206bb08..577eb97 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2750,7 +2750,10 @@ extern void normalize_rt_tasks(void);
 extern struct task_group root_task_group;
 
 extern struct task_group *sched_create_group(struct task_group *parent);
+extern void sched_online_group(struct task_group *tg,
+			       struct task_group *parent);
 extern void sched_destroy_group(struct task_group *tg);
+extern void sched_offline_group(struct task_group *tg);
 extern void sched_move_task(struct task_struct *tsk);
 #ifdef CONFIG_FAIR_GROUP_SCHED
 extern int sched_group_set_shares(struct task_group *tg, unsigned long shares);
diff --git a/kernel/sched/auto_group.c b/kernel/sched/auto_group.c
index 0984a21..64de5f8 100644
--- a/kernel/sched/auto_group.c
+++ b/kernel/sched/auto_group.c
@@ -35,6 +35,7 @@ static inline void autogroup_destroy(struct kref *kref)
 	ag->tg->rt_se = NULL;
 	ag->tg->rt_rq = NULL;
 #endif
+	sched_offline_group(ag->tg);
 	sched_destroy_group(ag->tg);
 }
 
@@ -76,6 +77,8 @@ static inline struct autogroup *autogroup_create(void)
 	if (IS_ERR(tg))
 		goto out_free;
 
+	sched_online_group(tg, &root_task_group);
+
 	kref_init(&ag->kref);
 	init_rwsem(&ag->lock);
 	ag->id = atomic_inc_return(&autogroup_seq_nr);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 257002c..1061672 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7159,7 +7159,6 @@ static void free_sched_group(struct task_group *tg)
 struct task_group *sched_create_group(struct task_group *parent)
 {
 	struct task_group *tg;
-	unsigned long flags;
 
 	tg = kzalloc(sizeof(*tg), GFP_KERNEL);
 	if (!tg)
@@ -7171,6 +7170,17 @@ struct task_group *sched_create_group(struct task_group *parent)
 	if (!alloc_rt_sched_group(tg, parent))
 		goto err;
 
+	return tg;
+
+err:
+	free_sched_group(tg);
+	return ERR_PTR(-ENOMEM);
+}
+
+void sched_online_group(struct task_group *tg, struct task_group *parent)
+{
+	unsigned long flags;
+
 	spin_lock_irqsave(&task_group_lock, flags);
 	list_add_rcu(&tg->list, &task_groups);
 
@@ -7180,12 +7190,6 @@ struct task_group *sched_create_group(struct task_group *parent)
 	INIT_LIST_HEAD(&tg->children);
 	list_add_rcu(&tg->siblings, &parent->children);
 	spin_unlock_irqrestore(&task_group_lock, flags);
-
-	return tg;
-
-err:
-	free_sched_group(tg);
-	return ERR_PTR(-ENOMEM);
 }
 
 /* rcu callback to free various structures associated with a task group */
@@ -7198,6 +7202,12 @@ static void free_sched_group_rcu(struct rcu_head *rhp)
 /* Destroy runqueue etc associated with a task group */
 void sched_destroy_group(struct task_group *tg)
 {
+	/* wait for possible concurrent references to cfs_rqs complete */
+	call_rcu(&tg->rcu, free_sched_group_rcu);
+}
+
+void sched_offline_group(struct task_group *tg)
+{
 	unsigned long flags;
 	int i;
 
@@ -7209,9 +7219,6 @@ void sched_destroy_group(struct task_group *tg)
 	list_del_rcu(&tg->list);
 	list_del_rcu(&tg->siblings);
 	spin_unlock_irqrestore(&task_group_lock, flags);
-
-	/* wait for possible concurrent references to cfs_rqs complete */
-	call_rcu(&tg->rcu, free_sched_group_rcu);
 }
 
 /* change task's runqueue when it moves between groups.
@@ -7563,6 +7570,19 @@ static struct cgroup_subsys_state *cpu_cgroup_css_alloc(struct cgroup *cgrp)
 	return &tg->css;
 }
 
+static int cpu_cgroup_css_online(struct cgroup *cgrp)
+{
+	struct task_group *tg = cgroup_tg(cgrp);
+	struct task_group *parent;
+
+	if (!cgrp->parent)
+		return 0;
+
+	parent = cgroup_tg(cgrp->parent);
+	sched_online_group(tg, parent);
+	return 0;
+}
+
 static void cpu_cgroup_css_free(struct cgroup *cgrp)
 {
 	struct task_group *tg = cgroup_tg(cgrp);
@@ -7570,6 +7590,13 @@ static void cpu_cgroup_css_free(struct cgroup *cgrp)
 	sched_destroy_group(tg);
 }
 
+static void cpu_cgroup_css_offline(struct cgroup *cgrp)
+{
+	struct task_group *tg = cgroup_tg(cgrp);
+
+	sched_offline_group(tg);
+}
+
 static int cpu_cgroup_can_attach(struct cgroup *cgrp,
 				 struct cgroup_taskset *tset)
 {
@@ -7925,6 +7952,8 @@ struct cgroup_subsys cpu_cgroup_subsys = {
 	.name		= "cpu",
 	.css_alloc	= cpu_cgroup_css_alloc,
 	.css_free	= cpu_cgroup_css_free,
+	.css_online	= cpu_cgroup_css_online,
+	.css_offline	= cpu_cgroup_css_offline,
 	.can_attach	= cpu_cgroup_can_attach,
 	.attach		= cpu_cgroup_attach,
 	.exit		= cpu_cgroup_exit,
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/6] sched: remove redundant NULL cgroup check in task_group_path()
  2013-01-17  8:03 [PATCH 1/6] cgroup: initialize cgrp->dentry before css_alloc() Li Zefan
  2013-01-17  8:03 ` [PATCH 2/6] sched: split out tg creation/destruction to css_online/css_offline Li Zefan
@ 2013-01-17  8:03 ` Li Zefan
  2013-01-17  8:04 ` [PATCH 4/6] cgroup: remove duplicate RCU free on struct cgroup Li Zefan
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Li Zefan @ 2013-01-17  8:03 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Peter Zijlstra, Ingo Molnar, LKML, Cgroups

cpu_cgroup_css_online() is called after cgroup is fully created,
and it makes this check redundant.

Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 kernel/sched/debug.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 2cd3c1b..38df0db 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -110,13 +110,6 @@ static char *task_group_path(struct task_group *tg)
 	if (autogroup_path(tg, group_path, PATH_MAX))
 		return group_path;
 
-	/*
-	 * May be NULL if the underlying cgroup isn't fully-created yet
-	 */
-	if (!tg->css.cgroup) {
-		group_path[0] = '\0';
-		return group_path;
-	}
 	cgroup_path(tg->css.cgroup, group_path, PATH_MAX);
 	return group_path;
 }
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/6] cgroup: remove duplicate RCU free on struct cgroup
  2013-01-17  8:03 [PATCH 1/6] cgroup: initialize cgrp->dentry before css_alloc() Li Zefan
  2013-01-17  8:03 ` [PATCH 2/6] sched: split out tg creation/destruction to css_online/css_offline Li Zefan
  2013-01-17  8:03 ` [PATCH 3/6] sched: remove redundant NULL cgroup check in task_group_path() Li Zefan
@ 2013-01-17  8:04 ` Li Zefan
  2013-01-17  8:04 ` [PATCH 5/6] cgroup: remove synchronize_rcu() from cgroup_diput() Li Zefan
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Li Zefan @ 2013-01-17  8:04 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Peter Zijlstra, Ingo Molnar, LKML, Cgroups

When destroying cgroup, though in cgroup_diput() we've called
synchronize_rcu(), we then still have to free cgroup via call_rcu().

The story is, long ago to fix a race between reading /proc/sched_debug
and freeing cgroup, the code was changed to utilize call_rcu().

Now we've splitted part of cpu cgroup css_alloc/free to css_online/offline,
which makes the race vanish, and this grants us to remove call_rcu().

Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 kernel/cgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 23a0ce1..d7fefeb 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -892,7 +892,7 @@ static void cgroup_diput(struct dentry *dentry, struct inode *inode)
 		simple_xattrs_free(&cgrp->xattrs);
 
 		ida_simple_remove(&cgrp->root->cgroup_ida, cgrp->id);
-		kfree_rcu(cgrp, rcu_head);
+		kfree(cgrp);
 	} else {
 		struct cfent *cfe = __d_cfe(dentry);
 		struct cgroup *cgrp = dentry->d_parent->d_fsdata;
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 5/6] cgroup: remove synchronize_rcu() from cgroup_diput()
  2013-01-17  8:03 [PATCH 1/6] cgroup: initialize cgrp->dentry before css_alloc() Li Zefan
                   ` (2 preceding siblings ...)
  2013-01-17  8:04 ` [PATCH 4/6] cgroup: remove duplicate RCU free on struct cgroup Li Zefan
@ 2013-01-17  8:04 ` Li Zefan
  2013-01-17  8:05 ` [PATCH 6/6] cgroup: remove bogus comments in cgroup_diput() Li Zefan
  2013-01-23  0:27 ` [PATCH 1/6] cgroup: initialize cgrp->dentry before css_alloc() Tejun Heo
  5 siblings, 0 replies; 10+ messages in thread
From: Li Zefan @ 2013-01-17  8:04 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Peter Zijlstra, Ingo Molnar, LKML, Cgroups

Free cgroup via call_rcu(). The actual work is done through workqueue.

Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 include/linux/cgroup.h |  1 +
 kernel/cgroup.c        | 73 ++++++++++++++++++++++++++++++--------------------
 2 files changed, 45 insertions(+), 29 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 8118a31..900af59 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -203,6 +203,7 @@ struct cgroup {
 
 	/* For RCU-protected deletion */
 	struct rcu_head rcu_head;
+	struct work_struct free_work;
 
 	/* List of events which userspace want to receive */
 	struct list_head event_list;
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index d7fefeb..83ac78b 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -852,12 +852,52 @@ static struct inode *cgroup_new_inode(umode_t mode, struct super_block *sb)
 	return inode;
 }
 
+static void cgroup_free_fn(struct work_struct *work)
+{
+	struct cgroup *cgrp = container_of(work, struct cgroup, free_work);
+	struct cgroup_subsys *ss;
+
+	mutex_lock(&cgroup_mutex);
+	/*
+	 * Release the subsystem state objects.
+	 */
+	for_each_subsys(cgrp->root, ss)
+		ss->css_free(cgrp);
+
+	cgrp->root->number_of_cgroups--;
+	mutex_unlock(&cgroup_mutex);
+
+	/*
+	 * Drop the active superblock reference that we took when we
+	 * created the cgroup
+	 */
+	deactivate_super(cgrp->root->sb);
+
+	/*
+	 * if we're getting rid of the cgroup, refcount should ensure
+	 * that there are no pidlists left.
+	 */
+	BUG_ON(!list_empty(&cgrp->pidlists));
+
+	simple_xattrs_free(&cgrp->xattrs);
+
+	ida_simple_remove(&cgrp->root->cgroup_ida, cgrp->id);
+	kfree(cgrp);
+}
+
+static void cgroup_free_rcu(struct rcu_head *head)
+{
+	struct cgroup *cgrp = container_of(head, struct cgroup, rcu_head);
+
+	schedule_work(&cgrp->free_work);
+}
+
 static void cgroup_diput(struct dentry *dentry, struct inode *inode)
 {
 	/* is dentry a directory ? if so, kfree() associated cgroup */
 	if (S_ISDIR(inode->i_mode)) {
 		struct cgroup *cgrp = dentry->d_fsdata;
-		struct cgroup_subsys *ss;
+
 		BUG_ON(!(cgroup_is_removed(cgrp)));
 		/* It's possible for external users to be holding css
 		 * reference counts on a cgroup; css_put() needs to
@@ -865,34 +905,7 @@ static void cgroup_diput(struct dentry *dentry, struct inode *inode)
 		 * the reference count in order to know if it needs to
 		 * queue the cgroup to be handled by the release
 		 * agent */
-		synchronize_rcu();
-
-		mutex_lock(&cgroup_mutex);
-		/*
-		 * Release the subsystem state objects.
-		 */
-		for_each_subsys(cgrp->root, ss)
-			ss->css_free(cgrp);
-
-		cgrp->root->number_of_cgroups--;
-		mutex_unlock(&cgroup_mutex);
-
-		/*
-		 * Drop the active superblock reference that we took when we
-		 * created the cgroup
-		 */
-		deactivate_super(cgrp->root->sb);
-
-		/*
-		 * if we're getting rid of the cgroup, refcount should ensure
-		 * that there are no pidlists left.
-		 */
-		BUG_ON(!list_empty(&cgrp->pidlists));
-
-		simple_xattrs_free(&cgrp->xattrs);
-
-		ida_simple_remove(&cgrp->root->cgroup_ida, cgrp->id);
-		kfree(cgrp);
+		call_rcu(&cgrp->rcu_head, cgroup_free_rcu);
 	} else {
 		struct cfent *cfe = __d_cfe(dentry);
 		struct cgroup *cgrp = dentry->d_parent->d_fsdata;
@@ -4163,6 +4176,8 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
 	if (test_bit(CGRP_CPUSET_CLONE_CHILDREN, &parent->flags))
 		set_bit(CGRP_CPUSET_CLONE_CHILDREN, &cgrp->flags);
 
+	INIT_WORK(&cgrp->free_work, cgroup_free_fn);
+
 	for_each_subsys(root, ss) {
 		struct cgroup_subsys_state *css;
 
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 6/6] cgroup: remove bogus comments in cgroup_diput()
  2013-01-17  8:03 [PATCH 1/6] cgroup: initialize cgrp->dentry before css_alloc() Li Zefan
                   ` (3 preceding siblings ...)
  2013-01-17  8:04 ` [PATCH 5/6] cgroup: remove synchronize_rcu() from cgroup_diput() Li Zefan
@ 2013-01-17  8:05 ` Li Zefan
  2013-01-23  0:27 ` [PATCH 1/6] cgroup: initialize cgrp->dentry before css_alloc() Tejun Heo
  5 siblings, 0 replies; 10+ messages in thread
From: Li Zefan @ 2013-01-17  8:05 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Peter Zijlstra, Ingo Molnar, LKML, Cgroups

Since commit 48ddbe194623ae089cc0576e60363f2d2e85662a
("cgroup: make css->refcnt clearing on cgroup removal optional"),
each css holds a ref on cgroup's dentry, so cgroup_diput() won't be
called until all css' refs go down to 0, which invalids the comments.

Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 kernel/cgroup.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 83ac78b..77ab520 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -899,12 +899,6 @@ static void cgroup_diput(struct dentry *dentry, struct inode *inode)
 		struct cgroup *cgrp = dentry->d_fsdata;
 
 		BUG_ON(!(cgroup_is_removed(cgrp)));
-		/* It's possible for external users to be holding css
-		 * reference counts on a cgroup; css_put() needs to
-		 * be able to access the cgroup after decrementing
-		 * the reference count in order to know if it needs to
-		 * queue the cgroup to be handled by the release
-		 * agent */
 		call_rcu(&cgrp->rcu_head, cgroup_free_rcu);
 	} else {
 		struct cfent *cfe = __d_cfe(dentry);
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/6] cgroup: initialize cgrp->dentry before css_alloc()
  2013-01-17  8:03 [PATCH 1/6] cgroup: initialize cgrp->dentry before css_alloc() Li Zefan
                   ` (4 preceding siblings ...)
  2013-01-17  8:05 ` [PATCH 6/6] cgroup: remove bogus comments in cgroup_diput() Li Zefan
@ 2013-01-23  0:27 ` Tejun Heo
  2013-01-23  0:35   ` Andrew Morton
  2013-01-23  2:34   ` Li Zefan
  5 siblings, 2 replies; 10+ messages in thread
From: Tejun Heo @ 2013-01-23  0:27 UTC (permalink / raw)
  To: Li Zefan; +Cc: Peter Zijlstra, Ingo Molnar, LKML, Cgroups, Andrew Morton

(cc'ing Andrew as scheduler folks are difficult to get response from
 these days and I can't think of anyone else to bother :)

Hello, Li.

The cgroup part looks good to me but it would be great if the
descriptions are more detailed, especially, about why the change is
beneficial or what it's aiming at.  I take it that the shed changes
are necessary to facilitate the later cgroup changes?  Can you please
elaborate how?

The scheduler part of changes are mostly mechanical, so it would be
great if we can get ack from scheduler people and route these together.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/6] cgroup: initialize cgrp->dentry before css_alloc()
  2013-01-23  0:27 ` [PATCH 1/6] cgroup: initialize cgrp->dentry before css_alloc() Tejun Heo
@ 2013-01-23  0:35   ` Andrew Morton
  2013-01-23  2:34   ` Li Zefan
  1 sibling, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2013-01-23  0:35 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Li Zefan, Peter Zijlstra, Ingo Molnar, LKML, Cgroups

On Tue, 22 Jan 2013 16:27:54 -0800
Tejun Heo <tj@kernel.org> wrote:

> (cc'ing Andrew as scheduler folks are difficult to get response from
>  these days and I can't think of anyone else to bother :)
> 
> Hello, Li.
> 
> The cgroup part looks good to me but it would be great if the
> descriptions are more detailed, especially, about why the change is
> beneficial or what it's aiming at.  I take it that the shed changes
> are necessary to facilitate the later cgroup changes?  Can you please
> elaborate how?
> 
> The scheduler part of changes are mostly mechanical, so it would be
> great if we can get ack from scheduler people and route these together.
> 

The sched parts do look pretty innocuous and could be viewed as "cgroup
stuff which happens to reside in kernel/sched/".

If you like the code, I think the best you can do is to merge it up,
make sure the sched guys are fully cc'ed then send it in to Linus.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/6] cgroup: initialize cgrp->dentry before css_alloc()
  2013-01-23  0:27 ` [PATCH 1/6] cgroup: initialize cgrp->dentry before css_alloc() Tejun Heo
  2013-01-23  0:35   ` Andrew Morton
@ 2013-01-23  2:34   ` Li Zefan
  2013-01-23 16:35     ` Tejun Heo
  1 sibling, 1 reply; 10+ messages in thread
From: Li Zefan @ 2013-01-23  2:34 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Peter Zijlstra, Ingo Molnar, LKML, Cgroups, Andrew Morton

On 2013/1/23 8:27, Tejun Heo wrote:
> (cc'ing Andrew as scheduler folks are difficult to get response from
>  these days and I can't think of anyone else to bother :)
> 
> Hello, Li.
> 
> The cgroup part looks good to me but it would be great if the
> descriptions are more detailed, especially, about why the change is
> beneficial or what it's aiming at.  I take it that the shed changes
> are necessary to facilitate the later cgroup changes?  Can you please
> elaborate how?
> 

Scheduler changes made in patch #2 are preparation for #3 and #4.

- for #3:

After ss->css_alloc(), there's a small window that tg->css.cgroup is NULL.
With the change, tg won't be seen before ss->css_online(), so the scheduler
won't see NULL tg->css.cgroup.

- for #4:

If tg is unregistered and removed from global list in css_free(), and if
we kfree cgroup right after ss->css_free(), the scheduler can access
invalid tg->css.cgroup, because tg is also protected by RCU.

Without patch #2:

cgroup_rmdir()
  no ss->css_offline()
diput()
  syncronize_rcu()
  ss->css_free()    <-- remove tg from global list,
			and free tg via call_rcu()
  kfree_rcu(cgroup) <-- wait rcu read section

With the change:

cgroup_rmdir()
  ss->css_offline() <-- remove tg from global list
diput()
  synchronize_rcu() <-- wait rcu read section
  ss->css_free();   <-- free tg
  kfree(cgroup);

> The scheduler part of changes are mostly mechanical, so it would be
> great if we can get ack from scheduler people and route these together.
> 
> Thanks.
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/6] cgroup: initialize cgrp->dentry before css_alloc()
  2013-01-23  2:34   ` Li Zefan
@ 2013-01-23 16:35     ` Tejun Heo
  0 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2013-01-23 16:35 UTC (permalink / raw)
  To: Li Zefan; +Cc: Peter Zijlstra, Ingo Molnar, LKML, Cgroups, Andrew Morton

Hello, Li.

On Wed, Jan 23, 2013 at 10:34:53AM +0800, Li Zefan wrote:
> Scheduler changes made in patch #2 are preparation for #3 and #4.
> 
> - for #3:
> 
> After ss->css_alloc(), there's a small window that tg->css.cgroup is NULL.
> With the change, tg won't be seen before ss->css_online(), so the scheduler
> won't see NULL tg->css.cgroup.
> 
> - for #4:
> 
> If tg is unregistered and removed from global list in css_free(), and if
> we kfree cgroup right after ss->css_free(), the scheduler can access
> invalid tg->css.cgroup, because tg is also protected by RCU.

Can you please incorporate the above into the patch descriptions?

Thanks!

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-01-23 16:35 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-17  8:03 [PATCH 1/6] cgroup: initialize cgrp->dentry before css_alloc() Li Zefan
2013-01-17  8:03 ` [PATCH 2/6] sched: split out tg creation/destruction to css_online/css_offline Li Zefan
2013-01-17  8:03 ` [PATCH 3/6] sched: remove redundant NULL cgroup check in task_group_path() Li Zefan
2013-01-17  8:04 ` [PATCH 4/6] cgroup: remove duplicate RCU free on struct cgroup Li Zefan
2013-01-17  8:04 ` [PATCH 5/6] cgroup: remove synchronize_rcu() from cgroup_diput() Li Zefan
2013-01-17  8:05 ` [PATCH 6/6] cgroup: remove bogus comments in cgroup_diput() Li Zefan
2013-01-23  0:27 ` [PATCH 1/6] cgroup: initialize cgrp->dentry before css_alloc() Tejun Heo
2013-01-23  0:35   ` Andrew Morton
2013-01-23  2:34   ` Li Zefan
2013-01-23 16:35     ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).