linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET cgroup/for-3.8] cpuset: drop cpuset->stack_list and ->parent
@ 2012-11-28 22:26 Tejun Heo
  2012-11-28 22:26 ` [PATCH 1/3] cpuset: implement cgroup_rightmost_descendant() Tejun Heo
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Tejun Heo @ 2012-11-28 22:26 UTC (permalink / raw)
  To: lizefan, paul, glommer; +Cc: containers, cgroups, peterz, mhocko, linux-kernel

Hello, guys.

cpuset implements its own descendant iteration using
cpuset->stack_list and has its own ->parent pointer.  There's nothing
cpuset specific about descendant walking or finding the parent.  This
patchset makes cpuset use cgroup generic API instead.

 0001-cpuset-implement-cgroup_rightmost_descendant.patch
 0002-cpuset-replace-cpuset-stack_list-with-cpuset_for_eac.patch
 0003-cpuset-remove-cpuset-parent.patch

0001 implements cgroup_rightmost_descendant() which can be used to
skip subtree during pre-order tree walk.  Michal, maybe memcg can use
it too?

0002 replaces cpuset->stack_list with generic
for_each_descendasnt_pre().

0003 replaces cpuset->parent with cgroup->parent.

This patchset is on top of

cgroup/for-3.8 fddfb02ad0 ("cgroup: move list add after list head initilization")
+ [1] "[PATCHSET cgroup/for-3.8] cpuset: decouple cpuset locking from cgroup core"

and available in the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-cpuset-iter

 include/linux/cgroup.h |    1
 kernel/cgroup.c        |   26 ++++++++
 kernel/cpuset.c        |  151 +++++++++++++++++++++----------------------------
 3 files changed, 92 insertions(+), 86 deletions(-)

Thanks.

--
tejun

[1] http://thread.gmane.org/gmane.linux.kernel.cgroups/5251

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/3] cpuset: implement cgroup_rightmost_descendant()
  2012-11-28 22:26 [PATCHSET cgroup/for-3.8] cpuset: drop cpuset->stack_list and ->parent Tejun Heo
@ 2012-11-28 22:26 ` Tejun Heo
  2012-12-03 16:03   ` Michal Hocko
  2012-11-28 22:27 ` [PATCH 2/3] cpuset: replace cpuset->stack_list with cpuset_for_each_descendant_pre() Tejun Heo
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Tejun Heo @ 2012-11-28 22:26 UTC (permalink / raw)
  To: lizefan, paul, glommer
  Cc: containers, cgroups, peterz, mhocko, linux-kernel, Tejun Heo

Implement cgroup_rightmost_descendant() which returns the right most
descendant of the specified cgroup.  This can be used to skip the
cgroup's subtree while iterating with
cgroup_for_each_descendant_pre().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Michal Hocko <mhocko@suse.cz>
---
 include/linux/cgroup.h |  1 +
 kernel/cgroup.c        | 26 ++++++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 7d73905..860ca0f 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -558,6 +558,7 @@ static inline struct cgroup* task_cgroup(struct task_struct *task,
 
 struct cgroup *cgroup_next_descendant_pre(struct cgroup *pos,
 					  struct cgroup *cgroup);
+struct cgroup *cgroup_rightmost_descendant(struct cgroup *pos);
 
 /**
  * cgroup_for_each_descendant_pre - pre-order walk of a cgroup's descendants
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index c02b055..bbf7460 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -3009,6 +3009,32 @@ struct cgroup *cgroup_next_descendant_pre(struct cgroup *pos,
 }
 EXPORT_SYMBOL_GPL(cgroup_next_descendant_pre);
 
+/**
+ * cgroup_rightmost_descendant - return the rightmost descendant of a cgroup
+ * @cgrp: cgroup of interest
+ *
+ * Return the rightmost descendant of @cgrp.  If there's no descendant,
+ * @cgrp is returned.  This can be used during pre-order traversal to skip
+ * subtree of @cgrp.
+ */
+struct cgroup *cgroup_rightmost_descendant(struct cgroup *pos)
+{
+	struct cgroup *last, *tmp;
+
+	WARN_ON_ONCE(!rcu_read_lock_held());
+
+	do {
+		last = pos;
+		/* ->prev isn't RCU safe, walk ->next till the end */
+		pos = NULL;
+		list_for_each_entry_rcu(tmp, &last->children, sibling)
+			pos = tmp;
+	} while (pos);
+
+	return last;
+}
+EXPORT_SYMBOL_GPL(cgroup_rightmost_descendant);
+
 static struct cgroup *cgroup_leftmost_descendant(struct cgroup *pos)
 {
 	struct cgroup *last;
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/3] cpuset: replace cpuset->stack_list with cpuset_for_each_descendant_pre()
  2012-11-28 22:26 [PATCHSET cgroup/for-3.8] cpuset: drop cpuset->stack_list and ->parent Tejun Heo
  2012-11-28 22:26 ` [PATCH 1/3] cpuset: implement cgroup_rightmost_descendant() Tejun Heo
@ 2012-11-28 22:27 ` Tejun Heo
  2012-12-03 16:18   ` Michal Hocko
  2012-11-28 22:27 ` [PATCH 3/3] cpuset: remove cpuset->parent Tejun Heo
  2012-12-03 15:55 ` [PATCHSET cgroup/for-3.8] cpuset: drop cpuset->stack_list and ->parent Michal Hocko
  3 siblings, 1 reply; 8+ messages in thread
From: Tejun Heo @ 2012-11-28 22:27 UTC (permalink / raw)
  To: lizefan, paul, glommer
  Cc: containers, cgroups, peterz, mhocko, linux-kernel, Tejun Heo

Implement cpuset_for_each_descendant_pre() and replace the
cpuset-specific tree walking using cpuset->stack_list with it.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/cpuset.c | 123 ++++++++++++++++++++++----------------------------------
 1 file changed, 48 insertions(+), 75 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 2ee0e03..3a01730 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -103,9 +103,6 @@ struct cpuset {
 	/* for custom sched domain */
 	int relax_domain_level;
 
-	/* used for walking a cpuset hierarchy */
-	struct list_head stack_list;
-
 	struct work_struct hotplug_work;
 };
 
@@ -207,6 +204,20 @@ static struct cpuset top_cpuset = {
 	cgroup_for_each_child((pos_cgrp), (parent_cs)->css.cgroup)	\
 		if (is_cpuset_online(((child_cs) = cgroup_cs((pos_cgrp)))))
 
+/**
+ * cpuset_for_each_descendant_pre - pre-order walk of a cpuset's descendants
+ * @des_cs: loop cursor pointing to the current descendant
+ * @pos_cgrp: used for iteration
+ * @root_cs: target cpuset to walk ancestor of
+ *
+ * Walk @des_cs through the online descendants of @root_cs.  Must be used
+ * with RCU read locked.  The caller may modify @pos_cgrp by calling
+ * cgroup_rightmost_descendant() to skip subtree.
+ */
+#define cpuset_for_each_descendant_pre(des_cs, pos_cgrp, root_cs)	\
+	cgroup_for_each_descendant_pre((pos_cgrp), (root_cs)->css.cgroup) \
+		if (is_cpuset_online(((des_cs) = cgroup_cs((pos_cgrp)))))
+
 /*
  * There are two global mutexes guarding cpuset structures - cpuset_mutex
  * and callback_mutex.  The latter may nest inside the former.  We also
@@ -507,31 +518,24 @@ update_domain_attr(struct sched_domain_attr *dattr, struct cpuset *c)
 	return;
 }
 
-static void
-update_domain_attr_tree(struct sched_domain_attr *dattr, struct cpuset *c)
+static void update_domain_attr_tree(struct sched_domain_attr *dattr,
+				    struct cpuset *root_cs)
 {
-	LIST_HEAD(q);
-
-	list_add(&c->stack_list, &q);
-	while (!list_empty(&q)) {
-		struct cpuset *cp;
-		struct cgroup *cont;
-		struct cpuset *child;
-
-		cp = list_first_entry(&q, struct cpuset, stack_list);
-		list_del(q.next);
+	struct cpuset *cp;
+	struct cgroup *pos_cgrp;
 
-		if (cpumask_empty(cp->cpus_allowed))
+	rcu_read_lock();
+	cpuset_for_each_descendant_pre(cp, pos_cgrp, root_cs) {
+		/* skip the whole subtree if @cp doesn't have any CPU */
+		if (cpumask_empty(cp->cpus_allowed)) {
+			pos_cgrp = cgroup_rightmost_descendant(pos_cgrp);
 			continue;
+		}
 
 		if (is_sched_load_balance(cp))
 			update_domain_attr(dattr, cp);
-
-		rcu_read_lock();
-		cpuset_for_each_child(child, cont, cp)
-			list_add_tail(&child->stack_list, &q);
-		rcu_read_unlock();
 	}
+	rcu_read_unlock();
 }
 
 /*
@@ -591,7 +595,6 @@ update_domain_attr_tree(struct sched_domain_attr *dattr, struct cpuset *c)
 static int generate_sched_domains(cpumask_var_t **domains,
 			struct sched_domain_attr **attributes)
 {
-	LIST_HEAD(q);		/* queue of cpusets to be scanned */
 	struct cpuset *cp;	/* scans q */
 	struct cpuset **csa;	/* array of all cpuset ptrs */
 	int csn;		/* how many cpuset ptrs in csa so far */
@@ -600,6 +603,7 @@ static int generate_sched_domains(cpumask_var_t **domains,
 	struct sched_domain_attr *dattr;  /* attributes for custom domains */
 	int ndoms = 0;		/* number of sched domains in result */
 	int nslot;		/* next empty doms[] struct cpumask slot */
+	struct cgroup *pos_cgrp;
 
 	doms = NULL;
 	dattr = NULL;
@@ -627,33 +631,27 @@ static int generate_sched_domains(cpumask_var_t **domains,
 		goto done;
 	csn = 0;
 
-	list_add(&top_cpuset.stack_list, &q);
-	while (!list_empty(&q)) {
-		struct cgroup *cont;
-		struct cpuset *child;   /* scans child cpusets of cp */
-
-		cp = list_first_entry(&q, struct cpuset, stack_list);
-		list_del(q.next);
-
-		if (cpumask_empty(cp->cpus_allowed))
-			continue;
-
+	rcu_read_lock();
+	cpuset_for_each_descendant_pre(cp, pos_cgrp, &top_cpuset) {
 		/*
-		 * All child cpusets contain a subset of the parent's cpus, so
-		 * just skip them, and then we call update_domain_attr_tree()
-		 * to calc relax_domain_level of the corresponding sched
-		 * domain.
+		 * Continue traversing beyond @cp iff @cp has some CPUs and
+		 * isn't load balancing.  The former is obvious.  The
+		 * latter: All child cpusets contain a subset of the
+		 * parent's cpus, so just skip them, and then we call
+		 * update_domain_attr_tree() to calc relax_domain_level of
+		 * the corresponding sched domain.
 		 */
-		if (is_sched_load_balance(cp)) {
-			csa[csn++] = cp;
+		if (!cpumask_empty(cp->cpus_allowed) &&
+		    !is_sched_load_balance(cp))
 			continue;
-		}
 
-		rcu_read_lock();
-		cpuset_for_each_child(child, cont, cp)
-			list_add_tail(&child->stack_list, &q);
-		rcu_read_unlock();
-  	}
+		if (is_sched_load_balance(cp))
+			csa[csn++] = cp;
+
+		/* skip @cp's subtree */
+		pos_cgrp = cgroup_rightmost_descendant(pos_cgrp);
+	}
+	rcu_read_unlock();
 
 	for (i = 0; i < csn; i++)
 		csa[i]->pn = i;
@@ -2059,31 +2057,6 @@ static void remove_tasks_in_empty_cpuset(struct cpuset *cs)
 	move_member_tasks_to_cpuset(cs, parent);
 }
 
-/*
- * Helper function to traverse cpusets.
- * It can be used to walk the cpuset tree from top to bottom, completing
- * one layer before dropping down to the next (thus always processing a
- * node before any of its children).
- */
-static struct cpuset *cpuset_next(struct list_head *queue)
-{
-	struct cpuset *cp;
-	struct cpuset *child;	/* scans child cpusets of cp */
-	struct cgroup *cont;
-
-	if (list_empty(queue))
-		return NULL;
-
-	cp = list_first_entry(queue, struct cpuset, stack_list);
-	list_del(queue->next);
-	rcu_read_lock();
-	cpuset_for_each_child(child, cont, cp)
-		list_add_tail(&child->stack_list, queue);
-	rcu_read_unlock();
-
-	return cp;
-}
-
 /**
  * cpuset_propagate_hotplug_workfn - propagate CPU/memory hotplug to a cpuset
  * @cs: cpuset in interest
@@ -2220,12 +2193,12 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
 	/* if cpus or mems went down, we need to propagate to descendants */
 	if (cpus_offlined || mems_offlined) {
 		struct cpuset *cs;
-		LIST_HEAD(queue);
+		struct cgroup *pos_cgrp;
 
-		list_add_tail(&top_cpuset.stack_list, &queue);
-		while ((cs = cpuset_next(&queue)))
-			if (cs != &top_cpuset)
-				schedule_cpuset_propagate_hotplug(cs);
+		rcu_read_lock();
+		cpuset_for_each_descendant_pre(cs, pos_cgrp, &top_cpuset)
+			schedule_cpuset_propagate_hotplug(cs);
+		rcu_read_unlock();
 	}
 
 	mutex_unlock(&cpuset_mutex);
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/3] cpuset: remove cpuset->parent
  2012-11-28 22:26 [PATCHSET cgroup/for-3.8] cpuset: drop cpuset->stack_list and ->parent Tejun Heo
  2012-11-28 22:26 ` [PATCH 1/3] cpuset: implement cgroup_rightmost_descendant() Tejun Heo
  2012-11-28 22:27 ` [PATCH 2/3] cpuset: replace cpuset->stack_list with cpuset_for_each_descendant_pre() Tejun Heo
@ 2012-11-28 22:27 ` Tejun Heo
  2012-12-03 16:20   ` Michal Hocko
  2012-12-03 15:55 ` [PATCHSET cgroup/for-3.8] cpuset: drop cpuset->stack_list and ->parent Michal Hocko
  3 siblings, 1 reply; 8+ messages in thread
From: Tejun Heo @ 2012-11-28 22:27 UTC (permalink / raw)
  To: lizefan, paul, glommer
  Cc: containers, cgroups, peterz, mhocko, linux-kernel, Tejun Heo

cgroup already tracks the hierarchy.  Follow cgroup->parent to find
the parent and drop cpuset->parent.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/cpuset.c | 28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 3a01730..d804415 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -87,8 +87,6 @@ struct cpuset {
 	cpumask_var_t cpus_allowed;	/* CPUs allowed to tasks in cpuset */
 	nodemask_t mems_allowed;	/* Memory Nodes allowed to tasks */
 
-	struct cpuset *parent;		/* my parent */
-
 	struct fmeter fmeter;		/* memory_pressure filter */
 
 	/*
@@ -120,6 +118,15 @@ static inline struct cpuset *task_cs(struct task_struct *task)
 			    struct cpuset, css);
 }
 
+static inline struct cpuset *parent_cs(const struct cpuset *cs)
+{
+	struct cgroup *pcgrp = cs->css.cgroup->parent;
+
+	if (pcgrp)
+		return cgroup_cs(pcgrp);
+	return NULL;
+}
+
 #ifdef CONFIG_NUMA
 static inline bool task_has_mempolicy(struct task_struct *task)
 {
@@ -323,7 +330,7 @@ static void guarantee_online_cpus(const struct cpuset *cs,
 				  struct cpumask *pmask)
 {
 	while (cs && !cpumask_intersects(cs->cpus_allowed, cpu_online_mask))
-		cs = cs->parent;
+		cs = parent_cs(cs);
 	if (cs)
 		cpumask_and(pmask, cs->cpus_allowed, cpu_online_mask);
 	else
@@ -348,7 +355,7 @@ static void guarantee_online_mems(const struct cpuset *cs, nodemask_t *pmask)
 {
 	while (cs && !nodes_intersects(cs->mems_allowed,
 					node_states[N_HIGH_MEMORY]))
-		cs = cs->parent;
+		cs = parent_cs(cs);
 	if (cs)
 		nodes_and(*pmask, cs->mems_allowed,
 					node_states[N_HIGH_MEMORY]);
@@ -461,7 +468,7 @@ static int validate_change(const struct cpuset *cur, const struct cpuset *trial)
 	if (cur == &top_cpuset)
 		goto out;
 
-	par = cur->parent;
+	par = parent_cs(cur);
 
 	/* We must be a subset of our parent cpuset */
 	ret = -EACCES;
@@ -1860,7 +1867,6 @@ static struct cgroup_subsys_state *cpuset_css_alloc(struct cgroup *cont)
 	fmeter_init(&cs->fmeter);
 	INIT_WORK(&cs->hotplug_work, cpuset_propagate_hotplug_workfn);
 	cs->relax_domain_level = -1;
-	cs->parent = cgroup_cs(cont->parent);
 
 	return &cs->css;
 }
@@ -1868,7 +1874,7 @@ static struct cgroup_subsys_state *cpuset_css_alloc(struct cgroup *cont)
 static int cpuset_css_online(struct cgroup *cgrp)
 {
 	struct cpuset *cs = cgroup_cs(cgrp);
-	struct cpuset *parent = cs->parent;
+	struct cpuset *parent = parent_cs(cs);
 	struct cpuset *tmp_cs;
 	struct cgroup *pos_cg;
 
@@ -2049,10 +2055,10 @@ static void remove_tasks_in_empty_cpuset(struct cpuset *cs)
 	 * Find its next-highest non-empty parent, (top cpuset
 	 * has online cpus, so can't be empty).
 	 */
-	parent = cs->parent;
+	parent = parent_cs(cs);
 	while (cpumask_empty(parent->cpus_allowed) ||
 			nodes_empty(parent->mems_allowed))
-		parent = parent->parent;
+		parent = parent_cs(parent);
 
 	move_member_tasks_to_cpuset(cs, parent);
 }
@@ -2353,8 +2359,8 @@ int cpuset_nodemask_valid_mems_allowed(nodemask_t *nodemask)
  */
 static const struct cpuset *nearest_hardwall_ancestor(const struct cpuset *cs)
 {
-	while (!(is_mem_exclusive(cs) || is_mem_hardwall(cs)) && cs->parent)
-		cs = cs->parent;
+	while (!(is_mem_exclusive(cs) || is_mem_hardwall(cs)) && parent_cs(cs))
+		cs = parent_cs(cs);
 	return cs;
 }
 
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCHSET cgroup/for-3.8] cpuset: drop cpuset->stack_list and ->parent
  2012-11-28 22:26 [PATCHSET cgroup/for-3.8] cpuset: drop cpuset->stack_list and ->parent Tejun Heo
                   ` (2 preceding siblings ...)
  2012-11-28 22:27 ` [PATCH 3/3] cpuset: remove cpuset->parent Tejun Heo
@ 2012-12-03 15:55 ` Michal Hocko
  3 siblings, 0 replies; 8+ messages in thread
From: Michal Hocko @ 2012-12-03 15:55 UTC (permalink / raw)
  To: Tejun Heo
  Cc: lizefan, paul, glommer, containers, cgroups, peterz, linux-kernel

On Wed 28-11-12 14:26:58, Tejun Heo wrote:
[...]
> 0001 implements cgroup_rightmost_descendant() which can be used to
> skip subtree during pre-order tree walk.  Michal, maybe memcg can use
> it too?

I like this but I do not see any immediate use for it because we do not
have any generic conditions when the whole subtree might be skipped at
the moment.
Maybe it will turn out being useful for the soft limit reclaim but I
haven't thought about it more.
[...]
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] cpuset: implement cgroup_rightmost_descendant()
  2012-11-28 22:26 ` [PATCH 1/3] cpuset: implement cgroup_rightmost_descendant() Tejun Heo
@ 2012-12-03 16:03   ` Michal Hocko
  0 siblings, 0 replies; 8+ messages in thread
From: Michal Hocko @ 2012-12-03 16:03 UTC (permalink / raw)
  To: Tejun Heo
  Cc: lizefan, paul, glommer, containers, cgroups, peterz, linux-kernel

On Wed 28-11-12 14:26:59, Tejun Heo wrote:
> Implement cgroup_rightmost_descendant() which returns the right most
> descendant of the specified cgroup.  This can be used to skip the
> cgroup's subtree while iterating with
> cgroup_for_each_descendant_pre().
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Michal Hocko <mhocko@suse.cz>

Acked-by: Michal Hocko <mhocko@suse.cz>

Just a nit bellow

[...]
> +/**
> + * cgroup_rightmost_descendant - return the rightmost descendant of a cgroup
> + * @cgrp: cgroup of interest
> + *
> + * Return the rightmost descendant of @cgrp.  If there's no descendant,
> + * @cgrp is returned.  This can be used during pre-order traversal to skip
> + * subtree of @cgrp.
> + */

s/cgrp/pos/

> +struct cgroup *cgroup_rightmost_descendant(struct cgroup *pos)
[...]

Thanks
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] cpuset: replace cpuset->stack_list with cpuset_for_each_descendant_pre()
  2012-11-28 22:27 ` [PATCH 2/3] cpuset: replace cpuset->stack_list with cpuset_for_each_descendant_pre() Tejun Heo
@ 2012-12-03 16:18   ` Michal Hocko
  0 siblings, 0 replies; 8+ messages in thread
From: Michal Hocko @ 2012-12-03 16:18 UTC (permalink / raw)
  To: Tejun Heo
  Cc: lizefan, paul, glommer, containers, cgroups, peterz, linux-kernel

On Wed 28-11-12 14:27:00, Tejun Heo wrote:
> Implement cpuset_for_each_descendant_pre() and replace the
> cpuset-specific tree walking using cpuset->stack_list with it.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>

Reviewed-by: Michal Hocko <mhocko@suse.cz>

> ---
>  kernel/cpuset.c | 123 ++++++++++++++++++++++----------------------------------
>  1 file changed, 48 insertions(+), 75 deletions(-)
> 
> diff --git a/kernel/cpuset.c b/kernel/cpuset.c
> index 2ee0e03..3a01730 100644
> --- a/kernel/cpuset.c
> +++ b/kernel/cpuset.c
> @@ -103,9 +103,6 @@ struct cpuset {
>  	/* for custom sched domain */
>  	int relax_domain_level;
>  
> -	/* used for walking a cpuset hierarchy */
> -	struct list_head stack_list;
> -
>  	struct work_struct hotplug_work;
>  };
>  
> @@ -207,6 +204,20 @@ static struct cpuset top_cpuset = {
>  	cgroup_for_each_child((pos_cgrp), (parent_cs)->css.cgroup)	\
>  		if (is_cpuset_online(((child_cs) = cgroup_cs((pos_cgrp)))))
>  
> +/**
> + * cpuset_for_each_descendant_pre - pre-order walk of a cpuset's descendants
> + * @des_cs: loop cursor pointing to the current descendant
> + * @pos_cgrp: used for iteration
> + * @root_cs: target cpuset to walk ancestor of
> + *
> + * Walk @des_cs through the online descendants of @root_cs.  Must be used
> + * with RCU read locked.  The caller may modify @pos_cgrp by calling
> + * cgroup_rightmost_descendant() to skip subtree.
> + */
> +#define cpuset_for_each_descendant_pre(des_cs, pos_cgrp, root_cs)	\
> +	cgroup_for_each_descendant_pre((pos_cgrp), (root_cs)->css.cgroup) \
> +		if (is_cpuset_online(((des_cs) = cgroup_cs((pos_cgrp)))))
> +
>  /*
>   * There are two global mutexes guarding cpuset structures - cpuset_mutex
>   * and callback_mutex.  The latter may nest inside the former.  We also
> @@ -507,31 +518,24 @@ update_domain_attr(struct sched_domain_attr *dattr, struct cpuset *c)
>  	return;
>  }
>  
> -static void
> -update_domain_attr_tree(struct sched_domain_attr *dattr, struct cpuset *c)
> +static void update_domain_attr_tree(struct sched_domain_attr *dattr,
> +				    struct cpuset *root_cs)
>  {
> -	LIST_HEAD(q);
> -
> -	list_add(&c->stack_list, &q);
> -	while (!list_empty(&q)) {
> -		struct cpuset *cp;
> -		struct cgroup *cont;
> -		struct cpuset *child;
> -
> -		cp = list_first_entry(&q, struct cpuset, stack_list);
> -		list_del(q.next);
> +	struct cpuset *cp;
> +	struct cgroup *pos_cgrp;
>  
> -		if (cpumask_empty(cp->cpus_allowed))
> +	rcu_read_lock();
> +	cpuset_for_each_descendant_pre(cp, pos_cgrp, root_cs) {
> +		/* skip the whole subtree if @cp doesn't have any CPU */
> +		if (cpumask_empty(cp->cpus_allowed)) {
> +			pos_cgrp = cgroup_rightmost_descendant(pos_cgrp);
>  			continue;
> +		}
>  
>  		if (is_sched_load_balance(cp))
>  			update_domain_attr(dattr, cp);
> -
> -		rcu_read_lock();
> -		cpuset_for_each_child(child, cont, cp)
> -			list_add_tail(&child->stack_list, &q);
> -		rcu_read_unlock();
>  	}
> +	rcu_read_unlock();
>  }
>  
>  /*
> @@ -591,7 +595,6 @@ update_domain_attr_tree(struct sched_domain_attr *dattr, struct cpuset *c)
>  static int generate_sched_domains(cpumask_var_t **domains,
>  			struct sched_domain_attr **attributes)
>  {
> -	LIST_HEAD(q);		/* queue of cpusets to be scanned */
>  	struct cpuset *cp;	/* scans q */
>  	struct cpuset **csa;	/* array of all cpuset ptrs */
>  	int csn;		/* how many cpuset ptrs in csa so far */
> @@ -600,6 +603,7 @@ static int generate_sched_domains(cpumask_var_t **domains,
>  	struct sched_domain_attr *dattr;  /* attributes for custom domains */
>  	int ndoms = 0;		/* number of sched domains in result */
>  	int nslot;		/* next empty doms[] struct cpumask slot */
> +	struct cgroup *pos_cgrp;
>  
>  	doms = NULL;
>  	dattr = NULL;
> @@ -627,33 +631,27 @@ static int generate_sched_domains(cpumask_var_t **domains,
>  		goto done;
>  	csn = 0;
>  
> -	list_add(&top_cpuset.stack_list, &q);
> -	while (!list_empty(&q)) {
> -		struct cgroup *cont;
> -		struct cpuset *child;   /* scans child cpusets of cp */
> -
> -		cp = list_first_entry(&q, struct cpuset, stack_list);
> -		list_del(q.next);
> -
> -		if (cpumask_empty(cp->cpus_allowed))
> -			continue;
> -
> +	rcu_read_lock();
> +	cpuset_for_each_descendant_pre(cp, pos_cgrp, &top_cpuset) {
>  		/*
> -		 * All child cpusets contain a subset of the parent's cpus, so
> -		 * just skip them, and then we call update_domain_attr_tree()
> -		 * to calc relax_domain_level of the corresponding sched
> -		 * domain.
> +		 * Continue traversing beyond @cp iff @cp has some CPUs and
> +		 * isn't load balancing.  The former is obvious.  The
> +		 * latter: All child cpusets contain a subset of the
> +		 * parent's cpus, so just skip them, and then we call
> +		 * update_domain_attr_tree() to calc relax_domain_level of
> +		 * the corresponding sched domain.
>  		 */
> -		if (is_sched_load_balance(cp)) {
> -			csa[csn++] = cp;
> +		if (!cpumask_empty(cp->cpus_allowed) &&
> +		    !is_sched_load_balance(cp))
>  			continue;
> -		}
>  
> -		rcu_read_lock();
> -		cpuset_for_each_child(child, cont, cp)
> -			list_add_tail(&child->stack_list, &q);
> -		rcu_read_unlock();
> -  	}
> +		if (is_sched_load_balance(cp))
> +			csa[csn++] = cp;
> +
> +		/* skip @cp's subtree */
> +		pos_cgrp = cgroup_rightmost_descendant(pos_cgrp);
> +	}
> +	rcu_read_unlock();
>  
>  	for (i = 0; i < csn; i++)
>  		csa[i]->pn = i;
> @@ -2059,31 +2057,6 @@ static void remove_tasks_in_empty_cpuset(struct cpuset *cs)
>  	move_member_tasks_to_cpuset(cs, parent);
>  }
>  
> -/*
> - * Helper function to traverse cpusets.
> - * It can be used to walk the cpuset tree from top to bottom, completing
> - * one layer before dropping down to the next (thus always processing a
> - * node before any of its children).
> - */
> -static struct cpuset *cpuset_next(struct list_head *queue)
> -{
> -	struct cpuset *cp;
> -	struct cpuset *child;	/* scans child cpusets of cp */
> -	struct cgroup *cont;
> -
> -	if (list_empty(queue))
> -		return NULL;
> -
> -	cp = list_first_entry(queue, struct cpuset, stack_list);
> -	list_del(queue->next);
> -	rcu_read_lock();
> -	cpuset_for_each_child(child, cont, cp)
> -		list_add_tail(&child->stack_list, queue);
> -	rcu_read_unlock();
> -
> -	return cp;
> -}
> -
>  /**
>   * cpuset_propagate_hotplug_workfn - propagate CPU/memory hotplug to a cpuset
>   * @cs: cpuset in interest
> @@ -2220,12 +2193,12 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
>  	/* if cpus or mems went down, we need to propagate to descendants */
>  	if (cpus_offlined || mems_offlined) {
>  		struct cpuset *cs;
> -		LIST_HEAD(queue);
> +		struct cgroup *pos_cgrp;
>  
> -		list_add_tail(&top_cpuset.stack_list, &queue);
> -		while ((cs = cpuset_next(&queue)))
> -			if (cs != &top_cpuset)
> -				schedule_cpuset_propagate_hotplug(cs);
> +		rcu_read_lock();
> +		cpuset_for_each_descendant_pre(cs, pos_cgrp, &top_cpuset)
> +			schedule_cpuset_propagate_hotplug(cs);
> +		rcu_read_unlock();
>  	}
>  
>  	mutex_unlock(&cpuset_mutex);
> -- 
> 1.7.11.7
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] cpuset: remove cpuset->parent
  2012-11-28 22:27 ` [PATCH 3/3] cpuset: remove cpuset->parent Tejun Heo
@ 2012-12-03 16:20   ` Michal Hocko
  0 siblings, 0 replies; 8+ messages in thread
From: Michal Hocko @ 2012-12-03 16:20 UTC (permalink / raw)
  To: Tejun Heo
  Cc: lizefan, paul, glommer, containers, cgroups, peterz, linux-kernel

On Wed 28-11-12 14:27:01, Tejun Heo wrote:
> cgroup already tracks the hierarchy.  Follow cgroup->parent to find
> the parent and drop cpuset->parent.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>

Yes, makes total sense.
Reviewed-by: Michal Hocko <mhocko@suse.cz>

> ---
>  kernel/cpuset.c | 28 +++++++++++++++++-----------
>  1 file changed, 17 insertions(+), 11 deletions(-)
> 
> diff --git a/kernel/cpuset.c b/kernel/cpuset.c
> index 3a01730..d804415 100644
> --- a/kernel/cpuset.c
> +++ b/kernel/cpuset.c
> @@ -87,8 +87,6 @@ struct cpuset {
>  	cpumask_var_t cpus_allowed;	/* CPUs allowed to tasks in cpuset */
>  	nodemask_t mems_allowed;	/* Memory Nodes allowed to tasks */
>  
> -	struct cpuset *parent;		/* my parent */
> -
>  	struct fmeter fmeter;		/* memory_pressure filter */
>  
>  	/*
> @@ -120,6 +118,15 @@ static inline struct cpuset *task_cs(struct task_struct *task)
>  			    struct cpuset, css);
>  }
>  
> +static inline struct cpuset *parent_cs(const struct cpuset *cs)
> +{
> +	struct cgroup *pcgrp = cs->css.cgroup->parent;
> +
> +	if (pcgrp)
> +		return cgroup_cs(pcgrp);
> +	return NULL;
> +}
> +
>  #ifdef CONFIG_NUMA
>  static inline bool task_has_mempolicy(struct task_struct *task)
>  {
> @@ -323,7 +330,7 @@ static void guarantee_online_cpus(const struct cpuset *cs,
>  				  struct cpumask *pmask)
>  {
>  	while (cs && !cpumask_intersects(cs->cpus_allowed, cpu_online_mask))
> -		cs = cs->parent;
> +		cs = parent_cs(cs);
>  	if (cs)
>  		cpumask_and(pmask, cs->cpus_allowed, cpu_online_mask);
>  	else
> @@ -348,7 +355,7 @@ static void guarantee_online_mems(const struct cpuset *cs, nodemask_t *pmask)
>  {
>  	while (cs && !nodes_intersects(cs->mems_allowed,
>  					node_states[N_HIGH_MEMORY]))
> -		cs = cs->parent;
> +		cs = parent_cs(cs);
>  	if (cs)
>  		nodes_and(*pmask, cs->mems_allowed,
>  					node_states[N_HIGH_MEMORY]);
> @@ -461,7 +468,7 @@ static int validate_change(const struct cpuset *cur, const struct cpuset *trial)
>  	if (cur == &top_cpuset)
>  		goto out;
>  
> -	par = cur->parent;
> +	par = parent_cs(cur);
>  
>  	/* We must be a subset of our parent cpuset */
>  	ret = -EACCES;
> @@ -1860,7 +1867,6 @@ static struct cgroup_subsys_state *cpuset_css_alloc(struct cgroup *cont)
>  	fmeter_init(&cs->fmeter);
>  	INIT_WORK(&cs->hotplug_work, cpuset_propagate_hotplug_workfn);
>  	cs->relax_domain_level = -1;
> -	cs->parent = cgroup_cs(cont->parent);
>  
>  	return &cs->css;
>  }
> @@ -1868,7 +1874,7 @@ static struct cgroup_subsys_state *cpuset_css_alloc(struct cgroup *cont)
>  static int cpuset_css_online(struct cgroup *cgrp)
>  {
>  	struct cpuset *cs = cgroup_cs(cgrp);
> -	struct cpuset *parent = cs->parent;
> +	struct cpuset *parent = parent_cs(cs);
>  	struct cpuset *tmp_cs;
>  	struct cgroup *pos_cg;
>  
> @@ -2049,10 +2055,10 @@ static void remove_tasks_in_empty_cpuset(struct cpuset *cs)
>  	 * Find its next-highest non-empty parent, (top cpuset
>  	 * has online cpus, so can't be empty).
>  	 */
> -	parent = cs->parent;
> +	parent = parent_cs(cs);
>  	while (cpumask_empty(parent->cpus_allowed) ||
>  			nodes_empty(parent->mems_allowed))
> -		parent = parent->parent;
> +		parent = parent_cs(parent);
>  
>  	move_member_tasks_to_cpuset(cs, parent);
>  }
> @@ -2353,8 +2359,8 @@ int cpuset_nodemask_valid_mems_allowed(nodemask_t *nodemask)
>   */
>  static const struct cpuset *nearest_hardwall_ancestor(const struct cpuset *cs)
>  {
> -	while (!(is_mem_exclusive(cs) || is_mem_hardwall(cs)) && cs->parent)
> -		cs = cs->parent;
> +	while (!(is_mem_exclusive(cs) || is_mem_hardwall(cs)) && parent_cs(cs))
> +		cs = parent_cs(cs);
>  	return cs;
>  }
>  
> -- 
> 1.7.11.7
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-12-03 16:20 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-28 22:26 [PATCHSET cgroup/for-3.8] cpuset: drop cpuset->stack_list and ->parent Tejun Heo
2012-11-28 22:26 ` [PATCH 1/3] cpuset: implement cgroup_rightmost_descendant() Tejun Heo
2012-12-03 16:03   ` Michal Hocko
2012-11-28 22:27 ` [PATCH 2/3] cpuset: replace cpuset->stack_list with cpuset_for_each_descendant_pre() Tejun Heo
2012-12-03 16:18   ` Michal Hocko
2012-11-28 22:27 ` [PATCH 3/3] cpuset: remove cpuset->parent Tejun Heo
2012-12-03 16:20   ` Michal Hocko
2012-12-03 15:55 ` [PATCHSET cgroup/for-3.8] cpuset: drop cpuset->stack_list and ->parent Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).