All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/11] cpuset: separate configured masks and effective masks
@ 2013-08-21  9:58 ` Li Zefan
  0 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-08-21  9:58 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups, Containers

This patcheset introduces behavior changes, but only if you mount cgroupfs
with sane_behavior option:

- We introduce new interfaces cpuset.effective_cpus and cpuset.effective_mems,
  while cpuset.cpus and cpuset.mems will be configured masks.

- The configured masks can be changed by writing cpuset.cpus/mems only. They
  won't be changed when hotplug happens.

- Users can config cpus and mems without restrictions from the parent cpuset.
  effective masks will enforce the hierarchical behavior.

- Users can also config cpus and mems to have already offlined CPU/nodes.

- When a CPU/node is onlined, it will be brought back to the effective masks
  if it's in the configured masks.

- We build sched domains based on effective cpumask but not configured cpumask.

Li Zefan (11):
  cgroup: allow subsystems to create files for sane_behavior only
  cpuset: add cs->real_cpus_allowed and cs->real_mems_allowed
  cpuset: update cpuset->real_{cpus,mems}_allowed at hotplug
  cpuset: update cs->real_{cpus,mems}_allowed when config changes
  cpuset: inherite ancestor's masks if real_{cpus,mems}_allowed become empty
  cpuset: apply cs->real_{cpus,mems}_allowed
  cpuset: use effective cpumask to build sched domains
  cpuset: separate configured masks and efffective masks
  cpuset: enable onlined cpu/node in effective masks
  cpuset: allow writing offlined masks to cpuset.cpus/mems
  cpuset: export effective masks to userspace

 include/linux/cgroup.h |   1 +
 kernel/cgroup.c        |   2 +
 kernel/cpuset.c        | 466 ++++++++++++++++++++++++++++---------------------
 3 files changed, 271 insertions(+), 198 deletions(-)

-- 
1.8.0.2


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 00/11] cpuset: separate configured masks and effective masks
@ 2013-08-21  9:58 ` Li Zefan
  0 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-08-21  9:58 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups, Containers

This patcheset introduces behavior changes, but only if you mount cgroupfs
with sane_behavior option:

- We introduce new interfaces cpuset.effective_cpus and cpuset.effective_mems,
  while cpuset.cpus and cpuset.mems will be configured masks.

- The configured masks can be changed by writing cpuset.cpus/mems only. They
  won't be changed when hotplug happens.

- Users can config cpus and mems without restrictions from the parent cpuset.
  effective masks will enforce the hierarchical behavior.

- Users can also config cpus and mems to have already offlined CPU/nodes.

- When a CPU/node is onlined, it will be brought back to the effective masks
  if it's in the configured masks.

- We build sched domains based on effective cpumask but not configured cpumask.

Li Zefan (11):
  cgroup: allow subsystems to create files for sane_behavior only
  cpuset: add cs->real_cpus_allowed and cs->real_mems_allowed
  cpuset: update cpuset->real_{cpus,mems}_allowed at hotplug
  cpuset: update cs->real_{cpus,mems}_allowed when config changes
  cpuset: inherite ancestor's masks if real_{cpus,mems}_allowed become empty
  cpuset: apply cs->real_{cpus,mems}_allowed
  cpuset: use effective cpumask to build sched domains
  cpuset: separate configured masks and efffective masks
  cpuset: enable onlined cpu/node in effective masks
  cpuset: allow writing offlined masks to cpuset.cpus/mems
  cpuset: export effective masks to userspace

 include/linux/cgroup.h |   1 +
 kernel/cgroup.c        |   2 +
 kernel/cpuset.c        | 466 ++++++++++++++++++++++++++++---------------------
 3 files changed, 271 insertions(+), 198 deletions(-)

-- 
1.8.0.2

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 01/11] cgroup: allow subsystems to create files for sane_behavior only
       [not found] ` <52148F52.0-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2013-08-21  9:58   ` Li Zefan
  2013-08-21  9:59   ` [PATCH 02/11] cpuset: add cs->real_cpus_allowed and cs->real_mems_allowed Li Zefan
                     ` (10 subsequent siblings)
  11 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-08-21  9:58 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Cgroups, Containers, LKML

So we can add new cgroupfs interfaces for sane_behavior only.

Signed-off-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 include/linux/cgroup.h | 1 +
 kernel/cgroup.c        | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 3aac34d..7ba7764 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -411,6 +411,7 @@ enum {
 	CFTYPE_ONLY_ON_ROOT	= (1 << 0),	/* only create on root cgrp */
 	CFTYPE_NOT_ON_ROOT	= (1 << 1),	/* don't create on root cgrp */
 	CFTYPE_INSANE		= (1 << 2),	/* don't create if sane_behavior */
+	CFTYPE_SANE		= (1 << 3),	/* only create if sane_behavior */
 };
 
 #define MAX_CFTYPE_NAME		64
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 737d752..6c770ee 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2809,6 +2809,8 @@ static int cgroup_addrm_files(struct cgroup *cgrp, struct cftype cfts[],
 		/* does cft->flags tell us to skip this file on @cgrp? */
 		if ((cft->flags & CFTYPE_INSANE) && cgroup_sane_behavior(cgrp))
 			continue;
+		if ((cft->flags & CFTYPE_SANE) && !cgroup_sane_behavior(cgrp))
+			continue;
 		if ((cft->flags & CFTYPE_NOT_ON_ROOT) && !cgrp->parent)
 			continue;
 		if ((cft->flags & CFTYPE_ONLY_ON_ROOT) && cgrp->parent)
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 01/11] cgroup: allow subsystems to create files for sane_behavior only
  2013-08-21  9:58 ` Li Zefan
  (?)
@ 2013-08-21  9:58 ` Li Zefan
  -1 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-08-21  9:58 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups, Containers

So we can add new cgroupfs interfaces for sane_behavior only.

Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 include/linux/cgroup.h | 1 +
 kernel/cgroup.c        | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 3aac34d..7ba7764 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -411,6 +411,7 @@ enum {
 	CFTYPE_ONLY_ON_ROOT	= (1 << 0),	/* only create on root cgrp */
 	CFTYPE_NOT_ON_ROOT	= (1 << 1),	/* don't create on root cgrp */
 	CFTYPE_INSANE		= (1 << 2),	/* don't create if sane_behavior */
+	CFTYPE_SANE		= (1 << 3),	/* only create if sane_behavior */
 };
 
 #define MAX_CFTYPE_NAME		64
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 737d752..6c770ee 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2809,6 +2809,8 @@ static int cgroup_addrm_files(struct cgroup *cgrp, struct cftype cfts[],
 		/* does cft->flags tell us to skip this file on @cgrp? */
 		if ((cft->flags & CFTYPE_INSANE) && cgroup_sane_behavior(cgrp))
 			continue;
+		if ((cft->flags & CFTYPE_SANE) && !cgroup_sane_behavior(cgrp))
+			continue;
 		if ((cft->flags & CFTYPE_NOT_ON_ROOT) && !cgrp->parent)
 			continue;
 		if ((cft->flags & CFTYPE_ONLY_ON_ROOT) && cgrp->parent)
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 02/11] cpuset: add cs->real_cpus_allowed and cs->real_mems_allowed
       [not found] ` <52148F52.0-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
  2013-08-21  9:58   ` [PATCH 01/11] cgroup: allow subsystems to create files for sane_behavior only Li Zefan
@ 2013-08-21  9:59   ` Li Zefan
  2013-08-21  9:59   ` [PATCH 03/11] cpuset: update cpuset->real_{cpus,mems}_allowed at hotplug Li Zefan
                     ` (9 subsequent siblings)
  11 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-08-21  9:59 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Cgroups, Containers, LKML

We're going to have separate user-configured masks and effective ones.

At last configured masks can only be changed by writing cpuset.cpus
and cpuset.mems, and they won't be restricted by parent cpuset. While
effective masks reflect cpu/memory hotplug and hierachical restriction.

This patch adds and initializes the effective masks. The effective
masks of the top cpuset is the same with configured masks, and a child
cpuset inherites its parent's effective masks.

This won't introduce behavior change.

Signed-off-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 kernel/cpuset.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 46 insertions(+), 11 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 70ab3fd..404fea5 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -81,8 +81,14 @@ struct cpuset {
 	struct cgroup_subsys_state css;
 
 	unsigned long flags;		/* "unsigned long" so bitops work */
-	cpumask_var_t cpus_allowed;	/* CPUs allowed to tasks in cpuset */
-	nodemask_t mems_allowed;	/* Memory Nodes allowed to tasks */
+
+	/* user-configured CPUs and Memory Nodes allow to tasks */
+	cpumask_var_t cpus_allowed;
+	nodemask_t mems_allowed;
+
+	/* effective CPUs and Memory Nodes allow to tasks */
+	cpumask_var_t real_cpus_allowed;
+	nodemask_t real_mems_allowed;
 
 	/*
 	 * This is old Memory Nodes tasks took on.
@@ -381,13 +387,20 @@ static struct cpuset *alloc_trial_cpuset(struct cpuset *cs)
 	if (!trial)
 		return NULL;
 
-	if (!alloc_cpumask_var(&trial->cpus_allowed, GFP_KERNEL)) {
-		kfree(trial);
-		return NULL;
-	}
-	cpumask_copy(trial->cpus_allowed, cs->cpus_allowed);
+	if (!alloc_cpumask_var(&trial->cpus_allowed, GFP_KERNEL))
+		goto free_cs;
+	if (!alloc_cpumask_var(&trial->real_cpus_allowed, GFP_KERNEL))
+		goto free_cpus;
 
+	cpumask_copy(trial->cpus_allowed, cs->cpus_allowed);
+	cpumask_copy(trial->real_cpus_allowed, cs->real_cpus_allowed);
 	return trial;
+
+free_cpus:
+	free_cpumask_var(trial->cpus_allowed);
+free_cs:
+	kfree(trial);
+	return NULL;
 }
 
 /**
@@ -396,6 +409,7 @@ static struct cpuset *alloc_trial_cpuset(struct cpuset *cs)
  */
 static void free_trial_cpuset(struct cpuset *trial)
 {
+	free_cpumask_var(trial->real_cpus_allowed);
 	free_cpumask_var(trial->cpus_allowed);
 	kfree(trial);
 }
@@ -1949,18 +1963,26 @@ cpuset_css_alloc(struct cgroup_subsys_state *parent_css)
 	cs = kzalloc(sizeof(*cs), GFP_KERNEL);
 	if (!cs)
 		return ERR_PTR(-ENOMEM);
-	if (!alloc_cpumask_var(&cs->cpus_allowed, GFP_KERNEL)) {
-		kfree(cs);
-		return ERR_PTR(-ENOMEM);
-	}
+	if (!alloc_cpumask_var(&cs->cpus_allowed, GFP_KERNEL))
+		goto free_cs;
+	if (!alloc_cpumask_var(&cs->real_cpus_allowed, GFP_KERNEL))
+		goto free_cpus;
 
 	set_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
 	cpumask_clear(cs->cpus_allowed);
 	nodes_clear(cs->mems_allowed);
+	cpumask_clear(cs->real_cpus_allowed);
+	nodes_clear(cs->real_mems_allowed);
 	fmeter_init(&cs->fmeter);
 	cs->relax_domain_level = -1;
 
 	return &cs->css;
+
+free_cpus:
+	free_cpumask_var(cs->cpus_allowed);
+free_cs:
+	kfree(cs);
+	return ERR_PTR(-ENOMEM);
 }
 
 static int cpuset_css_online(struct cgroup_subsys_state *css)
@@ -1983,6 +2005,11 @@ static int cpuset_css_online(struct cgroup_subsys_state *css)
 
 	number_of_cpusets++;
 
+	mutex_lock(&callback_mutex);
+	cpumask_copy(cs->real_cpus_allowed, parent->real_cpus_allowed);
+	cs->real_mems_allowed = parent->real_mems_allowed;
+	mutex_unlock(&callback_mutex);
+
 	if (!test_bit(CGRP_CPUSET_CLONE_CHILDREN, &css->cgroup->flags))
 		goto out_unlock;
 
@@ -2042,6 +2069,7 @@ static void cpuset_css_free(struct cgroup_subsys_state *css)
 {
 	struct cpuset *cs = css_cs(css);
 
+	free_cpumask_var(cs->real_cpus_allowed);
 	free_cpumask_var(cs->cpus_allowed);
 	kfree(cs);
 }
@@ -2072,9 +2100,13 @@ int __init cpuset_init(void)
 
 	if (!alloc_cpumask_var(&top_cpuset.cpus_allowed, GFP_KERNEL))
 		BUG();
+	if (!alloc_cpumask_var(&top_cpuset.real_cpus_allowed, GFP_KERNEL))
+		BUG();
 
 	cpumask_setall(top_cpuset.cpus_allowed);
 	nodes_setall(top_cpuset.mems_allowed);
+	cpumask_setall(top_cpuset.real_cpus_allowed);
+	nodes_setall(top_cpuset.real_mems_allowed);
 
 	fmeter_init(&top_cpuset.fmeter);
 	set_bit(CS_SCHED_LOAD_BALANCE, &top_cpuset.flags);
@@ -2312,6 +2344,9 @@ void __init cpuset_init_smp(void)
 	top_cpuset.mems_allowed = node_states[N_MEMORY];
 	top_cpuset.old_mems_allowed = top_cpuset.mems_allowed;
 
+	cpumask_copy(top_cpuset.real_cpus_allowed, cpu_active_mask);
+	top_cpuset.real_mems_allowed = node_states[N_MEMORY];
+
 	register_hotmemory_notifier(&cpuset_track_online_nodes_nb);
 }
 
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 02/11] cpuset: add cs->real_cpus_allowed and cs->real_mems_allowed
       [not found] ` <52148F52.0-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2013-08-21  9:59   ` Li Zefan
  2013-08-21  9:59   ` [PATCH 02/11] cpuset: add cs->real_cpus_allowed and cs->real_mems_allowed Li Zefan
                     ` (10 subsequent siblings)
  11 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-08-21  9:59 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups, Containers

We're going to have separate user-configured masks and effective ones.

At last configured masks can only be changed by writing cpuset.cpus
and cpuset.mems, and they won't be restricted by parent cpuset. While
effective masks reflect cpu/memory hotplug and hierachical restriction.

This patch adds and initializes the effective masks. The effective
masks of the top cpuset is the same with configured masks, and a child
cpuset inherites its parent's effective masks.

This won't introduce behavior change.

Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 kernel/cpuset.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 46 insertions(+), 11 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 70ab3fd..404fea5 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -81,8 +81,14 @@ struct cpuset {
 	struct cgroup_subsys_state css;
 
 	unsigned long flags;		/* "unsigned long" so bitops work */
-	cpumask_var_t cpus_allowed;	/* CPUs allowed to tasks in cpuset */
-	nodemask_t mems_allowed;	/* Memory Nodes allowed to tasks */
+
+	/* user-configured CPUs and Memory Nodes allow to tasks */
+	cpumask_var_t cpus_allowed;
+	nodemask_t mems_allowed;
+
+	/* effective CPUs and Memory Nodes allow to tasks */
+	cpumask_var_t real_cpus_allowed;
+	nodemask_t real_mems_allowed;
 
 	/*
 	 * This is old Memory Nodes tasks took on.
@@ -381,13 +387,20 @@ static struct cpuset *alloc_trial_cpuset(struct cpuset *cs)
 	if (!trial)
 		return NULL;
 
-	if (!alloc_cpumask_var(&trial->cpus_allowed, GFP_KERNEL)) {
-		kfree(trial);
-		return NULL;
-	}
-	cpumask_copy(trial->cpus_allowed, cs->cpus_allowed);
+	if (!alloc_cpumask_var(&trial->cpus_allowed, GFP_KERNEL))
+		goto free_cs;
+	if (!alloc_cpumask_var(&trial->real_cpus_allowed, GFP_KERNEL))
+		goto free_cpus;
 
+	cpumask_copy(trial->cpus_allowed, cs->cpus_allowed);
+	cpumask_copy(trial->real_cpus_allowed, cs->real_cpus_allowed);
 	return trial;
+
+free_cpus:
+	free_cpumask_var(trial->cpus_allowed);
+free_cs:
+	kfree(trial);
+	return NULL;
 }
 
 /**
@@ -396,6 +409,7 @@ static struct cpuset *alloc_trial_cpuset(struct cpuset *cs)
  */
 static void free_trial_cpuset(struct cpuset *trial)
 {
+	free_cpumask_var(trial->real_cpus_allowed);
 	free_cpumask_var(trial->cpus_allowed);
 	kfree(trial);
 }
@@ -1949,18 +1963,26 @@ cpuset_css_alloc(struct cgroup_subsys_state *parent_css)
 	cs = kzalloc(sizeof(*cs), GFP_KERNEL);
 	if (!cs)
 		return ERR_PTR(-ENOMEM);
-	if (!alloc_cpumask_var(&cs->cpus_allowed, GFP_KERNEL)) {
-		kfree(cs);
-		return ERR_PTR(-ENOMEM);
-	}
+	if (!alloc_cpumask_var(&cs->cpus_allowed, GFP_KERNEL))
+		goto free_cs;
+	if (!alloc_cpumask_var(&cs->real_cpus_allowed, GFP_KERNEL))
+		goto free_cpus;
 
 	set_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
 	cpumask_clear(cs->cpus_allowed);
 	nodes_clear(cs->mems_allowed);
+	cpumask_clear(cs->real_cpus_allowed);
+	nodes_clear(cs->real_mems_allowed);
 	fmeter_init(&cs->fmeter);
 	cs->relax_domain_level = -1;
 
 	return &cs->css;
+
+free_cpus:
+	free_cpumask_var(cs->cpus_allowed);
+free_cs:
+	kfree(cs);
+	return ERR_PTR(-ENOMEM);
 }
 
 static int cpuset_css_online(struct cgroup_subsys_state *css)
@@ -1983,6 +2005,11 @@ static int cpuset_css_online(struct cgroup_subsys_state *css)
 
 	number_of_cpusets++;
 
+	mutex_lock(&callback_mutex);
+	cpumask_copy(cs->real_cpus_allowed, parent->real_cpus_allowed);
+	cs->real_mems_allowed = parent->real_mems_allowed;
+	mutex_unlock(&callback_mutex);
+
 	if (!test_bit(CGRP_CPUSET_CLONE_CHILDREN, &css->cgroup->flags))
 		goto out_unlock;
 
@@ -2042,6 +2069,7 @@ static void cpuset_css_free(struct cgroup_subsys_state *css)
 {
 	struct cpuset *cs = css_cs(css);
 
+	free_cpumask_var(cs->real_cpus_allowed);
 	free_cpumask_var(cs->cpus_allowed);
 	kfree(cs);
 }
@@ -2072,9 +2100,13 @@ int __init cpuset_init(void)
 
 	if (!alloc_cpumask_var(&top_cpuset.cpus_allowed, GFP_KERNEL))
 		BUG();
+	if (!alloc_cpumask_var(&top_cpuset.real_cpus_allowed, GFP_KERNEL))
+		BUG();
 
 	cpumask_setall(top_cpuset.cpus_allowed);
 	nodes_setall(top_cpuset.mems_allowed);
+	cpumask_setall(top_cpuset.real_cpus_allowed);
+	nodes_setall(top_cpuset.real_mems_allowed);
 
 	fmeter_init(&top_cpuset.fmeter);
 	set_bit(CS_SCHED_LOAD_BALANCE, &top_cpuset.flags);
@@ -2312,6 +2344,9 @@ void __init cpuset_init_smp(void)
 	top_cpuset.mems_allowed = node_states[N_MEMORY];
 	top_cpuset.old_mems_allowed = top_cpuset.mems_allowed;
 
+	cpumask_copy(top_cpuset.real_cpus_allowed, cpu_active_mask);
+	top_cpuset.real_mems_allowed = node_states[N_MEMORY];
+
 	register_hotmemory_notifier(&cpuset_track_online_nodes_nb);
 }
 
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 02/11] cpuset: add cs->real_cpus_allowed and cs->real_mems_allowed
@ 2013-08-21  9:59   ` Li Zefan
  0 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-08-21  9:59 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups, Containers

We're going to have separate user-configured masks and effective ones.

At last configured masks can only be changed by writing cpuset.cpus
and cpuset.mems, and they won't be restricted by parent cpuset. While
effective masks reflect cpu/memory hotplug and hierachical restriction.

This patch adds and initializes the effective masks. The effective
masks of the top cpuset is the same with configured masks, and a child
cpuset inherites its parent's effective masks.

This won't introduce behavior change.

Signed-off-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 kernel/cpuset.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 46 insertions(+), 11 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 70ab3fd..404fea5 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -81,8 +81,14 @@ struct cpuset {
 	struct cgroup_subsys_state css;
 
 	unsigned long flags;		/* "unsigned long" so bitops work */
-	cpumask_var_t cpus_allowed;	/* CPUs allowed to tasks in cpuset */
-	nodemask_t mems_allowed;	/* Memory Nodes allowed to tasks */
+
+	/* user-configured CPUs and Memory Nodes allow to tasks */
+	cpumask_var_t cpus_allowed;
+	nodemask_t mems_allowed;
+
+	/* effective CPUs and Memory Nodes allow to tasks */
+	cpumask_var_t real_cpus_allowed;
+	nodemask_t real_mems_allowed;
 
 	/*
 	 * This is old Memory Nodes tasks took on.
@@ -381,13 +387,20 @@ static struct cpuset *alloc_trial_cpuset(struct cpuset *cs)
 	if (!trial)
 		return NULL;
 
-	if (!alloc_cpumask_var(&trial->cpus_allowed, GFP_KERNEL)) {
-		kfree(trial);
-		return NULL;
-	}
-	cpumask_copy(trial->cpus_allowed, cs->cpus_allowed);
+	if (!alloc_cpumask_var(&trial->cpus_allowed, GFP_KERNEL))
+		goto free_cs;
+	if (!alloc_cpumask_var(&trial->real_cpus_allowed, GFP_KERNEL))
+		goto free_cpus;
 
+	cpumask_copy(trial->cpus_allowed, cs->cpus_allowed);
+	cpumask_copy(trial->real_cpus_allowed, cs->real_cpus_allowed);
 	return trial;
+
+free_cpus:
+	free_cpumask_var(trial->cpus_allowed);
+free_cs:
+	kfree(trial);
+	return NULL;
 }
 
 /**
@@ -396,6 +409,7 @@ static struct cpuset *alloc_trial_cpuset(struct cpuset *cs)
  */
 static void free_trial_cpuset(struct cpuset *trial)
 {
+	free_cpumask_var(trial->real_cpus_allowed);
 	free_cpumask_var(trial->cpus_allowed);
 	kfree(trial);
 }
@@ -1949,18 +1963,26 @@ cpuset_css_alloc(struct cgroup_subsys_state *parent_css)
 	cs = kzalloc(sizeof(*cs), GFP_KERNEL);
 	if (!cs)
 		return ERR_PTR(-ENOMEM);
-	if (!alloc_cpumask_var(&cs->cpus_allowed, GFP_KERNEL)) {
-		kfree(cs);
-		return ERR_PTR(-ENOMEM);
-	}
+	if (!alloc_cpumask_var(&cs->cpus_allowed, GFP_KERNEL))
+		goto free_cs;
+	if (!alloc_cpumask_var(&cs->real_cpus_allowed, GFP_KERNEL))
+		goto free_cpus;
 
 	set_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
 	cpumask_clear(cs->cpus_allowed);
 	nodes_clear(cs->mems_allowed);
+	cpumask_clear(cs->real_cpus_allowed);
+	nodes_clear(cs->real_mems_allowed);
 	fmeter_init(&cs->fmeter);
 	cs->relax_domain_level = -1;
 
 	return &cs->css;
+
+free_cpus:
+	free_cpumask_var(cs->cpus_allowed);
+free_cs:
+	kfree(cs);
+	return ERR_PTR(-ENOMEM);
 }
 
 static int cpuset_css_online(struct cgroup_subsys_state *css)
@@ -1983,6 +2005,11 @@ static int cpuset_css_online(struct cgroup_subsys_state *css)
 
 	number_of_cpusets++;
 
+	mutex_lock(&callback_mutex);
+	cpumask_copy(cs->real_cpus_allowed, parent->real_cpus_allowed);
+	cs->real_mems_allowed = parent->real_mems_allowed;
+	mutex_unlock(&callback_mutex);
+
 	if (!test_bit(CGRP_CPUSET_CLONE_CHILDREN, &css->cgroup->flags))
 		goto out_unlock;
 
@@ -2042,6 +2069,7 @@ static void cpuset_css_free(struct cgroup_subsys_state *css)
 {
 	struct cpuset *cs = css_cs(css);
 
+	free_cpumask_var(cs->real_cpus_allowed);
 	free_cpumask_var(cs->cpus_allowed);
 	kfree(cs);
 }
@@ -2072,9 +2100,13 @@ int __init cpuset_init(void)
 
 	if (!alloc_cpumask_var(&top_cpuset.cpus_allowed, GFP_KERNEL))
 		BUG();
+	if (!alloc_cpumask_var(&top_cpuset.real_cpus_allowed, GFP_KERNEL))
+		BUG();
 
 	cpumask_setall(top_cpuset.cpus_allowed);
 	nodes_setall(top_cpuset.mems_allowed);
+	cpumask_setall(top_cpuset.real_cpus_allowed);
+	nodes_setall(top_cpuset.real_mems_allowed);
 
 	fmeter_init(&top_cpuset.fmeter);
 	set_bit(CS_SCHED_LOAD_BALANCE, &top_cpuset.flags);
@@ -2312,6 +2344,9 @@ void __init cpuset_init_smp(void)
 	top_cpuset.mems_allowed = node_states[N_MEMORY];
 	top_cpuset.old_mems_allowed = top_cpuset.mems_allowed;
 
+	cpumask_copy(top_cpuset.real_cpus_allowed, cpu_active_mask);
+	top_cpuset.real_mems_allowed = node_states[N_MEMORY];
+
 	register_hotmemory_notifier(&cpuset_track_online_nodes_nb);
 }
 
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 03/11] cpuset: update cpuset->real_{cpus,mems}_allowed at hotplug
       [not found] ` <52148F52.0-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
  2013-08-21  9:58   ` [PATCH 01/11] cgroup: allow subsystems to create files for sane_behavior only Li Zefan
  2013-08-21  9:59   ` [PATCH 02/11] cpuset: add cs->real_cpus_allowed and cs->real_mems_allowed Li Zefan
@ 2013-08-21  9:59   ` Li Zefan
  2013-08-21  9:59   ` [PATCH 04/11] cpuset: update cs->real_{cpus, mems}_allowed when config changes Li Zefan
                     ` (8 subsequent siblings)
  11 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-08-21  9:59 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Cgroups, Containers, LKML

We're going to have separate user-configured masks and effective ones.

At last configured masks can only be changed by writing cpuset.cpus
and cpuset.mems, and they won't be restricted by parent cpuset. While
effective masks reflect cpu/memory hotplug and hierachical restriction.

This is a preparation to make real_{cpus,mems}_allowed to be effective
masks of the cpuset:

- change the effective masks at hotplug: done
- change the effective masks at config change: todo
- take on ancestor's mask when the effective mask is empty: todo

This won't introduce behavior change.

Signed-off-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 kernel/cpuset.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 404fea5..ab89c1e 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -2185,6 +2185,8 @@ retry:
 
 	mutex_lock(&callback_mutex);
 	cpumask_andnot(cs->cpus_allowed, cs->cpus_allowed, &off_cpus);
+	cpumask_andnot(cs->real_cpus_allowed, cs->real_cpus_allowed,
+		       &off_cpus);
 	mutex_unlock(&callback_mutex);
 
 	/*
@@ -2199,6 +2201,7 @@ retry:
 
 	mutex_lock(&callback_mutex);
 	nodes_andnot(cs->mems_allowed, cs->mems_allowed, off_mems);
+	nodes_andnot(cs->real_mems_allowed, cs->real_mems_allowed, off_mems);
 	mutex_unlock(&callback_mutex);
 
 	/*
@@ -2262,6 +2265,7 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
 	if (cpus_updated) {
 		mutex_lock(&callback_mutex);
 		cpumask_copy(top_cpuset.cpus_allowed, &new_cpus);
+		cpumask_copy(top_cpuset.real_cpus_allowed, &new_cpus);
 		mutex_unlock(&callback_mutex);
 		/* we don't mess with cpumasks of tasks in top_cpuset */
 	}
@@ -2270,6 +2274,7 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
 	if (mems_updated) {
 		mutex_lock(&callback_mutex);
 		top_cpuset.mems_allowed = new_mems;
+		top_cpuset.real_mems_allowed = new_mems;
 		mutex_unlock(&callback_mutex);
 		update_tasks_nodemask(&top_cpuset, NULL);
 	}
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 03/11] cpuset: update cpuset->real_{cpus,mems}_allowed at hotplug
  2013-08-21  9:58 ` Li Zefan
                   ` (2 preceding siblings ...)
  (?)
@ 2013-08-21  9:59 ` Li Zefan
  -1 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-08-21  9:59 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups, Containers

We're going to have separate user-configured masks and effective ones.

At last configured masks can only be changed by writing cpuset.cpus
and cpuset.mems, and they won't be restricted by parent cpuset. While
effective masks reflect cpu/memory hotplug and hierachical restriction.

This is a preparation to make real_{cpus,mems}_allowed to be effective
masks of the cpuset:

- change the effective masks at hotplug: done
- change the effective masks at config change: todo
- take on ancestor's mask when the effective mask is empty: todo

This won't introduce behavior change.

Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 kernel/cpuset.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 404fea5..ab89c1e 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -2185,6 +2185,8 @@ retry:
 
 	mutex_lock(&callback_mutex);
 	cpumask_andnot(cs->cpus_allowed, cs->cpus_allowed, &off_cpus);
+	cpumask_andnot(cs->real_cpus_allowed, cs->real_cpus_allowed,
+		       &off_cpus);
 	mutex_unlock(&callback_mutex);
 
 	/*
@@ -2199,6 +2201,7 @@ retry:
 
 	mutex_lock(&callback_mutex);
 	nodes_andnot(cs->mems_allowed, cs->mems_allowed, off_mems);
+	nodes_andnot(cs->real_mems_allowed, cs->real_mems_allowed, off_mems);
 	mutex_unlock(&callback_mutex);
 
 	/*
@@ -2262,6 +2265,7 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
 	if (cpus_updated) {
 		mutex_lock(&callback_mutex);
 		cpumask_copy(top_cpuset.cpus_allowed, &new_cpus);
+		cpumask_copy(top_cpuset.real_cpus_allowed, &new_cpus);
 		mutex_unlock(&callback_mutex);
 		/* we don't mess with cpumasks of tasks in top_cpuset */
 	}
@@ -2270,6 +2274,7 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
 	if (mems_updated) {
 		mutex_lock(&callback_mutex);
 		top_cpuset.mems_allowed = new_mems;
+		top_cpuset.real_mems_allowed = new_mems;
 		mutex_unlock(&callback_mutex);
 		update_tasks_nodemask(&top_cpuset, NULL);
 	}
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 04/11] cpuset: update cs->real_{cpus, mems}_allowed when config changes
       [not found] ` <52148F52.0-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
                     ` (2 preceding siblings ...)
  2013-08-21  9:59   ` [PATCH 03/11] cpuset: update cpuset->real_{cpus,mems}_allowed at hotplug Li Zefan
@ 2013-08-21  9:59   ` Li Zefan
  2013-08-21  9:59     ` [PATCH 05/11] cpuset: inherite ancestor's masks if real_{cpus,mems}_allowed " Li Zefan
                     ` (7 subsequent siblings)
  11 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-08-21  9:59 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Cgroups, Containers, LKML

We're going to have separate user-configured masks and effective ones.

At last configured masks can only be changed by writing cpuset.cpus
and cpuset.mems, and they won't be restricted by parent cpuset. While
effective masks reflect cpu/memory hotplug and hierachical restriction.

This is a preparation to make real_{cpus,mems}_allowed to be effective
masks of the cpuset:

- change the effective masks at hotplug: done
- change the effective masks at config change: done
- take on ancestor's mask when the effective mask is empty: todo

This won't introduce behavior change.

Signed-off-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 kernel/cpuset.c | 115 ++++++++++++++++++++++++++++++++------------------------
 1 file changed, 66 insertions(+), 49 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index ab89c1e..72afef4 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -880,39 +880,49 @@ static void update_tasks_cpumask(struct cpuset *cs, struct ptr_heap *heap)
 	css_scan_tasks(&cs->css, NULL, cpuset_change_cpumask, cs, heap);
 }
 
-/*
- * update_tasks_cpumask_hier - Update the cpumasks of tasks in the hierarchy.
- * @root_cs: the root cpuset of the hierarchy
- * @update_root: update root cpuset or not?
+/**
+ * update_cpumasks_hier - Update effective cpumasks and tasks in the subtree
+ * @cs: the cpuset to consider
+ * @trialcs: the trial cpuset
  * @heap: the heap used by css_scan_tasks()
  *
- * This will update cpumasks of tasks in @root_cs and all other empty cpusets
- * which take on cpumask of @root_cs.
- *
- * Called with cpuset_mutex held
+ * When configured cpumask is changed, the effective cpumasks of this cpuset
+ * and all its descendants need to be updated.
  */
-static void update_tasks_cpumask_hier(struct cpuset *root_cs,
-				      bool update_root, struct ptr_heap *heap)
+static void update_cpumasks_hier(struct cpuset *cs, struct cpuset *trialcs,
+				 struct ptr_heap *heap)
 {
-	struct cpuset *cp;
 	struct cgroup_subsys_state *pos_css;
+	struct cpuset *cp;
 
 	rcu_read_lock();
-	cpuset_for_each_descendant_pre(cp, pos_css, root_cs) {
-		if (cp == root_cs) {
-			if (!update_root)
-				continue;
-		} else {
-			/* skip the whole subtree if @cp have some CPU */
-			if (!cpumask_empty(cp->cpus_allowed)) {
-				pos_css = css_rightmost_descendant(pos_css);
-				continue;
-			}
+	cpuset_for_each_descendant_pre(cp, pos_css, cs) {
+		struct cpuset *parent = parent_cs(cs);
+		struct cpumask *new_cpus = trialcs->real_cpus_allowed;
+
+		cpumask_and(new_cpus, cp->cpus_allowed,
+			    parent->real_cpus_allowed);
+
+		/*
+		 * Skip the whole subtree if the cpumask is not changed, unless
+		 * it's empty, and in this case we need to update tasks to take
+		 * on an ancestor's cpumask.
+		 */
+		if (cpumask_equal(new_cpus, cp->real_cpus_allowed) &&
+		    ((cp == cs) || !cpumask_empty(new_cpus))) {
+			pos_css = css_rightmost_descendant(pos_css);
+			continue;
 		}
+
 		if (!css_tryget(&cp->css))
 			continue;
+
 		rcu_read_unlock();
 
+		mutex_lock(&callback_mutex);
+		cpumask_copy(cp->real_cpus_allowed, new_cpus);
+		mutex_unlock(&callback_mutex);
+
 		update_tasks_cpumask(cp, heap);
 
 		rcu_read_lock();
@@ -931,7 +941,6 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
 {
 	struct ptr_heap heap;
 	int retval;
-	int is_load_balanced;
 
 	/* top_cpuset.cpus_allowed tracks cpu_online_mask; it's read-only */
 	if (cs == &top_cpuset)
@@ -966,17 +975,15 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
 	if (retval)
 		return retval;
 
-	is_load_balanced = is_sched_load_balance(trialcs);
-
 	mutex_lock(&callback_mutex);
 	cpumask_copy(cs->cpus_allowed, trialcs->cpus_allowed);
 	mutex_unlock(&callback_mutex);
 
-	update_tasks_cpumask_hier(cs, true, &heap);
+	update_cpumasks_hier(cs, trialcs, &heap);
 
 	heap_free(&heap);
 
-	if (is_load_balanced)
+	if (is_sched_load_balance(cs))
 		rebuild_sched_domains_locked();
 	return 0;
 }
@@ -1137,40 +1144,50 @@ static void update_tasks_nodemask(struct cpuset *cs, struct ptr_heap *heap)
 	cpuset_being_rebound = NULL;
 }
 
-/*
- * update_tasks_nodemask_hier - Update the nodemasks of tasks in the hierarchy.
- * @cs: the root cpuset of the hierarchy
- * @update_root: update the root cpuset or not?
+/**
+ * update_nodesmasks_hier - Update effective nodemasks and tasks in the subtree
+ * @cs: the cpuset to consider
+ * @trialcs: the trial cpuset
  * @heap: the heap used by css_scan_tasks()
  *
- * This will update nodemasks of tasks in @root_cs and all other empty cpusets
- * which take on nodemask of @root_cs.
- *
- * Called with cpuset_mutex held
+ * When configured nodemask is changed, the effective nodemasks of this cpuset
+ * and all its descendants need to be updated.
  */
-static void update_tasks_nodemask_hier(struct cpuset *root_cs,
-				       bool update_root, struct ptr_heap *heap)
+static void update_nodemasks_hier(struct cpuset *cs, struct cpuset *trialcs,
+				 struct ptr_heap *heap)
 {
-	struct cpuset *cp;
 	struct cgroup_subsys_state *pos_css;
+	struct cpuset *cp;
 
 	rcu_read_lock();
-	cpuset_for_each_descendant_pre(cp, pos_css, root_cs) {
-		if (cp == root_cs) {
-			if (!update_root)
-				continue;
-		} else {
-			/* skip the whole subtree if @cp have some CPU */
-			if (!nodes_empty(cp->mems_allowed)) {
-				pos_css = css_rightmost_descendant(pos_css);
-				continue;
-			}
+	cpuset_for_each_descendant_pre(cp, pos_css, cs) {
+		struct cpuset *parent = parent_cs(cs);
+		nodemask_t *new_mems = &trialcs->real_mems_allowed;
+
+		nodes_and(*new_mems, cp->mems_allowed,
+			  parent->real_mems_allowed);
+
+		/*
+		 * Skip the whole subtree if the nodemask is not changed, unless
+		 * it's empty, and in this case we need to update tasks to take
+		 * on an ancestor's nodemask.
+		 */
+		if (nodes_equal(*new_mems, cp->real_mems_allowed) &&
+		    ((cp == cs) || !nodes_empty(*new_mems))) {
+			pos_css = css_rightmost_descendant(pos_css);
+			continue;
 		}
+
 		if (!css_tryget(&cp->css))
 			continue;
+
 		rcu_read_unlock();
 
-		update_tasks_nodemask(cp, heap);
+		mutex_lock(&callback_mutex);
+		cp->real_mems_allowed = *new_mems;
+		mutex_unlock(&callback_mutex);
+
+		update_tasks_cpumask(cs, heap);
 
 		rcu_read_lock();
 		css_put(&cp->css);
@@ -1242,7 +1259,7 @@ static int update_nodemask(struct cpuset *cs, struct cpuset *trialcs,
 	cs->mems_allowed = trialcs->mems_allowed;
 	mutex_unlock(&callback_mutex);
 
-	update_tasks_nodemask_hier(cs, true, &heap);
+	update_nodemasks_hier(cs, trialcs, &heap);
 
 	heap_free(&heap);
 done:
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 04/11] cpuset: update cs->real_{cpus,mems}_allowed when config changes
       [not found] ` <52148F52.0-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2013-08-21  9:59   ` Li Zefan
  2013-08-21  9:59   ` [PATCH 02/11] cpuset: add cs->real_cpus_allowed and cs->real_mems_allowed Li Zefan
                     ` (10 subsequent siblings)
  11 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-08-21  9:59 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups, Containers

We're going to have separate user-configured masks and effective ones.

At last configured masks can only be changed by writing cpuset.cpus
and cpuset.mems, and they won't be restricted by parent cpuset. While
effective masks reflect cpu/memory hotplug and hierachical restriction.

This is a preparation to make real_{cpus,mems}_allowed to be effective
masks of the cpuset:

- change the effective masks at hotplug: done
- change the effective masks at config change: done
- take on ancestor's mask when the effective mask is empty: todo

This won't introduce behavior change.

Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 kernel/cpuset.c | 115 ++++++++++++++++++++++++++++++++------------------------
 1 file changed, 66 insertions(+), 49 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index ab89c1e..72afef4 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -880,39 +880,49 @@ static void update_tasks_cpumask(struct cpuset *cs, struct ptr_heap *heap)
 	css_scan_tasks(&cs->css, NULL, cpuset_change_cpumask, cs, heap);
 }
 
-/*
- * update_tasks_cpumask_hier - Update the cpumasks of tasks in the hierarchy.
- * @root_cs: the root cpuset of the hierarchy
- * @update_root: update root cpuset or not?
+/**
+ * update_cpumasks_hier - Update effective cpumasks and tasks in the subtree
+ * @cs: the cpuset to consider
+ * @trialcs: the trial cpuset
  * @heap: the heap used by css_scan_tasks()
  *
- * This will update cpumasks of tasks in @root_cs and all other empty cpusets
- * which take on cpumask of @root_cs.
- *
- * Called with cpuset_mutex held
+ * When configured cpumask is changed, the effective cpumasks of this cpuset
+ * and all its descendants need to be updated.
  */
-static void update_tasks_cpumask_hier(struct cpuset *root_cs,
-				      bool update_root, struct ptr_heap *heap)
+static void update_cpumasks_hier(struct cpuset *cs, struct cpuset *trialcs,
+				 struct ptr_heap *heap)
 {
-	struct cpuset *cp;
 	struct cgroup_subsys_state *pos_css;
+	struct cpuset *cp;
 
 	rcu_read_lock();
-	cpuset_for_each_descendant_pre(cp, pos_css, root_cs) {
-		if (cp == root_cs) {
-			if (!update_root)
-				continue;
-		} else {
-			/* skip the whole subtree if @cp have some CPU */
-			if (!cpumask_empty(cp->cpus_allowed)) {
-				pos_css = css_rightmost_descendant(pos_css);
-				continue;
-			}
+	cpuset_for_each_descendant_pre(cp, pos_css, cs) {
+		struct cpuset *parent = parent_cs(cs);
+		struct cpumask *new_cpus = trialcs->real_cpus_allowed;
+
+		cpumask_and(new_cpus, cp->cpus_allowed,
+			    parent->real_cpus_allowed);
+
+		/*
+		 * Skip the whole subtree if the cpumask is not changed, unless
+		 * it's empty, and in this case we need to update tasks to take
+		 * on an ancestor's cpumask.
+		 */
+		if (cpumask_equal(new_cpus, cp->real_cpus_allowed) &&
+		    ((cp == cs) || !cpumask_empty(new_cpus))) {
+			pos_css = css_rightmost_descendant(pos_css);
+			continue;
 		}
+
 		if (!css_tryget(&cp->css))
 			continue;
+
 		rcu_read_unlock();
 
+		mutex_lock(&callback_mutex);
+		cpumask_copy(cp->real_cpus_allowed, new_cpus);
+		mutex_unlock(&callback_mutex);
+
 		update_tasks_cpumask(cp, heap);
 
 		rcu_read_lock();
@@ -931,7 +941,6 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
 {
 	struct ptr_heap heap;
 	int retval;
-	int is_load_balanced;
 
 	/* top_cpuset.cpus_allowed tracks cpu_online_mask; it's read-only */
 	if (cs == &top_cpuset)
@@ -966,17 +975,15 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
 	if (retval)
 		return retval;
 
-	is_load_balanced = is_sched_load_balance(trialcs);
-
 	mutex_lock(&callback_mutex);
 	cpumask_copy(cs->cpus_allowed, trialcs->cpus_allowed);
 	mutex_unlock(&callback_mutex);
 
-	update_tasks_cpumask_hier(cs, true, &heap);
+	update_cpumasks_hier(cs, trialcs, &heap);
 
 	heap_free(&heap);
 
-	if (is_load_balanced)
+	if (is_sched_load_balance(cs))
 		rebuild_sched_domains_locked();
 	return 0;
 }
@@ -1137,40 +1144,50 @@ static void update_tasks_nodemask(struct cpuset *cs, struct ptr_heap *heap)
 	cpuset_being_rebound = NULL;
 }
 
-/*
- * update_tasks_nodemask_hier - Update the nodemasks of tasks in the hierarchy.
- * @cs: the root cpuset of the hierarchy
- * @update_root: update the root cpuset or not?
+/**
+ * update_nodesmasks_hier - Update effective nodemasks and tasks in the subtree
+ * @cs: the cpuset to consider
+ * @trialcs: the trial cpuset
  * @heap: the heap used by css_scan_tasks()
  *
- * This will update nodemasks of tasks in @root_cs and all other empty cpusets
- * which take on nodemask of @root_cs.
- *
- * Called with cpuset_mutex held
+ * When configured nodemask is changed, the effective nodemasks of this cpuset
+ * and all its descendants need to be updated.
  */
-static void update_tasks_nodemask_hier(struct cpuset *root_cs,
-				       bool update_root, struct ptr_heap *heap)
+static void update_nodemasks_hier(struct cpuset *cs, struct cpuset *trialcs,
+				 struct ptr_heap *heap)
 {
-	struct cpuset *cp;
 	struct cgroup_subsys_state *pos_css;
+	struct cpuset *cp;
 
 	rcu_read_lock();
-	cpuset_for_each_descendant_pre(cp, pos_css, root_cs) {
-		if (cp == root_cs) {
-			if (!update_root)
-				continue;
-		} else {
-			/* skip the whole subtree if @cp have some CPU */
-			if (!nodes_empty(cp->mems_allowed)) {
-				pos_css = css_rightmost_descendant(pos_css);
-				continue;
-			}
+	cpuset_for_each_descendant_pre(cp, pos_css, cs) {
+		struct cpuset *parent = parent_cs(cs);
+		nodemask_t *new_mems = &trialcs->real_mems_allowed;
+
+		nodes_and(*new_mems, cp->mems_allowed,
+			  parent->real_mems_allowed);
+
+		/*
+		 * Skip the whole subtree if the nodemask is not changed, unless
+		 * it's empty, and in this case we need to update tasks to take
+		 * on an ancestor's nodemask.
+		 */
+		if (nodes_equal(*new_mems, cp->real_mems_allowed) &&
+		    ((cp == cs) || !nodes_empty(*new_mems))) {
+			pos_css = css_rightmost_descendant(pos_css);
+			continue;
 		}
+
 		if (!css_tryget(&cp->css))
 			continue;
+
 		rcu_read_unlock();
 
-		update_tasks_nodemask(cp, heap);
+		mutex_lock(&callback_mutex);
+		cp->real_mems_allowed = *new_mems;
+		mutex_unlock(&callback_mutex);
+
+		update_tasks_cpumask(cs, heap);
 
 		rcu_read_lock();
 		css_put(&cp->css);
@@ -1242,7 +1259,7 @@ static int update_nodemask(struct cpuset *cs, struct cpuset *trialcs,
 	cs->mems_allowed = trialcs->mems_allowed;
 	mutex_unlock(&callback_mutex);
 
-	update_tasks_nodemask_hier(cs, true, &heap);
+	update_nodemasks_hier(cs, trialcs, &heap);
 
 	heap_free(&heap);
 done:
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 04/11] cpuset: update cs->real_{cpus,mems}_allowed when config changes
@ 2013-08-21  9:59   ` Li Zefan
  0 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-08-21  9:59 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups, Containers

We're going to have separate user-configured masks and effective ones.

At last configured masks can only be changed by writing cpuset.cpus
and cpuset.mems, and they won't be restricted by parent cpuset. While
effective masks reflect cpu/memory hotplug and hierachical restriction.

This is a preparation to make real_{cpus,mems}_allowed to be effective
masks of the cpuset:

- change the effective masks at hotplug: done
- change the effective masks at config change: done
- take on ancestor's mask when the effective mask is empty: todo

This won't introduce behavior change.

Signed-off-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 kernel/cpuset.c | 115 ++++++++++++++++++++++++++++++++------------------------
 1 file changed, 66 insertions(+), 49 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index ab89c1e..72afef4 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -880,39 +880,49 @@ static void update_tasks_cpumask(struct cpuset *cs, struct ptr_heap *heap)
 	css_scan_tasks(&cs->css, NULL, cpuset_change_cpumask, cs, heap);
 }
 
-/*
- * update_tasks_cpumask_hier - Update the cpumasks of tasks in the hierarchy.
- * @root_cs: the root cpuset of the hierarchy
- * @update_root: update root cpuset or not?
+/**
+ * update_cpumasks_hier - Update effective cpumasks and tasks in the subtree
+ * @cs: the cpuset to consider
+ * @trialcs: the trial cpuset
  * @heap: the heap used by css_scan_tasks()
  *
- * This will update cpumasks of tasks in @root_cs and all other empty cpusets
- * which take on cpumask of @root_cs.
- *
- * Called with cpuset_mutex held
+ * When configured cpumask is changed, the effective cpumasks of this cpuset
+ * and all its descendants need to be updated.
  */
-static void update_tasks_cpumask_hier(struct cpuset *root_cs,
-				      bool update_root, struct ptr_heap *heap)
+static void update_cpumasks_hier(struct cpuset *cs, struct cpuset *trialcs,
+				 struct ptr_heap *heap)
 {
-	struct cpuset *cp;
 	struct cgroup_subsys_state *pos_css;
+	struct cpuset *cp;
 
 	rcu_read_lock();
-	cpuset_for_each_descendant_pre(cp, pos_css, root_cs) {
-		if (cp == root_cs) {
-			if (!update_root)
-				continue;
-		} else {
-			/* skip the whole subtree if @cp have some CPU */
-			if (!cpumask_empty(cp->cpus_allowed)) {
-				pos_css = css_rightmost_descendant(pos_css);
-				continue;
-			}
+	cpuset_for_each_descendant_pre(cp, pos_css, cs) {
+		struct cpuset *parent = parent_cs(cs);
+		struct cpumask *new_cpus = trialcs->real_cpus_allowed;
+
+		cpumask_and(new_cpus, cp->cpus_allowed,
+			    parent->real_cpus_allowed);
+
+		/*
+		 * Skip the whole subtree if the cpumask is not changed, unless
+		 * it's empty, and in this case we need to update tasks to take
+		 * on an ancestor's cpumask.
+		 */
+		if (cpumask_equal(new_cpus, cp->real_cpus_allowed) &&
+		    ((cp == cs) || !cpumask_empty(new_cpus))) {
+			pos_css = css_rightmost_descendant(pos_css);
+			continue;
 		}
+
 		if (!css_tryget(&cp->css))
 			continue;
+
 		rcu_read_unlock();
 
+		mutex_lock(&callback_mutex);
+		cpumask_copy(cp->real_cpus_allowed, new_cpus);
+		mutex_unlock(&callback_mutex);
+
 		update_tasks_cpumask(cp, heap);
 
 		rcu_read_lock();
@@ -931,7 +941,6 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
 {
 	struct ptr_heap heap;
 	int retval;
-	int is_load_balanced;
 
 	/* top_cpuset.cpus_allowed tracks cpu_online_mask; it's read-only */
 	if (cs == &top_cpuset)
@@ -966,17 +975,15 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
 	if (retval)
 		return retval;
 
-	is_load_balanced = is_sched_load_balance(trialcs);
-
 	mutex_lock(&callback_mutex);
 	cpumask_copy(cs->cpus_allowed, trialcs->cpus_allowed);
 	mutex_unlock(&callback_mutex);
 
-	update_tasks_cpumask_hier(cs, true, &heap);
+	update_cpumasks_hier(cs, trialcs, &heap);
 
 	heap_free(&heap);
 
-	if (is_load_balanced)
+	if (is_sched_load_balance(cs))
 		rebuild_sched_domains_locked();
 	return 0;
 }
@@ -1137,40 +1144,50 @@ static void update_tasks_nodemask(struct cpuset *cs, struct ptr_heap *heap)
 	cpuset_being_rebound = NULL;
 }
 
-/*
- * update_tasks_nodemask_hier - Update the nodemasks of tasks in the hierarchy.
- * @cs: the root cpuset of the hierarchy
- * @update_root: update the root cpuset or not?
+/**
+ * update_nodesmasks_hier - Update effective nodemasks and tasks in the subtree
+ * @cs: the cpuset to consider
+ * @trialcs: the trial cpuset
  * @heap: the heap used by css_scan_tasks()
  *
- * This will update nodemasks of tasks in @root_cs and all other empty cpusets
- * which take on nodemask of @root_cs.
- *
- * Called with cpuset_mutex held
+ * When configured nodemask is changed, the effective nodemasks of this cpuset
+ * and all its descendants need to be updated.
  */
-static void update_tasks_nodemask_hier(struct cpuset *root_cs,
-				       bool update_root, struct ptr_heap *heap)
+static void update_nodemasks_hier(struct cpuset *cs, struct cpuset *trialcs,
+				 struct ptr_heap *heap)
 {
-	struct cpuset *cp;
 	struct cgroup_subsys_state *pos_css;
+	struct cpuset *cp;
 
 	rcu_read_lock();
-	cpuset_for_each_descendant_pre(cp, pos_css, root_cs) {
-		if (cp == root_cs) {
-			if (!update_root)
-				continue;
-		} else {
-			/* skip the whole subtree if @cp have some CPU */
-			if (!nodes_empty(cp->mems_allowed)) {
-				pos_css = css_rightmost_descendant(pos_css);
-				continue;
-			}
+	cpuset_for_each_descendant_pre(cp, pos_css, cs) {
+		struct cpuset *parent = parent_cs(cs);
+		nodemask_t *new_mems = &trialcs->real_mems_allowed;
+
+		nodes_and(*new_mems, cp->mems_allowed,
+			  parent->real_mems_allowed);
+
+		/*
+		 * Skip the whole subtree if the nodemask is not changed, unless
+		 * it's empty, and in this case we need to update tasks to take
+		 * on an ancestor's nodemask.
+		 */
+		if (nodes_equal(*new_mems, cp->real_mems_allowed) &&
+		    ((cp == cs) || !nodes_empty(*new_mems))) {
+			pos_css = css_rightmost_descendant(pos_css);
+			continue;
 		}
+
 		if (!css_tryget(&cp->css))
 			continue;
+
 		rcu_read_unlock();
 
-		update_tasks_nodemask(cp, heap);
+		mutex_lock(&callback_mutex);
+		cp->real_mems_allowed = *new_mems;
+		mutex_unlock(&callback_mutex);
+
+		update_tasks_cpumask(cs, heap);
 
 		rcu_read_lock();
 		css_put(&cp->css);
@@ -1242,7 +1259,7 @@ static int update_nodemask(struct cpuset *cs, struct cpuset *trialcs,
 	cs->mems_allowed = trialcs->mems_allowed;
 	mutex_unlock(&callback_mutex);
 
-	update_tasks_nodemask_hier(cs, true, &heap);
+	update_nodemasks_hier(cs, trialcs, &heap);
 
 	heap_free(&heap);
 done:
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 05/11] cpuset: inherite ancestor's masks if real_{cpus, mems}_allowed become empty
  2013-08-21  9:58 ` Li Zefan
@ 2013-08-21  9:59     ` Li Zefan
  -1 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-08-21  9:59 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Cgroups, Containers, LKML

We're going to have separate user-configured masks and effective ones.

At last configured masks can only be changed by writing cpuset.cpus
and cpuset.mems, and they won't be restricted by parent cpuset. While
effective masks reflect cpu/memory hotplug and hierachical restriction.

This is a preparation to make real_{cpus,mems}_allowed to be effective
masks of the cpuset:

- change the effective masks at hotplug: done
- change the effective masks at config change: done
- take on ancestor's mask when the effective mask is empty: done

This won't introduce behavior change.

Signed-off-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 kernel/cpuset.c | 33 +++++++++++++++++++++++----------
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 72afef4..b7b63dd 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -904,12 +904,14 @@ static void update_cpumasks_hier(struct cpuset *cs, struct cpuset *trialcs,
 			    parent->real_cpus_allowed);
 
 		/*
-		 * Skip the whole subtree if the cpumask is not changed, unless
-		 * it's empty, and in this case we need to update tasks to take
-		 * on an ancestor's cpumask.
+		 * If it becomes empty, inherite the effective mask of the
+		 * parent, which is guarantted to have some CPUs.
 		 */
-		if (cpumask_equal(new_cpus, cp->real_cpus_allowed) &&
-		    ((cp == cs) || !cpumask_empty(new_cpus))) {
+		if (cpumask_empty(new_cpus))
+			cpumask_copy(new_cpus, parent->real_cpus_allowed);
+
+		/* Skip the whole subtree if the cpumask is not changed. */
+		if (cpumask_equal(new_cpus, cp->real_cpus_allowed)) {
 			pos_css = css_rightmost_descendant(pos_css);
 			continue;
 		}
@@ -1168,12 +1170,14 @@ static void update_nodemasks_hier(struct cpuset *cs, struct cpuset *trialcs,
 			  parent->real_mems_allowed);
 
 		/*
-		 * Skip the whole subtree if the nodemask is not changed, unless
-		 * it's empty, and in this case we need to update tasks to take
-		 * on an ancestor's nodemask.
+		 * If it becoms empty, inherite the effective mask of the
+		 * parent, which is guaranteed to have some MEMs.
 		 */
-		if (nodes_equal(*new_mems, cp->real_mems_allowed) &&
-		    ((cp == cs) || !nodes_empty(*new_mems))) {
+		if (nodes_empty(*new_mems))
+			*new_mems = parent->real_mems_allowed;
+
+		/* Skip the whole subtree if the nodemask is not changed. */
+		if (nodes_equal(*new_mems, cp->real_mems_allowed)) {
 			pos_css = css_rightmost_descendant(pos_css);
 			continue;
 		}
@@ -2202,8 +2206,13 @@ retry:
 
 	mutex_lock(&callback_mutex);
 	cpumask_andnot(cs->cpus_allowed, cs->cpus_allowed, &off_cpus);
+
 	cpumask_andnot(cs->real_cpus_allowed, cs->real_cpus_allowed,
 		       &off_cpus);
+	/* Inherite the effective mask of the parent, if it becomes empty */
+	if (cpumask_empty(cs->real_cpus_allowed))
+		cpumask_copy(cs->real_cpus_allowed,
+			     parent_cs(cs)->real_cpus_allowed);
 	mutex_unlock(&callback_mutex);
 
 	/*
@@ -2218,7 +2227,11 @@ retry:
 
 	mutex_lock(&callback_mutex);
 	nodes_andnot(cs->mems_allowed, cs->mems_allowed, off_mems);
+
 	nodes_andnot(cs->real_mems_allowed, cs->real_mems_allowed, off_mems);
+	/* Inherite the effective mask of the parent, if it becomes empty */
+	if (nodes_empty(cs->real_mems_allowed))
+		cs->real_mems_allowed = parent_cs(cs)->real_mems_allowed;
 	mutex_unlock(&callback_mutex);
 
 	/*
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 05/11] cpuset: inherite ancestor's masks if real_{cpus,mems}_allowed become empty
@ 2013-08-21  9:59     ` Li Zefan
  0 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-08-21  9:59 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups, Containers

We're going to have separate user-configured masks and effective ones.

At last configured masks can only be changed by writing cpuset.cpus
and cpuset.mems, and they won't be restricted by parent cpuset. While
effective masks reflect cpu/memory hotplug and hierachical restriction.

This is a preparation to make real_{cpus,mems}_allowed to be effective
masks of the cpuset:

- change the effective masks at hotplug: done
- change the effective masks at config change: done
- take on ancestor's mask when the effective mask is empty: done

This won't introduce behavior change.

Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 kernel/cpuset.c | 33 +++++++++++++++++++++++----------
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 72afef4..b7b63dd 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -904,12 +904,14 @@ static void update_cpumasks_hier(struct cpuset *cs, struct cpuset *trialcs,
 			    parent->real_cpus_allowed);
 
 		/*
-		 * Skip the whole subtree if the cpumask is not changed, unless
-		 * it's empty, and in this case we need to update tasks to take
-		 * on an ancestor's cpumask.
+		 * If it becomes empty, inherite the effective mask of the
+		 * parent, which is guarantted to have some CPUs.
 		 */
-		if (cpumask_equal(new_cpus, cp->real_cpus_allowed) &&
-		    ((cp == cs) || !cpumask_empty(new_cpus))) {
+		if (cpumask_empty(new_cpus))
+			cpumask_copy(new_cpus, parent->real_cpus_allowed);
+
+		/* Skip the whole subtree if the cpumask is not changed. */
+		if (cpumask_equal(new_cpus, cp->real_cpus_allowed)) {
 			pos_css = css_rightmost_descendant(pos_css);
 			continue;
 		}
@@ -1168,12 +1170,14 @@ static void update_nodemasks_hier(struct cpuset *cs, struct cpuset *trialcs,
 			  parent->real_mems_allowed);
 
 		/*
-		 * Skip the whole subtree if the nodemask is not changed, unless
-		 * it's empty, and in this case we need to update tasks to take
-		 * on an ancestor's nodemask.
+		 * If it becoms empty, inherite the effective mask of the
+		 * parent, which is guaranteed to have some MEMs.
 		 */
-		if (nodes_equal(*new_mems, cp->real_mems_allowed) &&
-		    ((cp == cs) || !nodes_empty(*new_mems))) {
+		if (nodes_empty(*new_mems))
+			*new_mems = parent->real_mems_allowed;
+
+		/* Skip the whole subtree if the nodemask is not changed. */
+		if (nodes_equal(*new_mems, cp->real_mems_allowed)) {
 			pos_css = css_rightmost_descendant(pos_css);
 			continue;
 		}
@@ -2202,8 +2206,13 @@ retry:
 
 	mutex_lock(&callback_mutex);
 	cpumask_andnot(cs->cpus_allowed, cs->cpus_allowed, &off_cpus);
+
 	cpumask_andnot(cs->real_cpus_allowed, cs->real_cpus_allowed,
 		       &off_cpus);
+	/* Inherite the effective mask of the parent, if it becomes empty */
+	if (cpumask_empty(cs->real_cpus_allowed))
+		cpumask_copy(cs->real_cpus_allowed,
+			     parent_cs(cs)->real_cpus_allowed);
 	mutex_unlock(&callback_mutex);
 
 	/*
@@ -2218,7 +2227,11 @@ retry:
 
 	mutex_lock(&callback_mutex);
 	nodes_andnot(cs->mems_allowed, cs->mems_allowed, off_mems);
+
 	nodes_andnot(cs->real_mems_allowed, cs->real_mems_allowed, off_mems);
+	/* Inherite the effective mask of the parent, if it becomes empty */
+	if (nodes_empty(cs->real_mems_allowed))
+		cs->real_mems_allowed = parent_cs(cs)->real_mems_allowed;
 	mutex_unlock(&callback_mutex);
 
 	/*
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 06/11] cpuset: apply cs->real_{cpus,mems}_allowed
       [not found] ` <52148F52.0-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
                     ` (4 preceding siblings ...)
  2013-08-21  9:59     ` [PATCH 05/11] cpuset: inherite ancestor's masks if real_{cpus,mems}_allowed " Li Zefan
@ 2013-08-21  9:59   ` Li Zefan
  2013-08-21 10:00     ` Li Zefan
                     ` (5 subsequent siblings)
  11 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-08-21  9:59 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Cgroups, Containers, LKML

Now we can use cs->real{cpus,mems}_allowed as effective masks. It's
used whenever:

- we update tasks' cpus_allowed/mems_allowed,
- we want to retrieve tasks_cs(tsk)'s cpus_allowed/mems_allowed.

They actually replace effective_{cpu,node}mask_cpuset().

Signed-off-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 kernel/cpuset.c | 83 ++++++++++-----------------------------------------------
 1 file changed, 14 insertions(+), 69 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index b7b63dd..0de15eb 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -318,9 +318,9 @@ static struct file_system_type cpuset_fs_type = {
  */
 static void guarantee_online_cpus(struct cpuset *cs, struct cpumask *pmask)
 {
-	while (!cpumask_intersects(cs->cpus_allowed, cpu_online_mask))
+	while (!cpumask_intersects(cs->real_cpus_allowed, cpu_online_mask))
 		cs = parent_cs(cs);
-	cpumask_and(pmask, cs->cpus_allowed, cpu_online_mask);
+	cpumask_and(pmask, cs->real_cpus_allowed, cpu_online_mask);
 }
 
 /*
@@ -336,9 +336,9 @@ static void guarantee_online_cpus(struct cpuset *cs, struct cpumask *pmask)
  */
 static void guarantee_online_mems(struct cpuset *cs, nodemask_t *pmask)
 {
-	while (!nodes_intersects(cs->mems_allowed, node_states[N_MEMORY]))
+	while (!nodes_intersects(cs->real_mems_allowed, node_states[N_MEMORY]))
 		cs = parent_cs(cs);
-	nodes_and(*pmask, cs->mems_allowed, node_states[N_MEMORY]);
+	nodes_and(*pmask, cs->real_mems_allowed, node_states[N_MEMORY]);
 }
 
 /*
@@ -804,45 +804,6 @@ void rebuild_sched_domains(void)
 	mutex_unlock(&cpuset_mutex);
 }
 
-/*
- * effective_cpumask_cpuset - return nearest ancestor with non-empty cpus
- * @cs: the cpuset in interest
- *
- * A cpuset's effective cpumask is the cpumask of the nearest ancestor
- * with non-empty cpus. We use effective cpumask whenever:
- * - we update tasks' cpus_allowed. (they take on the ancestor's cpumask
- *   if the cpuset they reside in has no cpus)
- * - we want to retrieve task_cs(tsk)'s cpus_allowed.
- *
- * Called with cpuset_mutex held. cpuset_cpus_allowed_fallback() is an
- * exception. See comments there.
- */
-static struct cpuset *effective_cpumask_cpuset(struct cpuset *cs)
-{
-	while (cpumask_empty(cs->cpus_allowed))
-		cs = parent_cs(cs);
-	return cs;
-}
-
-/*
- * effective_nodemask_cpuset - return nearest ancestor with non-empty mems
- * @cs: the cpuset in interest
- *
- * A cpuset's effective nodemask is the nodemask of the nearest ancestor
- * with non-empty memss. We use effective nodemask whenever:
- * - we update tasks' mems_allowed. (they take on the ancestor's nodemask
- *   if the cpuset they reside in has no mems)
- * - we want to retrieve task_cs(tsk)'s mems_allowed.
- *
- * Called with cpuset_mutex held.
- */
-static struct cpuset *effective_nodemask_cpuset(struct cpuset *cs)
-{
-	while (nodes_empty(cs->mems_allowed))
-		cs = parent_cs(cs);
-	return cs;
-}
-
 /**
  * cpuset_change_cpumask - make a task's cpus_allowed the same as its cpuset's
  * @tsk: task to test
@@ -857,9 +818,8 @@ static struct cpuset *effective_nodemask_cpuset(struct cpuset *cs)
 static void cpuset_change_cpumask(struct task_struct *tsk, void *data)
 {
 	struct cpuset *cs = data;
-	struct cpuset *cpus_cs = effective_cpumask_cpuset(cs);
 
-	set_cpus_allowed_ptr(tsk, cpus_cs->cpus_allowed);
+	set_cpus_allowed_ptr(tsk, cs->real_cpus_allowed);
 }
 
 /**
@@ -1014,14 +974,12 @@ static void cpuset_migrate_mm(struct mm_struct *mm, const nodemask_t *from,
 							const nodemask_t *to)
 {
 	struct task_struct *tsk = current;
-	struct cpuset *mems_cs;
 
 	tsk->mems_allowed = *to;
 
 	do_migrate_pages(mm, from, to, MPOL_MF_MOVE_ALL);
 
-	mems_cs = effective_nodemask_cpuset(task_cs(tsk));
-	guarantee_online_mems(mems_cs, &tsk->mems_allowed);
+	guarantee_online_mems(task_cs(tsk), &tsk->mems_allowed);
 }
 
 /*
@@ -1116,13 +1074,12 @@ static void *cpuset_being_rebound;
 static void update_tasks_nodemask(struct cpuset *cs, struct ptr_heap *heap)
 {
 	static nodemask_t newmems;	/* protected by cpuset_mutex */
-	struct cpuset *mems_cs = effective_nodemask_cpuset(cs);
 	struct cpuset_change_nodemask_arg arg = { .cs = cs,
 						  .newmems = &newmems };
 
 	cpuset_being_rebound = cs;		/* causes mpol_dup() rebind */
 
-	guarantee_online_mems(mems_cs, &newmems);
+	guarantee_online_mems(cs, &newmems);
 
 	/*
 	 * The mpol_rebind_mm() call takes mmap_sem, which we couldn't
@@ -1556,8 +1513,6 @@ static void cpuset_attach(struct cgroup_subsys_state *css,
 							cpuset_subsys_id);
 	struct cpuset *cs = css_cs(css);
 	struct cpuset *oldcs = css_cs(oldcss);
-	struct cpuset *cpus_cs = effective_cpumask_cpuset(cs);
-	struct cpuset *mems_cs = effective_nodemask_cpuset(cs);
 
 	mutex_lock(&cpuset_mutex);
 
@@ -1565,9 +1520,9 @@ static void cpuset_attach(struct cgroup_subsys_state *css,
 	if (cs == &top_cpuset)
 		cpumask_copy(cpus_attach, cpu_possible_mask);
 	else
-		guarantee_online_cpus(cpus_cs, cpus_attach);
+		guarantee_online_cpus(cs, cpus_attach);
 
-	guarantee_online_mems(mems_cs, &cpuset_attach_nodemask_to);
+	guarantee_online_mems(cs, &cpuset_attach_nodemask_to);
 
 	cgroup_taskset_for_each(task, css, tset) {
 		/*
@@ -1584,11 +1539,9 @@ static void cpuset_attach(struct cgroup_subsys_state *css,
 	 * Change mm, possibly for multiple threads in a threadgroup. This is
 	 * expensive and may sleep.
 	 */
-	cpuset_attach_nodemask_to = cs->mems_allowed;
+	cpuset_attach_nodemask_to = cs->real_mems_allowed;
 	mm = get_task_mm(leader);
 	if (mm) {
-		struct cpuset *mems_oldcs = effective_nodemask_cpuset(oldcs);
-
 		mpol_rebind_mm(mm, &cpuset_attach_nodemask_to);
 
 		/*
@@ -1599,7 +1552,7 @@ static void cpuset_attach(struct cgroup_subsys_state *css,
 		 * mm from.
 		 */
 		if (is_memory_migrate(cs)) {
-			cpuset_migrate_mm(mm, &mems_oldcs->old_mems_allowed,
+			cpuset_migrate_mm(mm, &oldcs->old_mems_allowed,
 					  &cpuset_attach_nodemask_to);
 		}
 		mmput(mm);
@@ -2398,23 +2351,17 @@ void __init cpuset_init_smp(void)
 
 void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
 {
-	struct cpuset *cpus_cs;
-
 	mutex_lock(&callback_mutex);
 	task_lock(tsk);
-	cpus_cs = effective_cpumask_cpuset(task_cs(tsk));
-	guarantee_online_cpus(cpus_cs, pmask);
+	guarantee_online_cpus(task_cs(tsk), pmask);
 	task_unlock(tsk);
 	mutex_unlock(&callback_mutex);
 }
 
 void cpuset_cpus_allowed_fallback(struct task_struct *tsk)
 {
-	struct cpuset *cpus_cs;
-
 	rcu_read_lock();
-	cpus_cs = effective_cpumask_cpuset(task_cs(tsk));
-	do_set_cpus_allowed(tsk, cpus_cs->cpus_allowed);
+	do_set_cpus_allowed(tsk, task_cs(tsk)->real_cpus_allowed);
 	rcu_read_unlock();
 
 	/*
@@ -2453,13 +2400,11 @@ void cpuset_init_current_mems_allowed(void)
 
 nodemask_t cpuset_mems_allowed(struct task_struct *tsk)
 {
-	struct cpuset *mems_cs;
 	nodemask_t mask;
 
 	mutex_lock(&callback_mutex);
 	task_lock(tsk);
-	mems_cs = effective_nodemask_cpuset(task_cs(tsk));
-	guarantee_online_mems(mems_cs, &mask);
+	guarantee_online_mems(task_cs(tsk), &mask);
 	task_unlock(tsk);
 	mutex_unlock(&callback_mutex);
 
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 06/11] cpuset: apply cs->real_{cpus,mems}_allowed
  2013-08-21  9:58 ` Li Zefan
                   ` (5 preceding siblings ...)
  (?)
@ 2013-08-21  9:59 ` Li Zefan
  2013-08-21 14:01     ` Tejun Heo
       [not found]   ` <52148F9C.2080600-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
  -1 siblings, 2 replies; 66+ messages in thread
From: Li Zefan @ 2013-08-21  9:59 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups, Containers

Now we can use cs->real{cpus,mems}_allowed as effective masks. It's
used whenever:

- we update tasks' cpus_allowed/mems_allowed,
- we want to retrieve tasks_cs(tsk)'s cpus_allowed/mems_allowed.

They actually replace effective_{cpu,node}mask_cpuset().

Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 kernel/cpuset.c | 83 ++++++++++-----------------------------------------------
 1 file changed, 14 insertions(+), 69 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index b7b63dd..0de15eb 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -318,9 +318,9 @@ static struct file_system_type cpuset_fs_type = {
  */
 static void guarantee_online_cpus(struct cpuset *cs, struct cpumask *pmask)
 {
-	while (!cpumask_intersects(cs->cpus_allowed, cpu_online_mask))
+	while (!cpumask_intersects(cs->real_cpus_allowed, cpu_online_mask))
 		cs = parent_cs(cs);
-	cpumask_and(pmask, cs->cpus_allowed, cpu_online_mask);
+	cpumask_and(pmask, cs->real_cpus_allowed, cpu_online_mask);
 }
 
 /*
@@ -336,9 +336,9 @@ static void guarantee_online_cpus(struct cpuset *cs, struct cpumask *pmask)
  */
 static void guarantee_online_mems(struct cpuset *cs, nodemask_t *pmask)
 {
-	while (!nodes_intersects(cs->mems_allowed, node_states[N_MEMORY]))
+	while (!nodes_intersects(cs->real_mems_allowed, node_states[N_MEMORY]))
 		cs = parent_cs(cs);
-	nodes_and(*pmask, cs->mems_allowed, node_states[N_MEMORY]);
+	nodes_and(*pmask, cs->real_mems_allowed, node_states[N_MEMORY]);
 }
 
 /*
@@ -804,45 +804,6 @@ void rebuild_sched_domains(void)
 	mutex_unlock(&cpuset_mutex);
 }
 
-/*
- * effective_cpumask_cpuset - return nearest ancestor with non-empty cpus
- * @cs: the cpuset in interest
- *
- * A cpuset's effective cpumask is the cpumask of the nearest ancestor
- * with non-empty cpus. We use effective cpumask whenever:
- * - we update tasks' cpus_allowed. (they take on the ancestor's cpumask
- *   if the cpuset they reside in has no cpus)
- * - we want to retrieve task_cs(tsk)'s cpus_allowed.
- *
- * Called with cpuset_mutex held. cpuset_cpus_allowed_fallback() is an
- * exception. See comments there.
- */
-static struct cpuset *effective_cpumask_cpuset(struct cpuset *cs)
-{
-	while (cpumask_empty(cs->cpus_allowed))
-		cs = parent_cs(cs);
-	return cs;
-}
-
-/*
- * effective_nodemask_cpuset - return nearest ancestor with non-empty mems
- * @cs: the cpuset in interest
- *
- * A cpuset's effective nodemask is the nodemask of the nearest ancestor
- * with non-empty memss. We use effective nodemask whenever:
- * - we update tasks' mems_allowed. (they take on the ancestor's nodemask
- *   if the cpuset they reside in has no mems)
- * - we want to retrieve task_cs(tsk)'s mems_allowed.
- *
- * Called with cpuset_mutex held.
- */
-static struct cpuset *effective_nodemask_cpuset(struct cpuset *cs)
-{
-	while (nodes_empty(cs->mems_allowed))
-		cs = parent_cs(cs);
-	return cs;
-}
-
 /**
  * cpuset_change_cpumask - make a task's cpus_allowed the same as its cpuset's
  * @tsk: task to test
@@ -857,9 +818,8 @@ static struct cpuset *effective_nodemask_cpuset(struct cpuset *cs)
 static void cpuset_change_cpumask(struct task_struct *tsk, void *data)
 {
 	struct cpuset *cs = data;
-	struct cpuset *cpus_cs = effective_cpumask_cpuset(cs);
 
-	set_cpus_allowed_ptr(tsk, cpus_cs->cpus_allowed);
+	set_cpus_allowed_ptr(tsk, cs->real_cpus_allowed);
 }
 
 /**
@@ -1014,14 +974,12 @@ static void cpuset_migrate_mm(struct mm_struct *mm, const nodemask_t *from,
 							const nodemask_t *to)
 {
 	struct task_struct *tsk = current;
-	struct cpuset *mems_cs;
 
 	tsk->mems_allowed = *to;
 
 	do_migrate_pages(mm, from, to, MPOL_MF_MOVE_ALL);
 
-	mems_cs = effective_nodemask_cpuset(task_cs(tsk));
-	guarantee_online_mems(mems_cs, &tsk->mems_allowed);
+	guarantee_online_mems(task_cs(tsk), &tsk->mems_allowed);
 }
 
 /*
@@ -1116,13 +1074,12 @@ static void *cpuset_being_rebound;
 static void update_tasks_nodemask(struct cpuset *cs, struct ptr_heap *heap)
 {
 	static nodemask_t newmems;	/* protected by cpuset_mutex */
-	struct cpuset *mems_cs = effective_nodemask_cpuset(cs);
 	struct cpuset_change_nodemask_arg arg = { .cs = cs,
 						  .newmems = &newmems };
 
 	cpuset_being_rebound = cs;		/* causes mpol_dup() rebind */
 
-	guarantee_online_mems(mems_cs, &newmems);
+	guarantee_online_mems(cs, &newmems);
 
 	/*
 	 * The mpol_rebind_mm() call takes mmap_sem, which we couldn't
@@ -1556,8 +1513,6 @@ static void cpuset_attach(struct cgroup_subsys_state *css,
 							cpuset_subsys_id);
 	struct cpuset *cs = css_cs(css);
 	struct cpuset *oldcs = css_cs(oldcss);
-	struct cpuset *cpus_cs = effective_cpumask_cpuset(cs);
-	struct cpuset *mems_cs = effective_nodemask_cpuset(cs);
 
 	mutex_lock(&cpuset_mutex);
 
@@ -1565,9 +1520,9 @@ static void cpuset_attach(struct cgroup_subsys_state *css,
 	if (cs == &top_cpuset)
 		cpumask_copy(cpus_attach, cpu_possible_mask);
 	else
-		guarantee_online_cpus(cpus_cs, cpus_attach);
+		guarantee_online_cpus(cs, cpus_attach);
 
-	guarantee_online_mems(mems_cs, &cpuset_attach_nodemask_to);
+	guarantee_online_mems(cs, &cpuset_attach_nodemask_to);
 
 	cgroup_taskset_for_each(task, css, tset) {
 		/*
@@ -1584,11 +1539,9 @@ static void cpuset_attach(struct cgroup_subsys_state *css,
 	 * Change mm, possibly for multiple threads in a threadgroup. This is
 	 * expensive and may sleep.
 	 */
-	cpuset_attach_nodemask_to = cs->mems_allowed;
+	cpuset_attach_nodemask_to = cs->real_mems_allowed;
 	mm = get_task_mm(leader);
 	if (mm) {
-		struct cpuset *mems_oldcs = effective_nodemask_cpuset(oldcs);
-
 		mpol_rebind_mm(mm, &cpuset_attach_nodemask_to);
 
 		/*
@@ -1599,7 +1552,7 @@ static void cpuset_attach(struct cgroup_subsys_state *css,
 		 * mm from.
 		 */
 		if (is_memory_migrate(cs)) {
-			cpuset_migrate_mm(mm, &mems_oldcs->old_mems_allowed,
+			cpuset_migrate_mm(mm, &oldcs->old_mems_allowed,
 					  &cpuset_attach_nodemask_to);
 		}
 		mmput(mm);
@@ -2398,23 +2351,17 @@ void __init cpuset_init_smp(void)
 
 void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask)
 {
-	struct cpuset *cpus_cs;
-
 	mutex_lock(&callback_mutex);
 	task_lock(tsk);
-	cpus_cs = effective_cpumask_cpuset(task_cs(tsk));
-	guarantee_online_cpus(cpus_cs, pmask);
+	guarantee_online_cpus(task_cs(tsk), pmask);
 	task_unlock(tsk);
 	mutex_unlock(&callback_mutex);
 }
 
 void cpuset_cpus_allowed_fallback(struct task_struct *tsk)
 {
-	struct cpuset *cpus_cs;
-
 	rcu_read_lock();
-	cpus_cs = effective_cpumask_cpuset(task_cs(tsk));
-	do_set_cpus_allowed(tsk, cpus_cs->cpus_allowed);
+	do_set_cpus_allowed(tsk, task_cs(tsk)->real_cpus_allowed);
 	rcu_read_unlock();
 
 	/*
@@ -2453,13 +2400,11 @@ void cpuset_init_current_mems_allowed(void)
 
 nodemask_t cpuset_mems_allowed(struct task_struct *tsk)
 {
-	struct cpuset *mems_cs;
 	nodemask_t mask;
 
 	mutex_lock(&callback_mutex);
 	task_lock(tsk);
-	mems_cs = effective_nodemask_cpuset(task_cs(tsk));
-	guarantee_online_mems(mems_cs, &mask);
+	guarantee_online_mems(task_cs(tsk), &mask);
 	task_unlock(tsk);
 	mutex_unlock(&callback_mutex);
 
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 07/11] cpuset: use effective cpumask to build sched domains
  2013-08-21  9:58 ` Li Zefan
@ 2013-08-21 10:00     ` Li Zefan
  -1 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-08-21 10:00 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Cgroups, Containers, LKML

We're going to have separate user-configured masks and effective ones,
and configured masks won't be restricted by the parent, so we should
use effective masks to build sched domains.

This won't introduce behavior change.

Signed-off-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 kernel/cpuset.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 0de15eb..e7ad4a7 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -500,11 +500,11 @@ out:
 #ifdef CONFIG_SMP
 /*
  * Helper routine for generate_sched_domains().
- * Do cpusets a, b have overlapping cpus_allowed masks?
+ * Do cpusets a, b have overlapping effective cpus_allowed masks?
  */
 static int cpusets_overlap(struct cpuset *a, struct cpuset *b)
 {
-	return cpumask_intersects(a->cpus_allowed, b->cpus_allowed);
+	return cpumask_intersects(a->real_cpus_allowed, b->real_cpus_allowed);
 }
 
 static void
@@ -621,7 +621,7 @@ static int generate_sched_domains(cpumask_var_t **domains,
 			*dattr = SD_ATTR_INIT;
 			update_domain_attr_tree(dattr, &top_cpuset);
 		}
-		cpumask_copy(doms[0], top_cpuset.cpus_allowed);
+		cpumask_copy(doms[0], top_cpuset.real_cpus_allowed);
 
 		goto done;
 	}
@@ -728,7 +728,7 @@ restart:
 			struct cpuset *b = csa[j];
 
 			if (apn == b->pn) {
-				cpumask_or(dp, dp, b->cpus_allowed);
+				cpumask_or(dp, dp, b->real_cpus_allowed);
 				if (dattr)
 					update_domain_attr_tree(dattr + nslot, b);
 
@@ -854,6 +854,7 @@ static void update_cpumasks_hier(struct cpuset *cs, struct cpuset *trialcs,
 {
 	struct cgroup_subsys_state *pos_css;
 	struct cpuset *cp;
+	bool need_rebuild_sched_domains = false;
 
 	rcu_read_lock();
 	cpuset_for_each_descendant_pre(cp, pos_css, cs) {
@@ -887,10 +888,17 @@ static void update_cpumasks_hier(struct cpuset *cs, struct cpuset *trialcs,
 
 		update_tasks_cpumask(cp, heap);
 
+		if (!cpumask_empty(cp->cpus_allowed) &&
+		    is_sched_load_balance(cp))
+			need_rebuild_sched_domains = true;
+
 		rcu_read_lock();
 		css_put(&cp->css);
 	}
 	rcu_read_unlock();
+
+	if (need_rebuild_sched_domains)
+		rebuild_sched_domains_locked();
 }
 
 /**
@@ -944,9 +952,6 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
 	update_cpumasks_hier(cs, trialcs, &heap);
 
 	heap_free(&heap);
-
-	if (is_sched_load_balance(cs))
-		rebuild_sched_domains_locked();
 	return 0;
 }
 
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 07/11] cpuset: use effective cpumask to build sched domains
@ 2013-08-21 10:00     ` Li Zefan
  0 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-08-21 10:00 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups, Containers

We're going to have separate user-configured masks and effective ones,
and configured masks won't be restricted by the parent, so we should
use effective masks to build sched domains.

This won't introduce behavior change.

Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 kernel/cpuset.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 0de15eb..e7ad4a7 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -500,11 +500,11 @@ out:
 #ifdef CONFIG_SMP
 /*
  * Helper routine for generate_sched_domains().
- * Do cpusets a, b have overlapping cpus_allowed masks?
+ * Do cpusets a, b have overlapping effective cpus_allowed masks?
  */
 static int cpusets_overlap(struct cpuset *a, struct cpuset *b)
 {
-	return cpumask_intersects(a->cpus_allowed, b->cpus_allowed);
+	return cpumask_intersects(a->real_cpus_allowed, b->real_cpus_allowed);
 }
 
 static void
@@ -621,7 +621,7 @@ static int generate_sched_domains(cpumask_var_t **domains,
 			*dattr = SD_ATTR_INIT;
 			update_domain_attr_tree(dattr, &top_cpuset);
 		}
-		cpumask_copy(doms[0], top_cpuset.cpus_allowed);
+		cpumask_copy(doms[0], top_cpuset.real_cpus_allowed);
 
 		goto done;
 	}
@@ -728,7 +728,7 @@ restart:
 			struct cpuset *b = csa[j];
 
 			if (apn == b->pn) {
-				cpumask_or(dp, dp, b->cpus_allowed);
+				cpumask_or(dp, dp, b->real_cpus_allowed);
 				if (dattr)
 					update_domain_attr_tree(dattr + nslot, b);
 
@@ -854,6 +854,7 @@ static void update_cpumasks_hier(struct cpuset *cs, struct cpuset *trialcs,
 {
 	struct cgroup_subsys_state *pos_css;
 	struct cpuset *cp;
+	bool need_rebuild_sched_domains = false;
 
 	rcu_read_lock();
 	cpuset_for_each_descendant_pre(cp, pos_css, cs) {
@@ -887,10 +888,17 @@ static void update_cpumasks_hier(struct cpuset *cs, struct cpuset *trialcs,
 
 		update_tasks_cpumask(cp, heap);
 
+		if (!cpumask_empty(cp->cpus_allowed) &&
+		    is_sched_load_balance(cp))
+			need_rebuild_sched_domains = true;
+
 		rcu_read_lock();
 		css_put(&cp->css);
 	}
 	rcu_read_unlock();
+
+	if (need_rebuild_sched_domains)
+		rebuild_sched_domains_locked();
 }
 
 /**
@@ -944,9 +952,6 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
 	update_cpumasks_hier(cs, trialcs, &heap);
 
 	heap_free(&heap);
-
-	if (is_sched_load_balance(cs))
-		rebuild_sched_domains_locked();
 	return 0;
 }
 
-- 
1.8.0.2


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 08/11] cpuset: separate configured masks and efffective masks
       [not found] ` <52148F52.0-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
                     ` (6 preceding siblings ...)
  2013-08-21 10:00     ` Li Zefan
@ 2013-08-21 10:00   ` Li Zefan
  2013-08-21 10:01     ` Li Zefan
                     ` (3 subsequent siblings)
  11 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-08-21 10:00 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Cgroups, Containers, LKML

Now we've used effective cpumasks to enforce hierarchical manner,
we can use cs->{cpus,mems}_allowed as configured masks.

Configured masks can be changed by writing cpuset.cpus and cpuset.mems
only. The new behaviors are:

- They won't be changed by hotplug anymore.
- They won't be limited by its parent's masks.

This behavior change won't take effect unless mount with sane_behavior.

Signed-off-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 kernel/cpuset.c | 33 ++++++++++++++++++++++-----------
 1 file changed, 22 insertions(+), 11 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index e7ad4a7..c3a02a9 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -457,9 +457,13 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial)
 
 	par = parent_cs(cur);
 
-	/* We must be a subset of our parent cpuset */
+	/*
+	 * We must be a subset of our parent cpuset, unless sane_behavior
+	 * flag is set.
+	 */
 	ret = -EACCES;
-	if (!is_cpuset_subset(trial, par))
+	if (!cgroup_sane_behavior(cur->css.cgroup) &&
+	    !is_cpuset_subset(trial, par))
 		goto out;
 
 	/*
@@ -780,7 +784,7 @@ static void rebuild_sched_domains_locked(void)
 	 * passing doms with offlined cpu to partition_sched_domains().
 	 * Anyways, hotplug work item will rebuild sched domains.
 	 */
-	if (!cpumask_equal(top_cpuset.cpus_allowed, cpu_active_mask))
+	if (!cpumask_equal(top_cpuset.real_cpus_allowed, cpu_active_mask))
 		goto out;
 
 	/* Generate domain masks and attrs */
@@ -2159,11 +2163,14 @@ retry:
 		goto retry;
 	}
 
-	cpumask_andnot(&off_cpus, cs->cpus_allowed, top_cpuset.cpus_allowed);
-	nodes_andnot(off_mems, cs->mems_allowed, top_cpuset.mems_allowed);
+	cpumask_andnot(&off_cpus, cs->real_cpus_allowed,
+		       top_cpuset.real_cpus_allowed);
+	nodes_andnot(off_mems, cs->real_mems_allowed,
+		     top_cpuset.real_mems_allowed);
 
 	mutex_lock(&callback_mutex);
-	cpumask_andnot(cs->cpus_allowed, cs->cpus_allowed, &off_cpus);
+	if (!sane)
+		cpumask_andnot(cs->cpus_allowed, cs->cpus_allowed, &off_cpus);
 
 	cpumask_andnot(cs->real_cpus_allowed, cs->real_cpus_allowed,
 		       &off_cpus);
@@ -2184,7 +2191,8 @@ retry:
 		update_tasks_cpumask(cs, NULL);
 
 	mutex_lock(&callback_mutex);
-	nodes_andnot(cs->mems_allowed, cs->mems_allowed, off_mems);
+	if (!sane)
+		nodes_andnot(cs->mems_allowed, cs->mems_allowed, off_mems);
 
 	nodes_andnot(cs->real_mems_allowed, cs->real_mems_allowed, off_mems);
 	/* Inherite the effective mask of the parent, if it becomes empty */
@@ -2239,6 +2247,7 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
 	static cpumask_t new_cpus;
 	static nodemask_t new_mems;
 	bool cpus_updated, mems_updated;
+	bool sane = cgroup_sane_behavior(top_cpuset.css.cgroup);
 
 	mutex_lock(&cpuset_mutex);
 
@@ -2246,13 +2255,14 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
 	cpumask_copy(&new_cpus, cpu_active_mask);
 	new_mems = node_states[N_MEMORY];
 
-	cpus_updated = !cpumask_equal(top_cpuset.cpus_allowed, &new_cpus);
-	mems_updated = !nodes_equal(top_cpuset.mems_allowed, new_mems);
+	cpus_updated = !cpumask_equal(top_cpuset.real_cpus_allowed, &new_cpus);
+	mems_updated = !nodes_equal(top_cpuset.real_mems_allowed, new_mems);
 
 	/* synchronize cpus_allowed to cpu_active_mask */
 	if (cpus_updated) {
 		mutex_lock(&callback_mutex);
-		cpumask_copy(top_cpuset.cpus_allowed, &new_cpus);
+		if (!sane)
+			cpumask_copy(top_cpuset.cpus_allowed, &new_cpus);
 		cpumask_copy(top_cpuset.real_cpus_allowed, &new_cpus);
 		mutex_unlock(&callback_mutex);
 		/* we don't mess with cpumasks of tasks in top_cpuset */
@@ -2261,7 +2271,8 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
 	/* synchronize mems_allowed to N_MEMORY */
 	if (mems_updated) {
 		mutex_lock(&callback_mutex);
-		top_cpuset.mems_allowed = new_mems;
+		if (!sane)
+			top_cpuset.mems_allowed = new_mems;
 		top_cpuset.real_mems_allowed = new_mems;
 		mutex_unlock(&callback_mutex);
 		update_tasks_nodemask(&top_cpuset, NULL);
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 08/11] cpuset: separate configured masks and efffective masks
  2013-08-21  9:58 ` Li Zefan
                   ` (6 preceding siblings ...)
  (?)
@ 2013-08-21 10:00 ` Li Zefan
       [not found]   ` <52148FCA.8010704-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
  -1 siblings, 1 reply; 66+ messages in thread
From: Li Zefan @ 2013-08-21 10:00 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups, Containers

Now we've used effective cpumasks to enforce hierarchical manner,
we can use cs->{cpus,mems}_allowed as configured masks.

Configured masks can be changed by writing cpuset.cpus and cpuset.mems
only. The new behaviors are:

- They won't be changed by hotplug anymore.
- They won't be limited by its parent's masks.

This behavior change won't take effect unless mount with sane_behavior.

Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 kernel/cpuset.c | 33 ++++++++++++++++++++++-----------
 1 file changed, 22 insertions(+), 11 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index e7ad4a7..c3a02a9 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -457,9 +457,13 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial)
 
 	par = parent_cs(cur);
 
-	/* We must be a subset of our parent cpuset */
+	/*
+	 * We must be a subset of our parent cpuset, unless sane_behavior
+	 * flag is set.
+	 */
 	ret = -EACCES;
-	if (!is_cpuset_subset(trial, par))
+	if (!cgroup_sane_behavior(cur->css.cgroup) &&
+	    !is_cpuset_subset(trial, par))
 		goto out;
 
 	/*
@@ -780,7 +784,7 @@ static void rebuild_sched_domains_locked(void)
 	 * passing doms with offlined cpu to partition_sched_domains().
 	 * Anyways, hotplug work item will rebuild sched domains.
 	 */
-	if (!cpumask_equal(top_cpuset.cpus_allowed, cpu_active_mask))
+	if (!cpumask_equal(top_cpuset.real_cpus_allowed, cpu_active_mask))
 		goto out;
 
 	/* Generate domain masks and attrs */
@@ -2159,11 +2163,14 @@ retry:
 		goto retry;
 	}
 
-	cpumask_andnot(&off_cpus, cs->cpus_allowed, top_cpuset.cpus_allowed);
-	nodes_andnot(off_mems, cs->mems_allowed, top_cpuset.mems_allowed);
+	cpumask_andnot(&off_cpus, cs->real_cpus_allowed,
+		       top_cpuset.real_cpus_allowed);
+	nodes_andnot(off_mems, cs->real_mems_allowed,
+		     top_cpuset.real_mems_allowed);
 
 	mutex_lock(&callback_mutex);
-	cpumask_andnot(cs->cpus_allowed, cs->cpus_allowed, &off_cpus);
+	if (!sane)
+		cpumask_andnot(cs->cpus_allowed, cs->cpus_allowed, &off_cpus);
 
 	cpumask_andnot(cs->real_cpus_allowed, cs->real_cpus_allowed,
 		       &off_cpus);
@@ -2184,7 +2191,8 @@ retry:
 		update_tasks_cpumask(cs, NULL);
 
 	mutex_lock(&callback_mutex);
-	nodes_andnot(cs->mems_allowed, cs->mems_allowed, off_mems);
+	if (!sane)
+		nodes_andnot(cs->mems_allowed, cs->mems_allowed, off_mems);
 
 	nodes_andnot(cs->real_mems_allowed, cs->real_mems_allowed, off_mems);
 	/* Inherite the effective mask of the parent, if it becomes empty */
@@ -2239,6 +2247,7 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
 	static cpumask_t new_cpus;
 	static nodemask_t new_mems;
 	bool cpus_updated, mems_updated;
+	bool sane = cgroup_sane_behavior(top_cpuset.css.cgroup);
 
 	mutex_lock(&cpuset_mutex);
 
@@ -2246,13 +2255,14 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
 	cpumask_copy(&new_cpus, cpu_active_mask);
 	new_mems = node_states[N_MEMORY];
 
-	cpus_updated = !cpumask_equal(top_cpuset.cpus_allowed, &new_cpus);
-	mems_updated = !nodes_equal(top_cpuset.mems_allowed, new_mems);
+	cpus_updated = !cpumask_equal(top_cpuset.real_cpus_allowed, &new_cpus);
+	mems_updated = !nodes_equal(top_cpuset.real_mems_allowed, new_mems);
 
 	/* synchronize cpus_allowed to cpu_active_mask */
 	if (cpus_updated) {
 		mutex_lock(&callback_mutex);
-		cpumask_copy(top_cpuset.cpus_allowed, &new_cpus);
+		if (!sane)
+			cpumask_copy(top_cpuset.cpus_allowed, &new_cpus);
 		cpumask_copy(top_cpuset.real_cpus_allowed, &new_cpus);
 		mutex_unlock(&callback_mutex);
 		/* we don't mess with cpumasks of tasks in top_cpuset */
@@ -2261,7 +2271,8 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
 	/* synchronize mems_allowed to N_MEMORY */
 	if (mems_updated) {
 		mutex_lock(&callback_mutex);
-		top_cpuset.mems_allowed = new_mems;
+		if (!sane)
+			top_cpuset.mems_allowed = new_mems;
 		top_cpuset.real_mems_allowed = new_mems;
 		mutex_unlock(&callback_mutex);
 		update_tasks_nodemask(&top_cpuset, NULL);
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 09/11] cpuset: enable onlined cpu/node in effective masks
  2013-08-21  9:58 ` Li Zefan
@ 2013-08-21 10:01     ` Li Zefan
  -1 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-08-21 10:01 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Cgroups, Containers, LKML

Firstly offline cpu1:

  # echo 0-1 > cpuset.cpus
  # echo 0 > /sys/devices/system/cpu/cpu1/online
  # cat cpuset.cpus
  0-1
  # cat cpuset.effective_cpus
  0

Then online it:

  # echo 1 > /sys/devices/system/cpu/cpu1/online
  # cat cpuset.cpus
  0-1
  # cat cpuset.effective_cpus
  0-1

And cpuset will bring it back to the effective mask.

This is a behavior change for sane_behavior.

Signed-off-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 kernel/cpuset.c | 140 +++++++++++++++++++++++++++++++-------------------------
 1 file changed, 77 insertions(+), 63 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index c3a02a9..20fc109 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -2134,6 +2134,77 @@ static void remove_tasks_in_empty_cpuset(struct cpuset *cs)
 	}
 }
 
+static void hotplug_update_tasks_insane(struct cpuset *cs,
+					struct cpumask *off_cpus,
+					nodemask_t *off_mems)
+{
+	bool is_empty;
+
+	cpumask_andnot(off_cpus, cs->real_cpus_allowed,
+		       top_cpuset.real_cpus_allowed);
+	nodes_andnot(*off_mems, cs->real_mems_allowed,
+		     top_cpuset.real_mems_allowed);
+
+	mutex_lock(&callback_mutex);
+	cpumask_andnot(cs->cpus_allowed, cs->cpus_allowed, off_cpus);
+	cpumask_andnot(cs->real_cpus_allowed, cs->real_cpus_allowed, off_cpus);
+	nodes_andnot(cs->mems_allowed, cs->mems_allowed, *off_mems);
+	nodes_andnot(cs->real_mems_allowed, cs->real_mems_allowed, *off_mems);
+	mutex_unlock(&callback_mutex);
+
+	/*
+	 * Don't call update_tasks_cpumask() if the cpuset becomes empty,
+	 * as the tasks will be migrated to an ancestor.
+	 */
+	if (!cpumask_empty(off_cpus) && !cpumask_empty(cs->cpus_allowed))
+		update_tasks_cpumask(cs, NULL);
+
+	if (!nodes_empty(*off_mems) && !cpumask_empty(cs->cpus_allowed))
+		update_tasks_nodemask(cs, NULL);
+
+	is_empty = cpumask_empty(cs->cpus_allowed) ||
+		   nodes_empty(cs->mems_allowed);
+
+	mutex_unlock(&cpuset_mutex);
+	/*
+	 * Move tasks to the nearest ancestor with execution resources,
+	 * This is full cgroup operation which will also call back into
+	 * cpuset. Should be don outside any lock.
+	 */
+	if (is_empty)
+		remove_tasks_in_empty_cpuset(cs);
+	mutex_lock(&cpuset_mutex);
+}
+
+static void hotplug_update_tasks_sane(struct cpuset *cs,
+				      struct cpumask *new_cpus,
+				      nodemask_t *new_mems)
+{
+	struct cpuset *parent = parent_cs(cs);
+	bool update_cpus, update_mems;
+
+	cpumask_and(new_cpus, cs->cpus_allowed, parent->real_cpus_allowed);
+	if (cpumask_empty(new_cpus))
+		cpumask_copy(new_cpus, parent->real_cpus_allowed);
+
+	nodes_and(*new_mems, cs->mems_allowed, parent->real_mems_allowed);
+	if (nodes_empty(*new_mems))
+		*new_mems = parent->real_mems_allowed;
+
+	update_cpus = !cpumask_equal(cs->real_cpus_allowed, new_cpus);
+	update_mems = !nodes_equal(cs->real_mems_allowed, new_mems);
+
+	mutex_lock(&callback_mutex);
+	cpumask_copy(cs->real_cpus_allowed, new_cpus);
+	cs->real_mems_allowed = *new_mems;
+	mutex_unlock(&callback_mutex);
+
+	if (update_cpus)
+		update_tasks_cpumask(cs, NULL);
+	if (update_mems)
+		update_tasks_nodemask(cs, NULL);
+}
+
 /**
  * cpuset_hotplug_update_tasks - update tasks in a cpuset for hotunplug
  * @cs: cpuset in interest
@@ -2144,9 +2215,8 @@ static void remove_tasks_in_empty_cpuset(struct cpuset *cs)
  */
 static void cpuset_hotplug_update_tasks(struct cpuset *cs)
 {
-	static cpumask_t off_cpus;
-	static nodemask_t off_mems;
-	bool is_empty;
+	static cpumask_t tmp_cpus;
+	static nodemask_t tmp_mems;
 	bool sane = cgroup_sane_behavior(cs->css.cgroup);
 
 retry:
@@ -2163,67 +2233,11 @@ retry:
 		goto retry;
 	}
 
-	cpumask_andnot(&off_cpus, cs->real_cpus_allowed,
-		       top_cpuset.real_cpus_allowed);
-	nodes_andnot(off_mems, cs->real_mems_allowed,
-		     top_cpuset.real_mems_allowed);
-
-	mutex_lock(&callback_mutex);
-	if (!sane)
-		cpumask_andnot(cs->cpus_allowed, cs->cpus_allowed, &off_cpus);
-
-	cpumask_andnot(cs->real_cpus_allowed, cs->real_cpus_allowed,
-		       &off_cpus);
-	/* Inherite the effective mask of the parent, if it becomes empty */
-	if (cpumask_empty(cs->real_cpus_allowed))
-		cpumask_copy(cs->real_cpus_allowed,
-			     parent_cs(cs)->real_cpus_allowed);
-	mutex_unlock(&callback_mutex);
-
-	/*
-	 * If sane_behavior flag is set, we need to update tasks' cpumask
-	 * for empty cpuset to take on ancestor's cpumask. Otherwise, don't
-	 * call update_tasks_cpumask() if the cpuset becomes empty, as
-	 * the tasks in it will be migrated to an ancestor.
-	 */
-	if ((sane && cpumask_empty(cs->cpus_allowed)) ||
-	    (!cpumask_empty(&off_cpus) && !cpumask_empty(cs->cpus_allowed)))
-		update_tasks_cpumask(cs, NULL);
-
-	mutex_lock(&callback_mutex);
-	if (!sane)
-		nodes_andnot(cs->mems_allowed, cs->mems_allowed, off_mems);
-
-	nodes_andnot(cs->real_mems_allowed, cs->real_mems_allowed, off_mems);
-	/* Inherite the effective mask of the parent, if it becomes empty */
-	if (nodes_empty(cs->real_mems_allowed))
-		cs->real_mems_allowed = parent_cs(cs)->real_mems_allowed;
-	mutex_unlock(&callback_mutex);
-
-	/*
-	 * If sane_behavior flag is set, we need to update tasks' nodemask
-	 * for empty cpuset to take on ancestor's nodemask. Otherwise, don't
-	 * call update_tasks_nodemask() if the cpuset becomes empty, as
-	 * the tasks in it will be migratd to an ancestor.
-	 */
-	if ((sane && nodes_empty(cs->mems_allowed)) ||
-	    (!nodes_empty(off_mems) && !nodes_empty(cs->mems_allowed)))
-		update_tasks_nodemask(cs, NULL);
-
-	is_empty = cpumask_empty(cs->cpus_allowed) ||
-		nodes_empty(cs->mems_allowed);
-
+	if (sane)
+		hotplug_update_tasks_sane(cs, &tmp_cpus, &tmp_mems);
+	else
+		hotplug_update_tasks_insane(cs, &tmp_cpus, &tmp_mems);
 	mutex_unlock(&cpuset_mutex);
-
-	/*
-	 * If sane_behavior flag is set, we'll keep tasks in empty cpusets.
-	 *
-	 * Otherwise move tasks to the nearest ancestor with execution
-	 * resources.  This is full cgroup operation which will
-	 * also call back into cpuset.  Should be done outside any lock.
-	 */
-	if (!sane && is_empty)
-		remove_tasks_in_empty_cpuset(cs);
 }
 
 /**
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 09/11] cpuset: enable onlined cpu/node in effective masks
@ 2013-08-21 10:01     ` Li Zefan
  0 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-08-21 10:01 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups, Containers

Firstly offline cpu1:

  # echo 0-1 > cpuset.cpus
  # echo 0 > /sys/devices/system/cpu/cpu1/online
  # cat cpuset.cpus
  0-1
  # cat cpuset.effective_cpus
  0

Then online it:

  # echo 1 > /sys/devices/system/cpu/cpu1/online
  # cat cpuset.cpus
  0-1
  # cat cpuset.effective_cpus
  0-1

And cpuset will bring it back to the effective mask.

This is a behavior change for sane_behavior.

Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 kernel/cpuset.c | 140 +++++++++++++++++++++++++++++++-------------------------
 1 file changed, 77 insertions(+), 63 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index c3a02a9..20fc109 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -2134,6 +2134,77 @@ static void remove_tasks_in_empty_cpuset(struct cpuset *cs)
 	}
 }
 
+static void hotplug_update_tasks_insane(struct cpuset *cs,
+					struct cpumask *off_cpus,
+					nodemask_t *off_mems)
+{
+	bool is_empty;
+
+	cpumask_andnot(off_cpus, cs->real_cpus_allowed,
+		       top_cpuset.real_cpus_allowed);
+	nodes_andnot(*off_mems, cs->real_mems_allowed,
+		     top_cpuset.real_mems_allowed);
+
+	mutex_lock(&callback_mutex);
+	cpumask_andnot(cs->cpus_allowed, cs->cpus_allowed, off_cpus);
+	cpumask_andnot(cs->real_cpus_allowed, cs->real_cpus_allowed, off_cpus);
+	nodes_andnot(cs->mems_allowed, cs->mems_allowed, *off_mems);
+	nodes_andnot(cs->real_mems_allowed, cs->real_mems_allowed, *off_mems);
+	mutex_unlock(&callback_mutex);
+
+	/*
+	 * Don't call update_tasks_cpumask() if the cpuset becomes empty,
+	 * as the tasks will be migrated to an ancestor.
+	 */
+	if (!cpumask_empty(off_cpus) && !cpumask_empty(cs->cpus_allowed))
+		update_tasks_cpumask(cs, NULL);
+
+	if (!nodes_empty(*off_mems) && !cpumask_empty(cs->cpus_allowed))
+		update_tasks_nodemask(cs, NULL);
+
+	is_empty = cpumask_empty(cs->cpus_allowed) ||
+		   nodes_empty(cs->mems_allowed);
+
+	mutex_unlock(&cpuset_mutex);
+	/*
+	 * Move tasks to the nearest ancestor with execution resources,
+	 * This is full cgroup operation which will also call back into
+	 * cpuset. Should be don outside any lock.
+	 */
+	if (is_empty)
+		remove_tasks_in_empty_cpuset(cs);
+	mutex_lock(&cpuset_mutex);
+}
+
+static void hotplug_update_tasks_sane(struct cpuset *cs,
+				      struct cpumask *new_cpus,
+				      nodemask_t *new_mems)
+{
+	struct cpuset *parent = parent_cs(cs);
+	bool update_cpus, update_mems;
+
+	cpumask_and(new_cpus, cs->cpus_allowed, parent->real_cpus_allowed);
+	if (cpumask_empty(new_cpus))
+		cpumask_copy(new_cpus, parent->real_cpus_allowed);
+
+	nodes_and(*new_mems, cs->mems_allowed, parent->real_mems_allowed);
+	if (nodes_empty(*new_mems))
+		*new_mems = parent->real_mems_allowed;
+
+	update_cpus = !cpumask_equal(cs->real_cpus_allowed, new_cpus);
+	update_mems = !nodes_equal(cs->real_mems_allowed, new_mems);
+
+	mutex_lock(&callback_mutex);
+	cpumask_copy(cs->real_cpus_allowed, new_cpus);
+	cs->real_mems_allowed = *new_mems;
+	mutex_unlock(&callback_mutex);
+
+	if (update_cpus)
+		update_tasks_cpumask(cs, NULL);
+	if (update_mems)
+		update_tasks_nodemask(cs, NULL);
+}
+
 /**
  * cpuset_hotplug_update_tasks - update tasks in a cpuset for hotunplug
  * @cs: cpuset in interest
@@ -2144,9 +2215,8 @@ static void remove_tasks_in_empty_cpuset(struct cpuset *cs)
  */
 static void cpuset_hotplug_update_tasks(struct cpuset *cs)
 {
-	static cpumask_t off_cpus;
-	static nodemask_t off_mems;
-	bool is_empty;
+	static cpumask_t tmp_cpus;
+	static nodemask_t tmp_mems;
 	bool sane = cgroup_sane_behavior(cs->css.cgroup);
 
 retry:
@@ -2163,67 +2233,11 @@ retry:
 		goto retry;
 	}
 
-	cpumask_andnot(&off_cpus, cs->real_cpus_allowed,
-		       top_cpuset.real_cpus_allowed);
-	nodes_andnot(off_mems, cs->real_mems_allowed,
-		     top_cpuset.real_mems_allowed);
-
-	mutex_lock(&callback_mutex);
-	if (!sane)
-		cpumask_andnot(cs->cpus_allowed, cs->cpus_allowed, &off_cpus);
-
-	cpumask_andnot(cs->real_cpus_allowed, cs->real_cpus_allowed,
-		       &off_cpus);
-	/* Inherite the effective mask of the parent, if it becomes empty */
-	if (cpumask_empty(cs->real_cpus_allowed))
-		cpumask_copy(cs->real_cpus_allowed,
-			     parent_cs(cs)->real_cpus_allowed);
-	mutex_unlock(&callback_mutex);
-
-	/*
-	 * If sane_behavior flag is set, we need to update tasks' cpumask
-	 * for empty cpuset to take on ancestor's cpumask. Otherwise, don't
-	 * call update_tasks_cpumask() if the cpuset becomes empty, as
-	 * the tasks in it will be migrated to an ancestor.
-	 */
-	if ((sane && cpumask_empty(cs->cpus_allowed)) ||
-	    (!cpumask_empty(&off_cpus) && !cpumask_empty(cs->cpus_allowed)))
-		update_tasks_cpumask(cs, NULL);
-
-	mutex_lock(&callback_mutex);
-	if (!sane)
-		nodes_andnot(cs->mems_allowed, cs->mems_allowed, off_mems);
-
-	nodes_andnot(cs->real_mems_allowed, cs->real_mems_allowed, off_mems);
-	/* Inherite the effective mask of the parent, if it becomes empty */
-	if (nodes_empty(cs->real_mems_allowed))
-		cs->real_mems_allowed = parent_cs(cs)->real_mems_allowed;
-	mutex_unlock(&callback_mutex);
-
-	/*
-	 * If sane_behavior flag is set, we need to update tasks' nodemask
-	 * for empty cpuset to take on ancestor's nodemask. Otherwise, don't
-	 * call update_tasks_nodemask() if the cpuset becomes empty, as
-	 * the tasks in it will be migratd to an ancestor.
-	 */
-	if ((sane && nodes_empty(cs->mems_allowed)) ||
-	    (!nodes_empty(off_mems) && !nodes_empty(cs->mems_allowed)))
-		update_tasks_nodemask(cs, NULL);
-
-	is_empty = cpumask_empty(cs->cpus_allowed) ||
-		nodes_empty(cs->mems_allowed);
-
+	if (sane)
+		hotplug_update_tasks_sane(cs, &tmp_cpus, &tmp_mems);
+	else
+		hotplug_update_tasks_insane(cs, &tmp_cpus, &tmp_mems);
 	mutex_unlock(&cpuset_mutex);
-
-	/*
-	 * If sane_behavior flag is set, we'll keep tasks in empty cpusets.
-	 *
-	 * Otherwise move tasks to the nearest ancestor with execution
-	 * resources.  This is full cgroup operation which will
-	 * also call back into cpuset.  Should be done outside any lock.
-	 */
-	if (!sane && is_empty)
-		remove_tasks_in_empty_cpuset(cs);
 }
 
 /**
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 10/11] cpuset: allow writing offlined masks to cpuset.cpus/mems
       [not found] ` <52148F52.0-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
                     ` (8 preceding siblings ...)
  2013-08-21 10:01     ` Li Zefan
@ 2013-08-21 10:01   ` Li Zefan
  2013-08-21 10:01     ` Li Zefan
  2013-08-21 14:21   ` [PATCH 00/11] cpuset: separate configured masks and effective masks Tejun Heo
  11 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-08-21 10:01 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Cgroups, Containers, LKML

As the configured masks won't be limited by its parent, and the top
cpuset's masks won't change when hotplug happens, it's natural to
allow writing offlined masks to the configured masks.

Signed-off-by; Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 kernel/cpuset.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 20fc109..c302979 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -933,7 +933,8 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
 		if (retval < 0)
 			return retval;
 
-		if (!cpumask_subset(trialcs->cpus_allowed, cpu_active_mask))
+		if (!cpumask_subset(trialcs->cpus_allowed,
+				    top_cpuset.cpus_allowed))
 			return -EINVAL;
 	}
 
@@ -1207,8 +1208,8 @@ static int update_nodemask(struct cpuset *cs, struct cpuset *trialcs,
 			goto done;
 
 		if (!nodes_subset(trialcs->mems_allowed,
-				node_states[N_MEMORY])) {
-			retval =  -EINVAL;
+				  top_cpuset.mems_allowed)) {
+			retval = -EINVAL;
 			goto done;
 		}
 	}
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 10/11] cpuset: allow writing offlined masks to cpuset.cpus/mems
       [not found] ` <52148F52.0-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2013-08-21 10:01   ` Li Zefan
  2013-08-21  9:59   ` [PATCH 02/11] cpuset: add cs->real_cpus_allowed and cs->real_mems_allowed Li Zefan
                     ` (10 subsequent siblings)
  11 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-08-21 10:01 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups, Containers

As the configured masks won't be limited by its parent, and the top
cpuset's masks won't change when hotplug happens, it's natural to
allow writing offlined masks to the configured masks.

Signed-off-by; Li Zefan <lizefan@huawei.com>
---
 kernel/cpuset.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 20fc109..c302979 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -933,7 +933,8 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
 		if (retval < 0)
 			return retval;
 
-		if (!cpumask_subset(trialcs->cpus_allowed, cpu_active_mask))
+		if (!cpumask_subset(trialcs->cpus_allowed,
+				    top_cpuset.cpus_allowed))
 			return -EINVAL;
 	}
 
@@ -1207,8 +1208,8 @@ static int update_nodemask(struct cpuset *cs, struct cpuset *trialcs,
 			goto done;
 
 		if (!nodes_subset(trialcs->mems_allowed,
-				node_states[N_MEMORY])) {
-			retval =  -EINVAL;
+				  top_cpuset.mems_allowed)) {
+			retval = -EINVAL;
 			goto done;
 		}
 	}
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 10/11] cpuset: allow writing offlined masks to cpuset.cpus/mems
@ 2013-08-21 10:01   ` Li Zefan
  0 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-08-21 10:01 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups, Containers

As the configured masks won't be limited by its parent, and the top
cpuset's masks won't change when hotplug happens, it's natural to
allow writing offlined masks to the configured masks.

Signed-off-by; Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 kernel/cpuset.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 20fc109..c302979 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -933,7 +933,8 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
 		if (retval < 0)
 			return retval;
 
-		if (!cpumask_subset(trialcs->cpus_allowed, cpu_active_mask))
+		if (!cpumask_subset(trialcs->cpus_allowed,
+				    top_cpuset.cpus_allowed))
 			return -EINVAL;
 	}
 
@@ -1207,8 +1208,8 @@ static int update_nodemask(struct cpuset *cs, struct cpuset *trialcs,
 			goto done;
 
 		if (!nodes_subset(trialcs->mems_allowed,
-				node_states[N_MEMORY])) {
-			retval =  -EINVAL;
+				  top_cpuset.mems_allowed)) {
+			retval = -EINVAL;
 			goto done;
 		}
 	}
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 11/11] cpuset: export effective masks to userspace
  2013-08-21  9:58 ` Li Zefan
@ 2013-08-21 10:01     ` Li Zefan
  -1 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-08-21 10:01 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Cgroups, Containers, LKML

cpuset.cpus and cpuset.mems are the configured masks, and we need
to export effective masks to userspace, so users know the real
cpus_allowed and mems_allowed that apply to the tasks in a cpuset.

cpuset.effective_cpus and cpuset.effective_mems will be created for
sane_behavior only.

Signed-off-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 kernel/cpuset.c | 36 ++++++++++++++++++++++++++++++------
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index c302979..a4f3483 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1583,6 +1583,8 @@ typedef enum {
 	FILE_MEMORY_MIGRATE,
 	FILE_CPULIST,
 	FILE_MEMLIST,
+	FILE_EFFECTIVE_CPULIST,
+	FILE_EFFECTIVE_MEMLIST,
 	FILE_CPU_EXCLUSIVE,
 	FILE_MEM_EXCLUSIVE,
 	FILE_MEM_HARDWALL,
@@ -1731,23 +1733,23 @@ out_unlock:
  * across a page fault.
  */
 
-static size_t cpuset_sprintf_cpulist(char *page, struct cpuset *cs)
+static size_t cpuset_sprintf_cpulist(char *page, struct cpumask *pmask)
 {
 	size_t count;
 
 	mutex_lock(&callback_mutex);
-	count = cpulist_scnprintf(page, PAGE_SIZE, cs->cpus_allowed);
+	count = cpulist_scnprintf(page, PAGE_SIZE, pmask);
 	mutex_unlock(&callback_mutex);
 
 	return count;
 }
 
-static size_t cpuset_sprintf_memlist(char *page, struct cpuset *cs)
+static size_t cpuset_sprintf_memlist(char *page, nodemask_t mask)
 {
 	size_t count;
 
 	mutex_lock(&callback_mutex);
-	count = nodelist_scnprintf(page, PAGE_SIZE, cs->mems_allowed);
+	count = nodelist_scnprintf(page, PAGE_SIZE, mask);
 	mutex_unlock(&callback_mutex);
 
 	return count;
@@ -1771,10 +1773,16 @@ static ssize_t cpuset_common_file_read(struct cgroup_subsys_state *css,
 
 	switch (type) {
 	case FILE_CPULIST:
-		s += cpuset_sprintf_cpulist(s, cs);
+		s += cpuset_sprintf_cpulist(s, cs->cpus_allowed);
 		break;
 	case FILE_MEMLIST:
-		s += cpuset_sprintf_memlist(s, cs);
+		s += cpuset_sprintf_memlist(s, cs->mems_allowed);
+		break;
+	case FILE_EFFECTIVE_CPULIST:
+		s += cpuset_sprintf_cpulist(s, cs->real_cpus_allowed);
+		break;
+	case FILE_EFFECTIVE_MEMLIST:
+		s += cpuset_sprintf_memlist(s, cs->real_mems_allowed);
 		break;
 	default:
 		retval = -EINVAL;
@@ -1849,6 +1857,14 @@ static struct cftype files[] = {
 	},
 
 	{
+		.name = "effective_cpus",
+		.flags = CFTYPE_SANE,
+		.read = cpuset_common_file_read,
+		.max_write_len = (100U + 6 * NR_CPUS),
+		.private = FILE_EFFECTIVE_CPULIST,
+	},
+
+	{
 		.name = "mems",
 		.read = cpuset_common_file_read,
 		.write_string = cpuset_write_resmask,
@@ -1857,6 +1873,14 @@ static struct cftype files[] = {
 	},
 
 	{
+		.name = "effective_mems",
+		.flags = CFTYPE_SANE,
+		.read = cpuset_common_file_read,
+		.max_write_len = (100U + 6 * MAX_NUMNODES),
+		.private = FILE_EFFECTIVE_MEMLIST,
+	},
+
+	{
 		.name = "cpu_exclusive",
 		.read_u64 = cpuset_read_u64,
 		.write_u64 = cpuset_write_u64,
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 11/11] cpuset: export effective masks to userspace
@ 2013-08-21 10:01     ` Li Zefan
  0 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-08-21 10:01 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups, Containers

cpuset.cpus and cpuset.mems are the configured masks, and we need
to export effective masks to userspace, so users know the real
cpus_allowed and mems_allowed that apply to the tasks in a cpuset.

cpuset.effective_cpus and cpuset.effective_mems will be created for
sane_behavior only.

Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 kernel/cpuset.c | 36 ++++++++++++++++++++++++++++++------
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index c302979..a4f3483 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1583,6 +1583,8 @@ typedef enum {
 	FILE_MEMORY_MIGRATE,
 	FILE_CPULIST,
 	FILE_MEMLIST,
+	FILE_EFFECTIVE_CPULIST,
+	FILE_EFFECTIVE_MEMLIST,
 	FILE_CPU_EXCLUSIVE,
 	FILE_MEM_EXCLUSIVE,
 	FILE_MEM_HARDWALL,
@@ -1731,23 +1733,23 @@ out_unlock:
  * across a page fault.
  */
 
-static size_t cpuset_sprintf_cpulist(char *page, struct cpuset *cs)
+static size_t cpuset_sprintf_cpulist(char *page, struct cpumask *pmask)
 {
 	size_t count;
 
 	mutex_lock(&callback_mutex);
-	count = cpulist_scnprintf(page, PAGE_SIZE, cs->cpus_allowed);
+	count = cpulist_scnprintf(page, PAGE_SIZE, pmask);
 	mutex_unlock(&callback_mutex);
 
 	return count;
 }
 
-static size_t cpuset_sprintf_memlist(char *page, struct cpuset *cs)
+static size_t cpuset_sprintf_memlist(char *page, nodemask_t mask)
 {
 	size_t count;
 
 	mutex_lock(&callback_mutex);
-	count = nodelist_scnprintf(page, PAGE_SIZE, cs->mems_allowed);
+	count = nodelist_scnprintf(page, PAGE_SIZE, mask);
 	mutex_unlock(&callback_mutex);
 
 	return count;
@@ -1771,10 +1773,16 @@ static ssize_t cpuset_common_file_read(struct cgroup_subsys_state *css,
 
 	switch (type) {
 	case FILE_CPULIST:
-		s += cpuset_sprintf_cpulist(s, cs);
+		s += cpuset_sprintf_cpulist(s, cs->cpus_allowed);
 		break;
 	case FILE_MEMLIST:
-		s += cpuset_sprintf_memlist(s, cs);
+		s += cpuset_sprintf_memlist(s, cs->mems_allowed);
+		break;
+	case FILE_EFFECTIVE_CPULIST:
+		s += cpuset_sprintf_cpulist(s, cs->real_cpus_allowed);
+		break;
+	case FILE_EFFECTIVE_MEMLIST:
+		s += cpuset_sprintf_memlist(s, cs->real_mems_allowed);
 		break;
 	default:
 		retval = -EINVAL;
@@ -1849,6 +1857,14 @@ static struct cftype files[] = {
 	},
 
 	{
+		.name = "effective_cpus",
+		.flags = CFTYPE_SANE,
+		.read = cpuset_common_file_read,
+		.max_write_len = (100U + 6 * NR_CPUS),
+		.private = FILE_EFFECTIVE_CPULIST,
+	},
+
+	{
 		.name = "mems",
 		.read = cpuset_common_file_read,
 		.write_string = cpuset_write_resmask,
@@ -1857,6 +1873,14 @@ static struct cftype files[] = {
 	},
 
 	{
+		.name = "effective_mems",
+		.flags = CFTYPE_SANE,
+		.read = cpuset_common_file_read,
+		.max_write_len = (100U + 6 * MAX_NUMNODES),
+		.private = FILE_EFFECTIVE_MEMLIST,
+	},
+
+	{
 		.name = "cpu_exclusive",
 		.read_u64 = cpuset_read_u64,
 		.write_u64 = cpuset_write_u64,
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH 02/11] cpuset: add cs->real_cpus_allowed and cs->real_mems_allowed
  2013-08-21  9:59   ` Li Zefan
@ 2013-08-21 13:22       ` Tejun Heo
  -1 siblings, 0 replies; 66+ messages in thread
From: Tejun Heo @ 2013-08-21 13:22 UTC (permalink / raw)
  To: Li Zefan; +Cc: Cgroups, Containers, LKML

On Wed, Aug 21, 2013 at 05:59:11PM +0800, Li Zefan wrote:
> We're going to have separate user-configured masks and effective ones.
> 
> At last configured masks can only be changed by writing cpuset.cpus

I suppose you mean "eventually" by "at last"?

> and cpuset.mems, and they won't be restricted by parent cpuset. While
> effective masks reflect cpu/memory hotplug and hierachical restriction.
> 
> This patch adds and initializes the effective masks. The effective
> masks of the top cpuset is the same with configured masks, and a child
> cpuset inherites its parent's effective masks.
> 
> This won't introduce behavior change.
> 
> Signed-off-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> ---
>  kernel/cpuset.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 46 insertions(+), 11 deletions(-)
> 
> diff --git a/kernel/cpuset.c b/kernel/cpuset.c
> index 70ab3fd..404fea5 100644
> --- a/kernel/cpuset.c
> +++ b/kernel/cpuset.c
> @@ -81,8 +81,14 @@ struct cpuset {
>  	struct cgroup_subsys_state css;
>  
>  	unsigned long flags;		/* "unsigned long" so bitops work */
> -	cpumask_var_t cpus_allowed;	/* CPUs allowed to tasks in cpuset */
> -	nodemask_t mems_allowed;	/* Memory Nodes allowed to tasks */
> +
> +	/* user-configured CPUs and Memory Nodes allow to tasks */
> +	cpumask_var_t cpus_allowed;
> +	nodemask_t mems_allowed;
> +
> +	/* effective CPUs and Memory Nodes allow to tasks */
> +	cpumask_var_t real_cpus_allowed;
> +	nodemask_t real_mems_allowed;

Can we stick to the term "effective"?  If it's too long, we can drop
the "allowed" postfix, which is pretty superflous.  effective_cpus and
effective_mems should work, right?  For local vars, ecpus and emems
should do.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 02/11] cpuset: add cs->real_cpus_allowed and cs->real_mems_allowed
@ 2013-08-21 13:22       ` Tejun Heo
  0 siblings, 0 replies; 66+ messages in thread
From: Tejun Heo @ 2013-08-21 13:22 UTC (permalink / raw)
  To: Li Zefan; +Cc: LKML, Cgroups, Containers

On Wed, Aug 21, 2013 at 05:59:11PM +0800, Li Zefan wrote:
> We're going to have separate user-configured masks and effective ones.
> 
> At last configured masks can only be changed by writing cpuset.cpus

I suppose you mean "eventually" by "at last"?

> and cpuset.mems, and they won't be restricted by parent cpuset. While
> effective masks reflect cpu/memory hotplug and hierachical restriction.
> 
> This patch adds and initializes the effective masks. The effective
> masks of the top cpuset is the same with configured masks, and a child
> cpuset inherites its parent's effective masks.
> 
> This won't introduce behavior change.
> 
> Signed-off-by: Li Zefan <lizefan@huawei.com>
> ---
>  kernel/cpuset.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 46 insertions(+), 11 deletions(-)
> 
> diff --git a/kernel/cpuset.c b/kernel/cpuset.c
> index 70ab3fd..404fea5 100644
> --- a/kernel/cpuset.c
> +++ b/kernel/cpuset.c
> @@ -81,8 +81,14 @@ struct cpuset {
>  	struct cgroup_subsys_state css;
>  
>  	unsigned long flags;		/* "unsigned long" so bitops work */
> -	cpumask_var_t cpus_allowed;	/* CPUs allowed to tasks in cpuset */
> -	nodemask_t mems_allowed;	/* Memory Nodes allowed to tasks */
> +
> +	/* user-configured CPUs and Memory Nodes allow to tasks */
> +	cpumask_var_t cpus_allowed;
> +	nodemask_t mems_allowed;
> +
> +	/* effective CPUs and Memory Nodes allow to tasks */
> +	cpumask_var_t real_cpus_allowed;
> +	nodemask_t real_mems_allowed;

Can we stick to the term "effective"?  If it's too long, we can drop
the "allowed" postfix, which is pretty superflous.  effective_cpus and
effective_mems should work, right?  For local vars, ecpus and emems
should do.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 04/11] cpuset: update cs->real_{cpus,mems}_allowed when config changes
  2013-08-21  9:59   ` Li Zefan
@ 2013-08-21 13:39       ` Tejun Heo
  -1 siblings, 0 replies; 66+ messages in thread
From: Tejun Heo @ 2013-08-21 13:39 UTC (permalink / raw)
  To: Li Zefan; +Cc: Cgroups, Containers, LKML

On Wed, Aug 21, 2013 at 05:59:32PM +0800, Li Zefan wrote:
...
> +	cpuset_for_each_descendant_pre(cp, pos_css, cs) {
> +		struct cpuset *parent = parent_cs(cs);
> +		struct cpumask *new_cpus = trialcs->real_cpus_allowed;
> +
> +		cpumask_and(new_cpus, cp->cpus_allowed,
> +			    parent->real_cpus_allowed);
> +
> +		/*
> +		 * Skip the whole subtree if the cpumask is not changed, unless
> +		 * it's empty, and in this case we need to update tasks to take
> +		 * on an ancestor's cpumask.

Something like the following would be clearer?

"Skip the whole subtree if the cpumask remains the same and isn't
empty.  If empty, we need..."

> @@ -931,7 +941,6 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
>  {
>  	struct ptr_heap heap;
>  	int retval;
> -	int is_load_balanced;
>  
>  	/* top_cpuset.cpus_allowed tracks cpu_online_mask; it's read-only */
>  	if (cs == &top_cpuset)
> @@ -966,17 +975,15 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
>  	if (retval)
>  		return retval;
>  
> -	is_load_balanced = is_sched_load_balance(trialcs);
> -
>  	mutex_lock(&callback_mutex);
>  	cpumask_copy(cs->cpus_allowed, trialcs->cpus_allowed);
>  	mutex_unlock(&callback_mutex);
>  
> -	update_tasks_cpumask_hier(cs, true, &heap);
> +	update_cpumasks_hier(cs, trialcs, &heap);
>  
>  	heap_free(&heap);
>  
> -	if (is_load_balanced)
> +	if (is_sched_load_balance(cs))
>  		rebuild_sched_domains_locked();

Hmmm... Maybe the above change needs some explanation in the patch
description?

Ooh and @update_root params are gone.  Maybe nice to note that in the
description too?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 04/11] cpuset: update cs->real_{cpus,mems}_allowed when config changes
@ 2013-08-21 13:39       ` Tejun Heo
  0 siblings, 0 replies; 66+ messages in thread
From: Tejun Heo @ 2013-08-21 13:39 UTC (permalink / raw)
  To: Li Zefan; +Cc: LKML, Cgroups, Containers

On Wed, Aug 21, 2013 at 05:59:32PM +0800, Li Zefan wrote:
...
> +	cpuset_for_each_descendant_pre(cp, pos_css, cs) {
> +		struct cpuset *parent = parent_cs(cs);
> +		struct cpumask *new_cpus = trialcs->real_cpus_allowed;
> +
> +		cpumask_and(new_cpus, cp->cpus_allowed,
> +			    parent->real_cpus_allowed);
> +
> +		/*
> +		 * Skip the whole subtree if the cpumask is not changed, unless
> +		 * it's empty, and in this case we need to update tasks to take
> +		 * on an ancestor's cpumask.

Something like the following would be clearer?

"Skip the whole subtree if the cpumask remains the same and isn't
empty.  If empty, we need..."

> @@ -931,7 +941,6 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
>  {
>  	struct ptr_heap heap;
>  	int retval;
> -	int is_load_balanced;
>  
>  	/* top_cpuset.cpus_allowed tracks cpu_online_mask; it's read-only */
>  	if (cs == &top_cpuset)
> @@ -966,17 +975,15 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
>  	if (retval)
>  		return retval;
>  
> -	is_load_balanced = is_sched_load_balance(trialcs);
> -
>  	mutex_lock(&callback_mutex);
>  	cpumask_copy(cs->cpus_allowed, trialcs->cpus_allowed);
>  	mutex_unlock(&callback_mutex);
>  
> -	update_tasks_cpumask_hier(cs, true, &heap);
> +	update_cpumasks_hier(cs, trialcs, &heap);
>  
>  	heap_free(&heap);
>  
> -	if (is_load_balanced)
> +	if (is_sched_load_balance(cs))
>  		rebuild_sched_domains_locked();

Hmmm... Maybe the above change needs some explanation in the patch
description?

Ooh and @update_root params are gone.  Maybe nice to note that in the
description too?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 05/11] cpuset: inherite ancestor's masks if real_{cpus,mems}_allowed become empty
       [not found]     ` <52148F90.7070809-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2013-08-21 13:44       ` Tejun Heo
  0 siblings, 0 replies; 66+ messages in thread
From: Tejun Heo @ 2013-08-21 13:44 UTC (permalink / raw)
  To: Li Zefan; +Cc: Cgroups, Containers, LKML

Hello,

s/inherite/inherit/ in the subject.

On Wed, Aug 21, 2013 at 05:59:44PM +0800, Li Zefan wrote:
> We're going to have separate user-configured masks and effective ones.
> 
> At last configured masks can only be changed by writing cpuset.cpus
> and cpuset.mems, and they won't be restricted by parent cpuset. While
> effective masks reflect cpu/memory hotplug and hierachical restriction.
> 
> This is a preparation to make real_{cpus,mems}_allowed to be effective
> masks of the cpuset:
> 
> - change the effective masks at hotplug: done
> - change the effective masks at config change: done
> - take on ancestor's mask when the effective mask is empty: done

The above description doesn't really work well.  It looks like this
patch does all three changes.  Can you please update the patch
descriptions so that it's clear what each patch does?

>  		/*
> +		 * If it becomes empty, inherite the effective mask of the
                                        ^
					inherit

> +		 * parent, which is guarantted to have some CPUs.
>  		 */
> -		if (cpumask_equal(new_cpus, cp->real_cpus_allowed) &&
> -		    ((cp == cs) || !cpumask_empty(new_cpus))) {
> +		if (cpumask_empty(new_cpus))
> +			cpumask_copy(new_cpus, parent->real_cpus_allowed);
> +
> +		/* Skip the whole subtree if the cpumask is not changed. */
> +		if (cpumask_equal(new_cpus, cp->real_cpus_allowed)) {
>  			pos_css = css_rightmost_descendant(pos_css);
>  			continue;
>  		}

Ooh, I like how this looks now.  Makes a lot more sense than the logic
before.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 05/11] cpuset: inherite ancestor's masks if real_{cpus,mems}_allowed become empty
       [not found]     ` <52148F90.7070809-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2013-08-21 13:44       ` Tejun Heo
  0 siblings, 0 replies; 66+ messages in thread
From: Tejun Heo @ 2013-08-21 13:44 UTC (permalink / raw)
  To: Li Zefan; +Cc: LKML, Cgroups, Containers

Hello,

s/inherite/inherit/ in the subject.

On Wed, Aug 21, 2013 at 05:59:44PM +0800, Li Zefan wrote:
> We're going to have separate user-configured masks and effective ones.
> 
> At last configured masks can only be changed by writing cpuset.cpus
> and cpuset.mems, and they won't be restricted by parent cpuset. While
> effective masks reflect cpu/memory hotplug and hierachical restriction.
> 
> This is a preparation to make real_{cpus,mems}_allowed to be effective
> masks of the cpuset:
> 
> - change the effective masks at hotplug: done
> - change the effective masks at config change: done
> - take on ancestor's mask when the effective mask is empty: done

The above description doesn't really work well.  It looks like this
patch does all three changes.  Can you please update the patch
descriptions so that it's clear what each patch does?

>  		/*
> +		 * If it becomes empty, inherite the effective mask of the
                                        ^
					inherit

> +		 * parent, which is guarantted to have some CPUs.
>  		 */
> -		if (cpumask_equal(new_cpus, cp->real_cpus_allowed) &&
> -		    ((cp == cs) || !cpumask_empty(new_cpus))) {
> +		if (cpumask_empty(new_cpus))
> +			cpumask_copy(new_cpus, parent->real_cpus_allowed);
> +
> +		/* Skip the whole subtree if the cpumask is not changed. */
> +		if (cpumask_equal(new_cpus, cp->real_cpus_allowed)) {
>  			pos_css = css_rightmost_descendant(pos_css);
>  			continue;
>  		}

Ooh, I like how this looks now.  Makes a lot more sense than the logic
before.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 05/11] cpuset: inherite ancestor's masks if real_{cpus,mems}_allowed become empty
@ 2013-08-21 13:44       ` Tejun Heo
  0 siblings, 0 replies; 66+ messages in thread
From: Tejun Heo @ 2013-08-21 13:44 UTC (permalink / raw)
  To: Li Zefan; +Cc: LKML, Cgroups, Containers

Hello,

s/inherite/inherit/ in the subject.

On Wed, Aug 21, 2013 at 05:59:44PM +0800, Li Zefan wrote:
> We're going to have separate user-configured masks and effective ones.
> 
> At last configured masks can only be changed by writing cpuset.cpus
> and cpuset.mems, and they won't be restricted by parent cpuset. While
> effective masks reflect cpu/memory hotplug and hierachical restriction.
> 
> This is a preparation to make real_{cpus,mems}_allowed to be effective
> masks of the cpuset:
> 
> - change the effective masks at hotplug: done
> - change the effective masks at config change: done
> - take on ancestor's mask when the effective mask is empty: done

The above description doesn't really work well.  It looks like this
patch does all three changes.  Can you please update the patch
descriptions so that it's clear what each patch does?

>  		/*
> +		 * If it becomes empty, inherite the effective mask of the
                                        ^
					inherit

> +		 * parent, which is guarantted to have some CPUs.
>  		 */
> -		if (cpumask_equal(new_cpus, cp->real_cpus_allowed) &&
> -		    ((cp == cs) || !cpumask_empty(new_cpus))) {
> +		if (cpumask_empty(new_cpus))
> +			cpumask_copy(new_cpus, parent->real_cpus_allowed);
> +
> +		/* Skip the whole subtree if the cpumask is not changed. */
> +		if (cpumask_equal(new_cpus, cp->real_cpus_allowed)) {
>  			pos_css = css_rightmost_descendant(pos_css);
>  			continue;
>  		}

Ooh, I like how this looks now.  Makes a lot more sense than the logic
before.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 06/11] cpuset: apply cs->real_{cpus,mems}_allowed
       [not found]   ` <52148F9C.2080600-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2013-08-21 14:01     ` Tejun Heo
  0 siblings, 0 replies; 66+ messages in thread
From: Tejun Heo @ 2013-08-21 14:01 UTC (permalink / raw)
  To: Li Zefan; +Cc: Cgroups, Containers, LKML

On Wed, Aug 21, 2013 at 05:59:56PM +0800, Li Zefan wrote:
> Now we can use cs->real{cpus,mems}_allowed as effective masks. It's
> used whenever:
> 
> - we update tasks' cpus_allowed/mems_allowed,
> - we want to retrieve tasks_cs(tsk)'s cpus_allowed/mems_allowed.
> 
> They actually replace effective_{cpu,node}mask_cpuset().

I think it'd be great if this and the previous patch descriptions
explain that the effective masks equal configured sans the offline
cpus / mems except when the result is empty, in which case it takes on
the nearest ancestor whose intersection isn't empty, and that the
result equals the computed masks from
effective_{cpu,node}mask_cpuset().

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 06/11] cpuset: apply cs->real_{cpus,mems}_allowed
       [not found]   ` <52148F9C.2080600-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2013-08-21 14:01     ` Tejun Heo
  0 siblings, 0 replies; 66+ messages in thread
From: Tejun Heo @ 2013-08-21 14:01 UTC (permalink / raw)
  To: Li Zefan; +Cc: LKML, Cgroups, Containers

On Wed, Aug 21, 2013 at 05:59:56PM +0800, Li Zefan wrote:
> Now we can use cs->real{cpus,mems}_allowed as effective masks. It's
> used whenever:
> 
> - we update tasks' cpus_allowed/mems_allowed,
> - we want to retrieve tasks_cs(tsk)'s cpus_allowed/mems_allowed.
> 
> They actually replace effective_{cpu,node}mask_cpuset().

I think it'd be great if this and the previous patch descriptions
explain that the effective masks equal configured sans the offline
cpus / mems except when the result is empty, in which case it takes on
the nearest ancestor whose intersection isn't empty, and that the
result equals the computed masks from
effective_{cpu,node}mask_cpuset().

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 06/11] cpuset: apply cs->real_{cpus,mems}_allowed
@ 2013-08-21 14:01     ` Tejun Heo
  0 siblings, 0 replies; 66+ messages in thread
From: Tejun Heo @ 2013-08-21 14:01 UTC (permalink / raw)
  To: Li Zefan; +Cc: LKML, Cgroups, Containers

On Wed, Aug 21, 2013 at 05:59:56PM +0800, Li Zefan wrote:
> Now we can use cs->real{cpus,mems}_allowed as effective masks. It's
> used whenever:
> 
> - we update tasks' cpus_allowed/mems_allowed,
> - we want to retrieve tasks_cs(tsk)'s cpus_allowed/mems_allowed.
> 
> They actually replace effective_{cpu,node}mask_cpuset().

I think it'd be great if this and the previous patch descriptions
explain that the effective masks equal configured sans the offline
cpus / mems except when the result is empty, in which case it takes on
the nearest ancestor whose intersection isn't empty, and that the
result equals the computed masks from
effective_{cpu,node}mask_cpuset().

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 07/11] cpuset: use effective cpumask to build sched domains
  2013-08-21 10:00     ` Li Zefan
@ 2013-08-21 14:04         ` Tejun Heo
  -1 siblings, 0 replies; 66+ messages in thread
From: Tejun Heo @ 2013-08-21 14:04 UTC (permalink / raw)
  To: Li Zefan; +Cc: Cgroups, Containers, LKML

On Wed, Aug 21, 2013 at 06:00:09PM +0800, Li Zefan wrote:
> @@ -887,10 +888,17 @@ static void update_cpumasks_hier(struct cpuset *cs, struct cpuset *trialcs,
>  
>  		update_tasks_cpumask(cp, heap);
>  
> +		if (!cpumask_empty(cp->cpus_allowed) &&
> +		    is_sched_load_balance(cp))
> +			need_rebuild_sched_domains = true;
> +
>  		rcu_read_lock();
>  		css_put(&cp->css);
>  	}
>  	rcu_read_unlock();
> +
> +	if (need_rebuild_sched_domains)
> +		rebuild_sched_domains_locked();
>  }
>  
>  /**
> @@ -944,9 +952,6 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
>  	update_cpumasks_hier(cs, trialcs, &heap);
>  
>  	heap_free(&heap);
> -
> -	if (is_sched_load_balance(cs))
> -		rebuild_sched_domains_locked();

Hmmm... can we please document what's going on with the above call and
add some comment explaining what it's doing and why it's where it's
at?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 07/11] cpuset: use effective cpumask to build sched domains
@ 2013-08-21 14:04         ` Tejun Heo
  0 siblings, 0 replies; 66+ messages in thread
From: Tejun Heo @ 2013-08-21 14:04 UTC (permalink / raw)
  To: Li Zefan; +Cc: LKML, Cgroups, Containers

On Wed, Aug 21, 2013 at 06:00:09PM +0800, Li Zefan wrote:
> @@ -887,10 +888,17 @@ static void update_cpumasks_hier(struct cpuset *cs, struct cpuset *trialcs,
>  
>  		update_tasks_cpumask(cp, heap);
>  
> +		if (!cpumask_empty(cp->cpus_allowed) &&
> +		    is_sched_load_balance(cp))
> +			need_rebuild_sched_domains = true;
> +
>  		rcu_read_lock();
>  		css_put(&cp->css);
>  	}
>  	rcu_read_unlock();
> +
> +	if (need_rebuild_sched_domains)
> +		rebuild_sched_domains_locked();
>  }
>  
>  /**
> @@ -944,9 +952,6 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
>  	update_cpumasks_hier(cs, trialcs, &heap);
>  
>  	heap_free(&heap);
> -
> -	if (is_sched_load_balance(cs))
> -		rebuild_sched_domains_locked();

Hmmm... can we please document what's going on with the above call and
add some comment explaining what it's doing and why it's where it's
at?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 08/11] cpuset: separate configured masks and efffective masks
  2013-08-21 10:00 ` [PATCH 08/11] cpuset: separate configured masks and efffective masks Li Zefan
@ 2013-08-21 14:08       ` Tejun Heo
  0 siblings, 0 replies; 66+ messages in thread
From: Tejun Heo @ 2013-08-21 14:08 UTC (permalink / raw)
  To: Li Zefan; +Cc: Cgroups, Containers, LKML

On Wed, Aug 21, 2013 at 06:00:42PM +0800, Li Zefan wrote:
> @@ -2261,7 +2271,8 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
>  	/* synchronize mems_allowed to N_MEMORY */
>  	if (mems_updated) {
>  		mutex_lock(&callback_mutex);
> -		top_cpuset.mems_allowed = new_mems;
> +		if (!sane)
> +			top_cpuset.mems_allowed = new_mems;

Can you please further explain how the top cgroup behaves?

-- 
tejun

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 08/11] cpuset: separate configured masks and efffective masks
@ 2013-08-21 14:08       ` Tejun Heo
  0 siblings, 0 replies; 66+ messages in thread
From: Tejun Heo @ 2013-08-21 14:08 UTC (permalink / raw)
  To: Li Zefan; +Cc: LKML, Cgroups, Containers

On Wed, Aug 21, 2013 at 06:00:42PM +0800, Li Zefan wrote:
> @@ -2261,7 +2271,8 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
>  	/* synchronize mems_allowed to N_MEMORY */
>  	if (mems_updated) {
>  		mutex_lock(&callback_mutex);
> -		top_cpuset.mems_allowed = new_mems;
> +		if (!sane)
> +			top_cpuset.mems_allowed = new_mems;

Can you please further explain how the top cgroup behaves?

-- 
tejun

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 09/11] cpuset: enable onlined cpu/node in effective masks
       [not found]     ` <52148FE1.3080806-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2013-08-21 14:11       ` Tejun Heo
  0 siblings, 0 replies; 66+ messages in thread
From: Tejun Heo @ 2013-08-21 14:11 UTC (permalink / raw)
  To: Li Zefan; +Cc: Cgroups, Containers, LKML

Hello, Li.

On Wed, Aug 21, 2013 at 06:01:05PM +0800, Li Zefan wrote:
> Firstly offline cpu1:
> 
>   # echo 0-1 > cpuset.cpus
>   # echo 0 > /sys/devices/system/cpu/cpu1/online
>   # cat cpuset.cpus
>   0-1
>   # cat cpuset.effective_cpus
>   0
> 
> Then online it:
> 
>   # echo 1 > /sys/devices/system/cpu/cpu1/online
>   # cat cpuset.cpus
>   0-1
>   # cat cpuset.effective_cpus
>   0-1
> 
> And cpuset will bring it back to the effective mask.
> 
> This is a behavior change for sane_behavior.

It'd be great if the patch description also explains "how".

> +static void hotplug_update_tasks_insane(struct cpuset *cs,
> +					struct cpumask *off_cpus,
> +					nodemask_t *off_mems)
> +{
> +	bool is_empty;
> +
> +	cpumask_andnot(off_cpus, cs->real_cpus_allowed,
> +		       top_cpuset.real_cpus_allowed);
> +	nodes_andnot(*off_mems, cs->real_mems_allowed,
> +		     top_cpuset.real_mems_allowed);
> +
> +	mutex_lock(&callback_mutex);
> +	cpumask_andnot(cs->cpus_allowed, cs->cpus_allowed, off_cpus);
> +	cpumask_andnot(cs->real_cpus_allowed, cs->real_cpus_allowed, off_cpus);
> +	nodes_andnot(cs->mems_allowed, cs->mems_allowed, *off_mems);
> +	nodes_andnot(cs->real_mems_allowed, cs->real_mems_allowed, *off_mems);
> +	mutex_unlock(&callback_mutex);
> +
> +	/*
> +	 * Don't call update_tasks_cpumask() if the cpuset becomes empty,
> +	 * as the tasks will be migrated to an ancestor.
> +	 */
> +	if (!cpumask_empty(off_cpus) && !cpumask_empty(cs->cpus_allowed))
> +		update_tasks_cpumask(cs, NULL);
> +
> +	if (!nodes_empty(*off_mems) && !cpumask_empty(cs->cpus_allowed))
> +		update_tasks_nodemask(cs, NULL);
> +
> +	is_empty = cpumask_empty(cs->cpus_allowed) ||
> +		   nodes_empty(cs->mems_allowed);
> +
> +	mutex_unlock(&cpuset_mutex);
> +	/*
> +	 * Move tasks to the nearest ancestor with execution resources,
> +	 * This is full cgroup operation which will also call back into
> +	 * cpuset. Should be don outside any lock.
> +	 */
> +	if (is_empty)
> +		remove_tasks_in_empty_cpuset(cs);
> +	mutex_lock(&cpuset_mutex);
> +}

Maybe the patch would be easier to follow if factoring out the above
was in a separate patch?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 09/11] cpuset: enable onlined cpu/node in effective masks
       [not found]     ` <52148FE1.3080806-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2013-08-21 14:11       ` Tejun Heo
  0 siblings, 0 replies; 66+ messages in thread
From: Tejun Heo @ 2013-08-21 14:11 UTC (permalink / raw)
  To: Li Zefan; +Cc: LKML, Cgroups, Containers

Hello, Li.

On Wed, Aug 21, 2013 at 06:01:05PM +0800, Li Zefan wrote:
> Firstly offline cpu1:
> 
>   # echo 0-1 > cpuset.cpus
>   # echo 0 > /sys/devices/system/cpu/cpu1/online
>   # cat cpuset.cpus
>   0-1
>   # cat cpuset.effective_cpus
>   0
> 
> Then online it:
> 
>   # echo 1 > /sys/devices/system/cpu/cpu1/online
>   # cat cpuset.cpus
>   0-1
>   # cat cpuset.effective_cpus
>   0-1
> 
> And cpuset will bring it back to the effective mask.
> 
> This is a behavior change for sane_behavior.

It'd be great if the patch description also explains "how".

> +static void hotplug_update_tasks_insane(struct cpuset *cs,
> +					struct cpumask *off_cpus,
> +					nodemask_t *off_mems)
> +{
> +	bool is_empty;
> +
> +	cpumask_andnot(off_cpus, cs->real_cpus_allowed,
> +		       top_cpuset.real_cpus_allowed);
> +	nodes_andnot(*off_mems, cs->real_mems_allowed,
> +		     top_cpuset.real_mems_allowed);
> +
> +	mutex_lock(&callback_mutex);
> +	cpumask_andnot(cs->cpus_allowed, cs->cpus_allowed, off_cpus);
> +	cpumask_andnot(cs->real_cpus_allowed, cs->real_cpus_allowed, off_cpus);
> +	nodes_andnot(cs->mems_allowed, cs->mems_allowed, *off_mems);
> +	nodes_andnot(cs->real_mems_allowed, cs->real_mems_allowed, *off_mems);
> +	mutex_unlock(&callback_mutex);
> +
> +	/*
> +	 * Don't call update_tasks_cpumask() if the cpuset becomes empty,
> +	 * as the tasks will be migrated to an ancestor.
> +	 */
> +	if (!cpumask_empty(off_cpus) && !cpumask_empty(cs->cpus_allowed))
> +		update_tasks_cpumask(cs, NULL);
> +
> +	if (!nodes_empty(*off_mems) && !cpumask_empty(cs->cpus_allowed))
> +		update_tasks_nodemask(cs, NULL);
> +
> +	is_empty = cpumask_empty(cs->cpus_allowed) ||
> +		   nodes_empty(cs->mems_allowed);
> +
> +	mutex_unlock(&cpuset_mutex);
> +	/*
> +	 * Move tasks to the nearest ancestor with execution resources,
> +	 * This is full cgroup operation which will also call back into
> +	 * cpuset. Should be don outside any lock.
> +	 */
> +	if (is_empty)
> +		remove_tasks_in_empty_cpuset(cs);
> +	mutex_lock(&cpuset_mutex);
> +}

Maybe the patch would be easier to follow if factoring out the above
was in a separate patch?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 09/11] cpuset: enable onlined cpu/node in effective masks
@ 2013-08-21 14:11       ` Tejun Heo
  0 siblings, 0 replies; 66+ messages in thread
From: Tejun Heo @ 2013-08-21 14:11 UTC (permalink / raw)
  To: Li Zefan; +Cc: LKML, Cgroups, Containers

Hello, Li.

On Wed, Aug 21, 2013 at 06:01:05PM +0800, Li Zefan wrote:
> Firstly offline cpu1:
> 
>   # echo 0-1 > cpuset.cpus
>   # echo 0 > /sys/devices/system/cpu/cpu1/online
>   # cat cpuset.cpus
>   0-1
>   # cat cpuset.effective_cpus
>   0
> 
> Then online it:
> 
>   # echo 1 > /sys/devices/system/cpu/cpu1/online
>   # cat cpuset.cpus
>   0-1
>   # cat cpuset.effective_cpus
>   0-1
> 
> And cpuset will bring it back to the effective mask.
> 
> This is a behavior change for sane_behavior.

It'd be great if the patch description also explains "how".

> +static void hotplug_update_tasks_insane(struct cpuset *cs,
> +					struct cpumask *off_cpus,
> +					nodemask_t *off_mems)
> +{
> +	bool is_empty;
> +
> +	cpumask_andnot(off_cpus, cs->real_cpus_allowed,
> +		       top_cpuset.real_cpus_allowed);
> +	nodes_andnot(*off_mems, cs->real_mems_allowed,
> +		     top_cpuset.real_mems_allowed);
> +
> +	mutex_lock(&callback_mutex);
> +	cpumask_andnot(cs->cpus_allowed, cs->cpus_allowed, off_cpus);
> +	cpumask_andnot(cs->real_cpus_allowed, cs->real_cpus_allowed, off_cpus);
> +	nodes_andnot(cs->mems_allowed, cs->mems_allowed, *off_mems);
> +	nodes_andnot(cs->real_mems_allowed, cs->real_mems_allowed, *off_mems);
> +	mutex_unlock(&callback_mutex);
> +
> +	/*
> +	 * Don't call update_tasks_cpumask() if the cpuset becomes empty,
> +	 * as the tasks will be migrated to an ancestor.
> +	 */
> +	if (!cpumask_empty(off_cpus) && !cpumask_empty(cs->cpus_allowed))
> +		update_tasks_cpumask(cs, NULL);
> +
> +	if (!nodes_empty(*off_mems) && !cpumask_empty(cs->cpus_allowed))
> +		update_tasks_nodemask(cs, NULL);
> +
> +	is_empty = cpumask_empty(cs->cpus_allowed) ||
> +		   nodes_empty(cs->mems_allowed);
> +
> +	mutex_unlock(&cpuset_mutex);
> +	/*
> +	 * Move tasks to the nearest ancestor with execution resources,
> +	 * This is full cgroup operation which will also call back into
> +	 * cpuset. Should be don outside any lock.
> +	 */
> +	if (is_empty)
> +		remove_tasks_in_empty_cpuset(cs);
> +	mutex_lock(&cpuset_mutex);
> +}

Maybe the patch would be easier to follow if factoring out the above
was in a separate patch?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 10/11] cpuset: allow writing offlined masks to cpuset.cpus/mems
       [not found]   ` <52148FF1.5060503-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2013-08-21 14:18     ` Tejun Heo
  0 siblings, 0 replies; 66+ messages in thread
From: Tejun Heo @ 2013-08-21 14:18 UTC (permalink / raw)
  To: Li Zefan; +Cc: Cgroups, Containers, LKML

Hello,

On Wed, Aug 21, 2013 at 06:01:21PM +0800, Li Zefan wrote:
> -		if (!cpumask_subset(trialcs->cpus_allowed, cpu_active_mask))
> +		if (!cpumask_subset(trialcs->cpus_allowed,
> +				    top_cpuset.cpus_allowed))

Hmmm... top_cpuset.cpus_allowed is filled using cpumask_setall(),
which may include more bits than cpu_possible_mask, which is kinda
weird.  We probably wanna initialize it with cpu_possible_mask and
also maybe using cpu_possible_mask in the above would be clearer?

Also, shouldn't this be dependent upon sane_behavior?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 10/11] cpuset: allow writing offlined masks to cpuset.cpus/mems
       [not found]   ` <52148FF1.5060503-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2013-08-21 14:18     ` Tejun Heo
  0 siblings, 0 replies; 66+ messages in thread
From: Tejun Heo @ 2013-08-21 14:18 UTC (permalink / raw)
  To: Li Zefan; +Cc: LKML, Cgroups, Containers

Hello,

On Wed, Aug 21, 2013 at 06:01:21PM +0800, Li Zefan wrote:
> -		if (!cpumask_subset(trialcs->cpus_allowed, cpu_active_mask))
> +		if (!cpumask_subset(trialcs->cpus_allowed,
> +				    top_cpuset.cpus_allowed))

Hmmm... top_cpuset.cpus_allowed is filled using cpumask_setall(),
which may include more bits than cpu_possible_mask, which is kinda
weird.  We probably wanna initialize it with cpu_possible_mask and
also maybe using cpu_possible_mask in the above would be clearer?

Also, shouldn't this be dependent upon sane_behavior?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 10/11] cpuset: allow writing offlined masks to cpuset.cpus/mems
@ 2013-08-21 14:18     ` Tejun Heo
  0 siblings, 0 replies; 66+ messages in thread
From: Tejun Heo @ 2013-08-21 14:18 UTC (permalink / raw)
  To: Li Zefan; +Cc: LKML, Cgroups, Containers

Hello,

On Wed, Aug 21, 2013 at 06:01:21PM +0800, Li Zefan wrote:
> -		if (!cpumask_subset(trialcs->cpus_allowed, cpu_active_mask))
> +		if (!cpumask_subset(trialcs->cpus_allowed,
> +				    top_cpuset.cpus_allowed))

Hmmm... top_cpuset.cpus_allowed is filled using cpumask_setall(),
which may include more bits than cpu_possible_mask, which is kinda
weird.  We probably wanna initialize it with cpu_possible_mask and
also maybe using cpu_possible_mask in the above would be clearer?

Also, shouldn't this be dependent upon sane_behavior?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 11/11] cpuset: export effective masks to userspace
       [not found]     ` <52148FFC.4080701-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2013-08-21 14:20       ` Tejun Heo
  0 siblings, 0 replies; 66+ messages in thread
From: Tejun Heo @ 2013-08-21 14:20 UTC (permalink / raw)
  To: Li Zefan; +Cc: Cgroups, Containers, LKML

On Wed, Aug 21, 2013 at 06:01:32PM +0800, Li Zefan wrote:
>  	{
> +		.name = "effective_cpus",
> +		.flags = CFTYPE_SANE,
> +		.read = cpuset_common_file_read,
> +		.max_write_len = (100U + 6 * NR_CPUS),
> +		.private = FILE_EFFECTIVE_CPULIST,

I don't think we need CFTYPE_SANE.  We can just expose these
unconditionally, right?  It still means the same thing when !sane.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 11/11] cpuset: export effective masks to userspace
       [not found]     ` <52148FFC.4080701-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2013-08-21 14:20       ` Tejun Heo
  0 siblings, 0 replies; 66+ messages in thread
From: Tejun Heo @ 2013-08-21 14:20 UTC (permalink / raw)
  To: Li Zefan; +Cc: LKML, Cgroups, Containers

On Wed, Aug 21, 2013 at 06:01:32PM +0800, Li Zefan wrote:
>  	{
> +		.name = "effective_cpus",
> +		.flags = CFTYPE_SANE,
> +		.read = cpuset_common_file_read,
> +		.max_write_len = (100U + 6 * NR_CPUS),
> +		.private = FILE_EFFECTIVE_CPULIST,

I don't think we need CFTYPE_SANE.  We can just expose these
unconditionally, right?  It still means the same thing when !sane.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 11/11] cpuset: export effective masks to userspace
@ 2013-08-21 14:20       ` Tejun Heo
  0 siblings, 0 replies; 66+ messages in thread
From: Tejun Heo @ 2013-08-21 14:20 UTC (permalink / raw)
  To: Li Zefan; +Cc: LKML, Cgroups, Containers

On Wed, Aug 21, 2013 at 06:01:32PM +0800, Li Zefan wrote:
>  	{
> +		.name = "effective_cpus",
> +		.flags = CFTYPE_SANE,
> +		.read = cpuset_common_file_read,
> +		.max_write_len = (100U + 6 * NR_CPUS),
> +		.private = FILE_EFFECTIVE_CPULIST,

I don't think we need CFTYPE_SANE.  We can just expose these
unconditionally, right?  It still means the same thing when !sane.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 00/11] cpuset: separate configured masks and effective masks
       [not found] ` <52148F52.0-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
                     ` (10 preceding siblings ...)
  2013-08-21 10:01     ` Li Zefan
@ 2013-08-21 14:21   ` Tejun Heo
  11 siblings, 0 replies; 66+ messages in thread
From: Tejun Heo @ 2013-08-21 14:21 UTC (permalink / raw)
  To: Li Zefan; +Cc: Cgroups, Containers, LKML

Hello, Li.

On Wed, Aug 21, 2013 at 05:58:42PM +0800, Li Zefan wrote:
> This patcheset introduces behavior changes, but only if you mount cgroupfs
> with sane_behavior option:
> 
> - We introduce new interfaces cpuset.effective_cpus and cpuset.effective_mems,
>   while cpuset.cpus and cpuset.mems will be configured masks.
> 
> - The configured masks can be changed by writing cpuset.cpus/mems only. They
>   won't be changed when hotplug happens.
> 
> - Users can config cpus and mems without restrictions from the parent cpuset.
>   effective masks will enforce the hierarchical behavior.
> 
> - Users can also config cpus and mems to have already offlined CPU/nodes.
> 
> - When a CPU/node is onlined, it will be brought back to the effective masks
>   if it's in the configured masks.
> 
> - We build sched domains based on effective cpumask but not configured cpumask.

Overall, it looks great.  Thank you so much for doing this.  Most of
my reviews are concerned with patch descriptions and documentation.  I
wish they showed clearer what are changed how and why that doesn't
affect !sane behaviors.

Thanks!

-- 
tejun

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 00/11] cpuset: separate configured masks and effective masks
       [not found] ` <52148F52.0-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2013-08-21 14:21   ` Tejun Heo
  2013-08-21  9:59   ` [PATCH 02/11] cpuset: add cs->real_cpus_allowed and cs->real_mems_allowed Li Zefan
                     ` (10 subsequent siblings)
  11 siblings, 0 replies; 66+ messages in thread
From: Tejun Heo @ 2013-08-21 14:21 UTC (permalink / raw)
  To: Li Zefan; +Cc: LKML, Cgroups, Containers

Hello, Li.

On Wed, Aug 21, 2013 at 05:58:42PM +0800, Li Zefan wrote:
> This patcheset introduces behavior changes, but only if you mount cgroupfs
> with sane_behavior option:
> 
> - We introduce new interfaces cpuset.effective_cpus and cpuset.effective_mems,
>   while cpuset.cpus and cpuset.mems will be configured masks.
> 
> - The configured masks can be changed by writing cpuset.cpus/mems only. They
>   won't be changed when hotplug happens.
> 
> - Users can config cpus and mems without restrictions from the parent cpuset.
>   effective masks will enforce the hierarchical behavior.
> 
> - Users can also config cpus and mems to have already offlined CPU/nodes.
> 
> - When a CPU/node is onlined, it will be brought back to the effective masks
>   if it's in the configured masks.
> 
> - We build sched domains based on effective cpumask but not configured cpumask.

Overall, it looks great.  Thank you so much for doing this.  Most of
my reviews are concerned with patch descriptions and documentation.  I
wish they showed clearer what are changed how and why that doesn't
affect !sane behaviors.

Thanks!

-- 
tejun

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 00/11] cpuset: separate configured masks and effective masks
@ 2013-08-21 14:21   ` Tejun Heo
  0 siblings, 0 replies; 66+ messages in thread
From: Tejun Heo @ 2013-08-21 14:21 UTC (permalink / raw)
  To: Li Zefan; +Cc: LKML, Cgroups, Containers

Hello, Li.

On Wed, Aug 21, 2013 at 05:58:42PM +0800, Li Zefan wrote:
> This patcheset introduces behavior changes, but only if you mount cgroupfs
> with sane_behavior option:
> 
> - We introduce new interfaces cpuset.effective_cpus and cpuset.effective_mems,
>   while cpuset.cpus and cpuset.mems will be configured masks.
> 
> - The configured masks can be changed by writing cpuset.cpus/mems only. They
>   won't be changed when hotplug happens.
> 
> - Users can config cpus and mems without restrictions from the parent cpuset.
>   effective masks will enforce the hierarchical behavior.
> 
> - Users can also config cpus and mems to have already offlined CPU/nodes.
> 
> - When a CPU/node is onlined, it will be brought back to the effective masks
>   if it's in the configured masks.
> 
> - We build sched domains based on effective cpumask but not configured cpumask.

Overall, it looks great.  Thank you so much for doing this.  Most of
my reviews are concerned with patch descriptions and documentation.  I
wish they showed clearer what are changed how and why that doesn't
affect !sane behaviors.

Thanks!

-- 
tejun

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 10/11] cpuset: allow writing offlined masks to cpuset.cpus/mems
  2013-08-21 14:18     ` Tejun Heo
@ 2013-08-23  7:37         ` Li Zefan
  -1 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-08-23  7:37 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Cgroups, Containers, LKML

On 2013/8/21 22:18, Tejun Heo wrote:
> Hello,
> 
> On Wed, Aug 21, 2013 at 06:01:21PM +0800, Li Zefan wrote:
>> -		if (!cpumask_subset(trialcs->cpus_allowed, cpu_active_mask))
>> +		if (!cpumask_subset(trialcs->cpus_allowed,
>> +				    top_cpuset.cpus_allowed))
> 
> Hmmm... top_cpuset.cpus_allowed is filled using cpumask_setall(),
> which may include more bits than cpu_possible_mask, which is kinda
> weird.  We probably wanna initialize it with cpu_possible_mask and
> also maybe using cpu_possible_mask in the above would be clearer?

In cpuset_init(), all the bits in cpus_allowed are set. Then in
cpuset_init_smp(), it's set to cpu_active_mask.

so we should set top_cpuset.cpus_allowed to possible_mask if mount
with sane_behavior, otherwise set it to active_mask.

> 
> Also, shouldn't this be dependent upon sane_behavior?
> 

We already treat top_cpus.cpus_allowed differently depending on
save_behavior, so the if statement works correctly.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 10/11] cpuset: allow writing offlined masks to cpuset.cpus/mems
@ 2013-08-23  7:37         ` Li Zefan
  0 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-08-23  7:37 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups, Containers

On 2013/8/21 22:18, Tejun Heo wrote:
> Hello,
> 
> On Wed, Aug 21, 2013 at 06:01:21PM +0800, Li Zefan wrote:
>> -		if (!cpumask_subset(trialcs->cpus_allowed, cpu_active_mask))
>> +		if (!cpumask_subset(trialcs->cpus_allowed,
>> +				    top_cpuset.cpus_allowed))
> 
> Hmmm... top_cpuset.cpus_allowed is filled using cpumask_setall(),
> which may include more bits than cpu_possible_mask, which is kinda
> weird.  We probably wanna initialize it with cpu_possible_mask and
> also maybe using cpu_possible_mask in the above would be clearer?

In cpuset_init(), all the bits in cpus_allowed are set. Then in
cpuset_init_smp(), it's set to cpu_active_mask.

so we should set top_cpuset.cpus_allowed to possible_mask if mount
with sane_behavior, otherwise set it to active_mask.

> 
> Also, shouldn't this be dependent upon sane_behavior?
> 

We already treat top_cpus.cpus_allowed differently depending on
save_behavior, so the if statement works correctly.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 08/11] cpuset: separate configured masks and efffective masks
       [not found]       ` <20130821140846.GH19286-9pTldWuhBndy/B6EtB590w@public.gmane.org>
@ 2013-08-23  7:46         ` Li Zefan
  0 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-08-23  7:46 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Cgroups, Containers, LKML

On 2013/8/21 22:08, Tejun Heo wrote:
> On Wed, Aug 21, 2013 at 06:00:42PM +0800, Li Zefan wrote:
>> @@ -2261,7 +2271,8 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
>>  	/* synchronize mems_allowed to N_MEMORY */
>>  	if (mems_updated) {
>>  		mutex_lock(&callback_mutex);
>> -		top_cpuset.mems_allowed = new_mems;
>> +		if (!sane)
>> +			top_cpuset.mems_allowed = new_mems;
> 
> Can you please further explain how the top cgroup behaves?
> 

top_cpuset.cpus_allowed will always be cpu_active_mask if sane_behavior
is not set, otherwise it will always be cpu_possible_mask. While
top_cpuset.effective_cpus will always be cpu_active_mask in either
case.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 08/11] cpuset: separate configured masks and efffective masks
       [not found]       ` <20130821140846.GH19286-9pTldWuhBndy/B6EtB590w@public.gmane.org>
@ 2013-08-23  7:46         ` Li Zefan
  0 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-08-23  7:46 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups, Containers

On 2013/8/21 22:08, Tejun Heo wrote:
> On Wed, Aug 21, 2013 at 06:00:42PM +0800, Li Zefan wrote:
>> @@ -2261,7 +2271,8 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
>>  	/* synchronize mems_allowed to N_MEMORY */
>>  	if (mems_updated) {
>>  		mutex_lock(&callback_mutex);
>> -		top_cpuset.mems_allowed = new_mems;
>> +		if (!sane)
>> +			top_cpuset.mems_allowed = new_mems;
> 
> Can you please further explain how the top cgroup behaves?
> 

top_cpuset.cpus_allowed will always be cpu_active_mask if sane_behavior
is not set, otherwise it will always be cpu_possible_mask. While
top_cpuset.effective_cpus will always be cpu_active_mask in either
case.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 08/11] cpuset: separate configured masks and efffective masks
@ 2013-08-23  7:46         ` Li Zefan
  0 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-08-23  7:46 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups, Containers

On 2013/8/21 22:08, Tejun Heo wrote:
> On Wed, Aug 21, 2013 at 06:00:42PM +0800, Li Zefan wrote:
>> @@ -2261,7 +2271,8 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
>>  	/* synchronize mems_allowed to N_MEMORY */
>>  	if (mems_updated) {
>>  		mutex_lock(&callback_mutex);
>> -		top_cpuset.mems_allowed = new_mems;
>> +		if (!sane)
>> +			top_cpuset.mems_allowed = new_mems;
> 
> Can you please further explain how the top cgroup behaves?
> 

top_cpuset.cpus_allowed will always be cpu_active_mask if sane_behavior
is not set, otherwise it will always be cpu_possible_mask. While
top_cpuset.effective_cpus will always be cpu_active_mask in either
case.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 11/11] cpuset: export effective masks to userspace
  2013-08-21 14:20       ` Tejun Heo
@ 2013-08-23  7:53           ` Li Zefan
  -1 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-08-23  7:53 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Cgroups, Containers, LKML

On 2013/8/21 22:20, Tejun Heo wrote:
> On Wed, Aug 21, 2013 at 06:01:32PM +0800, Li Zefan wrote:
>>  	{
>> +		.name = "effective_cpus",
>> +		.flags = CFTYPE_SANE,
>> +		.read = cpuset_common_file_read,
>> +		.max_write_len = (100U + 6 * NR_CPUS),
>> +		.private = FILE_EFFECTIVE_CPULIST,
> 
> I don't think we need CFTYPE_SANE.  We can just expose these
> unconditionally, right?  It still means the same thing when !sane.
> 

It seems confusing if there're two interaces but they actually mean
the same thing.

Another reason I didn't do this is, they're not always the same. When
!sane, If cpus_allowed is empty, effective_cpus is not empty, and you
are not able to put tasks into this cpuset. So if we want to  expose
it unconditionally, I'll make sure cpus_allowed == effective_cpus
always stand when !sane.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 11/11] cpuset: export effective masks to userspace
@ 2013-08-23  7:53           ` Li Zefan
  0 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-08-23  7:53 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups, Containers

On 2013/8/21 22:20, Tejun Heo wrote:
> On Wed, Aug 21, 2013 at 06:01:32PM +0800, Li Zefan wrote:
>>  	{
>> +		.name = "effective_cpus",
>> +		.flags = CFTYPE_SANE,
>> +		.read = cpuset_common_file_read,
>> +		.max_write_len = (100U + 6 * NR_CPUS),
>> +		.private = FILE_EFFECTIVE_CPULIST,
> 
> I don't think we need CFTYPE_SANE.  We can just expose these
> unconditionally, right?  It still means the same thing when !sane.
> 

It seems confusing if there're two interaces but they actually mean
the same thing.

Another reason I didn't do this is, they're not always the same. When
!sane, If cpus_allowed is empty, effective_cpus is not empty, and you
are not able to put tasks into this cpuset. So if we want to  expose
it unconditionally, I'll make sure cpus_allowed == effective_cpus
always stand when !sane.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 11/11] cpuset: export effective masks to userspace
  2013-08-23  7:53           ` Li Zefan
@ 2013-08-23 12:34               ` Tejun Heo
  -1 siblings, 0 replies; 66+ messages in thread
From: Tejun Heo @ 2013-08-23 12:34 UTC (permalink / raw)
  To: Li Zefan; +Cc: Cgroups, Containers, LKML

Hello,

On Fri, Aug 23, 2013 at 03:53:37PM +0800, Li Zefan wrote:
> It seems confusing if there're two interaces but they actually mean
> the same thing.
>
> Another reason I didn't do this is, they're not always the same. When
> !sane, If cpus_allowed is empty, effective_cpus is not empty, and you
> are not able to put tasks into this cpuset. So if we want to  expose
> it unconditionally, I'll make sure cpus_allowed == effective_cpus
> always stand when !sane.

Do you think that'll be convoluted?  Otherwise, sounds like a nice
thing to do anyway.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 11/11] cpuset: export effective masks to userspace
@ 2013-08-23 12:34               ` Tejun Heo
  0 siblings, 0 replies; 66+ messages in thread
From: Tejun Heo @ 2013-08-23 12:34 UTC (permalink / raw)
  To: Li Zefan; +Cc: LKML, Cgroups, Containers

Hello,

On Fri, Aug 23, 2013 at 03:53:37PM +0800, Li Zefan wrote:
> It seems confusing if there're two interaces but they actually mean
> the same thing.
>
> Another reason I didn't do this is, they're not always the same. When
> !sane, If cpus_allowed is empty, effective_cpus is not empty, and you
> are not able to put tasks into this cpuset. So if we want to  expose
> it unconditionally, I'll make sure cpus_allowed == effective_cpus
> always stand when !sane.

Do you think that'll be convoluted?  Otherwise, sounds like a nice
thing to do anyway.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 08/11] cpuset: separate configured masks and efffective masks
       [not found]         ` <52171367.90005-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2013-08-23 15:33           ` Tejun Heo
  0 siblings, 0 replies; 66+ messages in thread
From: Tejun Heo @ 2013-08-23 15:33 UTC (permalink / raw)
  To: Li Zefan; +Cc: Cgroups, Containers, LKML

Hello, Li.

On Fri, Aug 23, 2013 at 03:46:47PM +0800, Li Zefan wrote:
> On 2013/8/21 22:08, Tejun Heo wrote:
> > On Wed, Aug 21, 2013 at 06:00:42PM +0800, Li Zefan wrote:
> >> @@ -2261,7 +2271,8 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
> >>  	/* synchronize mems_allowed to N_MEMORY */
> >>  	if (mems_updated) {
> >>  		mutex_lock(&callback_mutex);
> >> -		top_cpuset.mems_allowed = new_mems;
> >> +		if (!sane)
> >> +			top_cpuset.mems_allowed = new_mems;
> > 
> > Can you please further explain how the top cgroup behaves?
> > 
> 
> top_cpuset.cpus_allowed will always be cpu_active_mask if sane_behavior
> is not set, otherwise it will always be cpu_possible_mask. While
> top_cpuset.effective_cpus will always be cpu_active_mask in either
> case.

Just in case it wasn't clear, it'd be great if you can also explain
what's going on w.r.t. sane_behavior in the comments and patch
description.  Having dual modes of operation can always be quite
confusing so I think some documentation could be very beneficial.

Thanks!

-- 
tejun

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 08/11] cpuset: separate configured masks and efffective masks
       [not found]         ` <52171367.90005-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2013-08-23 15:33           ` Tejun Heo
  0 siblings, 0 replies; 66+ messages in thread
From: Tejun Heo @ 2013-08-23 15:33 UTC (permalink / raw)
  To: Li Zefan; +Cc: LKML, Cgroups, Containers

Hello, Li.

On Fri, Aug 23, 2013 at 03:46:47PM +0800, Li Zefan wrote:
> On 2013/8/21 22:08, Tejun Heo wrote:
> > On Wed, Aug 21, 2013 at 06:00:42PM +0800, Li Zefan wrote:
> >> @@ -2261,7 +2271,8 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
> >>  	/* synchronize mems_allowed to N_MEMORY */
> >>  	if (mems_updated) {
> >>  		mutex_lock(&callback_mutex);
> >> -		top_cpuset.mems_allowed = new_mems;
> >> +		if (!sane)
> >> +			top_cpuset.mems_allowed = new_mems;
> > 
> > Can you please further explain how the top cgroup behaves?
> > 
> 
> top_cpuset.cpus_allowed will always be cpu_active_mask if sane_behavior
> is not set, otherwise it will always be cpu_possible_mask. While
> top_cpuset.effective_cpus will always be cpu_active_mask in either
> case.

Just in case it wasn't clear, it'd be great if you can also explain
what's going on w.r.t. sane_behavior in the comments and patch
description.  Having dual modes of operation can always be quite
confusing so I think some documentation could be very beneficial.

Thanks!

-- 
tejun

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 08/11] cpuset: separate configured masks and efffective masks
@ 2013-08-23 15:33           ` Tejun Heo
  0 siblings, 0 replies; 66+ messages in thread
From: Tejun Heo @ 2013-08-23 15:33 UTC (permalink / raw)
  To: Li Zefan; +Cc: LKML, Cgroups, Containers

Hello, Li.

On Fri, Aug 23, 2013 at 03:46:47PM +0800, Li Zefan wrote:
> On 2013/8/21 22:08, Tejun Heo wrote:
> > On Wed, Aug 21, 2013 at 06:00:42PM +0800, Li Zefan wrote:
> >> @@ -2261,7 +2271,8 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
> >>  	/* synchronize mems_allowed to N_MEMORY */
> >>  	if (mems_updated) {
> >>  		mutex_lock(&callback_mutex);
> >> -		top_cpuset.mems_allowed = new_mems;
> >> +		if (!sane)
> >> +			top_cpuset.mems_allowed = new_mems;
> > 
> > Can you please further explain how the top cgroup behaves?
> > 
> 
> top_cpuset.cpus_allowed will always be cpu_active_mask if sane_behavior
> is not set, otherwise it will always be cpu_possible_mask. While
> top_cpuset.effective_cpus will always be cpu_active_mask in either
> case.

Just in case it wasn't clear, it'd be great if you can also explain
what's going on w.r.t. sane_behavior in the comments and patch
description.  Having dual modes of operation can always be quite
confusing so I think some documentation could be very beneficial.

Thanks!

-- 
tejun

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 00/11] cpuset: separate configured masks and effective masks
@ 2013-08-21  9:58 Li Zefan
  0 siblings, 0 replies; 66+ messages in thread
From: Li Zefan @ 2013-08-21  9:58 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Cgroups, Containers, LKML

This patcheset introduces behavior changes, but only if you mount cgroupfs
with sane_behavior option:

- We introduce new interfaces cpuset.effective_cpus and cpuset.effective_mems,
  while cpuset.cpus and cpuset.mems will be configured masks.

- The configured masks can be changed by writing cpuset.cpus/mems only. They
  won't be changed when hotplug happens.

- Users can config cpus and mems without restrictions from the parent cpuset.
  effective masks will enforce the hierarchical behavior.

- Users can also config cpus and mems to have already offlined CPU/nodes.

- When a CPU/node is onlined, it will be brought back to the effective masks
  if it's in the configured masks.

- We build sched domains based on effective cpumask but not configured cpumask.

Li Zefan (11):
  cgroup: allow subsystems to create files for sane_behavior only
  cpuset: add cs->real_cpus_allowed and cs->real_mems_allowed
  cpuset: update cpuset->real_{cpus,mems}_allowed at hotplug
  cpuset: update cs->real_{cpus,mems}_allowed when config changes
  cpuset: inherite ancestor's masks if real_{cpus,mems}_allowed become empty
  cpuset: apply cs->real_{cpus,mems}_allowed
  cpuset: use effective cpumask to build sched domains
  cpuset: separate configured masks and efffective masks
  cpuset: enable onlined cpu/node in effective masks
  cpuset: allow writing offlined masks to cpuset.cpus/mems
  cpuset: export effective masks to userspace

 include/linux/cgroup.h |   1 +
 kernel/cgroup.c        |   2 +
 kernel/cpuset.c        | 466 ++++++++++++++++++++++++++++---------------------
 3 files changed, 271 insertions(+), 198 deletions(-)

-- 
1.8.0.2

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2013-08-23 15:33 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-21  9:58 [PATCH 00/11] cpuset: separate configured masks and effective masks Li Zefan
2013-08-21  9:58 ` Li Zefan
2013-08-21  9:58 ` [PATCH 01/11] cgroup: allow subsystems to create files for sane_behavior only Li Zefan
2013-08-21  9:59 ` [PATCH 02/11] cpuset: add cs->real_cpus_allowed and cs->real_mems_allowed Li Zefan
2013-08-21  9:59   ` Li Zefan
     [not found]   ` <52148F6F.4070507-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-08-21 13:22     ` Tejun Heo
2013-08-21 13:22       ` Tejun Heo
2013-08-21  9:59 ` [PATCH 03/11] cpuset: update cpuset->real_{cpus,mems}_allowed at hotplug Li Zefan
2013-08-21  9:59 ` [PATCH 04/11] cpuset: update cs->real_{cpus,mems}_allowed when config changes Li Zefan
2013-08-21  9:59   ` Li Zefan
     [not found]   ` <52148F84.9050309-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-08-21 13:39     ` Tejun Heo
2013-08-21 13:39       ` Tejun Heo
     [not found] ` <52148F52.0-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-08-21  9:58   ` [PATCH 01/11] cgroup: allow subsystems to create files for sane_behavior only Li Zefan
2013-08-21  9:59   ` [PATCH 02/11] cpuset: add cs->real_cpus_allowed and cs->real_mems_allowed Li Zefan
2013-08-21  9:59   ` [PATCH 03/11] cpuset: update cpuset->real_{cpus,mems}_allowed at hotplug Li Zefan
2013-08-21  9:59   ` [PATCH 04/11] cpuset: update cs->real_{cpus, mems}_allowed when config changes Li Zefan
2013-08-21  9:59   ` [PATCH 05/11] cpuset: inherite ancestor's masks if real_{cpus, mems}_allowed become empty Li Zefan
2013-08-21  9:59     ` [PATCH 05/11] cpuset: inherite ancestor's masks if real_{cpus,mems}_allowed " Li Zefan
2013-08-21 13:44     ` Tejun Heo
2013-08-21 13:44       ` Tejun Heo
     [not found]     ` <52148F90.7070809-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-08-21 13:44       ` Tejun Heo
2013-08-21  9:59   ` [PATCH 06/11] cpuset: apply cs->real_{cpus,mems}_allowed Li Zefan
2013-08-21 10:00   ` [PATCH 07/11] cpuset: use effective cpumask to build sched domains Li Zefan
2013-08-21 10:00     ` Li Zefan
     [not found]     ` <52148FA9.806-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-08-21 14:04       ` Tejun Heo
2013-08-21 14:04         ` Tejun Heo
2013-08-21 10:00   ` [PATCH 08/11] cpuset: separate configured masks and efffective masks Li Zefan
2013-08-21 10:01   ` [PATCH 09/11] cpuset: enable onlined cpu/node in effective masks Li Zefan
2013-08-21 10:01     ` Li Zefan
2013-08-21 14:11     ` Tejun Heo
2013-08-21 14:11       ` Tejun Heo
     [not found]     ` <52148FE1.3080806-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-08-21 14:11       ` Tejun Heo
2013-08-21 10:01   ` [PATCH 10/11] cpuset: allow writing offlined masks to cpuset.cpus/mems Li Zefan
2013-08-21 10:01   ` [PATCH 11/11] cpuset: export effective masks to userspace Li Zefan
2013-08-21 10:01     ` Li Zefan
2013-08-21 14:20     ` Tejun Heo
2013-08-21 14:20       ` Tejun Heo
     [not found]       ` <20130821142001.GK19286-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-08-23  7:53         ` Li Zefan
2013-08-23  7:53           ` Li Zefan
     [not found]           ` <52171501.8050401-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-08-23 12:34             ` Tejun Heo
2013-08-23 12:34               ` Tejun Heo
     [not found]     ` <52148FFC.4080701-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-08-21 14:20       ` Tejun Heo
2013-08-21 14:21   ` [PATCH 00/11] cpuset: separate configured masks and effective masks Tejun Heo
2013-08-21  9:59 ` [PATCH 06/11] cpuset: apply cs->real_{cpus,mems}_allowed Li Zefan
2013-08-21 14:01   ` Tejun Heo
2013-08-21 14:01     ` Tejun Heo
     [not found]   ` <52148F9C.2080600-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-08-21 14:01     ` Tejun Heo
2013-08-21 10:00 ` [PATCH 08/11] cpuset: separate configured masks and efffective masks Li Zefan
     [not found]   ` <52148FCA.8010704-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-08-21 14:08     ` Tejun Heo
2013-08-21 14:08       ` Tejun Heo
     [not found]       ` <20130821140846.GH19286-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-08-23  7:46         ` Li Zefan
2013-08-23  7:46       ` Li Zefan
2013-08-23  7:46         ` Li Zefan
2013-08-23 15:33         ` Tejun Heo
2013-08-23 15:33           ` Tejun Heo
     [not found]         ` <52171367.90005-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-08-23 15:33           ` Tejun Heo
2013-08-21 10:01 ` [PATCH 10/11] cpuset: allow writing offlined masks to cpuset.cpus/mems Li Zefan
2013-08-21 10:01   ` Li Zefan
2013-08-21 14:18   ` Tejun Heo
2013-08-21 14:18     ` Tejun Heo
     [not found]     ` <20130821141851.GJ19286-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-08-23  7:37       ` Li Zefan
2013-08-23  7:37         ` Li Zefan
     [not found]   ` <52148FF1.5060503-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-08-21 14:18     ` Tejun Heo
2013-08-21 14:21 ` [PATCH 00/11] cpuset: separate configured masks and effective masks Tejun Heo
2013-08-21 14:21   ` Tejun Heo
2013-08-21  9:58 Li Zefan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.