All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/5] cpuset: Enable cpuset controller in default hierarchy
@ 2018-04-19 13:46 ` Waiman Long
  0 siblings, 0 replies; 48+ messages in thread
From: Waiman Long @ 2018-04-19 13:46 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Waiman Long

v7:
 - Add a root-only cpuset.cpus.isolated control file for CPU isolation.
 - Enforce that load_balancing can only be turned off on cpusets with
   CPUs from the isolated list.
 - Update sched domain generation to allow cpusets with CPUs only
   from the isolated CPU list to be in separate root domains.

v6:
 - Hide cpuset control knobs in root cgroup.
 - Rename effective_cpus and effective_mems to cpus.effective and
   mems.effective respectively.
 - Remove cpuset.flags and add cpuset.sched_load_balance instead
   as the behavior of sched_load_balance has changed and so is
   not a simple flag.
 - Update cgroup-v2.txt accordingly.

v5:
 - Add patch 2 to provide the cpuset.flags control knob for the
   sched_load_balance flag which should be the only feature that is
   essential as a replacement of the "isolcpus" kernel boot parameter.

v4:
 - Further minimize the feature set by removing the flags control knob.

v3:
 - Further trim the additional features down to just memory_migrate.
 - Update Documentation/cgroup-v2.txt.

v6 patch: https://lkml.org/lkml/2018/3/21/530

The purpose of this patchset is to provide a basic set of cpuset
features for cgroup v2. This basic set includes the non-root "cpus",
"mems", "cpus.effective" and "mems.effective", "sched_load_balance"
control file as well as a root-only "cpus.isolated".

The root-only "cpus.isolated" file is added to support use cases similar
to the "isolcpus" kernel parameter. CPUs from the isolated list can be
put into child cpusets where "sched_load_balance" can be disabled to
allow finer control of task-cpu mappings of those isolated CPUs.

On the other hand, enabling the "sched_load_balance" on a cpuset with
only CPUs from the isolated list will allow those CPUs to use a separate
root domain from that of the root cpuset.

This patchset does not exclude the possibility of adding more features
in the future after careful consideration.

Patch 1 enables cpuset in cgroup v2 with cpus, mems and their
effective counterparts.

Patch 2 adds sched_load_balance whose behavior changes in v2 to become
hierarchical and includes an implicit !cpu_exclusive.

Patch 3 adds a new root-only "cpuset.cpus.isolated" control file for
CPU isolation purpose.

Patch 4 adds the limitation that "sched_load_balance" can only be turned
off in a cpuset if all the CPUs in the cpuset are already in the root's
"cpuset.cpus.isolated".

Patch 5 modifies the sched domain generation code to generate separate root
sched domains if all the CPUs in a cpuset comes from "cpuset.cpus.isolated".

In other words, all the CPUs that need to be isolated or in separate
root domains have to be put into the "cpuset.cpus.isolated" first. Then
child cpusets can be created to partition those isolated CPUs into
either separate root domains with "sched_load_balance" on or really
isolated CPUs with "sched_load_balance" off. Load balancing cannot
be turned off at root.

Waiman Long (5):
  cpuset: Enable cpuset controller in default hierarchy
  cpuset: Add cpuset.sched_load_balance to v2
  cpuset: Add a root-only cpus.isolated v2 control file
  cpuset: Restrict load balancing off cpus to subset of cpus.isolated
  cpuset: Make generate_sched_domains() recognize isolated_cpus

 Documentation/cgroup-v2.txt | 138 ++++++++++++++++++++-
 kernel/cgroup/cpuset.c      | 287 +++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 404 insertions(+), 21 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v7 0/5] cpuset: Enable cpuset controller in default hierarchy
@ 2018-04-19 13:46 ` Waiman Long
  0 siblings, 0 replies; 48+ messages in thread
From: Waiman Long @ 2018-04-19 13:46 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Waiman Long

v7:
 - Add a root-only cpuset.cpus.isolated control file for CPU isolation.
 - Enforce that load_balancing can only be turned off on cpusets with
   CPUs from the isolated list.
 - Update sched domain generation to allow cpusets with CPUs only
   from the isolated CPU list to be in separate root domains.

v6:
 - Hide cpuset control knobs in root cgroup.
 - Rename effective_cpus and effective_mems to cpus.effective and
   mems.effective respectively.
 - Remove cpuset.flags and add cpuset.sched_load_balance instead
   as the behavior of sched_load_balance has changed and so is
   not a simple flag.
 - Update cgroup-v2.txt accordingly.

v5:
 - Add patch 2 to provide the cpuset.flags control knob for the
   sched_load_balance flag which should be the only feature that is
   essential as a replacement of the "isolcpus" kernel boot parameter.

v4:
 - Further minimize the feature set by removing the flags control knob.

v3:
 - Further trim the additional features down to just memory_migrate.
 - Update Documentation/cgroup-v2.txt.

v6 patch: https://lkml.org/lkml/2018/3/21/530

The purpose of this patchset is to provide a basic set of cpuset
features for cgroup v2. This basic set includes the non-root "cpus",
"mems", "cpus.effective" and "mems.effective", "sched_load_balance"
control file as well as a root-only "cpus.isolated".

The root-only "cpus.isolated" file is added to support use cases similar
to the "isolcpus" kernel parameter. CPUs from the isolated list can be
put into child cpusets where "sched_load_balance" can be disabled to
allow finer control of task-cpu mappings of those isolated CPUs.

On the other hand, enabling the "sched_load_balance" on a cpuset with
only CPUs from the isolated list will allow those CPUs to use a separate
root domain from that of the root cpuset.

This patchset does not exclude the possibility of adding more features
in the future after careful consideration.

Patch 1 enables cpuset in cgroup v2 with cpus, mems and their
effective counterparts.

Patch 2 adds sched_load_balance whose behavior changes in v2 to become
hierarchical and includes an implicit !cpu_exclusive.

Patch 3 adds a new root-only "cpuset.cpus.isolated" control file for
CPU isolation purpose.

Patch 4 adds the limitation that "sched_load_balance" can only be turned
off in a cpuset if all the CPUs in the cpuset are already in the root's
"cpuset.cpus.isolated".

Patch 5 modifies the sched domain generation code to generate separate root
sched domains if all the CPUs in a cpuset comes from "cpuset.cpus.isolated".

In other words, all the CPUs that need to be isolated or in separate
root domains have to be put into the "cpuset.cpus.isolated" first. Then
child cpusets can be created to partition those isolated CPUs into
either separate root domains with "sched_load_balance" on or really
isolated CPUs with "sched_load_balance" off. Load balancing cannot
be turned off at root.

Waiman Long (5):
  cpuset: Enable cpuset controller in default hierarchy
  cpuset: Add cpuset.sched_load_balance to v2
  cpuset: Add a root-only cpus.isolated v2 control file
  cpuset: Restrict load balancing off cpus to subset of cpus.isolated
  cpuset: Make generate_sched_domains() recognize isolated_cpus

 Documentation/cgroup-v2.txt | 138 ++++++++++++++++++++-
 kernel/cgroup/cpuset.c      | 287 +++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 404 insertions(+), 21 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v7 1/5] cpuset: Enable cpuset controller in default hierarchy
  2018-04-19 13:46 ` Waiman Long
@ 2018-04-19 13:47   ` Waiman Long
  -1 siblings, 0 replies; 48+ messages in thread
From: Waiman Long @ 2018-04-19 13:47 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Waiman Long

Given the fact that thread mode had been merged into 4.14, it is now
time to enable cpuset to be used in the default hierarchy (cgroup v2)
as it is clearly threaded.

The cpuset controller had experienced feature creep since its
introduction more than a decade ago. Besides the core cpus and mems
control files to limit cpus and memory nodes, there are a bunch of
additional features that can be controlled from the userspace. Some of
the features are of doubtful usefulness and may not be actively used.

This patch enables cpuset controller in the default hierarchy with
a minimal set of features, namely just the cpus and mems and their
effective_* counterparts.  We can certainly add more features to the
default hierarchy in the future if there is a real user need for them
later on.

Alternatively, with the unified hiearachy, it may make more sense
to move some of those additional cpuset features, if desired, to
memory controller or may be to the cpu controller instead of staying
with cpuset.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/cgroup-v2.txt | 90 ++++++++++++++++++++++++++++++++++++++++++---
 kernel/cgroup/cpuset.c      | 48 ++++++++++++++++++++++--
 2 files changed, 130 insertions(+), 8 deletions(-)

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index 74cdeae..ed8ec66 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -53,11 +53,13 @@ v1 is available under Documentation/cgroup-v1/.
        5-3-2. Writeback
      5-4. PID
        5-4-1. PID Interface Files
-     5-5. Device
-     5-6. RDMA
-       5-6-1. RDMA Interface Files
-     5-7. Misc
-       5-7-1. perf_event
+     5-5. Cpuset
+       5.5-1. Cpuset Interface Files
+     5-6. Device
+     5-7. RDMA
+       5-7-1. RDMA Interface Files
+     5-8. Misc
+       5-8-1. perf_event
      5-N. Non-normative information
        5-N-1. CPU controller root cgroup process behaviour
        5-N-2. IO controller root cgroup process behaviour
@@ -1435,6 +1437,84 @@ through fork() or clone(). These will return -EAGAIN if the creation
 of a new process would cause a cgroup policy to be violated.
 
 
+Cpuset
+------
+
+The "cpuset" controller provides a mechanism for constraining
+the CPU and memory node placement of tasks to only the resources
+specified in the cpuset interface files in a task's current cgroup.
+This is especially valuable on large NUMA systems where placing jobs
+on properly sized subsets of the systems with careful processor and
+memory placement to reduce cross-node memory access and contention
+can improve overall system performance.
+
+The "cpuset" controller is hierarchical.  That means the controller
+cannot use CPUs or memory nodes not allowed in its parent.
+
+
+Cpuset Interface Files
+~~~~~~~~~~~~~~~~~~~~~~
+
+  cpuset.cpus
+	A read-write multiple values file which exists on non-root
+	cgroups.
+
+	It lists the CPUs allowed to be used by tasks within this
+	cgroup.  The CPU numbers are comma-separated numbers or
+	ranges.  For example:
+
+	  # cat cpuset.cpus
+	  0-4,6,8-10
+
+	An empty value indicates that the cgroup is using the same
+	setting as the nearest cgroup ancestor with a non-empty
+	"cpuset.cpus" or all the available CPUs if none is found.
+
+	The value of "cpuset.cpus" stays constant until the next update
+	and won't be affected by any CPU hotplug events.
+
+  cpuset.cpus.effective
+	A read-only multiple values file which exists on non-root
+	cgroups.
+
+	It lists the onlined CPUs that are actually allowed to be
+	used by tasks within the current cgroup.  If "cpuset.cpus"
+	is empty, it shows all the CPUs from the parent cgroup that
+	will be available to be used by this cgroup.  Otherwise, it is
+	a subset of "cpuset.cpus".  Its value will be affected by CPU
+	hotplug events.
+
+  cpuset.mems
+	A read-write multiple values file which exists on non-root
+	cgroups.
+
+	It lists the memory nodes allowed to be used by tasks within
+	this cgroup.  The memory node numbers are comma-separated
+	numbers or ranges.  For example:
+
+	  # cat cpuset.mems
+	  0-1,3
+
+	An empty value indicates that the cgroup is using the same
+	setting as the nearest cgroup ancestor with a non-empty
+	"cpuset.mems" or all the available memory nodes if none
+	is found.
+
+	The value of "cpuset.mems" stays constant until the next update
+	and won't be affected by any memory nodes hotplug events.
+
+  cpuset.mems.effective
+	A read-only multiple values file which exists on non-root
+	cgroups.
+
+	It lists the onlined memory nodes that are actually allowed to
+	be used by tasks within the current cgroup.  If "cpuset.mems"
+	is empty, it shows all the memory nodes from the parent cgroup
+	that will be available to be used by this cgroup.  Otherwise,
+	it is a subset of "cpuset.mems".  Its value will be affected
+	by memory nodes hotplug events.
+
+
 Device controller
 -----------------
 
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index b42037e..419b758 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1823,12 +1823,11 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
 	return 0;
 }
 
-
 /*
  * for the common functions, 'private' gives the type of file
  */
 
-static struct cftype files[] = {
+static struct cftype legacy_files[] = {
 	{
 		.name = "cpus",
 		.seq_show = cpuset_common_seq_show,
@@ -1931,6 +1930,47 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
 };
 
 /*
+ * This is currently a minimal set for the default hierarchy. It can be
+ * expanded later on by migrating more features and control files from v1.
+ */
+static struct cftype dfl_files[] = {
+	{
+		.name = "cpus",
+		.seq_show = cpuset_common_seq_show,
+		.write = cpuset_write_resmask,
+		.max_write_len = (100U + 6 * NR_CPUS),
+		.private = FILE_CPULIST,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
+
+	{
+		.name = "mems",
+		.seq_show = cpuset_common_seq_show,
+		.write = cpuset_write_resmask,
+		.max_write_len = (100U + 6 * MAX_NUMNODES),
+		.private = FILE_MEMLIST,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
+
+	{
+		.name = "cpus.effective",
+		.seq_show = cpuset_common_seq_show,
+		.private = FILE_EFFECTIVE_CPULIST,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
+
+	{
+		.name = "mems.effective",
+		.seq_show = cpuset_common_seq_show,
+		.private = FILE_EFFECTIVE_MEMLIST,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
+
+	{ }	/* terminate */
+};
+
+
+/*
  *	cpuset_css_alloc - allocate a cpuset css
  *	cgrp:	control group that the new cpuset will be part of
  */
@@ -2104,8 +2144,10 @@ struct cgroup_subsys cpuset_cgrp_subsys = {
 	.post_attach	= cpuset_post_attach,
 	.bind		= cpuset_bind,
 	.fork		= cpuset_fork,
-	.legacy_cftypes	= files,
+	.legacy_cftypes	= legacy_files,
+	.dfl_cftypes	= dfl_files,
 	.early_init	= true,
+	.threaded	= true,
 };
 
 /**
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 1/5] cpuset: Enable cpuset controller in default hierarchy
@ 2018-04-19 13:47   ` Waiman Long
  0 siblings, 0 replies; 48+ messages in thread
From: Waiman Long @ 2018-04-19 13:47 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Waiman Long

Given the fact that thread mode had been merged into 4.14, it is now
time to enable cpuset to be used in the default hierarchy (cgroup v2)
as it is clearly threaded.

The cpuset controller had experienced feature creep since its
introduction more than a decade ago. Besides the core cpus and mems
control files to limit cpus and memory nodes, there are a bunch of
additional features that can be controlled from the userspace. Some of
the features are of doubtful usefulness and may not be actively used.

This patch enables cpuset controller in the default hierarchy with
a minimal set of features, namely just the cpus and mems and their
effective_* counterparts.  We can certainly add more features to the
default hierarchy in the future if there is a real user need for them
later on.

Alternatively, with the unified hiearachy, it may make more sense
to move some of those additional cpuset features, if desired, to
memory controller or may be to the cpu controller instead of staying
with cpuset.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/cgroup-v2.txt | 90 ++++++++++++++++++++++++++++++++++++++++++---
 kernel/cgroup/cpuset.c      | 48 ++++++++++++++++++++++--
 2 files changed, 130 insertions(+), 8 deletions(-)

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index 74cdeae..ed8ec66 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -53,11 +53,13 @@ v1 is available under Documentation/cgroup-v1/.
        5-3-2. Writeback
      5-4. PID
        5-4-1. PID Interface Files
-     5-5. Device
-     5-6. RDMA
-       5-6-1. RDMA Interface Files
-     5-7. Misc
-       5-7-1. perf_event
+     5-5. Cpuset
+       5.5-1. Cpuset Interface Files
+     5-6. Device
+     5-7. RDMA
+       5-7-1. RDMA Interface Files
+     5-8. Misc
+       5-8-1. perf_event
      5-N. Non-normative information
        5-N-1. CPU controller root cgroup process behaviour
        5-N-2. IO controller root cgroup process behaviour
@@ -1435,6 +1437,84 @@ through fork() or clone(). These will return -EAGAIN if the creation
 of a new process would cause a cgroup policy to be violated.
 
 
+Cpuset
+------
+
+The "cpuset" controller provides a mechanism for constraining
+the CPU and memory node placement of tasks to only the resources
+specified in the cpuset interface files in a task's current cgroup.
+This is especially valuable on large NUMA systems where placing jobs
+on properly sized subsets of the systems with careful processor and
+memory placement to reduce cross-node memory access and contention
+can improve overall system performance.
+
+The "cpuset" controller is hierarchical.  That means the controller
+cannot use CPUs or memory nodes not allowed in its parent.
+
+
+Cpuset Interface Files
+~~~~~~~~~~~~~~~~~~~~~~
+
+  cpuset.cpus
+	A read-write multiple values file which exists on non-root
+	cgroups.
+
+	It lists the CPUs allowed to be used by tasks within this
+	cgroup.  The CPU numbers are comma-separated numbers or
+	ranges.  For example:
+
+	  # cat cpuset.cpus
+	  0-4,6,8-10
+
+	An empty value indicates that the cgroup is using the same
+	setting as the nearest cgroup ancestor with a non-empty
+	"cpuset.cpus" or all the available CPUs if none is found.
+
+	The value of "cpuset.cpus" stays constant until the next update
+	and won't be affected by any CPU hotplug events.
+
+  cpuset.cpus.effective
+	A read-only multiple values file which exists on non-root
+	cgroups.
+
+	It lists the onlined CPUs that are actually allowed to be
+	used by tasks within the current cgroup.  If "cpuset.cpus"
+	is empty, it shows all the CPUs from the parent cgroup that
+	will be available to be used by this cgroup.  Otherwise, it is
+	a subset of "cpuset.cpus".  Its value will be affected by CPU
+	hotplug events.
+
+  cpuset.mems
+	A read-write multiple values file which exists on non-root
+	cgroups.
+
+	It lists the memory nodes allowed to be used by tasks within
+	this cgroup.  The memory node numbers are comma-separated
+	numbers or ranges.  For example:
+
+	  # cat cpuset.mems
+	  0-1,3
+
+	An empty value indicates that the cgroup is using the same
+	setting as the nearest cgroup ancestor with a non-empty
+	"cpuset.mems" or all the available memory nodes if none
+	is found.
+
+	The value of "cpuset.mems" stays constant until the next update
+	and won't be affected by any memory nodes hotplug events.
+
+  cpuset.mems.effective
+	A read-only multiple values file which exists on non-root
+	cgroups.
+
+	It lists the onlined memory nodes that are actually allowed to
+	be used by tasks within the current cgroup.  If "cpuset.mems"
+	is empty, it shows all the memory nodes from the parent cgroup
+	that will be available to be used by this cgroup.  Otherwise,
+	it is a subset of "cpuset.mems".  Its value will be affected
+	by memory nodes hotplug events.
+
+
 Device controller
 -----------------
 
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index b42037e..419b758 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1823,12 +1823,11 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
 	return 0;
 }
 
-
 /*
  * for the common functions, 'private' gives the type of file
  */
 
-static struct cftype files[] = {
+static struct cftype legacy_files[] = {
 	{
 		.name = "cpus",
 		.seq_show = cpuset_common_seq_show,
@@ -1931,6 +1930,47 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
 };
 
 /*
+ * This is currently a minimal set for the default hierarchy. It can be
+ * expanded later on by migrating more features and control files from v1.
+ */
+static struct cftype dfl_files[] = {
+	{
+		.name = "cpus",
+		.seq_show = cpuset_common_seq_show,
+		.write = cpuset_write_resmask,
+		.max_write_len = (100U + 6 * NR_CPUS),
+		.private = FILE_CPULIST,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
+
+	{
+		.name = "mems",
+		.seq_show = cpuset_common_seq_show,
+		.write = cpuset_write_resmask,
+		.max_write_len = (100U + 6 * MAX_NUMNODES),
+		.private = FILE_MEMLIST,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
+
+	{
+		.name = "cpus.effective",
+		.seq_show = cpuset_common_seq_show,
+		.private = FILE_EFFECTIVE_CPULIST,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
+
+	{
+		.name = "mems.effective",
+		.seq_show = cpuset_common_seq_show,
+		.private = FILE_EFFECTIVE_MEMLIST,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
+
+	{ }	/* terminate */
+};
+
+
+/*
  *	cpuset_css_alloc - allocate a cpuset css
  *	cgrp:	control group that the new cpuset will be part of
  */
@@ -2104,8 +2144,10 @@ struct cgroup_subsys cpuset_cgrp_subsys = {
 	.post_attach	= cpuset_post_attach,
 	.bind		= cpuset_bind,
 	.fork		= cpuset_fork,
-	.legacy_cftypes	= files,
+	.legacy_cftypes	= legacy_files,
+	.dfl_cftypes	= dfl_files,
 	.early_init	= true,
+	.threaded	= true,
 };
 
 /**
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 2/5] cpuset: Add cpuset.sched_load_balance to v2
  2018-04-19 13:46 ` Waiman Long
@ 2018-04-19 13:47   ` Waiman Long
  -1 siblings, 0 replies; 48+ messages in thread
From: Waiman Long @ 2018-04-19 13:47 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Waiman Long

The sched_load_balance flag is needed to enable CPU isolation similar
to what can be done with the "isolcpus" kernel boot parameter.

The sched_load_balance flag implies an implicit !cpu_exclusive as
it doesn't make sense to have an isolated CPU being load-balanced in
another cpuset.

For v2, this flag is hierarchical and is inherited by child cpusets. It
is not allowed to have this flag turn off in a parent cpuset, but on
in a child cpuset.

This flag is set by the parent and is not delegatable.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/cgroup-v2.txt | 22 ++++++++++++++++++
 kernel/cgroup/cpuset.c      | 56 +++++++++++++++++++++++++++++++++++++++------
 2 files changed, 71 insertions(+), 7 deletions(-)

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index ed8ec66..c970bd7 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -1514,6 +1514,28 @@ Cpuset Interface Files
 	it is a subset of "cpuset.mems".  Its value will be affected
 	by memory nodes hotplug events.
 
+  cpuset.sched_load_balance
+	A read-write single value file which exists on non-root cgroups.
+	The default is "1" (on), and the other possible value is "0"
+	(off).
+
+	When it is on, tasks within this cpuset will be load-balanced
+	by the kernel scheduler.  Tasks will be moved from CPUs with
+	high load to other CPUs within the same cpuset with less load
+	periodically.
+
+	When it is off, there will be no load balancing among CPUs on
+	this cgroup.  Tasks will stay in the CPUs they are running on
+	and will not be moved to other CPUs.
+
+	This flag is hierarchical and is inherited by child cpusets. It
+	can be turned off only when the CPUs in this cpuset aren't
+	listed in the cpuset.cpus of other sibling cgroups, and all
+	the child cpusets, if present, have this flag turned off.
+
+	Once it is off, it cannot be turned back on as long as the
+	parent cgroup still has this flag in the off state.
+
 
 Device controller
 -----------------
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 419b758..50c9254 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -407,15 +407,22 @@ static void cpuset_update_task_spread_flag(struct cpuset *cs,
  *
  * One cpuset is a subset of another if all its allowed CPUs and
  * Memory Nodes are a subset of the other, and its exclusive flags
- * are only set if the other's are set.  Call holding cpuset_mutex.
+ * are only set if the other's are set (on legacy hierarchy) or
+ * its sched_load_balance flag is only set if the other is set
+ * (on default hierarchy).  Caller holding cpuset_mutex.
  */
 
 static int is_cpuset_subset(const struct cpuset *p, const struct cpuset *q)
 {
-	return	cpumask_subset(p->cpus_allowed, q->cpus_allowed) &&
-		nodes_subset(p->mems_allowed, q->mems_allowed) &&
-		is_cpu_exclusive(p) <= is_cpu_exclusive(q) &&
-		is_mem_exclusive(p) <= is_mem_exclusive(q);
+	if (!cpumask_subset(p->cpus_allowed, q->cpus_allowed) ||
+	    !nodes_subset(p->mems_allowed, q->mems_allowed))
+		return false;
+
+	if (cgroup_subsys_on_dfl(cpuset_cgrp_subsys))
+		return is_sched_load_balance(p) <= is_sched_load_balance(q);
+	else
+		return is_cpu_exclusive(p) <= is_cpu_exclusive(q) &&
+		       is_mem_exclusive(p) <= is_mem_exclusive(q);
 }
 
 /**
@@ -498,7 +505,7 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial)
 
 	par = parent_cs(cur);
 
-	/* On legacy hiearchy, we must be a subset of our parent cpuset. */
+	/* On legacy hierarchy, we must be a subset of our parent cpuset. */
 	ret = -EACCES;
 	if (!is_in_v2_mode() && !is_cpuset_subset(trial, par))
 		goto out;
@@ -1327,6 +1334,19 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
 	else
 		clear_bit(bit, &trialcs->flags);
 
+	/*
+	 * On default hierarchy, turning off sched_load_balance flag implies
+	 * an implicit cpu_exclusive. Turning on sched_load_balance will
+	 * clear the cpu_exclusive flag.
+	 */
+	if ((bit == CS_SCHED_LOAD_BALANCE) &&
+	    cgroup_subsys_on_dfl(cpuset_cgrp_subsys)) {
+		if (turning_on)
+			clear_bit(CS_CPU_EXCLUSIVE, &trialcs->flags);
+		else
+			set_bit(CS_CPU_EXCLUSIVE, &trialcs->flags);
+	}
+
 	err = validate_change(cs, trialcs);
 	if (err < 0)
 		goto out;
@@ -1966,6 +1986,14 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
 		.flags = CFTYPE_NOT_ON_ROOT,
 	},
 
+	{
+		.name = "sched_load_balance",
+		.read_u64 = cpuset_read_u64,
+		.write_u64 = cpuset_write_u64,
+		.private = FILE_SCHED_LOAD_BALANCE,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
+
 	{ }	/* terminate */
 };
 
@@ -1991,7 +2019,21 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
 	if (!alloc_cpumask_var(&cs->effective_cpus, GFP_KERNEL))
 		goto free_cpus;
 
-	set_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
+	/*
+	 * On default hierarchy, inherit parent's CS_SCHED_LOAD_BALANCE and
+	 * CS_CPU_EXCLUSIVE flag.
+	 */
+	if (cgroup_subsys_on_dfl(cpuset_cgrp_subsys)) {
+		struct cpuset *parent = css_cs(parent_css);
+
+		if (test_bit(CS_SCHED_LOAD_BALANCE, &parent->flags))
+			set_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
+		else
+			set_bit(CS_CPU_EXCLUSIVE, &cs->flags);
+	} else {
+		set_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
+	}
+
 	cpumask_clear(cs->cpus_allowed);
 	nodes_clear(cs->mems_allowed);
 	cpumask_clear(cs->effective_cpus);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 2/5] cpuset: Add cpuset.sched_load_balance to v2
@ 2018-04-19 13:47   ` Waiman Long
  0 siblings, 0 replies; 48+ messages in thread
From: Waiman Long @ 2018-04-19 13:47 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Waiman Long

The sched_load_balance flag is needed to enable CPU isolation similar
to what can be done with the "isolcpus" kernel boot parameter.

The sched_load_balance flag implies an implicit !cpu_exclusive as
it doesn't make sense to have an isolated CPU being load-balanced in
another cpuset.

For v2, this flag is hierarchical and is inherited by child cpusets. It
is not allowed to have this flag turn off in a parent cpuset, but on
in a child cpuset.

This flag is set by the parent and is not delegatable.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/cgroup-v2.txt | 22 ++++++++++++++++++
 kernel/cgroup/cpuset.c      | 56 +++++++++++++++++++++++++++++++++++++++------
 2 files changed, 71 insertions(+), 7 deletions(-)

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index ed8ec66..c970bd7 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -1514,6 +1514,28 @@ Cpuset Interface Files
 	it is a subset of "cpuset.mems".  Its value will be affected
 	by memory nodes hotplug events.
 
+  cpuset.sched_load_balance
+	A read-write single value file which exists on non-root cgroups.
+	The default is "1" (on), and the other possible value is "0"
+	(off).
+
+	When it is on, tasks within this cpuset will be load-balanced
+	by the kernel scheduler.  Tasks will be moved from CPUs with
+	high load to other CPUs within the same cpuset with less load
+	periodically.
+
+	When it is off, there will be no load balancing among CPUs on
+	this cgroup.  Tasks will stay in the CPUs they are running on
+	and will not be moved to other CPUs.
+
+	This flag is hierarchical and is inherited by child cpusets. It
+	can be turned off only when the CPUs in this cpuset aren't
+	listed in the cpuset.cpus of other sibling cgroups, and all
+	the child cpusets, if present, have this flag turned off.
+
+	Once it is off, it cannot be turned back on as long as the
+	parent cgroup still has this flag in the off state.
+
 
 Device controller
 -----------------
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 419b758..50c9254 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -407,15 +407,22 @@ static void cpuset_update_task_spread_flag(struct cpuset *cs,
  *
  * One cpuset is a subset of another if all its allowed CPUs and
  * Memory Nodes are a subset of the other, and its exclusive flags
- * are only set if the other's are set.  Call holding cpuset_mutex.
+ * are only set if the other's are set (on legacy hierarchy) or
+ * its sched_load_balance flag is only set if the other is set
+ * (on default hierarchy).  Caller holding cpuset_mutex.
  */
 
 static int is_cpuset_subset(const struct cpuset *p, const struct cpuset *q)
 {
-	return	cpumask_subset(p->cpus_allowed, q->cpus_allowed) &&
-		nodes_subset(p->mems_allowed, q->mems_allowed) &&
-		is_cpu_exclusive(p) <= is_cpu_exclusive(q) &&
-		is_mem_exclusive(p) <= is_mem_exclusive(q);
+	if (!cpumask_subset(p->cpus_allowed, q->cpus_allowed) ||
+	    !nodes_subset(p->mems_allowed, q->mems_allowed))
+		return false;
+
+	if (cgroup_subsys_on_dfl(cpuset_cgrp_subsys))
+		return is_sched_load_balance(p) <= is_sched_load_balance(q);
+	else
+		return is_cpu_exclusive(p) <= is_cpu_exclusive(q) &&
+		       is_mem_exclusive(p) <= is_mem_exclusive(q);
 }
 
 /**
@@ -498,7 +505,7 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial)
 
 	par = parent_cs(cur);
 
-	/* On legacy hiearchy, we must be a subset of our parent cpuset. */
+	/* On legacy hierarchy, we must be a subset of our parent cpuset. */
 	ret = -EACCES;
 	if (!is_in_v2_mode() && !is_cpuset_subset(trial, par))
 		goto out;
@@ -1327,6 +1334,19 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
 	else
 		clear_bit(bit, &trialcs->flags);
 
+	/*
+	 * On default hierarchy, turning off sched_load_balance flag implies
+	 * an implicit cpu_exclusive. Turning on sched_load_balance will
+	 * clear the cpu_exclusive flag.
+	 */
+	if ((bit == CS_SCHED_LOAD_BALANCE) &&
+	    cgroup_subsys_on_dfl(cpuset_cgrp_subsys)) {
+		if (turning_on)
+			clear_bit(CS_CPU_EXCLUSIVE, &trialcs->flags);
+		else
+			set_bit(CS_CPU_EXCLUSIVE, &trialcs->flags);
+	}
+
 	err = validate_change(cs, trialcs);
 	if (err < 0)
 		goto out;
@@ -1966,6 +1986,14 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
 		.flags = CFTYPE_NOT_ON_ROOT,
 	},
 
+	{
+		.name = "sched_load_balance",
+		.read_u64 = cpuset_read_u64,
+		.write_u64 = cpuset_write_u64,
+		.private = FILE_SCHED_LOAD_BALANCE,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
+
 	{ }	/* terminate */
 };
 
@@ -1991,7 +2019,21 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
 	if (!alloc_cpumask_var(&cs->effective_cpus, GFP_KERNEL))
 		goto free_cpus;
 
-	set_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
+	/*
+	 * On default hierarchy, inherit parent's CS_SCHED_LOAD_BALANCE and
+	 * CS_CPU_EXCLUSIVE flag.
+	 */
+	if (cgroup_subsys_on_dfl(cpuset_cgrp_subsys)) {
+		struct cpuset *parent = css_cs(parent_css);
+
+		if (test_bit(CS_SCHED_LOAD_BALANCE, &parent->flags))
+			set_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
+		else
+			set_bit(CS_CPU_EXCLUSIVE, &cs->flags);
+	} else {
+		set_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
+	}
+
 	cpumask_clear(cs->cpus_allowed);
 	nodes_clear(cs->mems_allowed);
 	cpumask_clear(cs->effective_cpus);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 3/5] cpuset: Add a root-only cpus.isolated v2 control file
  2018-04-19 13:46 ` Waiman Long
@ 2018-04-19 13:47   ` Waiman Long
  -1 siblings, 0 replies; 48+ messages in thread
From: Waiman Long @ 2018-04-19 13:47 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Waiman Long

In order to better support CPU isolation as well as multiple root
domains for deadline scheduling, the ability to carve out a set of CPUs
specifically for isolation or for another root domain will be useful.

A new root-only "cpuset.cpus.isolated" control file is now added for
holding the list of CPUs that will not be participating in load balancing
within the root cpuset. The root's effective cpu list will not contain
any CPUs that are in "cpuset.cpus.isolated" file.  These isolated CPUs,
however, can still be put into child cpusets and load balanced within
them if necessary.

For CPU isolation, putting the CPUs into this new control file and not
having them in any of the child cpusets should be enough. Those isolated
CPUs can also be put into a child cpuset with load balancing disabled
for finer-grained control.

For creating additional root domains for scheduling, a child cpuset
should only select an exclusive set of CPUs within the isolated set.

The "cpuset.cpus.isolated" control file should be set up before
any child cpusets are created. If child cpusets are present, changes
to this control file will not be allowed if any CPUs that will change
state are in any of the child cpusets.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/cgroup-v2.txt |  25 ++++++++++
 kernel/cgroup/cpuset.c      | 119 +++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 143 insertions(+), 1 deletion(-)

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index c970bd7..8d89dc2 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -1484,6 +1484,31 @@ Cpuset Interface Files
 	a subset of "cpuset.cpus".  Its value will be affected by CPU
 	hotplug events.
 
+  cpuset.cpus.isolated
+	A read-write multiple values file which exists on root cgroup
+	only.
+
+	It lists the CPUs that have been withdrawn from the root cgroup
+	for load balancing.  These CPUs can still be allocated to child
+	cpusets with load balancing enabled, if necessary.
+
+	If a child cpuset contains only an exclusive set of CPUs that are
+	a subset of the isolated CPUs and with load balancing enabled,
+	these CPUs will be load balanced on a separate root domain from
+	the one in the root cgroup.
+
+	Just putting the CPUs into "cpuset.cpus.isolated" will be
+	enough to disable load balancing on those CPUs as long as they
+	do not appear in a child cpuset with load balancing enabled.
+	Fine-grained control of cpu isolation can also be done by
+	putting these isolated CPUs into child cpusets with load
+	balancing disabled.
+
+	The "cpuset.cpus.isolated" should be set up before child
+	cpusets are created.  Once child cpusets are present, changes
+	to "cpuset.cpus.isolated" will not be allowed if the CPUs that
+	change their states are in any of the child cpusets.
+
   cpuset.mems
 	A read-write multiple values file which exists on non-root
 	cgroups.
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 50c9254..c746b18 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -109,6 +109,9 @@ struct cpuset {
 	cpumask_var_t effective_cpus;
 	nodemask_t effective_mems;
 
+	/* Isolated CPUs - root cpuset only */
+	cpumask_var_t isolated_cpus;
+
 	/*
 	 * This is old Memory Nodes tasks took on.
 	 *
@@ -134,6 +137,9 @@ struct cpuset {
 
 	/* for custom sched domain */
 	int relax_domain_level;
+
+	/* for isolated_cpus */
+	int isolation_count;
 };
 
 static inline struct cpuset *css_cs(struct cgroup_subsys_state *css)
@@ -909,7 +915,19 @@ static void update_cpumasks_hier(struct cpuset *cs, struct cpumask *new_cpus)
 	cpuset_for_each_descendant_pre(cp, pos_css, cs) {
 		struct cpuset *parent = parent_cs(cp);
 
-		cpumask_and(new_cpus, cp->cpus_allowed, parent->effective_cpus);
+		/*
+		 * If parent has isolated CPUs, include them in the list
+		 * of allowable CPUs.
+		 */
+		if (parent->isolation_count) {
+			cpumask_or(new_cpus, parent->effective_cpus,
+				   parent->isolated_cpus);
+			cpumask_and(new_cpus, new_cpus, cpu_online_mask);
+			cpumask_and(new_cpus, new_cpus, cp->cpus_allowed);
+		} else {
+			cpumask_and(new_cpus, cp->cpus_allowed,
+				    parent->effective_cpus);
+		}
 
 		/*
 		 * If it becomes empty, inherit the effective mask of the
@@ -1004,6 +1022,85 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
 	return 0;
 }
 
+/**
+ * update_isolated_cpumask - update the isolated_cpus mask of the top cpuset
+ * @buf: buffer of cpu numbers written to this cpuset
+ *
+ * Changes to the isolated CPUs are not allowed if any of CPUs changing
+ * state are in any of the child cpusets. Called with cpuset_mutex held.
+ */
+static int update_isolated_cpumask(const char *buf)
+{
+	int retval;
+	int adding, deleting;
+	cpumask_var_t addmask, delmask;
+	struct cpuset *child;
+	struct cgroup_subsys_state *pos_css;
+
+	if (!alloc_cpumask_var(&addmask, GFP_KERNEL))
+		return -ENOMEM;
+	if (!alloc_cpumask_var(&delmask, GFP_KERNEL)) {
+		free_cpumask_var(addmask);
+		return -ENOMEM;
+	}
+	retval = cpulist_parse(buf, addmask);
+	if (retval)
+		goto out;
+
+	retval = -EINVAL;
+	if (!cpumask_subset(addmask, top_cpuset.cpus_allowed))
+		goto out;
+
+	retval = -EBUSY;
+	deleting = cpumask_andnot(delmask, top_cpuset.isolated_cpus, addmask);
+	adding   = cpumask_andnot(addmask, addmask, top_cpuset.isolated_cpus);
+
+	if (!adding && !deleting)
+		goto out_ok;
+
+	/*
+	 * Check if any CPUs in addmask or delmask are in a child cpuset.
+	 * An empty child cpus_allowed means it is the same as parent's
+	 * effective_cpus.
+	 */
+	cpuset_for_each_child(child, pos_css, &top_cpuset) {
+		if (cpumask_empty(child->cpus_allowed))
+			goto out;
+		if (adding && cpumask_intersects(child->cpus_allowed, addmask))
+			goto out;
+		if (deleting &&
+		    cpumask_intersects(child->cpus_allowed, delmask))
+			goto out;
+	}
+
+	/*
+	 * Change the isolated CPU list.
+	 * Newly added isolated CPUs will be removed from effective_cpus
+	 * and newly deleted ones will be added back if they are online.
+	 */
+	spin_lock_irq(&callback_lock);
+	if (adding)
+		cpumask_or(top_cpuset.isolated_cpus,
+			   top_cpuset.isolated_cpus, addmask);
+
+	if (deleting)
+		cpumask_andnot(top_cpuset.isolated_cpus,
+			       top_cpuset.isolated_cpus, delmask);
+
+	cpumask_andnot(top_cpuset.effective_cpus, cpu_online_mask,
+		       top_cpuset.isolated_cpus);
+
+	top_cpuset.isolation_count = cpumask_weight(top_cpuset.isolated_cpus);
+	spin_unlock_irq(&callback_lock);
+
+out_ok:
+	retval = 0;
+out:
+	free_cpumask_var(addmask);
+	free_cpumask_var(delmask);
+	return retval;
+}
+
 /*
  * Migrate memory region from one set of nodes to another.  This is
  * performed asynchronously as it can be called from process migration path
@@ -1612,6 +1709,7 @@ static void cpuset_attach(struct cgroup_taskset *tset)
 	FILE_MEMLIST,
 	FILE_EFFECTIVE_CPULIST,
 	FILE_EFFECTIVE_MEMLIST,
+	FILE_ISOLATED_CPULIST,
 	FILE_CPU_EXCLUSIVE,
 	FILE_MEM_EXCLUSIVE,
 	FILE_MEM_HARDWALL,
@@ -1733,6 +1831,12 @@ static ssize_t cpuset_write_resmask(struct kernfs_open_file *of,
 	if (!is_cpuset_online(cs))
 		goto out_unlock;
 
+	if (of_cft(of)->private == FILE_ISOLATED_CPULIST) {
+		WARN_ON_ONCE(cs != &top_cpuset);
+		retval = update_isolated_cpumask(buf);
+		goto out_unlock;
+	}
+
 	trialcs = alloc_trial_cpuset(cs);
 	if (!trialcs) {
 		retval = -ENOMEM;
@@ -1789,6 +1893,9 @@ static int cpuset_common_seq_show(struct seq_file *sf, void *v)
 	case FILE_EFFECTIVE_MEMLIST:
 		seq_printf(sf, "%*pbl\n", nodemask_pr_args(&cs->effective_mems));
 		break;
+	case FILE_ISOLATED_CPULIST:	/* Root only */
+		seq_printf(sf, "%*pbl\n", cpumask_pr_args(cs->isolated_cpus));
+		break;
 	default:
 		ret = -EINVAL;
 	}
@@ -1994,6 +2101,15 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
 		.flags = CFTYPE_NOT_ON_ROOT,
 	},
 
+	{
+		.name = "cpus.isolated",
+		.seq_show = cpuset_common_seq_show,
+		.write = cpuset_write_resmask,
+		.max_write_len = (100U + 6 * NR_CPUS),
+		.private = FILE_ISOLATED_CPULIST,
+		.flags = CFTYPE_ONLY_ON_ROOT,
+	},
+
 	{ }	/* terminate */
 };
 
@@ -2204,6 +2320,7 @@ int __init cpuset_init(void)
 
 	BUG_ON(!alloc_cpumask_var(&top_cpuset.cpus_allowed, GFP_KERNEL));
 	BUG_ON(!alloc_cpumask_var(&top_cpuset.effective_cpus, GFP_KERNEL));
+	BUG_ON(!zalloc_cpumask_var(&top_cpuset.isolated_cpus, GFP_KERNEL));
 
 	cpumask_setall(top_cpuset.cpus_allowed);
 	nodes_setall(top_cpuset.mems_allowed);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 3/5] cpuset: Add a root-only cpus.isolated v2 control file
@ 2018-04-19 13:47   ` Waiman Long
  0 siblings, 0 replies; 48+ messages in thread
From: Waiman Long @ 2018-04-19 13:47 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Waiman Long

In order to better support CPU isolation as well as multiple root
domains for deadline scheduling, the ability to carve out a set of CPUs
specifically for isolation or for another root domain will be useful.

A new root-only "cpuset.cpus.isolated" control file is now added for
holding the list of CPUs that will not be participating in load balancing
within the root cpuset. The root's effective cpu list will not contain
any CPUs that are in "cpuset.cpus.isolated" file.  These isolated CPUs,
however, can still be put into child cpusets and load balanced within
them if necessary.

For CPU isolation, putting the CPUs into this new control file and not
having them in any of the child cpusets should be enough. Those isolated
CPUs can also be put into a child cpuset with load balancing disabled
for finer-grained control.

For creating additional root domains for scheduling, a child cpuset
should only select an exclusive set of CPUs within the isolated set.

The "cpuset.cpus.isolated" control file should be set up before
any child cpusets are created. If child cpusets are present, changes
to this control file will not be allowed if any CPUs that will change
state are in any of the child cpusets.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/cgroup-v2.txt |  25 ++++++++++
 kernel/cgroup/cpuset.c      | 119 +++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 143 insertions(+), 1 deletion(-)

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index c970bd7..8d89dc2 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -1484,6 +1484,31 @@ Cpuset Interface Files
 	a subset of "cpuset.cpus".  Its value will be affected by CPU
 	hotplug events.
 
+  cpuset.cpus.isolated
+	A read-write multiple values file which exists on root cgroup
+	only.
+
+	It lists the CPUs that have been withdrawn from the root cgroup
+	for load balancing.  These CPUs can still be allocated to child
+	cpusets with load balancing enabled, if necessary.
+
+	If a child cpuset contains only an exclusive set of CPUs that are
+	a subset of the isolated CPUs and with load balancing enabled,
+	these CPUs will be load balanced on a separate root domain from
+	the one in the root cgroup.
+
+	Just putting the CPUs into "cpuset.cpus.isolated" will be
+	enough to disable load balancing on those CPUs as long as they
+	do not appear in a child cpuset with load balancing enabled.
+	Fine-grained control of cpu isolation can also be done by
+	putting these isolated CPUs into child cpusets with load
+	balancing disabled.
+
+	The "cpuset.cpus.isolated" should be set up before child
+	cpusets are created.  Once child cpusets are present, changes
+	to "cpuset.cpus.isolated" will not be allowed if the CPUs that
+	change their states are in any of the child cpusets.
+
   cpuset.mems
 	A read-write multiple values file which exists on non-root
 	cgroups.
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 50c9254..c746b18 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -109,6 +109,9 @@ struct cpuset {
 	cpumask_var_t effective_cpus;
 	nodemask_t effective_mems;
 
+	/* Isolated CPUs - root cpuset only */
+	cpumask_var_t isolated_cpus;
+
 	/*
 	 * This is old Memory Nodes tasks took on.
 	 *
@@ -134,6 +137,9 @@ struct cpuset {
 
 	/* for custom sched domain */
 	int relax_domain_level;
+
+	/* for isolated_cpus */
+	int isolation_count;
 };
 
 static inline struct cpuset *css_cs(struct cgroup_subsys_state *css)
@@ -909,7 +915,19 @@ static void update_cpumasks_hier(struct cpuset *cs, struct cpumask *new_cpus)
 	cpuset_for_each_descendant_pre(cp, pos_css, cs) {
 		struct cpuset *parent = parent_cs(cp);
 
-		cpumask_and(new_cpus, cp->cpus_allowed, parent->effective_cpus);
+		/*
+		 * If parent has isolated CPUs, include them in the list
+		 * of allowable CPUs.
+		 */
+		if (parent->isolation_count) {
+			cpumask_or(new_cpus, parent->effective_cpus,
+				   parent->isolated_cpus);
+			cpumask_and(new_cpus, new_cpus, cpu_online_mask);
+			cpumask_and(new_cpus, new_cpus, cp->cpus_allowed);
+		} else {
+			cpumask_and(new_cpus, cp->cpus_allowed,
+				    parent->effective_cpus);
+		}
 
 		/*
 		 * If it becomes empty, inherit the effective mask of the
@@ -1004,6 +1022,85 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
 	return 0;
 }
 
+/**
+ * update_isolated_cpumask - update the isolated_cpus mask of the top cpuset
+ * @buf: buffer of cpu numbers written to this cpuset
+ *
+ * Changes to the isolated CPUs are not allowed if any of CPUs changing
+ * state are in any of the child cpusets. Called with cpuset_mutex held.
+ */
+static int update_isolated_cpumask(const char *buf)
+{
+	int retval;
+	int adding, deleting;
+	cpumask_var_t addmask, delmask;
+	struct cpuset *child;
+	struct cgroup_subsys_state *pos_css;
+
+	if (!alloc_cpumask_var(&addmask, GFP_KERNEL))
+		return -ENOMEM;
+	if (!alloc_cpumask_var(&delmask, GFP_KERNEL)) {
+		free_cpumask_var(addmask);
+		return -ENOMEM;
+	}
+	retval = cpulist_parse(buf, addmask);
+	if (retval)
+		goto out;
+
+	retval = -EINVAL;
+	if (!cpumask_subset(addmask, top_cpuset.cpus_allowed))
+		goto out;
+
+	retval = -EBUSY;
+	deleting = cpumask_andnot(delmask, top_cpuset.isolated_cpus, addmask);
+	adding   = cpumask_andnot(addmask, addmask, top_cpuset.isolated_cpus);
+
+	if (!adding && !deleting)
+		goto out_ok;
+
+	/*
+	 * Check if any CPUs in addmask or delmask are in a child cpuset.
+	 * An empty child cpus_allowed means it is the same as parent's
+	 * effective_cpus.
+	 */
+	cpuset_for_each_child(child, pos_css, &top_cpuset) {
+		if (cpumask_empty(child->cpus_allowed))
+			goto out;
+		if (adding && cpumask_intersects(child->cpus_allowed, addmask))
+			goto out;
+		if (deleting &&
+		    cpumask_intersects(child->cpus_allowed, delmask))
+			goto out;
+	}
+
+	/*
+	 * Change the isolated CPU list.
+	 * Newly added isolated CPUs will be removed from effective_cpus
+	 * and newly deleted ones will be added back if they are online.
+	 */
+	spin_lock_irq(&callback_lock);
+	if (adding)
+		cpumask_or(top_cpuset.isolated_cpus,
+			   top_cpuset.isolated_cpus, addmask);
+
+	if (deleting)
+		cpumask_andnot(top_cpuset.isolated_cpus,
+			       top_cpuset.isolated_cpus, delmask);
+
+	cpumask_andnot(top_cpuset.effective_cpus, cpu_online_mask,
+		       top_cpuset.isolated_cpus);
+
+	top_cpuset.isolation_count = cpumask_weight(top_cpuset.isolated_cpus);
+	spin_unlock_irq(&callback_lock);
+
+out_ok:
+	retval = 0;
+out:
+	free_cpumask_var(addmask);
+	free_cpumask_var(delmask);
+	return retval;
+}
+
 /*
  * Migrate memory region from one set of nodes to another.  This is
  * performed asynchronously as it can be called from process migration path
@@ -1612,6 +1709,7 @@ static void cpuset_attach(struct cgroup_taskset *tset)
 	FILE_MEMLIST,
 	FILE_EFFECTIVE_CPULIST,
 	FILE_EFFECTIVE_MEMLIST,
+	FILE_ISOLATED_CPULIST,
 	FILE_CPU_EXCLUSIVE,
 	FILE_MEM_EXCLUSIVE,
 	FILE_MEM_HARDWALL,
@@ -1733,6 +1831,12 @@ static ssize_t cpuset_write_resmask(struct kernfs_open_file *of,
 	if (!is_cpuset_online(cs))
 		goto out_unlock;
 
+	if (of_cft(of)->private == FILE_ISOLATED_CPULIST) {
+		WARN_ON_ONCE(cs != &top_cpuset);
+		retval = update_isolated_cpumask(buf);
+		goto out_unlock;
+	}
+
 	trialcs = alloc_trial_cpuset(cs);
 	if (!trialcs) {
 		retval = -ENOMEM;
@@ -1789,6 +1893,9 @@ static int cpuset_common_seq_show(struct seq_file *sf, void *v)
 	case FILE_EFFECTIVE_MEMLIST:
 		seq_printf(sf, "%*pbl\n", nodemask_pr_args(&cs->effective_mems));
 		break;
+	case FILE_ISOLATED_CPULIST:	/* Root only */
+		seq_printf(sf, "%*pbl\n", cpumask_pr_args(cs->isolated_cpus));
+		break;
 	default:
 		ret = -EINVAL;
 	}
@@ -1994,6 +2101,15 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
 		.flags = CFTYPE_NOT_ON_ROOT,
 	},
 
+	{
+		.name = "cpus.isolated",
+		.seq_show = cpuset_common_seq_show,
+		.write = cpuset_write_resmask,
+		.max_write_len = (100U + 6 * NR_CPUS),
+		.private = FILE_ISOLATED_CPULIST,
+		.flags = CFTYPE_ONLY_ON_ROOT,
+	},
+
 	{ }	/* terminate */
 };
 
@@ -2204,6 +2320,7 @@ int __init cpuset_init(void)
 
 	BUG_ON(!alloc_cpumask_var(&top_cpuset.cpus_allowed, GFP_KERNEL));
 	BUG_ON(!alloc_cpumask_var(&top_cpuset.effective_cpus, GFP_KERNEL));
+	BUG_ON(!zalloc_cpumask_var(&top_cpuset.isolated_cpus, GFP_KERNEL));
 
 	cpumask_setall(top_cpuset.cpus_allowed);
 	nodes_setall(top_cpuset.mems_allowed);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 4/5] cpuset: Restrict load balancing off cpus to subset of cpus.isolated
  2018-04-19 13:46 ` Waiman Long
@ 2018-04-19 13:47   ` Waiman Long
  -1 siblings, 0 replies; 48+ messages in thread
From: Waiman Long @ 2018-04-19 13:47 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Waiman Long

With the addition of "cpuset.cpus.isolated", it makes sense to add the
restriction that load balancing can only be turned off if the CPUs in
the isolated cpuset are subset of "cpuset.cpus.isolated".

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/cgroup-v2.txt |  7 ++++---
 kernel/cgroup/cpuset.c      | 29 ++++++++++++++++++++++++++---
 2 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index 8d89dc2..c4227ee 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -1554,9 +1554,10 @@ Cpuset Interface Files
 	and will not be moved to other CPUs.
 
 	This flag is hierarchical and is inherited by child cpusets. It
-	can be turned off only when the CPUs in this cpuset aren't
-	listed in the cpuset.cpus of other sibling cgroups, and all
-	the child cpusets, if present, have this flag turned off.
+	can be explicitly turned off only when it is a direct child of
+	the root cgroup and the CPUs in this cpuset are subset of the
+	root's "cpuset.cpus.isolated".	Moreover, the CPUs cannot be
+	listed in the "cpuset.cpus" of other sibling cgroups.
 
 	Once it is off, it cannot be turned back on as long as the
 	parent cgroup still has this flag in the off state.
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index c746b18..d05c4c8 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -511,6 +511,16 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial)
 
 	par = parent_cs(cur);
 
+	/*
+	 * On default hierarchy with sched_load_balance flag off, the cpu
+	 * list must be a subset of the parent's isolated CPU list, if
+	 * defined (root).
+	 */
+	if (cgroup_subsys_on_dfl(cpuset_cgrp_subsys) &&
+	    !is_sched_load_balance(trial) && par->isolation_count &&
+	    !cpumask_subset(trial->cpus_allowed, par->isolated_cpus))
+		goto out;
+
 	/* On legacy hierarchy, we must be a subset of our parent cpuset. */
 	ret = -EACCES;
 	if (!is_in_v2_mode() && !is_cpuset_subset(trial, par))
@@ -1431,10 +1441,16 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
 	else
 		clear_bit(bit, &trialcs->flags);
 
+	balance_flag_changed = (is_sched_load_balance(cs) !=
+				is_sched_load_balance(trialcs));
+
 	/*
 	 * On default hierarchy, turning off sched_load_balance flag implies
 	 * an implicit cpu_exclusive. Turning on sched_load_balance will
 	 * clear the cpu_exclusive flag.
+	 *
+	 * sched_load_balance can only be turned off if all the CPUs are
+	 * in the parent's isolated CPU list.
 	 */
 	if ((bit == CS_SCHED_LOAD_BALANCE) &&
 	    cgroup_subsys_on_dfl(cpuset_cgrp_subsys)) {
@@ -1442,15 +1458,22 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
 			clear_bit(CS_CPU_EXCLUSIVE, &trialcs->flags);
 		else
 			set_bit(CS_CPU_EXCLUSIVE, &trialcs->flags);
+
+		if (balance_flag_changed && !turning_on) {
+			struct cpuset *parent = parent_cs(cs);
+
+			err = -EBUSY;
+			if (!parent->isolation_count ||
+			    !cpumask_subset(trialcs->cpus_allowed,
+					    parent->cpus_allowed))
+				goto out;
+		}
 	}
 
 	err = validate_change(cs, trialcs);
 	if (err < 0)
 		goto out;
 
-	balance_flag_changed = (is_sched_load_balance(cs) !=
-				is_sched_load_balance(trialcs));
-
 	spread_flag_changed = ((is_spread_slab(cs) != is_spread_slab(trialcs))
 			|| (is_spread_page(cs) != is_spread_page(trialcs)));
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 4/5] cpuset: Restrict load balancing off cpus to subset of cpus.isolated
@ 2018-04-19 13:47   ` Waiman Long
  0 siblings, 0 replies; 48+ messages in thread
From: Waiman Long @ 2018-04-19 13:47 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Waiman Long

With the addition of "cpuset.cpus.isolated", it makes sense to add the
restriction that load balancing can only be turned off if the CPUs in
the isolated cpuset are subset of "cpuset.cpus.isolated".

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/cgroup-v2.txt |  7 ++++---
 kernel/cgroup/cpuset.c      | 29 ++++++++++++++++++++++++++---
 2 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index 8d89dc2..c4227ee 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -1554,9 +1554,10 @@ Cpuset Interface Files
 	and will not be moved to other CPUs.
 
 	This flag is hierarchical and is inherited by child cpusets. It
-	can be turned off only when the CPUs in this cpuset aren't
-	listed in the cpuset.cpus of other sibling cgroups, and all
-	the child cpusets, if present, have this flag turned off.
+	can be explicitly turned off only when it is a direct child of
+	the root cgroup and the CPUs in this cpuset are subset of the
+	root's "cpuset.cpus.isolated".	Moreover, the CPUs cannot be
+	listed in the "cpuset.cpus" of other sibling cgroups.
 
 	Once it is off, it cannot be turned back on as long as the
 	parent cgroup still has this flag in the off state.
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index c746b18..d05c4c8 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -511,6 +511,16 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial)
 
 	par = parent_cs(cur);
 
+	/*
+	 * On default hierarchy with sched_load_balance flag off, the cpu
+	 * list must be a subset of the parent's isolated CPU list, if
+	 * defined (root).
+	 */
+	if (cgroup_subsys_on_dfl(cpuset_cgrp_subsys) &&
+	    !is_sched_load_balance(trial) && par->isolation_count &&
+	    !cpumask_subset(trial->cpus_allowed, par->isolated_cpus))
+		goto out;
+
 	/* On legacy hierarchy, we must be a subset of our parent cpuset. */
 	ret = -EACCES;
 	if (!is_in_v2_mode() && !is_cpuset_subset(trial, par))
@@ -1431,10 +1441,16 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
 	else
 		clear_bit(bit, &trialcs->flags);
 
+	balance_flag_changed = (is_sched_load_balance(cs) !=
+				is_sched_load_balance(trialcs));
+
 	/*
 	 * On default hierarchy, turning off sched_load_balance flag implies
 	 * an implicit cpu_exclusive. Turning on sched_load_balance will
 	 * clear the cpu_exclusive flag.
+	 *
+	 * sched_load_balance can only be turned off if all the CPUs are
+	 * in the parent's isolated CPU list.
 	 */
 	if ((bit == CS_SCHED_LOAD_BALANCE) &&
 	    cgroup_subsys_on_dfl(cpuset_cgrp_subsys)) {
@@ -1442,15 +1458,22 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
 			clear_bit(CS_CPU_EXCLUSIVE, &trialcs->flags);
 		else
 			set_bit(CS_CPU_EXCLUSIVE, &trialcs->flags);
+
+		if (balance_flag_changed && !turning_on) {
+			struct cpuset *parent = parent_cs(cs);
+
+			err = -EBUSY;
+			if (!parent->isolation_count ||
+			    !cpumask_subset(trialcs->cpus_allowed,
+					    parent->cpus_allowed))
+				goto out;
+		}
 	}
 
 	err = validate_change(cs, trialcs);
 	if (err < 0)
 		goto out;
 
-	balance_flag_changed = (is_sched_load_balance(cs) !=
-				is_sched_load_balance(trialcs));
-
 	spread_flag_changed = ((is_spread_slab(cs) != is_spread_slab(trialcs))
 			|| (is_spread_page(cs) != is_spread_page(trialcs)));
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 5/5] cpuset: Make generate_sched_domains() recognize isolated_cpus
  2018-04-19 13:46 ` Waiman Long
@ 2018-04-19 13:47   ` Waiman Long
  -1 siblings, 0 replies; 48+ messages in thread
From: Waiman Long @ 2018-04-19 13:47 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Waiman Long

The generate_sched_domains() function and the hotplug code are modified
to make them use the newly introduced isolated_cpus mask for schedule
domains generation.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/cpuset.c | 35 +++++++++++++++++++++++++++++++++--
 1 file changed, 33 insertions(+), 2 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index d05c4c8..a67c77a 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -683,13 +683,14 @@ static int generate_sched_domains(cpumask_var_t **domains,
 	int ndoms = 0;		/* number of sched domains in result */
 	int nslot;		/* next empty doms[] struct cpumask slot */
 	struct cgroup_subsys_state *pos_css;
+	bool root_load_balance = is_sched_load_balance(&top_cpuset);
 
 	doms = NULL;
 	dattr = NULL;
 	csa = NULL;
 
 	/* Special case for the 99% of systems with one, full, sched domain */
-	if (is_sched_load_balance(&top_cpuset)) {
+	if (root_load_balance && !top_cpuset.isolation_count) {
 		ndoms = 1;
 		doms = alloc_sched_domains(ndoms);
 		if (!doms)
@@ -712,6 +713,8 @@ static int generate_sched_domains(cpumask_var_t **domains,
 	csn = 0;
 
 	rcu_read_lock();
+	if (root_load_balance)
+		csa[csn++] = &top_cpuset;
 	cpuset_for_each_descendant_pre(cp, pos_css, &top_cpuset) {
 		if (cp == &top_cpuset)
 			continue;
@@ -722,6 +725,9 @@ static int generate_sched_domains(cpumask_var_t **domains,
 		 * parent's cpus, so just skip them, and then we call
 		 * update_domain_attr_tree() to calc relax_domain_level of
 		 * the corresponding sched domain.
+		 *
+		 * If root is load-balancing, we can skip @cp if it
+		 * is a subset of the root's effective_cpus.
 		 */
 		if (!cpumask_empty(cp->cpus_allowed) &&
 		    !(is_sched_load_balance(cp) &&
@@ -729,6 +735,10 @@ static int generate_sched_domains(cpumask_var_t **domains,
 					 housekeeping_cpumask(HK_FLAG_DOMAIN))))
 			continue;
 
+		if (root_load_balance &&
+		    cpumask_subset(cp->cpus_allowed, top_cpuset.effective_cpus))
+			continue;
+
 		if (is_sched_load_balance(cp))
 			csa[csn++] = cp;
 
@@ -820,6 +830,12 @@ static int generate_sched_domains(cpumask_var_t **domains,
 	}
 	BUG_ON(nslot != ndoms);
 
+#ifdef CONFIG_DEBUG_KERNEL
+	for (i = 0; i < ndoms; i++)
+		pr_info("rebuild_sched_domains dom %d: %*pbl\n", i,
+			cpumask_pr_args(doms[i]));
+#endif
+
 done:
 	kfree(csa);
 
@@ -860,7 +876,12 @@ static void rebuild_sched_domains_locked(void)
 	 * passing doms with offlined cpu to partition_sched_domains().
 	 * Anyways, hotplug work item will rebuild sched domains.
 	 */
-	if (!cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
+	if (!top_cpuset.isolation_count &&
+	    !cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
+		goto out;
+
+	if (top_cpuset.isolation_count &&
+	   !cpumask_subset(top_cpuset.effective_cpus, cpu_active_mask))
 		goto out;
 
 	/* Generate domain masks and attrs */
@@ -1102,6 +1123,7 @@ static int update_isolated_cpumask(const char *buf)
 
 	top_cpuset.isolation_count = cpumask_weight(top_cpuset.isolated_cpus);
 	spin_unlock_irq(&callback_lock);
+	rebuild_sched_domains_locked();
 
 out_ok:
 	retval = 0;
@@ -2530,6 +2552,11 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
 	cpumask_copy(&new_cpus, cpu_active_mask);
 	new_mems = node_states[N_MEMORY];
 
+	/*
+	 * If isolated_cpus is populated, it is likely that the check below
+	 * will produce a false positive on cpus_updated when the cpu list
+	 * isn't changed. It is extra work, but it is better to be safe.
+	 */
 	cpus_updated = !cpumask_equal(top_cpuset.effective_cpus, &new_cpus);
 	mems_updated = !nodes_equal(top_cpuset.effective_mems, new_mems);
 
@@ -2538,6 +2565,10 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
 		spin_lock_irq(&callback_lock);
 		if (!on_dfl)
 			cpumask_copy(top_cpuset.cpus_allowed, &new_cpus);
+
+		if (top_cpuset.isolation_count)
+			cpumask_andnot(&new_cpus, &new_cpus,
+					top_cpuset.isolated_cpus);
 		cpumask_copy(top_cpuset.effective_cpus, &new_cpus);
 		spin_unlock_irq(&callback_lock);
 		/* we don't mess with cpumasks of tasks in top_cpuset */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 5/5] cpuset: Make generate_sched_domains() recognize isolated_cpus
@ 2018-04-19 13:47   ` Waiman Long
  0 siblings, 0 replies; 48+ messages in thread
From: Waiman Long @ 2018-04-19 13:47 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Waiman Long

The generate_sched_domains() function and the hotplug code are modified
to make them use the newly introduced isolated_cpus mask for schedule
domains generation.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/cpuset.c | 35 +++++++++++++++++++++++++++++++++--
 1 file changed, 33 insertions(+), 2 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index d05c4c8..a67c77a 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -683,13 +683,14 @@ static int generate_sched_domains(cpumask_var_t **domains,
 	int ndoms = 0;		/* number of sched domains in result */
 	int nslot;		/* next empty doms[] struct cpumask slot */
 	struct cgroup_subsys_state *pos_css;
+	bool root_load_balance = is_sched_load_balance(&top_cpuset);
 
 	doms = NULL;
 	dattr = NULL;
 	csa = NULL;
 
 	/* Special case for the 99% of systems with one, full, sched domain */
-	if (is_sched_load_balance(&top_cpuset)) {
+	if (root_load_balance && !top_cpuset.isolation_count) {
 		ndoms = 1;
 		doms = alloc_sched_domains(ndoms);
 		if (!doms)
@@ -712,6 +713,8 @@ static int generate_sched_domains(cpumask_var_t **domains,
 	csn = 0;
 
 	rcu_read_lock();
+	if (root_load_balance)
+		csa[csn++] = &top_cpuset;
 	cpuset_for_each_descendant_pre(cp, pos_css, &top_cpuset) {
 		if (cp == &top_cpuset)
 			continue;
@@ -722,6 +725,9 @@ static int generate_sched_domains(cpumask_var_t **domains,
 		 * parent's cpus, so just skip them, and then we call
 		 * update_domain_attr_tree() to calc relax_domain_level of
 		 * the corresponding sched domain.
+		 *
+		 * If root is load-balancing, we can skip @cp if it
+		 * is a subset of the root's effective_cpus.
 		 */
 		if (!cpumask_empty(cp->cpus_allowed) &&
 		    !(is_sched_load_balance(cp) &&
@@ -729,6 +735,10 @@ static int generate_sched_domains(cpumask_var_t **domains,
 					 housekeeping_cpumask(HK_FLAG_DOMAIN))))
 			continue;
 
+		if (root_load_balance &&
+		    cpumask_subset(cp->cpus_allowed, top_cpuset.effective_cpus))
+			continue;
+
 		if (is_sched_load_balance(cp))
 			csa[csn++] = cp;
 
@@ -820,6 +830,12 @@ static int generate_sched_domains(cpumask_var_t **domains,
 	}
 	BUG_ON(nslot != ndoms);
 
+#ifdef CONFIG_DEBUG_KERNEL
+	for (i = 0; i < ndoms; i++)
+		pr_info("rebuild_sched_domains dom %d: %*pbl\n", i,
+			cpumask_pr_args(doms[i]));
+#endif
+
 done:
 	kfree(csa);
 
@@ -860,7 +876,12 @@ static void rebuild_sched_domains_locked(void)
 	 * passing doms with offlined cpu to partition_sched_domains().
 	 * Anyways, hotplug work item will rebuild sched domains.
 	 */
-	if (!cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
+	if (!top_cpuset.isolation_count &&
+	    !cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
+		goto out;
+
+	if (top_cpuset.isolation_count &&
+	   !cpumask_subset(top_cpuset.effective_cpus, cpu_active_mask))
 		goto out;
 
 	/* Generate domain masks and attrs */
@@ -1102,6 +1123,7 @@ static int update_isolated_cpumask(const char *buf)
 
 	top_cpuset.isolation_count = cpumask_weight(top_cpuset.isolated_cpus);
 	spin_unlock_irq(&callback_lock);
+	rebuild_sched_domains_locked();
 
 out_ok:
 	retval = 0;
@@ -2530,6 +2552,11 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
 	cpumask_copy(&new_cpus, cpu_active_mask);
 	new_mems = node_states[N_MEMORY];
 
+	/*
+	 * If isolated_cpus is populated, it is likely that the check below
+	 * will produce a false positive on cpus_updated when the cpu list
+	 * isn't changed. It is extra work, but it is better to be safe.
+	 */
 	cpus_updated = !cpumask_equal(top_cpuset.effective_cpus, &new_cpus);
 	mems_updated = !nodes_equal(top_cpuset.effective_mems, new_mems);
 
@@ -2538,6 +2565,10 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
 		spin_lock_irq(&callback_lock);
 		if (!on_dfl)
 			cpumask_copy(top_cpuset.cpus_allowed, &new_cpus);
+
+		if (top_cpuset.isolation_count)
+			cpumask_andnot(&new_cpus, &new_cpus,
+					top_cpuset.isolated_cpus);
 		cpumask_copy(top_cpuset.effective_cpus, &new_cpus);
 		spin_unlock_irq(&callback_lock);
 		/* we don't mess with cpumasks of tasks in top_cpuset */
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 0/5] cpuset: Enable cpuset controller in default hierarchy
  2018-04-19 13:46 ` Waiman Long
@ 2018-04-20  8:23   ` Mike Galbraith
  -1 siblings, 0 replies; 48+ messages in thread
From: Mike Galbraith @ 2018-04-20  8:23 UTC (permalink / raw)
  To: Waiman Long, Tejun Heo, Li Zefan, Johannes Weiner,
	Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	torvalds, Roman Gushchin, Juri Lelli

On Thu, 2018-04-19 at 09:46 -0400, Waiman Long wrote:
> v7:
>  - Add a root-only cpuset.cpus.isolated control file for CPU isolation.
>  - Enforce that load_balancing can only be turned off on cpusets with
>    CPUs from the isolated list.
>  - Update sched domain generation to allow cpusets with CPUs only
>    from the isolated CPU list to be in separate root domains.

I haven't done much, but was able to do a q/d manual build, populate
and teardown of system/critical sets on my desktop box, and it looked
ok.  Thanks for getting this aboard the v2 boat.

	-Mike

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 0/5] cpuset: Enable cpuset controller in default hierarchy
@ 2018-04-20  8:23   ` Mike Galbraith
  0 siblings, 0 replies; 48+ messages in thread
From: Mike Galbraith @ 2018-04-20  8:23 UTC (permalink / raw)
  To: Waiman Long, Tejun Heo, Li Zefan, Johannes Weiner,
	Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	torvalds, Roman Gushchin, Juri Lelli

On Thu, 2018-04-19 at 09:46 -0400, Waiman Long wrote:
> v7:
>  - Add a root-only cpuset.cpus.isolated control file for CPU isolation.
>  - Enforce that load_balancing can only be turned off on cpusets with
>    CPUs from the isolated list.
>  - Update sched domain generation to allow cpusets with CPUs only
>    from the isolated CPU list to be in separate root domains.

I haven't done much, but was able to do a q/d manual build, populate
and teardown of system/critical sets on my desktop box, and it looked
ok.  Thanks for getting this aboard the v2 boat.

	-Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 0/5] cpuset: Enable cpuset controller in default hierarchy
  2018-04-19 13:46 ` Waiman Long
@ 2018-04-23 13:07   ` Juri Lelli
  -1 siblings, 0 replies; 48+ messages in thread
From: Juri Lelli @ 2018-04-23 13:07 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra,
	Ingo Molnar, cgroups, linux-kernel, linux-doc, kernel-team, pjt,
	luto, Mike Galbraith, torvalds, Roman Gushchin

Hi Waiman,

On 19/04/18 09:46, Waiman Long wrote:
> v7:
>  - Add a root-only cpuset.cpus.isolated control file for CPU isolation.
>  - Enforce that load_balancing can only be turned off on cpusets with
>    CPUs from the isolated list.
>  - Update sched domain generation to allow cpusets with CPUs only
>    from the isolated CPU list to be in separate root domains.

Just got this while

# echo 2-3 > /sys/fs/cgroup/cpuset.cpus.isolated

[ 6679.177826] =============================
[ 6679.178385] WARNING: suspicious RCU usage
[ 6679.178910] 4.16.0-rc6+ #151 Not tainted
[ 6679.179459] -----------------------------
[ 6679.180082] /home/juri/work/kernel/linux/kernel/cgroup/cgroup.c:3826 cgroup_mutex or RCU read lock required!
[ 6679.181402]
[ 6679.181402] other info that might help us debug this:
[ 6679.181402]
[ 6679.182407]
[ 6679.182407] rcu_scheduler_active = 2, debug_locks = 1
[ 6679.183278] 3 locks held by bash/2205:
[ 6679.183785]  #0:  (sb_writers#10){.+.+}, at: [<000000004e577fb9>] vfs_write+0x18a/0x1b0
[ 6679.184871]  #1:  (&of->mutex){+.+.}, at: [<000000005944c83f>] kernfs_fop_write+0xe2/0x1a0
[ 6679.185987]  #2:  (cpuset_mutex){+.+.}, at: [<00000000879bfba0>] cpuset_write_resmask+0x72/0x1560
[ 6679.187112]
[ 6679.187112] stack backtrace:
[ 6679.187612] CPU: 3 PID: 2205 Comm: bash Not tainted 4.16.0-rc6+ #151
[ 6679.188318] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
[ 6679.189291] Call Trace:
[ 6679.189581]  dump_stack+0x85/0xc5
[ 6679.189963]  css_next_child+0x90/0xd0
[ 6679.190385]  cpuset_write_resmask+0x46f/0x1560
[ 6679.190885]  ? lock_acquire+0x9f/0x210
[ 6679.191315]  cgroup_file_write+0x94/0x230
[ 6679.191768]  kernfs_fop_write+0x113/0x1a0
[ 6679.192223]  __vfs_write+0x36/0x180
[ 6679.192617]  ? rcu_read_lock_sched_held+0x6b/0x80
[ 6679.193139]  ? rcu_sync_lockdep_assert+0x2e/0x60
[ 6679.193654]  ? __sb_start_write+0x154/0x1f0
[ 6679.194118]  ? __sb_start_write+0x16a/0x1f0
[ 6679.194607]  vfs_write+0xc1/0x1b0
[ 6679.194984]  SyS_write+0x55/0xc0
[ 6679.195365]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 6679.195839]  do_syscall_64+0x79/0x220
[ 6679.196212]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ 6679.196729] RIP: 0033:0x7f03183ff780
[ 6679.197138] RSP: 002b:00007ffeae336ca8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 6679.197866] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f03183ff780
[ 6679.198550] RDX: 0000000000000004 RSI: 0000000000eaf408 RDI: 0000000000000001
[ 6679.199235] RBP: 0000000000eaf408 R08: 000000000000000a R09: 00007f0318cff700
[ 6679.199928] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f03186b57a0
[ 6679.200615] R13: 0000000000000004 R14: 0000000000000001 R15: 0000000000000000
[ 6679.201369] rebuild_sched_domains dom 0: 0-1
[ 6679.202196] span: 0-1 (max cpu_capacity = 1024)

Guess we should grab either lock from the writing path.

Best,

- Juri

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 0/5] cpuset: Enable cpuset controller in default hierarchy
@ 2018-04-23 13:07   ` Juri Lelli
  0 siblings, 0 replies; 48+ messages in thread
From: Juri Lelli @ 2018-04-23 13:07 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra,
	Ingo Molnar, cgroups, linux-kernel, linux-doc, kernel-team, pjt,
	luto, Mike Galbraith, torvalds, Roman Gushchin

Hi Waiman,

On 19/04/18 09:46, Waiman Long wrote:
> v7:
>  - Add a root-only cpuset.cpus.isolated control file for CPU isolation.
>  - Enforce that load_balancing can only be turned off on cpusets with
>    CPUs from the isolated list.
>  - Update sched domain generation to allow cpusets with CPUs only
>    from the isolated CPU list to be in separate root domains.

Just got this while

# echo 2-3 > /sys/fs/cgroup/cpuset.cpus.isolated

[ 6679.177826] =============================
[ 6679.178385] WARNING: suspicious RCU usage
[ 6679.178910] 4.16.0-rc6+ #151 Not tainted
[ 6679.179459] -----------------------------
[ 6679.180082] /home/juri/work/kernel/linux/kernel/cgroup/cgroup.c:3826 cgroup_mutex or RCU read lock required!
[ 6679.181402]
[ 6679.181402] other info that might help us debug this:
[ 6679.181402]
[ 6679.182407]
[ 6679.182407] rcu_scheduler_active = 2, debug_locks = 1
[ 6679.183278] 3 locks held by bash/2205:
[ 6679.183785]  #0:  (sb_writers#10){.+.+}, at: [<000000004e577fb9>] vfs_write+0x18a/0x1b0
[ 6679.184871]  #1:  (&of->mutex){+.+.}, at: [<000000005944c83f>] kernfs_fop_write+0xe2/0x1a0
[ 6679.185987]  #2:  (cpuset_mutex){+.+.}, at: [<00000000879bfba0>] cpuset_write_resmask+0x72/0x1560
[ 6679.187112]
[ 6679.187112] stack backtrace:
[ 6679.187612] CPU: 3 PID: 2205 Comm: bash Not tainted 4.16.0-rc6+ #151
[ 6679.188318] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
[ 6679.189291] Call Trace:
[ 6679.189581]  dump_stack+0x85/0xc5
[ 6679.189963]  css_next_child+0x90/0xd0
[ 6679.190385]  cpuset_write_resmask+0x46f/0x1560
[ 6679.190885]  ? lock_acquire+0x9f/0x210
[ 6679.191315]  cgroup_file_write+0x94/0x230
[ 6679.191768]  kernfs_fop_write+0x113/0x1a0
[ 6679.192223]  __vfs_write+0x36/0x180
[ 6679.192617]  ? rcu_read_lock_sched_held+0x6b/0x80
[ 6679.193139]  ? rcu_sync_lockdep_assert+0x2e/0x60
[ 6679.193654]  ? __sb_start_write+0x154/0x1f0
[ 6679.194118]  ? __sb_start_write+0x16a/0x1f0
[ 6679.194607]  vfs_write+0xc1/0x1b0
[ 6679.194984]  SyS_write+0x55/0xc0
[ 6679.195365]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 6679.195839]  do_syscall_64+0x79/0x220
[ 6679.196212]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ 6679.196729] RIP: 0033:0x7f03183ff780
[ 6679.197138] RSP: 002b:00007ffeae336ca8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 6679.197866] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f03183ff780
[ 6679.198550] RDX: 0000000000000004 RSI: 0000000000eaf408 RDI: 0000000000000001
[ 6679.199235] RBP: 0000000000eaf408 R08: 000000000000000a R09: 00007f0318cff700
[ 6679.199928] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f03186b57a0
[ 6679.200615] R13: 0000000000000004 R14: 0000000000000001 R15: 0000000000000000
[ 6679.201369] rebuild_sched_domains dom 0: 0-1
[ 6679.202196] span: 0-1 (max cpu_capacity = 1024)

Guess we should grab either lock from the writing path.

Best,

- Juri
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 0/5] cpuset: Enable cpuset controller in default hierarchy
  2018-04-23 13:07   ` Juri Lelli
@ 2018-04-23 13:57     ` Juri Lelli
  -1 siblings, 0 replies; 48+ messages in thread
From: Juri Lelli @ 2018-04-23 13:57 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra,
	Ingo Molnar, cgroups, linux-kernel, linux-doc, kernel-team, pjt,
	luto, Mike Galbraith, torvalds, Roman Gushchin

On 23/04/18 15:07, Juri Lelli wrote:
> Hi Waiman,
> 
> On 19/04/18 09:46, Waiman Long wrote:
> > v7:
> >  - Add a root-only cpuset.cpus.isolated control file for CPU isolation.
> >  - Enforce that load_balancing can only be turned off on cpusets with
> >    CPUs from the isolated list.
> >  - Update sched domain generation to allow cpusets with CPUs only
> >    from the isolated CPU list to be in separate root domains.
> 

Guess I'll be adding comments as soon as I stumble on something unclear
(to me :), hope that's OK (shout if I should do it differently).

The below looked unexpected to me:

root@debian-kvm:/sys/fs/cgroup# cat g1/cpuset.cpus
2-3
root@debian-kvm:/sys/fs/cgroup# cat g1/cpuset.mems

root@debian-kvm:~# echo $$ > /sys/fs/cgroup/g1/cgroup.threads
root@debian-kvm:/sys/fs/cgroup# cat g1/cgroup.threads
2312

So I can add tasks to groups with no mems? Or is it this only true in my
case with a single mem node? Or maybe it's inherited from root group
(slightly confusing IMHO if that's the case).

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 0/5] cpuset: Enable cpuset controller in default hierarchy
@ 2018-04-23 13:57     ` Juri Lelli
  0 siblings, 0 replies; 48+ messages in thread
From: Juri Lelli @ 2018-04-23 13:57 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra,
	Ingo Molnar, cgroups, linux-kernel, linux-doc, kernel-team, pjt,
	luto, Mike Galbraith, torvalds, Roman Gushchin

On 23/04/18 15:07, Juri Lelli wrote:
> Hi Waiman,
> 
> On 19/04/18 09:46, Waiman Long wrote:
> > v7:
> >  - Add a root-only cpuset.cpus.isolated control file for CPU isolation.
> >  - Enforce that load_balancing can only be turned off on cpusets with
> >    CPUs from the isolated list.
> >  - Update sched domain generation to allow cpusets with CPUs only
> >    from the isolated CPU list to be in separate root domains.
> 

Guess I'll be adding comments as soon as I stumble on something unclear
(to me :), hope that's OK (shout if I should do it differently).

The below looked unexpected to me:

root@debian-kvm:/sys/fs/cgroup# cat g1/cpuset.cpus
2-3
root@debian-kvm:/sys/fs/cgroup# cat g1/cpuset.mems

root@debian-kvm:~# echo $$ > /sys/fs/cgroup/g1/cgroup.threads
root@debian-kvm:/sys/fs/cgroup# cat g1/cgroup.threads
2312

So I can add tasks to groups with no mems? Or is it this only true in my
case with a single mem node? Or maybe it's inherited from root group
(slightly confusing IMHO if that's the case).
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 0/5] cpuset: Enable cpuset controller in default hierarchy
  2018-04-23 13:57     ` Juri Lelli
@ 2018-04-23 14:10       ` Waiman Long
  -1 siblings, 0 replies; 48+ messages in thread
From: Waiman Long @ 2018-04-23 14:10 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra,
	Ingo Molnar, cgroups, linux-kernel, linux-doc, kernel-team, pjt,
	luto, Mike Galbraith, torvalds, Roman Gushchin

On 04/23/2018 09:57 AM, Juri Lelli wrote:
> On 23/04/18 15:07, Juri Lelli wrote:
>> Hi Waiman,
>>
>> On 19/04/18 09:46, Waiman Long wrote:
>>> v7:
>>>  - Add a root-only cpuset.cpus.isolated control file for CPU isolation.
>>>  - Enforce that load_balancing can only be turned off on cpusets with
>>>    CPUs from the isolated list.
>>>  - Update sched domain generation to allow cpusets with CPUs only
>>>    from the isolated CPU list to be in separate root domains.
> Guess I'll be adding comments as soon as I stumble on something unclear
> (to me :), hope that's OK (shout if I should do it differently).
>
> The below looked unexpected to me:
>
> root@debian-kvm:/sys/fs/cgroup# cat g1/cpuset.cpus
> 2-3
> root@debian-kvm:/sys/fs/cgroup# cat g1/cpuset.mems
>
> root@debian-kvm:~# echo $$ > /sys/fs/cgroup/g1/cgroup.threads
> root@debian-kvm:/sys/fs/cgroup# cat g1/cgroup.threads
> 2312
>
> So I can add tasks to groups with no mems? Or is it this only true in my
> case with a single mem node? Or maybe it's inherited from root group
> (slightly confusing IMHO if that's the case).

No mems mean looking up the parents until we find one with non-empty
mems. The mems.effective will show you the actual memory nodes used.

-Longman

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 0/5] cpuset: Enable cpuset controller in default hierarchy
@ 2018-04-23 14:10       ` Waiman Long
  0 siblings, 0 replies; 48+ messages in thread
From: Waiman Long @ 2018-04-23 14:10 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra,
	Ingo Molnar, cgroups, linux-kernel, linux-doc, kernel-team, pjt,
	luto, Mike Galbraith, torvalds, Roman Gushchin

On 04/23/2018 09:57 AM, Juri Lelli wrote:
> On 23/04/18 15:07, Juri Lelli wrote:
>> Hi Waiman,
>>
>> On 19/04/18 09:46, Waiman Long wrote:
>>> v7:
>>>  - Add a root-only cpuset.cpus.isolated control file for CPU isolation.
>>>  - Enforce that load_balancing can only be turned off on cpusets with
>>>    CPUs from the isolated list.
>>>  - Update sched domain generation to allow cpusets with CPUs only
>>>    from the isolated CPU list to be in separate root domains.
> Guess I'll be adding comments as soon as I stumble on something unclear
> (to me :), hope that's OK (shout if I should do it differently).
>
> The below looked unexpected to me:
>
> root@debian-kvm:/sys/fs/cgroup# cat g1/cpuset.cpus
> 2-3
> root@debian-kvm:/sys/fs/cgroup# cat g1/cpuset.mems
>
> root@debian-kvm:~# echo $$ > /sys/fs/cgroup/g1/cgroup.threads
> root@debian-kvm:/sys/fs/cgroup# cat g1/cgroup.threads
> 2312
>
> So I can add tasks to groups with no mems? Or is it this only true in my
> case with a single mem node? Or maybe it's inherited from root group
> (slightly confusing IMHO if that's the case).

No mems mean looking up the parents until we find one with non-empty
mems. The mems.effective will show you the actual memory nodes used.

-Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 3/5] cpuset: Add a root-only cpus.isolated v2 control file
  2018-04-19 13:47   ` Waiman Long
@ 2018-04-23 15:56     ` Juri Lelli
  -1 siblings, 0 replies; 48+ messages in thread
From: Juri Lelli @ 2018-04-23 15:56 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra,
	Ingo Molnar, cgroups, linux-kernel, linux-doc, kernel-team, pjt,
	luto, Mike Galbraith, torvalds, Roman Gushchin

On 19/04/18 09:47, Waiman Long wrote:

[...]

> +  cpuset.cpus.isolated
> +	A read-write multiple values file which exists on root cgroup
> +	only.
> +
> +	It lists the CPUs that have been withdrawn from the root cgroup
> +	for load balancing.  These CPUs can still be allocated to child
> +	cpusets with load balancing enabled, if necessary.
> +
> +	If a child cpuset contains only an exclusive set of CPUs that are
> +	a subset of the isolated CPUs and with load balancing enabled,
> +	these CPUs will be load balanced on a separate root domain from
> +	the one in the root cgroup.
> +
> +	Just putting the CPUs into "cpuset.cpus.isolated" will be
> +	enough to disable load balancing on those CPUs as long as they
> +	do not appear in a child cpuset with load balancing enabled.

Tasks that were on those CPUs when they got isolated will stay there
(unless forcibly moved somewhere else). They will also "automatically"
belong to default root domain (or potentially to a new root domain
created for a group using those CPUs). Both things are maybe unavoidable
(as discussed in previous versions some tasks cannot be migrated at
all), but such "side effects" should probably be documented. What do you
think?

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 3/5] cpuset: Add a root-only cpus.isolated v2 control file
@ 2018-04-23 15:56     ` Juri Lelli
  0 siblings, 0 replies; 48+ messages in thread
From: Juri Lelli @ 2018-04-23 15:56 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra,
	Ingo Molnar, cgroups, linux-kernel, linux-doc, kernel-team, pjt,
	luto, Mike Galbraith, torvalds, Roman Gushchin

On 19/04/18 09:47, Waiman Long wrote:

[...]

> +  cpuset.cpus.isolated
> +	A read-write multiple values file which exists on root cgroup
> +	only.
> +
> +	It lists the CPUs that have been withdrawn from the root cgroup
> +	for load balancing.  These CPUs can still be allocated to child
> +	cpusets with load balancing enabled, if necessary.
> +
> +	If a child cpuset contains only an exclusive set of CPUs that are
> +	a subset of the isolated CPUs and with load balancing enabled,
> +	these CPUs will be load balanced on a separate root domain from
> +	the one in the root cgroup.
> +
> +	Just putting the CPUs into "cpuset.cpus.isolated" will be
> +	enough to disable load balancing on those CPUs as long as they
> +	do not appear in a child cpuset with load balancing enabled.

Tasks that were on those CPUs when they got isolated will stay there
(unless forcibly moved somewhere else). They will also "automatically"
belong to default root domain (or potentially to a new root domain
created for a group using those CPUs). Both things are maybe unavoidable
(as discussed in previous versions some tasks cannot be migrated at
all), but such "side effects" should probably be documented. What do you
think?
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 0/5] cpuset: Enable cpuset controller in default hierarchy
  2018-04-20  8:23   ` Mike Galbraith
@ 2018-04-23 16:32     ` Waiman Long
  -1 siblings, 0 replies; 48+ messages in thread
From: Waiman Long @ 2018-04-23 16:32 UTC (permalink / raw)
  To: Mike Galbraith, Tejun Heo, Li Zefan, Johannes Weiner,
	Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	torvalds, Roman Gushchin, Juri Lelli

On 04/20/2018 04:23 AM, Mike Galbraith wrote:
> On Thu, 2018-04-19 at 09:46 -0400, Waiman Long wrote:
>> v7:
>>  - Add a root-only cpuset.cpus.isolated control file for CPU isolation.
>>  - Enforce that load_balancing can only be turned off on cpusets with
>>    CPUs from the isolated list.
>>  - Update sched domain generation to allow cpusets with CPUs only
>>    from the isolated CPU list to be in separate root domains.
> I haven't done much, but was able to do a q/d manual build, populate
> and teardown of system/critical sets on my desktop box, and it looked
> ok.  Thanks for getting this aboard the v2 boat.
>
> 	-Mike

Thank for the testing.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 0/5] cpuset: Enable cpuset controller in default hierarchy
@ 2018-04-23 16:32     ` Waiman Long
  0 siblings, 0 replies; 48+ messages in thread
From: Waiman Long @ 2018-04-23 16:32 UTC (permalink / raw)
  To: Mike Galbraith, Tejun Heo, Li Zefan, Johannes Weiner,
	Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	torvalds, Roman Gushchin, Juri Lelli

On 04/20/2018 04:23 AM, Mike Galbraith wrote:
> On Thu, 2018-04-19 at 09:46 -0400, Waiman Long wrote:
>> v7:
>>  - Add a root-only cpuset.cpus.isolated control file for CPU isolation.
>>  - Enforce that load_balancing can only be turned off on cpusets with
>>    CPUs from the isolated list.
>>  - Update sched domain generation to allow cpusets with CPUs only
>>    from the isolated CPU list to be in separate root domains.
> I haven't done much, but was able to do a q/d manual build, populate
> and teardown of system/critical sets on my desktop box, and it looked
> ok.  Thanks for getting this aboard the v2 boat.
>
> 	-Mike

Thank for the testing.

Cheers,
Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 4/5] cpuset: Restrict load balancing off cpus to subset of cpus.isolated
  2018-04-19 13:47   ` Waiman Long
@ 2018-05-01 19:51     ` Tejun Heo
  -1 siblings, 0 replies; 48+ messages in thread
From: Tejun Heo @ 2018-05-01 19:51 UTC (permalink / raw)
  To: Waiman Long
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli

Hello, Waiman.

Sorry about the delay.

On Thu, Apr 19, 2018 at 09:47:03AM -0400, Waiman Long wrote:
> With the addition of "cpuset.cpus.isolated", it makes sense to add the
> restriction that load balancing can only be turned off if the CPUs in
> the isolated cpuset are subset of "cpuset.cpus.isolated".
> 
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  Documentation/cgroup-v2.txt |  7 ++++---
>  kernel/cgroup/cpuset.c      | 29 ++++++++++++++++++++++++++---
>  2 files changed, 30 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
> index 8d89dc2..c4227ee 100644
> --- a/Documentation/cgroup-v2.txt
> +++ b/Documentation/cgroup-v2.txt
> @@ -1554,9 +1554,10 @@ Cpuset Interface Files
>  	and will not be moved to other CPUs.
>  
>  	This flag is hierarchical and is inherited by child cpusets. It
> -	can be turned off only when the CPUs in this cpuset aren't
> -	listed in the cpuset.cpus of other sibling cgroups, and all
> -	the child cpusets, if present, have this flag turned off.
> +	can be explicitly turned off only when it is a direct child of
> +	the root cgroup and the CPUs in this cpuset are subset of the
> +	root's "cpuset.cpus.isolated".	Moreover, the CPUs cannot be
> +	listed in the "cpuset.cpus" of other sibling cgroups.

It is a little bit convoluted that the isolation requires coordination
among root's isolated file and the first-level children's cpus file
and the flag.  Maybe I'm missing something but can't we do something
like the following?

* Add isolated flag file, which can only be modified on empty (in
  terms of cpus) first level children.

* Once isolated flag is set, CPUs can only be added to the cpus file
  iff they aren't being used by anyone else and automatically become
  isolated.

The first level cpus file is owned by the root cgroup anyway, so
there's no danger regarding delegation or whatever and the interface
would be a lot simpler.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 4/5] cpuset: Restrict load balancing off cpus to subset of cpus.isolated
@ 2018-05-01 19:51     ` Tejun Heo
  0 siblings, 0 replies; 48+ messages in thread
From: Tejun Heo @ 2018-05-01 19:51 UTC (permalink / raw)
  To: Waiman Long
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli

Hello, Waiman.

Sorry about the delay.

On Thu, Apr 19, 2018 at 09:47:03AM -0400, Waiman Long wrote:
> With the addition of "cpuset.cpus.isolated", it makes sense to add the
> restriction that load balancing can only be turned off if the CPUs in
> the isolated cpuset are subset of "cpuset.cpus.isolated".
> 
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  Documentation/cgroup-v2.txt |  7 ++++---
>  kernel/cgroup/cpuset.c      | 29 ++++++++++++++++++++++++++---
>  2 files changed, 30 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
> index 8d89dc2..c4227ee 100644
> --- a/Documentation/cgroup-v2.txt
> +++ b/Documentation/cgroup-v2.txt
> @@ -1554,9 +1554,10 @@ Cpuset Interface Files
>  	and will not be moved to other CPUs.
>  
>  	This flag is hierarchical and is inherited by child cpusets. It
> -	can be turned off only when the CPUs in this cpuset aren't
> -	listed in the cpuset.cpus of other sibling cgroups, and all
> -	the child cpusets, if present, have this flag turned off.
> +	can be explicitly turned off only when it is a direct child of
> +	the root cgroup and the CPUs in this cpuset are subset of the
> +	root's "cpuset.cpus.isolated".	Moreover, the CPUs cannot be
> +	listed in the "cpuset.cpus" of other sibling cgroups.

It is a little bit convoluted that the isolation requires coordination
among root's isolated file and the first-level children's cpus file
and the flag.  Maybe I'm missing something but can't we do something
like the following?

* Add isolated flag file, which can only be modified on empty (in
  terms of cpus) first level children.

* Once isolated flag is set, CPUs can only be added to the cpus file
  iff they aren't being used by anyone else and automatically become
  isolated.

The first level cpus file is owned by the root cgroup anyway, so
there's no danger regarding delegation or whatever and the interface
would be a lot simpler.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 4/5] cpuset: Restrict load balancing off cpus to subset of cpus.isolated
  2018-05-01 19:51     ` Tejun Heo
@ 2018-05-01 20:33       ` Waiman Long
  -1 siblings, 0 replies; 48+ messages in thread
From: Waiman Long @ 2018-05-01 20:33 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli

On 05/01/2018 03:51 PM, Tejun Heo wrote:
> Hello, Waiman.
>
> Sorry about the delay.
>
> On Thu, Apr 19, 2018 at 09:47:03AM -0400, Waiman Long wrote:
>> With the addition of "cpuset.cpus.isolated", it makes sense to add the
>> restriction that load balancing can only be turned off if the CPUs in
>> the isolated cpuset are subset of "cpuset.cpus.isolated".
>>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>  Documentation/cgroup-v2.txt |  7 ++++---
>>  kernel/cgroup/cpuset.c      | 29 ++++++++++++++++++++++++++---
>>  2 files changed, 30 insertions(+), 6 deletions(-)
>>
>> diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
>> index 8d89dc2..c4227ee 100644
>> --- a/Documentation/cgroup-v2.txt
>> +++ b/Documentation/cgroup-v2.txt
>> @@ -1554,9 +1554,10 @@ Cpuset Interface Files
>>  	and will not be moved to other CPUs.
>>  
>>  	This flag is hierarchical and is inherited by child cpusets. It
>> -	can be turned off only when the CPUs in this cpuset aren't
>> -	listed in the cpuset.cpus of other sibling cgroups, and all
>> -	the child cpusets, if present, have this flag turned off.
>> +	can be explicitly turned off only when it is a direct child of
>> +	the root cgroup and the CPUs in this cpuset are subset of the
>> +	root's "cpuset.cpus.isolated".	Moreover, the CPUs cannot be
>> +	listed in the "cpuset.cpus" of other sibling cgroups.
> It is a little bit convoluted that the isolation requires coordination
> among root's isolated file and the first-level children's cpus file
> and the flag.  Maybe I'm missing something but can't we do something
> like the following?
>
> * Add isolated flag file, which can only be modified on empty (in
>   terms of cpus) first level children.
>
> * Once isolated flag is set, CPUs can only be added to the cpus file
>   iff they aren't being used by anyone else and automatically become
>   isolated.
>
> The first level cpus file is owned by the root cgroup anyway, so
> there's no danger regarding delegation or whatever and the interface
> would be a lot simpler.

I think that will work too. We currently don't have a flag to make a
file visible on first-level children only, but it shouldn't be hard to
make one.

Putting CPUs into an isolated child cpuset means removing it from the
root's effective CPUs. So I would probably like to expose the read-only
cpus.effective in the root cgroup so that we can check changes in the
effective cpu list.

I will renew the patchset will your suggestion.

Thanks,
Longman

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 4/5] cpuset: Restrict load balancing off cpus to subset of cpus.isolated
@ 2018-05-01 20:33       ` Waiman Long
  0 siblings, 0 replies; 48+ messages in thread
From: Waiman Long @ 2018-05-01 20:33 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli

On 05/01/2018 03:51 PM, Tejun Heo wrote:
> Hello, Waiman.
>
> Sorry about the delay.
>
> On Thu, Apr 19, 2018 at 09:47:03AM -0400, Waiman Long wrote:
>> With the addition of "cpuset.cpus.isolated", it makes sense to add the
>> restriction that load balancing can only be turned off if the CPUs in
>> the isolated cpuset are subset of "cpuset.cpus.isolated".
>>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>  Documentation/cgroup-v2.txt |  7 ++++---
>>  kernel/cgroup/cpuset.c      | 29 ++++++++++++++++++++++++++---
>>  2 files changed, 30 insertions(+), 6 deletions(-)
>>
>> diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
>> index 8d89dc2..c4227ee 100644
>> --- a/Documentation/cgroup-v2.txt
>> +++ b/Documentation/cgroup-v2.txt
>> @@ -1554,9 +1554,10 @@ Cpuset Interface Files
>>  	and will not be moved to other CPUs.
>>  
>>  	This flag is hierarchical and is inherited by child cpusets. It
>> -	can be turned off only when the CPUs in this cpuset aren't
>> -	listed in the cpuset.cpus of other sibling cgroups, and all
>> -	the child cpusets, if present, have this flag turned off.
>> +	can be explicitly turned off only when it is a direct child of
>> +	the root cgroup and the CPUs in this cpuset are subset of the
>> +	root's "cpuset.cpus.isolated".	Moreover, the CPUs cannot be
>> +	listed in the "cpuset.cpus" of other sibling cgroups.
> It is a little bit convoluted that the isolation requires coordination
> among root's isolated file and the first-level children's cpus file
> and the flag.  Maybe I'm missing something but can't we do something
> like the following?
>
> * Add isolated flag file, which can only be modified on empty (in
>   terms of cpus) first level children.
>
> * Once isolated flag is set, CPUs can only be added to the cpus file
>   iff they aren't being used by anyone else and automatically become
>   isolated.
>
> The first level cpus file is owned by the root cgroup anyway, so
> there's no danger regarding delegation or whatever and the interface
> would be a lot simpler.

I think that will work too. We currently don't have a flag to make a
file visible on first-level children only, but it shouldn't be hard to
make one.

Putting CPUs into an isolated child cpuset means removing it from the
root's effective CPUs. So I would probably like to expose the read-only
cpus.effective in the root cgroup so that we can check changes in the
effective cpu list.

I will renew the patchset will your suggestion.

Thanks,
Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 4/5] cpuset: Restrict load balancing off cpus to subset of cpus.isolated
  2018-05-01 20:33       ` Waiman Long
@ 2018-05-01 20:58         ` Tejun Heo
  -1 siblings, 0 replies; 48+ messages in thread
From: Tejun Heo @ 2018-05-01 20:58 UTC (permalink / raw)
  To: Waiman Long
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli

Hello,

On Tue, May 01, 2018 at 04:33:45PM -0400, Waiman Long wrote:
> I think that will work too. We currently don't have a flag to make a
> file visible on first-level children only, but it shouldn't be hard to
> make one.

I think it'd be fine to make the flag file exist on all !root cgroups
but only writable on the first level children.

> Putting CPUs into an isolated child cpuset means removing it from the
> root's effective CPUs. So I would probably like to expose the read-only
> cpus.effective in the root cgroup so that we can check changes in the
> effective cpu list.

Ah, yeah, that makes sense.

> I will renew the patchset will your suggestion.

Thank you very much.

-- 
tejun

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 4/5] cpuset: Restrict load balancing off cpus to subset of cpus.isolated
@ 2018-05-01 20:58         ` Tejun Heo
  0 siblings, 0 replies; 48+ messages in thread
From: Tejun Heo @ 2018-05-01 20:58 UTC (permalink / raw)
  To: Waiman Long
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli

Hello,

On Tue, May 01, 2018 at 04:33:45PM -0400, Waiman Long wrote:
> I think that will work too. We currently don't have a flag to make a
> file visible on first-level children only, but it shouldn't be hard to
> make one.

I think it'd be fine to make the flag file exist on all !root cgroups
but only writable on the first level children.

> Putting CPUs into an isolated child cpuset means removing it from the
> root's effective CPUs. So I would probably like to expose the read-only
> cpus.effective in the root cgroup so that we can check changes in the
> effective cpu list.

Ah, yeah, that makes sense.

> I will renew the patchset will your suggestion.

Thank you very much.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 4/5] cpuset: Restrict load balancing off cpus to subset of cpus.isolated
  2018-05-01 20:58         ` Tejun Heo
@ 2018-05-01 21:31           ` Waiman Long
  -1 siblings, 0 replies; 48+ messages in thread
From: Waiman Long @ 2018-05-01 21:31 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli

On 05/01/2018 04:58 PM, Tejun Heo wrote:
> Hello,
>
> On Tue, May 01, 2018 at 04:33:45PM -0400, Waiman Long wrote:
>> I think that will work too. We currently don't have a flag to make a
>> file visible on first-level children only, but it shouldn't be hard to
>> make one.
> I think it'd be fine to make the flag file exist on all !root cgroups
> but only writable on the first level children.

Right. This flag will be inherited by child cgroups like the
sched_load_balance.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 4/5] cpuset: Restrict load balancing off cpus to subset of cpus.isolated
@ 2018-05-01 21:31           ` Waiman Long
  0 siblings, 0 replies; 48+ messages in thread
From: Waiman Long @ 2018-05-01 21:31 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli

On 05/01/2018 04:58 PM, Tejun Heo wrote:
> Hello,
>
> On Tue, May 01, 2018 at 04:33:45PM -0400, Waiman Long wrote:
>> I think that will work too. We currently don't have a flag to make a
>> file visible on first-level children only, but it shouldn't be hard to
>> make one.
> I think it'd be fine to make the flag file exist on all !root cgroups
> but only writable on the first level children.

Right. This flag will be inherited by child cgroups like the
sched_load_balance.

Cheers,
Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 2/5] cpuset: Add cpuset.sched_load_balance to v2
  2018-04-19 13:47   ` Waiman Long
@ 2018-05-02 10:24     ` Peter Zijlstra
  -1 siblings, 0 replies; 48+ messages in thread
From: Peter Zijlstra @ 2018-05-02 10:24 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli

On Thu, Apr 19, 2018 at 09:47:01AM -0400, Waiman Long wrote:
> +  cpuset.sched_load_balance
> +	A read-write single value file which exists on non-root cgroups.

Uhhm.. it should very much exist in the root group too. Otherwise you
cannot disable it there, which is required to allow smaller groups to
load-balance between themselves.

> +	The default is "1" (on), and the other possible value is "0"
> +	(off).
> +
> +	When it is on, tasks within this cpuset will be load-balanced
> +	by the kernel scheduler.  Tasks will be moved from CPUs with
> +	high load to other CPUs within the same cpuset with less load
> +	periodically.
> +
> +	When it is off, there will be no load balancing among CPUs on
> +	this cgroup.  Tasks will stay in the CPUs they are running on
> +	and will not be moved to other CPUs.
> +
> +	This flag is hierarchical and is inherited by child cpusets. It
> +	can be turned off only when the CPUs in this cpuset aren't
> +	listed in the cpuset.cpus of other sibling cgroups, and all
> +	the child cpusets, if present, have this flag turned off.
> +
> +	Once it is off, it cannot be turned back on as long as the
> +	parent cgroup still has this flag in the off state.

That too is wrong and broken. You explicitly want to turn it on for
children.

So the idea is that you can have:

		R
	      /   \
            A       B

With:

	R cpus=0-3, load_balance=0
	A cpus=0-1, load_balance=1
	B cpus=2-3, load_balance=1

Which will allow all tasks in A,B (and its children) to load-balance
across 0-1 or 2-3 resp.

If you don't allow the root group to disable load_balance, it will
always be the largest group and load-balancing will always happen system
wide.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 2/5] cpuset: Add cpuset.sched_load_balance to v2
@ 2018-05-02 10:24     ` Peter Zijlstra
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Zijlstra @ 2018-05-02 10:24 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli

On Thu, Apr 19, 2018 at 09:47:01AM -0400, Waiman Long wrote:
> +  cpuset.sched_load_balance
> +	A read-write single value file which exists on non-root cgroups.

Uhhm.. it should very much exist in the root group too. Otherwise you
cannot disable it there, which is required to allow smaller groups to
load-balance between themselves.

> +	The default is "1" (on), and the other possible value is "0"
> +	(off).
> +
> +	When it is on, tasks within this cpuset will be load-balanced
> +	by the kernel scheduler.  Tasks will be moved from CPUs with
> +	high load to other CPUs within the same cpuset with less load
> +	periodically.
> +
> +	When it is off, there will be no load balancing among CPUs on
> +	this cgroup.  Tasks will stay in the CPUs they are running on
> +	and will not be moved to other CPUs.
> +
> +	This flag is hierarchical and is inherited by child cpusets. It
> +	can be turned off only when the CPUs in this cpuset aren't
> +	listed in the cpuset.cpus of other sibling cgroups, and all
> +	the child cpusets, if present, have this flag turned off.
> +
> +	Once it is off, it cannot be turned back on as long as the
> +	parent cgroup still has this flag in the off state.

That too is wrong and broken. You explicitly want to turn it on for
children.

So the idea is that you can have:

		R
	      /   \
            A       B

With:

	R cpus=0-3, load_balance=0
	A cpus=0-1, load_balance=1
	B cpus=2-3, load_balance=1

Which will allow all tasks in A,B (and its children) to load-balance
across 0-1 or 2-3 resp.

If you don't allow the root group to disable load_balance, it will
always be the largest group and load-balancing will always happen system
wide.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 2/5] cpuset: Add cpuset.sched_load_balance to v2
  2018-05-02 10:24     ` Peter Zijlstra
@ 2018-05-02 13:29       ` Waiman Long
  -1 siblings, 0 replies; 48+ messages in thread
From: Waiman Long @ 2018-05-02 13:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli

On 05/02/2018 06:24 AM, Peter Zijlstra wrote:
> On Thu, Apr 19, 2018 at 09:47:01AM -0400, Waiman Long wrote:
>> +  cpuset.sched_load_balance
>> +	A read-write single value file which exists on non-root cgroups.
> Uhhm.. it should very much exist in the root group too. Otherwise you
> cannot disable it there, which is required to allow smaller groups to
> load-balance between themselves.
>
>> +	The default is "1" (on), and the other possible value is "0"
>> +	(off).
>> +
>> +	When it is on, tasks within this cpuset will be load-balanced
>> +	by the kernel scheduler.  Tasks will be moved from CPUs with
>> +	high load to other CPUs within the same cpuset with less load
>> +	periodically.
>> +
>> +	When it is off, there will be no load balancing among CPUs on
>> +	this cgroup.  Tasks will stay in the CPUs they are running on
>> +	and will not be moved to other CPUs.
>> +
>> +	This flag is hierarchical and is inherited by child cpusets. It
>> +	can be turned off only when the CPUs in this cpuset aren't
>> +	listed in the cpuset.cpus of other sibling cgroups, and all
>> +	the child cpusets, if present, have this flag turned off.
>> +
>> +	Once it is off, it cannot be turned back on as long as the
>> +	parent cgroup still has this flag in the off state.
> That too is wrong and broken. You explicitly want to turn it on for
> children.
>
> So the idea is that you can have:
>
> 		R
> 	      /   \
>             A       B
>
> With:
>
> 	R cpus=0-3, load_balance=0
> 	A cpus=0-1, load_balance=1
> 	B cpus=2-3, load_balance=1
>
> Which will allow all tasks in A,B (and its children) to load-balance
> across 0-1 or 2-3 resp.
>
> If you don't allow the root group to disable load_balance, it will
> always be the largest group and load-balancing will always happen system
> wide.

If you look at the remaining patches in the series, I was proposing a
different way to support isolcpus and separate sched domains with
turning off load balancing in the root cgroup.

For me, it doesn't feel right to have load balancing disabled in the
root cgroup as we probably cannot move all the tasks away from the root
cgroup anyway. I am going to update the current patchset to incorporate
suggestion from Tejun. It will probably be ready sometime next week.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 2/5] cpuset: Add cpuset.sched_load_balance to v2
@ 2018-05-02 13:29       ` Waiman Long
  0 siblings, 0 replies; 48+ messages in thread
From: Waiman Long @ 2018-05-02 13:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli

On 05/02/2018 06:24 AM, Peter Zijlstra wrote:
> On Thu, Apr 19, 2018 at 09:47:01AM -0400, Waiman Long wrote:
>> +  cpuset.sched_load_balance
>> +	A read-write single value file which exists on non-root cgroups.
> Uhhm.. it should very much exist in the root group too. Otherwise you
> cannot disable it there, which is required to allow smaller groups to
> load-balance between themselves.
>
>> +	The default is "1" (on), and the other possible value is "0"
>> +	(off).
>> +
>> +	When it is on, tasks within this cpuset will be load-balanced
>> +	by the kernel scheduler.  Tasks will be moved from CPUs with
>> +	high load to other CPUs within the same cpuset with less load
>> +	periodically.
>> +
>> +	When it is off, there will be no load balancing among CPUs on
>> +	this cgroup.  Tasks will stay in the CPUs they are running on
>> +	and will not be moved to other CPUs.
>> +
>> +	This flag is hierarchical and is inherited by child cpusets. It
>> +	can be turned off only when the CPUs in this cpuset aren't
>> +	listed in the cpuset.cpus of other sibling cgroups, and all
>> +	the child cpusets, if present, have this flag turned off.
>> +
>> +	Once it is off, it cannot be turned back on as long as the
>> +	parent cgroup still has this flag in the off state.
> That too is wrong and broken. You explicitly want to turn it on for
> children.
>
> So the idea is that you can have:
>
> 		R
> 	      /   \
>             A       B
>
> With:
>
> 	R cpus=0-3, load_balance=0
> 	A cpus=0-1, load_balance=1
> 	B cpus=2-3, load_balance=1
>
> Which will allow all tasks in A,B (and its children) to load-balance
> across 0-1 or 2-3 resp.
>
> If you don't allow the root group to disable load_balance, it will
> always be the largest group and load-balancing will always happen system
> wide.

If you look at the remaining patches in the series, I was proposing a
different way to support isolcpus and separate sched domains with
turning off load balancing in the root cgroup.

For me, it doesn't feel right to have load balancing disabled in the
root cgroup as we probably cannot move all the tasks away from the root
cgroup anyway. I am going to update the current patchset to incorporate
suggestion from Tejun. It will probably be ready sometime next week.

Cheers,
Longman


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 2/5] cpuset: Add cpuset.sched_load_balance to v2
  2018-05-02 13:29       ` Waiman Long
@ 2018-05-02 13:42         ` Peter Zijlstra
  -1 siblings, 0 replies; 48+ messages in thread
From: Peter Zijlstra @ 2018-05-02 13:42 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli

On Wed, May 02, 2018 at 09:29:54AM -0400, Waiman Long wrote:
> On 05/02/2018 06:24 AM, Peter Zijlstra wrote:
> > On Thu, Apr 19, 2018 at 09:47:01AM -0400, Waiman Long wrote:
> >> +  cpuset.sched_load_balance
> >> +	A read-write single value file which exists on non-root cgroups.
> > Uhhm.. it should very much exist in the root group too. Otherwise you
> > cannot disable it there, which is required to allow smaller groups to
> > load-balance between themselves.
> >
> >> +	The default is "1" (on), and the other possible value is "0"
> >> +	(off).
> >> +
> >> +	When it is on, tasks within this cpuset will be load-balanced
> >> +	by the kernel scheduler.  Tasks will be moved from CPUs with
> >> +	high load to other CPUs within the same cpuset with less load
> >> +	periodically.
> >> +
> >> +	When it is off, there will be no load balancing among CPUs on
> >> +	this cgroup.  Tasks will stay in the CPUs they are running on
> >> +	and will not be moved to other CPUs.
> >> +
> >> +	This flag is hierarchical and is inherited by child cpusets. It
> >> +	can be turned off only when the CPUs in this cpuset aren't
> >> +	listed in the cpuset.cpus of other sibling cgroups, and all
> >> +	the child cpusets, if present, have this flag turned off.
> >> +
> >> +	Once it is off, it cannot be turned back on as long as the
> >> +	parent cgroup still has this flag in the off state.
> > That too is wrong and broken. You explicitly want to turn it on for
> > children.
> >
> > So the idea is that you can have:
> >
> > 		R
> > 	      /   \
> >             A       B
> >
> > With:
> >
> > 	R cpus=0-3, load_balance=0
> > 	A cpus=0-1, load_balance=1
> > 	B cpus=2-3, load_balance=1
> >
> > Which will allow all tasks in A,B (and its children) to load-balance
> > across 0-1 or 2-3 resp.
> >
> > If you don't allow the root group to disable load_balance, it will
> > always be the largest group and load-balancing will always happen system
> > wide.
> 
> If you look at the remaining patches in the series, I was proposing a
> different way to support isolcpus and separate sched domains with
> turning off load balancing in the root cgroup.
> 
> For me, it doesn't feel right to have load balancing disabled in the
> root cgroup as we probably cannot move all the tasks away from the root
> cgroup anyway. I am going to update the current patchset to incorporate
> suggestion from Tejun. It will probably be ready sometime next week.
> 

I've read half of the next patch that adds the isolation thing. And
while that kludges around the whole root cgorup is magic thing, it
doesn't help if you move the above scenario on level down:


	R
     /    \
   A        B
          /   \
        C       D


R: cpus=0-7, load_balance=0
A: cpus=0-1, load_balance=1
B: cpus=2-7, load_balance=0
C: cpus=2-3, load_balance=1
D: cpus=4-7, load_balance=1


Also, I feel we should strive to have a minimal amount of tasks that
cannot be moved out of the root group; the current set is far too large.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 2/5] cpuset: Add cpuset.sched_load_balance to v2
@ 2018-05-02 13:42         ` Peter Zijlstra
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Zijlstra @ 2018-05-02 13:42 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli

On Wed, May 02, 2018 at 09:29:54AM -0400, Waiman Long wrote:
> On 05/02/2018 06:24 AM, Peter Zijlstra wrote:
> > On Thu, Apr 19, 2018 at 09:47:01AM -0400, Waiman Long wrote:
> >> +  cpuset.sched_load_balance
> >> +	A read-write single value file which exists on non-root cgroups.
> > Uhhm.. it should very much exist in the root group too. Otherwise you
> > cannot disable it there, which is required to allow smaller groups to
> > load-balance between themselves.
> >
> >> +	The default is "1" (on), and the other possible value is "0"
> >> +	(off).
> >> +
> >> +	When it is on, tasks within this cpuset will be load-balanced
> >> +	by the kernel scheduler.  Tasks will be moved from CPUs with
> >> +	high load to other CPUs within the same cpuset with less load
> >> +	periodically.
> >> +
> >> +	When it is off, there will be no load balancing among CPUs on
> >> +	this cgroup.  Tasks will stay in the CPUs they are running on
> >> +	and will not be moved to other CPUs.
> >> +
> >> +	This flag is hierarchical and is inherited by child cpusets. It
> >> +	can be turned off only when the CPUs in this cpuset aren't
> >> +	listed in the cpuset.cpus of other sibling cgroups, and all
> >> +	the child cpusets, if present, have this flag turned off.
> >> +
> >> +	Once it is off, it cannot be turned back on as long as the
> >> +	parent cgroup still has this flag in the off state.
> > That too is wrong and broken. You explicitly want to turn it on for
> > children.
> >
> > So the idea is that you can have:
> >
> > 		R
> > 	      /   \
> >             A       B
> >
> > With:
> >
> > 	R cpus=0-3, load_balance=0
> > 	A cpus=0-1, load_balance=1
> > 	B cpus=2-3, load_balance=1
> >
> > Which will allow all tasks in A,B (and its children) to load-balance
> > across 0-1 or 2-3 resp.
> >
> > If you don't allow the root group to disable load_balance, it will
> > always be the largest group and load-balancing will always happen system
> > wide.
> 
> If you look at the remaining patches in the series, I was proposing a
> different way to support isolcpus and separate sched domains with
> turning off load balancing in the root cgroup.
> 
> For me, it doesn't feel right to have load balancing disabled in the
> root cgroup as we probably cannot move all the tasks away from the root
> cgroup anyway. I am going to update the current patchset to incorporate
> suggestion from Tejun. It will probably be ready sometime next week.
> 

I've read half of the next patch that adds the isolation thing. And
while that kludges around the whole root cgorup is magic thing, it
doesn't help if you move the above scenario on level down:


	R
     /    \
   A        B
          /   \
        C       D


R: cpus=0-7, load_balance=0
A: cpus=0-1, load_balance=1
B: cpus=2-7, load_balance=0
C: cpus=2-3, load_balance=1
D: cpus=4-7, load_balance=1


Also, I feel we should strive to have a minimal amount of tasks that
cannot be moved out of the root group; the current set is far too large.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 2/5] cpuset: Add cpuset.sched_load_balance to v2
  2018-05-02 13:42         ` Peter Zijlstra
@ 2018-05-02 13:47           ` Waiman Long
  -1 siblings, 0 replies; 48+ messages in thread
From: Waiman Long @ 2018-05-02 13:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli

On 05/02/2018 09:42 AM, Peter Zijlstra wrote:
> On Wed, May 02, 2018 at 09:29:54AM -0400, Waiman Long wrote:
>> On 05/02/2018 06:24 AM, Peter Zijlstra wrote:
>>> On Thu, Apr 19, 2018 at 09:47:01AM -0400, Waiman Long wrote:
>>>> +  cpuset.sched_load_balance
>>>> +	A read-write single value file which exists on non-root cgroups.
>>> Uhhm.. it should very much exist in the root group too. Otherwise you
>>> cannot disable it there, which is required to allow smaller groups to
>>> load-balance between themselves.
>>>
>>>> +	The default is "1" (on), and the other possible value is "0"
>>>> +	(off).
>>>> +
>>>> +	When it is on, tasks within this cpuset will be load-balanced
>>>> +	by the kernel scheduler.  Tasks will be moved from CPUs with
>>>> +	high load to other CPUs within the same cpuset with less load
>>>> +	periodically.
>>>> +
>>>> +	When it is off, there will be no load balancing among CPUs on
>>>> +	this cgroup.  Tasks will stay in the CPUs they are running on
>>>> +	and will not be moved to other CPUs.
>>>> +
>>>> +	This flag is hierarchical and is inherited by child cpusets. It
>>>> +	can be turned off only when the CPUs in this cpuset aren't
>>>> +	listed in the cpuset.cpus of other sibling cgroups, and all
>>>> +	the child cpusets, if present, have this flag turned off.
>>>> +
>>>> +	Once it is off, it cannot be turned back on as long as the
>>>> +	parent cgroup still has this flag in the off state.
>>> That too is wrong and broken. You explicitly want to turn it on for
>>> children.
>>>
>>> So the idea is that you can have:
>>>
>>> 		R
>>> 	      /   \
>>>             A       B
>>>
>>> With:
>>>
>>> 	R cpus=0-3, load_balance=0
>>> 	A cpus=0-1, load_balance=1
>>> 	B cpus=2-3, load_balance=1
>>>
>>> Which will allow all tasks in A,B (and its children) to load-balance
>>> across 0-1 or 2-3 resp.
>>>
>>> If you don't allow the root group to disable load_balance, it will
>>> always be the largest group and load-balancing will always happen system
>>> wide.
>> If you look at the remaining patches in the series, I was proposing a
>> different way to support isolcpus and separate sched domains with
>> turning off load balancing in the root cgroup.
>>
>> For me, it doesn't feel right to have load balancing disabled in the
>> root cgroup as we probably cannot move all the tasks away from the root
>> cgroup anyway. I am going to update the current patchset to incorporate
>> suggestion from Tejun. It will probably be ready sometime next week.
>>
> I've read half of the next patch that adds the isolation thing. And
> while that kludges around the whole root cgorup is magic thing, it
> doesn't help if you move the above scenario on level down:
>
>
> 	R
>      /    \
>    A        B
>           /   \
>         C       D
>
>
> R: cpus=0-7, load_balance=0
> A: cpus=0-1, load_balance=1
> B: cpus=2-7, load_balance=0
> C: cpus=2-3, load_balance=1
> D: cpus=4-7, load_balance=1
>
>
> Also, I feel we should strive to have a minimal amount of tasks that
> cannot be moved out of the root group; the current set is far too large.

What exactly is the use case you have in mind with loading balancing
disabled in B, but enabled in C and D? We would like to support some
sensible use cases, but not every possible combinations.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 2/5] cpuset: Add cpuset.sched_load_balance to v2
@ 2018-05-02 13:47           ` Waiman Long
  0 siblings, 0 replies; 48+ messages in thread
From: Waiman Long @ 2018-05-02 13:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli

On 05/02/2018 09:42 AM, Peter Zijlstra wrote:
> On Wed, May 02, 2018 at 09:29:54AM -0400, Waiman Long wrote:
>> On 05/02/2018 06:24 AM, Peter Zijlstra wrote:
>>> On Thu, Apr 19, 2018 at 09:47:01AM -0400, Waiman Long wrote:
>>>> +  cpuset.sched_load_balance
>>>> +	A read-write single value file which exists on non-root cgroups.
>>> Uhhm.. it should very much exist in the root group too. Otherwise you
>>> cannot disable it there, which is required to allow smaller groups to
>>> load-balance between themselves.
>>>
>>>> +	The default is "1" (on), and the other possible value is "0"
>>>> +	(off).
>>>> +
>>>> +	When it is on, tasks within this cpuset will be load-balanced
>>>> +	by the kernel scheduler.  Tasks will be moved from CPUs with
>>>> +	high load to other CPUs within the same cpuset with less load
>>>> +	periodically.
>>>> +
>>>> +	When it is off, there will be no load balancing among CPUs on
>>>> +	this cgroup.  Tasks will stay in the CPUs they are running on
>>>> +	and will not be moved to other CPUs.
>>>> +
>>>> +	This flag is hierarchical and is inherited by child cpusets. It
>>>> +	can be turned off only when the CPUs in this cpuset aren't
>>>> +	listed in the cpuset.cpus of other sibling cgroups, and all
>>>> +	the child cpusets, if present, have this flag turned off.
>>>> +
>>>> +	Once it is off, it cannot be turned back on as long as the
>>>> +	parent cgroup still has this flag in the off state.
>>> That too is wrong and broken. You explicitly want to turn it on for
>>> children.
>>>
>>> So the idea is that you can have:
>>>
>>> 		R
>>> 	      /   \
>>>             A       B
>>>
>>> With:
>>>
>>> 	R cpus=0-3, load_balance=0
>>> 	A cpus=0-1, load_balance=1
>>> 	B cpus=2-3, load_balance=1
>>>
>>> Which will allow all tasks in A,B (and its children) to load-balance
>>> across 0-1 or 2-3 resp.
>>>
>>> If you don't allow the root group to disable load_balance, it will
>>> always be the largest group and load-balancing will always happen system
>>> wide.
>> If you look at the remaining patches in the series, I was proposing a
>> different way to support isolcpus and separate sched domains with
>> turning off load balancing in the root cgroup.
>>
>> For me, it doesn't feel right to have load balancing disabled in the
>> root cgroup as we probably cannot move all the tasks away from the root
>> cgroup anyway. I am going to update the current patchset to incorporate
>> suggestion from Tejun. It will probably be ready sometime next week.
>>
> I've read half of the next patch that adds the isolation thing. And
> while that kludges around the whole root cgorup is magic thing, it
> doesn't help if you move the above scenario on level down:
>
>
> 	R
>      /    \
>    A        B
>           /   \
>         C       D
>
>
> R: cpus=0-7, load_balance=0
> A: cpus=0-1, load_balance=1
> B: cpus=2-7, load_balance=0
> C: cpus=2-3, load_balance=1
> D: cpus=4-7, load_balance=1
>
>
> Also, I feel we should strive to have a minimal amount of tasks that
> cannot be moved out of the root group; the current set is far too large.

What exactly is the use case you have in mind with loading balancing
disabled in B, but enabled in C and D? We would like to support some
sensible use cases, but not every possible combinations.

Cheers,
Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 2/5] cpuset: Add cpuset.sched_load_balance to v2
  2018-05-02 13:47           ` Waiman Long
@ 2018-05-02 14:02             ` Peter Zijlstra
  -1 siblings, 0 replies; 48+ messages in thread
From: Peter Zijlstra @ 2018-05-02 14:02 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli

On Wed, May 02, 2018 at 09:47:00AM -0400, Waiman Long wrote:

> > I've read half of the next patch that adds the isolation thing. And
> > while that kludges around the whole root cgorup is magic thing, it
> > doesn't help if you move the above scenario on level down:
> >
> >
> > 	R
> >      /    \
> >    A        B
> >           /   \
> >         C       D
> >
> >
> > R: cpus=0-7, load_balance=0
> > A: cpus=0-1, load_balance=1
> > B: cpus=2-7, load_balance=0
> > C: cpus=2-3, load_balance=1
> > D: cpus=4-7, load_balance=1
> >
> >
> > Also, I feel we should strive to have a minimal amount of tasks that
> > cannot be moved out of the root group; the current set is far too large.
> 
> What exactly is the use case you have in mind with loading balancing
> disabled in B, but enabled in C and D? We would like to support some
> sensible use cases, but not every possible combinations.

Suppose A is your system group, and C and D are individual RT workloads
or something.

Or suppose B has siblings and each group at that level is a delegate to
a particular user/container. And the user/container in B happens to need
2 partitioned VMs or whatever.

The idea is the same in all the examples, you want to allow
sub-partitions.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 2/5] cpuset: Add cpuset.sched_load_balance to v2
@ 2018-05-02 14:02             ` Peter Zijlstra
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Zijlstra @ 2018-05-02 14:02 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli

On Wed, May 02, 2018 at 09:47:00AM -0400, Waiman Long wrote:

> > I've read half of the next patch that adds the isolation thing. And
> > while that kludges around the whole root cgorup is magic thing, it
> > doesn't help if you move the above scenario on level down:
> >
> >
> > 	R
> >      /    \
> >    A        B
> >           /   \
> >         C       D
> >
> >
> > R: cpus=0-7, load_balance=0
> > A: cpus=0-1, load_balance=1
> > B: cpus=2-7, load_balance=0
> > C: cpus=2-3, load_balance=1
> > D: cpus=4-7, load_balance=1
> >
> >
> > Also, I feel we should strive to have a minimal amount of tasks that
> > cannot be moved out of the root group; the current set is far too large.
> 
> What exactly is the use case you have in mind with loading balancing
> disabled in B, but enabled in C and D? We would like to support some
> sensible use cases, but not every possible combinations.

Suppose A is your system group, and C and D are individual RT workloads
or something.

Or suppose B has siblings and each group at that level is a delegate to
a particular user/container. And the user/container in B happens to need
2 partitioned VMs or whatever.

The idea is the same in all the examples, you want to allow
sub-partitions.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 3/5] cpuset: Add a root-only cpus.isolated v2 control file
  2018-04-19 13:47   ` Waiman Long
@ 2018-05-02 14:08     ` Peter Zijlstra
  -1 siblings, 0 replies; 48+ messages in thread
From: Peter Zijlstra @ 2018-05-02 14:08 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli

On Thu, Apr 19, 2018 at 09:47:02AM -0400, Waiman Long wrote:
> diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
> index c970bd7..8d89dc2 100644
> --- a/Documentation/cgroup-v2.txt
> +++ b/Documentation/cgroup-v2.txt
> @@ -1484,6 +1484,31 @@ Cpuset Interface Files
>  	a subset of "cpuset.cpus".  Its value will be affected by CPU
>  	hotplug events.
>  
> +  cpuset.cpus.isolated
> +	A read-write multiple values file which exists on root cgroup
> +	only.
> +
> +	It lists the CPUs that have been withdrawn from the root cgroup
> +	for load balancing.  These CPUs can still be allocated to child
> +	cpusets with load balancing enabled, if necessary.
> +
> +	If a child cpuset contains only an exclusive set of CPUs that are
> +	a subset of the isolated CPUs and with load balancing enabled,
> +	these CPUs will be load balanced on a separate root domain from
> +	the one in the root cgroup.
> +
> +	Just putting the CPUs into "cpuset.cpus.isolated" will be
> +	enough to disable load balancing on those CPUs as long as they
> +	do not appear in a child cpuset with load balancing enabled.
> +	Fine-grained control of cpu isolation can also be done by
> +	putting these isolated CPUs into child cpusets with load
> +	balancing disabled.
> +
> +	The "cpuset.cpus.isolated" should be set up before child
> +	cpusets are created.  Once child cpusets are present, changes
> +	to "cpuset.cpus.isolated" will not be allowed if the CPUs that
> +	change their states are in any of the child cpusets.
> +

So I see why you did this, but it is _really_ ugly and breaks the
container invariant.

Ideally we'd make the root group less special, not more special.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 3/5] cpuset: Add a root-only cpus.isolated v2 control file
@ 2018-05-02 14:08     ` Peter Zijlstra
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Zijlstra @ 2018-05-02 14:08 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli

On Thu, Apr 19, 2018 at 09:47:02AM -0400, Waiman Long wrote:
> diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
> index c970bd7..8d89dc2 100644
> --- a/Documentation/cgroup-v2.txt
> +++ b/Documentation/cgroup-v2.txt
> @@ -1484,6 +1484,31 @@ Cpuset Interface Files
>  	a subset of "cpuset.cpus".  Its value will be affected by CPU
>  	hotplug events.
>  
> +  cpuset.cpus.isolated
> +	A read-write multiple values file which exists on root cgroup
> +	only.
> +
> +	It lists the CPUs that have been withdrawn from the root cgroup
> +	for load balancing.  These CPUs can still be allocated to child
> +	cpusets with load balancing enabled, if necessary.
> +
> +	If a child cpuset contains only an exclusive set of CPUs that are
> +	a subset of the isolated CPUs and with load balancing enabled,
> +	these CPUs will be load balanced on a separate root domain from
> +	the one in the root cgroup.
> +
> +	Just putting the CPUs into "cpuset.cpus.isolated" will be
> +	enough to disable load balancing on those CPUs as long as they
> +	do not appear in a child cpuset with load balancing enabled.
> +	Fine-grained control of cpu isolation can also be done by
> +	putting these isolated CPUs into child cpusets with load
> +	balancing disabled.
> +
> +	The "cpuset.cpus.isolated" should be set up before child
> +	cpusets are created.  Once child cpusets are present, changes
> +	to "cpuset.cpus.isolated" will not be allowed if the CPUs that
> +	change their states are in any of the child cpusets.
> +

So I see why you did this, but it is _really_ ugly and breaks the
container invariant.

Ideally we'd make the root group less special, not more special.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 2/5] cpuset: Add cpuset.sched_load_balance to v2
  2018-05-02 14:02             ` Peter Zijlstra
@ 2018-05-02 14:35               ` Mike Galbraith
  -1 siblings, 0 replies; 48+ messages in thread
From: Mike Galbraith @ 2018-05-02 14:35 UTC (permalink / raw)
  To: Peter Zijlstra, Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, torvalds,
	Roman Gushchin, Juri Lelli

On Wed, 2018-05-02 at 16:02 +0200, Peter Zijlstra wrote:
> On Wed, May 02, 2018 at 09:47:00AM -0400, Waiman Long wrote:
> 
> > > I've read half of the next patch that adds the isolation thing. And
> > > while that kludges around the whole root cgorup is magic thing, it
> > > doesn't help if you move the above scenario on level down:
> > >
> > >
> > > 	R
> > >      /    \
> > >    A        B
> > >           /   \
> > >         C       D
> > >
> > >
> > > R: cpus=0-7, load_balance=0
> > > A: cpus=0-1, load_balance=1
> > > B: cpus=2-7, load_balance=0
> > > C: cpus=2-3, load_balance=1
> > > D: cpus=4-7, load_balance=1
> > >
> > >
> > > Also, I feel we should strive to have a minimal amount of tasks that
> > > cannot be moved out of the root group; the current set is far too large.
> > 
> > What exactly is the use case you have in mind with loading balancing
> > disabled in B, but enabled in C and D? We would like to support some
> > sensible use cases, but not every possible combinations.
> 
> Suppose A is your system group, and C and D are individual RT workloads
> or something.

Yeah, it does have a distinct "640K ought to be enough for anybody"
flavor to it.

	-Mike

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 2/5] cpuset: Add cpuset.sched_load_balance to v2
@ 2018-05-02 14:35               ` Mike Galbraith
  0 siblings, 0 replies; 48+ messages in thread
From: Mike Galbraith @ 2018-05-02 14:35 UTC (permalink / raw)
  To: Peter Zijlstra, Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, torvalds,
	Roman Gushchin, Juri Lelli

On Wed, 2018-05-02 at 16:02 +0200, Peter Zijlstra wrote:
> On Wed, May 02, 2018 at 09:47:00AM -0400, Waiman Long wrote:
> 
> > > I've read half of the next patch that adds the isolation thing. And
> > > while that kludges around the whole root cgorup is magic thing, it
> > > doesn't help if you move the above scenario on level down:
> > >
> > >
> > > 	R
> > >      /    \
> > >    A        B
> > >           /   \
> > >         C       D
> > >
> > >
> > > R: cpus=0-7, load_balance=0
> > > A: cpus=0-1, load_balance=1
> > > B: cpus=2-7, load_balance=0
> > > C: cpus=2-3, load_balance=1
> > > D: cpus=4-7, load_balance=1
> > >
> > >
> > > Also, I feel we should strive to have a minimal amount of tasks that
> > > cannot be moved out of the root group; the current set is far too large.
> > 
> > What exactly is the use case you have in mind with loading balancing
> > disabled in B, but enabled in C and D? We would like to support some
> > sensible use cases, but not every possible combinations.
> 
> Suppose A is your system group, and C and D are individual RT workloads
> or something.

Yeah, it does have a distinct "640K ought to be enough for anybody"
flavor to it.

	-Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 3/5] cpuset: Add a root-only cpus.isolated v2 control file
  2018-05-02 14:08     ` Peter Zijlstra
@ 2018-05-08  0:30       ` Waiman Long
  -1 siblings, 0 replies; 48+ messages in thread
From: Waiman Long @ 2018-05-08  0:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli

On 05/02/2018 10:08 AM, Peter Zijlstra wrote:
> On Thu, Apr 19, 2018 at 09:47:02AM -0400, Waiman Long wrote:
>> diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
>> index c970bd7..8d89dc2 100644
>> --- a/Documentation/cgroup-v2.txt
>> +++ b/Documentation/cgroup-v2.txt
>> @@ -1484,6 +1484,31 @@ Cpuset Interface Files
>>  	a subset of "cpuset.cpus".  Its value will be affected by CPU
>>  	hotplug events.
>>  
>> +  cpuset.cpus.isolated
>> +	A read-write multiple values file which exists on root cgroup
>> +	only.
>> +
>> +	It lists the CPUs that have been withdrawn from the root cgroup
>> +	for load balancing.  These CPUs can still be allocated to child
>> +	cpusets with load balancing enabled, if necessary.
>> +
>> +	If a child cpuset contains only an exclusive set of CPUs that are
>> +	a subset of the isolated CPUs and with load balancing enabled,
>> +	these CPUs will be load balanced on a separate root domain from
>> +	the one in the root cgroup.
>> +
>> +	Just putting the CPUs into "cpuset.cpus.isolated" will be
>> +	enough to disable load balancing on those CPUs as long as they
>> +	do not appear in a child cpuset with load balancing enabled.
>> +	Fine-grained control of cpu isolation can also be done by
>> +	putting these isolated CPUs into child cpusets with load
>> +	balancing disabled.
>> +
>> +	The "cpuset.cpus.isolated" should be set up before child
>> +	cpusets are created.  Once child cpusets are present, changes
>> +	to "cpuset.cpus.isolated" will not be allowed if the CPUs that
>> +	change their states are in any of the child cpusets.
>> +
> So I see why you did this, but it is _really_ ugly and breaks the
> container invariant.
>
> Ideally we'd make the root group less special, not more special.

Yes, I am planning to make the root cgroup less special by putting a new
isolation flag into all the non-root cgroup.

The container invariant thing, however, is a bit hard to do. Do we
really need a container root to behave exactly like the real root? I
guess we can make that happen if we really want to, but it will
certainly make the code more complex. So it is a trade-off about what is
worth to do and what is not.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 3/5] cpuset: Add a root-only cpus.isolated v2 control file
@ 2018-05-08  0:30       ` Waiman Long
  0 siblings, 0 replies; 48+ messages in thread
From: Waiman Long @ 2018-05-08  0:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli

On 05/02/2018 10:08 AM, Peter Zijlstra wrote:
> On Thu, Apr 19, 2018 at 09:47:02AM -0400, Waiman Long wrote:
>> diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
>> index c970bd7..8d89dc2 100644
>> --- a/Documentation/cgroup-v2.txt
>> +++ b/Documentation/cgroup-v2.txt
>> @@ -1484,6 +1484,31 @@ Cpuset Interface Files
>>  	a subset of "cpuset.cpus".  Its value will be affected by CPU
>>  	hotplug events.
>>  
>> +  cpuset.cpus.isolated
>> +	A read-write multiple values file which exists on root cgroup
>> +	only.
>> +
>> +	It lists the CPUs that have been withdrawn from the root cgroup
>> +	for load balancing.  These CPUs can still be allocated to child
>> +	cpusets with load balancing enabled, if necessary.
>> +
>> +	If a child cpuset contains only an exclusive set of CPUs that are
>> +	a subset of the isolated CPUs and with load balancing enabled,
>> +	these CPUs will be load balanced on a separate root domain from
>> +	the one in the root cgroup.
>> +
>> +	Just putting the CPUs into "cpuset.cpus.isolated" will be
>> +	enough to disable load balancing on those CPUs as long as they
>> +	do not appear in a child cpuset with load balancing enabled.
>> +	Fine-grained control of cpu isolation can also be done by
>> +	putting these isolated CPUs into child cpusets with load
>> +	balancing disabled.
>> +
>> +	The "cpuset.cpus.isolated" should be set up before child
>> +	cpusets are created.  Once child cpusets are present, changes
>> +	to "cpuset.cpus.isolated" will not be allowed if the CPUs that
>> +	change their states are in any of the child cpusets.
>> +
> So I see why you did this, but it is _really_ ugly and breaks the
> container invariant.
>
> Ideally we'd make the root group less special, not more special.

Yes, I am planning to make the root cgroup less special by putting a new
isolation flag into all the non-root cgroup.

The container invariant thing, however, is a bit hard to do. Do we
really need a container root to behave exactly like the real root? I
guess we can make that happen if we really want to, but it will
certainly make the code more complex. So it is a trade-off about what is
worth to do and what is not.

Cheers,
Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2018-05-08  0:30 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-19 13:46 [PATCH v7 0/5] cpuset: Enable cpuset controller in default hierarchy Waiman Long
2018-04-19 13:46 ` Waiman Long
2018-04-19 13:47 ` [PATCH v7 1/5] " Waiman Long
2018-04-19 13:47   ` Waiman Long
2018-04-19 13:47 ` [PATCH v7 2/5] cpuset: Add cpuset.sched_load_balance to v2 Waiman Long
2018-04-19 13:47   ` Waiman Long
2018-05-02 10:24   ` Peter Zijlstra
2018-05-02 10:24     ` Peter Zijlstra
2018-05-02 13:29     ` Waiman Long
2018-05-02 13:29       ` Waiman Long
2018-05-02 13:42       ` Peter Zijlstra
2018-05-02 13:42         ` Peter Zijlstra
2018-05-02 13:47         ` Waiman Long
2018-05-02 13:47           ` Waiman Long
2018-05-02 14:02           ` Peter Zijlstra
2018-05-02 14:02             ` Peter Zijlstra
2018-05-02 14:35             ` Mike Galbraith
2018-05-02 14:35               ` Mike Galbraith
2018-04-19 13:47 ` [PATCH v7 3/5] cpuset: Add a root-only cpus.isolated v2 control file Waiman Long
2018-04-19 13:47   ` Waiman Long
2018-04-23 15:56   ` Juri Lelli
2018-04-23 15:56     ` Juri Lelli
2018-05-02 14:08   ` Peter Zijlstra
2018-05-02 14:08     ` Peter Zijlstra
2018-05-08  0:30     ` Waiman Long
2018-05-08  0:30       ` Waiman Long
2018-04-19 13:47 ` [PATCH v7 4/5] cpuset: Restrict load balancing off cpus to subset of cpus.isolated Waiman Long
2018-04-19 13:47   ` Waiman Long
2018-05-01 19:51   ` Tejun Heo
2018-05-01 19:51     ` Tejun Heo
2018-05-01 20:33     ` Waiman Long
2018-05-01 20:33       ` Waiman Long
2018-05-01 20:58       ` Tejun Heo
2018-05-01 20:58         ` Tejun Heo
2018-05-01 21:31         ` Waiman Long
2018-05-01 21:31           ` Waiman Long
2018-04-19 13:47 ` [PATCH v7 5/5] cpuset: Make generate_sched_domains() recognize isolated_cpus Waiman Long
2018-04-19 13:47   ` Waiman Long
2018-04-20  8:23 ` [PATCH v7 0/5] cpuset: Enable cpuset controller in default hierarchy Mike Galbraith
2018-04-20  8:23   ` Mike Galbraith
2018-04-23 16:32   ` Waiman Long
2018-04-23 16:32     ` Waiman Long
2018-04-23 13:07 ` Juri Lelli
2018-04-23 13:07   ` Juri Lelli
2018-04-23 13:57   ` Juri Lelli
2018-04-23 13:57     ` Juri Lelli
2018-04-23 14:10     ` Waiman Long
2018-04-23 14:10       ` Waiman Long

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.