All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v10 0/9] cpuset: Enable cpuset controller in default hierarchy
@ 2018-06-18  4:13 ` Waiman Long
  0 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-18  4:13 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Patrick Bellasi, Waiman Long

v10:
 - Remove the cpuset.sched.load_balance patch for now as it may not
   be that useful.
 - Break the large patch 2 into smaller patches to make them a bit
   easier to review.
 - Test and fix issues related to changing "cpuset.cpus" and cpu
   online/offline in a domain root.
 - Rename isolated_cpus to reserved_cpus as this cpumask holds CPUs
   reserved for child sched domains.
 - Rework the scheduling domain debug printing code in the last patch.
 - Document update to the newly moved
   Documentation/admin-guide/cgroup-v2.rst.

v9:
 - Rename cpuset.sched.domain to cpuset.sched.domain_root to better
   identify its purpose as the root of a new scheduling domain or
   partition.
 - Clarify in the document about the purpose of domain_root and
   load_balance. Using domain_root is th only way to create new
   partition.
 - Fix a lockdep warning in update_isolated_cpumask() function.
 - Add a new patch to eliminate call to generate_sched_domains() for
   v2 when a change in cpu list does not touch a domain_root.

v8:
 - Remove cpuset.cpus.isolated and add a new cpuset.sched.domain flag
   and rework the code accordingly.

v7:
 - Add a root-only cpuset.cpus.isolated control file for CPU isolation.
 - Enforce that load_balancing can only be turned off on cpusets with
   CPUs from the isolated list.
 - Update sched domain generation to allow cpusets with CPUs only
   from the isolated CPU list to be in separate root domains.

v6:
 - Hide cpuset control knobs in root cgroup.
 - Rename effective_cpus and effective_mems to cpus.effective and
   mems.effective respectively.
 - Remove cpuset.flags and add cpuset.sched_load_balance instead
   as the behavior of sched_load_balance has changed and so is
   not a simple flag.
 - Update cgroup-v2.txt accordingly.

v5:
 - Add patch 2 to provide the cpuset.flags control knob for the
   sched_load_balance flag which should be the only feature that is
   essential as a replacement of the "isolcpus" kernel boot parameter.

v4:
 - Further minimize the feature set by removing the flags control knob.

v3:
 - Further trim the additional features down to just memory_migrate.
 - Update Documentation/cgroup-v2.txt.

v7 patch: https://lkml.org/lkml/2018/4/19/448
v8 patch: https://lkml.org/lkml/2018/5/17/939
v9 patch: https://lkml.org/lkml/2018/5/29/507

The purpose of this patchset is to provide a basic set of cpuset control
files for cgroup v2. This basic set includes the non-root "cpus", "mems"
and "sched.domain_root". The "cpus.effective" and "mems.effective"
will appear in all cpuset-enabled cgroups.

The new control file that is unique to v2 is "sched.domain_root". It
is a boolean flag file that designates if a cgroup is the root of a new
scheduling domain or partition with its own set of unique list of CPUs
from scheduling perspective disjointed from other partitions. The root
cgroup is always a scheduling domain root. Multiple levels of scheduling
domains are supported with some limitations. So a container scheduling
domain root can behave like a real root.

When a scheduling domain root cgroup is removed, its list of exclusive
CPUs will be returned to the parent's cpus.effective automatically.

This patchset does not exclude the possibility of adding more features
in the future after careful consideration.

Patch 1 enables cpuset in cgroup v2 with cpus, mems and their effective
counterparts.

Patch 2 adds a new "sched.domain_root" control file for setting up
multiple scheduling domains or partitions. A scheduling domain root
implies cpu_exclusive.

Patch 3 handles the proper deletion of a domain root cgroup by turning
off the domain_root flag automatically before deletion.

Patch 4 allows "cpuset.cpus" of a domain root cgroup to be changed
subject to certain constraints.

Patch 5 makes the hotplug code deal with domain root properly.

Patch 6 updates the scheduling domain genaration code to work with
the new domain root feature.

Patch 7 exposes cpus.effective and mems.effective to the root cgroup as
enabling child scheduling domains will take CPUs away from the root cgroup.
So it will be nice to monitor what CPUs are left there.

Patch 8 eliminates the need to rebuild sched domains for v2 if cpu list
changes occur to non-domain root cpusets only.

Patch 9 enables the printing the debug information about scheduling
domain generation.

Waiman Long (9):
  cpuset: Enable cpuset controller in default hierarchy
  cpuset: Add new v2 cpuset.sched.domain_root flag
  cpuset: Simulate auto-off of sched.domain_root at cgroup removal
  cpuset: Allow changes to cpus in a domain root
  cpuset: Make sure that domain roots work properly with CPU hotplug
  cpuset: Make generate_sched_domains() recognize reserved_cpus
  cpuset: Expose cpus.effective and mems.effective on cgroup v2 root
  cpuset: Don't rebuild sched domains if cpu changes in non-domain root
  cpuset: Allow reporting of sched domain generation info

 Documentation/admin-guide/cgroup-v2.rst | 158 ++++++++++++-
 kernel/cgroup/cpuset.c                  | 406 ++++++++++++++++++++++++++++++--
 2 files changed, 543 insertions(+), 21 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v10 0/9] cpuset: Enable cpuset controller in default hierarchy
@ 2018-06-18  4:13 ` Waiman Long
  0 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-18  4:13 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Patrick Bellasi, Waiman Long

v10:
 - Remove the cpuset.sched.load_balance patch for now as it may not
   be that useful.
 - Break the large patch 2 into smaller patches to make them a bit
   easier to review.
 - Test and fix issues related to changing "cpuset.cpus" and cpu
   online/offline in a domain root.
 - Rename isolated_cpus to reserved_cpus as this cpumask holds CPUs
   reserved for child sched domains.
 - Rework the scheduling domain debug printing code in the last patch.
 - Document update to the newly moved
   Documentation/admin-guide/cgroup-v2.rst.

v9:
 - Rename cpuset.sched.domain to cpuset.sched.domain_root to better
   identify its purpose as the root of a new scheduling domain or
   partition.
 - Clarify in the document about the purpose of domain_root and
   load_balance. Using domain_root is th only way to create new
   partition.
 - Fix a lockdep warning in update_isolated_cpumask() function.
 - Add a new patch to eliminate call to generate_sched_domains() for
   v2 when a change in cpu list does not touch a domain_root.

v8:
 - Remove cpuset.cpus.isolated and add a new cpuset.sched.domain flag
   and rework the code accordingly.

v7:
 - Add a root-only cpuset.cpus.isolated control file for CPU isolation.
 - Enforce that load_balancing can only be turned off on cpusets with
   CPUs from the isolated list.
 - Update sched domain generation to allow cpusets with CPUs only
   from the isolated CPU list to be in separate root domains.

v6:
 - Hide cpuset control knobs in root cgroup.
 - Rename effective_cpus and effective_mems to cpus.effective and
   mems.effective respectively.
 - Remove cpuset.flags and add cpuset.sched_load_balance instead
   as the behavior of sched_load_balance has changed and so is
   not a simple flag.
 - Update cgroup-v2.txt accordingly.

v5:
 - Add patch 2 to provide the cpuset.flags control knob for the
   sched_load_balance flag which should be the only feature that is
   essential as a replacement of the "isolcpus" kernel boot parameter.

v4:
 - Further minimize the feature set by removing the flags control knob.

v3:
 - Further trim the additional features down to just memory_migrate.
 - Update Documentation/cgroup-v2.txt.

v7 patch: https://lkml.org/lkml/2018/4/19/448
v8 patch: https://lkml.org/lkml/2018/5/17/939
v9 patch: https://lkml.org/lkml/2018/5/29/507

The purpose of this patchset is to provide a basic set of cpuset control
files for cgroup v2. This basic set includes the non-root "cpus", "mems"
and "sched.domain_root". The "cpus.effective" and "mems.effective"
will appear in all cpuset-enabled cgroups.

The new control file that is unique to v2 is "sched.domain_root". It
is a boolean flag file that designates if a cgroup is the root of a new
scheduling domain or partition with its own set of unique list of CPUs
from scheduling perspective disjointed from other partitions. The root
cgroup is always a scheduling domain root. Multiple levels of scheduling
domains are supported with some limitations. So a container scheduling
domain root can behave like a real root.

When a scheduling domain root cgroup is removed, its list of exclusive
CPUs will be returned to the parent's cpus.effective automatically.

This patchset does not exclude the possibility of adding more features
in the future after careful consideration.

Patch 1 enables cpuset in cgroup v2 with cpus, mems and their effective
counterparts.

Patch 2 adds a new "sched.domain_root" control file for setting up
multiple scheduling domains or partitions. A scheduling domain root
implies cpu_exclusive.

Patch 3 handles the proper deletion of a domain root cgroup by turning
off the domain_root flag automatically before deletion.

Patch 4 allows "cpuset.cpus" of a domain root cgroup to be changed
subject to certain constraints.

Patch 5 makes the hotplug code deal with domain root properly.

Patch 6 updates the scheduling domain genaration code to work with
the new domain root feature.

Patch 7 exposes cpus.effective and mems.effective to the root cgroup as
enabling child scheduling domains will take CPUs away from the root cgroup.
So it will be nice to monitor what CPUs are left there.

Patch 8 eliminates the need to rebuild sched domains for v2 if cpu list
changes occur to non-domain root cpusets only.

Patch 9 enables the printing the debug information about scheduling
domain generation.

Waiman Long (9):
  cpuset: Enable cpuset controller in default hierarchy
  cpuset: Add new v2 cpuset.sched.domain_root flag
  cpuset: Simulate auto-off of sched.domain_root at cgroup removal
  cpuset: Allow changes to cpus in a domain root
  cpuset: Make sure that domain roots work properly with CPU hotplug
  cpuset: Make generate_sched_domains() recognize reserved_cpus
  cpuset: Expose cpus.effective and mems.effective on cgroup v2 root
  cpuset: Don't rebuild sched domains if cpu changes in non-domain root
  cpuset: Allow reporting of sched domain generation info

 Documentation/admin-guide/cgroup-v2.rst | 158 ++++++++++++-
 kernel/cgroup/cpuset.c                  | 406 ++++++++++++++++++++++++++++++--
 2 files changed, 543 insertions(+), 21 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v10 1/9] cpuset: Enable cpuset controller in default hierarchy
  2018-06-18  4:13 ` Waiman Long
@ 2018-06-18  4:14   ` Waiman Long
  -1 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-18  4:14 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Patrick Bellasi, Waiman Long

Given the fact that thread mode had been merged into 4.14, it is now
time to enable cpuset to be used in the default hierarchy (cgroup v2)
as it is clearly threaded.

The cpuset controller had experienced feature creep since its
introduction more than a decade ago. Besides the core cpus and mems
control files to limit cpus and memory nodes, there are a bunch of
additional features that can be controlled from the userspace. Some of
the features are of doubtful usefulness and may not be actively used.

This patch enables cpuset controller in the default hierarchy with
a minimal set of features, namely just the cpus and mems and their
effective_* counterparts.  We can certainly add more features to the
default hierarchy in the future if there is a real user need for them
later on.

Alternatively, with the unified hiearachy, it may make more sense
to move some of those additional cpuset features, if desired, to
memory controller or may be to the cpu controller instead of staying
with cpuset.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/admin-guide/cgroup-v2.rst | 109 ++++++++++++++++++++++++++++++--
 kernel/cgroup/cpuset.c                  |  48 +++++++++++++-
 2 files changed, 149 insertions(+), 8 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 8a2c52d..fbc30b6 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -53,11 +53,13 @@ v1 is available under Documentation/cgroup-v1/.
        5-3-2. Writeback
      5-4. PID
        5-4-1. PID Interface Files
-     5-5. Device
-     5-6. RDMA
-       5-6-1. RDMA Interface Files
-     5-7. Misc
-       5-7-1. perf_event
+     5-5. Cpuset
+       5.5-1. Cpuset Interface Files
+     5-6. Device
+     5-7. RDMA
+       5-7-1. RDMA Interface Files
+     5-8. Misc
+       5-8-1. perf_event
      5-N. Non-normative information
        5-N-1. CPU controller root cgroup process behaviour
        5-N-2. IO controller root cgroup process behaviour
@@ -1486,6 +1488,103 @@ through fork() or clone(). These will return -EAGAIN if the creation
 of a new process would cause a cgroup policy to be violated.
 
 
+Cpuset
+------
+
+The "cpuset" controller provides a mechanism for constraining
+the CPU and memory node placement of tasks to only the resources
+specified in the cpuset interface files in a task's current cgroup.
+This is especially valuable on large NUMA systems where placing jobs
+on properly sized subsets of the systems with careful processor and
+memory placement to reduce cross-node memory access and contention
+can improve overall system performance.
+
+The "cpuset" controller is hierarchical.  That means the controller
+cannot use CPUs or memory nodes not allowed in its parent.
+
+
+Cpuset Interface Files
+~~~~~~~~~~~~~~~~~~~~~~
+
+  cpuset.cpus
+	A read-write multiple values file which exists on non-root
+	cpuset-enabled cgroups.
+
+	It lists the requested CPUs to be used by tasks within this
+	cgroup.  The actual list of CPUs to be granted, however, is
+	subjected to constraints imposed by its parent and can differ
+	from the requested CPUs.
+
+	The CPU numbers are comma-separated numbers or ranges.
+	For example:
+
+	  # cat cpuset.cpus
+	  0-4,6,8-10
+
+	An empty value indicates that the cgroup is using the same
+	setting as the nearest cgroup ancestor with a non-empty
+	"cpuset.cpus" or all the available CPUs if none is found.
+
+	The value of "cpuset.cpus" stays constant until the next update
+	and won't be affected by any CPU hotplug events.
+
+  cpuset.cpus.effective
+	A read-only multiple values file which exists on non-root
+	cpuset-enabled cgroups.
+
+	It lists the onlined CPUs that are actually granted to this
+	cgroup by its parent.  These CPUs are allowed to be used by
+	tasks within the current cgroup.
+
+	If "cpuset.cpus" is empty, the "cpuset.cpus.effective" file shows
+	all the CPUs from the parent cgroup that can be available to
+	be used by this cgroup.  Otherwise, it should be a subset of
+	"cpuset.cpus" unless none of the CPUs listed in "cpuset.cpus"
+	can be granted.  In this case, it will be treated just like an
+	empty "cpuset.cpus".
+
+	Its value will be affected by CPU hotplug events.
+
+  cpuset.mems
+	A read-write multiple values file which exists on non-root
+	cpuset-enabled cgroups.
+
+	It lists the requested memory nodes to be used by tasks within
+	this cgroup.  The actual list of memory nodes granted, however,
+	is subjected to constraints imposed by its parent and can differ
+	from the requested memory nodes.
+
+	The memory node numbers are comma-separated numbers or ranges.
+	For example:
+
+	  # cat cpuset.mems
+	  0-1,3
+
+	An empty value indicates that the cgroup is using the same
+	setting as the nearest cgroup ancestor with a non-empty
+	"cpuset.mems" or all the available memory nodes if none
+	is found.
+
+	The value of "cpuset.mems" stays constant until the next update
+	and won't be affected by any memory nodes hotplug events.
+
+  cpuset.mems.effective
+	A read-only multiple values file which exists on non-root
+	cpuset-enabled cgroups.
+
+	It lists the onlined memory nodes that are actually granted to
+	this cgroup by its parent. These memory nodes are allowed to
+	be used by tasks within the current cgroup.
+
+	If "cpuset.mems" is empty, it shows all the memory nodes from the
+	parent cgroup that will be available to be used by this cgroup.
+	Otherwise, it should be a subset of "cpuset.mems" unless none of
+	the memory nodes listed in "cpuset.mems" can be granted.  In this
+	case, it will be treated just like an empty "cpuset.mems".
+
+	Its value will be affected by memory nodes hotplug events.
+
+
 Device controller
 -----------------
 
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 266f10c..2b5c447 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1824,12 +1824,11 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
 	return 0;
 }
 
-
 /*
  * for the common functions, 'private' gives the type of file
  */
 
-static struct cftype files[] = {
+static struct cftype legacy_files[] = {
 	{
 		.name = "cpus",
 		.seq_show = cpuset_common_seq_show,
@@ -1932,6 +1931,47 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
 };
 
 /*
+ * This is currently a minimal set for the default hierarchy. It can be
+ * expanded later on by migrating more features and control files from v1.
+ */
+static struct cftype dfl_files[] = {
+	{
+		.name = "cpus",
+		.seq_show = cpuset_common_seq_show,
+		.write = cpuset_write_resmask,
+		.max_write_len = (100U + 6 * NR_CPUS),
+		.private = FILE_CPULIST,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
+
+	{
+		.name = "mems",
+		.seq_show = cpuset_common_seq_show,
+		.write = cpuset_write_resmask,
+		.max_write_len = (100U + 6 * MAX_NUMNODES),
+		.private = FILE_MEMLIST,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
+
+	{
+		.name = "cpus.effective",
+		.seq_show = cpuset_common_seq_show,
+		.private = FILE_EFFECTIVE_CPULIST,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
+
+	{
+		.name = "mems.effective",
+		.seq_show = cpuset_common_seq_show,
+		.private = FILE_EFFECTIVE_MEMLIST,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
+
+	{ }	/* terminate */
+};
+
+
+/*
  *	cpuset_css_alloc - allocate a cpuset css
  *	cgrp:	control group that the new cpuset will be part of
  */
@@ -2105,8 +2145,10 @@ struct cgroup_subsys cpuset_cgrp_subsys = {
 	.post_attach	= cpuset_post_attach,
 	.bind		= cpuset_bind,
 	.fork		= cpuset_fork,
-	.legacy_cftypes	= files,
+	.legacy_cftypes	= legacy_files,
+	.dfl_cftypes	= dfl_files,
 	.early_init	= true,
+	.threaded	= true,
 };
 
 /**
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v10 1/9] cpuset: Enable cpuset controller in default hierarchy
@ 2018-06-18  4:14   ` Waiman Long
  0 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-18  4:14 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Patrick Bellasi, Waiman Long

Given the fact that thread mode had been merged into 4.14, it is now
time to enable cpuset to be used in the default hierarchy (cgroup v2)
as it is clearly threaded.

The cpuset controller had experienced feature creep since its
introduction more than a decade ago. Besides the core cpus and mems
control files to limit cpus and memory nodes, there are a bunch of
additional features that can be controlled from the userspace. Some of
the features are of doubtful usefulness and may not be actively used.

This patch enables cpuset controller in the default hierarchy with
a minimal set of features, namely just the cpus and mems and their
effective_* counterparts.  We can certainly add more features to the
default hierarchy in the future if there is a real user need for them
later on.

Alternatively, with the unified hiearachy, it may make more sense
to move some of those additional cpuset features, if desired, to
memory controller or may be to the cpu controller instead of staying
with cpuset.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/admin-guide/cgroup-v2.rst | 109 ++++++++++++++++++++++++++++++--
 kernel/cgroup/cpuset.c                  |  48 +++++++++++++-
 2 files changed, 149 insertions(+), 8 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 8a2c52d..fbc30b6 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -53,11 +53,13 @@ v1 is available under Documentation/cgroup-v1/.
        5-3-2. Writeback
      5-4. PID
        5-4-1. PID Interface Files
-     5-5. Device
-     5-6. RDMA
-       5-6-1. RDMA Interface Files
-     5-7. Misc
-       5-7-1. perf_event
+     5-5. Cpuset
+       5.5-1. Cpuset Interface Files
+     5-6. Device
+     5-7. RDMA
+       5-7-1. RDMA Interface Files
+     5-8. Misc
+       5-8-1. perf_event
      5-N. Non-normative information
        5-N-1. CPU controller root cgroup process behaviour
        5-N-2. IO controller root cgroup process behaviour
@@ -1486,6 +1488,103 @@ through fork() or clone(). These will return -EAGAIN if the creation
 of a new process would cause a cgroup policy to be violated.
 
 
+Cpuset
+------
+
+The "cpuset" controller provides a mechanism for constraining
+the CPU and memory node placement of tasks to only the resources
+specified in the cpuset interface files in a task's current cgroup.
+This is especially valuable on large NUMA systems where placing jobs
+on properly sized subsets of the systems with careful processor and
+memory placement to reduce cross-node memory access and contention
+can improve overall system performance.
+
+The "cpuset" controller is hierarchical.  That means the controller
+cannot use CPUs or memory nodes not allowed in its parent.
+
+
+Cpuset Interface Files
+~~~~~~~~~~~~~~~~~~~~~~
+
+  cpuset.cpus
+	A read-write multiple values file which exists on non-root
+	cpuset-enabled cgroups.
+
+	It lists the requested CPUs to be used by tasks within this
+	cgroup.  The actual list of CPUs to be granted, however, is
+	subjected to constraints imposed by its parent and can differ
+	from the requested CPUs.
+
+	The CPU numbers are comma-separated numbers or ranges.
+	For example:
+
+	  # cat cpuset.cpus
+	  0-4,6,8-10
+
+	An empty value indicates that the cgroup is using the same
+	setting as the nearest cgroup ancestor with a non-empty
+	"cpuset.cpus" or all the available CPUs if none is found.
+
+	The value of "cpuset.cpus" stays constant until the next update
+	and won't be affected by any CPU hotplug events.
+
+  cpuset.cpus.effective
+	A read-only multiple values file which exists on non-root
+	cpuset-enabled cgroups.
+
+	It lists the onlined CPUs that are actually granted to this
+	cgroup by its parent.  These CPUs are allowed to be used by
+	tasks within the current cgroup.
+
+	If "cpuset.cpus" is empty, the "cpuset.cpus.effective" file shows
+	all the CPUs from the parent cgroup that can be available to
+	be used by this cgroup.  Otherwise, it should be a subset of
+	"cpuset.cpus" unless none of the CPUs listed in "cpuset.cpus"
+	can be granted.  In this case, it will be treated just like an
+	empty "cpuset.cpus".
+
+	Its value will be affected by CPU hotplug events.
+
+  cpuset.mems
+	A read-write multiple values file which exists on non-root
+	cpuset-enabled cgroups.
+
+	It lists the requested memory nodes to be used by tasks within
+	this cgroup.  The actual list of memory nodes granted, however,
+	is subjected to constraints imposed by its parent and can differ
+	from the requested memory nodes.
+
+	The memory node numbers are comma-separated numbers or ranges.
+	For example:
+
+	  # cat cpuset.mems
+	  0-1,3
+
+	An empty value indicates that the cgroup is using the same
+	setting as the nearest cgroup ancestor with a non-empty
+	"cpuset.mems" or all the available memory nodes if none
+	is found.
+
+	The value of "cpuset.mems" stays constant until the next update
+	and won't be affected by any memory nodes hotplug events.
+
+  cpuset.mems.effective
+	A read-only multiple values file which exists on non-root
+	cpuset-enabled cgroups.
+
+	It lists the onlined memory nodes that are actually granted to
+	this cgroup by its parent. These memory nodes are allowed to
+	be used by tasks within the current cgroup.
+
+	If "cpuset.mems" is empty, it shows all the memory nodes from the
+	parent cgroup that will be available to be used by this cgroup.
+	Otherwise, it should be a subset of "cpuset.mems" unless none of
+	the memory nodes listed in "cpuset.mems" can be granted.  In this
+	case, it will be treated just like an empty "cpuset.mems".
+
+	Its value will be affected by memory nodes hotplug events.
+
+
 Device controller
 -----------------
 
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 266f10c..2b5c447 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1824,12 +1824,11 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
 	return 0;
 }
 
-
 /*
  * for the common functions, 'private' gives the type of file
  */
 
-static struct cftype files[] = {
+static struct cftype legacy_files[] = {
 	{
 		.name = "cpus",
 		.seq_show = cpuset_common_seq_show,
@@ -1932,6 +1931,47 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
 };
 
 /*
+ * This is currently a minimal set for the default hierarchy. It can be
+ * expanded later on by migrating more features and control files from v1.
+ */
+static struct cftype dfl_files[] = {
+	{
+		.name = "cpus",
+		.seq_show = cpuset_common_seq_show,
+		.write = cpuset_write_resmask,
+		.max_write_len = (100U + 6 * NR_CPUS),
+		.private = FILE_CPULIST,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
+
+	{
+		.name = "mems",
+		.seq_show = cpuset_common_seq_show,
+		.write = cpuset_write_resmask,
+		.max_write_len = (100U + 6 * MAX_NUMNODES),
+		.private = FILE_MEMLIST,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
+
+	{
+		.name = "cpus.effective",
+		.seq_show = cpuset_common_seq_show,
+		.private = FILE_EFFECTIVE_CPULIST,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
+
+	{
+		.name = "mems.effective",
+		.seq_show = cpuset_common_seq_show,
+		.private = FILE_EFFECTIVE_MEMLIST,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
+
+	{ }	/* terminate */
+};
+
+
+/*
  *	cpuset_css_alloc - allocate a cpuset css
  *	cgrp:	control group that the new cpuset will be part of
  */
@@ -2105,8 +2145,10 @@ struct cgroup_subsys cpuset_cgrp_subsys = {
 	.post_attach	= cpuset_post_attach,
 	.bind		= cpuset_bind,
 	.fork		= cpuset_fork,
-	.legacy_cftypes	= files,
+	.legacy_cftypes	= legacy_files,
+	.dfl_cftypes	= dfl_files,
 	.early_init	= true,
+	.threaded	= true,
 };
 
 /**
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v10 2/9] cpuset: Add new v2 cpuset.sched.domain_root flag
  2018-06-18  4:13 ` Waiman Long
@ 2018-06-18  4:14   ` Waiman Long
  -1 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-18  4:14 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Patrick Bellasi, Waiman Long

A new cpuset.sched.domain_root boolean flag is added to cpuset
v2. This new flag, if set, indicates that the cgroup is the root of
a new scheduling domain or partition that includes itself and all its
descendants except those that are scheduling domain roots themselves
and their descendants.

With this new flag, one can directly create as many partitions as
necessary without ever using the v1 trick of turning off load balancing
in specific cpusets to create partitions as a side effect.

This new flag is owned by the parent and will cause the CPUs in the
cpuset to be removed from the effective CPUs of its parent.

This is implemented internally by adding a new reserved_cpus mask that
holds the CPUs belonging to child scheduling domain cpusets so that:

	reserved_cpus | effective_cpus = cpus_allowed
	reserved_cpus & effective_cpus = 0

This new flag can only be turned on in a cpuset if its parent is a
scheduling domain root itself. The state of this flag cannot be changed
if the cpuset has children.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/admin-guide/cgroup-v2.rst |  33 +++++
 kernel/cgroup/cpuset.c                  | 209 +++++++++++++++++++++++++++++++-
 2 files changed, 239 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index fbc30b6..d5e25a0 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1584,6 +1584,39 @@ Cpuset Interface Files
 
 	Its value will be affected by memory nodes hotplug events.
 
+  cpuset.sched.domain_root
+	A read-write single value file which exists on non-root
+	cpuset-enabled cgroups.  It is a binary value flag that accepts
+	either "0" (off) or "1" (on).  This flag is set by the parent
+	and is not delegatable.
+
+	If set, it indicates that the current cgroup is the root of a
+	new scheduling domain or partition that comprises itself and
+	all its descendants except those that are scheduling domain
+	roots themselves and their descendants.  The root cgroup is
+	always a scheduling domain root.
+
+	There are constraints on where this flag can be set.  It can
+	only be set in a cgroup if all the following conditions are true.
+
+	1) The "cpuset.cpus" is not empty and the list of CPUs are
+	   exclusive, i.e. they are not shared by any of its siblings.
+	2) The "cpuset.cpus" is also a proper subset of the parent's
+	   "cpuset.cpus.effective".
+	3) The parent cgroup is a scheduling domain root.
+	4) There is no child cgroups with cpuset enabled.  This is
+	   for eliminating corner cases that have to be handled if such
+	   a condition is allowed.
+
+	Setting this flag will take the CPUs away from the effective
+	CPUs of the parent cgroup.  Once it is set, this flag cannot be
+	cleared if there are any child cgroups with cpuset enabled.
+
+	A parent scheduling domain root cgroup cannot distribute
+	all its CPUs to its child scheduling domain root cgroups.
+	There must be at least one cpu left in the parent scheduling
+	domain root cgroup.
+
 
 Device controller
 -----------------
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 2b5c447..68a9c25 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -109,6 +109,9 @@ struct cpuset {
 	cpumask_var_t effective_cpus;
 	nodemask_t effective_mems;
 
+	/* CPUs reserved for child scheduling domains */
+	cpumask_var_t reserved_cpus;
+
 	/*
 	 * This is old Memory Nodes tasks took on.
 	 *
@@ -134,6 +137,9 @@ struct cpuset {
 
 	/* for custom sched domain */
 	int relax_domain_level;
+
+	/* number of CPUs in reserved_cpus */
+	int nr_reserved;
 };
 
 static inline struct cpuset *css_cs(struct cgroup_subsys_state *css)
@@ -175,6 +181,7 @@ static inline bool task_has_mempolicy(struct task_struct *task)
 	CS_SCHED_LOAD_BALANCE,
 	CS_SPREAD_PAGE,
 	CS_SPREAD_SLAB,
+	CS_SCHED_DOMAIN_ROOT,
 } cpuset_flagbits_t;
 
 /* convenient tests for these bits */
@@ -203,6 +210,11 @@ static inline int is_sched_load_balance(const struct cpuset *cs)
 	return test_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
 }
 
+static inline int is_sched_domain_root(const struct cpuset *cs)
+{
+	return test_bit(CS_SCHED_DOMAIN_ROOT, &cs->flags);
+}
+
 static inline int is_memory_migrate(const struct cpuset *cs)
 {
 	return test_bit(CS_MEMORY_MIGRATE, &cs->flags);
@@ -220,7 +232,7 @@ static inline int is_spread_slab(const struct cpuset *cs)
 
 static struct cpuset top_cpuset = {
 	.flags = ((1 << CS_ONLINE) | (1 << CS_CPU_EXCLUSIVE) |
-		  (1 << CS_MEM_EXCLUSIVE)),
+		  (1 << CS_MEM_EXCLUSIVE) | (1 << CS_SCHED_DOMAIN_ROOT)),
 };
 
 /**
@@ -881,6 +893,27 @@ static void update_tasks_cpumask(struct cpuset *cs)
 	css_task_iter_end(&it);
 }
 
+/**
+ * compute_effective_cpumask - Compute the effective cpumask of the cpuset
+ * @new_cpus: the temp variable for the new effective_cpus mask
+ * @cs: the cpuset the need to recompute the new effective_cpus mask
+ * @parent: the parent cpuset
+ *
+ * If the parent has reserved CPUs, include them in the list of allowable
+ * CPUs in computing the new effective_cpus mask.
+ */
+static void compute_effective_cpumask(struct cpumask *new_cpus,
+				      struct cpuset *cs, struct cpuset *parent)
+{
+	if (parent->nr_reserved) {
+		cpumask_or(new_cpus, parent->effective_cpus,
+			   parent->reserved_cpus);
+		cpumask_and(new_cpus, new_cpus, cs->cpus_allowed);
+	} else {
+		cpumask_and(new_cpus, cs->cpus_allowed, parent->effective_cpus);
+	}
+}
+
 /*
  * update_cpumasks_hier - Update effective cpumasks and tasks in the subtree
  * @cs: the cpuset to consider
@@ -903,7 +936,7 @@ static void update_cpumasks_hier(struct cpuset *cs, struct cpumask *new_cpus)
 	cpuset_for_each_descendant_pre(cp, pos_css, cs) {
 		struct cpuset *parent = parent_cs(cp);
 
-		cpumask_and(new_cpus, cp->cpus_allowed, parent->effective_cpus);
+		compute_effective_cpumask(new_cpus, cp, parent);
 
 		/*
 		 * If it becomes empty, inherit the effective mask of the
@@ -949,6 +982,130 @@ static void update_cpumasks_hier(struct cpuset *cs, struct cpumask *new_cpus)
 }
 
 /**
+ * update_reserved_cpumask - update the reserved_cpus mask of parent cpuset
+ * @cpuset:  The cpuset that requests CPU reservation
+ * @delmask: The old reserved cpumask to be removed from the parent
+ * @addmask: The new reserved cpumask to be added to the parent
+ * Return: 0 if successful, an error code otherwise
+ *
+ * Changes to the reserved CPUs are not allowed if any of CPUs changing
+ * state are in any of the child cpusets of the parent except the requesting
+ * child.
+ *
+ * If the sched_domain_root flag changes, either the delmask (0=>1) or the
+ * addmask (1=>0) will be NULL.
+ *
+ * Called with cpuset_mutex held.
+ */
+static int update_reserved_cpumask(struct cpuset *cpuset,
+	struct cpumask *delmask, struct cpumask *addmask)
+{
+	int retval;
+	struct cpuset *parent = parent_cs(cpuset);
+	struct cpuset *sibling;
+	struct cgroup_subsys_state *pos_css;
+	int old_count = parent->nr_reserved;
+
+	/*
+	 * The parent must be a scheduling domain root.
+	 * The new cpumask, if present, must not be empty.
+	 */
+	if (!is_sched_domain_root(parent) ||
+	   (addmask && cpumask_empty(addmask)))
+		return -EINVAL;
+
+	/*
+	 * The delmask, if present, must be a subset of parent's reserved
+	 * CPUs.
+	 */
+	if (delmask && !cpumask_empty(delmask) && (!parent->nr_reserved ||
+		       !cpumask_subset(delmask, parent->reserved_cpus))) {
+		WARN_ON_ONCE(1);
+		return -EINVAL;
+	}
+
+	/*
+	 * A sched_domain_root state change is not allowed if there are
+	 * online children.
+	 */
+	if (css_has_online_children(&cpuset->css))
+		return -EBUSY;
+
+	if (!old_count) {
+		if (!zalloc_cpumask_var(&parent->reserved_cpus, GFP_KERNEL)) {
+			retval = -ENOMEM;
+			goto out;
+		}
+		old_count = 1;
+	}
+
+	retval = -EBUSY;
+
+	/*
+	 * The cpus to be added must be a proper subset of the parent's
+	 * effective_cpus mask but not in the reserved_cpus mask.
+	 */
+	if (addmask) {
+		if (!cpumask_subset(addmask, parent->effective_cpus) ||
+		     cpumask_equal(addmask, parent->effective_cpus))
+			goto out;
+		if (parent->nr_reserved &&
+		    cpumask_intersects(parent->reserved_cpus, addmask))
+			goto out;
+	}
+
+	/*
+	 * Check if any CPUs in addmask or delmask are in the effective_cpus
+	 * of a sibling cpuset. The implied cpu_exclusive of a scheduling
+	 * domain root will ensure there are no overlap in cpus_allowed.
+	 */
+	rcu_read_lock();
+	cpuset_for_each_child(sibling, pos_css, parent) {
+		if ((sibling == cpuset) || !(sibling->css.flags & CSS_ONLINE))
+			continue;
+		if (addmask &&
+		    cpumask_intersects(sibling->effective_cpus, addmask))
+			goto out_unlock;
+		if (delmask &&
+		    cpumask_intersects(sibling->effective_cpus, delmask))
+			goto out_unlock;
+	}
+	rcu_read_unlock();
+
+	/*
+	 * Change the reserved CPU list.
+	 * Newly added reserved CPUs will be removed from effective_cpus
+	 * and newly deleted ones will be added back if they are online.
+	 */
+	spin_lock_irq(&callback_lock);
+	if (addmask) {
+		cpumask_or(parent->reserved_cpus,
+			   parent->reserved_cpus, addmask);
+		cpumask_andnot(parent->effective_cpus,
+			       parent->effective_cpus, addmask);
+	}
+	if (delmask) {
+		cpumask_andnot(parent->reserved_cpus,
+			       parent->reserved_cpus, delmask);
+		cpumask_or(parent->effective_cpus,
+			   parent->effective_cpus, delmask);
+	}
+
+	parent->nr_reserved = cpumask_weight(parent->reserved_cpus);
+	spin_unlock_irq(&callback_lock);
+	retval = 0;
+out:
+	if (old_count && !parent->nr_reserved)
+		free_cpumask_var(parent->reserved_cpus);
+
+	return retval;
+
+out_unlock:
+	rcu_read_unlock();
+	goto out;
+}
+
+/**
  * update_cpumask - update the cpus_allowed mask of a cpuset and all tasks in it
  * @cs: the cpuset to consider
  * @trialcs: trial cpuset
@@ -989,6 +1146,9 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
 	if (retval < 0)
 		return retval;
 
+	if (is_sched_domain_root(cs))
+		return -EBUSY;
+
 	spin_lock_irq(&callback_lock);
 	cpumask_copy(cs->cpus_allowed, trialcs->cpus_allowed);
 	spin_unlock_irq(&callback_lock);
@@ -1317,6 +1477,7 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
 	struct cpuset *trialcs;
 	int balance_flag_changed;
 	int spread_flag_changed;
+	int domain_flag_changed;
 	int err;
 
 	trialcs = alloc_trial_cpuset(cs);
@@ -1328,6 +1489,18 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
 	else
 		clear_bit(bit, &trialcs->flags);
 
+	/*
+	 *  Turning on sched.domain flag (default hierarchy only) implies
+	 *  an implicit cpu_exclusive. Turning off sched.domain will clear
+	 *  the cpu_exclusive flag.
+	 */
+	if (bit == CS_SCHED_DOMAIN_ROOT) {
+		if (turning_on)
+			set_bit(CS_CPU_EXCLUSIVE, &trialcs->flags);
+		else
+			clear_bit(CS_CPU_EXCLUSIVE, &trialcs->flags);
+	}
+
 	err = validate_change(cs, trialcs);
 	if (err < 0)
 		goto out;
@@ -1338,11 +1511,27 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
 	spread_flag_changed = ((is_spread_slab(cs) != is_spread_slab(trialcs))
 			|| (is_spread_page(cs) != is_spread_page(trialcs)));
 
+	domain_flag_changed = (is_sched_domain_root(cs) !=
+			       is_sched_domain_root(trialcs));
+
+	if (domain_flag_changed) {
+		err = turning_on
+		    ? update_reserved_cpumask(cs, NULL, cs->cpus_allowed)
+		    : update_reserved_cpumask(cs, cs->cpus_allowed, NULL);
+		if (err < 0)
+			goto out;
+		/*
+		 * At this point, the state has been changed.
+		 * So we can't back out with error anymore.
+		 */
+	}
+
 	spin_lock_irq(&callback_lock);
 	cs->flags = trialcs->flags;
 	spin_unlock_irq(&callback_lock);
 
-	if (!cpumask_empty(trialcs->cpus_allowed) && balance_flag_changed)
+	if (!cpumask_empty(trialcs->cpus_allowed) &&
+	   (balance_flag_changed || domain_flag_changed))
 		rebuild_sched_domains_locked();
 
 	if (spread_flag_changed)
@@ -1597,6 +1786,7 @@ static void cpuset_attach(struct cgroup_taskset *tset)
 	FILE_MEM_EXCLUSIVE,
 	FILE_MEM_HARDWALL,
 	FILE_SCHED_LOAD_BALANCE,
+	FILE_SCHED_DOMAIN_ROOT,
 	FILE_SCHED_RELAX_DOMAIN_LEVEL,
 	FILE_MEMORY_PRESSURE_ENABLED,
 	FILE_MEMORY_PRESSURE,
@@ -1630,6 +1820,9 @@ static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft,
 	case FILE_SCHED_LOAD_BALANCE:
 		retval = update_flag(CS_SCHED_LOAD_BALANCE, cs, val);
 		break;
+	case FILE_SCHED_DOMAIN_ROOT:
+		retval = update_flag(CS_SCHED_DOMAIN_ROOT, cs, val);
+		break;
 	case FILE_MEMORY_MIGRATE:
 		retval = update_flag(CS_MEMORY_MIGRATE, cs, val);
 		break;
@@ -1791,6 +1984,8 @@ static u64 cpuset_read_u64(struct cgroup_subsys_state *css, struct cftype *cft)
 		return is_mem_hardwall(cs);
 	case FILE_SCHED_LOAD_BALANCE:
 		return is_sched_load_balance(cs);
+	case FILE_SCHED_DOMAIN_ROOT:
+		return is_sched_domain_root(cs);
 	case FILE_MEMORY_MIGRATE:
 		return is_memory_migrate(cs);
 	case FILE_MEMORY_PRESSURE_ENABLED:
@@ -1967,6 +2162,14 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
 		.flags = CFTYPE_NOT_ON_ROOT,
 	},
 
+	{
+		.name = "sched.domain_root",
+		.read_u64 = cpuset_read_u64,
+		.write_u64 = cpuset_write_u64,
+		.private = FILE_SCHED_DOMAIN_ROOT,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
+
 	{ }	/* terminate */
 };
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v10 2/9] cpuset: Add new v2 cpuset.sched.domain_root flag
@ 2018-06-18  4:14   ` Waiman Long
  0 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-18  4:14 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Patrick Bellasi, Waiman Long

A new cpuset.sched.domain_root boolean flag is added to cpuset
v2. This new flag, if set, indicates that the cgroup is the root of
a new scheduling domain or partition that includes itself and all its
descendants except those that are scheduling domain roots themselves
and their descendants.

With this new flag, one can directly create as many partitions as
necessary without ever using the v1 trick of turning off load balancing
in specific cpusets to create partitions as a side effect.

This new flag is owned by the parent and will cause the CPUs in the
cpuset to be removed from the effective CPUs of its parent.

This is implemented internally by adding a new reserved_cpus mask that
holds the CPUs belonging to child scheduling domain cpusets so that:

	reserved_cpus | effective_cpus = cpus_allowed
	reserved_cpus & effective_cpus = 0

This new flag can only be turned on in a cpuset if its parent is a
scheduling domain root itself. The state of this flag cannot be changed
if the cpuset has children.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/admin-guide/cgroup-v2.rst |  33 +++++
 kernel/cgroup/cpuset.c                  | 209 +++++++++++++++++++++++++++++++-
 2 files changed, 239 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index fbc30b6..d5e25a0 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1584,6 +1584,39 @@ Cpuset Interface Files
 
 	Its value will be affected by memory nodes hotplug events.
 
+  cpuset.sched.domain_root
+	A read-write single value file which exists on non-root
+	cpuset-enabled cgroups.  It is a binary value flag that accepts
+	either "0" (off) or "1" (on).  This flag is set by the parent
+	and is not delegatable.
+
+	If set, it indicates that the current cgroup is the root of a
+	new scheduling domain or partition that comprises itself and
+	all its descendants except those that are scheduling domain
+	roots themselves and their descendants.  The root cgroup is
+	always a scheduling domain root.
+
+	There are constraints on where this flag can be set.  It can
+	only be set in a cgroup if all the following conditions are true.
+
+	1) The "cpuset.cpus" is not empty and the list of CPUs are
+	   exclusive, i.e. they are not shared by any of its siblings.
+	2) The "cpuset.cpus" is also a proper subset of the parent's
+	   "cpuset.cpus.effective".
+	3) The parent cgroup is a scheduling domain root.
+	4) There is no child cgroups with cpuset enabled.  This is
+	   for eliminating corner cases that have to be handled if such
+	   a condition is allowed.
+
+	Setting this flag will take the CPUs away from the effective
+	CPUs of the parent cgroup.  Once it is set, this flag cannot be
+	cleared if there are any child cgroups with cpuset enabled.
+
+	A parent scheduling domain root cgroup cannot distribute
+	all its CPUs to its child scheduling domain root cgroups.
+	There must be at least one cpu left in the parent scheduling
+	domain root cgroup.
+
 
 Device controller
 -----------------
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 2b5c447..68a9c25 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -109,6 +109,9 @@ struct cpuset {
 	cpumask_var_t effective_cpus;
 	nodemask_t effective_mems;
 
+	/* CPUs reserved for child scheduling domains */
+	cpumask_var_t reserved_cpus;
+
 	/*
 	 * This is old Memory Nodes tasks took on.
 	 *
@@ -134,6 +137,9 @@ struct cpuset {
 
 	/* for custom sched domain */
 	int relax_domain_level;
+
+	/* number of CPUs in reserved_cpus */
+	int nr_reserved;
 };
 
 static inline struct cpuset *css_cs(struct cgroup_subsys_state *css)
@@ -175,6 +181,7 @@ static inline bool task_has_mempolicy(struct task_struct *task)
 	CS_SCHED_LOAD_BALANCE,
 	CS_SPREAD_PAGE,
 	CS_SPREAD_SLAB,
+	CS_SCHED_DOMAIN_ROOT,
 } cpuset_flagbits_t;
 
 /* convenient tests for these bits */
@@ -203,6 +210,11 @@ static inline int is_sched_load_balance(const struct cpuset *cs)
 	return test_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
 }
 
+static inline int is_sched_domain_root(const struct cpuset *cs)
+{
+	return test_bit(CS_SCHED_DOMAIN_ROOT, &cs->flags);
+}
+
 static inline int is_memory_migrate(const struct cpuset *cs)
 {
 	return test_bit(CS_MEMORY_MIGRATE, &cs->flags);
@@ -220,7 +232,7 @@ static inline int is_spread_slab(const struct cpuset *cs)
 
 static struct cpuset top_cpuset = {
 	.flags = ((1 << CS_ONLINE) | (1 << CS_CPU_EXCLUSIVE) |
-		  (1 << CS_MEM_EXCLUSIVE)),
+		  (1 << CS_MEM_EXCLUSIVE) | (1 << CS_SCHED_DOMAIN_ROOT)),
 };
 
 /**
@@ -881,6 +893,27 @@ static void update_tasks_cpumask(struct cpuset *cs)
 	css_task_iter_end(&it);
 }
 
+/**
+ * compute_effective_cpumask - Compute the effective cpumask of the cpuset
+ * @new_cpus: the temp variable for the new effective_cpus mask
+ * @cs: the cpuset the need to recompute the new effective_cpus mask
+ * @parent: the parent cpuset
+ *
+ * If the parent has reserved CPUs, include them in the list of allowable
+ * CPUs in computing the new effective_cpus mask.
+ */
+static void compute_effective_cpumask(struct cpumask *new_cpus,
+				      struct cpuset *cs, struct cpuset *parent)
+{
+	if (parent->nr_reserved) {
+		cpumask_or(new_cpus, parent->effective_cpus,
+			   parent->reserved_cpus);
+		cpumask_and(new_cpus, new_cpus, cs->cpus_allowed);
+	} else {
+		cpumask_and(new_cpus, cs->cpus_allowed, parent->effective_cpus);
+	}
+}
+
 /*
  * update_cpumasks_hier - Update effective cpumasks and tasks in the subtree
  * @cs: the cpuset to consider
@@ -903,7 +936,7 @@ static void update_cpumasks_hier(struct cpuset *cs, struct cpumask *new_cpus)
 	cpuset_for_each_descendant_pre(cp, pos_css, cs) {
 		struct cpuset *parent = parent_cs(cp);
 
-		cpumask_and(new_cpus, cp->cpus_allowed, parent->effective_cpus);
+		compute_effective_cpumask(new_cpus, cp, parent);
 
 		/*
 		 * If it becomes empty, inherit the effective mask of the
@@ -949,6 +982,130 @@ static void update_cpumasks_hier(struct cpuset *cs, struct cpumask *new_cpus)
 }
 
 /**
+ * update_reserved_cpumask - update the reserved_cpus mask of parent cpuset
+ * @cpuset:  The cpuset that requests CPU reservation
+ * @delmask: The old reserved cpumask to be removed from the parent
+ * @addmask: The new reserved cpumask to be added to the parent
+ * Return: 0 if successful, an error code otherwise
+ *
+ * Changes to the reserved CPUs are not allowed if any of CPUs changing
+ * state are in any of the child cpusets of the parent except the requesting
+ * child.
+ *
+ * If the sched_domain_root flag changes, either the delmask (0=>1) or the
+ * addmask (1=>0) will be NULL.
+ *
+ * Called with cpuset_mutex held.
+ */
+static int update_reserved_cpumask(struct cpuset *cpuset,
+	struct cpumask *delmask, struct cpumask *addmask)
+{
+	int retval;
+	struct cpuset *parent = parent_cs(cpuset);
+	struct cpuset *sibling;
+	struct cgroup_subsys_state *pos_css;
+	int old_count = parent->nr_reserved;
+
+	/*
+	 * The parent must be a scheduling domain root.
+	 * The new cpumask, if present, must not be empty.
+	 */
+	if (!is_sched_domain_root(parent) ||
+	   (addmask && cpumask_empty(addmask)))
+		return -EINVAL;
+
+	/*
+	 * The delmask, if present, must be a subset of parent's reserved
+	 * CPUs.
+	 */
+	if (delmask && !cpumask_empty(delmask) && (!parent->nr_reserved ||
+		       !cpumask_subset(delmask, parent->reserved_cpus))) {
+		WARN_ON_ONCE(1);
+		return -EINVAL;
+	}
+
+	/*
+	 * A sched_domain_root state change is not allowed if there are
+	 * online children.
+	 */
+	if (css_has_online_children(&cpuset->css))
+		return -EBUSY;
+
+	if (!old_count) {
+		if (!zalloc_cpumask_var(&parent->reserved_cpus, GFP_KERNEL)) {
+			retval = -ENOMEM;
+			goto out;
+		}
+		old_count = 1;
+	}
+
+	retval = -EBUSY;
+
+	/*
+	 * The cpus to be added must be a proper subset of the parent's
+	 * effective_cpus mask but not in the reserved_cpus mask.
+	 */
+	if (addmask) {
+		if (!cpumask_subset(addmask, parent->effective_cpus) ||
+		     cpumask_equal(addmask, parent->effective_cpus))
+			goto out;
+		if (parent->nr_reserved &&
+		    cpumask_intersects(parent->reserved_cpus, addmask))
+			goto out;
+	}
+
+	/*
+	 * Check if any CPUs in addmask or delmask are in the effective_cpus
+	 * of a sibling cpuset. The implied cpu_exclusive of a scheduling
+	 * domain root will ensure there are no overlap in cpus_allowed.
+	 */
+	rcu_read_lock();
+	cpuset_for_each_child(sibling, pos_css, parent) {
+		if ((sibling == cpuset) || !(sibling->css.flags & CSS_ONLINE))
+			continue;
+		if (addmask &&
+		    cpumask_intersects(sibling->effective_cpus, addmask))
+			goto out_unlock;
+		if (delmask &&
+		    cpumask_intersects(sibling->effective_cpus, delmask))
+			goto out_unlock;
+	}
+	rcu_read_unlock();
+
+	/*
+	 * Change the reserved CPU list.
+	 * Newly added reserved CPUs will be removed from effective_cpus
+	 * and newly deleted ones will be added back if they are online.
+	 */
+	spin_lock_irq(&callback_lock);
+	if (addmask) {
+		cpumask_or(parent->reserved_cpus,
+			   parent->reserved_cpus, addmask);
+		cpumask_andnot(parent->effective_cpus,
+			       parent->effective_cpus, addmask);
+	}
+	if (delmask) {
+		cpumask_andnot(parent->reserved_cpus,
+			       parent->reserved_cpus, delmask);
+		cpumask_or(parent->effective_cpus,
+			   parent->effective_cpus, delmask);
+	}
+
+	parent->nr_reserved = cpumask_weight(parent->reserved_cpus);
+	spin_unlock_irq(&callback_lock);
+	retval = 0;
+out:
+	if (old_count && !parent->nr_reserved)
+		free_cpumask_var(parent->reserved_cpus);
+
+	return retval;
+
+out_unlock:
+	rcu_read_unlock();
+	goto out;
+}
+
+/**
  * update_cpumask - update the cpus_allowed mask of a cpuset and all tasks in it
  * @cs: the cpuset to consider
  * @trialcs: trial cpuset
@@ -989,6 +1146,9 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
 	if (retval < 0)
 		return retval;
 
+	if (is_sched_domain_root(cs))
+		return -EBUSY;
+
 	spin_lock_irq(&callback_lock);
 	cpumask_copy(cs->cpus_allowed, trialcs->cpus_allowed);
 	spin_unlock_irq(&callback_lock);
@@ -1317,6 +1477,7 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
 	struct cpuset *trialcs;
 	int balance_flag_changed;
 	int spread_flag_changed;
+	int domain_flag_changed;
 	int err;
 
 	trialcs = alloc_trial_cpuset(cs);
@@ -1328,6 +1489,18 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
 	else
 		clear_bit(bit, &trialcs->flags);
 
+	/*
+	 *  Turning on sched.domain flag (default hierarchy only) implies
+	 *  an implicit cpu_exclusive. Turning off sched.domain will clear
+	 *  the cpu_exclusive flag.
+	 */
+	if (bit == CS_SCHED_DOMAIN_ROOT) {
+		if (turning_on)
+			set_bit(CS_CPU_EXCLUSIVE, &trialcs->flags);
+		else
+			clear_bit(CS_CPU_EXCLUSIVE, &trialcs->flags);
+	}
+
 	err = validate_change(cs, trialcs);
 	if (err < 0)
 		goto out;
@@ -1338,11 +1511,27 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
 	spread_flag_changed = ((is_spread_slab(cs) != is_spread_slab(trialcs))
 			|| (is_spread_page(cs) != is_spread_page(trialcs)));
 
+	domain_flag_changed = (is_sched_domain_root(cs) !=
+			       is_sched_domain_root(trialcs));
+
+	if (domain_flag_changed) {
+		err = turning_on
+		    ? update_reserved_cpumask(cs, NULL, cs->cpus_allowed)
+		    : update_reserved_cpumask(cs, cs->cpus_allowed, NULL);
+		if (err < 0)
+			goto out;
+		/*
+		 * At this point, the state has been changed.
+		 * So we can't back out with error anymore.
+		 */
+	}
+
 	spin_lock_irq(&callback_lock);
 	cs->flags = trialcs->flags;
 	spin_unlock_irq(&callback_lock);
 
-	if (!cpumask_empty(trialcs->cpus_allowed) && balance_flag_changed)
+	if (!cpumask_empty(trialcs->cpus_allowed) &&
+	   (balance_flag_changed || domain_flag_changed))
 		rebuild_sched_domains_locked();
 
 	if (spread_flag_changed)
@@ -1597,6 +1786,7 @@ static void cpuset_attach(struct cgroup_taskset *tset)
 	FILE_MEM_EXCLUSIVE,
 	FILE_MEM_HARDWALL,
 	FILE_SCHED_LOAD_BALANCE,
+	FILE_SCHED_DOMAIN_ROOT,
 	FILE_SCHED_RELAX_DOMAIN_LEVEL,
 	FILE_MEMORY_PRESSURE_ENABLED,
 	FILE_MEMORY_PRESSURE,
@@ -1630,6 +1820,9 @@ static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft,
 	case FILE_SCHED_LOAD_BALANCE:
 		retval = update_flag(CS_SCHED_LOAD_BALANCE, cs, val);
 		break;
+	case FILE_SCHED_DOMAIN_ROOT:
+		retval = update_flag(CS_SCHED_DOMAIN_ROOT, cs, val);
+		break;
 	case FILE_MEMORY_MIGRATE:
 		retval = update_flag(CS_MEMORY_MIGRATE, cs, val);
 		break;
@@ -1791,6 +1984,8 @@ static u64 cpuset_read_u64(struct cgroup_subsys_state *css, struct cftype *cft)
 		return is_mem_hardwall(cs);
 	case FILE_SCHED_LOAD_BALANCE:
 		return is_sched_load_balance(cs);
+	case FILE_SCHED_DOMAIN_ROOT:
+		return is_sched_domain_root(cs);
 	case FILE_MEMORY_MIGRATE:
 		return is_memory_migrate(cs);
 	case FILE_MEMORY_PRESSURE_ENABLED:
@@ -1967,6 +2162,14 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
 		.flags = CFTYPE_NOT_ON_ROOT,
 	},
 
+	{
+		.name = "sched.domain_root",
+		.read_u64 = cpuset_read_u64,
+		.write_u64 = cpuset_write_u64,
+		.private = FILE_SCHED_DOMAIN_ROOT,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
+
 	{ }	/* terminate */
 };
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v10 3/9] cpuset: Simulate auto-off of sched.domain_root at cgroup removal
  2018-06-18  4:13 ` Waiman Long
@ 2018-06-18  4:14   ` Waiman Long
  -1 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-18  4:14 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Patrick Bellasi, Waiman Long

Making a cgroup a domain root will reserve cpu resource at its parent.
So when a domain root cgroup is destroyed, we need to free the
reserved cpus at its parent. This is now done by doing an auto-off of
the sched.domain_root flag in the offlining phase when a domain root
cgroup is being removed.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/cpuset.c | 34 +++++++++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 5 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 68a9c25..a1d5ccd 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -995,7 +995,8 @@ static void update_cpumasks_hier(struct cpuset *cs, struct cpumask *new_cpus)
  * If the sched_domain_root flag changes, either the delmask (0=>1) or the
  * addmask (1=>0) will be NULL.
  *
- * Called with cpuset_mutex held.
+ * Called with cpuset_mutex held. Some of the checks are skipped if the
+ * cpuset is being offlined (dying).
  */
 static int update_reserved_cpumask(struct cpuset *cpuset,
 	struct cpumask *delmask, struct cpumask *addmask)
@@ -1005,6 +1006,7 @@ static int update_reserved_cpumask(struct cpuset *cpuset,
 	struct cpuset *sibling;
 	struct cgroup_subsys_state *pos_css;
 	int old_count = parent->nr_reserved;
+	bool dying = cpuset->css.flags & CSS_DYING;
 
 	/*
 	 * The parent must be a scheduling domain root.
@@ -1026,9 +1028,9 @@ static int update_reserved_cpumask(struct cpuset *cpuset,
 
 	/*
 	 * A sched_domain_root state change is not allowed if there are
-	 * online children.
+	 * online children and the cpuset is not dying.
 	 */
-	if (css_has_online_children(&cpuset->css))
+	if (!dying && css_has_online_children(&cpuset->css))
 		return -EBUSY;
 
 	if (!old_count) {
@@ -1058,7 +1060,12 @@ static int update_reserved_cpumask(struct cpuset *cpuset,
 	 * Check if any CPUs in addmask or delmask are in the effective_cpus
 	 * of a sibling cpuset. The implied cpu_exclusive of a scheduling
 	 * domain root will ensure there are no overlap in cpus_allowed.
+	 *
+	 * This check is skipped if the cpuset is dying.
 	 */
+	if (dying)
+		goto updated_reserved_cpus;
+
 	rcu_read_lock();
 	cpuset_for_each_child(sibling, pos_css, parent) {
 		if ((sibling == cpuset) || !(sibling->css.flags & CSS_ONLINE))
@@ -1077,6 +1084,7 @@ static int update_reserved_cpumask(struct cpuset *cpuset,
 	 * Newly added reserved CPUs will be removed from effective_cpus
 	 * and newly deleted ones will be added back if they are online.
 	 */
+updated_reserved_cpus:
 	spin_lock_irq(&callback_lock);
 	if (addmask) {
 		cpumask_or(parent->reserved_cpus,
@@ -2278,7 +2286,12 @@ static int cpuset_css_online(struct cgroup_subsys_state *css)
 /*
  * If the cpuset being removed has its flag 'sched_load_balance'
  * enabled, then simulate turning sched_load_balance off, which
- * will call rebuild_sched_domains_locked().
+ * will call rebuild_sched_domains_locked(). That is not needed
+ * in the default hierarchy where only changes in domain_root
+ * will cause repartitioning.
+ *
+ * If the cpuset has the 'sched.domain_root' flag enabled, simulate
+ * turning 'sched.domain_root" off.
  */
 
 static void cpuset_css_offline(struct cgroup_subsys_state *css)
@@ -2287,7 +2300,18 @@ static void cpuset_css_offline(struct cgroup_subsys_state *css)
 
 	mutex_lock(&cpuset_mutex);
 
-	if (is_sched_load_balance(cs))
+	/*
+	 * A WARN_ON_ONCE() check after calling update_flag() to make
+	 * sure that the operation succceeds without failure.
+	 */
+	if (is_sched_domain_root(cs)) {
+		int ret = update_flag(CS_SCHED_DOMAIN_ROOT, cs, 0);
+
+		WARN_ON_ONCE(ret);
+	}
+
+	if (!cgroup_subsys_on_dfl(cpuset_cgrp_subsys) &&
+	    is_sched_load_balance(cs))
 		update_flag(CS_SCHED_LOAD_BALANCE, cs, 0);
 
 	cpuset_dec();
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v10 3/9] cpuset: Simulate auto-off of sched.domain_root at cgroup removal
@ 2018-06-18  4:14   ` Waiman Long
  0 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-18  4:14 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Patrick Bellasi, Waiman Long

Making a cgroup a domain root will reserve cpu resource at its parent.
So when a domain root cgroup is destroyed, we need to free the
reserved cpus at its parent. This is now done by doing an auto-off of
the sched.domain_root flag in the offlining phase when a domain root
cgroup is being removed.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/cpuset.c | 34 +++++++++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 5 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 68a9c25..a1d5ccd 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -995,7 +995,8 @@ static void update_cpumasks_hier(struct cpuset *cs, struct cpumask *new_cpus)
  * If the sched_domain_root flag changes, either the delmask (0=>1) or the
  * addmask (1=>0) will be NULL.
  *
- * Called with cpuset_mutex held.
+ * Called with cpuset_mutex held. Some of the checks are skipped if the
+ * cpuset is being offlined (dying).
  */
 static int update_reserved_cpumask(struct cpuset *cpuset,
 	struct cpumask *delmask, struct cpumask *addmask)
@@ -1005,6 +1006,7 @@ static int update_reserved_cpumask(struct cpuset *cpuset,
 	struct cpuset *sibling;
 	struct cgroup_subsys_state *pos_css;
 	int old_count = parent->nr_reserved;
+	bool dying = cpuset->css.flags & CSS_DYING;
 
 	/*
 	 * The parent must be a scheduling domain root.
@@ -1026,9 +1028,9 @@ static int update_reserved_cpumask(struct cpuset *cpuset,
 
 	/*
 	 * A sched_domain_root state change is not allowed if there are
-	 * online children.
+	 * online children and the cpuset is not dying.
 	 */
-	if (css_has_online_children(&cpuset->css))
+	if (!dying && css_has_online_children(&cpuset->css))
 		return -EBUSY;
 
 	if (!old_count) {
@@ -1058,7 +1060,12 @@ static int update_reserved_cpumask(struct cpuset *cpuset,
 	 * Check if any CPUs in addmask or delmask are in the effective_cpus
 	 * of a sibling cpuset. The implied cpu_exclusive of a scheduling
 	 * domain root will ensure there are no overlap in cpus_allowed.
+	 *
+	 * This check is skipped if the cpuset is dying.
 	 */
+	if (dying)
+		goto updated_reserved_cpus;
+
 	rcu_read_lock();
 	cpuset_for_each_child(sibling, pos_css, parent) {
 		if ((sibling == cpuset) || !(sibling->css.flags & CSS_ONLINE))
@@ -1077,6 +1084,7 @@ static int update_reserved_cpumask(struct cpuset *cpuset,
 	 * Newly added reserved CPUs will be removed from effective_cpus
 	 * and newly deleted ones will be added back if they are online.
 	 */
+updated_reserved_cpus:
 	spin_lock_irq(&callback_lock);
 	if (addmask) {
 		cpumask_or(parent->reserved_cpus,
@@ -2278,7 +2286,12 @@ static int cpuset_css_online(struct cgroup_subsys_state *css)
 /*
  * If the cpuset being removed has its flag 'sched_load_balance'
  * enabled, then simulate turning sched_load_balance off, which
- * will call rebuild_sched_domains_locked().
+ * will call rebuild_sched_domains_locked(). That is not needed
+ * in the default hierarchy where only changes in domain_root
+ * will cause repartitioning.
+ *
+ * If the cpuset has the 'sched.domain_root' flag enabled, simulate
+ * turning 'sched.domain_root" off.
  */
 
 static void cpuset_css_offline(struct cgroup_subsys_state *css)
@@ -2287,7 +2300,18 @@ static void cpuset_css_offline(struct cgroup_subsys_state *css)
 
 	mutex_lock(&cpuset_mutex);
 
-	if (is_sched_load_balance(cs))
+	/*
+	 * A WARN_ON_ONCE() check after calling update_flag() to make
+	 * sure that the operation succceeds without failure.
+	 */
+	if (is_sched_domain_root(cs)) {
+		int ret = update_flag(CS_SCHED_DOMAIN_ROOT, cs, 0);
+
+		WARN_ON_ONCE(ret);
+	}
+
+	if (!cgroup_subsys_on_dfl(cpuset_cgrp_subsys) &&
+	    is_sched_load_balance(cs))
 		update_flag(CS_SCHED_LOAD_BALANCE, cs, 0);
 
 	cpuset_dec();
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v10 4/9] cpuset: Allow changes to cpus in a domain root
  2018-06-18  4:13 ` Waiman Long
@ 2018-06-18  4:14   ` Waiman Long
  -1 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-18  4:14 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Patrick Bellasi, Waiman Long

The previous patch introduces a new domain_root flag, but won't allow
changes made to "cpuset.cpus" once the flag is on. That may be too
restrictive in some use cases. So this restiction is now relaxed to
allow changes made to the "cpuset.cpus" file with some constraints:

 1) The new set of cpus must still be exclusive.
 2) Newly added cpus must be a subset of the parent effective_cpus.
 3) None of the deleted cpus can be one of those allocated to a child
    domain roots, if present.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/admin-guide/cgroup-v2.rst |  9 ++++
 kernel/cgroup/cpuset.c                  | 81 ++++++++++++++++++++++++++-------
 2 files changed, 73 insertions(+), 17 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index d5e25a0..5ee5e77 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1617,6 +1617,15 @@ Cpuset Interface Files
 	There must be at least one cpu left in the parent scheduling
 	domain root cgroup.
 
+	In a scheduling domain root, changes to "cpuset.cpus" is allowed
+	as long as the first condition above as well as the following
+	two additional conditions are true.
+
+	1) Any added CPUs must be a proper subset of the parent's
+	   "cpuset.cpus.effective".
+	2) No CPU that has been distributed to child scheduling domain
+	   roots is deleted.
+
 
 Device controller
 -----------------
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index a1d5ccd..b1abe3d 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -957,6 +957,9 @@ static void update_cpumasks_hier(struct cpuset *cs, struct cpumask *new_cpus)
 
 		spin_lock_irq(&callback_lock);
 		cpumask_copy(cp->effective_cpus, new_cpus);
+		if (cp->nr_reserved)
+			cpumask_andnot(cp->effective_cpus, cp->effective_cpus,
+				       cp->reserved_cpus);
 		spin_unlock_irq(&callback_lock);
 
 		WARN_ON(!is_in_v2_mode() &&
@@ -984,24 +987,26 @@ static void update_cpumasks_hier(struct cpuset *cs, struct cpumask *new_cpus)
 /**
  * update_reserved_cpumask - update the reserved_cpus mask of parent cpuset
  * @cpuset:  The cpuset that requests CPU reservation
- * @delmask: The old reserved cpumask to be removed from the parent
- * @addmask: The new reserved cpumask to be added to the parent
+ * @oldmask: The old reserved cpumask to be removed from the parent
+ * @newmask: The new reserved cpumask to be added to the parent
  * Return: 0 if successful, an error code otherwise
  *
  * Changes to the reserved CPUs are not allowed if any of CPUs changing
  * state are in any of the child cpusets of the parent except the requesting
  * child.
  *
- * If the sched_domain_root flag changes, either the delmask (0=>1) or the
- * addmask (1=>0) will be NULL.
+ * If the sched_domain_root flag changes, either the oldmask (0=>1) or the
+ * newmask (1=>0) will be NULL.
  *
  * Called with cpuset_mutex held. Some of the checks are skipped if the
  * cpuset is being offlined (dying).
  */
 static int update_reserved_cpumask(struct cpuset *cpuset,
-	struct cpumask *delmask, struct cpumask *addmask)
+	struct cpumask *oldmask, struct cpumask *newmask)
 {
 	int retval;
+	int adding, deleting;
+	cpumask_var_t addmask, delmask;
 	struct cpuset *parent = parent_cs(cpuset);
 	struct cpuset *sibling;
 	struct cgroup_subsys_state *pos_css;
@@ -1013,15 +1018,15 @@ static int update_reserved_cpumask(struct cpuset *cpuset,
 	 * The new cpumask, if present, must not be empty.
 	 */
 	if (!is_sched_domain_root(parent) ||
-	   (addmask && cpumask_empty(addmask)))
+	   (newmask && cpumask_empty(newmask)))
 		return -EINVAL;
 
 	/*
-	 * The delmask, if present, must be a subset of parent's reserved
+	 * The oldmask, if present, must be a subset of parent's reserved
 	 * CPUs.
 	 */
-	if (delmask && !cpumask_empty(delmask) && (!parent->nr_reserved ||
-		       !cpumask_subset(delmask, parent->reserved_cpus))) {
+	if (oldmask && !cpumask_empty(oldmask) && (!parent->nr_reserved ||
+		       !cpumask_subset(oldmask, parent->reserved_cpus))) {
 		WARN_ON_ONCE(1);
 		return -EINVAL;
 	}
@@ -1030,9 +1035,17 @@ static int update_reserved_cpumask(struct cpuset *cpuset,
 	 * A sched_domain_root state change is not allowed if there are
 	 * online children and the cpuset is not dying.
 	 */
-	if (!dying && css_has_online_children(&cpuset->css))
+	if (!dying && (!oldmask || !newmask) &&
+	    css_has_online_children(&cpuset->css))
 		return -EBUSY;
 
+	if (!zalloc_cpumask_var(&addmask, GFP_KERNEL))
+		return -ENOMEM;
+	if (!zalloc_cpumask_var(&delmask, GFP_KERNEL)) {
+		free_cpumask_var(addmask);
+		return -ENOMEM;
+	}
+
 	if (!old_count) {
 		if (!zalloc_cpumask_var(&parent->reserved_cpus, GFP_KERNEL)) {
 			retval = -ENOMEM;
@@ -1042,12 +1055,29 @@ static int update_reserved_cpumask(struct cpuset *cpuset,
 	}
 
 	retval = -EBUSY;
+	adding = deleting = false;
+	/*
+	 * addmask = newmask & ~oldmask
+	 * delmask = oldmask & ~newmask
+	 */
+	if (oldmask && newmask) {
+		adding   = cpumask_andnot(addmask, newmask, oldmask);
+		deleting = cpumask_andnot(delmask, oldmask, newmask);
+		if (!adding && !deleting)
+			goto out_ok;
+	} else if (newmask) {
+		adding = true;
+		cpumask_copy(addmask, newmask);
+	} else if (oldmask) {
+		deleting = true;
+		cpumask_copy(delmask, oldmask);
+	}
 
 	/*
 	 * The cpus to be added must be a proper subset of the parent's
 	 * effective_cpus mask but not in the reserved_cpus mask.
 	 */
-	if (addmask) {
+	if (adding) {
 		if (!cpumask_subset(addmask, parent->effective_cpus) ||
 		     cpumask_equal(addmask, parent->effective_cpus))
 			goto out;
@@ -1057,6 +1087,15 @@ static int update_reserved_cpumask(struct cpuset *cpuset,
 	}
 
 	/*
+	 * For cpu changes in a domain root, cpu deletion isn't allowed
+	 * if any of the deleted CPUs is in reserved_cpus (distributed
+	 * to child domain roots).
+	 */
+	if (oldmask && newmask && cpuset->nr_reserved && deleting &&
+	    cpumask_intersects(delmask, cpuset->reserved_cpus))
+		goto out;
+
+	/*
 	 * Check if any CPUs in addmask or delmask are in the effective_cpus
 	 * of a sibling cpuset. The implied cpu_exclusive of a scheduling
 	 * domain root will ensure there are no overlap in cpus_allowed.
@@ -1070,10 +1109,10 @@ static int update_reserved_cpumask(struct cpuset *cpuset,
 	cpuset_for_each_child(sibling, pos_css, parent) {
 		if ((sibling == cpuset) || !(sibling->css.flags & CSS_ONLINE))
 			continue;
-		if (addmask &&
+		if (adding &&
 		    cpumask_intersects(sibling->effective_cpus, addmask))
 			goto out_unlock;
-		if (delmask &&
+		if (deleting &&
 		    cpumask_intersects(sibling->effective_cpus, delmask))
 			goto out_unlock;
 	}
@@ -1086,13 +1125,13 @@ static int update_reserved_cpumask(struct cpuset *cpuset,
 	 */
 updated_reserved_cpus:
 	spin_lock_irq(&callback_lock);
-	if (addmask) {
+	if (adding) {
 		cpumask_or(parent->reserved_cpus,
 			   parent->reserved_cpus, addmask);
 		cpumask_andnot(parent->effective_cpus,
 			       parent->effective_cpus, addmask);
 	}
-	if (delmask) {
+	if (deleting) {
 		cpumask_andnot(parent->reserved_cpus,
 			       parent->reserved_cpus, delmask);
 		cpumask_or(parent->effective_cpus,
@@ -1101,8 +1140,12 @@ static int update_reserved_cpumask(struct cpuset *cpuset,
 
 	parent->nr_reserved = cpumask_weight(parent->reserved_cpus);
 	spin_unlock_irq(&callback_lock);
+
+out_ok:
 	retval = 0;
 out:
+	free_cpumask_var(addmask);
+	free_cpumask_var(delmask);
 	if (old_count && !parent->nr_reserved)
 		free_cpumask_var(parent->reserved_cpus);
 
@@ -1154,8 +1197,12 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
 	if (retval < 0)
 		return retval;
 
-	if (is_sched_domain_root(cs))
-		return -EBUSY;
+	if (is_sched_domain_root(cs)) {
+		retval = update_reserved_cpumask(cs, cs->cpus_allowed,
+						 trialcs->cpus_allowed);
+		if (retval < 0)
+			return retval;
+	}
 
 	spin_lock_irq(&callback_lock);
 	cpumask_copy(cs->cpus_allowed, trialcs->cpus_allowed);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v10 4/9] cpuset: Allow changes to cpus in a domain root
@ 2018-06-18  4:14   ` Waiman Long
  0 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-18  4:14 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Patrick Bellasi, Waiman Long

The previous patch introduces a new domain_root flag, but won't allow
changes made to "cpuset.cpus" once the flag is on. That may be too
restrictive in some use cases. So this restiction is now relaxed to
allow changes made to the "cpuset.cpus" file with some constraints:

 1) The new set of cpus must still be exclusive.
 2) Newly added cpus must be a subset of the parent effective_cpus.
 3) None of the deleted cpus can be one of those allocated to a child
    domain roots, if present.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/admin-guide/cgroup-v2.rst |  9 ++++
 kernel/cgroup/cpuset.c                  | 81 ++++++++++++++++++++++++++-------
 2 files changed, 73 insertions(+), 17 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index d5e25a0..5ee5e77 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1617,6 +1617,15 @@ Cpuset Interface Files
 	There must be at least one cpu left in the parent scheduling
 	domain root cgroup.
 
+	In a scheduling domain root, changes to "cpuset.cpus" is allowed
+	as long as the first condition above as well as the following
+	two additional conditions are true.
+
+	1) Any added CPUs must be a proper subset of the parent's
+	   "cpuset.cpus.effective".
+	2) No CPU that has been distributed to child scheduling domain
+	   roots is deleted.
+
 
 Device controller
 -----------------
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index a1d5ccd..b1abe3d 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -957,6 +957,9 @@ static void update_cpumasks_hier(struct cpuset *cs, struct cpumask *new_cpus)
 
 		spin_lock_irq(&callback_lock);
 		cpumask_copy(cp->effective_cpus, new_cpus);
+		if (cp->nr_reserved)
+			cpumask_andnot(cp->effective_cpus, cp->effective_cpus,
+				       cp->reserved_cpus);
 		spin_unlock_irq(&callback_lock);
 
 		WARN_ON(!is_in_v2_mode() &&
@@ -984,24 +987,26 @@ static void update_cpumasks_hier(struct cpuset *cs, struct cpumask *new_cpus)
 /**
  * update_reserved_cpumask - update the reserved_cpus mask of parent cpuset
  * @cpuset:  The cpuset that requests CPU reservation
- * @delmask: The old reserved cpumask to be removed from the parent
- * @addmask: The new reserved cpumask to be added to the parent
+ * @oldmask: The old reserved cpumask to be removed from the parent
+ * @newmask: The new reserved cpumask to be added to the parent
  * Return: 0 if successful, an error code otherwise
  *
  * Changes to the reserved CPUs are not allowed if any of CPUs changing
  * state are in any of the child cpusets of the parent except the requesting
  * child.
  *
- * If the sched_domain_root flag changes, either the delmask (0=>1) or the
- * addmask (1=>0) will be NULL.
+ * If the sched_domain_root flag changes, either the oldmask (0=>1) or the
+ * newmask (1=>0) will be NULL.
  *
  * Called with cpuset_mutex held. Some of the checks are skipped if the
  * cpuset is being offlined (dying).
  */
 static int update_reserved_cpumask(struct cpuset *cpuset,
-	struct cpumask *delmask, struct cpumask *addmask)
+	struct cpumask *oldmask, struct cpumask *newmask)
 {
 	int retval;
+	int adding, deleting;
+	cpumask_var_t addmask, delmask;
 	struct cpuset *parent = parent_cs(cpuset);
 	struct cpuset *sibling;
 	struct cgroup_subsys_state *pos_css;
@@ -1013,15 +1018,15 @@ static int update_reserved_cpumask(struct cpuset *cpuset,
 	 * The new cpumask, if present, must not be empty.
 	 */
 	if (!is_sched_domain_root(parent) ||
-	   (addmask && cpumask_empty(addmask)))
+	   (newmask && cpumask_empty(newmask)))
 		return -EINVAL;
 
 	/*
-	 * The delmask, if present, must be a subset of parent's reserved
+	 * The oldmask, if present, must be a subset of parent's reserved
 	 * CPUs.
 	 */
-	if (delmask && !cpumask_empty(delmask) && (!parent->nr_reserved ||
-		       !cpumask_subset(delmask, parent->reserved_cpus))) {
+	if (oldmask && !cpumask_empty(oldmask) && (!parent->nr_reserved ||
+		       !cpumask_subset(oldmask, parent->reserved_cpus))) {
 		WARN_ON_ONCE(1);
 		return -EINVAL;
 	}
@@ -1030,9 +1035,17 @@ static int update_reserved_cpumask(struct cpuset *cpuset,
 	 * A sched_domain_root state change is not allowed if there are
 	 * online children and the cpuset is not dying.
 	 */
-	if (!dying && css_has_online_children(&cpuset->css))
+	if (!dying && (!oldmask || !newmask) &&
+	    css_has_online_children(&cpuset->css))
 		return -EBUSY;
 
+	if (!zalloc_cpumask_var(&addmask, GFP_KERNEL))
+		return -ENOMEM;
+	if (!zalloc_cpumask_var(&delmask, GFP_KERNEL)) {
+		free_cpumask_var(addmask);
+		return -ENOMEM;
+	}
+
 	if (!old_count) {
 		if (!zalloc_cpumask_var(&parent->reserved_cpus, GFP_KERNEL)) {
 			retval = -ENOMEM;
@@ -1042,12 +1055,29 @@ static int update_reserved_cpumask(struct cpuset *cpuset,
 	}
 
 	retval = -EBUSY;
+	adding = deleting = false;
+	/*
+	 * addmask = newmask & ~oldmask
+	 * delmask = oldmask & ~newmask
+	 */
+	if (oldmask && newmask) {
+		adding   = cpumask_andnot(addmask, newmask, oldmask);
+		deleting = cpumask_andnot(delmask, oldmask, newmask);
+		if (!adding && !deleting)
+			goto out_ok;
+	} else if (newmask) {
+		adding = true;
+		cpumask_copy(addmask, newmask);
+	} else if (oldmask) {
+		deleting = true;
+		cpumask_copy(delmask, oldmask);
+	}
 
 	/*
 	 * The cpus to be added must be a proper subset of the parent's
 	 * effective_cpus mask but not in the reserved_cpus mask.
 	 */
-	if (addmask) {
+	if (adding) {
 		if (!cpumask_subset(addmask, parent->effective_cpus) ||
 		     cpumask_equal(addmask, parent->effective_cpus))
 			goto out;
@@ -1057,6 +1087,15 @@ static int update_reserved_cpumask(struct cpuset *cpuset,
 	}
 
 	/*
+	 * For cpu changes in a domain root, cpu deletion isn't allowed
+	 * if any of the deleted CPUs is in reserved_cpus (distributed
+	 * to child domain roots).
+	 */
+	if (oldmask && newmask && cpuset->nr_reserved && deleting &&
+	    cpumask_intersects(delmask, cpuset->reserved_cpus))
+		goto out;
+
+	/*
 	 * Check if any CPUs in addmask or delmask are in the effective_cpus
 	 * of a sibling cpuset. The implied cpu_exclusive of a scheduling
 	 * domain root will ensure there are no overlap in cpus_allowed.
@@ -1070,10 +1109,10 @@ static int update_reserved_cpumask(struct cpuset *cpuset,
 	cpuset_for_each_child(sibling, pos_css, parent) {
 		if ((sibling == cpuset) || !(sibling->css.flags & CSS_ONLINE))
 			continue;
-		if (addmask &&
+		if (adding &&
 		    cpumask_intersects(sibling->effective_cpus, addmask))
 			goto out_unlock;
-		if (delmask &&
+		if (deleting &&
 		    cpumask_intersects(sibling->effective_cpus, delmask))
 			goto out_unlock;
 	}
@@ -1086,13 +1125,13 @@ static int update_reserved_cpumask(struct cpuset *cpuset,
 	 */
 updated_reserved_cpus:
 	spin_lock_irq(&callback_lock);
-	if (addmask) {
+	if (adding) {
 		cpumask_or(parent->reserved_cpus,
 			   parent->reserved_cpus, addmask);
 		cpumask_andnot(parent->effective_cpus,
 			       parent->effective_cpus, addmask);
 	}
-	if (delmask) {
+	if (deleting) {
 		cpumask_andnot(parent->reserved_cpus,
 			       parent->reserved_cpus, delmask);
 		cpumask_or(parent->effective_cpus,
@@ -1101,8 +1140,12 @@ static int update_reserved_cpumask(struct cpuset *cpuset,
 
 	parent->nr_reserved = cpumask_weight(parent->reserved_cpus);
 	spin_unlock_irq(&callback_lock);
+
+out_ok:
 	retval = 0;
 out:
+	free_cpumask_var(addmask);
+	free_cpumask_var(delmask);
 	if (old_count && !parent->nr_reserved)
 		free_cpumask_var(parent->reserved_cpus);
 
@@ -1154,8 +1197,12 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
 	if (retval < 0)
 		return retval;
 
-	if (is_sched_domain_root(cs))
-		return -EBUSY;
+	if (is_sched_domain_root(cs)) {
+		retval = update_reserved_cpumask(cs, cs->cpus_allowed,
+						 trialcs->cpus_allowed);
+		if (retval < 0)
+			return retval;
+	}
 
 	spin_lock_irq(&callback_lock);
 	cpumask_copy(cs->cpus_allowed, trialcs->cpus_allowed);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v10 5/9] cpuset: Make sure that domain roots work properly with CPU hotplug
  2018-06-18  4:13 ` Waiman Long
@ 2018-06-18  4:14   ` Waiman Long
  -1 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-18  4:14 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Patrick Bellasi, Waiman Long

When there is a cpu hotplug event (CPU online or offline), the scheduling
domains needed to be reconfigured and regenerated. So code is added to
the hotplug functions to make them work with new reserved_cpus mask to
compute the right effective_cpus for each of the affected cpusets.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/admin-guide/cgroup-v2.rst |  7 +++++++
 kernel/cgroup/cpuset.c                  | 26 ++++++++++++++++++++++++--
 2 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 5ee5e77..6ef3516 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1626,6 +1626,13 @@ Cpuset Interface Files
 	2) No CPU that has been distributed to child scheduling domain
 	   roots is deleted.
 
+	When all the CPUs allocated to a scheduling domain are offlined,
+	that scheduling domain will be temporaily gone and all the
+	tasks in that scheduling domain will migrate to another one that
+	belongs to the parent of the scheduling domain root.  When any
+	of those offlined CPUs is onlined again, a new scheduling domain
+	will be re-created and the tasks will be migrated back.
+
 
 Device controller
 -----------------
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index b1abe3d..26ac083 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -900,7 +900,8 @@ static void update_tasks_cpumask(struct cpuset *cs)
  * @parent: the parent cpuset
  *
  * If the parent has reserved CPUs, include them in the list of allowable
- * CPUs in computing the new effective_cpus mask.
+ * CPUs in computing the new effective_cpus mask. The cpu_active_mask is
+ * used to mask off cpus that are to be offlined.
  */
 static void compute_effective_cpumask(struct cpumask *new_cpus,
 				      struct cpuset *cs, struct cpuset *parent)
@@ -909,6 +910,7 @@ static void compute_effective_cpumask(struct cpumask *new_cpus,
 		cpumask_or(new_cpus, parent->effective_cpus,
 			   parent->reserved_cpus);
 		cpumask_and(new_cpus, new_cpus, cs->cpus_allowed);
+		cpumask_and(new_cpus, new_cpus, cpu_active_mask);
 	} else {
 		cpumask_and(new_cpus, cs->cpus_allowed, parent->effective_cpus);
 	}
@@ -2571,9 +2573,17 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs)
 		goto retry;
 	}
 
-	cpumask_and(&new_cpus, cs->cpus_allowed, parent_cs(cs)->effective_cpus);
+	compute_effective_cpumask(&new_cpus, cs, parent_cs(cs));
 	nodes_and(new_mems, cs->mems_allowed, parent_cs(cs)->effective_mems);
 
+	if (cs->nr_reserved) {
+		/*
+		 * Some of the CPUs may have been distributed to child
+		 * domain roots. So we need skip those when computing the
+		 * real effective cpus.
+		 */
+		cpumask_andnot(&new_cpus, &new_cpus, cs->reserved_cpus);
+	}
 	cpus_updated = !cpumask_equal(&new_cpus, cs->effective_cpus);
 	mems_updated = !nodes_equal(new_mems, cs->effective_mems);
 
@@ -2623,6 +2633,11 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
 	cpumask_copy(&new_cpus, cpu_active_mask);
 	new_mems = node_states[N_MEMORY];
 
+	/*
+	 * If reserved_cpus is populated, it is likely that the check below
+	 * will produce a false positive on cpus_updated when the cpu list
+	 * isn't changed. It is extra work, but it is better to be safe.
+	 */
 	cpus_updated = !cpumask_equal(top_cpuset.effective_cpus, &new_cpus);
 	mems_updated = !nodes_equal(top_cpuset.effective_mems, new_mems);
 
@@ -2631,6 +2646,13 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
 		spin_lock_irq(&callback_lock);
 		if (!on_dfl)
 			cpumask_copy(top_cpuset.cpus_allowed, &new_cpus);
+		/*
+		 * Make sure that the reserved cpus aren't in the
+		 * effective cpus.
+		 */
+		if (top_cpuset.nr_reserved)
+			cpumask_andnot(&new_cpus, &new_cpus,
+					top_cpuset.reserved_cpus);
 		cpumask_copy(top_cpuset.effective_cpus, &new_cpus);
 		spin_unlock_irq(&callback_lock);
 		/* we don't mess with cpumasks of tasks in top_cpuset */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v10 5/9] cpuset: Make sure that domain roots work properly with CPU hotplug
@ 2018-06-18  4:14   ` Waiman Long
  0 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-18  4:14 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Patrick Bellasi, Waiman Long

When there is a cpu hotplug event (CPU online or offline), the scheduling
domains needed to be reconfigured and regenerated. So code is added to
the hotplug functions to make them work with new reserved_cpus mask to
compute the right effective_cpus for each of the affected cpusets.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/admin-guide/cgroup-v2.rst |  7 +++++++
 kernel/cgroup/cpuset.c                  | 26 ++++++++++++++++++++++++--
 2 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 5ee5e77..6ef3516 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1626,6 +1626,13 @@ Cpuset Interface Files
 	2) No CPU that has been distributed to child scheduling domain
 	   roots is deleted.
 
+	When all the CPUs allocated to a scheduling domain are offlined,
+	that scheduling domain will be temporaily gone and all the
+	tasks in that scheduling domain will migrate to another one that
+	belongs to the parent of the scheduling domain root.  When any
+	of those offlined CPUs is onlined again, a new scheduling domain
+	will be re-created and the tasks will be migrated back.
+
 
 Device controller
 -----------------
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index b1abe3d..26ac083 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -900,7 +900,8 @@ static void update_tasks_cpumask(struct cpuset *cs)
  * @parent: the parent cpuset
  *
  * If the parent has reserved CPUs, include them in the list of allowable
- * CPUs in computing the new effective_cpus mask.
+ * CPUs in computing the new effective_cpus mask. The cpu_active_mask is
+ * used to mask off cpus that are to be offlined.
  */
 static void compute_effective_cpumask(struct cpumask *new_cpus,
 				      struct cpuset *cs, struct cpuset *parent)
@@ -909,6 +910,7 @@ static void compute_effective_cpumask(struct cpumask *new_cpus,
 		cpumask_or(new_cpus, parent->effective_cpus,
 			   parent->reserved_cpus);
 		cpumask_and(new_cpus, new_cpus, cs->cpus_allowed);
+		cpumask_and(new_cpus, new_cpus, cpu_active_mask);
 	} else {
 		cpumask_and(new_cpus, cs->cpus_allowed, parent->effective_cpus);
 	}
@@ -2571,9 +2573,17 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs)
 		goto retry;
 	}
 
-	cpumask_and(&new_cpus, cs->cpus_allowed, parent_cs(cs)->effective_cpus);
+	compute_effective_cpumask(&new_cpus, cs, parent_cs(cs));
 	nodes_and(new_mems, cs->mems_allowed, parent_cs(cs)->effective_mems);
 
+	if (cs->nr_reserved) {
+		/*
+		 * Some of the CPUs may have been distributed to child
+		 * domain roots. So we need skip those when computing the
+		 * real effective cpus.
+		 */
+		cpumask_andnot(&new_cpus, &new_cpus, cs->reserved_cpus);
+	}
 	cpus_updated = !cpumask_equal(&new_cpus, cs->effective_cpus);
 	mems_updated = !nodes_equal(new_mems, cs->effective_mems);
 
@@ -2623,6 +2633,11 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
 	cpumask_copy(&new_cpus, cpu_active_mask);
 	new_mems = node_states[N_MEMORY];
 
+	/*
+	 * If reserved_cpus is populated, it is likely that the check below
+	 * will produce a false positive on cpus_updated when the cpu list
+	 * isn't changed. It is extra work, but it is better to be safe.
+	 */
 	cpus_updated = !cpumask_equal(top_cpuset.effective_cpus, &new_cpus);
 	mems_updated = !nodes_equal(top_cpuset.effective_mems, new_mems);
 
@@ -2631,6 +2646,13 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
 		spin_lock_irq(&callback_lock);
 		if (!on_dfl)
 			cpumask_copy(top_cpuset.cpus_allowed, &new_cpus);
+		/*
+		 * Make sure that the reserved cpus aren't in the
+		 * effective cpus.
+		 */
+		if (top_cpuset.nr_reserved)
+			cpumask_andnot(&new_cpus, &new_cpus,
+					top_cpuset.reserved_cpus);
 		cpumask_copy(top_cpuset.effective_cpus, &new_cpus);
 		spin_unlock_irq(&callback_lock);
 		/* we don't mess with cpumasks of tasks in top_cpuset */
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v10 6/9] cpuset: Make generate_sched_domains() recognize isolated_cpus
  2018-06-18  4:13 ` Waiman Long
@ 2018-06-18  4:14   ` Waiman Long
  -1 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-18  4:14 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Patrick Bellasi, Waiman Long

The generate_sched_domains() function and the hotplug code are modified
to make them use the newly introduced isolated_cpus mask for schedule
domains generation.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/cpuset.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index cfc9b7b..5ee4239 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -672,13 +672,14 @@ static int generate_sched_domains(cpumask_var_t **domains,
 	int ndoms = 0;		/* number of sched domains in result */
 	int nslot;		/* next empty doms[] struct cpumask slot */
 	struct cgroup_subsys_state *pos_css;
+	bool root_load_balance = is_sched_load_balance(&top_cpuset);
 
 	doms = NULL;
 	dattr = NULL;
 	csa = NULL;
 
 	/* Special case for the 99% of systems with one, full, sched domain */
-	if (is_sched_load_balance(&top_cpuset)) {
+	if (root_load_balance && !top_cpuset.isolation_count) {
 		ndoms = 1;
 		doms = alloc_sched_domains(ndoms);
 		if (!doms)
@@ -701,6 +702,8 @@ static int generate_sched_domains(cpumask_var_t **domains,
 	csn = 0;
 
 	rcu_read_lock();
+	if (root_load_balance)
+		csa[csn++] = &top_cpuset;
 	cpuset_for_each_descendant_pre(cp, pos_css, &top_cpuset) {
 		if (cp == &top_cpuset)
 			continue;
@@ -711,6 +714,9 @@ static int generate_sched_domains(cpumask_var_t **domains,
 		 * parent's cpus, so just skip them, and then we call
 		 * update_domain_attr_tree() to calc relax_domain_level of
 		 * the corresponding sched domain.
+		 *
+		 * If root is load-balancing, we can skip @cp if it
+		 * is a subset of the root's effective_cpus.
 		 */
 		if (!cpumask_empty(cp->cpus_allowed) &&
 		    !(is_sched_load_balance(cp) &&
@@ -718,11 +724,16 @@ static int generate_sched_domains(cpumask_var_t **domains,
 					 housekeeping_cpumask(HK_FLAG_DOMAIN))))
 			continue;
 
+		if (root_load_balance &&
+		    cpumask_subset(cp->cpus_allowed, top_cpuset.effective_cpus))
+			continue;
+
 		if (is_sched_load_balance(cp))
 			csa[csn++] = cp;
 
-		/* skip @cp's subtree */
-		pos_css = css_rightmost_descendant(pos_css);
+		/* skip @cp's subtree if not a scheduling domain root */
+		if (!is_sched_domain_root(cp))
+			pos_css = css_rightmost_descendant(pos_css);
 	}
 	rcu_read_unlock();
 
@@ -849,7 +860,12 @@ static void rebuild_sched_domains_locked(void)
 	 * passing doms with offlined cpu to partition_sched_domains().
 	 * Anyways, hotplug work item will rebuild sched domains.
 	 */
-	if (!cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
+	if (!top_cpuset.isolation_count &&
+	    !cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
+		goto out;
+
+	if (top_cpuset.isolation_count &&
+	   !cpumask_subset(top_cpuset.effective_cpus, cpu_active_mask))
 		goto out;
 
 	/* Generate domain masks and attrs */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v10 6/9] cpuset: Make generate_sched_domains() recognize isolated_cpus
@ 2018-06-18  4:14   ` Waiman Long
  0 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-18  4:14 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Patrick Bellasi, Waiman Long

The generate_sched_domains() function and the hotplug code are modified
to make them use the newly introduced isolated_cpus mask for schedule
domains generation.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/cpuset.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index cfc9b7b..5ee4239 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -672,13 +672,14 @@ static int generate_sched_domains(cpumask_var_t **domains,
 	int ndoms = 0;		/* number of sched domains in result */
 	int nslot;		/* next empty doms[] struct cpumask slot */
 	struct cgroup_subsys_state *pos_css;
+	bool root_load_balance = is_sched_load_balance(&top_cpuset);
 
 	doms = NULL;
 	dattr = NULL;
 	csa = NULL;
 
 	/* Special case for the 99% of systems with one, full, sched domain */
-	if (is_sched_load_balance(&top_cpuset)) {
+	if (root_load_balance && !top_cpuset.isolation_count) {
 		ndoms = 1;
 		doms = alloc_sched_domains(ndoms);
 		if (!doms)
@@ -701,6 +702,8 @@ static int generate_sched_domains(cpumask_var_t **domains,
 	csn = 0;
 
 	rcu_read_lock();
+	if (root_load_balance)
+		csa[csn++] = &top_cpuset;
 	cpuset_for_each_descendant_pre(cp, pos_css, &top_cpuset) {
 		if (cp == &top_cpuset)
 			continue;
@@ -711,6 +714,9 @@ static int generate_sched_domains(cpumask_var_t **domains,
 		 * parent's cpus, so just skip them, and then we call
 		 * update_domain_attr_tree() to calc relax_domain_level of
 		 * the corresponding sched domain.
+		 *
+		 * If root is load-balancing, we can skip @cp if it
+		 * is a subset of the root's effective_cpus.
 		 */
 		if (!cpumask_empty(cp->cpus_allowed) &&
 		    !(is_sched_load_balance(cp) &&
@@ -718,11 +724,16 @@ static int generate_sched_domains(cpumask_var_t **domains,
 					 housekeeping_cpumask(HK_FLAG_DOMAIN))))
 			continue;
 
+		if (root_load_balance &&
+		    cpumask_subset(cp->cpus_allowed, top_cpuset.effective_cpus))
+			continue;
+
 		if (is_sched_load_balance(cp))
 			csa[csn++] = cp;
 
-		/* skip @cp's subtree */
-		pos_css = css_rightmost_descendant(pos_css);
+		/* skip @cp's subtree if not a scheduling domain root */
+		if (!is_sched_domain_root(cp))
+			pos_css = css_rightmost_descendant(pos_css);
 	}
 	rcu_read_unlock();
 
@@ -849,7 +860,12 @@ static void rebuild_sched_domains_locked(void)
 	 * passing doms with offlined cpu to partition_sched_domains().
 	 * Anyways, hotplug work item will rebuild sched domains.
 	 */
-	if (!cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
+	if (!top_cpuset.isolation_count &&
+	    !cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
+		goto out;
+
+	if (top_cpuset.isolation_count &&
+	   !cpumask_subset(top_cpuset.effective_cpus, cpu_active_mask))
 		goto out;
 
 	/* Generate domain masks and attrs */
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v10 6/9] cpuset: Make generate_sched_domains() recognize reserved_cpus
  2018-06-18  4:13 ` Waiman Long
@ 2018-06-18  4:14   ` Waiman Long
  -1 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-18  4:14 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Patrick Bellasi, Waiman Long

The generate_sched_domains() function is modified to make it work
correctly with the newly introduced reserved_cpus mask for schedule
domains generation.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/cpuset.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 26ac083..ea640ce 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -672,13 +672,14 @@ static int generate_sched_domains(cpumask_var_t **domains,
 	int ndoms = 0;		/* number of sched domains in result */
 	int nslot;		/* next empty doms[] struct cpumask slot */
 	struct cgroup_subsys_state *pos_css;
+	bool root_load_balance = is_sched_load_balance(&top_cpuset);
 
 	doms = NULL;
 	dattr = NULL;
 	csa = NULL;
 
 	/* Special case for the 99% of systems with one, full, sched domain */
-	if (is_sched_load_balance(&top_cpuset)) {
+	if (root_load_balance && !top_cpuset.nr_reserved) {
 		ndoms = 1;
 		doms = alloc_sched_domains(ndoms);
 		if (!doms)
@@ -701,6 +702,8 @@ static int generate_sched_domains(cpumask_var_t **domains,
 	csn = 0;
 
 	rcu_read_lock();
+	if (root_load_balance)
+		csa[csn++] = &top_cpuset;
 	cpuset_for_each_descendant_pre(cp, pos_css, &top_cpuset) {
 		if (cp == &top_cpuset)
 			continue;
@@ -711,6 +714,9 @@ static int generate_sched_domains(cpumask_var_t **domains,
 		 * parent's cpus, so just skip them, and then we call
 		 * update_domain_attr_tree() to calc relax_domain_level of
 		 * the corresponding sched domain.
+		 *
+		 * If root is load-balancing, we can skip @cp if it
+		 * is a subset of the root's effective_cpus.
 		 */
 		if (!cpumask_empty(cp->cpus_allowed) &&
 		    !(is_sched_load_balance(cp) &&
@@ -718,11 +724,16 @@ static int generate_sched_domains(cpumask_var_t **domains,
 					 housekeeping_cpumask(HK_FLAG_DOMAIN))))
 			continue;
 
+		if (root_load_balance &&
+		    cpumask_subset(cp->cpus_allowed, top_cpuset.effective_cpus))
+			continue;
+
 		if (is_sched_load_balance(cp))
 			csa[csn++] = cp;
 
-		/* skip @cp's subtree */
-		pos_css = css_rightmost_descendant(pos_css);
+		/* skip @cp's subtree if not a scheduling domain root */
+		if (!is_sched_domain_root(cp))
+			pos_css = css_rightmost_descendant(pos_css);
 	}
 	rcu_read_unlock();
 
@@ -850,7 +861,12 @@ static void rebuild_sched_domains_locked(void)
 	 * passing doms with offlined cpu to partition_sched_domains().
 	 * Anyways, hotplug work item will rebuild sched domains.
 	 */
-	if (!cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
+	if (!top_cpuset.nr_reserved &&
+	    !cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
+		goto out;
+
+	if (top_cpuset.nr_reserved &&
+	   !cpumask_subset(top_cpuset.effective_cpus, cpu_active_mask))
 		goto out;
 
 	/* Generate domain masks and attrs */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v10 6/9] cpuset: Make generate_sched_domains() recognize reserved_cpus
@ 2018-06-18  4:14   ` Waiman Long
  0 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-18  4:14 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Patrick Bellasi, Waiman Long

The generate_sched_domains() function is modified to make it work
correctly with the newly introduced reserved_cpus mask for schedule
domains generation.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/cpuset.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 26ac083..ea640ce 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -672,13 +672,14 @@ static int generate_sched_domains(cpumask_var_t **domains,
 	int ndoms = 0;		/* number of sched domains in result */
 	int nslot;		/* next empty doms[] struct cpumask slot */
 	struct cgroup_subsys_state *pos_css;
+	bool root_load_balance = is_sched_load_balance(&top_cpuset);
 
 	doms = NULL;
 	dattr = NULL;
 	csa = NULL;
 
 	/* Special case for the 99% of systems with one, full, sched domain */
-	if (is_sched_load_balance(&top_cpuset)) {
+	if (root_load_balance && !top_cpuset.nr_reserved) {
 		ndoms = 1;
 		doms = alloc_sched_domains(ndoms);
 		if (!doms)
@@ -701,6 +702,8 @@ static int generate_sched_domains(cpumask_var_t **domains,
 	csn = 0;
 
 	rcu_read_lock();
+	if (root_load_balance)
+		csa[csn++] = &top_cpuset;
 	cpuset_for_each_descendant_pre(cp, pos_css, &top_cpuset) {
 		if (cp == &top_cpuset)
 			continue;
@@ -711,6 +714,9 @@ static int generate_sched_domains(cpumask_var_t **domains,
 		 * parent's cpus, so just skip them, and then we call
 		 * update_domain_attr_tree() to calc relax_domain_level of
 		 * the corresponding sched domain.
+		 *
+		 * If root is load-balancing, we can skip @cp if it
+		 * is a subset of the root's effective_cpus.
 		 */
 		if (!cpumask_empty(cp->cpus_allowed) &&
 		    !(is_sched_load_balance(cp) &&
@@ -718,11 +724,16 @@ static int generate_sched_domains(cpumask_var_t **domains,
 					 housekeeping_cpumask(HK_FLAG_DOMAIN))))
 			continue;
 
+		if (root_load_balance &&
+		    cpumask_subset(cp->cpus_allowed, top_cpuset.effective_cpus))
+			continue;
+
 		if (is_sched_load_balance(cp))
 			csa[csn++] = cp;
 
-		/* skip @cp's subtree */
-		pos_css = css_rightmost_descendant(pos_css);
+		/* skip @cp's subtree if not a scheduling domain root */
+		if (!is_sched_domain_root(cp))
+			pos_css = css_rightmost_descendant(pos_css);
 	}
 	rcu_read_unlock();
 
@@ -850,7 +861,12 @@ static void rebuild_sched_domains_locked(void)
 	 * passing doms with offlined cpu to partition_sched_domains().
 	 * Anyways, hotplug work item will rebuild sched domains.
 	 */
-	if (!cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
+	if (!top_cpuset.nr_reserved &&
+	    !cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
+		goto out;
+
+	if (top_cpuset.nr_reserved &&
+	   !cpumask_subset(top_cpuset.effective_cpus, cpu_active_mask))
 		goto out;
 
 	/* Generate domain masks and attrs */
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v10 7/9] cpuset: Expose cpus.effective and mems.effective on cgroup v2 root
  2018-06-18  4:13 ` Waiman Long
@ 2018-06-18  4:14   ` Waiman Long
  -1 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-18  4:14 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Patrick Bellasi, Waiman Long

Because of the fact that setting the "cpuset.sched.domain_root" in
a direct child of root can remove CPUs from the root's effective CPU
list, it makes sense to know what CPUs are left in the root cgroup for
scheduling purpose. So the "cpuset.cpus.effective" control file is now
exposed in the v2 cgroup root.

For consistency, the "cpuset.mems.effective" control file is exposed
as well.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/admin-guide/cgroup-v2.rst | 4 ++--
 kernel/cgroup/cpuset.c                  | 2 --
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 6ef3516..c2b61e1 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1529,7 +1529,7 @@ Cpuset Interface Files
 	and won't be affected by any CPU hotplug events.
 
   cpuset.cpus.effective
-	A read-only multiple values file which exists on non-root
+	A read-only multiple values file which exists on all
 	cpuset-enabled cgroups.
 
 	It lists the onlined CPUs that are actually granted to this
@@ -1569,7 +1569,7 @@ Cpuset Interface Files
 	and won't be affected by any memory nodes hotplug events.
 
   cpuset.mems.effective
-	A read-only multiple values file which exists on non-root
+	A read-only multiple values file which exists on all
 	cpuset-enabled cgroups.
 
 	It lists the onlined memory nodes that are actually granted to
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index ea640ce..b609795 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2225,14 +2225,12 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
 		.name = "cpus.effective",
 		.seq_show = cpuset_common_seq_show,
 		.private = FILE_EFFECTIVE_CPULIST,
-		.flags = CFTYPE_NOT_ON_ROOT,
 	},
 
 	{
 		.name = "mems.effective",
 		.seq_show = cpuset_common_seq_show,
 		.private = FILE_EFFECTIVE_MEMLIST,
-		.flags = CFTYPE_NOT_ON_ROOT,
 	},
 
 	{
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v10 7/9] cpuset: Expose cpus.effective and mems.effective on cgroup v2 root
@ 2018-06-18  4:14   ` Waiman Long
  0 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-18  4:14 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Patrick Bellasi, Waiman Long

Because of the fact that setting the "cpuset.sched.domain_root" in
a direct child of root can remove CPUs from the root's effective CPU
list, it makes sense to know what CPUs are left in the root cgroup for
scheduling purpose. So the "cpuset.cpus.effective" control file is now
exposed in the v2 cgroup root.

For consistency, the "cpuset.mems.effective" control file is exposed
as well.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/admin-guide/cgroup-v2.rst | 4 ++--
 kernel/cgroup/cpuset.c                  | 2 --
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 6ef3516..c2b61e1 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1529,7 +1529,7 @@ Cpuset Interface Files
 	and won't be affected by any CPU hotplug events.
 
   cpuset.cpus.effective
-	A read-only multiple values file which exists on non-root
+	A read-only multiple values file which exists on all
 	cpuset-enabled cgroups.
 
 	It lists the onlined CPUs that are actually granted to this
@@ -1569,7 +1569,7 @@ Cpuset Interface Files
 	and won't be affected by any memory nodes hotplug events.
 
   cpuset.mems.effective
-	A read-only multiple values file which exists on non-root
+	A read-only multiple values file which exists on all
 	cpuset-enabled cgroups.
 
 	It lists the onlined memory nodes that are actually granted to
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index ea640ce..b609795 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2225,14 +2225,12 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
 		.name = "cpus.effective",
 		.seq_show = cpuset_common_seq_show,
 		.private = FILE_EFFECTIVE_CPULIST,
-		.flags = CFTYPE_NOT_ON_ROOT,
 	},
 
 	{
 		.name = "mems.effective",
 		.seq_show = cpuset_common_seq_show,
 		.private = FILE_EFFECTIVE_MEMLIST,
-		.flags = CFTYPE_NOT_ON_ROOT,
 	},
 
 	{
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v10 8/9] cpuset: Don't rebuild sched domains if cpu changes in non-domain root
  2018-06-18  4:13 ` Waiman Long
@ 2018-06-18  4:14   ` Waiman Long
  -1 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-18  4:14 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Patrick Bellasi, Waiman Long

With the cpuset v1, any changes to the list of allowable CPUs in a cpuset
may cause changes in the sched domain configuration depending on the
load balancing states and the cpu lists of its parent and its children.

With cpuset v2 (on default hierarchy), there are more restrictions
on how the load balancing state of a cpuset can change. As a result,
only changes made in a sched domain root will cause possible changes
to the corresponding sched domain. CPU changes to any of the non-domain
root cpusets will not cause changes in the sched domain configuration.
As a result, we don't need to call rebuild_sched_domains_locked()
for changes in a non-domain root cpuset saving precious cpu cycles.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/cpuset.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index b609795..ed80663 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -986,11 +986,15 @@ static void update_cpumasks_hier(struct cpuset *cs, struct cpumask *new_cpus)
 		update_tasks_cpumask(cp);
 
 		/*
-		 * If the effective cpumask of any non-empty cpuset is changed,
-		 * we need to rebuild sched domains.
+		 * On legacy hierarchy, if the effective cpumask of any non-
+		 * empty cpuset is changed, we need to rebuild sched domains.
+		 * On default hiearchy, the cpuset needs to be a sched
+		 * domain root as well.
 		 */
 		if (!cpumask_empty(cp->cpus_allowed) &&
-		    is_sched_load_balance(cp))
+		    is_sched_load_balance(cp) &&
+		   (!cgroup_subsys_on_dfl(cpuset_cgrp_subsys) ||
+		    is_sched_domain_root(cp)))
 			need_rebuild_sched_domains = true;
 
 		rcu_read_lock();
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v10 8/9] cpuset: Don't rebuild sched domains if cpu changes in non-domain root
@ 2018-06-18  4:14   ` Waiman Long
  0 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-18  4:14 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Patrick Bellasi, Waiman Long

With the cpuset v1, any changes to the list of allowable CPUs in a cpuset
may cause changes in the sched domain configuration depending on the
load balancing states and the cpu lists of its parent and its children.

With cpuset v2 (on default hierarchy), there are more restrictions
on how the load balancing state of a cpuset can change. As a result,
only changes made in a sched domain root will cause possible changes
to the corresponding sched domain. CPU changes to any of the non-domain
root cpusets will not cause changes in the sched domain configuration.
As a result, we don't need to call rebuild_sched_domains_locked()
for changes in a non-domain root cpuset saving precious cpu cycles.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/cpuset.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index b609795..ed80663 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -986,11 +986,15 @@ static void update_cpumasks_hier(struct cpuset *cs, struct cpumask *new_cpus)
 		update_tasks_cpumask(cp);
 
 		/*
-		 * If the effective cpumask of any non-empty cpuset is changed,
-		 * we need to rebuild sched domains.
+		 * On legacy hierarchy, if the effective cpumask of any non-
+		 * empty cpuset is changed, we need to rebuild sched domains.
+		 * On default hiearchy, the cpuset needs to be a sched
+		 * domain root as well.
 		 */
 		if (!cpumask_empty(cp->cpus_allowed) &&
-		    is_sched_load_balance(cp))
+		    is_sched_load_balance(cp) &&
+		   (!cgroup_subsys_on_dfl(cpuset_cgrp_subsys) ||
+		    is_sched_domain_root(cp)))
 			need_rebuild_sched_domains = true;
 
 		rcu_read_lock();
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v10 9/9] cpuset: Allow reporting of sched domain generation info
  2018-06-18  4:13 ` Waiman Long
@ 2018-06-18  4:14   ` Waiman Long
  -1 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-18  4:14 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Patrick Bellasi, Waiman Long

This patch enables us to report sched domain generation information.

If DYNAMIC_DEBUG is enabled, issuing the following command

  echo "file cpuset.c +p" > /sys/kernel/debug/dynamic_debug/control

and setting loglevel to 8 will allow the kernel to show what scheduling
domain changes are being made.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/cpuset.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index ed80663..2943d7c 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -836,6 +836,23 @@ static int generate_sched_domains(cpumask_var_t **domains,
 	return ndoms;
 }
 
+#ifdef CONFIG_DEBUG_KERNEL
+static inline void debug_print_domains(cpumask_var_t *doms, int ndoms)
+{
+	int i;
+	char buf[200];
+	char *ptr, *end = buf + sizeof(buf) - 1;
+
+	for (i = 0, ptr = buf, *end = '\0'; i < ndoms; i++)
+		ptr += snprintf(ptr, end - ptr, "dom%d=%*pbl ", i,
+				cpumask_pr_args(doms[i]));
+
+	pr_debug("Generated %d domains: %s\n", ndoms, buf);
+}
+#else
+static inline void debug_print_domains(cpumask_var_t *doms, int ndoms) { }
+#endif
+
 /*
  * Rebuild scheduler domains.
  *
@@ -871,6 +888,7 @@ static void rebuild_sched_domains_locked(void)
 
 	/* Generate domain masks and attrs */
 	ndoms = generate_sched_domains(&doms, &attr);
+	debug_print_domains(doms, ndoms);
 
 	/* Have scheduler rebuild the domains */
 	partition_sched_domains(ndoms, doms, attr);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v10 9/9] cpuset: Allow reporting of sched domain generation info
@ 2018-06-18  4:14   ` Waiman Long
  0 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-18  4:14 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Patrick Bellasi, Waiman Long

This patch enables us to report sched domain generation information.

If DYNAMIC_DEBUG is enabled, issuing the following command

  echo "file cpuset.c +p" > /sys/kernel/debug/dynamic_debug/control

and setting loglevel to 8 will allow the kernel to show what scheduling
domain changes are being made.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/cpuset.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index ed80663..2943d7c 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -836,6 +836,23 @@ static int generate_sched_domains(cpumask_var_t **domains,
 	return ndoms;
 }
 
+#ifdef CONFIG_DEBUG_KERNEL
+static inline void debug_print_domains(cpumask_var_t *doms, int ndoms)
+{
+	int i;
+	char buf[200];
+	char *ptr, *end = buf + sizeof(buf) - 1;
+
+	for (i = 0, ptr = buf, *end = '\0'; i < ndoms; i++)
+		ptr += snprintf(ptr, end - ptr, "dom%d=%*pbl ", i,
+				cpumask_pr_args(doms[i]));
+
+	pr_debug("Generated %d domains: %s\n", ndoms, buf);
+}
+#else
+static inline void debug_print_domains(cpumask_var_t *doms, int ndoms) { }
+#endif
+
 /*
  * Rebuild scheduler domains.
  *
@@ -871,6 +888,7 @@ static void rebuild_sched_domains_locked(void)
 
 	/* Generate domain masks and attrs */
 	ndoms = generate_sched_domains(&doms, &attr);
+	debug_print_domains(doms, ndoms);
 
 	/* Have scheduler rebuild the domains */
 	partition_sched_domains(ndoms, doms, attr);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 0/9] cpuset: Enable cpuset controller in default hierarchy
  2018-06-18  4:13 ` Waiman Long
@ 2018-06-18 14:20   ` Juri Lelli
  -1 siblings, 0 replies; 62+ messages in thread
From: Juri Lelli @ 2018-06-18 14:20 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra,
	Ingo Molnar, cgroups, linux-kernel, linux-doc, kernel-team, pjt,
	luto, Mike Galbraith, torvalds, Roman Gushchin, Patrick Bellasi

Hi,

On 18/06/18 12:13, Waiman Long wrote:
> v10:
>  - Remove the cpuset.sched.load_balance patch for now as it may not
>    be that useful.
>  - Break the large patch 2 into smaller patches to make them a bit
>    easier to review.
>  - Test and fix issues related to changing "cpuset.cpus" and cpu
>    online/offline in a domain root.
>  - Rename isolated_cpus to reserved_cpus as this cpumask holds CPUs
>    reserved for child sched domains.
>  - Rework the scheduling domain debug printing code in the last patch.
>  - Document update to the newly moved
>    Documentation/admin-guide/cgroup-v2.rst.

There seem to be two (similar but different) 6/9 in the set. Something
went wrong?

Also I can't seem to be able to create a subgroup with an isolated
domain root. I think that, when doing the following

 # mount -t cgroup2 none /sys/fs/cgroup
 # echo "+cpuset" >/sys/fs/cgroup/cgroup.subtree_control 
 # mkdir /sys/fs/cgroup/g1
 # echo 0-1 >/sys/fs/cgroup/g1/cpuset.cpus
 # echo 1 >/sys/fs/cgroup/g1/cpuset.sched.domain_root

rebuild_sched_domains_locked exits early, since
top_cpuset.effective_cpus != cpu_active_mask. (effective_cpus being 2-3
at this point since I'm testing this on a 0-3 system)

In your v9 this [1] was adding a special condition to make rebuilding of
domains happen. Was the change intentional?

Thanks,

- Juri

[1] https://marc.info/?l=linux-kernel&m=152760142531222&w=2

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 0/9] cpuset: Enable cpuset controller in default hierarchy
@ 2018-06-18 14:20   ` Juri Lelli
  0 siblings, 0 replies; 62+ messages in thread
From: Juri Lelli @ 2018-06-18 14:20 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra,
	Ingo Molnar, cgroups, linux-kernel, linux-doc, kernel-team, pjt,
	luto, Mike Galbraith, torvalds, Roman Gushchin, Patrick Bellasi

Hi,

On 18/06/18 12:13, Waiman Long wrote:
> v10:
>  - Remove the cpuset.sched.load_balance patch for now as it may not
>    be that useful.
>  - Break the large patch 2 into smaller patches to make them a bit
>    easier to review.
>  - Test and fix issues related to changing "cpuset.cpus" and cpu
>    online/offline in a domain root.
>  - Rename isolated_cpus to reserved_cpus as this cpumask holds CPUs
>    reserved for child sched domains.
>  - Rework the scheduling domain debug printing code in the last patch.
>  - Document update to the newly moved
>    Documentation/admin-guide/cgroup-v2.rst.

There seem to be two (similar but different) 6/9 in the set. Something
went wrong?

Also I can't seem to be able to create a subgroup with an isolated
domain root. I think that, when doing the following

 # mount -t cgroup2 none /sys/fs/cgroup
 # echo "+cpuset" >/sys/fs/cgroup/cgroup.subtree_control 
 # mkdir /sys/fs/cgroup/g1
 # echo 0-1 >/sys/fs/cgroup/g1/cpuset.cpus
 # echo 1 >/sys/fs/cgroup/g1/cpuset.sched.domain_root

rebuild_sched_domains_locked exits early, since
top_cpuset.effective_cpus != cpu_active_mask. (effective_cpus being 2-3
at this point since I'm testing this on a 0-3 system)

In your v9 this [1] was adding a special condition to make rebuilding of
domains happen. Was the change intentional?

Thanks,

- Juri

[1] https://marc.info/?l=linux-kernel&m=152760142531222&w=2
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 6/9] cpuset: Make generate_sched_domains() recognize isolated_cpus
  2018-06-18  4:14   ` Waiman Long
@ 2018-06-18 14:44     ` Waiman Long
  -1 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-18 14:44 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Patrick Bellasi

On 06/18/2018 12:14 PM, Waiman Long wrote:
> The generate_sched_domains() function and the hotplug code are modified
> to make them use the newly introduced isolated_cpus mask for schedule
> domains generation.
>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  kernel/cgroup/cpuset.c | 24 ++++++++++++++++++++----
>  1 file changed, 20 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index cfc9b7b..5ee4239 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -672,13 +672,14 @@ static int generate_sched_domains(cpumask_var_t **domains,
>  	int ndoms = 0;		/* number of sched domains in result */
>  	int nslot;		/* next empty doms[] struct cpumask slot */
>  	struct cgroup_subsys_state *pos_css;
> +	bool root_load_balance = is_sched_load_balance(&top_cpuset);
>  
>  	doms = NULL;
>  	dattr = NULL;
>  	csa = NULL;
>  
>  	/* Special case for the 99% of systems with one, full, sched domain */
> -	if (is_sched_load_balance(&top_cpuset)) {
> +	if (root_load_balance && !top_cpuset.isolation_count) {
>  		ndoms = 1;
>  		doms = alloc_sched_domains(ndoms);
>  		if (!doms)
> @@ -701,6 +702,8 @@ static int generate_sched_domains(cpumask_var_t **domains,
>  	csn = 0;
>  
>  	rcu_read_lock();
> +	if (root_load_balance)
> +		csa[csn++] = &top_cpuset;
>  	cpuset_for_each_descendant_pre(cp, pos_css, &top_cpuset) {
>  		if (cp == &top_cpuset)
>  			continue;
> @@ -711,6 +714,9 @@ static int generate_sched_domains(cpumask_var_t **domains,
>  		 * parent's cpus, so just skip them, and then we call
>  		 * update_domain_attr_tree() to calc relax_domain_level of
>  		 * the corresponding sched domain.
> +		 *
> +		 * If root is load-balancing, we can skip @cp if it
> +		 * is a subset of the root's effective_cpus.
>  		 */
>  		if (!cpumask_empty(cp->cpus_allowed) &&
>  		    !(is_sched_load_balance(cp) &&
> @@ -718,11 +724,16 @@ static int generate_sched_domains(cpumask_var_t **domains,
>  					 housekeeping_cpumask(HK_FLAG_DOMAIN))))
>  			continue;
>  
> +		if (root_load_balance &&
> +		    cpumask_subset(cp->cpus_allowed, top_cpuset.effective_cpus))
> +			continue;
> +
>  		if (is_sched_load_balance(cp))
>  			csa[csn++] = cp;
>  
> -		/* skip @cp's subtree */
> -		pos_css = css_rightmost_descendant(pos_css);
> +		/* skip @cp's subtree if not a scheduling domain root */
> +		if (!is_sched_domain_root(cp))
> +			pos_css = css_rightmost_descendant(pos_css);
>  	}
>  	rcu_read_unlock();
>  
> @@ -849,7 +860,12 @@ static void rebuild_sched_domains_locked(void)
>  	 * passing doms with offlined cpu to partition_sched_domains().
>  	 * Anyways, hotplug work item will rebuild sched domains.
>  	 */
> -	if (!cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
> +	if (!top_cpuset.isolation_count &&
> +	    !cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
> +		goto out;
> +
> +	if (top_cpuset.isolation_count &&
> +	   !cpumask_subset(top_cpuset.effective_cpus, cpu_active_mask))
>  		goto out;
>  
>  	/* Generate domain masks and attrs */

Sorry, that one is bogus. Please ignore that.

NAK


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 6/9] cpuset: Make generate_sched_domains() recognize isolated_cpus
@ 2018-06-18 14:44     ` Waiman Long
  0 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-18 14:44 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli,
	Patrick Bellasi

On 06/18/2018 12:14 PM, Waiman Long wrote:
> The generate_sched_domains() function and the hotplug code are modified
> to make them use the newly introduced isolated_cpus mask for schedule
> domains generation.
>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  kernel/cgroup/cpuset.c | 24 ++++++++++++++++++++----
>  1 file changed, 20 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index cfc9b7b..5ee4239 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -672,13 +672,14 @@ static int generate_sched_domains(cpumask_var_t **domains,
>  	int ndoms = 0;		/* number of sched domains in result */
>  	int nslot;		/* next empty doms[] struct cpumask slot */
>  	struct cgroup_subsys_state *pos_css;
> +	bool root_load_balance = is_sched_load_balance(&top_cpuset);
>  
>  	doms = NULL;
>  	dattr = NULL;
>  	csa = NULL;
>  
>  	/* Special case for the 99% of systems with one, full, sched domain */
> -	if (is_sched_load_balance(&top_cpuset)) {
> +	if (root_load_balance && !top_cpuset.isolation_count) {
>  		ndoms = 1;
>  		doms = alloc_sched_domains(ndoms);
>  		if (!doms)
> @@ -701,6 +702,8 @@ static int generate_sched_domains(cpumask_var_t **domains,
>  	csn = 0;
>  
>  	rcu_read_lock();
> +	if (root_load_balance)
> +		csa[csn++] = &top_cpuset;
>  	cpuset_for_each_descendant_pre(cp, pos_css, &top_cpuset) {
>  		if (cp == &top_cpuset)
>  			continue;
> @@ -711,6 +714,9 @@ static int generate_sched_domains(cpumask_var_t **domains,
>  		 * parent's cpus, so just skip them, and then we call
>  		 * update_domain_attr_tree() to calc relax_domain_level of
>  		 * the corresponding sched domain.
> +		 *
> +		 * If root is load-balancing, we can skip @cp if it
> +		 * is a subset of the root's effective_cpus.
>  		 */
>  		if (!cpumask_empty(cp->cpus_allowed) &&
>  		    !(is_sched_load_balance(cp) &&
> @@ -718,11 +724,16 @@ static int generate_sched_domains(cpumask_var_t **domains,
>  					 housekeeping_cpumask(HK_FLAG_DOMAIN))))
>  			continue;
>  
> +		if (root_load_balance &&
> +		    cpumask_subset(cp->cpus_allowed, top_cpuset.effective_cpus))
> +			continue;
> +
>  		if (is_sched_load_balance(cp))
>  			csa[csn++] = cp;
>  
> -		/* skip @cp's subtree */
> -		pos_css = css_rightmost_descendant(pos_css);
> +		/* skip @cp's subtree if not a scheduling domain root */
> +		if (!is_sched_domain_root(cp))
> +			pos_css = css_rightmost_descendant(pos_css);
>  	}
>  	rcu_read_unlock();
>  
> @@ -849,7 +860,12 @@ static void rebuild_sched_domains_locked(void)
>  	 * passing doms with offlined cpu to partition_sched_domains().
>  	 * Anyways, hotplug work item will rebuild sched domains.
>  	 */
> -	if (!cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
> +	if (!top_cpuset.isolation_count &&
> +	    !cpumask_equal(top_cpuset.effective_cpus, cpu_active_mask))
> +		goto out;
> +
> +	if (top_cpuset.isolation_count &&
> +	   !cpumask_subset(top_cpuset.effective_cpus, cpu_active_mask))
>  		goto out;
>  
>  	/* Generate domain masks and attrs */

Sorry, that one is bogus. Please ignore that.

NAK

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 6/9] cpuset: Make generate_sched_domains() recognize isolated_cpus
  2018-06-18 14:44     ` Waiman Long
@ 2018-06-18 14:58       ` Juri Lelli
  -1 siblings, 0 replies; 62+ messages in thread
From: Juri Lelli @ 2018-06-18 14:58 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra,
	Ingo Molnar, cgroups, linux-kernel, linux-doc, kernel-team, pjt,
	luto, Mike Galbraith, torvalds, Roman Gushchin, Patrick Bellasi

On 18/06/18 22:44, Waiman Long wrote:

[...]

> 
> Sorry, that one is bogus. Please ignore that.
> 

OK, ignoring this should actually "fix" the odd behaviour I was talking
about.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 6/9] cpuset: Make generate_sched_domains() recognize isolated_cpus
@ 2018-06-18 14:58       ` Juri Lelli
  0 siblings, 0 replies; 62+ messages in thread
From: Juri Lelli @ 2018-06-18 14:58 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra,
	Ingo Molnar, cgroups, linux-kernel, linux-doc, kernel-team, pjt,
	luto, Mike Galbraith, torvalds, Roman Gushchin, Patrick Bellasi

On 18/06/18 22:44, Waiman Long wrote:

[...]

> 
> Sorry, that one is bogus. Please ignore that.
> 

OK, ignoring this should actually "fix" the odd behaviour I was talking
about.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 0/9] cpuset: Enable cpuset controller in default hierarchy
  2018-06-18 14:20   ` Juri Lelli
@ 2018-06-18 15:07     ` Waiman Long
  -1 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-18 15:07 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra,
	Ingo Molnar, cgroups, linux-kernel, linux-doc, kernel-team, pjt,
	luto, Mike Galbraith, torvalds, Roman Gushchin, Patrick Bellasi

On 06/18/2018 10:20 PM, Juri Lelli wrote:
> Hi,
>
> On 18/06/18 12:13, Waiman Long wrote:
>> v10:
>>  - Remove the cpuset.sched.load_balance patch for now as it may not
>>    be that useful.
>>  - Break the large patch 2 into smaller patches to make them a bit
>>    easier to review.
>>  - Test and fix issues related to changing "cpuset.cpus" and cpu
>>    online/offline in a domain root.
>>  - Rename isolated_cpus to reserved_cpus as this cpumask holds CPUs
>>    reserved for child sched domains.
>>  - Rework the scheduling domain debug printing code in the last patch.
>>  - Document update to the newly moved
>>    Documentation/admin-guide/cgroup-v2.rst.
> There seem to be two (similar but different) 6/9 in the set. Something
> went wrong?

The isolated_cpus patch is old, I forgot to remove it before sending out
the patch.


> Also I can't seem to be able to create a subgroup with an isolated
> domain root. I think that, when doing the following
>
>  # mount -t cgroup2 none /sys/fs/cgroup
>  # echo "+cpuset" >/sys/fs/cgroup/cgroup.subtree_control 
>  # mkdir /sys/fs/cgroup/g1
>  # echo 0-1 >/sys/fs/cgroup/g1/cpuset.cpus
>  # echo 1 >/sys/fs/cgroup/g1/cpuset.sched.domain_root
>
> rebuild_sched_domains_locked exits early, since
> top_cpuset.effective_cpus != cpu_active_mask. (effective_cpus being 2-3
> at this point since I'm testing this on a 0-3 system)
>
> In your v9 this [1] was adding a special condition to make rebuilding of
> domains happen. Was the change intentional?

Can you reply to the relevant patch to pinpoint what condition are you
talking about? I do try to eliminate domain rebuild as much as possible,
but I am just not sure which condition you have question about.

Cheers,
Longman




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 0/9] cpuset: Enable cpuset controller in default hierarchy
@ 2018-06-18 15:07     ` Waiman Long
  0 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-18 15:07 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra,
	Ingo Molnar, cgroups, linux-kernel, linux-doc, kernel-team, pjt,
	luto, Mike Galbraith, torvalds, Roman Gushchin, Patrick Bellasi

On 06/18/2018 10:20 PM, Juri Lelli wrote:
> Hi,
>
> On 18/06/18 12:13, Waiman Long wrote:
>> v10:
>>  - Remove the cpuset.sched.load_balance patch for now as it may not
>>    be that useful.
>>  - Break the large patch 2 into smaller patches to make them a bit
>>    easier to review.
>>  - Test and fix issues related to changing "cpuset.cpus" and cpu
>>    online/offline in a domain root.
>>  - Rename isolated_cpus to reserved_cpus as this cpumask holds CPUs
>>    reserved for child sched domains.
>>  - Rework the scheduling domain debug printing code in the last patch.
>>  - Document update to the newly moved
>>    Documentation/admin-guide/cgroup-v2.rst.
> There seem to be two (similar but different) 6/9 in the set. Something
> went wrong?

The isolated_cpus patch is old, I forgot to remove it before sending out
the patch.


> Also I can't seem to be able to create a subgroup with an isolated
> domain root. I think that, when doing the following
>
>  # mount -t cgroup2 none /sys/fs/cgroup
>  # echo "+cpuset" >/sys/fs/cgroup/cgroup.subtree_control 
>  # mkdir /sys/fs/cgroup/g1
>  # echo 0-1 >/sys/fs/cgroup/g1/cpuset.cpus
>  # echo 1 >/sys/fs/cgroup/g1/cpuset.sched.domain_root
>
> rebuild_sched_domains_locked exits early, since
> top_cpuset.effective_cpus != cpu_active_mask. (effective_cpus being 2-3
> at this point since I'm testing this on a 0-3 system)
>
> In your v9 this [1] was adding a special condition to make rebuilding of
> domains happen. Was the change intentional?

Can you reply to the relevant patch to pinpoint what condition are you
talking about? I do try to eliminate domain rebuild as much as possible,
but I am just not sure which condition you have question about.

Cheers,
Longman



--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 0/9] cpuset: Enable cpuset controller in default hierarchy
  2018-06-18  4:13 ` Waiman Long
@ 2018-06-19  9:52   ` Juri Lelli
  -1 siblings, 0 replies; 62+ messages in thread
From: Juri Lelli @ 2018-06-19  9:52 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra,
	Ingo Molnar, cgroups, linux-kernel, linux-doc, kernel-team, pjt,
	luto, Mike Galbraith, torvalds, Roman Gushchin, Patrick Bellasi

Hi,

On 18/06/18 12:13, Waiman Long wrote:
> v10:
>  - Remove the cpuset.sched.load_balance patch for now as it may not
>    be that useful.
>  - Break the large patch 2 into smaller patches to make them a bit
>    easier to review.
>  - Test and fix issues related to changing "cpuset.cpus" and cpu
>    online/offline in a domain root.
>  - Rename isolated_cpus to reserved_cpus as this cpumask holds CPUs
>    reserved for child sched domains.
>  - Rework the scheduling domain debug printing code in the last patch.
>  - Document update to the newly moved
>    Documentation/admin-guide/cgroup-v2.rst.

By removing the redundant 6/9 as advised, I could test that this is
working as expected (at least by me :): https://git.io/fURIC

I played a bit with creating and modifying domain root(s), both with
legacy and default hierarchies. Looks good to me, you can add

Tested-by: Juri Lelli <juri.lelli@redhat.com>

to the series.

Best,

- Juri

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 0/9] cpuset: Enable cpuset controller in default hierarchy
@ 2018-06-19  9:52   ` Juri Lelli
  0 siblings, 0 replies; 62+ messages in thread
From: Juri Lelli @ 2018-06-19  9:52 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra,
	Ingo Molnar, cgroups, linux-kernel, linux-doc, kernel-team, pjt,
	luto, Mike Galbraith, torvalds, Roman Gushchin, Patrick Bellasi

Hi,

On 18/06/18 12:13, Waiman Long wrote:
> v10:
>  - Remove the cpuset.sched.load_balance patch for now as it may not
>    be that useful.
>  - Break the large patch 2 into smaller patches to make them a bit
>    easier to review.
>  - Test and fix issues related to changing "cpuset.cpus" and cpu
>    online/offline in a domain root.
>  - Rename isolated_cpus to reserved_cpus as this cpumask holds CPUs
>    reserved for child sched domains.
>  - Rework the scheduling domain debug printing code in the last patch.
>  - Document update to the newly moved
>    Documentation/admin-guide/cgroup-v2.rst.

By removing the redundant 6/9 as advised, I could test that this is
working as expected (at least by me :): https://git.io/fURIC

I played a bit with creating and modifying domain root(s), both with
legacy and default hierarchies. Looks good to me, you can add

Tested-by: Juri Lelli <juri.lelli@redhat.com>

to the series.

Best,

- Juri
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 3/9] cpuset: Simulate auto-off of sched.domain_root at cgroup removal
  2018-06-18  4:14   ` Waiman Long
@ 2018-06-20 14:11     ` Peter Zijlstra
  -1 siblings, 0 replies; 62+ messages in thread
From: Peter Zijlstra @ 2018-06-20 14:11 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi

On Mon, Jun 18, 2018 at 12:14:02PM +0800, Waiman Long wrote:
> @@ -1058,7 +1060,12 @@ static int update_reserved_cpumask(struct cpuset *cpuset,
>  	 * Check if any CPUs in addmask or delmask are in the effective_cpus
>  	 * of a sibling cpuset. The implied cpu_exclusive of a scheduling
>  	 * domain root will ensure there are no overlap in cpus_allowed.
> +	 *
> +	 * This check is skipped if the cpuset is dying.

Comments that state what the code does are mostly useless; please
explain _why_ if anything.

>  	 */
> +	if (dying)
> +		goto updated_reserved_cpus;
> +
>  	rcu_read_lock();
>  	cpuset_for_each_child(sibling, pos_css, parent) {
>  		if ((sibling == cpuset) || !(sibling->css.flags & CSS_ONLINE))

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 3/9] cpuset: Simulate auto-off of sched.domain_root at cgroup removal
@ 2018-06-20 14:11     ` Peter Zijlstra
  0 siblings, 0 replies; 62+ messages in thread
From: Peter Zijlstra @ 2018-06-20 14:11 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi

On Mon, Jun 18, 2018 at 12:14:02PM +0800, Waiman Long wrote:
> @@ -1058,7 +1060,12 @@ static int update_reserved_cpumask(struct cpuset *cpuset,
>  	 * Check if any CPUs in addmask or delmask are in the effective_cpus
>  	 * of a sibling cpuset. The implied cpu_exclusive of a scheduling
>  	 * domain root will ensure there are no overlap in cpus_allowed.
> +	 *
> +	 * This check is skipped if the cpuset is dying.

Comments that state what the code does are mostly useless; please
explain _why_ if anything.

>  	 */
> +	if (dying)
> +		goto updated_reserved_cpus;
> +
>  	rcu_read_lock();
>  	cpuset_for_each_child(sibling, pos_css, parent) {
>  		if ((sibling == cpuset) || !(sibling->css.flags & CSS_ONLINE))
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 5/9] cpuset: Make sure that domain roots work properly with CPU hotplug
  2018-06-18  4:14   ` Waiman Long
@ 2018-06-20 14:15     ` Peter Zijlstra
  -1 siblings, 0 replies; 62+ messages in thread
From: Peter Zijlstra @ 2018-06-20 14:15 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi

On Mon, Jun 18, 2018 at 12:14:04PM +0800, Waiman Long wrote:
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 5ee5e77..6ef3516 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1626,6 +1626,13 @@ Cpuset Interface Files
>  	2) No CPU that has been distributed to child scheduling domain
>  	   roots is deleted.
>  
> +	When all the CPUs allocated to a scheduling domain are offlined,
> +	that scheduling domain will be temporaily gone and all the
> +	tasks in that scheduling domain will migrate to another one that
> +	belongs to the parent of the scheduling domain root.  When any
> +	of those offlined CPUs is onlined again, a new scheduling domain
> +	will be re-created and the tasks will be migrated back.
> +

You should mention that this is a destructive operation. If any of the
tasks had an affinity smaller than the original cgroup, that will be
gone.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 5/9] cpuset: Make sure that domain roots work properly with CPU hotplug
@ 2018-06-20 14:15     ` Peter Zijlstra
  0 siblings, 0 replies; 62+ messages in thread
From: Peter Zijlstra @ 2018-06-20 14:15 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi

On Mon, Jun 18, 2018 at 12:14:04PM +0800, Waiman Long wrote:
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 5ee5e77..6ef3516 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1626,6 +1626,13 @@ Cpuset Interface Files
>  	2) No CPU that has been distributed to child scheduling domain
>  	   roots is deleted.
>  
> +	When all the CPUs allocated to a scheduling domain are offlined,
> +	that scheduling domain will be temporaily gone and all the
> +	tasks in that scheduling domain will migrate to another one that
> +	belongs to the parent of the scheduling domain root.  When any
> +	of those offlined CPUs is onlined again, a new scheduling domain
> +	will be re-created and the tasks will be migrated back.
> +

You should mention that this is a destructive operation. If any of the
tasks had an affinity smaller than the original cgroup, that will be
gone.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 6/9] cpuset: Make generate_sched_domains() recognize reserved_cpus
  2018-06-18  4:14   ` Waiman Long
@ 2018-06-20 14:17     ` Peter Zijlstra
  -1 siblings, 0 replies; 62+ messages in thread
From: Peter Zijlstra @ 2018-06-20 14:17 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi

On Mon, Jun 18, 2018 at 12:14:06PM +0800, Waiman Long wrote:
> The generate_sched_domains() function is modified to make it work
> correctly with the newly introduced reserved_cpus mask for schedule
> domains generation.

Why isn't this (and the previous) patch part of the patch that
introduces reserved_cpus? It seems weird to have this broken
intermediate state.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 6/9] cpuset: Make generate_sched_domains() recognize reserved_cpus
@ 2018-06-20 14:17     ` Peter Zijlstra
  0 siblings, 0 replies; 62+ messages in thread
From: Peter Zijlstra @ 2018-06-20 14:17 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi

On Mon, Jun 18, 2018 at 12:14:06PM +0800, Waiman Long wrote:
> The generate_sched_domains() function is modified to make it work
> correctly with the newly introduced reserved_cpus mask for schedule
> domains generation.

Why isn't this (and the previous) patch part of the patch that
introduces reserved_cpus? It seems weird to have this broken
intermediate state.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 9/9] cpuset: Allow reporting of sched domain generation info
  2018-06-18  4:14   ` Waiman Long
@ 2018-06-20 14:20     ` Peter Zijlstra
  -1 siblings, 0 replies; 62+ messages in thread
From: Peter Zijlstra @ 2018-06-20 14:20 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi

On Mon, Jun 18, 2018 at 12:14:09PM +0800, Waiman Long wrote:
> +#ifdef CONFIG_DEBUG_KERNEL
> +static inline void debug_print_domains(cpumask_var_t *doms, int ndoms)
> +{
> +	int i;
> +	char buf[200];
> +	char *ptr, *end = buf + sizeof(buf) - 1;
> +
> +	for (i = 0, ptr = buf, *end = '\0'; i < ndoms; i++)
> +		ptr += snprintf(ptr, end - ptr, "dom%d=%*pbl ", i,
> +				cpumask_pr_args(doms[i]));
> +
> +	pr_debug("Generated %d domains: %s\n", ndoms, buf);
> +}

Why not use pr_cont() and do away with that static buffer?

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 9/9] cpuset: Allow reporting of sched domain generation info
@ 2018-06-20 14:20     ` Peter Zijlstra
  0 siblings, 0 replies; 62+ messages in thread
From: Peter Zijlstra @ 2018-06-20 14:20 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi

On Mon, Jun 18, 2018 at 12:14:09PM +0800, Waiman Long wrote:
> +#ifdef CONFIG_DEBUG_KERNEL
> +static inline void debug_print_domains(cpumask_var_t *doms, int ndoms)
> +{
> +	int i;
> +	char buf[200];
> +	char *ptr, *end = buf + sizeof(buf) - 1;
> +
> +	for (i = 0, ptr = buf, *end = '\0'; i < ndoms; i++)
> +		ptr += snprintf(ptr, end - ptr, "dom%d=%*pbl ", i,
> +				cpumask_pr_args(doms[i]));
> +
> +	pr_debug("Generated %d domains: %s\n", ndoms, buf);
> +}

Why not use pr_cont() and do away with that static buffer?
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 2/9] cpuset: Add new v2 cpuset.sched.domain_root flag
  2018-06-18  4:14   ` Waiman Long
@ 2018-06-20 14:27     ` Peter Zijlstra
  -1 siblings, 0 replies; 62+ messages in thread
From: Peter Zijlstra @ 2018-06-20 14:27 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi

On Mon, Jun 18, 2018 at 12:14:01PM +0800, Waiman Long wrote:
> +  cpuset.sched.domain_root

Why are we calling this a domain_root and not a partition?

> +	A read-write single value file which exists on non-root
> +	cpuset-enabled cgroups.  It is a binary value flag that accepts
> +	either "0" (off) or "1" (on).  This flag is set by the parent
> +	and is not delegatable.

You still haven't answered:

  https://lkml.kernel.org/r/20180531094943.GG12180@hirez.programming.kicks-ass.net

the question stands.

> +	If set, it indicates that the current cgroup is the root of a
> +	new scheduling domain or partition that comprises itself and
> +	all its descendants except those that are scheduling domain
> +	roots themselves and their descendants.  The root cgroup is
> +	always a scheduling domain root.
> +
> +	There are constraints on where this flag can be set.  It can
> +	only be set in a cgroup if all the following conditions are true.
> +
> +	1) The "cpuset.cpus" is not empty and the list of CPUs are
> +	   exclusive, i.e. they are not shared by any of its siblings.
> +	2) The "cpuset.cpus" is also a proper subset of the parent's
> +	   "cpuset.cpus.effective".
> +	3) The parent cgroup is a scheduling domain root.
> +	4) There is no child cgroups with cpuset enabled.  This is
> +	   for eliminating corner cases that have to be handled if such
> +	   a condition is allowed.
> +
> +	Setting this flag will take the CPUs away from the effective
> +	CPUs of the parent cgroup.  Once it is set, this flag cannot be
> +	cleared if there are any child cgroups with cpuset enabled.
> +
> +	A parent scheduling domain root cgroup cannot distribute
> +	all its CPUs to its child scheduling domain root cgroups.
> +	There must be at least one cpu left in the parent scheduling
> +	domain root cgroup.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 2/9] cpuset: Add new v2 cpuset.sched.domain_root flag
@ 2018-06-20 14:27     ` Peter Zijlstra
  0 siblings, 0 replies; 62+ messages in thread
From: Peter Zijlstra @ 2018-06-20 14:27 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi

On Mon, Jun 18, 2018 at 12:14:01PM +0800, Waiman Long wrote:
> +  cpuset.sched.domain_root

Why are we calling this a domain_root and not a partition?

> +	A read-write single value file which exists on non-root
> +	cpuset-enabled cgroups.  It is a binary value flag that accepts
> +	either "0" (off) or "1" (on).  This flag is set by the parent
> +	and is not delegatable.

You still haven't answered:

  https://lkml.kernel.org/r/20180531094943.GG12180@hirez.programming.kicks-ass.net

the question stands.

> +	If set, it indicates that the current cgroup is the root of a
> +	new scheduling domain or partition that comprises itself and
> +	all its descendants except those that are scheduling domain
> +	roots themselves and their descendants.  The root cgroup is
> +	always a scheduling domain root.
> +
> +	There are constraints on where this flag can be set.  It can
> +	only be set in a cgroup if all the following conditions are true.
> +
> +	1) The "cpuset.cpus" is not empty and the list of CPUs are
> +	   exclusive, i.e. they are not shared by any of its siblings.
> +	2) The "cpuset.cpus" is also a proper subset of the parent's
> +	   "cpuset.cpus.effective".
> +	3) The parent cgroup is a scheduling domain root.
> +	4) There is no child cgroups with cpuset enabled.  This is
> +	   for eliminating corner cases that have to be handled if such
> +	   a condition is allowed.
> +
> +	Setting this flag will take the CPUs away from the effective
> +	CPUs of the parent cgroup.  Once it is set, this flag cannot be
> +	cleared if there are any child cgroups with cpuset enabled.
> +
> +	A parent scheduling domain root cgroup cannot distribute
> +	all its CPUs to its child scheduling domain root cgroups.
> +	There must be at least one cpu left in the parent scheduling
> +	domain root cgroup.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 5/9] cpuset: Make sure that domain roots work properly with CPU hotplug
  2018-06-20 14:15     ` Peter Zijlstra
@ 2018-06-21  3:09       ` Waiman Long
  -1 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-21  3:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi

On 06/20/2018 10:15 PM, Peter Zijlstra wrote:
> On Mon, Jun 18, 2018 at 12:14:04PM +0800, Waiman Long wrote:
>> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
>> index 5ee5e77..6ef3516 100644
>> --- a/Documentation/admin-guide/cgroup-v2.rst
>> +++ b/Documentation/admin-guide/cgroup-v2.rst
>> @@ -1626,6 +1626,13 @@ Cpuset Interface Files
>>  	2) No CPU that has been distributed to child scheduling domain
>>  	   roots is deleted.
>>  
>> +	When all the CPUs allocated to a scheduling domain are offlined,
>> +	that scheduling domain will be temporaily gone and all the
>> +	tasks in that scheduling domain will migrate to another one that
>> +	belongs to the parent of the scheduling domain root.  When any
>> +	of those offlined CPUs is onlined again, a new scheduling domain
>> +	will be re-created and the tasks will be migrated back.
>> +
> You should mention that this is a destructive operation. If any of the
> tasks had an affinity smaller than the original cgroup, that will be
> gone.

Thanks for the information. Will update the documentation to mention that.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 5/9] cpuset: Make sure that domain roots work properly with CPU hotplug
@ 2018-06-21  3:09       ` Waiman Long
  0 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-21  3:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi

On 06/20/2018 10:15 PM, Peter Zijlstra wrote:
> On Mon, Jun 18, 2018 at 12:14:04PM +0800, Waiman Long wrote:
>> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
>> index 5ee5e77..6ef3516 100644
>> --- a/Documentation/admin-guide/cgroup-v2.rst
>> +++ b/Documentation/admin-guide/cgroup-v2.rst
>> @@ -1626,6 +1626,13 @@ Cpuset Interface Files
>>  	2) No CPU that has been distributed to child scheduling domain
>>  	   roots is deleted.
>>  
>> +	When all the CPUs allocated to a scheduling domain are offlined,
>> +	that scheduling domain will be temporaily gone and all the
>> +	tasks in that scheduling domain will migrate to another one that
>> +	belongs to the parent of the scheduling domain root.  When any
>> +	of those offlined CPUs is onlined again, a new scheduling domain
>> +	will be re-created and the tasks will be migrated back.
>> +
> You should mention that this is a destructive operation. If any of the
> tasks had an affinity smaller than the original cgroup, that will be
> gone.

Thanks for the information. Will update the documentation to mention that.

Cheers,
Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 2/9] cpuset: Add new v2 cpuset.sched.domain_root flag
  2018-06-20 14:27     ` Peter Zijlstra
@ 2018-06-21  7:58       ` Waiman Long
  -1 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-21  7:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi

On 06/20/2018 10:27 PM, Peter Zijlstra wrote:
> On Mon, Jun 18, 2018 at 12:14:01PM +0800, Waiman Long wrote:
>> +  cpuset.sched.domain_root
> Why are we calling this a domain_root and not a partition?

A partition can consist of several cgroups in a tree structure. That
flag should only be set at the root of a partition. I will change the
name to partition_root if you think this name is acceptable.

>
>> +	A read-write single value file which exists on non-root
>> +	cpuset-enabled cgroups.  It is a binary value flag that accepts
>> +	either "0" (off) or "1" (on).  This flag is set by the parent
>> +	and is not delegatable.
> You still haven't answered:
> , 
>   https://lkml.kernel.org/r/20180531094943.GG12180@hirez.programming.kicks-ass.net
>
> the question stands.

I am sorry to miss your question. Turning on domain_root will affects
the cpu mapping in the parent. That is why it cannot be set by the child
as a child is not supposed to be able to affect the parent.

As for the inconsistency between the real root and the container root,
this is true for almost all the controllers. So it is a generic problem.
One possible solution is to create a kind a pseudo root cgroup for the
container that looks and feels like a real root. But is there really a
need to do that?

Cheers,
Longman


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 2/9] cpuset: Add new v2 cpuset.sched.domain_root flag
@ 2018-06-21  7:58       ` Waiman Long
  0 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-21  7:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi

On 06/20/2018 10:27 PM, Peter Zijlstra wrote:
> On Mon, Jun 18, 2018 at 12:14:01PM +0800, Waiman Long wrote:
>> +  cpuset.sched.domain_root
> Why are we calling this a domain_root and not a partition?

A partition can consist of several cgroups in a tree structure. That
flag should only be set at the root of a partition. I will change the
name to partition_root if you think this name is acceptable.

>
>> +	A read-write single value file which exists on non-root
>> +	cpuset-enabled cgroups.  It is a binary value flag that accepts
>> +	either "0" (off) or "1" (on).  This flag is set by the parent
>> +	and is not delegatable.
> You still haven't answered:
> , 
>   https://lkml.kernel.org/r/20180531094943.GG12180@hirez.programming.kicks-ass.net
>
> the question stands.

I am sorry to miss your question. Turning on domain_root will affects
the cpu mapping in the parent. That is why it cannot be set by the child
as a child is not supposed to be able to affect the parent.

As for the inconsistency between the real root and the container root,
this is true for almost all the controllers. So it is a generic problem.
One possible solution is to create a kind a pseudo root cgroup for the
container that looks and feels like a real root. But is there really a
need to do that?

Cheers,
Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 2/9] cpuset: Add new v2 cpuset.sched.domain_root flag
  2018-06-21  7:58       ` Waiman Long
@ 2018-06-21  8:05         ` Waiman Long
  -1 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-21  8:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi

On 06/21/2018 03:58 PM, Waiman Long wrote:
> On 06/20/2018 10:27 PM, Peter Zijlstra wrote:
>> On Mon, Jun 18, 2018 at 12:14:01PM +0800, Waiman Long wrote:
>>> +  cpuset.sched.domain_root
>> Why are we calling this a domain_root and not a partition?
> A partition can consist of several cgroups in a tree structure. That
> flag should only be set at the root of a partition. I will change the
> name to partition_root if you think this name is acceptable.
>
>>> +	A read-write single value file which exists on non-root
>>> +	cpuset-enabled cgroups.  It is a binary value flag that accepts
>>> +	either "0" (off) or "1" (on).  This flag is set by the parent
>>> +	and is not delegatable.
>> You still haven't answered:
>> ,  
>>   https://lkml.kernel.org/r/20180531094943.GG12180@hirez.programming.kicks-ass.net
>>
>> the question stands.
> I am sorry to miss your question. Turning on domain_root will affects
> the cpu mapping in the parent. That is why it cannot be set by the child
> as a child is not supposed to be able to affect the parent.

After thinking a bit more about it, you are right that I should not use
the term "not delegatable" here. I will rephrase in the next version.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 2/9] cpuset: Add new v2 cpuset.sched.domain_root flag
@ 2018-06-21  8:05         ` Waiman Long
  0 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-21  8:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi

On 06/21/2018 03:58 PM, Waiman Long wrote:
> On 06/20/2018 10:27 PM, Peter Zijlstra wrote:
>> On Mon, Jun 18, 2018 at 12:14:01PM +0800, Waiman Long wrote:
>>> +  cpuset.sched.domain_root
>> Why are we calling this a domain_root and not a partition?
> A partition can consist of several cgroups in a tree structure. That
> flag should only be set at the root of a partition. I will change the
> name to partition_root if you think this name is acceptable.
>
>>> +	A read-write single value file which exists on non-root
>>> +	cpuset-enabled cgroups.  It is a binary value flag that accepts
>>> +	either "0" (off) or "1" (on).  This flag is set by the parent
>>> +	and is not delegatable.
>> You still haven't answered:
>> ,  
>>   https://lkml.kernel.org/r/20180531094943.GG12180@hirez.programming.kicks-ass.net
>>
>> the question stands.
> I am sorry to miss your question. Turning on domain_root will affects
> the cpu mapping in the parent. That is why it cannot be set by the child
> as a child is not supposed to be able to affect the parent.

After thinking a bit more about it, you are right that I should not use
the term "not delegatable" here. I will rephrase in the next version.

Cheers,
Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 6/9] cpuset: Make generate_sched_domains() recognize reserved_cpus
  2018-06-20 14:17     ` Peter Zijlstra
@ 2018-06-21  8:14       ` Waiman Long
  -1 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-21  8:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi

On 06/20/2018 10:17 PM, Peter Zijlstra wrote:
> On Mon, Jun 18, 2018 at 12:14:06PM +0800, Waiman Long wrote:
>> The generate_sched_domains() function is modified to make it work
>> correctly with the newly introduced reserved_cpus mask for schedule
>> domains generation.
> Why isn't this (and the previous) patch part of the patch that
> introduces reserved_cpus? It seems weird to have this broken
> intermediate state.

I was trying to break the reserved cpu patch into smaller meaningful
pieces that will be easier to review. The individual patches can be
compiled cleanly one-by-one, though I didn't check if they are
functional if not all of them are applied.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 6/9] cpuset: Make generate_sched_domains() recognize reserved_cpus
@ 2018-06-21  8:14       ` Waiman Long
  0 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-21  8:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi

On 06/20/2018 10:17 PM, Peter Zijlstra wrote:
> On Mon, Jun 18, 2018 at 12:14:06PM +0800, Waiman Long wrote:
>> The generate_sched_domains() function is modified to make it work
>> correctly with the newly introduced reserved_cpus mask for schedule
>> domains generation.
> Why isn't this (and the previous) patch part of the patch that
> introduces reserved_cpus? It seems weird to have this broken
> intermediate state.

I was trying to break the reserved cpu patch into smaller meaningful
pieces that will be easier to review. The individual patches can be
compiled cleanly one-by-one, though I didn't check if they are
functional if not all of them are applied.

Cheers,
Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 3/9] cpuset: Simulate auto-off of sched.domain_root at cgroup removal
  2018-06-20 14:11     ` Peter Zijlstra
@ 2018-06-21  8:22       ` Waiman Long
  -1 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-21  8:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi

On 06/20/2018 10:11 PM, Peter Zijlstra wrote:
> On Mon, Jun 18, 2018 at 12:14:02PM +0800, Waiman Long wrote:
>> @@ -1058,7 +1060,12 @@ static int update_reserved_cpumask(struct cpuset *cpuset,
>>  	 * Check if any CPUs in addmask or delmask are in the effective_cpus
>>  	 * of a sibling cpuset. The implied cpu_exclusive of a scheduling
>>  	 * domain root will ensure there are no overlap in cpus_allowed.
>> +	 *
>> +	 * This check is skipped if the cpuset is dying.
> Comments that state what the code does are mostly useless; please
> explain _why_ if anything.

I  am adding more restrictions on where the domain_root can be turned on
to make sure that there will be no surprise.

I have a script to test the new cpuset v2 functionality and found that
cgroup deletion may sometime failed if there was not enough time for the
previous operation to complete. That is the reason why I relax the
checking for dying cgroup to make my test script pass.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 3/9] cpuset: Simulate auto-off of sched.domain_root at cgroup removal
@ 2018-06-21  8:22       ` Waiman Long
  0 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-21  8:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi

On 06/20/2018 10:11 PM, Peter Zijlstra wrote:
> On Mon, Jun 18, 2018 at 12:14:02PM +0800, Waiman Long wrote:
>> @@ -1058,7 +1060,12 @@ static int update_reserved_cpumask(struct cpuset *cpuset,
>>  	 * Check if any CPUs in addmask or delmask are in the effective_cpus
>>  	 * of a sibling cpuset. The implied cpu_exclusive of a scheduling
>>  	 * domain root will ensure there are no overlap in cpus_allowed.
>> +	 *
>> +	 * This check is skipped if the cpuset is dying.
> Comments that state what the code does are mostly useless; please
> explain _why_ if anything.

I  am adding more restrictions on where the domain_root can be turned on
to make sure that there will be no surprise.

I have a script to test the new cpuset v2 functionality and found that
cgroup deletion may sometime failed if there was not enough time for the
previous operation to complete. That is the reason why I relax the
checking for dying cgroup to make my test script pass.

Cheers,
Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 2/9] cpuset: Add new v2 cpuset.sched.domain_root flag
  2018-06-21  7:58       ` Waiman Long
@ 2018-06-21  9:20         ` Peter Zijlstra
  -1 siblings, 0 replies; 62+ messages in thread
From: Peter Zijlstra @ 2018-06-21  9:20 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi

On Thu, Jun 21, 2018 at 03:58:06PM +0800, Waiman Long wrote:

> As for the inconsistency between the real root and the container root,
> this is true for almost all the controllers. So it is a generic problem.
> One possible solution is to create a kind a pseudo root cgroup for the
> container that looks and feels like a real root. But is there really a
> need to do that?

I don't really know. I thought the idea was to make containers
indistinguishable from a real system. Now I know we're really rather far
away from that in reality, and I really have no clue how important all
that is.

It all depends on how exactly this works; is it like I assumed, that
this file is owned by the parent instead of the current directory? And
that if you namespace this, you have an effective read-only file?

Then fixing the inconsistency is trivial; simply provide a read-only
file for the actual root cgroup too.

And if the solution is trivial, I don't see a good reason not to do it.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 2/9] cpuset: Add new v2 cpuset.sched.domain_root flag
@ 2018-06-21  9:20         ` Peter Zijlstra
  0 siblings, 0 replies; 62+ messages in thread
From: Peter Zijlstra @ 2018-06-21  9:20 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi

On Thu, Jun 21, 2018 at 03:58:06PM +0800, Waiman Long wrote:

> As for the inconsistency between the real root and the container root,
> this is true for almost all the controllers. So it is a generic problem.
> One possible solution is to create a kind a pseudo root cgroup for the
> container that looks and feels like a real root. But is there really a
> need to do that?

I don't really know. I thought the idea was to make containers
indistinguishable from a real system. Now I know we're really rather far
away from that in reality, and I really have no clue how important all
that is.

It all depends on how exactly this works; is it like I assumed, that
this file is owned by the parent instead of the current directory? And
that if you namespace this, you have an effective read-only file?

Then fixing the inconsistency is trivial; simply provide a read-only
file for the actual root cgroup too.

And if the solution is trivial, I don't see a good reason not to do it.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 2/9] cpuset: Add new v2 cpuset.sched.domain_root flag
  2018-06-21  7:58       ` Waiman Long
@ 2018-06-21  9:27         ` Peter Zijlstra
  -1 siblings, 0 replies; 62+ messages in thread
From: Peter Zijlstra @ 2018-06-21  9:27 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi

On Thu, Jun 21, 2018 at 03:58:06PM +0800, Waiman Long wrote:
> On 06/20/2018 10:27 PM, Peter Zijlstra wrote:
> > On Mon, Jun 18, 2018 at 12:14:01PM +0800, Waiman Long wrote:
> >> +  cpuset.sched.domain_root
> > Why are we calling this a domain_root and not a partition?
> 
> A partition can consist of several cgroups in a tree structure. That
> flag should only be set at the root of a partition. I will change the
> name to partition_root if you think this name is acceptable.

The flag indicates the 'effective_cpus' things of the current group is a
partition. The fact that it can have sub-partitions doesn't really
matter does it.

Just call it 'partition', leave out the whole root stuff, all of cgroup
is hierarchical and you can have sub-groups we don't go around calling
everything a root.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 2/9] cpuset: Add new v2 cpuset.sched.domain_root flag
@ 2018-06-21  9:27         ` Peter Zijlstra
  0 siblings, 0 replies; 62+ messages in thread
From: Peter Zijlstra @ 2018-06-21  9:27 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi

On Thu, Jun 21, 2018 at 03:58:06PM +0800, Waiman Long wrote:
> On 06/20/2018 10:27 PM, Peter Zijlstra wrote:
> > On Mon, Jun 18, 2018 at 12:14:01PM +0800, Waiman Long wrote:
> >> +  cpuset.sched.domain_root
> > Why are we calling this a domain_root and not a partition?
> 
> A partition can consist of several cgroups in a tree structure. That
> flag should only be set at the root of a partition. I will change the
> name to partition_root if you think this name is acceptable.

The flag indicates the 'effective_cpus' things of the current group is a
partition. The fact that it can have sub-partitions doesn't really
matter does it.

Just call it 'partition', leave out the whole root stuff, all of cgroup
is hierarchical and you can have sub-groups we don't go around calling
everything a root.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 2/9] cpuset: Add new v2 cpuset.sched.domain_root flag
  2018-06-21  9:27         ` Peter Zijlstra
@ 2018-06-22  2:48           ` Waiman Long
  -1 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-22  2:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi

On 06/21/2018 05:27 PM, Peter Zijlstra wrote:
> On Thu, Jun 21, 2018 at 03:58:06PM +0800, Waiman Long wrote:
>> On 06/20/2018 10:27 PM, Peter Zijlstra wrote:
>>> On Mon, Jun 18, 2018 at 12:14:01PM +0800, Waiman Long wrote:
>>>> +  cpuset.sched.domain_root
>>> Why are we calling this a domain_root and not a partition?
>> A partition can consist of several cgroups in a tree structure. That
>> flag should only be set at the root of a partition. I will change the
>> name to partition_root if you think this name is acceptable.
> The flag indicates the 'effective_cpus' things of the current group is a
> partition. The fact that it can have sub-partitions doesn't really
> matter does it.
>
> Just call it 'partition', leave out the whole root stuff, all of cgroup
> is hierarchical and you can have sub-groups we don't go arounOd calling
> everything a root.

OK, will name it "cpuset.sched.partition" in the next version of the
patchset.

Cheers,
Longman



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 2/9] cpuset: Add new v2 cpuset.sched.domain_root flag
@ 2018-06-22  2:48           ` Waiman Long
  0 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-22  2:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi

On 06/21/2018 05:27 PM, Peter Zijlstra wrote:
> On Thu, Jun 21, 2018 at 03:58:06PM +0800, Waiman Long wrote:
>> On 06/20/2018 10:27 PM, Peter Zijlstra wrote:
>>> On Mon, Jun 18, 2018 at 12:14:01PM +0800, Waiman Long wrote:
>>>> +  cpuset.sched.domain_root
>>> Why are we calling this a domain_root and not a partition?
>> A partition can consist of several cgroups in a tree structure. That
>> flag should only be set at the root of a partition. I will change the
>> name to partition_root if you think this name is acceptable.
> The flag indicates the 'effective_cpus' things of the current group is a
> partition. The fact that it can have sub-partitions doesn't really
> matter does it.
>
> Just call it 'partition', leave out the whole root stuff, all of cgroup
> is hierarchical and you can have sub-groups we don't go arounOd calling
> everything a root.

OK, will name it "cpuset.sched.partition" in the next version of the
patchset.

Cheers,
Longman


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 2/9] cpuset: Add new v2 cpuset.sched.domain_root flag
  2018-06-21  9:20         ` Peter Zijlstra
@ 2018-06-22  3:00           ` Waiman Long
  -1 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-22  3:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi

On 06/21/2018 05:20 PM, Peter Zijlstra wrote:
> On Thu, Jun 21, 2018 at 03:58:06PM +0800, Waiman Long wrote:
>
>> As for the inconsistency between the real root and the container root,
>> this is true for almost all the controllers. So it is a generic problem.
>> One possible solution is to create a kind a pseudo root cgroup for the
>> container that looks and feels like a real root. But is there really a
>> need to do that?
> I don't really know. I thought the idea was to make containers
> indistinguishable from a real system. Now I know we're really rather far
> away from that in reality, and I really have no clue how important all
> that is.

That will certainly be the ideal.

> It all depends on how exactly this works; is it like I assumed, that
> this file is owned by the parent instead of the current directory? And
> that if you namespace this, you have an effective read-only file?

Yes, that is right.

> Then fixing the inconsistency is trivial; simply provide a read-only
> file for the actual root cgroup too.
>
> And if the solution is trivial, I don't see a good reason not to do it.

Do you mean providing a flag like READONLY_AT_ROOT so that it will be
read-only at the real root? That is an cgroup architectural decision
that needs input from Tejun. Anyway, this issue is not specific to this
patchset and I would like to break it out as a separate discussion
independent of this patchset.

Cheers,
Longman



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 2/9] cpuset: Add new v2 cpuset.sched.domain_root flag
@ 2018-06-22  3:00           ` Waiman Long
  0 siblings, 0 replies; 62+ messages in thread
From: Waiman Long @ 2018-06-22  3:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi

On 06/21/2018 05:20 PM, Peter Zijlstra wrote:
> On Thu, Jun 21, 2018 at 03:58:06PM +0800, Waiman Long wrote:
>
>> As for the inconsistency between the real root and the container root,
>> this is true for almost all the controllers. So it is a generic problem.
>> One possible solution is to create a kind a pseudo root cgroup for the
>> container that looks and feels like a real root. But is there really a
>> need to do that?
> I don't really know. I thought the idea was to make containers
> indistinguishable from a real system. Now I know we're really rather far
> away from that in reality, and I really have no clue how important all
> that is.

That will certainly be the ideal.

> It all depends on how exactly this works; is it like I assumed, that
> this file is owned by the parent instead of the current directory? And
> that if you namespace this, you have an effective read-only file?

Yes, that is right.

> Then fixing the inconsistency is trivial; simply provide a read-only
> file for the actual root cgroup too.
>
> And if the solution is trivial, I don't see a good reason not to do it.

Do you mean providing a flag like READONLY_AT_ROOT so that it will be
read-only at the real root? That is an cgroup architectural decision
that needs input from Tejun. Anyway, this issue is not specific to this
patchset and I would like to break it out as a separate discussion
independent of this patchset.

Cheers,
Longman


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 2/9] cpuset: Add new v2 cpuset.sched.domain_root flag
  2018-06-22  3:00           ` Waiman Long
@ 2018-07-02 16:32             ` Tejun Heo
  -1 siblings, 0 replies; 62+ messages in thread
From: Tejun Heo @ 2018-07-02 16:32 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi

Hello,

On Fri, Jun 22, 2018 at 11:00:03AM +0800, Waiman Long wrote:
> On 06/21/2018 05:20 PM, Peter Zijlstra wrote:
> >> As for the inconsistency between the real root and the container root,
> >> this is true for almost all the controllers. So it is a generic problem.
> >> One possible solution is to create a kind a pseudo root cgroup for the
> >> container that looks and feels like a real root. But is there really a
> >> need to do that?
> > I don't really know. I thought the idea was to make containers
> > indistinguishable from a real system. Now I know we're really rather far
> > away from that in reality, and I really have no clue how important all
> > that is.
> 
> That will certainly be the ideal.

Sure, ideal in theoretical sense; however, the practical cost-benefit
ratio of trying to make containers indistinguishible from system
doesn't seem enough to justify the effort.  Not yet anyway.  It'd be
nice to not paint ourselves into a corner where we can't get the
equivalence without major interface changes later but I think that's
about the extent we should go for now.

> > It all depends on how exactly this works; is it like I assumed, that
> > this file is owned by the parent instead of the current directory? And
> > that if you namespace this, you have an effective read-only file?
> 
> Yes, that is right.
> 
> > Then fixing the inconsistency is trivial; simply provide a read-only
> > file for the actual root cgroup too.
> >
> > And if the solution is trivial, I don't see a good reason not to do it.
> 
> Do you mean providing a flag like READONLY_AT_ROOT so that it will be
> read-only at the real root? That is an cgroup architectural decision
> that needs input from Tejun. Anyway, this issue is not specific to this
> patchset and I would like to break it out as a separate discussion
> independent of this patchset.

Yeah, it's a larger problem than cpuset and different controllers
trying it in different ways isn't a good idea anyway.  Let's shelve
this topic for now.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v10 2/9] cpuset: Add new v2 cpuset.sched.domain_root flag
@ 2018-07-02 16:32             ` Tejun Heo
  0 siblings, 0 replies; 62+ messages in thread
From: Tejun Heo @ 2018-07-02 16:32 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, Mike Galbraith,
	torvalds, Roman Gushchin, Juri Lelli, Patrick Bellasi

Hello,

On Fri, Jun 22, 2018 at 11:00:03AM +0800, Waiman Long wrote:
> On 06/21/2018 05:20 PM, Peter Zijlstra wrote:
> >> As for the inconsistency between the real root and the container root,
> >> this is true for almost all the controllers. So it is a generic problem.
> >> One possible solution is to create a kind a pseudo root cgroup for the
> >> container that looks and feels like a real root. But is there really a
> >> need to do that?
> > I don't really know. I thought the idea was to make containers
> > indistinguishable from a real system. Now I know we're really rather far
> > away from that in reality, and I really have no clue how important all
> > that is.
> 
> That will certainly be the ideal.

Sure, ideal in theoretical sense; however, the practical cost-benefit
ratio of trying to make containers indistinguishible from system
doesn't seem enough to justify the effort.  Not yet anyway.  It'd be
nice to not paint ourselves into a corner where we can't get the
equivalence without major interface changes later but I think that's
about the extent we should go for now.

> > It all depends on how exactly this works; is it like I assumed, that
> > this file is owned by the parent instead of the current directory? And
> > that if you namespace this, you have an effective read-only file?
> 
> Yes, that is right.
> 
> > Then fixing the inconsistency is trivial; simply provide a read-only
> > file for the actual root cgroup too.
> >
> > And if the solution is trivial, I don't see a good reason not to do it.
> 
> Do you mean providing a flag like READONLY_AT_ROOT so that it will be
> read-only at the real root? That is an cgroup architectural decision
> that needs input from Tejun. Anyway, this issue is not specific to this
> patchset and I would like to break it out as a separate discussion
> independent of this patchset.

Yeah, it's a larger problem than cpuset and different controllers
trying it in different ways isn't a good idea anyway.  Let's shelve
this topic for now.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2018-07-02 16:32 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-18  4:13 [PATCH v10 0/9] cpuset: Enable cpuset controller in default hierarchy Waiman Long
2018-06-18  4:13 ` Waiman Long
2018-06-18  4:14 ` [PATCH v10 1/9] " Waiman Long
2018-06-18  4:14   ` Waiman Long
2018-06-18  4:14 ` [PATCH v10 2/9] cpuset: Add new v2 cpuset.sched.domain_root flag Waiman Long
2018-06-18  4:14   ` Waiman Long
2018-06-20 14:27   ` Peter Zijlstra
2018-06-20 14:27     ` Peter Zijlstra
2018-06-21  7:58     ` Waiman Long
2018-06-21  7:58       ` Waiman Long
2018-06-21  8:05       ` Waiman Long
2018-06-21  8:05         ` Waiman Long
2018-06-21  9:20       ` Peter Zijlstra
2018-06-21  9:20         ` Peter Zijlstra
2018-06-22  3:00         ` Waiman Long
2018-06-22  3:00           ` Waiman Long
2018-07-02 16:32           ` Tejun Heo
2018-07-02 16:32             ` Tejun Heo
2018-06-21  9:27       ` Peter Zijlstra
2018-06-21  9:27         ` Peter Zijlstra
2018-06-22  2:48         ` Waiman Long
2018-06-22  2:48           ` Waiman Long
2018-06-18  4:14 ` [PATCH v10 3/9] cpuset: Simulate auto-off of sched.domain_root at cgroup removal Waiman Long
2018-06-18  4:14   ` Waiman Long
2018-06-20 14:11   ` Peter Zijlstra
2018-06-20 14:11     ` Peter Zijlstra
2018-06-21  8:22     ` Waiman Long
2018-06-21  8:22       ` Waiman Long
2018-06-18  4:14 ` [PATCH v10 4/9] cpuset: Allow changes to cpus in a domain root Waiman Long
2018-06-18  4:14   ` Waiman Long
2018-06-18  4:14 ` [PATCH v10 5/9] cpuset: Make sure that domain roots work properly with CPU hotplug Waiman Long
2018-06-18  4:14   ` Waiman Long
2018-06-20 14:15   ` Peter Zijlstra
2018-06-20 14:15     ` Peter Zijlstra
2018-06-21  3:09     ` Waiman Long
2018-06-21  3:09       ` Waiman Long
2018-06-18  4:14 ` [PATCH v10 6/9] cpuset: Make generate_sched_domains() recognize isolated_cpus Waiman Long
2018-06-18  4:14   ` Waiman Long
2018-06-18 14:44   ` Waiman Long
2018-06-18 14:44     ` Waiman Long
2018-06-18 14:58     ` Juri Lelli
2018-06-18 14:58       ` Juri Lelli
2018-06-18  4:14 ` [PATCH v10 6/9] cpuset: Make generate_sched_domains() recognize reserved_cpus Waiman Long
2018-06-18  4:14   ` Waiman Long
2018-06-20 14:17   ` Peter Zijlstra
2018-06-20 14:17     ` Peter Zijlstra
2018-06-21  8:14     ` Waiman Long
2018-06-21  8:14       ` Waiman Long
2018-06-18  4:14 ` [PATCH v10 7/9] cpuset: Expose cpus.effective and mems.effective on cgroup v2 root Waiman Long
2018-06-18  4:14   ` Waiman Long
2018-06-18  4:14 ` [PATCH v10 8/9] cpuset: Don't rebuild sched domains if cpu changes in non-domain root Waiman Long
2018-06-18  4:14   ` Waiman Long
2018-06-18  4:14 ` [PATCH v10 9/9] cpuset: Allow reporting of sched domain generation info Waiman Long
2018-06-18  4:14   ` Waiman Long
2018-06-20 14:20   ` Peter Zijlstra
2018-06-20 14:20     ` Peter Zijlstra
2018-06-18 14:20 ` [PATCH v10 0/9] cpuset: Enable cpuset controller in default hierarchy Juri Lelli
2018-06-18 14:20   ` Juri Lelli
2018-06-18 15:07   ` Waiman Long
2018-06-18 15:07     ` Waiman Long
2018-06-19  9:52 ` Juri Lelli
2018-06-19  9:52   ` Juri Lelli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.