linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4] cpuset: Enable cpuset controller in default hierarchy
@ 2018-03-09 15:35 Waiman Long
  2018-03-09 16:34 ` Mike Galbraith
  0 siblings, 1 reply; 20+ messages in thread
From: Waiman Long @ 2018-03-09 15:35 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto, efault,
	torvalds, Roman Gushchin, Waiman Long

Given the fact that thread mode had been merged into 4.14, it is now
time to enable cpuset to be used in the default hierarchy (cgroup v2)
as it is clearly threaded.

The cpuset controller had experienced feature creep since its
introduction more than a decade ago. Besides the core cpus and mems
control files to limit cpus and memory nodes, there are a bunch of
additional features that can be controlled from the userspace. Some of
the features are of doubtful usefulness and may not be actively used.

This patch enables cpuset controller in the default hierarchy with
a minimal set of features, namely just the cpus and mems and their
effective_* counterparts.  We can certainly add more features to the
default hierarchy in the future if there is a real user need for them
later on.

Alternatively, with the unified hiearachy, it may make more sense
to move some of those additional cpuset features, if desired, to
memory controller or may be to the cpu controller instead of staying
with cpuset.

v4:
 - Further minimize the feature set by removing the flags control knob.

v3:
 - Further trim the additional features down to just memory_migrate.
 - Update Documentation/cgroup-v2.txt.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/cgroup-v2.txt | 96 ++++++++++++++++++++++++++++++++++++++++-----
 kernel/cgroup/cpuset.c      | 44 +++++++++++++++++++--
 2 files changed, 127 insertions(+), 13 deletions(-)

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index 74cdeae..8d7300f 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -48,16 +48,18 @@ v1 is available under Documentation/cgroup-v1/.
        5-2-1. Memory Interface Files
        5-2-2. Usage Guidelines
        5-2-3. Memory Ownership
-     5-3. IO
-       5-3-1. IO Interface Files
-       5-3-2. Writeback
-     5-4. PID
-       5-4-1. PID Interface Files
-     5-5. Device
-     5-6. RDMA
-       5-6-1. RDMA Interface Files
-     5-7. Misc
-       5-7-1. perf_event
+     5-3. Cpuset
+       5.3-1. Cpuset Interface Files
+     5-4. IO
+       5-4-1. IO Interface Files
+       5-4-2. Writeback
+     5-5. PID
+       5-5-1. PID Interface Files
+     5-6. Device
+     5-7. RDMA
+       5-7-1. RDMA Interface Files
+     5-8. Misc
+       5-8-1. perf_event
      5-N. Non-normative information
        5-N-1. CPU controller root cgroup process behaviour
        5-N-2. IO controller root cgroup process behaviour
@@ -1243,6 +1245,80 @@ POSIX_FADV_DONTNEED to relinquish the ownership of memory areas
 belonging to the affected files to ensure correct memory ownership.
 
 
+Cpuset
+------
+
+The "cpuset" controller provides a mechanism for constraining
+the CPU and memory node placement of tasks to only the resources
+specified in the cpuset interface files in a task's current cgroup.
+This is especially valuable on large NUMA systems where placing jobs
+on properly sized subsets of the systems with careful processor and
+memory placement to reduce cross-node memory access and contention
+can improve overall system performance.
+
+The "cpuset" controller is hierarchical.  That means the controller
+cannot use CPUs or memory nodes not allowed in its parent.
+
+
+Cpuset Interface Files
+~~~~~~~~~~~~~~~~~~~~~~
+
+  cpuset.cpus
+	A read-write multiple values file which exists on non-root
+	cgroups.
+
+	It lists the CPUs allowed to be used by tasks within this
+	cgroup.  The CPU numbers are comma-separated numbers or
+	ranges.  For example:
+
+	  # cat cpuset.cpus
+	  0-4,6,8-10
+
+	An empty value indicates that the cgroup is using the same
+	setting as the nearest cgroup ancestor with a non-empty
+	"cpuset.cpus" or all the available CPUs if none is found.
+
+	The value of "cpuset.cpus" stays constant until the next update
+	and won't be affected by any CPU hotplug events.
+
+  cpuset.effective_cpus
+	A read-only multiple values file which exists on non-root
+	cgroups.
+
+	It lists the onlined CPUs that are actually allowed to be
+	used by tasks within the current cgroup. It is a subset of
+	"cpuset.cpus". Its value will be affected by CPU hotplug
+	events.
+
+  cpuset.mems
+	A read-write multiple values file which exists on non-root
+	cgroups.
+
+	It lists the memory nodes allowed to be used by tasks within
+	this cgroup.  The memory node numbers are comma-separated
+	numbers or ranges.  For example:
+
+	  # cat cpuset.mems
+	  0-1,3
+
+	An empty value indicates that the cgroup is using the same
+	setting as the nearest cgroup ancestor with a non-empty
+	"cpuset.mems" or all the available memory nodes if none
+	is found.
+
+	The value of "cpuset.mems" stays constant until the next update
+	and won't be affected by any memory nodes hotplug events.
+
+  cpuset.effective_mems
+	A read-only multiple values file which exists on non-root
+	cgroups.
+
+	It lists the onlined memory nodes that are actually allowed
+	to be used by tasks within the current cgroup. It is a subset
+	of "cpuset.mems". Its value will be affected by memory nodes
+	hotplug events.
+
+
 IO
 --
 
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index b42037e..7837d1f 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1823,12 +1823,11 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
 	return 0;
 }
 
-
 /*
  * for the common functions, 'private' gives the type of file
  */
 
-static struct cftype files[] = {
+static struct cftype legacy_files[] = {
 	{
 		.name = "cpus",
 		.seq_show = cpuset_common_seq_show,
@@ -1931,6 +1930,43 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
 };
 
 /*
+ * This is currently a minimal set for the default hierarchy. It can be
+ * expanded later on by migrating more features and control files from v1.
+ */
+static struct cftype dfl_files[] = {
+	{
+		.name = "cpus",
+		.seq_show = cpuset_common_seq_show,
+		.write = cpuset_write_resmask,
+		.max_write_len = (100U + 6 * NR_CPUS),
+		.private = FILE_CPULIST,
+	},
+
+	{
+		.name = "mems",
+		.seq_show = cpuset_common_seq_show,
+		.write = cpuset_write_resmask,
+		.max_write_len = (100U + 6 * MAX_NUMNODES),
+		.private = FILE_MEMLIST,
+	},
+
+	{
+		.name = "effective_cpus",
+		.seq_show = cpuset_common_seq_show,
+		.private = FILE_EFFECTIVE_CPULIST,
+	},
+
+	{
+		.name = "effective_mems",
+		.seq_show = cpuset_common_seq_show,
+		.private = FILE_EFFECTIVE_MEMLIST,
+	},
+
+	{ }	/* terminate */
+};
+
+
+/*
  *	cpuset_css_alloc - allocate a cpuset css
  *	cgrp:	control group that the new cpuset will be part of
  */
@@ -2104,8 +2140,10 @@ struct cgroup_subsys cpuset_cgrp_subsys = {
 	.post_attach	= cpuset_post_attach,
 	.bind		= cpuset_bind,
 	.fork		= cpuset_fork,
-	.legacy_cftypes	= files,
+	.legacy_cftypes	= legacy_files,
+	.dfl_cftypes	= dfl_files,
 	.early_init	= true,
+	.threaded	= true,
 };
 
 /**
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v4] cpuset: Enable cpuset controller in default hierarchy
  2018-03-09 15:35 [PATCH v4] cpuset: Enable cpuset controller in default hierarchy Waiman Long
@ 2018-03-09 16:34 ` Mike Galbraith
  2018-03-09 17:23   ` Mike Galbraith
  2018-03-09 17:45   ` Waiman Long
  0 siblings, 2 replies; 20+ messages in thread
From: Mike Galbraith @ 2018-03-09 16:34 UTC (permalink / raw)
  To: Waiman Long, Tejun Heo, Li Zefan, Johannes Weiner,
	Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	torvalds, Roman Gushchin

On Fri, 2018-03-09 at 10:35 -0500, Waiman Long wrote:
> Given the fact that thread mode had been merged into 4.14, it is now
> time to enable cpuset to be used in the default hierarchy (cgroup v2)
> as it is clearly threaded.
> 
> The cpuset controller had experienced feature creep since its
> introduction more than a decade ago. Besides the core cpus and mems
> control files to limit cpus and memory nodes, there are a bunch of
> additional features that can be controlled from the userspace. Some of
> the features are of doubtful usefulness and may not be actively used.

One rather important features is the ability to dynamically partition a
box and isolate critical loads.  How does one do that with v2?

In v1, you create two or more exclusive sets, one for generic
housekeeping, and one or more for critical load(s), RT in my case,
turning off load balancing in the critical set(s) for obvious reasons.

> This patch enables cpuset controller in the default hierarchy with
> a minimal set of features, namely just the cpus and mems and their
> effective_* counterparts.  We can certainly add more features to the
> default hierarchy in the future if there is a real user need for them
> later on.
> 
> Alternatively, with the unified hiearachy, it may make more sense
> to move some of those additional cpuset features, if desired, to
> memory controller or may be to the cpu controller instead of staying
> with cpuset.
> 
> v4:
>  - Further minimize the feature set by removing the flags control knob.
> 
> v3:
>  - Further trim the additional features down to just memory_migrate.
>  - Update Documentation/cgroup-v2.txt.
> 
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  Documentation/cgroup-v2.txt | 96 ++++++++++++++++++++++++++++++++++++++++-----
>  kernel/cgroup/cpuset.c      | 44 +++++++++++++++++++--
>  2 files changed, 127 insertions(+), 13 deletions(-)
> 
> diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
> index 74cdeae..8d7300f 100644
> --- a/Documentation/cgroup-v2.txt
> +++ b/Documentation/cgroup-v2.txt
> @@ -48,16 +48,18 @@ v1 is available under Documentation/cgroup-v1/.
>         5-2-1. Memory Interface Files
>         5-2-2. Usage Guidelines
>         5-2-3. Memory Ownership
> -     5-3. IO
> -       5-3-1. IO Interface Files
> -       5-3-2. Writeback
> -     5-4. PID
> -       5-4-1. PID Interface Files
> -     5-5. Device
> -     5-6. RDMA
> -       5-6-1. RDMA Interface Files
> -     5-7. Misc
> -       5-7-1. perf_event
> +     5-3. Cpuset
> +       5.3-1. Cpuset Interface Files
> +     5-4. IO
> +       5-4-1. IO Interface Files
> +       5-4-2. Writeback
> +     5-5. PID
> +       5-5-1. PID Interface Files
> +     5-6. Device
> +     5-7. RDMA
> +       5-7-1. RDMA Interface Files
> +     5-8. Misc
> +       5-8-1. perf_event
>       5-N. Non-normative information
>         5-N-1. CPU controller root cgroup process behaviour
>         5-N-2. IO controller root cgroup process behaviour
> @@ -1243,6 +1245,80 @@ POSIX_FADV_DONTNEED to relinquish the ownership of memory areas
>  belonging to the affected files to ensure correct memory ownership.
>  
>  
> +Cpuset
> +------
> +
> +The "cpuset" controller provides a mechanism for constraining
> +the CPU and memory node placement of tasks to only the resources
> +specified in the cpuset interface files in a task's current cgroup.
> +This is especially valuable on large NUMA systems where placing jobs
> +on properly sized subsets of the systems with careful processor and
> +memory placement to reduce cross-node memory access and contention
> +can improve overall system performance.
> +
> +The "cpuset" controller is hierarchical.  That means the controller
> +cannot use CPUs or memory nodes not allowed in its parent.
> +
> +
> +Cpuset Interface Files
> +~~~~~~~~~~~~~~~~~~~~~~
> +
> +  cpuset.cpus
> +	A read-write multiple values file which exists on non-root
> +	cgroups.
> +
> +	It lists the CPUs allowed to be used by tasks within this
> +	cgroup.  The CPU numbers are comma-separated numbers or
> +	ranges.  For example:
> +
> +	  # cat cpuset.cpus
> +	  0-4,6,8-10
> +
> +	An empty value indicates that the cgroup is using the same
> +	setting as the nearest cgroup ancestor with a non-empty
> +	"cpuset.cpus" or all the available CPUs if none is found.
> +
> +	The value of "cpuset.cpus" stays constant until the next update
> +	and won't be affected by any CPU hotplug events.
> +
> +  cpuset.effective_cpus
> +	A read-only multiple values file which exists on non-root
> +	cgroups.
> +
> +	It lists the onlined CPUs that are actually allowed to be
> +	used by tasks within the current cgroup. It is a subset of
> +	"cpuset.cpus". Its value will be affected by CPU hotplug
> +	events.
> +
> +  cpuset.mems
> +	A read-write multiple values file which exists on non-root
> +	cgroups.
> +
> +	It lists the memory nodes allowed to be used by tasks within
> +	this cgroup.  The memory node numbers are comma-separated
> +	numbers or ranges.  For example:
> +
> +	  # cat cpuset.mems
> +	  0-1,3
> +
> +	An empty value indicates that the cgroup is using the same
> +	setting as the nearest cgroup ancestor with a non-empty
> +	"cpuset.mems" or all the available memory nodes if none
> +	is found.
> +
> +	The value of "cpuset.mems" stays constant until the next update
> +	and won't be affected by any memory nodes hotplug events.
> +
> +  cpuset.effective_mems
> +	A read-only multiple values file which exists on non-root
> +	cgroups.
> +
> +	It lists the onlined memory nodes that are actually allowed
> +	to be used by tasks within the current cgroup. It is a subset
> +	of "cpuset.mems". Its value will be affected by memory nodes
> +	hotplug events.
> +
> +
>  IO
>  --
>  
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index b42037e..7837d1f 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -1823,12 +1823,11 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
>  	return 0;
>  }
>  
> -
>  /*
>   * for the common functions, 'private' gives the type of file
>   */
>  
> -static struct cftype files[] = {
> +static struct cftype legacy_files[] = {
>  	{
>  		.name = "cpus",
>  		.seq_show = cpuset_common_seq_show,
> @@ -1931,6 +1930,43 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
>  };
>  
>  /*
> + * This is currently a minimal set for the default hierarchy. It can be
> + * expanded later on by migrating more features and control files from v1.
> + */
> +static struct cftype dfl_files[] = {
> +	{
> +		.name = "cpus",
> +		.seq_show = cpuset_common_seq_show,
> +		.write = cpuset_write_resmask,
> +		.max_write_len = (100U + 6 * NR_CPUS),
> +		.private = FILE_CPULIST,
> +	},
> +
> +	{
> +		.name = "mems",
> +		.seq_show = cpuset_common_seq_show,
> +		.write = cpuset_write_resmask,
> +		.max_write_len = (100U + 6 * MAX_NUMNODES),
> +		.private = FILE_MEMLIST,
> +	},
> +
> +	{
> +		.name = "effective_cpus",
> +		.seq_show = cpuset_common_seq_show,
> +		.private = FILE_EFFECTIVE_CPULIST,
> +	},
> +
> +	{
> +		.name = "effective_mems",
> +		.seq_show = cpuset_common_seq_show,
> +		.private = FILE_EFFECTIVE_MEMLIST,
> +	},
> +
> +	{ }	/* terminate */
> +};
> +
> +
> +/*
>   *	cpuset_css_alloc - allocate a cpuset css
>   *	cgrp:	control group that the new cpuset will be part of
>   */
> @@ -2104,8 +2140,10 @@ struct cgroup_subsys cpuset_cgrp_subsys = {
>  	.post_attach	= cpuset_post_attach,
>  	.bind		= cpuset_bind,
>  	.fork		= cpuset_fork,
> -	.legacy_cftypes	= files,
> +	.legacy_cftypes	= legacy_files,
> +	.dfl_cftypes	= dfl_files,
>  	.early_init	= true,
> +	.threaded	= true,
>  };
>  
>  /**

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4] cpuset: Enable cpuset controller in default hierarchy
  2018-03-09 16:34 ` Mike Galbraith
@ 2018-03-09 17:23   ` Mike Galbraith
  2018-03-09 17:45   ` Waiman Long
  1 sibling, 0 replies; 20+ messages in thread
From: Mike Galbraith @ 2018-03-09 17:23 UTC (permalink / raw)
  To: Waiman Long, Tejun Heo, Li Zefan, Johannes Weiner,
	Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	torvalds, Roman Gushchin

On Fri, 2018-03-09 at 17:34 +0100, Mike Galbraith wrote:
> On Fri, 2018-03-09 at 10:35 -0500, Waiman Long wrote:
> > Given the fact that thread mode had been merged into 4.14, it is now
> > time to enable cpuset to be used in the default hierarchy (cgroup v2)
> > as it is clearly threaded.
> > 
> > The cpuset controller had experienced feature creep since its
> > introduction more than a decade ago. Besides the core cpus and mems
> > control files to limit cpus and memory nodes, there are a bunch of
> > additional features that can be controlled from the userspace. Some of
> > the features are of doubtful usefulness and may not be actively used.
> 
> One rather important features is the ability to dynamically partition a
> box and isolate critical loads.  How does one do that with v2?

This still very much in use stuff that started below, not to mention
the nohz_full stuff that Frederic Weisbecker is working on integrating
so it can blossom into a proper dynamic set property.

Author: Dinakar Guniguntala <dino@in.ibm.com>  2005-06-25 23:57:33
Committer: Linus Torvalds <torvalds@ppc970.osdl.org>  2005-06-26 01:24:45
Parent: 37e4ab3f0cba13adf3535d373fd98e5ee47b5410 ([PATCH] Changing RT priority without CAP_SYS_NICE)
Child:  85d7b94981e2e919697bc235aad7367b33c3864b ([PATCH] Dynamic sched domains: cpuset changes)
Branches: master, remotes/origin/master and many more (82)
Follows: v2.6.12
Precedes: v2.6.13-rc1

    [PATCH] Dynamic sched domains: sched changes
    
    The following patches add dynamic sched domains functionality that was
    extensively discussed on lkml and lse-tech.  I would like to see this added to
    -mm
    
    o The main advantage with this feature is that it ensures that the scheduler
      load balacing code only balances against the cpus that are in the sched
      domain as defined by an exclusive cpuset and not all of the cpus in the
      system. This removes any overhead due to load balancing code trying to
      pull tasks outside of the cpu exclusive cpuset only to be prevented by
      the tasks' cpus_allowed mask.
    o cpu exclusive cpusets are useful for servers running orthogonal
      workloads such as RT applications requiring low latency and HPC
      applications that are throughput sensitive
    
    o It provides a new API partition_sched_domains in sched.c
      that makes dynamic sched domains possible.
    o cpu_exclusive cpusets sets are now associated with a sched domain.
      Which means that the users can dynamically modify the sched domains
      through the cpuset file system interface
    o ia64 sched domain code has been updated to support this feature as well
    o Currently, this does not support hotplug. (However some of my tests
      indicate hotplug+preempt is currently broken)
    o I have tested it extensively on x86.
    o This should have very minimal impact on performance as none of
      the fast paths are affected
    
    Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com>
    Acked-by: Paul Jackson <pj@sgi.com>
    Acked-by: Nick Piggin <nickpiggin@yahoo.com.au>
    Acked-by: Matthew Dobson <colpatch@us.ibm.com>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4] cpuset: Enable cpuset controller in default hierarchy
  2018-03-09 16:34 ` Mike Galbraith
  2018-03-09 17:23   ` Mike Galbraith
@ 2018-03-09 17:45   ` Waiman Long
  2018-03-09 18:17     ` Mike Galbraith
  1 sibling, 1 reply; 20+ messages in thread
From: Waiman Long @ 2018-03-09 17:45 UTC (permalink / raw)
  To: Mike Galbraith, Tejun Heo, Li Zefan, Johannes Weiner,
	Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	torvalds, Roman Gushchin

On 03/09/2018 11:34 AM, Mike Galbraith wrote:
> On Fri, 2018-03-09 at 10:35 -0500, Waiman Long wrote:
>> Given the fact that thread mode had been merged into 4.14, it is now
>> time to enable cpuset to be used in the default hierarchy (cgroup v2)
>> as it is clearly threaded.
>>
>> The cpuset controller had experienced feature creep since its
>> introduction more than a decade ago. Besides the core cpus and mems
>> control files to limit cpus and memory nodes, there are a bunch of
>> additional features that can be controlled from the userspace. Some of
>> the features are of doubtful usefulness and may not be actively used.
> One rather important features is the ability to dynamically partition a
> box and isolate critical loads.  How does one do that with v2?
>
> In v1, you create two or more exclusive sets, one for generic
> housekeeping, and one or more for critical load(s), RT in my case,
> turning off load balancing in the critical set(s) for obvious reasons.

This patch just serves as a foundation for cpuset support in v2. I am
not excluding the fact that more v1 features will be added in future
patches. We want to start with a clean slate and add on it after careful
consideration. There are some v1 cpuset features that are not used or
rarely used. We certainly want to get rid of them, if possible.

Now for the exclusive cpuset, it is certainly something that can be done
in userspace. If there is a valid use case that requires exclusive
cpuset support in the kernel, we can certainly consider putting it into
v2 as well.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4] cpuset: Enable cpuset controller in default hierarchy
  2018-03-09 17:45   ` Waiman Long
@ 2018-03-09 18:17     ` Mike Galbraith
  2018-03-09 18:20       ` Waiman Long
  0 siblings, 1 reply; 20+ messages in thread
From: Mike Galbraith @ 2018-03-09 18:17 UTC (permalink / raw)
  To: Waiman Long, Tejun Heo, Li Zefan, Johannes Weiner,
	Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	torvalds, Roman Gushchin

On Fri, 2018-03-09 at 12:45 -0500, Waiman Long wrote:
> On 03/09/2018 11:34 AM, Mike Galbraith wrote:
> > On Fri, 2018-03-09 at 10:35 -0500, Waiman Long wrote:
> >> Given the fact that thread mode had been merged into 4.14, it is now
> >> time to enable cpuset to be used in the default hierarchy (cgroup v2)
> >> as it is clearly threaded.
> >>
> >> The cpuset controller had experienced feature creep since its
> >> introduction more than a decade ago. Besides the core cpus and mems
> >> control files to limit cpus and memory nodes, there are a bunch of
> >> additional features that can be controlled from the userspace. Some of
> >> the features are of doubtful usefulness and may not be actively used.
> > One rather important features is the ability to dynamically partition a
> > box and isolate critical loads.  How does one do that with v2?
> >
> > In v1, you create two or more exclusive sets, one for generic
> > housekeeping, and one or more for critical load(s), RT in my case,
> > turning off load balancing in the critical set(s) for obvious reasons.
> 
> This patch just serves as a foundation for cpuset support in v2. I am
> not excluding the fact that more v1 features will be added in future
> patches. We want to start with a clean slate and add on it after careful
> consideration. There are some v1 cpuset features that are not used or
> rarely used. We certainly want to get rid of them, if possible.

If v2 is to ever supersede v1, as is the normal way of things, core
functionality really should be on the v2 boat when it sails.  What you
left standing on the dock is critical core cpuset functionality.

	-Mike

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4] cpuset: Enable cpuset controller in default hierarchy
  2018-03-09 18:17     ` Mike Galbraith
@ 2018-03-09 18:20       ` Waiman Long
  2018-03-09 19:40         ` Mike Galbraith
  0 siblings, 1 reply; 20+ messages in thread
From: Waiman Long @ 2018-03-09 18:20 UTC (permalink / raw)
  To: Mike Galbraith, Tejun Heo, Li Zefan, Johannes Weiner,
	Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	torvalds, Roman Gushchin

On 03/09/2018 01:17 PM, Mike Galbraith wrote:
> On Fri, 2018-03-09 at 12:45 -0500, Waiman Long wrote:
>> On 03/09/2018 11:34 AM, Mike Galbraith wrote:
>>> On Fri, 2018-03-09 at 10:35 -0500, Waiman Long wrote:
>>>> Given the fact that thread mode had been merged into 4.14, it is now
>>>> time to enable cpuset to be used in the default hierarchy (cgroup v2)
>>>> as it is clearly threaded.
>>>>
>>>> The cpuset controller had experienced feature creep since its
>>>> introduction more than a decade ago. Besides the core cpus and mems
>>>> control files to limit cpus and memory nodes, there are a bunch of
>>>> additional features that can be controlled from the userspace. Some of
>>>> the features are of doubtful usefulness and may not be actively used.
>>> One rather important features is the ability to dynamically partition a
>>> box and isolate critical loads.  How does one do that with v2?
>>>
>>> In v1, you create two or more exclusive sets, one for generic
>>> housekeeping, and one or more for critical load(s), RT in my case,
>>> turning off load balancing in the critical set(s) for obvious reasons.
>> This patch just serves as a foundation for cpuset support in v2. I am
>> not excluding the fact that more v1 features will be added in future
>> patches. We want to start with a clean slate and add on it after careful
>> consideration. There are some v1 cpuset features that are not used or
>> rarely used. We certainly want to get rid of them, if possible.
> If v2 is to ever supersede v1, as is the normal way of things, core
> functionality really should be on the v2 boat when it sails.  What you
> left standing on the dock is critical core cpuset functionality.
>
> 	-Mike

>From your perspective, what are core functionality that should be
included in cpuset v2 other than the ability to restrict cpus and memory
nodes.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4] cpuset: Enable cpuset controller in default hierarchy
  2018-03-09 18:20       ` Waiman Long
@ 2018-03-09 19:40         ` Mike Galbraith
  2018-03-09 20:43           ` Waiman Long
  0 siblings, 1 reply; 20+ messages in thread
From: Mike Galbraith @ 2018-03-09 19:40 UTC (permalink / raw)
  To: Waiman Long, Tejun Heo, Li Zefan, Johannes Weiner,
	Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	torvalds, Roman Gushchin

On Fri, 2018-03-09 at 13:20 -0500, Waiman Long wrote:
> On 03/09/2018 01:17 PM, Mike Galbraith wrote:
> > On Fri, 2018-03-09 at 12:45 -0500, Waiman Long wrote:
> >> On 03/09/2018 11:34 AM, Mike Galbraith wrote:
> >>> On Fri, 2018-03-09 at 10:35 -0500, Waiman Long wrote:
> >>>> Given the fact that thread mode had been merged into 4.14, it is now
> >>>> time to enable cpuset to be used in the default hierarchy (cgroup v2)
> >>>> as it is clearly threaded.
> >>>>
> >>>> The cpuset controller had experienced feature creep since its
> >>>> introduction more than a decade ago. Besides the core cpus and mems
> >>>> control files to limit cpus and memory nodes, there are a bunch of
> >>>> additional features that can be controlled from the userspace. Some of
> >>>> the features are of doubtful usefulness and may not be actively used.
> >>> One rather important features is the ability to dynamically partition a
> >>> box and isolate critical loads.  How does one do that with v2?
> >>>
> >>> In v1, you create two or more exclusive sets, one for generic
> >>> housekeeping, and one or more for critical load(s), RT in my case,
> >>> turning off load balancing in the critical set(s) for obvious reasons.
> >> This patch just serves as a foundation for cpuset support in v2. I am
> >> not excluding the fact that more v1 features will be added in future
> >> patches. We want to start with a clean slate and add on it after careful
> >> consideration. There are some v1 cpuset features that are not used or
> >> rarely used. We certainly want to get rid of them, if possible.
> > If v2 is to ever supersede v1, as is the normal way of things, core
> > functionality really should be on the v2 boat when it sails.  What you
> > left standing on the dock is critical core cpuset functionality.
> >
> > 	-Mike
> 
> From your perspective, what are core functionality that should be
> included in cpuset v2 other than the ability to restrict cpus and memory
> nodes.

Exclusive sets are essential, no?  How else can you manage set wide
properties such as topology (and hopefully soonish nohz).  You clearly
can't have overlapping sets, one having scheduler topology, the other
having none.  Whatever the form, something as core as the capability to
dynamically partition and isolate should IMO be firmly aboard the v2
boat before it sails.

	-Mike

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4] cpuset: Enable cpuset controller in default hierarchy
  2018-03-09 19:40         ` Mike Galbraith
@ 2018-03-09 20:43           ` Waiman Long
  2018-03-09 22:17             ` Peter Zijlstra
  0 siblings, 1 reply; 20+ messages in thread
From: Waiman Long @ 2018-03-09 20:43 UTC (permalink / raw)
  To: Mike Galbraith, Tejun Heo, Li Zefan, Johannes Weiner,
	Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	torvalds, Roman Gushchin

On 03/09/2018 02:40 PM, Mike Galbraith wrote:
>>>
>>> If v2 is to ever supersede v1, as is the normal way of things, core
>>> functionality really should be on the v2 boat when it sails.  What you
>>> left standing on the dock is critical core cpuset functionality.
>>>
>>> 	-Mike
>> From your perspective, what are core functionality that should be
>> included in cpuset v2 other than the ability to restrict cpus and memory
>> nodes.
> Exclusive sets are essential, no?  How else can you manage set wide
> properties such as topology (and hopefully soonish nohz).  You clearly
> can't have overlapping sets, one having scheduler topology, the other
> having none.  Whatever the form, something as core as the capability to
> dynamically partition and isolate should IMO be firmly aboard the v2
> boat before it sails.
>
> 	-Mike

The isolcpus= parameter just reduce the cpus available to the rests of
the system. The cpuset controller does look at that value and make
adjustment accordingly, but it has no dependence on exclusive cpu/mem
features of cpuset.

-Longman

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4] cpuset: Enable cpuset controller in default hierarchy
  2018-03-09 20:43           ` Waiman Long
@ 2018-03-09 22:17             ` Peter Zijlstra
  2018-03-09 23:06               ` Waiman Long
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2018-03-09 22:17 UTC (permalink / raw)
  To: Waiman Long
  Cc: Mike Galbraith, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, cgroups, linux-kernel, linux-doc, kernel-team, pjt,
	luto, torvalds, Roman Gushchin

On Fri, Mar 09, 2018 at 03:43:34PM -0500, Waiman Long wrote:
> The isolcpus= parameter just reduce the cpus available to the rests of
> the system. The cpuset controller does look at that value and make
> adjustment accordingly, but it has no dependence on exclusive cpu/mem
> features of cpuset.

The isolcpus= boot param is donkey shit and needs to die. cpuset _used_
to be able to fully replace it, but with the advent of cgroup 'feature'
this got lost.

And instead of fixing it, you're making it _far_ worse. You completely
removed all the bits that allow repartitioning the scheduler domains.

Mike is completely right, full NAK on any such approach.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4] cpuset: Enable cpuset controller in default hierarchy
  2018-03-09 22:17             ` Peter Zijlstra
@ 2018-03-09 23:06               ` Waiman Long
  2018-03-10  3:47                 ` Mike Galbraith
  2018-03-10 13:16                 ` Peter Zijlstra
  0 siblings, 2 replies; 20+ messages in thread
From: Waiman Long @ 2018-03-09 23:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mike Galbraith, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, cgroups, linux-kernel, linux-doc, kernel-team, pjt,
	luto, torvalds, Roman Gushchin

On 03/09/2018 05:17 PM, Peter Zijlstra wrote:
> On Fri, Mar 09, 2018 at 03:43:34PM -0500, Waiman Long wrote:
>> The isolcpus= parameter just reduce the cpus available to the rests of
>> the system. The cpuset controller does look at that value and make
>> adjustment accordingly, but it has no dependence on exclusive cpu/mem
>> features of cpuset.
> The isolcpus= boot param is donkey shit and needs to die. cpuset _used_
> to be able to fully replace it, but with the advent of cgroup 'feature'
> this got lost.
>
> And instead of fixing it, you're making it _far_ worse. You completely
> removed all the bits that allow repartitioning the scheduler domains.
>
> Mike is completely right, full NAK on any such approach.

So you are talking about sched_relax_domain_level and
sched_load_balance. I have not removed any bits. I just haven't exposed
them yet. It does seem like these 2 control knobs are useful from the
scheduling perspective. Do we also need cpu_exclusive or just the two
sched control knobs are enough?

Cheers,
Longman

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4] cpuset: Enable cpuset controller in default hierarchy
  2018-03-09 23:06               ` Waiman Long
@ 2018-03-10  3:47                 ` Mike Galbraith
  2018-03-14 19:57                   ` Tejun Heo
  2018-03-10 13:16                 ` Peter Zijlstra
  1 sibling, 1 reply; 20+ messages in thread
From: Mike Galbraith @ 2018-03-10  3:47 UTC (permalink / raw)
  To: Waiman Long, Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, torvalds,
	Roman Gushchin

On Fri, 2018-03-09 at 18:06 -0500, Waiman Long wrote:
> On 03/09/2018 05:17 PM, Peter Zijlstra wrote:
> > On Fri, Mar 09, 2018 at 03:43:34PM -0500, Waiman Long wrote:
> >> The isolcpus= parameter just reduce the cpus available to the rests of
> >> the system. The cpuset controller does look at that value and make
> >> adjustment accordingly, but it has no dependence on exclusive cpu/mem
> >> features of cpuset.
> > The isolcpus= boot param is donkey shit and needs to die. cpuset _used_
> > to be able to fully replace it, but with the advent of cgroup 'feature'
> > this got lost.
> >
> > And instead of fixing it, you're making it _far_ worse. You completely
> > removed all the bits that allow repartitioning the scheduler domains.
> >
> > Mike is completely right, full NAK on any such approach.
> 
> So you are talking about sched_relax_domain_level and
> sched_load_balance. I have not removed any bits. I just haven't exposed
> them yet. It does seem like these 2 control knobs are useful from the
> scheduling perspective. Do we also need cpu_exclusive or just the two
> sched control knobs are enough?

Some form of cpu_exclusive (preferably exactly that, but something else
could replace it) is needed to define sets that must not overlap any
other set at creation time or any time thereafter.  A set with property
'exclusive' is the enabler for fundamentally exclusive (but dynamic!)
set properties such as 'isolated' (etc etc).

	-Mike

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4] cpuset: Enable cpuset controller in default hierarchy
  2018-03-09 23:06               ` Waiman Long
  2018-03-10  3:47                 ` Mike Galbraith
@ 2018-03-10 13:16                 ` Peter Zijlstra
  2018-03-12 14:20                   ` Waiman Long
  1 sibling, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2018-03-10 13:16 UTC (permalink / raw)
  To: Waiman Long
  Cc: Mike Galbraith, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, cgroups, linux-kernel, linux-doc, kernel-team, pjt,
	luto, torvalds, Roman Gushchin

On Fri, Mar 09, 2018 at 06:06:29PM -0500, Waiman Long wrote:
> So you are talking about sched_relax_domain_level and

That one I wouldn't be sad to see the back of.

> sched_load_balance.

This one, that's critical. And this is the perfect time to try and fix
the whole isolcpus issue.

The primary issue is that to make equivalent functionality available
through cpuset, we need to basically start all tasks outside the root
group.

The equivalent of isolcpus=xxx is a cgroup setup like:

        root
	/  \
  system    other

Where other has the @xxx cpus and system the remainder and
root.sched_load_balance = 0.

Back before cgroups (and the new workqueue stuff), we could've started
everything in the !root group, no worry. But now that doesn't work,
because a bunch of controllers can't deal with that and everything
cgroup expects the cgroupfs to be empty on boot.

It's one of my biggest regrets that I didn't 'fix' this before cgroups
came along.

> I have not removed any bits. I just haven't exposed
> them yet. It does seem like these 2 control knobs are useful from the
> scheduling perspective. Do we also need cpu_exclusive or just the two
> sched control knobs are enough?

I always forget if we need exclusive for load_balance to work; I'll
peruse the document/code.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4] cpuset: Enable cpuset controller in default hierarchy
  2018-03-10 13:16                 ` Peter Zijlstra
@ 2018-03-12 14:20                   ` Waiman Long
  2018-03-12 15:21                     ` Mike Galbraith
  0 siblings, 1 reply; 20+ messages in thread
From: Waiman Long @ 2018-03-12 14:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mike Galbraith, Tejun Heo, Li Zefan, Johannes Weiner,
	Ingo Molnar, cgroups, linux-kernel, linux-doc, kernel-team, pjt,
	luto, torvalds, Roman Gushchin

On 03/10/2018 08:16 AM, Peter Zijlstra wrote:
> On Fri, Mar 09, 2018 at 06:06:29PM -0500, Waiman Long wrote:
>> So you are talking about sched_relax_domain_level and
> That one I wouldn't be sad to see the back of.
>
>> sched_load_balance.
> This one, that's critical. And this is the perfect time to try and fix
> the whole isolcpus issue.
>
> The primary issue is that to make equivalent functionality available
> through cpuset, we need to basically start all tasks outside the root
> group.
>
> The equivalent of isolcpus=xxx is a cgroup setup like:
>
>         root
> 	/  \
>   system    other
>
> Where other has the @xxx cpus and system the remainder and
> root.sched_load_balance = 0.

I saw in the kernel-parameters.txt file that the isolcpus option was
deprecated - use cpusets instead. However, there doesn't seem to have
document on the right way to do it. Of course, we can achieve similar
results with what you have outlined above, but the process is more
complex than just adding another boot command line argument with
isolcpus. So I doubt isolcpus will die anytime soon unless we can make
the alternative as easy to use.

> Back before cgroups (and the new workqueue stuff), we could've started
> everything in the !root group, no worry. But now that doesn't work,
> because a bunch of controllers can't deal with that and everything
> cgroup expects the cgroupfs to be empty on boot.

AFAIK, all the processes belong to the root cgroup on boot. And the root
cgroup is usually special that the controller may not exert any control
for processes in the root cgroup. Many controllers become active for
processes in the child cgroups only. Would you mind elaborating what
doesn't quite work currently?

 
> It's one of my biggest regrets that I didn't 'fix' this before cgroups
> came along.
>
>> I have not removed any bits. I just haven't exposed
>> them yet. It does seem like these 2 control knobs are useful from the
>> scheduling perspective. Do we also need cpu_exclusive or just the two
>> sched control knobs are enough?
> I always forget if we need exclusive for load_balance to work; I'll
> peruse the document/code.

I think the cpu_exclusive feature can be useful to enforce that CPUs
allocated to the "other" isolated cgroup cannot be used by the processes
under the "system" parent.

I know that there are special code to handle the isolcpus option. How
about changing it to create a exclusive cpuset automatically instead.
Applications that need to run in those isolated CPUs can then use the
standard cgroup process to be moved into the isolated cgroup. For example,

isolcpus=<cpuset-name>,<cpu-id-list>

or

isolcpuset=<cpuset-name>[,cpu:<cpu-id-list>][,mem:<memory-node-list>]

We can then retire the old usage and encourage users to use the cgroup
API to manage it.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4] cpuset: Enable cpuset controller in default hierarchy
  2018-03-12 14:20                   ` Waiman Long
@ 2018-03-12 15:21                     ` Mike Galbraith
  0 siblings, 0 replies; 20+ messages in thread
From: Mike Galbraith @ 2018-03-12 15:21 UTC (permalink / raw)
  To: Waiman Long, Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, torvalds,
	Roman Gushchin

On Mon, 2018-03-12 at 10:20 -0400, Waiman Long wrote:
> On 03/10/2018 08:16 AM, Peter Zijlstra wrote:
> 
> > The equivalent of isolcpus=xxx is a cgroup setup like:
> >
> >         root
> > 	/  \
> >   system    other
> >
> > Where other has the @xxx cpus and system the remainder and
> > root.sched_load_balance = 0.
> 
> I saw in the kernel-parameters.txt file that the isolcpus option was
> deprecated - use cpusets instead. However, there doesn't seem to have
> document on the right way to do it.

I use cset shield (cpuset package) in a script to create a set and
migrate everything that's permitted into the system set.

setup:
cset shield --userset=rtcpus --cpu=4-63 --kthread=on
<poke this/that>

teardown:
cset shield --userset=rtcpus --reset
<un-poke this/that>

Non-sexy, but works for simple stuff.

	-Mike

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4] cpuset: Enable cpuset controller in default hierarchy
  2018-03-10  3:47                 ` Mike Galbraith
@ 2018-03-14 19:57                   ` Tejun Heo
  2018-03-15  2:49                     ` Mike Galbraith
  0 siblings, 1 reply; 20+ messages in thread
From: Tejun Heo @ 2018-03-14 19:57 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Waiman Long, Peter Zijlstra, Li Zefan, Johannes Weiner,
	Ingo Molnar, cgroups, linux-kernel, linux-doc, kernel-team, pjt,
	luto, torvalds, Roman Gushchin

Hello,

On Sat, Mar 10, 2018 at 04:47:28AM +0100, Mike Galbraith wrote:
> Some form of cpu_exclusive (preferably exactly that, but something else
> could replace it) is needed to define sets that must not overlap any
> other set at creation time or any time thereafter.  A set with property
> 'exclusive' is the enabler for fundamentally exclusive (but dynamic!)
> set properties such as 'isolated' (etc etc).

I'm not sure cpu_exclusive makes sense.  A controller knob can either
belong to the parent or the cgroup itself and cpu_exclusive doesn't
make sense in either case.

1. cpu_exclusive is owned by the parent as other usual resource
   control knobs.  IOW, it's not delegatable.

   This is weird because it's asking the kernel to protect against its
   own misconfiguration and there's nothing preventing cpu_exclusive
   itself being cleared by the same entitya.

2. cpu_exclusive is owned by the cgroup itself like memory.oom_group.
   IOW, it's delegatable.

   This allows a cgroup to affect what its siblings can or cannot do,
   which is broken.  Semantically, it doesn't make much sense either.

I don't think it's a good idea to add a kernel mechanism to prevent
misconfiguration from a single entity.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4] cpuset: Enable cpuset controller in default hierarchy
  2018-03-14 19:57                   ` Tejun Heo
@ 2018-03-15  2:49                     ` Mike Galbraith
  2018-03-19 15:34                       ` Tejun Heo
  0 siblings, 1 reply; 20+ messages in thread
From: Mike Galbraith @ 2018-03-15  2:49 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Waiman Long, Peter Zijlstra, Li Zefan, Johannes Weiner,
	Ingo Molnar, cgroups, linux-kernel, linux-doc, kernel-team, pjt,
	luto, torvalds, Roman Gushchin

On Wed, 2018-03-14 at 12:57 -0700, Tejun Heo wrote:
> Hello,
> 
> On Sat, Mar 10, 2018 at 04:47:28AM +0100, Mike Galbraith wrote:
> > Some form of cpu_exclusive (preferably exactly that, but something else
> > could replace it) is needed to define sets that must not overlap any
> > other set at creation time or any time thereafter.  A set with property
> > 'exclusive' is the enabler for fundamentally exclusive (but dynamic!)
> > set properties such as 'isolated' (etc etc).
> 
> I'm not sure cpu_exclusive makes sense.  A controller knob can either
> belong to the parent or the cgroup itself and cpu_exclusive doesn't
> make sense in either case.
> 
> 1. cpu_exclusive is owned by the parent as other usual resource
>    control knobs.  IOW, it's not delegatable.
> 
>    This is weird because it's asking the kernel to protect against its
>    own misconfiguration and there's nothing preventing cpu_exclusive
>    itself being cleared by the same entitya.
> 
> 2. cpu_exclusive is owned by the cgroup itself like memory.oom_group.
>    IOW, it's delegatable.
> 
>    This allows a cgroup to affect what its siblings can or cannot do,
>    which is broken.  Semantically, it doesn't make much sense either.
> 
> I don't think it's a good idea to add a kernel mechanism to prevent
> misconfiguration from a single entity.

Under the hood v2 details are entirely up to you.  My input ends at
please don't leave dynamic partitioning standing at the dock when v2
sails.

	-Mike

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4] cpuset: Enable cpuset controller in default hierarchy
  2018-03-15  2:49                     ` Mike Galbraith
@ 2018-03-19 15:34                       ` Tejun Heo
  2018-03-19 20:49                         ` Mike Galbraith
  0 siblings, 1 reply; 20+ messages in thread
From: Tejun Heo @ 2018-03-19 15:34 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Waiman Long, Peter Zijlstra, Li Zefan, Johannes Weiner,
	Ingo Molnar, cgroups, linux-kernel, linux-doc, kernel-team, pjt,
	luto, torvalds, Roman Gushchin

Hello, Mike.

On Thu, Mar 15, 2018 at 03:49:01AM +0100, Mike Galbraith wrote:
> Under the hood v2 details are entirely up to you.  My input ends at
> please don't leave dynamic partitioning standing at the dock when v2
> sails.

So, this isn't about implementation details but about what the
interface achieves - ie, what's the actual function?  The only thing I
can see is blocking the entity which is configuring the hierarchy from
making certain configs.  While that might be useful in some specific
use cases, it seems to miss the bar for becoming its own kernel
feature.  After all, nothing prevents the same entity from clearing
the exlusive bit and making the said changes.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4] cpuset: Enable cpuset controller in default hierarchy
  2018-03-19 15:34                       ` Tejun Heo
@ 2018-03-19 20:49                         ` Mike Galbraith
  2018-03-19 21:41                           ` Waiman Long
  0 siblings, 1 reply; 20+ messages in thread
From: Mike Galbraith @ 2018-03-19 20:49 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Waiman Long, Peter Zijlstra, Li Zefan, Johannes Weiner,
	Ingo Molnar, cgroups, linux-kernel, linux-doc, kernel-team, pjt,
	luto, torvalds, Roman Gushchin

On Mon, 2018-03-19 at 08:34 -0700, Tejun Heo wrote:
> Hello, Mike.
> 
> On Thu, Mar 15, 2018 at 03:49:01AM +0100, Mike Galbraith wrote:
> > Under the hood v2 details are entirely up to you.  My input ends at
> > please don't leave dynamic partitioning standing at the dock when v2
> > sails.
> 
> So, this isn't about implementation details but about what the
> interface achieves - ie, what's the actual function?  The only thing I
> can see is blocking the entity which is configuring the hierarchy from
> making certain configs.  While that might be useful in some specific
> use cases, it seems to miss the bar for becoming its own kernel
> feature.  After all, nothing prevents the same entity from clearing
> the exlusive bit and making the said changes.

Yes, privileged contexts can maliciously or stupidly step all over one
other no matter what you do (finite resource), but oxymoron creation
(CPUs simultaneously balanced and isolated) should be handled.  If one
context can allocate a set overlapping a set another context intends to
or already has detached from scheduler domains, both are screwed.

	-Mike

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4] cpuset: Enable cpuset controller in default hierarchy
  2018-03-19 20:49                         ` Mike Galbraith
@ 2018-03-19 21:41                           ` Waiman Long
  2018-03-20  4:25                             ` Mike Galbraith
  0 siblings, 1 reply; 20+ messages in thread
From: Waiman Long @ 2018-03-19 21:41 UTC (permalink / raw)
  To: Mike Galbraith, Tejun Heo
  Cc: Peter Zijlstra, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, torvalds,
	Roman Gushchin

On 03/19/2018 04:49 PM, Mike Galbraith wrote:
> On Mon, 2018-03-19 at 08:34 -0700, Tejun Heo wrote:
>> Hello, Mike.
>>
>> On Thu, Mar 15, 2018 at 03:49:01AM +0100, Mike Galbraith wrote:
>>> Under the hood v2 details are entirely up to you.  My input ends at
>>> please don't leave dynamic partitioning standing at the dock when v2
>>> sails.
>> So, this isn't about implementation details but about what the
>> interface achieves - ie, what's the actual function?  The only thing I
>> can see is blocking the entity which is configuring the hierarchy from
>> making certain configs.  While that might be useful in some specific
>> use cases, it seems to miss the bar for becoming its own kernel
>> feature.  After all, nothing prevents the same entity from clearing
>> the exlusive bit and making the said changes.
> Yes, privileged contexts can maliciously or stupidly step all over one
> other no matter what you do (finite resource), but oxymoron creation
> (CPUs simultaneously balanced and isolated) should be handled.  If one
> context can allocate a set overlapping a set another context intends to
> or already has detached from scheduler domains, both are screwed.
>
> 	-Mike

The allocations of CPUs to child cgroups should be controlled by the
parent cgroup. It is the parent's fault if some CPUs are in both
balanced and isolated cgroups.

How about we don't allow turning off scheduling if the CPUs aren't
exclusive from the parent's perspective? So you can't create an isolated
cgroup if the CPUs aren't exclusive. Will this be a good enough compromise?

Cheers,
Longman

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4] cpuset: Enable cpuset controller in default hierarchy
  2018-03-19 21:41                           ` Waiman Long
@ 2018-03-20  4:25                             ` Mike Galbraith
  0 siblings, 0 replies; 20+ messages in thread
From: Mike Galbraith @ 2018-03-20  4:25 UTC (permalink / raw)
  To: Waiman Long, Tejun Heo
  Cc: Peter Zijlstra, Li Zefan, Johannes Weiner, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, torvalds,
	Roman Gushchin

On Mon, 2018-03-19 at 17:41 -0400, Waiman Long wrote:
> On 03/19/2018 04:49 PM, Mike Galbraith wrote:
> > On Mon, 2018-03-19 at 08:34 -0700, Tejun Heo wrote:
> >> Hello, Mike.
> >>
> >> On Thu, Mar 15, 2018 at 03:49:01AM +0100, Mike Galbraith wrote:
> >>> Under the hood v2 details are entirely up to you.  My input ends at
> >>> please don't leave dynamic partitioning standing at the dock when v2
> >>> sails.
> >> So, this isn't about implementation details but about what the
> >> interface achieves - ie, what's the actual function?  The only thing I
> >> can see is blocking the entity which is configuring the hierarchy from
> >> making certain configs.  While that might be useful in some specific
> >> use cases, it seems to miss the bar for becoming its own kernel
> >> feature.  After all, nothing prevents the same entity from clearing
> >> the exlusive bit and making the said changes.
> > Yes, privileged contexts can maliciously or stupidly step all over one
> > other no matter what you do (finite resource), but oxymoron creation
> > (CPUs simultaneously balanced and isolated) should be handled.  If one
> > context can allocate a set overlapping a set another context intends to
> > or already has detached from scheduler domains, both are screwed.
> >
> > 	-Mike
> 
> The allocations of CPUs to child cgroups should be controlled by the
> parent cgroup. It is the parent's fault if some CPUs are in both
> balanced and isolated cgroups.
> 
> How about we don't allow turning off scheduling if the CPUs aren't
> exclusive from the parent's perspective? So you can't create an isolated
> cgroup if the CPUs aren't exclusive. Will this be a good enough compromise?

Sure.  The kernel need only ensure its own sanity.  Userspace conflicts
are more or less a non-issue.  In practice, all players but one will
have been constrained or eliminated prior to any partitioning.

	-Mike

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2018-03-20  4:26 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-09 15:35 [PATCH v4] cpuset: Enable cpuset controller in default hierarchy Waiman Long
2018-03-09 16:34 ` Mike Galbraith
2018-03-09 17:23   ` Mike Galbraith
2018-03-09 17:45   ` Waiman Long
2018-03-09 18:17     ` Mike Galbraith
2018-03-09 18:20       ` Waiman Long
2018-03-09 19:40         ` Mike Galbraith
2018-03-09 20:43           ` Waiman Long
2018-03-09 22:17             ` Peter Zijlstra
2018-03-09 23:06               ` Waiman Long
2018-03-10  3:47                 ` Mike Galbraith
2018-03-14 19:57                   ` Tejun Heo
2018-03-15  2:49                     ` Mike Galbraith
2018-03-19 15:34                       ` Tejun Heo
2018-03-19 20:49                         ` Mike Galbraith
2018-03-19 21:41                           ` Waiman Long
2018-03-20  4:25                             ` Mike Galbraith
2018-03-10 13:16                 ` Peter Zijlstra
2018-03-12 14:20                   ` Waiman Long
2018-03-12 15:21                     ` Mike Galbraith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).