linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/2] cpuset: Enable cpuset controller in default hierarchy
@ 2018-03-15 21:20 Waiman Long
  2018-03-15 21:20 ` [PATCH v5 1/2] " Waiman Long
  2018-03-15 21:20 ` [PATCH v5 2/2] cpuset: Add cpuset.flags control knob to v2 Waiman Long
  0 siblings, 2 replies; 14+ messages in thread
From: Waiman Long @ 2018-03-15 21:20 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto, efault,
	torvalds, Roman Gushchin, Waiman Long

v5:
 - Add patch 2 to provide the cpuset.flags control knob for the
   sched_load_balance flag which should be the only feature that is
   essential as a replacement of the "isolcpus" kernel boot parameter.

v4:
 - Further minimize the feature set by removing the flags control knob.

v3:
 - Further trim the additional features down to just memory_migrate.
 - Update Documentation/cgroup-v2.txt.

The purpose of this patchset is to provide a minimal set of cpuset
features for cgroup v2. That minimal set includes the cpus, mems and
their effective_* counterparts as well as a new flags control knob
that currently supports only the sched_load_balance flag.

This patchset does not exclude the possibility of adding more flags
and features in the future after careful consideration.

Patch 1 enables cpuset in cgroup v2 with cpus, mems and their
effective_* counterparts.

Patch 2 adds flags with support for the sched_load_balance only.

Waiman Long (2):
  cpuset: Enable cpuset controller in default hierarchy
  cpuset: Add cpuset.flags control knob to v2

 Documentation/cgroup-v2.txt | 128 ++++++++++++++++++++++++++++++++++++----
 kernel/cgroup/cpuset.c      | 140 +++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 256 insertions(+), 12 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v5 1/2] cpuset: Enable cpuset controller in default hierarchy
  2018-03-15 21:20 [PATCH v5 0/2] cpuset: Enable cpuset controller in default hierarchy Waiman Long
@ 2018-03-15 21:20 ` Waiman Long
  2018-03-19 15:59   ` Tejun Heo
  2018-03-15 21:20 ` [PATCH v5 2/2] cpuset: Add cpuset.flags control knob to v2 Waiman Long
  1 sibling, 1 reply; 14+ messages in thread
From: Waiman Long @ 2018-03-15 21:20 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto, efault,
	torvalds, Roman Gushchin, Waiman Long

Given the fact that thread mode had been merged into 4.14, it is now
time to enable cpuset to be used in the default hierarchy (cgroup v2)
as it is clearly threaded.

The cpuset controller had experienced feature creep since its
introduction more than a decade ago. Besides the core cpus and mems
control files to limit cpus and memory nodes, there are a bunch of
additional features that can be controlled from the userspace. Some of
the features are of doubtful usefulness and may not be actively used.

This patch enables cpuset controller in the default hierarchy with
a minimal set of features, namely just the cpus and mems and their
effective_* counterparts.  We can certainly add more features to the
default hierarchy in the future if there is a real user need for them
later on.

Alternatively, with the unified hiearachy, it may make more sense
to move some of those additional cpuset features, if desired, to
memory controller or may be to the cpu controller instead of staying
with cpuset.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/cgroup-v2.txt | 96 ++++++++++++++++++++++++++++++++++++++++-----
 kernel/cgroup/cpuset.c      | 44 +++++++++++++++++++--
 2 files changed, 127 insertions(+), 13 deletions(-)

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index 74cdeae..b91fd5d 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -48,16 +48,18 @@ v1 is available under Documentation/cgroup-v1/.
        5-2-1. Memory Interface Files
        5-2-2. Usage Guidelines
        5-2-3. Memory Ownership
-     5-3. IO
-       5-3-1. IO Interface Files
-       5-3-2. Writeback
-     5-4. PID
-       5-4-1. PID Interface Files
-     5-5. Device
-     5-6. RDMA
-       5-6-1. RDMA Interface Files
-     5-7. Misc
-       5-7-1. perf_event
+     5-3. Cpuset
+       5.3-1. Cpuset Interface Files
+     5-4. IO
+       5-4-1. IO Interface Files
+       5-4-2. Writeback
+     5-5. PID
+       5-5-1. PID Interface Files
+     5-6. Device
+     5-7. RDMA
+       5-7-1. RDMA Interface Files
+     5-8. Misc
+       5-8-1. perf_event
      5-N. Non-normative information
        5-N-1. CPU controller root cgroup process behaviour
        5-N-2. IO controller root cgroup process behaviour
@@ -1243,6 +1245,80 @@ POSIX_FADV_DONTNEED to relinquish the ownership of memory areas
 belonging to the affected files to ensure correct memory ownership.
 
 
+Cpuset
+------
+
+The "cpuset" controller provides a mechanism for constraining
+the CPU and memory node placement of tasks to only the resources
+specified in the cpuset interface files in a task's current cgroup.
+This is especially valuable on large NUMA systems where placing jobs
+on properly sized subsets of the systems with careful processor and
+memory placement to reduce cross-node memory access and contention
+can improve overall system performance.
+
+The "cpuset" controller is hierarchical.  That means the controller
+cannot use CPUs or memory nodes not allowed in its parent.
+
+
+Cpuset Interface Files
+~~~~~~~~~~~~~~~~~~~~~~
+
+  cpuset.cpus
+	A read-write multiple values file which exists on non-root
+	cgroups.
+
+	It lists the CPUs allowed to be used by tasks within this
+	cgroup.  The CPU numbers are comma-separated numbers or
+	ranges.  For example:
+
+	  # cat cpuset.cpus
+	  0-4,6,8-10
+
+	An empty value indicates that the cgroup is using the same
+	setting as the nearest cgroup ancestor with a non-empty
+	"cpuset.cpus" or all the available CPUs if none is found.
+
+	The value of "cpuset.cpus" stays constant until the next update
+	and won't be affected by any CPU hotplug events.
+
+  cpuset.effective_cpus
+	A read-only multiple values file which exists on non-root
+	cgroups.
+
+	It lists the onlined CPUs that are actually allowed to be
+	used by tasks within the current cgroup. It is a subset of
+	"cpuset.cpus".  Its value will be affected by CPU hotplug
+	events.
+
+  cpuset.mems
+	A read-write multiple values file which exists on non-root
+	cgroups.
+
+	It lists the memory nodes allowed to be used by tasks within
+	this cgroup.  The memory node numbers are comma-separated
+	numbers or ranges.  For example:
+
+	  # cat cpuset.mems
+	  0-1,3
+
+	An empty value indicates that the cgroup is using the same
+	setting as the nearest cgroup ancestor with a non-empty
+	"cpuset.mems" or all the available memory nodes if none
+	is found.
+
+	The value of "cpuset.mems" stays constant until the next update
+	and won't be affected by any memory nodes hotplug events.
+
+  cpuset.effective_mems
+	A read-only multiple values file which exists on non-root
+	cgroups.
+
+	It lists the onlined memory nodes that are actually allowed
+	to be used by tasks within the current cgroup.  It is a subset
+	of "cpuset.mems".  Its value will be affected by memory nodes
+	hotplug events.
+
+
 IO
 --
 
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index b42037e..7837d1f 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1823,12 +1823,11 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
 	return 0;
 }
 
-
 /*
  * for the common functions, 'private' gives the type of file
  */
 
-static struct cftype files[] = {
+static struct cftype legacy_files[] = {
 	{
 		.name = "cpus",
 		.seq_show = cpuset_common_seq_show,
@@ -1931,6 +1930,43 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
 };
 
 /*
+ * This is currently a minimal set for the default hierarchy. It can be
+ * expanded later on by migrating more features and control files from v1.
+ */
+static struct cftype dfl_files[] = {
+	{
+		.name = "cpus",
+		.seq_show = cpuset_common_seq_show,
+		.write = cpuset_write_resmask,
+		.max_write_len = (100U + 6 * NR_CPUS),
+		.private = FILE_CPULIST,
+	},
+
+	{
+		.name = "mems",
+		.seq_show = cpuset_common_seq_show,
+		.write = cpuset_write_resmask,
+		.max_write_len = (100U + 6 * MAX_NUMNODES),
+		.private = FILE_MEMLIST,
+	},
+
+	{
+		.name = "effective_cpus",
+		.seq_show = cpuset_common_seq_show,
+		.private = FILE_EFFECTIVE_CPULIST,
+	},
+
+	{
+		.name = "effective_mems",
+		.seq_show = cpuset_common_seq_show,
+		.private = FILE_EFFECTIVE_MEMLIST,
+	},
+
+	{ }	/* terminate */
+};
+
+
+/*
  *	cpuset_css_alloc - allocate a cpuset css
  *	cgrp:	control group that the new cpuset will be part of
  */
@@ -2104,8 +2140,10 @@ struct cgroup_subsys cpuset_cgrp_subsys = {
 	.post_attach	= cpuset_post_attach,
 	.bind		= cpuset_bind,
 	.fork		= cpuset_fork,
-	.legacy_cftypes	= files,
+	.legacy_cftypes	= legacy_files,
+	.dfl_cftypes	= dfl_files,
 	.early_init	= true,
+	.threaded	= true,
 };
 
 /**
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 2/2] cpuset: Add cpuset.flags control knob to v2
  2018-03-15 21:20 [PATCH v5 0/2] cpuset: Enable cpuset controller in default hierarchy Waiman Long
  2018-03-15 21:20 ` [PATCH v5 1/2] " Waiman Long
@ 2018-03-15 21:20 ` Waiman Long
  2018-03-19 16:26   ` Tejun Heo
  1 sibling, 1 reply; 14+ messages in thread
From: Waiman Long @ 2018-03-15 21:20 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar
  Cc: cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto, efault,
	torvalds, Roman Gushchin, Waiman Long

In cgroup v1, there are a bunch of cpuset control knobs that are just
boolean flags. To reduce the proliteration of cpuset control knobs
in v2 where multiple controllers will be active in a single cgroup,
all the boolean flags are now consolidated into a single cpuset.flags
control knob.

Currently, the cpuset.flags control knob just has one flag -
sched_load_balance. This flag is needed to enable CPU isolation
similar to what can be done with the "isolcpus" kernel boot parameter.
More flags can be added in the future if desired.

When the "cpuset.flags" file is read, it shows what flags have been
currently enabled. Turning on or off specific flags are done in a
way similar to what controllers can be enabled or disabled in the
cgroup.subtree_control control knob.

To enable the sched_load_balance flag, use

  # echo +sched_load_balance > cpuset.flags

To disable it, use

  # echo -sched_load_balance > cpuset.flags

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/cgroup-v2.txt | 32 +++++++++++++++
 kernel/cgroup/cpuset.c      | 98 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 130 insertions(+)

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index b91fd5d..362026a 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -1318,6 +1318,38 @@ Cpuset Interface Files
 	of "cpuset.mems".  Its value will be affected by memory nodes
 	hotplug events.
 
+  cpuset.flags
+	A read-write multiple values file which exists on non-root
+	cgroups.
+
+	On read, it lists the flags that are currently enabled.  On
+	write, the '+' or '-' prefix with the flag name is used to
+	enable and disable the flag respectively.
+
+	The currently supported flag is:
+
+	  sched_load_balance
+		When it is not set, there will be no load balancing
+		among CPUs on this cpuset.  Tasks will stay in the
+		CPUs they are running on and will not be moved to
+		other CPUs.
+
+		When it is set, tasks within this cpuset will be
+		load-balanced by the kernel scheduler.  Tasks will be
+		moved from CPUs with high load to other CPUs within
+		the same cpuset with less load periodically.
+
+		This flag is on by default and will have to be
+		explicitly turned off.
+
+	To set a flag, use the '+' prefix:
+
+	  # echo +sched_load_balance > cpuset.flags
+
+	To clear a flag, use the '-' prefix:
+
+	  # echo -sched_load_balance > cpuset.flags
+
 
 IO
 --
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 7837d1f..3145dc0 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1601,6 +1601,7 @@ static void cpuset_attach(struct cgroup_taskset *tset)
 	FILE_MEMORY_PRESSURE,
 	FILE_SPREAD_PAGE,
 	FILE_SPREAD_SLAB,
+	FILE_FLAGS,
 } cpuset_filetype_t;
 
 static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft,
@@ -1823,6 +1824,97 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
 	return 0;
 }
 
+static const struct {
+	char *name;
+	int  cs_bit;
+} cpuset_flags[] = {
+	{ "sched_load_balance", CS_SCHED_LOAD_BALANCE },
+};
+
+static int cpuset_read_flags(struct seq_file *sf, void *v)
+{
+	struct cpuset *cs = css_cs(seq_css(sf));
+	unsigned long enabled = READ_ONCE(cs->flags);
+	int i, cnt;
+
+	for (i = cnt = 0; i < ARRAY_SIZE(cpuset_flags); i++) {
+		if (!test_bit(cpuset_flags[i].cs_bit, &enabled))
+			continue;
+
+		if (cnt++)
+			seq_putc(sf, ' ');
+
+		seq_printf(sf, "%s", cpuset_flags[i].name);
+	}
+	seq_putc(sf, '\n');
+	return 0;
+}
+
+static ssize_t cpuset_write_flags(struct kernfs_open_file *of, char *buf,
+				  size_t nbytes, loff_t off)
+{
+	struct cpuset *cs = css_cs(of_css(of));
+	unsigned long enable = 0, disable = 0;
+	char *tok;
+	int i;
+
+	mutex_lock(&cpuset_mutex);
+	if (!is_cpuset_online(cs)) {
+		nbytes = -ENODEV;
+		goto out_unlock;
+	}
+
+	/*
+	 * Parse input - space seperated list of feature names prefixed
+	 * with either + or -.
+	 */
+	buf = strstrip(buf);
+	while ((tok = strsep(&buf, " "))) {
+		if (tok[0] == '\0')
+			continue;
+		for (i = 0; i < ARRAY_SIZE(cpuset_flags); i++)
+			if (!strcmp(tok + 1, cpuset_flags[i].name))
+				break;
+		if (i == ARRAY_SIZE(cpuset_flags)) {
+			nbytes = -EINVAL;
+			goto out_unlock;
+		}
+		if (*tok == '+') {
+			set_bit(cpuset_flags[i].cs_bit, &cs->flags);
+			enable  |= 1UL << cpuset_flags[i].cs_bit;
+			disable &= 1UL << cpuset_flags[i].cs_bit;
+		} else if (*tok == '-') {
+			disable |= 1UL << cpuset_flags[i].cs_bit;
+			enable  &= 1UL << cpuset_flags[i].cs_bit;
+		} else {
+			nbytes = -EINVAL;
+			goto out_unlock;
+		}
+	}
+
+	/*
+	 * Set/clear the flags individually. It is possible an error may
+	 * happen before all the flags are processed, but it should be rare.
+	 */
+	for  (i = 0; i < ARRAY_SIZE(cpuset_flags); i++) {
+		unsigned long flag = 1UL << cpuset_flags[i].cs_bit;
+		int retval;
+
+		if (flag & (enable|disable)) {
+			retval = update_flag(cpuset_flags[i].cs_bit, cs,
+					     enable & flag);
+			if (retval) {
+				nbytes = retval;
+				goto out_unlock;
+			}
+		}
+	}
+
+out_unlock:
+	mutex_unlock(&cpuset_mutex);
+	return nbytes;
+}
+
 /*
  * for the common functions, 'private' gives the type of file
  */
@@ -1962,6 +2054,12 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
 		.private = FILE_EFFECTIVE_MEMLIST,
 	},
 
+	{
+		.name = "flags",
+		.seq_show = cpuset_read_flags,
+		.write = cpuset_write_flags,
+		.private = FILE_FLAGS,
+	},
 	{ }	/* terminate */
 };
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 1/2] cpuset: Enable cpuset controller in default hierarchy
  2018-03-15 21:20 ` [PATCH v5 1/2] " Waiman Long
@ 2018-03-19 15:59   ` Tejun Heo
  2018-03-20 13:51     ` Waiman Long
  0 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2018-03-19 15:59 UTC (permalink / raw)
  To: Waiman Long
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, efault,
	torvalds, Roman Gushchin

Hello, Waiman.

This looks great.  A couple nitpicks below.

> +     5-3. Cpuset
> +       5.3-1. Cpuset Interface Files

Can we put cpuset below pid?  It feels weird to break up cpu, memory
and io as they represent the three major resources and are in a
similar fashion.

> +  cpuset.effective_cpus
> +	A read-only multiple values file which exists on non-root
> +	cgroups.
> +
> +	It lists the onlined CPUs that are actually allowed to be
> +	used by tasks within the current cgroup. It is a subset of
> +	"cpuset.cpus".  Its value will be affected by CPU hotplug
> +	events.

Can we do cpuset.cpus.availble which lists the cpus available to the
cgroup instead of the eventual computed mask for the cgroup?  That'd
be more useful as it doesn't lose the information by and'ing what's
available with the cgroup's mask and it's trivial to determine the
effective from the two masks.

> +  cpuset.effective_mems
> +	A read-only multiple values file which exists on non-root
> +	cgroups.
> +
> +	It lists the onlined memory nodes that are actually allowed
> +	to be used by tasks within the current cgroup.  It is a subset
> +	of "cpuset.mems".  Its value will be affected by memory nodes
> +	hotplug events.

Ditto.

> +static struct cftype dfl_files[] = {
> +	{
> +		.name = "cpus",
> +		.seq_show = cpuset_common_seq_show,
> +		.write = cpuset_write_resmask,
> +		.max_write_len = (100U + 6 * NR_CPUS),
> +		.private = FILE_CPULIST,
> +	},

Is it missing CFTYPE_NOT_ON_ROOT?  Other files too.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 2/2] cpuset: Add cpuset.flags control knob to v2
  2018-03-15 21:20 ` [PATCH v5 2/2] cpuset: Add cpuset.flags control knob to v2 Waiman Long
@ 2018-03-19 16:26   ` Tejun Heo
  2018-03-19 16:33     ` Waiman Long
  0 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2018-03-19 16:26 UTC (permalink / raw)
  To: Waiman Long
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, efault,
	torvalds, Roman Gushchin

Hello, Waiman.

On Thu, Mar 15, 2018 at 05:20:42PM -0400, Waiman Long wrote:
> +	The currently supported flag is:
> +
> +	  sched_load_balance
> +		When it is not set, there will be no load balancing
> +		among CPUs on this cpuset.  Tasks will stay in the
> +		CPUs they are running on and will not be moved to
> +		other CPUs.
> +
> +		When it is set, tasks within this cpuset will be
> +		load-balanced by the kernel scheduler.  Tasks will be
> +		moved from CPUs with high load to other CPUs within
> +		the same cpuset with less load periodically.

Hmm... looks like this is something which can be decided by the cgroup
itself and should be made delegatable.  Given that different flags
might need different delegation settings and the precedence of
memory.oom_group, I think it'd be better to make the flags separate
bool files - ie. cpuset.sched_load_balance which contains 0/1 and
marked delegatable.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 2/2] cpuset: Add cpuset.flags control knob to v2
  2018-03-19 16:26   ` Tejun Heo
@ 2018-03-19 16:33     ` Waiman Long
  2018-03-20 20:12       ` Waiman Long
  0 siblings, 1 reply; 14+ messages in thread
From: Waiman Long @ 2018-03-19 16:33 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, efault,
	torvalds, Roman Gushchin

On 03/19/2018 12:26 PM, Tejun Heo wrote:
> Hello, Waiman.
>
> On Thu, Mar 15, 2018 at 05:20:42PM -0400, Waiman Long wrote:
>> +	The currently supported flag is:
>> +
>> +	  sched_load_balance
>> +		When it is not set, there will be no load balancing
>> +		among CPUs on this cpuset.  Tasks will stay in the
>> +		CPUs they are running on and will not be moved to
>> +		other CPUs.
>> +
>> +		When it is set, tasks within this cpuset will be
>> +		load-balanced by the kernel scheduler.  Tasks will be
>> +		moved from CPUs with high load to other CPUs within
>> +		the same cpuset with less load periodically.
> Hmm... looks like this is something which can be decided by the cgroup
> itself and should be made delegatable.  Given that different flags
> might need different delegation settings and the precedence of
> memory.oom_group, I think it'd be better to make the flags separate
> bool files - ie. cpuset.sched_load_balance which contains 0/1 and
> marked delegatable.
>
> Thanks.
>
Sure. Will do that.

-Longman

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 1/2] cpuset: Enable cpuset controller in default hierarchy
  2018-03-19 15:59   ` Tejun Heo
@ 2018-03-20 13:51     ` Waiman Long
  2018-03-20 20:10       ` Tejun Heo
  0 siblings, 1 reply; 14+ messages in thread
From: Waiman Long @ 2018-03-20 13:51 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, efault,
	torvalds, Roman Gushchin

On 03/19/2018 11:59 AM, Tejun Heo wrote:
> Hello, Waiman.
>
> This looks great.  A couple nitpicks below.
>
>> +     5-3. Cpuset
>> +       5.3-1. Cpuset Interface Files
> Can we put cpuset below pid?  It feels weird to break up cpu, memory
> and io as they represent the three major resources and are in a
> similar fashion.
Sure. I will move it down below pid.

>> +  cpuset.effective_cpus
>> +	A read-only multiple values file which exists on non-root
>> +	cgroups.
>> +
>> +	It lists the onlined CPUs that are actually allowed to be
>> +	used by tasks within the current cgroup. It is a subset of
>> +	"cpuset.cpus".  Its value will be affected by CPU hotplug
>> +	events.
> Can we do cpuset.cpus.availble which lists the cpus available to the
> cgroup instead of the eventual computed mask for the cgroup?  That'd
> be more useful as it doesn't lose the information by and'ing what's
> available with the cgroup's mask and it's trivial to determine the
> effective from the two masks.

I don't get what you want here. cpus is the cpuset's cpus_allowed mask.
effective_cpus is the effective_cpus mask. When you say cpus available
to the cgroup, do you mean the cpu_online_mask or the list of cpus from
the parent? Or do you just want to change the name to cpus.available
instead of effective_cpus?

>> +  cpuset.effective_mems
>> +	A read-only multiple values file which exists on non-root
>> +	cgroups.
>> +
>> +	It lists the onlined memory nodes that are actually allowed
>> +	to be used by tasks within the current cgroup.  It is a subset
>> +	of "cpuset.mems".  Its value will be affected by memory nodes
>> +	hotplug events.
> Ditto.
>
>> +static struct cftype dfl_files[] = {
>> +	{
>> +		.name = "cpus",
>> +		.seq_show = cpuset_common_seq_show,
>> +		.write = cpuset_write_resmask,
>> +		.max_write_len = (100U + 6 * NR_CPUS),
>> +		.private = FILE_CPULIST,
>> +	},
> Is it missing CFTYPE_NOT_ON_ROOT?  Other files too.

Right, I will set CFTYPE_NOT_ON_ROOT to "cpus" and "mems" as we are not
supposed to change them in the root. The effective_cpus and
effective_mems will be there in the root to show what are available.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 1/2] cpuset: Enable cpuset controller in default hierarchy
  2018-03-20 13:51     ` Waiman Long
@ 2018-03-20 20:10       ` Tejun Heo
  2018-03-20 20:53         ` Waiman Long
  0 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2018-03-20 20:10 UTC (permalink / raw)
  To: Waiman Long
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, efault,
	torvalds, Roman Gushchin

Hello, Waiman.

On Tue, Mar 20, 2018 at 09:51:20AM -0400, Waiman Long wrote:
> >> +	It lists the onlined CPUs that are actually allowed to be
> >> +	used by tasks within the current cgroup. It is a subset of
> >> +	"cpuset.cpus".  Its value will be affected by CPU hotplug
> >> +	events.
> > Can we do cpuset.cpus.availble which lists the cpus available to the
> > cgroup instead of the eventual computed mask for the cgroup?  That'd
> > be more useful as it doesn't lose the information by and'ing what's
> > available with the cgroup's mask and it's trivial to determine the
> > effective from the two masks.
> 
> I don't get what you want here. cpus is the cpuset's cpus_allowed mask.
> effective_cpus is the effective_cpus mask. When you say cpus available
> to the cgroup, do you mean the cpu_online_mask or the list of cpus from
> the parent? Or do you just want to change the name to cpus.available
> instead of effective_cpus?

The available cpus from the parent, where the effective is AND between
cpuset.available and cpuset.cpus of the cgroup, so that the user can
see what's available for the cgroup unfiltered by cpuset.cpus.

> Right, I will set CFTYPE_NOT_ON_ROOT to "cpus" and "mems" as we are not
> supposed to change them in the root. The effective_cpus and
> effective_mems will be there in the root to show what are available.

So, we can do that in the future but let's not do that for now.  It's
the same problem we have for basically everything else and we've
stayed away from replicating the information in the root cgroup.  This
might change in the future but if we do that let's do that
consistently.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 2/2] cpuset: Add cpuset.flags control knob to v2
  2018-03-19 16:33     ` Waiman Long
@ 2018-03-20 20:12       ` Waiman Long
  2018-03-20 20:22         ` Tejun Heo
  0 siblings, 1 reply; 14+ messages in thread
From: Waiman Long @ 2018-03-20 20:12 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, efault,
	torvalds, Roman Gushchin

On 03/19/2018 12:33 PM, Waiman Long wrote:
> On 03/19/2018 12:26 PM, Tejun Heo wrote:
>> Hello, Waiman.
>>
>> On Thu, Mar 15, 2018 at 05:20:42PM -0400, Waiman Long wrote:
>>> +	The currently supported flag is:
>>> +
>>> +	  sched_load_balance
>>> +		When it is not set, there will be no load balancing
>>> +		among CPUs on this cpuset.  Tasks will stay in the
>>> +		CPUs they are running on and will not be moved to
>>> +		other CPUs.
>>> +
>>> +		When it is set, tasks within this cpuset will be
>>> +		load-balanced by the kernel scheduler.  Tasks will be
>>> +		moved from CPUs with high load to other CPUs within
>>> +		the same cpuset with less load periodically.
>> Hmm... looks like this is something which can be decided by the cgroup
>> itself and should be made delegatable.  Given that different flags
>> might need different delegation settings and the precedence of
>> memory.oom_group, I think it'd be better to make the flags separate
>> bool files - ie. cpuset.sched_load_balance which contains 0/1 and
>> marked delegatable.
>>
>> Thanks.
>>
> Sure. Will do that.

After some thought, I am planning to impose the following additional
constraints on how sched_load_balance works in v2.

1) sched_load_balance will be made hierarchical, the child will inherit
the flag from its parent.
2) cpu_exclusive will be implicitly associated with sched_load_balance.
IOW, sched_load_balance => !cpu_exclusive, and !sched_load_balance =>
cpu_exclusive.
3) sched_load_balance cannot be 1 on a child if it is 0 on the parent.

With these changes, sched_load_balance will have to be set by the parent
and so will not be delegatable. Please let me know your thought on that.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 2/2] cpuset: Add cpuset.flags control knob to v2
  2018-03-20 20:12       ` Waiman Long
@ 2018-03-20 20:22         ` Tejun Heo
  2018-03-20 20:43           ` Waiman Long
  0 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2018-03-20 20:22 UTC (permalink / raw)
  To: Waiman Long
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, efault,
	torvalds, Roman Gushchin

Hello, Waiman.

On Tue, Mar 20, 2018 at 04:12:25PM -0400, Waiman Long wrote:
> After some thought, I am planning to impose the following additional
> constraints on how sched_load_balance works in v2.
> 
> 1) sched_load_balance will be made hierarchical, the child will inherit
> the flag from its parent.
> 2) cpu_exclusive will be implicitly associated with sched_load_balance.
> IOW, sched_load_balance => !cpu_exclusive, and !sched_load_balance =>
> cpu_exclusive.
> 3) sched_load_balance cannot be 1 on a child if it is 0 on the parent.
> 
> With these changes, sched_load_balance will have to be set by the parent
> and so will not be delegatable. Please let me know your thought on that.

So, for configurations, we usually don't let them interact across
hierarchy because that can lead to configurations surprise-changing
and delegated children locking the parent into the current config.

This case could be different and as long as we always guarantee that
an ancestor isn't limited by its descendants in what it can configure,
it should be okay (e.g. an ancestor should always be able to turn on
sched_load_balance regardless of how the descendants are configured).

Hmmm... can you explain why sched_load_balance needs to behave this
way?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 2/2] cpuset: Add cpuset.flags control knob to v2
  2018-03-20 20:22         ` Tejun Heo
@ 2018-03-20 20:43           ` Waiman Long
  0 siblings, 0 replies; 14+ messages in thread
From: Waiman Long @ 2018-03-20 20:43 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, efault,
	torvalds, Roman Gushchin

On 03/20/2018 04:22 PM, Tejun Heo wrote:
> Hello, Waiman.
>
> On Tue, Mar 20, 2018 at 04:12:25PM -0400, Waiman Long wrote:
>> After some thought, I am planning to impose the following additional
>> constraints on how sched_load_balance works in v2.
>>
>> 1) sched_load_balance will be made hierarchical, the child will inherit
>> the flag from its parent.
>> 2) cpu_exclusive will be implicitly associated with sched_load_balance.
>> IOW, sched_load_balance => !cpu_exclusive, and !sched_load_balance =>
>> cpu_exclusive.
>> 3) sched_load_balance cannot be 1 on a child if it is 0 on the parent.
>>
>> With these changes, sched_load_balance will have to be set by the parent
>> and so will not be delegatable. Please let me know your thought on that.
> So, for configurations, we usually don't let them interact across
> hierarchy because that can lead to configurations surprise-changing
> and delegated children locking the parent into the current config.
>
> This case could be different and as long as we always guarantee that
> an ancestor isn't limited by its descendants in what it can configure,
> it should be okay (e.g. an ancestor should always be able to turn on
> sched_load_balance regardless of how the descendants are configured).

Yes, I will do some testing to make sure that a descendant won't be able
to affect how the ancestors can behave.

> Hmmm... can you explain why sched_load_balance needs to behave this
> way?

It boils down to the fact that it doesn't make sense to have a CPU in an
isolated cpuset to participate in load balancing in another cpuset as
Mike has said before. It is especially true in a parent-child
relationship where a delegatee can escape CPU isolation by re-enabling
sched_load_balance in a child cpuset.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 1/2] cpuset: Enable cpuset controller in default hierarchy
  2018-03-20 20:10       ` Tejun Heo
@ 2018-03-20 20:53         ` Waiman Long
  2018-03-20 21:14           ` Tejun Heo
  0 siblings, 1 reply; 14+ messages in thread
From: Waiman Long @ 2018-03-20 20:53 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, efault,
	torvalds, Roman Gushchin

On 03/20/2018 04:10 PM, Tejun Heo wrote:
> Hello, Waiman.
>
> On Tue, Mar 20, 2018 at 09:51:20AM -0400, Waiman Long wrote:
>>>> +	It lists the onlined CPUs that are actually allowed to be
>>>> +	used by tasks within the current cgroup. It is a subset of
>>>> +	"cpuset.cpus".  Its value will be affected by CPU hotplug
>>>> +	events.
>>> Can we do cpuset.cpus.availble which lists the cpus available to the
>>> cgroup instead of the eventual computed mask for the cgroup?  That'd
>>> be more useful as it doesn't lose the information by and'ing what's
>>> available with the cgroup's mask and it's trivial to determine the
>>> effective from the two masks.
>> I don't get what you want here. cpus is the cpuset's cpus_allowed mask.
>> effective_cpus is the effective_cpus mask. When you say cpus available
>> to the cgroup, do you mean the cpu_online_mask or the list of cpus from
>> the parent? Or do you just want to change the name to cpus.available
>> instead of effective_cpus?
> The available cpus from the parent, where the effective is AND between
> cpuset.available and cpuset.cpus of the cgroup, so that the user can
> see what's available for the cgroup unfiltered by cpuset.cpus.

ASAIK for v2, when cpuset.cpus is empty, cpuset.effective_cpus will show
all the cpus available from the parent. It is a different behavior from
v1. So do we still need a cpuset.cpus_available?

>> Right, I will set CFTYPE_NOT_ON_ROOT to "cpus" and "mems" as we are not
>> supposed to change them in the root. The effective_cpus and
>> effective_mems will be there in the root to show what are available.
> So, we can do that in the future but let's not do that for now.  It's
> the same problem we have for basically everything else and we've
> stayed away from replicating the information in the root cgroup.  This
> might change in the future but if we do that let's do that
> consistently.
That is fine. I will make them all disappears in the root cgroup.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 1/2] cpuset: Enable cpuset controller in default hierarchy
  2018-03-20 20:53         ` Waiman Long
@ 2018-03-20 21:14           ` Tejun Heo
  2018-03-20 22:01             ` Waiman Long
  0 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2018-03-20 21:14 UTC (permalink / raw)
  To: Waiman Long
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, efault,
	torvalds, Roman Gushchin

Hello,

On Tue, Mar 20, 2018 at 04:53:37PM -0400, Waiman Long wrote:
> ASAIK for v2, when cpuset.cpus is empty, cpuset.effective_cpus will show
> all the cpus available from the parent. It is a different behavior from
> v1. So do we still need a cpuset.cpus_available?

Heh, you're right.  Let's forget about available and do
cpuset.cpus.effective.  The primary reason for suggesting that was
because of the similarity with cgroup.controllers and
cgroup.subtree_control; however, they're that way because
subtree_control is delegatable.  For a normal resource knob like
cpuset.cpus, the knob is owned by the parent and what's interesting to
the parent is its effective set that it's distributing from.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 1/2] cpuset: Enable cpuset controller in default hierarchy
  2018-03-20 21:14           ` Tejun Heo
@ 2018-03-20 22:01             ` Waiman Long
  0 siblings, 0 replies; 14+ messages in thread
From: Waiman Long @ 2018-03-20 22:01 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar, cgroups,
	linux-kernel, linux-doc, kernel-team, pjt, luto, efault,
	torvalds, Roman Gushchin

On 03/20/2018 05:14 PM, Tejun Heo wrote:
> Hello,
>
> On Tue, Mar 20, 2018 at 04:53:37PM -0400, Waiman Long wrote:
>> ASAIK for v2, when cpuset.cpus is empty, cpuset.effective_cpus will show
>> all the cpus available from the parent. It is a different behavior from
>> v1. So do we still need a cpuset.cpus_available?
> Heh, you're right.  Let's forget about available and do
> cpuset.cpus.effective.  The primary reason for suggesting that was
> because of the similarity with cgroup.controllers and
> cgroup.subtree_control; however, they're that way because
> subtree_control is delegatable.  For a normal resource knob like
> cpuset.cpus, the knob is owned by the parent and what's interesting to
> the parent is its effective set that it's distributing from.

OK, will change the names as suggested.

-Longman

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-03-20 22:01 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-15 21:20 [PATCH v5 0/2] cpuset: Enable cpuset controller in default hierarchy Waiman Long
2018-03-15 21:20 ` [PATCH v5 1/2] " Waiman Long
2018-03-19 15:59   ` Tejun Heo
2018-03-20 13:51     ` Waiman Long
2018-03-20 20:10       ` Tejun Heo
2018-03-20 20:53         ` Waiman Long
2018-03-20 21:14           ` Tejun Heo
2018-03-20 22:01             ` Waiman Long
2018-03-15 21:20 ` [PATCH v5 2/2] cpuset: Add cpuset.flags control knob to v2 Waiman Long
2018-03-19 16:26   ` Tejun Heo
2018-03-19 16:33     ` Waiman Long
2018-03-20 20:12       ` Waiman Long
2018-03-20 20:22         ` Tejun Heo
2018-03-20 20:43           ` Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).