From: Waiman Long <longman@redhat.com> To: Tejun Heo <tj@kernel.org>, Zefan Li <lizefan.x@bytedance.com>, Johannes Weiner <hannes@cmpxchg.org>, Jonathan Corbet <corbet@lwn.net>, Shuah Khan <shuah@kernel.org> Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, "Andrew Morton" <akpm@linux-foundation.org>, "Roman Gushchin" <guro@fb.com>, "Phil Auld" <pauld@redhat.com>, "Peter Zijlstra" <peterz@infradead.org>, "Juri Lelli" <juri.lelli@redhat.com>, "Frederic Weisbecker" <frederic@kernel.org>, "Marcelo Tosatti" <mtosatti@redhat.com>, "Michal Koutný" <mkoutny@suse.com>, "Waiman Long" <longman@redhat.com> Subject: [PATCH v6 4/6] cgroup/cpuset: Allow non-top parent partition to distribute out all CPUs Date: Sat, 14 Aug 2021 16:57:41 -0400 [thread overview] Message-ID: <20210814205743.3039-5-longman@redhat.com> (raw) In-Reply-To: <20210814205743.3039-1-longman@redhat.com> Currently, a parent partition cannot distribute all its CPUs to child partitions with no CPUs left. However in some use cases, a management application may want to create a parent partition as a management unit with no task associated with it and has all its CPUs distributed to various child partitions dynamically according to their needs. Leaving a cpu in the parent partition in such a case is now a waste. To accommodate such use cases, a parent partition can now have all its CPUs distributed to its child partitions with 0 effective cpu left as long as it is not the top cpuset and it has no task at the time the child partition is being created. A terminal partition with no child partition underlying it, however, cannot have 0 effective cpu which will make the partition invalid. Once an empty parent partition is formed, no new task can be moved into it. Signed-off-by: Waiman Long <longman@redhat.com> --- kernel/cgroup/cpuset.c | 96 ++++++++++++++++++++++++++++++------------ 1 file changed, 69 insertions(+), 27 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 0d4a2ed6fb24..02115e5c818a 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -305,6 +305,21 @@ static inline void notify_partition_change(struct cpuset *cs, WRITE_ONCE(cs->prs_err, PERR_NONE); } +static inline int cpuset_has_tasks(const struct cpuset *cs) +{ + return cs->css.cgroup->nr_populated_csets; +} + +/* + * A empty partition (one with no effective cpu) is valid if it has no + * associated task and all its cpus have been distributed out to child + * partitions. + */ +static inline bool valid_empty_partition(const struct cpuset *cs) +{ + return !cpuset_has_tasks(cs) && cs->nr_subparts_cpus; +} + static struct cpuset top_cpuset = { .flags = ((1 << CS_ONLINE) | (1 << CS_CPU_EXCLUSIVE) | (1 << CS_MEM_EXCLUSIVE)), @@ -1211,22 +1226,32 @@ static int update_parent_subparts_cpumask(struct cpuset *cpuset, int cmd, if ((cmd == partcmd_enable) && css_has_online_children(&cpuset->css)) return -EBUSY; - /* - * Enabling partition root is not allowed if not all the CPUs - * can be granted from parent's effective_cpus or at least one - * CPU will be left after that. - */ - if ((cmd == partcmd_enable) && - (!cpumask_subset(cpuset->cpus_allowed, parent->effective_cpus) || - cpumask_equal(cpuset->cpus_allowed, parent->effective_cpus))) - return -EINVAL; - /* * A cpumask update cannot make parent's effective_cpus become empty. */ adding = deleting = false; old_prs = new_prs = cpuset->partition_root_state; if (cmd == partcmd_enable) { + bool parent_is_top_cpuset = !parent_cs(parent); + bool no_cpu_in_parent = cpumask_equal(cpuset->cpus_allowed, + parent->effective_cpus); + /* + * Enabling partition root is not allowed if not all the CPUs + * can be granted from parent's effective_cpus. If the parent + * is the top cpuset, at least one CPU must be left after that. + */ + if (!cpumask_subset(cpuset->cpus_allowed, parent->effective_cpus) || + (parent_is_top_cpuset && no_cpu_in_parent)) + return -EINVAL; + + /* + * A non-top parent can be left with no CPU as long as there + * is no task directly associated with the parent. For such + * a parent, no new task can be moved into it. + */ + if (no_cpu_in_parent && cpuset_has_tasks(parent)) + return -EINVAL; + cpumask_copy(tmp->addmask, cpuset->cpus_allowed); adding = true; } else if (cmd == partcmd_disable) { @@ -1257,9 +1282,9 @@ static int update_parent_subparts_cpumask(struct cpuset *cpuset, int cmd, parent->subparts_cpus); /* * Make partition invalid if parent's effective_cpus could - * become empty. + * become empty and there are tasks in the parent. */ - if (adding && + if (adding && cpuset_has_tasks(parent) && cpumask_equal(parent->effective_cpus, tmp->addmask)) { if (!deleting) part_error = true; @@ -1294,7 +1319,9 @@ static int update_parent_subparts_cpumask(struct cpuset *cpuset, int cmd, parent->effective_cpus); part_error = (is_partition_root(cpuset) && !parent->nr_subparts_cpus) || - cpumask_equal(tmp->addmask, parent->effective_cpus); + (cpumask_equal(tmp->addmask, parent->effective_cpus) && + cpuset_has_tasks(parent)); + if (is_partition_root(cpuset) && part_error) WRITE_ONCE(cpuset->prs_err, PERR_NOCPUS); } @@ -1397,9 +1424,15 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp) /* * If it becomes empty, inherit the effective mask of the - * parent, which is guaranteed to have some CPUs. + * parent, which is guaranteed to have some CPUs unless + * it is a partition root that has explicitly distributed + * out all its CPUs. */ if (is_in_v2_mode() && cpumask_empty(tmp->new_cpus)) { + if (is_partition_root(cp) && + cpumask_equal(cp->cpus_allowed, cp->subparts_cpus)) + goto update_parent_subparts; + cpumask_copy(tmp->new_cpus, parent->effective_cpus); if (!cp->use_parent_ecpus) { cp->use_parent_ecpus = true; @@ -1421,6 +1454,7 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp) continue; } +update_parent_subparts: /* * update_parent_subparts_cpumask() should have been called * for cs already in update_cpumask(). We should also call @@ -1497,12 +1531,9 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp) */ cpumask_andnot(cp->effective_cpus, cp->effective_cpus, cp->subparts_cpus); - WARN_ON_ONCE(cpumask_empty(cp->effective_cpus)); } - if (new_prs != old_prs) - cp->partition_root_state = new_prs; - + cp->partition_root_state = new_prs; spin_unlock_irq(&callback_lock); notify_partition_change(cp, old_prs, new_prs); @@ -2244,6 +2275,13 @@ static int cpuset_can_attach(struct cgroup_taskset *tset) (cpumask_empty(cs->cpus_allowed) || nodes_empty(cs->mems_allowed))) goto out_unlock; + /* + * On default hierarchy, task cannot be moved to a cpuset with empty + * effective cpus. + */ + if (is_in_v2_mode() && cpumask_empty(cs->effective_cpus)) + goto out_unlock; + cgroup_taskset_for_each(task, css, tset) { ret = task_can_attach(task, cs->cpus_allowed); if (ret) @@ -3120,7 +3158,8 @@ hotplug_update_tasks(struct cpuset *cs, struct cpumask *new_cpus, nodemask_t *new_mems, bool cpus_updated, bool mems_updated) { - if (cpumask_empty(new_cpus)) + /* A partition root is allowed to have empty effective cpus */ + if (cpumask_empty(new_cpus) && !is_partition_root(cs)) cpumask_copy(new_cpus, parent_cs(cs)->effective_cpus); if (nodes_empty(*new_mems)) *new_mems = parent_cs(cs)->effective_mems; @@ -3189,22 +3228,25 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp) /* * In the unlikely event that a partition root has empty - * effective_cpus, we will have to force any child partitions, - * if present, to become invalid by setting nr_subparts_cpus to 0 - * without causing itself to become invalid. + * effective_cpus with tasks, we will have to force any child + * partitions, if present, to become invalid by setting + * nr_subparts_cpus to 0 without causing itself to become invalid. */ - if (is_partition_root(cs) && cs->nr_subparts_cpus && - cpumask_empty(&new_cpus)) { + if (is_partition_root(cs) && cpumask_empty(&new_cpus) && + !valid_empty_partition(cs)) { cs->nr_subparts_cpus = 0; cpumask_clear(cs->subparts_cpus); compute_effective_cpumask(&new_cpus, cs, parent); } /* - * If empty effective_cpus or zero nr_subparts_cpus or its parent - * becomes erroneous, we have to transition it to the erroneous state. + * Force the partition to become invalid if either one of + * the following conditions hold: + * 1) empty effective cpus but not valid empty partition. + * 2) parent is invalid or doesn't grant any cpus to child partitions. */ - if (is_partition_root(cs) && (cpumask_empty(&new_cpus) || + if (is_partition_root(cs) && + ((cpumask_empty(&new_cpus) && !valid_empty_partition(cs)) || (parent->partition_root_state == PRS_ERROR) || !parent->nr_subparts_cpus)) { int old_prs; -- 2.18.1
WARNING: multiple messages have this Message-ID (diff)
From: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> To: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>, Zefan Li <lizefan.x-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>, Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>, Jonathan Corbet <corbet-T1hC0tSOHrs@public.gmane.org>, Shuah Khan <shuah-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-doc-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-kselftest-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, "Andrew Morton" <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>, "Roman Gushchin" <guro-b10kYP2dOMg@public.gmane.org>, "Phil Auld" <pauld-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Peter Zijlstra" <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>, "Juri Lelli" <juri.lelli-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Frederic Weisbecker" <frederic-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>, "Marcelo Tosatti" <mtosatti-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Michal Koutný" <mkoutny-IBi9RG/b67k@public.gmane.org>, "Waiman Long" <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Subject: [PATCH v6 4/6] cgroup/cpuset: Allow non-top parent partition to distribute out all CPUs Date: Sat, 14 Aug 2021 16:57:41 -0400 [thread overview] Message-ID: <20210814205743.3039-5-longman@redhat.com> (raw) In-Reply-To: <20210814205743.3039-1-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Currently, a parent partition cannot distribute all its CPUs to child partitions with no CPUs left. However in some use cases, a management application may want to create a parent partition as a management unit with no task associated with it and has all its CPUs distributed to various child partitions dynamically according to their needs. Leaving a cpu in the parent partition in such a case is now a waste. To accommodate such use cases, a parent partition can now have all its CPUs distributed to its child partitions with 0 effective cpu left as long as it is not the top cpuset and it has no task at the time the child partition is being created. A terminal partition with no child partition underlying it, however, cannot have 0 effective cpu which will make the partition invalid. Once an empty parent partition is formed, no new task can be moved into it. Signed-off-by: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> --- kernel/cgroup/cpuset.c | 96 ++++++++++++++++++++++++++++++------------ 1 file changed, 69 insertions(+), 27 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 0d4a2ed6fb24..02115e5c818a 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -305,6 +305,21 @@ static inline void notify_partition_change(struct cpuset *cs, WRITE_ONCE(cs->prs_err, PERR_NONE); } +static inline int cpuset_has_tasks(const struct cpuset *cs) +{ + return cs->css.cgroup->nr_populated_csets; +} + +/* + * A empty partition (one with no effective cpu) is valid if it has no + * associated task and all its cpus have been distributed out to child + * partitions. + */ +static inline bool valid_empty_partition(const struct cpuset *cs) +{ + return !cpuset_has_tasks(cs) && cs->nr_subparts_cpus; +} + static struct cpuset top_cpuset = { .flags = ((1 << CS_ONLINE) | (1 << CS_CPU_EXCLUSIVE) | (1 << CS_MEM_EXCLUSIVE)), @@ -1211,22 +1226,32 @@ static int update_parent_subparts_cpumask(struct cpuset *cpuset, int cmd, if ((cmd == partcmd_enable) && css_has_online_children(&cpuset->css)) return -EBUSY; - /* - * Enabling partition root is not allowed if not all the CPUs - * can be granted from parent's effective_cpus or at least one - * CPU will be left after that. - */ - if ((cmd == partcmd_enable) && - (!cpumask_subset(cpuset->cpus_allowed, parent->effective_cpus) || - cpumask_equal(cpuset->cpus_allowed, parent->effective_cpus))) - return -EINVAL; - /* * A cpumask update cannot make parent's effective_cpus become empty. */ adding = deleting = false; old_prs = new_prs = cpuset->partition_root_state; if (cmd == partcmd_enable) { + bool parent_is_top_cpuset = !parent_cs(parent); + bool no_cpu_in_parent = cpumask_equal(cpuset->cpus_allowed, + parent->effective_cpus); + /* + * Enabling partition root is not allowed if not all the CPUs + * can be granted from parent's effective_cpus. If the parent + * is the top cpuset, at least one CPU must be left after that. + */ + if (!cpumask_subset(cpuset->cpus_allowed, parent->effective_cpus) || + (parent_is_top_cpuset && no_cpu_in_parent)) + return -EINVAL; + + /* + * A non-top parent can be left with no CPU as long as there + * is no task directly associated with the parent. For such + * a parent, no new task can be moved into it. + */ + if (no_cpu_in_parent && cpuset_has_tasks(parent)) + return -EINVAL; + cpumask_copy(tmp->addmask, cpuset->cpus_allowed); adding = true; } else if (cmd == partcmd_disable) { @@ -1257,9 +1282,9 @@ static int update_parent_subparts_cpumask(struct cpuset *cpuset, int cmd, parent->subparts_cpus); /* * Make partition invalid if parent's effective_cpus could - * become empty. + * become empty and there are tasks in the parent. */ - if (adding && + if (adding && cpuset_has_tasks(parent) && cpumask_equal(parent->effective_cpus, tmp->addmask)) { if (!deleting) part_error = true; @@ -1294,7 +1319,9 @@ static int update_parent_subparts_cpumask(struct cpuset *cpuset, int cmd, parent->effective_cpus); part_error = (is_partition_root(cpuset) && !parent->nr_subparts_cpus) || - cpumask_equal(tmp->addmask, parent->effective_cpus); + (cpumask_equal(tmp->addmask, parent->effective_cpus) && + cpuset_has_tasks(parent)); + if (is_partition_root(cpuset) && part_error) WRITE_ONCE(cpuset->prs_err, PERR_NOCPUS); } @@ -1397,9 +1424,15 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp) /* * If it becomes empty, inherit the effective mask of the - * parent, which is guaranteed to have some CPUs. + * parent, which is guaranteed to have some CPUs unless + * it is a partition root that has explicitly distributed + * out all its CPUs. */ if (is_in_v2_mode() && cpumask_empty(tmp->new_cpus)) { + if (is_partition_root(cp) && + cpumask_equal(cp->cpus_allowed, cp->subparts_cpus)) + goto update_parent_subparts; + cpumask_copy(tmp->new_cpus, parent->effective_cpus); if (!cp->use_parent_ecpus) { cp->use_parent_ecpus = true; @@ -1421,6 +1454,7 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp) continue; } +update_parent_subparts: /* * update_parent_subparts_cpumask() should have been called * for cs already in update_cpumask(). We should also call @@ -1497,12 +1531,9 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp) */ cpumask_andnot(cp->effective_cpus, cp->effective_cpus, cp->subparts_cpus); - WARN_ON_ONCE(cpumask_empty(cp->effective_cpus)); } - if (new_prs != old_prs) - cp->partition_root_state = new_prs; - + cp->partition_root_state = new_prs; spin_unlock_irq(&callback_lock); notify_partition_change(cp, old_prs, new_prs); @@ -2244,6 +2275,13 @@ static int cpuset_can_attach(struct cgroup_taskset *tset) (cpumask_empty(cs->cpus_allowed) || nodes_empty(cs->mems_allowed))) goto out_unlock; + /* + * On default hierarchy, task cannot be moved to a cpuset with empty + * effective cpus. + */ + if (is_in_v2_mode() && cpumask_empty(cs->effective_cpus)) + goto out_unlock; + cgroup_taskset_for_each(task, css, tset) { ret = task_can_attach(task, cs->cpus_allowed); if (ret) @@ -3120,7 +3158,8 @@ hotplug_update_tasks(struct cpuset *cs, struct cpumask *new_cpus, nodemask_t *new_mems, bool cpus_updated, bool mems_updated) { - if (cpumask_empty(new_cpus)) + /* A partition root is allowed to have empty effective cpus */ + if (cpumask_empty(new_cpus) && !is_partition_root(cs)) cpumask_copy(new_cpus, parent_cs(cs)->effective_cpus); if (nodes_empty(*new_mems)) *new_mems = parent_cs(cs)->effective_mems; @@ -3189,22 +3228,25 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp) /* * In the unlikely event that a partition root has empty - * effective_cpus, we will have to force any child partitions, - * if present, to become invalid by setting nr_subparts_cpus to 0 - * without causing itself to become invalid. + * effective_cpus with tasks, we will have to force any child + * partitions, if present, to become invalid by setting + * nr_subparts_cpus to 0 without causing itself to become invalid. */ - if (is_partition_root(cs) && cs->nr_subparts_cpus && - cpumask_empty(&new_cpus)) { + if (is_partition_root(cs) && cpumask_empty(&new_cpus) && + !valid_empty_partition(cs)) { cs->nr_subparts_cpus = 0; cpumask_clear(cs->subparts_cpus); compute_effective_cpumask(&new_cpus, cs, parent); } /* - * If empty effective_cpus or zero nr_subparts_cpus or its parent - * becomes erroneous, we have to transition it to the erroneous state. + * Force the partition to become invalid if either one of + * the following conditions hold: + * 1) empty effective cpus but not valid empty partition. + * 2) parent is invalid or doesn't grant any cpus to child partitions. */ - if (is_partition_root(cs) && (cpumask_empty(&new_cpus) || + if (is_partition_root(cs) && + ((cpumask_empty(&new_cpus) && !valid_empty_partition(cs)) || (parent->partition_root_state == PRS_ERROR) || !parent->nr_subparts_cpus)) { int old_prs; -- 2.18.1
next prev parent reply other threads:[~2021-08-14 20:58 UTC|newest] Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-08-14 20:57 [PATCH-cgroup v6 0/6] cgroup/cpuset: Add new cpuset partition type & empty effecitve cpus Waiman Long 2021-08-14 20:57 ` Waiman Long 2021-08-14 20:57 ` [PATCH v6 1/6] cgroup/cpuset: Properly transition to invalid partition Waiman Long 2021-08-14 20:57 ` [PATCH v6 2/6] cgroup/cpuset: Show invalid partition reason string Waiman Long 2021-08-14 20:57 ` Waiman Long 2021-08-14 20:57 ` [PATCH v6 3/6] cgroup/cpuset: Add a new isolated cpus.partition type Waiman Long 2021-08-14 20:57 ` Waiman Long [this message] 2021-08-14 20:57 ` [PATCH v6 4/6] cgroup/cpuset: Allow non-top parent partition to distribute out all CPUs Waiman Long 2021-08-14 20:57 ` [PATCH v6 5/6] cgroup/cpuset: Update description of cpuset.cpus.partition in cgroup-v2.rst Waiman Long 2021-08-16 17:08 ` Tejun Heo 2021-08-16 17:08 ` Tejun Heo 2021-08-24 5:35 ` Waiman Long 2021-08-24 19:04 ` Tejun Heo 2021-08-24 19:04 ` Tejun Heo 2021-08-25 19:21 ` Waiman Long 2021-08-25 19:21 ` Waiman Long 2021-08-25 19:24 ` Tejun Heo 2021-08-25 19:24 ` Tejun Heo 2021-08-14 20:57 ` [PATCH v6 6/6] kselftest/cgroup: Add cpuset v2 partition root state test Waiman Long
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20210814205743.3039-5-longman@redhat.com \ --to=longman@redhat.com \ --cc=akpm@linux-foundation.org \ --cc=cgroups@vger.kernel.org \ --cc=corbet@lwn.net \ --cc=frederic@kernel.org \ --cc=guro@fb.com \ --cc=hannes@cmpxchg.org \ --cc=juri.lelli@redhat.com \ --cc=linux-doc@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-kselftest@vger.kernel.org \ --cc=lizefan.x@bytedance.com \ --cc=mkoutny@suse.com \ --cc=mtosatti@redhat.com \ --cc=pauld@redhat.com \ --cc=peterz@infradead.org \ --cc=shuah@kernel.org \ --cc=tj@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.