From: Tejun Heo <tj@kernel.org> To: Waiman Long <longman@redhat.com> Cc: "Michal Koutný" <mkoutny@suse.com>, "Zefan Li" <lizefan.x@bytedance.com>, "Johannes Weiner" <hannes@cmpxchg.org>, "Jonathan Corbet" <corbet@lwn.net>, "Shuah Khan" <shuah@kernel.org>, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, "Andrew Morton" <akpm@linux-foundation.org>, "Roman Gushchin" <guro@fb.com>, "Phil Auld" <pauld@redhat.com>, "Peter Zijlstra" <peterz@infradead.org>, "Juri Lelli" <juri.lelli@redhat.com>, "Frederic Weisbecker" <frederic@kernel.org>, "Marcelo Tosatti" <mtosatti@redhat.com> Subject: Re: [PATCH v8 5/6] cgroup/cpuset: Update description of cpuset.cpus.partition in cgroup-v2.rst Date: Tue, 30 Nov 2021 07:11:56 -1000 [thread overview] Message-ID: <YaZbXArNIMNvwJD/@slm.duckdns.org> (raw) In-Reply-To: <293d7abf-aff6-fcd8-c999-b1dbda1cffb8@redhat.com> Hello, Waiman. On Tue, Nov 30, 2021 at 10:35:19AM -0500, Waiman Long wrote: > On read, the "cpuset.cpus.partition" file can show the following > values. > > ====================== ============================== > "member" Non-root member of a partition > "root" Partition root > "isolated" Partition root without load balancing > "root invalid (<reason>)" Invalid partition root > ====================== ============================== What happens if an isolated domain becomes invalid and then valid again due to cpu hotplug? Does it go "root invalid" and then back to "isolated"? ... > Before the "member" to partition root transition can happen, > the following conditions must be met or the transition will > not be allowed. > > 1) The "cpuset.cpus" is non-empty and exclusive, i.e. they are > not shared by any of its siblings. > 2) The parent cgroup is a valid partition root. > 3) The "cpuset.cpus" is a subset of parent's "cpuset.cpus". > 4) There is no child cgroups with cpuset enabled. This avoids > cpu migrations of multiple cgroups simultaneously which can > be problematic. So, I still have a hard time justifying the above restrictions. 1) can be broken through hotplug anyway. 2) can be broken by the parent switching to member. 3) would mean that we'd need to restrict parent's config changes depending on what children are doing. 4) is more understandable but it's an implementation detail that we can address in the future. > Once becoming a partition root, the following two rules restrict > what changes can be made to "cpuset.cpus". > > 1) The value must be exclusive. > 2) If child cpusets exist, the value must be a superset of what > are defined in the child cpusets. > > The second rule applies even for "member". Other changes to > "cpuset.cpus" that do not violate the above rules are always > allowed. While it isn't necessarily tied to this series, it's a big no-no to restrict what a parent can do depending on what its descendants are doing. A cgroup higher up in the hierarchy should be able to change configuration however it sees fit as deligation breaks down otherwise. Maybe you can argue that cpuset is special and shouldn't be subject to such convention but I can't see strong enough justifications especially given that most of these restrictions can be broken by hotplug operations anyway and thus need code to handle those situations. > Changing a partition root (valid or invalid) to "member" is > always allowed. If there are child partition roots underneath > it, however, they will be forced to be switched back to "member" > too and lose their partitions. So care must be taken to double > check for this condition before disabling a partition root. Wouldn't it make more sense for them to retain their configuration and turn invalid? Why is this special? > A valid parent partition may distribute out all its CPUs to > its child partitions as long as it is not the root cgroup and > there is no task associated with it. A valid parent partition which isn't root never has tasks in them to begin with. > An invalid partition root can be reverted back to a valid one > if none of the validity constraints of a valid partition root > are violated. > > Poll and inotify events are triggered whenever the state of > "cpuset.cpus.partition" changes. That includes changes caused by > write to "cpuset.cpus.partition", cpu hotplug and other changes > that make the partition invalid. This will allow user space > agents to monitor unexpected changes to "cpuset.cpus.partition" > without the need to do continuous polling. Unfortunately, my sense is still that both the restrictions and behaviors are pretty arbitrary. I can somewhat see how the restrictions may make sense in a specific frame of mind but am having a hard time finding strong enough justifications for them. There are many really specific rules and it isn't clear why they are the way they are. Thanks. -- tejun
WARNING: multiple messages have this Message-ID (diff)
From: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> To: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Cc: "Michal Koutný" <mkoutny-IBi9RG/b67k@public.gmane.org>, "Zefan Li" <lizefan.x-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>, "Johannes Weiner" <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>, "Jonathan Corbet" <corbet-T1hC0tSOHrs@public.gmane.org>, "Shuah Khan" <shuah-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-doc-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-kselftest-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, "Andrew Morton" <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>, "Roman Gushchin" <guro-b10kYP2dOMg@public.gmane.org>, "Phil Auld" <pauld-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Peter Zijlstra" <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>, "Juri Lelli" <juri.lelli-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "Frederic Weisbecker" <frederic-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>, "Marcelo Tosatti" <mtosatti-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Subject: Re: [PATCH v8 5/6] cgroup/cpuset: Update description of cpuset.cpus.partition in cgroup-v2.rst Date: Tue, 30 Nov 2021 07:11:56 -1000 [thread overview] Message-ID: <YaZbXArNIMNvwJD/@slm.duckdns.org> (raw) In-Reply-To: <293d7abf-aff6-fcd8-c999-b1dbda1cffb8-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Hello, Waiman. On Tue, Nov 30, 2021 at 10:35:19AM -0500, Waiman Long wrote: > On read, the "cpuset.cpus.partition" file can show the following > values. > > ====================== ============================== > "member" Non-root member of a partition > "root" Partition root > "isolated" Partition root without load balancing > "root invalid (<reason>)" Invalid partition root > ====================== ============================== What happens if an isolated domain becomes invalid and then valid again due to cpu hotplug? Does it go "root invalid" and then back to "isolated"? ... > Before the "member" to partition root transition can happen, > the following conditions must be met or the transition will > not be allowed. > > 1) The "cpuset.cpus" is non-empty and exclusive, i.e. they are > not shared by any of its siblings. > 2) The parent cgroup is a valid partition root. > 3) The "cpuset.cpus" is a subset of parent's "cpuset.cpus". > 4) There is no child cgroups with cpuset enabled. This avoids > cpu migrations of multiple cgroups simultaneously which can > be problematic. So, I still have a hard time justifying the above restrictions. 1) can be broken through hotplug anyway. 2) can be broken by the parent switching to member. 3) would mean that we'd need to restrict parent's config changes depending on what children are doing. 4) is more understandable but it's an implementation detail that we can address in the future. > Once becoming a partition root, the following two rules restrict > what changes can be made to "cpuset.cpus". > > 1) The value must be exclusive. > 2) If child cpusets exist, the value must be a superset of what > are defined in the child cpusets. > > The second rule applies even for "member". Other changes to > "cpuset.cpus" that do not violate the above rules are always > allowed. While it isn't necessarily tied to this series, it's a big no-no to restrict what a parent can do depending on what its descendants are doing. A cgroup higher up in the hierarchy should be able to change configuration however it sees fit as deligation breaks down otherwise. Maybe you can argue that cpuset is special and shouldn't be subject to such convention but I can't see strong enough justifications especially given that most of these restrictions can be broken by hotplug operations anyway and thus need code to handle those situations. > Changing a partition root (valid or invalid) to "member" is > always allowed. If there are child partition roots underneath > it, however, they will be forced to be switched back to "member" > too and lose their partitions. So care must be taken to double > check for this condition before disabling a partition root. Wouldn't it make more sense for them to retain their configuration and turn invalid? Why is this special? > A valid parent partition may distribute out all its CPUs to > its child partitions as long as it is not the root cgroup and > there is no task associated with it. A valid parent partition which isn't root never has tasks in them to begin with. > An invalid partition root can be reverted back to a valid one > if none of the validity constraints of a valid partition root > are violated. > > Poll and inotify events are triggered whenever the state of > "cpuset.cpus.partition" changes. That includes changes caused by > write to "cpuset.cpus.partition", cpu hotplug and other changes > that make the partition invalid. This will allow user space > agents to monitor unexpected changes to "cpuset.cpus.partition" > without the need to do continuous polling. Unfortunately, my sense is still that both the restrictions and behaviors are pretty arbitrary. I can somewhat see how the restrictions may make sense in a specific frame of mind but am having a hard time finding strong enough justifications for them. There are many really specific rules and it isn't clear why they are the way they are. Thanks. -- tejun
next prev parent reply other threads:[~2021-11-30 17:12 UTC|newest] Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-10-18 14:36 [PATCH v8 0/6] cgroup/cpuset: Add new cpuset partition type & empty effecitve cpus Waiman Long 2021-10-18 14:36 ` Waiman Long 2021-10-18 14:36 ` [PATCH v8 1/6] cgroup/cpuset: Allow no-task partition to have empty cpuset.cpus.effective Waiman Long 2021-10-18 14:36 ` Waiman Long 2021-10-18 14:36 ` [PATCH v8 2/6] cgroup/cpuset: Refining features and constraints of a partition Waiman Long 2021-10-18 14:36 ` Waiman Long 2021-10-18 14:36 ` [PATCH v8 3/6] cgroup/cpuset: Add a new isolated cpus.partition type Waiman Long 2021-10-18 14:36 ` [PATCH v8 4/6] cgroup/cpuset: Show invalid partition reason string Waiman Long 2021-10-18 14:36 ` [PATCH v8 5/6] cgroup/cpuset: Update description of cpuset.cpus.partition in cgroup-v2.rst Waiman Long 2021-11-15 19:31 ` Michal Koutný 2021-11-15 19:31 ` Michal Koutný 2021-11-15 20:11 ` Tejun Heo 2021-11-15 20:11 ` Tejun Heo 2021-11-15 21:27 ` Waiman Long 2021-11-15 21:27 ` Waiman Long 2021-11-15 21:10 ` Waiman Long 2021-11-16 17:54 ` Michal Koutný 2021-11-30 15:35 ` Waiman Long 2021-11-30 17:11 ` Tejun Heo [this message] 2021-11-30 17:11 ` Tejun Heo 2021-12-01 3:56 ` Waiman Long 2021-12-01 14:13 ` Michal Koutný 2021-12-01 14:13 ` Michal Koutný 2021-12-01 14:56 ` Waiman Long 2021-12-01 14:56 ` Waiman Long 2021-12-01 16:39 ` Tejun Heo 2021-12-01 17:49 ` Waiman Long 2021-12-01 17:49 ` Waiman Long 2021-12-01 14:26 ` Waiman Long 2021-12-01 14:26 ` Waiman Long 2021-12-01 16:46 ` Tejun Heo 2021-12-01 16:46 ` Tejun Heo 2021-12-01 18:05 ` Waiman Long 2021-12-02 1:28 ` Waiman Long 2021-12-03 18:25 ` Michal Koutný 2021-12-03 18:25 ` Michal Koutný 2021-12-03 19:27 ` Waiman Long 2021-10-18 14:36 ` [PATCH v8 6/6] kselftest/cgroup: Add cpuset v2 partition root state test Waiman Long 2021-10-18 14:36 ` Waiman Long 2021-10-27 23:05 ` [PATCH v8 0/6] cgroup/cpuset: Add new cpuset partition type & empty effecitve cpus Waiman Long 2021-11-10 11:13 ` Felix Moessbauer 2021-11-10 13:21 ` Marcelo Tosatti 2021-11-10 13:56 ` Michal Koutný 2021-11-10 15:21 ` Moessbauer, Felix 2021-11-10 15:21 ` Moessbauer, Felix 2021-11-10 16:10 ` Marcelo Tosatti 2021-11-10 16:10 ` Marcelo Tosatti 2021-11-10 16:14 ` Marcelo Tosatti 2021-11-10 16:14 ` Marcelo Tosatti 2021-11-10 16:15 ` Jan Kiszka 2021-11-10 16:15 ` Jan Kiszka 2021-11-10 17:29 ` Marcelo Tosatti 2021-11-10 17:29 ` Marcelo Tosatti 2021-11-10 18:30 ` Waiman Long 2021-11-10 18:30 ` Waiman Long 2021-11-10 17:52 ` Michal Koutný 2021-11-10 17:52 ` Michal Koutný 2021-11-10 18:04 ` Jan Kiszka 2021-11-10 18:04 ` Jan Kiszka 2021-11-10 18:15 ` Michal Koutný 2021-11-10 18:15 ` Michal Koutný 2021-11-10 15:20 ` Waiman Long
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=YaZbXArNIMNvwJD/@slm.duckdns.org \ --to=tj@kernel.org \ --cc=akpm@linux-foundation.org \ --cc=cgroups@vger.kernel.org \ --cc=corbet@lwn.net \ --cc=frederic@kernel.org \ --cc=guro@fb.com \ --cc=hannes@cmpxchg.org \ --cc=juri.lelli@redhat.com \ --cc=linux-doc@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-kselftest@vger.kernel.org \ --cc=lizefan.x@bytedance.com \ --cc=longman@redhat.com \ --cc=mkoutny@suse.com \ --cc=mtosatti@redhat.com \ --cc=pauld@redhat.com \ --cc=peterz@infradead.org \ --cc=shuah@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.