linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Waiman Long <longman@redhat.com>
To: "Michal Koutný" <mkoutny@suse.com>
Cc: Tejun Heo <tj@kernel.org>, Zefan Li <lizefan.x@bytedance.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Jonathan Corbet <corbet@lwn.net>, Shuah Khan <shuah@kernel.org>,
	cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Roman Gushchin <guro@fb.com>, Phil Auld <pauld@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Frederic Weisbecker <frederic@kernel.org>,
	Marcelo Tosatti <mtosatti@redhat.com>
Subject: Re: [PATCH v8 5/6] cgroup/cpuset: Update description of cpuset.cpus.partition in cgroup-v2.rst
Date: Tue, 30 Nov 2021 10:35:19 -0500	[thread overview]
Message-ID: <293d7abf-aff6-fcd8-c999-b1dbda1cffb8@redhat.com> (raw)
In-Reply-To: <20211116175411.GA50019@blackbody.suse.cz>

On 11/16/21 12:54, Michal Koutný wrote:
> On Mon, Nov 15, 2021 at 04:10:29PM -0500, Waiman Long <longman@redhat.com> wrote:
>>> On Mon, Oct 18, 2021 at 10:36:18AM -0400, Waiman Long <longman@redhat.com> wrote:
>>>> +	scheduler.  Tasks in such a partition must be explicitly bound
>>>> +	to each individual CPU.
>>>> [...]
>>>>
>>>> It can be a problem when one is trying to move from one cgroup to another
>>>> cgroup with non-overlapping cpus laterally. However, if a task is initially
>>>> from a parent cgroup with affinity mask that include cpus in the isolated
>>>> child cgroup, I believe it should be able to move to the isolated child
>>>> cgroup without problem. Otherwise, it is a bug that needs to be fixed.
> app_root	cpuset.cpus=0-3
> `- non_rt	cpuset.cpus=0-1	cpuset.cpus.partition=member
> `- rt		cpuset.cpus=2-3	cpuset.cpus.partition=isolated
>
> The app_root would have cpuset.cpus.effective=0-1 so even the task in
> app_root can't sched_setaffinity() to cpus 2-3.
> But AFAICS, the migration calls set_cpus_allowed_ptr() anyway, so the
> task in the isolated partition needn't to bind explicitly with
> sched_setaffinity(). (It'd have two cpus available, so one more
> sched_setaffinity() or migration into a single-cpu list is desirable.)
>
> All in all, I think the behavior is OK and the explicit binding of tasks
> in an isolated cpuset is optional (not a must as worded currently).
>
>
>> I think the wording may be confusing. What I meant is none of the requested
>> cpu can be granted. So if there is at least one granted, the effective cpus
>> won't be empty.
> Ack.
>
>> You currently cannot make change to cpuset.cpus that violates the cpu
>> exclusivity rule. The above constraints will not disallow you to make the
>> change. They just affect the validity of the partition root.
> Sibling exclusivity should be a validity condition regardless of whether
> transition is allowed or not. (At least it looks simpler to me.)
>
>
>>>> +        Changing a partition root to "member" is always allowed.
>>>> +        If there are child partition roots underneath it, however,
>>>> +        they will be forced to be switched back to "member" too and
>>>> +        lose their partitions. So care must be taken to double check
>>>> +        for this condition before disabling a partition root.
>>> (Or is this how delegation is intended?) However, AFAICS, parent still
>>> can't remove cpuset.cpus even when the child is a "member". Otherwise,
>>> I agree with the back-switch.
>> There are only 2 possibilities here. Either we force the child partitions to
>> be become members or invalid partition root.
> My point here was mostly about preempting the cpus (as a v2 specific
> feature). (I'm rather indifferent whether children turn into invalid
> roots or members.)

Below is my latest iterations of the cpuset.cpus.partition 
documentation. If there is no objection or other suggestion for 
improvement, I am going to send out another iteration of the patch 
series with the updated documentation.

Cheers,
Longman

--------------------------------------------------------------

   cpuset.cpus.partition
     A read-write single value file which exists on non-root
     cpuset-enabled cgroups.  This flag is owned by the parent cgroup
     and is not delegatable.

     It accepts only the following input values when written to.

       ========    ================================
       "member"    Non-root member of a partition
       "root"    Partition root
       "isolated"    Partition root without load balancing
       ========    ================================

     The root cgroup is always a partition root and its state
     cannot be changed.  All other non-root cgroups start out as
     "member".

     When set to "root", the current cgroup is the root of a new
     partition or scheduling domain that comprises itself and
     all its descendants except those that are separate partition
     roots themselves and their descendants.

     The value shown in "cpuset.cpus.effective" of a partition root is
     the CPUs that the parent partition root can dedicate to the new
     partition root.  They are subtracted from "cpuset.cpus.effective"
     of the parent and may be different from "cpuset.cpus"

     When set to "isolated", the CPUs in that partition root will
     be in an isolated state without any load balancing from the
     scheduler.  Tasks placed in such a partition with multiple
     CPUs should be carefully distributed and bound to each of the
     individual CPUs for optimal performance.

     A partition root ("root" or "isolated") can be in one of the
     two possible states - valid or invalid.  An invalid partition
     root is in a degraded state where some state information are
     retained, but behaves more like a "member".

     On read, the "cpuset.cpus.partition" file can show the following
     values.

       ======================    ==============================
       "member"            Non-root member of a partition
       "root"            Partition root
       "isolated"            Partition root without load balancing
       "root invalid (<reason>)"    Invalid partition root
       ======================    ==============================

     In the case of an invalid partition root, a descriptive string on
     why the partition is invalid is included within parentheses.

     Almost all possible state transitions among "member", valid
     and invalid partition roots are allowed except from "member"
     to invalid partition root.

     Before the "member" to partition root transition can happen,
     the following conditions must be met or the transition will
     not be allowed.

     1) The "cpuset.cpus" is non-empty and exclusive, i.e. they are
        not shared by any of its siblings.
     2) The parent cgroup is a valid partition root.
     3) The "cpuset.cpus" is a subset of parent's "cpuset.cpus".
     4) There is no child cgroups with cpuset enabled.  This avoids
        cpu migrations of multiple cgroups simultaneously which can
        be problematic.

     Once becoming a partition root, the following two rules restrict
     what changes can be made to "cpuset.cpus".

     1) The value must be exclusive.
     2) If child cpusets exist, the value must be a superset of what
        are defined in the child cpusets.

     The second rule applies even for "member". Other changes to
     "cpuset.cpus" that do not violate the above rules are always
     allowed.

     External events like hotplug or inappropriate changes to
     "cpuset.cpus" can cause a valid partition root to become invalid.
     Besides the constraints on changing "cpuset.cpus" listed above,
     the other conditions required to maintain the validity of a
     partition root are as follows:

     1) The parent cgroup is a valid partition root.
     2) If "cpuset.cpus.effective" is empty, the partition must have
        no task associated with it. Otherwise, the partition becomes
        invalid and "cpuset.cpus.effective" will fall back to that
        of the nearest non-empty ancestor.

     A corollary of a valid partition root is that
     "cpuset.cpu.effective" is always a subset of "cpuset.cpus".
     Note that a task cannot be moved to a cgroup with empty
     "cpuset.cpus.effective".

     Changing a partition root (valid or invalid) to "member" is
     always allowed.  If there are child partition roots underneath
     it, however, they will be forced to be switched back to "member"
     too and lose their partitions. So care must be taken to double
     check for this condition before disabling a partition root.

     A valid parent partition may distribute out all its CPUs to
     its child partitions as long as it is not the root cgroup and
     there is no task associated with it.

     An invalid partition root can be reverted back to a valid one
     if none of the validity constraints of a valid partition root
     are violated.

     Poll and inotify events are triggered whenever the state of
     "cpuset.cpus.partition" changes.  That includes changes caused by
     write to "cpuset.cpus.partition", cpu hotplug and other changes
     that make the partition invalid.  This will allow user space
     agents to monitor unexpected changes to "cpuset.cpus.partition"
     without the need to do continuous polling.



  reply	other threads:[~2021-11-30 15:36 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-18 14:36 [PATCH v8 0/6] cgroup/cpuset: Add new cpuset partition type & empty effecitve cpus Waiman Long
2021-10-18 14:36 ` [PATCH v8 1/6] cgroup/cpuset: Allow no-task partition to have empty cpuset.cpus.effective Waiman Long
2021-10-18 14:36 ` [PATCH v8 2/6] cgroup/cpuset: Refining features and constraints of a partition Waiman Long
2021-10-18 14:36 ` [PATCH v8 3/6] cgroup/cpuset: Add a new isolated cpus.partition type Waiman Long
2021-10-18 14:36 ` [PATCH v8 4/6] cgroup/cpuset: Show invalid partition reason string Waiman Long
2021-10-18 14:36 ` [PATCH v8 5/6] cgroup/cpuset: Update description of cpuset.cpus.partition in cgroup-v2.rst Waiman Long
2021-11-15 19:31   ` Michal Koutný
2021-11-15 20:11     ` Tejun Heo
2021-11-15 21:27       ` Waiman Long
2021-11-15 21:10     ` Waiman Long
2021-11-16 17:54       ` Michal Koutný
2021-11-30 15:35         ` Waiman Long [this message]
2021-11-30 17:11           ` Tejun Heo
2021-12-01  3:56             ` Waiman Long
2021-12-01 14:13               ` Michal Koutný
2021-12-01 14:56                 ` Waiman Long
2021-12-01 16:39                   ` Tejun Heo
2021-12-01 17:49                     ` Waiman Long
2021-12-01 14:26               ` Waiman Long
2021-12-01 16:46               ` Tejun Heo
2021-12-01 18:05                 ` Waiman Long
2021-12-02  1:28                   ` Waiman Long
2021-12-03 18:25                     ` Michal Koutný
2021-12-03 19:27                       ` Waiman Long
2021-10-18 14:36 ` [PATCH v8 6/6] kselftest/cgroup: Add cpuset v2 partition root state test Waiman Long
2021-10-27 23:05 ` [PATCH v8 0/6] cgroup/cpuset: Add new cpuset partition type & empty effecitve cpus Waiman Long
2021-11-10 11:13 ` Felix Moessbauer
2021-11-10 13:21   ` Marcelo Tosatti
2021-11-10 13:56   ` Michal Koutný
2021-11-10 15:21     ` Moessbauer, Felix
2021-11-10 16:10       ` Marcelo Tosatti
2021-11-10 16:14         ` Marcelo Tosatti
2021-11-10 16:15         ` Jan Kiszka
2021-11-10 17:29           ` Marcelo Tosatti
2021-11-10 18:30             ` Waiman Long
2021-11-10 17:52           ` Michal Koutný
2021-11-10 18:04             ` Jan Kiszka
2021-11-10 18:15       ` Michal Koutný
2021-11-10 15:20   ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=293d7abf-aff6-fcd8-c999-b1dbda1cffb8@redhat.com \
    --to=longman@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=corbet@lwn.net \
    --cc=frederic@kernel.org \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=juri.lelli@redhat.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=lizefan.x@bytedance.com \
    --cc=mkoutny@suse.com \
    --cc=mtosatti@redhat.com \
    --cc=pauld@redhat.com \
    --cc=peterz@infradead.org \
    --cc=shuah@kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).