linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paul Jackson <pj@sgi.com>
To: KUROSAWA Takahiro <kurosawa@valinux.co.jp>
Cc: taka@valinux.co.jp, magnus.damm@gmail.com, dino@in.ibm.com,
	linux-kernel@vger.kernel.org, ckrm-tech@lists.sourceforge.net
Subject: Re: [ckrm-tech] Re: [PATCH 1/3] CPUMETER: add cpumeter framework to the CPUSETS
Date: Tue, 27 Sep 2005 08:49:05 -0700	[thread overview]
Message-ID: <20050927084905.7d77bdde.pj@sgi.com> (raw)
In-Reply-To: <20050927113902.C78A570046@sv1.valinux.co.jp>

Takahiro-san asks perceptive question:
> If it is prohibited to set meter_cpu=0 for the immediate children 
> of C, cpuset_create() needs a check whether the siblings are
> metered or not. 

Very good question.  It exposes an impossibility in my proposal, as
stated.  The rule I had in mind was that either all the children of C
had meter_cpu set, or none of them.  But since one can only mark one
cpuset at a time, this is impossible to setup if there is more than
one child already.

Allow me to try to fix my proposal.

Instead of doing the impossible and trying to mark all the children
of C as meter_cpu all at same instant in time, I should just mark the
parent cpuset C, one time.  Then if C is so marked, its -children- now
have the meter_cpu_* files, which can default at that instant in time to
providing (1/N) of the cpu to each of the N children of C.  I would
say that C can only be marked meter_cpu if:
 * C is already marked cpu_exclusive.
 * All of its children have the same 'cpus' setting as C.
 * Any new child of C created after C is marked meter_cpu will
   automatically start with the same 'cpus' setting as C, and
   with the meter_cpu_* files.
 * It is prohibited to change the 'cpus' of any cpuset whose
   parent is marked meter_cpu.
 * Changing the 'cpus' of a cpuset such as C that is itself
   marked meter_cpu will instantly change the 'cpus' of each
   of its children.
 * It is prohibited to turn on the cpu_exclusive flag of a cpuset
   whose parent is marked meter_cpu, or to turn off the cpu_exclusive
   flag on a cpuset that is itself marked meter_cpu.

Similar rules would apply for mem_exclusive and meter_mem.

The metered children of C may have their own children in turn, which
may have cpus any subset of the cpus in C, but which cannot be marked
cpu_exclusive or meter_cpu.

Borrowing your fine art work, and modifying it slightly, this looks
like:

      +-----------------------------------+
      |                                   |
   CPUSET 0                            CPUSET 1 (aka 'C')
   sched domain A                      sched domain B
   cpus: 0, 1                          cpus: 2, 3
   cpu_exclusive=1                     cpu_exclusive=1
   meter_cpu=0                         meter_cpu=1
                                          |
                         +----------------+----------------+
                         |                |                |
                      CPUSET 1a        CPUSET 1b        CPUSET 1c
                      cpus: 2, 3       cpus: 2, 3       cpus: 2, 3
                      cpu_exclusive=0  cpu_exclusive=0  cpu_exclusive=0
                      meter_cpu=0      meter_cpu=0      meter_cpu=0
                      meter_cpu_*      meter_cpu_*      meter_cpu_*
                         |
            +------------+------------+
            |                         |
         CPUSET 2a                CPUSET 2b
         cpus: 2                  cpus: 3
         meter_cpu=0              meter_cpu=0
         cpu_exclusive=0          cpu_exclusive=0

Note here that marking C (CPUSET 1) as meter_cpu exposes the meter_cpu_*
files in the children of C.

> Is it prohibited for any decendant of C's children to set meter_cpu=1 ?

Yes, I presume so, and made up my new rules above assuming that.  It is
definitely worth an effort in my opinion to allow creating nested
ordinary (not metered) cpusets below the children of C, but I am
guessing it would be too hard to try to allow nesting of metered
cpusets below metered cpusets.  If you have a mind to try that however,
I am more than willing to listen to your proposal.

The above proposal makes it more obvious than ever that I am starting
to overload the meaning of cpu_exclusive and mem_exclusive perhaps a
bit too much.

One or the other of the two *_exclusive flags should be required
preconditions for some of these special properties (sched domains,
GFP_KERNEL memory allocation confinement, oom killer confinement, cpu
metering and memory metering), but perhaps actually enabling any of
these special properties should be an additional and distinct choice.

Therefore I propose some new cpuset flags:
 * 'sched_domain' to mark sched domains (now done by the cpu_exclusive
   flag),
 * 'kernel_memory' to mark the constraints on GFP_KERNEL allocations,
 * 'oom_killer' to mark the constraints on oom killing,
 * your 'meter_cpu' flag to mark a set of metered cpus, and
 * your 'meter_mem' flag to mark a set of metered mems.

Each of these new flags would require the appropriate cpu_exclusive or
mem_exclusive flag on the same cpuset to already be set, but just
setting the *_exclusive flags by themselves would not be enough to get
you the special behaviour.  You would also have to set the appropriate
one of these new flags.

So, for example, the condition to define a sched domain would change,
from just being the lowest level cpuset marked cpu_exclusive (or the
left over CPUs not marked exclusive), to being both that -and- having
its "sched_domain" flag marked True (or being the left over CPUs,
again).

At first writing, I like the sound of this.  But then I often
think my suggestions are good, when I first write them <grin>.

Without these new flags, the interface has an odd assymmetry to it.
Just setting cpu_exclusive could get you a sched domain for instance,
but you had to have both cpu_exclusive and meter_cpu to get the
cpu metering code.  The only reason for this was that Dinakar got his
sched domain patch in before you got your cpu meter patch in, which is
a poor reason if I do say so.

These extra flags have an additional benefit.  They make explicit
to the user level what additional semantics are switched on, rather
than hiding them as implicit side affects of the cpu_exclusive
configuration.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

  reply	other threads:[~2005-09-27 15:49 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-09-08  5:39 [PATCH 0/5] SUBCPUSETS: a resource control functionality using CPUSETS KUROSAWA Takahiro
2005-09-08  7:23 ` Paul Jackson
2005-09-08  8:18   ` KUROSAWA Takahiro
2005-09-08 12:02     ` Paul Jackson
2005-09-09  1:38       ` KUROSAWA Takahiro
2005-09-09  4:12         ` Magnus Damm
2005-09-09  5:55           ` Paul Jackson
2005-09-09  7:52             ` Magnus Damm
2005-09-09  8:39               ` Paul Jackson
2005-09-09 11:38             ` Hirokazu Takahashi
2005-09-09 13:31               ` Paul Jackson
2005-09-10  7:11                 ` Hirokazu Takahashi
2005-09-10  8:52                   ` Paul Jackson
2005-09-11 16:02                     ` Hirokazu Takahashi
2005-09-26  9:33                     ` [PATCH 0/3] CPUMETER (Re: [PATCH 0/5] SUBCPUSETS: a resource control functionality using CPUSETS) KUROSAWA Takahiro
2005-10-02  4:20                       ` Paul Jackson
2005-10-04  2:49                         ` KUROSAWA Takahiro
2005-09-26  9:34                     ` [PATCH 1/3] CPUMETER: add cpumeter framework to the CPUSETS KUROSAWA Takahiro
2005-09-27  8:37                       ` Paul Jackson
2005-09-27  9:22                         ` Nick Piggin
2005-09-27 15:53                           ` [ckrm-tech] " Paul Jackson
2005-09-27 21:45                           ` Chandra Seetharaman
2005-09-28  6:35                           ` KUROSAWA Takahiro
2005-09-28 10:08                             ` Hirokazu Takahashi
2005-09-28 10:32                               ` KUROSAWA Takahiro
2005-09-27 11:39                         ` KUROSAWA Takahiro
2005-09-27 15:49                           ` Paul Jackson [this message]
2005-09-28  6:21                             ` [ckrm-tech] " KUROSAWA Takahiro
2005-09-28  6:43                               ` Paul Jackson
2005-09-28  7:08                               ` Paul Jackson
2005-09-28  7:53                                 ` KUROSAWA Takahiro
2005-09-28 16:49                                   ` Paul Jackson
2005-09-29  2:53                                     ` KUROSAWA Takahiro
2005-09-29  2:58                                       ` Paul Jackson
2005-09-30  9:39                                       ` Simon Derr
2005-09-30 14:21                                         ` Paul Jackson
2005-10-02  7:01                             ` Ok to change cpuset flags for sched domains? (was [PATCH 1/3] CPUMETER ...) Paul Jackson
2005-10-03 14:00                               ` Dinakar Guniguntala
2005-10-03 23:36                                 ` [ckrm-tech] " Paul Jackson
2005-09-28  9:25                           ` [PATCH][BUG] fix memory leak on reading cpuset files after seeking beyond eof KUROSAWA Takahiro
2005-09-28 13:42                             ` Paul Jackson
2005-09-28 13:42                             ` [PATCH] cpuset read past eof memory leak fix Paul Jackson
2005-09-28 15:01                               ` Linus Torvalds
2005-09-28 17:53                                 ` Paul Jackson
2005-09-28 18:03                                   ` Linus Torvalds
2005-09-28 18:03                                   ` Randy.Dunlap
2005-09-28 19:04                                     ` [ckrm-tech] " Paul Jackson
2005-09-28 14:29                           ` [PATCH 1/3] CPUMETER: add cpumeter framework to the CPUSETS Paul Jackson
2005-09-26  9:34                     ` [PATCH 2/3] CPUMETER: CPU resource controller KUROSAWA Takahiro
2005-09-26  9:34                     ` [PATCH 3/3] CPUMETER: connect the CPU resource controller to CPUMETER KUROSAWA Takahiro
2005-09-09 22:26           ` [PATCH 0/5] SUBCPUSETS: a resource control functionality using CPUSETS Matthew Helsley
2005-09-08 13:14   ` Dinakar Guniguntala
2005-09-08 14:11     ` Dipankar Sarma
2005-09-08 14:55       ` Paul Jackson
2005-09-08 14:59     ` Paul Jackson
2005-09-08 22:51     ` [ckrm-tech] " Chandra Seetharaman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050927084905.7d77bdde.pj@sgi.com \
    --to=pj@sgi.com \
    --cc=ckrm-tech@lists.sourceforge.net \
    --cc=dino@in.ibm.com \
    --cc=kurosawa@valinux.co.jp \
    --cc=linux-kernel@vger.kernel.org \
    --cc=magnus.damm@gmail.com \
    --cc=taka@valinux.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).