linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paul Jackson <pj@sgi.com>
To: dino@in.ibm.com
Cc: Simon.Derr@bull.net, nickpiggin@yahoo.com.au,
	linux-kernel@vger.kernel.org, lse-tech@lists.sourceforge.net,
	akpm@osdl.org, dipankar@in.ibm.com, colpatch@us.ibm.com
Subject: Re: [Lse-tech] Re: [RFC PATCH] Dynamic sched domains aka Isolated cpusets
Date: Fri, 22 Apr 2005 14:26:18 -0700	[thread overview]
Message-ID: <20050422142618.08d74ede.pj@sgi.com> (raw)
In-Reply-To: <20050421162738.GA4200@in.ibm.com>

Dinakar wrote:
> Ok, Let me begin at the beginning and attempt to define what I am 
> doing here

The statement of requirements and approach help.  Thank-you.

And the comments in the code patch are much easier for me
to understand.  Thanks.

Let me step back and consider where we are here.

I've not been entirely happy with the cpu_exclusive (and mem_exclusive)
properties.  They were easy to code, and they require only looking at
ones siblings and parent, but they don't provide all that people usually
want, which is system wide exclusivity, because they don't exclude tasks
in ones parent (or more remote ancestor) cpusets from stealing resources.

I take your isolated cpusets as a reasonable attempt to provide what's
really wanted.  I had avoided simple, system-wide exclusivity because
I really wanted cpusets to be hierarchical.  One should be able to
subdivide and manage one subtree of the cpuset hierarchy, oblivious
to what someone else is doing with a disjoint subtree.  Your work shows
how to provide a stronger form of isolation (exclusivity) without
abandoning the hierarchical structure.

There are three directions we could go from here.  I am not yet decided
between them:

 1) Remove cpu and mem exclusive flags - they are of limited use.

 2) Leave code as is.

 3) Extend the exclusive capability to include isolation from parents,
    along the lines of your patch.

If I was redoing cpusets from scratch, I might not include the exclusive
feature at all - not sure.  But it's cheap, at least in terms of code,
and of some use to some users.  So I would choose (2) over (1), given
where we are now.  The main cost at present of the exclusive flags is
the cost in understanding - they tend to confuse people at first glance,
due to their somewhat unusual approach.

If we go with (3), then I'd like to consider the overall design of this
a bit more.  Your patch, as is common for patches, attempts to work within
the current framework, minimizing change.  Better to take a step back and
consider what would have been the best design as if the past didn't matter,
then with that clearly in mind, ask how best to get there from here.

I don't think we would have both isolated and exclusive flags, in the
'ideal design.'  The exclusive flags are essentially half (or a third)
of what's needed, and the isolated flags and masks the rest of it.

Essentially, your patch replaces the single set of CPUs in a cpuset
with three, related sets:
 A] the set of all CPUs managed by that cpuset
 B] the set of CPUs allowed to tasks attached to that cpuset
 C] the set of CPUs isolated for the dedicated use of some descendent

Sets [B] and [C] form a partition of [A] -- their intersection is empty,
and their union is [A].

Your current presentation of these sets of CPUs shows set [B] in the
cpus file, followed by set [C] in brackets, if I am recalling correctly.
This format changes the format of the current cpus_allowed file, and it
violates the preference for a single value or vector per file.  I would
like to consider alternatives.

Your code automatically updates [C] if the child cpuset adds or removes
CPUs from those it manages in isolation (though I am not sure that your
code manages this change all the way back up the hierarchy to the top
cpuset, and I wondering if perhaps your code should be doing this, as
noted in my detailed comments on your patch earlier today.)

I'd be tempted, if taking this approach (3) to consider a couple of
alternatives.

As I spelled out a few days ago, one could mark some cpusets that form a
partition of the systems CPUs, for the purposes of establishing isolated
scheduler domains, without requiring the above three related sets per
cpuset instead of one.  I am still unsure how much of your motivation is
the need to make the scheduler more efficient by establishing useful
isolated sched domains, and how much is the need to keep the usage of
CPUs by various jobs isolated, even from tasks attached to parent cpusets.

One can obtain the job isolation just in user code - if you don't want a
task to use a parent cpusets access to your isolated cpuset, then simply
don't attach a task to the parent cpusets.  I do not understand yet how
strong your requirement is to have the _kernel_ enforce that there are
not tasks in a parent cpuset which could intrude on the non-isolated
resources of a child.  I provide (non open source) user level tools to
my users which enable them to conveniently ensure that there are no such
unwanted tasks, so they don't have a problem with a parent cpusets CPUs
overlapping a cpuset that they are using for an isolated job.  Perhaps I
could persuade my employer that it would be appropriate to open source
these tools.

In any case, going (3) would result in _one_ attribute, not two (both
exclusive and isolated, with overlapping semantics, which is confusing.)

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

  reply	other threads:[~2005-04-22 21:29 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-10-07  0:51 [RFC PATCH] scheduler: Dynamic sched_domains Matthew Dobson
2004-10-07  2:13 ` Nick Piggin
2004-10-07 17:01   ` Jesse Barnes
2004-10-08  5:55     ` [Lse-tech] " Takayoshi Kochi
2004-10-08  6:08       ` Nick Piggin
2004-10-08 16:43         ` Jesse Barnes
2004-10-07 21:58   ` Matthew Dobson
2004-10-08  0:22     ` Nick Piggin
2004-10-07 22:20   ` Matthew Dobson
2004-10-07  4:12 ` [ckrm-tech] " Marc E. Fiuczynski
2004-10-07  5:35   ` Paul Jackson
2004-10-07 22:06   ` Matthew Dobson
2004-10-07  9:32 ` Paul Jackson
2004-10-08 10:14 ` [Lse-tech] " Erich Focht
2004-10-08 10:40   ` Nick Piggin
2004-10-08 15:50     ` [ckrm-tech] " Hubertus Franke
2004-10-08 22:48       ` Matthew Dobson
2004-10-08 18:54     ` Matthew Dobson
2004-10-08 21:56       ` Peter Williams
2004-10-08 22:52         ` Matthew Dobson
2004-10-08 23:13       ` Erich Focht
2004-10-08 23:50         ` Nick Piggin
2004-10-10 12:25           ` Erich Focht
2004-10-08 22:51     ` Erich Focht
2004-10-09  1:05       ` Matthew Dobson
2004-10-10 12:45         ` Erich Focht
2004-10-12 22:45           ` Matthew Dobson
2004-10-08 18:45   ` Matthew Dobson
2005-04-18 20:26 ` [RFC PATCH] Dynamic sched domains aka Isolated cpusets Dinakar Guniguntala
2005-04-18 23:44   ` Nick Piggin
2005-04-19  8:00     ` Dinakar Guniguntala
2005-04-19  5:54   ` Paul Jackson
2005-04-19  6:19     ` Nick Piggin
2005-04-19  6:59       ` Paul Jackson
2005-04-19  7:09         ` Nick Piggin
2005-04-19  7:25           ` Paul Jackson
2005-04-19  7:28           ` Paul Jackson
2005-04-19  7:19       ` Paul Jackson
2005-04-19  7:57         ` Nick Piggin
2005-04-19 20:34           ` Paul Jackson
2005-04-23 23:26             ` Paul Jackson
2005-04-26  0:52               ` Matthew Dobson
2005-04-26  0:59                 ` Paul Jackson
2005-04-19  9:52       ` Dinakar Guniguntala
2005-04-19 15:26         ` Paul Jackson
2005-04-20  7:37           ` Dinakar Guniguntala
2005-04-19 20:42         ` Paul Jackson
2005-04-19  8:12     ` Simon Derr
2005-04-19 16:19       ` Paul Jackson
2005-04-19  9:34     ` [Lse-tech] " Dinakar Guniguntala
2005-04-19 17:23       ` Paul Jackson
2005-04-20  7:16         ` Dinakar Guniguntala
2005-04-20 19:09           ` Paul Jackson
2005-04-21 16:27             ` Dinakar Guniguntala
2005-04-22 21:26               ` Paul Jackson [this message]
2005-04-23  7:24                 ` Dinakar Guniguntala
2005-04-23 22:30               ` Paul Jackson
2005-04-25 11:53                 ` Dinakar Guniguntala
2005-04-25 14:38                   ` Paul Jackson
2005-04-21 17:31   ` [RFC PATCH] Dynamic sched domains aka Isolated cpusets (v0.2) Dinakar Guniguntala
2005-04-22 18:50     ` Paul Jackson
2005-04-22 21:37       ` Paul Jackson
2005-04-23  3:11     ` Paul Jackson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050422142618.08d74ede.pj@sgi.com \
    --to=pj@sgi.com \
    --cc=Simon.Derr@bull.net \
    --cc=akpm@osdl.org \
    --cc=colpatch@us.ibm.com \
    --cc=dino@in.ibm.com \
    --cc=dipankar@in.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lse-tech@lists.sourceforge.net \
    --cc=nickpiggin@yahoo.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).