All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [Lse-tech] CPUSET Proposal
@ 2003-09-24 16:30 Stephen Hemminger
  2003-09-24 17:02 ` David Mosberger
                   ` (15 more replies)
  0 siblings, 16 replies; 17+ messages in thread
From: Stephen Hemminger @ 2003-09-24 16:30 UTC (permalink / raw)
  To: linux-ia64

Looks good, but you aren't likely to get much acceptance or testing if
it only works on ia64.  You need to make a version for i386 as well.


Also, don't send your patch as base64 encode attachment, it makes working
with text tools harder.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Lse-tech] CPUSET Proposal
  2003-09-24 16:30 [Lse-tech] CPUSET Proposal Stephen Hemminger
@ 2003-09-24 17:02 ` David Mosberger
  2003-09-24 19:32 ` Gerrit Huizenga
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: David Mosberger @ 2003-09-24 17:02 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Wed, 24 Sep 2003 09:30:44 -0700, Stephen Hemminger <shemminger@osdl.org> said:

  Stephen> Looks good, but you aren't likely to get much acceptance or
  Stephen> testing if it only works on ia64.  You need to make a
  Stephen> version for i386 as well.

Is this true for >8-way machines?

	--david

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Lse-tech] CPUSET Proposal
  2003-09-24 16:30 [Lse-tech] CPUSET Proposal Stephen Hemminger
  2003-09-24 17:02 ` David Mosberger
@ 2003-09-24 19:32 ` Gerrit Huizenga
  2003-09-24 21:42 ` Paul Jackson
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Gerrit Huizenga @ 2003-09-24 19:32 UTC (permalink / raw)
  To: linux-ia64

This might be worth comparing notes on with the CKRM folks (cc:'d above).

gerrit

On Wed, 24 Sep 2003 17:59:01 +0200, Simon Derr wrote:
> 
> Hi,
> 
> We have developped a new feature in the Linux kernel, controlling CPU
> placements, which are useful on large SMP machines, especially NUMA ones.
> We call it CPUSETS, and we would highly appreciate to know about anyone
> who would be interested in such a feature. This has been somewhat inspired
> by the pset or cpumemset patches existing for Linux 2.4.
> 
> CPUSETs are lightweight objects in the linux kernel that enable users to
> partition their multiprocessor machine by creating execution areas. A
> virtualization layer has been added so it becomes possible to split a
> machine in terms of CPUs.
> 
> Furthermore, HPC applications often need to bind their processes to a
> specific CPU, and can achieve this by calling sched_setaffinity() in the
> recent Linux kernels. But running several HPC applications on a large
> system will result in several processes running on the same processor.
> This problem is addressed by the CPUSET mechanism.
> 
> 
> CPUSETS allow to:
> ----------------
> 1/ create sets of CPUs on the system, and bind applications to them
> 
> 2/ translate the masks of CPUs given to sched_setaffinity() so they stay
>    inside the set of CPUs. With this mechanism, processors are virtualized,
>    for the use of sched_setaffinity() and /proc information. Thus, any former
>    application using this syscall to bind processes to processors will
>    work with virtual CPUs without any change.
> 
> 3/ provide a way to create sets of cpus *inside* a set of cpus : hence a
>    system administrator can partition a system among users, and users can
>    partition their partition among their applications.
> 
> 4/ Change on the fly the execution area of a whole set of processes (to
>    give more resources to a critical application, for example).
> 
> ...
> 5/ In the future, probably associate a memory allocation policy (such as
> local node, or round robin) to a set of cpus.
> 
> 
> These features have been implemented as a kernel patch for Linux 2.6 and a
> suite of userland tools.
> 
> You can find the associated manpages and a slightly more detailed
> explanation here: http://www.bullopensource.org/cpuset/
> 
> Any feedback, comment or opinion is welcome:
> 	Simon.Derr@Bull.net,
> 	Sylvain.Jaugey@bull.net
> 
> Thanks,
> 
> 	Simon and Sylvain.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Lse-tech] CPUSET Proposal
  2003-09-24 16:30 [Lse-tech] CPUSET Proposal Stephen Hemminger
  2003-09-24 17:02 ` David Mosberger
  2003-09-24 19:32 ` Gerrit Huizenga
@ 2003-09-24 21:42 ` Paul Jackson
  2003-09-25  5:40 ` Paul Jackson
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Paul Jackson @ 2003-09-24 21:42 UTC (permalink / raw)
  To: linux-ia64

Interesting ...

I'm still digesting it.  However, one of the documents, at:

  http://www.bullopensource.org/cpuset/cpuset.html

was painful to read in a web browser, because it was just one big
<pre>...</pre> block of text, with rather long lines (over 400
characters in one line) requiring much horizontal scrolling.

So I have reformatted it, using more common html markup.
You are welcome to steal my reformatting - it's visible at:

  http://www.speakeasy.org/~pj99/cpuset_formatted.html

-- 
                          I won't rest till it's the best ...
                          Programmer, Linux Scalability
                          Paul Jackson <pj@sgi.com> 1.650.933.1373

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Lse-tech] CPUSET Proposal
  2003-09-24 16:30 [Lse-tech] CPUSET Proposal Stephen Hemminger
                   ` (2 preceding siblings ...)
  2003-09-24 21:42 ` Paul Jackson
@ 2003-09-25  5:40 ` Paul Jackson
  2003-09-25  5:44 ` Paul Jackson
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Paul Jackson @ 2003-09-25  5:40 UTC (permalink / raw)
  To: linux-ia64

Where's the user level cpuset.h file?  I wasn't able to find
it so far -- just a few links in the man pages to it, but only
on some local file system to which I lack access.

-- 
                          I won't rest till it's the best ...
                          Programmer, Linux Scalability
                          Paul Jackson <pj@sgi.com> 1.650.933.1373

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Lse-tech] CPUSET Proposal
  2003-09-24 16:30 [Lse-tech] CPUSET Proposal Stephen Hemminger
                   ` (3 preceding siblings ...)
  2003-09-25  5:40 ` Paul Jackson
@ 2003-09-25  5:44 ` Paul Jackson
  2003-09-25  6:02 ` William Lee Irwin III
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Paul Jackson @ 2003-09-25  5:44 UTC (permalink / raw)
  To: linux-ia64

On the pchange man page, an example starts with:

        # pcreate -np 4 --strict
        new area created with id 2

How does the 'pcreate' invoker know that the new area
had an id of "2" ?

-- 
                          I won't rest till it's the best ...
                          Programmer, Linux Scalability
                          Paul Jackson <pj@sgi.com> 1.650.933.1373

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Lse-tech] CPUSET Proposal
  2003-09-24 16:30 [Lse-tech] CPUSET Proposal Stephen Hemminger
                   ` (4 preceding siblings ...)
  2003-09-25  5:44 ` Paul Jackson
@ 2003-09-25  6:02 ` William Lee Irwin III
  2003-09-25  6:57 ` David Mosberger
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: William Lee Irwin III @ 2003-09-25  6:02 UTC (permalink / raw)
  To: linux-ia64

On Wed, 24 Sep 2003 09:30:44 -0700, Stephen Hemminger <shemminger@osdl.org> said:
Stephen> Looks good, but you aren't likely to get much acceptance or
Stephen> testing if it only works on ia64.  You need to make a
Stephen> version for i386 as well.

On Wed, Sep 24, 2003 at 10:02:35AM -0700, David Mosberger wrote:
> Is this true for >8-way machines?

x86's architectural limitations are 64x for serial APIC -based machines
(e.g. NUMA-Q) and 255x for xAPIC -based machines (no known extant > 32x
machines, apparently some kind of non-architectural regression), where
the non-power-of-two number of cpus is due to the broadcast ID reserved
from an 8-bit interrupt controller ID space. A likely explanation for
the current xAPIC limitations is the recommended (publicly documented)
physical APIC ID enumeration scheme breaking down for > 32x.

Custom interrupt controllers may exceed these limits, but I don't know
of any that have actually been made use of to do so. Though it sucks
and very, very badly, x86 is not limited to anything like 8x.


-- wli

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Lse-tech] CPUSET Proposal
  2003-09-24 16:30 [Lse-tech] CPUSET Proposal Stephen Hemminger
                   ` (5 preceding siblings ...)
  2003-09-25  6:02 ` William Lee Irwin III
@ 2003-09-25  6:57 ` David Mosberger
  2003-09-25  7:07 ` David Mosberger
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: David Mosberger @ 2003-09-25  6:57 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Wed, 24 Sep 2003 23:02:34 -0700, William Lee Irwin III <wli@holomorphy.com> said:

  Bill> On Wed, 24 Sep 2003 09:30:44 -0700, Stephen Hemminger
  Bill> <shemminger@osdl.org> said:

  Stephen> Looks good, but you aren't likely to get much acceptance or
  Stephen> testing if it only works on ia64.  You need to make a
  Stephen> version for i386 as well.

  Bill> On Wed, Sep 24, 2003 at 10:02:35AM -0700, David Mosberger wrote:

  >> Is this true for >8-way machines?

  Bill> x86's architectural limitations are 64x for serial APIC -based machines
  Bill> (e.g. NUMA-Q) and 255x for xAPIC -based machines (no known extant > 32x
  Bill> machines, apparently some kind of non-architectural regression), where
  Bill> the non-power-of-two number of cpus is due to the broadcast ID reserved
  Bill> from an 8-bit interrupt controller ID space. A likely explanation for
  Bill> the current xAPIC limitations is the recommended (publicly documented)
  Bill> physical APIC ID enumeration scheme breaking down for > 32x.

  Bill> Custom interrupt controllers may exceed these limits, but I don't know
  Bill> of any that have actually been made use of to do so. Though it sucks
  Bill> and very, very badly, x86 is not limited to anything like 8x.

I wasn't suggesting that x86 is limited to 8-way, I was wondering how
many > 8-way x86 Linux machines are actually out there.  I wasn't even
being facetious---just curious.

	--david

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Lse-tech] CPUSET Proposal
  2003-09-24 16:30 [Lse-tech] CPUSET Proposal Stephen Hemminger
                   ` (6 preceding siblings ...)
  2003-09-25  6:57 ` David Mosberger
@ 2003-09-25  7:07 ` David Mosberger
  2003-09-25  7:08 ` William Lee Irwin III
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: David Mosberger @ 2003-09-25  7:07 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Wed, 24 Sep 2003 23:57:10 -0700, David Mosberger <davidm@linux.hpl.hp.com> said:
  Bill> x86's architectural limitations are 64x for serial APIC -based
  Bill> machines (e.g. NUMA-Q) and 255x for xAPIC -based machines (no
  Bill> known extant > 32x machines, apparently some kind of
  Bill> non-architectural regression), where the non-power-of-two
  Bill> number of cpus is due to the broadcast ID reserved from an
  Bill> 8-bit interrupt controller ID space. A likely explanation for
  Bill> the current xAPIC limitations is the recommended (publicly
  Bill> documented) physical APIC ID enumeration scheme breaking down
  Bill> for > 32x.

  Bill> Custom interrupt controllers may exceed these limits, but I
  Bill> don't know of any that have actually been made use of to do
  Bill> so. Though it sucks and very, very badly, x86 is not limited
  Bill> to anything like 8x.

  David> I wasn't suggesting that x86 is limited to 8-way, I was
  David> wondering how many > 8-way x86 Linux machines are actually
  David> out there.  I wasn't even being facetious---just curious.

Incidentally, the first "big" SMP machine I had access to was some
sort of Sequent (S81/10?), with ~12 80386 CPUs (yes, that was a long
time ago... ;-).

	--david

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Lse-tech] CPUSET Proposal
  2003-09-24 16:30 [Lse-tech] CPUSET Proposal Stephen Hemminger
                   ` (7 preceding siblings ...)
  2003-09-25  7:07 ` David Mosberger
@ 2003-09-25  7:08 ` William Lee Irwin III
  2003-09-25  7:14 ` William Lee Irwin III
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: William Lee Irwin III @ 2003-09-25  7:08 UTC (permalink / raw)
  To: linux-ia64

On Wed, 24 Sep 2003 23:02:34 -0700, William Lee Irwin III <wli@holomorphy.com> said:
Bill> Custom interrupt controllers may exceed these limits, but I don't know
Bill> of any that have actually been made use of to do so. Though it sucks
Bill> and very, very badly, x86 is not limited to anything like 8x.

On Wed, Sep 24, 2003 at 11:57:10PM -0700, David Mosberger wrote:
> I wasn't suggesting that x86 is limited to 8-way, I was wondering how
> many > 8-way x86 Linux machines are actually out there.  I wasn't even
> being facetious---just curious.

I am not able to get any kind of useful numerical estimate of how mahy
machines of that kind are manufactured or sold. I suspect the numbers
are meaningful to and kept secret by someone, e.g. marketing ppl. Not to
say that I've tried very hard.


-- wli

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Lse-tech] CPUSET Proposal
  2003-09-24 16:30 [Lse-tech] CPUSET Proposal Stephen Hemminger
                   ` (8 preceding siblings ...)
  2003-09-25  7:08 ` William Lee Irwin III
@ 2003-09-25  7:14 ` William Lee Irwin III
  2003-09-25  9:04 ` Dave Hansen
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: William Lee Irwin III @ 2003-09-25  7:14 UTC (permalink / raw)
  To: linux-ia64

On Wed, 24 Sep 2003 23:57:10 -0700, David Mosberger <davidm@linux.hpl.hp.com> said:
David> I wasn't suggesting that x86 is limited to 8-way, I was
David> wondering how many > 8-way x86 Linux machines are actually
David> out there.  I wasn't even being facetious---just curious.

On Thu, Sep 25, 2003 at 12:07:03AM -0700, David Mosberger wrote:
> Incidentally, the first "big" SMP machine I had access to was some
> sort of Sequent (S81/10?), with ~12 80386 CPUs (yes, that was a long
> time ago... ;-).

Aha, I've been on S-81's myself. Those definitely predated APIC's.
I think they used the SLIC or whatever the precursor to the CSLIC was
for an interrupt controller, and I'm sure there's a Sequent historian
around somewhere to correct me if my memory's (which wouldn't have been
of kernel hacking back then) failed me. =)


-- wli

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Lse-tech] CPUSET Proposal
  2003-09-24 16:30 [Lse-tech] CPUSET Proposal Stephen Hemminger
                   ` (9 preceding siblings ...)
  2003-09-25  7:14 ` William Lee Irwin III
@ 2003-09-25  9:04 ` Dave Hansen
  2003-09-25 18:07 ` Shailabh Nagar
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Dave Hansen @ 2003-09-25  9:04 UTC (permalink / raw)
  To: linux-ia64

On Wed, 2003-09-24 at 23:57, David Mosberger wrote:
> I wasn't suggesting that x86 is limited to 8-way, I was wondering how
> many > 8-way x86 Linux machines are actually out there.  I wasn't even
> being facetious---just curious.

Well, besides the NUMA-Q, which went up to 60x and is dead now, there
are at least the IBM Summit chipset machines.  They're sold as 32-ways
today on the x445 (that's physical, without hyperthreading).  I've
personally booted Linux on a 16-way, but I'm know others have booted on
the 32-way configuration.  Patches for this were posted in the last week
by James Cleverdon.  

There's also the bigsmp code in the kernel for other P4-based systems
that are >8x.  I haven't seen any of them yet, but I wouldn't imagine
that people would put support in the kernel for hardware that wasn't at
least *close* to production. 

-- 
Dave Hansen
haveblue@us.ibm.com


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Lse-tech] CPUSET Proposal
  2003-09-24 16:30 [Lse-tech] CPUSET Proposal Stephen Hemminger
                   ` (10 preceding siblings ...)
  2003-09-25  9:04 ` Dave Hansen
@ 2003-09-25 18:07 ` Shailabh Nagar
  2003-09-25 18:08 ` William Lee Irwin III
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Shailabh Nagar @ 2003-09-25 18:07 UTC (permalink / raw)
  To: linux-ia64

Gerrit Huizenga wrote:

>This might be worth comparing notes on with the CKRM folks (cc:'d above).
>  
>
I went through the cpusets proposal trying to see the commonalities and differences
with what CKRM is trying to achieve.

Speaking of CPUs alone:

The way I see it (corrections welcome), CPUSETS has two objectives
- performance isolation for arbitrary groups of processes by constraining them to run on 
arbitrary sets of CPUs 
- gaining performance benefits for a CPUSET by specifying *which* CPUs it can use. This could
be helpful in NUMA and Hyperthreaded CPU cases as well as for regular CPUs simply by increasing 
the chance of cache warmth etc. i.e. the same benefits as seen in sched_affinity, but generalized for
groups of processes.

CKRM shares some aspects of the first objective i.e. it also seeks performance isolation for 
arbitrary groups of processes (classes) too but differs in that it
- uses scheduler modifications to achieve isolation
- has a quantitative measures for how much resource is used
CKRM does not share the second objective i.e. it does not try to control which CPUs
are allocated by the kernel schedulers. If such constraints are placed on allocation, whether by 
sched_affinity, NUMA sched changes or CPUSETs, CKRM will cooperate and operate within that constraint.

For this second objective, CKRM is orthogonal to CPUSETs. For the first one, it is another way of
achieving isolation, one that broadly sacrifices stricter guarantees (which are possible by physically
isolating CPUs as in CPUSETS) for better load balancing and utilization.

Coming to memory, it wasn't very clear what CPUSETs future plans/objectives are. Does it intend to 
control which address ranges a CPUSET can allocate from (for NUMA reasons) AND limit how much memory it
can consume ? The latter intersects with CKRM objectives, the former doesn't.

Finally, CKRM is dealing with I/O and planning to incorporate inbound and possibly outbound networking
into its framework which would appear to be outside the scope of something like CPUSETs. 


The user interfaces used by CPUSETs like pexec are quite good. We'd been wanting to develop something
similar for CKRM usage as well - allow a user to use a single commandline to specify a job/program and 
the constraints under which it should operate. The command will then create a class with the specified
cpu/mem/io/net constraints, run the job/program within that class and autoclean it when done.


-- Shailabh





On Wed, 24 Sep 2003 17:59:01 +0200, Simon Derr wrote:

>>Hi,
>>
>>We have developped a new feature in the Linux kernel, controlling CPU
>>placements, which are useful on large SMP machines, especially NUMA ones.
>>We call it CPUSETS, and we would highly appreciate to know about anyone
>>who would be interested in such a feature. This has been somewhat inspired
>>by the pset or cpumemset patches existing for Linux 2.4.........
>>    
>>
<snip>

>>You can find the associated manpages and a slightly more detailed
>>explanation here: http://www.bullopensource.org/cpuset/
>>
>>Any feedback, comment or opinion is welcome:
>>	Simon.Derr@Bull.net,
>>	Sylvain.Jaugey@bull.net
>>
>>Thanks,
>>
>>	Simon and Sylvain.
>>    
>>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Lse-tech] CPUSET Proposal
  2003-09-24 16:30 [Lse-tech] CPUSET Proposal Stephen Hemminger
                   ` (11 preceding siblings ...)
  2003-09-25 18:07 ` Shailabh Nagar
@ 2003-09-25 18:08 ` William Lee Irwin III
  2003-09-25 20:50 ` Shailabh Nagar
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: William Lee Irwin III @ 2003-09-25 18:08 UTC (permalink / raw)
  To: linux-ia64

On Wed, 2003-09-24 at 23:57, David Mosberger wrote:
>> I wasn't suggesting that x86 is limited to 8-way, I was wondering how
>> many > 8-way x86 Linux machines are actually out there.  I wasn't even
>> being facetious---just curious.

On Thu, Sep 25, 2003 at 02:04:07AM -0700, Dave Hansen wrote:
> Well, besides the NUMA-Q, which went up to 60x and is dead now, there
> are at least the IBM Summit chipset machines.  They're sold as 32-ways
> today on the x445 (that's physical, without hyperthreading).  I've
> personally booted Linux on a 16-way, but I'm know others have booted on
> the 32-way configuration.  Patches for this were posted in the last week
> by James Cleverdon.  
> There's also the bigsmp code in the kernel for other P4-based systems
> that are >8x.  I haven't seen any of them yet, but I wouldn't imagine
> that people would put support in the kernel for hardware that wasn't at
> least *close* to production. 

I figured the ES7000 and x440/x445 were recent enough they'd be fresh
in people's mind. Small correction to the NUMA-Q: it was 64x.

http://www-3.ibm.com/software/data/db2/benchmarks/050300.html

-- wli

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Lse-tech] CPUSET Proposal
  2003-09-24 16:30 [Lse-tech] CPUSET Proposal Stephen Hemminger
                   ` (12 preceding siblings ...)
  2003-09-25 18:08 ` William Lee Irwin III
@ 2003-09-25 20:50 ` Shailabh Nagar
  2003-09-26  7:36 ` Simon Derr
  2003-09-26  9:58 ` Paul Jackson
  15 siblings, 0 replies; 17+ messages in thread
From: Shailabh Nagar @ 2003-09-25 20:50 UTC (permalink / raw)
  To: linux-ia64

Shailabh Nagar wrote:

> 
> CKRM shares some aspects of the first objective i.e. it also seeks 
> performance isolation for arbitrary groups of processes (classes) too 
> but differs in that it
> - uses scheduler modifications to achieve isolation
> - has a quantitative measures for how much resource is used

Let me modify that since CPUSETs also provides a coarse-grain (#cpus) 
quantitative measure.

CKRM allows % of the total resource (cpu ticks, mem pages etc.) to be 
specified and is consequently at a finer granularity.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Lse-tech] CPUSET Proposal
  2003-09-24 16:30 [Lse-tech] CPUSET Proposal Stephen Hemminger
                   ` (13 preceding siblings ...)
  2003-09-25 20:50 ` Shailabh Nagar
@ 2003-09-26  7:36 ` Simon Derr
  2003-09-26  9:58 ` Paul Jackson
  15 siblings, 0 replies; 17+ messages in thread
From: Simon Derr @ 2003-09-26  7:36 UTC (permalink / raw)
  To: linux-ia64

On Thu, 25 Sep 2003, Shailabh Nagar wrote:

> Coming to memory, it wasn't very clear what CPUSETs future
> plans/objectives are. Does it intend to control which address ranges a
> CPUSET can allocate from (for NUMA reasons) AND limit how much memory it
> can consume ? The latter intersects with CKRM objectives, the former
> doesn't.
The future plans are still to be decided. The control on memory provided
by future versions of CPUSET might be to:
-associate one node on wich to allocate memory to each cpuset
-associate one allocation policy (local node first, round robin) to each
cpuset.

At first glance, limiting the amount of memory used will probably not be
in our objectives. The initial goal of CPUSET was obtaining good
performance on NUMA architectures. But this is an open question.

> Finally, CKRM is dealing with I/O and planning to incorporate inbound
> and possibly outbound networking into its framework which would appear
> to be outside the scope of something like CPUSETs.
That's right.


	Simon.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Lse-tech] CPUSET Proposal
  2003-09-24 16:30 [Lse-tech] CPUSET Proposal Stephen Hemminger
                   ` (14 preceding siblings ...)
  2003-09-26  7:36 ` Simon Derr
@ 2003-09-26  9:58 ` Paul Jackson
  15 siblings, 0 replies; 17+ messages in thread
From: Paul Jackson @ 2003-09-26  9:58 UTC (permalink / raw)
  To: linux-ia64

Simon wrote:
> The control on memory provided by future versions of CPUSET might be ...


I anticipate that we end up with a kernel "mems_allowed" attribute
of a task (or perhaps of a vma), that is quite similar to cpus_allowed.

The mems_allowed field is a multiword (well, one word, until someone
ventures above 64 memory nodes) bit vector of node numbers, and
indicates on which one _or_more_ nodes memory may be allocated.  It
controls allocation in mm/page_alloc.c:__alloc_pages(), and related such
places.  Allocation by the kernel for user address space is
distinguished from allocation for the kernel's own needs.

Actually, the mems_allowed field should not be directly in the task (or
vma?) struct, but in the shared cpuset struct, referenced from the task
or vma.   Simon and Sylvain's cpuset proposal already does this for the
cpus_allowed bit vector, moving it from the task struct to the shared
cpuset struct referenced from the task struct.

On the larger scale mems_allowed is set administratively (via convenient
tools such as batch and job managers) for an entire job or related set
of jobs.  On the smaller scale, it is set for various tasks (or vmas) of
a job, relative to the cpuset that job is executing on.  It is usually
set from user level code, sometimes by admin utilities, sometimes by
system services, sometimes by a numa aware application itself.

There may be a need for kernel or low level library assistance in
setting it, for situations such as when a multithreaded scientific
program that is tuned for running on an SMP system knows enough to get
the number of threads right, but doesn't know enough to place them
across nodes optimally.  In such cases, one sometimes has to intercede
at a relatively low level, behind the applications back, to get the
proper distribution of tasks and vmas across cpus and memory nodes for
optimal performance.

Additional work is required to handle cases where one runs out of memory
on allowed nodes - one might want to steal memory from other nodes, or
swap or kill or sleep or grant more nodes to the greedy application,
depending on administrative policies.  Eventually, the kernel should
provide minimal mechanisms in support of each such policy.

The essential attribute of NUMA systems is "non-uniform memory access"
(not surprisingly).  Getting memory placed is just as essential for
optimum performance as getting cpu usage placed.  The two must work
together.

-- 
                          I won't rest till it's the best ...
                          Programmer, Linux Scalability
                          Paul Jackson <pj@sgi.com> 1.650.933.1373

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2003-09-26  9:58 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-09-24 16:30 [Lse-tech] CPUSET Proposal Stephen Hemminger
2003-09-24 17:02 ` David Mosberger
2003-09-24 19:32 ` Gerrit Huizenga
2003-09-24 21:42 ` Paul Jackson
2003-09-25  5:40 ` Paul Jackson
2003-09-25  5:44 ` Paul Jackson
2003-09-25  6:02 ` William Lee Irwin III
2003-09-25  6:57 ` David Mosberger
2003-09-25  7:07 ` David Mosberger
2003-09-25  7:08 ` William Lee Irwin III
2003-09-25  7:14 ` William Lee Irwin III
2003-09-25  9:04 ` Dave Hansen
2003-09-25 18:07 ` Shailabh Nagar
2003-09-25 18:08 ` William Lee Irwin III
2003-09-25 20:50 ` Shailabh Nagar
2003-09-26  7:36 ` Simon Derr
2003-09-26  9:58 ` Paul Jackson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.