linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Gang Scheduling in linux
@ 2002-07-16 22:54 shreenivasa H V
  2002-07-18 16:21 ` Ingo Molnar
  0 siblings, 1 reply; 16+ messages in thread
From: shreenivasa H V @ 2002-07-16 22:54 UTC (permalink / raw)
  To: linux-kernel

Hello,

I wanted to know whether there is any support for gang scheduling in the linux
kernel. If so, what is the status of the implementation and if there are any
documents about the same.

thanks,
shreeni.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Gang Scheduling in linux
  2002-07-18 16:21 ` Ingo Molnar
@ 2002-07-17 17:40   ` William Lee Irwin III
  2002-07-18 17:40   ` Ingo Molnar
  1 sibling, 0 replies; 16+ messages in thread
From: William Lee Irwin III @ 2002-07-17 17:40 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: shreenivasa H V, linux-kernel

On Thu, Jul 18, 2002 at 06:21:41PM +0200, Ingo Molnar wrote:
> yes - the 'synchronous wakeup' feature is a form of gang scheduling. It in
> essence uses real process-communication information to migrate 'related'
> tasks to the same CPU. So it's automatic, no need to declare processes to
> be part of a 'gang' in some formal (and thus fundamentally imperfect) way.
> (another form of 'gang scheduling' can be achieved by binding the 'parent'
> process to a single CPU - all children will be bound to that CPU as well.)

Hit #1 on google.com: http://www.sw.nec.co.jp/hpc/sx-e/sx-world/no23/en10.pdf

   [SX-5 SERIES TECHNICAL HIGHLIGHTS]
   Overview of Gang Scheduling
   Koichi Nakanishi,
   Senior Manager
   Koji Suzuki,
   Assistant Manager                                                            
   4th Development Department, 1st Computers Software Division, Computers       
   Software Operations Unit, NEC Corporation                                    
   SX WORLD                                                                     
   Autumn 1998 No.23 Special Issue

[...]

   The Gang scheduling function has the func-     
   tion of simultaneously allocating the required  
   number of CPUs when scheduling parallel             
   programs, and allows you to obtain almost  
   the same performance when multiple parallel  
   programs are simultaneously executing, as if  
   the programs were running alone.              


I have approximately zero interest in this myself, but something seemed
off about the definition of gang scheduling being used in the post.


Cheers,
Bill

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Gang Scheduling in linux
  2002-07-18 17:40   ` Ingo Molnar
@ 2002-07-17 17:47     ` William Lee Irwin III
  2002-07-17 20:14     ` Sam Mason
  2002-07-19 15:05     ` Hubertus Franke
  2 siblings, 0 replies; 16+ messages in thread
From: William Lee Irwin III @ 2002-07-17 17:47 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: shreenivasa H V, linux-kernel

On Thu, Jul 18, 2002 at 07:40:19PM +0200, Ingo Molnar wrote:
> you are right in that the Linux scheduler does not enable classic
> gang-scheduling: where multiple processes are scheduled 'at once' on
> multiple CPUs. Can you point out specific (real-life) workloads where this
> would be advantegous? Some testcode would be the best form of expressing
> this. Pretty much any job that uses sane (kernel-based or kernel-helped)
> synchronization should see good throughput.

I will not advocate it myself. I only remembered the definition.


Cheers,
Bill

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Gang Scheduling in linux
  2002-07-18 17:40   ` Ingo Molnar
  2002-07-17 17:47     ` William Lee Irwin III
@ 2002-07-17 20:14     ` Sam Mason
  2002-07-18 20:08       ` Ingo Molnar
  2002-07-19 15:05     ` Hubertus Franke
  2 siblings, 1 reply; 16+ messages in thread
From: Sam Mason @ 2002-07-17 20:14 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: shreenivasa H V, linux-kernel

On Thu, Jul 18, 2002 at 07:40:19PM +0200, Ingo Molnar wrote:
>Can you point out specific (real-life) workloads where this
>would be advantegous?

It's mainly used for programs that needs lots of processing power
chucked at a specific problem, the problem is first broken down into
several small pieces and each part is sent off to a different
processor.  When each piece has been processed, they are all
recombined and the rest of the calculation is continued.  The problem
with this is that if any one of the pieces is delayed, all the
processors will be idle waiting for the interrupted piece to be
processed, before they can process the next set of pieces.

  Sam

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Gang Scheduling in linux
  2002-07-18 20:08       ` Ingo Molnar
@ 2002-07-17 20:39         ` Sam Mason
  2002-07-18 20:32           ` Ingo Molnar
  0 siblings, 1 reply; 16+ messages in thread
From: Sam Mason @ 2002-07-17 20:39 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: shreenivasa H V, linux-kernel

On Thu, Jul 18, 2002 at 10:08:13PM +0200, Ingo Molnar wrote:
>On Wed, 17 Jul 2002, Sam Mason wrote:
>> It's mainly used for programs that needs lots of processing power
>> chucked at a specific problem, the problem is first broken down into
>> several small pieces and each part is sent off to a different processor.
>> When each piece has been processed, they are all recombined and the rest
>> of the calculation is continued.  The problem with this is that if any
>> one of the pieces is delayed, all the processors will be idle waiting
>> for the interrupted piece to be processed, before they can process the
>> next set of pieces.
>well, how does gang scheduling solve this problem? Even gang-scheduled
>tasks might be interrupted anytime on any CPU, by higher-priority tasks,
>thus causing a delay.

The important thing to remember is that this isn't a normal scheduling
method, it's used for VERY specialised software which is assumed to
have (almost) complete control of the machine.  Gang scheduled
processes would have the highest priority possible and would get
executed before any other processes.  This works because the software
knows what it's doing and assumes that the user only ran one bit of
gang scheduled software, if all of these are valid assumptions
everything should work nicely.

Thinking about it, if a process just sets itself to be the highest
priority and constrains it's self to appropriate processors then it
wouldn't surprise me if this was just what you want to do gang
scheduled.


  Sam

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Gang Scheduling in linux
  2002-07-18 20:32           ` Ingo Molnar
@ 2002-07-17 21:24             ` Sam Mason
  2002-07-18 12:43             ` Jean Wolter
  1 sibling, 0 replies; 16+ messages in thread
From: Sam Mason @ 2002-07-17 21:24 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: shreenivasa H V, linux-kernel

On Thu, Jul 18, 2002 at 10:32:04PM +0200, Ingo Molnar wrote:
>On Wed, 17 Jul 2002, Sam Mason wrote:
>> The important thing to remember is that this isn't a normal scheduling
>> method, it's used for VERY specialised software which is assumed to have
>> (almost) complete control of the machine. [...]
>so how does this differ from a normal Linux system that is used
>exclusively? The specialized tasks will get evenly distributed between
>CPUs (as long as the number of tasks is not higher than the number of
>CPUs), and nothing should interrupt them.

At the moment I can't think of anything, but I'm sure that someone
with a bit of real life experience will show up and prove me wrong.

>> [...] Gang scheduled processes would have the highest priority possible
>> and would get executed before any other processes.  This works because
>> the software knows what it's doing and assumes that the user only ran
>> one bit of gang scheduled software, if all of these are valid
>> assumptions everything should work nicely.
>>
>> Thinking about it, if a process just sets itself to be the highest
>> priority and constrains it's self to appropriate processors then it
>> wouldn't surprise me if this was just what you want to do gang
>> scheduled.
>
>yeah. You can schedule processes 'manually' by using affinities - this is
>for corner-cases which know it 100% well what they are doing. But the
>default scheduler should get the '8 tasks running on an 8-way system' case
>right as well - each CPU will run a single number-cruncher, and there wont
>be any bouncing.

I think that would work, I would also assume that the program has
enough knowledge to start only the required number of tasks and
therefore let the scheduler figure out where to put them.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Gang Scheduling in linux
  2002-07-18 20:32           ` Ingo Molnar
  2002-07-17 21:24             ` Sam Mason
@ 2002-07-18 12:43             ` Jean Wolter
  1 sibling, 0 replies; 16+ messages in thread
From: Jean Wolter @ 2002-07-18 12:43 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Sam Mason, shreenivasa H V, linux-kernel

Ingo Molnar <mingo@elte.hu> writes:

> On Wed, 17 Jul 2002, Sam Mason wrote:
> 
> > The important thing to remember is that this isn't a normal scheduling
> > method, it's used for VERY specialised software which is assumed to have
> > (almost) complete control of the machine. [...]
> 
> so how does this differ from a normal Linux system that is used
> exclusively? The specialized tasks will get evenly distributed between
> CPUs (as long as the number of tasks is not higher than the number of
> CPUs), and nothing should interrupt them.

I think as long as there is only one task set, it doesn't matter. But
if you are trying to run several parallel applications at the same
time on the same machine you are trying to schedule them as a gang (at
the same time on different processors) to provide the impression, that
each task set would run alone.

And afaik these applications typically use shared resources which are
protected by some kind of synchronization primitive like (user level)
spinlocks. If one of the processes holds the lock and the kernel
preempts it for some reason no other process needing access to the
shared resource is able to make progress. So the idea is to try to
schedule them all at the same time on different processors to ensure
that blocking time on the shared resource is really short.

regards,
Jean

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Gang Scheduling in linux
  2002-07-16 22:54 Gang Scheduling in linux shreenivasa H V
@ 2002-07-18 16:21 ` Ingo Molnar
  2002-07-17 17:40   ` William Lee Irwin III
  2002-07-18 17:40   ` Ingo Molnar
  0 siblings, 2 replies; 16+ messages in thread
From: Ingo Molnar @ 2002-07-18 16:21 UTC (permalink / raw)
  To: shreenivasa H V; +Cc: linux-kernel


On Tue, 16 Jul 2002, shreenivasa H V wrote:

> I wanted to know whether there is any support for gang scheduling in the
> linux kernel. If so, what is the status of the implementation and if
> there are any documents about the same.

yes - the 'synchronous wakeup' feature is a form of gang scheduling. It in
essence uses real process-communication information to migrate 'related'
tasks to the same CPU. So it's automatic, no need to declare processes to
be part of a 'gang' in some formal (and thus fundamentally imperfect) way.

(another form of 'gang scheduling' can be achieved by binding the 'parent'
process to a single CPU - all children will be bound to that CPU as well.)

	Ingo


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Gang Scheduling in linux
  2002-07-18 16:21 ` Ingo Molnar
  2002-07-17 17:40   ` William Lee Irwin III
@ 2002-07-18 17:40   ` Ingo Molnar
  2002-07-17 17:47     ` William Lee Irwin III
                       ` (2 more replies)
  1 sibling, 3 replies; 16+ messages in thread
From: Ingo Molnar @ 2002-07-18 17:40 UTC (permalink / raw)
  To: shreenivasa H V; +Cc: linux-kernel


you are right in that the Linux scheduler does not enable classic
gang-scheduling: where multiple processes are scheduled 'at once' on
multiple CPUs. Can you point out specific (real-life) workloads where this
would be advantegous? Some testcode would be the best form of expressing
this. Pretty much any job that uses sane (kernel-based or kernel-helped)
synchronization should see good throughput.

	Ingo


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Gang Scheduling in linux
  2002-07-17 20:14     ` Sam Mason
@ 2002-07-18 20:08       ` Ingo Molnar
  2002-07-17 20:39         ` Sam Mason
  0 siblings, 1 reply; 16+ messages in thread
From: Ingo Molnar @ 2002-07-18 20:08 UTC (permalink / raw)
  To: Sam Mason; +Cc: shreenivasa H V, linux-kernel


On Wed, 17 Jul 2002, Sam Mason wrote:

> It's mainly used for programs that needs lots of processing power
> chucked at a specific problem, the problem is first broken down into
> several small pieces and each part is sent off to a different processor.  
> When each piece has been processed, they are all recombined and the rest
> of the calculation is continued.  The problem with this is that if any
> one of the pieces is delayed, all the processors will be idle waiting
> for the interrupted piece to be processed, before they can process the
> next set of pieces.

well, how does gang scheduling solve this problem? Even gang-scheduled
tasks might be interrupted anytime on any CPU, by higher-priority tasks,
thus causing a delay.

	Ingo


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Gang Scheduling in linux
  2002-07-17 20:39         ` Sam Mason
@ 2002-07-18 20:32           ` Ingo Molnar
  2002-07-17 21:24             ` Sam Mason
  2002-07-18 12:43             ` Jean Wolter
  0 siblings, 2 replies; 16+ messages in thread
From: Ingo Molnar @ 2002-07-18 20:32 UTC (permalink / raw)
  To: Sam Mason; +Cc: shreenivasa H V, linux-kernel


On Wed, 17 Jul 2002, Sam Mason wrote:

> The important thing to remember is that this isn't a normal scheduling
> method, it's used for VERY specialised software which is assumed to have
> (almost) complete control of the machine. [...]

so how does this differ from a normal Linux system that is used
exclusively? The specialized tasks will get evenly distributed between
CPUs (as long as the number of tasks is not higher than the number of
CPUs), and nothing should interrupt them.

> [...] Gang scheduled processes would have the highest priority possible
> and would get executed before any other processes.  This works because
> the software knows what it's doing and assumes that the user only ran
> one bit of gang scheduled software, if all of these are valid
> assumptions everything should work nicely.
> 
> Thinking about it, if a process just sets itself to be the highest
> priority and constrains it's self to appropriate processors then it
> wouldn't surprise me if this was just what you want to do gang
> scheduled.

yeah. You can schedule processes 'manually' by using affinities - this is
for corner-cases which know it 100% well what they are doing. But the
default scheduler should get the '8 tasks running on an 8-way system' case
right as well - each CPU will run a single number-cruncher, and there wont
be any bouncing.

	Ingo


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Gang Scheduling in linux
  2002-07-18 17:40   ` Ingo Molnar
  2002-07-17 17:47     ` William Lee Irwin III
  2002-07-17 20:14     ` Sam Mason
@ 2002-07-19 15:05     ` Hubertus Franke
  2002-07-20 16:59       ` Ingo Molnar
  2 siblings, 1 reply; 16+ messages in thread
From: Hubertus Franke @ 2002-07-19 15:05 UTC (permalink / raw)
  To: Ingo Molnar, shreenivasa H V; +Cc: linux-kernel

On Thursday 18 July 2002 01:40 pm, Ingo Molnar wrote:
> you are right in that the Linux scheduler does not enable classic
> gang-scheduling: where multiple processes are scheduled 'at once' on
> multiple CPUs. Can you point out specific (real-life) workloads where this
> would be advantegous? Some testcode would be the best form of expressing
> this. Pretty much any job that uses sane (kernel-based or kernel-helped)
> synchronization should see good throughput.
>
> 	Ingo
>
> -

Go to any of the national labs. 
I was involved in the gangscheduler implementation for the IBM 340 node SP2 
cluster at Lawrence Livermore National Lab.
Implementation aside, one can show that the total system utilization can be 
raised from ~60% to a ~90% when doing gang scheduling rather than FIFO 
scheduling, which one would otherwise do to get a dedicated machine.
We got tons of papers on this. 

For this it seems sufficient to simply STOP apps on a larger granularity and 
have that done through a user level daemon. The kernel scheduler simply 
schedules the runnable threads that given the U-Sched would always amount
to a limited number of threads/tasks.

-- 
-- Hubertus Franke  (frankeh@watson.ibm.com)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Gang Scheduling in linux
  2002-07-20 16:59       ` Ingo Molnar
@ 2002-07-19 19:25         ` Hubertus Franke
  2002-07-19 22:05           ` Richard Gooch
  0 siblings, 1 reply; 16+ messages in thread
From: Hubertus Franke @ 2002-07-19 19:25 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: shreenivasa H V, linux-kernel

On Saturday 20 July 2002 12:59 pm, Ingo Molnar wrote:
> On Fri, 19 Jul 2002, Hubertus Franke wrote:
> > For this it seems sufficient to simply STOP apps on a larger granularity
> > and have that done through a user level daemon. The kernel scheduler
> > simply schedules the runnable threads that given the U-Sched would
> > always amount to a limited number of threads/tasks.
>
> yep, this is my suggestion as well. Any reason to do gang scheduling in
> the scheduler and not via a userspace daemon that stops/resumes (and
> binds) managed tasks explicitly?
>
> 	Ingo

Not from our experience from a cluster based application. Each node was 
a SMP in that case.

On a single SMP I could imagine for instance for parallel reendering
or similar tightly integrated parallel programs that need data 
synchronization.  Most of these apps assume a tightly coupled non-virtual
resource, i.e., scheduling of tasks is aligned.

SGI used to have that stuff in their base kernel. Read a paper about this some 
years ago.
Again, at the beginning I'd go with a user level scheduler approach that 
certainly would satisfy national labs etc. Most of the cluster schedulers, 
like PBS and LoadLeveler etc., already provide that functionality.


-- 
-- Hubertus Franke  (frankeh@watson.ibm.com)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Gang Scheduling in linux
  2002-07-19 19:25         ` Hubertus Franke
@ 2002-07-19 22:05           ` Richard Gooch
  2002-07-22 19:52             ` Hubertus Franke
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Gooch @ 2002-07-19 22:05 UTC (permalink / raw)
  To: frankeh; +Cc: Ingo Molnar, shreenivasa H V, linux-kernel

Hubertus Franke writes:
> On a single SMP I could imagine for instance for parallel reendering
> or similar tightly integrated parallel programs that need data
> synchronization.  Most of these apps assume a tightly coupled
> non-virtual resource, i.e., scheduling of tasks is aligned.
> 
> SGI used to have that stuff in their base kernel. Read a paper about
> this some years ago.  Again, at the beginning I'd go with a user
> level scheduler approach that certainly would satisfy national labs
> etc. Most of the cluster schedulers, like PBS and LoadLeveler etc.,
> already provide that functionality.

A completely user-level solution may have some disadvantages, though,
such as delays in scheduling on/off (say if some daemon is used to
scan the process list). Perhaps we could add a small hack to the
scheduler such that when a task is about to be scheduled off, a signal
can be sent to a designated pid? Similarly, when a task is scheduled
on, another signal may be sent. An application that wanted to have
gang scheduling could then make use of this to STOP/CONT threads.

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Gang Scheduling in linux
  2002-07-19 15:05     ` Hubertus Franke
@ 2002-07-20 16:59       ` Ingo Molnar
  2002-07-19 19:25         ` Hubertus Franke
  0 siblings, 1 reply; 16+ messages in thread
From: Ingo Molnar @ 2002-07-20 16:59 UTC (permalink / raw)
  To: Hubertus Franke; +Cc: shreenivasa H V, linux-kernel


On Fri, 19 Jul 2002, Hubertus Franke wrote:

> For this it seems sufficient to simply STOP apps on a larger granularity
> and have that done through a user level daemon. The kernel scheduler
> simply schedules the runnable threads that given the U-Sched would
> always amount to a limited number of threads/tasks.

yep, this is my suggestion as well. Any reason to do gang scheduling in
the scheduler and not via a userspace daemon that stops/resumes (and
binds) managed tasks explicitly?

	Ingo


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Gang Scheduling in linux
  2002-07-19 22:05           ` Richard Gooch
@ 2002-07-22 19:52             ` Hubertus Franke
  0 siblings, 0 replies; 16+ messages in thread
From: Hubertus Franke @ 2002-07-22 19:52 UTC (permalink / raw)
  To: Richard Gooch; +Cc: Ingo Molnar, shreenivasa H V, linux-kernel

On Friday 19 July 2002 06:05 pm, Richard Gooch wrote:
> Hubertus Franke writes:
> > On a single SMP I could imagine for instance for parallel reendering
> > or similar tightly integrated parallel programs that need data
> > synchronization.  Most of these apps assume a tightly coupled
> > non-virtual resource, i.e., scheduling of tasks is aligned.
> >
> > SGI used to have that stuff in their base kernel. Read a paper about
> > this some years ago.  Again, at the beginning I'd go with a user
> > level scheduler approach that certainly would satisfy national labs
> > etc. Most of the cluster schedulers, like PBS and LoadLeveler etc.,
> > already provide that functionality.
>
> A completely user-level solution may have some disadvantages, though,
> such as delays in scheduling on/off (say if some daemon is used to
> scan the process list). Perhaps we could add a small hack to the
> scheduler such that when a task is about to be scheduled off, a signal
> can be sent to a designated pid? Similarly, when a task is scheduled
> on, another signal may be sent. An application that wanted to have
> gang scheduling could then make use of this to STOP/CONT threads.
>
> 				Regards,
>
> 					Richard....
> Permanent: rgooch@atnf.csiro.au
> Current:   rgooch@ras.ucalgary.ca

I am glad you brought this up. I'd love to have a generic callback for this.
AIX used/has a process change handler that is being called on start/exit.

In Linux, this idea could be done through a generic hook settable through a 
module... that should be sufficient and would allow for other stuff to 
be handled as well. For instance in the presence of fast user level
communication (e.g. user mapped windows to myrinet the current
process could be marked in the communication adapter).

Just thinking loud.

-- 
-- Hubertus Franke  (frankeh@watson.ibm.com)

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2002-07-22 20:53 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-07-16 22:54 Gang Scheduling in linux shreenivasa H V
2002-07-18 16:21 ` Ingo Molnar
2002-07-17 17:40   ` William Lee Irwin III
2002-07-18 17:40   ` Ingo Molnar
2002-07-17 17:47     ` William Lee Irwin III
2002-07-17 20:14     ` Sam Mason
2002-07-18 20:08       ` Ingo Molnar
2002-07-17 20:39         ` Sam Mason
2002-07-18 20:32           ` Ingo Molnar
2002-07-17 21:24             ` Sam Mason
2002-07-18 12:43             ` Jean Wolter
2002-07-19 15:05     ` Hubertus Franke
2002-07-20 16:59       ` Ingo Molnar
2002-07-19 19:25         ` Hubertus Franke
2002-07-19 22:05           ` Richard Gooch
2002-07-22 19:52             ` Hubertus Franke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).