All of lore.kernel.org
 help / color / mirror / Atom feed
* Linux scheduler capabilities for batch jobs.
@ 2009-06-01 13:41 J Louis
  2009-06-01 16:40 ` Rik van Riel
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: J Louis @ 2009-06-01 13:41 UTC (permalink / raw)
  To: linux-kernel

Hello All,

I have been writing user land server code and increasingly I find
myself writing resource management code which I think would be better
handled by the scheduler.  The problem can be described as "don't run
this process if the machine is swapping."  I would think that this
would be a common enough need that it was already in the kernel.  I'm
hoping it is and I have simply overlooked it (part of the reason for
this post,) but I've looked around a good bit and most of the
scheduler enhancements have to do with real time and latency, not
batch jobs.

My problem is analogous to a parallel make.  Say I have an 8 CPU
machine, and I run "make -j8".  If the total memory of the 8 jobs
throws the machine into swap, it begins to thrash and runtime is
awful.  I believe this is aggravated by the scheduler trying to be
fair, and keeping all 8 processes running.  If it was possible to tell
the scheduler that it was OK not to be fair when scheduling these
processes, I think the total runtime could be reduced if it put some
of the processes to sleep while others completed.  Is there a way to
tell the scheduler it is allowed to do this?  Should there be?

Thank You,

Hands

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Linux scheduler capabilities for batch jobs.
  2009-06-01 13:41 Linux scheduler capabilities for batch jobs J Louis
@ 2009-06-01 16:40 ` Rik van Riel
  2009-06-01 17:04   ` Avi Kivity
  2009-06-03 21:14   ` Bill Davidsen
  2009-06-02  4:57 ` Mike Galbraith
  2009-06-17  9:38 ` Pavel Machek
  2 siblings, 2 replies; 6+ messages in thread
From: Rik van Riel @ 2009-06-01 16:40 UTC (permalink / raw)
  To: J Louis; +Cc: linux-kernel

J Louis wrote:

> If it was possible to tell
> the scheduler that it was OK not to be fair when scheduling these
> processes, I think the total runtime could be reduced if it put some
> of the processes to sleep while others completed.  Is there a way to
> tell the scheduler it is allowed to do this?  Should there be?

There is no way to do this currently, but I suspect that it
would not be too difficult to add.

Of course, if you have two tasks that are each a little larger
than memory, your idea could lead to one of the processes being
starved forever.  This is probably not acceptable :)

In fact, one single batch process that is swapping could trigger
the algorithm you described, halting itself.  Your idea would
need very carefuly implementation to avoid these kinds of issues,
but I believe it could definately be done.

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Linux scheduler capabilities for batch jobs.
  2009-06-01 16:40 ` Rik van Riel
@ 2009-06-01 17:04   ` Avi Kivity
  2009-06-03 21:14   ` Bill Davidsen
  1 sibling, 0 replies; 6+ messages in thread
From: Avi Kivity @ 2009-06-01 17:04 UTC (permalink / raw)
  To: Rik van Riel; +Cc: J Louis, linux-kernel

Rik van Riel wrote:
> J Louis wrote:
>
>> If it was possible to tell
>> the scheduler that it was OK not to be fair when scheduling these
>> processes, I think the total runtime could be reduced if it put some
>> of the processes to sleep while others completed.  Is there a way to
>> tell the scheduler it is allowed to do this?  Should there be?
>
> There is no way to do this currently, but I suspect that it
> would not be too difficult to add.
>
> Of course, if you have two tasks that are each a little larger
> than memory, your idea could lead to one of the processes being
> starved forever.  This is probably not acceptable :)
>
> In fact, one single batch process that is swapping could trigger
> the algorithm you described, halting itself.  Your idea would
> need very carefuly implementation to avoid these kinds of issues,
> but I believe it could definately be done.

Some king of interaction between the swap token and the scheduler, 
perhaps, for SCHED_BATCH processes.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Linux scheduler capabilities for batch jobs.
  2009-06-01 13:41 Linux scheduler capabilities for batch jobs J Louis
  2009-06-01 16:40 ` Rik van Riel
@ 2009-06-02  4:57 ` Mike Galbraith
  2009-06-17  9:38 ` Pavel Machek
  2 siblings, 0 replies; 6+ messages in thread
From: Mike Galbraith @ 2009-06-02  4:57 UTC (permalink / raw)
  To: J Louis; +Cc: linux-kernel

On Mon, 2009-06-01 at 09:41 -0400, J Louis wrote:

> My problem is analogous to a parallel make.  Say I have an 8 CPU
> machine, and I run "make -j8".  If the total memory of the 8 jobs
> throws the machine into swap, it begins to thrash and runtime is
> awful.  

Thrashing is more of a VM/IO scheduling concern.

> I believe this is aggravated by the scheduler trying to be
> fair, and keeping all 8 processes running.  

Yup, fair CPU distribution is the process scheduler's mission, and that
allows tasks to compete for other resources.

> If it was possible to tell
> the scheduler that it was OK not to be fair when scheduling these
> processes, I think the total runtime could be reduced if it put some
> of the processes to sleep while others completed.  

The scheduler doesn't know that any given task _ever_ completes.

> Is there a way to
> tell the scheduler it is allowed to do this?  Should there be?

No, and I don't think it's feasible for existing classes.  You could
invent a new scheduling class, but I think you'd need to invent quite a
bit of infrastructure in the VM to make it work well.

OTOH, the process scheduler doesn't, and shouldn't, make IO resource
decisions, we have IO schedulers to manage who gets what IO bandwidth
when.  The same should apply to VM resources.  Seems to me what you
really want is a VM scheduler.

	-Mike


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Linux scheduler capabilities for batch jobs.
  2009-06-01 16:40 ` Rik van Riel
  2009-06-01 17:04   ` Avi Kivity
@ 2009-06-03 21:14   ` Bill Davidsen
  1 sibling, 0 replies; 6+ messages in thread
From: Bill Davidsen @ 2009-06-03 21:14 UTC (permalink / raw)
  To: Rik van Riel; +Cc: J Louis, linux-kernel

Rik van Riel wrote:
> J Louis wrote:
> 
>> If it was possible to tell
>> the scheduler that it was OK not to be fair when scheduling these
>> processes, I think the total runtime could be reduced if it put some
>> of the processes to sleep while others completed.  Is there a way to
>> tell the scheduler it is allowed to do this?  Should there be?
> 
> There is no way to do this currently, but I suspect that it
> would not be too difficult to add.
> 
> Of course, if you have two tasks that are each a little larger
> than memory, your idea could lead to one of the processes being
> starved forever.  This is probably not acceptable :)
> 
> In fact, one single batch process that is swapping could trigger
> the algorithm you described, halting itself.  Your idea would
> need very carefuly implementation to avoid these kinds of issues,
> but I believe it could definately be done.
> 
I think it gets doubly hard because the processes may have parts swapped even if 
there isn't memory pressure, so you can't just use percentage in memory. If you 
were going to do this in an effective way you would really need to monitor swap 
rate per process, and factor in total size, so it's not trivial. Of course in 
the extreme cases it would be hard to avoid improvement, so it need not be 
perfect to be helpful.

-- 
Bill Davidsen <davidsen@tmr.com>
   Even purely technical things can appear to be magic, if the documentation is
obscure enough. For example, PulseAudio is configured by dancing naked around a
fire at midnight, shaking a rattle with one hand and a LISP manual with the
other, while reciting the GNU manifesto in hexadecimal. The documentation fails
to note that you must circle the fire counter-clockwise in the southern
hemisphere.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Linux scheduler capabilities for batch jobs.
  2009-06-01 13:41 Linux scheduler capabilities for batch jobs J Louis
  2009-06-01 16:40 ` Rik van Riel
  2009-06-02  4:57 ` Mike Galbraith
@ 2009-06-17  9:38 ` Pavel Machek
  2 siblings, 0 replies; 6+ messages in thread
From: Pavel Machek @ 2009-06-17  9:38 UTC (permalink / raw)
  To: J Louis; +Cc: linux-kernel

On Mon 2009-06-01 09:41:07, J Louis wrote:
> Hello All,
> 
> I have been writing user land server code and increasingly I find
> myself writing resource management code which I think would be better
> handled by the scheduler.  The problem can be described as "don't run
> this process if the machine is swapping."  I would think that this
> would be a common enough need that it was already in the kernel.  I'm
> hoping it is and I have simply overlooked it (part of the reason for
> this post,) but I've looked around a good bit and most of the
> scheduler enhancements have to do with real time and latency, not
> batch jobs.
> 
> My problem is analogous to a parallel make.  Say I have an 8 CPU
> machine, and I run "make -j8".  If the total memory of the 8 jobs
> throws the machine into swap, it begins to thrash and runtime is
> awful.  I believe this is aggravated by the scheduler trying to be

This seems to be racy by design.

make -j:
     launch gcc1
     gcc1 does preprocessing (10mb)  
     ok, still not swapping
     launch gcc2
     gcc2 does preprocessing (10mb)  
     ...
     gcc100 does preprocessing (10mb, 1GB total)
now gcc1..gcc100 start optimizing (100mb needed) and boom.

									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-06-17  9:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-01 13:41 Linux scheduler capabilities for batch jobs J Louis
2009-06-01 16:40 ` Rik van Riel
2009-06-01 17:04   ` Avi Kivity
2009-06-03 21:14   ` Bill Davidsen
2009-06-02  4:57 ` Mike Galbraith
2009-06-17  9:38 ` Pavel Machek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.