linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* SCHED_IDLE documentation
       [not found]   ` <20080303051719.GA26102@lst.de>
@ 2008-03-03  6:21     ` Arnd Bergmann
  2008-03-03  7:33       ` Ingo Molnar
  0 siblings, 1 reply; 18+ messages in thread
From: Arnd Bergmann @ 2008-03-03  6:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: cbe-oss-dev, Jeremy Kerr, linux-kernel, mingo, Michael Kerrisk

On Monday 03 March 2008, Christoph Hellwig wrote:
> On Mon, Mar 03, 2008 at 06:12:27AM +0100, Arnd Bergmann wrote:
> > > I'm okay with the algorithm, but please don't use policy == SCHED_IDLE
> > > as the flag for it. ?SCHED_IDLE is a user-visible and user-settable
> > > scheduler class and we shouldn't overload this. 
> > 
> > Ah, didn't know that. Is that a fairly recent interface in Linux? I can't
> > find any reference to it in the sched_setscheduler man page, though the
> > mail archives have numerous discussions about the semantics throughout
> > the last 10 years.
> 
> It went in with the CFS scheduler and is indeed as far as I can tell
> totally undocumented (sigh..)

So was there no documentation written for it, or does the new man page
just need to make it into distro releases?

If we don't have any man page, what is the actual definition of SCHED_IDLE
anyway?

	Arnd <><

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SCHED_IDLE documentation
  2008-03-03  6:21     ` SCHED_IDLE documentation Arnd Bergmann
@ 2008-03-03  7:33       ` Ingo Molnar
  2008-03-03  8:40         ` Michael Kerrisk
  0 siblings, 1 reply; 18+ messages in thread
From: Ingo Molnar @ 2008-03-03  7:33 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Christoph Hellwig, cbe-oss-dev, Jeremy Kerr, linux-kernel,
	Michael Kerrisk


* Arnd Bergmann <arnd@arndb.de> wrote:

> If we don't have any man page, what is the actual definition of 
> SCHED_IDLE anyway?

it's rather simple: "it's a priority level even lower priority than nice 
+19".

	Ingo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SCHED_IDLE documentation
  2008-03-03  7:33       ` Ingo Molnar
@ 2008-03-03  8:40         ` Michael Kerrisk
  2008-03-03  9:24           ` Ingo Molnar
  0 siblings, 1 reply; 18+ messages in thread
From: Michael Kerrisk @ 2008-03-03  8:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnd Bergmann, Christoph Hellwig, cbe-oss-dev, Jeremy Kerr, linux-kernel

On Mon, Mar 3, 2008 at 8:33 AM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Arnd Bergmann <arnd@arndb.de> wrote:
>
> > If we don't have any man page, what is the actual definition of
> > SCHED_IDLE anyway?
>
> it's rather simple: "it's a priority level even lower priority than nice
> +19".

Some other questions whose answers may be worth including in the man page:

* When was SCHED_IDLE added?  (Actually, who added it?)

* Why was it added?  (What are the particular benefits of the new
sceuling class as opposed to using a very low nice value for
SCHED_OTHER?)

* What's the difference between SCHED_IDLE and SCHED_BATCH?

Cheers,

Michael


-- 
Michael Kerrisk
Maintainer of the Linux man-pages project
http://www.kernel.org/doc/man-pages/
Want to report a man-pages bug?  Look here:
http://www.kernel.org/doc/man-pages/reporting_bugs.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SCHED_IDLE documentation
  2008-03-03  8:40         ` Michael Kerrisk
@ 2008-03-03  9:24           ` Ingo Molnar
  2008-03-03  9:31             ` Arnd Bergmann
                               ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Ingo Molnar @ 2008-03-03  9:24 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: Arnd Bergmann, Christoph Hellwig, cbe-oss-dev, Jeremy Kerr, linux-kernel


* Michael Kerrisk <mtk.manpages@googlemail.com> wrote:

> On Mon, Mar 3, 2008 at 8:33 AM, Ingo Molnar <mingo@elte.hu> wrote:
> >
> > * Arnd Bergmann <arnd@arndb.de> wrote:
> >
> > > If we don't have any man page, what is the actual definition of
> > > SCHED_IDLE anyway?
> >
> > it's rather simple: "it's a priority level even lower priority than nice
> > +19".
> 
> Some other questions whose answers may be worth including in the man page:
> 
> * When was SCHED_IDLE added?  (Actually, who added it?)

"git-blame include/linux/sched.h" gives you that information, it was 
added by me as part of CFS:

  commit 0e6aca43e08a62a48d6770e9a159dbec167bf4c6
  Author: Ingo Molnar <mingo@elte.hu>
  Date:   Mon Jul 9 18:51:57 2007 +0200

      sched: add SCHED_IDLE policy
 
> * Why was it added?  (What are the particular benefits of the new 
> sceuling class as opposed to using a very low nice value for 
> SCHED_OTHER?)

because some people wanted even lower priorities than what nice +19 
gave, and extending nice levels wasnt possible for ABI reasons.

> * What's the difference between SCHED_IDLE and SCHED_BATCH?

SCHED_BATCH can still have nice levels from -20 to +19, it is a modified 
SCHED_OTHER/SCHED_NORMAL for "throughput oriented" workloads.

SCHED_IDLE overrides the nice settings and it means a "super idle" 
workload.

	Ingo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SCHED_IDLE documentation
  2008-03-03  9:24           ` Ingo Molnar
@ 2008-03-03  9:31             ` Arnd Bergmann
  2008-03-03 10:03               ` Michael Kerrisk
  2008-03-03 10:04               ` Ingo Molnar
  2008-03-03 10:07             ` Michael Kerrisk
  2008-03-03 12:31             ` Michael Kerrisk
  2 siblings, 2 replies; 18+ messages in thread
From: Arnd Bergmann @ 2008-03-03  9:31 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Michael Kerrisk, Christoph Hellwig, cbe-oss-dev, Jeremy Kerr,
	linux-kernel

On Monday 03 March 2008, Ingo Molnar wrote:
> > * What's the difference between SCHED_IDLE and SCHED_BATCH?
> 
> SCHED_BATCH can still have nice levels from -20 to +19, it is a modified 
> SCHED_OTHER/SCHED_NORMAL for "throughput oriented" workloads.
> 
> SCHED_IDLE overrides the nice settings and it means a "super idle" 
> workload.

Does that mean that a SCHED_IDLE task still runs some of the time if
you have a CPU hog running on +19, or can any other process starve the
SCHED_IDLE task?

What happens if you have two SCHED_IDLE tasks on a single CPU, do they get
equal share, or will they just run as batch jobs?

	Arnd <><

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SCHED_IDLE documentation
  2008-03-03  9:31             ` Arnd Bergmann
@ 2008-03-03 10:03               ` Michael Kerrisk
  2008-03-03 10:04               ` Ingo Molnar
  1 sibling, 0 replies; 18+ messages in thread
From: Michael Kerrisk @ 2008-03-03 10:03 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Ingo Molnar, Christoph Hellwig, cbe-oss-dev, Jeremy Kerr, linux-kernel

On Mon, Mar 3, 2008 at 10:31 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Monday 03 March 2008, Ingo Molnar wrote:
> > > * What's the difference between SCHED_IDLE and SCHED_BATCH?
> >
> > SCHED_BATCH can still have nice levels from -20 to +19, it is a modified
> > SCHED_OTHER/SCHED_NORMAL for "throughput oriented" workloads.
> >
> > SCHED_IDLE overrides the nice settings and it means a "super idle"
> > workload.
>
> Does that mean that a SCHED_IDLE task still runs some of the time if
> you have a CPU hog running on +19, or can any other process starve the
> SCHED_IDLE task?

Thanks.  That was going to be my next question!

I'm guessing that SCHED_IDLE doesn't get completely starved by a CPU
hog running nice +19, but it would still be interesting to know:
roughly how much of the CPU could the CPU_IDLE process expect to get
in that case?

> What happens if you have two SCHED_IDLE tasks on a single CPU, do they get
> equal share, or will they just run as batch jobs?

Also might be useful to add th e answer to this to the man page...

Cheers,

Michael


-- 
Michael Kerrisk
Maintainer of the Linux man-pages project
http://www.kernel.org/doc/man-pages/
Want to report a man-pages bug?  Look here:
http://www.kernel.org/doc/man-pages/reporting_bugs.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SCHED_IDLE documentation
  2008-03-03  9:31             ` Arnd Bergmann
  2008-03-03 10:03               ` Michael Kerrisk
@ 2008-03-03 10:04               ` Ingo Molnar
  2008-03-03 10:12                 ` Michael Kerrisk
  1 sibling, 1 reply; 18+ messages in thread
From: Ingo Molnar @ 2008-03-03 10:04 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Michael Kerrisk, Christoph Hellwig, cbe-oss-dev, Jeremy Kerr,
	linux-kernel


* Arnd Bergmann <arnd@arndb.de> wrote:

> On Monday 03 March 2008, Ingo Molnar wrote:
> > > * What's the difference between SCHED_IDLE and SCHED_BATCH?
> > 
> > SCHED_BATCH can still have nice levels from -20 to +19, it is a modified 
> > SCHED_OTHER/SCHED_NORMAL for "throughput oriented" workloads.
> > 
> > SCHED_IDLE overrides the nice settings and it means a "super idle" 
> > workload.
> 
> Does that mean that a SCHED_IDLE task still runs some of the time if 
> you have a CPU hog running on +19, or can any other process starve the 
> SCHED_IDLE task?

yes, even SCHED_IDLE tasks get some CPU time - so complete starvation 
should not be possible. We dont really want to define the exact amount 
of time they need. (we might want to change that in the future) But it 
will always be less prio than nice +19 :-) You can think of it as if it 
was "nice +30".

> What happens if you have two SCHED_IDLE tasks on a single CPU, do they 
> get equal share, or will they just run as batch jobs?

they get equal share.

you can try it out too: if you have 2.6.23 or newer kernels then just 
pick up schedtool and use "schedtool -I" to run SCHED_IDLE tasks.

	Ingo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SCHED_IDLE documentation
  2008-03-03  9:24           ` Ingo Molnar
  2008-03-03  9:31             ` Arnd Bergmann
@ 2008-03-03 10:07             ` Michael Kerrisk
  2008-03-03 10:17               ` Ingo Molnar
  2008-03-03 12:31             ` Michael Kerrisk
  2 siblings, 1 reply; 18+ messages in thread
From: Michael Kerrisk @ 2008-03-03 10:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnd Bergmann, Christoph Hellwig, cbe-oss-dev, Jeremy Kerr, linux-kernel

On Mon, Mar 3, 2008 at 10:24 AM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Michael Kerrisk <mtk.manpages@googlemail.com> wrote:
>
> > On Mon, Mar 3, 2008 at 8:33 AM, Ingo Molnar <mingo@elte.hu> wrote:
> > >
> > > * Arnd Bergmann <arnd@arndb.de> wrote:
> > >
> > > > If we don't have any man page, what is the actual definition of
> > > > SCHED_IDLE anyway?
> > >
> > > it's rather simple: "it's a priority level even lower priority than nice
> > > +19".
> >
> > Some other questions whose answers may be worth including in the man page:
> >
> > * When was SCHED_IDLE added?  (Actually, who added it?)
>
> "git-blame include/linux/sched.h" gives you that information, it was
> added by me as part of CFS:
>
>  commit 0e6aca43e08a62a48d6770e9a159dbec167bf4c6
>  Author: Ingo Molnar <mingo@elte.hu>
>  Date:   Mon Jul 9 18:51:57 2007 +0200

Yep -- I found it later -- thanks.

Ingo, could you please CC me when kernel-userland API changes go into
mainline?  Otherwise, they potentially end up undocumented, and
un(der)used.

Thanks,

Michael


-- 
Michael Kerrisk
Maintainer of the Linux man-pages project
http://www.kernel.org/doc/man-pages/
Want to report a man-pages bug?  Look here:
http://www.kernel.org/doc/man-pages/reporting_bugs.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SCHED_IDLE documentation
  2008-03-03 10:04               ` Ingo Molnar
@ 2008-03-03 10:12                 ` Michael Kerrisk
  0 siblings, 0 replies; 18+ messages in thread
From: Michael Kerrisk @ 2008-03-03 10:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnd Bergmann, Christoph Hellwig, cbe-oss-dev, Jeremy Kerr, linux-kernel

On Mon, Mar 3, 2008 at 11:04 AM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Arnd Bergmann <arnd@arndb.de> wrote:
>
> > On Monday 03 March 2008, Ingo Molnar wrote:
> > > > * What's the difference between SCHED_IDLE and SCHED_BATCH?
> > >
> > > SCHED_BATCH can still have nice levels from -20 to +19, it is a modified
> > > SCHED_OTHER/SCHED_NORMAL for "throughput oriented" workloads.
> > >
> > > SCHED_IDLE overrides the nice settings and it means a "super idle"
> > > workload.
> >
> > Does that mean that a SCHED_IDLE task still runs some of the time if
> > you have a CPU hog running on +19, or can any other process starve the
> > SCHED_IDLE task?
>
> yes, even SCHED_IDLE tasks get some CPU time - so complete starvation
> should not be possible. We dont really want to define the exact amount
> of time they need. (we might want to change that in the future) But it
> will always be less prio than nice +19 :-) You can think of it as if it
> was "nice +30".
>
> > What happens if you have two SCHED_IDLE tasks on a single CPU, do they
> > get equal share, or will they just run as batch jobs?
>
> they get equal share.
>
> you can try it out too: if you have 2.6.23 or newer kernels then just
> pick up schedtool and use "schedtool -I" to run SCHED_IDLE tasks.

Thanks for the answers Ingo!

Cheers,

Michael

-- 
Michael Kerrisk
Maintainer of the Linux man-pages project
http://www.kernel.org/doc/man-pages/
Want to report a man-pages bug?  Look here:
http://www.kernel.org/doc/man-pages/reporting_bugs.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SCHED_IDLE documentation
  2008-03-03 10:07             ` Michael Kerrisk
@ 2008-03-03 10:17               ` Ingo Molnar
  2008-03-03 10:20                 ` Michael Kerrisk
  0 siblings, 1 reply; 18+ messages in thread
From: Ingo Molnar @ 2008-03-03 10:17 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: Arnd Bergmann, Christoph Hellwig, cbe-oss-dev, Jeremy Kerr,
	linux-kernel, Andrew Morton


* Michael Kerrisk <mtk.manpages@googlemail.com> wrote:

> >  commit 0e6aca43e08a62a48d6770e9a159dbec167bf4c6
> >  Author: Ingo Molnar <mingo@elte.hu>
> >  Date:   Mon Jul 9 18:51:57 2007 +0200
> 
> Yep -- I found it later -- thanks.
> 
> Ingo, could you please CC me when kernel-userland API changes go into 
> mainline?  Otherwise, they potentially end up undocumented, and 
> un(der)used.

sure.

We could also add an in-commit-message marker for such changes, that you 
could periodically scan. Something like:

   User-ABI-extended-by: Ingo Molnar <mingo@elte.hu>

and:

   User-ABI-modified-by: Ingo Molnar <mingo@elte.hu>

that way you also know whom to contact about followup questions.

Declaring such changes would have other benefits as well: the review 
process becomes more streamlined. Also, any ABI side-effect would be 
known to be intentional versus unintentional, based on the commit 
headers alone. Undeclared ABI side-effects would be frowned upon and 
would be strong grounds for immediate reversal as well.

	Ingo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SCHED_IDLE documentation
  2008-03-03 10:17               ` Ingo Molnar
@ 2008-03-03 10:20                 ` Michael Kerrisk
  0 siblings, 0 replies; 18+ messages in thread
From: Michael Kerrisk @ 2008-03-03 10:20 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnd Bergmann, Christoph Hellwig, cbe-oss-dev, Jeremy Kerr,
	linux-kernel, Andrew Morton

On Mon, Mar 3, 2008 at 11:17 AM, Ingo Molnar <mingo@elte.hu> wrote:
>
>  * Michael Kerrisk <mtk.manpages@googlemail.com> wrote:
>
>
> > >  commit 0e6aca43e08a62a48d6770e9a159dbec167bf4c6
>  > >  Author: Ingo Molnar <mingo@elte.hu>
>  > >  Date:   Mon Jul 9 18:51:57 2007 +0200
>  >
>  > Yep -- I found it later -- thanks.
>  >
>  > Ingo, could you please CC me when kernel-userland API changes go into
>  > mainline?  Otherwise, they potentially end up undocumented, and
>  > un(der)used.
>
>  sure.
>
>  We could also add an in-commit-message marker for such changes, that you
>  could periodically scan. Something like:
>
>    User-ABI-extended-by: Ingo Molnar <mingo@elte.hu>
>
>  and:
>
>    User-ABI-modified-by: Ingo Molnar <mingo@elte.hu>
>
>  that way you also know whom to contact about followup questions.
>
>  Declaring such changes would have other benefits as well: the review
>  process becomes more streamlined. Also, any ABI side-effect would be
>  known to be intentional versus unintentional, based on the commit
>  headers alone. Undeclared ABI side-effects would be frowned upon and
>  would be strong grounds for immediate reversal as well.

Sounds like an excellent idea, if we could make it fly.  How can we make it fly?

Cheers,

Michael


-- 
Michael Kerrisk
Maintainer of the Linux man-pages project
http://www.kernel.org/doc/man-pages/
Want to report a man-pages bug?  Look here:
http://www.kernel.org/doc/man-pages/reporting_bugs.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SCHED_IDLE documentation
  2008-03-03  9:24           ` Ingo Molnar
  2008-03-03  9:31             ` Arnd Bergmann
  2008-03-03 10:07             ` Michael Kerrisk
@ 2008-03-03 12:31             ` Michael Kerrisk
  2008-03-03 12:52               ` Ingo Molnar
  2 siblings, 1 reply; 18+ messages in thread
From: Michael Kerrisk @ 2008-03-03 12:31 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnd Bergmann, Christoph Hellwig, cbe-oss-dev, Jeremy Kerr, linux-kernel

Ingo,

> > * What's the difference between SCHED_IDLE and SCHED_BATCH?
>
> SCHED_BATCH can still have nice levels from -20 to +19, it is a modified
> SCHED_OTHER/SCHED_NORMAL for "throughput oriented" workloads.

So, suppose we have two CPU intensive jobs, one SCHED_OTHER and the
other SCHED_BATCH.  If they have the same nice value, will/should the
scheduler favour one over the other?

I've done some testing on 2.6.25-rc2, x86-32 for this case, and it
appears that the two jobs are treated the same by the scheduler (each
gets 50% of CPU).  Is that expected behavior?  If it is, can you give
an example where scheduling SCHED_OTHER versus SCHED_BATCH should show
a difference in the amount of CPU received by each process?

> SCHED_IDLE overrides the nice settings and it means a "super idle"
> workload.

Tested on 2.6.25-rc2, x86-32.  Two CPU intensive jobs, one
SCHED_OTHER, nice=+19, the other SCHED_IDLE.  The SCHED_OTHER job gets
~88% of CPU.  So SCHED_IDLE does indeed give a "super low nice"
effect.

Cheers,

Michael

-- 
Michael Kerrisk
Maintainer of the Linux man-pages project
http://www.kernel.org/doc/man-pages/
Want to report a man-pages bug?  Look here:
http://www.kernel.org/doc/man-pages/reporting_bugs.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SCHED_IDLE documentation
  2008-03-03 12:31             ` Michael Kerrisk
@ 2008-03-03 12:52               ` Ingo Molnar
  2008-03-03 14:06                 ` Michael Kerrisk
  2008-03-03 14:42                 ` Michael Kerrisk
  0 siblings, 2 replies; 18+ messages in thread
From: Ingo Molnar @ 2008-03-03 12:52 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: Arnd Bergmann, Christoph Hellwig, cbe-oss-dev, Jeremy Kerr, linux-kernel


* Michael Kerrisk <mtk.manpages@googlemail.com> wrote:

> > > * What's the difference between SCHED_IDLE and SCHED_BATCH?
> >
> > SCHED_BATCH can still have nice levels from -20 to +19, it is a 
> > modified SCHED_OTHER/SCHED_NORMAL for "throughput oriented" 
> > workloads.
> 
> So, suppose we have two CPU intensive jobs, one SCHED_OTHER and the 
> other SCHED_BATCH.  If they have the same nice value, will/should the 
> scheduler favour one over the other?

yes - SCHED_BATCH does not modify the CPU usage proportion for 
CPU-intense tasks, it's their nice value that controls the proportion. 
What it will influence is wakeup behavior - i.e. wakeup-intense 
workloads should schedule less with SCHED_BATCH. (but how that is done 
is really fluid and will probably tweaked in the future.)

	Ingo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SCHED_IDLE documentation
  2008-03-03 12:52               ` Ingo Molnar
@ 2008-03-03 14:06                 ` Michael Kerrisk
  2008-03-04 11:11                   ` Peter Zijlstra
  2008-03-03 14:42                 ` Michael Kerrisk
  1 sibling, 1 reply; 18+ messages in thread
From: Michael Kerrisk @ 2008-03-03 14:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnd Bergmann, Christoph Hellwig, cbe-oss-dev, Jeremy Kerr, linux-kernel

Ingo,

On Mon, Mar 3, 2008 at 1:52 PM, Ingo Molnar <mingo@elte.hu> wrote:
>
>  * Michael Kerrisk <mtk.manpages@googlemail.com> wrote:
>
>
> > > > * What's the difference between SCHED_IDLE and SCHED_BATCH?
>  > >
>  > > SCHED_BATCH can still have nice levels from -20 to +19, it is a
>  > > modified SCHED_OTHER/SCHED_NORMAL for "throughput oriented"
>  > > workloads.
>  >
>  > So, suppose we have two CPU intensive jobs, one SCHED_OTHER and the
>  > other SCHED_BATCH.  If they have the same nice value, will/should the
>  > scheduler favour one over the other?
>
>  yes - SCHED_BATCH does not modify the CPU usage proportion for
>  CPU-intense tasks, it's their nice value that controls the proportion.
>  What it will influence is wakeup behavior - i.e. wakeup-intense
>  workloads should schedule less with SCHED_BATCH. (but how that is done
>  is really fluid and will probably tweaked in the future.)
>
>         Ingo

So, I've tweaked the description of SCHED_BATCH in the
sched_setscheduler.2 man page, and added some text describing
SCHED_IDLE.  Relevant excepts below.  Does his look okay to you?

       SCHED_OTHER is the default universal time-sharing  sched-
       uler  policy  used  by  most  processes.   SCHED_BATCH is
       intended  for  "batch"  style  execution  of   processes.
       SCHED_IDLE  is  intended  for  running  very low priority
       background jobs.  SCHED_FIFO and  SCHED_RR  are  intended
       for  special time-critical applications that need precise
       control over the way  in  which  runnable  processes  are
       selected for execution.

       Processes  scheduled  with  SCHED_OTHER,  SCHED_BATCH, or
       SCHED_IDLE must be assigned the static priority 0.   Pro-
       cesses  scheduled under SCHED_FIFO or SCHED_RR can have a
       static priority in the range 1 to 99.
       ...

   SCHED_BATCH: Scheduling batch processes
       (Since Linux 2.6.16.)  SCHED_BATCH can only  be  used  at
       static   priority   0.    This   policy   is  similar  to
       SCHED_OTHER, except that it will cause the  scheduler  to
       always  assume that the process is CPU-intensive.  Conse-
       quently, the scheduler  will  apply  a  small  scheduling
       penalty  with  respect  to wakeup behaviour, so that this
       process is mildly  disfavored  in  scheduling  decisions.
       This policy is useful for workloads that are non-interac-
       tive, but do not want to lower their nice value, and  for
       workloads  that  want  a  deterministic scheduling policy
       without interactivity causing extra preemptions  (between
       the workload's tasks).

   SCHED_IDLE: Scheduling very low priority jobs
       (Since  Linux  2.6.23.)   SCHED_IDLE  can only be used at
       static priority 0; the process nice value has  no  influ-
       ence  for  this policy.  This policy is intended for run-
       ning jobs at extremely low priority (lower  even  than  a
       +19  nice value with the SCHED_OTHER or SCHED_BATCH poli-
       cies).

Cheers,

Michael

-- 
Michael Kerrisk
Maintainer of the Linux man-pages project
http://www.kernel.org/doc/man-pages/
Want to report a man-pages bug?  Look here:
http://www.kernel.org/doc/man-pages/reporting_bugs.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SCHED_IDLE documentation
  2008-03-03 12:52               ` Ingo Molnar
  2008-03-03 14:06                 ` Michael Kerrisk
@ 2008-03-03 14:42                 ` Michael Kerrisk
  2008-03-05 15:02                   ` Michael Kerrisk
  1 sibling, 1 reply; 18+ messages in thread
From: Michael Kerrisk @ 2008-03-03 14:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnd Bergmann, Christoph Hellwig, cbe-oss-dev, Jeremy Kerr, linux-kernel

Ingo,

One more thought while we're in this thread.  In my recent tests I
notice that the magnitude of the effect of nice values has changed
quite a lot in recent times.  For example, back in 2.6.18, two CPU
intensive processes would get the following shares of the CPU over 100
seconds of run time (here, three different examples of nice value
settings):

A nice	B nice	%A	%B
-20	-10	58.3	41.7
-20	  0	89.2	10.8
-20	+19	99.7	 0.3

In 2.6.25-rc2, we have:

A nice	B nice	%A	%B
-20	-10	 90.5	9.5
-20	  0	 99.0	1.0
-20	+19	100.0	0.0

While I realise that nice values are not intended to guarantee any
particular degree of access to the CPU, these wide variations in the
effect of the nice value are surprising.  (For the 2.6.25-rc2 -20/+19
case, my test shows the low priority process is getting 0.000% of the
CPU -- i.e., < one thousandth of a percent.  In other words it is in
effect being totally starved of the CPU.)  Any comments?

Cheers,

Michael

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SCHED_IDLE documentation
  2008-03-03 14:06                 ` Michael Kerrisk
@ 2008-03-04 11:11                   ` Peter Zijlstra
  2008-03-05 15:19                     ` Michael Kerrisk
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2008-03-04 11:11 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: Ingo Molnar, Arnd Bergmann, Christoph Hellwig, cbe-oss-dev,
	Jeremy Kerr, linux-kernel


On Mon, 2008-03-03 at 15:06 +0100, Michael Kerrisk wrote:
> Ingo,
> 
> On Mon, Mar 3, 2008 at 1:52 PM, Ingo Molnar <mingo@elte.hu> wrote:
> >
> >  * Michael Kerrisk <mtk.manpages@googlemail.com> wrote:
> >
> >
> > > > > * What's the difference between SCHED_IDLE and SCHED_BATCH?
> >  > >
> >  > > SCHED_BATCH can still have nice levels from -20 to +19, it is a
> >  > > modified SCHED_OTHER/SCHED_NORMAL for "throughput oriented"
> >  > > workloads.
> >  >
> >  > So, suppose we have two CPU intensive jobs, one SCHED_OTHER and the
> >  > other SCHED_BATCH.  If they have the same nice value, will/should the
> >  > scheduler favour one over the other?
> >
> >  yes - SCHED_BATCH does not modify the CPU usage proportion for
> >  CPU-intense tasks, it's their nice value that controls the proportion.
> >  What it will influence is wakeup behavior - i.e. wakeup-intense
> >  workloads should schedule less with SCHED_BATCH. (but how that is done
> >  is really fluid and will probably tweaked in the future.)
> >
> >         Ingo
> 
> So, I've tweaked the description of SCHED_BATCH in the
> sched_setscheduler.2 man page, and added some text describing
> SCHED_IDLE.  Relevant excepts below.  Does his look okay to you?
> 
>        SCHED_OTHER is the default universal time-sharing  sched-
>        uler  policy  used  by  most  processes.   SCHED_BATCH is
>        intended  for  "batch"  style  execution  of   processes.
>        SCHED_IDLE  is  intended  for  running  very low priority
>        background jobs.  SCHED_FIFO and  SCHED_RR  are  intended
>        for  special time-critical applications that need precise
>        control over the way  in  which  runnable  processes  are
>        selected for execution.
> 
>        Processes  scheduled  with  SCHED_OTHER,  SCHED_BATCH, or
>        SCHED_IDLE must be assigned the static priority 0.   Pro-
>        cesses  scheduled under SCHED_FIFO or SCHED_RR can have a
>        static priority in the range 1 to 99.
>        ...
> 
>    SCHED_BATCH: Scheduling batch processes
>        (Since Linux 2.6.16.)  SCHED_BATCH can only  be  used  at
>        static   priority   0.    This   policy   is  similar  to
>        SCHED_OTHER, except that it will cause the  scheduler  to
>        always  assume that the process is CPU-intensive.  Conse-
>        quently, the scheduler  will  apply  a  small  scheduling
>        penalty  with  respect  to wakeup behaviour, so that this
>        process is mildly  disfavored  in  scheduling  decisions.
>        This policy is useful for workloads that are non-interac-
>        tive, but do not want to lower their nice value, and  for
>        workloads  that  want  a  deterministic scheduling policy
>        without interactivity causing extra preemptions  (between
>        the workload's tasks).
> 
>    SCHED_IDLE: Scheduling very low priority jobs
>        (Since  Linux  2.6.23.)   SCHED_IDLE  can only be used at
>        static priority 0; the process nice value has  no  influ-
>        ence  for  this policy.  This policy is intended for run-
>        ning jobs at extremely low priority (lower  even  than  a
>        +19  nice value with the SCHED_OTHER or SCHED_BATCH poli-
>        cies).

Your SCHED_BATCH and SCHED_IDLE descriptions seem at odds, in that your
SCHED_IDLE description says you can run SCHED_BATCH +19, however your
SCHED_BATCH description says you can only run at nice 0.

To clarify SCHED_BATCH _can_ indeed use the full nice range.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SCHED_IDLE documentation
  2008-03-03 14:42                 ` Michael Kerrisk
@ 2008-03-05 15:02                   ` Michael Kerrisk
  0 siblings, 0 replies; 18+ messages in thread
From: Michael Kerrisk @ 2008-03-05 15:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnd Bergmann, Christoph Hellwig, cbe-oss-dev, Jeremy Kerr, linux-kernel

On Mon, Mar 3, 2008 at 3:42 PM, Michael Kerrisk
<mtk.manpages@googlemail.com> wrote:
> Ingo,
>
>  One more thought while we're in this thread.  In my recent tests I
>  notice that the magnitude of the effect of nice values has changed
>  quite a lot in recent times.  For example, back in 2.6.18, two CPU
>  intensive processes would get the following shares of the CPU over 100
>  seconds of run time (here, three different examples of nice value
>  settings):
>
>  A nice  B nice  %A      %B
>  -20     -10     58.3    41.7
>  -20       0     89.2    10.8
>  -20     +19     99.7     0.3
>
>  In 2.6.25-rc2, we have:
>
>  A nice  B nice  %A      %B
>  -20     -10      90.5   9.5
>  -20       0      99.0   1.0
>  -20     +19     100.0   0.0
>
>  While I realise that nice values are not intended to guarantee any
>  particular degree of access to the CPU, these wide variations in the
>  effect of the nice value are surprising.  (For the 2.6.25-rc2 -20/+19
>  case, my test shows the low priority process is getting 0.000% of the
>  CPU -- i.e., < one thousandth of a percent.  In other words it is in
>  effect being totally starved of the CPU.)  Any comments?

Off list, Ingo pointed me at
Documentation/scheduler/sched-nice-design.txt which explains the
changes that took effect for nice values in 2.6.32.  I've added a
reference to that doc to the getpriority.2 page, and also added the
following text to NOTES on that page:

       The degree to which their relative nice value affects the
       scheduling  of processes varies across Unix systems, and,
       on Linux, across kernel versions.  Starting  with  kernel
       2.6.23,  Linux  adopted an algorithm that causes relative
       differences in  nice  values  to  have  a  much  stronger
       effect.   This causes very low nice values (+19) to truly
       provide little CPU to a process  whenever  there  is  any
       other  higher priority load on the system, and makes high
       nice values (-20) deliver most of the CPU to applications
       that require it (e.g., some audio applications).

I also tweaked my test program to deliver slightly more accurate info.
 With two CPU bound SCHED_OTHER processes, one at nice -20, the other
at nice+19, I found that the latter process is not _completely_
starved of the CPU: it gets about 0.006% of the CPU during a 5-minute
period..

Cheers,

Michael

-- 
Michael Kerrisk
Maintainer of the Linux man-pages project
http://www.kernel.org/doc/man-pages/
Want to report a man-pages bug?  Look here:
http://www.kernel.org/doc/man-pages/reporting_bugs.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: SCHED_IDLE documentation
  2008-03-04 11:11                   ` Peter Zijlstra
@ 2008-03-05 15:19                     ` Michael Kerrisk
  0 siblings, 0 replies; 18+ messages in thread
From: Michael Kerrisk @ 2008-03-05 15:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnd Bergmann, Christoph Hellwig, cbe-oss-dev,
	Jeremy Kerr, linux-kernel

On Tue, Mar 4, 2008 at 12:11 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
>
>  On Mon, 2008-03-03 at 15:06 +0100, Michael Kerrisk wrote:
>  > Ingo,
>  >
>  > On Mon, Mar 3, 2008 at 1:52 PM, Ingo Molnar <mingo@elte.hu> wrote:
>  > >
>  > >  * Michael Kerrisk <mtk.manpages@googlemail.com> wrote:
>  > >
>  > >
>  > > > > > * What's the difference between SCHED_IDLE and SCHED_BATCH?
>  > >  > >
>  > >  > > SCHED_BATCH can still have nice levels from -20 to +19, it is a
>  > >  > > modified SCHED_OTHER/SCHED_NORMAL for "throughput oriented"
>  > >  > > workloads.
>  > >  >
>  > >  > So, suppose we have two CPU intensive jobs, one SCHED_OTHER and the
>  > >  > other SCHED_BATCH.  If they have the same nice value, will/should the
>  > >  > scheduler favour one over the other?
>  > >
>  > >  yes - SCHED_BATCH does not modify the CPU usage proportion for
>  > >  CPU-intense tasks, it's their nice value that controls the proportion.
>  > >  What it will influence is wakeup behavior - i.e. wakeup-intense
>  > >  workloads should schedule less with SCHED_BATCH. (but how that is done
>  > >  is really fluid and will probably tweaked in the future.)
>  > >
>  > >         Ingo
>  >
>  > So, I've tweaked the description of SCHED_BATCH in the
>  > sched_setscheduler.2 man page, and added some text describing
>  > SCHED_IDLE.  Relevant excepts below.  Does his look okay to you?
>  >
>  >        SCHED_OTHER is the default universal time-sharing  sched-
>  >        uler  policy  used  by  most  processes.   SCHED_BATCH is
>  >        intended  for  "batch"  style  execution  of   processes.
>  >        SCHED_IDLE  is  intended  for  running  very low priority
>  >        background jobs.  SCHED_FIFO and  SCHED_RR  are  intended
>  >        for  special time-critical applications that need precise
>  >        control over the way  in  which  runnable  processes  are
>  >        selected for execution.
>  >
>  >        Processes  scheduled  with  SCHED_OTHER,  SCHED_BATCH, or
>  >        SCHED_IDLE must be assigned the static priority 0.   Pro-
>  >        cesses  scheduled under SCHED_FIFO or SCHED_RR can have a
>  >        static priority in the range 1 to 99.
>  >        ...
>  >
>  >    SCHED_BATCH: Scheduling batch processes
>  >        (Since Linux 2.6.16.)  SCHED_BATCH can only  be  used  at
>  >        static   priority   0.    This   policy   is  similar  to
>  >        SCHED_OTHER, except that it will cause the  scheduler  to
>  >        always  assume that the process is CPU-intensive.  Conse-
>  >        quently, the scheduler  will  apply  a  small  scheduling
>  >        penalty  with  respect  to wakeup behaviour, so that this
>  >        process is mildly  disfavored  in  scheduling  decisions.
>  >        This policy is useful for workloads that are non-interac-
>  >        tive, but do not want to lower their nice value, and  for
>  >        workloads  that  want  a  deterministic scheduling policy
>  >        without interactivity causing extra preemptions  (between
>  >        the workload's tasks).
>  >
>  >    SCHED_IDLE: Scheduling very low priority jobs
>  >        (Since  Linux  2.6.23.)   SCHED_IDLE  can only be used at
>  >        static priority 0; the process nice value has  no  influ-
>  >        ence  for  this policy.  This policy is intended for run-
>  >        ning jobs at extremely low priority (lower  even  than  a
>  >        +19  nice value with the SCHED_OTHER or SCHED_BATCH poli-
>  >        cies).
>
>  Your SCHED_BATCH and SCHED_IDLE descriptions seem at odds, in that your
>  SCHED_IDLE description says you can run SCHED_BATCH +19, however your
>  SCHED_BATCH description says you can only run at nice 0.
>
>  To clarify SCHED_BATCH _can_ indeed use the full nice range.

Peter,

The problem is that the page is slightly confusing in that it talks
about two different types of priorities:

"static priorities" which can only be non-zero (1 to 99) for RR and
FIFO policies.

"dynamic priorities" (aka "nice value") which can be (in userspace) -20 to +19.

I tweaked the wording in the description of SCHED_BATCH a little to
make this clearer.

Thanks,

Michael

-- 
Michael Kerrisk
Maintainer of the Linux man-pages project
http://www.kernel.org/doc/man-pages/
Want to report a man-pages bug?  Look here:
http://www.kernel.org/doc/man-pages/reporting_bugs.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2008-03-05 15:19 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1203376368.275756.252634247263.1.gpush@pokey>
     [not found] ` <200803030612.28039.arnd@arndb.de>
     [not found]   ` <20080303051719.GA26102@lst.de>
2008-03-03  6:21     ` SCHED_IDLE documentation Arnd Bergmann
2008-03-03  7:33       ` Ingo Molnar
2008-03-03  8:40         ` Michael Kerrisk
2008-03-03  9:24           ` Ingo Molnar
2008-03-03  9:31             ` Arnd Bergmann
2008-03-03 10:03               ` Michael Kerrisk
2008-03-03 10:04               ` Ingo Molnar
2008-03-03 10:12                 ` Michael Kerrisk
2008-03-03 10:07             ` Michael Kerrisk
2008-03-03 10:17               ` Ingo Molnar
2008-03-03 10:20                 ` Michael Kerrisk
2008-03-03 12:31             ` Michael Kerrisk
2008-03-03 12:52               ` Ingo Molnar
2008-03-03 14:06                 ` Michael Kerrisk
2008-03-04 11:11                   ` Peter Zijlstra
2008-03-05 15:19                     ` Michael Kerrisk
2008-03-03 14:42                 ` Michael Kerrisk
2008-03-05 15:02                   ` Michael Kerrisk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).