linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Paul Mackerras <paulus@samba.org>
Cc: Dan Hecht <dhecht@vmware.com>,
	dwalker@mvista.com, cpufreq@lists.linux.org.uk,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Con Kolivas <kernel@kolivas.org>,
	Chris Wright <chrisw@sous-sol.org>,
	Virtualization Mailing List <virtualization@lists.osdl.org>,
	john stultz <johnstul@us.ibm.com>, Ingo Molnar <mingo@elte.hu>,
	Thomas Gleixner <tglx@linutronix.de>,
	schwidefsky@de.ibm.com, Rik van Riel <riel@redhat.com>
Subject: Re: Stolen and degraded time and schedulers
Date: Thu, 15 Mar 2007 12:33:47 -0700	[thread overview]
Message-ID: <45F99F9B.40806@goop.org> (raw)
In-Reply-To: <17912.55404.556847.350399@cargo.ozlabs.ibm.com>

Paul Mackerras wrote:
> A cycle on one thread of a machine with SMT/hyperthreading when the
> other thread is idle *isn't* equivalent to a cycle when the other
> thread is busy.  We run into this on POWER5, where we have hardware
> that counts cycles when each of the two threads in each core gets to
> dispatch instructions (on each cycle, one thread or the other gets to
> dispatch).  That helps but still doesn't give a totally accurate
> estimate of how much computation a given process has managed to do.
>   

Yes, but it doesn't need to be 100% accurate to be useful; it just needs
to better characterize the amount of work done.  You could get a better
approximation by using two scaling factors: work done with other thread
idle, and work done when other thread busy.

>> I often nice my kernel builds
>> (which cpufreq takes as a hint to not ramp up the cpu speed) on my
>> laptop so to save power.
>>     
>
> Just as a side note - that's probably actually a bad strategy; you
> almost certainly consume less total energy by running the cpu at full
> speed until the build is done and then going to the deepest sleep mode
> you can achieve.
>   

It seems to me that a 5min build at 1/4 power is better than running for
2.5min at full power - voltage scaling gives you n^2 power use scaling,
remember.  Not that I've measured it or anything.

> What was the original proposal?  I came into this discussion late...
>   

My core proposal is basically that sched_clock() should try to return a
time which scales with the amount of work done by a CPU rather than
measure real time.  This helps solve two problems:

    * it accounts for time stolen by a hypervisor, since a stolen CPU
      does no work
    * it accounts for cpus running a lower operating points, since they
      do less work per unit time

You could also use it to take into account time stolen by interrupts,
SMM, thermal limiting and so on.  The idea is that processes shouldn't
get penalized for CPU time that they had no opportunity to use.

This is almost completely compatible with how sched_clock() is currently
used, except that the scheduler also uses it to measure sleeping time. 
This doesn't make much sense in my proposal because sched_clock() is an
inherently per-CPU time measure, and sleeping doesn't involve any CPU by
definition.  It also doesn't make much sense to say that a process slept
for less time simply because it wasn't using the CPU while it was being
stolen/running slowly.

Despite this, it works better than expected because the current
scheduler adjusts the sched_clock-derived process timestamps as they
move between runqueues, so they never get too far out of whack.

I've implemented sched_clock to only count non-stolen CPU nanoseconds in
the Xen-paravirt_ops implementation; we'll see how it works out.

    J

  reply	other threads:[~2007-03-15 19:33 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-13 16:31 Stolen and degraded time and schedulers Jeremy Fitzhardinge
2007-03-13 20:12 ` john stultz
2007-03-13 20:32   ` Jeremy Fitzhardinge
2007-03-13 21:27     ` Daniel Walker
2007-03-13 21:59       ` Jeremy Fitzhardinge
2007-03-14  0:43         ` Dan Hecht
2007-03-14  4:37           ` Jeremy Fitzhardinge
2007-03-14 13:58             ` Lennart Sorensen
2007-03-14 15:08               ` Jeremy Fitzhardinge
2007-03-14 15:12                 ` Lennart Sorensen
2007-03-14 19:02             ` Dan Hecht
2007-03-14 19:34               ` Jeremy Fitzhardinge
2007-03-14 19:45                 ` Rik van Riel
2007-03-14 19:47                   ` Jeremy Fitzhardinge
2007-03-14 20:02                     ` Rik van Riel
2007-03-14 20:26                 ` Dan Hecht
2007-03-14 20:31                   ` Jeremy Fitzhardinge
2007-03-14 20:46                     ` Dan Hecht
2007-03-14 21:18                       ` Jeremy Fitzhardinge
2007-03-15 19:09                         ` Dan Hecht
2007-03-15 19:18                           ` Jeremy Fitzhardinge
2007-03-15 19:48                           ` Rik van Riel
2007-03-15 19:53                           ` Jeremy Fitzhardinge
2007-03-15 20:07                             ` Dan Hecht
2007-03-15 20:14                               ` Rik van Riel
2007-03-15 20:35                                 ` Dan Hecht
2007-03-16  8:59                                   ` Martin Schwidefsky
2007-03-14 20:38                 ` Ingo Molnar
2007-03-14 20:59                   ` Jeremy Fitzhardinge
2007-03-16  8:38                     ` Ingo Molnar
2007-03-16 16:53                       ` Jeremy Fitzhardinge
2007-03-15  5:23                 ` Paul Mackerras
2007-03-15 19:33                   ` Jeremy Fitzhardinge [this message]
2007-03-14  2:00         ` Daniel Walker
2007-03-14  6:52           ` Jeremy Fitzhardinge
2007-03-14  8:20             ` Zan Lynx
2007-03-14 16:11             ` Daniel Walker
2007-03-14 16:37               ` Jeremy Fitzhardinge
2007-03-14 16:59                 ` Daniel Walker
2007-03-14 17:08                   ` Jeremy Fitzhardinge
2007-03-14 18:06                     ` Daniel Walker
2007-03-14 18:41                       ` Jeremy Fitzhardinge
2007-03-14 19:00                         ` Daniel Walker
2007-03-14 19:44                           ` Jeremy Fitzhardinge
2007-03-14 20:33                             ` Daniel Walker
2007-03-14 21:16                               ` Jeremy Fitzhardinge
2007-03-14 21:34                                 ` Daniel Walker
2007-03-14 21:42                                   ` Jeremy Fitzhardinge
2007-03-14 21:36 ` Con Kolivas
2007-03-14 21:38   ` Jeremy Fitzhardinge
2007-03-14 21:40   ` Con Kolivas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45F99F9B.40806@goop.org \
    --to=jeremy@goop.org \
    --cc=chrisw@sous-sol.org \
    --cc=cpufreq@lists.linux.org.uk \
    --cc=dhecht@vmware.com \
    --cc=dwalker@mvista.com \
    --cc=johnstul@us.ibm.com \
    --cc=kernel@kolivas.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=paulus@samba.org \
    --cc=riel@redhat.com \
    --cc=schwidefsky@de.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=virtualization@lists.osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).