linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Con Kolivas <kernel@kolivas.org>
To: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Andi Kleen <ak@suse.de>, Ingo Molnar <mingo@elte.hu>,
	Thomas Gleixner <tglx@linutronix.de>,
	Rusty Russell <rusty@rustcorp.com.au>,
	Zachary Amsden <zach@vmware.com>,
	James Morris <jmorris@namei.org>,
	john stultz <johnstul@us.ibm.com>,
	Chris Wright <chrisw@sous-sol.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	cpufreq@lists.linux.org.uk,
	Virtualization Mailing List <virtualization@lists.osdl.org>
Subject: Re: Stolen and degraded time and schedulers
Date: Thu, 15 Mar 2007 08:40:48 +1100	[thread overview]
Message-ID: <200703150840.49269.kernel@kolivas.org> (raw)
In-Reply-To: <200703150836.08670.kernel@kolivas.org>

On Thursday 15 March 2007 08:36, Con Kolivas wrote:
> On Wednesday 14 March 2007 03:31, Jeremy Fitzhardinge wrote:
> > The current Linux scheduler makes one big assumption: that 1ms of CPU
> > time is the same as any other 1ms of CPU time, and that therefore a
> > process makes the same amount of progress regardless of which particular
> > ms of time it gets.
> >
> > This assumption is wrong now, and will become more wrong as
> > virtualization gets more widely used.
> >
> > It's wrong now, because it fails to take into account of several kinds
> > of missing time:
> >
> >    1. interrupts - time spent in an ISR is accounted to the current
> >       process, even though it gets no direct benefit
> >    2. SMM - time is completely lost from the kernel
> >    3. slow CPUs - 1ms of 600MHz CPU is less useful than 1ms of 2.4GHz CPU
> >
> > The first two - time lost to interrupts - are a well known problem, and
> > are generally considered to be a non issue.  If you're losing a
> > significant amount of time to interrupts, you probably have bigger
> > problems.  (Or maybe not?)
> >
> > The third is not something I've seen discussed before, but it seems like
> > it could be a significant problem today.  Certainly, I've noticed it
> > myself: an interactive program decides to do something CPU-intensive
> > (like start an animation), and it chugs until the conservative governor
> > brings the CPU up to speed.  Certainly some of this is because its just
> > plain CPU-starved, but I think another factor is that it gets penalized
> > for running on a slow CPU: 1ms is not 1ms.  And for power reasons you
> > want to encourage processes to run on slow CPUs rather than penalize
> > them.
> >
> > Virtualization just exacerbates this.  If you have a busy machine
> > running multiple virtual CPUs, then each VCPU may only get a small
> > proportion of the total amount of available CPU time.  If the kernel's
> > scheduler asserts that "you were just scheduled for 1ms, therefore you
> > made 1ms of progress", then many timeslices will effectively end up
> > being 1ms of 0Mhz CPU - because the VCPU wasn't scheduled and the real
> > CPU was doing something else.
> >
> >
> > So how to deal with this?  Basically we need a clock which measures "CPU
> > work units", and have the scheduler use this clock.
> >
> > A "CPU work unit" clock has these properties:
> >
> >     * inherently per-CPU (from the kernel's perspective, so it would be
> >       per-VCPU in a virtual machine)
> >     * monotonic - you can't do negative work
> >     * measured in "work units"
> >
> > A "work unit" is probably most simply expressed in cycles - you assume a
> > cycle of CPU time is equivalent in terms of work done to any other
> > cycle.  This means that 1 cycle at 600MHz is equivalent to 1 cycle at
> > 2.4GHz - but of course the 2.4GHz processor gets 4 times as many in any
> > real time interval.  (This is the instance where the worst kind of tsc -
> > varying speed which stops on idle - is actually exactly what you want.)
> >
> > You could also measure "work units" in terms of normalized time units:
> > if the fastest CPU on the machine is 2.4GHz, then 1ms is 1ms a work unit
> > on that CPU, but 250us on the 600MHz CPU.
> >
> > It doesn't really matter what the unit is, so long as it is used
> > consistently to measure how much progress all processes made.
>
> I think you're looking for a complex solution to a problem that doesn't
> exist. The job of the process scheduler is to meter out the available cpu
> resources. It cannot make up cycles for a slow cpu or one that is
> throttled. If the problem is happening due to throttling it should be fixed
> by altering the throttle. The example you describe with the conservative
> governor is as easy to fix as changing to the ondemand governor.
> Differential power cpus on an SMP machine should be managed by SMP
> balancing choices based on power groups.
>
> It would be fine to implement some other accounting of this definition of
> time for other purposes

I mean such as for virtualisation purposes.

> but not for process scheduler decisions per se. 

>
> Sorry to chime in late.  My physical condition prevents me spending any
> extended period of time at the computer so I've tried to be succinct with
> my comments and may not be able to reply again.

-- 
-ck

      parent reply	other threads:[~2007-03-14 21:32 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-13 16:31 Stolen and degraded time and schedulers Jeremy Fitzhardinge
2007-03-13 20:12 ` john stultz
2007-03-13 20:32   ` Jeremy Fitzhardinge
2007-03-13 21:27     ` Daniel Walker
2007-03-13 21:59       ` Jeremy Fitzhardinge
2007-03-14  0:43         ` Dan Hecht
2007-03-14  4:37           ` Jeremy Fitzhardinge
2007-03-14 13:58             ` Lennart Sorensen
2007-03-14 15:08               ` Jeremy Fitzhardinge
2007-03-14 15:12                 ` Lennart Sorensen
2007-03-14 19:02             ` Dan Hecht
2007-03-14 19:34               ` Jeremy Fitzhardinge
2007-03-14 19:45                 ` Rik van Riel
2007-03-14 19:47                   ` Jeremy Fitzhardinge
2007-03-14 20:02                     ` Rik van Riel
2007-03-14 20:26                 ` Dan Hecht
2007-03-14 20:31                   ` Jeremy Fitzhardinge
2007-03-14 20:46                     ` Dan Hecht
2007-03-14 21:18                       ` Jeremy Fitzhardinge
2007-03-15 19:09                         ` Dan Hecht
2007-03-15 19:18                           ` Jeremy Fitzhardinge
2007-03-15 19:48                           ` Rik van Riel
2007-03-15 19:53                           ` Jeremy Fitzhardinge
2007-03-15 20:07                             ` Dan Hecht
2007-03-15 20:14                               ` Rik van Riel
2007-03-15 20:35                                 ` Dan Hecht
2007-03-16  8:59                                   ` Martin Schwidefsky
2007-03-14 20:38                 ` Ingo Molnar
2007-03-14 20:59                   ` Jeremy Fitzhardinge
2007-03-16  8:38                     ` Ingo Molnar
2007-03-16 16:53                       ` Jeremy Fitzhardinge
2007-03-15  5:23                 ` Paul Mackerras
2007-03-15 19:33                   ` Jeremy Fitzhardinge
2007-03-14  2:00         ` Daniel Walker
2007-03-14  6:52           ` Jeremy Fitzhardinge
2007-03-14  8:20             ` Zan Lynx
2007-03-14 16:11             ` Daniel Walker
2007-03-14 16:37               ` Jeremy Fitzhardinge
2007-03-14 16:59                 ` Daniel Walker
2007-03-14 17:08                   ` Jeremy Fitzhardinge
2007-03-14 18:06                     ` Daniel Walker
2007-03-14 18:41                       ` Jeremy Fitzhardinge
2007-03-14 19:00                         ` Daniel Walker
2007-03-14 19:44                           ` Jeremy Fitzhardinge
2007-03-14 20:33                             ` Daniel Walker
2007-03-14 21:16                               ` Jeremy Fitzhardinge
2007-03-14 21:34                                 ` Daniel Walker
2007-03-14 21:42                                   ` Jeremy Fitzhardinge
2007-03-14 21:36 ` Con Kolivas
2007-03-14 21:38   ` Jeremy Fitzhardinge
2007-03-14 21:40   ` Con Kolivas [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200703150840.49269.kernel@kolivas.org \
    --to=kernel@kolivas.org \
    --cc=ak@suse.de \
    --cc=chrisw@sous-sol.org \
    --cc=cpufreq@lists.linux.org.uk \
    --cc=jeremy@goop.org \
    --cc=jmorris@namei.org \
    --cc=johnstul@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rusty@rustcorp.com.au \
    --cc=tglx@linutronix.de \
    --cc=virtualization@lists.osdl.org \
    --cc=zach@vmware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).