From: Con Kolivas <kernel@kolivas.org>
To: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Andi Kleen <ak@suse.de>, Ingo Molnar <mingo@elte.hu>,
Thomas Gleixner <tglx@linutronix.de>,
Rusty Russell <rusty@rustcorp.com.au>,
Zachary Amsden <zach@vmware.com>,
James Morris <jmorris@namei.org>,
john stultz <johnstul@us.ibm.com>,
Chris Wright <chrisw@sous-sol.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
cpufreq@lists.linux.org.uk,
Virtualization Mailing List <virtualization@lists.osdl.org>
Subject: Re: Stolen and degraded time and schedulers
Date: Thu, 15 Mar 2007 08:40:48 +1100 [thread overview]
Message-ID: <200703150840.49269.kernel@kolivas.org> (raw)
In-Reply-To: <200703150836.08670.kernel@kolivas.org>
On Thursday 15 March 2007 08:36, Con Kolivas wrote:
> On Wednesday 14 March 2007 03:31, Jeremy Fitzhardinge wrote:
> > The current Linux scheduler makes one big assumption: that 1ms of CPU
> > time is the same as any other 1ms of CPU time, and that therefore a
> > process makes the same amount of progress regardless of which particular
> > ms of time it gets.
> >
> > This assumption is wrong now, and will become more wrong as
> > virtualization gets more widely used.
> >
> > It's wrong now, because it fails to take into account of several kinds
> > of missing time:
> >
> > 1. interrupts - time spent in an ISR is accounted to the current
> > process, even though it gets no direct benefit
> > 2. SMM - time is completely lost from the kernel
> > 3. slow CPUs - 1ms of 600MHz CPU is less useful than 1ms of 2.4GHz CPU
> >
> > The first two - time lost to interrupts - are a well known problem, and
> > are generally considered to be a non issue. If you're losing a
> > significant amount of time to interrupts, you probably have bigger
> > problems. (Or maybe not?)
> >
> > The third is not something I've seen discussed before, but it seems like
> > it could be a significant problem today. Certainly, I've noticed it
> > myself: an interactive program decides to do something CPU-intensive
> > (like start an animation), and it chugs until the conservative governor
> > brings the CPU up to speed. Certainly some of this is because its just
> > plain CPU-starved, but I think another factor is that it gets penalized
> > for running on a slow CPU: 1ms is not 1ms. And for power reasons you
> > want to encourage processes to run on slow CPUs rather than penalize
> > them.
> >
> > Virtualization just exacerbates this. If you have a busy machine
> > running multiple virtual CPUs, then each VCPU may only get a small
> > proportion of the total amount of available CPU time. If the kernel's
> > scheduler asserts that "you were just scheduled for 1ms, therefore you
> > made 1ms of progress", then many timeslices will effectively end up
> > being 1ms of 0Mhz CPU - because the VCPU wasn't scheduled and the real
> > CPU was doing something else.
> >
> >
> > So how to deal with this? Basically we need a clock which measures "CPU
> > work units", and have the scheduler use this clock.
> >
> > A "CPU work unit" clock has these properties:
> >
> > * inherently per-CPU (from the kernel's perspective, so it would be
> > per-VCPU in a virtual machine)
> > * monotonic - you can't do negative work
> > * measured in "work units"
> >
> > A "work unit" is probably most simply expressed in cycles - you assume a
> > cycle of CPU time is equivalent in terms of work done to any other
> > cycle. This means that 1 cycle at 600MHz is equivalent to 1 cycle at
> > 2.4GHz - but of course the 2.4GHz processor gets 4 times as many in any
> > real time interval. (This is the instance where the worst kind of tsc -
> > varying speed which stops on idle - is actually exactly what you want.)
> >
> > You could also measure "work units" in terms of normalized time units:
> > if the fastest CPU on the machine is 2.4GHz, then 1ms is 1ms a work unit
> > on that CPU, but 250us on the 600MHz CPU.
> >
> > It doesn't really matter what the unit is, so long as it is used
> > consistently to measure how much progress all processes made.
>
> I think you're looking for a complex solution to a problem that doesn't
> exist. The job of the process scheduler is to meter out the available cpu
> resources. It cannot make up cycles for a slow cpu or one that is
> throttled. If the problem is happening due to throttling it should be fixed
> by altering the throttle. The example you describe with the conservative
> governor is as easy to fix as changing to the ondemand governor.
> Differential power cpus on an SMP machine should be managed by SMP
> balancing choices based on power groups.
>
> It would be fine to implement some other accounting of this definition of
> time for other purposes
I mean such as for virtualisation purposes.
> but not for process scheduler decisions per se.
>
> Sorry to chime in late. My physical condition prevents me spending any
> extended period of time at the computer so I've tried to be succinct with
> my comments and may not be able to reply again.
--
-ck
prev parent reply other threads:[~2007-03-14 21:32 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-03-13 16:31 Stolen and degraded time and schedulers Jeremy Fitzhardinge
2007-03-13 20:12 ` john stultz
2007-03-13 20:32 ` Jeremy Fitzhardinge
2007-03-13 21:27 ` Daniel Walker
2007-03-13 21:59 ` Jeremy Fitzhardinge
2007-03-14 0:43 ` Dan Hecht
2007-03-14 4:37 ` Jeremy Fitzhardinge
2007-03-14 13:58 ` Lennart Sorensen
2007-03-14 15:08 ` Jeremy Fitzhardinge
2007-03-14 15:12 ` Lennart Sorensen
2007-03-14 19:02 ` Dan Hecht
2007-03-14 19:34 ` Jeremy Fitzhardinge
2007-03-14 19:45 ` Rik van Riel
2007-03-14 19:47 ` Jeremy Fitzhardinge
2007-03-14 20:02 ` Rik van Riel
2007-03-14 20:26 ` Dan Hecht
2007-03-14 20:31 ` Jeremy Fitzhardinge
2007-03-14 20:46 ` Dan Hecht
2007-03-14 21:18 ` Jeremy Fitzhardinge
2007-03-15 19:09 ` Dan Hecht
2007-03-15 19:18 ` Jeremy Fitzhardinge
2007-03-15 19:48 ` Rik van Riel
2007-03-15 19:53 ` Jeremy Fitzhardinge
2007-03-15 20:07 ` Dan Hecht
2007-03-15 20:14 ` Rik van Riel
2007-03-15 20:35 ` Dan Hecht
2007-03-16 8:59 ` Martin Schwidefsky
2007-03-14 20:38 ` Ingo Molnar
2007-03-14 20:59 ` Jeremy Fitzhardinge
2007-03-16 8:38 ` Ingo Molnar
2007-03-16 16:53 ` Jeremy Fitzhardinge
2007-03-15 5:23 ` Paul Mackerras
2007-03-15 19:33 ` Jeremy Fitzhardinge
2007-03-14 2:00 ` Daniel Walker
2007-03-14 6:52 ` Jeremy Fitzhardinge
2007-03-14 8:20 ` Zan Lynx
2007-03-14 16:11 ` Daniel Walker
2007-03-14 16:37 ` Jeremy Fitzhardinge
2007-03-14 16:59 ` Daniel Walker
2007-03-14 17:08 ` Jeremy Fitzhardinge
2007-03-14 18:06 ` Daniel Walker
2007-03-14 18:41 ` Jeremy Fitzhardinge
2007-03-14 19:00 ` Daniel Walker
2007-03-14 19:44 ` Jeremy Fitzhardinge
2007-03-14 20:33 ` Daniel Walker
2007-03-14 21:16 ` Jeremy Fitzhardinge
2007-03-14 21:34 ` Daniel Walker
2007-03-14 21:42 ` Jeremy Fitzhardinge
2007-03-14 21:36 ` Con Kolivas
2007-03-14 21:38 ` Jeremy Fitzhardinge
2007-03-14 21:40 ` Con Kolivas [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200703150840.49269.kernel@kolivas.org \
--to=kernel@kolivas.org \
--cc=ak@suse.de \
--cc=chrisw@sous-sol.org \
--cc=cpufreq@lists.linux.org.uk \
--cc=jeremy@goop.org \
--cc=jmorris@namei.org \
--cc=johnstul@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=rusty@rustcorp.com.au \
--cc=tglx@linutronix.de \
--cc=virtualization@lists.osdl.org \
--cc=zach@vmware.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).