linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: [RFC][PATCH 1/2] ACPI: Idle Processor PM Improvements
@ 2006-08-30 18:40 Pallipadi, Venkatesh
  2006-08-30 19:43 ` Matthew Garrett
  0 siblings, 1 reply; 27+ messages in thread
From: Pallipadi, Venkatesh @ 2006-08-30 18:40 UTC (permalink / raw)
  To: Adam Belay, Brown, Len
  Cc: ACPI ML, Linux Kernel ML, Dominik Brodowski, Arjan van de Ven

 

>-----Original Message-----
>From: linux-kernel-owner@vger.kernel.org 
>[mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Adam Belay
>Sent: Tuesday, August 29, 2006 1:51 PM
>To: Brown, Len
>Cc: ACPI ML; Linux Kernel ML; Dominik Brodowski; Arjan van de Ven
>Subject: [RFC][PATCH 1/2] ACPI: Idle Processor PM Improvements
>
>Hi All,
>
>This patch improves the ACPI c-state selection algorithm.  It also
>includes a major cleanup and simplification of the processor idle code.
>
>The new implementation considers the full menu of available c-states.
>Just as the previous implementation, decisions are primarily based on
>the residency time of the last c-state entry.  This is generally an
>effective metric because it allows for detection of interrupt activity.
>However, the new algorithm differs in that it does not promote 
>or demote
>through the c-states in succession.  Rather, it immediately jumps to
>whatever c-state has the best expected power consumption advantage for
>the predicted residency time (i.e. the previously measured residency).
>If the residency time is too short during a deep c-state 
>entry, then the
>cost of entering the state outweighs any power consumption advantage.
>Similarly, if a shallow c-state is entered and resident for an
>excessively long duration, then a potential opportunity to save more
>power is missed.
>
>The changes in this patch allow the ACPI idle processor mechanism to
>react more quickly to sudden bursts of activity because it can jump
>directly to whatever c-state is appropriate.  However, because of the
>"menu" nature of c-state selection, the code works best when ACPI
>implementations expose all of the c-states supported by hardware.
>
>The bus master activity mechanism has undergone similar improvements.
>During capability detection, the deepest c-state that allows bus master
>activity is determined.  BM_STS is then polled each time the ACPI code
>prepares to enter a c-state.  If bus master activity is detected, then
>the previously mentioned bus master capable c-state becomes the deepest
>c-state allowed for that quantum.  In contrast, the old implementation
>would permit bus master activity to cause a promotion from one C3-type
>state to the next shallower C3-type state, imposing 
>unnecessary latency.
>As a further optimization, BM_STS is cleared each time
>acpi_processor_idle() is entered.  This prevents any stale bus master
>status from affecting c-state policy, as it may have occurred long ago
>during scheduled work.
>
>Finally, it's worth mentioning that the bulk of c-state policy
>calculations have been moved to take place before c-states are entered.
>This should further reduce exit latency when returning from a c-state.
>
>This algorithm has not yet been carefully benchmarked (e.g. bltk or
>power meters).  However, I can say with some confidence that it saves a
>small amount more power during an idle workload and a larger 
>amount more
>power during typical user-input oriented workloads such as word
>processing.
>
>I would really appreciate any comments, suggestions, or testing.
>

Nice changes. Will test and let you know how it goes.

While we are at cleaning up the code, I think it will be much better to 
move out C-state policy out of this acpi code altogether. We should have

just a generic interface, where any low level driver (acpi) can 
register/unregister a idle routine with latency, power and other 
characteristics (BM_STS). That way the policy can be generic and 
out of ACPI code. We had a patch earlier that does something like this
here:
http://www.mail-archive.com/linux-acpi@vger.kernel.org/msg00129.html
http://www.mail-archive.com/linux-acpi@vger.kernel.org/msg00130.html
But, that did not go anywhere at that time. Probably we can do some 
cleanup like that, along with this patch....

Thanks,
Venki

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH 1/2] ACPI: Idle Processor PM Improvements
  2006-08-30 18:40 [RFC][PATCH 1/2] ACPI: Idle Processor PM Improvements Pallipadi, Venkatesh
@ 2006-08-30 19:43 ` Matthew Garrett
  2006-08-31 23:13   ` Bjorn Helgaas
  0 siblings, 1 reply; 27+ messages in thread
From: Matthew Garrett @ 2006-08-30 19:43 UTC (permalink / raw)
  To: Pallipadi, Venkatesh
  Cc: Adam Belay, Brown, Len, ACPI ML, Linux Kernel ML,
	Dominik Brodowski, Arjan van de Ven, devel

On Wed, Aug 30, 2006 at 11:40:16AM -0700, Pallipadi, Venkatesh wrote:

(Added devel@laptop.org to the Cc:)

> While we are at cleaning up the code, I think it will be much better to 
> move out C-state policy out of this acpi code altogether. We should have

That would be helpful. For the One Laptop Per Child project (or whatever 
it's called today), it would be advantageous to run without acpi. At the 
moment that would cost us deeper C states, so an interface to allow a 
platform driver to register and provide the same functionality without 
code duplication would be helpful.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH 1/2] ACPI: Idle Processor PM Improvements
  2006-08-30 19:43 ` Matthew Garrett
@ 2006-08-31 23:13   ` Bjorn Helgaas
  2006-09-01  0:30     ` [OLPC-devel] " Jim Gettys
  0 siblings, 1 reply; 27+ messages in thread
From: Bjorn Helgaas @ 2006-08-31 23:13 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Pallipadi, Venkatesh, Adam Belay, Brown, Len, ACPI ML,
	Linux Kernel ML, Dominik Brodowski, Arjan van de Ven, devel

On Wednesday 30 August 2006 13:43, Matthew Garrett wrote:
> That would be helpful. For the One Laptop Per Child project (or whatever 
> it's called today), it would be advantageous to run without acpi.

Out of curiosity, what is the motivation for running without acpi?
It costs a lot to diverge from the mainstream in areas like that,
so there must be a big payoff.  But maybe if OLPC depends on acpi
being smarter about power or code size or whatever, those improvements
could be made and everybody would benefit.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [OLPC-devel] Re: [RFC][PATCH 1/2] ACPI: Idle Processor PM Improvements
  2006-08-31 23:13   ` Bjorn Helgaas
@ 2006-09-01  0:30     ` Jim Gettys
  2006-09-01  3:53       ` Len Brown
  2006-09-04 13:09       ` Pavel Machek
  0 siblings, 2 replies; 27+ messages in thread
From: Jim Gettys @ 2006-09-01  0:30 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Matthew Garrett, Brown, Len, Linux Kernel ML, Dominik Brodowski,
	ACPI ML, Adam Belay, Pallipadi, Venkatesh, Arjan van de Ven,
	devel

[-- Attachment #1: Type: text/plain, Size: 4700 bytes --]

On Thu, 2006-08-31 at 17:13 -0600, Bjorn Helgaas wrote:
> On Wednesday 30 August 2006 13:43, Matthew Garrett wrote:
> > That would be helpful. For the One Laptop Per Child project (or whatever 
> > it's called today), it would be advantageous to run without acpi.
> 
> Out of curiosity, what is the motivation for running without acpi?
> It costs a lot to diverge from the mainstream in areas like that,
> so there must be a big payoff.  But maybe if OLPC depends on acpi
> being smarter about power or code size or whatever, those improvements
> could be made and everybody would benefit.

Good question; I see Matthew beat me to part of the explanation, but
here is more detail:

Our screen consumes of order 1/10th the power of a conventional flat
panel, and can consume a half watt or so (yes, we now have working
screens; this is not mythological hardware; I got my own personal first
hand look at prototype display running this afternoon :-); I always do
line new toys...).  

Even though the base machine may take only a couple watts of power
(Geode GX + the rest of the base logic), 2-3 watts is too much power to
use; a small child can generate only 7-10 watts.  So if we want a decent
"learn" to "generate" ratio, we have to do better than the 2-4 to 1
ratio we might get conventionally.  In January, we saw this staring us
in the face, and knew we had to do better, or we'd have just told a good
fraction of the kids in the world they can't have the advantages of a
computer.  Our goal has always been a 10 to 1 ratio, for at least the
most important use cases (e.g. reading).

OK, what to do?  We built a chip that lets us suspend the processor and
keep the screen alive, and chose a wireless chip that will let us keep
the mesh network alive, and we intend to suspend/resume the processor
to/from RAM at the drop of a hat.  This gets our idle consumption from
about 2.5-3 watts (with screen and wireless on), to under one watt.
We'd need resume to be as close to imperceptible as possible; touch a
key or the touchpad, the machine resumes so fast as you don't notice.

In short, we have novel hardware: we can have our screen on, and suspend
the processor to RAM, and use a half a watt.  We can have our wireless
forwarding packets in our mesh networks, with the processor suspended,
consuming under 400mw (we hope 300mw by the time we ship).  Both on, and
we're still under one watt.

For keyboard activity, human perception is in the 100-200 millisecond
range; for some other stuff, it is even less much than that.  So that's
the necessity; now the invention.

I've done a straw pole among kernel gurus at OLS and elsewhere on how
fast Linux might be able to resume. I've gotten answers of typically
"one second".

But, on other platforms (see attached), I have data I've measured myself
showing Linux going from resume from RAM to *scheduling user level
processes* 100 times faster than that, on a wimpy 200mhz ARM processor.
Yes, Matilda, Linux can, on non-braindead hardware, resume all the way
to scheduling user processes in 10 milliseconds on a 200mhz processor.

This will, for most use cases (you are reading, or your machine is
sitting there between bursts of activity), likely double / triple /
quadruple our battery life depending on what you are doing.  Note that
on a conventional machine, with a conventional display, you'd not see
this large an improvement.  Worst case, of course, it will make no
difference at all (e.g. watching a video).

Clearly we can't do any better than what our hardware allows
(stabilization of power supplies, PLL's, etc).  I should have data on
that very shortly, now that I can measure it on LinuxBIOS pretty
directly.  For those of you building chips and systems: please make the
hardware restart time as fast as possible: it matters.  The CPU doesn't
have to go full speed instantly; just get it going at some speed as
quickly as you can.

Conventional PC's with conventional BIOS's using ACPI don't do anything
like as well. So, guess what?  We don't plan to use a conventional
commercial BIOS, (we're using LinuxBIOS and Linux as Bootloader) and
will do whatever it takes (including ignoring however much of ACPI turns
out to be necessary) to get our resume down to what we know is possible.
ACPI is mostly an x86 aberration; on most architectures it does not
exist.  So it does not require contorting Linux to not use ACPI, to the
extent we find it necessary.  Most of *real* power management is done by
Linux, and not by ACPI.

Boy, human powered machines really *do* focus the mind on power
management ;-).
                              Regards,
                                     - Jim Gettys

-- 
Jim Gettys
One Laptop Per Child


[-- Attachment #2: Attached message - Linux resume time on iPAQ (Linux resume can be *really* fast). --]
[-- Type: message/rfc822, Size: 5396 bytes --]

From: Jim Gettys <jg@laptop.org>
To: OLPC Developer's List <devel@laptop.org>
Subject: Linux resume time on iPAQ (Linux resume can be *really* fast).
Date: Fri, 14 Jul 2006 20:21:05 -0400
Message-ID: <1152922866.6001.332.camel@localhost.localdomain>


The iPAQ is by far the closest device we have to modeling the OLPC
system, though the one I chose is less than 1/2 the integer performance
of our machine. It has roughly comparable peripherals to the OLPC
system, and unified graphics as the Geode (just a dumb frame buffer on
the SA1100).

Here's the test:

I have a simple C program written for me by Joshua Wise that just writes
characters to /dev/tty.  It can either do so continuously, or open and
close the device between each character.  In the former case, you get
metronome like character output, as the characters are all interrupt
driven out of the kernel character buffers; in the latter case, the
close/open sequence enables the operating system to reschedule the
process as it sees fit.  This is a much more interesting test.

I reconfirmed the data with Mike Bove this afternoon.

A: suspend on the iPAQ is amazingly fast; we could see no significant
delay from emitting a character to power off of the machine.

B: resume is also very fast, if not quite so fast (of order a few up to
10 milliseconds).

Here's the the measurement methodology:

I suspend the iPAQ.

I wait some amount of time.

I resume the iPAQ.  Conveniently, there happens to be a debug message
emitted by the bootloader right when it transfers control to Linux.

1) The iPAQ does nothing for anywhere between 280-400ms after resume
starts; we do not know (or care) what the cause actually is. We theorize
that it has some built in delay on how long the power supply takes to
stabilize or some such strangeness.   This will be a combination of
whatever hardware delays force us + any bootloader/BIOS delays.

2) within some few milliseconds of actually resuming (like of order 10
ms), Linux is in user space executing code, and some characters again
appear on the serial port.

3) There then appears to be an approximately 180ms gap before characters
again start appearing on the output port.  

The resume is triggering processes in user space; if I kill the cardmgr
process, used for hotplug of PCMCIA, this gap goes away.  There may be
very simple solutions too: e.g. running those processes at reduced
priority, but probably better is to try to arrange hotplug to work in
some other fashion.

Conclusion
==========

Linux can resume *really, really, really* fast, if the hardware lets it,
and the device drivers don't have bad delays built into them.  

If they do have such bad delays, we might have to do Mark's fast
suspend/resume scheme, or something driver specific. I really like
Mark's fast suspend/resume idea, and on some big systems (or with really
bad hardware that has multiple very long delays, it may be a godsend). 

We *will* have to do something about this user space behavior, which is
not at all surprising. One option might be to only attempt hotplug when
the lid is closed, or when you invoke some application, rather than on
resume from save to RAM; or it may be possible to do this on USB
provided hotplug events (but I haven't read the Geode errata sheet for a
while).

So: 
  o we need to vet the drivers we are using to see if any of them have
long built in delays on resume.  If our hardware is really braindead, we
might still have to do something about parallelizing the resume code.
The most likely driver to have problems with is clearly going to be USB,
and we need USB to talk to the Marvell chip.
  o The iPAQ is running Linux 2.4 which does not have a
particularly decent scheduler; it doesn't follow that we'll necessarily
see the same complete starvation of the original process (though we very
well might). We certainly don't want to be triggering hotplug at the
rate we'd like to suspend/resume.

Next step: perform the same tests on the OLPC system and see what that
tells us.  This is just now to the point of becoming feasible.
                               Regards,
                                 - Jim

-- 
Jim Gettys
One Laptop Per Child



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [OLPC-devel] Re: [RFC][PATCH 1/2] ACPI: Idle Processor PM Improvements
  2006-09-01  0:30     ` [OLPC-devel] " Jim Gettys
@ 2006-09-01  3:53       ` Len Brown
  2006-09-01  4:12         ` Matthew Garrett
                           ` (3 more replies)
  2006-09-04 13:09       ` Pavel Machek
  1 sibling, 4 replies; 27+ messages in thread
From: Len Brown @ 2006-09-01  3:53 UTC (permalink / raw)
  To: jg
  Cc: Bjorn Helgaas, Matthew Garrett, Linux Kernel ML,
	Dominik Brodowski, ACPI ML, Adam Belay, Pallipadi, Venkatesh,
	Arjan van de Ven, devel

On Thursday 31 August 2006 20:30, Jim Gettys wrote:
> On Thu, 2006-08-31 at 17:13 -0600, Bjorn Helgaas wrote:
> > On Wednesday 30 August 2006 13:43, Matthew Garrett wrote:
> > > That would be helpful. For the One Laptop Per Child project (or whatever 
> > > it's called today), it would be advantageous to run without acpi.
> > 
> > Out of curiosity, what is the motivation for running without acpi?
> > It costs a lot to diverge from the mainstream in areas like that,
> > so there must be a big payoff.  But maybe if OLPC depends on acpi
> > being smarter about power or code size or whatever, those improvements
> > could be made and everybody would benefit.
> 
> Good question; I see Matthew beat me to part of the explanation, but
> here is more detail:

I recommended that the OLPC guys not use ACPI.

I do not think it would benefit their system.  Although it is an i386
instruction set, their system is more like an embedded device than
like a traditional laptop.

The Geode doesn't suport any C-states -- so ACPI wouldn't help them there anyway.

As Jim wrote, OLPC plans to suspend-to-ram from idle, and to keep video running,
so ACPI wouldn't help them on that either.

Re: optimizing suspend/resume speed
I expect suspend/resume speed has more to do with devices than with ACPI.
But frankly, with gaping functionality holes in Linux suspend/resume support such as
IDE and SATA, I think that optimizing for suspend/resume speed on a mainstream laptop
is somewhat "forward looking".

-Len

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [OLPC-devel] Re: [RFC][PATCH 1/2] ACPI: Idle Processor PM Improvements
  2006-09-01  3:53       ` Len Brown
@ 2006-09-01  4:12         ` Matthew Garrett
  2006-09-01 15:51           ` Jordan Crouse
  2006-09-01 13:14         ` [OLPC-devel] Re: [RFC][PATCH 1/2] " Carl-Daniel Hailfinger
                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 27+ messages in thread
From: Matthew Garrett @ 2006-09-01  4:12 UTC (permalink / raw)
  To: Len Brown
  Cc: jg, Bjorn Helgaas, Linux Kernel ML, Dominik Brodowski, ACPI ML,
	Adam Belay, Pallipadi, Venkatesh, Arjan van de Ven, devel

On Thu, Aug 31, 2006 at 11:53:04PM -0400, Len Brown wrote:

> The Geode doesn't suport any C-states -- so ACPI wouldn't help them there anyway.

Are you sure of that? The docs I have here suggest C1 and C2, but it's 
possible that that's just the companion chip and they aren't implemented 
in the CPU.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [OLPC-devel] Re: [RFC][PATCH 1/2] ACPI: Idle Processor PM Improvements
  2006-09-01  3:53       ` Len Brown
  2006-09-01  4:12         ` Matthew Garrett
@ 2006-09-01 13:14         ` Carl-Daniel Hailfinger
  2006-09-01 21:52         ` Andi Kleen
  2006-09-04 13:13         ` Pavel Machek
  3 siblings, 0 replies; 27+ messages in thread
From: Carl-Daniel Hailfinger @ 2006-09-01 13:14 UTC (permalink / raw)
  To: Len Brown
  Cc: jg, Linux Kernel ML, Dominik Brodowski, ACPI ML, Adam Belay,
	Pallipadi, Venkatesh, Arjan van de Ven, devel, Bjorn Helgaas

Len Brown wrote:
> 
> But frankly, with gaping functionality holes in Linux suspend/resume support such as
> IDE and SATA, I think that optimizing for suspend/resume speed on a mainstream laptop
> is somewhat "forward looking".

OLPC has no IDE/SATA devices, just 512 MB of onboard NAND flash.

Regards,
Carl-Daniel
-- 
http://www.hailfinger.org/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: ACPI: Idle Processor PM Improvements
  2006-09-01  4:12         ` Matthew Garrett
@ 2006-09-01 15:51           ` Jordan Crouse
  0 siblings, 0 replies; 27+ messages in thread
From: Jordan Crouse @ 2006-09-01 15:51 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Len Brown, Linux Kernel ML, Dominik Brodowski, ACPI ML,
	Adam Belay, Pallipadi, Venkatesh, Arjan van de Ven, devel,
	Bjorn Helgaas

On 01/09/06 05:12 +0100, Matthew Garrett wrote:
> On Thu, Aug 31, 2006 at 11:53:04PM -0400, Len Brown wrote:
> 
> > The Geode doesn't suport any C-states -- so ACPI wouldn't help them there anyway.
> 
> Are you sure of that? The docs I have here suggest C1 and C2, but it's 
> possible that that's just the companion chip and they aren't implemented 
> in the CPU.

C1 is essentially suspend on hlt.  We have something called Automatic Hardware
Clock Gating that kicks in when the blocks go unused, so that saves a bit
more power (especially in the south bridge) then we would with just a simple
hlt.  In any event, this already happens without the assistance of ACPI.

The 5536 has support for a C2 state as well, but I don't know if that
has any effect on the GX or not.

Jordan

-- 
Jordan Crouse
Senior Linux Engineer
Advanced Micro Devices, Inc.
<www.amd.com/embeddedprocessors>



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [OLPC-devel] Re: [RFC][PATCH 1/2] ACPI: Idle Processor PM Improvements
  2006-09-01  3:53       ` Len Brown
  2006-09-01  4:12         ` Matthew Garrett
  2006-09-01 13:14         ` [OLPC-devel] Re: [RFC][PATCH 1/2] " Carl-Daniel Hailfinger
@ 2006-09-01 21:52         ` Andi Kleen
  2006-09-01 22:57           ` Alan Cox
  2006-09-04 13:13         ` Pavel Machek
  3 siblings, 1 reply; 27+ messages in thread
From: Andi Kleen @ 2006-09-01 21:52 UTC (permalink / raw)
  To: Len Brown
  Cc: Bjorn Helgaas, Matthew Garrett, Linux Kernel ML,
	Dominik Brodowski, ACPI ML, Adam Belay, Pallipadi, Venkatesh,
	Arjan van de Ven, devel


Len Brown <len.brown@intel.com> writes:
> 
> Re: optimizing suspend/resume speed
> I expect suspend/resume speed has more to do with devices than with ACPI.
> But frankly, with gaping functionality holes in Linux suspend/resume support such as
> IDE and SATA, I think that optimizing for suspend/resume speed on a mainstream laptop
> is somewhat "forward looking".

What are these gaping holes? SATA seems to work at least on many
drivers with an out of tree patch (that will hopefully be merged soon)
And IDE mostly works too except for HPA on thinkpads (which can be
disabled in the BIOS). While certainly not perfect it doesn't seem
that bad to me.

-Andi


-- 
VGER BF report: H 0

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [OLPC-devel] Re: [RFC][PATCH 1/2] ACPI: Idle Processor PM Improvements
  2006-09-01 21:52         ` Andi Kleen
@ 2006-09-01 22:57           ` Alan Cox
  0 siblings, 0 replies; 27+ messages in thread
From: Alan Cox @ 2006-09-01 22:57 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Len Brown, Bjorn Helgaas, Matthew Garrett, Linux Kernel ML,
	Dominik Brodowski, ACPI ML, Adam Belay, Pallipadi, Venkatesh,
	Arjan van de Ven, devel

Ar Gwe, 2006-09-01 am 23:52 +0200, ysgrifennodd Andi Kleen:
> What are these gaping holes? SATA seems to work at least on many
> drivers with an out of tree patch (that will hopefully be merged soon)

SATA ought to be pretty good now. 

> And IDE mostly works too except for HPA on thinkpads (which can be
> disabled in the BIOS). While certainly not perfect it doesn't seem
> that bad to me.

IDE also fails for various chipsets where PLLs need a recalibration or
setup needs redoing, and some users report things like floating IRQ 14
hangs on suspend or resume.

HPA now has a -mm proposed patch.

Alan


-- 
VGER BF report: H 0

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [OLPC-devel] Re: [RFC][PATCH 1/2] ACPI: Idle Processor PM Improvements
  2006-09-01  0:30     ` [OLPC-devel] " Jim Gettys
  2006-09-01  3:53       ` Len Brown
@ 2006-09-04 13:09       ` Pavel Machek
  2006-09-05 14:31         ` Jim Gettys
  1 sibling, 1 reply; 27+ messages in thread
From: Pavel Machek @ 2006-09-04 13:09 UTC (permalink / raw)
  To: Jim Gettys
  Cc: Bjorn Helgaas, Matthew Garrett, Brown, Len, Linux Kernel ML,
	Dominik Brodowski, ACPI ML, Adam Belay, Pallipadi, Venkatesh,
	Arjan van de Ven, devel

Hi!

> In short, we have novel hardware: we can have our screen on, and suspend
> the processor to RAM, and use a half a watt.  We can have our wireless
> forwarding packets in our mesh networks, with the processor suspended,
> consuming under 400mw (we hope 300mw by the time we ship).  Both on, and
> we're still under one watt.
> 
> For keyboard activity, human perception is in the 100-200 millisecond
> range; for some other stuff, it is even less much than that.  So that's
> the necessity; now the invention.
> 
> I've done a straw pole among kernel gurus at OLS and elsewhere on how
> fast Linux might be able to resume. I've gotten answers of typically
> "one second".
> 
> But, on other platforms (see attached), I have data I've measured myself
> showing Linux going from resume from RAM to *scheduling user level
> processes* 100 times faster than that, on a wimpy 200mhz ARM processor.
> Yes, Matilda, Linux can, on non-braindead hardware, resume all the way
> to scheduling user processes in 10 milliseconds on a 200mhz processor.

2.4 and 2.6 are *very* different here. You'll probably need to optimize freezer
in 2.6 a bit...
							Pavel
-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [OLPC-devel] Re: [RFC][PATCH 1/2] ACPI: Idle Processor PM Improvements
  2006-09-01  3:53       ` Len Brown
                           ` (2 preceding siblings ...)
  2006-09-01 21:52         ` Andi Kleen
@ 2006-09-04 13:13         ` Pavel Machek
  3 siblings, 0 replies; 27+ messages in thread
From: Pavel Machek @ 2006-09-04 13:13 UTC (permalink / raw)
  To: Len Brown
  Cc: jg, Bjorn Helgaas, Matthew Garrett, Linux Kernel ML,
	Dominik Brodowski, ACPI ML, Adam Belay, Pallipadi, Venkatesh,
	Arjan van de Ven, devel

On Thu 31-08-06 23:53:04, Len Brown wrote:
> On Thursday 31 August 2006 20:30, Jim Gettys wrote:
> > On Thu, 2006-08-31 at 17:13 -0600, Bjorn Helgaas wrote:
> > > On Wednesday 30 August 2006 13:43, Matthew Garrett wrote:
> > > > That would be helpful. For the One Laptop Per Child project (or whatever 
> > > > it's called today), it would be advantageous to run without acpi.
> > > 
> > > Out of curiosity, what is the motivation for running without acpi?
> > > It costs a lot to diverge from the mainstream in areas like that,
> > > so there must be a big payoff.  But maybe if OLPC depends on acpi
> > > being smarter about power or code size or whatever, those improvements
> > > could be made and everybody would benefit.
> > 
> > Good question; I see Matthew beat me to part of the explanation, but
> > here is more detail:
> 
> I recommended that the OLPC guys not use ACPI.
> 
> I do not think it would benefit their system.  Although it is an i386
> instruction set, their system is more like an embedded device than
> like a traditional laptop.
> 
> The Geode doesn't suport any C-states -- so ACPI wouldn't help them there anyway.
> 
> As Jim wrote, OLPC plans to suspend-to-ram from idle, and to keep video running,
> so ACPI wouldn't help them on that either.
> 
> Re: optimizing suspend/resume speed
> I expect suspend/resume speed has more to do with devices than with ACPI.
> But frankly, with gaping functionality holes in Linux suspend/resume support such as
> IDE and SATA, I think that optimizing for suspend/resume speed on a mainstream laptop
> is somewhat "forward looking".

Well, list of hardware where s2ram works okay is long and growing...
of course, help is always wanted. And yes, it would be nice if someone
optimized suspend/resume speed. There are somelow-hanging fruits
there.

-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [OLPC-devel] Re: [RFC][PATCH 1/2] ACPI: Idle Processor PM Improvements
  2006-09-04 13:09       ` Pavel Machek
@ 2006-09-05 14:31         ` Jim Gettys
  2006-09-06 10:37           ` Pavel Machek
  0 siblings, 1 reply; 27+ messages in thread
From: Jim Gettys @ 2006-09-05 14:31 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Bjorn Helgaas, Matthew Garrett, Brown, Len, Linux Kernel ML,
	Dominik Brodowski, ACPI ML, Adam Belay, Pallipadi, Venkatesh,
	Arjan van de Ven, devel

On Mon, 2006-09-04 at 13:09 +0000, Pavel Machek wrote:

> 
> 2.4 and 2.6 are *very* different here. You'll probably need to optimize freezer
> in 2.6 a bit...
> 						

Among other problems: e.g. 2.4 did not automatically do a VT switch; 2.6
does; we'll have to have a way to signal "we're a sane display driver;
don't switch away from me on suspend".
                                 - Jim

-- 
Jim Gettys
One Laptop Per Child



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [OLPC-devel] Re: [RFC][PATCH 1/2] ACPI: Idle Processor PM Improvements
  2006-09-05 14:31         ` Jim Gettys
@ 2006-09-06 10:37           ` Pavel Machek
  2006-09-06 14:58             ` Jordan Crouse
  2006-09-06 15:19             ` [OLPC-devel] Re: [RFC][PATCH 1/2] " Jim Gettys
  0 siblings, 2 replies; 27+ messages in thread
From: Pavel Machek @ 2006-09-06 10:37 UTC (permalink / raw)
  To: Jim Gettys
  Cc: Bjorn Helgaas, Matthew Garrett, Brown, Len, Linux Kernel ML,
	Dominik Brodowski, ACPI ML, Adam Belay, Pallipadi, Venkatesh,
	Arjan van de Ven, devel

Hi!

> > 2.4 and 2.6 are *very* different here. You'll probably need to optimize freezer
> > in 2.6 a bit...
> > 						
> 
> Among other problems: e.g. 2.4 did not automatically do a VT switch; 2.6
> does; we'll have to have a way to signal "we're a sane display driver;
> don't switch away from me on suspend".

Not like that, please.

You are using X running over framebuffer, right? So that kernel is
controlling the graphics hardware. In such case it is safe to avoid VT
switch.
									Pavel
-- 
Thanks, Sharp!

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: ACPI: Idle Processor PM Improvements
  2006-09-06 10:37           ` Pavel Machek
@ 2006-09-06 14:58             ` Jordan Crouse
  2006-09-12  9:21               ` Pavel Machek
  2006-09-06 15:19             ` [OLPC-devel] Re: [RFC][PATCH 1/2] " Jim Gettys
  1 sibling, 1 reply; 27+ messages in thread
From: Jordan Crouse @ 2006-09-06 14:58 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Jim Gettys, Brown, Len, Linux Kernel ML, Dominik Brodowski,
	ACPI ML, Adam Belay, Pallipadi, Venkatesh, Arjan van de Ven,
	devel, Bjorn Helgaas

On 06/09/06 12:37 +0200, Pavel Machek wrote:
> Hi!
> 
> > > 2.4 and 2.6 are *very* different here. You'll probably need to optimize freezer
> > > in 2.6 a bit...
> > > 						
> > 
> > Among other problems: e.g. 2.4 did not automatically do a VT switch; 2.6
> > does; we'll have to have a way to signal "we're a sane display driver;
> > don't switch away from me on suspend".
> 
> Not like that, please.
> 
> You are using X running over framebuffer, right? So that kernel is
> controlling the graphics hardware. In such case it is safe to avoid VT
> switch.

Actually not - the Geode GX has full 2D hardware acceleration with a complete
X driver to match.  No Xfbdev here.

Jordan

Pavel
-- 
Jordan Crouse
Senior Linux Engineer
Advanced Micro Devices, Inc.
<www.amd.com/embeddedprocessors>



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [OLPC-devel] Re: [RFC][PATCH 1/2] ACPI: Idle Processor PM Improvements
  2006-09-06 10:37           ` Pavel Machek
  2006-09-06 14:58             ` Jordan Crouse
@ 2006-09-06 15:19             ` Jim Gettys
  2006-09-12  9:21               ` Pavel Machek
  1 sibling, 1 reply; 27+ messages in thread
From: Jim Gettys @ 2006-09-06 15:19 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Bjorn Helgaas, Matthew Garrett, Brown, Len, Linux Kernel ML,
	Dominik Brodowski, ACPI ML, Adam Belay, Pallipadi, Venkatesh,
	Arjan van de Ven, devel

On Wed, 2006-09-06 at 12:37 +0200, Pavel Machek wrote:
> Hi!
> 
> > > 2.4 and 2.6 are *very* different here. You'll probably need to optimize freezer
> > > in 2.6 a bit...
> > > 						
> > 
> > Among other problems: e.g. 2.4 did not automatically do a VT switch; 2.6
> > does; we'll have to have a way to signal "we're a sane display driver;
> > don't switch away from me on suspend".
> 
> Not like that, please.
> 
> You are using X running over framebuffer, right? So that kernel is
> controlling the graphics hardware. In such case it is safe to avoid VT
> switch.

It should be perfectly safe.

The Geode has significantly more than dumb frame buffer support, even
though it can't support 3D in hardware (we do get blit and alpha
blending, and YUV->RGB support in hardware).

We have an fbdev driver for the hardware (in fact, have to finally have
a decent driver in general, as the transfer to and from DCON controlled
display has to happen at interrupt time).  We won't be doing thing evil
in X behind the operating system's back the way most XF86 drivers do,
but very much the way display drivers supported X before the strange
notion of completely OS independent drivers without any kernel support
twisted the way XF86 drivers usually work.  Ah, back to the future
(past)....
                                        - Jim


-- 
Jim Gettys
One Laptop Per Child



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: ACPI: Idle Processor PM Improvements
  2006-09-06 14:58             ` Jordan Crouse
@ 2006-09-12  9:21               ` Pavel Machek
  2006-09-12 18:14                 ` Jim Gettys
  0 siblings, 1 reply; 27+ messages in thread
From: Pavel Machek @ 2006-09-12  9:21 UTC (permalink / raw)
  To: Jordan Crouse
  Cc: Jim Gettys, Brown, Len, Linux Kernel ML, Dominik Brodowski,
	ACPI ML, Adam Belay, Pallipadi, Venkatesh, Arjan van de Ven,
	devel, Bjorn Helgaas

On Wed 2006-09-06 08:58:49, Jordan Crouse wrote:
> On 06/09/06 12:37 +0200, Pavel Machek wrote:
> > Hi!
> > 
> > > > 2.4 and 2.6 are *very* different here. You'll probably need to optimize freezer
> > > > in 2.6 a bit...
> > > > 						
> > > 
> > > Among other problems: e.g. 2.4 did not automatically do a VT switch; 2.6
> > > does; we'll have to have a way to signal "we're a sane display driver;
> > > don't switch away from me on suspend".
> > 
> > Not like that, please.
> > 
> > You are using X running over framebuffer, right? So that kernel is
> > controlling the graphics hardware. In such case it is safe to avoid VT
> > switch.
> 
> Actually not - the Geode GX has full 2D hardware acceleration with a complete
> X driver to match.  No Xfbdev here.

Ok, so what is needed is message to X "we are suspending", and X needs
to respond "okay, I'm ready, no need for console switch".

Alternatively, hack kernel to take control from X without actually
switching consoles. That should be possible even with current
interface.
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [OLPC-devel] Re: [RFC][PATCH 1/2] ACPI: Idle Processor PM Improvements
  2006-09-06 15:19             ` [OLPC-devel] Re: [RFC][PATCH 1/2] " Jim Gettys
@ 2006-09-12  9:21               ` Pavel Machek
  0 siblings, 0 replies; 27+ messages in thread
From: Pavel Machek @ 2006-09-12  9:21 UTC (permalink / raw)
  To: Jim Gettys
  Cc: Bjorn Helgaas, Matthew Garrett, Brown, Len, Linux Kernel ML,
	Dominik Brodowski, ACPI ML, Adam Belay, Pallipadi, Venkatesh,
	Arjan van de Ven, devel

On Wed 2006-09-06 11:19:09, Jim Gettys wrote:
> On Wed, 2006-09-06 at 12:37 +0200, Pavel Machek wrote:
> > Hi!
> > 
> > > > 2.4 and 2.6 are *very* different here. You'll probably need to optimize freezer
> > > > in 2.6 a bit...
> > > > 						
> > > 
> > > Among other problems: e.g. 2.4 did not automatically do a VT switch; 2.6
> > > does; we'll have to have a way to signal "we're a sane display driver;
> > > don't switch away from me on suspend".
> > 
> > Not like that, please.
> > 
> > You are using X running over framebuffer, right? So that kernel is
> > controlling the graphics hardware. In such case it is safe to avoid VT
> > switch.
> 
> It should be perfectly safe.

Okay, but per-driver flag is wrong way to go (see the other mail).
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: ACPI: Idle Processor PM Improvements
  2006-09-12  9:21               ` Pavel Machek
@ 2006-09-12 18:14                 ` Jim Gettys
  2006-09-12 18:27                   ` Mitch Bradley
                                     ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Jim Gettys @ 2006-09-12 18:14 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Jordan Crouse, Brown, Len, Linux Kernel ML, Dominik Brodowski,
	ACPI ML, Adam Belay, Pallipadi, Venkatesh, Arjan van de Ven,
	devel, Bjorn Helgaas

On Tue, 2006-09-12 at 11:21 +0200, Pavel Machek wrote:

> Ok, so what is needed is message to X "we are suspending", and X needs
> to respond "okay, I'm ready, no need for console switch".

This presumes an external agent to X controlling the fast
suspend/resume, with messages having to flow to and from X, and to and
from the kernel, with the kernel in the middle.

Another simpler option is X itself just telling the kernel to suspend
without console switch, as the handoff of the display to the DCON chip
has to be done with X and with an interrupt signaling completion of the
handoff.  This would be triggered by an inactivity timeout in the X
server.

I'm not sure which is best right now: generality vs. simplicity.  We
just got samples of hardware to do some prototyping on in the last two
weeks. (see wiki.laptop.org for photographs of our screen and the DCON
in action).

> 
> Alternatively, hack kernel to take control from X without actually
> switching consoles. That should be possible even with current
> interface.

This would require saving/restoring all graphics state in the kernel
(and X already has that state internally).  Feasible, but seems like
duplication of effort.  I haven't checked if there are any write-only
registers in the Geode (though, thankfully, this kind of brain damage is
rarer than it once was).  This then begs interesting kernel/X
synchronization issues, of course.
                                     - Jim


> 								Pavel
-- 
Jim Gettys
One Laptop Per Child



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: ACPI: Idle Processor PM Improvements
  2006-09-12 18:14                 ` Jim Gettys
@ 2006-09-12 18:27                   ` Mitch Bradley
  2006-09-12 20:18                   ` Jordan Crouse
  2006-09-14  9:18                   ` Pavel Machek
  2 siblings, 0 replies; 27+ messages in thread
From: Mitch Bradley @ 2006-09-12 18:27 UTC (permalink / raw)
  To: jg
  Cc: Pavel Machek, Brown, Len, ACPI ML, Linux Kernel ML,
	Dominik Brodowski, Adam Belay, Pallipadi, Venkatesh,
	Arjan van de Ven, devel, Bjorn Helgaas

Jim Gettys wrote:
>
>  I haven't checked if there are any write-only
> registers in the Geode (though, thankfully, this kind of brain damage is
> rarer than it once was). 
I've been going through the Geode and 5536 specs with a fine-toothed 
comb, and so far haven't seen any write-only registers apart from the 
ones in the ISA legacy devices.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: ACPI: Idle Processor PM Improvements
  2006-09-12 18:14                 ` Jim Gettys
  2006-09-12 18:27                   ` Mitch Bradley
@ 2006-09-12 20:18                   ` Jordan Crouse
  2006-09-14  9:20                     ` Pavel Machek
  2006-09-14  9:18                   ` Pavel Machek
  2 siblings, 1 reply; 27+ messages in thread
From: Jordan Crouse @ 2006-09-12 20:18 UTC (permalink / raw)
  To: Jim Gettys
  Cc: Pavel Machek, Brown, Len, Linux Kernel ML, Dominik Brodowski,
	ACPI ML, Adam Belay, Pallipadi, Venkatesh, Arjan van de Ven,
	devel, Bjorn Helgaas

On 12/09/06 14:14 -0400, Jim Gettys wrote:
> > Alternatively, hack kernel to take control from X without actually
> > switching consoles. That should be possible even with current
> > interface.
> 
> This would require saving/restoring all graphics state in the kernel
> (and X already has that state internally).  Feasible, but seems like
> duplication of effort.  I haven't checked if there are any write-only
> registers in the Geode (though, thankfully, this kind of brain damage is
> rarer than it once was).  This then begs interesting kernel/X
> synchronization issues, of course.

We don't need any kernel output during suspend or resume.  Thus, if the VT
doesn't change, then the kernel doesn't need worry about saving or restoring 
the graphics state, and thats the way it should be, IMHO.
Whoever owns the current VT should be in charge of saving and restoring 
the registers.

So, we would need some way of indicating the "ownership" of the VT.  And
in reality, we really only to know if the framebuffer console owns it or
not, so a boolean would suffice.  In the past, I've used KD_TEXT and 
KD_GRAPHICS for this purpose.  As an example, on the Geode LX, I assume
that if the vc_mode is KD_GRAPHICS, then we don't own it, and we don't
do 2D accelerations.  If the mode is KD_TEXT then we are free to use the
2D engine.   All I needed to add ws a notifier chain to let the framebuffer
know when the mode switched, and I was happy.  I'm not sure if thats the
smartest way to handle it permanently, but it works in a pinch.

Jordan

-- 
Jordan Crouse
Senior Linux Engineer
Advanced Micro Devices, Inc.
<www.amd.com/embeddedprocessors>



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: ACPI: Idle Processor PM Improvements
  2006-09-12 18:14                 ` Jim Gettys
  2006-09-12 18:27                   ` Mitch Bradley
  2006-09-12 20:18                   ` Jordan Crouse
@ 2006-09-14  9:18                   ` Pavel Machek
  2006-09-14 11:29                     ` Jim Gettys
  2 siblings, 1 reply; 27+ messages in thread
From: Pavel Machek @ 2006-09-14  9:18 UTC (permalink / raw)
  To: Jim Gettys
  Cc: Jordan Crouse, Brown, Len, Linux Kernel ML, Dominik Brodowski,
	ACPI ML, Adam Belay, Pallipadi, Venkatesh, Arjan van de Ven,
	devel, Bjorn Helgaas

On Tue 2006-09-12 14:14:30, Jim Gettys wrote:
> On Tue, 2006-09-12 at 11:21 +0200, Pavel Machek wrote:
> 
> > Ok, so what is needed is message to X "we are suspending", and X needs
> > to respond "okay, I'm ready, no need for console switch".
> 
> This presumes an external agent to X controlling the fast
> suspend/resume, with messages having to flow to and from X, and to and
> from the kernel, with the kernel in the middle.
> 
> Another simpler option is X itself just telling the kernel to suspend
> without console switch, as the handoff of the display to the DCON chip
> has to be done with X and with an interrupt signaling completion of the
> handoff.  This would be triggered by an inactivity timeout in the X
> server.

Whoa... that's a hack.. but yes, you can probably do that, and I think
kernel even has neccessary interfaces already. (They were needed for
uswsusp).

> > Alternatively, hack kernel to take control from X without actually
> > switching consoles. That should be possible even with current
> > interface.
> 
> This would require saving/restoring all graphics state in the kernel
> (and X already has that state internally).  Feasible, but seems like

Hmm, save/restore graphics state from the kernel would of course be
clean solution, but you should have that anyway... what if someone
suspends without X running?

And of course you can just cheat, and not do kernel save-state on your
system.

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: ACPI: Idle Processor PM Improvements
  2006-09-12 20:18                   ` Jordan Crouse
@ 2006-09-14  9:20                     ` Pavel Machek
  0 siblings, 0 replies; 27+ messages in thread
From: Pavel Machek @ 2006-09-14  9:20 UTC (permalink / raw)
  To: Jordan Crouse
  Cc: Jim Gettys, Brown, Len, Linux Kernel ML, Dominik Brodowski,
	ACPI ML, Adam Belay, Pallipadi, Venkatesh, Arjan van de Ven,
	devel, Bjorn Helgaas

On Tue 2006-09-12 14:18:05, Jordan Crouse wrote:
> On 12/09/06 14:14 -0400, Jim Gettys wrote:
> > > Alternatively, hack kernel to take control from X without actually
> > > switching consoles. That should be possible even with current
> > > interface.
> > 
> > This would require saving/restoring all graphics state in the kernel
> > (and X already has that state internally).  Feasible, but seems like
> > duplication of effort.  I haven't checked if there are any write-only
> > registers in the Geode (though, thankfully, this kind of brain damage is
> > rarer than it once was).  This then begs interesting kernel/X
> > synchronization issues, of course.
> 
> We don't need any kernel output during suspend or resume.  Thus, if the VT
> doesn't change, then the kernel doesn't need worry about saving or restoring 
> the graphics state, and thats the way it should be, IMHO.
> Whoever owns the current VT should be in charge of saving and restoring 
> the registers.
> 
> So, we would need some way of indicating the "ownership" of the VT.  And
> in reality, we really only to know if the framebuffer console owns it or
> not, so a boolean would suffice.  In the past, I've used KD_TEXT and 
> KD_GRAPHICS for this purpose.  As an example, on the Geode LX, I assume
> that if the vc_mode is KD_GRAPHICS, then we don't own it, and we don't
> do 2D accelerations.  If the mode is KD_TEXT then we are free to use the
> 2D engine.   All I needed to add ws a notifier chain to let the framebuffer
> know when the mode switched, and I was happy.  I'm not sure if thats the
> smartest way to handle it permanently, but it works in a pinch.

KD_TEXT vs. KD_GRAPHICS looks like the way to go. Just tell X you want
console back, but then don't actually redraw/switch consoles. We
probably want that on normal PCs, too... console switch for
suspend-to-RAM looks ugly.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: ACPI: Idle Processor PM Improvements
  2006-09-14  9:18                   ` Pavel Machek
@ 2006-09-14 11:29                     ` Jim Gettys
  0 siblings, 0 replies; 27+ messages in thread
From: Jim Gettys @ 2006-09-14 11:29 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Jordan Crouse, Brown, Len, Linux Kernel ML, Dominik Brodowski,
	ACPI ML, Adam Belay, Pallipadi, Venkatesh, Arjan van de Ven,
	devel, Bjorn Helgaas

On Thu, 2006-09-14 at 11:18 +0200, Pavel Machek wrote:
> On Tue, 2006-09-12 at 11:21 +0200, Pavel Machek wrote:
> > 
> > > Ok, so what is needed is message to X "we are suspending", and X
> needs
> > > to respond "okay, I'm ready, no need for console switch".
> > 
> > This presumes an external agent to X controlling the fast
> > suspend/resume, with messages having to flow to and from X, and to
> and
> > from the kernel, with the kernel in the middle.
> > 
> > Another simpler option is X itself just telling the kernel to suspend
> > without console switch, as the handoff of the display to the DCON chip
> > has to be done with X and with an interrupt signaling completion of the
> > handoff.  This would be triggered by an inactivity timeout in the X
> > server.
> 
> Whoa... that's a hack.. but yes, you can probably do that, and I think
> kernel even has neccessary interfaces already. (They were needed for
> uswsusp).

Glad you like it ;-).  Dunno which way we'll go yet, though it will get
to the top of the pile to implement this fall.  I suspect we may go this
route to get going, but explore the more general solution as we get more
sophisticated power management policies and standards in place.

> 
> > > Alternatively, hack kernel to take control from X without actually
> > > switching consoles. That should be possible even with current
> > > interface.
> > 
> > This would require saving/restoring all graphics state in the kernel
> > (and X already has that state internally).  Feasible, but seems like
> 
> Hmm, save/restore graphics state from the kernel would of course be
> clean solution, but you should have that anyway... what if someone
> suspends without X running?

X knows its graphics state; it has to remember it all to know when it
has to be changed; on resume, resume can reinit the graphics state to
what the console wants/needs.

If you VT switch back to X, X can restore the graphics state to what it
remembers.

> 
> And of course you can just cheat, and not do kernel save-state on your
> system.

Yup, though it isn't clear to me I'd call it cheating.  In some ways,
what I just described to handle suspends when X is not running is really
robust and simple.  And you don't have divided responsibility for
remembering the state. Simple == good in my book.

                                  - Jim

-- 
Jim Gettys
One Laptop Per Child



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH 1/2] ACPI: Idle Processor PM Improvements
  2006-09-04 12:59 ` Pavel Machek
@ 2006-09-05  2:19   ` Adam Belay
  0 siblings, 0 replies; 27+ messages in thread
From: Adam Belay @ 2006-09-05  2:19 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Len Brown, ACPI ML, Linux Kernel ML, Dominik Brodowski, Arjan van de Ven

Hi Pavel,

On Mon, 2006-09-04 at 12:59 +0000, Pavel Machek wrote:
> Hi!
> 
> > This patch improves the ACPI c-state selection algorithm.  It also
> > includes a major cleanup and simplification of the processor idle code.
> 
> Nice!
> 
> > @@ -1009,7 +883,7 @@
> >  
> >  	seq_printf(seq, "active state:            C%zd\n"
> >  		   "max_cstate:              C%d\n"
> > -		   "bus master activity:     %08x\n",
> > +		   "bus master activity:     %d\n",
> >  		   pr->power.state ? pr->power.state - pr->power.states : 0,
> >  		   max_cstate, (unsigned)pr->power.bm_activity);
> >  
> 
> This changes kernel - user interface. You should change the field
> description, or keep it in hex...

Good catch!  Essentially the field now counts the number of times bus
master activity was detected, rather than bitshifting.  I'll change its
name in the next iteration.

> 
> BTW will you be on september's labs conference?

It's not currently in my plans, but I'd love to attend one at some
point.

> 
> 							Pavel

Thanks for the comments.

Regards,
Adam



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH 1/2] ACPI: Idle Processor PM Improvements
  2006-08-29 20:51 Adam Belay
@ 2006-09-04 12:59 ` Pavel Machek
  2006-09-05  2:19   ` Adam Belay
  0 siblings, 1 reply; 27+ messages in thread
From: Pavel Machek @ 2006-09-04 12:59 UTC (permalink / raw)
  To: Adam Belay
  Cc: Len Brown, ACPI ML, Linux Kernel ML, Dominik Brodowski, Arjan van de Ven

Hi!

> This patch improves the ACPI c-state selection algorithm.  It also
> includes a major cleanup and simplification of the processor idle code.

Nice!

> @@ -1009,7 +883,7 @@
>  
>  	seq_printf(seq, "active state:            C%zd\n"
>  		   "max_cstate:              C%d\n"
> -		   "bus master activity:     %08x\n",
> +		   "bus master activity:     %d\n",
>  		   pr->power.state ? pr->power.state - pr->power.states : 0,
>  		   max_cstate, (unsigned)pr->power.bm_activity);
>  

This changes kernel - user interface. You should change the field
description, or keep it in hex...

BTW will you be on september's labs conference?

							Pavel
-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC][PATCH 1/2] ACPI: Idle Processor PM Improvements
@ 2006-08-29 20:51 Adam Belay
  2006-09-04 12:59 ` Pavel Machek
  0 siblings, 1 reply; 27+ messages in thread
From: Adam Belay @ 2006-08-29 20:51 UTC (permalink / raw)
  To: Len Brown; +Cc: ACPI ML, Linux Kernel ML, Dominik Brodowski, Arjan van de Ven

Hi All,

This patch improves the ACPI c-state selection algorithm.  It also
includes a major cleanup and simplification of the processor idle code.

The new implementation considers the full menu of available c-states.
Just as the previous implementation, decisions are primarily based on
the residency time of the last c-state entry.  This is generally an
effective metric because it allows for detection of interrupt activity.
However, the new algorithm differs in that it does not promote or demote
through the c-states in succession.  Rather, it immediately jumps to
whatever c-state has the best expected power consumption advantage for
the predicted residency time (i.e. the previously measured residency).
If the residency time is too short during a deep c-state entry, then the
cost of entering the state outweighs any power consumption advantage.
Similarly, if a shallow c-state is entered and resident for an
excessively long duration, then a potential opportunity to save more
power is missed.

The changes in this patch allow the ACPI idle processor mechanism to
react more quickly to sudden bursts of activity because it can jump
directly to whatever c-state is appropriate.  However, because of the
"menu" nature of c-state selection, the code works best when ACPI
implementations expose all of the c-states supported by hardware.

The bus master activity mechanism has undergone similar improvements.
During capability detection, the deepest c-state that allows bus master
activity is determined.  BM_STS is then polled each time the ACPI code
prepares to enter a c-state.  If bus master activity is detected, then
the previously mentioned bus master capable c-state becomes the deepest
c-state allowed for that quantum.  In contrast, the old implementation
would permit bus master activity to cause a promotion from one C3-type
state to the next shallower C3-type state, imposing unnecessary latency.
As a further optimization, BM_STS is cleared each time
acpi_processor_idle() is entered.  This prevents any stale bus master
status from affecting c-state policy, as it may have occurred long ago
during scheduled work.

Finally, it's worth mentioning that the bulk of c-state policy
calculations have been moved to take place before c-states are entered.
This should further reduce exit latency when returning from a c-state.

This algorithm has not yet been carefully benchmarked (e.g. bltk or
power meters).  However, I can say with some confidence that it saves a
small amount more power during an idle workload and a larger amount more
power during typical user-input oriented workloads such as word
processing.

I would really appreciate any comments, suggestions, or testing.

Cheers,
Adam

P.S.: It would be great if we had an accurate way to determine the ticks
spent in the C1 state.  Currently, I work around the issue by setting
"sleep_ticks" such that it promotes to the next deeper state during the
next quantum.

Patch is against 2.6.18-rc4.
Signed-off-by: Adam Belay <abelay@novell.com>

---
 drivers/acpi/processor_idle.c |  502 +++++++++++++++---------------------------
 include/acpi/processor.h      |   18 -
 2 files changed, 184 insertions(+), 336 deletions(-)

--- a/drivers/acpi/processor_idle.c	2006-08-28 17:14:40.000000000 -0400
+++ b/drivers/acpi/processor_idle.c	2006-08-28 17:13:56.000000000 -0400
@@ -8,6 +8,8 @@
  *  			- Added processor hotplug support
  *  Copyright (C) 2005  Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
  *  			- Added support for C3 on SMP
+ *  Copyright (C) 2006  Adam Belay <abelay@novell.com>
+ *  			- New policy algorithm, several cleanups
  *
  * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  *
@@ -52,8 +54,6 @@
 ACPI_MODULE_NAME("acpi_processor")
 #define ACPI_PROCESSOR_FILE_POWER	"power"
 #define US_TO_PM_TIMER_TICKS(t)		((t * (PM_TIMER_FREQUENCY/1000)) / 1000)
-#define C2_OVERHEAD			4	/* 1us (3.579 ticks per us) */
-#define C3_OVERHEAD			4	/* 1us (3.579 ticks per us) */
 static void (*pm_idle_save) (void) __read_mostly;
 module_param(max_cstate, uint, 0644);
 
@@ -61,15 +61,10 @@
 module_param(nocst, uint, 0000);
 
 /*
- * bm_history -- bit-mask with a bit per jiffy of bus-master activity
- * 1000 HZ: 0xFFFFFFFF: 32 jiffies = 32ms
- * 800 HZ: 0xFFFFFFFF: 32 jiffies = 40ms
- * 100 HZ: 0x0000000F: 4 jiffies = 40ms
- * reduce history for more aggressive entry into C3
+ * Currently, we aim for the entry/exit latency to be 20% of measured residency.
  */
-static unsigned int bm_history __read_mostly =
-    (HZ >= 800 ? 0xFFFFFFFF : ((1U << (HZ / 25)) - 1));
-module_param(bm_history, uint, 0644);
+#define RESIDENCY_TO_LATENCY_RATIO	5
+
 /* --------------------------------------------------------------------------
                                 Power Management
    -------------------------------------------------------------------------- */
@@ -165,6 +160,13 @@
 		return ((0xFFFFFFFF - t1) + t2);
 }
 
+static atomic_t c3_cpu_count;
+
+/**
+ * acpi_processor_power_activate - prepares for the next power state
+ * @power: power data
+ * @new: the target power state
+ */
 static void
 acpi_processor_power_activate(struct acpi_processor *pr,
 			      struct acpi_processor_cx *new)
@@ -176,10 +178,6 @@
 
 	old = pr->power.state;
 
-	if (old)
-		old->promotion.count = 0;
-	new->demotion.count = 0;
-
 	/* Cleanup from old state. */
 	if (old) {
 		switch (old->type) {
@@ -207,330 +205,216 @@
 	return;
 }
 
-static void acpi_safe_halt(void)
+
+/**
+ * acpi_check_bm_status - determines if there is BM activity
+ *
+ * Returns: a non-zero value to indicate BM activity
+ */
+static inline int acpi_check_bm_status(void)
 {
-	current_thread_info()->status &= ~TS_POLLING;
-	smp_mb__after_clear_bit();
-	if (!need_resched())
-		safe_halt();
-	current_thread_info()->status |= TS_POLLING;
-}
+	u32 bm_status;
 
-static atomic_t c3_cpu_count;
+	acpi_get_register(ACPI_BITREG_BUS_MASTER_STATUS,
+			  &bm_status, ACPI_MTX_DO_NOT_LOCK);
+	if (bm_status) {
+		acpi_set_register(ACPI_BITREG_BUS_MASTER_STATUS,
+				  1, ACPI_MTX_DO_NOT_LOCK);
+		return 1;
+	}
+	/*
+	 * PIIX4 Erratum #18: Note that BM_STS doesn't always reflect
+	 * the true state of bus mastering activity; forcing us to
+	 * manually check the BMIDEA bit of each IDE channel.
+	 */
+	else if (errata.piix4.bmisx) {
+		if ((inb_p(errata.piix4.bmisx + 0x02) & 0x01)
+		    || (inb_p(errata.piix4.bmisx + 0x0A) & 0x01))
+			return 1;
+	}
+
+	return 0;
+}
 
+/**
+ * acpi_processor_idle - the main ACPI idle loop
+ *
+ * This function determines and enters the most appropriate ACPI c-state based
+ * on current system conditions.
+ */
 static void acpi_processor_idle(void)
 {
 	struct acpi_processor *pr = NULL;
 	struct acpi_processor_cx *cx = NULL;
-	struct acpi_processor_cx *next_state = NULL;
-	int sleep_ticks = 0;
-	u32 t1, t2 = 0;
+	u32 sleep_ticks, state_idx, t1, t2, i;
 
 	pr = processors[smp_processor_id()];
 	if (!pr)
 		return;
 
 	/*
-	 * Interrupts must be disabled during bus mastering calculations and
-	 * for C2/C3 transitions.
-	 */
-	local_irq_disable();
-
-	/*
-	 * Check whether we truly need to go idle, or should
-	 * reschedule:
-	 */
-	if (unlikely(need_resched())) {
-		local_irq_enable();
-		return;
-	}
-
-	cx = pr->power.state;
-	if (!cx) {
-		if (pm_idle_save)
-			pm_idle_save();
-		else
-			acpi_safe_halt();
-		return;
-	}
-
-	/*
-	 * Check BM Activity
-	 * -----------------
-	 * Check for bus mastering activity (if required), record, and check
-	 * for demotion.
-	 */
-	if (pr->flags.bm_check) {
-		u32 bm_status = 0;
-		unsigned long diff = jiffies - pr->power.bm_check_timestamp;
-
-		if (diff > 31)
-			diff = 31;
-
-		pr->power.bm_activity <<= diff;
-
-		acpi_get_register(ACPI_BITREG_BUS_MASTER_STATUS,
-				  &bm_status, ACPI_MTX_DO_NOT_LOCK);
-		if (bm_status) {
-			pr->power.bm_activity |= 0x1;
-			acpi_set_register(ACPI_BITREG_BUS_MASTER_STATUS,
-					  1, ACPI_MTX_DO_NOT_LOCK);
+	 * We assume there's a good chance the idle conditions will be similar
+	 * to those before we scheduled work.  Therefore, the next state is
+	 * determined by the idle ticks of the last sleep state entered.
+	 */
+	sleep_ticks = pr->power.last_ticks;
+	state_idx = pr->power.count;
+
+	/*
+	 * We also clear BM_STS, as it may have been a while since we last
+	 * checked it.
+	 */
+	acpi_set_register(ACPI_BITREG_BUS_MASTER_STATUS,
+			  1, ACPI_MTX_DO_NOT_LOCK);
+
+	while (!need_resched()) {
+		int count = min(pr->power.count, (int) max_cstate);
+		cx = &pr->power.states[state_idx];
+
+		if (cx->target_ticks < sleep_ticks) { /* promotion */
+			for (i = state_idx + 1; i <= count; i++) {
+				cx = &pr->power.states[i];
+				if (!cx->valid)
+					continue;
+				state_idx = i;
+				if (cx->target_ticks >= sleep_ticks)
+					break;
+			}
+		} else { /* demotion */
+			for (i = state_idx - 1; i > 0; i--) {
+				cx = &pr->power.states[i];
+				if (!cx->valid)
+					continue;
+				state_idx = i;
+				if (cx->target_ticks < sleep_ticks)
+					break;
+			}
 		}
+
 		/*
-		 * PIIX4 Erratum #18: Note that BM_STS doesn't always reflect
-		 * the true state of bus mastering activity; forcing us to
-		 * manually check the BMIDEA bit of each IDE channel.
+		 * Interrupts must be disabled during bus mastering
+		 * calculations and for C-state transitions.
 		 */
-		else if (errata.piix4.bmisx) {
-			if ((inb_p(errata.piix4.bmisx + 0x02) & 0x01)
-			    || (inb_p(errata.piix4.bmisx + 0x0A) & 0x01))
-				pr->power.bm_activity |= 0x1;
-		}
+		local_irq_disable();
 
-		pr->power.bm_check_timestamp = jiffies;
+		if (unlikely(need_resched())) {
+			local_irq_enable();
+			return;
+		}
 
 		/*
-		 * If bus mastering is or was active this jiffy, demote
-		 * to avoid a faulty transition.  Note that the processor
-		 * won't enter a low-power state during this call (to this
-		 * function) but should upon the next.
-		 *
-		 * TBD: A better policy might be to fallback to the demotion
-		 *      state (use it for this quantum only) istead of
-		 *      demoting -- and rely on duration as our sole demotion
-		 *      qualification.  This may, however, introduce DMA
-		 *      issues (e.g. floppy DMA transfer overrun/underrun).
+		 * Check bus master status, if active ensure we enter a state
+		 * that allows bus master transactions.
 		 */
-		if ((pr->power.bm_activity & 0x1) &&
-		    cx->demotion.threshold.bm) {
-			local_irq_enable();
-			next_state = cx->demotion.state;
-			goto end;
+		if (pr->flags.bm_check && acpi_check_bm_status()) {
+			pr->power.bm_activity++;
+			state_idx = min(state_idx, pr->power.bm_veto_state);
 		}
-	}
 
 #ifdef CONFIG_HOTPLUG_CPU
-	/*
-	 * Check for P_LVL2_UP flag before entering C2 and above on
-	 * an SMP system. We do it here instead of doing it at _CST/P_LVL
-	 * detection phase, to work cleanly with logical CPU hotplug.
-	 */
-	if ((cx->type != ACPI_STATE_C1) && (num_online_cpus() > 1) && 
-	    !pr->flags.has_cst && !acpi_fadt.plvl2_up)
-		cx = &pr->power.states[ACPI_STATE_C1];
+		/*
+		 * Check for P_LVL2_UP flag before entering C2 and above on
+		 * an SMP system. We do it here instead of doing it at _CST/P_LVL
+		 * detection phase, to work cleanly with logical CPU hotplug.
+		 */
+		if ((cx->type != ACPI_STATE_C1) && (num_online_cpus() > 1) && 
+		    !pr->flags.has_cst && !acpi_fadt.plvl2_up)
+			state_idx = ACPI_STATE_C1;
 #endif
 
-	/*
-	 * Sleep:
-	 * ------
-	 * Invoke the current Cx state to put the processor to sleep.
-	 */
-	if (cx->type == ACPI_STATE_C2 || cx->type == ACPI_STATE_C3) {
+		cx = &pr->power.states[state_idx];
+
+		acpi_processor_power_activate(pr, cx);
+
 		current_thread_info()->status &= ~TS_POLLING;
 		smp_mb__after_clear_bit();
+
 		if (need_resched()) {
 			current_thread_info()->status |= TS_POLLING;
 			local_irq_enable();
 			return;
 		}
-	}
-
-	switch (cx->type) {
 
-	case ACPI_STATE_C1:
-		/*
-		 * Invoke C1.
-		 * Use the appropriate idle routine, the one that would
-		 * be used without acpi C-states.
-		 */
-		if (pm_idle_save)
-			pm_idle_save();
-		else
-			acpi_safe_halt();
-
-		/*
-		 * TBD: Can't get time duration while in C1, as resumes
-		 *      go to an ISR rather than here.  Need to instrument
-		 *      base interrupt handler.
-		 */
-		sleep_ticks = 0xFFFFFFFF;
-		break;
+		if (cx->type == ACPI_STATE_C1) { /* enter C1 */
+			safe_halt();
+			/*
+			 * TBD: Can't get time duration while in C1, as resumes
+			 *      go to an ISR rather than here.  Need to instrument
+			 *      base interrupt handler.
+			 */
+			sleep_ticks = cx->target_ticks + 1;
+		} else { /* enter C2 or C3 */
+			if (cx->type == ACPI_STATE_C3) {
+				if (pr->flags.bm_check) {
+					if (atomic_inc_return(&c3_cpu_count) ==
+						num_online_cpus()) {
+						/*
+						 * All CPUs are trying to go to C3
+						 * Disable bus master arbitration
+						 */
+						acpi_set_register(ACPI_BITREG_ARB_DISABLE, 1,
+								  ACPI_MTX_DO_NOT_LOCK);
+					}
+				} else {
+					/* SMP with no shared cache... Invalidate cache  */
+					ACPI_FLUSH_CPU_CACHE();
+				}
+			}
 
-	case ACPI_STATE_C2:
-		/* Get start time (ticks) */
-		t1 = inl(acpi_fadt.xpm_tmr_blk.address);
-		/* Invoke C2 */
-		inb(cx->address);
-		/* Dummy wait op - must do something useless after P_LVL2 read
-		   because chipsets cannot guarantee that STPCLK# signal
-		   gets asserted in time to freeze execution properly. */
-		t2 = inl(acpi_fadt.xpm_tmr_blk.address);
-		/* Get end time (ticks) */
-		t2 = inl(acpi_fadt.xpm_tmr_blk.address);
+			/* Get start time (ticks) */
+			t1= inl(acpi_fadt.xpm_tmr_blk.address);
+			/* invoke the target C-state */
+			inb(cx->address);
+			/* Dummy wait op - must do something useless after P_LVL2/3 read
+			   because chipsets cannot guarantee that STPCLK# signal gets
+			   asserted in time to freeze execution properly. */
+			t2 = inl(acpi_fadt.xpm_tmr_blk.address);
+			/* Get end time (ticks) */
+			t2 = inl(acpi_fadt.xpm_tmr_blk.address);
+
+			if (cx->type == ACPI_STATE_C3 && pr->flags.bm_check) {
+				/* Enable bus master arbitration */
+				atomic_dec(&c3_cpu_count);
+				acpi_set_register(ACPI_BITREG_ARB_DISABLE, 0,
+						  ACPI_MTX_DO_NOT_LOCK);
+			}
 
 #ifdef CONFIG_GENERIC_TIME
-		/* TSC halts in C2, so notify users */
-		mark_tsc_unstable();
+			/* TSC halts, so notify users */
+			mark_tsc_unstable();
 #endif
-		/* Re-enable interrupts */
-		local_irq_enable();
-		current_thread_info()->status |= TS_POLLING;
-		/* Compute time (ticks) that we were actually asleep */
-		sleep_ticks =
-		    ticks_elapsed(t1, t2) - cx->latency_ticks - C2_OVERHEAD;
-		break;
 
-	case ACPI_STATE_C3:
-
-		if (pr->flags.bm_check) {
-			if (atomic_inc_return(&c3_cpu_count) ==
-			    num_online_cpus()) {
-				/*
-				 * All CPUs are trying to go to C3
-				 * Disable bus master arbitration
-				 */
-				acpi_set_register(ACPI_BITREG_ARB_DISABLE, 1,
-						  ACPI_MTX_DO_NOT_LOCK);
-			}
-		} else {
-			/* SMP with no shared cache... Invalidate cache  */
-			ACPI_FLUSH_CPU_CACHE();
+			/* Compute time (ticks) that we were actually asleep */
+			sleep_ticks = ticks_elapsed(t1, t2);
 		}
 
-		/* Get start time (ticks) */
-		t1 = inl(acpi_fadt.xpm_tmr_blk.address);
-		/* Invoke C3 */
-		inb(cx->address);
-		/* Dummy wait op (see above) */
-		t2 = inl(acpi_fadt.xpm_tmr_blk.address);
-		/* Get end time (ticks) */
-		t2 = inl(acpi_fadt.xpm_tmr_blk.address);
-		if (pr->flags.bm_check) {
-			/* Enable bus master arbitration */
-			atomic_dec(&c3_cpu_count);
-			acpi_set_register(ACPI_BITREG_ARB_DISABLE, 0,
-					  ACPI_MTX_DO_NOT_LOCK);
-		}
-
-#ifdef CONFIG_GENERIC_TIME
-		/* TSC halts in C3, so notify users */
-		mark_tsc_unstable();
-#endif
-		/* Re-enable interrupts */
 		local_irq_enable();
 		current_thread_info()->status |= TS_POLLING;
-		/* Compute time (ticks) that we were actually asleep */
-		sleep_ticks =
-		    ticks_elapsed(t1, t2) - cx->latency_ticks - C3_OVERHEAD;
-		break;
 
-	default:
-		local_irq_enable();
-		return;
-	}
-	cx->usage++;
-	if ((cx->type != ACPI_STATE_C1) && (sleep_ticks > 0))
+		cx->usage++;
 		cx->time += sleep_ticks;
-
-	next_state = pr->power.state;
-
-#ifdef CONFIG_HOTPLUG_CPU
-	/* Don't do promotion/demotion */
-	if ((cx->type == ACPI_STATE_C1) && (num_online_cpus() > 1) &&
-	    !pr->flags.has_cst && !acpi_fadt.plvl2_up) {
-		next_state = cx;
-		goto end;
-	}
-#endif
-
-	/*
-	 * Promotion?
-	 * ----------
-	 * Track the number of longs (time asleep is greater than threshold)
-	 * and promote when the count threshold is reached.  Note that bus
-	 * mastering activity may prevent promotions.
-	 * Do not promote above max_cstate.
-	 */
-	if (cx->promotion.state &&
-	    ((cx->promotion.state - pr->power.states) <= max_cstate)) {
-		if (sleep_ticks > cx->promotion.threshold.ticks) {
-			cx->promotion.count++;
-			cx->demotion.count = 0;
-			if (cx->promotion.count >=
-			    cx->promotion.threshold.count) {
-				if (pr->flags.bm_check) {
-					if (!
-					    (pr->power.bm_activity & cx->
-					     promotion.threshold.bm)) {
-						next_state =
-						    cx->promotion.state;
-						goto end;
-					}
-				} else {
-					next_state = cx->promotion.state;
-					goto end;
-				}
-			}
-		}
-	}
-
-	/*
-	 * Demotion?
-	 * ---------
-	 * Track the number of shorts (time asleep is less than time threshold)
-	 * and demote when the usage threshold is reached.
-	 */
-	if (cx->demotion.state) {
-		if (sleep_ticks < cx->demotion.threshold.ticks) {
-			cx->demotion.count++;
-			cx->promotion.count = 0;
-			if (cx->demotion.count >= cx->demotion.threshold.count) {
-				next_state = cx->demotion.state;
-				goto end;
-			}
-		}
 	}
 
-      end:
-	/*
-	 * Demote if current state exceeds max_cstate
-	 */
-	if ((pr->power.state - pr->power.states) > max_cstate) {
-		if (cx->demotion.state)
-			next_state = cx->demotion.state;
-	}
-
-	/*
-	 * New Cx State?
-	 * -------------
-	 * If we're going to start using a new Cx state we must clean up
-	 * from the previous and prepare to use the new.
-	 */
-	if (next_state != pr->power.state)
-		acpi_processor_power_activate(pr, next_state);
+	pr->power.last_ticks = sleep_ticks;
 }
 
+/**
+ * acpi_processor_set_power_policy - sets the default idle policy
+ * @pr: the processor
+ *
+ * This function sets the default Cx state policy (OS idle handler).
+ * Note that the Cx state policy is completely customizable and can
+ * be altered dynamically.
+ */
 static int acpi_processor_set_power_policy(struct acpi_processor *pr)
 {
 	unsigned int i;
 	unsigned int state_is_set = 0;
-	struct acpi_processor_cx *lower = NULL;
-	struct acpi_processor_cx *higher = NULL;
 	struct acpi_processor_cx *cx;
 
-
 	if (!pr)
 		return -EINVAL;
 
-	/*
-	 * This function sets the default Cx state policy (OS idle handler).
-	 * Our scheme is to promote quickly to C2 but more conservatively
-	 * to C3.  We're favoring C2  for its characteristics of low latency
-	 * (quick response), good power savings, and ability to allow bus
-	 * mastering activity.  Note that the Cx state policy is completely
-	 * customizable and can be altered dynamically.
-	 */
-
 	/* startup state */
 	for (i = 1; i < ACPI_PROCESSOR_MAX_POWER; i++) {
 		cx = &pr->power.states[i];
@@ -546,41 +430,31 @@
 	if (!state_is_set)
 		return -ENODEV;
 
-	/* demotion */
-	for (i = 1; i < ACPI_PROCESSOR_MAX_POWER; i++) {
+	state_is_set = 0;
+
+	/* find deepest bus master compatible state */
+	for (i = (ACPI_PROCESSOR_MAX_POWER - 1); i > 0; i--) {
 		cx = &pr->power.states[i];
 		if (!cx->valid)
 			continue;
+		if (cx->type == ACPI_STATE_C3)
+			continue;
 
-		if (lower) {
-			cx->demotion.state = lower;
-			cx->demotion.threshold.ticks = cx->latency_ticks;
-			cx->demotion.threshold.count = 1;
-			if (cx->type == ACPI_STATE_C3)
-				cx->demotion.threshold.bm = bm_history;
-		}
-
-		lower = cx;
+		pr->power.bm_veto_state = i;
+		state_is_set = 1;
+		break;
 	}
 
-	/* promotion */
-	for (i = (ACPI_PROCESSOR_MAX_POWER - 1); i > 0; i--) {
+	if (!state_is_set)
+		return -ENODEV;
+
+	/* determine target sleep ticks */
+	for (i = 1; i < ACPI_PROCESSOR_MAX_POWER; i++) {
 		cx = &pr->power.states[i];
 		if (!cx->valid)
 			continue;
 
-		if (higher) {
-			cx->promotion.state = higher;
-			cx->promotion.threshold.ticks = cx->latency_ticks;
-			if (cx->type >= ACPI_STATE_C2)
-				cx->promotion.threshold.count = 4;
-			else
-				cx->promotion.threshold.count = 10;
-			if (higher->type == ACPI_STATE_C3)
-				cx->promotion.threshold.bm = bm_history;
-		}
-
-		higher = cx;
+		cx->target_ticks = cx->latency_ticks * RESIDENCY_TO_LATENCY_RATIO;
 	}
 
 	return 0;
@@ -1009,7 +883,7 @@
 
 	seq_printf(seq, "active state:            C%zd\n"
 		   "max_cstate:              C%d\n"
-		   "bus master activity:     %08x\n",
+		   "bus master activity:     %d\n",
 		   pr->power.state ? pr->power.state - pr->power.states : 0,
 		   max_cstate, (unsigned)pr->power.bm_activity);
 
@@ -1040,20 +914,6 @@
 			break;
 		}
 
-		if (pr->power.states[i].promotion.state)
-			seq_printf(seq, "promotion[C%zd] ",
-				   (pr->power.states[i].promotion.state -
-				    pr->power.states));
-		else
-			seq_puts(seq, "promotion[--] ");
-
-		if (pr->power.states[i].demotion.state)
-			seq_printf(seq, "demotion[C%zd] ",
-				   (pr->power.states[i].demotion.state -
-				    pr->power.states));
-		else
-			seq_puts(seq, "demotion[--] ");
-
 		seq_printf(seq, "latency[%03d] usage[%08d] duration[%020llu]\n",
 			   pr->power.states[i].latency,
 			   pr->power.states[i].usage,
--- a/include/acpi/processor.h	2006-08-28 17:14:40.000000000 -0400
+++ b/include/acpi/processor.h	2006-08-28 16:37:35.000000000 -0400
@@ -43,17 +43,6 @@
 	u64 address;
 } __attribute__ ((packed));
 
-struct acpi_processor_cx_policy {
-	u32 count;
-	struct acpi_processor_cx *state;
-	struct {
-		u32 time;
-		u32 ticks;
-		u32 count;
-		u32 bm;
-	} threshold;
-};
-
 struct acpi_processor_cx {
 	u8 valid;
 	u8 type;
@@ -63,15 +52,14 @@
 	u32 power;
 	u32 usage;
 	u64 time;
-	struct acpi_processor_cx_policy promotion;
-	struct acpi_processor_cx_policy demotion;
+	u32 target_ticks;
 };
 
 struct acpi_processor_power {
 	struct acpi_processor_cx *state;
-	unsigned long bm_check_timestamp;
-	u32 default_state;
 	u32 bm_activity;
+	u32 bm_veto_state;
+	u32 last_ticks;
 	int count;
 	struct acpi_processor_cx states[ACPI_PROCESSOR_MAX_POWER];
 };



^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2006-09-14 11:30 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-08-30 18:40 [RFC][PATCH 1/2] ACPI: Idle Processor PM Improvements Pallipadi, Venkatesh
2006-08-30 19:43 ` Matthew Garrett
2006-08-31 23:13   ` Bjorn Helgaas
2006-09-01  0:30     ` [OLPC-devel] " Jim Gettys
2006-09-01  3:53       ` Len Brown
2006-09-01  4:12         ` Matthew Garrett
2006-09-01 15:51           ` Jordan Crouse
2006-09-01 13:14         ` [OLPC-devel] Re: [RFC][PATCH 1/2] " Carl-Daniel Hailfinger
2006-09-01 21:52         ` Andi Kleen
2006-09-01 22:57           ` Alan Cox
2006-09-04 13:13         ` Pavel Machek
2006-09-04 13:09       ` Pavel Machek
2006-09-05 14:31         ` Jim Gettys
2006-09-06 10:37           ` Pavel Machek
2006-09-06 14:58             ` Jordan Crouse
2006-09-12  9:21               ` Pavel Machek
2006-09-12 18:14                 ` Jim Gettys
2006-09-12 18:27                   ` Mitch Bradley
2006-09-12 20:18                   ` Jordan Crouse
2006-09-14  9:20                     ` Pavel Machek
2006-09-14  9:18                   ` Pavel Machek
2006-09-14 11:29                     ` Jim Gettys
2006-09-06 15:19             ` [OLPC-devel] Re: [RFC][PATCH 1/2] " Jim Gettys
2006-09-12  9:21               ` Pavel Machek
  -- strict thread matches above, loose matches on Subject: below --
2006-08-29 20:51 Adam Belay
2006-09-04 12:59 ` Pavel Machek
2006-09-05  2:19   ` Adam Belay

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).