* Re: So, what's the status on the recent patches here?
@ 2006-08-24 14:52 Woodruff, Richard
2006-08-25 19:58 ` Pavel Machek
0 siblings, 1 reply; 136+ messages in thread
From: Woodruff, Richard @ 2006-08-24 14:52 UTC (permalink / raw)
To: Pavel Machek; +Cc: linux-pm
> > > ...which is very bad interface for applications. See my other
> > > mail. Applications should not have to play with fast/medium/slow,
> > > explicitely. Instead, on opening /dev/dsp, you should power up the
> > > sound system (and maybe adjust cpu frequency if
> > > neccessary). Application should not have to do echo fast >
somewhere
> > > before opening /dev/dsp
> >
> > How does /dev/dsp know at what level it can run at? On the SOC I
> > control the speed of the DSP. I can adjust its MIPs rate.
>
> (I meant /dev/dsp -- OSS audio device, not Digital Signal Processor).
It can be all the same.
The device behind /dev/dsp doing the work likely is the main control
processor or it is a DSP (and some side mixer chip or not). The DSP in
the laptop case might be hidden inside some PCI composite device and
present its own interface. Thus you may treat it as some discrete
device with one register range. On the SOC it is all unrolled and you
must control all the pieces individually (and in concert with each
other). The DSP in both cases may have its own OS also running. This
is generally what you download as firmware.
When I want to frequency and/or voltage scale I must take into account
what the DSP is doing and what the applications processor is doing.
This is not so different from today's SMP/Core-Duo type systems where
both CPUs are in the same voltage plane. You can't change one with out
affecting the other.
The internal busses in SOCs wrap all these integrated peripherals in a
common way and add power hooks. This allows them to achieve massive
power optimizations which are not likely possible in the PC world.
> > A missing pieces is meaningful coordination between devices. Each
> > device is not an island. Not taking care of all devices on the
internal
> > interconnects may mean you don't get the big power savings. For the
DSP
>
> For notebooks, devices *are* islands. powerop tries to push
> everything-depends-on-everything model that may be good for some SoC,
> but sucks for notebooks. We need some middle ground.
USB being enabled and causing your laptop battery to dry up is a case
where laptop device dependency has been shown. There are likely many
more cases. I would expect BIOS/chip set developers are all too aware
of these in their sub-domains.
On a PC it might be hardware bugs and software bugs which are cause some
of the problems. This is the case for embedded also. An embedded SOC
does have another dimension in that they are designed to have global
system power states which include all devices (a processor is just
another device and their may be many). Their high level of integration
enables this. Linux's device model doesn't match up well with this.
There are X standard ways in which it is done by various vendors.
PowerOP at the low level provides a mechanism to abstract these
differences.
Thanks,
Richard W.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
@ 2006-09-05 16:03 Scott E. Preece
2006-09-05 20:42 ` Rafael J. Wysocki
2006-09-06 10:56 ` Pavel Machek
0 siblings, 2 replies; 136+ messages in thread
From: Scott E. Preece @ 2006-09-05 16:03 UTC (permalink / raw)
To: rjw; +Cc: scott.preece, matthew.a.locke, linux-pm, pavel
| From: "Rafael J. Wysocki" <rjw@sisk.pl>
|
| On Monday, 4 September 2006 01:00, Scott E. Preece wrote:
| > policy decision to suspend is based on factors that are wholly different
| > than the factors that drive frequency/voltage changes. If that were the
| > case, then there would be no point to making the decisions in the same
| > place. Honestly, I'm not sure of the answer to that...
|
| I think the decision to suspend is made
| a) by the user,
| b) by a policy manager in case when, for example, the battery is running
| critical (ie. on emergency).
| and the decision to change a frequency/voltage is usually based on some
| efficiency factors.
|
| Also, the suspend "transitions" are never transparent to the user and the
| changes of frequency/voltage usually are, at least as far as CPUs are
| concerned.
---
Your scope is too narrow. In our domain (mobile phones) the user has no
control at all over power management and decisions to suspend are always
transparent.
In our own implementation, the user-space policy manager initiates
frequency and voltage changes and enabled suspends, but doesn't actually
initiate them. That is, the policy manager says "based on current user
activity, it would be OK to suspend now", and a kernel component then
looks for a good time to do it, based on the system being idle.
We have thought about merging the decisions into a single range of
operating points, but the added plumbing to get idle information back to
the policy manager seemed unappealing.
We don't use cpufreq, so Pavel's arguments about not changing the kernel
interface weren't a concern, for us. We're using a kernel interface that
is all our own (and unappealingly ioctl-based).
scott
--
scott preece
motorola mobile devices, il67, 1800 s. oak st., champaign, il 61820
e-mail: preece@motorola.com fax: +1-217-384-8550
phone: +1-217-384-8589 cell: +1-217-433-6114 pager: 2174336114@vtext.com
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-09-05 16:03 Scott E. Preece
@ 2006-09-05 20:42 ` Rafael J. Wysocki
2006-09-06 10:56 ` Pavel Machek
1 sibling, 0 replies; 136+ messages in thread
From: Rafael J. Wysocki @ 2006-09-05 20:42 UTC (permalink / raw)
Cc: scott.preece, matthew.a.locke, linux-pm, pavel
On Tuesday, 5 September 2006 18:03, Scott E. Preece wrote:
>
> | From: "Rafael J. Wysocki" <rjw@sisk.pl>
> |
> | On Monday, 4 September 2006 01:00, Scott E. Preece wrote:
> | > policy decision to suspend is based on factors that are wholly different
> | > than the factors that drive frequency/voltage changes. If that were the
> | > case, then there would be no point to making the decisions in the same
> | > place. Honestly, I'm not sure of the answer to that...
> |
> | I think the decision to suspend is made
> | a) by the user,
> | b) by a policy manager in case when, for example, the battery is running
> | critical (ie. on emergency).
> | and the decision to change a frequency/voltage is usually based on some
> | efficiency factors.
> |
> | Also, the suspend "transitions" are never transparent to the user and the
> | changes of frequency/voltage usually are, at least as far as CPUs are
> | concerned.
> ---
>
> Your scope is too narrow. In our domain (mobile phones) the user has no
> control at all over power management and decisions to suspend are always
> transparent.
>
> In our own implementation, the user-space policy manager initiates
> frequency and voltage changes and enabled suspends, but doesn't actually
> initiate them. That is, the policy manager says "based on current user
> activity, it would be OK to suspend now", and a kernel component then
> looks for a good time to do it, based on the system being idle.
Okay, but it's not like that on a PC.
IMHO there are architectures on which suspend states are distinct and
therefore they should be treated as such in general.
> We have thought about merging the decisions into a single range of
> operating points, but the added plumbing to get idle information back to
> the policy manager seemed unappealing.
>
> We don't use cpufreq, so Pavel's arguments about not changing the kernel
> interface weren't a concern, for us. We're using a kernel interface that
> is all our own (and unappealingly ioctl-based).
Fine, as far as I'm concerned. ;-)
Greetings,
Rafael
--
You never change things by fighting the existing reality.
R. Buckminster Fuller
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-09-05 16:03 Scott E. Preece
2006-09-05 20:42 ` Rafael J. Wysocki
@ 2006-09-06 10:56 ` Pavel Machek
1 sibling, 0 replies; 136+ messages in thread
From: Pavel Machek @ 2006-09-06 10:56 UTC (permalink / raw)
Cc: scott.preece, matthew.a.locke, linux-pm
Hi!
> We have thought about merging the decisions into a single range of
> operating points, but the added plumbing to get idle information back to
> the policy manager seemed unappealing.
>
> We don't use cpufreq, so Pavel's arguments about not changing the kernel
> interface weren't a concern, for us. We're using a kernel interface that
> is all our own (and unappealingly ioctl-based).
Well, but you can see that my arguments are quite important for
mainline merge? ;-).
Of course, powerop/oppoint patches are okay for your own use.
Pavel
--
Thanks, Sharp!
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
@ 2006-09-04 15:43 Scott E. Preece
0 siblings, 0 replies; 136+ messages in thread
From: Scott E. Preece @ 2006-09-04 15:43 UTC (permalink / raw)
To: pavel; +Cc: scott.preece, matthew.a.locke, linux-pm
| From: Pavel Machek<pavel@ucw.cz>
| >...
| > My question is whether there are aspects of suspending, other than
| > latency, that the policy manager would need to consider in deciding
| > whether to suspend or not.
| >
| > Look at it this way. In one scheme the policy manager code is:
| >
| > new_OP = select_transition(current_OP, decision_factors);
| > set_OP(new_OP);
|
| No, it would be
|
| new_OP = select_transition(current_OP, decision_factors);
| if (new_OP == SUSPEND) {
| setup wakeup events ...
| }
| set_OP(new_OP);
---
Sorry; in our implementation, the devices are responsible for
configuring the wakeup events, either globally or in their suspend
routines, so it would look like my example code. However, I would have
expected the setup of wakeup events to happen in the kernel's set_OP
implementation, rather than in the policy manager, anyway.
Again, this model puts the decision to change in user space and the
implementation of the decision in the kernel.
scott
--
scott preece
motorola mobile devices, il67, 1800 s. oak st., champaign, il 61820
e-mail: preece@motorola.com fax: +1-217-384-8550
phone: +1-217-384-8589 cell: +1-217-433-6114 pager: 2174336114@vtext.com
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
@ 2006-09-03 23:05 Scott E. Preece
2006-09-04 9:09 ` Pavel Machek
0 siblings, 1 reply; 136+ messages in thread
From: Scott E. Preece @ 2006-09-03 23:05 UTC (permalink / raw)
To: pavel; +Cc: linux-pm
| From: Pavel Machek<pavel@ucw.cz>
|
| On Sun 2006-09-03 17:31:02, Scott E. Preece wrote:
| > | From: Pavel Machek<pavel@ucw.cz>
| > | Cc: <amit.kucheria@nokia.com>, <linux-pm@lists.osdl.org>
| > | User-Agent: Mutt/1.5.11+cvs20060126
| > |
| > | On Sun 2006-09-03 16:21:34, Scott E. Preece wrote:
| > | > | From: Pavel Machek<pavel@ucw.cz>
| > | > some reason, in your perception, why definition of operating points
| > | > really needs to be in the kernel? Definition of the operating points,
| > | > as opposed to changing from one OP to another, shouldn't have any timing
| > | > issues, so why isn't a privileged user-space manager a reasonable
| > | > approach?
| > |
| > | For one thing, is not powerop needed for boot? You need to boot in
| > | some operating point after all :-).
| > ---
| >
| > In the implementation we use, the initial OP is set by the boot
| > loader. Our kernel power management driver is a module. The assumption
| > is that booting involves a lot of processing and it makes sense to just
| > use the highest OP anyway. That makes sense in our environment, but I
| > wouldn't recommend our version as a general solution, anyway.
|
| That's actually regression; cpufreq saves power even while booting.
---
Well, the kernel boot phase is relatively brief. The module could be
loaded and managing the OP before you start bringing up the rest of user
space. However, I'm not arguing that our approach to that is generally
applicable - I was just giving an existence proof.
scott
|
| > | > As noted previously, OPs bundle together more than just the
| > | > frequency. Those of us supporting the OP model believe that you can't
| > | > intelligently change CPU frequency in isolation and you can't change
| > | > some of those parameters independently, because only certain
| > | > combinations work.
| > |
| > | That's okay. User gives you combination he wants, and you select "next
| > | higher" working operating point.
| > ---
| >
| > Hmm. I need to think about that. I guess the OP abstraction *could* be
| > entirely inside the user-space policy manager, with the kernel exposing
| > individual interfaces for every parameter that the policy manager would
| > need to adjust. However, that means that the whole mess has to be at
| > user level (kernel just implements simple knobs for individual
| > parameters), because the dependency management between the parameters
| > would only be known at user level, and means a relatively bulky kernel
| > interface, since it would expose more things at the interface.
|
| That would work for me.
|
| But my idea was actually opposite: expose individual knobs to
| userspace, and then select some operating point (inside kernel) that
| satisfies given knobs.
---
But then the kernel needs to know about the operating points, so aren't
you back where we were before?
scott
--
scott preece
motorola mobile devices, il67, 1800 s. oak st., champaign, il 61820
e-mail: preece@motorola.com fax: +1-217-384-8550
phone: +1-217-384-8589 cell: +1-217-433-6114 pager: 2174336114@vtext.com
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-09-03 23:05 Scott E. Preece
@ 2006-09-04 9:09 ` Pavel Machek
0 siblings, 0 replies; 136+ messages in thread
From: Pavel Machek @ 2006-09-04 9:09 UTC (permalink / raw)
To: Scott E. Preece; +Cc: linux-pm
Hi!
> | > In the implementation we use, the initial OP is set by the boot
> | > loader. Our kernel power management driver is a module. The assumption
> | > is that booting involves a lot of processing and it makes sense to just
> | > use the highest OP anyway. That makes sense in our environment, but I
> | > wouldn't recommend our version as a general solution, anyway.
> |
> | That's actually regression; cpufreq saves power even while booting.
> ---
>
> Well, the kernel boot phase is relatively brief. The module could be
> loaded and managing the OP before you start bringing up the rest of user
> space. However, I'm not arguing that our approach to that is generally
> applicable - I was just giving an existence proof.
powernow-k8 machines can't even run in highest OP point on battery
power.
> | > Hmm. I need to think about that. I guess the OP abstraction *could* be
> | > entirely inside the user-space policy manager, with the kernel exposing
> | > individual interfaces for every parameter that the policy manager would
> | > need to adjust. However, that means that the whole mess has to be at
> | > user level (kernel just implements simple knobs for individual
> | > parameters), because the dependency management between the parameters
> | > would only be known at user level, and means a relatively bulky kernel
> | > interface, since it would expose more things at the interface.
> |
> | That would work for me.
> |
> | But my idea was actually opposite: expose individual knobs to
> | userspace, and then select some operating point (inside kernel) that
> | satisfies given knobs.
> ---
>
> But then the kernel needs to know about the operating points, so aren't
> you back where we were before?
Yes, it will be similar to your solution, but
a) I'll need not use oppoints on PCs where parameters are independend
and
b) I'll still get reasonable user<->kernel interface.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
@ 2006-09-03 23:00 Scott E. Preece
2006-09-04 9:12 ` Pavel Machek
2006-09-05 10:31 ` Rafael J. Wysocki
0 siblings, 2 replies; 136+ messages in thread
From: Scott E. Preece @ 2006-09-03 23:00 UTC (permalink / raw)
To: rjw; +Cc: scott.preece, matthew.a.locke, linux-pm, pavel
| From: "Rafael J. Wysocki" <rjw@sisk.pl>
|
| On Sunday, 3 September 2006 23:34, Scott E. Preece wrote:
| >
| > | From: "Rafael J. Wysocki" <rjw@sisk.pl>
| > |
| > | On Sunday, 3 September 2006 18:25, David Singleton wrote:
| > | > On 9/2/06, Rafael J. Wysocki <rjw@sisk.pl> wrote:
| > | > >
| > | > > That depends on the definition, but I think of suspend states as the ones
| > | > > that require processes to be frozen before they can be entered. IMHO it is
| > | > > quite clear that such states cannot be handled in the same way as those
| > | > > that do not require the freezing of processes, so they are not the same.
| > | >
| > | > You are correct, processes do need to be frozen before a suspend.
| > | > That's the prepare to suspend part of the suspend process, and
| > | > the transtition is the suspending and finish is the un-freezing
| > | > of the processes to resume execution.
| > | >
| > | > And those same steps are the same steps required to transition the
| > | > system to a new operating point, whether it's suspend or change
| > | > from 1.4GHz to 600MHz.
| > |
| > | There are only a few states that require the processes to be frozen and I
| > | think that's a good enough reason to handle them separately.
| >
| > ---
| >
| > But, surely that distinction can be handled in the implementation behind
| > the interface, rather than exsposed in the interface.
|
| I don't think you can handle that behind the interface in a satisfactory way.
| For example during a suspend to disk we carry out several transitions of
| devices within the suspend-resume cycle.
|
| > Does that distinction matter to the policy manager?
|
| I think so.
|
| > I would argue that it
| > increases the latency, which would be important to the policy manager,
| > but that the nature of the latency isn't important to making a policy
| > decision, and the proposed interface already exposes the latency as
| > something that can be used in making transition decisions.
|
| From the policy manager perspective it may be just a latency fator,
| but for all of the things _outside_ of the policy manager it's much more
| than that.
|
| For example transitions like a CPU frequency change are transparent for kernel
| threads, but the suspend "transitions" are not, because the kernel threads need
| to be informed that the system is suspending and they are expected to freeze
| themselves voluntarily.
|
| Really, I think that the "states" which are entered only after tasks are
| frozen should be considered as special and handled separately.
---
My point is that if the only kernel interface is set-op(), then the code
in the kernel that implements set-op() is the thing that's going to
drive the details of suspending the system, just as it does today. The
abstraction at the kernel interface is about as simple as it can be and
all the policy issues are moved outside the kernel.
My question is whether there are aspects of suspending, other than
latency, that the policy manager would need to consider in deciding
whether to suspend or not.
Look at it this way. In one scheme the policy manager code is:
new_OP = select_transition(current_OP, decision_factors);
set_OP(new_OP);
in the other the policy manager code is:
new_OP = select_transition(current_OP, decision_factors);
if (new_OP == SUSPEND)
suspend();
else
set_OP(new_OP);
The only practical difference is whether the kernel has one interface or
two; in the one-interface case, there's code in the kernel's
implementation of set_OP() that does the same conditional and calls the
same implementation of suspend. In Pavel's preferred idiom, the calls
to set_OP() are replaced by a sequence of
set_power_parameter(PARM, VALUE) calls
All dreadfully oversimplified, of course, but I know that the general
approach is possible, because our PM subsystem works in a vaguely
similar manner. The simplification isn't completely ignorable, though,
because the mechanisms driving the transitions involve input from the
kernel (entry to idle, interrupts, clock events, load information, etc.).
The interaction between the kernel and the policy manager may actually
be too complex to support doing all of policy management in user space
(our implementation actually has some kernel bits and some user-spec
bits). Not sure that affects the question of whether suspend is an
operating point, though - that seems (to me) to work the same whether
the policy decision is in the kernel or in user space.
The one question that I see as interesting on that score is whether the
policy decision to suspend is based on factors that are wholly different
than the factors that drive frequency/voltage changes. If that were the
case, then there would be no point to making the decisions in the same
place. Honestly, I'm not sure of the answer to that...
scott
--
scott preece
motorola mobile devices, il67, 1800 s. oak st., champaign, il 61820
e-mail: preece@motorola.com fax: +1-217-384-8550
phone: +1-217-384-8589 cell: +1-217-433-6114 pager: 2174336114@vtext.com
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-09-03 23:00 Scott E. Preece
@ 2006-09-04 9:12 ` Pavel Machek
2006-09-05 10:31 ` Rafael J. Wysocki
1 sibling, 0 replies; 136+ messages in thread
From: Pavel Machek @ 2006-09-04 9:12 UTC (permalink / raw)
Cc: scott.preece, matthew.a.locke, linux-pm
Hi!
> | Really, I think that the "states" which are entered only after tasks are
> | frozen should be considered as special and handled separately.
> ---
>
> My point is that if the only kernel interface is set-op(), then the code
> in the kernel that implements set-op() is the thing that's going to
> drive the details of suspending the system, just as it does today. The
> abstraction at the kernel interface is about as simple as it can be and
> all the policy issues are moved outside the kernel.
>
> My question is whether there are aspects of suspending, other than
> latency, that the policy manager would need to consider in deciding
> whether to suspend or not.
>
> Look at it this way. In one scheme the policy manager code is:
>
> new_OP = select_transition(current_OP, decision_factors);
> set_OP(new_OP);
No, it would be
new_OP = select_transition(current_OP, decision_factors);
if (new_OP == SUSPEND) {
setup wakeup events ...
}
set_OP(new_OP);
> in the other the policy manager code is:
>
> new_OP = select_transition(current_OP, decision_factors);
> if (new_OP == SUSPEND)
> suspend();
> else
> set_OP(new_OP);
...
> The one question that I see as interesting on that score is whether the
> policy decision to suspend is based on factors that are wholly different
> than the factors that drive frequency/voltage changes. If that were the
> case, then there would be no point to making the decisions in the same
> place. Honestly, I'm not sure of the answer to that...
I'm pretty sure decision to suspend is other factors. Remember most
machines are non-functional during suspend.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-09-03 23:00 Scott E. Preece
2006-09-04 9:12 ` Pavel Machek
@ 2006-09-05 10:31 ` Rafael J. Wysocki
1 sibling, 0 replies; 136+ messages in thread
From: Rafael J. Wysocki @ 2006-09-05 10:31 UTC (permalink / raw)
Cc: scott.preece, matthew.a.locke, linux-pm, pavel
On Monday, 4 September 2006 01:00, Scott E. Preece wrote:
>
> | From: "Rafael J. Wysocki" <rjw@sisk.pl>
> |
> | On Sunday, 3 September 2006 23:34, Scott E. Preece wrote:
> | >
> | > | From: "Rafael J. Wysocki" <rjw@sisk.pl>
> | > |
> | > | On Sunday, 3 September 2006 18:25, David Singleton wrote:
> | > | > On 9/2/06, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> | > | > >
> | > | > > That depends on the definition, but I think of suspend states as the ones
> | > | > > that require processes to be frozen before they can be entered. IMHO it is
> | > | > > quite clear that such states cannot be handled in the same way as those
> | > | > > that do not require the freezing of processes, so they are not the same.
> | > | >
> | > | > You are correct, processes do need to be frozen before a suspend.
> | > | > That's the prepare to suspend part of the suspend process, and
> | > | > the transtition is the suspending and finish is the un-freezing
> | > | > of the processes to resume execution.
> | > | >
> | > | > And those same steps are the same steps required to transition the
> | > | > system to a new operating point, whether it's suspend or change
> | > | > from 1.4GHz to 600MHz.
> | > |
> | > | There are only a few states that require the processes to be frozen and I
> | > | think that's a good enough reason to handle them separately.
> | >
> | > ---
> | >
> | > But, surely that distinction can be handled in the implementation behind
> | > the interface, rather than exsposed in the interface.
> |
> | I don't think you can handle that behind the interface in a satisfactory way.
> | For example during a suspend to disk we carry out several transitions of
> | devices within the suspend-resume cycle.
> |
> | > Does that distinction matter to the policy manager?
> |
> | I think so.
> |
> | > I would argue that it
> | > increases the latency, which would be important to the policy manager,
> | > but that the nature of the latency isn't important to making a policy
> | > decision, and the proposed interface already exposes the latency as
> | > something that can be used in making transition decisions.
> |
> | From the policy manager perspective it may be just a latency fator,
> | but for all of the things _outside_ of the policy manager it's much more
> | than that.
> |
> | For example transitions like a CPU frequency change are transparent for kernel
> | threads, but the suspend "transitions" are not, because the kernel threads need
> | to be informed that the system is suspending and they are expected to freeze
> | themselves voluntarily.
> |
> | Really, I think that the "states" which are entered only after tasks are
> | frozen should be considered as special and handled separately.
> ---
>
> My point is that if the only kernel interface is set-op(), then the code
> in the kernel that implements set-op() is the thing that's going to
> drive the details of suspending the system, just as it does today.
It's not exactly correct in the case of the userland suspend when we have
a userland process that drives the suspend (eg. it writes the suspend image
to a storage). In that case the kernel is only asked to performe some
well defined atomic actions and not the entire transition.
> The abstraction at the kernel interface is about as simple as it can be and
> all the policy issues are moved outside the kernel.
>
> My question is whether there are aspects of suspending, other than
> latency, that the policy manager would need to consider in deciding
> whether to suspend or not.
>
> Look at it this way. In one scheme the policy manager code is:
>
> new_OP = select_transition(current_OP, decision_factors);
> set_OP(new_OP);
>
> in the other the policy manager code is:
>
> new_OP = select_transition(current_OP, decision_factors);
> if (new_OP == SUSPEND)
> suspend();
> else
> set_OP(new_OP);
>
> The only practical difference is whether the kernel has one interface or
> two; in the one-interface case, there's code in the kernel's
> implementation of set_OP() that does the same conditional and calls the
> same implementation of suspend. In Pavel's preferred idiom, the calls
> to set_OP() are replaced by a sequence of
>
> set_power_parameter(PARM, VALUE) calls
>
> All dreadfully oversimplified, of course, but I know that the general
> approach is possible, because our PM subsystem works in a vaguely
> similar manner. The simplification isn't completely ignorable, though,
> because the mechanisms driving the transitions involve input from the
> kernel (entry to idle, interrupts, clock events, load information, etc.).
> The interaction between the kernel and the policy manager may actually
> be too complex to support doing all of policy management in user space
> (our implementation actually has some kernel bits and some user-spec
> bits). Not sure that affects the question of whether suspend is an
> operating point, though - that seems (to me) to work the same whether
> the policy decision is in the kernel or in user space.
>
> The one question that I see as interesting on that score is whether the
> policy decision to suspend is based on factors that are wholly different
> than the factors that drive frequency/voltage changes. If that were the
> case, then there would be no point to making the decisions in the same
> place. Honestly, I'm not sure of the answer to that...
I think the decision to suspend is made
a) by the user,
b) by a policy manager in case when, for example, the battery is running
critical (ie. on emergency).
and the decision to change a frequency/voltage is usually based on some
efficiency factors.
Also, the suspend "transitions" are never transparent to the user and the
changes of frequency/voltage usually are, at least as far as CPUs are
concerned.
Greetings,
Rafael
--
You never change things by fighting the existing reality.
R. Buckminster Fuller
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
@ 2006-09-03 22:40 Scott E. Preece
2006-09-04 9:06 ` Pavel Machek
0 siblings, 1 reply; 136+ messages in thread
From: Scott E. Preece @ 2006-09-03 22:40 UTC (permalink / raw)
To: pavel; +Cc: scott.preece, matthew.a.locke, linux-pm
| From: Pavel Machek<pavel@ucw.cz>
|
| On Sun 2006-09-03 17:12:22, Scott E. Preece wrote:
| > | From: Pavel Machek<pavel@ucw.cz>
|
| > | > But, surely that distinction can be handled in the implementation behind
| > | > the interface, rather than exsposed in the interface. Does that
| > | > distinction matter to the policy manager? I would argue that it
| > | > increases the latency, which would be important to the policy manager,
| > | > but that the nature of the latency isn't important to making a policy
| > | > decision, and the proposed interface already exposes the latency as
| > | > something that can be used in making transition decisions.
| > |
| > | Are we talking about the same thing?
| > |
| > | If policy manager decides to suspend-to-RAM, it will freeze
| > | itself. Puff, it is not running any more.
| > ---
| >
| > Well, I assume the policy manager is telling something in the kernel to
| > actually set the operating point. Once it has made that request, it
| > doesn't need to run any longer.
|
| And how will it tell the kernel to get back to some _operating_
| point? (As opposed to "off-suspended-to-disk"?)
|
| You see, that interface even causes problems in our (human!)
| comunication. Some of operating points are not really operating!
---
You mean, how will it initiate the transition out of "suspended"? Well,
obviously, it wouldn't be able to do that until the machine
resumed. But, from the perspective of the policy manager, that doesn't
really matter - no time passes (from its perspective), it just starts
running again, receives some kind of wakeup event from the kernel, and
decides what transition should happen.
---
|
| > | Of course, we could use same interface for both. No, it is not good
| > | idea. We want reasonably clean interface. If it means rewriting
| > | powerop two or three times... we'll need to do it.
| > ---
| >
| > Not speaking to either of the current code submissions, I would say that
| > having a kernel interface for defining OPs and a kernel interface for
| > setting the OP, was a reasonably clean interface.
|
| Well, me and Rafael disagree, and you do not really listen to
| arguments. Now you can either fix the interface, or try to submit code
| to lkml despite our NAKs. Go ahead and prepare for some flaming...
---
I think I'm listening to arguments just as much as you guys are! We just
disagree. What are your criteria for "a clean interface"? Why do you
think that n separate set-parameter() interfaces, with no consistency
relationship between them, are cleaner than one define-op() and one
set-op() interface?
scott
--
scott preece
motorola mobile devices, il67, 1800 s. oak st., champaign, il 61820
e-mail: preece@motorola.com fax: +1-217-384-8550
phone: +1-217-384-8589 cell: +1-217-433-6114 pager: 2174336114@vtext.com
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-09-03 22:40 Scott E. Preece
@ 2006-09-04 9:06 ` Pavel Machek
2006-09-05 16:45 ` Mark Gross
0 siblings, 1 reply; 136+ messages in thread
From: Pavel Machek @ 2006-09-04 9:06 UTC (permalink / raw)
Cc: scott.preece, matthew.a.locke, linux-pm
On Sun 2006-09-03 17:40:27, Scott E. Preece wrote:
>
> | From: Pavel Machek<pavel@ucw.cz>
> |
> | On Sun 2006-09-03 17:12:22, Scott E. Preece wrote:
> | > | From: Pavel Machek<pavel@ucw.cz>
> | > Not speaking to either of the current code submissions, I would say that
> | > having a kernel interface for defining OPs and a kernel interface for
> | > setting the OP, was a reasonably clean interface.
> |
> | Well, me and Rafael disagree, and you do not really listen to
> | arguments. Now you can either fix the interface, or try to submit code
> | to lkml despite our NAKs. Go ahead and prepare for some flaming...
> ---
>
> I think I'm listening to arguments just as much as you guys are! We just
> disagree. What are your criteria for "a clean interface"? Why do you
> think that n separate set-parameter() interfaces, with no consistency
> relationship between them, are cleaner than one define-op() and one
> set-op() interface?
Because we already have cpufreq-set-parameter() interface and
enter-suspend-state() interface. We can't really get rid of them.
If you add set-op() replacing both cpufreq-set-parameter() and
enter-suspend-state(), we'll end up with two different interfaces for
each interface; that's considered "mess".
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-09-04 9:06 ` Pavel Machek
@ 2006-09-05 16:45 ` Mark Gross
2006-09-06 10:59 ` Pavel Machek
0 siblings, 1 reply; 136+ messages in thread
From: Mark Gross @ 2006-09-05 16:45 UTC (permalink / raw)
To: Pavel Machek; +Cc: matthew.a.locke, scott.preece, linux-pm
On Mon, Sep 04, 2006 at 11:06:45AM +0200, Pavel Machek wrote:
> On Sun 2006-09-03 17:40:27, Scott E. Preece wrote:
> >
> > | From: Pavel Machek<pavel@ucw.cz>
> > |
> > | On Sun 2006-09-03 17:12:22, Scott E. Preece wrote:
> > | > | From: Pavel Machek<pavel@ucw.cz>
>
> > | > Not speaking to either of the current code submissions, I would say that
> > | > having a kernel interface for defining OPs and a kernel interface for
> > | > setting the OP, was a reasonably clean interface.
> > |
> > | Well, me and Rafael disagree, and you do not really listen to
> > | arguments. Now you can either fix the interface, or try to submit code
> > | to lkml despite our NAKs. Go ahead and prepare for some flaming...
> > ---
> >
> > I think I'm listening to arguments just as much as you guys are! We just
> > disagree. What are your criteria for "a clean interface"? Why do you
> > think that n separate set-parameter() interfaces, with no consistency
> > relationship between them, are cleaner than one define-op() and one
> > set-op() interface?
>
> Because we already have cpufreq-set-parameter() interface and
> enter-suspend-state() interface. We can't really get rid of them.
>
This is true. Yet todays cpufreq interface is not up to the job of
providing power management for many embedded platforms.
> If you add set-op() replacing both cpufreq-set-parameter() and
> enter-suspend-state(), we'll end up with two different interfaces for
> each interface; that's considered "mess".
Why can't they coexist?
Are you arguing that the cpufreq interface be morphed to support power
op applications?
--mgross
> Pavel
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/linux-pm
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-09-05 16:45 ` Mark Gross
@ 2006-09-06 10:59 ` Pavel Machek
0 siblings, 0 replies; 136+ messages in thread
From: Pavel Machek @ 2006-09-06 10:59 UTC (permalink / raw)
To: Mark Gross; +Cc: matthew.a.locke, scott.preece, linux-pm
> > > I think I'm listening to arguments just as much as you guys are! We just
> > > disagree. What are your criteria for "a clean interface"? Why do you
> > > think that n separate set-parameter() interfaces, with no consistency
> > > relationship between them, are cleaner than one define-op() and one
> > > set-op() interface?
> >
> > Because we already have cpufreq-set-parameter() interface and
> > enter-suspend-state() interface. We can't really get rid of them.
> >
> This is true. Yet todays cpufreq interface is not up to the job of
> providing power management for many embedded platforms.
> > If you add set-op() replacing both cpufreq-set-parameter() and
> > enter-suspend-state(), we'll end up with two different interfaces for
> > each interface; that's considered "mess".
>
> Why can't they coexist?
>
> Are you arguing that the cpufreq interface be morphed to support power
> op applications?
No. I'm arguing that
* cpufreq interface should be used for changing cpu frequency
* additional interfaces should be created for changing memory clock
etc.
* existing interfaces should be used for turning devices on/off (and
new ones created when old ones do not exist)
* powerop should take a look what userspace wants, and just close
closest point to that.
Pavel
--
Thanks, Sharp!
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
@ 2006-09-03 22:31 Scott E. Preece
2006-09-03 22:41 ` Pavel Machek
0 siblings, 1 reply; 136+ messages in thread
From: Scott E. Preece @ 2006-09-03 22:31 UTC (permalink / raw)
To: pavel; +Cc: linux-pm
| From: Pavel Machek<pavel@ucw.cz>
| Cc: <amit.kucheria@nokia.com>, <linux-pm@lists.osdl.org>
| User-Agent: Mutt/1.5.11+cvs20060126
|
| On Sun 2006-09-03 16:21:34, Scott E. Preece wrote:
| > | From: Pavel Machek<pavel@ucw.cz>
| > some reason, in your perception, why definition of operating points
| > really needs to be in the kernel? Definition of the operating points,
| > as opposed to changing from one OP to another, shouldn't have any timing
| > issues, so why isn't a privileged user-space manager a reasonable
| > approach?
|
| For one thing, is not powerop needed for boot? You need to boot in
| some operating point after all :-).
---
In the implementation we use, the initial OP is set by the boot
loader. Our kernel power management driver is a module. The assumption
is that booting involves a lot of processing and it makes sense to just
use the highest OP anyway. That makes sense in our environment, but I
wouldn't recommend our version as a general solution, anyway.
---
|
| Yes, I see having points defined in userspace is useful for debugging,
| but having kernel depend on external daemon for its proper operation
| is not nice.
|
---
Again, as long as the kernel comes up in some OP it should run
properly. If there's any goal of moving policy out of the kernel, the
kernel is going to have to depend on user-space to support optimal
operation, but the kernel should operate correctly, if non-optimally,
without it.
---
| > | > The only other interface is the actually setting of a (named) operating
| > | > point and that is _required_ to do anything useful.
| > |
| > | No, they are not.
| > |
| > | We already have interface for selecting cpu frequency. Lets keep it.
| >
| > As noted previously, OPs bundle together more than just the
| > frequency. Those of us supporting the OP model believe that you can't
| > intelligently change CPU frequency in isolation and you can't change
| > some of those parameters independently, because only certain
| > combinations work.
|
| That's okay. User gives you combination he wants, and you select "next
| higher" working operating point.
---
Hmm. I need to think about that. I guess the OP abstraction *could* be
entirely inside the user-space policy manager, with the kernel exposing
individual interfaces for every parameter that the policy manager would
need to adjust. However, that means that the whole mess has to be at
user level (kernel just implements simple knobs for individual
parameters), because the dependency management between the parameters
would only be known at user level, and means a relatively bulky kernel
interface, since it would expose more things at the interface.
On the other hand, that complexity has to be somewhere, so maybe that
would be OK. As I said, I need to think about it...
scott
--
scott preece
motorola mobile devices, il67, 1800 s. oak st., champaign, il 61820
e-mail: preece@motorola.com fax: +1-217-384-8550
phone: +1-217-384-8589 cell: +1-217-433-6114 pager: 2174336114@vtext.com
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-09-03 22:31 Scott E. Preece
@ 2006-09-03 22:41 ` Pavel Machek
0 siblings, 0 replies; 136+ messages in thread
From: Pavel Machek @ 2006-09-03 22:41 UTC (permalink / raw)
To: Scott E. Preece; +Cc: linux-pm
On Sun 2006-09-03 17:31:02, Scott E. Preece wrote:
> | From: Pavel Machek<pavel@ucw.cz>
> | Cc: <amit.kucheria@nokia.com>, <linux-pm@lists.osdl.org>
> | User-Agent: Mutt/1.5.11+cvs20060126
> |
> | On Sun 2006-09-03 16:21:34, Scott E. Preece wrote:
> | > | From: Pavel Machek<pavel@ucw.cz>
> | > some reason, in your perception, why definition of operating points
> | > really needs to be in the kernel? Definition of the operating points,
> | > as opposed to changing from one OP to another, shouldn't have any timing
> | > issues, so why isn't a privileged user-space manager a reasonable
> | > approach?
> |
> | For one thing, is not powerop needed for boot? You need to boot in
> | some operating point after all :-).
> ---
>
> In the implementation we use, the initial OP is set by the boot
> loader. Our kernel power management driver is a module. The assumption
> is that booting involves a lot of processing and it makes sense to just
> use the highest OP anyway. That makes sense in our environment, but I
> wouldn't recommend our version as a general solution, anyway.
That's actually regression; cpufreq saves power even while booting.
> | > As noted previously, OPs bundle together more than just the
> | > frequency. Those of us supporting the OP model believe that you can't
> | > intelligently change CPU frequency in isolation and you can't change
> | > some of those parameters independently, because only certain
> | > combinations work.
> |
> | That's okay. User gives you combination he wants, and you select "next
> | higher" working operating point.
> ---
>
> Hmm. I need to think about that. I guess the OP abstraction *could* be
> entirely inside the user-space policy manager, with the kernel exposing
> individual interfaces for every parameter that the policy manager would
> need to adjust. However, that means that the whole mess has to be at
> user level (kernel just implements simple knobs for individual
> parameters), because the dependency management between the parameters
> would only be known at user level, and means a relatively bulky kernel
> interface, since it would expose more things at the interface.
That would work for me.
But my idea was actually opposite: expose individual knobs to
userspace, and then select some operating point (inside kernel) that
satisfies given knobs.
> On the other hand, that complexity has to be somewhere, so maybe that
> would be OK. As I said, I need to think about it...
Looking forward for new interface proposal ;-).
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
@ 2006-09-03 22:12 Scott E. Preece
2006-09-03 22:25 ` Pavel Machek
0 siblings, 1 reply; 136+ messages in thread
From: Scott E. Preece @ 2006-09-03 22:12 UTC (permalink / raw)
To: pavel; +Cc: scott.preece, matthew.a.locke, linux-pm
| From: Pavel Machek<pavel@ucw.cz>
|
| Hi!
|
| > | > > That depends on the definition, but I think of suspend states as the ones
| > | > > that require processes to be frozen before they can be entered. IMHO it is
| > | > > quite clear that such states cannot be handled in the same way as those
| > | > > that do not require the freezing of processes, so they are not the same.
| > | >
| > | > You are correct, processes do need to be frozen before a suspend.
| > | > That's the prepare to suspend part of the suspend process, and
| > | > the transtition is the suspending and finish is the un-freezing
| > | > of the processes to resume execution.
| > | >
| > | > And those same steps are the same steps required to transition the
| > | > system to a new operating point, whether it's suspend or change
| > | > from 1.4GHz to 600MHz.
| > |
| > | There are only a few states that require the processes to be frozen and I
| > | think that's a good enough reason to handle them separately.
| >
| > ---
| >
| > But, surely that distinction can be handled in the implementation behind
| > the interface, rather than exsposed in the interface. Does that
| > distinction matter to the policy manager? I would argue that it
| > increases the latency, which would be important to the policy manager,
| > but that the nature of the latency isn't important to making a policy
| > decision, and the proposed interface already exposes the latency as
| > something that can be used in making transition decisions.
|
| Are we talking about the same thing?
|
| If policy manager decides to suspend-to-RAM, it will freeze
| itself. Puff, it is not running any more.
---
Well, I assume the policy manager is telling something in the kernel to
actually set the operating point. Once it has made that request, it
doesn't need to run any longer.
---
|
| Yes, it is important that interfaces are different. Would you argue
| for using same interface for slowing down machine and for turning
| machine off?
|
| And suspend-to-disk *is* turning machine off.
---
An interesting question. While it's turning the machine off, it's not
turning it off in the same sense as shutdown, because otherwise you
wouldn't come back via resume.
In any case, I could imagine OFF being another point in the operating
point continuum, except that it's not something I would expect to be
part of the range available to a policy manager (probably; I guess there
are emergency situations where the policy manager might want to shut the
machine down).
---
|
| Of course, we could use same interface for both. No, it is not good
| idea. We want reasonably clean interface. If it means rewriting
| powerop two or three times... we'll need to do it.
---
Not speaking to either of the current code submissions, I would say that
having a kernel interface for defining OPs and a kernel interface for
setting the OP, was a reasonably clean interface.
scott
--
scott preece
motorola mobile devices, il67, 1800 s. oak st., champaign, il 61820
e-mail: preece@motorola.com fax: +1-217-384-8550
phone: +1-217-384-8589 cell: +1-217-433-6114 pager: 2174336114@vtext.com
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-09-03 22:12 Scott E. Preece
@ 2006-09-03 22:25 ` Pavel Machek
0 siblings, 0 replies; 136+ messages in thread
From: Pavel Machek @ 2006-09-03 22:25 UTC (permalink / raw)
Cc: scott.preece, matthew.a.locke, linux-pm
On Sun 2006-09-03 17:12:22, Scott E. Preece wrote:
> | From: Pavel Machek<pavel@ucw.cz>
> | > But, surely that distinction can be handled in the implementation behind
> | > the interface, rather than exsposed in the interface. Does that
> | > distinction matter to the policy manager? I would argue that it
> | > increases the latency, which would be important to the policy manager,
> | > but that the nature of the latency isn't important to making a policy
> | > decision, and the proposed interface already exposes the latency as
> | > something that can be used in making transition decisions.
> |
> | Are we talking about the same thing?
> |
> | If policy manager decides to suspend-to-RAM, it will freeze
> | itself. Puff, it is not running any more.
> ---
>
> Well, I assume the policy manager is telling something in the kernel to
> actually set the operating point. Once it has made that request, it
> doesn't need to run any longer.
And how will it tell the kernel to get back to some _operating_
point? (As opposed to "off-suspended-to-disk"?)
You see, that interface even causes problems in our (human!)
comunication. Some of operating points are not really operating!
> | Of course, we could use same interface for both. No, it is not good
> | idea. We want reasonably clean interface. If it means rewriting
> | powerop two or three times... we'll need to do it.
> ---
>
> Not speaking to either of the current code submissions, I would say that
> having a kernel interface for defining OPs and a kernel interface for
> setting the OP, was a reasonably clean interface.
Well, me and Rafael disagree, and you do not really listen to
arguments. Now you can either fix the interface, or try to submit code
to lkml despite our NAKs. Go ahead and prepare for some flaming...
(But I'd rather have you fix the interface.)
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
@ 2006-09-03 21:34 Scott E. Preece
2006-09-03 21:43 ` Pavel Machek
2006-09-03 22:10 ` Rafael J. Wysocki
0 siblings, 2 replies; 136+ messages in thread
From: Scott E. Preece @ 2006-09-03 21:34 UTC (permalink / raw)
To: rjw; +Cc: scott.preece, matthew.a.locke, linux-pm, pavel
| From: "Rafael J. Wysocki" <rjw@sisk.pl>
|
| On Sunday, 3 September 2006 18:25, David Singleton wrote:
| > On 9/2/06, Rafael J. Wysocki <rjw@sisk.pl> wrote:
| > >
| > > That depends on the definition, but I think of suspend states as the ones
| > > that require processes to be frozen before they can be entered. IMHO it is
| > > quite clear that such states cannot be handled in the same way as those
| > > that do not require the freezing of processes, so they are not the same.
| >
| > You are correct, processes do need to be frozen before a suspend.
| > That's the prepare to suspend part of the suspend process, and
| > the transtition is the suspending and finish is the un-freezing
| > of the processes to resume execution.
| >
| > And those same steps are the same steps required to transition the
| > system to a new operating point, whether it's suspend or change
| > from 1.4GHz to 600MHz.
|
| There are only a few states that require the processes to be frozen and I
| think that's a good enough reason to handle them separately.
---
But, surely that distinction can be handled in the implementation behind
the interface, rather than exsposed in the interface. Does that
distinction matter to the policy manager? I would argue that it
increases the latency, which would be important to the policy manager,
but that the nature of the latency isn't important to making a policy
decision, and the proposed interface already exposes the latency as
something that can be used in making transition decisions.
scott
--
scott preece
motorola mobile devices, il67, 1800 s. oak st., champaign, il 61820
e-mail: preece@motorola.com fax: +1-217-384-8550
phone: +1-217-384-8589 cell: +1-217-433-6114 pager: 2174336114@vtext.com
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-09-03 21:34 Scott E. Preece
@ 2006-09-03 21:43 ` Pavel Machek
2006-09-03 22:10 ` Rafael J. Wysocki
1 sibling, 0 replies; 136+ messages in thread
From: Pavel Machek @ 2006-09-03 21:43 UTC (permalink / raw)
Cc: scott.preece, matthew.a.locke, linux-pm
Hi!
> | > > That depends on the definition, but I think of suspend states as the ones
> | > > that require processes to be frozen before they can be entered. IMHO it is
> | > > quite clear that such states cannot be handled in the same way as those
> | > > that do not require the freezing of processes, so they are not the same.
> | >
> | > You are correct, processes do need to be frozen before a suspend.
> | > That's the prepare to suspend part of the suspend process, and
> | > the transtition is the suspending and finish is the un-freezing
> | > of the processes to resume execution.
> | >
> | > And those same steps are the same steps required to transition the
> | > system to a new operating point, whether it's suspend or change
> | > from 1.4GHz to 600MHz.
> |
> | There are only a few states that require the processes to be frozen and I
> | think that's a good enough reason to handle them separately.
>
> ---
>
> But, surely that distinction can be handled in the implementation behind
> the interface, rather than exsposed in the interface. Does that
> distinction matter to the policy manager? I would argue that it
> increases the latency, which would be important to the policy manager,
> but that the nature of the latency isn't important to making a policy
> decision, and the proposed interface already exposes the latency as
> something that can be used in making transition decisions.
Are we talking about the same thing?
If policy manager decides to suspend-to-RAM, it will freeze
itself. Puff, it is not running any more.
Yes, it is important that interfaces are different. Would you argue
for using same interface for slowing down machine and for turning
machine off?
And suspend-to-disk *is* turning machine off.
Of course, we could use same interface for both. No, it is not good
idea. We want reasonably clean interface. If it means rewriting
powerop two or three times... we'll need to do it.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-09-03 21:34 Scott E. Preece
2006-09-03 21:43 ` Pavel Machek
@ 2006-09-03 22:10 ` Rafael J. Wysocki
1 sibling, 0 replies; 136+ messages in thread
From: Rafael J. Wysocki @ 2006-09-03 22:10 UTC (permalink / raw)
Cc: scott.preece, matthew.a.locke, linux-pm, pavel
On Sunday, 3 September 2006 23:34, Scott E. Preece wrote:
>
> | From: "Rafael J. Wysocki" <rjw@sisk.pl>
> |
> | On Sunday, 3 September 2006 18:25, David Singleton wrote:
> | > On 9/2/06, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> | > >
> | > > That depends on the definition, but I think of suspend states as the ones
> | > > that require processes to be frozen before they can be entered. IMHO it is
> | > > quite clear that such states cannot be handled in the same way as those
> | > > that do not require the freezing of processes, so they are not the same.
> | >
> | > You are correct, processes do need to be frozen before a suspend.
> | > That's the prepare to suspend part of the suspend process, and
> | > the transtition is the suspending and finish is the un-freezing
> | > of the processes to resume execution.
> | >
> | > And those same steps are the same steps required to transition the
> | > system to a new operating point, whether it's suspend or change
> | > from 1.4GHz to 600MHz.
> |
> | There are only a few states that require the processes to be frozen and I
> | think that's a good enough reason to handle them separately.
>
> ---
>
> But, surely that distinction can be handled in the implementation behind
> the interface, rather than exsposed in the interface.
I don't think you can handle that behind the interface in a satisfactory way.
For example during a suspend to disk we carry out several transitions of
devices within the suspend-resume cycle.
> Does that distinction matter to the policy manager?
I think so.
> I would argue that it
> increases the latency, which would be important to the policy manager,
> but that the nature of the latency isn't important to making a policy
> decision, and the proposed interface already exposes the latency as
> something that can be used in making transition decisions.
>From the policy manager perspective it may be just a latency fator,
but for all of the things _outside_ of the policy manager it's much more
than that.
For example transitions like a CPU frequency change are transparent for kernel
threads, but the suspend "transitions" are not, because the kernel threads need
to be informed that the system is suspending and they are expected to freeze
themselves voluntarily.
Really, I think that the "states" which are entered only after tasks are
frozen should be considered as special and handled separately.
Greetings,
Rafael
--
You never change things by fighting the existing reality.
R. Buckminster Fuller
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
@ 2006-09-03 21:21 Scott E. Preece
2006-09-03 21:54 ` Pavel Machek
0 siblings, 1 reply; 136+ messages in thread
From: Scott E. Preece @ 2006-09-03 21:21 UTC (permalink / raw)
To: pavel; +Cc: linux-pm
| From: Pavel Machek<pavel@ucw.cz>
|
| Hi!
|
| On Thu 2006-08-31 16:44:12, Amit Kucheria wrote:
| > On Thu, 2006-08-31 at 00:36 +0200, ext Pavel Machek wrote:
| > > On Wed 2006-08-30 14:00:53, Amit Kucheria wrote:
| > > > But PowerOP would allow SoC-based systems to tune the operating points
| > > > to get the most out of their top-10 use-cases and sleep modes.
| > >
| > > Question is: can we get similar savings without ugly interface powerop
| > > presents?
| >
| > If I have understood correctly, your main objection is to defining new
| > operating points from userspace?
|
| Well, that is big objection, but not my main one. I believe that "new
| operating points from userspace" are non-starter. "So obviously wrong
| that noone would merge that".
---
Why? Are you interpreting "from user space" as "under user control"? A
lot of us have been taught for some time that it's a good thing to move
stuff out of the kernel, unless it really needs to be there. Is there
some reason, in your perception, why definition of operating points
really needs to be in the kernel? Definition of the operating points,
as opposed to changing from one OP to another, shouldn't have any timing
issues, so why isn't a privileged user-space manager a reasonable
approach?
---
|
| > The only other interface is the actually setting of a (named) operating
| > point and that is _required_ to do anything useful.
|
| No, they are not.
|
| We already have interface for selecting cpu frequency. Lets keep it.
---
As noted previously, OPs bundle together more than just the
frequency. Those of us supporting the OP model believe that you can't
intelligently change CPU frequency in isolation and you can't change
some of those parameters independently, because only certain
combinations work.
---
| ...
| Now, it should be up-to the powerop framework to select best operating
| point given "cpu speed, dsp speed, usb on/off" state. But I argue that
| this should be done in-kernel and hidden from user.
---
Well, I agree with hiding it from the user, but there's no particular
reason that means it needs to be done in the kernel. Again, we like to
have it run from user-space, so we can replace it easily (without
recompiling/restarting the kernel) in development.
scott
--
scott preece
motorola mobile devices, il67, 1800 s. oak st., champaign, il 61820
e-mail: preece@motorola.com fax: +1-217-384-8550
phone: +1-217-384-8589 cell: +1-217-433-6114 pager: 2174336114@vtext.com
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-09-03 21:21 Scott E. Preece
@ 2006-09-03 21:54 ` Pavel Machek
0 siblings, 0 replies; 136+ messages in thread
From: Pavel Machek @ 2006-09-03 21:54 UTC (permalink / raw)
To: Scott E. Preece; +Cc: linux-pm
On Sun 2006-09-03 16:21:34, Scott E. Preece wrote:
> | From: Pavel Machek<pavel@ucw.cz>
> | On Thu 2006-08-31 16:44:12, Amit Kucheria wrote:
> | > On Thu, 2006-08-31 at 00:36 +0200, ext Pavel Machek wrote:
> | > > On Wed 2006-08-30 14:00:53, Amit Kucheria wrote:
> | > > > But PowerOP would allow SoC-based systems to tune the operating points
> | > > > to get the most out of their top-10 use-cases and sleep modes.
> | > >
> | > > Question is: can we get similar savings without ugly interface powerop
> | > > presents?
> | >
> | > If I have understood correctly, your main objection is to defining new
> | > operating points from userspace?
> |
> | Well, that is big objection, but not my main one. I believe that "new
> | operating points from userspace" are non-starter. "So obviously wrong
> | that noone would merge that".
> ---
>
> Why? Are you interpreting "from user space" as "under user control"? A
> lot of us have been taught for some time that it's a good thing to move
> stuff out of the kernel, unless it really needs to be there. Is
> there
Moving stuff out of kernel is one important design principe. Keeping
user<->kernel interface reasonably clean is another one.
> some reason, in your perception, why definition of operating points
> really needs to be in the kernel? Definition of the operating points,
> as opposed to changing from one OP to another, shouldn't have any timing
> issues, so why isn't a privileged user-space manager a reasonable
> approach?
For one thing, is not powerop needed for boot? You need to boot in
some operating point after all :-).
Yes, I see having points defined in userspace is useful for debugging,
but having kernel depend on external daemon for its proper operation
is not nice.
> | > The only other interface is the actually setting of a (named) operating
> | > point and that is _required_ to do anything useful.
> |
> | No, they are not.
> |
> | We already have interface for selecting cpu frequency. Lets keep it.
>
> As noted previously, OPs bundle together more than just the
> frequency. Those of us supporting the OP model believe that you can't
> intelligently change CPU frequency in isolation and you can't change
> some of those parameters independently, because only certain
> combinations work.
That's okay. User gives you combination he wants, and you select "next
higher" working operating point.
> | Now, it should be up-to the powerop framework to select best operating
> | point given "cpu speed, dsp speed, usb on/off" state. But I argue that
> | this should be done in-kernel and hidden from user.
>
> Well, I agree with hiding it from the user, but there's no particular
> reason that means it needs to be done in the kernel. Again, we like to
> have it run from user-space, so we can replace it easily (without
> recompiling/restarting the kernel) in development.
Do whatever you want for development (that includes patching your
kernel). For production, nice interface is more important.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
@ 2006-09-01 14:49 Scott E. Preece
0 siblings, 0 replies; 136+ messages in thread
From: Scott E. Preece @ 2006-09-01 14:49 UTC (permalink / raw)
To: amit.kucheria; +Cc: linux-pm, scott.preece
| From Amit.Kucheria@nokia.com Fri Sep 1 03:14:57 2006
|
| On Thu, 2006-08-31 at 15:22 -0400, ext Preece Scott-PREECE wrote:
| ...
| > So, if a driver had set acceptable_latency to 300ms, the
| > Power-Management policy manager could look at the range of
| > available Ops and pick the lowest-power OP that met the
| > expected load and would also meet the required latency
| > guarantee. [And note that the acceptable latency has to include
| > both the resume time and whatever part of suspend happens with
| > interrupts blocked and can't be aborted.]
|
| Thinking of it that way, latency is possibly useful. Needs more
| thinking. But what latency values are associated with the OP? The values
| from the spec sheet provided by the silicon vendor do not take into
| account the other operations necessary before you can safely switch to a
| new OP. Some of these operations require indeterminate amount of time.
---
That's something the system designer would have to work out and provide
as part of the information associated with each possible OP transtion
(that is, it would potentially be different for each (currentOP, newOP)
pair).
The system designer would also need to decide whether the latencies had
to be worst-case guarantees or whether the system could tolerate
occasionally missing a latency deadline. This would vary depending on
the system (a heart pacemaker might find deadlines to be more important
than a PDA).
scott
--
scott preece
motorola mobile devices, il67, 1800 s. oak st., champaign, il 61820
e-mail: preece@motorola.com fax: +1-217-384-8550
phone: +1-217-384-8589 cell: +1-217-433-6114 pager: 2174336114@vtext.com
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
@ 2006-08-31 15:14 Scott E. Preece
0 siblings, 0 replies; 136+ messages in thread
From: Scott E. Preece @ 2006-08-31 15:14 UTC (permalink / raw)
To: pavel; +Cc: linux-pm, matthew.a.locke, scott.preece
| From: Pavel Machek<pavel@suse.cz>
| ...
| > I'm not sure how you distinguish between a "system" sleep state
| > and a "CPU" sleep state - seems like there's a collection of
| > things that can be shut down or not; except for true OFF, there's
| > always something on.
|
| Well, even in "true OFF", RTC keeps ticking. And in "disk" state
| (swsusp), machine is basically "true OFF" but it still retains state.
---
In our sleep state (which I would aligned with "mem", in the previous
list), the application processor part of the system is basically true
off, but retains state in memory. In our systems, of course, there's a
second processor that is independently going in and out of its own
low-power modes while waking up every so many milliseconds to stay
camped on a cellular network.
One "interesting" diffference between "disk" and "mem" (as we would use
them, though we don't have a disk, so we don't have a "disk" state), is
that suspend-to-disk today requires rebooting, while suspend-to-RAM
doesn't. I don't see why that distinction can't still be below the
interface abstraction presented to a user-space power manager, but it's
the most qualitative difference across the range of proposed operating
points...
scott
--
scott preece
motorola mobile devices, il67, 1800 s. oak st., champaign, il 61820
e-mail: preece@motorola.com fax: +1-217-384-8550
phone: +1-217-384-8589 cell: +1-217-433-6114 pager: 2174336114@vtext.com
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
@ 2006-08-31 2:41 Woodruff, Richard
0 siblings, 0 replies; 136+ messages in thread
From: Woodruff, Richard @ 2006-08-31 2:41 UTC (permalink / raw)
To: Preece Scott-PREECE, Pavel Machek; +Cc: Matthew Locke, linux-pm
> Well, we have some hardware where we can sleep everything but
> memory and some where we can also leave the display active (and
> backlit). In fact, however, today the latency for going to sleep
> is too great to do so between frames, so we just do a wait there.
[Woodruff, Richard]
In effect OMAP2/3 can auto idle to low power states in between LCD FIFO
refills. The SDRAM, the DPLL and interconnects can be auto-idled
between LCD FIFO loads. By carefully setting your LCD FIFO thresholds
you can spin back up the DPLL, memory and interconnect in time to load
up the LCD FIFO, then back to sleep.
Its not just LCD, other devices can do the same. Say pushing data into
an audio codec's FIFO.
Getting this effect does require device coordination. If a device
objects you don't hit the big savings.
Regards,
Richard W.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
@ 2006-08-31 0:52 Scott E. Preece
0 siblings, 0 replies; 136+ messages in thread
From: Scott E. Preece @ 2006-08-31 0:52 UTC (permalink / raw)
To: pavel; +Cc: linux-pm
| From linux-pm-bounces@lists.osdl.org Wed Aug 30 17:48:27 2006
|
| On Tue 2006-08-29 21:52:26, David Singleton wrote:
| > >> >> /sys/power/operating_points/mem
| > >> >> /sys/power/operating_points/standby
| ....
| > >That does not make mixing them right.
| >
| > Both OpPoint and PowerOp are going to 'mix' frequency, voltage
| > and sleep states into their operating point concepts.
| >
| > The point was not to make it look like I was mixing sleep states and
| > CPU frequency states, but to present all the power states
| > supported by the system in one place and with one interface. It simplifies
| > not only kernel code, but power manager code as well.
|
| It is also wrong. And no, I do not think your power manager can
| properly use "mem" state.
|
| You see, "mem" is very different from lowest. To exit lowest, you have
| to "echo highest > state". To exit "mem", you need power
| button. That's very different operation.
---
Not sure exactly what is meant by "mem" operating point. I was assuming
it meant "suspend-to-RAM" (almost everything shut down, memory self
refreshing). In my world, our current policy manager does manage mem
(which we call "sleep" and is the deepest suspend we do) separately from
frequency changes, but that's accident rather than intention.
I agree that there is some difference between them, since we do
frequency changes in response to load, but sleep-state changes based on
idleness. However, there's no real reason why those can't be inputs to
the same policy manager. We actually do make both decisions in the Idle
handler (well, there's more plumbing than that, but they're both driven
by going idle).
scott
--
scott preece
motorola mobile devices, il67, 1800 s. oak st., champaign, il 61820
e-mail: preece@motorola.com fax: +1-217-384-8550
phone: +1-217-384-8589 cell: +1-217-433-6114 pager: 2174336114@vtext.com
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
@ 2006-08-25 22:11 Woodruff, Richard
0 siblings, 0 replies; 136+ messages in thread
From: Woodruff, Richard @ 2006-08-25 22:11 UTC (permalink / raw)
To: Alan Stern; +Cc: linux-pm, Pavel Machek
> Note that you don't need to replace one device with another having the
> same PID/VID. It could be the very same device but with new media
loaded.
> That would be just as bad.
Yes, I see your point this time. Thanks. If I'm hooking up to a modem
inside a phone via an internal transceiver-less link it has a chance,
but not so well with a USB card reader.
Completely generalized solutions in the power domain seem pretty hard to
come by. You end up with 'if this class of device and not that class of
device' when you try and optimize. One way of slicing it up is with
discrete sets...kind of like operating point parameters :)
Regards,
Richard W.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
@ 2006-08-25 21:21 Woodruff, Richard
2006-08-25 21:42 ` Alan Stern
0 siblings, 1 reply; 136+ messages in thread
From: Woodruff, Richard @ 2006-08-25 21:21 UTC (permalink / raw)
To: Alan Stern; +Cc: linux-pm, Pavel Machek
> > > ? Wouldn't suspending the entire bus completely stop the
throughput
> > of
> > > any attached device?
> >
> > Not necessarily (right?). If I shut off VBUS then yes then I need
to
> > re-enumerate for sure. One might figure out a cleaver way of
shutting
> > it off and turning it back on in between host requests.
>
> No, you can't do that. Without VBUS power there's no reliable way to
> detect disconnects or media changes.
That's a good point. I wonder in this case if it is possible to keep a
list of what was there. When you re-power if its not there, it should
be like it was disconnected, then act accordingly. If it is there all
is ok. It seems unlikely that the same PID/VID device would have
replaced it. New devices would show up as not being there before.
Coding that all up would likely be a bunch of work assuming it was
possible.
Thanks,
Richard W.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-25 21:21 Woodruff, Richard
@ 2006-08-25 21:42 ` Alan Stern
0 siblings, 0 replies; 136+ messages in thread
From: Alan Stern @ 2006-08-25 21:42 UTC (permalink / raw)
To: Woodruff, Richard; +Cc: linux-pm, Pavel Machek
On Fri, 25 Aug 2006, Woodruff, Richard wrote:
> > > > ? Wouldn't suspending the entire bus completely stop the
> throughput
> > > of
> > > > any attached device?
> > >
> > > Not necessarily (right?). If I shut off VBUS then yes then I need
> to
> > > re-enumerate for sure. One might figure out a cleaver way of
> shutting
> > > it off and turning it back on in between host requests.
> >
> > No, you can't do that. Without VBUS power there's no reliable way to
> > detect disconnects or media changes.
>
> That's a good point. I wonder in this case if it is possible to keep a
> list of what was there. When you re-power if its not there, it should
> be like it was disconnected, then act accordingly. If it is there all
> is ok. It seems unlikely that the same PID/VID device would have
> replaced it. New devices would show up as not being there before.
>
> Coding that all up would likely be a bunch of work assuming it was
> possible.
There have been long discussions about this in the past, mainly focused on
suspend-to-disk (which generally turns off all power to the USB
controllers). They were pretty much inconclusive; it's safe to assume no
progress will be made on supporting this for a long time, if ever.
Note that you don't need to replace one device with another having the
same PID/VID. It could be the very same device but with new media loaded.
That would be just as bad.
Alan Stern
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
@ 2006-08-25 20:57 Woodruff, Richard
2006-08-25 21:13 ` Alan Stern
0 siblings, 1 reply; 136+ messages in thread
From: Woodruff, Richard @ 2006-08-25 20:57 UTC (permalink / raw)
To: Alan Stern; +Cc: linux-pm, Pavel Machek
> ? Wouldn't suspending the entire bus completely stop the throughput
of
> any attached device?
Not necessarily (right?). If I shut off VBUS then yes then I need to
re-enumerate for sure. One might figure out a cleaver way of shutting
it off and turning it back on in between host requests. If I have a USB
drive attached it might well be the protocol layers are smart enough
keep using the device once it is restarted at the low layers. They just
have to wait longer for the data to arrive.
If I just do USB bus suspend the device at the other end can signal
remote wake up to me. I tell him suspend and he agrees to drop to a low
power state, then I lower current capacity. When he has some data he
can let me know and I'll up the VBUS current capacity and then go talk
to him. If I have data I want from him I directly wake him up.
In both cases the extra overhead causes my throughput to drop, but I
still have an effective connection to the device.
At the PM summit Len made a nice observation that you can map all the
processor ACPI-States right to devices. You can have T states with them
if you 'throttle' them (slow down their clock). You can have P states
with voltage/frequency changes. And you can have C-states (1,2,3) by
shutting down devices in-between accesses, etc. You can likely have a
single state representation which covers processors and devices. You
might even implement 'race to idle' conditions at the devices. Get the
work done then shut off to as low a state as you can and still have
acceptable latency. This was what sparked the 'on-ness' discussion a
while back on this list.
> > The frequency of entering this state should not interfere with my
active
> > use case.
> >
> > -B- After I've put down the USB device, I now can program the
internal
> > SOC bus wrapper for the USB to allow idling of the interconnect. I
also
> > need to associate the USB remote wake interrupt with a wake up
interrupt
> > to restart my interconnect. All devices on that interconnect must
be in
> > the same state for the big savings to happen.
> >
> > Certainly for this embedded system, not coordinating the device
states
> > means I can't get the big power savings.
>
> Part of this programming has to be done in the architecture-specific
> driver for the interconnect. There already is code being developed to
> suspend USB buses when they aren't in use (although determining _when_
> they aren't in use has not yet been implemented). However this code
stops
> at the level of the USB controller. Further development is being
stymied
> by lack of information about how to detect the controller's wake-up
events
> on regular desktop systems; it's possible someone might implement this
> first on an embedded platform.
That seems possible.
> But this doesn't require any over-reaching global coordination. All
it
> needs is for each driver to know when it's not being used.
Applying an activity time of sorts to each driver 'might' end up in
situations where you get good power savings. Assuming everything lines
up. However, given hardware and software bugs and errata, it seems
forcing the situation is much more likely to succeed and make sure you
hit your targets.
Even with a per-driver activity timer how does one set the time out
levels for the whole system? You need some kind of policy pieces to set
all the knobs. Letting the self adjust won't likely work for QOS
(quality of service type things) unless they are set very
conservatively.
Thanks,
Richard W.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-25 20:57 Woodruff, Richard
@ 2006-08-25 21:13 ` Alan Stern
0 siblings, 0 replies; 136+ messages in thread
From: Alan Stern @ 2006-08-25 21:13 UTC (permalink / raw)
To: Woodruff, Richard; +Cc: linux-pm, Pavel Machek
On Fri, 25 Aug 2006, Woodruff, Richard wrote:
> > ? Wouldn't suspending the entire bus completely stop the throughput
> of
> > any attached device?
>
> Not necessarily (right?). If I shut off VBUS then yes then I need to
> re-enumerate for sure. One might figure out a cleaver way of shutting
> it off and turning it back on in between host requests.
No, you can't do that. Without VBUS power there's no reliable way to
detect disconnects or media changes.
> If I have a USB
> drive attached it might well be the protocol layers are smart enough
> keep using the device once it is restarted at the low layers. They just
> have to wait longer for the data to arrive.
>
> If I just do USB bus suspend the device at the other end can signal
> remote wake up to me. I tell him suspend and he agrees to drop to a low
> power state, then I lower current capacity. When he has some data he
> can let me know and I'll up the VBUS current capacity and then go talk
> to him. If I have data I want from him I directly wake him up.
Assuming the device has remote wake-up capability. And assuming the
latency of a remote wakeup isn't too high. I tried doing some tests using
a USB keyboard; when the device was suspended and I woke it up by typing
on it, nearly every time the first few keystrokes were lost.
> > But this doesn't require any over-reaching global coordination. All
> it
> > needs is for each driver to know when it's not being used.
>
> Applying an activity time of sorts to each driver 'might' end up in
> situations where you get good power savings. Assuming everything lines
> up. However, given hardware and software bugs and errata, it seems
> forcing the situation is much more likely to succeed and make sure you
> hit your targets.
Activity timers might be appropriate for some devices but not for others.
For instance, a USB mass-storage device will always have a lower-level
driver below the USB driver (for instance, a disk or CD driver that uses
the USB driver for its transport). Suspending the USB driver can't be
done unless the lower-level driver is suspended first, because it might
have unexpected side effects such as spinning down a drive. My point
being that an inactivity timer might be appropriate at the level of the
disk or CD driver, but not at the level of a USB mass-storage driver.
> Even with a per-driver activity timer how does one set the time out
> levels for the whole system? You need some kind of policy pieces to set
> all the knobs. Letting the self adjust won't likely work for QOS
> (quality of service type things) unless they are set very
> conservatively.
I would imagine it depends very heavily on the type of system you're
talking about. On desktops and laptops, for example, Windows seems to get
along okay with a small handful of system-level inactivity timers.
Alan Stern
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
@ 2006-08-25 20:22 Woodruff, Richard
2006-08-25 20:34 ` Alan Stern
0 siblings, 1 reply; 136+ messages in thread
From: Woodruff, Richard @ 2006-08-25 20:22 UTC (permalink / raw)
To: Pavel Machek; +Cc: linux-pm
> > > No, it is because USB enabled prevents cpu from sleeping; it is
> > > actually well known.
> >
> > I vaguely recall hearing the why. It has some DMAs which are going
on
> > and I suppose the processor must service the completions.
> > Now, if you coordinated with the USB device some how, you could try
> > > and
>
> If I coordinated with USB device somehow, I'd know when it is possible
> to shutoff usb bus. This can be done locally at usb driver, no need
> for big framework. Just someone needs to write that code.
There are two sides to this in the case for the embedded processor I'm
using.
-A- you have to put the USB bus into suspend mode.
- This will lower the throughput of the device on the other end.
The frequency of entering this state should not interfere with my active
use case.
-B- After I've put down the USB device, I now can program the internal
SOC bus wrapper for the USB to allow idling of the interconnect. I also
need to associate the USB remote wake interrupt with a wake up interrupt
to restart my interconnect. All devices on that interconnect must be in
the same state for the big savings to happen.
Certainly for this embedded system, not coordinating the device states
means I can't get the big power savings.
Now, programming up millions of combinations is not feasible. However
you can profile your usage and target common use cases. Playing MP3 and
reading a document for instance.
Thanks,
Richard W.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-25 20:22 Woodruff, Richard
@ 2006-08-25 20:34 ` Alan Stern
2006-08-25 21:27 ` Pavel Machek
0 siblings, 1 reply; 136+ messages in thread
From: Alan Stern @ 2006-08-25 20:34 UTC (permalink / raw)
To: Woodruff, Richard; +Cc: linux-pm, Pavel Machek
On Fri, 25 Aug 2006, Woodruff, Richard wrote:
> There are two sides to this in the case for the embedded processor I'm
> using.
>
> -A- you have to put the USB bus into suspend mode.
> - This will lower the throughput of the device on the other end.
? Wouldn't suspending the entire bus completely stop the throughput of
any attached device?
> The frequency of entering this state should not interfere with my active
> use case.
>
> -B- After I've put down the USB device, I now can program the internal
> SOC bus wrapper for the USB to allow idling of the interconnect. I also
> need to associate the USB remote wake interrupt with a wake up interrupt
> to restart my interconnect. All devices on that interconnect must be in
> the same state for the big savings to happen.
>
> Certainly for this embedded system, not coordinating the device states
> means I can't get the big power savings.
Part of this programming has to be done in the architecture-specific
driver for the interconnect. There already is code being developed to
suspend USB buses when they aren't in use (although determining _when_
they aren't in use has not yet been implemented). However this code stops
at the level of the USB controller. Further development is being stymied
by lack of information about how to detect the controller's wake-up events
on regular desktop systems; it's possible someone might implement this
first on an embedded platform.
But this doesn't require any over-reaching global coordination. All it
needs is for each driver to know when it's not being used.
Alan Stern
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-25 20:34 ` Alan Stern
@ 2006-08-25 21:27 ` Pavel Machek
2006-08-25 21:46 ` Alan Stern
0 siblings, 1 reply; 136+ messages in thread
From: Pavel Machek @ 2006-08-25 21:27 UTC (permalink / raw)
To: Alan Stern; +Cc: linux-pm
Hi!
> > The frequency of entering this state should not interfere with my active
> > use case.
> >
> > -B- After I've put down the USB device, I now can program the internal
> > SOC bus wrapper for the USB to allow idling of the interconnect. I also
> > need to associate the USB remote wake interrupt with a wake up interrupt
> > to restart my interconnect. All devices on that interconnect must be in
> > the same state for the big savings to happen.
> >
> > Certainly for this embedded system, not coordinating the device states
> > means I can't get the big power savings.
>
> Part of this programming has to be done in the architecture-specific
> driver for the interconnect. There already is code being developed to
> suspend USB buses when they aren't in use (although determining _when_
> they aren't in use has not yet been implemented). However this code
> stops
Are there some patches to test? I'd like to power down USB bus, even
when it has device connected (I do not user fingerprint scanner that
much).
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-25 21:27 ` Pavel Machek
@ 2006-08-25 21:46 ` Alan Stern
2006-08-25 22:03 ` Pavel Machek
0 siblings, 1 reply; 136+ messages in thread
From: Alan Stern @ 2006-08-25 21:46 UTC (permalink / raw)
To: Pavel Machek; +Cc: linux-pm
On Fri, 25 Aug 2006, Pavel Machek wrote:
> Hi!
>
> > > The frequency of entering this state should not interfere with my active
> > > use case.
> > >
> > > -B- After I've put down the USB device, I now can program the internal
> > > SOC bus wrapper for the USB to allow idling of the interconnect. I also
> > > need to associate the USB remote wake interrupt with a wake up interrupt
> > > to restart my interconnect. All devices on that interconnect must be in
> > > the same state for the big savings to happen.
> > >
> > > Certainly for this embedded system, not coordinating the device states
> > > means I can't get the big power savings.
> >
> > Part of this programming has to be done in the architecture-specific
> > driver for the interconnect. There already is code being developed to
> > suspend USB buses when they aren't in use (although determining _when_
> > they aren't in use has not yet been implemented). However this code
> > stops
>
> Are there some patches to test? I'd like to power down USB bus, even
> when it has device connected (I do not user fingerprint scanner that
> much).
There are some old patches. I could update them to the current -mm kernel
and post them next week.
The idea of the patches is that they will autosuspend a USB hub when it
has no active (i.e., unsuspended) children, and autosuspending a root hub
stops the USB controller from doing DMA. However, non-hub devices are not
yet automatically suspended, so you will have to suspend the fingerprint
scanner by hand.
Alan Stern
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-25 21:46 ` Alan Stern
@ 2006-08-25 22:03 ` Pavel Machek
2006-08-26 2:21 ` Alan Stern
0 siblings, 1 reply; 136+ messages in thread
From: Pavel Machek @ 2006-08-25 22:03 UTC (permalink / raw)
To: Alan Stern; +Cc: linux-pm
Hi!
> > Are there some patches to test? I'd like to power down USB bus, even
> > when it has device connected (I do not user fingerprint scanner that
> > much).
>
> There are some old patches. I could update them to the current -mm kernel
> and post them next week.
Yes, that would be great.
> The idea of the patches is that they will autosuspend a USB hub when it
> has no active (i.e., unsuspended) children, and autosuspending a root hub
> stops the USB controller from doing DMA. However, non-hub devices are not
> yet automatically suspended, so you will have to suspend the fingerprint
> scanner by hand.
That is okay, I can do that. It saves 2 hours of battery life on my
machine...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-25 22:03 ` Pavel Machek
@ 2006-08-26 2:21 ` Alan Stern
0 siblings, 0 replies; 136+ messages in thread
From: Alan Stern @ 2006-08-26 2:21 UTC (permalink / raw)
To: Pavel Machek; +Cc: linux-pm
On Sat, 26 Aug 2006, Pavel Machek wrote:
> Hi!
>
> > > Are there some patches to test? I'd like to power down USB bus, even
> > > when it has device connected (I do not user fingerprint scanner that
> > > much).
> >
> > There are some old patches. I could update them to the current -mm kernel
> > and post them next week.
>
> Yes, that would be great.
>
> > The idea of the patches is that they will autosuspend a USB hub when it
> > has no active (i.e., unsuspended) children, and autosuspending a root hub
> > stops the USB controller from doing DMA. However, non-hub devices are not
> > yet automatically suspended, so you will have to suspend the fingerprint
> > scanner by hand.
>
> That is okay, I can do that. It saves 2 hours of battery life on my
> machine...
Come to think of it, you don't need the autosuspend patches to turn these
devices off. You can do it right now with your existing kernel, although
it's a little easier with -mm. (The reason is that -mm contains a
development patch which ties a USB device's interfaces to the device
itself; suspending the device will automatically suspend all its
interfaces, and likewise resuming the device will automatically resume all
its interfaces. With a vanilla kernel you must manually suspend the
interfaces before you can suspend the device and manually resume them
after resuming the device.)
Anyway, you can use the deprecated
echo -n 2 >/sys/devices/.../power/state
mechanism to suspend all the USB interfaces, devices, and controllers you
want -- provided you work your way up from the bottom of the device tree.
The autosuspend patch just makes it simpler, since it takes care of
suspending and resuming all the hubs for you.
Alan Stern
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
@ 2006-08-25 20:05 Woodruff, Richard
2006-08-25 20:08 ` Pavel Machek
0 siblings, 1 reply; 136+ messages in thread
From: Woodruff, Richard @ 2006-08-25 20:05 UTC (permalink / raw)
To: Pavel Machek; +Cc: linux-pm
> > > For notebooks, devices *are* islands. powerop tries to push
> > > everything-depends-on-everything model that may be good for some
SoC,
> > > but sucks for notebooks. We need some middle ground.
> >
> > USB being enabled and causing your laptop battery to dry up is a
case
> > where laptop device dependency has been shown. There are likely
many
> > more cases. I would expect BIOS/chip set developers are all too
aware
> > of these in their sub-domains.
>
> No, it is because USB enabled prevents cpu from sleeping; it is
> actually well known.
I vaguely recall hearing the why. It has some DMAs which are going on
and I suppose the processor must service the completions.
Now, if you coordinated with the USB device some how, you could try and
place the USB bus into suspend mode, and only wake up on USB remote wake
up or data to be sent, they you could likely spend a lot more time in a
lower P state.
How are you to know when it is ok to shut off the USB bus? Is that
something which could be coordinated with the processor and the active
use case. If I don't need high performance I could go in and out of
suspend to save power. Knowing high, or low performance helps in this
case :)
Regards,
Richard W.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-25 20:05 Woodruff, Richard
@ 2006-08-25 20:08 ` Pavel Machek
0 siblings, 0 replies; 136+ messages in thread
From: Pavel Machek @ 2006-08-25 20:08 UTC (permalink / raw)
To: Woodruff, Richard; +Cc: linux-pm
Hi!
> > No, it is because USB enabled prevents cpu from sleeping; it is
> > actually well known.
>
> I vaguely recall hearing the why. It has some DMAs which are going on
> and I suppose the processor must service the completions.
> Now, if you coordinated with the USB device some how, you could try
> > and
If I coordinated with USB device somehow, I'd know when it is possible
to shutoff usb bus. This can be done locally at usb driver, no need
for big framework. Just someone needs to write that code.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
@ 2006-08-24 12:16 Woodruff, Richard
2006-08-24 12:29 ` Pavel Machek
0 siblings, 1 reply; 136+ messages in thread
From: Woodruff, Richard @ 2006-08-24 12:16 UTC (permalink / raw)
To: Pavel Machek; +Cc: linux-pm
> > To some extent having lots of specific policies in the embedded
space is
> > inevitable. The hardware is very tightly coupled. You may have
>
> Maybe. But you certainly do not have to export that uglyness to
> userspace.
Some aspects are easier to manage. If data base like operations and
such are needed it's a more friendly place.
Sysfs exporting every gory detail about PCI or USB doesn't seem so far
from this kind of thing.
> > I have some notion that a policy manager can create a state with
simple
> > & general names like fast, medium, slow (whatever) which is the
> > interface in which applications might speak. A complex policy
> > manager
>
> ...which is very bad interface for applications. See my other
> mail. Applications should not have to play with fast/medium/slow,
> explicitely. Instead, on opening /dev/dsp, you should power up the
> sound system (and maybe adjust cpu frequency if
> neccessary). Application should not have to do echo fast > somewhere
> before opening /dev/dsp
How does /dev/dsp know at what level it can run at? On the SOC I
control the speed of the DSP. I can adjust its MIPs rate.
We do internally have some run time algorithms on the DSP which allow it
to feed statistics back about how well it is doing... like did I drop
any frames, and how close was I to my deadline in decoding a sample. So
there is some low level things which can be done. The DSP is currently
has management code in the kernel (bridge driver) and it has a user
space component which can load algorithms and such through the bridge to
do things.
A missing pieces is meaningful coordination between devices. Each
device is not an island. Not taking care of all devices on the internal
interconnects may mean you don't get the big power savings. For the DSP
and the Control processor to work you need each device enabled to a
sufficient performance level. Setting them to all high means you don't
get the savings.
Doing this kind of coordination which is very specific to your use case
is difficult to achieve in a generalized fashion. Splitting some of
this work between user and kernel space can help.
Regards,
Richard W.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-24 12:16 Woodruff, Richard
@ 2006-08-24 12:29 ` Pavel Machek
0 siblings, 0 replies; 136+ messages in thread
From: Pavel Machek @ 2006-08-24 12:29 UTC (permalink / raw)
To: Woodruff, Richard; +Cc: linux-pm
> > > I have some notion that a policy manager can create a state with
> simple
> > > & general names like fast, medium, slow (whatever) which is the
> > > interface in which applications might speak. A complex policy
> > > manager
> >
> > ...which is very bad interface for applications. See my other
> > mail. Applications should not have to play with fast/medium/slow,
> > explicitely. Instead, on opening /dev/dsp, you should power up the
> > sound system (and maybe adjust cpu frequency if
> > neccessary). Application should not have to do echo fast > somewhere
> > before opening /dev/dsp
>
> How does /dev/dsp know at what level it can run at? On the SOC I
> control the speed of the DSP. I can adjust its MIPs rate.
(I meant /dev/dsp -- OSS audio device, not Digital Signal Processor).
> A missing pieces is meaningful coordination between devices. Each
> device is not an island. Not taking care of all devices on the internal
> interconnects may mean you don't get the big power savings. For the DSP
For notebooks, devices *are* islands. powerop tries to push
everything-depends-on-everything model that may be good for some SoC,
but sucks for notebooks. We need some middle ground.
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
@ 2006-08-23 19:20 Woodruff, Richard
2006-08-24 8:03 ` Pavel Machek
0 siblings, 1 reply; 136+ messages in thread
From: Woodruff, Richard @ 2006-08-23 19:20 UTC (permalink / raw)
To: Pavel Machek, Vitaly Wool; +Cc: linux-pm
> > I guess it just defines an appropriate policy. You can call it
> > 'usb_mp3' if you wish ;)
> > I don't think it's too embedded-specific.
>
> Well, it leads to exponential number of policies -- not nice. Having
> usb_mp3_fileserver_webserver is not nice.
To some extent having lots of specific policies in the embedded space is
inevitable. The hardware is very tightly coupled. You may have almost
all the functionality of a PC some 5 years back on a single chip. In
that kind of environment not taking into account the chip as a whole
means you do power at a 10x or say 100x of optimal. You don't get the
big interconnect savings unless you link the all the individual device
states with the processor states. A 400mA@1.3v might sound good to a PC
centric person, but when the design target is 4mA@1.3v it is not good.
I have some notion that a policy manager can create a state with simple
& general names like fast, medium, slow (whatever) which is the
interface in which applications might speak. A complex policy manager
will associate this name with device and cpu states in great detail. A
more general purpose one only need map it to some governor and its run
time parameters.
Regards,
Richard W.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-23 19:20 Woodruff, Richard
@ 2006-08-24 8:03 ` Pavel Machek
0 siblings, 0 replies; 136+ messages in thread
From: Pavel Machek @ 2006-08-24 8:03 UTC (permalink / raw)
To: Woodruff, Richard; +Cc: linux-pm
Hi!
> > > I guess it just defines an appropriate policy. You can call it
> > > 'usb_mp3' if you wish ;)
> > > I don't think it's too embedded-specific.
> >
> > Well, it leads to exponential number of policies -- not nice. Having
> > usb_mp3_fileserver_webserver is not nice.
>
> To some extent having lots of specific policies in the embedded space is
> inevitable. The hardware is very tightly coupled. You may have
Maybe. But you certainly do not have to export that uglyness to
userspace.
> I have some notion that a policy manager can create a state with simple
> & general names like fast, medium, slow (whatever) which is the
> interface in which applications might speak. A complex policy
> manager
...which is very bad interface for applications. See my other
mail. Applications should not have to play with fast/medium/slow,
explicitely. Instead, on opening /dev/dsp, you should power up the
sound system (and maybe adjust cpu frequency if
neccessary). Application should not have to do echo fast > somewhere
before opening /dev/dsp.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
@ 2006-08-20 13:36 Woodruff, Richard
0 siblings, 0 replies; 136+ messages in thread
From: Woodruff, Richard @ 2006-08-20 13:36 UTC (permalink / raw)
To: David Singleton; +Cc: linux-pm
>
> Oppoint could replace large pieces of the cpufreq code
> in the kernel, most notably the policy and governor code,
which I
> believe belongs in user space in the power manager daemon.
I've not actually looked a CPUFreq implementation to know how all this
maps...
In general policy is better in user space and depending on your system
it might all live there.
However, when response time counts, it can be necessary to have a level
of algorithm be executed in kernel space. Some might associate this
level with policy.
Cpufreq has both user and kernel space governess. For sure the choice
of what govner to use, in which context it executes, and tracking of its
performance likely would be easiest in user space.
Regards,
Richard W.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
@ 2006-08-16 1:27 Scott E. Preece
2006-08-16 15:25 ` Mark Gross
0 siblings, 1 reply; 136+ messages in thread
From: Scott E. Preece @ 2006-08-16 1:27 UTC (permalink / raw)
To: davej; +Cc: linux-pm
| From: Dave Jones<davej@redhat.com>
|
| On Tue, Aug 15, 2006 at 01:35:15PM +0300, Amit Kucheria wrote:
|
| > d. In the end, all this is leading to an interface for a user-space
| > policy manager that will control _system_ power state based on
| > constraints imposed by HW peripherals or on policies implemented by
| > device manufacturer/distro maintainer.
|
| How does that interface look from a userspace point of view ?
| Hopefully not anything like the tuple described above.
| Why would userspace ever care about "interconnect freq" ?
|
| Userspace cares about "save power" or "go fast".
| Historically, I wish we had never exposed frequencies, but instead
| a performance percentage, so that the various userspace tools
| didn't have to care about things like 'what frequencies are available'.
| Adding the same mistake for voltages doesn't strike me as a fantastic idea.
---
For us, "userspace" means a power policy manager that potentially has a
lot of awareness about the power needs of specific applications and the
overall use cases driving the device. There is no interface available or
visible to a "user". The policy manager does want to know about
specific frequencies and voltages and their interaction, because they
determine the circumstances under which it makes sense to make
particular transitions.
As I think I mentioned at the PM Summit in April, it's important to
recognize that the power and performance implications of operating
points are not simply based on frequency. Sometimes you want so shift
"sideways", because changing one parameter may be preferable to changing
another.
Note that we also want to be able to run the same code on a range of
devices that may have significantly different hardware performance, so
an abstract set of names (fastest to slowest, for instance) is also a
problem.
scott
--
scott preece
motorola mobile devices, il67, 1800 s. oak st., champaign, il 61820
e-mail: preece@motorola.com fax: +1-217-384-8550
phone: +1-217-384-8589 cell: +1-217-433-6114 pager: 2174336114@vtext.com
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-16 1:27 Scott E. Preece
@ 2006-08-16 15:25 ` Mark Gross
0 siblings, 0 replies; 136+ messages in thread
From: Mark Gross @ 2006-08-16 15:25 UTC (permalink / raw)
To: Scott E. Preece; +Cc: linux-pm
On Tue, Aug 15, 2006 at 08:27:49PM -0500, Scott E. Preece wrote:
>
>
> | From: Dave Jones<davej@redhat.com>
> |
> | On Tue, Aug 15, 2006 at 01:35:15PM +0300, Amit Kucheria wrote:
> |
> | > d. In the end, all this is leading to an interface for a user-space
> | > policy manager that will control _system_ power state based on
> | > constraints imposed by HW peripherals or on policies implemented by
> | > device manufacturer/distro maintainer.
> |
> | How does that interface look from a userspace point of view ?
> | Hopefully not anything like the tuple described above.
> | Why would userspace ever care about "interconnect freq" ?
> |
> | Userspace cares about "save power" or "go fast".
> | Historically, I wish we had never exposed frequencies, but instead
> | a performance percentage, so that the various userspace tools
> | didn't have to care about things like 'what frequencies are available'.
> | Adding the same mistake for voltages doesn't strike me as a fantastic idea.
> ---
>
> For us, "userspace" means a power policy manager that potentially has a
> lot of awareness about the power needs of specific applications and the
> overall use cases driving the device. There is no interface available or
> visible to a "user". The policy manager does want to know about
> specific frequencies and voltages and their interaction, because they
> determine the circumstances under which it makes sense to make
> particular transitions.
>
> As I think I mentioned at the PM Summit in April, it's important to
> recognize that the power and performance implications of operating
> points are not simply based on frequency. Sometimes you want so shift
> "sideways", because changing one parameter may be preferable to changing
> another.
Yes, over time it will become unrealistic to assume that voltage is a 1
to 1 function of frequency in a power management implementation for most
architectures. Additionally the control of more than just CPU power
consumption will become only a fraction of the runtime platform PM
story.
What is trying to happen with this work is to take some initial steps to
enable more global power load controls by adding infrastructure to
expose the types of platform knobs to the system needed to implement
more power savings.
The target is to enable cpufreq styled power load control to multiple
platform components. Plugging a PowerOP interface in under CPUFREQ is
one way to try to get this while not breaking existing work.
I don't know if its ready for the mm tree yet, it should at least build
for i386 or x86_64 even if today there is not obvious value in non-ACPI
PM platform throttling for these guys.
It is true that the embedded folks will be the early adopters of this
type of thing, but the big iron folks will not be far behind, and
eventually the desktop and laptop crowd will likely follow.
>
> Note that we also want to be able to run the same code on a range of
> devices that may have significantly different hardware performance, so
> an abstract set of names (fastest to slowest, for instance) is also a
> problem.
The problem of what to expose to user space will vary from platform to
platform and use-case to use-case. I don't think we'll find a one size
fits all solution to this issue.
--mgross
^ permalink raw reply [flat|nested] 136+ messages in thread
* So, what's the status on the recent patches here?
@ 2006-08-14 20:07 Greg KH
2006-08-14 22:24 ` Matthew Locke
0 siblings, 1 reply; 136+ messages in thread
From: Greg KH @ 2006-08-14 20:07 UTC (permalink / raw)
To: linux-pm
I'm seeing a lot of threads without very much resolution on the
differing patches that are flying around here in regards to the rework
of the power management stuff (not suspend stuff...)
So, should I just grab a random patchset from here and add it to my
trees and get it into -mm for testing, or does someone want to possibly
guide me to the set that everyone seems to agree apon?
Or, is there two (or more) competing patch sets here that need to get
resolved?
(If you can't tell I'm getting a bit annoyed at having to tell people
all the time that yes, power management on Linux is bad, and yes, people
are working on it, but no, I have no idea when it will ever see the
light of day...)
thanks,
greg k-h
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-14 20:07 Greg KH
@ 2006-08-14 22:24 ` Matthew Locke
2006-08-14 22:46 ` Dave Jones
2006-08-14 23:29 ` Dominik Brodowski
0 siblings, 2 replies; 136+ messages in thread
From: Matthew Locke @ 2006-08-14 22:24 UTC (permalink / raw)
To: Greg KH; +Cc: linux-pm
On Aug 14, 2006, at 1:07 PM, Greg KH wrote:
> I'm seeing a lot of threads without very much resolution on the
> differing patches that are flying around here in regards to the rework
> of the power management stuff (not suspend stuff...)
>
RIght now there are two sets of patches with the name powerop.
One set (from Eugeny and myself) is focused on getting agreement for
the PowerOP interface and operating point definition. I believe the
last patchset Eugeny submitted as incorporated all the comments about
PowerOP so far. I don't think integrating PowerOP with suspend
(/sys/power/state) is appropriate at this time (as others agreed). I
would rather see PowerOP accepted and used by cpufreq before we tackle
suspend/resume.
The other set posted by Dave Singleton is geared towards showing how
PowerOP can be used by both cpufreq and suspend code. It contains lots
of features that have not been reviewed or discussed.
> So, should I just grab a random patchset from here and add it to my
> trees and get it into -mm for testing, or does someone want to possibly
> guide me to the set that everyone seems to agree apon?
No, please don't grab a random patchset:) IMO, the patches from
Eugeny and myself are the ones to grab and put into -mm. We were
hoping to get some feedback on the set posted here
http://lists.osdl.org/pipermail/linux-pm/2006-August/003196.html but I
think the two patchsets have confused the situation. We are working
on the next rev of these patches which will mostly be some clean up and
tighter integration with cpufreq. Our plan was to get the next rev
out before we request inclusion in -mm. However if you are ready to
look at and play with patches. Start with the ones at the link above.
I am a little concerned that none of the cpufreq developers have
responded. I was hoping to get their feedback.
>
> Or, is there two (or more) competing patch sets here that need to get
> resolved?
I don't view the two patchsets as competing. Eugeny and I are focused
on getting the basic building block necessary to do advanced frequency
and voltage scaling accepted. If we can get PowerOP in the mainline,
then we can add more feature by feature. As Dave outlined in his
email, his patches are a starting point for further discussion about
integrating with other subsystems and additional features. Let's focus
on getting PowerOP accepted by starting with Eugeny's patches which
provides powerop as a separate component and integration with cpufreq.
> (If you can't tell I'm getting a bit annoyed at having to tell people
> all the time that yes, power management on Linux is bad, and yes,
> people
> are working on it, but no, I have no idea when it will ever see the
> light of day...)
Well, we are working on it. I think we had some really good
discussion/feedback over the last weeks and we are almost there.
Unfortunately, the discussion tapered off recently when we needed some
final feedback. Probably related to having two patchsets with the name
powerop. Let's try to get something acceptable in -mm over the next
couple days.
Thanks
Matt
>
> thanks,
>
> greg k-h
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/linux-pm
>
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-14 22:24 ` Matthew Locke
@ 2006-08-14 22:46 ` Dave Jones
2006-08-14 23:24 ` Matthew Locke
2006-08-14 23:29 ` Dominik Brodowski
1 sibling, 1 reply; 136+ messages in thread
From: Dave Jones @ 2006-08-14 22:46 UTC (permalink / raw)
To: Matthew Locke; +Cc: linux-pm
On Mon, Aug 14, 2006 at 03:24:19PM -0700, Matthew Locke wrote:
> I am a little concerned that none of the cpufreq developers have
> responded. I was hoping to get their feedback.
I was waiting for the dust to settle before spending a significant
amount of time reviewing. I have to admit, the two patchsets thing
did confuse me too. (Though I've also been swamped with bugs since
I got back from OLS, so I've appreciated the breathing room :)
If we're arriving any closer to consensus on whats mergable from the
cpufreq side, and what needs more input, I'll find the time to review
soon, but there still seems to be ongoing discussion which is why I
decided to leave it sort itself out :)
> > (If you can't tell I'm getting a bit annoyed at having to tell people
> > all the time that yes, power management on Linux is bad, and yes,
> > people
> > are working on it, but no, I have no idea when it will ever see the
> > light of day...)
>
> Well, we are working on it.
Sadly powerop is but a tiny piece of the puzzle.
Dave
--
http://www.codemonkey.org.uk
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-14 22:46 ` Dave Jones
@ 2006-08-14 23:24 ` Matthew Locke
2006-08-14 23:48 ` Dave Jones
0 siblings, 1 reply; 136+ messages in thread
From: Matthew Locke @ 2006-08-14 23:24 UTC (permalink / raw)
To: Dave Jones; +Cc: linux-pm
On Aug 14, 2006, at 3:46 PM, Dave Jones wrote:
> On Mon, Aug 14, 2006 at 03:24:19PM -0700, Matthew Locke wrote:
>
>> I am a little concerned that none of the cpufreq developers have
>> responded. I was hoping to get their feedback.
>
> I was waiting for the dust to settle before spending a significant
> amount of time reviewing. I have to admit, the two patchsets thing
> did confuse me too. (Though I've also been swamped with bugs since
> I got back from OLS, so I've appreciated the breathing room :)
Yeah, I understand that. I'm still catching up as well.
>
> If we're arriving any closer to consensus on whats mergable from the
> cpufreq side, and what needs more input, I'll find the time to review
> soon, but there still seems to be ongoing discussion which is why I
> decided to leave it sort itself out :)
I think we are at the stage of need more input on the last set of
Eugeny's patches. (the ones I point to in my email) The cpufreq
patches, so far, are more for example. We need a bit of work before
they are ready for merging. However, I would prefer to have your
feedback now rather than later.
>
>>> (If you can't tell I'm getting a bit annoyed at having to tell people
>>> all the time that yes, power management on Linux is bad, and yes,
>>> people
>>> are working on it, but no, I have no idea when it will ever see the
>>> light of day...)
>>
>> Well, we are working on it.
>
> Sadly powerop is but a tiny piece of the puzzle.
Cheer up guys. Power management will get better one piece at a time;
just like the rest of Linux:)
>
> Dave
>
> --
> http://www.codemonkey.org.uk
>
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-14 23:24 ` Matthew Locke
@ 2006-08-14 23:48 ` Dave Jones
2006-08-15 1:00 ` Greg KH
0 siblings, 1 reply; 136+ messages in thread
From: Dave Jones @ 2006-08-14 23:48 UTC (permalink / raw)
To: Matthew Locke; +Cc: linux-pm
On Mon, Aug 14, 2006 at 04:24:33PM -0700, Matthew Locke wrote:
> > If we're arriving any closer to consensus on whats mergable from the
> > cpufreq side, and what needs more input, I'll find the time to review
> > soon, but there still seems to be ongoing discussion which is why I
> > decided to leave it sort itself out :)
>
> I think we are at the stage of need more input on the last set of
> Eugeny's patches. (the ones I point to in my email) The cpufreq
> patches, so far, are more for example. We need a bit of work before
> they are ready for merging. However, I would prefer to have your
> feedback now rather than later.
I gave them a quick lookover, and there are the to-be-expected minor
nits, but there's something more fundamental that I'm still not getting.
This adds a whole bunch of new code, and doesn't seem to make any
existing code any simpler (to me at least). From a cpufreq point of view,
what does adding this buy us? What problem do we have today that is
being solved by all this?
Every explanation of powerop I've seen so far dives into microdetails,
whilst the 10,000ft view has always passed me by other than "this is
what we've had in the embedded world".
The diagram at http://lists.osdl.org/pipermail/linux-pm/2006-August/003196.html
also confuses me. I was under the impression that powerop was adding additional
userspace interfaces. If we're not changing how things from a userspace
point of view, we're churning a lot of kernel code,.. why?
Clue me in here, I'm feeling thick.
Dave
--
http://www.codemonkey.org.uk
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-14 23:48 ` Dave Jones
@ 2006-08-15 1:00 ` Greg KH
2006-08-15 3:03 ` Dave Jones
` (3 more replies)
0 siblings, 4 replies; 136+ messages in thread
From: Greg KH @ 2006-08-15 1:00 UTC (permalink / raw)
To: Dave Jones; +Cc: linux-pm
On Mon, Aug 14, 2006 at 07:48:01PM -0400, Dave Jones wrote:
>
> This adds a whole bunch of new code, and doesn't seem to make any
> existing code any simpler (to me at least). From a cpufreq point of view,
> what does adding this buy us? What problem do we have today that is
> being solved by all this?
>
> Every explanation of powerop I've seen so far dives into microdetails,
> whilst the 10,000ft view has always passed me by other than "this is
> what we've had in the embedded world".
>
> The diagram at http://lists.osdl.org/pipermail/linux-pm/2006-August/003196.html
> also confuses me. I was under the impression that powerop was adding additional
> userspace interfaces. If we're not changing how things from a userspace
> point of view, we're churning a lot of kernel code,.. why?
>
> Clue me in here, I'm feeling thick.
You're not alone, I really don't get it either.
But I guess we'll just wait for the next round of unified patches and
then go from there.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-15 1:00 ` Greg KH
@ 2006-08-15 3:03 ` Dave Jones
2006-08-15 10:35 ` Amit Kucheria
` (2 subsequent siblings)
3 siblings, 0 replies; 136+ messages in thread
From: Dave Jones @ 2006-08-15 3:03 UTC (permalink / raw)
To: Greg KH; +Cc: linux-pm
On Mon, Aug 14, 2006 at 06:00:20PM -0700, Greg Kroah-Hartman wrote:
> > This adds a whole bunch of new code, and doesn't seem to make any
> > existing code any simpler (to me at least). From a cpufreq point of view,
> > what does adding this buy us? What problem do we have today that is
> > being solved by all this?
> >
> > Every explanation of powerop I've seen so far dives into microdetails,
> > whilst the 10,000ft view has always passed me by other than "this is
> > what we've had in the embedded world".
> >
> > The diagram at http://lists.osdl.org/pipermail/linux-pm/2006-August/003196.html
> > also confuses me. I was under the impression that powerop was adding additional
> > userspace interfaces. If we're not changing how things from a userspace
> > point of view, we're churning a lot of kernel code,.. why?
> >
> > Clue me in here, I'm feeling thick.
>
> You're not alone, I really don't get it either.
>
> But I guess we'll just wait for the next round of unified patches and
> then go from there.
I have concerns over this because the cpufreq code has gotten pretty damned
complicated in parts, and it's really impacting our ability to fix bugs in
the thing. Every time something new falls out we have to play archaeologist
looking up a lot of ancient changes to figure why we did x in y way, and why
z didn't work, and it's getting quite unfun. In a lot of cases even the
original authors of the problematic parts can't remember their reasoning.
I've got a fairly good handle on most parts, but things like the recent
cpufreq vs hotplug-cpu fiasco (which went in via some other route rather than the
cpufreq tree) really threw me a curve-ball, and no-one other than Linus, Andrew
and myself stepped up to the plate to fix that mess, despite there being
the better part of a half dozen people who had hacked on it during its integration.
Dave
--
http://www.codemonkey.org.uk
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-15 1:00 ` Greg KH
2006-08-15 3:03 ` Dave Jones
@ 2006-08-15 10:35 ` Amit Kucheria
2006-08-15 19:04 ` Dave Jones
2006-08-17 21:24 ` Pavel Machek
2006-08-19 6:10 ` David Singleton
2006-08-19 6:19 ` David Singleton
3 siblings, 2 replies; 136+ messages in thread
From: Amit Kucheria @ 2006-08-15 10:35 UTC (permalink / raw)
To: ext Greg KH; +Cc: linux-pm
On Mon, 2006-08-14 at 18:00 -0700, ext Greg KH wrote:
> On Mon, Aug 14, 2006 at 07:48:01PM -0400, Dave Jones wrote:
> >
> > This adds a whole bunch of new code, and doesn't seem to make any
> > existing code any simpler (to me at least). From a cpufreq point of view,
> > what does adding this buy us? What problem do we have today that is
> > being solved by all this?
> >
> > Every explanation of powerop I've seen so far dives into microdetails,
> > whilst the 10,000ft view has always passed me by other than "this is
> > what we've had in the embedded world".
> >
> > The diagram at http://lists.osdl.org/pipermail/linux-pm/2006-August/003196.html
> > also confuses me. I was under the impression that powerop was adding additional
> > userspace interfaces. If we're not changing how things from a userspace
> > point of view, we're churning a lot of kernel code,.. why?
> >
> > Clue me in here, I'm feeling thick.
>
> You're not alone, I really don't get it either.
>
> But I guess we'll just wait for the next round of unified patches and
> then go from there.
Here is my shot at providing a 10,000ft view:
a. For embedded platforms, cpufreq just does not cut it when specifying
an operating point for the device. These platforms use tuples such as
<voltage, pll_freq, core1_freq, core2_freq, interconnect_freq>
to signify an 'operating point'. Note that core1 and core2 can be
totally different processors e.g. ARM and DSP. x86 platforms hide this
complexity behind ACPI.
b. The clocking and voltage dependency tree in embedded devices can be
summarised in the clock framework ( find ./arch -name clock* ) and
soon-to-be-available voltage framework. These is again done to large
extent by ACPI on x86.
c. PowerOP provides an interface, that most embedded developers agree,
is a good starting point to encapsulate platform information in a
consistent way without having to resort to subarch-specific kludges.
d. In the end, all this is leading to an interface for a user-space
policy manager that will control _system_ power state based on
constraints imposed by HW peripherals or on policies implemented by
device manufacturer/distro maintainer.
In conclusion, PowerOP only allows embedded platforms to join the PM fun
without affecting cpufreq-supported platforms adversely. Or that is the
original idea. If that rule is being violated, please feel free to point
that out.
The way forward as I see is:
1. PowerOP integration into mainline [patch 1/3]
- PM Core drivers (OMAP, x86) [patch 2/3]
- cpufreq drivers modified to use PowerOP (OMAP, x86) [patch 3/3]
At this point, PowerOP is an optional component in mainline tree.
Cpufreq drivers can _choose_ to use it or not. But now embedded
platforms can do the PM dance in a consistent way.
2. Support for more architectures - PPC, x86_64, XScale, MIPS? Platform
expertise needed here.
3. Move clock/voltage framework from arch/arm to kernel/clock and
kernel/voltage. This would allow more embedded platforms and potentially
future PC platforms to utilise the framework for 'automated'
clock/voltage dependency management.
4. Userspace policy managers are created to provide basic control of
system operating points. Embedded system integrators might want to
extend these policy managers to fully utilise the 'knobs' available on
their platforms.
5. <Crystal Ball Gazing/Wishful Thinking>
- Clock/Voltage FW <--> ACPI logical mappings allows us to use global
state names in /sys/power/state for system power state transitions.
- Drivers on PC platforms are fixed to use/unuse resources dynamically
to allow asynchronous peripheral power state transitions. Embedded
platforms use clock framework for this.
- PowerOP can address needs of all platforms, allowing removal of
cpufreq.
</CBG/WT>
Regards,
Amit
--
Amit Kucheria <amit.kucheria@nokia.com>
Nokia
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-15 10:35 ` Amit Kucheria
@ 2006-08-15 19:04 ` Dave Jones
2006-08-16 12:58 ` Igor Stoppa
` (2 more replies)
2006-08-17 21:24 ` Pavel Machek
1 sibling, 3 replies; 136+ messages in thread
From: Dave Jones @ 2006-08-15 19:04 UTC (permalink / raw)
To: Amit Kucheria; +Cc: linux-pm
On Tue, Aug 15, 2006 at 01:35:15PM +0300, Amit Kucheria wrote:
> Here is my shot at providing a 10,000ft view:
>
> a. For embedded platforms, cpufreq just does not cut it when specifying
> an operating point for the device. These platforms use tuples such as
>
> <voltage, pll_freq, core1_freq, core2_freq, interconnect_freq>
>
> to signify an 'operating point'. Note that core1 and core2 can be
> totally different processors e.g. ARM and DSP. x86 platforms hide this
> complexity behind ACPI.
If there are dependancies inherently linking core1 and core2, cpufreq
should already be programming both parts. For example, the SA1100
driver programs both CPU and SDRAM controller. If there isn't any dependancy
between them, I don't see the attraction of creating an artificial one
in the way suggested for no real purpose.
Things like voltage and frequency are closely tied together, so offering
any means of controlling them independantly makes no sense afaics.
> b. The clocking and voltage dependency tree in embedded devices can be
> summarised in the clock framework ( find ./arch -name clock* ) and
> soon-to-be-available voltage framework. These is again done to large
> extent by ACPI on x86.
Of the 14 x86 cpufreq drivers, 3 of them _optionally_ use ACPI.
powernow-k7 for example doesn't use it, and is possibly one of the most
stable cpufreq drivers we've had in the tree (for x86 at least).
> d. In the end, all this is leading to an interface for a user-space
> policy manager that will control _system_ power state based on
> constraints imposed by HW peripherals or on policies implemented by
> device manufacturer/distro maintainer.
How does that interface look from a userspace point of view ?
Hopefully not anything like the tuple described above.
Why would userspace ever care about "interconnect freq" ?
Userspace cares about "save power" or "go fast".
Historically, I wish we had never exposed frequencies, but instead
a performance percentage, so that the various userspace tools
didn't have to care about things like 'what frequencies are available'.
Adding the same mistake for voltages doesn't strike me as a fantastic idea.
> At this point, PowerOP is an optional component in mainline tree.
> Cpufreq drivers can _choose_ to use it or not. But now embedded
> platforms can do the PM dance in a consistent way.
That's about the only part I really like so far. The option to opt-out
where it makes absolutely no sense to pointlessly abstract stuff
(which for x86 seems to be the case). For ARM, I'm going to leave
Russell to comment/review.
Dave
--
http://www.codemonkey.org.uk
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-15 19:04 ` Dave Jones
@ 2006-08-16 12:58 ` Igor Stoppa
2006-08-17 21:39 ` Pavel Machek
2006-08-17 5:20 ` Matthew Locke
2006-08-17 9:18 ` Amit Kucheria
2 siblings, 1 reply; 136+ messages in thread
From: Igor Stoppa @ 2006-08-16 12:58 UTC (permalink / raw)
To: ext Dave Jones; +Cc: linux-pm, Kucheria Amit (Nokia-M/Tampere)
On Tue, 2006-08-15 at 22:04 +0300, ext Dave Jones wrote:
> On Tue, Aug 15, 2006 at 01:35:15PM +0300, Amit Kucheria wrote:
>
> > Here is my shot at providing a 10,000ft view:
> >
> > a. For embedded platforms, cpufreq just does not cut it when
> specifying
> > an operating point for the device. These platforms use tuples such
> as
> >
> > <voltage, pll_freq, core1_freq, core2_freq, interconnect_freq>
> >
> > to signify an 'operating point'. Note that core1 and core2 can be
> > totally different processors e.g. ARM and DSP. x86 platforms hide
> this
> > complexity behind ACPI.
>
> If there are dependancies inherently linking core1 and core2, cpufreq
> should already be programming both parts. For example, the SA1100
> driver programs both CPU and SDRAM controller. If there isn't any
> dependancy
> between them, I don't see the attraction of creating an artificial one
> in the way suggested for no real purpose.
>
> Things like voltage and frequency are closely tied together, so
> offering
> any means of controlling them independantly makes no sense afaics.
Yet a certain subsystem (for example an onboard camera, in a phone)
might require a higher voltage when it's active, effectively loosening
the tight coupling between freq and voltage that the porcessor is
enforcing.
>
> > b. The clocking and voltage dependency tree in embedded devices can
> be
> > summarised in the clock framework ( find ./arch -name clock* ) and
> > soon-to-be-available voltage framework. These is again done to
> large
> > extent by ACPI on x86.
>
> Of the 14 x86 cpufreq drivers, 3 of them _optionally_ use ACPI.
> powernow-k7 for example doesn't use it, and is possibly one of the
> most
> stable cpufreq drivers we've had in the tree (for x86 at least).
>
> > d. In the end, all this is leading to an interface for a user-space
> > policy manager that will control _system_ power state based on
> > constraints imposed by HW peripherals or on policies implemented by
> > device manufacturer/distro maintainer.
>
> How does that interface look from a userspace point of view ?
> Hopefully not anything like the tuple described above.
> Why would userspace ever care about "interconnect freq" ?
>
> Userspace cares about "save power" or "go fast".
> Historically, I wish we had never exposed frequencies, but instead
> a performance percentage, so that the various userspace tools
> didn't have to care about things like 'what frequencies are
> available'.
> Adding the same mistake for voltages doesn't strike me as a fantastic
> idea.
Such generic definitions are not enough for embedded userspace, the
complexity of the tuning is expected and accepted as long as it allows
to leverage the HW performance.
>
> > At this point, PowerOP is an optional component in mainline tree.
> > Cpufreq drivers can _choose_ to use it or not. But now embedded
> > platforms can do the PM dance in a consistent way.
>
> That's about the only part I really like so far. The option to opt-out
> where it makes absolutely no sense to pointlessly abstract stuff
> (which for x86 seems to be the case). For ARM, I'm going to leave
> Russell to comment/review.
>
> Dave
>
> --
> http://www.codemonkey.org.uk
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/linux-pm
>
>
--
Cheers,
Igor
Igor Stoppa (Nokia M - OSSO / Tampere)
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-16 12:58 ` Igor Stoppa
@ 2006-08-17 21:39 ` Pavel Machek
2006-08-18 10:02 ` Igor Stoppa
0 siblings, 1 reply; 136+ messages in thread
From: Pavel Machek @ 2006-08-17 21:39 UTC (permalink / raw)
To: Igor Stoppa; +Cc: linux-pm, Kucheria Amit (Nokia-M/Tampere)
Hi!
> > If there are dependancies inherently linking core1 and core2, cpufreq
> > should already be programming both parts. For example, the SA1100
> > driver programs both CPU and SDRAM controller. If there isn't any
> > dependancy
> > between them, I don't see the attraction of creating an artificial one
> > in the way suggested for no real purpose.
> >
> > Things like voltage and frequency are closely tied together, so
> > offering
> > any means of controlling them independantly makes no sense afaics.
> Yet a certain subsystem (for example an onboard camera, in a phone)
> might require a higher voltage when it's active, effectively loosening
> the tight coupling between freq and voltage that the porcessor is
> enforcing.
So... you expect userland to echo high > state before camera can be
used?
I'd rather have kernel automagically up the voltage when /dev/video0
is opened...
Pavel
--
Thanks for all the (sleeping) penguins.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-17 21:39 ` Pavel Machek
@ 2006-08-18 10:02 ` Igor Stoppa
2006-08-18 15:29 ` Alexey Starikovskiy
0 siblings, 1 reply; 136+ messages in thread
From: Igor Stoppa @ 2006-08-18 10:02 UTC (permalink / raw)
To: ext Pavel Machek; +Cc: linux-pm, Kucheria Amit (Nokia-M/Tampere)
On Fri, 2006-08-18 at 00:39 +0300, ext Pavel Machek wrote:
> Hi!
>
> > > If there are dependancies inherently linking core1 and core2,
> cpufreq
> > > should already be programming both parts. For example, the SA1100
> > > driver programs both CPU and SDRAM controller. If there isn't any
> > > dependancy
> > > between them, I don't see the attraction of creating an artificial
> one
> > > in the way suggested for no real purpose.
> > >
> > > Things like voltage and frequency are closely tied together, so
> > > offering
> > > any means of controlling them independantly makes no sense afaics.
>
> > Yet a certain subsystem (for example an onboard camera, in a phone)
> > might require a higher voltage when it's active, effectively
> loosening
> > the tight coupling between freq and voltage that the porcessor is
> > enforcing.
>
> So... you expect userland to echo high > state before camera can be
> used?
>
> I'd rather have kernel automagically up the voltage when /dev/video0
> is opened...
Not really, I meant that the CPU is not the only customer of power
domains (depend on the HW design), so the relation freq <-> voltage is
not always true.
--
Cheers,
Igor
Igor Stoppa (Nokia M - OSSO / Tampere)
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-18 10:02 ` Igor Stoppa
@ 2006-08-18 15:29 ` Alexey Starikovskiy
2006-08-18 17:54 ` Igor Stoppa
0 siblings, 1 reply; 136+ messages in thread
From: Alexey Starikovskiy @ 2006-08-18 15:29 UTC (permalink / raw)
To: Igor Stoppa; +Cc: linux-pm, ext Pavel Machek, Kucheria Amit (Nokia-M/Tampere)
Igor Stoppa wrote:
> On Fri, 2006-08-18 at 00:39 +0300, ext Pavel Machek wrote:
>> Hi!
>>
>>>> If there are dependancies inherently linking core1 and core2,
>> cpufreq
>>>> should already be programming both parts. For example, the SA1100
>>>> driver programs both CPU and SDRAM controller. If there isn't any
>>>> dependancy
>>>> between them, I don't see the attraction of creating an artificial
>> one
>>>> in the way suggested for no real purpose.
>>>>
>>>> Things like voltage and frequency are closely tied together, so
>>>> offering
>>>> any means of controlling them independantly makes no sense afaics.
>>> Yet a certain subsystem (for example an onboard camera, in a phone)
>>> might require a higher voltage when it's active, effectively
>> loosening
>>> the tight coupling between freq and voltage that the porcessor is
>>> enforcing.
>> So... you expect userland to echo high > state before camera can be
>> used?
>>
>> I'd rather have kernel automagically up the voltage when /dev/video0
>> is opened...
>
> Not really, I meant that the CPU is not the only customer of power
> domains (depend on the HW design), so the relation freq <-> voltage is
> not always true.
>
Then you need to introduce power domains and associate your devices with them, isn't it?
So if your camera appears in the same domain with CPU, the voltage of
that domain will go up either with camera=on, or CPU going to higher frequency.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-18 15:29 ` Alexey Starikovskiy
@ 2006-08-18 17:54 ` Igor Stoppa
2006-08-18 21:05 ` Alexey Starikovskiy
0 siblings, 1 reply; 136+ messages in thread
From: Igor Stoppa @ 2006-08-18 17:54 UTC (permalink / raw)
To: ext Alexey Starikovskiy
Cc: linux-pm, ext Pavel Machek, Kucheria Amit (Nokia-M/Tampere)
On Fri, 2006-08-18 at 18:29 +0300, ext Alexey Starikovskiy wrote:
> Igor Stoppa wrote:
> > On Fri, 2006-08-18 at 00:39 +0300, ext Pavel Machek wrote:
> >> Hi!
> >>
> >>>> If there are dependancies inherently linking core1 and core2,
> >> cpufreq
> >>>> should already be programming both parts. For example, the SA1100
> >>>> driver programs both CPU and SDRAM controller. If there isn't
> any
> >>>> dependancy
> >>>> between them, I don't see the attraction of creating an
> artificial
> >> one
> >>>> in the way suggested for no real purpose.
> >>>>
> >>>> Things like voltage and frequency are closely tied together, so
> >>>> offering
> >>>> any means of controlling them independantly makes no sense
> afaics.
> >>> Yet a certain subsystem (for example an onboard camera, in a
> phone)
> >>> might require a higher voltage when it's active, effectively
> >> loosening
> >>> the tight coupling between freq and voltage that the porcessor is
> >>> enforcing.
> >> So... you expect userland to echo high > state before camera can be
> >> used?
> >>
> >> I'd rather have kernel automagically up the voltage
> when /dev/video0
> >> is opened...
> >
> > Not really, I meant that the CPU is not the only customer of power
> > domains (depend on the HW design), so the relation freq <-> voltage
> is
> > not always true.
> >
> Then you need to introduce power domains and associate your devices
> with them, isn't it?
> So if your camera appears in the same domain with CPU, the voltage of
> that domain will go up either with camera=on, or CPU going to higher
> frequency.
I used the expression "power domain" to refer to a generic domain,
either voltage or frequency, to indicate that changing either freq or
voltage in a domain implies changing the domain power level.
Of course it is changing linearly with frequency, while the dependency
from voltage is quadratic.
So in the camera example we might have 2 different cases:
-the one mentioned above, where the camera shares the same voltage
domain with CPU and the correlation is the one you described
-another case where the clock frequency provided to the camera is
related to the resolution being used
camera off => no constraints
low res => low freq, high voltage
high res => high freq, high voltage
in such case the currently active resolution would affect whatever
device shares the camera clock, if any.
But no need to introduce power domains.
--
Cheers,
Igor
Igor Stoppa (Nokia M - OSSO / Tampere)
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-18 17:54 ` Igor Stoppa
@ 2006-08-18 21:05 ` Alexey Starikovskiy
2006-08-20 13:19 ` Igor Stoppa
0 siblings, 1 reply; 136+ messages in thread
From: Alexey Starikovskiy @ 2006-08-18 21:05 UTC (permalink / raw)
To: Igor Stoppa; +Cc: linux-pm, ext Pavel Machek, Kucheria Amit (Nokia-M/Tampere)
Igor Stoppa wrote:
> On Fri, 2006-08-18 at 18:29 +0300, ext Alexey Starikovskiy wrote:
>> Igor Stoppa wrote:
>>> On Fri, 2006-08-18 at 00:39 +0300, ext Pavel Machek wrote:
>>>> Hi!
>>>>
>>>>>> If there are dependancies inherently linking core1 and core2,
>>>> cpufreq
>>>>>> should already be programming both parts. For example, the SA1100
>>>>>> driver programs both CPU and SDRAM controller. If there isn't
>> any
>>>>>> dependancy
>>>>>> between them, I don't see the attraction of creating an
>> artificial
>>>> one
>>>>>> in the way suggested for no real purpose.
>>>>>>
>>>>>> Things like voltage and frequency are closely tied together, so
>>>>>> offering
>>>>>> any means of controlling them independantly makes no sense
>> afaics.
>>>>> Yet a certain subsystem (for example an onboard camera, in a
>> phone)
>>>>> might require a higher voltage when it's active, effectively
>>>> loosening
>>>>> the tight coupling between freq and voltage that the porcessor is
>>>>> enforcing.
>>>> So... you expect userland to echo high > state before camera can be
>>>> used?
>>>>
>>>> I'd rather have kernel automagically up the voltage
>> when /dev/video0
>>>> is opened...
>>> Not really, I meant that the CPU is not the only customer of power
>>> domains (depend on the HW design), so the relation freq <-> voltage
>> is
>>> not always true.
>>>
>> Then you need to introduce power domains and associate your devices
>> with them, isn't it?
>> So if your camera appears in the same domain with CPU, the voltage of
>> that domain will go up either with camera=on, or CPU going to higher
>> frequency.
>
> I used the expression "power domain" to refer to a generic domain,
> either voltage or frequency, to indicate that changing either freq or
> voltage in a domain implies changing the domain power level.
>
> Of course it is changing linearly with frequency, while the dependency
> from voltage is quadratic.
>
> So in the camera example we might have 2 different cases:
>
> -the one mentioned above, where the camera shares the same voltage
> domain with CPU and the correlation is the one you described
>
> -another case where the clock frequency provided to the camera is
> related to the resolution being used
>
> camera off => no constraints
> low res => low freq, high voltage
> high res => high freq, high voltage
>
> in such case the currently active resolution would affect whatever
> device shares the camera clock, if any.
>
> But no need to introduce power domains.
>
How about introducing a frequency domain as well?
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-18 21:05 ` Alexey Starikovskiy
@ 2006-08-20 13:19 ` Igor Stoppa
0 siblings, 0 replies; 136+ messages in thread
From: Igor Stoppa @ 2006-08-20 13:19 UTC (permalink / raw)
To: ext Alexey Starikovskiy
Cc: linux-pm, ext Pavel Machek, Kucheria Amit (Nokia-M/Tampere)
On Sat, 2006-08-19 at 00:05 +0300, ext Alexey Starikovskiy wrote:
> Igor Stoppa wrote:
> > On Fri, 2006-08-18 at 18:29 +0300, ext Alexey Starikovskiy wrote:
> >> Igor Stoppa wrote:
> >>> On Fri, 2006-08-18 at 00:39 +0300, ext Pavel Machek wrote:
> >>>> Hi!
> >>>>
> >>>>>> If there are dependancies inherently linking core1 and core2,
> >>>> cpufreq
> >>>>>> should already be programming both parts. For example, the
> SA1100
> >>>>>> driver programs both CPU and SDRAM controller. If there isn't
> >> any
> >>>>>> dependancy
> >>>>>> between them, I don't see the attraction of creating an
> >> artificial
> >>>> one
> >>>>>> in the way suggested for no real purpose.
> >>>>>>
> >>>>>> Things like voltage and frequency are closely tied together, so
> >>>>>> offering
> >>>>>> any means of controlling them independantly makes no sense
> >> afaics.
> >>>>> Yet a certain subsystem (for example an onboard camera, in a
> >> phone)
> >>>>> might require a higher voltage when it's active, effectively
> >>>> loosening
> >>>>> the tight coupling between freq and voltage that the porcessor
> is
> >>>>> enforcing.
> >>>> So... you expect userland to echo high > state before camera can
> be
> >>>> used?
> >>>>
> >>>> I'd rather have kernel automagically up the voltage
> >> when /dev/video0
> >>>> is opened...
> >>> Not really, I meant that the CPU is not the only customer of power
> >>> domains (depend on the HW design), so the relation freq <->
> voltage
> >> is
> >>> not always true.
> >>>
> >> Then you need to introduce power domains and associate your devices
> >> with them, isn't it?
> >> So if your camera appears in the same domain with CPU, the voltage
> of
> >> that domain will go up either with camera=on, or CPU going to
> higher
> >> frequency.
> >
> > I used the expression "power domain" to refer to a generic domain,
> > either voltage or frequency, to indicate that changing either freq
> or
> > voltage in a domain implies changing the domain power level.
> >
> > Of course it is changing linearly with frequency, while the
> dependency
> > from voltage is quadratic.
> >
> > So in the camera example we might have 2 different cases:
> >
> > -the one mentioned above, where the camera shares the same voltage
> > domain with CPU and the correlation is the one you described
> >
> > -another case where the clock frequency provided to the camera is
> > related to the resolution being used
> >
> > camera off => no constraints
> > low res => low freq, high voltage
> > high res => high freq, high voltage
> >
> > in such case the currently active resolution would affect whatever
> > device shares the camera clock, if any.
> >
> > But no need to introduce power domains.
> >
> How about introducing a frequency domain as well?
The clock framework deals with clock correlations and dependencies.
--
Cheers,
Igor
Igor Stoppa (Nokia M - OSSO / Tampere)
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-15 19:04 ` Dave Jones
2006-08-16 12:58 ` Igor Stoppa
@ 2006-08-17 5:20 ` Matthew Locke
2006-08-17 7:20 ` Paul Mundt
2006-08-17 9:18 ` Amit Kucheria
2 siblings, 1 reply; 136+ messages in thread
From: Matthew Locke @ 2006-08-17 5:20 UTC (permalink / raw)
To: Dave Jones; +Cc: linux-pm
On Aug 15, 2006, at 12:04 PM, Dave Jones wrote:
> On Tue, Aug 15, 2006 at 01:35:15PM +0300, Amit Kucheria wrote:
>
>> Here is my shot at providing a 10,000ft view:
>>
>> a. For embedded platforms, cpufreq just does not cut it when
>> specifying
>> an operating point for the device. These platforms use tuples such as
>>
>> <voltage, pll_freq, core1_freq, core2_freq, interconnect_freq>
>>
>> to signify an 'operating point'. Note that core1 and core2 can be
>> totally different processors e.g. ARM and DSP. x86 platforms hide this
>> complexity behind ACPI.
>
> If there are dependancies inherently linking core1 and core2, cpufreq
> should already be programming both parts. For example, the SA1100
> driver programs both CPU and SDRAM controller. If there isn't any
> dependancy
> between them, I don't see the attraction of creating an artificial one
> in the way suggested for no real purpose.
Are you arguing against the operating point concept because it creates
an artificial dependency? I assume your definition of dependency means
a physical dependency.
The operating point represents both a physical and operational
dependency. It is a collection of parameters that can/will be adjusted
to reduce power consumption. However, adjusting these parameters can
have a severe impact to performance and operational state of the
system. The parameters can not be adjusted individually and still
achieve the goal of an operational and power efficient system. SoC's
have a fixed number of values in a fixed number of combinations that
keep the system operational and power efficient. Using power op, a
piece of controlling software can tell the system to go to specific
instance of the power parameters that provide the best combination of
power savings and performance/operational integrity according to the
current state of the system. This instance is represented by a string.
PowerOP is needed to do advanced power management on embedded mobile
devices.
>
> Things like voltage and frequency are closely tied together, so
> offering
> any means of controlling them independantly makes no sense afaics.
It's not about controlling parameters independently. We need to be
able to control them as described above.
>
>> b. The clocking and voltage dependency tree in embedded devices can be
>> summarised in the clock framework ( find ./arch -name clock* ) and
>> soon-to-be-available voltage framework. These is again done to large
>> extent by ACPI on x86.
>
> Of the 14 x86 cpufreq drivers, 3 of them _optionally_ use ACPI.
> powernow-k7 for example doesn't use it, and is possibly one of the most
> stable cpufreq drivers we've had in the tree (for x86 at least).
>
>> d. In the end, all this is leading to an interface for a user-space
>> policy manager that will control _system_ power state based on
>> constraints imposed by HW peripherals or on policies implemented by
>> device manufacturer/distro maintainer.
>
> How does that interface look from a userspace point of view ?
> Hopefully not anything like the tuple described above.
> Why would userspace ever care about "interconnect freq" ?
>
> Userspace cares about "save power" or "go fast".
> Historically, I wish we had never exposed frequencies, but instead
> a performance percentage, so that the various userspace tools
> didn't have to care about things like 'what frequencies are available'.
> Adding the same mistake for voltages doesn't strike me as a fantastic
> idea.
I'm not sure I follow your comments here. We are not making the same
mistake. In fact we are fixing it with PowerOP. The power parameters
are represented by a name and you create whatever name makes sense for
your system. In fact the names can all be the same for the various x86
platforms if you so desire. The abstraction allows userspace to use
the name and not know anything about the frequencies or voltages. As
Scott pointed out, some power managers will need to know lots of
architecture and board specific details to be able to reduce power
consumption and keep the system operational. The abstraction enables
this as well.
>
>> At this point, PowerOP is an optional component in mainline tree.
>> Cpufreq drivers can _choose_ to use it or not. But now embedded
>> platforms can do the PM dance in a consistent way.
>
> That's about the only part I really like so far. The option to opt-out
> where it makes absolutely no sense to pointlessly abstract stuff
> (which for x86 seems to be the case). For ARM, I'm going to leave
> Russell to comment/review.
I'm not following why you think PowerOP isn't needed for x86. It seems
to address the issues with cpufreq that you point out above. The
conclusion we reached at the PM summit was that cpufreq/PowerOP
integration was useful and desired.
If we need to, I'm happy to put the integration of cpufreq/PowerOP
aside and just work on getting PowerOP accepted.
>
> Dave
>
> --
> http://www.codemonkey.org.uk
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/linux-pm
>
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-17 5:20 ` Matthew Locke
@ 2006-08-17 7:20 ` Paul Mundt
0 siblings, 0 replies; 136+ messages in thread
From: Paul Mundt @ 2006-08-17 7:20 UTC (permalink / raw)
To: Matthew Locke; +Cc: linux-pm
On Wed, Aug 16, 2006 at 10:20:42PM -0700, Matthew Locke wrote:
> On Aug 15, 2006, at 12:04 PM, Dave Jones wrote:
> > If there are dependancies inherently linking core1 and core2, cpufreq
> > should already be programming both parts. For example, the SA1100
> > driver programs both CPU and SDRAM controller. If there isn't any
> > dependancy between them, I don't see the attraction of creating an
> > artificial one in the way suggested for no real purpose.
>
> Are you arguing against the operating point concept because it creates
> an artificial dependency? I assume your definition of dependency means
> a physical dependency.
>
> The operating point represents both a physical and operational
> dependency. It is a collection of parameters that can/will be adjusted
> to reduce power consumption. However, adjusting these parameters can
> have a severe impact to performance and operational state of the
> system. The parameters can not be adjusted individually and still
> achieve the goal of an operational and power efficient system. SoC's
> have a fixed number of values in a fixed number of combinations that
> keep the system operational and power efficient. Using power op, a
> piece of controlling software can tell the system to go to specific
> instance of the power parameters that provide the best combination of
> power savings and performance/operational integrity according to the
> current state of the system. This instance is represented by a string.
>
The core1 and core interdependencies are something that cpufreq doesn't
handle particularly well. If it's something as simple as recalibrating
your baud rate generator or adjusting the SDRAM controller, these are
all things that need to be done for normal operation to continue, not
things that end up being exposed or configurable, so it tends to largely
ignore the interdependency issue.
The problem occurs when you have independent cores that want to be
throttled or scaled independently, yet still have some fixed dependency
between them (say, enabling a synchronization circuit), where failure to
handle this will ultimately result in core reset or otherwise undefined
behaviour.
In order to handle this cleanly, one would need multiple drivers for
each core, as well as some shared common code for handling the clocks,
voltage, and sanity checks. This alone already begins to enter the scope
for what things like PowerOP are reasonably suited for. The operating
point semantics work well here, as independent core states can be
trivially defined based off of vendor-defined usage profiles. It's
necessary to have a big picture view of the operating point in order to
sanely handle sanity checks and rate validation, which is something that
gets rather ugly with cpufreq without referencing common validation code
between each core (which will also cause problems for code reuse in the
cases where one of the cores is used in another processor).
PowerOP seems well suited for these sorts of cases, and it's not clear
that trying to beat cpufreq in to submission will offer any benefits in
this case, particularly since it's something x86 people largely don't
seem to care about, given that ACPI does most of the heavy lifting.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-15 19:04 ` Dave Jones
2006-08-16 12:58 ` Igor Stoppa
2006-08-17 5:20 ` Matthew Locke
@ 2006-08-17 9:18 ` Amit Kucheria
2006-08-17 21:40 ` Pavel Machek
2 siblings, 1 reply; 136+ messages in thread
From: Amit Kucheria @ 2006-08-17 9:18 UTC (permalink / raw)
To: ext Dave Jones; +Cc: linux-pm
On Tue, 2006-08-15 at 15:04 -0400, ext Dave Jones wrote:
>
> > d. In the end, all this is leading to an interface for a user-space
> > policy manager that will control _system_ power state based on
> > constraints imposed by HW peripherals or on policies implemented by
> > device manufacturer/distro maintainer.
>
> How does that interface look from a userspace point of view ?
> Hopefully not anything like the tuple described above.
> Why would userspace ever care about "interconnect freq" ?
>
> Userspace cares about "save power" or "go fast".
> Historically, I wish we had never exposed frequencies, but instead
> a performance percentage, so that the various userspace tools
> didn't have to care about things like 'what frequencies are
> available'.
> Adding the same mistake for voltages doesn't strike me as a fantastic
> idea.
The userspace interface in Eungeny's patches is for other userspace
programs (policy managers) to activate/deactivate valid operating points
in the system dynamically and if necessary, introduce new ones into the
system. It will also allow the operating points to be referenced by name
instead of the tuple.
Then, we will be able to use names like 'video', 'mp3', 'fast',
'powersave', 'usb' to switch to the relevant operating point based on
configuration of the policy manager.
Regards,
Amit
--
Amit Kucheria <amit.kucheria@nokia.com>
Nokia
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-17 9:18 ` Amit Kucheria
@ 2006-08-17 21:40 ` Pavel Machek
2006-08-18 5:42 ` Vitaly Wool
2006-08-18 11:48 ` Amit Kucheria
0 siblings, 2 replies; 136+ messages in thread
From: Pavel Machek @ 2006-08-17 21:40 UTC (permalink / raw)
To: Amit Kucheria; +Cc: linux-pm
Hi!
> > Userspace cares about "save power" or "go fast".
> > Historically, I wish we had never exposed frequencies, but instead
> > a performance percentage, so that the various userspace tools
> > didn't have to care about things like 'what frequencies are
> > available'.
> > Adding the same mistake for voltages doesn't strike me as a fantastic
> > idea.
>
> The userspace interface in Eungeny's patches is for other userspace
> programs (policy managers) to activate/deactivate valid operating points
> in the system dynamically and if necessary, introduce new ones into the
> system. It will also allow the operating points to be referenced by name
> instead of the tuple.
>
> Then, we will be able to use names like 'video', 'mp3', 'fast',
> 'powersave', 'usb' to switch to the relevant operating point based on
> configuration of the policy manager.
This seems to be too specific to embedded machine.
If userspace wants to work with usb and play mp3s at the same time,
what does it do?
Pavel
--
Thanks for all the (sleeping) penguins.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-17 21:40 ` Pavel Machek
@ 2006-08-18 5:42 ` Vitaly Wool
2006-08-23 12:28 ` Pavel Machek
2006-08-18 11:48 ` Amit Kucheria
1 sibling, 1 reply; 136+ messages in thread
From: Vitaly Wool @ 2006-08-18 5:42 UTC (permalink / raw)
To: Pavel Machek; +Cc: linux-pm
On 8/18/06, Pavel Machek <pavel@ucw.cz> wrote:
> > Then, we will be able to use names like 'video', 'mp3', 'fast',
> > 'powersave', 'usb' to switch to the relevant operating point based on
> > configuration of the policy manager.
>
> This seems to be too specific to embedded machine.
>
> If userspace wants to work with usb and play mp3s at the same time,
> what does it do?
I guess it just defines an appropriate policy. You can call it
'usb_mp3' if you wish ;)
I don't think it's too embedded-specific.
Vitaly
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-18 5:42 ` Vitaly Wool
@ 2006-08-23 12:28 ` Pavel Machek
2006-08-23 15:26 ` Igor Stoppa
2006-08-24 12:58 ` Vitaly Wool
0 siblings, 2 replies; 136+ messages in thread
From: Pavel Machek @ 2006-08-23 12:28 UTC (permalink / raw)
To: Vitaly Wool; +Cc: linux-pm
> On 8/18/06, Pavel Machek <pavel@ucw.cz> wrote:
> >> Then, we will be able to use names like 'video', 'mp3', 'fast',
> >> 'powersave', 'usb' to switch to the relevant operating point based on
> >> configuration of the policy manager.
> >
> >This seems to be too specific to embedded machine.
> >
> >If userspace wants to work with usb and play mp3s at the same time,
> >what does it do?
>
> I guess it just defines an appropriate policy. You can call it
> 'usb_mp3' if you wish ;)
> I don't think it's too embedded-specific.
Well, it leads to exponential number of policies -- not nice. Having
usb_mp3_fileserver_webserver is not nice.
Pavel
--
Thanks, Sharp!
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-23 12:28 ` Pavel Machek
@ 2006-08-23 15:26 ` Igor Stoppa
2006-08-24 12:58 ` Vitaly Wool
1 sibling, 0 replies; 136+ messages in thread
From: Igor Stoppa @ 2006-08-23 15:26 UTC (permalink / raw)
To: ext Pavel Machek; +Cc: linux-pm
On Wed, 2006-08-23 at 15:28 +0300, ext Pavel Machek wrote:
> > On 8/18/06, Pavel Machek <pavel@ucw.cz> wrote:
> > >> Then, we will be able to use names like 'video', 'mp3', 'fast',
> > >> 'powersave', 'usb' to switch to the relevant operating point
> based on
> > >> configuration of the policy manager.
> > >
> > >This seems to be too specific to embedded machine.
> > >
> > >If userspace wants to work with usb and play mp3s at the same time,
> > >what does it do?
> >
> > I guess it just defines an appropriate policy. You can call it
> > 'usb_mp3' if you wish ;)
> > I don't think it's too embedded-specific.
>
> Well, it leads to exponential number of policies -- not nice. Having
> usb_mp3_fileserver_webserver is not nice.
The whole idea is that you have generic "good enough" policies for
unknown cases-combinations, plus specific policies for well known cases.
This tends to be simpler for embedded systems, of course.
But even for your laptops you can identify few major use cases, that
incidentally tend to overlap with embedded devices use cases:
-mp3
-video
-browsing
-name your own
...
--
Cheers,
Igor
Igor Stoppa (Nokia M - OSSO / Tampere)
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-23 12:28 ` Pavel Machek
2006-08-23 15:26 ` Igor Stoppa
@ 2006-08-24 12:58 ` Vitaly Wool
2006-08-25 19:55 ` Pavel Machek
1 sibling, 1 reply; 136+ messages in thread
From: Vitaly Wool @ 2006-08-24 12:58 UTC (permalink / raw)
To: Pavel Machek; +Cc: linux-pm
On 8/23/06, Pavel Machek <pavel@ucw.cz> wrote:
> > >
> > >This seems to be too specific to embedded machine.
> > >
> > >If userspace wants to work with usb and play mp3s at the same time,
> > >what does it do?
> >
> > I guess it just defines an appropriate policy. You can call it
> > 'usb_mp3' if you wish ;)
> > I don't think it's too embedded-specific.
>
> Well, it leads to exponential number of policies -- not nice. Having
> usb_mp3_fileserver_webserver is not nice.
No. The reason is there's no _real_ difference in 'fileserver' and
'webserver' from the PM POV, so this will never happen.
There's no reason to introduce different policies for different use
cases which however imply similar peripherals utilization.
Moreover, I never play MP3s on a fileserver/webserver. The example
you've given is pretty much artificial.
Vitaly
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-24 12:58 ` Vitaly Wool
@ 2006-08-25 19:55 ` Pavel Machek
2006-08-25 23:26 ` Vitaly Wool
0 siblings, 1 reply; 136+ messages in thread
From: Pavel Machek @ 2006-08-25 19:55 UTC (permalink / raw)
To: Vitaly Wool; +Cc: linux-pm
On Thu 2006-08-24 16:58:50, Vitaly Wool wrote:
> On 8/23/06, Pavel Machek <pavel@ucw.cz> wrote:
> >> >
> >> >This seems to be too specific to embedded machine.
> >> >
> >> >If userspace wants to work with usb and play mp3s at the same time,
> >> >what does it do?
> >>
> >> I guess it just defines an appropriate policy. You can call it
> >> 'usb_mp3' if you wish ;)
> >> I don't think it's too embedded-specific.
> >
> >Well, it leads to exponential number of policies -- not nice. Having
> >usb_mp3_fileserver_webserver is not nice.
>
> No. The reason is there's no _real_ difference in 'fileserver' and
> 'webserver' from the PM POV, so this will never happen.
> There's no reason to introduce different policies for different use
> cases which however imply similar peripherals utilization.
> Moreover, I never play MP3s on a fileserver/webserver. The example
> you've given is pretty much artificial.
My notebook has 23 different devices. Do you really want to have
8388608 policies for different perihepal utilizations?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-25 19:55 ` Pavel Machek
@ 2006-08-25 23:26 ` Vitaly Wool
2006-08-26 10:18 ` Pavel Machek
0 siblings, 1 reply; 136+ messages in thread
From: Vitaly Wool @ 2006-08-25 23:26 UTC (permalink / raw)
To: Pavel Machek; +Cc: linux-pm
On 8/25/06, Pavel Machek <pavel@ucw.cz> wrote:
> > No. The reason is there's no _real_ difference in 'fileserver' and
> > 'webserver' from the PM POV, so this will never happen.
> > There's no reason to introduce different policies for different use
> > cases which however imply similar peripherals utilization.
> > Moreover, I never play MP3s on a fileserver/webserver. The example
> > you've given is pretty much artificial.
>
> My notebook has 23 different devices. Do you really want to have
> 8388608 policies for different perihepal utilizations?
Can you please elaborate on how this number corresponds to the reality?
Looks like you don't catch what I'm saying. I'm talking about use
case-driven model in which you will need to invent 8388608 use cases
basically in order to have 8388608 policies. IOW, not any combination
is valid within these 8388608.
Vitaly
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-25 23:26 ` Vitaly Wool
@ 2006-08-26 10:18 ` Pavel Machek
2006-08-26 13:30 ` Vitaly Wool
0 siblings, 1 reply; 136+ messages in thread
From: Pavel Machek @ 2006-08-26 10:18 UTC (permalink / raw)
To: Vitaly Wool; +Cc: linux-pm
Hi!
> >> No. The reason is there's no _real_ difference in 'fileserver' and
> >> 'webserver' from the PM POV, so this will never happen.
> >> There's no reason to introduce different policies for different use
> >> cases which however imply similar peripherals utilization.
> >> Moreover, I never play MP3s on a fileserver/webserver. The example
> >> you've given is pretty much artificial.
> >
> >My notebook has 23 different devices. Do you really want to have
> >8388608 policies for different perihepal utilizations?
>
> Can you please elaborate on how this number corresponds to the reality?
> Looks like you don't catch what I'm saying. I'm talking about use
> case-driven model in which you will need to invent 8388608 use cases
> basically in order to have 8388608 policies. IOW, not any combination
> is valid within these 8388608.
I'm saying that usecase-driven model is not acceptable for a
kernel. It is not kernel's business to limit user to particular usage
models.
That's why your model works for closed machines like a cellphones, but
is totally broken for notebook. Sorry.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-26 10:18 ` Pavel Machek
@ 2006-08-26 13:30 ` Vitaly Wool
2006-08-26 13:46 ` Pavel Machek
0 siblings, 1 reply; 136+ messages in thread
From: Vitaly Wool @ 2006-08-26 13:30 UTC (permalink / raw)
To: Pavel Machek; +Cc: linux-pm
Hi Pavel,
On 8/26/06, Pavel Machek <pavel@suse.cz> wrote:
> Hi!
>
> > >> No. The reason is there's no _real_ difference in 'fileserver' and
> > >> 'webserver' from the PM POV, so this will never happen.
> > >> There's no reason to introduce different policies for different use
> > >> cases which however imply similar peripherals utilization.
> > >> Moreover, I never play MP3s on a fileserver/webserver. The example
> > >> you've given is pretty much artificial.
> > >
> > >My notebook has 23 different devices. Do you really want to have
> > >8388608 policies for different perihepal utilizations?
> >
> > Can you please elaborate on how this number corresponds to the reality?
> > Looks like you don't catch what I'm saying. I'm talking about use
> > case-driven model in which you will need to invent 8388608 use cases
> > basically in order to have 8388608 policies. IOW, not any combination
> > is valid within these 8388608.
>
> I'm saying that usecase-driven model is not acceptable for a
> kernel. It is not kernel's business to limit user to particular usage
> models.
>
> That's why your model works for closed machines like a cellphones, but
> is totally broken for notebook. Sorry.
Who talks about kernel? A policy is an userspace thing. I guess we're
not quite understanding each other :)
Vitaly
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-26 13:30 ` Vitaly Wool
@ 2006-08-26 13:46 ` Pavel Machek
2006-08-28 16:40 ` Mark Gross
0 siblings, 1 reply; 136+ messages in thread
From: Pavel Machek @ 2006-08-26 13:46 UTC (permalink / raw)
To: Vitaly Wool; +Cc: linux-pm
On Sat 2006-08-26 17:30:40, Vitaly Wool wrote:
> Hi Pavel,
>
> On 8/26/06, Pavel Machek <pavel@suse.cz> wrote:
> >Hi!
> >
> >> >> No. The reason is there's no _real_ difference in 'fileserver' and
> >> >> 'webserver' from the PM POV, so this will never happen.
> >> >> There's no reason to introduce different policies for different use
> >> >> cases which however imply similar peripherals utilization.
> >> >> Moreover, I never play MP3s on a fileserver/webserver. The example
> >> >> you've given is pretty much artificial.
> >> >
> >> >My notebook has 23 different devices. Do you really want to have
> >> >8388608 policies for different perihepal utilizations?
> >>
> >> Can you please elaborate on how this number corresponds to the reality?
> >> Looks like you don't catch what I'm saying. I'm talking about use
> >> case-driven model in which you will need to invent 8388608 use cases
> >> basically in order to have 8388608 policies. IOW, not any combination
> >> is valid within these 8388608.
> >
> >I'm saying that usecase-driven model is not acceptable for a
> >kernel. It is not kernel's business to limit user to particular usage
> >models.
> >
> >That's why your model works for closed machines like a cellphones, but
> >is totally broken for notebook. Sorry.
>
> Who talks about kernel? A policy is an userspace thing. I guess we're
> not quite understanding each other :)
You upload policies to kernel. You want 5 policies for your cellphone,
and thats fine, but I'm telling you I'd need 8388608 policies for my
notebook, because devices are independent and users want separate
control.
Because 8388608 policies is clearly not reasonable, powerop can not
help here, and something better should be developed... like power
domains someone proposed here.
(Or to say it in another words, powerop forces one big power domain,
which is bad model for notebook-style machine).
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-26 13:46 ` Pavel Machek
@ 2006-08-28 16:40 ` Mark Gross
2006-08-28 17:39 ` Pavel Machek
0 siblings, 1 reply; 136+ messages in thread
From: Mark Gross @ 2006-08-28 16:40 UTC (permalink / raw)
To: Pavel Machek; +Cc: linux-pm
On Sat, Aug 26, 2006 at 03:46:53PM +0200, Pavel Machek wrote:
> On Sat 2006-08-26 17:30:40, Vitaly Wool wrote:
> > Hi Pavel,
> >
> > On 8/26/06, Pavel Machek <pavel@suse.cz> wrote:
> > >Hi!
> > >
> > >> >> No. The reason is there's no _real_ difference in 'fileserver' and
> > >> >> 'webserver' from the PM POV, so this will never happen.
> > >> >> There's no reason to introduce different policies for different use
> > >> >> cases which however imply similar peripherals utilization.
> > >> >> Moreover, I never play MP3s on a fileserver/webserver. The example
> > >> >> you've given is pretty much artificial.
> > >> >
> > >> >My notebook has 23 different devices. Do you really want to have
> > >> >8388608 policies for different perihepal utilizations?
> > >>
> > >> Can you please elaborate on how this number corresponds to the reality?
> > >> Looks like you don't catch what I'm saying. I'm talking about use
> > >> case-driven model in which you will need to invent 8388608 use cases
> > >> basically in order to have 8388608 policies. IOW, not any combination
> > >> is valid within these 8388608.
> > >
> > >I'm saying that usecase-driven model is not acceptable for a
> > >kernel. It is not kernel's business to limit user to particular usage
> > >models.
> > >
> > >That's why your model works for closed machines like a cellphones, but
> > >is totally broken for notebook. Sorry.
> >
> > Who talks about kernel? A policy is an userspace thing. I guess we're
> > not quite understanding each other :)
>
> You upload policies to kernel. You want 5 policies for your cellphone,
> and thats fine, but I'm telling you I'd need 8388608 policies for my
> notebook, because devices are independent and users want separate
> control.
No. Users do not, and if they do they won't deal directly with that large
of a number of power states.
Do not confuse policies with operating points. The policies define the
sets of operating points that are valid at a given time and the policy
manager attempts to set the optimum OP.
The user will only deal with the set of policies exported by whatever
policy manager is in use. Not the operating points.
power op is attempting to build a power management stack and is near the
bottom of the stack.
>
> Because 8388608 policies is clearly not reasonable, powerop can not
> help here, and something better should be developed... like power
> domains someone proposed here.
>
> (Or to say it in another words, powerop forces one big power domain,
> which is bad model for notebook-style machine).
I doubt notebook-style machines will ever us power op in any
significant way. HPC and embedded will be the first users.
Power domains will likely build on top power op.
Power domains adds complexities themselves. Dealing with
dependencies and constraints between domains will be a challenge.
It is an interesting thought about implementing powerop interfaces on a
per power domain bases....
--mgross
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-28 16:40 ` Mark Gross
@ 2006-08-28 17:39 ` Pavel Machek
2006-08-29 7:51 ` Matthew Locke
2006-08-30 22:13 ` Mark Gross
0 siblings, 2 replies; 136+ messages in thread
From: Pavel Machek @ 2006-08-28 17:39 UTC (permalink / raw)
To: Mark Gross; +Cc: linux-pm
On Mon 2006-08-28 09:40:38, Mark Gross wrote:
> On Sat, Aug 26, 2006 at 03:46:53PM +0200, Pavel Machek wrote:
> > On Sat 2006-08-26 17:30:40, Vitaly Wool wrote:
> > > On 8/26/06, Pavel Machek <pavel@suse.cz> wrote:
> > Because 8388608 policies is clearly not reasonable, powerop can not
> > help here, and something better should be developed... like power
> > domains someone proposed here.
> >
> > (Or to say it in another words, powerop forces one big power domain,
> > which is bad model for notebook-style machine).
>
> I doubt notebook-style machines will ever us power op in any
> significant way. HPC and embedded will be the first users.
I agree here... power op look useless for notebooks. But I doubt power
op authors would agree...
> Power domains will likely build on top power op.
>
> Power domains adds complexities themselves. Dealing with
> dependencies and constraints between domains will be a challenge.
Once we have power domains in/solved... do we still need power op? I
thought power op could be useful for solving constrains _inside_ one
domain, but...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-28 17:39 ` Pavel Machek
@ 2006-08-29 7:51 ` Matthew Locke
2006-08-30 22:13 ` Mark Gross
1 sibling, 0 replies; 136+ messages in thread
From: Matthew Locke @ 2006-08-29 7:51 UTC (permalink / raw)
To: Pavel Machek; +Cc: linux-pm
On Aug 28, 2006, at 10:39 AM, Pavel Machek wrote:
> On Mon 2006-08-28 09:40:38, Mark Gross wrote:
>> On Sat, Aug 26, 2006 at 03:46:53PM +0200, Pavel Machek wrote:
>>> On Sat 2006-08-26 17:30:40, Vitaly Wool wrote:
>>>> On 8/26/06, Pavel Machek <pavel@suse.cz> wrote:
>>> Because 8388608 policies is clearly not reasonable, powerop can not
>>> help here, and something better should be developed... like power
>>> domains someone proposed here.
>>>
>>> (Or to say it in another words, powerop forces one big power domain,
>>> which is bad model for notebook-style machine).
>>
>> I doubt notebook-style machines will ever us power op in any
>> significant way. HPC and embedded will be the first users.
>
> I agree here... power op look useless for notebooks. But I doubt power
> op authors would agree...
Agree that something I work on is useless? Never:) I know I sound like
a broken record but...
PowerOP is the basic building block for scaling power management. Its
as useless or useful as the cpufreq_driver layer of cpufreq is on
laptops. You can think of PowerOP as a redesign of cpufreq_driver that
enables other software in the PM stack to select a group of power
parameter values by a string. On x86 this other software can continue
to be cpufreq. On embedded devices the other software can use the
powerop sysfs api or kernel APIs.
>
>> Power domains will likely build on top power op.
>>
>> Power domains adds complexities themselves. Dealing with
>> dependencies and constraints between domains will be a challenge.
>
> Once we have power domains in/solved... do we still need power op? I
> thought power op could be useful for solving constrains _inside_ one
> domain, but...
I don't have a specific answer for this. We will deal with it when we
port to hardware that has power domain control.
> Pavel
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures)
> http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/linux-pm
>
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-28 17:39 ` Pavel Machek
2006-08-29 7:51 ` Matthew Locke
@ 2006-08-30 22:13 ` Mark Gross
2006-08-30 22:27 ` Pavel Machek
1 sibling, 1 reply; 136+ messages in thread
From: Mark Gross @ 2006-08-30 22:13 UTC (permalink / raw)
To: Pavel Machek; +Cc: linux-pm
On Mon, Aug 28, 2006 at 07:39:57PM +0200, Pavel Machek wrote:
> On Mon 2006-08-28 09:40:38, Mark Gross wrote:
> > On Sat, Aug 26, 2006 at 03:46:53PM +0200, Pavel Machek wrote:
> > > On Sat 2006-08-26 17:30:40, Vitaly Wool wrote:
> > > > On 8/26/06, Pavel Machek <pavel@suse.cz> wrote:
> > > Because 8388608 policies is clearly not reasonable, powerop can not
> > > help here, and something better should be developed... like power
> > > domains someone proposed here.
> > >
> > > (Or to say it in another words, powerop forces one big power domain,
> > > which is bad model for notebook-style machine).
> >
> > I doubt notebook-style machines will ever us power op in any
> > significant way. HPC and embedded will be the first users.
>
> I agree here... power op look useless for notebooks. But I doubt power
> op authors would agree...
Concluding that it will be useless for notebooks may be premature.
I see powerop as the bottom of an future PM stack. As the upper layers
take shape who knows what platforms will use it?
>
> > Power domains will likely build on top power op.
> >
> > Power domains adds complexities themselves. Dealing with
> > dependencies and constraints between domains will be a challenge.
>
> Once we have power domains in/solved... do we still need power op? I
> thought power op could be useful for solving constrains _inside_ one
> domain, but...
Power domains and the components within them will likely be accessed as
operating points. I think we need to build the power domain
abstractions on top of operating points. This is why I want to see
support for multiple power_op_driver instances or a story for how
operating points are added to a running system or even platform to
enable and deal with domains.
--mgross
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-30 22:13 ` Mark Gross
@ 2006-08-30 22:27 ` Pavel Machek
0 siblings, 0 replies; 136+ messages in thread
From: Pavel Machek @ 2006-08-30 22:27 UTC (permalink / raw)
To: Mark Gross; +Cc: linux-pm
On Wed 2006-08-30 15:13:54, Mark Gross wrote:
> On Mon, Aug 28, 2006 at 07:39:57PM +0200, Pavel Machek wrote:
> > On Mon 2006-08-28 09:40:38, Mark Gross wrote:
> > > On Sat, Aug 26, 2006 at 03:46:53PM +0200, Pavel Machek wrote:
> > > > On Sat 2006-08-26 17:30:40, Vitaly Wool wrote:
> > > > > On 8/26/06, Pavel Machek <pavel@suse.cz> wrote:
> > > > Because 8388608 policies is clearly not reasonable, powerop can not
> > > > help here, and something better should be developed... like power
> > > > domains someone proposed here.
> > > >
> > > > (Or to say it in another words, powerop forces one big power domain,
> > > > which is bad model for notebook-style machine).
> > >
> > > I doubt notebook-style machines will ever us power op in any
> > > significant way. HPC and embedded will be the first users.
> >
> > I agree here... power op look useless for notebooks. But I doubt power
> > op authors would agree...
>
> Concluding that it will be useless for notebooks may be premature.
>
> I see powerop as the bottom of an future PM stack. As the upper layers
> take shape who knows what platforms will use it?
Well, PCs are generaly designed in a way where individual devices are
separate, and that means that we do not have linked-parameters-problem
powerop tries to solve. But okay, perhaps someone created such
notebook in future...
> > > Power domains will likely build on top power op.
> > >
> > > Power domains adds complexities themselves. Dealing with
> > > dependencies and constraints between domains will be a challenge.
> >
> > Once we have power domains in/solved... do we still need power op? I
> > thought power op could be useful for solving constrains _inside_ one
> > domain, but...
>
> Power domains and the components within them will likely be accessed as
> operating points. I think we need to build the power domain
> abstractions on top of operating points. This is why I want to see
> support for multiple power_op_driver instances or a story for how
> operating points are added to a running system or even platform to
> enable and deal with domains.
Yes, multiple power_op_drivers -- one per power domain -- makes some
sense.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-17 21:40 ` Pavel Machek
2006-08-18 5:42 ` Vitaly Wool
@ 2006-08-18 11:48 ` Amit Kucheria
2006-08-24 7:59 ` Pavel Machek
1 sibling, 1 reply; 136+ messages in thread
From: Amit Kucheria @ 2006-08-18 11:48 UTC (permalink / raw)
To: ext Pavel Machek; +Cc: linux-pm
On Thu, 2006-08-17 at 21:40 +0000, ext Pavel Machek wrote:
> > The userspace interface in Eungeny's patches is for other userspace
> > programs (policy managers) to activate/deactivate valid operating points
> > in the system dynamically and if necessary, introduce new ones into the
> > system. It will also allow the operating points to be referenced by name
> > instead of the tuple.
> >
> > Then, we will be able to use names like 'video', 'mp3', 'fast',
> > 'powersave', 'usb' to switch to the relevant operating point based on
> > configuration of the policy manager.
>
> This seems to be too specific to embedded machine.
>
> If userspace wants to work with usb and play mp3s at the same time,
> what does it do?
Switch to 'fast'?
The operating point for a use-case specifies the _minimum_ required for
the use-case. You can always go up.
The system designer is responsible for 'designing' operating points that
take into account multiple use-cases. Designing here refers to mapping
use-cases to HW operating points.
Consider an example system with a main CPU and a DSP. To simplify
discussion, lets assume 3 levels for CPU and DSP speeds and system
voltage. Then, here is what an example operating-point to use-case
mapping table could look like:
# CPU speed DSP speed Voltage use-case
----------------------------------------------------------
1. high high high fast, video
2. med high high
3. med med med usb[1]
4. low med med mp3
5. low low low powersave
[1] USB has voltage constraint (voltage >= med)
Mapping
=======
Performance related: fast, video, mp3
Power related: powersave
Miscellaneous: usb
- Now if we are playing mp3, we switch to OP 4.
- Add usb and we switch to OP 3.
- Now our performance monitor (e.g load avg) indicates that we need more
CPU processing. So we switch to OP 2.
Regards,
Amit
--
Amit Kucheria <amit.kucheria@nokia.com>
Nokia
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-18 11:48 ` Amit Kucheria
@ 2006-08-24 7:59 ` Pavel Machek
2006-08-30 11:00 ` Amit Kucheria
0 siblings, 1 reply; 136+ messages in thread
From: Pavel Machek @ 2006-08-24 7:59 UTC (permalink / raw)
To: Amit Kucheria; +Cc: linux-pm
Hi!
> > > The userspace interface in Eungeny's patches is for other userspace
> > > programs (policy managers) to activate/deactivate valid operating points
> > > in the system dynamically and if necessary, introduce new ones into the
> > > system. It will also allow the operating points to be referenced by name
> > > instead of the tuple.
> > >
> > > Then, we will be able to use names like 'video', 'mp3', 'fast',
> > > 'powersave', 'usb' to switch to the relevant operating point based on
> > > configuration of the policy manager.
> >
> > This seems to be too specific to embedded machine.
> >
> > If userspace wants to work with usb and play mp3s at the same time,
> > what does it do?
>
> Switch to 'fast'?
>
> The operating point for a use-case specifies the _minimum_ required for
> the use-case. You can always go up.
> The system designer is responsible for 'designing' operating points that
> take into account multiple use-cases. Designing here refers to mapping
> use-cases to HW operating points.
Yes, and that's why I argue this is unsuitable for notebook: there are
just too many usecases for a notebook.
> Consider an example system with a main CPU and a DSP. To simplify
> discussion, lets assume 3 levels for CPU and DSP speeds and system
> voltage. Then, here is what an example operating-point to use-case
> mapping table could look like:
>
> # CPU speed DSP speed Voltage use-case
> ----------------------------------------------------------
> 1. high high high fast, video
> 2. med high high
> 3. med med med usb[1]
> 4. low med med mp3
> 5. low low low powersave
>
> [1] USB has voltage constraint (voltage >= med)
So... you take three independend parametrs and merge them into one,
named parameter. Bad idea.
What about simply having these parameters:
usb on or off
cpu speed (controlled by cpufreq)
dsp speed (controlled by userspace)
Then you can have infrastructure that is able to compute system
voltage from usb/cpu/dsp speed, and users stll have interface they can
understand.
(How are they supposed to know if video use case is compatible with
usb? They should not have to).
> - Now if we are playing mp3, we switch to OP 4.
Do you expect all mp3 playing applications to play with
/sys/.../powerop-point? How do you tell if mp3's are playing? These
are hard questions for a notebook.
> - Add usb and we switch to OP 3.
> - Now our performance monitor (e.g load avg) indicates that we need more
> CPU processing. So we switch to OP 2.
That's cpufreq job, please
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-24 7:59 ` Pavel Machek
@ 2006-08-30 11:00 ` Amit Kucheria
2006-08-30 22:36 ` Pavel Machek
0 siblings, 1 reply; 136+ messages in thread
From: Amit Kucheria @ 2006-08-30 11:00 UTC (permalink / raw)
To: ext Pavel Machek; +Cc: linux-pm
On Thu, 2006-08-24 at 09:59 +0200, ext Pavel Machek wrote:
> Hi!
>
> > > > The userspace interface in Eungeny's patches is for other userspace
> > > > programs (policy managers) to activate/deactivate valid operating points
> > > > in the system dynamically and if necessary, introduce new ones into the
> > > > system. It will also allow the operating points to be referenced by name
> > > > instead of the tuple.
> > > >
> > > > Then, we will be able to use names like 'video', 'mp3', 'fast',
> > > > 'powersave', 'usb' to switch to the relevant operating point based on
> > > > configuration of the policy manager.
> > >
> > > This seems to be too specific to embedded machine.
> > >
> > > If userspace wants to work with usb and play mp3s at the same time,
> > > what does it do?
> >
> > Switch to 'fast'?
> >
> > The operating point for a use-case specifies the _minimum_ required for
> > the use-case. You can always go up.
>
> > The system designer is responsible for 'designing' operating points that
> > take into account multiple use-cases. Designing here refers to mapping
> > use-cases to HW operating points.
>
> Yes, and that's why I argue this is unsuitable for notebook: there are
> just too many usecases for a notebook.
You are trying to make it sound more complex than it really is. For a
notebook, as you yourself pointed out, things could be handled with the
present adaptive, load-based system. So you don't need to map _every_
use-case to an operating point. So you don't need to move to use PowerOP
today.
But if someone (distro vendors?) takes the time and effort to map
possible use-cases, then the power manager could do better prediction of
performance requirements.
Optionally, _if_ applications were power aware, they would send
information about their activity to power manager or modify their own
class-of-service requirements. e.g. Rendering webpage has different
requirements than simply showing the page. This is NOT being discussed
at the moment.
But PowerOP would allow SoC-based systems to tune the operating points
to get the most out of their top-10 use-cases and sleep modes.
> > Consider an example system with a main CPU and a DSP. To simplify
> > discussion, lets assume 3 levels for CPU and DSP speeds and system
> > voltage. Then, here is what an example operating-point to use-case
> > mapping table could look like:
> >
> > # CPU speed DSP speed Voltage use-case
> > ----------------------------------------------------------
> > 1. high high high fast, video
> > 2. med high high
> > 3. med med med usb[1]
> > 4. low med med mp3
> > 5. low low low powersave
> >
> > [1] USB has voltage constraint (voltage >= med)
>
> So... you take three independend parametrs and merge them into one,
> named parameter. Bad idea.
But they are NOT independent parameters! Which is why we want to
encapsulate them into an 'Operating Point'. We have completely failed in
our effort to explain the concept of an operating point if that has been
your assumption all along.
> What about simply having these parameters:
>
> usb on or off
>
> cpu speed (controlled by cpufreq)
>
> dsp speed (controlled by userspace)
>
> Then you can have infrastructure that is able to compute system
> voltage from usb/cpu/dsp speed, and users stll have interface they can
> understand.
This is moot for the reason above - cpu/dsp/volt are NOT independent.
And USB (or any device information) is NOT part of the operating point.
It is just an asynchronous constraint whose appearance/disappearance
influences operating point tangentially. IOW, on some systems USB could
run at any operating point, so there would be no constraint. On others,
use of USB would automatically cause usb clocks to go high which in turn
would switch the system to an operating point that satisfies the
constraint - this is handled by clock/voltage framework.
> (How are they supposed to know if video use case is compatible with
> usb? They should not have to).
Only one human 'user' needs to worry about this detail - the system
designer, and that too only in SoC-based systems. For PC systems, such
constraints don't exist; you can be happy with the load-based scaling.
> > - Now if we are playing mp3, we switch to OP 4.
>
> Do you expect all mp3 playing applications to play with
> /sys/.../powerop-point? How do you tell if mp3's are playing? These
> are hard questions for a notebook.
There are two ways I can think of:
1. Modify every application - Every application then sends messages to
the power manager about its state e.g. paused, playing, ffwd, stopped.
This is not meant for PC applications due to their sheer numbers. But it
is not uncommon on embedded systems to tune application behaviour to
ease/improve power management.
2. Modify central launcher application - Click on an application icon
gives us information about what application is about to be launched that
allows us to change operating point. This might be doable for PC
applications by modifying KDE/Gnome launchers. But it only tells us what
applications are loaded, even though they might be idle. Which is where
the load average helps.
> > - Add usb and we switch to OP 3.
> > - Now our performance monitor (e.g load avg) indicates that we need more
> > CPU processing. So we switch to OP 2.
>
> That's cpufreq job, please
Yes. Or more particularly, the ondemand governor, right? But load
average is not the only input used to make decisions. There could be
thermal alarms, battery alarms, etc. And deciding which of these
conflicting inputs is given priority is a policy decision made by the
device manager. We discussed some of this at the PM summit.
Device manager
|
-------------------------------------
| | | |
Power Performance Thermal Misc.
manager manager manager
(e.g. ondemand)
Regards,
Amit
--
Amit Kucheria <amit.kucheria@nokia.com>
Nokia
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-30 11:00 ` Amit Kucheria
@ 2006-08-30 22:36 ` Pavel Machek
2006-08-31 13:44 ` Amit Kucheria
0 siblings, 1 reply; 136+ messages in thread
From: Pavel Machek @ 2006-08-30 22:36 UTC (permalink / raw)
To: Amit Kucheria; +Cc: linux-pm
On Wed 2006-08-30 14:00:53, Amit Kucheria wrote:
> On Thu, 2006-08-24 at 09:59 +0200, ext Pavel Machek wrote:
> > Hi!
> >
> > > > > The userspace interface in Eungeny's patches is for other userspace
> > > > > programs (policy managers) to activate/deactivate valid operating points
> > > > > in the system dynamically and if necessary, introduce new ones into the
> > > > > system. It will also allow the operating points to be referenced by name
> > > > > instead of the tuple.
> > > > >
> > > > > Then, we will be able to use names like 'video', 'mp3', 'fast',
> > > > > 'powersave', 'usb' to switch to the relevant operating point based on
> > > > > configuration of the policy manager.
> > > >
> > > > This seems to be too specific to embedded machine.
> > > >
> > > > If userspace wants to work with usb and play mp3s at the same time,
> > > > what does it do?
> > >
> > > Switch to 'fast'?
> > >
> > > The operating point for a use-case specifies the _minimum_ required for
> > > the use-case. You can always go up.
> >
> > > The system designer is responsible for 'designing' operating points that
> > > take into account multiple use-cases. Designing here refers to mapping
> > > use-cases to HW operating points.
> >
> > Yes, and that's why I argue this is unsuitable for notebook: there are
> > just too many usecases for a notebook.
>
> You are trying to make it sound more complex than it really is. For a
> notebook, as you yourself pointed out, things could be handled with the
> present adaptive, load-based system. So you don't need to map _every_
> use-case to an operating point. So you don't need to move to use PowerOP
> today.
Ok, but please do not try to replace cpufreq with
powerop/oppoint. That is not possible.
> But PowerOP would allow SoC-based systems to tune the operating points
> to get the most out of their top-10 use-cases and sleep modes.
Question is: can we get similar savings without ugly interface powerop
presents?
> > > Consider an example system with a main CPU and a DSP. To simplify
> > > discussion, lets assume 3 levels for CPU and DSP speeds and system
> > > voltage. Then, here is what an example operating-point to use-case
> > > mapping table could look like:
> > >
> > > # CPU speed DSP speed Voltage use-case
> > > ----------------------------------------------------------
> > > 1. high high high fast, video
> > > 2. med high high
> > > 3. med med med usb[1]
> > > 4. low med med mp3
> > > 5. low low low powersave
> > >
> > > [1] USB has voltage constraint (voltage >= med)
> >
> > So... you take three independend parametrs and merge them into one,
> > named parameter. Bad idea.
>
> But they are NOT independent parameters! Which is why we want to
> encapsulate them into an 'Operating Point'. We have completely failed in
> our effort to explain the concept of an operating point if that has been
> your assumption all along.
They are independed, at least from application point of view. And
that's probably right interface to present to userland. Application
tells you its dsp speed desired, you take current cpu frequency
requirements from cpufreq, and select ooperating point with lowest
consumption based on that constraints.
> > What about simply having these parameters:
> >
> > usb on or off
> >
> > cpu speed (controlled by cpufreq)
> >
> > dsp speed (controlled by userspace)
> >
> > Then you can have infrastructure that is able to compute system
> > voltage from usb/cpu/dsp speed, and users stll have interface they can
> > understand.
>
> This is moot for the reason above - cpu/dsp/volt are NOT independent.
>
> And USB (or any device information) is NOT part of the operating point.
> It is just an asynchronous constraint whose appearance/disappearance
> influences operating point tangentially. IOW, on some systems USB could
> run at any operating point, so there would be no constraint. On others,
> use of USB would automatically cause usb clocks to go high which in turn
> would switch the system to an operating point that satisfies the
> constraint - this is handled by clock/voltage framework.
Okay, and why can't we handle _all_ the constraints in this style? Ask
userspace what constraints are there, and automagically select best
operating point, without having operating points explicit at userspace
interface.
> > > - Add usb and we switch to OP 3.
> > > - Now our performance monitor (e.g load avg) indicates that we need more
> > > CPU processing. So we switch to OP 2.
> >
> > That's cpufreq job, please
>
> Yes. Or more particularly, the ondemand governor, right? But load
> average is not the only input used to make decisions. There could be
> thermal alarms, battery alarms, etc. And deciding which of these
> conflicting inputs is given priority is a policy decision made by the
> device manager. We discussed some of this at the PM summit.
cpufreq already knows about thermal. (There's no policy in there, you
can't allow system to overheat).
cpufreq already knows about battery. (On some powernow-k8, high cpu
frequencies are not available on battery power, because battery is not
powerful enough). If you have aditional constraints (may not use
400MHz when battery is below 20%, because li-ion has too big internal
resistancy at that point?), please use cpufreq framework to enforce
them.
Is there anything cpufreq can't do?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-30 22:36 ` Pavel Machek
@ 2006-08-31 13:44 ` Amit Kucheria
2006-09-02 11:17 ` Pavel Machek
0 siblings, 1 reply; 136+ messages in thread
From: Amit Kucheria @ 2006-08-31 13:44 UTC (permalink / raw)
To: ext Pavel Machek; +Cc: linux-pm
On Thu, 2006-08-31 at 00:36 +0200, ext Pavel Machek wrote:
> On Wed 2006-08-30 14:00:53, Amit Kucheria wrote:
<snip>
> > You are trying to make it sound more complex than it really is. For a
> > notebook, as you yourself pointed out, things could be handled with the
> > present adaptive, load-based system. So you don't need to map _every_
> > use-case to an operating point. So you don't need to move to use PowerOP
> > today.
>
> Ok, but please do not try to replace cpufreq with
> powerop/oppoint. That is not possible.
No one wants to replace cpufreq with PowerOP today. Which is why the
patches make PowerOP optional. But embedded systems need it today.
> > But PowerOP would allow SoC-based systems to tune the operating points
> > to get the most out of their top-10 use-cases and sleep modes.
>
> Question is: can we get similar savings without ugly interface powerop
> presents?
>
If I have understood correctly, your main objection is to defining new
operating points from userspace?
The only other interface is the actually setting of a (named) operating
point and that is _required_ to do anything useful.
<snip>
> > But they are NOT independent parameters! Which is why we want to
> > encapsulate them into an 'Operating Point'. We have completely failed in
> > our effort to explain the concept of an operating point if that has been
> > your assumption all along.
>
> They are independed, at least from application point of view. And
> that's probably right interface to present to userland. Application
> tells you its dsp speed desired, you take current cpu frequency
> requirements from cpufreq, and select ooperating point with lowest
> consumption based on that constraints.
You are violating your own principles here - why should application know
about 'DSP speed'?
And individual applications don't know about operating points either.
They just present their requirements in terms of increased load or a
constraint (usb, temp, etc.). The device manager gets inputs about this
increased load and constraints and programs the appropriate OP. So we
agree here.
<snip>
> > And USB (or any device information) is NOT part of the operating point.
> > It is just an asynchronous constraint whose appearance/disappearance
> > influences operating point tangentially. IOW, on some systems USB could
> > run at any operating point, so there would be no constraint. On others,
> > use of USB would automatically cause usb clocks to go high which in turn
> > would switch the system to an operating point that satisfies the
> > constraint - this is handled by clock/voltage framework.
>
> Okay, and why can't we handle _all_ the constraints in this style? Ask
> userspace what constraints are there, and automagically select best
> operating point, without having operating points explicit at userspace
> interface.
>
OP change _will_ happen automatically in the kernel for quick
transitions. But the device manager will sometimes override this with
policy decisions. Because in certain cases, userspace knows best.
<snip>
> > Yes. Or more particularly, the ondemand governor, right? But load
> > average is not the only input used to make decisions. There could be
> > thermal alarms, battery alarms, etc. And deciding which of these
> > conflicting inputs is given priority is a policy decision made by the
> > device manager. We discussed some of this at the PM summit.
>
> cpufreq already knows about thermal. (There's no policy in there, you
> can't allow system to overheat).
>
> cpufreq already knows about battery. (On some powernow-k8, high cpu
> frequencies are not available on battery power, because battery is not
> powerful enough). If you have aditional constraints (may not use
> 400MHz when battery is below 20%, because li-ion has too big internal
> resistancy at that point?), please use cpufreq framework to enforce
> them.
I will look at support for thermal/battery events in cpufreq in greater
detail.
> Is there anything cpufreq can't do?
- Embedded systems _want_ to deal with performance/power management in
terms of Operating Points that encapsulate the complete state of the SoC
(core speeds, voltages, buses speeds, etc.) instead of only CPU
frequency.
- There is not much in cpufreq for handling rate propagation and
dependency tracking for clocks and voltages. This is what the clock
framework and the upcoming voltage framework handle quite well.
- Most implementations of cpufreq drivers have a fixed rate table (freq,
voltage). With rate propagation and dependencies in SoCs, available
rates can vary dynamically based on states of various cores,
peripherals, etc.
Regards,
Amit
--
Amit Kucheria <amit.kucheria@nokia.com>
Nokia
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-31 13:44 ` Amit Kucheria
@ 2006-09-02 11:17 ` Pavel Machek
0 siblings, 0 replies; 136+ messages in thread
From: Pavel Machek @ 2006-09-02 11:17 UTC (permalink / raw)
To: Amit Kucheria; +Cc: linux-pm
Hi!
On Thu 2006-08-31 16:44:12, Amit Kucheria wrote:
> On Thu, 2006-08-31 at 00:36 +0200, ext Pavel Machek wrote:
> > On Wed 2006-08-30 14:00:53, Amit Kucheria wrote:
> > > But PowerOP would allow SoC-based systems to tune the operating points
> > > to get the most out of their top-10 use-cases and sleep modes.
> >
> > Question is: can we get similar savings without ugly interface powerop
> > presents?
>
> If I have understood correctly, your main objection is to defining new
> operating points from userspace?
Well, that is big objection, but not my main one. I believe that "new
operating points from userspace" are non-starter. "So obviously wrong
that noone would merge that".
> The only other interface is the actually setting of a (named) operating
> point and that is _required_ to do anything useful.
No, they are not.
We already have interface for selecting cpu frequency. Lets keep it.
We may need new interface for selecting DSP frequency. If that is
needed, lets *add* that interface.
We may need new interface to say if usb needs to be enabled or
not. How to do that interface right is a question, but lets say we
*add* that interface.
Now, it should be up-to the powerop framework to select best operating
point given "cpu speed, dsp speed, usb on/off" state. But I argue that
this should be done in-kernel and hidden from user.
> > Is there anything cpufreq can't do?
>
> - Embedded systems _want_ to deal with performance/power management in
> terms of Operating Points that encapsulate the complete state of the SoC
> (core speeds, voltages, buses speeds, etc.) instead of only CPU
> frequency.
I'm not saying we'll not have complete-SoC-state at some layer. But I
do not think we want it at userspace-kernelspace interface.
> - There is not much in cpufreq for handling rate propagation and
> dependency tracking for clocks and voltages. This is what the clock
> framework and the upcoming voltage framework handle quite well.
Good.
> - Most implementations of cpufreq drivers have a fixed rate table (freq,
> voltage). With rate propagation and dependencies in SoCs, available
> rates can vary dynamically based on states of various cores,
> peripherals, etc.
But that's cpufreq-implementation-detail, right?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-15 10:35 ` Amit Kucheria
2006-08-15 19:04 ` Dave Jones
@ 2006-08-17 21:24 ` Pavel Machek
1 sibling, 0 replies; 136+ messages in thread
From: Pavel Machek @ 2006-08-17 21:24 UTC (permalink / raw)
To: Amit Kucheria; +Cc: linux-pm
Hi!
> 5. <Crystal Ball Gazing/Wishful Thinking>
> - Clock/Voltage FW <--> ACPI logical mappings allows us to use global
> state names in /sys/power/state for system power state transitions.
We probably do not want cpufreq-like controls in /sys/power/state.
--
Thanks for all the (sleeping) penguins.
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-15 1:00 ` Greg KH
2006-08-15 3:03 ` Dave Jones
2006-08-15 10:35 ` Amit Kucheria
@ 2006-08-19 6:10 ` David Singleton
2006-08-22 2:13 ` Greg KH
` (2 more replies)
2006-08-19 6:19 ` David Singleton
3 siblings, 3 replies; 136+ messages in thread
From: David Singleton @ 2006-08-19 6:10 UTC (permalink / raw)
To: Greg KH; +Cc: linux-pm
[-- Attachment #1: Type: text/plain, Size: 4055 bytes --]
On 8/14/06, Greg KH <greg@kroah.com> wrote:
> On Mon, Aug 14, 2006 at 07:48:01PM -0400, Dave Jones wrote:
> >
> > This adds a whole bunch of new code, and doesn't seem to make any
> > existing code any simpler (to me at least). From a cpufreq point of view,
> > what does adding this buy us? What problem do we have today that is
> > being solved by all this?
Greg and Dave,
there are two competing patch sets for a new power management
framework. The patch set I sent out simplifies power management,
from both the cpufreq perspective and the embedded world's view of
power management.
I've renamed my patch oppoint so as not confuse it
with the powerop set from Matt Locke (which will probably make
it even more confusing). I've renamed it so it can be seen as an
alternative design approach, not just an alternative implementation
of the same ideas. I've also incorporated suggestions from
Pavel in cleaning up the original patches.
If you'd be willing to take a look at, or try out, the patches
in my patch set you should be able to see how oppoint could simplify
cpufreq code. The first patch is the oppoint-cpufreq.patch and
the second is the oppoint-x86-centrino.patch.
Oppoint could replace large pieces of the cpufreq code
in the kernel, most notably the policy and governor code, which I
believe belongs in user space in the power manager daemon.
You'll notice that the oppoint-cpufreq.patch only touches
two files, cpufreq.c and cpufreq.h. It only creates two new interfaces
to the cpufreq frequency scaling notifier lists to support driver pre
and post scaling routines, already supported in the kernel.
The oppoint-x86-centrino.patch completes the replacement
of cpufreq code by introducing the transition routine to
change frequencies and creates operating points for the
centrino-speedstep processors already supported by Linux.
(although I've recieved a note from Intel that the data I've copied
from the centrino-speedstep cpufreq tables is known to be inaccurate
and unsupported)
This code could replace cpufreq code and simplify it quite a
bit in the process. The kernel drivers that support cpufreq frequency
scaling would not have to be changed. Operating points for the rest
of the processors that support cpufreq would have to be created, but
as you can see it's quite a straight forward transformation from
a cpufreq table to a set of operating points for a processor.
The entire patch set can be found at:
http://source.mvista.com/~dsingleton/2.6.18-rc4/
The patch set consists of:
oppoint-core.patch
oppoint-cpufreq.patch
oppoint-x86-centrino.patch
oppoint-arm-pxa27x.patch
I'll attach oppoint-cpufreq.patch to this email and
send out oppoint-x86-centrino.patch next.
David
> >
> > Every explanation of powerop I've seen so far dives into microdetails,
> > whilst the 10,000ft view has always passed me by other than "this is
> > what we've had in the embedded world".
> >
> > The diagram at http://lists.osdl.org/pipermail/linux-pm/2006-August/003196.html
> > also confuses me. I was under the impression that powerop was adding additional
> > userspace interfaces. If we're not changing how things from a userspace
> > point of view, we're churning a lot of kernel code,.. why?
> >
> > Clue me in here, I'm feeling thick.
>
> You're not alone, I really don't get it either.
>
> But I guess we'll just wait for the next round of unified patches and
> then go from there.
>
> thanks,
>
> greg k-h
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/linux-pm
>
[-- Attachment #2: oppoint-cpufreq.patch --]
[-- Type: application/octet-stream, Size: 2433 bytes --]
Signed-Off-by: David Singleton <dsingleton@mvista.com>
drivers/cpufreq/cpufreq.c | 36 ++++++++++++++++++++++++++++++++++++
include/linux/cpufreq.h | 2 ++
2 files changed, 38 insertions(+)
Index: linux-2.6.17/drivers/cpufreq/cpufreq.c
===================================================================
--- linux-2.6.17.orig/drivers/cpufreq/cpufreq.c
+++ linux-2.6.17/drivers/cpufreq/cpufreq.c
@@ -226,6 +226,35 @@ static void adjust_jiffies(unsigned long
static inline void adjust_jiffies(unsigned long val, struct cpufreq_freqs *ci) { return; }
#endif
+int cpufreq_prepare_transition(struct oppoint *cur, struct oppoint *new)
+{
+ struct cpufreq_freqs freqs;
+
+ freqs.old = cur->frequency;
+ freqs.new = new->frequency;
+ freqs.cpu = 0;
+ freqs.flags = cpufreq_driver->flags;
+ blocking_notifier_call_chain(&cpufreq_transition_notifier_list,
+ CPUFREQ_PRECHANGE, &freqs);
+ adjust_jiffies(CPUFREQ_PRECHANGE, &freqs);
+ return 0;
+}
+EXPORT_SYMBOL(cpufreq_prepare_transition);
+
+int cpufreq_finish_transition(struct oppoint *cur, struct oppoint *new)
+{
+ struct cpufreq_freqs freqs;
+
+ freqs.old = cur->frequency;
+ freqs.new = new->frequency;
+ freqs.cpu = 0;
+ freqs.flags = cpufreq_driver->flags;
+ adjust_jiffies(CPUFREQ_POSTCHANGE, &freqs);
+ blocking_notifier_call_chain(&cpufreq_transition_notifier_list,
+ CPUFREQ_POSTCHANGE, &freqs);
+ return 0;
+}
+EXPORT_SYMBOL(cpufreq_finish_transition);
/**
* cpufreq_notify_transition - call notifier chain and adjust_jiffies
@@ -920,6 +949,12 @@ static void cpufreq_out_of_sync(unsigned
}
+#ifdef CONFIG_PM
+unsigned int cpufreq_quick_get(unsigned int cpu)
+{
+ return (current_state->frequency * 1000);
+}
+#else
/**
* cpufreq_quick_get - get the CPU frequency (in kHz) frpm policy->cur
* @cpu: CPU number
@@ -941,6 +976,7 @@ unsigned int cpufreq_quick_get(unsigned
return (ret);
}
+#endif
EXPORT_SYMBOL(cpufreq_quick_get);
Index: linux-2.6.17/include/linux/cpufreq.h
===================================================================
--- linux-2.6.17.orig/include/linux/cpufreq.h
+++ linux-2.6.17/include/linux/cpufreq.h
@@ -268,6 +268,8 @@ static inline unsigned int cpufreq_quick
return 0;
}
#endif
+int cpufreq_prepare_transition(struct oppoint *cur, struct oppoint *new);
+int cpufreq_finish_transition(struct oppoint *cur, struct oppoint *new);
/*********************************************************************
[-- Attachment #3: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-19 6:10 ` David Singleton
@ 2006-08-22 2:13 ` Greg KH
2006-08-22 5:20 ` David Singleton
2006-08-23 19:05 ` Mark Gross
2006-08-24 12:39 ` Pavel Machek
2 siblings, 1 reply; 136+ messages in thread
From: Greg KH @ 2006-08-22 2:13 UTC (permalink / raw)
To: David Singleton; +Cc: linux-pm
On Fri, Aug 18, 2006 at 11:10:02PM -0700, David Singleton wrote:
> Oppoint could replace large pieces of the cpufreq code
> in the kernel, most notably the policy and governor code, which I
> believe belongs in user space in the power manager daemon.
>
> You'll notice that the oppoint-cpufreq.patch only touches
> two files, cpufreq.c and cpufreq.h. It only creates two new
> interfaces
> to the cpufreq frequency scaling notifier lists to support driver pre
> and post scaling routines, already supported in the kernel.
>
> The oppoint-x86-centrino.patch completes the replacement
> of cpufreq code by introducing the transition routine to
> change frequencies and creates operating points for the
> centrino-speedstep processors already supported by Linux.
>
> (although I've recieved a note from Intel that the data I've copied
> from the centrino-speedstep cpufreq tables is known to be inaccurate
> and unsupported)
>
> This code could replace cpufreq code and simplify it quite a
> bit in the process. The kernel drivers that support cpufreq
> frequency
> scaling would not have to be changed. Operating points for the rest
> of the processors that support cpufreq would have to be created, but
> as you can see it's quite a straight forward transformation from
> a cpufreq table to a set of operating points for a processor.
This only touches on the cpu frequency stuff. I am assuming that the
current driver interface to the different power management states is
acceptable to you?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-22 2:13 ` Greg KH
@ 2006-08-22 5:20 ` David Singleton
0 siblings, 0 replies; 136+ messages in thread
From: David Singleton @ 2006-08-22 5:20 UTC (permalink / raw)
To: Greg KH; +Cc: linux-pm
On 8/21/06, Greg KH <greg@kroah.com> wrote:
> On Fri, Aug 18, 2006 at 11:10:02PM -0700, David Singleton wrote:
> > Oppoint could replace large pieces of the cpufreq code
> > in the kernel, most notably the policy and governor code, which I
> > believe belongs in user space in the power manager daemon.
> >
> > You'll notice that the oppoint-cpufreq.patch only touches
> > two files, cpufreq.c and cpufreq.h. It only creates two new
> > interfaces
> > to the cpufreq frequency scaling notifier lists to support driver pre
> > and post scaling routines, already supported in the kernel.
> >
> > The oppoint-x86-centrino.patch completes the replacement
> > of cpufreq code by introducing the transition routine to
> > change frequencies and creates operating points for the
> > centrino-speedstep processors already supported by Linux.
> >
> > (although I've recieved a note from Intel that the data I've copied
> > from the centrino-speedstep cpufreq tables is known to be inaccurate
> > and unsupported)
> >
> > This code could replace cpufreq code and simplify it quite a
> > bit in the process. The kernel drivers that support cpufreq
> > frequency
> > scaling would not have to be changed. Operating points for the rest
> > of the processors that support cpufreq would have to be created, but
> > as you can see it's quite a straight forward transformation from
> > a cpufreq table to a set of operating points for a processor.
>
> This only touches on the cpu frequency stuff. I am assuming that the
> current driver interface to the different power management states is
> acceptable to you?
Yes. The driver interface works perfectlly with the operating point design.
The power manager can simply set whatever operating point it wants the system
to be in and still have the flexibility to suspend individual devices through
the current driver interface. Drivers do not have be changed in any way
to operate correctly with the operating point model.
The cpufreq driver scaling code doesn't need to changed either. OpPoint
calls the same scaling routines throught the same notifier chain as
cpufreq does.
That's two of the big advantages to the OpPoint design,
the driver interace doesn't need to change and the existing driver frequency
scaling code doesn't need to change either.
Which makes sense since the operating point design is performing the same
functionality, just in a simpler manner.
I still have to write a power manager for this so all the
policy/class stuff
that's being discussed now can be plug-ins for the power manager. You'd
set you power management classes and policies up in the power manager,
wether its' performance, or energy efficiency, or thermal constraints,
or battery
constraints, and let it simply set operating states and handle
individual devices
as it sees fit.
That's really where all the policy/class code belongs, in the
power manager.
OpPoint just provides a simpler interface for the power manager. The operating
points are set by their name and the device control works just as it does today
through the sysfs interface.
David
>
> thanks,
>
> greg k-h
>
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-19 6:10 ` David Singleton
2006-08-22 2:13 ` Greg KH
@ 2006-08-23 19:05 ` Mark Gross
2006-08-24 12:39 ` Pavel Machek
2 siblings, 0 replies; 136+ messages in thread
From: Mark Gross @ 2006-08-23 19:05 UTC (permalink / raw)
To: David Singleton; +Cc: linux-pm
On Fri, Aug 18, 2006 at 11:10:02PM -0700, David Singleton wrote:
> On 8/14/06, Greg KH <greg@kroah.com> wrote:
> >On Mon, Aug 14, 2006 at 07:48:01PM -0400, Dave Jones wrote:
> >>
> >> This adds a whole bunch of new code, and doesn't seem to make any
> >> existing code any simpler (to me at least). From a cpufreq point of
> >view,
> >> what does adding this buy us? What problem do we have today that is
> >> being solved by all this?
>
> Greg and Dave,
>
> there are two competing patch sets for a new power
> management
> framework. The patch set I sent out simplifies power management,
> from both the cpufreq perspective and the embedded world's view of
> power management.
Why can't we have one evolve a single powerOP framework? Both of these
patches are derived from The MV/Todd Poynor's patches. It seems "funny"
to not coordinate these two patch sets.
>
> I've renamed my patch oppoint so as not confuse it
> with the powerop set from Matt Locke (which will probably make
> it even more confusing). I've renamed it so it can be seen as an
> alternative design approach, not just an alternative implementation
> of the same ideas. I've also incorporated suggestions from
> Pavel in cleaning up the original patches.
>
> If you'd be willing to take a look at, or try out, the
> patches
> in my patch set you should be able to see how oppoint could simplify
> cpufreq code. The first patch is the oppoint-cpufreq.patch and
> the second is the oppoint-x86-centrino.patch.
How would the ACPI cpufreq_driver be integrated with this design?
>
> Oppoint could replace large pieces of the cpufreq code
> in the kernel, most notably the policy and governor code, which I
> believe belongs in user space in the power manager daemon.
How will the users of on-demand make use of this design?
I don't think you can just dump the governor function of CPUFREQ for
user defined performance control.
>
> You'll notice that the oppoint-cpufreq.patch only touches
> two files, cpufreq.c and cpufreq.h. It only creates two new
> interfaces
> to the cpufreq frequency scaling notifier lists to support driver pre
> and post scaling routines, already supported in the kernel.
re-using the cpufreq notification infrastructure makes sense.
> The oppoint-x86-centrino.patch completes the replacement
> of cpufreq code by introducing the transition routine to
> change frequencies and creates operating points for the
> centrino-speedstep processors already supported by Linux.
>
> (although I've recieved a note from Intel that the data I've copied
> from the centrino-speedstep cpufreq tables is known to be inaccurate
> and unsupported)
>
> This code could replace cpufreq code and simplify it quite a
> bit in the process. The kernel drivers that support cpufreq
> frequency
Only for user mode governors, I believe kernel mode governors still have
role in Linux.
--mgross
> scaling would not have to be changed. Operating points for the rest
> of the processors that support cpufreq would have to be created, but
> as you can see it's quite a straight forward transformation from
> a cpufreq table to a set of operating points for a processor.
>
> The entire patch set can be found at:
>
> http://source.mvista.com/~dsingleton/2.6.18-rc4/
>
> The patch set consists of:
>
> oppoint-core.patch
> oppoint-cpufreq.patch
> oppoint-x86-centrino.patch
> oppoint-arm-pxa27x.patch
>
> I'll attach oppoint-cpufreq.patch to this email and
> send out oppoint-x86-centrino.patch next.
>
>
> David
>
>
>
> >>
> >> Every explanation of powerop I've seen so far dives into microdetails,
> >> whilst the 10,000ft view has always passed me by other than "this is
> >> what we've had in the embedded world".
> >>
> >> The diagram at
> >http://lists.osdl.org/pipermail/linux-pm/2006-August/003196.html
> >> also confuses me. I was under the impression that powerop was adding
> >additional
> >> userspace interfaces. If we're not changing how things from a userspace
> >> point of view, we're churning a lot of kernel code,.. why?
> >>
> >> Clue me in here, I'm feeling thick.
> >
> >You're not alone, I really don't get it either.
> >
> >But I guess we'll just wait for the next round of unified patches and
> >then go from there.
> >
> >thanks,
> >
> >greg k-h
> >_______________________________________________
> >linux-pm mailing list
> >linux-pm@lists.osdl.org
> >https://lists.osdl.org/mailman/listinfo/linux-pm
> >
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/linux-pm
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-19 6:10 ` David Singleton
2006-08-22 2:13 ` Greg KH
2006-08-23 19:05 ` Mark Gross
@ 2006-08-24 12:39 ` Pavel Machek
2 siblings, 0 replies; 136+ messages in thread
From: Pavel Machek @ 2006-08-24 12:39 UTC (permalink / raw)
To: David Singleton; +Cc: linux-pm
Hi!
> >> This adds a whole bunch of new code, and doesn't seem to make any
> >> existing code any simpler (to me at least). From a cpufreq point of
> >view,
> >> what does adding this buy us? What problem do we have today that is
> >> being solved by all this?
>
> Greg and Dave,
>
> there are two competing patch sets for a new power
> management
> framework. The patch set I sent out simplifies power management,
> from both the cpufreq perspective and the embedded world's view of
> power management.
>
> I've renamed my patch oppoint so as not confuse it
> with the powerop set from Matt Locke (which will probably make
> it even more confusing). I've renamed it so it can be seen as an
> alternative design approach, not just an alternative implementation
> of the same ideas. I've also incorporated suggestions from
> Pavel in cleaning up the original patches.
>
> If you'd be willing to take a look at, or try out, the
> patches
> in my patch set you should be able to see how oppoint could simplify
> cpufreq code. The first patch is the oppoint-cpufreq.patch and
> the second is the oppoint-x86-centrino.patch.
>
> Oppoint could replace large pieces of the cpufreq code
> in the kernel, most notably the policy and governor code, which I
> believe belongs in user space in the power manager daemon.
I was told (by intel folks) that you can't push governor code into
userspace, because it is latency-critical on new cpus... so I do not
think this is going to simplify cpufreq.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-15 1:00 ` Greg KH
` (2 preceding siblings ...)
2006-08-19 6:10 ` David Singleton
@ 2006-08-19 6:19 ` David Singleton
[not found] ` <20060819184843.GB15644@redhat.com>
3 siblings, 1 reply; 136+ messages in thread
From: David Singleton @ 2006-08-19 6:19 UTC (permalink / raw)
To: Greg KH; +Cc: linux-pm
[-- Attachment #1: Type: text/plain, Size: 1800 bytes --]
On 8/14/06, Greg KH <greg@kroah.com> wrote:
> On Mon, Aug 14, 2006 at 07:48:01PM -0400, Dave Jones wrote:
> >
> > This adds a whole bunch of new code, and doesn't seem to make any
> > existing code any simpler (to me at least). From a cpufreq point of view,
> > what does adding this buy us? What problem do we have today that is
> > being solved by all this?
Greg and Dave,
Here is the patch the provides the cpufreq functionality in an
operating point
fashion for the centrino-speedstep. Cpufreq tables are transformed
into operating
points which can be simply set by writing the name of the operating
point into /sys/power/state.
These two patches implement the cpufreq functionality of changing
processsor
frequency and voltage. The huge amount of code that tries to make decisions
about what operating point to set and which devices can be suspended
or not is left to
the power manager.
David
> >
> > Every explanation of powerop I've seen so far dives into microdetails,
> > whilst the 10,000ft view has always passed me by other than "this is
> > what we've had in the embedded world".
> >
> > The diagram at http://lists.osdl.org/pipermail/linux-pm/2006-August/003196.html
> > also confuses me. I was under the impression that powerop was adding additional
> > userspace interfaces. If we're not changing how things from a userspace
> > point of view, we're churning a lot of kernel code,.. why?
> >
> > Clue me in here, I'm feeling thick.
>
> You're not alone, I really don't get it either.
>
> But I guess we'll just wait for the next round of unified patches and
> then go from there.
>
> thanks,
>
> greg k-h
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/linux-pm
>
[-- Attachment #2: oppoint-x86-centrino.patch --]
[-- Type: application/octet-stream, Size: 16865 bytes --]
Signed-Off-by: David Singleton <dsingleton@mvista.com>
arch/i386/kernel/cpu/Makefile | 1
arch/i386/kernel/cpu/oppoint/Makefile | 2
arch/i386/kernel/cpu/oppoint/centrino-dynamic-oppoint.c | 71 ++
arch/i386/kernel/cpu/oppoint/centrino-oppoint.c | 460 ++++++++++++++++
arch/i386/kernel/i386_ksyms.c | 4
5 files changed, 538 insertions(+)
Index: linux-2.6.17/arch/i386/kernel/cpu/Makefile
===================================================================
--- linux-2.6.17.orig/arch/i386/kernel/cpu/Makefile
+++ linux-2.6.17/arch/i386/kernel/cpu/Makefile
@@ -17,3 +17,4 @@ obj-$(CONFIG_X86_MCE) += mcheck/
obj-$(CONFIG_MTRR) += mtrr/
obj-$(CONFIG_CPU_FREQ) += cpufreq/
+obj-$(CONFIG_PM) += oppoint/
Index: linux-2.6.17/arch/i386/kernel/i386_ksyms.c
===================================================================
--- linux-2.6.17.orig/arch/i386/kernel/i386_ksyms.c
+++ linux-2.6.17/arch/i386/kernel/i386_ksyms.c
@@ -28,3 +28,7 @@ EXPORT_SYMBOL(__read_lock_failed);
#endif
EXPORT_SYMBOL(csum_partial);
+#ifdef CONFIG_PM
+#include <linux/pm.h>
+EXPORT_SYMBOL(pm_states);
+#endif
Index: linux-2.6.17/arch/i386/kernel/cpu/oppoint/Makefile
===================================================================
--- /dev/null
+++ linux-2.6.17/arch/i386/kernel/cpu/oppoint/Makefile
@@ -0,0 +1,2 @@
+obj-$(CONFIG_X86_SPEEDSTEP_CENTRINO) += centrino-oppoint.o
+obj-m += centrino-dynamic-oppoint.o
Index: linux-2.6.17/arch/i386/kernel/cpu/oppoint/centrino-dynamic-oppoint.c
===================================================================
--- /dev/null
+++ linux-2.6.17/arch/i386/kernel/cpu/oppoint/centrino-dynamic-oppoint.c
@@ -0,0 +1,71 @@
+/*
+ * oppoint/centrino-dynamic-oppoint.c
+ *
+ * This is the template to create dynamic operating points for power management.
+ *
+ * Author: David Singleton dsingleton@mvista.com MontaVista Software, Inc.
+ *
+ * 2006 (c) MontaVista Software, Inc. This file is licensed under
+ * the terms of the GNU General Public License version 2. This program
+ * is licensed "as is" without any warranty of any kind, whether express
+ * or implied.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/pm.h>
+#include <linux/cpufreq.h>
+#include <linux/moduleparam.h>
+#include <linux/moduleloader.h>
+
+int centrino_transition(struct oppoint *cur, struct oppoint *new);
+
+static char oppoint_name[PM_NAME_SIZE] = "dynamic";
+static unsigned int voltage = 1308;
+static unsigned int latency = 100;
+module_param_named(name, oppoint_name, char *, 0);
+module_param_named(frequency, frequency, uint, 0);
+module_param_named(voltage, voltage, uint, 0);
+module_param_named(latency, latency, uint, 0);
+MODULE_PARM_DESC(frequency, "cpu frequency in kHz");
+MODULE_PARM_DESC(voltage, "cpu voltage in mV");
+MODULE_PARM_DESC(latency, "transition latency in us");
+
+/* Register both the driver and the device */
+
+static struct oppoint dynamic_op = {
+ .type = PM_FREQ_CHANGE,
+ .name = "Dynamic",
+ .prepare_transition = cpufreq_prepare_transition,
+ .transition = centrino_transition,
+ .finish_transition = cpufreq_finish_transition,
+};
+
+extern void centrino_set_frequency(struct oppoint *op, uint freq, uint volt);
+
+int __init dynamic_oppoint_init(void)
+{
+
+ printk("Dynamic PowerOp operating point for speedstep centrino\n");
+ dynamic_op.name = name;
+ dynamic_op.frequency = frequency;
+ dynamic_op.voltage = voltage;
+ dynamic_op.latency = latency;
+ centrino_set_frequency(&dynamic_op, frequency / 1000, voltage);
+ printk("freq %d volt %d msr 0x%x\n", dynamic_op.frequency,
+ dynamic_op.voltage, (unsigned int)dynamic_op.md_data);
+ list_add_tail(&dynamic_op.list, &pm_states.list);
+ return 0;
+}
+
+void __exit dynamic_oppoint_cleanup(void)
+{
+ list_del_init(&dynamic_op.list);
+}
+
+module_init(dynamic_oppoint_init);
+module_exit(dynamic_oppoint_cleanup);
+
+MODULE_DESCRIPTION("Dynamic Powerop module");
+MODULE_LICENSE("GPL");
Index: linux-2.6.17/arch/i386/kernel/cpu/oppoint/centrino-oppoint.c
===================================================================
--- /dev/null
+++ linux-2.6.17/arch/i386/kernel/cpu/oppoint/centrino-oppoint.c
@@ -0,0 +1,460 @@
+/*
+ * PowerOp support for Enhanced SpeedStep, as found in Intel's Pentium
+ * M (part of the Centrino chipset).
+ *
+ * Modelled on speedstep-centrino.c
+ *
+ * Copyright (C) 2006 David Singleton <dsingleton@mvista.com>
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/cpufreq.h>
+#include <linux/delay.h>
+#include <linux/compiler.h>
+
+#include <asm/msr.h>
+#include <asm/processor.h>
+#include <asm/cpufeature.h>
+
+struct cpu_id
+{
+ __u8 x86; /* CPU family */
+ __u8 x86_model; /* model */
+ __u8 x86_mask; /* stepping */
+};
+
+enum {
+ CPU_BANIAS,
+ CPU_DOTHAN_A1,
+ CPU_DOTHAN_A2,
+ CPU_DOTHAN_B0,
+ CPU_MP4HT_D0,
+ CPU_MP4HT_E0,
+};
+
+static const struct cpu_id cpu_ids[] = {
+ [CPU_BANIAS] = { 6, 9, 5 },
+ [CPU_DOTHAN_A1] = { 6, 13, 1 },
+ [CPU_DOTHAN_A2] = { 6, 13, 2 },
+ [CPU_DOTHAN_B0] = { 6, 13, 6 },
+ [CPU_MP4HT_D0] = {15, 3, 4 },
+ [CPU_MP4HT_E0] = {15, 4, 1 },
+};
+#define N_IDS ARRAY_SIZE(cpu_ids)
+
+struct cpu_model
+{
+ const struct cpu_id *cpu_id;
+ const char *model_name;
+ unsigned max_freq; /* max clock in kHz */
+
+ struct cpufreq_frequency_table *op_points; /* clock/voltage pairs */
+};
+static int centrino_verify_cpu_id(const struct cpuinfo_x86 *c, const struct cpu_id *x);
+
+void centrino_set_frequency(struct oppoint *op, uint freq, uint volt)
+{
+ op->frequency = freq * 1000;
+ op->voltage = volt;
+ op->md_data = (void *)(((freq / 100) << 8) | (volt - 700) / 16);
+ printk("freq %d volt %d msr 0x%x\n", op->frequency, op->voltage,
+ (unsigned int)op->md_data);
+}
+EXPORT_SYMBOL(centrino_set_frequency);
+
+int centrino_transition(struct oppoint *cur, struct oppoint *new)
+{
+ unsigned int msr, oldmsr = 0, h = 0;
+
+ if (cur == new)
+ return 0;
+
+ msr = (unsigned int)new->md_data;
+ rdmsr(MSR_IA32_PERF_CTL, oldmsr, h);
+
+ /* all but 16 LSB are reserved, treat them with care */
+ oldmsr &= ~0xffff;
+ msr &= 0xffff;
+ oldmsr |= msr;
+
+ wrmsr(MSR_IA32_PERF_CTL, oldmsr, h);
+
+ udelay(new->latency);
+
+ return 0;
+}
+EXPORT_SYMBOL(centrino_transition);
+
+#define OP(mhz, mv) \
+ { \
+ .frequency = (mhz) * 1000, \
+ .index = (((mhz)/100) << 8) | ((mv - 700) / 16) \
+ }
+
+/*
+ * These voltage tables were derived from the Intel Pentium M
+ * datasheet, document 25261202.pdf, Table 5. I have verified they
+ * are consistent with my IBM ThinkPad X31, which has a 1.3GHz Pentium
+ * M.
+ */
+
+/* Ultra Low Voltage Intel Pentium M processor 900MHz (Banias) */
+static struct cpufreq_frequency_table banias_900[] =
+{
+ OP(600, 844),
+ OP(800, 988),
+ OP(900, 1004),
+ { .frequency = CPUFREQ_TABLE_END }
+};
+/* Ultra Low Voltage Intel Pentium M processor 1000MHz (Banias) */
+static struct cpufreq_frequency_table banias_1000[] =
+{
+ OP(600, 844),
+ OP(800, 972),
+ OP(900, 988),
+ OP(1000, 1004),
+ { .frequency = CPUFREQ_TABLE_END }
+};
+
+/* Low Voltage Intel Pentium M processor 1.10GHz (Banias) */
+static struct cpufreq_frequency_table banias_1100[] =
+{
+ OP( 600, 956),
+ OP( 800, 1020),
+ OP( 900, 1100),
+ OP(1000, 1164),
+ OP(1100, 1180),
+ { .frequency = CPUFREQ_TABLE_END }
+};
+
+
+/* Low Voltage Intel Pentium M processor 1.20GHz (Banias) */
+static struct cpufreq_frequency_table banias_1200[] =
+{
+ OP( 600, 956),
+ OP( 800, 1004),
+ OP( 900, 1020),
+ OP(1000, 1100),
+ OP(1100, 1164),
+ OP(1200, 1180),
+ { .frequency = CPUFREQ_TABLE_END }
+};
+
+/* Intel Pentium M processor 1.30GHz (Banias) */
+static struct cpufreq_frequency_table banias_1300[] =
+{
+ OP( 600, 956),
+ OP( 800, 1260),
+ OP(1000, 1292),
+ OP(1200, 1356),
+ OP(1300, 1388),
+ { .frequency = CPUFREQ_TABLE_END }
+};
+
+/* Intel Pentium M processor 1.40GHz (Banias) */
+static struct cpufreq_frequency_table banias_1400[] =
+{
+ OP( 600, 956),
+ OP( 800, 1180),
+ OP(1000, 1308),
+ OP(1200, 1436),
+ OP(1400, 1484),
+ { .frequency = CPUFREQ_TABLE_END }
+};
+
+/* Intel Pentium M processor 1.50GHz (Banias) */
+static struct cpufreq_frequency_table banias_1500[] =
+{
+ OP( 600, 956),
+ OP( 800, 1116),
+ OP(1000, 1228),
+ OP(1200, 1356),
+ OP(1400, 1452),
+ OP(1500, 1484),
+ { .frequency = CPUFREQ_TABLE_END }
+};
+
+/* Intel Pentium M processor 1.60GHz (Banias) */
+static struct cpufreq_frequency_table banias_1600[] =
+{
+ OP( 600, 956),
+ OP( 800, 1036),
+ OP(1000, 1164),
+ OP(1200, 1276),
+ OP(1400, 1420),
+ OP(1600, 1484),
+ { .frequency = CPUFREQ_TABLE_END }
+};
+
+/* Intel Pentium M processor 1.70GHz (Banias) */
+static struct cpufreq_frequency_table banias_1700[] =
+{
+ OP( 600, 956),
+ OP( 800, 1004),
+ OP(1000, 1116),
+ OP(1200, 1228),
+ OP(1400, 1308),
+ OP(1700, 1484),
+ { .frequency = CPUFREQ_TABLE_END }
+};
+
+#define _BANIAS(cpuid, max, name) \
+{ .cpu_id = cpuid, \
+ .model_name = "Intel(R) Pentium(R) M processor " name "MHz", \
+ .max_freq = (max)*1000, \
+ .op_points = banias_##max, \
+}
+#define BANIAS(max) _BANIAS(&cpu_ids[CPU_BANIAS], max, #max)
+
+static struct oppoint lowest = {
+ .name = "lowest",
+ .type = PM_FREQ_CHANGE,
+ .frequency = 0,
+ .voltage = 0,
+ .latency = 15,
+ .prepare_transition = cpufreq_prepare_transition,
+ .transition = centrino_transition,
+ .finish_transition = cpufreq_finish_transition,
+};
+
+static struct oppoint low = {
+ .name = "low",
+ .type = PM_FREQ_CHANGE,
+ .latency = 15,
+ .prepare_transition = cpufreq_prepare_transition,
+ .transition = centrino_transition,
+ .finish_transition = cpufreq_finish_transition,
+};
+
+static struct oppoint mediumlow = {
+ .name = "mediumlow",
+ .type = PM_FREQ_CHANGE,
+ .latency = 15,
+ .prepare_transition = cpufreq_prepare_transition,
+ .transition = centrino_transition,
+ .finish_transition = cpufreq_finish_transition,
+};
+
+static struct oppoint medium = {
+ .name = "medium",
+ .type = PM_FREQ_CHANGE,
+ .latency = 15,
+ .prepare_transition = cpufreq_prepare_transition,
+ .transition = centrino_transition,
+ .finish_transition = cpufreq_finish_transition,
+};
+
+static struct oppoint mediumhigh = {
+ .name = "mediumhigh",
+ .type = PM_FREQ_CHANGE,
+ .latency = 15,
+ .prepare_transition = cpufreq_prepare_transition,
+ .transition = centrino_transition,
+ .finish_transition = cpufreq_finish_transition,
+};
+
+static struct oppoint high = {
+ .name = "high",
+ .type = PM_FREQ_CHANGE,
+ .latency = 15,
+ .prepare_transition = cpufreq_prepare_transition,
+ .transition = centrino_transition,
+ .finish_transition = cpufreq_finish_transition,
+};
+
+static struct oppoint highest = {
+ .name = "highest",
+ .type = PM_FREQ_CHANGE,
+ .latency = 15,
+ .prepare_transition = cpufreq_prepare_transition,
+ .transition = centrino_transition,
+ .finish_transition = cpufreq_finish_transition,
+};
+
+/* CPU models, their operating frequency range, and freq/voltage
+ operating points */
+static struct cpu_model models[] =
+{
+ _BANIAS(&cpu_ids[CPU_BANIAS], 900, " 900"),
+ BANIAS(1000),
+ BANIAS(1100),
+ BANIAS(1200),
+ BANIAS(1300),
+ BANIAS(1400),
+ BANIAS(1500),
+ BANIAS(1600),
+ BANIAS(1700),
+
+ /* NULL model_name is a wildcard */
+ { &cpu_ids[CPU_DOTHAN_A1], NULL, 0, NULL },
+ { &cpu_ids[CPU_DOTHAN_A2], NULL, 0, NULL },
+ { &cpu_ids[CPU_DOTHAN_B0], NULL, 0, NULL },
+ { &cpu_ids[CPU_MP4HT_D0], NULL, 0, NULL },
+ { &cpu_ids[CPU_MP4HT_E0], NULL, 0, NULL },
+
+ { NULL, }
+};
+#undef _BANIAS
+#undef BANIAS
+
+static int __init centrino_init_oppoint(void)
+{
+ struct cpuinfo_x86 *cpu = &cpu_data[0];
+ struct cpu_model *model;
+
+ for(model = models; model->cpu_id != NULL; model++) {
+ if (centrino_verify_cpu_id(cpu, model->cpu_id) &&
+ (model->model_name == NULL ||
+ strcmp(cpu->x86_model_id, model->model_name) == 0))
+ break;
+ }
+
+ if (model->cpu_id == NULL) {
+ /* No match at all */
+ printk("no support for CPU model %s\n", cpu->x86_model_id);
+ return -ENOENT;
+ }
+
+ printk("found \"%s\": max frequency: %dkHz\n",
+ model->model_name, model->max_freq);
+ switch (model->max_freq) {
+ case (900000) :
+ {
+ centrino_set_frequency(&low, 600, 844);
+ centrino_set_frequency(&medium, 800, 988);
+ centrino_set_frequency(&high, 900, 1004);
+ break;
+ }
+ case (1000000) :
+ {
+ centrino_set_frequency(&low, 600, 844);
+ centrino_set_frequency(&medium, 800, 972);
+ centrino_set_frequency(&high, 900, 988);
+ centrino_set_frequency(&highest, 1000, 1004);
+ break;
+ }
+ case (1100000) :
+ {
+ centrino_set_frequency(&lowest, 600, 956);
+ centrino_set_frequency(&low, 800, 1020);
+ centrino_set_frequency(&medium, 900, 1100);
+ centrino_set_frequency(&high, 1000, 1164);
+ centrino_set_frequency(&highest, 1100, 1180);
+ break;
+ }
+ case (1200000) :
+ {
+ centrino_set_frequency(&lowest, 600, 956);
+ centrino_set_frequency(&low, 800, 1004);
+ centrino_set_frequency(&medium, 900, 1020);
+ centrino_set_frequency(&mediumhigh, 1000, 1100);
+ centrino_set_frequency(&high, 1100, 1164);
+ centrino_set_frequency(&highest, 1200, 1180);
+ break;
+ }
+ case (1300000) :
+ {
+ centrino_set_frequency(&lowest, 600, 956);
+ centrino_set_frequency(&low, 800, 1260);
+ centrino_set_frequency(&medium, 1000, 1292);
+ centrino_set_frequency(&high, 1200, 1356);
+ centrino_set_frequency(&highest, 1300, 1388);
+ break;
+ }
+ case (1400000) :
+ {
+ centrino_set_frequency(&lowest, 600, 956);
+ centrino_set_frequency(&low, 800, 1180);
+ centrino_set_frequency(&medium, 1000, 1308);
+ centrino_set_frequency(&high, 1200, 1436);
+ centrino_set_frequency(&highest, 1400, 1484);
+ break;
+ }
+ case (1500000) :
+ {
+ centrino_set_frequency(&lowest, 600, 956);
+ centrino_set_frequency(&low, 800, 1116);
+ centrino_set_frequency(&medium, 1000, 1228);
+ centrino_set_frequency(&mediumhigh, 1200, 1356);
+ centrino_set_frequency(&high, 1400, 1452);
+ centrino_set_frequency(&highest, 1500, 1484);
+ break;
+ }
+ case (1600000) :
+ {
+ centrino_set_frequency(&lowest, 600, 956);
+ centrino_set_frequency(&low, 800, 1036);
+ centrino_set_frequency(&medium, 1000, 1164);
+ centrino_set_frequency(&mediumhigh, 1200, 1276);
+ centrino_set_frequency(&high, 1400, 1420);
+ centrino_set_frequency(&highest, 1600, 1484);
+ break;
+ }
+ case (1700000) :
+ {
+ centrino_set_frequency(&lowest, 600, 956);
+ centrino_set_frequency(&low, 800, 1004);
+ centrino_set_frequency(&medium, 1000, 1116);
+ centrino_set_frequency(&mediumhigh, 1200, 1228);
+ centrino_set_frequency(&high, 1400, 1308);
+ centrino_set_frequency(&highest, 1700, 1484);
+ break;
+ }
+ }
+ if (lowest.frequency)
+ list_add_tail(&lowest.list, &pm_states.list);
+ if (low.frequency)
+ list_add_tail(&low.list, &pm_states.list);
+ if (mediumlow.frequency)
+ list_add_tail(&mediumlow.list, &pm_states.list);
+ if (medium.frequency)
+ list_add_tail(&medium.list, &pm_states.list);
+ if (mediumhigh.frequency)
+ list_add_tail(&mediumhigh.list, &pm_states.list);
+ if (high.frequency) {
+ list_add_tail(&high.list, &pm_states.list);
+ current_state = &high;
+ }
+ if (highest.frequency) {
+ list_add_tail(&highest.list, &pm_states.list);
+ current_state = &highest;
+ }
+ return 0;
+}
+
+static void centrino_exit_oppoint(void)
+{
+ if (lowest.frequency)
+ list_del_init(&lowest.list);
+ if (low.frequency)
+ list_del_init(&low.list);
+ if (mediumlow.frequency)
+ list_del_init(&mediumlow.list);
+ if (medium.frequency)
+ list_del_init(&medium.list);
+ if (mediumhigh.frequency)
+ list_del_init(&mediumhigh.list);
+ if (high.frequency)
+ list_del_init(&high.list);
+ if (highest.frequency)
+ list_del_init(&highest.list);
+ return;
+}
+
+static int centrino_verify_cpu_id(const struct cpuinfo_x86 *c, const struct cpu_id *x)
+{
+ if ((c->x86 == x->x86) &&
+ (c->x86_model == x->x86_model) &&
+ (c->x86_mask == x->x86_mask))
+ return 1;
+ return 0;
+}
+
+MODULE_AUTHOR ("David Singleton <dsingleton@mvista.com>");
+MODULE_DESCRIPTION ("PowerOp operting points for Intel Pentium M processors.");
+MODULE_LICENSE ("GPL");
+
+late_initcall(centrino_init_oppoint);
+module_exit(centrino_exit_oppoint);
[-- Attachment #3: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-14 22:24 ` Matthew Locke
2006-08-14 22:46 ` Dave Jones
@ 2006-08-14 23:29 ` Dominik Brodowski
2006-08-14 23:48 ` Matthew Locke
1 sibling, 1 reply; 136+ messages in thread
From: Dominik Brodowski @ 2006-08-14 23:29 UTC (permalink / raw)
To: Matthew Locke; +Cc: linux-pm
Hi,
On Mon, Aug 14, 2006 at 03:24:19PM -0700, Matthew Locke wrote:
> I am a little concerned that none of the cpufreq developers have
> responded. I was hoping to get their feedback.
Graduating from one law school, moving to the US, and adapting to another law
school proved to be quite time-consuming for me, but I hope to get back to
linux-related things within this and the next week -- so please excuse my
delay so far...
Thanks,
Dominik
^ permalink raw reply [flat|nested] 136+ messages in thread
* Re: So, what's the status on the recent patches here?
2006-08-14 23:29 ` Dominik Brodowski
@ 2006-08-14 23:48 ` Matthew Locke
0 siblings, 0 replies; 136+ messages in thread
From: Matthew Locke @ 2006-08-14 23:48 UTC (permalink / raw)
To: Dominik Brodowski; +Cc: linux-pm
Dominik,
On Aug 14, 2006, at 4:29 PM, Dominik Brodowski wrote:
> Hi,
>
> On Mon, Aug 14, 2006 at 03:24:19PM -0700, Matthew Locke wrote:
>> I am a little concerned that none of the cpufreq developers have
>> responded. I was hoping to get their feedback.
>
> Graduating from one law school, moving to the US, and adapting to
> another law
> school proved to be quite time-consuming for me, but I hope to get
> back to
> linux-related things within this and the next week -- so please excuse
> my
> delay so far...
I hope the transition is happening smoothly. Looking forward to your
comments when you have time.
>
> Thanks,
> Dominik
>
^ permalink raw reply [flat|nested] 136+ messages in thread
end of thread, other threads:[~2006-09-11 18:58 UTC | newest]
Thread overview: 136+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-08-24 14:52 So, what's the status on the recent patches here? Woodruff, Richard
2006-08-25 19:58 ` Pavel Machek
-- strict thread matches above, loose matches on Subject: below --
2006-09-05 16:03 Scott E. Preece
2006-09-05 20:42 ` Rafael J. Wysocki
2006-09-06 10:56 ` Pavel Machek
2006-09-04 15:43 Scott E. Preece
2006-09-03 23:05 Scott E. Preece
2006-09-04 9:09 ` Pavel Machek
2006-09-03 23:00 Scott E. Preece
2006-09-04 9:12 ` Pavel Machek
2006-09-05 10:31 ` Rafael J. Wysocki
2006-09-03 22:40 Scott E. Preece
2006-09-04 9:06 ` Pavel Machek
2006-09-05 16:45 ` Mark Gross
2006-09-06 10:59 ` Pavel Machek
2006-09-03 22:31 Scott E. Preece
2006-09-03 22:41 ` Pavel Machek
2006-09-03 22:12 Scott E. Preece
2006-09-03 22:25 ` Pavel Machek
2006-09-03 21:34 Scott E. Preece
2006-09-03 21:43 ` Pavel Machek
2006-09-03 22:10 ` Rafael J. Wysocki
2006-09-03 21:21 Scott E. Preece
2006-09-03 21:54 ` Pavel Machek
2006-09-01 14:49 Scott E. Preece
2006-08-31 15:14 Scott E. Preece
2006-08-31 2:41 Woodruff, Richard
2006-08-31 0:52 Scott E. Preece
2006-08-25 22:11 Woodruff, Richard
2006-08-25 21:21 Woodruff, Richard
2006-08-25 21:42 ` Alan Stern
2006-08-25 20:57 Woodruff, Richard
2006-08-25 21:13 ` Alan Stern
2006-08-25 20:22 Woodruff, Richard
2006-08-25 20:34 ` Alan Stern
2006-08-25 21:27 ` Pavel Machek
2006-08-25 21:46 ` Alan Stern
2006-08-25 22:03 ` Pavel Machek
2006-08-26 2:21 ` Alan Stern
2006-08-25 20:05 Woodruff, Richard
2006-08-25 20:08 ` Pavel Machek
2006-08-24 12:16 Woodruff, Richard
2006-08-24 12:29 ` Pavel Machek
2006-08-23 19:20 Woodruff, Richard
2006-08-24 8:03 ` Pavel Machek
2006-08-20 13:36 Woodruff, Richard
2006-08-16 1:27 Scott E. Preece
2006-08-16 15:25 ` Mark Gross
2006-08-14 20:07 Greg KH
2006-08-14 22:24 ` Matthew Locke
2006-08-14 22:46 ` Dave Jones
2006-08-14 23:24 ` Matthew Locke
2006-08-14 23:48 ` Dave Jones
2006-08-15 1:00 ` Greg KH
2006-08-15 3:03 ` Dave Jones
2006-08-15 10:35 ` Amit Kucheria
2006-08-15 19:04 ` Dave Jones
2006-08-16 12:58 ` Igor Stoppa
2006-08-17 21:39 ` Pavel Machek
2006-08-18 10:02 ` Igor Stoppa
2006-08-18 15:29 ` Alexey Starikovskiy
2006-08-18 17:54 ` Igor Stoppa
2006-08-18 21:05 ` Alexey Starikovskiy
2006-08-20 13:19 ` Igor Stoppa
2006-08-17 5:20 ` Matthew Locke
2006-08-17 7:20 ` Paul Mundt
2006-08-17 9:18 ` Amit Kucheria
2006-08-17 21:40 ` Pavel Machek
2006-08-18 5:42 ` Vitaly Wool
2006-08-23 12:28 ` Pavel Machek
2006-08-23 15:26 ` Igor Stoppa
2006-08-24 12:58 ` Vitaly Wool
2006-08-25 19:55 ` Pavel Machek
2006-08-25 23:26 ` Vitaly Wool
2006-08-26 10:18 ` Pavel Machek
2006-08-26 13:30 ` Vitaly Wool
2006-08-26 13:46 ` Pavel Machek
2006-08-28 16:40 ` Mark Gross
2006-08-28 17:39 ` Pavel Machek
2006-08-29 7:51 ` Matthew Locke
2006-08-30 22:13 ` Mark Gross
2006-08-30 22:27 ` Pavel Machek
2006-08-18 11:48 ` Amit Kucheria
2006-08-24 7:59 ` Pavel Machek
2006-08-30 11:00 ` Amit Kucheria
2006-08-30 22:36 ` Pavel Machek
2006-08-31 13:44 ` Amit Kucheria
2006-09-02 11:17 ` Pavel Machek
2006-08-17 21:24 ` Pavel Machek
2006-08-19 6:10 ` David Singleton
2006-08-22 2:13 ` Greg KH
2006-08-22 5:20 ` David Singleton
2006-08-23 19:05 ` Mark Gross
2006-08-24 12:39 ` Pavel Machek
2006-08-19 6:19 ` David Singleton
[not found] ` <20060819184843.GB15644@redhat.com>
2006-08-20 3:20 ` David Singleton
2006-08-20 3:30 ` Dave Jones
2006-08-23 18:50 ` Mark Gross
2006-08-27 4:37 ` David Singleton
2006-08-27 15:41 ` Pavel Machek
2006-08-29 15:55 ` David Singleton
2006-08-29 16:34 ` Pavel Machek
2006-08-29 17:49 ` Preece Scott-PREECE
2006-08-30 6:20 ` Matthew Locke
2006-08-30 13:26 ` Preece Scott-PREECE
2006-08-30 22:50 ` Pavel Machek
2006-08-31 0:22 ` Preece Scott-PREECE
2006-08-31 12:04 ` Pavel Machek
2006-09-02 18:05 ` David Singleton
2006-09-02 19:30 ` Rafael J. Wysocki
2006-09-03 16:25 ` David Singleton
2006-09-03 20:57 ` Rafael J. Wysocki
2006-09-03 21:33 ` Pavel Machek
2006-09-09 0:39 ` David Singleton
2006-09-09 0:48 ` David Singleton
2006-09-09 16:13 ` Pavel Machek
2006-09-09 12:17 ` Pavel Machek
2006-09-11 15:11 ` David Singleton
2006-09-11 17:14 ` Pavel Machek
2006-09-11 18:58 ` Matthew Locke
2006-08-30 4:52 ` David Singleton
2006-08-30 5:52 ` Matthew Locke
2006-08-30 13:39 ` Preece Scott-PREECE
2006-08-30 22:43 ` Pavel Machek
2006-08-27 19:48 ` Greg KH
2006-08-28 0:07 ` David Singleton
2006-08-27 20:54 ` Eugeny S. Mints
2006-08-28 22:18 ` Pavel Machek
2006-08-29 21:46 ` Eugeny S. Mints
2006-08-29 1:29 ` David Singleton
2006-08-29 22:39 ` Eugeny S. Mints
2006-08-31 13:27 ` Amit Kucheria
2006-08-31 19:22 ` Preece Scott-PREECE
2006-09-01 8:11 ` Amit Kucheria
2006-08-14 23:29 ` Dominik Brodowski
2006-08-14 23:48 ` Matthew Locke
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.