linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RFC: /sys/power/policy_preference
@ 2010-06-16 21:05 Len Brown
  2010-06-17  6:03 ` [linux-pm] " Igor.Stoppa
                   ` (4 more replies)
  0 siblings, 5 replies; 26+ messages in thread
From: Len Brown @ 2010-06-16 21:05 UTC (permalink / raw)
  To: Linux Power Management List, Linux Kernel Mailing List, linux-acpi

Create /sys/power/policy_preference, giving user-space
the ability to express its preference for kernel based
power vs. performance decisions in a single place.

This gives kernel sub-systems and drivers a central place
to discover this system-wide policy preference.
It also allows user-space to not have to be updated
every time a sub-system or driver adds a new power/perf knob.

policy_preference has 5 levels, from max_performance
through max_powersave.  Here is how 4 parts of the kernel
might respond to those 5 levels:

max_performance (unwilling to sacrifice any performance)
	scheduler: default (optimized for performance)
	cpuidle: disable all C-states except polling mode
	ondemand: disable all P-states except max perf
	msr_ia32_energy_perf_bias: 0 of 15

performance (care primarily about performance)
	scheduler: default (optimized for performance)
	cpuidle: enable all C-states subject to QOS
	ondemand: all P-states, using no bias
	msr_ia32_energy_perf_bias: 3 of 15

balanced (default)
	scheduler: enable sched_mc_power_savings
	cpuidle: enable all C-states subject to QOS
	ondemand: all P-states, powersave_bias=5
	msr_ia32_energy_perf_bias: 7 of 15

powersave (can sacrifice measurable performance)
	scheduler: enable sched_smt_power_savings
	cpuidle: enable all C-states, subject to QOS
	ondemand: disable turbo mode, powersave_bias=10
	msr_ia32_energy_perf_bias: 11 of 15

max_powersave (can sacrifice significant performance)
	scheduler: enable sched_smt_power_savings
	cpuidle: enable all C-states, subject to QOS
	ondemand: min P-state (do not invoke T-states)
	msr_ia32_energy_perf_bias: 15 of 15

Note that today Linux is typically operating in the mode
called "performance" above, rather than "balanced",
which is proposed to be the default.  While a system
should work well if left in "balanced" mode, it is likely
that some users would want to use "powersave" when on
battery and perhaps shift to "performance" on A/C.

Please let me know what you think.

thanks,
Len Brown, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [linux-pm] RFC: /sys/power/policy_preference
  2010-06-16 21:05 RFC: /sys/power/policy_preference Len Brown
@ 2010-06-17  6:03 ` Igor.Stoppa
  2010-06-17 19:00   ` Len Brown
  2010-06-17 16:14 ` Victor Lowther
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 26+ messages in thread
From: Igor.Stoppa @ 2010-06-17  6:03 UTC (permalink / raw)
  To: lenb, linux-pm, linux-kernel, linux-acpi

hi,

> From:  Len Brown [lenb@kernel.org]

policy_preference has 5 levels, from max_performance
through max_powersave.  Here is how 4 parts of the kernel
might respond to those 5 levels:

[levels description]

i do understand that you are mostly targetting acpi based systems, but even there, based on static leaks, it might not be always true that lower frequencies are correlated to higher power savings (or maybe i have misunderstood your draft - i am not so fluent in acpi)

> it is likely
> that some users would want to use "powersave" when on
> battery and perhaps shift to "performance" on A/C.

if we consider also the thermal envelope and the fact that "performance" might steal power from a charging battery, even ton A/C it might not be possible to settle down in one state permanently.

Or do you expect other mechanisms to intervene?

Cheers, igor

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [linux-pm] RFC: /sys/power/policy_preference
  2010-06-16 21:05 RFC: /sys/power/policy_preference Len Brown
  2010-06-17  6:03 ` [linux-pm] " Igor.Stoppa
@ 2010-06-17 16:14 ` Victor Lowther
  2010-06-17 19:02   ` Len Brown
  2010-06-19 15:17   ` Vaidyanathan Srinivasan
  2010-06-17 20:48 ` Mike Chan
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 26+ messages in thread
From: Victor Lowther @ 2010-06-17 16:14 UTC (permalink / raw)
  To: Len Brown
  Cc: Linux Power Management List, Linux Kernel Mailing List, linux-acpi





On Jun 16, 2010, at 4:05 PM, Len Brown <lenb@kernel.org> wrote:

> Create /sys/power/policy_preference, giving user-space
> the ability to express its preference for kernel based
> power vs. performance decisions in a single place.
>
> This gives kernel sub-systems and drivers a central place
> to discover this system-wide policy preference.
> It also allows user-space to not have to be updated
> every time a sub-system or driver adds a new power/perf knob.

I would prefer documenting all the current knobs and adding them to pm- 
utils so that pm-powersave knows about and can manage them. Once that  
is done, creating arbitrary powersave levels should be fairly simple.

> policy_preference has 5 levels, from max_performance
> through max_powersave.  Here is how 4 parts of the kernel
> might respond to those 5 levels:
>
> max_performance (unwilling to sacrifice any performance)
>    scheduler: default (optimized for performance)
>    cpuidle: disable all C-states except polling mode
>    ondemand: disable all P-states except max perf
>    msr_ia32_energy_perf_bias: 0 of 15
>
> performance (care primarily about performance)
>    scheduler: default (optimized for performance)
>    cpuidle: enable all C-states subject to QOS
>    ondemand: all P-states, using no bias
>    msr_ia32_energy_perf_bias: 3 of 15
>
> balanced (default)
>    scheduler: enable sched_mc_power_savings
>    cpuidle: enable all C-states subject to QOS
>    ondemand: all P-states, powersave_bias=5
>    msr_ia32_energy_perf_bias: 7 of 15
>
> powersave (can sacrifice measurable performance)
>    scheduler: enable sched_smt_power_savings
>    cpuidle: enable all C-states, subject to QOS
>    ondemand: disable turbo mode, powersave_bias=10
>    msr_ia32_energy_perf_bias: 11 of 15
>
> max_powersave (can sacrifice significant performance)
>    scheduler: enable sched_smt_power_savings
>    cpuidle: enable all C-states, subject to QOS
>    ondemand: min P-state (do not invoke T-states)
>    msr_ia32_energy_perf_bias: 15 of 15
>
> Note that today Linux is typically operating in the mode
> called "performance" above, rather than "balanced",
> which is proposed to be the default.  While a system
> should work well if left in "balanced" mode, it is likely
> that some users would want to use "powersave" when on
> battery and perhaps shift to "performance" on A/C.
>
> Please let me know what you think.
>
> thanks,
> Len Brown, Intel Open Source Technology Center
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/linux-pm

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [linux-pm] RFC: /sys/power/policy_preference
  2010-06-17  6:03 ` [linux-pm] " Igor.Stoppa
@ 2010-06-17 19:00   ` Len Brown
  0 siblings, 0 replies; 26+ messages in thread
From: Len Brown @ 2010-06-17 19:00 UTC (permalink / raw)
  To: Igor.Stoppa; +Cc: linux-pm, linux-kernel, linux-acpi

On Thu, 17 Jun 2010, Igor.Stoppa@nokia.com wrote:

> i do understand that you are mostly targetting acpi based systems,
> but even there, based on static leaks, it might not be always true
> that lower frequencies are correlated to higher power savings
> (or maybe i have misunderstood your draft - i am not so fluent in acpi)

Right, my assertion is that ondemand deals only with P-states,
where, by defintion, the deeper the P-state the lower the voltage,
the higher the efficiency.

I assume that ondemand is not used to enable T-states
where the clock is throttled w/o lowering the voltage.
I put a note to try to make that clear under
max_powersave:

"ondemand: min P-state (do not invoke T-states)"

Of course it is also possible for a processor to do a poor job
implementing P-states and a great job optimizing idle states
such that race to idle were always a win.  However, on such
a processor it would make more sense to simply disable P-states.

> > it is likely
> > that some users would want to use "powersave" when on
> > battery and perhaps shift to "performance" on A/C.
> 
> if we consider also the thermal envelope and the fact that "performance"
> might steal power from a charging battery, even ton A/C it might not be 
> possible to settle down in one state permanently.
> 
> Or do you expect other mechanisms to intervene?

Typical laptop BIOS commonly implement a scheme where
they maximize performance on AC and bias towards saving energy
on DC.

That, of course, is just one example use-model.
Here Linux user-space can choose whatever policy
makes sense for them at run-time.

cheers,
-Len Brown, Intel Open Source Technology Center


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [linux-pm] RFC: /sys/power/policy_preference
  2010-06-17 16:14 ` Victor Lowther
@ 2010-06-17 19:02   ` Len Brown
  2010-06-17 22:23     ` Victor Lowther
  2010-06-19 15:17   ` Vaidyanathan Srinivasan
  1 sibling, 1 reply; 26+ messages in thread
From: Len Brown @ 2010-06-17 19:02 UTC (permalink / raw)
  To: Victor Lowther
  Cc: Linux Power Management List, Linux Kernel Mailing List, linux-acpi


> On Jun 16, 2010, at 4:05 PM, Len Brown <lenb@kernel.org> wrote:
> 
> > Create /sys/power/policy_preference, giving user-space
> > the ability to express its preference for kernel based
> > power vs. performance decisions in a single place.
> > 
> > This gives kernel sub-systems and drivers a central place
> > to discover this system-wide policy preference.
> > It also allows user-space to not have to be updated
> > every time a sub-system or driver adds a new power/perf knob.
> 
> I would prefer documenting all the current knobs and adding them to pm-utils
> so that pm-powersave knows about and can manage them. Once that is done,
> creating arbitrary powersave levels should be fairly simple.


The idea here is to not require user-space to need updating
whenever a future knob is invented.  We can do a great job
at documenting the past, but a poor job of documenting the future:-)

cheers,
Len Brown, Intel Open Source Technolgy Center

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: /sys/power/policy_preference
  2010-06-16 21:05 RFC: /sys/power/policy_preference Len Brown
  2010-06-17  6:03 ` [linux-pm] " Igor.Stoppa
  2010-06-17 16:14 ` Victor Lowther
@ 2010-06-17 20:48 ` Mike Chan
  2010-06-18  6:25   ` Len Brown
  2010-06-21 20:10 ` [linux-pm] " Dipankar Sarma
  2010-09-28 16:17 ` x86_energy_perf_policy.c Len Brown
  4 siblings, 1 reply; 26+ messages in thread
From: Mike Chan @ 2010-06-17 20:48 UTC (permalink / raw)
  To: Len Brown
  Cc: Linux Power Management List, Linux Kernel Mailing List, linux-acpi

On Wed, Jun 16, 2010 at 2:05 PM, Len Brown <lenb@kernel.org> wrote:
> Create /sys/power/policy_preference, giving user-space
> the ability to express its preference for kernel based
> power vs. performance decisions in a single place.
>
> This gives kernel sub-systems and drivers a central place
> to discover this system-wide policy preference.
> It also allows user-space to not have to be updated
> every time a sub-system or driver adds a new power/perf knob.
>

This might be ok as a convince feature for userspace, but if that is
the sole intention, is 5 states enough? Are these values sufficient? I
can say at least for Android this will probably won't be as useful
(but perhaps on your platforms it makes sense).

As for a place for subsystems and drivers to check for what
performance mode you're in, do my driver how to check two places now?
Whats stopping someone from overriding cpufreq, or cpuidle? I might be
confused here (if I am someone please correct me) but isn't this
somewhat along he lines of pm runtime / pm qos if drivers want to
check what power / performance state the system is in?

-- Mike

> policy_preference has 5 levels, from max_performance
> through max_powersave.  Here is how 4 parts of the kernel
> might respond to those 5 levels:
>
> max_performance (unwilling to sacrifice any performance)
>        scheduler: default (optimized for performance)
>        cpuidle: disable all C-states except polling mode
>        ondemand: disable all P-states except max perf
>        msr_ia32_energy_perf_bias: 0 of 15
>
> performance (care primarily about performance)
>        scheduler: default (optimized for performance)
>        cpuidle: enable all C-states subject to QOS
>        ondemand: all P-states, using no bias
>        msr_ia32_energy_perf_bias: 3 of 15
>
> balanced (default)
>        scheduler: enable sched_mc_power_savings
>        cpuidle: enable all C-states subject to QOS
>        ondemand: all P-states, powersave_bias=5
>        msr_ia32_energy_perf_bias: 7 of 15
>
> powersave (can sacrifice measurable performance)
>        scheduler: enable sched_smt_power_savings
>        cpuidle: enable all C-states, subject to QOS
>        ondemand: disable turbo mode, powersave_bias=10
>        msr_ia32_energy_perf_bias: 11 of 15
>
> max_powersave (can sacrifice significant performance)
>        scheduler: enable sched_smt_power_savings
>        cpuidle: enable all C-states, subject to QOS
>        ondemand: min P-state (do not invoke T-states)
>        msr_ia32_energy_perf_bias: 15 of 15
>
> Note that today Linux is typically operating in the mode
> called "performance" above, rather than "balanced",
> which is proposed to be the default.  While a system
> should work well if left in "balanced" mode, it is likely
> that some users would want to use "powersave" when on
> battery and perhaps shift to "performance" on A/C.
>
> Please let me know what you think.
>
> thanks,
> Len Brown, Intel Open Source Technology Center
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [linux-pm] RFC: /sys/power/policy_preference
  2010-06-17 19:02   ` Len Brown
@ 2010-06-17 22:23     ` Victor Lowther
  2010-06-18  5:56       ` Len Brown
  0 siblings, 1 reply; 26+ messages in thread
From: Victor Lowther @ 2010-06-17 22:23 UTC (permalink / raw)
  To: Len Brown
  Cc: Linux Power Management List, Linux Kernel Mailing List, linux-acpi

On Thu, Jun 17, 2010 at 2:02 PM, Len Brown <lenb@kernel.org> wrote:
>
>> On Jun 16, 2010, at 4:05 PM, Len Brown <lenb@kernel.org> wrote:
>>
>> > Create /sys/power/policy_preference, giving user-space
>> > the ability to express its preference for kernel based
>> > power vs. performance decisions in a single place.
>> >
>> > This gives kernel sub-systems and drivers a central place
>> > to discover this system-wide policy preference.
>> > It also allows user-space to not have to be updated
>> > every time a sub-system or driver adds a new power/perf knob.
>>
>> I would prefer documenting all the current knobs and adding them to pm-utils
>> so that pm-powersave knows about and can manage them. Once that is done,
>> creating arbitrary powersave levels should be fairly simple.
>
>
> The idea here is to not require user-space to need updating
> whenever a future knob is invented.  We can do a great job
> at documenting the past, but a poor job of documenting the future:-)

Well, I would suggest that the habit of not documenting what is
happening with power management in the kernel needs to change, then.

Having the documentation and example code for how to tweak the various
power management settings from userspace is inherently more flexible
than trying to expose a single knob from the kernel to userspace for
power management, with little loss of flexibility.
> cheers,
> Len Brown, Intel Open Source Technolgy Center
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [linux-pm] RFC: /sys/power/policy_preference
  2010-06-17 22:23     ` Victor Lowther
@ 2010-06-18  5:56       ` Len Brown
  2010-06-18 11:55         ` Victor Lowther
  0 siblings, 1 reply; 26+ messages in thread
From: Len Brown @ 2010-06-18  5:56 UTC (permalink / raw)
  To: Victor Lowther
  Cc: Linux Power Management List, Linux Kernel Mailing List, linux-acpi

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1269 bytes --]

On Thu, 17 Jun 2010, Victor Lowther wrote:

> > The idea here is to not require user-space to need updating
> > whenever a future knob is invented.  We can do a great job
> > at documenting the past, but a poor job of documenting the future:-)
> 
> Well, I would suggest that the habit of not documenting what is
> happening with power management in the kernel needs to change, then.

Actually some of the knobs I showed in the examples
have been documented for *years*, yet are ignored
by user-space today.  I don't want to insult user-space
programmers, but the reality is that simpler is usually better.

> Having the documentation and example code for how to tweak the various
> power management settings from userspace is inherently more flexible
> than trying to expose a single knob from the kernel to userspace for
> power management, with little loss of flexibility.

Yes, the ultimate in flexibility is to update user-space whenever
some new driver or new knob appears in the kernel.  I'm not proposing
that ability be taken away.  I'm proposing that in many cases it
is unnecessary.

The idea is to have the ability to add something to the 
kernel and avoid the need to make any change to user-space.

thanks,
-Len Brown, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: /sys/power/policy_preference
  2010-06-17 20:48 ` Mike Chan
@ 2010-06-18  6:25   ` Len Brown
  0 siblings, 0 replies; 26+ messages in thread
From: Len Brown @ 2010-06-18  6:25 UTC (permalink / raw)
  To: Mike Chan
  Cc: Linux Power Management List, Linux Kernel Mailing List, linux-acpi

On Thu, 17 Jun 2010, Mike Chan wrote:

> On Wed, Jun 16, 2010 at 2:05 PM, Len Brown <lenb@kernel.org> wrote:
> > Create /sys/power/policy_preference, giving user-space
> > the ability to express its preference for kernel based
> > power vs. performance decisions in a single place.
> >
> > This gives kernel sub-systems and drivers a central place
> > to discover this system-wide policy preference.
> > It also allows user-space to not have to be updated
> > every time a sub-system or driver adds a new power/perf knob.
> >
> 
> This might be ok as a convince feature for userspace, but if that is
> the sole intention, is 5 states enough?
>
> Are these values sufficient? I
> can say at least for Android this will probably won't be as useful
> (but perhaps on your platforms it makes sense).

Honestly, my first thought was to use 100 values -- a percentage.
But I got quickly taked out of it by people much wiser than me.

Consider that the vendors that are cleaning Linux's clock
on laptops seem quite content with 3 values at the user-interface.
So one might argue that 5 levels is already 66% more complexity
than needed:-)

Some suggested special case states, eg for HPC.
But those needs didn't fit into this simple power vs performance
continuum, and every consumer of this interface needs to undertand
every state, so adding special states would be a mistake.

The folks that do HPC and the folks that do embedded devices
are smart enough to tune their systems without using this
rather blunt instrument.  They should continue to do so,
and this mechanism should not get in their way.

For example, if this mechanism is used to update powersave_bias
inside ondemand, but at the same time somebody tunes powersave_bias
by hand, the by-hand tuning must win.

> As for a place for subsystems and drivers to check for what
> performance mode you're in, do my driver how to check two places now?
> Whats stopping someone from overriding cpufreq, or cpuidle? I might be
> confused here (if I am someone please correct me) but isn't this
> somewhat along he lines of pm runtime / pm qos if drivers want to
> check what power / performance state the system is in?

pm runtime and pm qos are much bigger hammers, and this
mechanism is intended to complement them, not replace them.

Simply stated, this mechanism is intended just to give
a global hint of the user's power vs. performance preference
at a given time.  There are places in the kernel and drivers
where power vs performance decisions are made with zero
concept of user preference, and this hint can help there.

Other parts of the kernel don't care, or have sufficient
information to make informed decisions, and thus they
simply wouldn't need to make use of this hint.

thanks,
Len Brown, Intel Open Source Technology Center


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [linux-pm] RFC: /sys/power/policy_preference
  2010-06-18  5:56       ` Len Brown
@ 2010-06-18 11:55         ` Victor Lowther
  0 siblings, 0 replies; 26+ messages in thread
From: Victor Lowther @ 2010-06-18 11:55 UTC (permalink / raw)
  To: Len Brown
  Cc: Linux Power Management List, Linux Kernel Mailing List, linux-acpi

On Fri, Jun 18, 2010 at 12:56 AM, Len Brown <lenb@kernel.org> wrote:
> On Thu, 17 Jun 2010, Victor Lowther wrote:
>
>> > The idea here is to not require user-space to need updating
>> > whenever a future knob is invented.  We can do a great job
>> > at documenting the past, but a poor job of documenting the future:-)
>>
>> Well, I would suggest that the habit of not documenting what is
>> happening with power management in the kernel needs to change, then.
>
> Actually some of the knobs I showed in the examples
> have been documented for *years*, yet are ignored
> by user-space today.  I don't want to insult user-space
> programmers, but the reality is that simpler is usually better.

Let me explain where I am coming from, then.  I maintain pm-utils, one
of the main low-level bodies of userspace code that concerns itself
with power management.  I am currently in the process of standardizing
some of the more common power management tweaks so that they will work
in a cross distro manner, and know from this that the documentation we
have is badly fragmented -- if you know exactly what you are looking
for, you can google or grep for it, but if you do not, there is no
easy way to find a list of all the power management settings you can
tune.

>> Having the documentation and example code for how to tweak the various
>> power management settings from userspace is inherently more flexible
>> than trying to expose a single knob from the kernel to userspace for
>> power management, with little loss of flexibility.
>
> Yes, the ultimate in flexibility is to update user-space whenever
> some new driver or new knob appears in the kernel.  I'm not proposing
> that ability be taken away.  I'm proposing that in many cases it
> is unnecessary.

I disagree.  Most of userspace does not care about how the system is
trying to save power.  I maintain one that does, and I do not like the
idea of adding another knob whose entire purpose is to map other,
already existing knobs onto a line, especially when we can do that in
userspace easily enough if anyone actually wants it.

> The idea is to have the ability to add something to the
> kernel and avoid the need to make any change to user-space.

Userspace in this case consists mainly of
acpi-scripts/pm-utils/laptop-mode-tools, upower,
g-p-m/kpowersave/x-p-m, and X. I can only speak for pm-utils, but the
model pm-utils, acpi-scripts, and laptop-mode-tools use does not map
to your proposed knob at all.  We use a two-state model -- either we
are on AC power and use the kernel's default power state, or we are on
battery power and set power management to a set of distro or user
chosen set of parameters.  I am working on making pm-utils contain
some predefined powersaving policies, but I do not expect them to
change the two-state model much more than changing which power
management tweaks are used in the on-ac and on-battery states.

> thanks,
> -Len Brown, Intel Open Source Technology Center
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [linux-pm] RFC: /sys/power/policy_preference
  2010-06-17 16:14 ` Victor Lowther
  2010-06-17 19:02   ` Len Brown
@ 2010-06-19 15:17   ` Vaidyanathan Srinivasan
  2010-06-19 19:04     ` Rafael J. Wysocki
  1 sibling, 1 reply; 26+ messages in thread
From: Vaidyanathan Srinivasan @ 2010-06-19 15:17 UTC (permalink / raw)
  To: Victor Lowther
  Cc: Len Brown, Linux Power Management List,
	Linux Kernel Mailing List, linux-acpi

* Victor Lowther <victor.lowther@gmail.com> [2010-06-17 11:14:50]:

> 
> 
> 
> 
> On Jun 16, 2010, at 4:05 PM, Len Brown <lenb@kernel.org> wrote:
> 
> >Create /sys/power/policy_preference, giving user-space
> >the ability to express its preference for kernel based
> >power vs. performance decisions in a single place.
> >
> >This gives kernel sub-systems and drivers a central place
> >to discover this system-wide policy preference.
> >It also allows user-space to not have to be updated
> >every time a sub-system or driver adds a new power/perf knob.
> 
> I would prefer documenting all the current knobs and adding them to
> pm-utils so that pm-powersave knows about and can manage them. Once
> that is done, creating arbitrary powersave levels should be fairly
> simple.

Hi Len,

Reading through this thread, I prefer the above recommendation.  We
have three main dimensions of (power savings) control (cpufreq,
cpuidle and scheduler) and you are combining them into a single policy
in the kernel.  The challenges are as follows:

* Number of policies will always limit flexibility
* More dimensions of control will be added in future and your
  intention is to transparently include them within these defined
  polices
* Even with the current implementations, power savings and performance
  impact widely vary based on system topology and workload.  There is
  no easy method to define modes such that one mode will _always_
  consume less power than the other
* Each subsystem can override the policy settings and create more
  combinations anyway

Your argument is that these modes can serve as a good default and allow
the user to tune the knobs directly for more sophisticated policies.
But in that case all kernel subsystem should default to the balanced
policy and let the user tweak individual subsystems for other modes.

On the other hand having the policy definitions in user space allows
us to create more flexible policies by considering higher level
factors like workload behavior, utilization, platform features,
power/thermal constraints etc.

--Vaidy

[snip]


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [linux-pm] RFC: /sys/power/policy_preference
  2010-06-19 15:17   ` Vaidyanathan Srinivasan
@ 2010-06-19 19:04     ` Rafael J. Wysocki
  0 siblings, 0 replies; 26+ messages in thread
From: Rafael J. Wysocki @ 2010-06-19 19:04 UTC (permalink / raw)
  To: svaidy, Linux Kernel Mailing List
  Cc: Victor Lowther, Len Brown, linux-acpi, Matthew Garrett, linux-pm

On Saturday, June 19, 2010, Vaidyanathan Srinivasan wrote:
> * Victor Lowther <victor.lowther@gmail.com> [2010-06-17 11:14:50]:
> 
> > 
> > 
> > 
> > 
> > On Jun 16, 2010, at 4:05 PM, Len Brown <lenb@kernel.org> wrote:
> > 
> > >Create /sys/power/policy_preference, giving user-space
> > >the ability to express its preference for kernel based
> > >power vs. performance decisions in a single place.
> > >
> > >This gives kernel sub-systems and drivers a central place
> > >to discover this system-wide policy preference.
> > >It also allows user-space to not have to be updated
> > >every time a sub-system or driver adds a new power/perf knob.
> > 
> > I would prefer documenting all the current knobs and adding them to
> > pm-utils so that pm-powersave knows about and can manage them. Once
> > that is done, creating arbitrary powersave levels should be fairly
> > simple.
> 
> Hi Len,
> 
> Reading through this thread, I prefer the above recommendation.

It also reflects my opinion quite well.

> We have three main dimensions of (power savings) control (cpufreq,
> cpuidle and scheduler) and you are combining them into a single policy
> in the kernel.

There's more than that, because we're in the process of adding runtime PM
features to I/O device drivers.

> The challenges are as follows:
> 
> * Number of policies will always limit flexibility
> * More dimensions of control will be added in future and your
>   intention is to transparently include them within these defined
>   polices
> * Even with the current implementations, power savings and performance
>   impact widely vary based on system topology and workload.  There is
>   no easy method to define modes such that one mode will _always_
>   consume less power than the other
> * Each subsystem can override the policy settings and create more
>   combinations anyway
> 
> Your argument is that these modes can serve as a good default and allow
> the user to tune the knobs directly for more sophisticated policies.
> But in that case all kernel subsystem should default to the balanced
> policy and let the user tweak individual subsystems for other modes.
> 
> On the other hand having the policy definitions in user space allows
> us to create more flexible policies by considering higher level
> factors like workload behavior, utilization, platform features,
> power/thermal constraints etc.

The policy_preference levels as proposed are also really arbitrary and they
will usually mean different things on different systems.  If the interpretation
of these values is left to device drivers, then (for example) different network
adapter drivers may interpret "performance" differently and that will lead to
different types of behavior depending on which of them is used.  I think we
should rather use interfaces that unambiguously tell the driver what to do.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [linux-pm] RFC: /sys/power/policy_preference
  2010-06-16 21:05 RFC: /sys/power/policy_preference Len Brown
                   ` (2 preceding siblings ...)
  2010-06-17 20:48 ` Mike Chan
@ 2010-06-21 20:10 ` Dipankar Sarma
  2010-09-28 16:17 ` x86_energy_perf_policy.c Len Brown
  4 siblings, 0 replies; 26+ messages in thread
From: Dipankar Sarma @ 2010-06-21 20:10 UTC (permalink / raw)
  To: Len Brown
  Cc: Linux Power Management List, Linux Kernel Mailing List, linux-acpi

On Wed, Jun 16, 2010 at 05:05:26PM -0400, Len Brown wrote:
> Create /sys/power/policy_preference, giving user-space
> the ability to express its preference for kernel based
> power vs. performance decisions in a single place.
> 
> policy_preference has 5 levels, from max_performance
> through max_powersave.  Here is how 4 parts of the kernel
> might respond to those 5 levels:

In theory this makes sense. We have been toying with something
like this, but the difficulty is that outside of benchmarking
environment, it is hard to figure out what mode to set when.
Also, the impact could be different for different workloads.
We should probably have a broader discussion around this
with data - I will share some measurements on impact
of such power modes.


> max_performance (unwilling to sacrifice any performance)
> 	scheduler: default (optimized for performance)
> 	cpuidle: disable all C-states except polling mode
> 	ondemand: disable all P-states except max perf
> 	msr_ia32_energy_perf_bias: 0 of 15
> 
> performance (care primarily about performance)
> 	scheduler: default (optimized for performance)
> 	cpuidle: enable all C-states subject to QOS
> 	ondemand: all P-states, using no bias
> 	msr_ia32_energy_perf_bias: 3 of 15
> 
> balanced (default)
> 	scheduler: enable sched_mc_power_savings
> 	cpuidle: enable all C-states subject to QOS
> 	ondemand: all P-states, powersave_bias=5
> 	msr_ia32_energy_perf_bias: 7 of 15

Would there be sufficient difference between performance
and balanced ?


> 
> powersave (can sacrifice measurable performance)
> 	scheduler: enable sched_smt_power_savings
> 	cpuidle: enable all C-states, subject to QOS
> 	ondemand: disable turbo mode, powersave_bias=10
> 	msr_ia32_energy_perf_bias: 11 of 15
> 
> max_powersave (can sacrifice significant performance)
> 	scheduler: enable sched_smt_power_savings
> 	cpuidle: enable all C-states, subject to QOS
> 	ondemand: min P-state (do not invoke T-states)
> 	msr_ia32_energy_perf_bias: 15 of 15


Thanks
Dipankar

^ permalink raw reply	[flat|nested] 26+ messages in thread

* x86_energy_perf_policy.c
  2010-06-16 21:05 RFC: /sys/power/policy_preference Len Brown
                   ` (3 preceding siblings ...)
  2010-06-21 20:10 ` [linux-pm] " Dipankar Sarma
@ 2010-09-28 16:17 ` Len Brown
  2010-10-23  4:40   ` [PATCH] tools: add x86_energy_perf_policy to program MSR_IA32_ENERGY_PERF_BIAS Len Brown
  4 siblings, 1 reply; 26+ messages in thread
From: Len Brown @ 2010-09-28 16:17 UTC (permalink / raw)
  To: Linux Power Management List, Linux Kernel Mailing List, linux-acpi, x86

/* In June, I proposed /sys/power/policy_preference
to consolidate the knobs that user-space needs to turn
to tell the kernel its performance/energy preference.

The feedback I got was that user-space doesn't want the
kernel to consolidate anything, but instead wants the
kernel to expose everything and user-space will be able
to keep up with new devices and hooks, as long as
they are sufficiently documented.

I think that past history and the current state of affairs
suggests that user-space will come up short, but who am I to judge?

So here is a utility to implement the user-space
approach for Intel's new ENERGY_PERFR_BIAS MSR.
(You'll see it on some Westmere, and all Sandy Bridge processors)

The utility translates the words "powersave",
"normal", or "performance" into the right bits for
this register, and scribbles on /dev/cpu/*/msr,
as appropriate.

I'll be delighted to re-implement this in a different way
if consensus emerges that a better way exists.

thanks,
Len Brown
Intel Open Source Technology Center
*/

/*
 * x86_energy_perf_policy -- set the energy versus performance
 * policy preference bias on recent X86 processors.
 */
/*
 * Copyright (c) 2010, Intel Corporation.
 * Len Brown <len.brown@intel.com>
 *
 * This program is free software; you can redistribute it and/or modify it
 * under the terms and conditions of the GNU General Public License,
 * version 2, as published by the Free Software Foundation.
 *
 * This program is distributed in the hope it will be useful, but WITHOUT
 * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
 * more details.
 *
 * You should have received a copy of the GNU General Public License along with
 * this program; if not, write to the Free Software Foundation, Inc.,
 * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
 */

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/resource.h>
#include <fcntl.h>
#include <signal.h>
#include <sys/time.h>
#include <stdlib.h>

unsigned int verbose;		/* set with -v */
unsigned int read_only;		/* set with -r */
char *progname;
unsigned long long new_bias;
int cpu = -1;

/*
 * Usage:
 *
 * -c cpu: limit action to a single CPU (default is all CPUs)
 * -v: verbose output (can invoke more than once)
 * -r: read-only, don't change any settings
 *
 *  performance
 *	Performance is paramount.
 *	Unwilling to sacrafice any performance
 *	for the sake of energy saving. (hardware default)
 *
 *  normal
 *	Can tolerate minor performance compromise
 *	for potentially significant energy savings.
 *	(reasonable default for most desktops and servers)
 *
 *  powersave
 *	Can tolerate significant performance hit
 *	to maximize energy savings.
 *
 * n
 * 	a numerical value to write to the underlying MSR.
 */
void usage(void)
{
	printf("%s: [-c cpu] [-v] "
		"(-r | 'performance' | 'normal' | 'powersave' | n)\n",
		progname);
}

/*
 * MSR_IA32_ENERGY_PERF_BIAS allows software to convey
 * its policy for the relative importance of performance
 * versus energy savings.
 *
 * The hardware uses this information in model-specific ways
 * when it must choose trade-offs between performance and
 * energy consumption.
 *
 * This policy hint does not supercede Processor Performance states
 * (P-states) or CPU Idle power states (C-states), but allows
 * software to have influence where it has been unable to
 * express a preference in the past.
 *
 * For example, this setting may tell the hardware how
 * aggressively or conservatively to control frequency
 * in the "turbo range" above the explicitly OS-controlled
 * P-state frequency range.  It may also tell the hardware
 * how aggressively is should enter the OS requestec C-states.
 *
 * The support for this feature is indicated by CPUID.06H.ECX.bit3
 * per the Intel Architectures Software Developer's Manual.
 */

#define MSR_IA32_ENERGY_PERF_BIAS	0x000001b0

#define	BIAS_PERFORMANCE		0
#define BIAS_BALANCE			6
#define	BIAS_POWERSAVE			15

cmdline(int argc, char **argv) {
	int opt;

	progname = argv[0];

	while((opt = getopt(argc, argv, "+rvc:")) != -1) {
		switch (opt) {
		case 'c':
			cpu = atoi(optarg);
			break;
		case 'r':
			read_only = 1;
			break;
		case 'v':
			verbose++;
			break;
		default:
			usage();
			exit(-1);
		}
	}
	/* if -r, then should be no additional optind */
	if (read_only && (argc > optind)) {
		usage();
		exit(-1);
	}

	/*
	 * if no -r , then must be one additional optind
	 */
	if (!read_only) {

		if (argc != optind + 1 ) {
			printf("must supply -r or policy param\n");
			usage();
			exit(-1);
			}

		if (!strcmp("performance", argv[optind])) {
			new_bias = BIAS_PERFORMANCE;
		} else if (!strcmp("normal", argv[optind])) {
			new_bias = BIAS_BALANCE;
		} else if (!strcmp("powersave", argv[optind])) {
			new_bias = BIAS_POWERSAVE;
		} else {
			new_bias = atoll(argv[optind]);
			if (new_bias > BIAS_POWERSAVE) {
				usage();
				exit(-1);
			}
		}
		printf("new_bias 0x%016llx\n", new_bias);
	}
}

/*
 * validate_cpuid()
 * returns on success, quietly exits on failure (make verbose with -v)
 */
void validate_cpuid(void) {
	unsigned int eax, ebx, ecx, edx, max_level;
	char brand[16];
	unsigned int fms, family, model, stepping, ht_capable;

	eax = ebx = ecx = edx = 0;

	asm("cpuid" : "=a" (max_level), "=b" (ebx), "=c" (ecx), "=d" (edx) : "a" (0));

	sprintf(brand, "%.4s%.4s%.4s", &ebx, &edx, &ecx);

	if (strncmp(brand, "GenuineIntel", 12)) {
		if (verbose) printf("CPUID: %s != GenuineIntel\n",
			brand);
		exit(-1);
	}

	asm("cpuid" : "=a" (fms), "=c" (ecx), "=d" (edx) : "a" (1) : "ebx");
	family = (fms >> 8) & 0xf;
	model = (fms >> 4) & 0xf;
	stepping = fms & 0xf;
	if (family == 6 || family == 0xf)
		model += ((fms >> 16) & 0xf) << 4;

	if (verbose > 1)
		printf("CPUID %s %d levels family:model:stepping "
			"0x%x:%x:%x (%d:%d:%d)\n",
			brand, max_level, family, model, stepping, family, model, stepping);

	if (!(edx & (1 << 5))) {
		if (verbose)
			printf("CPUID: no MSR\n");
		exit(-1);
	}

 	/*
 	 * Support for MSR_IA32_ENERGY_PERF_BIAS is indicated by CPUID.06H.ECX.bit3
 	 */
	asm("cpuid" : "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx) : "a" (6));
	if (verbose) printf("CPUID.06H.ECX: 0x%x\n", ecx);
	if (!(ecx & (1 << 3))) {
		if (verbose)
			printf("CPUID: No MSR_IA32_ENERGY_PERF_BIAS\n");
		exit(-1);
	}
	return;	/* success */
}

check_dev_msr() {
	struct stat sb;

	if (stat("/dev/cpu/0/msr", &sb)) {
		printf("no /dev/cpu/0/msr\n");
		printf("Try \"# modprobe msr\"\n");
		exit(-5);
	}
}

unsigned long long get_msr(int cpu, int offset)
{
	unsigned long long msr;
	char msr_path[32];
	int retval;
	int fd;

	sprintf(msr_path, "/dev/cpu/%d/msr", cpu);
	fd = open(msr_path, O_RDONLY);
	if (fd < 0) {
		perror(msr_path);
		exit(-1);
	}

	retval = pread(fd, &msr, sizeof msr, offset);

	if (retval != sizeof msr) {
		printf("pread cpu%d 0x%x = %d\n", cpu, offset, retval);
		exit(-2);
	}
	close(fd);
	return msr;
}

unsigned long long  put_msr(int cpu, unsigned long long new_msr, int offset)
{
	unsigned long long old_msr;
	char msr_path[32];
	int retval;
	int fd;

	sprintf(msr_path, "/dev/cpu/%d/msr", cpu);
	fd = open(msr_path, O_RDWR);
	if (fd < 0) {
		perror(msr_path);
		exit(-1);
	}

	retval = pread(fd, &old_msr, sizeof old_msr, offset);
	if (retval != sizeof old_msr) {
		perror("pwrite");
		printf("pread cpu%d 0x%x = %d\n", cpu, offset, retval);
		exit(-2);
	}

	retval = pwrite(fd, &new_msr, sizeof new_msr, offset);
	if (retval != sizeof new_msr) {
		perror("pwrite");
		printf("pwrite cpu%d 0x%x = %d\n", cpu, offset, retval);
		exit(-2);
	}

	close(fd);

	return old_msr;
}

void print_msr(int cpu)
{
	printf("cpu%d: 0x%016llx\n", cpu, get_msr(cpu, MSR_IA32_ENERGY_PERF_BIAS));
}

void update_msr(int cpu)
{
	unsigned long long previous_msr;

	previous_msr = put_msr(cpu, new_bias, MSR_IA32_ENERGY_PERF_BIAS);

	if (verbose)
		printf("cpu%d  msr0x%x 0x%016llx -> 0x%016llx\n",
			cpu, MSR_IA32_ENERGY_PERF_BIAS, previous_msr, new_bias);

	return;
}

char *proc_stat = "/proc/stat";
/*
 * run func() on every cpu in /dev/cpu
 */
void for_every_cpu(void (func)(int)) {
	FILE *fp;
	int cpu_count;
	int retval;

	fp = fopen(proc_stat, "r");
	if (fp == NULL) {
		perror(proc_stat);
		exit(-1);
	}

	retval = fscanf(fp, "cpu %*d %*d %*d %*d %*d %*d %*d %*d %*d %*d\n");
	if (retval != 0) {
		perror("/proc/stat format");
		exit(-1);
	}

	for (cpu_count = 0; ;cpu_count++) {
		int cpu;

		retval = fscanf(fp, "cpu%u %*d %*d %*d %*d %*d %*d %*d %*d %*d %*d\n", &cpu);
		if (retval != 1)
			return;

		func(cpu);
	}
	fclose(fp);
}

int main(int argc, char **argv)
{
	cmdline(argc, argv);

	if (verbose > 1)
		printf("x86_energy_perf_policy Aug 2, 2010"
				" - Len Brown <lenb@kernel.org>\n");
	if (verbose > 1 && !read_only)
		printf("new_bias %lld\n", new_bias);

	validate_cpuid();
	check_dev_msr();

	if (cpu != -1) {
		if (read_only)
			print_msr(cpu);
		else
			update_msr(cpu);
	} else {
		if (read_only) {
			for_every_cpu(print_msr);
		} else {
			for_every_cpu(update_msr);
		}
	}
		
	return 0;
}

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH] tools: add x86_energy_perf_policy to program MSR_IA32_ENERGY_PERF_BIAS
  2010-09-28 16:17 ` x86_energy_perf_policy.c Len Brown
@ 2010-10-23  4:40   ` Len Brown
  2010-10-27  3:23     ` Andrew Morton
  2010-11-15 16:07     ` [PATCH RESEND] tools: add power/x86/x86_energy_perf_policy " Len Brown
  0 siblings, 2 replies; 26+ messages in thread
From: Len Brown @ 2010-10-23  4:40 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-pm, linux-kernel, linux-acpi, x86

From: Len Brown <len.brown@intel.com>

MSR_IA32_ENERGY_PERF_BIAS first became available on Westmere Xeon.
It is implemented in all Sandy Bridge processors -- mobile, desktop and server.
It is expected to become increasingly important in subsequent generations.

x86_energy_perf_policy is a user-space utility to set this
hardware energy vs performance policy hint in the processor.
Most systems would benefit from "x86_energy_perf_policy normal"
at system startup, as the hardware default is maximum performance
at the expense of energy efficiency.  See the comments
in the source code for more information.

Linux-2.6.36 added "epb" to /proc/cpuinfo to indicate
if an x86 processor supports MSR_IA32_ENERGY_PERF_BIAS,
though the kernel does not actually program the MSR.

In March, Venkatesh Pallipadi proposed a small driver
that programmed MSR_IA32_ENERGY_PERF_BIAS, based on
the cpufreq governor in use.  It also offered
a boot-time cmdline option to override.
http://lkml.org/lkml/2010/3/4/457
But hiding the hardware policy behind the
governor choice was deemed "kinda icky".

So in June, I proposed a generic user/kernel API to
consolidate the power/performance policy trade-off.
"RFC: /sys/power/policy_preference"
http://lkml.org/lkml/2010/6/16/399
That is my preference for implementing this capability,
but I received no support on the list.

So in September, I sent x86_energy_perf_policy.c to LKML,
a user-space utility that scribbles directly to the MSR.
http://lkml.org/lkml/2010/9/28/246

Here is the same utility re-sent, this time proposed
to reside in the kernel tools directory.

Signed-off-by: Len Brown <len.brown@intel.com>
---
 tools/power/x86/x86_energy_perf_policy/Makefile    |    7 +
 .../x86_energy_perf_policy.c                       |  358 ++++++++++++++++++++
 2 files changed, 365 insertions(+), 0 deletions(-)
 create mode 100644 tools/power/x86/x86_energy_perf_policy/Makefile
 create mode 100644 tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c

diff --git a/tools/power/x86/x86_energy_perf_policy/Makefile b/tools/power/x86/x86_energy_perf_policy/Makefile
new file mode 100644
index 0000000..b0763da
--- /dev/null
+++ b/tools/power/x86/x86_energy_perf_policy/Makefile
@@ -0,0 +1,7 @@
+x86_energy_perf_policy : x86_energy_perf_policy.c
+
+clean :
+	rm -f x86_energy_perf_policy
+
+install :
+	install x86_energy_perf_policy /usr/bin/x86_energy_perf_policy
diff --git a/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c
new file mode 100644
index 0000000..89394d9
--- /dev/null
+++ b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c
@@ -0,0 +1,358 @@
+/*
+ * x86_energy_perf_policy -- set the energy versus performance
+ * policy preference bias on recent X86 processors.
+ */
+/*
+ * Copyright (c) 2010, Intel Corporation.
+ * Len Brown <len.brown@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/resource.h>
+#include <fcntl.h>
+#include <signal.h>
+#include <sys/time.h>
+#include <stdlib.h>
+
+unsigned int verbose;		/* set with -v */
+unsigned int read_only;		/* set with -r */
+char *progname;
+unsigned long long new_bias;
+int cpu = -1;
+
+/*
+ * Usage:
+ *
+ * -c cpu: limit action to a single CPU (default is all CPUs)
+ * -v: verbose output (can invoke more than once)
+ * -r: read-only, don't change any settings
+ *
+ *  performance
+ *	Performance is paramount.
+ *	Unwilling to sacrafice any performance
+ *	for the sake of energy saving. (hardware default)
+ *
+ *  normal
+ *	Can tolerate minor performance compromise
+ *	for potentially significant energy savings.
+ *	(reasonable default for most desktops and servers)
+ *
+ *  powersave
+ *	Can tolerate significant performance hit
+ *	to maximize energy savings.
+ *
+ * n
+ *	a numerical value to write to the underlying MSR.
+ */
+void usage(void)
+{
+	printf("%s: [-c cpu] [-v] "
+		"(-r | 'performance' | 'normal' | 'powersave' | n)\n",
+		progname);
+}
+
+/*
+ * MSR_IA32_ENERGY_PERF_BIAS allows software to convey
+ * its policy for the relative importance of performance
+ * versus energy savings.
+ *
+ * The hardware uses this information in model-specific ways
+ * when it must choose trade-offs between performance and
+ * energy consumption.
+ *
+ * This policy hint does not supercede Processor Performance states
+ * (P-states) or CPU Idle power states (C-states), but allows
+ * software to have influence where it has been unable to
+ * express a preference in the past.
+ *
+ * For example, this setting may tell the hardware how
+ * aggressively or conservatively to control frequency
+ * in the "turbo range" above the explicitly OS-controlled
+ * P-state frequency range.  It may also tell the hardware
+ * how aggressively is should enter the OS requestec C-states.
+ *
+ * The support for this feature is indicated by CPUID.06H.ECX.bit3
+ * per the Intel Architectures Software Developer's Manual.
+ */
+
+#define MSR_IA32_ENERGY_PERF_BIAS	0x000001b0
+
+#define	BIAS_PERFORMANCE		0
+#define BIAS_BALANCE			6
+#define	BIAS_POWERSAVE			15
+
+cmdline(int argc, char **argv) {
+	int opt;
+
+	progname = argv[0];
+
+	while ((opt = getopt(argc, argv, "+rvc:")) != -1) {
+		switch (opt) {
+		case 'c':
+			cpu = atoi(optarg);
+			break;
+		case 'r':
+			read_only = 1;
+			break;
+		case 'v':
+			verbose++;
+			break;
+		default:
+			usage();
+			exit(-1);
+		}
+	}
+	/* if -r, then should be no additional optind */
+	if (read_only && (argc > optind)) {
+		usage();
+		exit(-1);
+	}
+
+	/*
+	 * if no -r , then must be one additional optind
+	 */
+	if (!read_only) {
+
+		if (argc != optind + 1) {
+			printf("must supply -r or policy param\n");
+			usage();
+			exit(-1);
+			}
+
+		if (!strcmp("performance", argv[optind])) {
+			new_bias = BIAS_PERFORMANCE;
+		} else if (!strcmp("normal", argv[optind])) {
+			new_bias = BIAS_BALANCE;
+		} else if (!strcmp("powersave", argv[optind])) {
+			new_bias = BIAS_POWERSAVE;
+		} else {
+			new_bias = atoll(argv[optind]);
+			if (new_bias > BIAS_POWERSAVE) {
+				usage();
+				exit(-1);
+			}
+		}
+	}
+}
+
+/*
+ * validate_cpuid()
+ * returns on success, quietly exits on failure (make verbose with -v)
+ */
+void validate_cpuid(void)
+{
+	unsigned int eax, ebx, ecx, edx, max_level;
+	char brand[16];
+	unsigned int fms, family, model, stepping, ht_capable;
+
+	eax = ebx = ecx = edx = 0;
+
+	asm("cpuid" : "=a" (max_level), "=b" (ebx), "=c" (ecx),
+		"=d" (edx) : "a" (0));
+
+	sprintf(brand, "%.4s%.4s%.4s", &ebx, &edx, &ecx);
+
+	if (strncmp(brand, "GenuineIntel", 12)) {
+		if (verbose)
+			printf("CPUID: %s != GenuineIntel\n", brand);
+		exit(-1);
+	}
+
+	asm("cpuid" : "=a" (fms), "=c" (ecx), "=d" (edx) : "a" (1) : "ebx");
+	family = (fms >> 8) & 0xf;
+	model = (fms >> 4) & 0xf;
+	stepping = fms & 0xf;
+	if (family == 6 || family == 0xf)
+		model += ((fms >> 16) & 0xf) << 4;
+
+	if (verbose > 1)
+		printf("CPUID %s %d levels family:model:stepping "
+			"0x%x:%x:%x (%d:%d:%d)\n", brand, max_level,
+			family, model, stepping, family, model, stepping);
+
+	if (!(edx & (1 << 5))) {
+		if (verbose)
+			printf("CPUID: no MSR\n");
+		exit(-1);
+	}
+
+	/*
+	 * Support for MSR_IA32_ENERGY_PERF_BIAS
+	 * is indicated by CPUID.06H.ECX.bit3
+	 */
+	asm("cpuid" : "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx) : "a" (6));
+	if (verbose)
+		printf("CPUID.06H.ECX: 0x%x\n", ecx);
+	if (!(ecx & (1 << 3))) {
+		if (verbose)
+			printf("CPUID: No MSR_IA32_ENERGY_PERF_BIAS\n");
+		exit(-1);
+	}
+	return;	/* success */
+}
+
+check_dev_msr() {
+	struct stat sb;
+
+	if (stat("/dev/cpu/0/msr", &sb)) {
+		printf("no /dev/cpu/0/msr\n");
+		printf("Try \"# modprobe msr\"\n");
+		exit(-5);
+	}
+}
+
+unsigned long long get_msr(int cpu, int offset)
+{
+	unsigned long long msr;
+	char msr_path[32];
+	int retval;
+	int fd;
+
+	sprintf(msr_path, "/dev/cpu/%d/msr", cpu);
+	fd = open(msr_path, O_RDONLY);
+	if (fd < 0) {
+		perror(msr_path);
+		exit(-1);
+	}
+
+	retval = pread(fd, &msr, sizeof msr, offset);
+
+	if (retval != sizeof msr) {
+		printf("pread cpu%d 0x%x = %d\n", cpu, offset, retval);
+		exit(-2);
+	}
+	close(fd);
+	return msr;
+}
+
+unsigned long long  put_msr(int cpu, unsigned long long new_msr, int offset)
+{
+	unsigned long long old_msr;
+	char msr_path[32];
+	int retval;
+	int fd;
+
+	sprintf(msr_path, "/dev/cpu/%d/msr", cpu);
+	fd = open(msr_path, O_RDWR);
+	if (fd < 0) {
+		perror(msr_path);
+		exit(-1);
+	}
+
+	retval = pread(fd, &old_msr, sizeof old_msr, offset);
+	if (retval != sizeof old_msr) {
+		perror("pwrite");
+		printf("pread cpu%d 0x%x = %d\n", cpu, offset, retval);
+		exit(-2);
+	}
+
+	retval = pwrite(fd, &new_msr, sizeof new_msr, offset);
+	if (retval != sizeof new_msr) {
+		perror("pwrite");
+		printf("pwrite cpu%d 0x%x = %d\n", cpu, offset, retval);
+		exit(-2);
+	}
+
+	close(fd);
+
+	return old_msr;
+}
+
+void print_msr(int cpu)
+{
+	printf("cpu%d: 0x%016llx\n",
+		cpu, get_msr(cpu, MSR_IA32_ENERGY_PERF_BIAS));
+}
+
+void update_msr(int cpu)
+{
+	unsigned long long previous_msr;
+
+	previous_msr = put_msr(cpu, new_bias, MSR_IA32_ENERGY_PERF_BIAS);
+
+	if (verbose)
+		printf("cpu%d  msr0x%x 0x%016llx -> 0x%016llx\n",
+			cpu, MSR_IA32_ENERGY_PERF_BIAS, previous_msr, new_bias);
+
+	return;
+}
+
+char *proc_stat = "/proc/stat";
+/*
+ * run func() on every cpu in /dev/cpu
+ */
+void for_every_cpu(void (func)(int))
+{
+	FILE *fp;
+	int cpu_count;
+	int retval;
+
+	fp = fopen(proc_stat, "r");
+	if (fp == NULL) {
+		perror(proc_stat);
+		exit(-1);
+	}
+
+	retval = fscanf(fp, "cpu %*d %*d %*d %*d %*d %*d %*d %*d %*d %*d\n");
+	if (retval != 0) {
+		perror("/proc/stat format");
+		exit(-1);
+	}
+
+	for (cpu_count = 0; ; cpu_count++) {
+		int cpu;
+
+		retval = fscanf(fp,
+			"cpu%u %*d %*d %*d %*d %*d %*d %*d %*d %*d %*d\n",
+			&cpu);
+		if (retval != 1)
+			return;
+
+		func(cpu);
+	}
+	fclose(fp);
+}
+
+int main(int argc, char **argv)
+{
+	cmdline(argc, argv);
+
+	if (verbose > 1)
+		printf("x86_energy_perf_policy Aug 2, 2010"
+				" - Len Brown <lenb@kernel.org>\n");
+	if (verbose > 1 && !read_only)
+		printf("new_bias %lld\n", new_bias);
+
+	validate_cpuid();
+	check_dev_msr();
+
+	if (cpu != -1) {
+		if (read_only)
+			print_msr(cpu);
+		else
+			update_msr(cpu);
+	} else {
+		if (read_only)
+			for_every_cpu(print_msr);
+		else
+			for_every_cpu(update_msr);
+	}
+
+	return 0;
+}
-- 
1.7.3.1.127.g1bb28



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] tools: add x86_energy_perf_policy to program MSR_IA32_ENERGY_PERF_BIAS
  2010-10-23  4:40   ` [PATCH] tools: add x86_energy_perf_policy to program MSR_IA32_ENERGY_PERF_BIAS Len Brown
@ 2010-10-27  3:23     ` Andrew Morton
  2010-10-27  6:01       ` Ingo Molnar
  2010-11-15 16:07     ` [PATCH RESEND] tools: add power/x86/x86_energy_perf_policy " Len Brown
  1 sibling, 1 reply; 26+ messages in thread
From: Andrew Morton @ 2010-10-27  3:23 UTC (permalink / raw)
  To: Len Brown; +Cc: linux-pm, linux-kernel, linux-acpi, x86

On Sat, 23 Oct 2010 00:40:18 -0400 (EDT) Len Brown <lenb@kernel.org> wrote:

> MSR_IA32_ENERGY_PERF_BIAS first became available on Westmere Xeon.
> It is implemented in all Sandy Bridge processors -- mobile, desktop and server.
> It is expected to become increasingly important in subsequent generations.
> 
> x86_energy_perf_policy is a user-space utility to set this
> hardware energy vs performance policy hint in the processor.
> Most systems would benefit from "x86_energy_perf_policy normal"
> at system startup, as the hardware default is maximum performance
> at the expense of energy efficiency.  See the comments
> in the source code for more information.
> 
> Linux-2.6.36 added "epb" to /proc/cpuinfo to indicate
> if an x86 processor supports MSR_IA32_ENERGY_PERF_BIAS,
> though the kernel does not actually program the MSR.
> 
> In March, Venkatesh Pallipadi proposed a small driver
> that programmed MSR_IA32_ENERGY_PERF_BIAS, based on
> the cpufreq governor in use.  It also offered
> a boot-time cmdline option to override.
> http://lkml.org/lkml/2010/3/4/457
> But hiding the hardware policy behind the
> governor choice was deemed "kinda icky".
> 
> So in June, I proposed a generic user/kernel API to
> consolidate the power/performance policy trade-off.
> "RFC: /sys/power/policy_preference"
> http://lkml.org/lkml/2010/6/16/399
> That is my preference for implementing this capability,
> but I received no support on the list.
> 
> So in September, I sent x86_energy_perf_policy.c to LKML,
> a user-space utility that scribbles directly to the MSR.
> http://lkml.org/lkml/2010/9/28/246
> 
> Here is the same utility re-sent, this time proposed
> to reside in the kernel tools directory.
> 
> Signed-off-by: Len Brown <len.brown@intel.com>
> ---
>  tools/power/x86/x86_energy_perf_policy/Makefile    |    7 +
>  .../x86_energy_perf_policy.c                       |  358 ++++++++++++++++++++
>  2 files changed, 365 insertions(+), 0 deletions(-)
>  create mode 100644 tools/power/x86/x86_energy_perf_policy/Makefile
>  create mode 100644 tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c

tools/power/x86, eh?  It seems a better place than under
Documentation/, where such things have thus far landed!

I looked briefly, wondering about the kbuild situation.  It doesn't
appear to be wired up, so one has to manually enter that directory and
type `make'?

I guess that's OK as an interim thing but longer-term I suppose we
should have some more complete build and deployment system.  So
(thinking out loud) a `make' would invoke a `make tools', and that
`make tools' would build the tools which are specific to the target
arch[*], and any generic ones.  And a `make tools_install' would install
those tools in, I guess, /lib/modules/$(uname -r)/bin.

Or something else.  We'd need input from the distro guys to get this
right.

[*]: building tools for the `target arch' would require a far more
extensive cross-build environment than is needed for just kernel
cross-compilation.  This is perhaps Just Too Hard and perhaps a `make
tools_install' should copy the *source* into /lib/modules/$(uname
-r)/src and you then finish the build on the target.  Or something
else.  The mind boggles.

So for now, just parking the source down in ./tools/ and deferring the
problem sounds a fine idea ;)

A number of programs down under Documentation/ should be moved into
tools/ as well.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] tools: add x86_energy_perf_policy to program MSR_IA32_ENERGY_PERF_BIAS
  2010-10-27  3:23     ` Andrew Morton
@ 2010-10-27  6:01       ` Ingo Molnar
  2010-10-27 11:43         ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 26+ messages in thread
From: Ingo Molnar @ 2010-10-27  6:01 UTC (permalink / raw)
  To: Andrew Morton, Arnaldo Carvalho de Melo, Peter Zijlstra
  Cc: Len Brown, linux-pm, linux-kernel, linux-acpi, x86


* Andrew Morton <akpm@linux-foundation.org> wrote:

> On Sat, 23 Oct 2010 00:40:18 -0400 (EDT) Len Brown <lenb@kernel.org> wrote:
> 
> > MSR_IA32_ENERGY_PERF_BIAS first became available on Westmere Xeon.
> > It is implemented in all Sandy Bridge processors -- mobile, desktop and server.
> > It is expected to become increasingly important in subsequent generations.
> > 
> > x86_energy_perf_policy is a user-space utility to set this
> > hardware energy vs performance policy hint in the processor.
> > Most systems would benefit from "x86_energy_perf_policy normal"
> > at system startup, as the hardware default is maximum performance
> > at the expense of energy efficiency.  See the comments
> > in the source code for more information.
> > 
> > Linux-2.6.36 added "epb" to /proc/cpuinfo to indicate
> > if an x86 processor supports MSR_IA32_ENERGY_PERF_BIAS,
> > though the kernel does not actually program the MSR.
> > 
> > In March, Venkatesh Pallipadi proposed a small driver
> > that programmed MSR_IA32_ENERGY_PERF_BIAS, based on
> > the cpufreq governor in use.  It also offered
> > a boot-time cmdline option to override.
> > http://lkml.org/lkml/2010/3/4/457
> > But hiding the hardware policy behind the
> > governor choice was deemed "kinda icky".
> > 
> > So in June, I proposed a generic user/kernel API to
> > consolidate the power/performance policy trade-off.
> > "RFC: /sys/power/policy_preference"
> > http://lkml.org/lkml/2010/6/16/399
> > That is my preference for implementing this capability,
> > but I received no support on the list.
> > 
> > So in September, I sent x86_energy_perf_policy.c to LKML,
> > a user-space utility that scribbles directly to the MSR.
> > http://lkml.org/lkml/2010/9/28/246
> > 
> > Here is the same utility re-sent, this time proposed
> > to reside in the kernel tools directory.
> > 
> > Signed-off-by: Len Brown <len.brown@intel.com>
> > ---
> >  tools/power/x86/x86_energy_perf_policy/Makefile    |    7 +
> >  .../x86_energy_perf_policy.c                       |  358 ++++++++++++++++++++
> >  2 files changed, 365 insertions(+), 0 deletions(-)
> >  create mode 100644 tools/power/x86/x86_energy_perf_policy/Makefile
> >  create mode 100644 tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c
> 
> tools/power/x86, eh?  It seems a better place than under
> Documentation/, where such things have thus far landed!
> 
> I looked briefly, wondering about the kbuild situation.  It doesn't
> appear to be wired up, so one has to manually enter that directory and
> type `make'?
> 
> I guess that's OK as an interim thing but longer-term I suppose we
> should have some more complete build and deployment system.  So
> (thinking out loud) a `make' would invoke a `make tools', and that
> `make tools' would build the tools which are specific to the target
> arch[*], and any generic ones.  And a `make tools_install' would install
> those tools in, I guess, /lib/modules/$(uname -r)/bin.

In terms of build and documentation environment, tools/perf/ has one 
cloned/inherited from Git, which is rather good and functional.

Sharing it with the kernel's build system depends on the kbuild developers being 
interested in it.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] tools: add x86_energy_perf_policy to program MSR_IA32_ENERGY_PERF_BIAS
  2010-10-27  6:01       ` Ingo Molnar
@ 2010-10-27 11:43         ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 26+ messages in thread
From: Arnaldo Carvalho de Melo @ 2010-10-27 11:43 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, Peter Zijlstra, Len Brown, linux-pm, linux-kernel,
	linux-acpi, x86

Em Wed, Oct 27, 2010 at 08:01:39AM +0200, Ingo Molnar escreveu:
> 
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> > On Sat, 23 Oct 2010 00:40:18 -0400 (EDT) Len Brown <lenb@kernel.org> wrote:
> > tools/power/x86, eh?  It seems a better place than under
> > Documentation/, where such things have thus far landed!

> > I looked briefly, wondering about the kbuild situation.  It doesn't
> > appear to be wired up, so one has to manually enter that directory
> > and type `make'?

> > I guess that's OK as an interim thing but longer-term I suppose we
> > should have some more complete build and deployment system.  So
> > (thinking out loud) a `make' would invoke a `make tools', and that
> > `make tools' would build the tools which are specific to the target
> > arch[*], and any generic ones.  And a `make tools_install' would
> > install those tools in, I guess, /lib/modules/$(uname -r)/bin.
 
> In terms of build and documentation environment, tools/perf/ has one
> cloned/inherited from Git, which is rather good and functional.
 
> Sharing it with the kernel's build system depends on the kbuild
> developers being interested in it.

Yes, that is how it is today, I glued it to the main makefile in at
least one case:

[acme@doppio linux]$ make help | grep perf
  perf-tar-src-pkg    - Build perf-2.6.36-rc7.tar source tarball
  perf-targz-src-pkg  - Build perf-2.6.36-rc7.tar.gz source tarball
  perf-tarbz2-src-pkg - Build perf-2.6.36-rc7.tar.bz2 source tarball
[acme@doppio linux]$

I'd love to glue it some more, even using Kconfig and 'make toolsconfig'
for configuring the tools:

	. Want the TUI?
        . Want to link with DWARF? Needed for features x, y and z

Getting it done this way will provide examples that hopefully would lead
to more kernel coding practices and infrastructure being adopted by
(hell is freezing) userland programmers.

This is specially important now that there are more kernel programmers
writing userland code, lets hope that at least them continue to use
those practices and infrastructures ;-)

- Arnaldo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH RESEND] tools: add power/x86/x86_energy_perf_policy to program MSR_IA32_ENERGY_PERF_BIAS
  2010-10-23  4:40   ` [PATCH] tools: add x86_energy_perf_policy to program MSR_IA32_ENERGY_PERF_BIAS Len Brown
  2010-10-27  3:23     ` Andrew Morton
@ 2010-11-15 16:07     ` Len Brown
  2010-11-17 11:35       ` Andi Kleen
  2010-11-24  5:31       ` [PATCH v2] tools: create power/x86/x86_energy_perf_policy Len Brown
  1 sibling, 2 replies; 26+ messages in thread
From: Len Brown @ 2010-11-15 16:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: linux-pm, linux-kernel, linux-acpi, x86

From: Len Brown <len.brown@intel.com>

MSR_IA32_ENERGY_PERF_BIAS first became available on Westmere Xeon.
It is implemented in all Sandy Bridge processors -- mobile, desktop and server.
It is expected to become increasingly important in subsequent generations.

x86_energy_perf_policy is a user-space utility to set this
hardware energy vs performance policy hint in the processor.
Most systems would benefit from "x86_energy_perf_policy normal"
at system startup, as the hardware default is maximum performance
at the expense of energy efficiency.  See the comments
in the source code for more information.

Linux-2.6.36 added "epb" to /proc/cpuinfo to indicate
if an x86 processor supports MSR_IA32_ENERGY_PERF_BIAS,
though the kernel does not actually program the MSR.

In March, Venkatesh Pallipadi proposed a small driver
that programmed MSR_IA32_ENERGY_PERF_BIAS, based on
the cpufreq governor in use.  It also offered
a boot-time cmdline option to override.
http://lkml.org/lkml/2010/3/4/457
But hiding the hardware policy behind the
governor choice was deemed "kinda icky".

In June, I proposed a generic user/kernel API to
consolidate the power/performance policy trade-off.
"RFC: /sys/power/policy_preference"
http://lkml.org/lkml/2010/6/16/399
That is my preference for implementing this capability,
but I received no support on the list.

In September, I sent x86_energy_perf_policy.c to LKML,
a user-space utility that scribbles directly to the MSR.
http://lkml.org/lkml/2010/9/28/246

Here is the same utility re-sent, this time proposed
to reside in the kernel tools directory.

Signed-off-by: Len Brown <len.brown@intel.com>
---
 tools/power/x86/x86_energy_perf_policy/Makefile    |    7 +
 .../x86_energy_perf_policy.c                       |  358 ++++++++++++++++++++
 2 files changed, 365 insertions(+), 0 deletions(-)
 create mode 100644 tools/power/x86/x86_energy_perf_policy/Makefile
 create mode 100644 tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c

diff --git a/tools/power/x86/x86_energy_perf_policy/Makefile b/tools/power/x86/x86_energy_perf_policy/Makefile
new file mode 100644
index 0000000..b0763da
--- /dev/null
+++ b/tools/power/x86/x86_energy_perf_policy/Makefile
@@ -0,0 +1,7 @@
+x86_energy_perf_policy : x86_energy_perf_policy.c
+
+clean :
+	rm -f x86_energy_perf_policy
+
+install :
+	install x86_energy_perf_policy /usr/bin/x86_energy_perf_policy
diff --git a/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c
new file mode 100644
index 0000000..89394d9
--- /dev/null
+++ b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c
@@ -0,0 +1,358 @@
+/*
+ * x86_energy_perf_policy -- set the energy versus performance
+ * policy preference bias on recent X86 processors.
+ */
+/*
+ * Copyright (c) 2010, Intel Corporation.
+ * Len Brown <len.brown@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/resource.h>
+#include <fcntl.h>
+#include <signal.h>
+#include <sys/time.h>
+#include <stdlib.h>
+
+unsigned int verbose;		/* set with -v */
+unsigned int read_only;		/* set with -r */
+char *progname;
+unsigned long long new_bias;
+int cpu = -1;
+
+/*
+ * Usage:
+ *
+ * -c cpu: limit action to a single CPU (default is all CPUs)
+ * -v: verbose output (can invoke more than once)
+ * -r: read-only, don't change any settings
+ *
+ *  performance
+ *	Performance is paramount.
+ *	Unwilling to sacrafice any performance
+ *	for the sake of energy saving. (hardware default)
+ *
+ *  normal
+ *	Can tolerate minor performance compromise
+ *	for potentially significant energy savings.
+ *	(reasonable default for most desktops and servers)
+ *
+ *  powersave
+ *	Can tolerate significant performance hit
+ *	to maximize energy savings.
+ *
+ * n
+ *	a numerical value to write to the underlying MSR.
+ */
+void usage(void)
+{
+	printf("%s: [-c cpu] [-v] "
+		"(-r | 'performance' | 'normal' | 'powersave' | n)\n",
+		progname);
+}
+
+/*
+ * MSR_IA32_ENERGY_PERF_BIAS allows software to convey
+ * its policy for the relative importance of performance
+ * versus energy savings.
+ *
+ * The hardware uses this information in model-specific ways
+ * when it must choose trade-offs between performance and
+ * energy consumption.
+ *
+ * This policy hint does not supercede Processor Performance states
+ * (P-states) or CPU Idle power states (C-states), but allows
+ * software to have influence where it has been unable to
+ * express a preference in the past.
+ *
+ * For example, this setting may tell the hardware how
+ * aggressively or conservatively to control frequency
+ * in the "turbo range" above the explicitly OS-controlled
+ * P-state frequency range.  It may also tell the hardware
+ * how aggressively is should enter the OS requestec C-states.
+ *
+ * The support for this feature is indicated by CPUID.06H.ECX.bit3
+ * per the Intel Architectures Software Developer's Manual.
+ */
+
+#define MSR_IA32_ENERGY_PERF_BIAS	0x000001b0
+
+#define	BIAS_PERFORMANCE		0
+#define BIAS_BALANCE			6
+#define	BIAS_POWERSAVE			15
+
+cmdline(int argc, char **argv) {
+	int opt;
+
+	progname = argv[0];
+
+	while ((opt = getopt(argc, argv, "+rvc:")) != -1) {
+		switch (opt) {
+		case 'c':
+			cpu = atoi(optarg);
+			break;
+		case 'r':
+			read_only = 1;
+			break;
+		case 'v':
+			verbose++;
+			break;
+		default:
+			usage();
+			exit(-1);
+		}
+	}
+	/* if -r, then should be no additional optind */
+	if (read_only && (argc > optind)) {
+		usage();
+		exit(-1);
+	}
+
+	/*
+	 * if no -r , then must be one additional optind
+	 */
+	if (!read_only) {
+
+		if (argc != optind + 1) {
+			printf("must supply -r or policy param\n");
+			usage();
+			exit(-1);
+			}
+
+		if (!strcmp("performance", argv[optind])) {
+			new_bias = BIAS_PERFORMANCE;
+		} else if (!strcmp("normal", argv[optind])) {
+			new_bias = BIAS_BALANCE;
+		} else if (!strcmp("powersave", argv[optind])) {
+			new_bias = BIAS_POWERSAVE;
+		} else {
+			new_bias = atoll(argv[optind]);
+			if (new_bias > BIAS_POWERSAVE) {
+				usage();
+				exit(-1);
+			}
+		}
+	}
+}
+
+/*
+ * validate_cpuid()
+ * returns on success, quietly exits on failure (make verbose with -v)
+ */
+void validate_cpuid(void)
+{
+	unsigned int eax, ebx, ecx, edx, max_level;
+	char brand[16];
+	unsigned int fms, family, model, stepping, ht_capable;
+
+	eax = ebx = ecx = edx = 0;
+
+	asm("cpuid" : "=a" (max_level), "=b" (ebx), "=c" (ecx),
+		"=d" (edx) : "a" (0));
+
+	sprintf(brand, "%.4s%.4s%.4s", &ebx, &edx, &ecx);
+
+	if (strncmp(brand, "GenuineIntel", 12)) {
+		if (verbose)
+			printf("CPUID: %s != GenuineIntel\n", brand);
+		exit(-1);
+	}
+
+	asm("cpuid" : "=a" (fms), "=c" (ecx), "=d" (edx) : "a" (1) : "ebx");
+	family = (fms >> 8) & 0xf;
+	model = (fms >> 4) & 0xf;
+	stepping = fms & 0xf;
+	if (family == 6 || family == 0xf)
+		model += ((fms >> 16) & 0xf) << 4;
+
+	if (verbose > 1)
+		printf("CPUID %s %d levels family:model:stepping "
+			"0x%x:%x:%x (%d:%d:%d)\n", brand, max_level,
+			family, model, stepping, family, model, stepping);
+
+	if (!(edx & (1 << 5))) {
+		if (verbose)
+			printf("CPUID: no MSR\n");
+		exit(-1);
+	}
+
+	/*
+	 * Support for MSR_IA32_ENERGY_PERF_BIAS
+	 * is indicated by CPUID.06H.ECX.bit3
+	 */
+	asm("cpuid" : "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx) : "a" (6));
+	if (verbose)
+		printf("CPUID.06H.ECX: 0x%x\n", ecx);
+	if (!(ecx & (1 << 3))) {
+		if (verbose)
+			printf("CPUID: No MSR_IA32_ENERGY_PERF_BIAS\n");
+		exit(-1);
+	}
+	return;	/* success */
+}
+
+check_dev_msr() {
+	struct stat sb;
+
+	if (stat("/dev/cpu/0/msr", &sb)) {
+		printf("no /dev/cpu/0/msr\n");
+		printf("Try \"# modprobe msr\"\n");
+		exit(-5);
+	}
+}
+
+unsigned long long get_msr(int cpu, int offset)
+{
+	unsigned long long msr;
+	char msr_path[32];
+	int retval;
+	int fd;
+
+	sprintf(msr_path, "/dev/cpu/%d/msr", cpu);
+	fd = open(msr_path, O_RDONLY);
+	if (fd < 0) {
+		perror(msr_path);
+		exit(-1);
+	}
+
+	retval = pread(fd, &msr, sizeof msr, offset);
+
+	if (retval != sizeof msr) {
+		printf("pread cpu%d 0x%x = %d\n", cpu, offset, retval);
+		exit(-2);
+	}
+	close(fd);
+	return msr;
+}
+
+unsigned long long  put_msr(int cpu, unsigned long long new_msr, int offset)
+{
+	unsigned long long old_msr;
+	char msr_path[32];
+	int retval;
+	int fd;
+
+	sprintf(msr_path, "/dev/cpu/%d/msr", cpu);
+	fd = open(msr_path, O_RDWR);
+	if (fd < 0) {
+		perror(msr_path);
+		exit(-1);
+	}
+
+	retval = pread(fd, &old_msr, sizeof old_msr, offset);
+	if (retval != sizeof old_msr) {
+		perror("pwrite");
+		printf("pread cpu%d 0x%x = %d\n", cpu, offset, retval);
+		exit(-2);
+	}
+
+	retval = pwrite(fd, &new_msr, sizeof new_msr, offset);
+	if (retval != sizeof new_msr) {
+		perror("pwrite");
+		printf("pwrite cpu%d 0x%x = %d\n", cpu, offset, retval);
+		exit(-2);
+	}
+
+	close(fd);
+
+	return old_msr;
+}
+
+void print_msr(int cpu)
+{
+	printf("cpu%d: 0x%016llx\n",
+		cpu, get_msr(cpu, MSR_IA32_ENERGY_PERF_BIAS));
+}
+
+void update_msr(int cpu)
+{
+	unsigned long long previous_msr;
+
+	previous_msr = put_msr(cpu, new_bias, MSR_IA32_ENERGY_PERF_BIAS);
+
+	if (verbose)
+		printf("cpu%d  msr0x%x 0x%016llx -> 0x%016llx\n",
+			cpu, MSR_IA32_ENERGY_PERF_BIAS, previous_msr, new_bias);
+
+	return;
+}
+
+char *proc_stat = "/proc/stat";
+/*
+ * run func() on every cpu in /dev/cpu
+ */
+void for_every_cpu(void (func)(int))
+{
+	FILE *fp;
+	int cpu_count;
+	int retval;
+
+	fp = fopen(proc_stat, "r");
+	if (fp == NULL) {
+		perror(proc_stat);
+		exit(-1);
+	}
+
+	retval = fscanf(fp, "cpu %*d %*d %*d %*d %*d %*d %*d %*d %*d %*d\n");
+	if (retval != 0) {
+		perror("/proc/stat format");
+		exit(-1);
+	}
+
+	for (cpu_count = 0; ; cpu_count++) {
+		int cpu;
+
+		retval = fscanf(fp,
+			"cpu%u %*d %*d %*d %*d %*d %*d %*d %*d %*d %*d\n",
+			&cpu);
+		if (retval != 1)
+			return;
+
+		func(cpu);
+	}
+	fclose(fp);
+}
+
+int main(int argc, char **argv)
+{
+	cmdline(argc, argv);
+
+	if (verbose > 1)
+		printf("x86_energy_perf_policy Aug 2, 2010"
+				" - Len Brown <lenb@kernel.org>\n");
+	if (verbose > 1 && !read_only)
+		printf("new_bias %lld\n", new_bias);
+
+	validate_cpuid();
+	check_dev_msr();
+
+	if (cpu != -1) {
+		if (read_only)
+			print_msr(cpu);
+		else
+			update_msr(cpu);
+	} else {
+		if (read_only)
+			for_every_cpu(print_msr);
+		else
+			for_every_cpu(update_msr);
+	}
+
+	return 0;
+}
-- 
1.7.3.1.127.g1bb28

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH RESEND] tools: add power/x86/x86_energy_perf_policy to program MSR_IA32_ENERGY_PERF_BIAS
  2010-11-15 16:07     ` [PATCH RESEND] tools: add power/x86/x86_energy_perf_policy " Len Brown
@ 2010-11-17 11:35       ` Andi Kleen
  2010-11-22 20:13         ` Len Brown
  2010-11-24  5:31       ` [PATCH v2] tools: create power/x86/x86_energy_perf_policy Len Brown
  1 sibling, 1 reply; 26+ messages in thread
From: Andi Kleen @ 2010-11-17 11:35 UTC (permalink / raw)
  To: Len Brown; +Cc: Greg Kroah-Hartman, linux-pm, linux-kernel, linux-acpi, x86

Len Brown <lenb@kernel.org> writes:
> @@ -0,0 +1,7 @@
> +x86_energy_perf_policy : x86_energy_perf_policy.c
> +
> +clean :
> +	rm -f x86_energy_perf_policy
> +
> +install :
> +	install x86_energy_perf_policy /usr/bin/x86_energy_perf_policy

It's not clear to me how this Makefile ensures it's only
build on x86. 

If someone on another architecture does a full tools build
in the future (I think that is not wired up yet, but should
eventually) such a mechanism would be needed.


> +
> +/*
> + * Usage:

...

This full comment and parts of the following comments describing the
semantics need to be available somewhere to the user who may not have
easy access to the source. Can you make it display in usage or convert
it to a manpage? I would prefer a manpage

> +
> +cmdline(int argc, char **argv) {

No type?

> +	int opt;
> +
> +	progname = argv[0];
> +
> +	while ((opt = getopt(argc, argv, "+rvc:")) != -1) {

Maybe it's me, but I prefer having long options too (getopt_long)
These are easier to memorize.

> +
> +	/*
> +	 * if no -r , then must be one additional optind
> +	 */
> +	if (!read_only) {
> +
> +		if (argc != optind + 1) {
> +			printf("must supply -r or policy param\n");
> +			usage();
> +			exit(-1);

-1 is an unusual exit code. Better use 1.

An obvious improvement would be to put the exit() into usage()

> +			}
> +
> +		if (!strcmp("performance", argv[optind])) {
> +			new_bias = BIAS_PERFORMANCE;
> +		} else if (!strcmp("normal", argv[optind])) {
> +			new_bias = BIAS_BALANCE;
> +		} else if (!strcmp("powersave", argv[optind])) {
> +			new_bias = BIAS_POWERSAVE;
> +		} else {
> +			new_bias = atoll(argv[optind]);

If you used strtoull() you could actually check if the input
is really a number (end == argv[optind])

> +	eax = ebx = ecx = edx = 0;
> +
> +	asm("cpuid" : "=a" (max_level), "=b" (ebx), "=c" (ecx),
> +		"=d" (edx) : "a" (0));

Strictly for 386/early 486 you would need to check if cpuid
is available using pushf too. Perhaps it's safer to use cpuinfo

> +
> +check_dev_msr() {

Return type missing again

> +	struct stat sb;
> +
> +	if (stat("/dev/cpu/0/msr", &sb)) {
> +		printf("no /dev/cpu/0/msr\n");

This will fail if we eventually implement cpu 0 hotplug...
Better readdir or similar.

> +		printf("Try \"# modprobe msr\"\n");
> +		exit(-5);

Again -5 is unusual.


> +	char msr_path[32];
> +	int retval;
> +	int fd;
> +
> +	sprintf(msr_path, "/dev/cpu/%d/msr", cpu);
> +	fd = open(msr_path, O_RDONLY);
> +	if (fd < 0) {
> +		perror(msr_path);
> +		exit(-1);

This should be a soft error because the CPU can go away
any time.


> +/*
> + * run func() on every cpu in /dev/cpu
> + */
> +void for_every_cpu(void (func)(int))
> +{
> +	FILE *fp;
> +	int cpu_count;
> +	int retval;
> +
> +	fp = fopen(proc_stat, "r");

Using /proc/stat to get the number of CPUs is unusual
and you don't handle holes in the cpu numbers which
can happen due to hotplug.

I would just readdir or fnmatch the MSR /dev/cpu/* directories.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH RESEND] tools: add power/x86/x86_energy_perf_policy to program MSR_IA32_ENERGY_PERF_BIAS
  2010-11-17 11:35       ` Andi Kleen
@ 2010-11-22 20:13         ` Len Brown
  2010-11-22 20:33           ` Andi Kleen
  0 siblings, 1 reply; 26+ messages in thread
From: Len Brown @ 2010-11-22 20:13 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Greg Kroah-Hartman, linux-pm, linux-kernel, linux-acpi, x86

Hi Andy,

Thank you for the review!

responses below.

> > +install :
> > +	install x86_energy_perf_policy /usr/bin/x86_energy_perf_policy
> 
> It's not clear to me how this Makefile ensures it's only
> build on x86. 
> 
> If someone on another architecture does a full tools build
> in the future (I think that is not wired up yet, but should
> eventually) such a mechanism would be needed.

Per the comments from Andrew and others, the concept of a
"full tools build" doesn't actually exit (yet).

So I guess the only assurance that somebody not on x86 would run
make in this directory this utility lives in tools/power/x86/

Note that there are other utilities under tools
which have no Makefile at all...

> ...I would prefer a manpage

I'll be happy to write a manpage.
Is there good example I should follow?

> > +cmdline(int argc, char **argv) {
> 
> No type?

okay,  now void.

> > +	while ((opt = getopt(argc, argv, "+rvc:")) != -1) {
> 
> Maybe it's me, but I prefer having long options too (getopt_long)
> These are easier to memorize.

I'm not inclined to bother, as the use-case for this utility
is to be invoked by another program, and the options available
are really there just for verification/debugging, and don't
really merit being memorized by a human after that task.

> An obvious improvement would be to put the exit() into usage()

done.

> > +			new_bias = atoll(argv[optind]);
> 
> If you used strtoull() you could actually check if the input
> is really a number (end == argv[optind])

done.

> > +	asm("cpuid" : "=a" (max_level), "=b" (ebx), "=c" (ecx),
> > +		"=d" (edx) : "a" (0));
> 
> Strictly for 386/early 486 you would need to check if cpuid
> is available using pushf too. Perhaps it's safer to use cpuinfo

Meh, maybe simpler to crash on 486 and earlier?:-)
I'm not fond of parsing /proc/cpuinfo.

> > +check_dev_msr() {
> 
> Return type missing again

routine deleted.

> > +	struct stat sb;
> > +
> > +	if (stat("/dev/cpu/0/msr", &sb)) {
> > +		printf("no /dev/cpu/0/msr\n");
> 
> This will fail if we eventually implement cpu 0 hotplug...
> Better readdir or similar.

simpler to delete check_dev_msr() and stumble forward
assuming /dev/cpu/*/msr exists, and print a message and
exit if it doesn't.

> > +		printf("Try \"# modprobe msr\"\n");
> > +		exit(-5);
> 
> Again -5 is unusual.

okay, I canged all the exits to 1.

> > +	sprintf(msr_path, "/dev/cpu/%d/msr", cpu);
> > +	fd = open(msr_path, O_RDONLY);
> > +	if (fd < 0) {
> > +		perror(msr_path);
> > +		exit(-1);
> 
> This should be a soft error because the CPU can go away
> any time.

In the highly unlikely scenario that somebody uses
the -r option to excerise the read-only code,
and simultaneously invokes and completes a cpu hot remove
during the execution of this utility,
I think the utility exiting is just as useful,
and less complicated, than handling soft error.
Since in either case, the user would probably
simply re-invoke the utility to see what the
current state of the settled machine is.

> > +/*
> > + * run func() on every cpu in /dev/cpu
> > + */
...
> > +	fp = fopen(proc_stat, "r");
> 
> Using /proc/stat to get the number of CPUs is unusual
> and you don't handle holes in the cpu numbers which
> can happen due to hotplug.

The code does handle holes in cpu number namespace.

The "num_cpus" variable was a hold-over from
an older version that did not, and so I've deleted it.

> I would just readdir or fnmatch the MSR /dev/cpu/* directories.

I used to do that, but Arjan convinced me to use /proc/stat.
turbostat, rdmsr, and wrmsr all use /proc/stat.

thanks,
-Len Brown, Intel Open Source Technology Center





^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH RESEND] tools: add power/x86/x86_energy_perf_policy to program MSR_IA32_ENERGY_PERF_BIAS
  2010-11-22 20:13         ` Len Brown
@ 2010-11-22 20:33           ` Andi Kleen
  2010-11-23  4:48             ` Len Brown
  0 siblings, 1 reply; 26+ messages in thread
From: Andi Kleen @ 2010-11-22 20:33 UTC (permalink / raw)
  To: Len Brown
  Cc: Andi Kleen, Greg Kroah-Hartman, linux-pm, linux-kernel, linux-acpi, x86

On Mon, Nov 22, 2010 at 03:13:24PM -0500, Len Brown wrote:
> Per the comments from Andrew and others, the concept of a
> "full tools build" doesn't actually exit (yet).
> 
> So I guess the only assurance that somebody not on x86 would run
> make in this directory this utility lives in tools/power/x86/
> 
> Note that there are other utilities under tools
> which have no Makefile at all...

I suspect this will need to be fixed at some point.

e.g. kernel rpms probably don't want to hard code all of this
but just call some standard make file target. And the kernel
eventually needs a make install_user or similar.

> 
> > ...I would prefer a manpage
> 
> I'll be happy to write a manpage.
> Is there good example I should follow?

Just pick one from /usr/share/man.  You can grep for my 
name if you want one written by me, but I don't claim they are
necessarily better than others @)

> I'm not inclined to bother, as the use-case for this utility
> is to be invoked by another program, and the options available

What other program?

I could well imagine administrators sticking this 
into their boot.locals to set the policy they want.

> In the highly unlikely scenario that somebody uses
> the -r option to excerise the read-only code,
> and simultaneously invokes and completes a cpu hot remove

FWIW there are setups where core offlining can happen
automatically in response to an error.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH RESEND] tools: add power/x86/x86_energy_perf_policy to program MSR_IA32_ENERGY_PERF_BIAS
  2010-11-22 20:33           ` Andi Kleen
@ 2010-11-23  4:48             ` Len Brown
  0 siblings, 0 replies; 26+ messages in thread
From: Len Brown @ 2010-11-23  4:48 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Greg Kroah-Hartman, linux-pm, linux-kernel, linux-acpi, x86

On Mon, 22 Nov 2010, Andi Kleen wrote:

> On Mon, Nov 22, 2010 at 03:13:24PM -0500, Len Brown wrote:
> > Per the comments from Andrew and others, the concept of a
> > "full tools build" doesn't actually exit (yet).
> > 
> > So I guess the only assurance that somebody not on x86 would run
> > make in this directory this utility lives in tools/power/x86/
> > 
> > Note that there are other utilities under tools
> > which have no Makefile at all...
> 
> I suspect this will need to be fixed at some point.
> 
> e.g. kernel rpms probably don't want to hard code all of this
> but just call some standard make file target. And the kernel
> eventually needs a make install_user or similar.

I agree, but I don't volunteer to set up such
a build system as part of this particular patch.
As I mentioned, supplying any Makefile is
a step better than some of the peers...

> > I'm not inclined to bother, as the use-case for this utility
> > is to be invoked by another program, and the options available
> 
> What other program?
> 
> I could well imagine administrators sticking this 
> into their boot.locals to set the policy they want.

right, and that would be a program.
It is unlikely that users are going to be typing this
command, except into an admin script.

> > In the highly unlikely scenario that somebody uses
> > the -r option to excerise the read-only code,
> > and simultaneously invokes and completes a cpu hot remove
> 
> FWIW there are setups where core offlining can happen
> automatically in response to an error.

Understood.  I think it is fine if this utility
simply exits if that error occurs while it is running.

(turbostat, OTOH, may be long running, and it treats
 vanishing processors as a recoverable error)

thanks,
-Len Brown, Intel Open Source Technology Center



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2] tools: create power/x86/x86_energy_perf_policy
  2010-11-15 16:07     ` [PATCH RESEND] tools: add power/x86/x86_energy_perf_policy " Len Brown
  2010-11-17 11:35       ` Andi Kleen
@ 2010-11-24  5:31       ` Len Brown
  2010-11-25  5:52         ` Chen Gong
  1 sibling, 1 reply; 26+ messages in thread
From: Len Brown @ 2010-11-24  5:31 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: linux-pm, linux-kernel, linux-acpi, x86

From: Len Brown <len.brown@intel.com>

MSR_IA32_ENERGY_PERF_BIAS first became available on Westmere Xeon.
It is implemented in all Sandy Bridge processors -- mobile, desktop and server.
It is expected to become increasingly important in subsequent generations.

x86_energy_perf_policy is a user-space utility to set this
hardware energy vs performance policy hint in the processor.
Most systems would benefit from "x86_energy_perf_policy normal"
at system startup, as the hardware default is maximum performance
at the expense of energy efficiency.

Linux-2.6.36 added "epb" to /proc/cpuinfo to indicate
if an x86 processor supports MSR_IA32_ENERGY_PERF_BIAS,
though the kernel does not actually program the MSR.

In March, Venkatesh Pallipadi proposed a small driver
that programmed MSR_IA32_ENERGY_PERF_BIAS, based on
the cpufreq governor in use.  It also offered
a boot-time cmdline option to override.
http://lkml.org/lkml/2010/3/4/457
But hiding the hardware policy behind the
governor choice was deemed "kinda icky".

So in June, I proposed a generic user/kernel API to
consolidate the power/performance policy trade-off.
"RFC: /sys/power/policy_preference"
http://lkml.org/lkml/2010/6/16/399
That is my preference for implementing this capability,
but I received no support on the list.

So in September, I sent x86_energy_perf_policy.c to LKML,
a user-space utility that scribbles directly to the MSR.
http://lkml.org/lkml/2010/9/28/246

Here is the same utility re-sent, this time proposed
to reside in the kernel tools directory.

Signed-off-by: Len Brown <len.brown@intel.com>
---
v2
create man page
minor tweaks in response to review comments

tools/power/x86/x86_energy_perf_policy/Makefile    |    8 +
 .../x86_energy_perf_policy.8                       |  104 +++++++
 .../x86_energy_perf_policy.c                       |  325 ++++++++++++++++++++

diff --git a/tools/power/x86/x86_energy_perf_policy/Makefile b/tools/power/x86/x86_energy_perf_policy/Makefile
new file mode 100644
index 0000000..f458237
--- /dev/null
+++ b/tools/power/x86/x86_energy_perf_policy/Makefile
@@ -0,0 +1,8 @@
+x86_energy_perf_policy : x86_energy_perf_policy.c
+
+clean :
+	rm -f x86_energy_perf_policy
+
+install :
+	install x86_energy_perf_policy /usr/bin/
+	install x86_energy_perf_policy.8 /usr/share/man/man8/
diff --git a/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.8 b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.8
new file mode 100644
index 0000000..8eaaad6
--- /dev/null
+++ b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.8
@@ -0,0 +1,104 @@
+.\"  This page Copyright (C) 2010 Len Brown <len.brown@intel.com>
+.\"  Distributed under the GPL, Copyleft 1994.
+.TH X86_ENERGY_PERF_POLICY 8
+.SH NAME
+x86_energy_perf_policy \- read or write MSR_IA32_ENERGY_PERF_BIAS
+.SH SYNOPSIS
+.ft B
+.B x86_energy_perf_policy
+.RB [ "\-c cpu" ]
+.RB [ "\-v" ]
+.RB "\-r"
+.br
+.B x86_energy_perf_policy
+.RB [ "\-c cpu" ]
+.RB [ "\-v" ]
+.RB 'performance'
+.br
+.B x86_energy_perf_policy
+.RB [ "\-c cpu" ]
+.RB [ "\-v" ]
+.RB 'normal'
+.br
+.B x86_energy_perf_policy
+.RB [ "\-c cpu" ]
+.RB [ "\-v" ]
+.RB 'powersave'
+.br
+.B x86_energy_perf_policy
+.RB [ "\-c cpu" ]
+.RB [ "\-v" ]
+.RB n
+.br
+.SH DESCRIPTION
+\fBx86_energy_perf_policy\fP
+allows software to convey
+its policy for the relative importance of performance
+versus energy savings to the processor.
+
+The processor uses this information in model-specific ways
+when it must select trade-offs between performance and
+energy efficiency.
+
+This policy hint does not supersede Processor Performance states
+(P-states) or CPU Idle power states (C-states), but allows
+software to have influence where it would otherwise be unable
+to express a preference.
+
+For example, this setting may tell the hardware how
+aggressively or conservatively to control frequency
+in the "turbo range" above the explicitly OS-controlled
+P-state frequency range.  It may also tell the hardware
+how aggressively is should enter the OS requested C-states.
+
+Support for this feature is indicated by CPUID.06H.ECX.bit3
+per the Intel Architectures Software Developer's Manual.
+
+.SS Options
+\fB-c\fP limits operation to a single CPU.
+The default is to operate on all CPUs.
+Note that MSR_IA32_ENERGY_PERF_BIAS is defined per
+logical processor, but that the initial implementations
+of the MSR were shared among all processors in each package.
+.PP
+\fB-v\fP increases verbosity.  By default
+x86_energy_perf_policy is silent.
+.PP
+\fB-r\fP is for "read-only" mode - the unchanged state
+is read and displayed.
+.PP
+.I performance
+Set a policy where performance is paramount.
+The processor will be unwilling to sacrifice any performance
+for the sake of energy saving. This is the hardware default.
+.PP
+.I normal
+Set a policy with a normal balance between performance and energy efficiency.
+The processor will tolerate minor performance compromise
+for potentially significant energy savings.
+This reasonable default for most desktops and servers.
+.PP
+.I powersave
+Set a policy where the processor can accept
+a measurable performance hit to maximize energy efficiency.
+.PP
+.I n
+Set MSR_IA32_ENERGY_PERF_BIAS to the specified number.
+The range of valid numbers is 0-15, where 0 is maximum
+performance and 15 is maximum energy efficiency.
+
+.SH NOTES
+.B "x86_energy_perf_policy "
+runs only as root.
+.SH FILES
+.ta
+.nf
+/dev/cpu/*/msr
+.fi
+
+.SH "SEE ALSO"
+msr(4)
+.PP
+.SH AUTHORS
+.nf
+Written by Len Brown <len.brown@intel.com>
diff --git a/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c
new file mode 100644
index 0000000..b539923
--- /dev/null
+++ b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c
@@ -0,0 +1,325 @@
+/*
+ * x86_energy_perf_policy -- set the energy versus performance
+ * policy preference bias on recent X86 processors.
+ */
+/*
+ * Copyright (c) 2010, Intel Corporation.
+ * Len Brown <len.brown@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/resource.h>
+#include <fcntl.h>
+#include <signal.h>
+#include <sys/time.h>
+#include <stdlib.h>
+#include <string.h>
+
+unsigned int verbose;		/* set with -v */
+unsigned int read_only;		/* set with -r */
+char *progname;
+unsigned long long new_bias;
+int cpu = -1;
+
+/*
+ * Usage:
+ *
+ * -c cpu: limit action to a single CPU (default is all CPUs)
+ * -v: verbose output (can invoke more than once)
+ * -r: read-only, don't change any settings
+ *
+ *  performance
+ *	Performance is paramount.
+ *	Unwilling to sacrafice any performance
+ *	for the sake of energy saving. (hardware default)
+ *
+ *  normal
+ *	Can tolerate minor performance compromise
+ *	for potentially significant energy savings.
+ *	(reasonable default for most desktops and servers)
+ *
+ *  powersave
+ *	Can tolerate significant performance hit
+ *	to maximize energy savings.
+ *
+ * n
+ *	a numerical value to write to the underlying MSR.
+ */
+void usage(void)
+{
+	printf("%s: [-c cpu] [-v] "
+		"(-r | 'performance' | 'normal' | 'powersave' | n)\n",
+		progname);
+	exit(1);
+}
+
+#define MSR_IA32_ENERGY_PERF_BIAS	0x000001b0
+
+#define	BIAS_PERFORMANCE		0
+#define BIAS_BALANCE			6
+#define	BIAS_POWERSAVE			15
+
+void cmdline(int argc, char **argv)
+{
+	int opt;
+
+	progname = argv[0];
+
+	while ((opt = getopt(argc, argv, "+rvc:")) != -1) {
+		switch (opt) {
+		case 'c':
+			cpu = atoi(optarg);
+			break;
+		case 'r':
+			read_only = 1;
+			break;
+		case 'v':
+			verbose++;
+			break;
+		default:
+			usage();
+		}
+	}
+	/* if -r, then should be no additional optind */
+	if (read_only && (argc > optind))
+		usage();
+
+	/*
+	 * if no -r , then must be one additional optind
+	 */
+	if (!read_only) {
+
+		if (argc != optind + 1) {
+			printf("must supply -r or policy param\n");
+			usage();
+			}
+
+		if (!strcmp("performance", argv[optind])) {
+			new_bias = BIAS_PERFORMANCE;
+		} else if (!strcmp("normal", argv[optind])) {
+			new_bias = BIAS_BALANCE;
+		} else if (!strcmp("powersave", argv[optind])) {
+			new_bias = BIAS_POWERSAVE;
+		} else {
+			char *endptr;
+
+			new_bias = strtoull(argv[optind], &endptr, 0);
+			if (endptr == argv[optind] ||
+				new_bias > BIAS_POWERSAVE) {
+					fprintf(stderr, "invalid value: %s\n",
+						argv[optind]);
+				usage();
+			}
+		}
+	}
+}
+
+/*
+ * validate_cpuid()
+ * returns on success, quietly exits on failure (make verbose with -v)
+ */
+void validate_cpuid(void)
+{
+	unsigned int eax, ebx, ecx, edx, max_level;
+	char brand[16];
+	unsigned int fms, family, model, stepping;
+
+	eax = ebx = ecx = edx = 0;
+
+	asm("cpuid" : "=a" (max_level), "=b" (ebx), "=c" (ecx),
+		"=d" (edx) : "a" (0));
+
+	if (ebx != 0x756e6547 || edx != 0x49656e69 || ecx != 0x6c65746e) {
+		if (verbose)
+			fprintf(stderr, "%.4s%.4s%.4s != GenuineIntel",
+				(char *)&ebx, (char *)&edx, (char *)&ecx);
+		exit(1);
+	}
+
+	asm("cpuid" : "=a" (fms), "=c" (ecx), "=d" (edx) : "a" (1) : "ebx");
+	family = (fms >> 8) & 0xf;
+	model = (fms >> 4) & 0xf;
+	stepping = fms & 0xf;
+	if (family == 6 || family == 0xf)
+		model += ((fms >> 16) & 0xf) << 4;
+
+	if (verbose > 1)
+		printf("CPUID %s %d levels family:model:stepping "
+			"0x%x:%x:%x (%d:%d:%d)\n", brand, max_level,
+			family, model, stepping, family, model, stepping);
+
+	if (!(edx & (1 << 5))) {
+		if (verbose)
+			printf("CPUID: no MSR\n");
+		exit(1);
+	}
+
+	/*
+	 * Support for MSR_IA32_ENERGY_PERF_BIAS
+	 * is indicated by CPUID.06H.ECX.bit3
+	 */
+	asm("cpuid" : "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx) : "a" (6));
+	if (verbose)
+		printf("CPUID.06H.ECX: 0x%x\n", ecx);
+	if (!(ecx & (1 << 3))) {
+		if (verbose)
+			printf("CPUID: No MSR_IA32_ENERGY_PERF_BIAS\n");
+		exit(1);
+	}
+	return;	/* success */
+}
+
+unsigned long long get_msr(int cpu, int offset)
+{
+	unsigned long long msr;
+	char msr_path[32];
+	int retval;
+	int fd;
+
+	sprintf(msr_path, "/dev/cpu/%d/msr", cpu);
+	fd = open(msr_path, O_RDONLY);
+	if (fd < 0) {
+		printf("Try \"# modprobe msr\"\n");
+		perror(msr_path);
+		exit(1);
+	}
+
+	retval = pread(fd, &msr, sizeof msr, offset);
+
+	if (retval != sizeof msr) {
+		printf("pread cpu%d 0x%x = %d\n", cpu, offset, retval);
+		exit(-2);
+	}
+	close(fd);
+	return msr;
+}
+
+unsigned long long  put_msr(int cpu, unsigned long long new_msr, int offset)
+{
+	unsigned long long old_msr;
+	char msr_path[32];
+	int retval;
+	int fd;
+
+	sprintf(msr_path, "/dev/cpu/%d/msr", cpu);
+	fd = open(msr_path, O_RDWR);
+	if (fd < 0) {
+		perror(msr_path);
+		exit(1);
+	}
+
+	retval = pread(fd, &old_msr, sizeof old_msr, offset);
+	if (retval != sizeof old_msr) {
+		perror("pwrite");
+		printf("pread cpu%d 0x%x = %d\n", cpu, offset, retval);
+		exit(-2);
+	}
+
+	retval = pwrite(fd, &new_msr, sizeof new_msr, offset);
+	if (retval != sizeof new_msr) {
+		perror("pwrite");
+		printf("pwrite cpu%d 0x%x = %d\n", cpu, offset, retval);
+		exit(-2);
+	}
+
+	close(fd);
+
+	return old_msr;
+}
+
+void print_msr(int cpu)
+{
+	printf("cpu%d: 0x%016llx\n",
+		cpu, get_msr(cpu, MSR_IA32_ENERGY_PERF_BIAS));
+}
+
+void update_msr(int cpu)
+{
+	unsigned long long previous_msr;
+
+	previous_msr = put_msr(cpu, new_bias, MSR_IA32_ENERGY_PERF_BIAS);
+
+	if (verbose)
+		printf("cpu%d  msr0x%x 0x%016llx -> 0x%016llx\n",
+			cpu, MSR_IA32_ENERGY_PERF_BIAS, previous_msr, new_bias);
+
+	return;
+}
+
+char *proc_stat = "/proc/stat";
+/*
+ * run func() on every cpu in /dev/cpu
+ */
+void for_every_cpu(void (func)(int))
+{
+	FILE *fp;
+	int retval;
+
+	fp = fopen(proc_stat, "r");
+	if (fp == NULL) {
+		perror(proc_stat);
+		exit(1);
+	}
+
+	retval = fscanf(fp, "cpu %*d %*d %*d %*d %*d %*d %*d %*d %*d %*d\n");
+	if (retval != 0) {
+		perror("/proc/stat format");
+		exit(1);
+	}
+
+	while (1) {
+		int cpu;
+
+		retval = fscanf(fp,
+			"cpu%u %*d %*d %*d %*d %*d %*d %*d %*d %*d %*d\n",
+			&cpu);
+		if (retval != 1)
+			return;
+
+		func(cpu);
+	}
+	fclose(fp);
+}
+
+int main(int argc, char **argv)
+{
+	cmdline(argc, argv);
+
+	if (verbose > 1)
+		printf("x86_energy_perf_policy Nov 24, 2010"
+				" - Len Brown <lenb@kernel.org>\n");
+	if (verbose > 1 && !read_only)
+		printf("new_bias %lld\n", new_bias);
+
+	validate_cpuid();
+
+	if (cpu != -1) {
+		if (read_only)
+			print_msr(cpu);
+		else
+			update_msr(cpu);
+	} else {
+		if (read_only)
+			for_every_cpu(print_msr);
+		else
+			for_every_cpu(update_msr);
+	}
+
+	return 0;
+}


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] tools: create power/x86/x86_energy_perf_policy
  2010-11-24  5:31       ` [PATCH v2] tools: create power/x86/x86_energy_perf_policy Len Brown
@ 2010-11-25  5:52         ` Chen Gong
  2010-11-25  8:59           ` Chen Gong
  0 siblings, 1 reply; 26+ messages in thread
From: Chen Gong @ 2010-11-25  5:52 UTC (permalink / raw)
  To: Len Brown; +Cc: Greg Kroah-Hartman, linux-pm, linux-kernel, linux-acpi, x86

于 11/24/2010 1:31 PM, Len Brown 写道:
> From: Len Brown<len.brown@intel.com>
>
> MSR_IA32_ENERGY_PERF_BIAS first became available on Westmere Xeon.
> It is implemented in all Sandy Bridge processors -- mobile, desktop and server.
> It is expected to become increasingly important in subsequent generations.
>
> x86_energy_perf_policy is a user-space utility to set this
> hardware energy vs performance policy hint in the processor.
> Most systems would benefit from "x86_energy_perf_policy normal"
> at system startup, as the hardware default is maximum performance
> at the expense of energy efficiency.
>
> Linux-2.6.36 added "epb" to /proc/cpuinfo to indicate
> if an x86 processor supports MSR_IA32_ENERGY_PERF_BIAS,
> though the kernel does not actually program the MSR.
>
> In March, Venkatesh Pallipadi proposed a small driver
> that programmed MSR_IA32_ENERGY_PERF_BIAS, based on
> the cpufreq governor in use.  It also offered
> a boot-time cmdline option to override.
> http://lkml.org/lkml/2010/3/4/457
> But hiding the hardware policy behind the
> governor choice was deemed "kinda icky".
>
> So in June, I proposed a generic user/kernel API to
> consolidate the power/performance policy trade-off.
> "RFC: /sys/power/policy_preference"
> http://lkml.org/lkml/2010/6/16/399
> That is my preference for implementing this capability,
> but I received no support on the list.
>
> So in September, I sent x86_energy_perf_policy.c to LKML,
> a user-space utility that scribbles directly to the MSR.
> http://lkml.org/lkml/2010/9/28/246
>
> Here is the same utility re-sent, this time proposed
> to reside in the kernel tools directory.
>
> Signed-off-by: Len Brown<len.brown@intel.com>
> ---
> v2
> create man page
> minor tweaks in response to review comments
>
> tools/power/x86/x86_energy_perf_policy/Makefile    |    8 +
>   .../x86_energy_perf_policy.8                       |  104 +++++++
>   .../x86_energy_perf_policy.c                       |  325 ++++++++++++++++++++
>
> diff --git a/tools/power/x86/x86_energy_perf_policy/Makefile b/tools/power/x86/x86_energy_perf_policy/Makefile
> new file mode 100644
> index 0000000..f458237
> --- /dev/null
> +++ b/tools/power/x86/x86_energy_perf_policy/Makefile
> @@ -0,0 +1,8 @@
> +x86_energy_perf_policy : x86_energy_perf_policy.c
> +
> +clean :
> +	rm -f x86_energy_perf_policy
> +
> +install :
> +	install x86_energy_perf_policy /usr/bin/
> +	install x86_energy_perf_policy.8 /usr/share/man/man8/
> diff --git a/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.8 b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.8
> new file mode 100644
> index 0000000..8eaaad6
> --- /dev/null
> +++ b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.8
> @@ -0,0 +1,104 @@
> +.\"  This page Copyright (C) 2010 Len Brown<len.brown@intel.com>
> +.\"  Distributed under the GPL, Copyleft 1994.
> +.TH X86_ENERGY_PERF_POLICY 8
> +.SH NAME
> +x86_energy_perf_policy \- read or write MSR_IA32_ENERGY_PERF_BIAS
> +.SH SYNOPSIS
> +.ft B
> +.B x86_energy_perf_policy
> +.RB [ "\-c cpu" ]
> +.RB [ "\-v" ]
> +.RB "\-r"
> +.br
> +.B x86_energy_perf_policy
> +.RB [ "\-c cpu" ]
> +.RB [ "\-v" ]
> +.RB 'performance'
> +.br
> +.B x86_energy_perf_policy
> +.RB [ "\-c cpu" ]
> +.RB [ "\-v" ]
> +.RB 'normal'
> +.br
> +.B x86_energy_perf_policy
> +.RB [ "\-c cpu" ]
> +.RB [ "\-v" ]
> +.RB 'powersave'
> +.br
> +.B x86_energy_perf_policy
> +.RB [ "\-c cpu" ]
> +.RB [ "\-v" ]
> +.RB n
> +.br
> +.SH DESCRIPTION
> +\fBx86_energy_perf_policy\fP
> +allows software to convey
> +its policy for the relative importance of performance
> +versus energy savings to the processor.
> +
> +The processor uses this information in model-specific ways
> +when it must select trade-offs between performance and
> +energy efficiency.
> +
> +This policy hint does not supersede Processor Performance states
> +(P-states) or CPU Idle power states (C-states), but allows
> +software to have influence where it would otherwise be unable
> +to express a preference.
> +
> +For example, this setting may tell the hardware how
> +aggressively or conservatively to control frequency
> +in the "turbo range" above the explicitly OS-controlled
> +P-state frequency range.  It may also tell the hardware
> +how aggressively is should enter the OS requested C-states.
> +
> +Support for this feature is indicated by CPUID.06H.ECX.bit3
> +per the Intel Architectures Software Developer's Manual.
> +
> +.SS Options
> +\fB-c\fP limits operation to a single CPU.
> +The default is to operate on all CPUs.
> +Note that MSR_IA32_ENERGY_PERF_BIAS is defined per
> +logical processor, but that the initial implementations
> +of the MSR were shared among all processors in each package.
> +.PP
> +\fB-v\fP increases verbosity.  By default
> +x86_energy_perf_policy is silent.
> +.PP
> +\fB-r\fP is for "read-only" mode - the unchanged state
> +is read and displayed.
> +.PP
> +.I performance
> +Set a policy where performance is paramount.
> +The processor will be unwilling to sacrifice any performance
> +for the sake of energy saving. This is the hardware default.
> +.PP
> +.I normal
> +Set a policy with a normal balance between performance and energy efficiency.
> +The processor will tolerate minor performance compromise
> +for potentially significant energy savings.
> +This reasonable default for most desktops and servers.
> +.PP
> +.I powersave
> +Set a policy where the processor can accept
> +a measurable performance hit to maximize energy efficiency.
> +.PP
> +.I n
> +Set MSR_IA32_ENERGY_PERF_BIAS to the specified number.
> +The range of valid numbers is 0-15, where 0 is maximum
> +performance and 15 is maximum energy efficiency.
> +
> +.SH NOTES
> +.B "x86_energy_perf_policy "
> +runs only as root.
> +.SH FILES
> +.ta
> +.nf
> +/dev/cpu/*/msr
> +.fi
> +
> +.SH "SEE ALSO"
> +msr(4)
> +.PP
> +.SH AUTHORS
> +.nf
> +Written by Len Brown<len.brown@intel.com>
> diff --git a/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c
> new file mode 100644
> index 0000000..b539923
> --- /dev/null
> +++ b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c
> @@ -0,0 +1,325 @@
> +/*
> + * x86_energy_perf_policy -- set the energy versus performance
> + * policy preference bias on recent X86 processors.
> + */
> +/*
> + * Copyright (c) 2010, Intel Corporation.
> + * Len Brown<len.brown@intel.com>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; if not, write to the Free Software Foundation, Inc.,
> + * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
> + */
> +
> +#include<stdio.h>
> +#include<unistd.h>
> +#include<sys/types.h>
> +#include<sys/stat.h>
> +#include<sys/resource.h>
> +#include<fcntl.h>
> +#include<signal.h>
> +#include<sys/time.h>
> +#include<stdlib.h>
> +#include<string.h>
> +
> +unsigned int verbose;		/* set with -v */
> +unsigned int read_only;		/* set with -r */
> +char *progname;
> +unsigned long long new_bias;
> +int cpu = -1;
> +
> +/*
> + * Usage:
> + *
> + * -c cpu: limit action to a single CPU (default is all CPUs)
> + * -v: verbose output (can invoke more than once)
> + * -r: read-only, don't change any settings
> + *
> + *  performance
> + *	Performance is paramount.
> + *	Unwilling to sacrafice any performance
> + *	for the sake of energy saving. (hardware default)
> + *
> + *  normal
> + *	Can tolerate minor performance compromise
> + *	for potentially significant energy savings.
> + *	(reasonable default for most desktops and servers)
> + *
> + *  powersave
> + *	Can tolerate significant performance hit
> + *	to maximize energy savings.
> + *
> + * n
> + *	a numerical value to write to the underlying MSR.
> + */
> +void usage(void)
> +{
> +	printf("%s: [-c cpu] [-v] "
> +		"(-r | 'performance' | 'normal' | 'powersave' | n)\n",
> +		progname);
> +	exit(1);
> +}
> +
> +#define MSR_IA32_ENERGY_PERF_BIAS	0x000001b0
> +
> +#define	BIAS_PERFORMANCE		0
> +#define BIAS_BALANCE			6
> +#define	BIAS_POWERSAVE			15
> +
> +void cmdline(int argc, char **argv)
> +{
> +	int opt;
> +
> +	progname = argv[0];
> +
> +	while ((opt = getopt(argc, argv, "+rvc:")) != -1) {
> +		switch (opt) {
> +		case 'c':
> +			cpu = atoi(optarg);
> +			break;
> +		case 'r':
> +			read_only = 1;
> +			break;
> +		case 'v':
> +			verbose++;
> +			break;
> +		default:
> +			usage();
> +		}
> +	}
> +	/* if -r, then should be no additional optind */
> +	if (read_only&&  (argc>  optind))
> +		usage();
> +
> +	/*
> +	 * if no -r , then must be one additional optind
> +	 */
> +	if (!read_only) {
> +
> +		if (argc != optind + 1) {
> +			printf("must supply -r or policy param\n");
> +			usage();
> +			}
> +
> +		if (!strcmp("performance", argv[optind])) {
> +			new_bias = BIAS_PERFORMANCE;
> +		} else if (!strcmp("normal", argv[optind])) {
> +			new_bias = BIAS_BALANCE;
> +		} else if (!strcmp("powersave", argv[optind])) {
> +			new_bias = BIAS_POWERSAVE;
> +		} else {
> +			char *endptr;
> +
> +			new_bias = strtoull(argv[optind],&endptr, 0);
> +			if (endptr == argv[optind] ||
> +				new_bias>  BIAS_POWERSAVE) {
> +					fprintf(stderr, "invalid value: %s\n",
> +						argv[optind]);
> +				usage();
> +			}
> +		}
> +	}
> +}
> +
> +/*
> + * validate_cpuid()
> + * returns on success, quietly exits on failure (make verbose with -v)
> + */
> +void validate_cpuid(void)
> +{
> +	unsigned int eax, ebx, ecx, edx, max_level;
> +	char brand[16];
> +	unsigned int fms, family, model, stepping;
> +
> +	eax = ebx = ecx = edx = 0;
> +
> +	asm("cpuid" : "=a" (max_level), "=b" (ebx), "=c" (ecx),
> +		"=d" (edx) : "a" (0));
> +
> +	if (ebx != 0x756e6547 || edx != 0x49656e69 || ecx != 0x6c65746e) {
> +		if (verbose)
> +			fprintf(stderr, "%.4s%.4s%.4s != GenuineIntel",
> +				(char *)&ebx, (char *)&edx, (char *)&ecx);
> +		exit(1);
> +	}
> +
> +	asm("cpuid" : "=a" (fms), "=c" (ecx), "=d" (edx) : "a" (1) : "ebx");
> +	family = (fms>>  8)&  0xf;
> +	model = (fms>>  4)&  0xf;
> +	stepping = fms&  0xf;
> +	if (family == 6 || family == 0xf)
> +		model += ((fms>>  16)&  0xf)<<  4;
> +
> +	if (verbose>  1)
> +		printf("CPUID %s %d levels family:model:stepping "
> +			"0x%x:%x:%x (%d:%d:%d)\n", brand, max_level,
> +			family, model, stepping, family, model, stepping);
> +
> +	if (!(edx&  (1<<  5))) {
> +		if (verbose)
> +			printf("CPUID: no MSR\n");
> +		exit(1);
> +	}
> +
> +	/*
> +	 * Support for MSR_IA32_ENERGY_PERF_BIAS
> +	 * is indicated by CPUID.06H.ECX.bit3
> +	 */
> +	asm("cpuid" : "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx) : "a" (6));
> +	if (verbose)
> +		printf("CPUID.06H.ECX: 0x%x\n", ecx);
> +	if (!(ecx&  (1<<  3))) {
> +		if (verbose)
> +			printf("CPUID: No MSR_IA32_ENERGY_PERF_BIAS\n");
> +		exit(1);
> +	}
> +	return;	/* success */
> +}
> +
> +unsigned long long get_msr(int cpu, int offset)
> +{
> +	unsigned long long msr;
> +	char msr_path[32];
> +	int retval;
> +	int fd;
> +
> +	sprintf(msr_path, "/dev/cpu/%d/msr", cpu);
> +	fd = open(msr_path, O_RDONLY);
> +	if (fd<  0) {
> +		printf("Try \"# modprobe msr\"\n");
> +		perror(msr_path);
> +		exit(1);
> +	}
> +
> +	retval = pread(fd,&msr, sizeof msr, offset);
> +
> +	if (retval != sizeof msr) {
> +		printf("pread cpu%d 0x%x = %d\n", cpu, offset, retval);
> +		exit(-2);
> +	}
> +	close(fd);
> +	return msr;
> +}
> +
> +unsigned long long  put_msr(int cpu, unsigned long long new_msr, int offset)
> +{
> +	unsigned long long old_msr;
> +	char msr_path[32];
> +	int retval;
> +	int fd;
> +
> +	sprintf(msr_path, "/dev/cpu/%d/msr", cpu);
> +	fd = open(msr_path, O_RDWR);
> +	if (fd<  0) {
> +		perror(msr_path);
> +		exit(1);
> +	}
> +
> +	retval = pread(fd,&old_msr, sizeof old_msr, offset);
> +	if (retval != sizeof old_msr) {
> +		perror("pwrite");
> +		printf("pread cpu%d 0x%x = %d\n", cpu, offset, retval);
> +		exit(-2);
> +	}
> +
> +	retval = pwrite(fd,&new_msr, sizeof new_msr, offset);
> +	if (retval != sizeof new_msr) {
> +		perror("pwrite");
> +		printf("pwrite cpu%d 0x%x = %d\n", cpu, offset, retval);
> +		exit(-2);
> +	}
> +
> +	close(fd);
> +
> +	return old_msr;
> +}
> +
> +void print_msr(int cpu)
> +{
> +	printf("cpu%d: 0x%016llx\n",
> +		cpu, get_msr(cpu, MSR_IA32_ENERGY_PERF_BIAS));
> +}
> +
> +void update_msr(int cpu)
> +{
> +	unsigned long long previous_msr;
> +
> +	previous_msr = put_msr(cpu, new_bias, MSR_IA32_ENERGY_PERF_BIAS);
> +
> +	if (verbose)
> +		printf("cpu%d  msr0x%x 0x%016llx ->  0x%016llx\n",
> +			cpu, MSR_IA32_ENERGY_PERF_BIAS, previous_msr, new_bias);
> +
> +	return;
> +}
> +
> +char *proc_stat = "/proc/stat";
> +/*
> + * run func() on every cpu in /dev/cpu
> + */
> +void for_every_cpu(void (func)(int))
> +{
> +	FILE *fp;
> +	int retval;
> +
> +	fp = fopen(proc_stat, "r");
> +	if (fp == NULL) {
> +		perror(proc_stat);
> +		exit(1);
> +	}
> +
> +	retval = fscanf(fp, "cpu %*d %*d %*d %*d %*d %*d %*d %*d %*d %*d\n");
> +	if (retval != 0) {
> +		perror("/proc/stat format");
> +		exit(1);
> +	}
> +
> +	while (1) {
> +		int cpu;
> +
> +		retval = fscanf(fp,
> +			"cpu%u %*d %*d %*d %*d %*d %*d %*d %*d %*d %*d\n",
> +			&cpu);
> +		if (retval != 1)
> +			return;
> +
> +		func(cpu);
> +	}
> +	fclose(fp);
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	cmdline(argc, argv);
> +
> +	if (verbose>  1)
> +		printf("x86_energy_perf_policy Nov 24, 2010"
> +				" - Len Brown<lenb@kernel.org>\n");
> +	if (verbose>  1&&  !read_only)
> +		printf("new_bias %lld\n", new_bias);
> +
> +	validate_cpuid();
> +
> +	if (cpu != -1) {
> +		if (read_only)
> +			print_msr(cpu);
> +		else
> +			update_msr(cpu);
> +	} else {
> +		if (read_only)
> +			for_every_cpu(print_msr);
> +		else
> +			for_every_cpu(update_msr);
> +	}
> +
> +	return 0;
> +}
>
I have 2 questions.

1. the usage looks too simple. If I haven't read the comments
in the source codes, I even can't know the exact meaning of these
parameters. Such as -v, -vv etc. How about adding the comments
as the part of the usage ?

2. the paramter "noraml | performance | powersave | n" looks weird.
why it can't look like other paramter (-r, -v etc.). For example,
I can't use it such as
"./x86_energy_perf_policy  -c 0  normal -v"

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] tools: create power/x86/x86_energy_perf_policy
  2010-11-25  5:52         ` Chen Gong
@ 2010-11-25  8:59           ` Chen Gong
  0 siblings, 0 replies; 26+ messages in thread
From: Chen Gong @ 2010-11-25  8:59 UTC (permalink / raw)
  To: Chen Gong
  Cc: Len Brown, Greg Kroah-Hartman, linux-pm, linux-kernel, linux-acpi, x86

于 11/25/2010 1:52 PM, Chen Gong 写道:
> 于 11/24/2010 1:31 PM, Len Brown 写道:
>> From: Len Brown<len.brown@intel.com>
>>
>> MSR_IA32_ENERGY_PERF_BIAS first became available on Westmere Xeon.
>> It is implemented in all Sandy Bridge processors -- mobile, desktop
>> and server.
>> It is expected to become increasingly important in subsequent
>> generations.
>>
>> x86_energy_perf_policy is a user-space utility to set this
>> hardware energy vs performance policy hint in the processor.
>> Most systems would benefit from "x86_energy_perf_policy normal"
>> at system startup, as the hardware default is maximum performance
>> at the expense of energy efficiency.
>>
>> Linux-2.6.36 added "epb" to /proc/cpuinfo to indicate
>> if an x86 processor supports MSR_IA32_ENERGY_PERF_BIAS,
>> though the kernel does not actually program the MSR.
>>
>> In March, Venkatesh Pallipadi proposed a small driver
>> that programmed MSR_IA32_ENERGY_PERF_BIAS, based on
>> the cpufreq governor in use. It also offered
>> a boot-time cmdline option to override.
>> http://lkml.org/lkml/2010/3/4/457
>> But hiding the hardware policy behind the
>> governor choice was deemed "kinda icky".
>>
>> So in June, I proposed a generic user/kernel API to
>> consolidate the power/performance policy trade-off.
>> "RFC: /sys/power/policy_preference"
>> http://lkml.org/lkml/2010/6/16/399
>> That is my preference for implementing this capability,
>> but I received no support on the list.
>>
>> So in September, I sent x86_energy_perf_policy.c to LKML,
>> a user-space utility that scribbles directly to the MSR.
>> http://lkml.org/lkml/2010/9/28/246
>>
>> Here is the same utility re-sent, this time proposed
>> to reside in the kernel tools directory.
>>
>> Signed-off-by: Len Brown<len.brown@intel.com>
>> ---
>> v2
>> create man page
>> minor tweaks in response to review comments
>>
>> tools/power/x86/x86_energy_perf_policy/Makefile | 8 +
>> .../x86_energy_perf_policy.8 | 104 +++++++
>> .../x86_energy_perf_policy.c | 325 ++++++++++++++++++++
>>
>> diff --git a/tools/power/x86/x86_energy_perf_policy/Makefile
>> b/tools/power/x86/x86_energy_perf_policy/Makefile
>> new file mode 100644
>> index 0000000..f458237
>> --- /dev/null
>> +++ b/tools/power/x86/x86_energy_perf_policy/Makefile
>> @@ -0,0 +1,8 @@
>> +x86_energy_perf_policy : x86_energy_perf_policy.c
>> +
>> +clean :
>> + rm -f x86_energy_perf_policy
>> +
>> +install :
>> + install x86_energy_perf_policy /usr/bin/
>> + install x86_energy_perf_policy.8 /usr/share/man/man8/
>> diff --git
>> a/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.8
>> b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.8
>> new file mode 100644
>> index 0000000..8eaaad6
>> --- /dev/null
>> +++ b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.8
>> @@ -0,0 +1,104 @@
>> +.\" This page Copyright (C) 2010 Len Brown<len.brown@intel.com>
>> +.\" Distributed under the GPL, Copyleft 1994.
>> +.TH X86_ENERGY_PERF_POLICY 8
>> +.SH NAME
>> +x86_energy_perf_policy \- read or write MSR_IA32_ENERGY_PERF_BIAS
>> +.SH SYNOPSIS
>> +.ft B
>> +.B x86_energy_perf_policy
>> +.RB [ "\-c cpu" ]
>> +.RB [ "\-v" ]
>> +.RB "\-r"
>> +.br
>> +.B x86_energy_perf_policy
>> +.RB [ "\-c cpu" ]
>> +.RB [ "\-v" ]
>> +.RB 'performance'
>> +.br
>> +.B x86_energy_perf_policy
>> +.RB [ "\-c cpu" ]
>> +.RB [ "\-v" ]
>> +.RB 'normal'
>> +.br
>> +.B x86_energy_perf_policy
>> +.RB [ "\-c cpu" ]
>> +.RB [ "\-v" ]
>> +.RB 'powersave'
>> +.br
>> +.B x86_energy_perf_policy
>> +.RB [ "\-c cpu" ]
>> +.RB [ "\-v" ]
>> +.RB n
>> +.br
>> +.SH DESCRIPTION
>> +\fBx86_energy_perf_policy\fP
>> +allows software to convey
>> +its policy for the relative importance of performance
>> +versus energy savings to the processor.
>> +
>> +The processor uses this information in model-specific ways
>> +when it must select trade-offs between performance and
>> +energy efficiency.
>> +
>> +This policy hint does not supersede Processor Performance states
>> +(P-states) or CPU Idle power states (C-states), but allows
>> +software to have influence where it would otherwise be unable
>> +to express a preference.
>> +
>> +For example, this setting may tell the hardware how
>> +aggressively or conservatively to control frequency
>> +in the "turbo range" above the explicitly OS-controlled
>> +P-state frequency range. It may also tell the hardware
>> +how aggressively is should enter the OS requested C-states.
>> +
>> +Support for this feature is indicated by CPUID.06H.ECX.bit3
>> +per the Intel Architectures Software Developer's Manual.
>> +
>> +.SS Options
>> +\fB-c\fP limits operation to a single CPU.
>> +The default is to operate on all CPUs.
>> +Note that MSR_IA32_ENERGY_PERF_BIAS is defined per
>> +logical processor, but that the initial implementations
>> +of the MSR were shared among all processors in each package.
>> +.PP
>> +\fB-v\fP increases verbosity. By default
>> +x86_energy_perf_policy is silent.
>> +.PP
>> +\fB-r\fP is for "read-only" mode - the unchanged state
>> +is read and displayed.
>> +.PP
>> +.I performance
>> +Set a policy where performance is paramount.
>> +The processor will be unwilling to sacrifice any performance
>> +for the sake of energy saving. This is the hardware default.
>> +.PP
>> +.I normal
>> +Set a policy with a normal balance between performance and energy
>> efficiency.
>> +The processor will tolerate minor performance compromise
>> +for potentially significant energy savings.
>> +This reasonable default for most desktops and servers.
>> +.PP
>> +.I powersave
>> +Set a policy where the processor can accept
>> +a measurable performance hit to maximize energy efficiency.
>> +.PP
>> +.I n
>> +Set MSR_IA32_ENERGY_PERF_BIAS to the specified number.
>> +The range of valid numbers is 0-15, where 0 is maximum
>> +performance and 15 is maximum energy efficiency.
>> +
>> +.SH NOTES
>> +.B "x86_energy_perf_policy "
>> +runs only as root.
>> +.SH FILES
>> +.ta
>> +.nf
>> +/dev/cpu/*/msr
>> +.fi
>> +
>> +.SH "SEE ALSO"
>> +msr(4)
>> +.PP
>> +.SH AUTHORS
>> +.nf
>> +Written by Len Brown<len.brown@intel.com>
>> diff --git
>> a/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c
>> b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c
>> new file mode 100644
>> index 0000000..b539923
>> --- /dev/null
>> +++ b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c
>> @@ -0,0 +1,325 @@
>> +/*
>> + * x86_energy_perf_policy -- set the energy versus performance
>> + * policy preference bias on recent X86 processors.
>> + */
>> +/*
>> + * Copyright (c) 2010, Intel Corporation.
>> + * Len Brown<len.brown@intel.com>
>> + *
>> + * This program is free software; you can redistribute it and/or
>> modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but
>> WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
>> License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> along with
>> + * this program; if not, write to the Free Software Foundation, Inc.,
>> + * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
>> + */
>> +
>> +#include<stdio.h>
>> +#include<unistd.h>
>> +#include<sys/types.h>
>> +#include<sys/stat.h>
>> +#include<sys/resource.h>
>> +#include<fcntl.h>
>> +#include<signal.h>
>> +#include<sys/time.h>
>> +#include<stdlib.h>
>> +#include<string.h>
>> +
>> +unsigned int verbose; /* set with -v */
>> +unsigned int read_only; /* set with -r */
>> +char *progname;
>> +unsigned long long new_bias;
>> +int cpu = -1;
>> +
>> +/*
>> + * Usage:
>> + *
>> + * -c cpu: limit action to a single CPU (default is all CPUs)
>> + * -v: verbose output (can invoke more than once)
>> + * -r: read-only, don't change any settings
>> + *
>> + * performance
>> + * Performance is paramount.
>> + * Unwilling to sacrafice any performance
>> + * for the sake of energy saving. (hardware default)
>> + *
>> + * normal
>> + * Can tolerate minor performance compromise
>> + * for potentially significant energy savings.
>> + * (reasonable default for most desktops and servers)
>> + *
>> + * powersave
>> + * Can tolerate significant performance hit
>> + * to maximize energy savings.
>> + *
>> + * n
>> + * a numerical value to write to the underlying MSR.
>> + */
>> +void usage(void)
>> +{
>> + printf("%s: [-c cpu] [-v] "
>> + "(-r | 'performance' | 'normal' | 'powersave' | n)\n",
>> + progname);
>> + exit(1);
>> +}
>> +
>> +#define MSR_IA32_ENERGY_PERF_BIAS 0x000001b0
>> +
>> +#define BIAS_PERFORMANCE 0
>> +#define BIAS_BALANCE 6
>> +#define BIAS_POWERSAVE 15
>> +
>> +void cmdline(int argc, char **argv)
>> +{
>> + int opt;
>> +
>> + progname = argv[0];
>> +
>> + while ((opt = getopt(argc, argv, "+rvc:")) != -1) {
>> + switch (opt) {
>> + case 'c':
>> + cpu = atoi(optarg);
>> + break;
>> + case 'r':
>> + read_only = 1;
>> + break;
>> + case 'v':
>> + verbose++;
>> + break;
>> + default:
>> + usage();
>> + }
>> + }
>> + /* if -r, then should be no additional optind */
>> + if (read_only&& (argc> optind))
>> + usage();
>> +
>> + /*
>> + * if no -r , then must be one additional optind
>> + */
>> + if (!read_only) {
>> +
>> + if (argc != optind + 1) {
>> + printf("must supply -r or policy param\n");
>> + usage();
>> + }
>> +
>> + if (!strcmp("performance", argv[optind])) {
>> + new_bias = BIAS_PERFORMANCE;
>> + } else if (!strcmp("normal", argv[optind])) {
>> + new_bias = BIAS_BALANCE;
>> + } else if (!strcmp("powersave", argv[optind])) {
>> + new_bias = BIAS_POWERSAVE;
>> + } else {
>> + char *endptr;
>> +
>> + new_bias = strtoull(argv[optind],&endptr, 0);
>> + if (endptr == argv[optind] ||
>> + new_bias> BIAS_POWERSAVE) {
>> + fprintf(stderr, "invalid value: %s\n",
>> + argv[optind]);
>> + usage();
>> + }
>> + }
>> + }
>> +}
>> +
>> +/*
>> + * validate_cpuid()
>> + * returns on success, quietly exits on failure (make verbose with -v)
>> + */
>> +void validate_cpuid(void)
>> +{
>> + unsigned int eax, ebx, ecx, edx, max_level;
>> + char brand[16];
>> + unsigned int fms, family, model, stepping;
>> +
>> + eax = ebx = ecx = edx = 0;
>> +
>> + asm("cpuid" : "=a" (max_level), "=b" (ebx), "=c" (ecx),
>> + "=d" (edx) : "a" (0));
>> +
>> + if (ebx != 0x756e6547 || edx != 0x49656e69 || ecx != 0x6c65746e) {
>> + if (verbose)
>> + fprintf(stderr, "%.4s%.4s%.4s != GenuineIntel",
>> + (char *)&ebx, (char *)&edx, (char *)&ecx);
>> + exit(1);
>> + }
>> +
>> + asm("cpuid" : "=a" (fms), "=c" (ecx), "=d" (edx) : "a" (1) : "ebx");
>> + family = (fms>> 8)& 0xf;
>> + model = (fms>> 4)& 0xf;
>> + stepping = fms& 0xf;
>> + if (family == 6 || family == 0xf)
>> + model += ((fms>> 16)& 0xf)<< 4;
>> +
>> + if (verbose> 1)
>> + printf("CPUID %s %d levels family:model:stepping "
>> + "0x%x:%x:%x (%d:%d:%d)\n", brand, max_level,
>> + family, model, stepping, family, model, stepping);
>> +
>> + if (!(edx& (1<< 5))) {
>> + if (verbose)
>> + printf("CPUID: no MSR\n");
>> + exit(1);
>> + }
>> +
>> + /*
>> + * Support for MSR_IA32_ENERGY_PERF_BIAS
>> + * is indicated by CPUID.06H.ECX.bit3
>> + */
>> + asm("cpuid" : "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx) : "a"
>> (6));
>> + if (verbose)
>> + printf("CPUID.06H.ECX: 0x%x\n", ecx);
>> + if (!(ecx& (1<< 3))) {
>> + if (verbose)
>> + printf("CPUID: No MSR_IA32_ENERGY_PERF_BIAS\n");
>> + exit(1);
>> + }
>> + return; /* success */
>> +}
>> +
>> +unsigned long long get_msr(int cpu, int offset)
>> +{
>> + unsigned long long msr;
>> + char msr_path[32];
>> + int retval;
>> + int fd;
>> +
>> + sprintf(msr_path, "/dev/cpu/%d/msr", cpu);
>> + fd = open(msr_path, O_RDONLY);
>> + if (fd< 0) {
>> + printf("Try \"# modprobe msr\"\n");
>> + perror(msr_path);
>> + exit(1);
>> + }
>> +
>> + retval = pread(fd,&msr, sizeof msr, offset);
>> +
>> + if (retval != sizeof msr) {
>> + printf("pread cpu%d 0x%x = %d\n", cpu, offset, retval);
>> + exit(-2);
>> + }
>> + close(fd);
>> + return msr;
>> +}
>> +
>> +unsigned long long put_msr(int cpu, unsigned long long new_msr, int
>> offset)
>> +{
>> + unsigned long long old_msr;
>> + char msr_path[32];
>> + int retval;
>> + int fd;
>> +
>> + sprintf(msr_path, "/dev/cpu/%d/msr", cpu);
>> + fd = open(msr_path, O_RDWR);
>> + if (fd< 0) {
>> + perror(msr_path);
>> + exit(1);
>> + }
>> +
>> + retval = pread(fd,&old_msr, sizeof old_msr, offset);
>> + if (retval != sizeof old_msr) {
>> + perror("pwrite");
>> + printf("pread cpu%d 0x%x = %d\n", cpu, offset, retval);
>> + exit(-2);
>> + }
>> +
>> + retval = pwrite(fd,&new_msr, sizeof new_msr, offset);
>> + if (retval != sizeof new_msr) {
>> + perror("pwrite");
>> + printf("pwrite cpu%d 0x%x = %d\n", cpu, offset, retval);
>> + exit(-2);
>> + }
>> +
>> + close(fd);
>> +
>> + return old_msr;
>> +}
>> +
>> +void print_msr(int cpu)
>> +{
>> + printf("cpu%d: 0x%016llx\n",
>> + cpu, get_msr(cpu, MSR_IA32_ENERGY_PERF_BIAS));
>> +}
>> +
>> +void update_msr(int cpu)
>> +{
>> + unsigned long long previous_msr;
>> +
>> + previous_msr = put_msr(cpu, new_bias, MSR_IA32_ENERGY_PERF_BIAS);
>> +
>> + if (verbose)
>> + printf("cpu%d msr0x%x 0x%016llx -> 0x%016llx\n",
>> + cpu, MSR_IA32_ENERGY_PERF_BIAS, previous_msr, new_bias);
>> +
>> + return;
>> +}
>> +
>> +char *proc_stat = "/proc/stat";
>> +/*
>> + * run func() on every cpu in /dev/cpu
>> + */
>> +void for_every_cpu(void (func)(int))
>> +{
>> + FILE *fp;
>> + int retval;
>> +
>> + fp = fopen(proc_stat, "r");
>> + if (fp == NULL) {
>> + perror(proc_stat);
>> + exit(1);
>> + }
>> +
>> + retval = fscanf(fp, "cpu %*d %*d %*d %*d %*d %*d %*d %*d %*d %*d\n");
>> + if (retval != 0) {
>> + perror("/proc/stat format");
>> + exit(1);
>> + }
>> +
>> + while (1) {
>> + int cpu;
>> +
>> + retval = fscanf(fp,
>> + "cpu%u %*d %*d %*d %*d %*d %*d %*d %*d %*d %*d\n",
>> + &cpu);
>> + if (retval != 1)
>> + return;
>> +
>> + func(cpu);
>> + }
>> + fclose(fp);
>> +}
>> +
>> +int main(int argc, char **argv)
>> +{
>> + cmdline(argc, argv);
>> +
>> + if (verbose> 1)
>> + printf("x86_energy_perf_policy Nov 24, 2010"
>> + " - Len Brown<lenb@kernel.org>\n");
>> + if (verbose> 1&& !read_only)
>> + printf("new_bias %lld\n", new_bias);
>> +
>> + validate_cpuid();
>> +
>> + if (cpu != -1) {
>> + if (read_only)
>> + print_msr(cpu);
>> + else
>> + update_msr(cpu);
>> + } else {
>> + if (read_only)
>> + for_every_cpu(print_msr);
>> + else
>> + for_every_cpu(update_msr);
>> + }
>> +
>> + return 0;
>> +}
>>
> I have 2 questions.
>
> 1. the usage looks too simple. If I haven't read the comments
> in the source codes, I even can't know the exact meaning of these
> parameters. Such as -v, -vv etc. How about adding the comments
> as the part of the usage ?
>
> 2. the paramter "noraml | performance | powersave | n" looks weird.
> why it can't look like other paramter (-r, -v etc.). For example,
> I can't use it such as
> "./x86_energy_perf_policy -c 0 normal -v"
> --

One more question. From the spec, it should write 1 to the MSR 0x1FC[18]
to enable this function after setting the Energy Policy on all threads 
in one package.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2010-11-25  8:59 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-16 21:05 RFC: /sys/power/policy_preference Len Brown
2010-06-17  6:03 ` [linux-pm] " Igor.Stoppa
2010-06-17 19:00   ` Len Brown
2010-06-17 16:14 ` Victor Lowther
2010-06-17 19:02   ` Len Brown
2010-06-17 22:23     ` Victor Lowther
2010-06-18  5:56       ` Len Brown
2010-06-18 11:55         ` Victor Lowther
2010-06-19 15:17   ` Vaidyanathan Srinivasan
2010-06-19 19:04     ` Rafael J. Wysocki
2010-06-17 20:48 ` Mike Chan
2010-06-18  6:25   ` Len Brown
2010-06-21 20:10 ` [linux-pm] " Dipankar Sarma
2010-09-28 16:17 ` x86_energy_perf_policy.c Len Brown
2010-10-23  4:40   ` [PATCH] tools: add x86_energy_perf_policy to program MSR_IA32_ENERGY_PERF_BIAS Len Brown
2010-10-27  3:23     ` Andrew Morton
2010-10-27  6:01       ` Ingo Molnar
2010-10-27 11:43         ` Arnaldo Carvalho de Melo
2010-11-15 16:07     ` [PATCH RESEND] tools: add power/x86/x86_energy_perf_policy " Len Brown
2010-11-17 11:35       ` Andi Kleen
2010-11-22 20:13         ` Len Brown
2010-11-22 20:33           ` Andi Kleen
2010-11-23  4:48             ` Len Brown
2010-11-24  5:31       ` [PATCH v2] tools: create power/x86/x86_energy_perf_policy Len Brown
2010-11-25  5:52         ` Chen Gong
2010-11-25  8:59           ` Chen Gong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).