All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Documentation: cpufreq: intel_pstate: enhance documentation
@ 2015-12-21 18:05 Srinivas Pandruvada
  2015-12-22 17:26 ` Doug Smythies
  0 siblings, 1 reply; 3+ messages in thread
From: Srinivas Pandruvada @ 2015-12-21 18:05 UTC (permalink / raw)
  To: rafael; +Cc: len.brown, linux-pm, dsmythies, trenn, prarit, Srinivas Pandruvada

This is an attempt to make documentation more user friendly.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 Documentation/cpu-freq/intel-pstate.txt | 183 ++++++++++++++++++++++++--------
 1 file changed, 141 insertions(+), 42 deletions(-)

diff --git a/Documentation/cpu-freq/intel-pstate.txt b/Documentation/cpu-freq/intel-pstate.txt
index be8d400..592f3d1 100644
--- a/Documentation/cpu-freq/intel-pstate.txt
+++ b/Documentation/cpu-freq/intel-pstate.txt
@@ -1,61 +1,131 @@
-Intel P-state driver
+Intel P-State driver
 --------------------
 
-This driver provides an interface to control the P state selection for
-SandyBridge+ Intel processors.  The driver can operate two different
-modes based on the processor model, legacy mode and Hardware P state (HWP)
-mode.
-
-In legacy mode, the Intel P-state implements two internal governors,
-performance and powersave, that differ from the general cpufreq governors of
-the same name (the general cpufreq governors implement target(), whereas the
-internal Intel P-state governors implement setpolicy()).  The internal
-performance governor sets the max_perf_pct and min_perf_pct to 100; that is,
-the governor selects the highest available P state to maximize the performance
-of the core.  The internal powersave governor selects the appropriate P state
-based on the current load on the CPU.
-
-In HWP mode P state selection is implemented in the processor
-itself. The driver provides the interfaces between the cpufreq core and
-the processor to control P state selection based on user preferences
-and reporting frequency to the cpufreq core.  In this mode the
-internal Intel P-state governor code is disabled.
-
-In addition to the interfaces provided by the cpufreq core for
-controlling frequency the driver provides sysfs files for
-controlling P state selection. These files have been added to
-/sys/devices/system/cpu/intel_pstate/
-
-      max_perf_pct: limits the maximum P state that will be requested by
-      the driver stated as a percentage of the available performance. The
-      available (P states) performance may be reduced by the no_turbo
+This driver provides an interface to control the P-State selection for the
+SandyBridge+ Intel processors.
+
+The following document explains P-States:
+http://events.linuxfoundation.org/sites/events/files/slides/LinuxConEurope_2015.pdf
+As stated in the document, P-State doesn’t exactly mean a frequency. However, for
+the sake of the relationship with cpufreq, P-State and frequency are used
+interchangeably.
+
+Understanding the cpufreq core governors and policies are important before
+discussing more details about the Intel P-State driver. Based on what callbacks
+a cpufreq driver provides to the cpufreq core, it can support two types of
+drivers:
+- with target_index() callback: In this mode, the drivers using cpufreq core
+simply provide the minimum and maximum frequency limits and an additional
+interface target_index() to set the current frequency. The cpufreq subsystem
+has a number of scaling governors ("performance", "powersave", "ondemand",
+etc.). Depending on which governor is in use, cpufreq core will call for
+transitions to a specific frequency using target_index() callback.
+- setpolicy() callback: In this mode, drivers do not provide target_index()
+callback, so cpufreq core can't request a transition to a specific frequency.
+The driver provides minimum and maximum frequency limits and callbacks to set a
+policy. The policy in cpufreq sysfs is referred to as the "scaling governor".
+The cpufreq core can request the driver to operate in any of the two policies:
+"performance: and "powersave". The driver decides which frequency to use based
+on the above policy selection considering minimum and maximum frequency limits.
+
+The Intel P-State driver falls under the latter category, which implements the
+setpolicy() callback. This driver decides what P-State to use based on the
+requested policy from the cpufreq core. If the processor is capable of
+selecting its next P-State internally, then the driver will offload this
+responsibility to the processor (aka HWP: Hardware P-States). If not, the
+driver implements algorithms to select the next P-State.
+
+Since these policies are implemented in the driver, they are not same as the
+cpufreq scaling governors implementation, even if they have the same name in
+the cpufreq sysfs (scaling_governors). For example the "performance" policy is
+similar to cpufreq’s "performance" governor, but "powersave" is completely
+different than the cpufreq "powersave" governor. The strategy here is similar
+to cpufreq "ondemand", where the requested P-State is related to the system load.
+
+Sysfs Interface
+
+In addition to the frequency-controlling interfaces provided by the cpufreq
+core, the driver provides its own sysfs files to control the P-State selection.
+These files have been added to /sys/devices/system/cpu/intel_pstate/.
+Any changes made to these files are applicable to all CPUs (even in a
+multi-package system).
+
+      max_perf_pct: Limits the maximum P-State that will be requested by
+      the driver. It states it as a percentage of the available performance. The
+      available (P-State) performance may be reduced by the no_turbo
       setting described below.
 
-      min_perf_pct: limits the minimum P state that will be  requested by
-      the driver stated as a percentage of the max (non-turbo)
+      min_perf_pct: Limits the minimum P-State that will be requested by
+      the driver. It states it as a percentage of the max (non-turbo)
       performance level.
 
-      no_turbo: limits the driver to selecting P states below the turbo
+      no_turbo: Limits the driver to selecting P-State below the turbo
       frequency range.
 
-      turbo_pct: displays the percentage of the total performance that
-      is supported by hardware that is in the turbo range.  This number
+      turbo_pct: Displays the percentage of the total performance that
+      is supported by hardware that is in the turbo range. This number
       is independent of whether turbo has been disabled or not.
 
-      num_pstates: displays the number of pstates that are supported
-      by hardware.  This number is independent of whether turbo has
+      num_pstates: Displays the number of P-States that are supported
+      by hardware. This number is independent of whether turbo has
       been disabled or not.
 
+For example, if a system has these parameters:
+	Max 1 core turbo ratio: 0x21 (Max 1 core ratio is the maximum P-State)
+	Max non turbo ratio: 0x17
+	Minimum ratio : 0x08 (Here the ratio is called max efficiency ratio)
+
+Sysfs will show :
+	max_perf_pct:100, which corresponds to 1 core ratio
+	min_perf_pct:24, max_efficiency_ratio / max 1 Core ratio
+	no_turbo:0, turbo is not disabled
+	num_pstates:26 = (max 1 Core ratio - Max Efficiency Ratio + 1)
+	turbo_pct:39 = (max 1 core ratio - max non turbo ratio) / num_pstates
+
+Refer to "Intel® 64 and IA-32 Architectures Software Developer’s Manual
+Volume 3: System Programming Guide" to understand ratios.
+
+cpufreq sysfs for Intel P-State
+
+Since this driver registers with cpufreq, cpufreq sysfs is also presented.
+There are some important differences, which need to be considered.
+
+scaling_cur_freq: This displays the real frequency which was used during
+the last sample period instead of what is requested. Some other cpufreq driver,
+like acpi-cpufreq, displays what is requested (Some changes are on the
+way to fix this for acpi-cpufreq driver). The same is true for frequencies
+displayed at /proc/cpuinfo.
+
+scaling_governor: This displays current active policy. Since each CPU has a
+cpufreq sysfs, it is possible to set a scaling governor to each CPU. But this
+is not possible with Intel P-States, as there is one common policy for all
+CPUs. Here, the last requested policy will be applicable to all CPUs. It is
+suggested that use the cpupower utility to change policy to all CPUs at the
+same time.
+
+scaling_setspeed: This attribute can never be used with Intel P-State.
+
+scaling_max_freq/scaling_min_freq: This interface can be used similarly to
+the max_perf_pct/min_perf_pct of Intel P-State sysfs. However since frequencies
+are converted to nearest possible P-State, this is prone to rounding errors.
+This method is not preferred to limit performance.
+
+affected_cpus: Not used
+related_cpus: Not used
+
 For contemporary Intel processors, the frequency is controlled by the
-processor itself and the P-states exposed to software are related to
+processor itself and the P-State exposed to software are related to
 performance levels.  The idea that frequency can be set to a single
-frequency is fiction for Intel Core processors. Even if the scaling
-driver selects a single P state the actual frequency the processor
+frequency is fictional for Intel Core processors. Even if the scaling
+driver selects a single P-State, the actual frequency the processor
 will run at is selected by the processor itself.
 
-For legacy mode debugfs files have also been added to allow tuning of
-the internal governor algorythm. These files are located at
-/sys/kernel/debug/pstate_snb/ These files are NOT present in HWP mode.
+Tuning Intel P-State driver
+
+When HWP mode is not used, debugfs files have also been added to allow the
+tuning of the internal governor algorithm. These files are located at
+/sys/kernel/debug/pstate_snb/. The algorithm uses a PID (A proportional–
+integral–derivative) controller. The PID tuninable parameters are:
 
       deadband
       d_gain_pct
@@ -63,3 +133,32 @@ the internal governor algorythm. These files are located at
       p_gain_pct
       sample_rate_ms
       setpoint
+
+To adjust these parameters, some understanding of driver implementation is
+necessary. There are some tweeks described here, but be very careful. Adjusting
+them requires expert level understanding of power and performance relationship.
+These limits are only useful when the "powersave" policy is active.
+
+-To make the system more responsive to load changes, sample_rate_ms can
+be adjusted  (current default is 10ms).
+-To make the system use higher performance, even if the load is lower, setpoint
+can be adjusted to a lower number.
+If there are no derivative and integral coefficients, The next P-State will be
+equal to:
+	current P-State - ((setpoint - current cpu load) * p_gain_pct)
+
+For example, if the current PID parameters are:
+      deadband = 0
+      d_gain_pct = 0
+      i_gain_pct = 0
+      p_gain_pct = 20
+      sample_rate_ms = 10
+      setpoint = 80
+
+If the current P-State = 0x08 and current load = 100, this will result in the
+next P-State = 0x08 - ((80 - 100) * 0.2) = 12
+For the same load at setpoint = 60 this will result in the next P-State
+= 0x08 - ((60 - 100) * 0.2) = 16
+So by changing the setpoint from 80 to 60, there is an increase of the
+next P-State from 12 to 16. So this will make processor to execute at
+higher P-State for the same CPU load.
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* RE: [PATCH] Documentation: cpufreq: intel_pstate: enhance documentation
  2015-12-21 18:05 [PATCH] Documentation: cpufreq: intel_pstate: enhance documentation Srinivas Pandruvada
@ 2015-12-22 17:26 ` Doug Smythies
  2015-12-22 17:48   ` Srinivas Pandruvada
  0 siblings, 1 reply; 3+ messages in thread
From: Doug Smythies @ 2015-12-22 17:26 UTC (permalink / raw)
  To: 'Srinivas Pandruvada'
  Cc: len.brown, linux-pm, trenn, prarit, rafael, Doug Smythies

Hi Srinivas,
Just two typos.

On 2015.12.21 10:05 Srinivas Pandruvada wrote:

> +scaling_governor: This displays current active policy. Since each CPU has a
> +cpufreq sysfs, it is possible to set a scaling governor to each CPU. But this
> +is not possible with Intel P-States, as there is one common policy for all
> +CPUs. Here, the last requested policy will be applicable to all CPUs. It is

> >suggested that use the cpupower utility to change policy to all CPUs at the

+suggested that one use the cpupower utility to change policy to all CPUs at the

> +same time.
> +
> +scaling_setspeed: This attribute can never be used with Intel P-State.

> +      setpoint = 80
> +
> +If the current P-State = 0x08 and current load = 100, this will result in the
> +next P-State = 0x08 - ((80 - 100) * 0.2) = 12
> +For the same load at setpoint = 60 this will result in the next P-State
> += 0x08 - ((60 - 100) * 0.2) = 16
> +So by changing the setpoint from 80 to 60, there is an increase of the

> >next P-State from 12 to 16. So this will make processor to execute at

+ next P-State from 12 to 16. So this will make the processor execute at

> +higher P-State for the same CPU load.



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] Documentation: cpufreq: intel_pstate: enhance documentation
  2015-12-22 17:26 ` Doug Smythies
@ 2015-12-22 17:48   ` Srinivas Pandruvada
  0 siblings, 0 replies; 3+ messages in thread
From: Srinivas Pandruvada @ 2015-12-22 17:48 UTC (permalink / raw)
  To: Doug Smythies; +Cc: len.brown, linux-pm, trenn, prarit, rafael

On Tue, 2015-12-22 at 09:26 -0800, Doug Smythies wrote:
> Hi Srinivas,
Hi Doug,

Thanks for the review. Let me know if we can make any other
improvements.

Thanks,
Srinivas

> Just two typos.
> 
> On 2015.12.21 10:05 Srinivas Pandruvada wrote:
> 
> > +scaling_governor: This displays current active policy. Since each
> > CPU has a
> > +cpufreq sysfs, it is possible to set a scaling governor to each
> > CPU. But this
> > +is not possible with Intel P-States, as there is one common policy
> > for all
> > +CPUs. Here, the last requested policy will be applicable to all
> > CPUs. It is
> 
> > > suggested that use the cpupower utility to change policy to all
> > > CPUs at the
> 
> +suggested that one use the cpupower utility to change policy to all
> CPUs at the
> 
> > +same time.
> > +
> > +scaling_setspeed: This attribute can never be used with Intel P
> > -State.
> 
> > +      setpoint = 80
> > +
> > +If the current P-State = 0x08 and current load = 100, this will
> > result in the
> > +next P-State = 0x08 - ((80 - 100) * 0.2) = 12
> > +For the same load at setpoint = 60 this will result in the next P
> > -State
> > += 0x08 - ((60 - 100) * 0.2) = 16
> > +So by changing the setpoint from 80 to 60, there is an increase of
> > the
> 
> > > next P-State from 12 to 16. So this will make processor to
> > > execute at
> 
> + next P-State from 12 to 16. So this will make the processor execute
> at
> 
> > +higher P-State for the same CPU load.
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-12-22 17:50 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-21 18:05 [PATCH] Documentation: cpufreq: intel_pstate: enhance documentation Srinivas Pandruvada
2015-12-22 17:26 ` Doug Smythies
2015-12-22 17:48   ` Srinivas Pandruvada

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.