All of lore.kernel.org
 help / color / mirror / Atom feed
From: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
To: rafael@kernel.org
Cc: len.brown@intel.com, linux-pm@vger.kernel.org,
	dsmythies@telus.net, trenn@suse.de, prarit@redhat.com,
	Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Subject: [PATCH] Documentation: cpufreq: intel_pstate: enhance documentation
Date: Mon, 21 Dec 2015 10:05:17 -0800	[thread overview]
Message-ID: <1450721117-7620-1-git-send-email-srinivas.pandruvada@linux.intel.com> (raw)

This is an attempt to make documentation more user friendly.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 Documentation/cpu-freq/intel-pstate.txt | 183 ++++++++++++++++++++++++--------
 1 file changed, 141 insertions(+), 42 deletions(-)

diff --git a/Documentation/cpu-freq/intel-pstate.txt b/Documentation/cpu-freq/intel-pstate.txt
index be8d400..592f3d1 100644
--- a/Documentation/cpu-freq/intel-pstate.txt
+++ b/Documentation/cpu-freq/intel-pstate.txt
@@ -1,61 +1,131 @@
-Intel P-state driver
+Intel P-State driver
 --------------------
 
-This driver provides an interface to control the P state selection for
-SandyBridge+ Intel processors.  The driver can operate two different
-modes based on the processor model, legacy mode and Hardware P state (HWP)
-mode.
-
-In legacy mode, the Intel P-state implements two internal governors,
-performance and powersave, that differ from the general cpufreq governors of
-the same name (the general cpufreq governors implement target(), whereas the
-internal Intel P-state governors implement setpolicy()).  The internal
-performance governor sets the max_perf_pct and min_perf_pct to 100; that is,
-the governor selects the highest available P state to maximize the performance
-of the core.  The internal powersave governor selects the appropriate P state
-based on the current load on the CPU.
-
-In HWP mode P state selection is implemented in the processor
-itself. The driver provides the interfaces between the cpufreq core and
-the processor to control P state selection based on user preferences
-and reporting frequency to the cpufreq core.  In this mode the
-internal Intel P-state governor code is disabled.
-
-In addition to the interfaces provided by the cpufreq core for
-controlling frequency the driver provides sysfs files for
-controlling P state selection. These files have been added to
-/sys/devices/system/cpu/intel_pstate/
-
-      max_perf_pct: limits the maximum P state that will be requested by
-      the driver stated as a percentage of the available performance. The
-      available (P states) performance may be reduced by the no_turbo
+This driver provides an interface to control the P-State selection for the
+SandyBridge+ Intel processors.
+
+The following document explains P-States:
+http://events.linuxfoundation.org/sites/events/files/slides/LinuxConEurope_2015.pdf
+As stated in the document, P-State doesn’t exactly mean a frequency. However, for
+the sake of the relationship with cpufreq, P-State and frequency are used
+interchangeably.
+
+Understanding the cpufreq core governors and policies are important before
+discussing more details about the Intel P-State driver. Based on what callbacks
+a cpufreq driver provides to the cpufreq core, it can support two types of
+drivers:
+- with target_index() callback: In this mode, the drivers using cpufreq core
+simply provide the minimum and maximum frequency limits and an additional
+interface target_index() to set the current frequency. The cpufreq subsystem
+has a number of scaling governors ("performance", "powersave", "ondemand",
+etc.). Depending on which governor is in use, cpufreq core will call for
+transitions to a specific frequency using target_index() callback.
+- setpolicy() callback: In this mode, drivers do not provide target_index()
+callback, so cpufreq core can't request a transition to a specific frequency.
+The driver provides minimum and maximum frequency limits and callbacks to set a
+policy. The policy in cpufreq sysfs is referred to as the "scaling governor".
+The cpufreq core can request the driver to operate in any of the two policies:
+"performance: and "powersave". The driver decides which frequency to use based
+on the above policy selection considering minimum and maximum frequency limits.
+
+The Intel P-State driver falls under the latter category, which implements the
+setpolicy() callback. This driver decides what P-State to use based on the
+requested policy from the cpufreq core. If the processor is capable of
+selecting its next P-State internally, then the driver will offload this
+responsibility to the processor (aka HWP: Hardware P-States). If not, the
+driver implements algorithms to select the next P-State.
+
+Since these policies are implemented in the driver, they are not same as the
+cpufreq scaling governors implementation, even if they have the same name in
+the cpufreq sysfs (scaling_governors). For example the "performance" policy is
+similar to cpufreq’s "performance" governor, but "powersave" is completely
+different than the cpufreq "powersave" governor. The strategy here is similar
+to cpufreq "ondemand", where the requested P-State is related to the system load.
+
+Sysfs Interface
+
+In addition to the frequency-controlling interfaces provided by the cpufreq
+core, the driver provides its own sysfs files to control the P-State selection.
+These files have been added to /sys/devices/system/cpu/intel_pstate/.
+Any changes made to these files are applicable to all CPUs (even in a
+multi-package system).
+
+      max_perf_pct: Limits the maximum P-State that will be requested by
+      the driver. It states it as a percentage of the available performance. The
+      available (P-State) performance may be reduced by the no_turbo
       setting described below.
 
-      min_perf_pct: limits the minimum P state that will be  requested by
-      the driver stated as a percentage of the max (non-turbo)
+      min_perf_pct: Limits the minimum P-State that will be requested by
+      the driver. It states it as a percentage of the max (non-turbo)
       performance level.
 
-      no_turbo: limits the driver to selecting P states below the turbo
+      no_turbo: Limits the driver to selecting P-State below the turbo
       frequency range.
 
-      turbo_pct: displays the percentage of the total performance that
-      is supported by hardware that is in the turbo range.  This number
+      turbo_pct: Displays the percentage of the total performance that
+      is supported by hardware that is in the turbo range. This number
       is independent of whether turbo has been disabled or not.
 
-      num_pstates: displays the number of pstates that are supported
-      by hardware.  This number is independent of whether turbo has
+      num_pstates: Displays the number of P-States that are supported
+      by hardware. This number is independent of whether turbo has
       been disabled or not.
 
+For example, if a system has these parameters:
+	Max 1 core turbo ratio: 0x21 (Max 1 core ratio is the maximum P-State)
+	Max non turbo ratio: 0x17
+	Minimum ratio : 0x08 (Here the ratio is called max efficiency ratio)
+
+Sysfs will show :
+	max_perf_pct:100, which corresponds to 1 core ratio
+	min_perf_pct:24, max_efficiency_ratio / max 1 Core ratio
+	no_turbo:0, turbo is not disabled
+	num_pstates:26 = (max 1 Core ratio - Max Efficiency Ratio + 1)
+	turbo_pct:39 = (max 1 core ratio - max non turbo ratio) / num_pstates
+
+Refer to "Intel® 64 and IA-32 Architectures Software Developer’s Manual
+Volume 3: System Programming Guide" to understand ratios.
+
+cpufreq sysfs for Intel P-State
+
+Since this driver registers with cpufreq, cpufreq sysfs is also presented.
+There are some important differences, which need to be considered.
+
+scaling_cur_freq: This displays the real frequency which was used during
+the last sample period instead of what is requested. Some other cpufreq driver,
+like acpi-cpufreq, displays what is requested (Some changes are on the
+way to fix this for acpi-cpufreq driver). The same is true for frequencies
+displayed at /proc/cpuinfo.
+
+scaling_governor: This displays current active policy. Since each CPU has a
+cpufreq sysfs, it is possible to set a scaling governor to each CPU. But this
+is not possible with Intel P-States, as there is one common policy for all
+CPUs. Here, the last requested policy will be applicable to all CPUs. It is
+suggested that use the cpupower utility to change policy to all CPUs at the
+same time.
+
+scaling_setspeed: This attribute can never be used with Intel P-State.
+
+scaling_max_freq/scaling_min_freq: This interface can be used similarly to
+the max_perf_pct/min_perf_pct of Intel P-State sysfs. However since frequencies
+are converted to nearest possible P-State, this is prone to rounding errors.
+This method is not preferred to limit performance.
+
+affected_cpus: Not used
+related_cpus: Not used
+
 For contemporary Intel processors, the frequency is controlled by the
-processor itself and the P-states exposed to software are related to
+processor itself and the P-State exposed to software are related to
 performance levels.  The idea that frequency can be set to a single
-frequency is fiction for Intel Core processors. Even if the scaling
-driver selects a single P state the actual frequency the processor
+frequency is fictional for Intel Core processors. Even if the scaling
+driver selects a single P-State, the actual frequency the processor
 will run at is selected by the processor itself.
 
-For legacy mode debugfs files have also been added to allow tuning of
-the internal governor algorythm. These files are located at
-/sys/kernel/debug/pstate_snb/ These files are NOT present in HWP mode.
+Tuning Intel P-State driver
+
+When HWP mode is not used, debugfs files have also been added to allow the
+tuning of the internal governor algorithm. These files are located at
+/sys/kernel/debug/pstate_snb/. The algorithm uses a PID (A proportional–
+integral–derivative) controller. The PID tuninable parameters are:
 
       deadband
       d_gain_pct
@@ -63,3 +133,32 @@ the internal governor algorythm. These files are located at
       p_gain_pct
       sample_rate_ms
       setpoint
+
+To adjust these parameters, some understanding of driver implementation is
+necessary. There are some tweeks described here, but be very careful. Adjusting
+them requires expert level understanding of power and performance relationship.
+These limits are only useful when the "powersave" policy is active.
+
+-To make the system more responsive to load changes, sample_rate_ms can
+be adjusted  (current default is 10ms).
+-To make the system use higher performance, even if the load is lower, setpoint
+can be adjusted to a lower number.
+If there are no derivative and integral coefficients, The next P-State will be
+equal to:
+	current P-State - ((setpoint - current cpu load) * p_gain_pct)
+
+For example, if the current PID parameters are:
+      deadband = 0
+      d_gain_pct = 0
+      i_gain_pct = 0
+      p_gain_pct = 20
+      sample_rate_ms = 10
+      setpoint = 80
+
+If the current P-State = 0x08 and current load = 100, this will result in the
+next P-State = 0x08 - ((80 - 100) * 0.2) = 12
+For the same load at setpoint = 60 this will result in the next P-State
+= 0x08 - ((60 - 100) * 0.2) = 16
+So by changing the setpoint from 80 to 60, there is an increase of the
+next P-State from 12 to 16. So this will make processor to execute at
+higher P-State for the same CPU load.
-- 
2.4.3


             reply	other threads:[~2015-12-21 18:07 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-21 18:05 Srinivas Pandruvada [this message]
2015-12-22 17:26 ` [PATCH] Documentation: cpufreq: intel_pstate: enhance documentation Doug Smythies
2015-12-22 17:48   ` Srinivas Pandruvada

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1450721117-7620-1-git-send-email-srinivas.pandruvada@linux.intel.com \
    --to=srinivas.pandruvada@linux.intel.com \
    --cc=dsmythies@telus.net \
    --cc=len.brown@intel.com \
    --cc=linux-pm@vger.kernel.org \
    --cc=prarit@redhat.com \
    --cc=rafael@kernel.org \
    --cc=trenn@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.