From: "Doug Smythies" <dsmythies@telus.net>
To: "'Rafael J. Wysocki'" <rjw@rjwysocki.net>,
"'Linux PM'" <linux-pm@vger.kernel.org>
Cc: "'LKML'" <linux-kernel@vger.kernel.org>,
"'Len Brown'" <len.brown@intel.com>,
"'Srinivas Pandruvada'" <srinivas.pandruvada@linux.intel.com>,
"'Peter Zijlstra'" <peterz@infradead.org>,
"'Giovanni Gherdovich'" <ggherdovich@suse.cz>,
"'Francisco Jerez'" <francisco.jerez.plata@intel.com>
Subject: RE: [RFC/RFT][PATCH] cpufreq: intel_pstate: Work in passive mode with HWP enabled
Date: Sun, 14 Jun 2020 23:18:16 -0700 [thread overview]
Message-ID: <002801d642dc$c5225c10$4f671430$@net> (raw)
In-Reply-To: <3169564.ZRsPWhXyMD@kreacher>
On 2020.05.21 10:16 Rafael J. Wysocki wrote:
>
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Allow intel_pstate to work in the passive mode with HWP enabled and
> make it translate the target frequency supplied by the cpufreq
> governor in use into an EPP value to be written to the HWP request
> MSR (high frequencies are mapped to low EPP values that mean more
> performance-oriented HWP operation) as a hint for the HWP algorithm
> in the processor, so as to prevent it and the CPU scheduler from
> working against each other at least when the schedutil governor is
> in use.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>
> This is a prototype not intended for production use (based on linux-next).
>
> Please test it if you can (on HWP systems, of course) and let me know the
> results.
>
> The INTEL_CPUFREQ_TRANSITION_DELAY_HWP value has been guessed and it very well
> may turn out to be either too high or too low for the general use, which is one
> reason why getting as much testing coverage as possible is key here.
>
> If you can play with different INTEL_CPUFREQ_TRANSITION_DELAY_HWP values,
> please do so and let me know the conclusions.
>
> Cheers,
> Rafael
To anyone trying this patch:
You will need to monitor EPP (Energy Performance Preference) carefully.
It changes as a function of passive/active, and if you booted active
or passive or no-hwp and changed later.
Originally, I was not specifically monitoring EPP, or paths taken since boot
towards driver, intel_pstate or intel_cpufreq, and governor, and will now have
to set aside test results.
@Rafael: I am still having problems with my test computer and HWP. However, I can
observe the energy saving potential of this "passive-yet-active HWP mode".
At this point, I am actually trying to make my newer test computer
simply behave and do what it is told with respect to CPU frequency scaling,
because even acpi-cpufreq misbehaves for performance governor under
some conditions [1].
[1] https://marc.info/?l=linux-pm&m=159155067328641&w=2
To my way of thinking:
1.) it is imperative that we be able to decouple the governor servo
from the processor servo. At a minimum this is needed for system testing,
debugging and reference baselines. At a maximum users could, perhaps, decide
for themselves. Myself, I would prefer "passive" to mean "do what you
have been told", and that is now what I am testing.
2.) I have always thought, indeed relied on, performance mode as being
more than a hint. For my older i7-2600K it never disobeyed orders, except
for the most minuscule of workloads.
This newer i5-9600K seems to have a mind of its own which I would like
to be able to turn off, yet still be able to use intel_pstate trace
with schedutil.
Recall last week I said
> moving forward the typical CPU frequency scaling
> configuration for my test system will be:
>
> driver: intel-cpufreq, forced at boot.
> governor: schedutil
> hwp: forced off at boot.
The problem is that baseline references are still needed
and performance mode is unreliable. Maybe other stuff also,
I simply don't know at this point.
Example of EPP changing (no need to read on) (from fresh boot):
Current EPP:
root@s18:/home/doug# rdmsr --bitfield 31:24 -u -a 0x774
128
128
128
128
128
128
root@s18:/home/doug# grep . /sys/devices/system/cpu/cpu3/cpufreq/*
/sys/devices/system/cpu/cpu3/cpufreq/affected_cpus:3
/sys/devices/system/cpu/cpu3/cpufreq/base_frequency:3700000
/sys/devices/system/cpu/cpu3/cpufreq/cpuinfo_max_freq:4600000
/sys/devices/system/cpu/cpu3/cpufreq/cpuinfo_min_freq:800000
/sys/devices/system/cpu/cpu3/cpufreq/cpuinfo_transition_latency:0
/sys/devices/system/cpu/cpu3/cpufreq/energy_performance_available_preferences:default performance balance_performance balance_power
power
/sys/devices/system/cpu/cpu3/cpufreq/energy_performance_preference:balance_performance
/sys/devices/system/cpu/cpu3/cpufreq/related_cpus:3
/sys/devices/system/cpu/cpu3/cpufreq/scaling_available_governors:performance powersave
/sys/devices/system/cpu/cpu3/cpufreq/scaling_cur_freq:800102
/sys/devices/system/cpu/cpu3/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu3/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu3/cpufreq/scaling_max_freq:4600000
/sys/devices/system/cpu/cpu3/cpufreq/scaling_min_freq:800000
/sys/devices/system/cpu/cpu3/cpufreq/scaling_setspeed:<unsupported>
Now, switch to passive mode:
echo passive > /sys/devices/system/cpu/intel_pstate/status
And observe EPP:
root@s18:/home/doug# rdmsr --bitfield 31:24 -u -a 0x774
255
255
255
255
255
255
root@s18:/home/doug# grep . /sys/devices/system/cpu/cpu3/cpufreq/*
/sys/devices/system/cpu/cpu3/cpufreq/affected_cpus:3
/sys/devices/system/cpu/cpu3/cpufreq/cpuinfo_max_freq:4600000
/sys/devices/system/cpu/cpu3/cpufreq/cpuinfo_min_freq:800000
/sys/devices/system/cpu/cpu3/cpufreq/cpuinfo_transition_latency:20000
/sys/devices/system/cpu/cpu3/cpufreq/related_cpus:3
/sys/devices/system/cpu/cpu3/cpufreq/scaling_available_governors:conservative ondemand userspace powersave performance schedutil
Hey, where did the ability to adjust the energy_performance_preference setting go?
/sys/devices/system/cpu/cpu3/cpufreq/scaling_cur_freq:3400313
/sys/devices/system/cpu/cpu3/cpufreq/scaling_driver:intel_cpufreq
/sys/devices/system/cpu/cpu3/cpufreq/scaling_governor:performance
/sys/devices/system/cpu/cpu3/cpufreq/scaling_max_freq:4600000
/sys/devices/system/cpu/cpu3/cpufreq/scaling_min_freq:800000
/sys/devices/system/cpu/cpu3/cpufreq/scaling_setspeed:<unsupported>
Kernel is 5.7 +plus this patch:
root@s18:/home/doug# uname -a
Linux s18 5.7.0-hwp10 #786 SMP PREEMPT Tue Jun 9 20:15:18 PDT 2020 x86_64 x86_64 x86_64 GNU/Linux
223e5c33f927 (HEAD -> k57-doug-hwp) cpufreq: intel_pstate: Accept passive mode with HWP enabled
5d890a14763d cpufreq: intel_pstate: Use passive mode by default without HWP
3d77e6a8804a (tag: v5.7) Linux 5.7
The below is on top of this patch, and is how I am attempting to move forward:
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 4ab8bc1476c9..6c28ec49b192 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -2331,33 +2331,32 @@ static void intel_cpufreq_update_hwp_request(struct cpudata *cpu, u32 min_perf)
value |= HWP_MIN_PERF(min_perf);
/*
- * The entire MSR needs to be updated in order to update the HWP min
- * field in it, so opportunistically update the max too if needed.
+ * the max also...
*/
value &= ~HWP_MAX_PERF(~0L);
- value |= HWP_MAX_PERF(cpu->max_perf_ratio);
+ value |= HWP_MAX_PERF(min_perf);
if (value != prev)
wrmsrl_on_cpu(cpu->cpu, MSR_HWP_REQUEST, value);
}
/**
- * intel_cpufreq_adjust_hwp - Adjust the HWP reuqest register.
+ * intel_cpufreq_adjust_hwp - Adjust the HWP request register.
* @cpu: Target CPU.
* @target_pstate: P-state corresponding to the target frequency.
*
- * Set the HWP minimum performance limit to 75% of @target_pstate taking the
+ * Set the HWP minimum performance limit to @target_pstate taking the
* global min and max policy limits into account.
*
- * The purpose of this is to avoid situations in which the kernel and the HWP
- * algorithm work against each other by giving a hint about the expectations of
- * the former to the latter.
+ * The purpose of this is to force the slave (passive) servo to do what
+ * it has been told, not what ever it wants.
+ * This NOT a hint. EPP (responsiveness) is managed from elsewhere.
*/
static void intel_cpufreq_adjust_hwp(struct cpudata *cpu, u32 target_pstate)
{
u32 min_perf;
- min_perf = max_t(u32, (3 * target_pstate) / 4, cpu->min_perf_ratio);
+ min_perf = max_t(u32, target_pstate, cpu->min_perf_ratio);
min_perf = min_t(u32, min_perf, cpu->max_perf_ratio);
if (min_perf != cpu->pstate.current_pstate) {
cpu->pstate.current_pstate = min_perf;
... Doug
... [deleted] ...
next prev parent reply other threads:[~2020-06-15 6:18 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-21 17:15 [RFC/RFT][PATCH] cpufreq: intel_pstate: Work in passive mode with HWP enabled Rafael J. Wysocki
2020-05-25 1:36 ` Francisco Jerez
2020-05-25 13:23 ` Rafael J. Wysocki
2020-05-25 20:57 ` Francisco Jerez
2020-05-26 8:19 ` Rafael J. Wysocki
2020-05-26 15:51 ` Doug Smythies
2020-05-26 17:42 ` Rafael J. Wysocki
2020-05-26 18:29 ` Francisco Jerez
2020-05-25 15:30 ` Doug Smythies
2020-05-25 21:09 ` Doug Smythies
2020-06-15 6:18 ` Doug Smythies [this message]
2020-08-24 14:54 ` Doug Smythies
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='002801d642dc$c5225c10$4f671430$@net' \
--to=dsmythies@telus.net \
--cc=francisco.jerez.plata@intel.com \
--cc=ggherdovich@suse.cz \
--cc=len.brown@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=rjw@rjwysocki.net \
--cc=srinivas.pandruvada@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).