xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [RFC 00/12] Intel Hardware P-States (HWP) support
@ 2021-03-08 21:01 Jason Andryuk
  2021-03-08 21:01 ` [RFC 01/12] cpufreq: Allow restricting to internal governors only Jason Andryuk
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: Jason Andryuk @ 2021-03-08 21:01 UTC (permalink / raw)
  To: xen-devel
  Cc: pawel, marmarek, Jason Andryuk, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini

Hi,

This patch series adds Hardware-Controlled Performance States (HWP) for
Intel processors to Xen.

With HWP, the processor makes its own determinations for frequency
selection depending, though users can set some parameters and
preferences.  There is also Turbo Boost which pushes the max frequency
if possible.

The existing governors don't work with HWP since they select frequencies
and HWP doesn't expose those.  Therefore a dummy hwp-interal governor is
used that doesn't do anything.

xenpm get-cpufreq-para is extended to show HWP parameters, and
set-cpufreq-hwp is added to set them.

A lightly loaded OpenXT laptop showed ~1W power savings according to
powertop.  A mostly idle Fedora system (dom0 only) showed a more modest
power savings.

This fir for a 10th gen 6-core 1600 MHz base 4900 MHZ max cpu.  In the
default balance mode, Turbo Boost doesn't exceed 4GHz.  Tweaking the
energy_perf preference with `xenpm set-cpufreq-hwp balance ene:64`,
I've seen the CPU hit 4.7GHz before throttling down and bouncing around
between 4.3 and 4.5 GHz.  Curiously the other cores read ~4GHz when
turbo boost takes affect.  This was done after pinning all dom0 cores,
and using taskset to pin to vCPU/pCPU 11 and running a bash tightloop.

RFC since there are a few open questions:

hwp defaults to enabled and running in balanced mode.  Maybe this wants
changing?

This series unilaterally enables Hardware Duty Cycling (HDC) which is
another feature to autonomously powerdown things.  That is enabled if
HWP is enabled.  Maybe that want to be configurable?

I've only tested on an 8th gen and a 10th gen systems.  The don't have
fast MSR support, and they do have activity window and energy_perf
support.  So the respective other modes are untested.

This changes the systcl_pm_op hypercall, so that wants review.

I wanted to get this out since I know Qubes is also interested.

Regards,
Jason

Jason Andryuk (12):
  cpufreq: Allow restricting to internal governors only
  cpufreq: Add perf_freq to cpuinfo
  cpufreq: Export intel_feature_detect
  cpufreq: Add Hardware P-State (HWP) driver
  xenpm: Change get-cpufreq-para output for internal
  cpufreq: Export HWP parameters to userspace
  libxc: Include hwp_para in definitions
  xenpm: Print HWP parameters
  xenpm: Factor out a non-fatal cpuid_parse variant
  xen: Add SET_CPUFREQ_HWP xen_sysctl_pm_op
  libxc: Add xc_set_cpufreq_hwp
  xenpm: Add set-cpufreq-hwp subcommand

 docs/misc/xen-command-line.pandoc         |   9 +
 tools/include/xenctrl.h                   |   6 +
 tools/libs/ctrl/xc_pm.c                   |  18 +
 tools/misc/xenpm.c                        | 375 +++++++++++-
 xen/arch/x86/acpi/cpufreq/Makefile        |   1 +
 xen/arch/x86/acpi/cpufreq/cpufreq.c       |  15 +-
 xen/arch/x86/acpi/cpufreq/hwp.c           | 671 ++++++++++++++++++++++
 xen/drivers/acpi/pmstat.c                 |  30 +
 xen/drivers/cpufreq/cpufreq.c             |   4 +
 xen/drivers/cpufreq/utility.c             |   1 +
 xen/include/acpi/cpufreq/cpufreq.h        |   9 +
 xen/include/acpi/cpufreq/processor_perf.h |   5 +
 xen/include/asm-x86/cpufeature.h          |  11 +-
 xen/include/asm-x86/msr-index.h           |  21 +
 xen/include/public/sysctl.h               |  52 +-
 15 files changed, 1199 insertions(+), 29 deletions(-)
 create mode 100644 xen/arch/x86/acpi/cpufreq/hwp.c

-- 
2.29.2



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC 01/12] cpufreq: Allow restricting to internal governors only
  2021-03-08 21:01 [RFC 00/12] Intel Hardware P-States (HWP) support Jason Andryuk
@ 2021-03-08 21:01 ` Jason Andryuk
  2021-03-08 21:02 ` [RFC 02/12] cpufreq: Add perf_freq to cpuinfo Jason Andryuk
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Jason Andryuk @ 2021-03-08 21:01 UTC (permalink / raw)
  To: xen-devel; +Cc: pawel, marmarek, Jason Andryuk, Jan Beulich

For hwp, the standard governors are not usable, and only the internal
one is applicable.  Add the cpufreq_governor_internal boolean to
indicate when an internal governor, like hwp-internal, will be used.
This is set during presmp_initcall, so that it can suppress governor
registration during initcall.  Only a governor with a name containing
"internal" will be allowed in that case.

This way, the unuseable governors are not registered, so they internal
one is the only one returned to userspace.  This means incompatible
governors won't be advertised to userspace.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
---
 xen/drivers/cpufreq/cpufreq.c      | 4 ++++
 xen/include/acpi/cpufreq/cpufreq.h | 1 +
 2 files changed, 5 insertions(+)

diff --git a/xen/drivers/cpufreq/cpufreq.c b/xen/drivers/cpufreq/cpufreq.c
index e630a47419..b1f668b715 100644
--- a/xen/drivers/cpufreq/cpufreq.c
+++ b/xen/drivers/cpufreq/cpufreq.c
@@ -57,6 +57,7 @@ struct cpufreq_dom {
 };
 static LIST_HEAD_READ_MOSTLY(cpufreq_dom_list_head);
 
+bool __read_mostly cpufreq_governor_internal;
 struct cpufreq_governor *__read_mostly cpufreq_opt_governor;
 LIST_HEAD_READ_MOSTLY(cpufreq_governor_list);
 
@@ -122,6 +123,9 @@ int __init cpufreq_register_governor(struct cpufreq_governor *governor)
     if (!governor)
         return -EINVAL;
 
+    if (cpufreq_governor_internal && strstr(governor->name, "internal") == NULL)
+        return -EINVAL;
+
     if (__find_governor(governor->name) != NULL)
         return -EEXIST;
 
diff --git a/xen/include/acpi/cpufreq/cpufreq.h b/xen/include/acpi/cpufreq/cpufreq.h
index e88b20bfed..56df5eebed 100644
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -115,6 +115,7 @@ extern struct cpufreq_governor cpufreq_gov_dbs;
 extern struct cpufreq_governor cpufreq_gov_userspace;
 extern struct cpufreq_governor cpufreq_gov_performance;
 extern struct cpufreq_governor cpufreq_gov_powersave;
+extern bool cpufreq_governor_internal;
 
 extern struct list_head cpufreq_governor_list;
 
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 02/12] cpufreq: Add perf_freq to cpuinfo
  2021-03-08 21:01 [RFC 00/12] Intel Hardware P-States (HWP) support Jason Andryuk
  2021-03-08 21:01 ` [RFC 01/12] cpufreq: Allow restricting to internal governors only Jason Andryuk
@ 2021-03-08 21:02 ` Jason Andryuk
  2021-03-08 21:02 ` [RFC 03/12] cpufreq: Export intel_feature_detect Jason Andryuk
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Jason Andryuk @ 2021-03-08 21:02 UTC (permalink / raw)
  To: xen-devel
  Cc: pawel, marmarek, Jason Andryuk, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu

acpi-cpufreq scales the aperf/mperf measurements by max_freq, but HWP
needs to scale by base frequency.  Settings max_freq to base_freq
"works" but the code is not obvious, and returning values to userspace
is tricky.  Add an additonal perf_freq member which is used for scaling
aperf/mperf measurements.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
---
I don't like this, but it seems the best way to re-use the common
aperf/mperf code.  The other option would be to add wrappers that then
do the acpi vs. hwp scaling.
---
 xen/arch/x86/acpi/cpufreq/cpufreq.c | 2 +-
 xen/drivers/cpufreq/utility.c       | 1 +
 xen/include/acpi/cpufreq/cpufreq.h  | 3 +++
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/acpi/cpufreq/cpufreq.c b/xen/arch/x86/acpi/cpufreq/cpufreq.c
index f1f3c6923f..5eac2f7321 100644
--- a/xen/arch/x86/acpi/cpufreq/cpufreq.c
+++ b/xen/arch/x86/acpi/cpufreq/cpufreq.c
@@ -317,7 +317,7 @@ unsigned int get_measured_perf(unsigned int cpu, unsigned int flag)
     else
         perf_percent = 0;
 
-    return policy->cpuinfo.max_freq * perf_percent / 100;
+    return policy->cpuinfo.perf_freq * perf_percent / 100;
 }
 
 static unsigned int get_cur_freq_on_cpu(unsigned int cpu)
diff --git a/xen/drivers/cpufreq/utility.c b/xen/drivers/cpufreq/utility.c
index b93895d4dd..788929e079 100644
--- a/xen/drivers/cpufreq/utility.c
+++ b/xen/drivers/cpufreq/utility.c
@@ -236,6 +236,7 @@ int cpufreq_frequency_table_cpuinfo(struct cpufreq_policy *policy,
 
     policy->min = policy->cpuinfo.min_freq = min_freq;
     policy->max = policy->cpuinfo.max_freq = max_freq;
+    policy->cpuinfo.perf_freq = max_freq;
     policy->cpuinfo.second_max_freq = second_max_freq;
 
     if (policy->min == ~0)
diff --git a/xen/include/acpi/cpufreq/cpufreq.h b/xen/include/acpi/cpufreq/cpufreq.h
index 56df5eebed..b91859ce5d 100644
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -37,6 +37,9 @@ extern struct acpi_cpufreq_data *cpufreq_drv_data[NR_CPUS];
 struct cpufreq_cpuinfo {
     unsigned int        max_freq;
     unsigned int        second_max_freq;    /* P1 if Turbo Mode is on */
+    unsigned int        perf_freq; /* Scaling freq for aperf/mpref.
+                                      acpi-cpufreq uses max_freq, but HWP uses
+                                      base_freq.*/
     unsigned int        min_freq;
     unsigned int        transition_latency; /* in 10^(-9) s = nanoseconds */
 };
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 03/12] cpufreq: Export intel_feature_detect
  2021-03-08 21:01 [RFC 00/12] Intel Hardware P-States (HWP) support Jason Andryuk
  2021-03-08 21:01 ` [RFC 01/12] cpufreq: Allow restricting to internal governors only Jason Andryuk
  2021-03-08 21:02 ` [RFC 02/12] cpufreq: Add perf_freq to cpuinfo Jason Andryuk
@ 2021-03-08 21:02 ` Jason Andryuk
  2021-03-08 21:02 ` [RFC 04/12] cpufreq: Add Hardware P-State (HWP) driver Jason Andryuk
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Jason Andryuk @ 2021-03-08 21:02 UTC (permalink / raw)
  To: xen-devel
  Cc: pawel, marmarek, Jason Andryuk, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu

Export feature_detect as intel_feature_detect so it can be re-used by
HWP.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
---
 xen/arch/x86/acpi/cpufreq/cpufreq.c       | 4 ++--
 xen/include/acpi/cpufreq/processor_perf.h | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/acpi/cpufreq/cpufreq.c b/xen/arch/x86/acpi/cpufreq/cpufreq.c
index 5eac2f7321..8aae9b534d 100644
--- a/xen/arch/x86/acpi/cpufreq/cpufreq.c
+++ b/xen/arch/x86/acpi/cpufreq/cpufreq.c
@@ -340,7 +340,7 @@ static unsigned int get_cur_freq_on_cpu(unsigned int cpu)
     return extract_freq(get_cur_val(cpumask_of(cpu)), data);
 }
 
-static void feature_detect(void *info)
+void intel_feature_detect(void *info)
 {
     struct cpufreq_policy *policy = info;
     unsigned int eax;
@@ -596,7 +596,7 @@ acpi_cpufreq_cpu_init(struct cpufreq_policy *policy)
     /* Check for APERF/MPERF support in hardware
      * also check for boost support */
     if (c->x86_vendor == X86_VENDOR_INTEL && c->cpuid_level >= 6)
-        on_selected_cpus(cpumask_of(cpu), feature_detect, policy, 1);
+        on_selected_cpus(cpumask_of(cpu), intel_feature_detect, policy, 1);
 
     /*
      * the first call to ->target() should result in us actually
diff --git a/xen/include/acpi/cpufreq/processor_perf.h b/xen/include/acpi/cpufreq/processor_perf.h
index d8a1ba68a6..e2c08f0e6d 100644
--- a/xen/include/acpi/cpufreq/processor_perf.h
+++ b/xen/include/acpi/cpufreq/processor_perf.h
@@ -7,6 +7,8 @@
 
 #define XEN_PX_INIT 0x80000000
 
+void intel_feature_detect(void *info);
+
 int powernow_cpufreq_init(void);
 unsigned int powernow_register_driver(void);
 unsigned int get_measured_perf(unsigned int cpu, unsigned int flag);
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 04/12] cpufreq: Add Hardware P-State (HWP) driver
  2021-03-08 21:01 [RFC 00/12] Intel Hardware P-States (HWP) support Jason Andryuk
                   ` (2 preceding siblings ...)
  2021-03-08 21:02 ` [RFC 03/12] cpufreq: Export intel_feature_detect Jason Andryuk
@ 2021-03-08 21:02 ` Jason Andryuk
  2021-03-08 21:02 ` [RFC 05/12] xenpm: Change get-cpufreq-para output for internal Jason Andryuk
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Jason Andryuk @ 2021-03-08 21:02 UTC (permalink / raw)
  To: xen-devel
  Cc: pawel, marmarek, Jason Andryuk, Andrew Cooper, George Dunlap,
	Ian Jackson, Jan Beulich, Julien Grall, Stefano Stabellini,
	Wei Liu, Roger Pau Monné

From the Intel SDM: "Hardware-Controlled Performance States (HWP), which
autonomously selects performance states while utilizing OS supplied
performance guidance hints."

Enable HWP to run in autonomous mode by poking the correct MSRs.

There is no interface to configure - it hardcodes the default 0x80 (out
of 0x0-0xff) energy/performance preference.  xen_sysctl_pm_op/xenpm will
be to be extended to configure in subsequent patches.

Unscientific powertop measurement of an mostly idle, customized OpenXT
install:
A 10th gen 6-core laptop showed battery discharge drop from ~9.x to
~7.x watts.
A 8th gen 4-core laptop dropped from ~10 to ~9

Power usage depends on many factors, especially display brightness, but
this does show an power saving in balanced mode when CPU utilization is
low.

HWP isn't compatible with an external governor - it doesn't take
explicit frequency requests.  Therefore a minimal internal governor,
hwp-internal, is also added as a placeholder.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>

---

We disable on cpuid_level < 0x16.  cpuid(0x16) is used to get the cpu
frequencies for calculating the APERF/MPERF.  Without it, things would
still work, but the averge cpufrequency output would be wrong.

If HWP Energy_Performance_Preference isn't supported, the code falls
back to IA32_ENERGY_PERF_BIAS.  Right now, we don't check
CPUID.06H:ECX.SETBH[bit 3] before using that MSR.  The SDM reads like
it'll be available, and I assume it was available by the time Skylake
introduced HWP.

My 8th & 10th gen test systems both report:
(XEN) HWP: 1 notify: 1 act_window: 1 energy_perf: 1 pkg_level: 0 peci: 0
(XEN) HWP: FAST_IA32_HWP_REQUEST not supported
(XEN) HWP: Hardware Duty Cycling (HDC) supported
(XEN) HWP: HW_FEEDBACK not supported

So FAST_IA32_HWP_REQUEST and IA32_ENERGY_PERF_BIAS have not been tested.
---
 docs/misc/xen-command-line.pandoc         |   9 +
 xen/arch/x86/acpi/cpufreq/Makefile        |   1 +
 xen/arch/x86/acpi/cpufreq/cpufreq.c       |   9 +-
 xen/arch/x86/acpi/cpufreq/hwp.c           | 533 ++++++++++++++++++++++
 xen/include/acpi/cpufreq/processor_perf.h |   3 +
 xen/include/asm-x86/cpufeature.h          |  11 +-
 xen/include/asm-x86/msr-index.h           |  21 +
 7 files changed, 583 insertions(+), 4 deletions(-)
 create mode 100644 xen/arch/x86/acpi/cpufreq/hwp.c

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index 4737c92bfe..799391a5ab 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1310,6 +1310,15 @@ Specify whether guests are to be given access to physical port 80
 (often used for debugging purposes), to override the DMI based
 detection of systems known to misbehave upon accesses to that port.
 
+### hwp (x86)
+> `= <boolean>`
+
+> Default: `false`
+
+Specifies whether Xen uses Hardware-Controlled Performance States (HWP)
+on supported Intel hardware.  HWP is a Skylake+ feature which provides
+better CPU power management.
+
 ### idle_latency_factor (x86)
 > `= <integer>`
 
diff --git a/xen/arch/x86/acpi/cpufreq/Makefile b/xen/arch/x86/acpi/cpufreq/Makefile
index f75da9b9ca..db83aa6b14 100644
--- a/xen/arch/x86/acpi/cpufreq/Makefile
+++ b/xen/arch/x86/acpi/cpufreq/Makefile
@@ -1,2 +1,3 @@
 obj-y += cpufreq.o
+obj-y += hwp.o
 obj-y += powernow.o
diff --git a/xen/arch/x86/acpi/cpufreq/cpufreq.c b/xen/arch/x86/acpi/cpufreq/cpufreq.c
index 8aae9b534d..966490bda1 100644
--- a/xen/arch/x86/acpi/cpufreq/cpufreq.c
+++ b/xen/arch/x86/acpi/cpufreq/cpufreq.c
@@ -641,9 +641,12 @@ static int __init cpufreq_driver_init(void)
     int ret = 0;
 
     if ((cpufreq_controller == FREQCTL_xen) &&
-        (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL))
-        ret = cpufreq_register_driver(&acpi_cpufreq_driver);
-    else if ((cpufreq_controller == FREQCTL_xen) &&
+        (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)) {
+        if (hwp_available())
+            ret = hwp_register_driver();
+        else
+            ret = cpufreq_register_driver(&acpi_cpufreq_driver);
+    } else if ((cpufreq_controller == FREQCTL_xen) &&
         (boot_cpu_data.x86_vendor &
          (X86_VENDOR_AMD | X86_VENDOR_HYGON)))
         ret = powernow_register_driver();
diff --git a/xen/arch/x86/acpi/cpufreq/hwp.c b/xen/arch/x86/acpi/cpufreq/hwp.c
new file mode 100644
index 0000000000..f8e6fdbd41
--- /dev/null
+++ b/xen/arch/x86/acpi/cpufreq/hwp.c
@@ -0,0 +1,533 @@
+/*
+ * hwp.c cpufreq driver to run Intel Hardware P-States (HWP)
+ *
+ * Copyright (C) 2021 Jason Andryuk <jandryuk@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or (at
+ * your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/cpumask.h>
+#include <xen/init.h>
+#include <xen/param.h>
+#include <xen/xmalloc.h>
+#include <asm/msr.h>
+#include <asm/io.h>
+#include <acpi/cpufreq/cpufreq.h>
+
+static bool feature_hwp;
+static bool feature_hwp_notification;
+static bool feature_hwp_activity_window;
+static bool feature_hwp_energy_perf;
+static bool feature_hwp_pkg_level_ctl;
+static bool feature_hwp_peci;
+
+static bool feature_hdc;
+static bool feature_fast_msr;
+
+bool opt_hwp = true;
+boolean_param("hwp", opt_hwp);
+
+union hwp_request
+{
+    struct
+    {
+        uint64_t min_perf:8;
+        uint64_t max_perf:8;
+        uint64_t desired:8;
+        uint64_t energy_perf:8;
+        uint64_t activity_window:10;
+        uint64_t package_control:1;
+        uint64_t reserved:16;
+        uint64_t activity_window_valid:1;
+        uint64_t energy_perf_valid:1;
+        uint64_t desired_valid:1;
+        uint64_t max_perf_valid:1;
+        uint64_t min_perf_valid:1;
+    };
+    uint64_t raw;
+};
+
+struct hwp_drv_data
+{
+    union
+    {
+        uint64_t hwp_caps;
+        struct
+        {
+            uint64_t hw_highest:8;
+            uint64_t hw_guaranteed:8;
+            uint64_t hw_most_efficient:8;
+            uint64_t hw_lowest:8;
+            uint64_t hw_reserved:32;
+        };
+    };
+    union hwp_request curr_req;
+    uint16_t activity_window;
+    uint8_t minimum;
+    uint8_t maximum;
+    uint8_t desired;
+    uint8_t energy_perf;
+};
+struct hwp_drv_data *hwp_drv_data[NR_CPUS];
+
+#define hwp_err(...)     printk(XENLOG_ERR __VA_ARGS__)
+#define hwp_info(...)    printk(XENLOG_INFO __VA_ARGS__)
+#define hwp_verbose(...)                   \
+({                                         \
+    if ( cpufreq_verbose )                 \
+    {                                      \
+        printk(XENLOG_DEBUG __VA_ARGS__);  \
+    }                                      \
+})
+#define hwp_verbose_cont(...)              \
+({                                         \
+    if ( cpufreq_verbose )                 \
+    {                                      \
+        printk(             __VA_ARGS__);  \
+    }                                      \
+})
+
+static int hwp_governor(struct cpufreq_policy *policy,
+                        unsigned int event)
+{
+    int ret;
+
+    if ( policy == NULL )
+        return -EINVAL;
+
+    switch (event)
+    {
+    case CPUFREQ_GOV_START:
+        ret = 0;
+        break;
+    case CPUFREQ_GOV_STOP:
+        ret = -EINVAL;
+        break;
+    case CPUFREQ_GOV_LIMITS:
+        ret = 0;
+        break;
+    default:
+        ret = -EINVAL;
+    }
+
+    return ret;
+}
+
+static struct cpufreq_governor hwp_cpufreq_governor =
+{
+    .name          = "hwp-internal",
+    .governor      = hwp_governor,
+};
+
+static int __init cpufreq_gov_hwp_init(void)
+{
+    return cpufreq_register_governor(&hwp_cpufreq_governor);
+}
+__initcall(cpufreq_gov_hwp_init);
+
+bool hwp_available(void)
+{
+    uint32_t eax;
+    uint64_t val;
+    bool use_hwp;
+
+    if ( boot_cpu_data.cpuid_level < CPUID_PM_LEAF )
+    {
+        hwp_verbose("cpuid_level (%u) lacks HWP support\n", boot_cpu_data.cpuid_level);
+
+        return false;
+    }
+
+    eax = cpuid_eax(CPUID_PM_LEAF);
+    feature_hwp                 = !!(eax & CPUID6_EAX_HWP);
+    feature_hwp_notification    = !!(eax & CPUID6_EAX_HWP_Notification);
+    feature_hwp_activity_window = !!(eax & CPUID6_EAX_HWP_Activity_Window);
+    feature_hwp_energy_perf     =
+        !!(eax & CPUID6_EAX_HWP_Energy_Performance_Preference);
+    feature_hwp_pkg_level_ctl   =
+        !!(eax & CPUID6_EAX_HWP_Package_Level_Request);
+    feature_hwp_peci            = !!(eax & CPUID6_EAX_HWP_PECI);
+
+    hwp_verbose("HWP: %d notify: %d act_window: %d energy_perf: %d pkg_level: %d peci: %d\n",
+                feature_hwp, feature_hwp_notification,
+                feature_hwp_activity_window, feature_hwp_energy_perf,
+                feature_hwp_pkg_level_ctl, feature_hwp_peci);
+
+    if ( !feature_hwp )
+    {
+        hwp_verbose("Hardware does not support HWP\n");
+
+        return false;
+    }
+
+    if ( boot_cpu_data.cpuid_level < 0x16 )
+    {
+        hwp_info("HWP disabled: cpuid_level %x < 0x16 lacks CPU freq info\n",
+                 boot_cpu_data.cpuid_level);
+
+        return false;
+    }
+
+    hwp_verbose("HWP: FAST_IA32_HWP_REQUEST %ssupported\n",
+                eax & CPUID6_EAX_FAST_HWP_MSR ? "" : "not ");
+    if ( eax & CPUID6_EAX_FAST_HWP_MSR )
+    {
+        if ( rdmsr_safe(MSR_FAST_UNCORE_MSRS_CAPABILITY, val) )
+            hwp_err("error rdmsr_safe(MSR_FAST_UNCORE_MSRS_CAPABILITY)\n");
+
+        hwp_verbose("HWP: MSR_FAST_UNCORE_MSRS_CAPABILITY: %016lx\n", val);
+        if (val & FAST_IA32_HWP_REQUEST )
+        {
+            hwp_verbose("HWP: FAST_IA32_HWP_REQUEST MSR available\n");
+            feature_fast_msr = true;
+        }
+    }
+
+    feature_hdc = !!(eax & CPUID6_EAX_HDC);
+
+    hwp_verbose("HWP: Hardware Duty Cycling (HDC) %ssupported\n",
+                feature_hdc ? "" : "not ");
+
+    hwp_verbose("HWP: HW_FEEDBACK %ssupported\n",
+                (eax & CPUID6_EAX_HW_FEEDBACK) ? "" : "not ");
+
+    use_hwp = feature_hwp && opt_hwp;
+    cpufreq_governor_internal = use_hwp;
+
+    if ( use_hwp )
+        hwp_info("Using HWP for cpufreq\n");
+
+    return use_hwp;
+}
+
+static void hdc_set_pkg_hdc_ctl(bool val)
+{
+    uint64_t msr;
+
+    if ( rdmsr_safe(MSR_IA32_PKG_HDC_CTL, msr) )
+    {
+        hwp_err("error rdmsr_safe(MSR_IA32_PKG_HDC_CTL)\n");
+
+        return;
+    }
+
+    msr = val ? IA32_PKG_HDC_CTL_HDC_PKG_Enable : 0;
+
+    if ( wrmsr_safe(MSR_IA32_PKG_HDC_CTL, msr) )
+        hwp_err("error wrmsr_safe(MSR_IA32_PKG_HDC_CTL): %016lx\n", msr);
+}
+
+static void hdc_set_pm_ctl1(bool val)
+{
+    uint64_t msr;
+
+    if ( rdmsr_safe(MSR_IA32_PM_CTL1, msr) )
+    {
+        hwp_err("error rdmsr_safe(MSR_IA32_PM_CTL1)\n");
+
+        return;
+    }
+
+    msr = val ? IA32_PM_CTL1_HDC_Allow_Block : 0;
+
+    if ( wrmsr_safe(MSR_IA32_PM_CTL1, msr) )
+        hwp_err("error wrmsr_safe(MSR_IA32_PM_CTL1): %016lx\n", msr);
+}
+
+static void hwp_fast_uncore_msrs_ctl(bool val)
+{
+    uint64_t msr;
+
+    if ( rdmsr_safe(MSR_FAST_UNCORE_MSRS_CTL, msr) )
+        hwp_err("error rdmsr_safe(MSR_FAST_UNCORE_MSRS_CTL)\n");
+
+    msr = val;
+
+    if ( wrmsr_safe(MSR_FAST_UNCORE_MSRS_CTL, msr) )
+        hwp_err("error wrmsr_safe(MSR_FAST_UNCORE_MSRS_CTL): %016lx\n", msr);
+}
+
+static void hwp_get_cpu_speeds(struct cpufreq_policy *policy)
+{
+    uint32_t base_khz, max_khz, bus_khz, edx;
+
+    cpuid(0x16, &base_khz, &max_khz, &bus_khz, &edx);
+
+    /* aperf/mperf scales base. */
+    policy->cpuinfo.perf_freq = base_khz * 1000;
+    policy->cpuinfo.min_freq = base_khz * 1000;
+    policy->cpuinfo.max_freq = max_khz * 1000;
+    policy->min = base_khz * 1000;
+    policy->max = max_khz * 1000;
+    policy->cur = 0;
+}
+
+static void hwp_read_capabilities(void *info)
+{
+    struct cpufreq_policy *policy = info;
+    struct hwp_drv_data *data = hwp_drv_data[policy->cpu];
+
+    if ( rdmsr_safe(MSR_IA32_HWP_CAPABILITIES, data->hwp_caps) )
+    {
+        hwp_err("CPU%u: error rdmsr_safe(MSR_IA32_HWP_CAPABILITIES)\n",
+                policy->cpu);
+
+        return;
+    }
+
+    if ( rdmsr_safe(MSR_IA32_HWP_REQUEST, data->curr_req.raw) )
+    {
+        hwp_err("CPU%u: error rdmsr_safe(MSR_IA32_HWP_REQUEST)\n", policy->cpu);
+
+        return;
+    }
+}
+
+static void hwp_init_msrs(void *info)
+{
+    struct cpufreq_policy *policy = info;
+    uint64_t val;
+
+    /* Package level MSR, but we don't have a good idea of packages here, so
+     * just do it everytime. */
+    if ( rdmsr_safe(MSR_IA32_PM_ENABLE, val) )
+    {
+        hwp_err("CPU%u: error rdmsr_safe(MSR_IA32_PM_ENABLE)\n", policy->cpu);
+
+        return;
+    }
+
+    hwp_verbose("CPU%u: MSR_IA32_PM_ENABLE: %016lx\n", policy->cpu, val);
+    if ( val != IA32_PM_ENABLE_HWP_ENABLE )
+    {
+        val = IA32_PM_ENABLE_HWP_ENABLE;
+        if ( wrmsr_safe(MSR_IA32_PM_ENABLE, val) )
+            hwp_err("CPU%u: error wrmsr_safe(MSR_IA32_PM_ENABLE, %lx)\n",
+                    policy->cpu, val);
+    }
+
+    hwp_read_capabilities(info);
+
+    /* Check for APERF/MPERF support in hardware
+     * also check for boost/turbo support */
+    intel_feature_detect(policy);
+
+    if ( feature_hdc )
+    {
+        hdc_set_pkg_hdc_ctl(true);
+        hdc_set_pm_ctl1(true);
+    }
+
+    if ( feature_fast_msr )
+        hwp_fast_uncore_msrs_ctl(true);
+
+    hwp_get_cpu_speeds(policy);
+}
+
+static int hwp_cpufreq_verify(struct cpufreq_policy *policy)
+{
+    unsigned int cpu = policy->cpu;
+    struct hwp_drv_data *data = hwp_drv_data[cpu];
+
+    if ( !feature_hwp_energy_perf && data->energy_perf )
+    {
+        if ( data->energy_perf > 15 )
+        {
+            hwp_err("energy_perf %d exceeds IA32_ENERGY_PERF_BIAS range 0-15\n",
+                    data->energy_perf);
+
+            return -EINVAL;
+        }
+    }
+
+    if ( !feature_hwp_activity_window && data->activity_window )
+    {
+        hwp_err("HWP activity window not supported.\n");
+
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
+/* val 0 - highest performance, 15 - maximum energy savings */
+static void hwp_energy_perf_bias(void *info)
+{
+    uint64_t msr;
+    struct hwp_drv_data *data = info;
+    uint8_t val = data->energy_perf;
+
+    ASSERT(val <= 15);
+
+    if ( rdmsr_safe(MSR_IA32_ENERGY_PERF_BIAS, msr) )
+    {
+        hwp_err("error rdmsr_safe(MSR_IA32_ENERGY_PERF_BIAS)\n");
+
+        return;
+    }
+
+    msr &= ~(0xf);
+    msr |= val;
+
+    if ( wrmsr_safe(MSR_IA32_ENERGY_PERF_BIAS, msr) )
+        hwp_err("error wrmsr_safe(MSR_IA32_ENERGY_PERF_BIAS): %016lx\n", msr);
+}
+
+static void hwp_write_request(void *info)
+{
+    struct cpufreq_policy *policy = info;
+    struct hwp_drv_data *data = hwp_drv_data[policy->cpu];
+    union hwp_request hwp_req = data->curr_req;
+
+    BUILD_BUG_ON(sizeof(union hwp_request) != sizeof(uint64_t));
+    if ( wrmsr_safe(MSR_IA32_HWP_REQUEST, hwp_req.raw) )
+    {
+        hwp_err("CPU%u: error wrmsr_safe(MSR_IA32_HWP_REQUEST, %lx)\n",
+                policy->cpu, hwp_req.raw);
+        rdmsr_safe(MSR_IA32_HWP_REQUEST, data->curr_req.raw);
+    }
+}
+
+static int hwp_cpufreq_target(struct cpufreq_policy *policy,
+                              unsigned int target_freq, unsigned int relation)
+{
+    unsigned int cpu = policy->cpu;
+    struct hwp_drv_data *data = hwp_drv_data[cpu];
+    union hwp_request hwp_req;
+
+    /* Zero everything to ensure reserved bits are zero... */
+    hwp_req.raw = 0;
+    /* .. and update from there */
+    hwp_req.min_perf = data->minimum;
+    hwp_req.max_perf = data->maximum;
+    hwp_req.desired = data->desired;
+    if ( feature_hwp_energy_perf )
+        hwp_req.energy_perf = data->energy_perf;
+    if ( feature_hwp_activity_window )
+        hwp_req.activity_window = data->activity_window;
+
+    if ( hwp_req.raw == data->curr_req.raw )
+        return 0;
+
+    data->curr_req.raw = hwp_req.raw;
+
+    hwp_verbose("CPU%u: wrmsr HWP_REQUEST %016lx\n", cpu, hwp_req.raw);
+    on_selected_cpus(cpumask_of(cpu), hwp_write_request, policy, 1);
+
+    if ( !feature_hwp_energy_perf && data->energy_perf )
+    {
+        on_selected_cpus(cpumask_of(cpu), hwp_energy_perf_bias,
+                         data, 1);
+    }
+
+    return 0;
+}
+
+static int hwp_cpufreq_cpu_init(struct cpufreq_policy *policy)
+{
+    unsigned int cpu = policy->cpu;
+    struct hwp_drv_data *data;
+
+    if ( cpufreq_opt_governor )
+    {
+        printk(XENLOG_WARNING
+               "HWP: governor \"%s\" is incompatible with hwp. Using default \"%s\"\n",
+               cpufreq_opt_governor->name, hwp_cpufreq_governor.name);
+    }
+    policy->governor = &hwp_cpufreq_governor;
+
+    data = xzalloc(typeof(*data));
+    if ( !data )
+        return -ENOMEM;
+
+    hwp_drv_data[cpu] = data;
+
+    on_selected_cpus(cpumask_of(cpu), hwp_init_msrs, policy, 1);
+
+    data->minimum = data->hw_lowest;
+    data->maximum = data->hw_highest;
+    data->desired = 0; /* default to HW autonomous */
+    if ( feature_hwp_energy_perf )
+        data->energy_perf = 0x80;
+    else
+        data->energy_perf = 7;
+
+    hwp_verbose("CPU%u: IA32_HWP_CAPABILITIES: %016lx\n", cpu, data->hwp_caps);
+
+    hwp_verbose("CPU%u: rdmsr HWP_REQUEST %016lx\n", cpu, data->curr_req.raw);
+
+    return 0;
+}
+
+static int hwp_cpufreq_cpu_exit(struct cpufreq_policy *policy)
+{
+    unsigned int cpu = policy->cpu;
+
+    xfree(hwp_drv_data[cpu]);
+    hwp_drv_data[cpu] = NULL;
+
+    return 0;
+}
+
+/* The SDM reads like turbo should be disabled with MSR_IA32_PERF_CTL and
+ * PERF_CTL_TURBO_DISENGAGE, but that does not seem to actually work, at least
+ * with my HWP testing.  MSR_IA32_MISC_ENABLE and MISC_ENABLE_TURBO_DISENGAGE
+ * is what Linux uses and seems to work. */
+static void hwp_set_misc_turbo(void *info)
+{
+    struct cpufreq_policy *policy = info;
+    uint64_t msr;
+
+    if ( rdmsr_safe(MSR_IA32_MISC_ENABLE, msr) )
+    {
+        hwp_err("CPU%u: error rdmsr_safe(MSR_IA32_MISC_ENABLE)\n", policy->cpu);
+
+        return;
+    }
+
+    if ( policy->turbo == CPUFREQ_TURBO_ENABLED )
+        msr &= ~MSR_IA32_MISC_ENABLE_TURBO_DISENGAGE;
+    else
+        msr |= MSR_IA32_MISC_ENABLE_TURBO_DISENGAGE;
+
+    if ( wrmsr_safe(MSR_IA32_MISC_ENABLE, msr) )
+        hwp_err("CPU%u: error wrmsr_safe(MSR_IA32_MISC_ENABLE): %016lx\n",
+                policy->cpu, msr);
+}
+
+static int hwp_cpufreq_update(int cpuid, struct cpufreq_policy *policy)
+{
+    on_selected_cpus(cpumask_of(cpuid), hwp_set_misc_turbo, policy, 1);
+
+    return 0;
+}
+
+static const struct cpufreq_driver __initconstrel hwp_cpufreq_driver =
+{
+    .name   = "hwp-cpufreq",
+    .verify = hwp_cpufreq_verify,
+    .target = hwp_cpufreq_target,
+    .init   = hwp_cpufreq_cpu_init,
+    .exit   = hwp_cpufreq_cpu_exit,
+    .update = hwp_cpufreq_update,
+};
+
+int hwp_register_driver(void)
+{
+    int ret;
+
+    ret = cpufreq_register_driver(&hwp_cpufreq_driver);
+
+    return ret;
+}
diff --git a/xen/include/acpi/cpufreq/processor_perf.h b/xen/include/acpi/cpufreq/processor_perf.h
index e2c08f0e6d..2e67e667e0 100644
--- a/xen/include/acpi/cpufreq/processor_perf.h
+++ b/xen/include/acpi/cpufreq/processor_perf.h
@@ -9,6 +9,9 @@
 
 void intel_feature_detect(void *info);
 
+bool hwp_available(void);
+int hwp_register_driver(void);
+
 int powernow_cpufreq_init(void);
 unsigned int powernow_register_driver(void);
 unsigned int get_measured_perf(unsigned int cpu, unsigned int flag);
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index 33b2257888..1900c90f90 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -26,7 +26,16 @@
 #define CPUID5_ECX_EXTENSIONS_SUPPORTED 0x1
 #define CPUID5_ECX_INTERRUPT_BREAK      0x2
 
-#define CPUID_PM_LEAF                    6
+#define CPUID_PM_LEAF                                6
+#define CPUID6_EAX_HWP                               (_AC(1, U) <<  7)
+#define CPUID6_EAX_HWP_Notification                  (_AC(1, U) <<  8)
+#define CPUID6_EAX_HWP_Activity_Window               (_AC(1, U) <<  9)
+#define CPUID6_EAX_HWP_Energy_Performance_Preference (_AC(1, U) << 10)
+#define CPUID6_EAX_HWP_Package_Level_Request         (_AC(1, U) << 11)
+#define CPUID6_EAX_HDC                               (_AC(1, U) << 13)
+#define CPUID6_EAX_HWP_PECI                          (_AC(1, U) << 16)
+#define CPUID6_EAX_FAST_HWP_MSR                      (_AC(1, U) << 18)
+#define CPUID6_EAX_HW_FEEDBACK                       (_AC(1, U) << 19)
 #define CPUID6_ECX_APERFMPERF_CAPABILITY 0x1
 
 /* CPUID level 0x00000001.edx */
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index 1f5a5d0e38..6f1f62c0a1 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -101,6 +101,12 @@
 #define MSR_RTIT_ADDR_A(n)                 (0x00000580 + (n) * 2)
 #define MSR_RTIT_ADDR_B(n)                 (0x00000581 + (n) * 2)
 
+#define MSR_FAST_UNCORE_MSRS_CTL            0x00000657
+#define  FAST_IA32_HWP_REQUEST_MSR_ENABLE   (_AC(1, ULL) <<  0)
+
+#define MSR_FAST_UNCORE_MSRS_CAPABILITY     0x0000065f
+#define  FAST_IA32_HWP_REQUEST              (_AC(1, ULL) <<  0)
+
 #define MSR_U_CET                           0x000006a0
 #define MSR_S_CET                           0x000006a2
 #define  CET_SHSTK_EN                       (_AC(1, ULL) <<  0)
@@ -112,10 +118,24 @@
 #define MSR_PL3_SSP                         0x000006a7
 #define MSR_INTERRUPT_SSP_TABLE             0x000006a8
 
+#define MSR_IA32_PM_ENABLE                  0x00000770
+#define  IA32_PM_ENABLE_HWP_ENABLE          (_AC(1, ULL) <<  0)
+#define MSR_IA32_HWP_CAPABILITIES           0x00000771
+#define MSR_IA32_HWP_REQUEST_PKG            0x00000772
+#define MSR_IA32_HWP_INTERRUPT              0x00000773
+#define MSR_IA32_HWP_REQUEST                0x00000774
+#define MSR_IA32_HWP_PECI_REQUEST_INFO      0x00000775
+#define MSR_IA32_HWP_STATUS                 0x00000777
+
 #define MSR_PASID                           0x00000d93
 #define  PASID_PASID_MASK                   0x000fffff
 #define  PASID_VALID                        (_AC(1, ULL) << 31)
 
+#define MSR_IA32_PKG_HDC_CTL                0x00000db0
+#define  IA32_PKG_HDC_CTL_HDC_PKG_Enable    (_AC(1, ULL) <<  0)
+#define MSR_IA32_PM_CTL1                    0x00000db1
+#define  IA32_PM_CTL1_HDC_Allow_Block       (_AC(1, ULL) <<  0)
+
 #define MSR_K8_VM_CR                        0xc0010114
 #define  VM_CR_INIT_REDIRECTION             (_AC(1, ULL) <<  1)
 #define  VM_CR_SVM_DISABLE                  (_AC(1, ULL) <<  4)
@@ -458,6 +478,7 @@
 #define MSR_IA32_MISC_ENABLE_LIMIT_CPUID  (1<<22)
 #define MSR_IA32_MISC_ENABLE_XTPR_DISABLE (1<<23)
 #define MSR_IA32_MISC_ENABLE_XD_DISABLE	(1ULL << 34)
+#define MSR_IA32_MISC_ENABLE_TURBO_DISENGAGE (1ULL << 38)
 
 #define MSR_IA32_TSC_DEADLINE		0x000006E0
 #define MSR_IA32_ENERGY_PERF_BIAS	0x000001b0
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 05/12] xenpm: Change get-cpufreq-para output for internal
  2021-03-08 21:01 [RFC 00/12] Intel Hardware P-States (HWP) support Jason Andryuk
                   ` (3 preceding siblings ...)
  2021-03-08 21:02 ` [RFC 04/12] cpufreq: Add Hardware P-State (HWP) driver Jason Andryuk
@ 2021-03-08 21:02 ` Jason Andryuk
  2021-03-08 21:02 ` [RFC 06/12] cpufreq: Export HWP parameters to userspace Jason Andryuk
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Jason Andryuk @ 2021-03-08 21:02 UTC (permalink / raw)
  To: xen-devel; +Cc: pawel, marmarek, Jason Andryuk, Ian Jackson, Wei Liu

When using HWP, some of the returned data is not applicable.  In that
case, we should just omit it to avoid confusing the user.  So switch to
printing the base and turbo frequencies since those are relevant to HWP.
Similarly, stop printing the CPU frequencies since those do not apply.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
---
 tools/misc/xenpm.c | 45 +++++++++++++++++++++++++++++----------------
 1 file changed, 29 insertions(+), 16 deletions(-)

diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index d0191d4984..35223b98df 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -711,6 +711,7 @@ void start_gather_func(int argc, char *argv[])
 /* print out parameters about cpu frequency */
 static void print_cpufreq_para(int cpuid, struct xc_get_cpufreq_para *p_cpufreq)
 {
+    bool internal = strstr(p_cpufreq->scaling_governor, "internal");
     int i;
 
     printf("cpu id               : %d\n", cpuid);
@@ -720,10 +721,19 @@ static void print_cpufreq_para(int cpuid, struct xc_get_cpufreq_para *p_cpufreq)
         printf(" %d", p_cpufreq->affected_cpus[i]);
     printf("\n");
 
-    printf("cpuinfo frequency    : max [%u] min [%u] cur [%u]\n",
-           p_cpufreq->cpuinfo_max_freq,
-           p_cpufreq->cpuinfo_min_freq,
-           p_cpufreq->cpuinfo_cur_freq);
+    if (internal)
+    {
+        printf("cpuinfo frequency    : base [%u] turbo [%u]\n",
+               p_cpufreq->cpuinfo_min_freq,
+               p_cpufreq->cpuinfo_max_freq);
+    }
+    else
+    {
+        printf("cpuinfo frequency    : max [%u] min [%u] cur [%u]\n",
+               p_cpufreq->cpuinfo_max_freq,
+               p_cpufreq->cpuinfo_min_freq,
+               p_cpufreq->cpuinfo_cur_freq);
+    }
 
     printf("scaling_driver       : %s\n", p_cpufreq->scaling_driver);
 
@@ -750,19 +760,22 @@ static void print_cpufreq_para(int cpuid, struct xc_get_cpufreq_para *p_cpufreq)
                p_cpufreq->u.ondemand.up_threshold);
     }
 
-    printf("scaling_avail_freq   :");
-    for ( i = 0; i < p_cpufreq->freq_num; i++ )
-        if ( p_cpufreq->scaling_available_frequencies[i] ==
-             p_cpufreq->scaling_cur_freq )
-            printf(" *%d", p_cpufreq->scaling_available_frequencies[i]);
-        else
-            printf(" %d", p_cpufreq->scaling_available_frequencies[i]);
-    printf("\n");
+    if (!internal)
+    {
+        printf("scaling_avail_freq   :");
+        for ( i = 0; i < p_cpufreq->freq_num; i++ )
+            if ( p_cpufreq->scaling_available_frequencies[i] ==
+                 p_cpufreq->scaling_cur_freq )
+                printf(" *%d", p_cpufreq->scaling_available_frequencies[i]);
+            else
+                printf(" %d", p_cpufreq->scaling_available_frequencies[i]);
+        printf("\n");
 
-    printf("scaling frequency    : max [%u] min [%u] cur [%u]\n",
-           p_cpufreq->scaling_max_freq,
-           p_cpufreq->scaling_min_freq,
-           p_cpufreq->scaling_cur_freq);
+        printf("scaling frequency    : max [%u] min [%u] cur [%u]\n",
+               p_cpufreq->scaling_max_freq,
+               p_cpufreq->scaling_min_freq,
+               p_cpufreq->scaling_cur_freq);
+    }
 
     printf("turbo mode           : %s\n",
            p_cpufreq->turbo_enabled ? "enabled" : "disabled or n/a");
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 06/12] cpufreq: Export HWP parameters to userspace
  2021-03-08 21:01 [RFC 00/12] Intel Hardware P-States (HWP) support Jason Andryuk
                   ` (4 preceding siblings ...)
  2021-03-08 21:02 ` [RFC 05/12] xenpm: Change get-cpufreq-para output for internal Jason Andryuk
@ 2021-03-08 21:02 ` Jason Andryuk
  2021-03-08 21:02 ` [RFC 07/12] libxc: Include hwp_para in definitions Jason Andryuk
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Jason Andryuk @ 2021-03-08 21:02 UTC (permalink / raw)
  To: xen-devel
  Cc: pawel, marmarek, Jason Andryuk, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini

Extend xen_get_cpufreq_para to return hwp parameters.  These match the
hardware rather closely.

We need the hw_features bitmask to indicated fields supported by the
actual hardware.

The use of uint8_t parameters matches the hardware size.  uint32_t
entries grows the sysctl_t past the build assertion in setup.c.  The
uint8_t ranges are supported across multiple generations, so hopefully
they won't change.

Increment XEN_SYSCTL_INTERFACE_VERSION for the new fields.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
---
 xen/arch/x86/acpi/cpufreq/hwp.c    | 24 ++++++++++++++++++++++++
 xen/drivers/acpi/pmstat.c          |  6 ++++++
 xen/include/acpi/cpufreq/cpufreq.h |  3 +++
 xen/include/public/sysctl.h        | 20 +++++++++++++++++++-
 4 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/acpi/cpufreq/hwp.c b/xen/arch/x86/acpi/cpufreq/hwp.c
index f8e6fdbd41..92222d6d85 100644
--- a/xen/arch/x86/acpi/cpufreq/hwp.c
+++ b/xen/arch/x86/acpi/cpufreq/hwp.c
@@ -523,6 +523,30 @@ static const struct cpufreq_driver __initconstrel hwp_cpufreq_driver =
     .update = hwp_cpufreq_update,
 };
 
+int get_hwp_para(struct cpufreq_policy *policy, struct xen_hwp_para *hwp_para)
+{
+    unsigned int cpu = policy->cpu;
+    struct hwp_drv_data *data = hwp_drv_data[cpu];
+
+    if ( data == NULL )
+        return -EINVAL;
+
+    hwp_para->hw_feature        =
+        feature_hwp_activity_window ? XEN_SYSCTL_HWP_FEAT_ACT_WINDOW  : 0 |
+        feature_hwp_energy_perf     ? XEN_SYSCTL_HWP_FEAT_ENERGY_PERF : 0;
+    hwp_para->hw_lowest         = data->hw_lowest;
+    hwp_para->hw_most_efficient = data->hw_most_efficient;
+    hwp_para->hw_guaranteed     = data->hw_guaranteed;
+    hwp_para->hw_highest        = data->hw_highest;
+    hwp_para->minimum           = data->minimum;
+    hwp_para->maximum           = data->maximum;
+    hwp_para->energy_perf       = data->energy_perf;
+    hwp_para->activity_window   = data->activity_window;
+    hwp_para->desired           = data->desired;
+
+    return 0;
+}
+
 int hwp_register_driver(void)
 {
     int ret;
diff --git a/xen/drivers/acpi/pmstat.c b/xen/drivers/acpi/pmstat.c
index 2f528f9ca3..4c6ae5d768 100644
--- a/xen/drivers/acpi/pmstat.c
+++ b/xen/drivers/acpi/pmstat.c
@@ -290,6 +290,12 @@ static int get_cpufreq_para(struct xen_sysctl_pm_op *op)
             &op->u.get_para.u.ondemand.sampling_rate,
             &op->u.get_para.u.ondemand.up_threshold);
     }
+
+    if ( !strnicmp(op->u.get_para.scaling_governor,
+                   "hwp-internal", CPUFREQ_NAME_LEN) )
+    {
+        ret = get_hwp_para(policy, &op->u.get_para.u.hwp_para);
+    }
     op->u.get_para.turbo_enabled = cpufreq_get_turbo_status(op->cpuid);
 
     return ret;
diff --git a/xen/include/acpi/cpufreq/cpufreq.h b/xen/include/acpi/cpufreq/cpufreq.h
index b91859ce5d..42146ca2cf 100644
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -246,4 +246,7 @@ int write_userspace_scaling_setspeed(unsigned int cpu, unsigned int freq);
 void cpufreq_dbs_timer_suspend(void);
 void cpufreq_dbs_timer_resume(void);
 
+/********************** hwp hypercall helper *************************/
+int get_hwp_para(struct cpufreq_policy *policy, struct xen_hwp_para *hwp_para);
+
 #endif /* __XEN_CPUFREQ_PM_H__ */
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 039ccf885c..1a6c6397ea 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -35,7 +35,7 @@
 #include "domctl.h"
 #include "physdev.h"
 
-#define XEN_SYSCTL_INTERFACE_VERSION 0x00000013
+#define XEN_SYSCTL_INTERFACE_VERSION 0x00000014
 
 /*
  * Read console content from Xen buffer ring.
@@ -301,6 +301,23 @@ struct xen_ondemand {
     uint32_t up_threshold;
 };
 
+struct xen_hwp_para {
+    uint16_t activity_window; /* 7bit mantissa and 3bit exponent */
+#define XEN_SYSCTL_HWP_FEAT_ENERGY_PERF (1 << 0) /* energy_perf range 0-255 if
+                                                    1. Otherwise 0-15 */
+#define XEN_SYSCTL_HWP_FEAT_ACT_WINDOW  (1 << 1) /* activity_window supported
+                                                    if 1 */
+    uint8_t hw_feature; /* bit flags for features */
+    uint8_t hw_lowest;
+    uint8_t hw_most_efficient;
+    uint8_t hw_guaranteed;
+    uint8_t hw_highest;
+    uint8_t minimum;
+    uint8_t maximum;
+    uint8_t desired;
+    uint8_t energy_perf;
+};
+
 /*
  * cpufreq para name of this structure named
  * same as sysfs file name of native linux
@@ -332,6 +349,7 @@ struct xen_get_cpufreq_para {
     union {
         struct  xen_userspace userspace;
         struct  xen_ondemand ondemand;
+        struct  xen_hwp_para hwp_para;
     } u;
 
     int32_t turbo_enabled;
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 07/12] libxc: Include hwp_para in definitions
  2021-03-08 21:01 [RFC 00/12] Intel Hardware P-States (HWP) support Jason Andryuk
                   ` (5 preceding siblings ...)
  2021-03-08 21:02 ` [RFC 06/12] cpufreq: Export HWP parameters to userspace Jason Andryuk
@ 2021-03-08 21:02 ` Jason Andryuk
  2021-03-08 21:02 ` [RFC 08/12] xenpm: Print HWP parameters Jason Andryuk
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Jason Andryuk @ 2021-03-08 21:02 UTC (permalink / raw)
  To: xen-devel; +Cc: pawel, marmarek, Jason Andryuk, Ian Jackson, Wei Liu

Expose the hwp_para fields through libxc.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
---
 tools/include/xenctrl.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 318920166c..3b0ca62fc7 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1960,6 +1960,7 @@ int xc_smt_disable(xc_interface *xch);
  */
 typedef struct xen_userspace xc_userspace_t;
 typedef struct xen_ondemand xc_ondemand_t;
+typedef struct xen_hwp_para xc_hwp_para_t;
 
 struct xc_get_cpufreq_para {
     /* IN/OUT variable */
@@ -1987,6 +1988,7 @@ struct xc_get_cpufreq_para {
     union {
         xc_userspace_t userspace;
         xc_ondemand_t ondemand;
+        xc_hwp_para_t hwp_para;
     } u;
 
     int32_t turbo_enabled;
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 08/12] xenpm: Print HWP parameters
  2021-03-08 21:01 [RFC 00/12] Intel Hardware P-States (HWP) support Jason Andryuk
                   ` (6 preceding siblings ...)
  2021-03-08 21:02 ` [RFC 07/12] libxc: Include hwp_para in definitions Jason Andryuk
@ 2021-03-08 21:02 ` Jason Andryuk
  2021-03-08 21:02 ` [RFC 09/12] xenpm: Factor out a non-fatal cpuid_parse variant Jason Andryuk
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Jason Andryuk @ 2021-03-08 21:02 UTC (permalink / raw)
  To: xen-devel; +Cc: pawel, marmarek, Jason Andryuk, Ian Jackson, Wei Liu

Print HWP-specific parameters.  Some are always present, but others
depend on hardware support.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
---
 tools/misc/xenpm.c | 75 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 73 insertions(+), 2 deletions(-)

diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index 35223b98df..9588dac991 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -708,6 +708,43 @@ void start_gather_func(int argc, char *argv[])
     pause();
 }
 
+static void calculate_hwp_activity_window(const xc_hwp_para_t *hwp,
+                                          unsigned int *activity_window,
+                                          const char **units)
+{
+    unsigned int mantissa = hwp->activity_window & 0x7f;
+    unsigned int exponent = ( hwp->activity_window >> 7 ) & 0x7;
+    unsigned int multiplier = 1;
+
+    if ( hwp->activity_window == 0 )
+    {
+        *units = "hardware selected";
+        *activity_window = 0;
+
+        return;
+    }
+
+    if ( exponent >= 6 )
+    {
+        *units = "s";
+        exponent -= 6;
+    }
+    else if ( exponent >= 3 )
+    {
+        *units = "ms";
+        exponent -= 3;
+    }
+    else
+    {
+        *units = "us";
+    }
+
+    for ( unsigned int i = 0; i < exponent; i++ )
+        multiplier *= 10;
+
+    *activity_window = mantissa * multiplier;
+}
+
 /* print out parameters about cpu frequency */
 static void print_cpufreq_para(int cpuid, struct xc_get_cpufreq_para *p_cpufreq)
 {
@@ -721,7 +758,7 @@ static void print_cpufreq_para(int cpuid, struct xc_get_cpufreq_para *p_cpufreq)
         printf(" %d", p_cpufreq->affected_cpus[i]);
     printf("\n");
 
-    if (internal)
+    if ( internal )
     {
         printf("cpuinfo frequency    : base [%u] turbo [%u]\n",
                p_cpufreq->cpuinfo_min_freq,
@@ -760,7 +797,7 @@ static void print_cpufreq_para(int cpuid, struct xc_get_cpufreq_para *p_cpufreq)
                p_cpufreq->u.ondemand.up_threshold);
     }
 
-    if (!internal)
+    if ( !internal )
     {
         printf("scaling_avail_freq   :");
         for ( i = 0; i < p_cpufreq->freq_num; i++ )
@@ -777,6 +814,40 @@ static void print_cpufreq_para(int cpuid, struct xc_get_cpufreq_para *p_cpufreq)
                p_cpufreq->scaling_cur_freq);
     }
 
+    if ( strcmp(p_cpufreq->scaling_governor, "hwp-internal") == 0 )
+    {
+        const xc_hwp_para_t *hwp = &p_cpufreq->u.hwp_para;
+
+        printf("hwp variables        :\n");
+        printf("  hardware limits    : lowest [%u] most_efficient [%u]\n",
+               hwp->hw_lowest, hwp->hw_most_efficient);
+        printf("  hardware limits    : guaranteed [%u] highest [%u]\n",
+               hwp->hw_guaranteed, hwp->hw_highest);
+        printf("  configured limits  : min [%u] max [%u] energy_perf [%u]\n",
+               hwp->minimum, hwp->maximum, hwp->energy_perf);
+
+        if ( hwp->hw_feature & XEN_SYSCTL_HWP_FEAT_ENERGY_PERF )
+        {
+            printf("  configured limits  : energy_perf [%u%s]\n",
+                   hwp->energy_perf,
+                   hwp->energy_perf ? "" : " hw autonomous");
+        }
+
+        if ( hwp->hw_feature & XEN_SYSCTL_HWP_FEAT_ACT_WINDOW )
+        {
+            unsigned int activity_window;
+            const char *units;
+
+            calculate_hwp_activity_window(hwp, &activity_window, &units);
+            printf("  configured limits  : activity_window [%u %s]\n",
+                   activity_window, units);
+        }
+
+        printf("  configured limits  : desired [%u%s]\n",
+               hwp->desired,
+               hwp->desired ? "" : " hw autonomous");
+    }
+
     printf("turbo mode           : %s\n",
            p_cpufreq->turbo_enabled ? "enabled" : "disabled or n/a");
     printf("\n");
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 09/12] xenpm: Factor out a non-fatal cpuid_parse variant
  2021-03-08 21:01 [RFC 00/12] Intel Hardware P-States (HWP) support Jason Andryuk
                   ` (7 preceding siblings ...)
  2021-03-08 21:02 ` [RFC 08/12] xenpm: Print HWP parameters Jason Andryuk
@ 2021-03-08 21:02 ` Jason Andryuk
  2021-03-08 21:02 ` [RFC 10/12] xen: Add SET_CPUFREQ_HWP xen_sysctl_pm_op Jason Andryuk
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Jason Andryuk @ 2021-03-08 21:02 UTC (permalink / raw)
  To: xen-devel; +Cc: pawel, marmarek, Jason Andryuk, Ian Jackson, Wei Liu

Allow cpuid_prase to be re-used without terminating xenpm.  HWP
will re-use it to optionally parse a cpuid.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
---
 tools/misc/xenpm.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index 9588dac991..a686f8f46e 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -79,17 +79,26 @@ void help_func(int argc, char *argv[])
     show_help();
 }
 
-static void parse_cpuid(const char *arg, int *cpuid)
+static int parse_cpuid_non_fatal(const char *arg, int *cpuid)
 {
     if ( sscanf(arg, "%d", cpuid) != 1 || *cpuid < 0 )
     {
         if ( strcasecmp(arg, "all") )
-        {
-            fprintf(stderr, "Invalid CPU identifier: '%s'\n", arg);
-            exit(EINVAL);
-        }
+            return -1;
+
         *cpuid = -1;
     }
+
+    return 0;
+}
+
+static void parse_cpuid(const char *arg, int *cpuid)
+{
+    if ( parse_cpuid_non_fatal(arg, cpuid) )
+    {
+        fprintf(stderr, "Invalid CPU identifier: '%s'\n", arg);
+        exit(EINVAL);
+    }
 }
 
 static void parse_cpuid_and_int(int argc, char *argv[],
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 10/12] xen: Add SET_CPUFREQ_HWP xen_sysctl_pm_op
  2021-03-08 21:01 [RFC 00/12] Intel Hardware P-States (HWP) support Jason Andryuk
                   ` (8 preceding siblings ...)
  2021-03-08 21:02 ` [RFC 09/12] xenpm: Factor out a non-fatal cpuid_parse variant Jason Andryuk
@ 2021-03-08 21:02 ` Jason Andryuk
  2021-03-08 21:02 ` [RFC 11/12] libxc: Add xc_set_cpufreq_hwp Jason Andryuk
  2021-03-08 21:02 ` [RFC 12/12] xenpm: Add set-cpufreq-hwp subcommand Jason Andryuk
  11 siblings, 0 replies; 13+ messages in thread
From: Jason Andryuk @ 2021-03-08 21:02 UTC (permalink / raw)
  To: xen-devel
  Cc: pawel, marmarek, Jason Andryuk, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini

Add SET_CPUFREQ_HWP xen_sysctl_pm_op to set HWP parameters.  The sysctl
supports setting multiple values simultaneously as indicated by the
set_params bits.  This allows atomically applying new HWP configuration
via a single wrmsr.

XEN_SYSCTL_HWP_SET_PRESET_BALANCE/PERFORMANCE/POWERSAVE provide three
common presets.  Setting them depends on hardware limits which the
hypervisor is already caching.  So using them allows skipping a
hypercall to query the limits (hw_lowest/highest) to then set those same
values.  The code is organized to allow a preset to be refined with
additional stuff if desired.

"most_efficient" and "guaranteed" could be additional presets in the
future, but the are not added now.  Those levels can change at runtime,
but we don't have code in place to monitor and update for those events.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
---
 xen/arch/x86/acpi/cpufreq/hwp.c    | 113 +++++++++++++++++++++++++++++
 xen/drivers/acpi/pmstat.c          |  24 ++++++
 xen/include/acpi/cpufreq/cpufreq.h |   2 +
 xen/include/public/sysctl.h        |  32 ++++++++
 4 files changed, 171 insertions(+)

diff --git a/xen/arch/x86/acpi/cpufreq/hwp.c b/xen/arch/x86/acpi/cpufreq/hwp.c
index 92222d6d85..8f4b18d246 100644
--- a/xen/arch/x86/acpi/cpufreq/hwp.c
+++ b/xen/arch/x86/acpi/cpufreq/hwp.c
@@ -547,6 +547,119 @@ int get_hwp_para(struct cpufreq_policy *policy, struct xen_hwp_para *hwp_para)
     return 0;
 }
 
+int set_hwp_para(struct cpufreq_policy *policy,
+                 struct xen_set_hwp_para *set_hwp)
+{
+    unsigned int cpu = policy->cpu;
+    struct hwp_drv_data *data = hwp_drv_data[cpu];
+
+    if ( data == NULL )
+        return -EINVAL;
+
+    /* Validate all parameters first */
+    if ( set_hwp->set_params & ~XEN_SYSCTL_HWP_SET_PARAM_MASK )
+    {
+        hwp_err("Invalid bits in hwp set_params %u\n",
+                set_hwp->set_params);
+
+        return -EINVAL;
+    }
+
+    if ( set_hwp->activity_window & ~XEN_SYSCTL_HWP_ACT_WINDOW_MASK )
+    {
+        hwp_err("Invalid bits in activity window %u\n",
+                set_hwp->activity_window);
+
+        return -EINVAL;
+    }
+
+    if ( !feature_hwp_energy_perf &&
+         set_hwp->set_params & XEN_SYSCTL_HWP_SET_ENERGY_PERF &&
+         set_hwp->energy_perf > 0xf )
+    {
+        hwp_err("energy_perf %u out of range for IA32_ENERGY_PERF_BIAS\n",
+                set_hwp->energy_perf);
+
+        return -EINVAL;
+    }
+
+    if ( set_hwp->set_params & XEN_SYSCTL_HWP_SET_DESIRED &&
+         ( set_hwp->desired < data->hw_lowest ||
+           set_hwp->desired > data->hw_highest ) )
+    {
+        hwp_err("hwp desired %u is out of range (%u ... %u)\n",
+                set_hwp->desired, data->hw_lowest, data->hw_highest);
+
+        return -EINVAL;
+    }
+
+    /*
+     * minimum & maximum are not validated as hardware doesn't seem to care
+     * and the SDM says CPUs will clip internally.
+     */
+
+    /* Apply presets */
+    switch ( set_hwp->set_params & XEN_SYSCTL_HWP_SET_PRESET_MASK )
+    {
+    case XEN_SYSCTL_HWP_SET_PRESET_POWERSAVE:
+        data->minimum = data->hw_lowest;
+        data->maximum = data->hw_lowest;
+        data->activity_window = 0;
+        if ( feature_hwp_energy_perf )
+            data->energy_perf = 0xff;
+        else
+            data->energy_perf = 0xf;
+        data->desired = 0;
+        break;
+    case XEN_SYSCTL_HWP_SET_PRESET_PERFORMANCE:
+        data->minimum = data->hw_highest;
+        data->maximum = data->hw_highest;
+        data->activity_window = 0;
+        data->energy_perf = 0;
+        data->desired = 0;
+        break;
+    case XEN_SYSCTL_HWP_SET_PRESET_BALANCE:
+        data->minimum = data->hw_lowest;
+        data->maximum = data->hw_highest;
+        data->activity_window = 0;
+        data->energy_perf = 0x80;
+        if ( feature_hwp_energy_perf )
+            data->energy_perf = 0x80;
+        else
+            data->energy_perf = 0x7;
+        data->desired = 0;
+        break;
+    case XEN_SYSCTL_HWP_SET_PRESET_NONE:
+        break;
+    default:
+        printk("HWP: Invalid preset value: %u\n",
+               set_hwp->set_params & XEN_SYSCTL_HWP_SET_PRESET_MASK);
+
+        return -EINVAL;
+    }
+
+    /* Further customize presets if needed */
+    if ( set_hwp->set_params & XEN_SYSCTL_HWP_SET_MINIMUM )
+        data->minimum = set_hwp->minimum;
+
+    if ( set_hwp->set_params & XEN_SYSCTL_HWP_SET_MAXIMUM )
+        data->maximum = set_hwp->maximum;
+
+    if ( set_hwp->set_params & XEN_SYSCTL_HWP_SET_ENERGY_PERF )
+        data->energy_perf = set_hwp->energy_perf;
+
+    if ( set_hwp->set_params & XEN_SYSCTL_HWP_SET_DESIRED )
+        data->desired = set_hwp->desired;
+
+    if ( set_hwp->set_params & XEN_SYSCTL_HWP_SET_ACT_WINDOW )
+        data->activity_window = set_hwp->activity_window &
+                                XEN_SYSCTL_HWP_ACT_WINDOW_MASK;
+
+    hwp_cpufreq_target(policy, 0, 0);
+
+    return 0;
+}
+
 int hwp_register_driver(void)
 {
     int ret;
diff --git a/xen/drivers/acpi/pmstat.c b/xen/drivers/acpi/pmstat.c
index 4c6ae5d768..860bf4db61 100644
--- a/xen/drivers/acpi/pmstat.c
+++ b/xen/drivers/acpi/pmstat.c
@@ -318,6 +318,24 @@ static int set_cpufreq_gov(struct xen_sysctl_pm_op *op)
     return __cpufreq_set_policy(old_policy, &new_policy);
 }
 
+static int set_cpufreq_hwp(struct xen_sysctl_pm_op *op)
+{
+    struct cpufreq_policy *policy;
+
+    if ( !cpufreq_governor_internal )
+        return -EINVAL;
+
+    policy = per_cpu(cpufreq_cpu_policy, op->cpuid);
+
+    if ( !policy || !policy->governor )
+        return -EINVAL;
+
+    if ( strnicmp(policy->governor->name, "hwp-internal", CPUFREQ_NAME_LEN) )
+        return -EINVAL;
+
+    return set_hwp_para(policy, &op->u.set_hwp);
+}
+
 static int set_cpufreq_para(struct xen_sysctl_pm_op *op)
 {
     int ret = 0;
@@ -465,6 +483,12 @@ int do_pm_op(struct xen_sysctl_pm_op *op)
         break;
     }
 
+    case SET_CPUFREQ_HWP:
+    {
+        ret = set_cpufreq_hwp(op);
+        break;
+    }
+
     case SET_CPUFREQ_PARA:
     {
         ret = set_cpufreq_para(op);
diff --git a/xen/include/acpi/cpufreq/cpufreq.h b/xen/include/acpi/cpufreq/cpufreq.h
index 42146ca2cf..7ff7d0d4bb 100644
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -248,5 +248,7 @@ void cpufreq_dbs_timer_resume(void);
 
 /********************** hwp hypercall helper *************************/
 int get_hwp_para(struct cpufreq_policy *policy, struct xen_hwp_para *hwp_para);
+int set_hwp_para(struct cpufreq_policy *policy,
+                 struct xen_set_hwp_para *set_hwp);
 
 #endif /* __XEN_CPUFREQ_PM_H__ */
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 1a6c6397ea..3f18a3d522 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -318,6 +318,36 @@ struct xen_hwp_para {
     uint8_t energy_perf;
 };
 
+/* set multiple values simultaneously when set_args bit is set */
+struct xen_set_hwp_para {
+    uint16_t set_params; /* bitflags for valid values */
+#define XEN_SYSCTL_HWP_SET_DESIRED              (1U << 0)
+#define XEN_SYSCTL_HWP_SET_ENERGY_PERF          (1U << 1)
+#define XEN_SYSCTL_HWP_SET_ACT_WINDOW           (1U << 2)
+#define XEN_SYSCTL_HWP_SET_MINIMUM              (1U << 3)
+#define XEN_SYSCTL_HWP_SET_MAXIMUM              (1U << 4)
+#define XEN_SYSCTL_HWP_SET_PRESET_MASK          (0xf000)
+#define XEN_SYSCTL_HWP_SET_PRESET_NONE          (0x0000)
+#define XEN_SYSCTL_HWP_SET_PRESET_BALANCE       (0x1000)
+#define XEN_SYSCTL_HWP_SET_PRESET_POWERSAVE     (0x2000)
+#define XEN_SYSCTL_HWP_SET_PRESET_PERFORMANCE   (0x3000)
+#define XEN_SYSCTL_HWP_SET_PARAM_MASK ((uint16_t)( \
+                                  XEN_SYSCTL_HWP_SET_PRESET_MASK | \
+                                  XEN_SYSCTL_HWP_SET_DESIRED     | \
+                                  XEN_SYSCTL_HWP_SET_ENERGY_PERF | \
+                                  XEN_SYSCTL_HWP_SET_ACT_WINDOW  | \
+                                  XEN_SYSCTL_HWP_SET_MINIMUM     | \
+                                  XEN_SYSCTL_HWP_SET_MAXIMUM     ))
+
+    uint16_t activity_window; /* 7bit mantissa and 3bit exponent */
+#define XEN_SYSCTL_HWP_ACT_WINDOW_MASK          (0x03ff)
+    uint8_t minimum;
+    uint8_t maximum;
+    uint8_t desired;
+    uint8_t energy_perf; /* 0-255 or 0-15 depending on HW support */
+};
+
+
 /*
  * cpufreq para name of this structure named
  * same as sysfs file name of native linux
@@ -379,6 +409,7 @@ struct xen_sysctl_pm_op {
     #define SET_CPUFREQ_GOV            (CPUFREQ_PARA | 0x02)
     #define SET_CPUFREQ_PARA           (CPUFREQ_PARA | 0x03)
     #define GET_CPUFREQ_AVGFREQ        (CPUFREQ_PARA | 0x04)
+    #define SET_CPUFREQ_HWP            (CPUFREQ_PARA | 0x05)
 
     /* set/reset scheduler power saving option */
     #define XEN_SYSCTL_pm_op_set_sched_opt_smt    0x21
@@ -405,6 +436,7 @@ struct xen_sysctl_pm_op {
         struct xen_get_cpufreq_para get_para;
         struct xen_set_cpufreq_gov  set_gov;
         struct xen_set_cpufreq_para set_para;
+        struct xen_set_hwp_para     set_hwp;
         uint64_aligned_t get_avgfreq;
         uint32_t                    set_sched_opt_smt;
 #define XEN_SYSCTL_CX_UNLIMITED 0xffffffff
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 11/12] libxc: Add xc_set_cpufreq_hwp
  2021-03-08 21:01 [RFC 00/12] Intel Hardware P-States (HWP) support Jason Andryuk
                   ` (9 preceding siblings ...)
  2021-03-08 21:02 ` [RFC 10/12] xen: Add SET_CPUFREQ_HWP xen_sysctl_pm_op Jason Andryuk
@ 2021-03-08 21:02 ` Jason Andryuk
  2021-03-08 21:02 ` [RFC 12/12] xenpm: Add set-cpufreq-hwp subcommand Jason Andryuk
  11 siblings, 0 replies; 13+ messages in thread
From: Jason Andryuk @ 2021-03-08 21:02 UTC (permalink / raw)
  To: xen-devel; +Cc: pawel, marmarek, Jason Andryuk, Ian Jackson, Wei Liu

Add xc_set_cpufreq_hwp to allow calling xen_systctl_pm_op
SET_CPUFREQ_HWP.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>

---
Am I allowed to do set_hwp = *set_hwp struct assignment?
---
 tools/include/xenctrl.h |  4 ++++
 tools/libs/ctrl/xc_pm.c | 18 ++++++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 3b0ca62fc7..dc8e1bb23a 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1994,11 +1994,15 @@ struct xc_get_cpufreq_para {
     int32_t turbo_enabled;
 };
 
+typedef struct xen_set_hwp_para xc_set_hwp_para_t;
+
 int xc_get_cpufreq_para(xc_interface *xch, int cpuid,
                         struct xc_get_cpufreq_para *user_para);
 int xc_set_cpufreq_gov(xc_interface *xch, int cpuid, char *govname);
 int xc_set_cpufreq_para(xc_interface *xch, int cpuid,
                         int ctrl_type, int ctrl_value);
+int xc_set_cpufreq_hwp(xc_interface *xch, int cpuid,
+                       xc_set_hwp_para_t *set_hwp);
 int xc_get_cpufreq_avgfreq(xc_interface *xch, int cpuid, int *avg_freq);
 
 int xc_set_sched_opt_smt(xc_interface *xch, uint32_t value);
diff --git a/tools/libs/ctrl/xc_pm.c b/tools/libs/ctrl/xc_pm.c
index 76d7eb7f26..407a24d2aa 100644
--- a/tools/libs/ctrl/xc_pm.c
+++ b/tools/libs/ctrl/xc_pm.c
@@ -330,6 +330,24 @@ int xc_set_cpufreq_para(xc_interface *xch, int cpuid,
     return xc_sysctl(xch, &sysctl);
 }
 
+int xc_set_cpufreq_hwp(xc_interface *xch, int cpuid,
+                       xc_set_hwp_para_t *set_hwp)
+{
+    DECLARE_SYSCTL;
+
+    if ( !xch )
+    {
+        errno = EINVAL;
+        return -1;
+    }
+    sysctl.cmd = XEN_SYSCTL_pm_op;
+    sysctl.u.pm_op.cmd = SET_CPUFREQ_HWP;
+    sysctl.u.pm_op.cpuid = cpuid;
+    sysctl.u.pm_op.u.set_hwp = *set_hwp;
+
+    return xc_sysctl(xch, &sysctl);
+}
+
 int xc_get_cpufreq_avgfreq(xc_interface *xch, int cpuid, int *avg_freq)
 {
     int ret = 0;
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 12/12] xenpm: Add set-cpufreq-hwp subcommand
  2021-03-08 21:01 [RFC 00/12] Intel Hardware P-States (HWP) support Jason Andryuk
                   ` (10 preceding siblings ...)
  2021-03-08 21:02 ` [RFC 11/12] libxc: Add xc_set_cpufreq_hwp Jason Andryuk
@ 2021-03-08 21:02 ` Jason Andryuk
  11 siblings, 0 replies; 13+ messages in thread
From: Jason Andryuk @ 2021-03-08 21:02 UTC (permalink / raw)
  To: xen-devel
  Cc: pawel, marmarek, Jason Andryuk, Ian Jackson, Wei Liu,
	Jan Beulich, Andrew Cooper, Roger Pau Monné

set-cpufreq-hwp allows setting the Hardware P-State (HWP) parameters.

It can be run on all or just a snigle cpu.  There are preset of
balance, powersave & performance.  Those can be further tweaked by
param:val arguments as explained in the usage description.

parameter names are just checked to the first 3 characters to shorten
typing.

Some options are hardware dependent, and ranges can be found in
get-cpufreq-para.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
---
 tools/misc/xenpm.c              | 240 ++++++++++++++++++++++++++++++++
 xen/arch/x86/acpi/cpufreq/hwp.c |   1 +
 2 files changed, 241 insertions(+)

diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index a686f8f46e..d3bcaf3b58 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -67,6 +67,25 @@ void show_help(void)
             " set-max-cstate        <num>|'unlimited' [<num2>|'unlimited']\n"
             "                                     set the C-State limitation (<num> >= 0) and\n"
             "                                     optionally the C-sub-state limitation (<num2> >= 0)\n"
+            " set-cpufreq-hwp       [cpuid] [balance|performance|powersave] <param:val>*\n"
+            "                                     set Hardware P-State (HWP) parameters\n"
+            "                                     optionally a preset of one of\n"
+            "                                       balance|performance|powersave\n"
+            "                                     an optional list of param:val arguments\n"
+            "                                       minimum:N  hw_lowest ... hw_highest\n"
+            "                                       maximum:N  hw_lowest ... hw_highest\n"
+            "                                       desired:N  hw_lowest ... hw_highest\n"
+            "                                           Set explicit performance target.\n"
+            "                                           non-zero disables auto-HWP mode.\n"
+            "                                       energy_perf:0-255 (or 0-15)\n"
+            "                                                   energy/performance hint\n"
+            "                                                   lower favor performance\n"
+            "                                                   higher favor powersave\n"
+            "                                                   127 (or 7) balance\n"
+            "                                       act_window:N{,m,u}s range 0us-1270s\n"
+            "                                           window for internal calculations.\n"
+            "                                           0 lets the hardware decide.\n"
+            "                                     get-cpufreq-para returns hw_lowest/highest.\n"
             " start [seconds]                     start collect Cx/Px statistics,\n"
             "                                     output after CTRL-C or SIGINT or several seconds.\n"
             " enable-turbo-mode     [cpuid]       enable Turbo Mode for processors that support it.\n"
@@ -1309,6 +1328,226 @@ void disable_turbo_mode(int argc, char *argv[])
                 errno, strerror(errno));
 }
 
+/*
+ * Parse activity_window:NNN{us,ms,s} and validate range.
+ *
+ * Activity window is a 7bit mantissa (0-127) with a 3bit exponent (0-7) base
+ * 10 in microseconds.  So the range is 1 microsecond to 1270 seconds.  A value
+ * of 0 lets the hardware autonomously select the window.
+ *
+ * Return 0 on success
+ *       -1 on error
+ *        1 Not activity_window. i.e. try parsing as another argument
+ */
+static int parse_activity_window(xc_set_hwp_para_t *set_hwp, char *p)
+{
+    char *param = NULL, *val = NULL, *suffix = NULL;
+    unsigned int u;
+    unsigned int exponent = 0;
+    unsigned int multiplier = 1;
+    int ret;
+
+    ret = sscanf(p, "%m[a-z_A-Z]:%ms", &param, &val);
+    if ( ret != 2 )
+    {
+        return -1;
+    }
+
+    if ( strncasecmp(param, "act", 3) != 0 )
+    {
+        ret = 1;
+
+        goto out;
+    }
+
+    free(param);
+    param = NULL;
+
+    ret = sscanf(val, "%u%ms", &u, &suffix);
+    if ( ret != 1 && ret != 2 )
+    {
+        fprintf(stderr, "invalid activity window: %s\n", val);
+
+        ret = -1;
+
+        goto out;
+    }
+
+    if ( ret == 2 && suffix )
+    {
+        if ( strcasecmp(suffix, "s") == 0 )
+        {
+            multiplier = 1000 * 1000;
+            exponent = 6;
+        }
+        else if ( strcasecmp(suffix, "ms") == 0 )
+        {
+            multiplier = 1000;
+            exponent = 3;
+        }
+        else if ( strcasecmp(suffix, "us") == 0 )
+        {
+            multiplier = 1;
+            exponent = 0;
+        }
+        else
+        {
+            fprintf(stderr, "invalid activity window units: %s\n", suffix);
+
+            ret = -1;
+            goto out;
+        }
+    }
+
+    if ( u > 1270 * 1000 * 1000 / multiplier )
+    {
+        fprintf(stderr, "activity window %s too large\n", val);
+
+        ret = -1;
+        goto out;
+    }
+
+    /* looking for 7 bits of mantissa and 3 bits of exponent */
+    while ( u > 127 )
+    {
+        u /= 10;
+        exponent += 1;
+    }
+
+    set_hwp->activity_window = ( exponent & 0x7 ) << 7 | ( u & 0x7f );
+    set_hwp->set_params |= XEN_SYSCTL_HWP_SET_ACT_WINDOW;
+
+    ret = 0;
+
+ out:
+    free(suffix);
+    free(param);
+    free(val);
+
+    return ret;
+}
+
+static int parse_hwp_opts(xc_set_hwp_para_t *set_hwp, int *cpuid,
+                          int argc, char *argv[])
+{
+    int i = 0;
+
+    if ( argc < 1 )
+        return -1;
+
+    if ( parse_cpuid_non_fatal(argv[i], cpuid) == 0 )
+    {
+        i++;
+    }
+
+    if ( i == argc )
+        return -1;
+
+    if ( strcasecmp(argv[i], "powersave") == 0 )
+    {
+        set_hwp->set_params = XEN_SYSCTL_HWP_SET_PRESET_POWERSAVE;
+        i++;
+    }
+    else if ( strcasecmp(argv[i], "performance") == 0 )
+    {
+        set_hwp->set_params = XEN_SYSCTL_HWP_SET_PRESET_PERFORMANCE;
+        i++;
+    }
+    else if ( strcasecmp(argv[i], "balance") == 0 )
+    {
+        set_hwp->set_params = XEN_SYSCTL_HWP_SET_PRESET_BALANCE;
+        i++;
+    }
+
+    for ( ; i < argc; i++)
+    {
+        unsigned int val;
+        char *param;
+        int ret;
+
+        ret = parse_activity_window(set_hwp, argv[i]);
+        switch ( ret )
+        {
+        case -1:
+            return -1;
+        case 0:
+            continue;
+            break;
+        case 1:
+            /* try other parsing */
+            break;
+        }
+
+        /* sscanf can't handle split on ':' for "%ms:%u'  */
+        ret = sscanf(argv[i], "%m[a-zA-Z_]:%u", &param, &val);
+        if ( ret != 2 )
+        {
+            fprintf(stderr, "%s is an invalid hwp parameter.\n", argv[i]);
+            return -1;
+        }
+
+        if ( val > 255 )
+        {
+            fprintf(stderr, "%s value %u is out of range.\n", param, val);
+            return -1;
+        }
+
+        if ( strncasecmp(param, "min", 3) == 0 )
+        {
+            set_hwp->minimum = val;
+            set_hwp->set_params |= XEN_SYSCTL_HWP_SET_MINIMUM;
+        }
+        else if ( strncasecmp(param, "max", 3) == 0 )
+        {
+            set_hwp->maximum = val;
+            set_hwp->set_params |= XEN_SYSCTL_HWP_SET_MAXIMUM;
+        }
+        else if ( strncasecmp(param, "des", 3) == 0 )
+        {
+            set_hwp->desired = val;
+            set_hwp->set_params |= XEN_SYSCTL_HWP_SET_DESIRED;
+        }
+        else if ( strncasecmp(param, "ene", 3) == 0 )
+        {
+            set_hwp->energy_perf = val;
+            set_hwp->set_params |= XEN_SYSCTL_HWP_SET_ENERGY_PERF;
+        }
+        else
+        {
+            fprintf(stderr, "%s is an invalid parameter\n.", param);
+            return -1;
+        }
+
+        free(param);
+    }
+
+    return 0;
+}
+
+static void hwp_set_func(int argc, char *argv[])
+{
+    xc_set_hwp_para_t set_hwp = {};
+    int cpuid = -1;
+    int i = 0;
+
+    if ( parse_hwp_opts(&set_hwp, &cpuid, argc, argv) )
+    {
+        fprintf(stderr, "Missing, excess, or invalid argument(s)\n");
+        exit(EINVAL);
+    }
+
+    if ( cpuid != -1 )
+    {
+        i = cpuid;
+        max_cpu_nr = i + 1;
+    }
+
+    for ( ; i < max_cpu_nr; i++ )
+        if ( xc_set_cpufreq_hwp(xc_handle, i, &set_hwp) )
+            fprintf(stderr, "[CPU%d] failed to set hwp params (%d - %s)\n",
+                    i, errno, strerror(errno));
+}
+
 struct {
     const char *name;
     void (*function)(int argc, char *argv[]);
@@ -1319,6 +1558,7 @@ struct {
     { "get-cpufreq-average", cpufreq_func },
     { "start", start_gather_func },
     { "get-cpufreq-para", cpufreq_para_func },
+    { "set-cpufreq-hwp", hwp_set_func },
     { "set-scaling-maxfreq", scaling_max_freq_func },
     { "set-scaling-minfreq", scaling_min_freq_func },
     { "set-scaling-governor", scaling_governor_func },
diff --git a/xen/arch/x86/acpi/cpufreq/hwp.c b/xen/arch/x86/acpi/cpufreq/hwp.c
index 8f4b18d246..0fd70d76a8 100644
--- a/xen/arch/x86/acpi/cpufreq/hwp.c
+++ b/xen/arch/x86/acpi/cpufreq/hwp.c
@@ -584,6 +584,7 @@ int set_hwp_para(struct cpufreq_policy *policy,
     }
 
     if ( set_hwp->set_params & XEN_SYSCTL_HWP_SET_DESIRED &&
+         set_hwp->desired != 0 &&
          ( set_hwp->desired < data->hw_lowest ||
            set_hwp->desired > data->hw_highest ) )
     {
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-03-08 21:08 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-08 21:01 [RFC 00/12] Intel Hardware P-States (HWP) support Jason Andryuk
2021-03-08 21:01 ` [RFC 01/12] cpufreq: Allow restricting to internal governors only Jason Andryuk
2021-03-08 21:02 ` [RFC 02/12] cpufreq: Add perf_freq to cpuinfo Jason Andryuk
2021-03-08 21:02 ` [RFC 03/12] cpufreq: Export intel_feature_detect Jason Andryuk
2021-03-08 21:02 ` [RFC 04/12] cpufreq: Add Hardware P-State (HWP) driver Jason Andryuk
2021-03-08 21:02 ` [RFC 05/12] xenpm: Change get-cpufreq-para output for internal Jason Andryuk
2021-03-08 21:02 ` [RFC 06/12] cpufreq: Export HWP parameters to userspace Jason Andryuk
2021-03-08 21:02 ` [RFC 07/12] libxc: Include hwp_para in definitions Jason Andryuk
2021-03-08 21:02 ` [RFC 08/12] xenpm: Print HWP parameters Jason Andryuk
2021-03-08 21:02 ` [RFC 09/12] xenpm: Factor out a non-fatal cpuid_parse variant Jason Andryuk
2021-03-08 21:02 ` [RFC 10/12] xen: Add SET_CPUFREQ_HWP xen_sysctl_pm_op Jason Andryuk
2021-03-08 21:02 ` [RFC 11/12] libxc: Add xc_set_cpufreq_hwp Jason Andryuk
2021-03-08 21:02 ` [RFC 12/12] xenpm: Add set-cpufreq-hwp subcommand Jason Andryuk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).