linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/21] cpufreq: introduce a new AMD CPU frequency control mechanism
@ 2021-10-29 13:02 Huang Rui
  2021-10-29 13:02 ` [PATCH v3 01/21] x86/cpufreatures: add AMD Collaborative Processor Performance Control feature flag Huang Rui
                   ` (21 more replies)
  0 siblings, 22 replies; 50+ messages in thread
From: Huang Rui @ 2021-10-29 13:02 UTC (permalink / raw)
  To: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Borislav Petkov,
	Peter Zijlstra, Ingo Molnar, Giovanni Gherdovich, linux-pm
  Cc: Deepak Sharma, Alex Deucher, Mario Limonciello, Steven Noonan,
	Nathan Fontenot, Jinzhou Su, Xiaojian Du, linux-kernel, x86,
	Huang Rui

Hi all,

We would like to introduce a new AMD CPU frequency control mechanism as the
"amd-pstate" driver for modern AMD Zen based CPU series in Linux Kernel.
The new mechanism is based on Collaborative processor performance control
(CPPC) which is finer grain frequency management than legacy ACPI hardware
P-States. Current AMD CPU platforms are using the ACPI P-states driver to
manage CPU frequency and clocks with switching only in 3 P-states. AMD
P-States is to replace the ACPI P-states controls, allows a flexible,
low-latency interface for the Linux kernel to directly communicate the
performance hints to hardware.

"amd-pstate" leverages the Linux kernel governors such as *schedutil*,
*ondemand*, etc. to manage the performance hints which are provided by CPPC
hardware functionality. The first version for amd-pstate is to support one
of the Zen3 processors, and we will support more in future after we verify
the hardware and SBIOS functionalities.

There are two types of hardware implementations for amd-pstate: one is full
MSR support and another is shared memory support. It can use
X86_FEATURE_AMD_CPPC feature flag to distinguish the different types. 

Using the new AMD P-States method + kernel governors (*schedutil*,
*ondemand*, ...) to manage the frequency update is the most appropriate
bridge between AMD Zen based hardware processor and Linux kernel, the
processor is able to ajust to the most efficiency frequency according to
the kernel scheduler loading.

Performance Per Watt (PPW) Caculation:

The PPW caculation is referred by below paper:
https://software.intel.com/content/dam/develop/external/us/en/documents/performance-per-what-paper.pdf

Below formula is referred from below spec to measure the PPW:

(F / t) / P = F * t / (t * E) = F / E,

"F" is the number of frames per second.
"P" is power measurd in watts.
"E" is energy measured in joules.

We use the RAPL interface with "perf" tool to get the energy data of the
package power.

The data comparsions between amd-pstate and acpi-freq module are tested on
AMD Cezanne processor:

1) TBench CPU benchmark:

+---------------------------------------------------------------------+
|                                                                     |
|               TBench (Performance Per Watt)                         |
|                                                    Higher is better |
+-------------------+------------------------+------------------------+
|                   |  Performance Per Watt  |  Performance Per Watt  |
|   Kernel Module   |       (Schedutil)      |       (Ondemand)       |
|                   |  Unit: MB / (s * J)    |  Unit: MB / (s * J)    |
+-------------------+------------------------+------------------------+
|                   |                        |                        |
|    acpi-cpufreq   |         3.022          |        2.969           |
|                   |                        |                        |
+-------------------+------------------------+------------------------+
|                   |                        |                        |
|     amd-pstate    |         3.131          |        3.284           |
|                   |                        |                        |
+-------------------+------------------------+------------------------+

2) Gitsource CPU benchmark:

+---------------------------------------------------------------------+
|                                                                     |
|               Gitsource (Performance Per Watt)                      |
|                                                    Higher is better |
+-------------------+------------------------+------------------------+
|                   |  Performance Per Watt  |  Performance Per Watt  |
|   Kernel Module   |       (Schedutil)      |       (Ondemand)       |
|                   |  Unit: 1 / (s * J)     |  Unit: 1 / (s * J)     |
+-------------------+------------------------+------------------------+
|                   |                        |                        |
|    acpi-cpufreq   |     3.42172E-07        |     2.74508E-07        |
|                   |                        |                        |
+-------------------+------------------------+------------------------+
|                   |                        |                        |
|     amd-pstate    |     4.09141E-07        |     3.47610E-07        |
|                   |                        |                        |
+-------------------+------------------------+------------------------+

3) Speedometer 2.0 CPU benchmark:

+---------------------------------------------------------------------+
|                                                                     |
|               Speedometer 2.0 (Performance Per Watt)                |
|                                                    Higher is better |
+-------------------+------------------------+------------------------+
|                   |  Performance Per Watt  |  Performance Per Watt  |
|   Kernel Module   |       (Schedutil)      |       (Ondemand)       |
|                   |  Unit: 1 / (s * J)     |  Unit: 1 / (s * J)     |
+-------------------+------------------------+------------------------+
|                   |                        |                        |
|    acpi-cpufreq   |      0.116111767       |      0.110321664       |
|                   |                        |                        |
+-------------------+------------------------+------------------------+
|                   |                        |                        |
|     amd-pstate    |      0.115825281       |      0.122024299       |
|                   |                        |                        |
+-------------------+------------------------+------------------------+


According to above average data, we can see this solution has shown better
performance per watt scaling on mobile CPU benchmarks in most of cases.

These patch series depends on a "hotplug capable" CPU fix below (Only few
of CPU parts with "un-hotplug" core will encounter the issue and Mario is
working on the fix):
https://lore.kernel.org/linux-pm/20210813161842.222414-1-mario.limonciello@amd.com/

And we can see patch series in below git repo:
V1: https://git.kernel.org/pub/scm/linux/kernel/git/rui/linux.git/log/?h=amd-pstate-dev-v1
V2: https://git.kernel.org/pub/scm/linux/kernel/git/rui/linux.git/log/?h=amd-pstate-dev-v2
V3: https://git.kernel.org/pub/scm/linux/kernel/git/rui/linux.git/log/?h=amd-pstate-dev-v3

For details introduction, please see the patch 19.

Changes from V1 -> V2:
- cpufreq:
- - Add detailed description in the commit log.
- - Clean up the "extension" postfix in the x86 feature flag.
- - Revise cppc_set_enable helper.
- - Add a fix to check online cpus in cppc_acpi.
- - Use static calls to avoid retpolines.
- - Revise the comment style.
- - Remove amd_pstate_boost_supported() function.
- - Revise the return value in syfs attribute functions.
- cpupower:
- - Refine the commit log for cpupower patches.
- - Expose a function to get the sysfs value from specific table.
- - Move amd-pstate sysfs definitions and functions into amd helper file.
- - Move the boost init function into amd helper file and explain the
  details in the commit log.
- - Remove the amd_pstate_get_data in the lib/cpufreq.c to keep the lib as
  common operations.
- - Move print_speed function into misc helper file.
- - Add amd_pstate_show_perf_and_freq() function in amd helper for
  cpufreq-info print.

Changes from V2 -> V3:
- cpufreq:
- - Add a patch from Steven to add systemio register in cppc lib. (Thanks
  to verify the driver in his platform)
- - Update online cpu mask to present cpu.
- - Enhance cppc_set_enable to cover all valid use cases.
- - Add more description in the Kconfig definition.
- - Clean up some redundance functions and data members.
- - Revise amd-pstate trace event prints.
- - Move the amd-pstate traces into power trace system and set the driver
  as build-in instead of module.
- - Clean up the duplicated sysfs with core cpufreq driver.
- - Revise the amd-pstate RST documentation.
- cpupower:
- - Revise the cpupower_amd_pstate_enabled() function to use
  cpufreq_get_driver helper instead of read sysfs.
- - Clean up the amd-pstate max/min frequency APIs, because they are
  actually the same with cpufreq info sysfs.

Thanks,
Ray

Huang Rui (18):
  x86/cpufreatures: add AMD Collaborative Processor Performance Control
    feature flag
  x86/msr: add AMD CPPC MSR definitions
  cpufreq: amd: introduce a new amd pstate driver to support future
    processors
  cpufreq: amd: add fast switch function for amd-pstate
  cpufreq: amd: add acpi cppc function as the backend for legacy
    processors
  cpufreq: amd: add trace for amd-pstate module
  cpufreq: amd: add boost mode support for amd-pstate
  cpufreq: amd: add amd-pstate frequencies attributes
  cpufreq: amd: add amd-pstate performance attributes
  cpupower: add AMD P-state capability flag
  cpupower: add the function to check amd-pstate enabled
  cpupower: initial AMD P-state capability
  cpupower: add the function to get the sysfs value from specific table
  cpupower: add amd-pstate sysfs definition and access helper
  cpupower: enable boost state support for amd-pstate module
  cpupower: move print_speed function into misc helper
  cpupower: print amd-pstate information on cpupower
  Documentation: amd-pstate: add amd-pstate driver introduction

Jinzhou Su (1):
  ACPI: CPPC: add cppc enable register function

Mario Limonciello (1):
  ACPI: CPPC: Check present CPUs for determining _CPC is valid

Steven Noonan (1):
  ACPI: CPPC: implement support for SystemIO registers

 Documentation/admin-guide/pm/amd-pstate.rst   | 373 ++++++++++
 .../admin-guide/pm/working-state.rst          |   1 +
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/msr-index.h              |  17 +
 drivers/acpi/cppc_acpi.c                      |  93 ++-
 drivers/cpufreq/Kconfig.x86                   |  17 +
 drivers/cpufreq/Makefile                      |   1 +
 drivers/cpufreq/amd-pstate.c                  | 663 ++++++++++++++++++
 include/acpi/cppc_acpi.h                      |   5 +
 include/trace/events/power.h                  |  46 ++
 tools/power/cpupower/lib/cpufreq.c            |  21 +-
 tools/power/cpupower/lib/cpufreq.h            |  12 +
 tools/power/cpupower/utils/cpufreq-info.c     |  68 +-
 tools/power/cpupower/utils/helpers/amd.c      |  87 +++
 tools/power/cpupower/utils/helpers/cpuid.c    |  13 +
 tools/power/cpupower/utils/helpers/helpers.h  |  22 +
 tools/power/cpupower/utils/helpers/misc.c     |  62 ++
 17 files changed, 1441 insertions(+), 61 deletions(-)
 create mode 100644 Documentation/admin-guide/pm/amd-pstate.rst
 create mode 100644 drivers/cpufreq/amd-pstate.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 01/21] x86/cpufreatures: add AMD Collaborative Processor Performance Control feature flag
  2021-10-29 13:02 [PATCH v3 00/21] cpufreq: introduce a new AMD CPU frequency control mechanism Huang Rui
@ 2021-10-29 13:02 ` Huang Rui
  2021-10-29 14:39   ` Borislav Petkov
  2021-11-06 10:28   ` Borislav Petkov
  2021-10-29 13:02 ` [PATCH v3 02/21] x86/msr: add AMD CPPC MSR definitions Huang Rui
                   ` (20 subsequent siblings)
  21 siblings, 2 replies; 50+ messages in thread
From: Huang Rui @ 2021-10-29 13:02 UTC (permalink / raw)
  To: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Borislav Petkov,
	Peter Zijlstra, Ingo Molnar, Giovanni Gherdovich, linux-pm
  Cc: Deepak Sharma, Alex Deucher, Mario Limonciello, Steven Noonan,
	Nathan Fontenot, Jinzhou Su, Xiaojian Du, linux-kernel, x86,
	Huang Rui

Add Collaborative Processor Performance Control feature flag for AMD
processors.

This feature flag will be used on the following amd-pstate driver. The
amd-pstate driver has two approaches to implement the frequency control
behavior. That depends on the CPU hardware implementation. One is "Full
MSR Support" and another is "Shared Memory Support". The feature flag
indicates the current processors with "Full MSR Support".

Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index d0ce5cfd3ac1..f23dc1abd485 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -313,6 +313,7 @@
 #define X86_FEATURE_AMD_SSBD		(13*32+24) /* "" Speculative Store Bypass Disable */
 #define X86_FEATURE_VIRT_SSBD		(13*32+25) /* Virtualized Speculative Store Bypass Disable */
 #define X86_FEATURE_AMD_SSB_NO		(13*32+26) /* "" Speculative Store Bypass is fixed in hardware. */
+#define X86_FEATURE_AMD_CPPC		(13*32+27) /* Collaborative Processor Performance Control */
 
 /* Thermal and Power Management Leaf, CPUID level 0x00000006 (EAX), word 14 */
 #define X86_FEATURE_DTHERM		(14*32+ 0) /* Digital Thermal Sensor */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 02/21] x86/msr: add AMD CPPC MSR definitions
  2021-10-29 13:02 [PATCH v3 00/21] cpufreq: introduce a new AMD CPU frequency control mechanism Huang Rui
  2021-10-29 13:02 ` [PATCH v3 01/21] x86/cpufreatures: add AMD Collaborative Processor Performance Control feature flag Huang Rui
@ 2021-10-29 13:02 ` Huang Rui
  2021-10-29 13:02 ` [PATCH v3 03/21] ACPI: CPPC: implement support for SystemIO registers Huang Rui
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 50+ messages in thread
From: Huang Rui @ 2021-10-29 13:02 UTC (permalink / raw)
  To: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Borislav Petkov,
	Peter Zijlstra, Ingo Molnar, Giovanni Gherdovich, linux-pm
  Cc: Deepak Sharma, Alex Deucher, Mario Limonciello, Steven Noonan,
	Nathan Fontenot, Jinzhou Su, Xiaojian Du, linux-kernel, x86,
	Huang Rui

AMD CPPC (Collaborative Processor Performance Control) function uses MSR
registers to manage the performance hints. So add the MSR register macro
here.

Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 arch/x86/include/asm/msr-index.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index a7c413432b33..ce42e15cf303 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -486,6 +486,23 @@
 
 #define MSR_AMD64_VIRT_SPEC_CTRL	0xc001011f
 
+/* AMD Collaborative Processor Performance Control MSRs */
+#define MSR_AMD_CPPC_CAP1		0xc00102b0
+#define MSR_AMD_CPPC_ENABLE		0xc00102b1
+#define MSR_AMD_CPPC_CAP2		0xc00102b2
+#define MSR_AMD_CPPC_REQ		0xc00102b3
+#define MSR_AMD_CPPC_STATUS		0xc00102b4
+
+#define CAP1_LOWEST_PERF(x)	(((x) >> 0) & 0xff)
+#define CAP1_LOWNONLIN_PERF(x)	(((x) >> 8) & 0xff)
+#define CAP1_NOMINAL_PERF(x)	(((x) >> 16) & 0xff)
+#define CAP1_HIGHEST_PERF(x)	(((x) >> 24) & 0xff)
+
+#define REQ_MAX_PERF(x)		(((x) & 0xff) << 0)
+#define REQ_MIN_PERF(x)		(((x) & 0xff) << 8)
+#define REQ_DES_PERF(x)		(((x) & 0xff) << 16)
+#define REQ_ENERGY_PERF_PREF(x)	(((x) & 0xff) << 24)
+
 /* Fam 17h MSRs */
 #define MSR_F17H_IRPERF			0xc00000e9
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 03/21] ACPI: CPPC: implement support for SystemIO registers
  2021-10-29 13:02 [PATCH v3 00/21] cpufreq: introduce a new AMD CPU frequency control mechanism Huang Rui
  2021-10-29 13:02 ` [PATCH v3 01/21] x86/cpufreatures: add AMD Collaborative Processor Performance Control feature flag Huang Rui
  2021-10-29 13:02 ` [PATCH v3 02/21] x86/msr: add AMD CPPC MSR definitions Huang Rui
@ 2021-10-29 13:02 ` Huang Rui
  2021-10-29 13:02 ` [PATCH v3 04/21] ACPI: CPPC: Check present CPUs for determining _CPC is valid Huang Rui
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 50+ messages in thread
From: Huang Rui @ 2021-10-29 13:02 UTC (permalink / raw)
  To: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Borislav Petkov,
	Peter Zijlstra, Ingo Molnar, Giovanni Gherdovich, linux-pm
  Cc: Deepak Sharma, Alex Deucher, Mario Limonciello, Steven Noonan,
	Nathan Fontenot, Jinzhou Su, Xiaojian Du, linux-kernel, x86,
	Huang Rui

From: Steven Noonan <steven@valvesoftware.com>

According to the ACPI v6.2 (and later) specification, SystemIO can be
used for _CPC registers. This teaches cppc_acpi how to handle such
registers.

This patch was tested using the amd_pstate driver on my Zephyrus G15
(model GA503QS) using the current version 410 BIOS, which uses
a SystemIO register for the HighestPerformance element in _CPC.

Signed-off-by: Steven Noonan <steven@valvesoftware.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 drivers/acpi/cppc_acpi.c | 46 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 43 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
index bd482108310c..444c7a4605ad 100644
--- a/drivers/acpi/cppc_acpi.c
+++ b/drivers/acpi/cppc_acpi.c
@@ -759,9 +759,24 @@ int acpi_cppc_processor_probe(struct acpi_processor *pr)
 						goto out_free;
 					cpc_ptr->cpc_regs[i-2].sys_mem_vaddr = addr;
 				}
+			} else if (gas_t->space_id == ACPI_ADR_SPACE_SYSTEM_IO) {
+				if (gas_t->access_width < 1 || gas_t->access_width > 3) {
+					/* 1 = 8-bit, 2 = 16-bit, and 3 = 32-bit. SystemIO doesn't
+					 * implement 64-bit registers.
+					 */
+					pr_debug("Invalid access width %d for SystemIO register\n",
+						gas_t->access_width);
+					goto out_free;
+				}
+				if (gas_t->address & ~0xFFFFULL) {
+					/* SystemIO registers use 16-bit integer addresses */
+					pr_debug("Invalid IO port %llu for SystemIO register\n",
+						gas_t->address);
+					goto out_free;
+				}
 			} else {
 				if (gas_t->space_id != ACPI_ADR_SPACE_FIXED_HARDWARE || !cpc_ffh_supported()) {
-					/* Support only PCC ,SYS MEM and FFH type regs */
+					/* Support only PCC, SystemMemory, SystemIO, and FFH type regs. */
 					pr_debug("Unsupported register type: %d\n", gas_t->space_id);
 					goto out_free;
 				}
@@ -936,7 +951,20 @@ static int cpc_read(int cpu, struct cpc_register_resource *reg_res, u64 *val)
 	}
 
 	*val = 0;
-	if (reg->space_id == ACPI_ADR_SPACE_PLATFORM_COMM && pcc_ss_id >= 0)
+
+	if (reg->space_id == ACPI_ADR_SPACE_SYSTEM_IO) {
+		u32 width = 8 << (reg->access_width - 1);
+		acpi_status status;
+
+		status = acpi_os_read_port((acpi_io_address)reg->address, (u32 *)val, width);
+
+		if (status != AE_OK) {
+			pr_debug("Error: Failed to read SystemIO port %llx\n", reg->address);
+			return -EFAULT;
+		}
+
+		return 0;
+	} else if (reg->space_id == ACPI_ADR_SPACE_PLATFORM_COMM && pcc_ss_id >= 0)
 		vaddr = GET_PCC_VADDR(reg->address, pcc_ss_id);
 	else if (reg->space_id == ACPI_ADR_SPACE_SYSTEM_MEMORY)
 		vaddr = reg_res->sys_mem_vaddr;
@@ -975,7 +1003,19 @@ static int cpc_write(int cpu, struct cpc_register_resource *reg_res, u64 val)
 	int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpu);
 	struct cpc_reg *reg = &reg_res->cpc_entry.reg;
 
-	if (reg->space_id == ACPI_ADR_SPACE_PLATFORM_COMM && pcc_ss_id >= 0)
+	if (reg->space_id == ACPI_ADR_SPACE_SYSTEM_IO) {
+		u32 width = 8 << (reg->access_width - 1);
+		acpi_status status;
+
+		status = acpi_os_write_port((acpi_io_address)reg->address, (u32)val, width);
+
+		if (status != AE_OK) {
+			pr_debug("Error: Failed to write SystemIO port %llx\n", reg->address);
+			return -EFAULT;
+		}
+
+		return 0;
+	} else if (reg->space_id == ACPI_ADR_SPACE_PLATFORM_COMM && pcc_ss_id >= 0)
 		vaddr = GET_PCC_VADDR(reg->address, pcc_ss_id);
 	else if (reg->space_id == ACPI_ADR_SPACE_SYSTEM_MEMORY)
 		vaddr = reg_res->sys_mem_vaddr;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 04/21] ACPI: CPPC: Check present CPUs for determining _CPC is valid
  2021-10-29 13:02 [PATCH v3 00/21] cpufreq: introduce a new AMD CPU frequency control mechanism Huang Rui
                   ` (2 preceding siblings ...)
  2021-10-29 13:02 ` [PATCH v3 03/21] ACPI: CPPC: implement support for SystemIO registers Huang Rui
@ 2021-10-29 13:02 ` Huang Rui
  2021-10-29 13:02 ` [PATCH v3 05/21] ACPI: CPPC: add cppc enable register function Huang Rui
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 50+ messages in thread
From: Huang Rui @ 2021-10-29 13:02 UTC (permalink / raw)
  To: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Borislav Petkov,
	Peter Zijlstra, Ingo Molnar, Giovanni Gherdovich, linux-pm
  Cc: Deepak Sharma, Alex Deucher, Mario Limonciello, Steven Noonan,
	Nathan Fontenot, Jinzhou Su, Xiaojian Du, linux-kernel, x86,
	Huang Rui

From: Mario Limonciello <mario.limonciello@amd.com>

As this is a static check, it should be based upon what is currently
present on the system. This makes probeing more deterministic.

While local APIC flags field (lapic_flags) of cpu core in MADT table is
0, then the cpu core won't be enabled. In this case, _CPC won't be found
in this core, and return back to _CPC invalid with walking through
possible cpus (include disable cpus). This is not expected, so switch to
check present CPUs instead.

Reported-by: Jinzhou Su <Jinzhou.Su@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 drivers/acpi/cppc_acpi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
index 444c7a4605ad..c9169c221209 100644
--- a/drivers/acpi/cppc_acpi.c
+++ b/drivers/acpi/cppc_acpi.c
@@ -411,7 +411,7 @@ bool acpi_cpc_valid(void)
 	struct cpc_desc *cpc_ptr;
 	int cpu;
 
-	for_each_possible_cpu(cpu) {
+	for_each_present_cpu(cpu) {
 		cpc_ptr = per_cpu(cpc_desc_ptr, cpu);
 		if (!cpc_ptr)
 			return false;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 05/21] ACPI: CPPC: add cppc enable register function
  2021-10-29 13:02 [PATCH v3 00/21] cpufreq: introduce a new AMD CPU frequency control mechanism Huang Rui
                   ` (3 preceding siblings ...)
  2021-10-29 13:02 ` [PATCH v3 04/21] ACPI: CPPC: Check present CPUs for determining _CPC is valid Huang Rui
@ 2021-10-29 13:02 ` Huang Rui
  2021-10-29 14:15   ` Limonciello, Mario
  2021-10-29 13:02 ` [PATCH v3 06/21] cpufreq: amd: introduce a new amd pstate driver to support future processors Huang Rui
                   ` (16 subsequent siblings)
  21 siblings, 1 reply; 50+ messages in thread
From: Huang Rui @ 2021-10-29 13:02 UTC (permalink / raw)
  To: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Borislav Petkov,
	Peter Zijlstra, Ingo Molnar, Giovanni Gherdovich, linux-pm
  Cc: Deepak Sharma, Alex Deucher, Mario Limonciello, Steven Noonan,
	Nathan Fontenot, Jinzhou Su, Xiaojian Du, linux-kernel, x86,
	Huang Rui

From: Jinzhou Su <Jinzhou.Su@amd.com>

Add a new function to enable CPPC feature. This function
will write Continuous Performance Control package
EnableRegister field on the processor.

CPPC EnableRegister register described in section 8.4.7.1 of ACPI 6.4:
This element is optional. If supported, contains a resource descriptor
with a single Register() descriptor that describes a register to which
OSPM writes a One to enable CPPC on this processor. Before this register
is set, the processor will be controlled by legacy mechanisms (ACPI
Pstates, firmware, etc.).

This register will be used for AMD processors to enable amd-pstate
function instead of legacy ACPI P-States.

Signed-off-by: Jinzhou Su <Jinzhou.Su@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 drivers/acpi/cppc_acpi.c | 45 ++++++++++++++++++++++++++++++++++++++++
 include/acpi/cppc_acpi.h |  5 +++++
 2 files changed, 50 insertions(+)

diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
index c9169c221209..2d2297ef5bf9 100644
--- a/drivers/acpi/cppc_acpi.c
+++ b/drivers/acpi/cppc_acpi.c
@@ -1275,6 +1275,51 @@ int cppc_get_perf_ctrs(int cpunum, struct cppc_perf_fb_ctrs *perf_fb_ctrs)
 }
 EXPORT_SYMBOL_GPL(cppc_get_perf_ctrs);
 
+/**
+ * cppc_set_enable - Set to enable CPPC on the processor by writing the
+ * Continuous Performance Control package EnableRegister feild.
+ * @cpu: CPU for which to enable CPPC register.
+ * @enable: 0 - disable, 1 - enable CPPC feature on the processor.
+ *
+ * Return: 0 for success, -ERRNO or -EIO otherwise.
+ */
+int cppc_set_enable(int cpu, bool enable)
+{
+	int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpu);
+	struct cpc_register_resource *enable_reg;
+	struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpu);
+	struct cppc_pcc_data *pcc_ss_data = NULL;
+	int ret = -EINVAL;
+
+	if (!cpc_desc) {
+		pr_debug("No CPC descriptor for CPU:%d\n", cpu);
+		return -EINVAL;
+	}
+
+	enable_reg = &cpc_desc->cpc_regs[ENABLE];
+
+	if (CPC_IN_PCC(enable_reg)) {
+
+		if (pcc_ss_id < 0)
+			return -EIO;
+
+		ret = cpc_write(cpu, enable_reg, enable);
+		if (ret)
+			return ret;
+
+		pcc_ss_data = pcc_data[pcc_ss_id];
+
+		down_write(&pcc_ss_data->pcc_lock);
+		/* after writing CPC, transfer the ownership of PCC to platfrom */
+		ret = send_pcc_cmd(pcc_ss_id, CMD_WRITE);
+		up_write(&pcc_ss_data->pcc_lock);
+		return ret;
+	}
+
+	return cpc_write(cpu, enable_reg, enable);
+}
+EXPORT_SYMBOL_GPL(cppc_set_enable);
+
 /**
  * cppc_set_perf - Set a CPU's performance controls.
  * @cpu: CPU for which to set performance controls.
diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h
index bc159a9b4a73..92b7ea8d8f5e 100644
--- a/include/acpi/cppc_acpi.h
+++ b/include/acpi/cppc_acpi.h
@@ -138,6 +138,7 @@ extern int cppc_get_desired_perf(int cpunum, u64 *desired_perf);
 extern int cppc_get_nominal_perf(int cpunum, u64 *nominal_perf);
 extern int cppc_get_perf_ctrs(int cpu, struct cppc_perf_fb_ctrs *perf_fb_ctrs);
 extern int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls);
+extern int cppc_set_enable(int cpu, bool enable);
 extern int cppc_get_perf_caps(int cpu, struct cppc_perf_caps *caps);
 extern bool acpi_cpc_valid(void);
 extern int acpi_get_psd_map(unsigned int cpu, struct cppc_cpudata *cpu_data);
@@ -162,6 +163,10 @@ static inline int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls)
 {
 	return -ENOTSUPP;
 }
+static inline int cppc_set_enable(int cpu, bool enable)
+{
+	return -ENOTSUPP;
+}
 static inline int cppc_get_perf_caps(int cpu, struct cppc_perf_caps *caps)
 {
 	return -ENOTSUPP;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 06/21] cpufreq: amd: introduce a new amd pstate driver to support future processors
  2021-10-29 13:02 [PATCH v3 00/21] cpufreq: introduce a new AMD CPU frequency control mechanism Huang Rui
                   ` (4 preceding siblings ...)
  2021-10-29 13:02 ` [PATCH v3 05/21] ACPI: CPPC: add cppc enable register function Huang Rui
@ 2021-10-29 13:02 ` Huang Rui
  2021-11-02 18:52   ` Limonciello, Mario
  2021-11-02 19:38   ` Nathan Fontenot
  2021-10-29 13:02 ` [PATCH v3 07/21] cpufreq: amd: add fast switch function for amd-pstate Huang Rui
                   ` (15 subsequent siblings)
  21 siblings, 2 replies; 50+ messages in thread
From: Huang Rui @ 2021-10-29 13:02 UTC (permalink / raw)
  To: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Borislav Petkov,
	Peter Zijlstra, Ingo Molnar, Giovanni Gherdovich, linux-pm
  Cc: Deepak Sharma, Alex Deucher, Mario Limonciello, Steven Noonan,
	Nathan Fontenot, Jinzhou Su, Xiaojian Du, linux-kernel, x86,
	Huang Rui

amd-pstate is the AMD CPU performance scaling driver that introduces a
new CPU frequency control mechanism on AMD Zen based CPU series in Linux
kernel. The new mechanism is based on Collaborative processor
performance control (CPPC) which is finer grain frequency management
than legacy ACPI hardware P-States. Current AMD CPU platforms are using
the ACPI P-states driver to manage CPU frequency and clocks with
switching only in 3 P-states. AMD P-States is to replace the ACPI
P-states controls, allows a flexible, low-latency interface for the
Linux kernel to directly communicate the performance hints to hardware.

"amd-pstate" leverages the Linux kernel governors such as *schedutil*,
*ondemand*, etc. to manage the performance hints which are provided by CPPC
hardware functionality. The first version for amd-pstate is to support one
of the Zen3 processors, and we will support more in future after we verify
the hardware and SBIOS functionalities.

There are two types of hardware implementations for amd-pstate: one is full
MSR support and another is shared memory support. It can use
X86_FEATURE_AMD_CPPC_EXT feature flag to distinguish the different types.

Using the new AMD P-States method + kernel governors (*schedutil*,
*ondemand*, ...) to manage the frequency update is the most appropriate
bridge between AMD Zen based hardware processor and Linux kernel, the
processor is able to ajust to the most efficiency frequency according to
the kernel scheduler loading.

Performance Per Watt (PPW) Caculation:

The PPW caculation is referred by below paper:
https://software.intel.com/content/dam/develop/external/us/en/documents/performance-per-what-paper.pdf

Below formula is referred from below spec to measure the PPW:

(F / t) / P = F * t / (t * E) = F / E,

"F" is the number of frames per second.
"P" is power measurd in watts.
"E" is energy measured in joules.

We use the RAPL interface with "perf" tool to get the energy data of the
package power.

The data comparsions between amd-pstate and acpi-freq module are tested on
AMD Cezanne processor:

1) TBench CPU benchmark:

+---------------------------------------------------------------------+
|                                                                     |
|               TBench (Performance Per Watt)                         |
|                                                    Higher is better |
+-------------------+------------------------+------------------------+
|                   |  Performance Per Watt  |  Performance Per Watt  |
|   Kernel Module   |       (Schedutil)      |       (Ondemand)       |
|                   |  Unit: MB / (s * J)    |  Unit: MB / (s * J)    |
+-------------------+------------------------+------------------------+
|                   |                        |                        |
|    acpi-cpufreq   |         3.022          |        2.969           |
|                   |                        |                        |
+-------------------+------------------------+------------------------+
|                   |                        |                        |
|     amd-pstate    |         3.131          |        3.284           |
|                   |                        |                        |
+-------------------+------------------------+------------------------+

2) Gitsource CPU benchmark:

+---------------------------------------------------------------------+
|                                                                     |
|               Gitsource (Performance Per Watt)                      |
|                                                    Higher is better |
+-------------------+------------------------+------------------------+
|                   |  Performance Per Watt  |  Performance Per Watt  |
|   Kernel Module   |       (Schedutil)      |       (Ondemand)       |
|                   |  Unit: 1 / (s * J)     |  Unit: 1 / (s * J)     |
+-------------------+------------------------+------------------------+
|                   |                        |                        |
|    acpi-cpufreq   |     3.42172E-07        |     2.74508E-07        |
|                   |                        |                        |
+-------------------+------------------------+------------------------+
|                   |                        |                        |
|     amd-pstate    |     4.09141E-07        |     3.47610E-07        |
|                   |                        |                        |
+-------------------+------------------------+------------------------+

3) Speedometer 2.0 CPU benchmark:

+---------------------------------------------------------------------+
|                                                                     |
|               Speedometer 2.0 (Performance Per Watt)                |
|                                                    Higher is better |
+-------------------+------------------------+------------------------+
|                   |  Performance Per Watt  |  Performance Per Watt  |
|   Kernel Module   |       (Schedutil)      |       (Ondemand)       |
|                   |  Unit: 1 / (s * J)     |  Unit: 1 / (s * J)     |
+-------------------+------------------------+------------------------+
|                   |                        |                        |
|    acpi-cpufreq   |      0.116111767       |      0.110321664       |
|                   |                        |                        |
+-------------------+------------------------+------------------------+
|                   |                        |                        |
|     amd-pstate    |      0.115825281       |      0.122024299       |
|                   |                        |                        |
+-------------------+------------------------+------------------------+

According to above average data, we can see this solution has shown better
performance per watt scaling on mobile CPU benchmarks in most of cases.

Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 drivers/cpufreq/Kconfig.x86  |  17 ++
 drivers/cpufreq/Makefile     |   1 +
 drivers/cpufreq/amd-pstate.c | 413 +++++++++++++++++++++++++++++++++++
 3 files changed, 431 insertions(+)
 create mode 100644 drivers/cpufreq/amd-pstate.c

diff --git a/drivers/cpufreq/Kconfig.x86 b/drivers/cpufreq/Kconfig.x86
index 92701a18bdd9..2e798b2c0bdb 100644
--- a/drivers/cpufreq/Kconfig.x86
+++ b/drivers/cpufreq/Kconfig.x86
@@ -34,6 +34,23 @@ config X86_PCC_CPUFREQ
 
 	  If in doubt, say N.
 
+config X86_AMD_PSTATE
+	bool "AMD Processor P-State driver"
+	depends on X86
+	select ACPI_PROCESSOR if ACPI
+	select ACPI_CPPC_LIB if X86_64 && ACPI && SCHED_MC_PRIO
+	select CPU_FREQ_GOV_SCHEDUTIL if SMP
+	help
+	  This driver adds a CPUFreq driver which utilizes a fine grain
+	  processor performance freqency control range instead of legacy
+	  performance levels. This driver supports the AMD processors with
+	  _CPC object in the SBIOS.
+
+	  For details, take a look at:
+	  <file:Documentation/admin-guide/pm/amd-pstate.rst>.
+
+	  If in doubt, say N.
+
 config X86_ACPI_CPUFREQ
 	tristate "ACPI Processor P-States driver"
 	depends on ACPI_PROCESSOR
diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
index 48ee5859030c..c8d307010922 100644
--- a/drivers/cpufreq/Makefile
+++ b/drivers/cpufreq/Makefile
@@ -25,6 +25,7 @@ obj-$(CONFIG_CPUFREQ_DT_PLATDEV)	+= cpufreq-dt-platdev.o
 # speedstep-* is preferred over p4-clockmod.
 
 obj-$(CONFIG_X86_ACPI_CPUFREQ)		+= acpi-cpufreq.o
+obj-$(CONFIG_X86_AMD_PSTATE)		+= amd-pstate.o
 obj-$(CONFIG_X86_POWERNOW_K8)		+= powernow-k8.o
 obj-$(CONFIG_X86_PCC_CPUFREQ)		+= pcc-cpufreq.o
 obj-$(CONFIG_X86_POWERNOW_K6)		+= powernow-k6.o
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
new file mode 100644
index 000000000000..a400861c7fdc
--- /dev/null
+++ b/drivers/cpufreq/amd-pstate.c
@@ -0,0 +1,413 @@
+/*
+ * amd-pstate.c - AMD Processor P-state Frequency Driver
+ *
+ * Copyright (C) 2021 Advanced Micro Devices, Inc. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+ *
+ * Author: Huang Rui <ray.huang@amd.com>
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/smp.h>
+#include <linux/sched.h>
+#include <linux/cpufreq.h>
+#include <linux/compiler.h>
+#include <linux/dmi.h>
+#include <linux/slab.h>
+#include <linux/acpi.h>
+#include <linux/io.h>
+#include <linux/delay.h>
+#include <linux/uaccess.h>
+#include <linux/static_call.h>
+
+#include <acpi/processor.h>
+#include <acpi/cppc_acpi.h>
+
+#include <asm/msr.h>
+#include <asm/processor.h>
+#include <asm/cpufeature.h>
+#include <asm/cpu_device_id.h>
+
+#define AMD_PSTATE_TRANSITION_LATENCY	0x20000
+#define AMD_PSTATE_TRANSITION_DELAY	500
+
+static struct cpufreq_driver amd_pstate_driver;
+
+struct amd_cpudata {
+	int	cpu;
+
+	struct freq_qos_request req[2];
+
+	u64	cppc_req_cached;
+
+	u32	highest_perf;
+	u32	nominal_perf;
+	u32	lowest_nonlinear_perf;
+	u32	lowest_perf;
+
+	u32	max_freq;
+	u32	min_freq;
+	u32	nominal_freq;
+	u32	lowest_nonlinear_freq;
+};
+
+static inline int pstate_enable(bool enable)
+{
+	return wrmsrl_safe(MSR_AMD_CPPC_ENABLE, enable ? 1 : 0);
+}
+
+DEFINE_STATIC_CALL(amd_pstate_enable, pstate_enable);
+
+static inline int amd_pstate_enable(bool enable)
+{
+	return static_call(amd_pstate_enable)(enable);
+}
+
+static int pstate_init_perf(struct amd_cpudata *cpudata)
+{
+	u64 cap1;
+
+	int ret = rdmsrl_safe_on_cpu(cpudata->cpu, MSR_AMD_CPPC_CAP1,
+				     &cap1);
+	if (ret)
+		return ret;
+
+	/*
+	 * TODO: Introduce AMD specific power feature.
+	 *
+	 * CPPC entry doesn't indicate the highest performance in some ASICs.
+	 */
+	WRITE_ONCE(cpudata->highest_perf, amd_get_highest_perf());
+
+	WRITE_ONCE(cpudata->nominal_perf, CAP1_NOMINAL_PERF(cap1));
+	WRITE_ONCE(cpudata->lowest_nonlinear_perf, CAP1_LOWNONLIN_PERF(cap1));
+	WRITE_ONCE(cpudata->lowest_perf, CAP1_LOWEST_PERF(cap1));
+
+	return 0;
+}
+
+DEFINE_STATIC_CALL(amd_pstate_init_perf, pstate_init_perf);
+
+static inline int amd_pstate_init_perf(struct amd_cpudata *cpudata)
+{
+	return static_call(amd_pstate_init_perf)(cpudata);
+}
+
+static void pstate_update_perf(struct amd_cpudata *cpudata, u32 min_perf,
+			       u32 des_perf, u32 max_perf, bool fast_switch)
+{
+	if (fast_switch)
+		wrmsrl(MSR_AMD_CPPC_REQ, READ_ONCE(cpudata->cppc_req_cached));
+	else
+		wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ,
+			      READ_ONCE(cpudata->cppc_req_cached));
+}
+
+DEFINE_STATIC_CALL(amd_pstate_update_perf, pstate_update_perf);
+
+static inline void amd_pstate_update_perf(struct amd_cpudata *cpudata,
+					  u32 min_perf, u32 des_perf,
+					  u32 max_perf, bool fast_switch)
+{
+	static_call(amd_pstate_update_perf)(cpudata, min_perf, des_perf,
+					    max_perf, fast_switch);
+}
+
+static void amd_pstate_update(struct amd_cpudata *cpudata, u32 min_perf,
+			      u32 des_perf, u32 max_perf, bool fast_switch)
+{
+	u64 prev = READ_ONCE(cpudata->cppc_req_cached);
+	u64 value = prev;
+
+	value &= ~REQ_MIN_PERF(~0L);
+	value |= REQ_MIN_PERF(min_perf);
+
+	value &= ~REQ_DES_PERF(~0L);
+	value |= REQ_DES_PERF(des_perf);
+
+	value &= ~REQ_MAX_PERF(~0L);
+	value |= REQ_MAX_PERF(max_perf);
+
+	if (value == prev)
+		return;
+
+	WRITE_ONCE(cpudata->cppc_req_cached, value);
+
+	amd_pstate_update_perf(cpudata, min_perf, des_perf,
+			       max_perf, fast_switch);
+}
+
+static int amd_pstate_verify(struct cpufreq_policy_data *policy)
+{
+	cpufreq_verify_within_cpu_limits(policy);
+
+	return 0;
+}
+
+static int amd_pstate_target(struct cpufreq_policy *policy,
+			     unsigned int target_freq,
+			     unsigned int relation)
+{
+	struct cpufreq_freqs freqs;
+	struct amd_cpudata *cpudata = policy->driver_data;
+	unsigned long amd_max_perf, amd_min_perf, amd_des_perf,
+		      amd_cap_perf;
+
+	if (!cpudata->max_freq)
+		return -ENODEV;
+
+	amd_cap_perf = READ_ONCE(cpudata->highest_perf);
+	amd_min_perf = READ_ONCE(cpudata->lowest_nonlinear_perf);
+	amd_max_perf = amd_cap_perf;
+
+	freqs.old = policy->cur;
+	freqs.new = target_freq;
+
+	amd_des_perf = DIV_ROUND_CLOSEST(target_freq * amd_cap_perf,
+					 cpudata->max_freq);
+
+	cpufreq_freq_transition_begin(policy, &freqs);
+	amd_pstate_update(cpudata, amd_min_perf, amd_des_perf,
+			  amd_max_perf, false);
+	cpufreq_freq_transition_end(policy, &freqs, false);
+
+	return 0;
+}
+
+static int amd_get_min_freq(struct amd_cpudata *cpudata)
+{
+	struct cppc_perf_caps cppc_perf;
+
+	int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf);
+	if (ret)
+		return ret;
+
+	/* Switch to khz */
+	return cppc_perf.lowest_freq * 1000;
+}
+
+static int amd_get_max_freq(struct amd_cpudata *cpudata)
+{
+	struct cppc_perf_caps cppc_perf;
+	u32 max_perf, max_freq, nominal_freq, nominal_perf;
+	u64 boost_ratio;
+
+	int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf);
+	if (ret)
+		return ret;
+
+	nominal_freq = cppc_perf.nominal_freq;
+	nominal_perf = READ_ONCE(cpudata->nominal_perf);
+	max_perf = READ_ONCE(cpudata->highest_perf);
+
+	boost_ratio = div_u64(max_perf << SCHED_CAPACITY_SHIFT,
+			      nominal_perf);
+
+	max_freq = nominal_freq * boost_ratio >> SCHED_CAPACITY_SHIFT;
+
+	/* Switch to khz */
+	return max_freq * 1000;
+}
+
+static int amd_get_nominal_freq(struct amd_cpudata *cpudata)
+{
+	struct cppc_perf_caps cppc_perf;
+	u32 nominal_freq;
+
+	int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf);
+	if (ret)
+		return ret;
+
+	nominal_freq = cppc_perf.nominal_freq;
+
+	/* Switch to khz */
+	return nominal_freq * 1000;
+}
+
+static int amd_get_lowest_nonlinear_freq(struct amd_cpudata *cpudata)
+{
+	struct cppc_perf_caps cppc_perf;
+	u32 lowest_nonlinear_freq, lowest_nonlinear_perf,
+	    nominal_freq, nominal_perf;
+	u64 lowest_nonlinear_ratio;
+
+	int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf);
+	if (ret)
+		return ret;
+
+	nominal_freq = cppc_perf.nominal_freq;
+	nominal_perf = READ_ONCE(cpudata->nominal_perf);
+
+	lowest_nonlinear_perf = cppc_perf.lowest_nonlinear_perf;
+
+	lowest_nonlinear_ratio = div_u64(lowest_nonlinear_perf <<
+					 SCHED_CAPACITY_SHIFT, nominal_perf);
+
+	lowest_nonlinear_freq = nominal_freq * lowest_nonlinear_ratio >> SCHED_CAPACITY_SHIFT;
+
+	/* Switch to khz */
+	return lowest_nonlinear_freq * 1000;
+}
+
+static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
+{
+	int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret;
+	unsigned int cpu = policy->cpu;
+	struct device *dev;
+	struct amd_cpudata *cpudata;
+
+	dev = get_cpu_device(policy->cpu);
+	if (!dev)
+		return -ENODEV;
+
+	cpudata = kzalloc(sizeof(*cpudata), GFP_KERNEL);
+	if (!cpudata)
+		return -ENOMEM;
+
+	cpudata->cpu = cpu;
+
+	ret = amd_pstate_init_perf(cpudata);
+	if (ret)
+		goto free_cpudata1;
+
+	min_freq = amd_get_min_freq(cpudata);
+	max_freq = amd_get_max_freq(cpudata);
+	nominal_freq = amd_get_nominal_freq(cpudata);
+	lowest_nonlinear_freq = amd_get_lowest_nonlinear_freq(cpudata);
+
+	if (min_freq < 0 || max_freq < 0 || min_freq > max_freq) {
+		dev_err(dev, "min_freq(%d) or max_freq(%d) value is incorrect\n",
+			min_freq, max_freq);
+		ret = -EINVAL;
+		goto free_cpudata1;
+	}
+
+	policy->cpuinfo.transition_latency = AMD_PSTATE_TRANSITION_LATENCY;
+	policy->transition_delay_us = AMD_PSTATE_TRANSITION_DELAY;
+
+	policy->min = min_freq;
+	policy->max = max_freq;
+
+	policy->cpuinfo.min_freq = min_freq;
+	policy->cpuinfo.max_freq = max_freq;
+
+	/* It will be updated by governor */
+	policy->cur = policy->cpuinfo.min_freq;
+
+	ret = freq_qos_add_request(&policy->constraints, &cpudata->req[0],
+				   FREQ_QOS_MIN, policy->cpuinfo.min_freq);
+	if (ret < 0) {
+		dev_err(dev, "Failed to add min-freq constraint (%d)\n", ret);
+		goto free_cpudata1;
+	}
+
+	ret = freq_qos_add_request(&policy->constraints, &cpudata->req[1],
+				   FREQ_QOS_MAX, policy->cpuinfo.max_freq);
+	if (ret < 0) {
+		dev_err(dev, "Failed to add max-freq constraint (%d)\n", ret);
+		goto free_cpudata2;
+	}
+
+	/* Initial processor data capability frequencies */
+	cpudata->max_freq = max_freq;
+	cpudata->min_freq = min_freq;
+	cpudata->nominal_freq = nominal_freq;
+	cpudata->lowest_nonlinear_freq = lowest_nonlinear_freq;
+
+	policy->driver_data = cpudata;
+
+	return 0;
+
+	freq_qos_remove_request(&cpudata->req[1]);
+free_cpudata2:
+	freq_qos_remove_request(&cpudata->req[0]);
+free_cpudata1:
+	kfree(cpudata);
+	return ret;
+}
+
+static int amd_pstate_cpu_exit(struct cpufreq_policy *policy)
+{
+	struct amd_cpudata *cpudata;
+
+	cpudata = policy->driver_data;
+
+	freq_qos_remove_request(&cpudata->req[1]);
+	freq_qos_remove_request(&cpudata->req[0]);
+	kfree(cpudata);
+
+	return 0;
+}
+
+static struct cpufreq_driver amd_pstate_driver = {
+	.flags		= CPUFREQ_CONST_LOOPS | CPUFREQ_NEED_UPDATE_LIMITS,
+	.verify		= amd_pstate_verify,
+	.target		= amd_pstate_target,
+	.init		= amd_pstate_cpu_init,
+	.exit		= amd_pstate_cpu_exit,
+	.name		= "amd-pstate",
+};
+
+static int __init amd_pstate_init(void)
+{
+	int ret;
+
+	if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
+		return -ENODEV;
+
+	if (!acpi_cpc_valid()) {
+		pr_debug("%s, the _CPC object is not present in SBIOS\n",
+			 __func__);
+		return -ENODEV;
+	}
+
+	/* don't keep reloading if cpufreq_driver exists */
+	if (cpufreq_get_current_driver())
+		return -EEXIST;
+
+	/* capability check */
+	if (!boot_cpu_has(X86_FEATURE_AMD_CPPC)) {
+		pr_debug("%s, AMD CPPC MSR based functionality is not supported\n",
+			 __func__);
+		return -ENODEV;
+	}
+
+	/* enable amd pstate feature */
+	ret = amd_pstate_enable(true);
+	if (ret) {
+		pr_err("%s, failed to enable amd-pstate with return %d\n",
+		       __func__, ret);
+		return ret;
+	}
+
+	ret = cpufreq_register_driver(&amd_pstate_driver);
+	if (ret) {
+		pr_err("%s, return %d\n", __func__, ret);
+		return ret;
+	}
+
+	return 0;
+}
+
+device_initcall(amd_pstate_init);
+
+MODULE_AUTHOR("Huang Rui <ray.huang@amd.com>");
+MODULE_DESCRIPTION("AMD Processor P-state Frequency Driver");
+MODULE_LICENSE("GPL");
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 07/21] cpufreq: amd: add fast switch function for amd-pstate
  2021-10-29 13:02 [PATCH v3 00/21] cpufreq: introduce a new AMD CPU frequency control mechanism Huang Rui
                   ` (5 preceding siblings ...)
  2021-10-29 13:02 ` [PATCH v3 06/21] cpufreq: amd: introduce a new amd pstate driver to support future processors Huang Rui
@ 2021-10-29 13:02 ` Huang Rui
  2021-10-29 14:16   ` Limonciello, Mario
  2021-11-02 19:56   ` Nathan Fontenot
  2021-10-29 13:02 ` [PATCH v3 08/21] cpufreq: amd: add acpi cppc function as the backend for legacy processors Huang Rui
                   ` (14 subsequent siblings)
  21 siblings, 2 replies; 50+ messages in thread
From: Huang Rui @ 2021-10-29 13:02 UTC (permalink / raw)
  To: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Borislav Petkov,
	Peter Zijlstra, Ingo Molnar, Giovanni Gherdovich, linux-pm
  Cc: Deepak Sharma, Alex Deucher, Mario Limonciello, Steven Noonan,
	Nathan Fontenot, Jinzhou Su, Xiaojian Du, linux-kernel, x86,
	Huang Rui

Introduce the fast switch function for amd-pstate on the AMD processors
which support the full MSR register control. It's able to decrease the
lattency on interrupt context.

Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 drivers/cpufreq/amd-pstate.c | 38 ++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index a400861c7fdc..55ff03f85608 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -191,6 +191,41 @@ static int amd_pstate_target(struct cpufreq_policy *policy,
 	return 0;
 }
 
+static void amd_pstate_adjust_perf(unsigned int cpu,
+				   unsigned long min_perf,
+				   unsigned long target_perf,
+				   unsigned long capacity)
+{
+	unsigned long amd_max_perf, amd_min_perf, amd_des_perf,
+		      amd_cap_perf, lowest_nonlinear_perf;
+	struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
+	struct amd_cpudata *cpudata = policy->driver_data;
+
+	amd_cap_perf = READ_ONCE(cpudata->highest_perf);
+	lowest_nonlinear_perf = READ_ONCE(cpudata->lowest_nonlinear_perf);
+
+	if (target_perf < capacity)
+		amd_des_perf = DIV_ROUND_UP(amd_cap_perf * target_perf,
+					    capacity);
+
+	amd_min_perf = READ_ONCE(cpudata->highest_perf);
+	if (min_perf < capacity)
+		amd_min_perf = DIV_ROUND_UP(amd_cap_perf * min_perf, capacity);
+
+	if (amd_min_perf < lowest_nonlinear_perf)
+		amd_min_perf = lowest_nonlinear_perf;
+
+	amd_max_perf = amd_cap_perf;
+	if (amd_max_perf < amd_min_perf)
+		amd_max_perf = amd_min_perf;
+
+	amd_des_perf = clamp_t(unsigned long, amd_des_perf,
+			       amd_min_perf, amd_max_perf);
+
+	amd_pstate_update(cpudata, amd_min_perf, amd_des_perf,
+			  amd_max_perf, true);
+}
+
 static int amd_get_min_freq(struct amd_cpudata *cpudata)
 {
 	struct cppc_perf_caps cppc_perf;
@@ -311,6 +346,8 @@ static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
 	/* It will be updated by governor */
 	policy->cur = policy->cpuinfo.min_freq;
 
+	policy->fast_switch_possible = true;
+
 	ret = freq_qos_add_request(&policy->constraints, &cpudata->req[0],
 				   FREQ_QOS_MIN, policy->cpuinfo.min_freq);
 	if (ret < 0) {
@@ -360,6 +397,7 @@ static struct cpufreq_driver amd_pstate_driver = {
 	.flags		= CPUFREQ_CONST_LOOPS | CPUFREQ_NEED_UPDATE_LIMITS,
 	.verify		= amd_pstate_verify,
 	.target		= amd_pstate_target,
+	.adjust_perf    = amd_pstate_adjust_perf,
 	.init		= amd_pstate_cpu_init,
 	.exit		= amd_pstate_cpu_exit,
 	.name		= "amd-pstate",
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 08/21] cpufreq: amd: add acpi cppc function as the backend for legacy processors
  2021-10-29 13:02 [PATCH v3 00/21] cpufreq: introduce a new AMD CPU frequency control mechanism Huang Rui
                   ` (6 preceding siblings ...)
  2021-10-29 13:02 ` [PATCH v3 07/21] cpufreq: amd: add fast switch function for amd-pstate Huang Rui
@ 2021-10-29 13:02 ` Huang Rui
  2021-10-29 14:20   ` Limonciello, Mario
  2021-11-02 18:46   ` Nathan Fontenot
  2021-10-29 13:02 ` [PATCH v3 09/21] cpufreq: amd: add trace for amd-pstate module Huang Rui
                   ` (13 subsequent siblings)
  21 siblings, 2 replies; 50+ messages in thread
From: Huang Rui @ 2021-10-29 13:02 UTC (permalink / raw)
  To: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Borislav Petkov,
	Peter Zijlstra, Ingo Molnar, Giovanni Gherdovich, linux-pm
  Cc: Deepak Sharma, Alex Deucher, Mario Limonciello, Steven Noonan,
	Nathan Fontenot, Jinzhou Su, Xiaojian Du, linux-kernel, x86,
	Huang Rui

In some old Zen based processors, they are using the shared memory that
exposed from ACPI SBIOS.

Signed-off-by: Jinzhou Su <Jinzhou.Su@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 drivers/cpufreq/amd-pstate.c | 58 ++++++++++++++++++++++++++++++++----
 1 file changed, 53 insertions(+), 5 deletions(-)

diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index 55ff03f85608..d399938d6d85 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -73,6 +73,19 @@ static inline int pstate_enable(bool enable)
 	return wrmsrl_safe(MSR_AMD_CPPC_ENABLE, enable ? 1 : 0);
 }
 
+static int cppc_enable(bool enable)
+{
+	int cpu, ret = 0;
+
+	for_each_online_cpu(cpu) {
+		ret = cppc_set_enable(cpu, enable ? 1 : 0);
+		if (ret)
+			return ret;
+	}
+
+	return ret;
+}
+
 DEFINE_STATIC_CALL(amd_pstate_enable, pstate_enable);
 
 static inline int amd_pstate_enable(bool enable)
@@ -103,6 +116,24 @@ static int pstate_init_perf(struct amd_cpudata *cpudata)
 	return 0;
 }
 
+static int cppc_init_perf(struct amd_cpudata *cpudata)
+{
+	struct cppc_perf_caps cppc_perf;
+
+	int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf);
+	if (ret)
+		return ret;
+
+	WRITE_ONCE(cpudata->highest_perf, amd_get_highest_perf());
+
+	WRITE_ONCE(cpudata->nominal_perf, cppc_perf.nominal_perf);
+	WRITE_ONCE(cpudata->lowest_nonlinear_perf,
+		   cppc_perf.lowest_nonlinear_perf);
+	WRITE_ONCE(cpudata->lowest_perf, cppc_perf.lowest_perf);
+
+	return 0;
+}
+
 DEFINE_STATIC_CALL(amd_pstate_init_perf, pstate_init_perf);
 
 static inline int amd_pstate_init_perf(struct amd_cpudata *cpudata)
@@ -120,6 +151,19 @@ static void pstate_update_perf(struct amd_cpudata *cpudata, u32 min_perf,
 			      READ_ONCE(cpudata->cppc_req_cached));
 }
 
+static void cppc_update_perf(struct amd_cpudata *cpudata,
+			     u32 min_perf, u32 des_perf,
+			     u32 max_perf, bool fast_switch)
+{
+	struct cppc_perf_ctrls perf_ctrls;
+
+	perf_ctrls.max_perf = max_perf;
+	perf_ctrls.min_perf = min_perf;
+	perf_ctrls.desired_perf = des_perf;
+
+	cppc_set_perf(cpudata->cpu, &perf_ctrls);
+}
+
 DEFINE_STATIC_CALL(amd_pstate_update_perf, pstate_update_perf);
 
 static inline void amd_pstate_update_perf(struct amd_cpudata *cpudata,
@@ -346,7 +390,8 @@ static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
 	/* It will be updated by governor */
 	policy->cur = policy->cpuinfo.min_freq;
 
-	policy->fast_switch_possible = true;
+	if (boot_cpu_has(X86_FEATURE_AMD_CPPC))
+		policy->fast_switch_possible = true;
 
 	ret = freq_qos_add_request(&policy->constraints, &cpudata->req[0],
 				   FREQ_QOS_MIN, policy->cpuinfo.min_freq);
@@ -397,7 +442,6 @@ static struct cpufreq_driver amd_pstate_driver = {
 	.flags		= CPUFREQ_CONST_LOOPS | CPUFREQ_NEED_UPDATE_LIMITS,
 	.verify		= amd_pstate_verify,
 	.target		= amd_pstate_target,
-	.adjust_perf    = amd_pstate_adjust_perf,
 	.init		= amd_pstate_cpu_init,
 	.exit		= amd_pstate_cpu_exit,
 	.name		= "amd-pstate",
@@ -421,10 +465,14 @@ static int __init amd_pstate_init(void)
 		return -EEXIST;
 
 	/* capability check */
-	if (!boot_cpu_has(X86_FEATURE_AMD_CPPC)) {
-		pr_debug("%s, AMD CPPC MSR based functionality is not supported\n",
+	if (boot_cpu_has(X86_FEATURE_AMD_CPPC)) {
+		pr_debug("%s, AMD CPPC MSR based functionality is supported\n",
 			 __func__);
-		return -ENODEV;
+		amd_pstate_driver.adjust_perf = amd_pstate_adjust_perf;
+	} else {
+		static_call_update(amd_pstate_enable, cppc_enable);
+		static_call_update(amd_pstate_init_perf, cppc_init_perf);
+		static_call_update(amd_pstate_update_perf, cppc_update_perf);
 	}
 
 	/* enable amd pstate feature */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 09/21] cpufreq: amd: add trace for amd-pstate module
  2021-10-29 13:02 [PATCH v3 00/21] cpufreq: introduce a new AMD CPU frequency control mechanism Huang Rui
                   ` (7 preceding siblings ...)
  2021-10-29 13:02 ` [PATCH v3 08/21] cpufreq: amd: add acpi cppc function as the backend for legacy processors Huang Rui
@ 2021-10-29 13:02 ` Huang Rui
  2021-10-29 13:02 ` [PATCH v3 10/21] cpufreq: amd: add boost mode support for amd-pstate Huang Rui
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 50+ messages in thread
From: Huang Rui @ 2021-10-29 13:02 UTC (permalink / raw)
  To: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Borislav Petkov,
	Peter Zijlstra, Ingo Molnar, Giovanni Gherdovich, linux-pm
  Cc: Deepak Sharma, Alex Deucher, Mario Limonciello, Steven Noonan,
	Nathan Fontenot, Jinzhou Su, Xiaojian Du, linux-kernel, x86,
	Huang Rui

Add trace event to monitor the performance value changes which is
controlled by cpu governors.

Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 drivers/cpufreq/amd-pstate.c |  4 ++++
 include/trace/events/power.h | 46 ++++++++++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index d399938d6d85..6037590e82a6 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -36,6 +36,7 @@
 #include <linux/delay.h>
 #include <linux/uaccess.h>
 #include <linux/static_call.h>
+#include <trace/events/power.h>
 
 #include <acpi/processor.h>
 #include <acpi/cppc_acpi.h>
@@ -189,6 +190,9 @@ static void amd_pstate_update(struct amd_cpudata *cpudata, u32 min_perf,
 	value &= ~REQ_MAX_PERF(~0L);
 	value |= REQ_MAX_PERF(max_perf);
 
+	trace_amd_pstate_perf(min_perf, des_perf, max_perf, cpudata->cpu,
+			      (value != prev), fast_switch);
+
 	if (value == prev)
 		return;
 
diff --git a/include/trace/events/power.h b/include/trace/events/power.h
index af5018aa9517..c95c0b8d443d 100644
--- a/include/trace/events/power.h
+++ b/include/trace/events/power.h
@@ -173,6 +173,52 @@ TRACE_EVENT(cpu_frequency_limits,
 		  (unsigned long)__entry->cpu_id)
 );
 
+TRACE_EVENT(amd_pstate_perf,
+
+	TP_PROTO(unsigned long min_perf,
+		 unsigned long target_perf,
+		 unsigned long capacity,
+		 unsigned int cpu_id,
+		 bool changed,
+		 bool fast_switch
+		 ),
+
+	TP_ARGS(min_perf,
+		target_perf,
+		capacity,
+		cpu_id,
+		changed,
+		fast_switch
+		),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, min_perf)
+		__field(unsigned long, target_perf)
+		__field(unsigned long, capacity)
+		__field(unsigned int, cpu_id)
+		__field(bool, changed)
+		__field(bool, fast_switch)
+		),
+
+	TP_fast_assign(
+		__entry->min_perf = min_perf;
+		__entry->target_perf = target_perf;
+		__entry->capacity = capacity;
+		__entry->cpu_id = cpu_id;
+		__entry->changed = changed;
+		__entry->fast_switch = fast_switch;
+		),
+
+	TP_printk("amd_min_perf=%lu amd_des_perf=%lu amd_max_perf=%lu cpu_id=%u changed=%s fast_switch=%s",
+		  (unsigned long)__entry->min_perf,
+		  (unsigned long)__entry->target_perf,
+		  (unsigned long)__entry->capacity,
+		  (unsigned int)__entry->cpu_id,
+		  (__entry->changed) ? "true" : "false",
+		  (__entry->fast_switch) ? "true" : "false"
+		 )
+);
+
 TRACE_EVENT(device_pm_callback_start,
 
 	TP_PROTO(struct device *dev, const char *pm_ops, int event),
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 10/21] cpufreq: amd: add boost mode support for amd-pstate
  2021-10-29 13:02 [PATCH v3 00/21] cpufreq: introduce a new AMD CPU frequency control mechanism Huang Rui
                   ` (8 preceding siblings ...)
  2021-10-29 13:02 ` [PATCH v3 09/21] cpufreq: amd: add trace for amd-pstate module Huang Rui
@ 2021-10-29 13:02 ` Huang Rui
  2021-10-29 13:02 ` [PATCH v3 11/21] cpufreq: amd: add amd-pstate frequencies attributes Huang Rui
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 50+ messages in thread
From: Huang Rui @ 2021-10-29 13:02 UTC (permalink / raw)
  To: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Borislav Petkov,
	Peter Zijlstra, Ingo Molnar, Giovanni Gherdovich, linux-pm
  Cc: Deepak Sharma, Alex Deucher, Mario Limonciello, Steven Noonan,
	Nathan Fontenot, Jinzhou Su, Xiaojian Du, linux-kernel, x86,
	Huang Rui

If the sbios supports the boost mode of amd-pstate, let's switch to
boost enabled by default.

Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 drivers/cpufreq/amd-pstate.c | 44 ++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index 6037590e82a6..9af27ac1f818 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -67,6 +67,8 @@ struct amd_cpudata {
 	u32	min_freq;
 	u32	nominal_freq;
 	u32	lowest_nonlinear_freq;
+
+	bool	boost_supported;
 };
 
 static inline int pstate_enable(bool enable)
@@ -349,6 +351,45 @@ static int amd_get_lowest_nonlinear_freq(struct amd_cpudata *cpudata)
 	return lowest_nonlinear_freq * 1000;
 }
 
+static int amd_pstate_set_boost(struct cpufreq_policy *policy, int state)
+{
+	struct amd_cpudata *cpudata = policy->driver_data;
+	int ret;
+
+	if (!cpudata->boost_supported) {
+		pr_err("Boost mode is not supported by this processor or SBIOS\n");
+		return -EINVAL;
+	}
+
+	if (state)
+		policy->cpuinfo.max_freq = cpudata->max_freq;
+	else
+		policy->cpuinfo.max_freq = cpudata->nominal_freq;
+
+	policy->max = policy->cpuinfo.max_freq;
+
+	ret = freq_qos_update_request(&cpudata->req[1],
+				      policy->cpuinfo.max_freq);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
+static void amd_pstate_boost_init(struct amd_cpudata *cpudata)
+{
+	u32 highest_perf, nominal_perf;
+
+	highest_perf = READ_ONCE(cpudata->highest_perf);
+	nominal_perf = READ_ONCE(cpudata->nominal_perf);
+
+	if (highest_perf <= nominal_perf)
+		return;
+
+	cpudata->boost_supported = true;
+	amd_pstate_driver.boost_enabled = true;
+}
+
 static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
 {
 	int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret;
@@ -419,6 +460,8 @@ static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
 
 	policy->driver_data = cpudata;
 
+	amd_pstate_boost_init(cpudata);
+
 	return 0;
 
 	freq_qos_remove_request(&cpudata->req[1]);
@@ -448,6 +491,7 @@ static struct cpufreq_driver amd_pstate_driver = {
 	.target		= amd_pstate_target,
 	.init		= amd_pstate_cpu_init,
 	.exit		= amd_pstate_cpu_exit,
+	.set_boost	= amd_pstate_set_boost,
 	.name		= "amd-pstate",
 };
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 11/21] cpufreq: amd: add amd-pstate frequencies attributes
  2021-10-29 13:02 [PATCH v3 00/21] cpufreq: introduce a new AMD CPU frequency control mechanism Huang Rui
                   ` (9 preceding siblings ...)
  2021-10-29 13:02 ` [PATCH v3 10/21] cpufreq: amd: add boost mode support for amd-pstate Huang Rui
@ 2021-10-29 13:02 ` Huang Rui
  2021-11-05 18:59   ` Nathan Fontenot
  2021-10-29 13:02 ` [PATCH v3 12/21] cpufreq: amd: add amd-pstate performance attributes Huang Rui
                   ` (10 subsequent siblings)
  21 siblings, 1 reply; 50+ messages in thread
From: Huang Rui @ 2021-10-29 13:02 UTC (permalink / raw)
  To: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Borislav Petkov,
	Peter Zijlstra, Ingo Molnar, Giovanni Gherdovich, linux-pm
  Cc: Deepak Sharma, Alex Deucher, Mario Limonciello, Steven Noonan,
	Nathan Fontenot, Jinzhou Su, Xiaojian Du, linux-kernel, x86,
	Huang Rui

Introduce sysfs attributes to get the different level processor
frequencies.

Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 drivers/cpufreq/amd-pstate.c | 63 ++++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)

diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index 9af27ac1f818..8cf1e80f44e0 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -485,6 +485,68 @@ static int amd_pstate_cpu_exit(struct cpufreq_policy *policy)
 	return 0;
 }
 
+/* Sysfs attributes */
+
+/* This frequency is to indicate the maximum hardware frequency.
+ * If boost is not active but supported, the frequency will be larger than the
+ * one in cpuinfo.
+ */
+static ssize_t show_amd_pstate_max_freq(struct cpufreq_policy *policy,
+					char *buf)
+{
+	int max_freq;
+	struct amd_cpudata *cpudata;
+
+	cpudata = policy->driver_data;
+
+	max_freq = amd_get_max_freq(cpudata);
+	if (max_freq < 0)
+		return max_freq;
+
+	return sprintf(&buf[0], "%u\n", max_freq);
+}
+
+static ssize_t show_amd_pstate_nominal_freq(struct cpufreq_policy *policy,
+					    char *buf)
+{
+	int nominal_freq;
+	struct amd_cpudata *cpudata;
+
+	cpudata = policy->driver_data;
+
+	nominal_freq = amd_get_nominal_freq(cpudata);
+	if (nominal_freq < 0)
+		return nominal_freq;
+
+	return sprintf(&buf[0], "%u\n", nominal_freq);
+}
+
+static ssize_t show_amd_pstate_lowest_nonlinear_freq(struct cpufreq_policy *policy,
+						     char *buf)
+{
+	int freq;
+	struct amd_cpudata *cpudata;
+
+	cpudata = policy->driver_data;
+
+	freq = amd_get_lowest_nonlinear_freq(cpudata);
+	if (freq < 0)
+		return freq;
+
+	return sprintf(&buf[0], "%u\n", freq);
+}
+
+cpufreq_freq_attr_ro(amd_pstate_max_freq);
+cpufreq_freq_attr_ro(amd_pstate_nominal_freq);
+cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq);
+
+static struct freq_attr *amd_pstate_attr[] = {
+	&amd_pstate_max_freq,
+	&amd_pstate_nominal_freq,
+	&amd_pstate_lowest_nonlinear_freq,
+	NULL,
+};
+
 static struct cpufreq_driver amd_pstate_driver = {
 	.flags		= CPUFREQ_CONST_LOOPS | CPUFREQ_NEED_UPDATE_LIMITS,
 	.verify		= amd_pstate_verify,
@@ -493,6 +555,7 @@ static struct cpufreq_driver amd_pstate_driver = {
 	.exit		= amd_pstate_cpu_exit,
 	.set_boost	= amd_pstate_set_boost,
 	.name		= "amd-pstate",
+	.attr           = amd_pstate_attr,
 };
 
 static int __init amd_pstate_init(void)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 12/21] cpufreq: amd: add amd-pstate performance attributes
  2021-10-29 13:02 [PATCH v3 00/21] cpufreq: introduce a new AMD CPU frequency control mechanism Huang Rui
                   ` (10 preceding siblings ...)
  2021-10-29 13:02 ` [PATCH v3 11/21] cpufreq: amd: add amd-pstate frequencies attributes Huang Rui
@ 2021-10-29 13:02 ` Huang Rui
  2021-11-05 18:50   ` Nathan Fontenot
  2021-10-29 13:02 ` [PATCH v3 13/21] cpupower: add AMD P-state capability flag Huang Rui
                   ` (9 subsequent siblings)
  21 siblings, 1 reply; 50+ messages in thread
From: Huang Rui @ 2021-10-29 13:02 UTC (permalink / raw)
  To: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Borislav Petkov,
	Peter Zijlstra, Ingo Molnar, Giovanni Gherdovich, linux-pm
  Cc: Deepak Sharma, Alex Deucher, Mario Limonciello, Steven Noonan,
	Nathan Fontenot, Jinzhou Su, Xiaojian Du, linux-kernel, x86,
	Huang Rui

Introduce sysfs attributes to get the different level amd-pstate
performances.

Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 drivers/cpufreq/amd-pstate.c | 53 ++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index 8cf1e80f44e0..58ee50bf492b 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -536,14 +536,67 @@ static ssize_t show_amd_pstate_lowest_nonlinear_freq(struct cpufreq_policy *poli
 	return sprintf(&buf[0], "%u\n", freq);
 }
 
+static ssize_t show_amd_pstate_highest_perf(struct cpufreq_policy *policy,
+					    char *buf)
+{
+	u32 perf;
+	struct amd_cpudata *cpudata = policy->driver_data;
+
+	perf = READ_ONCE(cpudata->highest_perf);
+
+	return sprintf(&buf[0], "%u\n", perf);
+}
+
+static ssize_t show_amd_pstate_nominal_perf(struct cpufreq_policy *policy,
+					    char *buf)
+{
+	u32 perf;
+	struct amd_cpudata *cpudata = policy->driver_data;
+
+	perf = READ_ONCE(cpudata->nominal_perf);
+
+	return sprintf(&buf[0], "%u\n", perf);
+}
+
+static ssize_t show_amd_pstate_lowest_nonlinear_perf(struct cpufreq_policy *policy,
+						     char *buf)
+{
+	u32 perf;
+	struct amd_cpudata *cpudata = policy->driver_data;
+
+	perf = READ_ONCE(cpudata->lowest_nonlinear_perf);
+
+	return sprintf(&buf[0], "%u\n", perf);
+}
+
+static ssize_t show_amd_pstate_lowest_perf(struct cpufreq_policy *policy,
+					   char *buf)
+{
+	u32 perf;
+	struct amd_cpudata *cpudata = policy->driver_data;
+
+	perf = READ_ONCE(cpudata->lowest_perf);
+
+	return sprintf(&buf[0], "%u\n", perf);
+}
+
 cpufreq_freq_attr_ro(amd_pstate_max_freq);
 cpufreq_freq_attr_ro(amd_pstate_nominal_freq);
 cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq);
 
+cpufreq_freq_attr_ro(amd_pstate_highest_perf);
+cpufreq_freq_attr_ro(amd_pstate_nominal_perf);
+cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_perf);
+cpufreq_freq_attr_ro(amd_pstate_lowest_perf);
+
 static struct freq_attr *amd_pstate_attr[] = {
 	&amd_pstate_max_freq,
 	&amd_pstate_nominal_freq,
 	&amd_pstate_lowest_nonlinear_freq,
+	&amd_pstate_highest_perf,
+	&amd_pstate_nominal_perf,
+	&amd_pstate_lowest_nonlinear_perf,
+	&amd_pstate_lowest_perf,
 	NULL,
 };
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 13/21] cpupower: add AMD P-state capability flag
  2021-10-29 13:02 [PATCH v3 00/21] cpufreq: introduce a new AMD CPU frequency control mechanism Huang Rui
                   ` (11 preceding siblings ...)
  2021-10-29 13:02 ` [PATCH v3 12/21] cpufreq: amd: add amd-pstate performance attributes Huang Rui
@ 2021-10-29 13:02 ` Huang Rui
  2021-10-29 13:02 ` [PATCH v3 14/21] cpupower: add the function to check amd-pstate enabled Huang Rui
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 50+ messages in thread
From: Huang Rui @ 2021-10-29 13:02 UTC (permalink / raw)
  To: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Borislav Petkov,
	Peter Zijlstra, Ingo Molnar, Giovanni Gherdovich, linux-pm
  Cc: Deepak Sharma, Alex Deucher, Mario Limonciello, Steven Noonan,
	Nathan Fontenot, Jinzhou Su, Xiaojian Du, linux-kernel, x86,
	Huang Rui

Add AMD P-state capability flag in cpupower to indicate AMD new P-state
kernel module support on Ryzen processors.

Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 tools/power/cpupower/utils/helpers/helpers.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/power/cpupower/utils/helpers/helpers.h b/tools/power/cpupower/utils/helpers/helpers.h
index 33ffacee7fcb..b4813efdfb00 100644
--- a/tools/power/cpupower/utils/helpers/helpers.h
+++ b/tools/power/cpupower/utils/helpers/helpers.h
@@ -73,6 +73,7 @@ enum cpupower_cpu_vendor {X86_VENDOR_UNKNOWN = 0, X86_VENDOR_INTEL,
 #define CPUPOWER_CAP_AMD_HW_PSTATE	0x00000100
 #define CPUPOWER_CAP_AMD_PSTATEDEF	0x00000200
 #define CPUPOWER_CAP_AMD_CPB_MSR	0x00000400
+#define CPUPOWER_CAP_AMD_PSTATE		0x00000800
 
 #define CPUPOWER_AMD_CPBDIS		0x02000000
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 14/21] cpupower: add the function to check amd-pstate enabled
  2021-10-29 13:02 [PATCH v3 00/21] cpufreq: introduce a new AMD CPU frequency control mechanism Huang Rui
                   ` (12 preceding siblings ...)
  2021-10-29 13:02 ` [PATCH v3 13/21] cpupower: add AMD P-state capability flag Huang Rui
@ 2021-10-29 13:02 ` Huang Rui
  2021-10-29 13:02 ` [PATCH v3 15/21] cpupower: initial AMD P-state capability Huang Rui
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 50+ messages in thread
From: Huang Rui @ 2021-10-29 13:02 UTC (permalink / raw)
  To: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Borislav Petkov,
	Peter Zijlstra, Ingo Molnar, Giovanni Gherdovich, linux-pm
  Cc: Deepak Sharma, Alex Deucher, Mario Limonciello, Steven Noonan,
	Nathan Fontenot, Jinzhou Su, Xiaojian Du, linux-kernel, x86,
	Huang Rui

The processor with amd-pstate function also supports legacy ACPI
hardware P-States feature as well. Once driver sets amd-pstate eanbled,
the processor will respond the finer grain amd-pstate feature instead of
legacy ACPI P-States. So it introduces the cpupower_amd_pstate_enabled()
to check whether the current kernel enables amd-pstate or acpi-cpufreq
module.

Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 tools/power/cpupower/utils/helpers/helpers.h | 10 ++++++++++
 tools/power/cpupower/utils/helpers/misc.c    | 18 ++++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/tools/power/cpupower/utils/helpers/helpers.h b/tools/power/cpupower/utils/helpers/helpers.h
index b4813efdfb00..e03cc97297aa 100644
--- a/tools/power/cpupower/utils/helpers/helpers.h
+++ b/tools/power/cpupower/utils/helpers/helpers.h
@@ -11,6 +11,7 @@
 
 #include <libintl.h>
 #include <locale.h>
+#include <stdbool.h>
 
 #include "helpers/bitmask.h"
 #include <cpupower.h>
@@ -136,6 +137,12 @@ extern int decode_pstates(unsigned int cpu, int boost_states,
 
 extern int cpufreq_has_boost_support(unsigned int cpu, int *support,
 				     int *active, int * states);
+
+/* AMD P-States stuff **************************/
+extern bool cpupower_amd_pstate_enabled(void);
+
+/* AMD P-States stuff **************************/
+
 /*
  * CPUID functions returning a single datum
  */
@@ -168,6 +175,9 @@ static inline int cpufreq_has_boost_support(unsigned int cpu, int *support,
 					    int *active, int * states)
 { return -1; }
 
+static inline bool cpupower_amd_pstate_enabled(void)
+{ return false; }
+
 /* cpuid and cpuinfo helpers  **************************/
 
 static inline unsigned int cpuid_eax(unsigned int op) { return 0; };
diff --git a/tools/power/cpupower/utils/helpers/misc.c b/tools/power/cpupower/utils/helpers/misc.c
index fc6e34511721..0c483cdefcc2 100644
--- a/tools/power/cpupower/utils/helpers/misc.c
+++ b/tools/power/cpupower/utils/helpers/misc.c
@@ -3,9 +3,11 @@
 #include <stdio.h>
 #include <errno.h>
 #include <stdlib.h>
+#include <string.h>
 
 #include "helpers/helpers.h"
 #include "helpers/sysfs.h"
+#include "cpufreq.h"
 
 #if defined(__i386__) || defined(__x86_64__)
 
@@ -83,6 +85,22 @@ int cpupower_intel_set_perf_bias(unsigned int cpu, unsigned int val)
 	return 0;
 }
 
+bool cpupower_amd_pstate_enabled(void)
+{
+	char *driver = cpufreq_get_driver(0);
+	bool ret = false;
+
+	if (!driver)
+		return ret;
+
+	if (!strcmp(driver, "amd-pstate"))
+		ret = true;
+
+	cpufreq_put_driver(driver);
+
+	return ret;
+}
+
 #endif /* #if defined(__i386__) || defined(__x86_64__) */
 
 /* get_cpustate
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 15/21] cpupower: initial AMD P-state capability
  2021-10-29 13:02 [PATCH v3 00/21] cpufreq: introduce a new AMD CPU frequency control mechanism Huang Rui
                   ` (13 preceding siblings ...)
  2021-10-29 13:02 ` [PATCH v3 14/21] cpupower: add the function to check amd-pstate enabled Huang Rui
@ 2021-10-29 13:02 ` Huang Rui
  2021-10-29 13:02 ` [PATCH v3 16/21] cpupower: add the function to get the sysfs value from specific table Huang Rui
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 50+ messages in thread
From: Huang Rui @ 2021-10-29 13:02 UTC (permalink / raw)
  To: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Borislav Petkov,
	Peter Zijlstra, Ingo Molnar, Giovanni Gherdovich, linux-pm
  Cc: Deepak Sharma, Alex Deucher, Mario Limonciello, Steven Noonan,
	Nathan Fontenot, Jinzhou Su, Xiaojian Du, linux-kernel, x86,
	Huang Rui

If kernel starts the amd-pstate module, the cpupower will initial the
capability flag as CPUPOWER_CAP_AMD_PSTATE. And once amd-pstate
capability is set, it won't need to set legacy ACPI relative
capabilities anymore.

Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 tools/power/cpupower/utils/helpers/cpuid.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/tools/power/cpupower/utils/helpers/cpuid.c b/tools/power/cpupower/utils/helpers/cpuid.c
index 72eb43593180..2a6dc104e76b 100644
--- a/tools/power/cpupower/utils/helpers/cpuid.c
+++ b/tools/power/cpupower/utils/helpers/cpuid.c
@@ -149,6 +149,19 @@ int get_cpu_info(struct cpupower_cpu_info *cpu_info)
 		if (ext_cpuid_level >= 0x80000008 &&
 		    cpuid_ebx(0x80000008) & (1 << 4))
 			cpu_info->caps |= CPUPOWER_CAP_AMD_RDPRU;
+
+		if (cpupower_amd_pstate_enabled()) {
+			cpu_info->caps |= CPUPOWER_CAP_AMD_PSTATE;
+
+			/*
+			 * If AMD P-state is enabled, the firmware will treat
+			 * AMD P-state function as high priority.
+			 */
+			cpu_info->caps &= ~CPUPOWER_CAP_AMD_CPB;
+			cpu_info->caps &= ~CPUPOWER_CAP_AMD_CPB_MSR;
+			cpu_info->caps &= ~CPUPOWER_CAP_AMD_HW_PSTATE;
+			cpu_info->caps &= ~CPUPOWER_CAP_AMD_PSTATEDEF;
+		}
 	}
 
 	if (cpu_info->vendor == X86_VENDOR_INTEL) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 16/21] cpupower: add the function to get the sysfs value from specific table
  2021-10-29 13:02 [PATCH v3 00/21] cpufreq: introduce a new AMD CPU frequency control mechanism Huang Rui
                   ` (14 preceding siblings ...)
  2021-10-29 13:02 ` [PATCH v3 15/21] cpupower: initial AMD P-state capability Huang Rui
@ 2021-10-29 13:02 ` Huang Rui
  2021-10-29 13:02 ` [PATCH v3 17/21] cpupower: add amd-pstate sysfs definition and access helper Huang Rui
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 50+ messages in thread
From: Huang Rui @ 2021-10-29 13:02 UTC (permalink / raw)
  To: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Borislav Petkov,
	Peter Zijlstra, Ingo Molnar, Giovanni Gherdovich, linux-pm
  Cc: Deepak Sharma, Alex Deucher, Mario Limonciello, Steven Noonan,
	Nathan Fontenot, Jinzhou Su, Xiaojian Du, linux-kernel, x86,
	Huang Rui

Expose the helper into cpufreq header, then cpufreq driver can use this
function to get the sysfs value if it has any specific sysfs interfaces.

Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 tools/power/cpupower/lib/cpufreq.c | 21 +++++++++++++++------
 tools/power/cpupower/lib/cpufreq.h | 12 ++++++++++++
 2 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/tools/power/cpupower/lib/cpufreq.c b/tools/power/cpupower/lib/cpufreq.c
index c3b56db8b921..02719cc400a1 100644
--- a/tools/power/cpupower/lib/cpufreq.c
+++ b/tools/power/cpupower/lib/cpufreq.c
@@ -83,20 +83,21 @@ static const char *cpufreq_value_files[MAX_CPUFREQ_VALUE_READ_FILES] = {
 	[STATS_NUM_TRANSITIONS] = "stats/total_trans"
 };
 
-
-static unsigned long sysfs_cpufreq_get_one_value(unsigned int cpu,
-						 enum cpufreq_value which)
+unsigned long cpufreq_get_sysfs_value_from_table(unsigned int cpu,
+						 const char **table,
+						 unsigned index,
+						 unsigned size)
 {
 	unsigned long value;
 	unsigned int len;
 	char linebuf[MAX_LINE_LEN];
 	char *endp;
 
-	if (which >= MAX_CPUFREQ_VALUE_READ_FILES)
+	if (!table && !table[index] && index >= size)
 		return 0;
 
-	len = sysfs_cpufreq_read_file(cpu, cpufreq_value_files[which],
-				linebuf, sizeof(linebuf));
+	len = sysfs_cpufreq_read_file(cpu, table[index], linebuf,
+				      sizeof(linebuf));
 
 	if (len == 0)
 		return 0;
@@ -109,6 +110,14 @@ static unsigned long sysfs_cpufreq_get_one_value(unsigned int cpu,
 	return value;
 }
 
+static unsigned long sysfs_cpufreq_get_one_value(unsigned int cpu,
+						 enum cpufreq_value which)
+{
+	return cpufreq_get_sysfs_value_from_table(cpu, cpufreq_value_files,
+						  which,
+						  MAX_CPUFREQ_VALUE_READ_FILES);
+}
+
 /* read access to files which contain one string */
 
 enum cpufreq_string {
diff --git a/tools/power/cpupower/lib/cpufreq.h b/tools/power/cpupower/lib/cpufreq.h
index 95f4fd9e2656..107668c0c454 100644
--- a/tools/power/cpupower/lib/cpufreq.h
+++ b/tools/power/cpupower/lib/cpufreq.h
@@ -203,6 +203,18 @@ int cpufreq_modify_policy_governor(unsigned int cpu, char *governor);
 int cpufreq_set_frequency(unsigned int cpu,
 				unsigned long target_frequency);
 
+/*
+ * get the sysfs value from specific table
+ *
+ * Read the value with the sysfs file name from specific table. Does
+ * only work if the cpufreq driver has the specific sysfs interfaces.
+ */
+
+unsigned long cpufreq_get_sysfs_value_from_table(unsigned int cpu,
+						 const char **table,
+						 unsigned index,
+						 unsigned size);
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 17/21] cpupower: add amd-pstate sysfs definition and access helper
  2021-10-29 13:02 [PATCH v3 00/21] cpufreq: introduce a new AMD CPU frequency control mechanism Huang Rui
                   ` (15 preceding siblings ...)
  2021-10-29 13:02 ` [PATCH v3 16/21] cpupower: add the function to get the sysfs value from specific table Huang Rui
@ 2021-10-29 13:02 ` Huang Rui
  2021-10-29 14:10   ` Limonciello, Mario
  2021-10-29 13:02 ` [PATCH v3 18/21] cpupower: enable boost state support for amd-pstate module Huang Rui
                   ` (4 subsequent siblings)
  21 siblings, 1 reply; 50+ messages in thread
From: Huang Rui @ 2021-10-29 13:02 UTC (permalink / raw)
  To: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Borislav Petkov,
	Peter Zijlstra, Ingo Molnar, Giovanni Gherdovich, linux-pm
  Cc: Deepak Sharma, Alex Deucher, Mario Limonciello, Steven Noonan,
	Nathan Fontenot, Jinzhou Su, Xiaojian Du, linux-kernel, x86,
	Huang Rui

Introduce the marco definitions and access helper function for
amd-pstate sysfs interfaces such as each performance goals and frequency
levels in amd helper file. They will be used to read the sysfs attribute
from amd-pstate cpufreq driver for cpupower utilities.

Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 tools/power/cpupower/utils/helpers/amd.c | 37 ++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/tools/power/cpupower/utils/helpers/amd.c b/tools/power/cpupower/utils/helpers/amd.c
index 97f2c857048e..f233a6ab75ac 100644
--- a/tools/power/cpupower/utils/helpers/amd.c
+++ b/tools/power/cpupower/utils/helpers/amd.c
@@ -8,7 +8,9 @@
 #include <pci/pci.h>
 
 #include "helpers/helpers.h"
+#include "cpufreq.h"
 
+/* ACPI P-States Helper Functions for AMD Processors ***************/
 #define MSR_AMD_PSTATE_STATUS	0xc0010063
 #define MSR_AMD_PSTATE		0xc0010064
 #define MSR_AMD_PSTATE_LIMIT	0xc0010061
@@ -146,4 +148,39 @@ int amd_pci_get_num_boost_states(int *active, int *states)
 	pci_cleanup(pci_acc);
 	return 0;
 }
+
+/* ACPI P-States Helper Functions for AMD Processors ***************/
+
+/* AMD P-States Helper Functions ***************/
+enum amd_pstate_value {
+	AMD_PSTATE_HIGHEST_PERF,
+	AMD_PSTATE_NOMINAL_PERF,
+	AMD_PSTATE_LOWEST_NONLINEAR_PERF,
+	AMD_PSTATE_LOWEST_PERF,
+	AMD_PSTATE_MAX_FREQ,
+	AMD_PSTATE_NOMINAL_FREQ,
+	AMD_PSTATE_LOWEST_NONLINEAR_FREQ,
+	MAX_AMD_PSTATE_VALUE_READ_FILES
+};
+
+static const char *amd_pstate_value_files[MAX_AMD_PSTATE_VALUE_READ_FILES] = {
+	[AMD_PSTATE_HIGHEST_PERF] = "amd_pstate_highest_perf",
+	[AMD_PSTATE_NOMINAL_PERF] = "amd_pstate_nominal_perf",
+	[AMD_PSTATE_LOWEST_NONLINEAR_PERF] = "amd_pstate_lowest_nonlinear_perf",
+	[AMD_PSTATE_LOWEST_PERF] = "amd_pstate_lowest_perf",
+	[AMD_PSTATE_MAX_FREQ] = "amd_pstate_max_freq",
+	[AMD_PSTATE_NOMINAL_FREQ] = "amd_pstate_nominal_freq",
+	[AMD_PSTATE_LOWEST_NONLINEAR_FREQ] = "amd_pstate_lowest_nonlinear_freq",
+};
+
+static unsigned long amd_pstate_get_data(unsigned int cpu,
+					 enum amd_pstate_value value)
+{
+	return cpufreq_get_sysfs_value_from_table(cpu,
+						  amd_pstate_value_files,
+						  value,
+						  MAX_AMD_PSTATE_VALUE_READ_FILES);
+}
+
+/* AMD P-States Helper Functions ***************/
 #endif /* defined(__i386__) || defined(__x86_64__) */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 18/21] cpupower: enable boost state support for amd-pstate module
  2021-10-29 13:02 [PATCH v3 00/21] cpufreq: introduce a new AMD CPU frequency control mechanism Huang Rui
                   ` (16 preceding siblings ...)
  2021-10-29 13:02 ` [PATCH v3 17/21] cpupower: add amd-pstate sysfs definition and access helper Huang Rui
@ 2021-10-29 13:02 ` Huang Rui
  2021-11-02 20:11   ` Nathan Fontenot
  2021-10-29 13:02 ` [PATCH v3 19/21] cpupower: move print_speed function into misc helper Huang Rui
                   ` (3 subsequent siblings)
  21 siblings, 1 reply; 50+ messages in thread
From: Huang Rui @ 2021-10-29 13:02 UTC (permalink / raw)
  To: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Borislav Petkov,
	Peter Zijlstra, Ingo Molnar, Giovanni Gherdovich, linux-pm
  Cc: Deepak Sharma, Alex Deucher, Mario Limonciello, Steven Noonan,
	Nathan Fontenot, Jinzhou Su, Xiaojian Du, linux-kernel, x86,
	Huang Rui

The legacy ACPI hardware P-States function has 3 P-States on ACPI table,
the CPU frequency only can be switched between the 3 P-States. While the
processor supports the boost state, it will have another boost state
that the frequency can be higher than P0 state, and the state can be
decoded by the function of decode_pstates() and read by
amd_pci_get_num_boost_states().

However, the new AMD P-States function is different than legacy ACPI
hardware P-State on AMD processors. That has a finer grain frequency
range between the highest and lowest frequency. And boost frequency is
actually the frequency which is mapped on highest performance ratio. The
similiar previous P0 frequency is mapped on nominal performance ratio.
If the highest performance on the processor is higher than nominal
performance, then we think the current processor supports the boost
state. And it uses amd_pstate_boost_init() to initialize boost for AMD
P-States function.

Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 tools/power/cpupower/utils/helpers/amd.c     | 18 ++++++++++++++++++
 tools/power/cpupower/utils/helpers/helpers.h |  5 +++++
 tools/power/cpupower/utils/helpers/misc.c    |  2 ++
 3 files changed, 25 insertions(+)

diff --git a/tools/power/cpupower/utils/helpers/amd.c b/tools/power/cpupower/utils/helpers/amd.c
index f233a6ab75ac..92b9fb631768 100644
--- a/tools/power/cpupower/utils/helpers/amd.c
+++ b/tools/power/cpupower/utils/helpers/amd.c
@@ -182,5 +182,23 @@ static unsigned long amd_pstate_get_data(unsigned int cpu,
 						  MAX_AMD_PSTATE_VALUE_READ_FILES);
 }
 
+void amd_pstate_boost_init(unsigned int cpu, int *support, int *active)
+{
+	unsigned long highest_perf, nominal_perf, cpuinfo_min,
+		      cpuinfo_max, amd_pstate_max;
+
+	highest_perf = amd_pstate_get_data(cpu, AMD_PSTATE_HIGHEST_PERF);
+	nominal_perf = amd_pstate_get_data(cpu, AMD_PSTATE_NOMINAL_PERF);
+
+	*support = highest_perf > nominal_perf ? 1 : 0;
+	if (!(*support))
+		return;
+
+	cpufreq_get_hardware_limits(cpu, &cpuinfo_min, &cpuinfo_max);
+	amd_pstate_max = amd_pstate_get_data(cpu, AMD_PSTATE_MAX_FREQ);
+
+	*active = cpuinfo_max == amd_pstate_max ? 1 : 0;
+}
+
 /* AMD P-States Helper Functions ***************/
 #endif /* defined(__i386__) || defined(__x86_64__) */
diff --git a/tools/power/cpupower/utils/helpers/helpers.h b/tools/power/cpupower/utils/helpers/helpers.h
index e03cc97297aa..c03925bea655 100644
--- a/tools/power/cpupower/utils/helpers/helpers.h
+++ b/tools/power/cpupower/utils/helpers/helpers.h
@@ -140,6 +140,8 @@ extern int cpufreq_has_boost_support(unsigned int cpu, int *support,
 
 /* AMD P-States stuff **************************/
 extern bool cpupower_amd_pstate_enabled(void);
+extern void amd_pstate_boost_init(unsigned int cpu,
+				  int *support, int *active);
 
 /* AMD P-States stuff **************************/
 
@@ -177,6 +179,9 @@ static inline int cpufreq_has_boost_support(unsigned int cpu, int *support,
 
 static inline bool cpupower_amd_pstate_enabled(void)
 { return false; }
+static void amd_pstate_boost_init(unsigned int cpu,
+				  int *support, int *active)
+{ return; }
 
 /* cpuid and cpuinfo helpers  **************************/
 
diff --git a/tools/power/cpupower/utils/helpers/misc.c b/tools/power/cpupower/utils/helpers/misc.c
index 0c483cdefcc2..e0d3145434d3 100644
--- a/tools/power/cpupower/utils/helpers/misc.c
+++ b/tools/power/cpupower/utils/helpers/misc.c
@@ -41,6 +41,8 @@ int cpufreq_has_boost_support(unsigned int cpu, int *support, int *active,
 			if (ret)
 				return ret;
 		}
+	} else if (cpupower_cpu_info.caps & CPUPOWER_CAP_AMD_PSTATE) {
+		amd_pstate_boost_init(cpu, support, active);
 	} else if (cpupower_cpu_info.caps & CPUPOWER_CAP_INTEL_IDA)
 		*support = *active = 1;
 	return 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 19/21] cpupower: move print_speed function into misc helper
  2021-10-29 13:02 [PATCH v3 00/21] cpufreq: introduce a new AMD CPU frequency control mechanism Huang Rui
                   ` (17 preceding siblings ...)
  2021-10-29 13:02 ` [PATCH v3 18/21] cpupower: enable boost state support for amd-pstate module Huang Rui
@ 2021-10-29 13:02 ` Huang Rui
  2021-10-29 13:02 ` [PATCH v3 20/21] cpupower: print amd-pstate information on cpupower Huang Rui
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 50+ messages in thread
From: Huang Rui @ 2021-10-29 13:02 UTC (permalink / raw)
  To: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Borislav Petkov,
	Peter Zijlstra, Ingo Molnar, Giovanni Gherdovich, linux-pm
  Cc: Deepak Sharma, Alex Deucher, Mario Limonciello, Steven Noonan,
	Nathan Fontenot, Jinzhou Su, Xiaojian Du, linux-kernel, x86,
	Huang Rui

The print_speed can be as a common function, and expose it into misc
helper header. Then it can be used on other helper files as well.

Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 tools/power/cpupower/utils/cpufreq-info.c    | 59 ++++----------------
 tools/power/cpupower/utils/helpers/helpers.h |  1 +
 tools/power/cpupower/utils/helpers/misc.c    | 42 ++++++++++++++
 3 files changed, 54 insertions(+), 48 deletions(-)

diff --git a/tools/power/cpupower/utils/cpufreq-info.c b/tools/power/cpupower/utils/cpufreq-info.c
index f9895e31ff5a..b429454bf3ae 100644
--- a/tools/power/cpupower/utils/cpufreq-info.c
+++ b/tools/power/cpupower/utils/cpufreq-info.c
@@ -84,43 +84,6 @@ static void proc_cpufreq_output(void)
 }
 
 static int no_rounding;
-static void print_speed(unsigned long speed)
-{
-	unsigned long tmp;
-
-	if (no_rounding) {
-		if (speed > 1000000)
-			printf("%u.%06u GHz", ((unsigned int) speed/1000000),
-				((unsigned int) speed%1000000));
-		else if (speed > 1000)
-			printf("%u.%03u MHz", ((unsigned int) speed/1000),
-				(unsigned int) (speed%1000));
-		else
-			printf("%lu kHz", speed);
-	} else {
-		if (speed > 1000000) {
-			tmp = speed%10000;
-			if (tmp >= 5000)
-				speed += 10000;
-			printf("%u.%02u GHz", ((unsigned int) speed/1000000),
-				((unsigned int) (speed%1000000)/10000));
-		} else if (speed > 100000) {
-			tmp = speed%1000;
-			if (tmp >= 500)
-				speed += 1000;
-			printf("%u MHz", ((unsigned int) speed/1000));
-		} else if (speed > 1000) {
-			tmp = speed%100;
-			if (tmp >= 50)
-				speed += 100;
-			printf("%u.%01u MHz", ((unsigned int) speed/1000),
-				((unsigned int) (speed%1000)/100));
-		}
-	}
-
-	return;
-}
-
 static void print_duration(unsigned long duration)
 {
 	unsigned long tmp;
@@ -254,11 +217,11 @@ static int get_boost_mode(unsigned int cpu)
 	if (freqs) {
 		printf(_("  boost frequency steps: "));
 		while (freqs->next) {
-			print_speed(freqs->frequency);
+			print_speed(freqs->frequency, no_rounding);
 			printf(", ");
 			freqs = freqs->next;
 		}
-		print_speed(freqs->frequency);
+		print_speed(freqs->frequency, no_rounding);
 		printf("\n");
 		cpufreq_put_available_frequencies(freqs);
 	}
@@ -277,7 +240,7 @@ static int get_freq_kernel(unsigned int cpu, unsigned int human)
 		return -EINVAL;
 	}
 	if (human) {
-		print_speed(freq);
+		print_speed(freq, no_rounding);
 	} else
 		printf("%lu", freq);
 	printf(_(" (asserted by call to kernel)\n"));
@@ -296,7 +259,7 @@ static int get_freq_hardware(unsigned int cpu, unsigned int human)
 		return -EINVAL;
 	}
 	if (human) {
-		print_speed(freq);
+		print_speed(freq, no_rounding);
 	} else
 		printf("%lu", freq);
 	printf(_(" (asserted by call to hardware)\n"));
@@ -316,9 +279,9 @@ static int get_hardware_limits(unsigned int cpu, unsigned int human)
 
 	if (human) {
 		printf(_("  hardware limits: "));
-		print_speed(min);
+		print_speed(min, no_rounding);
 		printf(" - ");
-		print_speed(max);
+		print_speed(max, no_rounding);
 		printf("\n");
 	} else {
 		printf("%lu %lu\n", min, max);
@@ -350,9 +313,9 @@ static int get_policy(unsigned int cpu)
 		return -EINVAL;
 	}
 	printf(_("  current policy: frequency should be within "));
-	print_speed(policy->min);
+	print_speed(policy->min, no_rounding);
 	printf(_(" and "));
-	print_speed(policy->max);
+	print_speed(policy->max, no_rounding);
 
 	printf(".\n                  ");
 	printf(_("The governor \"%s\" may decide which speed to use\n"
@@ -436,7 +399,7 @@ static int get_freq_stats(unsigned int cpu, unsigned int human)
 	struct cpufreq_stats *stats = cpufreq_get_stats(cpu, &total_time);
 	while (stats) {
 		if (human) {
-			print_speed(stats->frequency);
+			print_speed(stats->frequency, no_rounding);
 			printf(":%.2f%%",
 				(100.0 * stats->time_in_state) / total_time);
 		} else
@@ -486,11 +449,11 @@ static void debug_output_one(unsigned int cpu)
 	if (freqs) {
 		printf(_("  available frequency steps:  "));
 		while (freqs->next) {
-			print_speed(freqs->frequency);
+			print_speed(freqs->frequency, no_rounding);
 			printf(", ");
 			freqs = freqs->next;
 		}
-		print_speed(freqs->frequency);
+		print_speed(freqs->frequency, no_rounding);
 		printf("\n");
 		cpufreq_put_available_frequencies(freqs);
 	}
diff --git a/tools/power/cpupower/utils/helpers/helpers.h b/tools/power/cpupower/utils/helpers/helpers.h
index c03925bea655..fbbfa6047c83 100644
--- a/tools/power/cpupower/utils/helpers/helpers.h
+++ b/tools/power/cpupower/utils/helpers/helpers.h
@@ -200,5 +200,6 @@ extern struct bitmask *offline_cpus;
 void get_cpustate(void);
 void print_online_cpus(void);
 void print_offline_cpus(void);
+void print_speed(unsigned long speed, int no_rounding);
 
 #endif /* __CPUPOWERUTILS_HELPERS__ */
diff --git a/tools/power/cpupower/utils/helpers/misc.c b/tools/power/cpupower/utils/helpers/misc.c
index e0d3145434d3..d693c96cd09c 100644
--- a/tools/power/cpupower/utils/helpers/misc.c
+++ b/tools/power/cpupower/utils/helpers/misc.c
@@ -164,3 +164,45 @@ void print_offline_cpus(void)
 		printf(_("cpupower set operation was not performed on them\n"));
 	}
 }
+
+/*
+ * print_speed
+ *
+ * Print the exact CPU frequency with appropriate unit
+ */
+void print_speed(unsigned long speed, int no_rounding)
+{
+	unsigned long tmp;
+
+	if (no_rounding) {
+		if (speed > 1000000)
+			printf("%u.%06u GHz", ((unsigned int) speed/1000000),
+				((unsigned int) speed%1000000));
+		else if (speed > 1000)
+			printf("%u.%03u MHz", ((unsigned int) speed/1000),
+				(unsigned int) (speed%1000));
+		else
+			printf("%lu kHz", speed);
+	} else {
+		if (speed > 1000000) {
+			tmp = speed%10000;
+			if (tmp >= 5000)
+				speed += 10000;
+			printf("%u.%02u GHz", ((unsigned int) speed/1000000),
+				((unsigned int) (speed%1000000)/10000));
+		} else if (speed > 100000) {
+			tmp = speed%1000;
+			if (tmp >= 500)
+				speed += 1000;
+			printf("%u MHz", ((unsigned int) speed/1000));
+		} else if (speed > 1000) {
+			tmp = speed%100;
+			if (tmp >= 50)
+				speed += 100;
+			printf("%u.%01u MHz", ((unsigned int) speed/1000),
+				((unsigned int) (speed%1000)/100));
+		}
+	}
+
+	return;
+}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 20/21] cpupower: print amd-pstate information on cpupower
  2021-10-29 13:02 [PATCH v3 00/21] cpufreq: introduce a new AMD CPU frequency control mechanism Huang Rui
                   ` (18 preceding siblings ...)
  2021-10-29 13:02 ` [PATCH v3 19/21] cpupower: move print_speed function into misc helper Huang Rui
@ 2021-10-29 13:02 ` Huang Rui
  2021-10-29 13:02 ` [PATCH v3 21/21] Documentation: amd-pstate: add amd-pstate driver introduction Huang Rui
  2021-11-04 16:40 ` [PATCH v3 00/21] cpufreq: introduce a new AMD CPU frequency control mechanism Giovanni Gherdovich
  21 siblings, 0 replies; 50+ messages in thread
From: Huang Rui @ 2021-10-29 13:02 UTC (permalink / raw)
  To: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Borislav Petkov,
	Peter Zijlstra, Ingo Molnar, Giovanni Gherdovich, linux-pm
  Cc: Deepak Sharma, Alex Deucher, Mario Limonciello, Steven Noonan,
	Nathan Fontenot, Jinzhou Su, Xiaojian Du, linux-kernel, x86,
	Huang Rui

amd-pstate kernel module is using the fine grain frequency instead of
acpi hardware pstate. So the performance and frequency values should be
printed in frequency-info.

Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 tools/power/cpupower/utils/cpufreq-info.c    |  9 ++++--
 tools/power/cpupower/utils/helpers/amd.c     | 32 ++++++++++++++++++++
 tools/power/cpupower/utils/helpers/helpers.h |  5 +++
 3 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/tools/power/cpupower/utils/cpufreq-info.c b/tools/power/cpupower/utils/cpufreq-info.c
index b429454bf3ae..f828f3c35a6f 100644
--- a/tools/power/cpupower/utils/cpufreq-info.c
+++ b/tools/power/cpupower/utils/cpufreq-info.c
@@ -146,9 +146,12 @@ static int get_boost_mode_x86(unsigned int cpu)
 	printf(_("    Supported: %s\n"), support ? _("yes") : _("no"));
 	printf(_("    Active: %s\n"), active ? _("yes") : _("no"));
 
-	if ((cpupower_cpu_info.vendor == X86_VENDOR_AMD &&
-	     cpupower_cpu_info.family >= 0x10) ||
-	     cpupower_cpu_info.vendor == X86_VENDOR_HYGON) {
+	if (cpupower_cpu_info.vendor == X86_VENDOR_AMD &&
+	    cpupower_cpu_info.caps & CPUPOWER_CAP_AMD_PSTATE) {
+		amd_pstate_show_perf_and_freq(cpu, no_rounding);
+	} else if ((cpupower_cpu_info.vendor == X86_VENDOR_AMD &&
+		    cpupower_cpu_info.family >= 0x10) ||
+		   cpupower_cpu_info.vendor == X86_VENDOR_HYGON) {
 		ret = decode_pstates(cpu, b_states, pstates, &pstate_no);
 		if (ret)
 			return ret;
diff --git a/tools/power/cpupower/utils/helpers/amd.c b/tools/power/cpupower/utils/helpers/amd.c
index 92b9fb631768..fa38d3da42ce 100644
--- a/tools/power/cpupower/utils/helpers/amd.c
+++ b/tools/power/cpupower/utils/helpers/amd.c
@@ -200,5 +200,37 @@ void amd_pstate_boost_init(unsigned int cpu, int *support, int *active)
 	*active = cpuinfo_max == amd_pstate_max ? 1 : 0;
 }
 
+void amd_pstate_show_perf_and_freq(unsigned int cpu, int no_rounding)
+{
+	unsigned long cpuinfo_max, cpuinfo_min;
+
+	cpufreq_get_hardware_limits(cpu, &cpuinfo_min, &cpuinfo_max);
+
+	printf(_("    AMD PSTATE Highest Performance: %lu. Maximum Frequency: "),
+	       amd_pstate_get_data(cpu, AMD_PSTATE_HIGHEST_PERF));
+	/* If boost isn't active, the cpuinfo_max doesn't indicate real max
+	 * frequency. So we read it back from amd-pstate sysfs entry.
+	 */
+	print_speed(amd_pstate_get_data(cpu, AMD_PSTATE_MAX_FREQ), no_rounding);
+	printf(".\n");
+
+	printf(_("    AMD PSTATE Nominal Performance: %lu. Nominal Frequency: "),
+	       amd_pstate_get_data(cpu, AMD_PSTATE_NOMINAL_PERF));
+	print_speed(amd_pstate_get_data(cpu, AMD_PSTATE_NOMINAL_FREQ),
+		    no_rounding);
+	printf(".\n");
+
+	printf(_("    AMD PSTATE Lowest Non-linear Performance: %lu. Lowest Non-linear Frequency: "),
+	       amd_pstate_get_data(cpu, AMD_PSTATE_LOWEST_NONLINEAR_PERF));
+	print_speed(amd_pstate_get_data(cpu, AMD_PSTATE_LOWEST_NONLINEAR_FREQ),
+		    no_rounding);
+	printf(".\n");
+
+	printf(_("    AMD PSTATE Lowest Performance: %lu. Lowest Frequency: "),
+	       amd_pstate_get_data(cpu, AMD_PSTATE_LOWEST_PERF));
+	print_speed(cpuinfo_min, no_rounding);
+	printf(".\n");
+}
+
 /* AMD P-States Helper Functions ***************/
 #endif /* defined(__i386__) || defined(__x86_64__) */
diff --git a/tools/power/cpupower/utils/helpers/helpers.h b/tools/power/cpupower/utils/helpers/helpers.h
index fbbfa6047c83..5f6862502dbf 100644
--- a/tools/power/cpupower/utils/helpers/helpers.h
+++ b/tools/power/cpupower/utils/helpers/helpers.h
@@ -142,6 +142,8 @@ extern int cpufreq_has_boost_support(unsigned int cpu, int *support,
 extern bool cpupower_amd_pstate_enabled(void);
 extern void amd_pstate_boost_init(unsigned int cpu,
 				  int *support, int *active);
+extern void amd_pstate_show_perf_and_freq(unsigned int cpu,
+					  int no_rounding);
 
 /* AMD P-States stuff **************************/
 
@@ -182,6 +184,9 @@ static inline bool cpupower_amd_pstate_enabled(void)
 static void amd_pstate_boost_init(unsigned int cpu,
 				  int *support, int *active)
 { return; }
+static inline void amd_pstate_show_perf_and_freq(unsigned int cpu,
+						 int no_rounding)
+{ return; }
 
 /* cpuid and cpuinfo helpers  **************************/
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 21/21] Documentation: amd-pstate: add amd-pstate driver introduction
  2021-10-29 13:02 [PATCH v3 00/21] cpufreq: introduce a new AMD CPU frequency control mechanism Huang Rui
                   ` (19 preceding siblings ...)
  2021-10-29 13:02 ` [PATCH v3 20/21] cpupower: print amd-pstate information on cpupower Huang Rui
@ 2021-10-29 13:02 ` Huang Rui
  2021-11-04 16:40 ` [PATCH v3 00/21] cpufreq: introduce a new AMD CPU frequency control mechanism Giovanni Gherdovich
  21 siblings, 0 replies; 50+ messages in thread
From: Huang Rui @ 2021-10-29 13:02 UTC (permalink / raw)
  To: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Borislav Petkov,
	Peter Zijlstra, Ingo Molnar, Giovanni Gherdovich, linux-pm
  Cc: Deepak Sharma, Alex Deucher, Mario Limonciello, Steven Noonan,
	Nathan Fontenot, Jinzhou Su, Xiaojian Du, linux-kernel, x86,
	Huang Rui

Introduce the amd-pstate driver design and implementation.

Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 Documentation/admin-guide/pm/amd-pstate.rst   | 373 ++++++++++++++++++
 .../admin-guide/pm/working-state.rst          |   1 +
 2 files changed, 374 insertions(+)
 create mode 100644 Documentation/admin-guide/pm/amd-pstate.rst

diff --git a/Documentation/admin-guide/pm/amd-pstate.rst b/Documentation/admin-guide/pm/amd-pstate.rst
new file mode 100644
index 000000000000..375374e3eb80
--- /dev/null
+++ b/Documentation/admin-guide/pm/amd-pstate.rst
@@ -0,0 +1,373 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
+
+===============================================
+``amd-pstate`` CPU Performance Scaling Driver
+===============================================
+
+:Copyright: |copy| 2021 Advanced Micro Devices, Inc.
+
+:Author: Huang Rui <ray.huang@amd.com>
+
+
+Introduction
+===================
+
+``amd-pstate`` is the AMD CPU performance scaling driver that introduces a
+new CPU frequency control mechanism on modern AMD APU and CPU series in
+Linux kernel. The new mechanism is based on Collaborative Processor
+Performance Control (CPPC) which provides finer grain frequency management
+than legacy ACPI hardware P-States. Current AMD CPU/APU platforms are using
+the ACPI P-states driver to manage CPU frequency and clocks with switching
+only in 3 P-states. CPPC replaces the ACPI P-states controls, allows a
+flexible, low-latency interface for the Linux kernel to directly
+communicate the performance hints to hardware.
+
+``amd-pstate`` leverages the Linux kernel governors such as ``schedutil``,
+``ondemand``, etc. to manage the performance hints which are provided by
+CPPC hardware functionality that internally follows the hardware
+specification (for details refer to AMD64 Architecture Programmer's Manual
+Volume 2: System Programming [1]_). Currently ``amd-pstate`` supports basic
+frequency control function according to kernel governors on some of the
+Zen2 and Zen3 processors, and we will implement more AMD specific functions
+in future after we verify them on the hardware and SBIOS.
+
+
+AMD CPPC Overview
+=======================
+
+Collaborative Processor Performance Control (CPPC) interface enumerates a
+continuous, abstract, and unit-less performance value in a scale that is
+not tied to a specific performance state / frequency. This is an ACPI
+standard [2]_ which software can specify application performance goals and
+hints as a relative target to the infrastructure limits. AMD processors
+provides the low latency register model (MSR) instead of AML code
+interpreter for performance adjustments. ``amd-pstate`` will initialize a
+``struct cpufreq_driver`` instance ``amd_pstate_driver`` with the callbacks
+to manage each performance update behavior. ::
+
+ Highest Perf ------>+-----------------------+                         +-----------------------+
+                     |                       |                         |                       |
+                     |                       |                         |                       |
+                     |                       |          Max Perf  ---->|                       |
+                     |                       |                         |                       |
+                     |                       |                         |                       |
+ Nominal Perf ------>+-----------------------+                         +-----------------------+
+                     |                       |                         |                       |
+                     |                       |                         |                       |
+                     |                       |                         |                       |
+                     |                       |                         |                       |
+                     |                       |                         |                       |
+                     |                       |                         |                       |
+                     |                       |      Desired Perf  ---->|                       |
+                     |                       |                         |                       |
+                     |                       |                         |                       |
+                     |                       |                         |                       |
+                     |                       |                         |                       |
+                     |                       |                         |                       |
+                     |                       |                         |                       |
+                     |                       |                         |                       |
+                     |                       |                         |                       |
+                     |                       |                         |                       |
+  Lowest non-        |                       |                         |                       |
+  linear perf ------>+-----------------------+                         +-----------------------+
+                     |                       |                         |                       |
+                     |                       |       Lowest perf  ---->|                       |
+                     |                       |                         |                       |
+  Lowest perf ------>+-----------------------+                         +-----------------------+
+                     |                       |                         |                       |
+                     |                       |                         |                       |
+                     |                       |                         |                       |
+          0   ------>+-----------------------+                         +-----------------------+
+
+                                     AMD P-States Performance Scale
+
+
+.. _perf_cap:
+
+AMD CPPC Performance Capability
+--------------------------------
+
+Highest Performance (RO)
+.........................
+
+It is the absolute maximum performance an individual processor may reach,
+assuming ideal conditions. This performance level may not be sustainable
+for long durations and may only be achievable if other platform components
+are in a specific state; for example, it may require other processors be in
+an idle state. This would be equivalent to the highest frequencies
+supported by the processor.
+
+Nominal (Guaranteed) Performance (RO)
+......................................
+
+It is the maximum sustained performance level of the processor, assuming
+ideal operating conditions. In absence of an external constraint (power,
+thermal, etc.) this is the performance level the processor is expected to
+be able to maintain continuously. All cores/processors are expected to be
+able to sustain their nominal performance state simultaneously.
+
+Lowest non-linear Performance (RO)
+...................................
+
+It is the lowest performance level at which nonlinear power savings are
+achieved, for example, due to the combined effects of voltage and frequency
+scaling. Above this threshold, lower performance levels should be generally
+more energy efficient than higher performance levels. This register
+effectively conveys the most efficient performance level to ``amd-pstate``.
+
+Lowest Performance (RO)
+........................
+
+It is the absolute lowest performance level of the processor. Selecting a
+performance level lower than the lowest nonlinear performance level may
+cause an efficiency penalty but should reduce the instantaneous power
+consumption of the processor.
+
+AMD CPPC Performance Control
+------------------------------
+
+``amd-pstate`` passes performance goals through these registers. The
+register drives the behavior of the desired performance target.
+
+Minimum requested performance (RW)
+...................................
+
+``amd-pstate`` specifies the minimum allowed performance level.
+
+Maximum requested performance (RW)
+...................................
+
+``amd-pstate`` specifies a limit the maximum performance that is expected
+to be supplied by the hardware.
+
+Desired performance target (RW)
+...................................
+
+``amd-pstate`` specifies a desired target in the CPPC performance scale as
+a relative number. This can be expressed as percentage of nominal
+performance (infrastructure max). Below the nominal sustained performance
+level, desired performance expresses the average performance level of the
+processor subject to hardware. Above the nominal performance level,
+processor must provide at least nominal performance requested and go higher
+if current operating conditions allow.
+
+Energy Performance Preference (EPP) (RW)
+.........................................
+
+Provides a hint to the hardware if software wants to bias toward performance
+(0x0) or energy efficiency (0xff).
+
+
+Key Governors Support
+=======================
+
+``amd-pstate`` can be used with all the (generic) scaling governors listed
+by the ``scaling_available_governors`` policy attribute in ``sysfs``. Then,
+it is responsible for the configuration of policy objects corresponding to
+CPUs and provides the ``CPUFreq`` core (and the scaling governors attached
+to the policy objects) with accurate information on the maximum and minimum
+operating frequencies supported by the hardware. Users can check the
+``scaling_cur_freq`` information comes from the ``CPUFreq`` core.
+
+``amd-pstate`` mainly supports ``schedutil`` and ``ondemand`` for dynamic
+frequency control. It is to fine tune the processor configuration on
+``amd-pstate`` to the ``schedutil`` with CPU CFS scheduler. ``amd-pstate``
+registers adjust_perf callback to implement the CPPC similar performance
+update behavior. It is initialized by ``sugov_start`` and then populate the
+CPU's update_util_data pointer to assign ``sugov_update_single_perf`` as
+the utilization update callback function in CPU scheduler. CPU scheduler
+will call ``cpufreq_update_util`` and assign the target performance
+according to the ``struct sugov_cpu`` that utilization update belongs to.
+Then ``amd-pstate`` updates the desired performance according to the CPU
+scheduler assigned.
+
+
+Processor Support
+=======================
+
+The ``amd-pstate`` initialization will fail if the _CPC in ACPI SBIOS is
+not existed at the detected processor, and it uses ``acpi_cpc_valid`` to
+check the _CPC existence. All Zen based processors support legacy ACPI
+hardware P-States function, so while the ``amd-pstate`` fails to be
+initialized, the kernel will fall back to initialize ``acpi-cpufreq``
+driver.
+
+There are two types of hardware implementations for ``amd-pstate``: one is
+`Full MSR Support <perf_cap_>`_ and another is `Shared Memory Support
+<perf_cap_>`_. It can use :c:macro:`X86_FEATURE_AMD_CPPC` feature flag (for
+details refer to Processor Programming Reference (PPR) for AMD Family
+19h Model 21h, Revision B0 Processors [3]_) to indicate the different
+types. ``amd-pstate`` is to register different ``amd_pstate_perf_funcs``
+instances for different hardware implementations.
+
+Currently, some of Zen2 and Zen3 processors support ``amd-pstate``. In the
+future, it will be supported on more and more AMD processors.
+
+Full MSR Support
+-----------------
+
+Some new Zen3 processors such as Cezanne provide the MSR registers directly
+while the :c:macro:`X86_FEATURE_AMD_CPPC` CPU feature flag is set.
+``amd-pstate`` can handle the MSR register to implement the fast switch
+function in ``CPUFreq`` that can shrink latency of frequency control on the
+interrupt context.
+
+Shared Memory Support
+----------------------
+
+If :c:macro:`X86_FEATURE_AMD_CPPC` CPU feature flag is not set, that means
+the processor supports shared memory solution. In this case, ``amd-pstate``
+uses the ``cppc_acpi`` helper methods to implement the callback functions
+of ``amd_pstate_perf_funcs``.
+
+
+AMD P-States and ACPI hardware P-States always can be supported in one
+processor. But AMD P-States has the higher priority and if it is enabled
+with :c:macro:`MSR_AMD_CPPC_ENABLE` or ``cppc_set_enable``, it will respond
+to the request from AMD P-States.
+
+
+User Space Interface in ``sysfs``
+==================================
+
+``amd-pstate`` exposes several global attributes (files) in ``sysfs`` to
+control its functionality at the system level. They located in the
+``/sys/devices/system/cpu/cpufreq/policyX/`` directory and affect all CPUs. ::
+
+ root@hr-test1:/home/ray# ls /sys/devices/system/cpu/cpufreq/policy0/*amd*
+ /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_highest_perf
+ /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_lowest_nonlinear_freq
+ /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_lowest_nonlinear_perf
+ /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_lowest_perf
+ /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_max_freq
+ /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_min_freq
+ /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_nominal_freq
+ /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_nominal_perf
+
+
+``amd_pstate_highest_perf / amd_pstate_max_freq``
+
+Maximum CPPC performance and CPU frequency that the driver is allowed to
+set in percent of the maximum supported CPPC performance level (the highest
+performance supported in `AMD CPPC Performance Capability <perf_cap_>`_).
+This attribute is read-only.
+
+``amd_pstate_nominal_perf / amd_pstate_nominal_freq``
+
+Nominal CPPC performance and CPU frequency that the driver is allowed to
+set in percent of the maximum supported CPPC performance level (Please see
+nominal performance in `AMD CPPC Performance Capability <perf_cap_>`_).
+This attribute is read-only.
+
+``amd_pstate_lowest_nonlinear_perf / amd_pstate_lowest_nonlinear_freq``
+
+The lowest non-linear CPPC performance and CPU frequency that the driver is
+allowed to set in percent of the maximum supported CPPC performance level
+(Please see the lowest non-linear performance in `AMD CPPC Performance
+Capability <perf_cap_>`_).
+This attribute is read-only.
+
+``amd_pstate_lowest_perf``
+
+The lowest physical CPPC performance. The minimum CPU frequency can be read
+back from ``cpuinfo`` member of ``cpufreq_policy``, so we won't expose it
+here.
+This attribute is read-only.
+
+
+``amd-pstate`` vs ``acpi-cpufreq``
+======================================
+
+On majority of AMD platforms supported by ``acpi-cpufreq``, the ACPI tables
+provided by the platform firmware used for CPU performance scaling, but
+only provides 3 P-states on AMD processors.
+However, on modern AMD APU and CPU series, it provides the collaborative
+processor performance control according to ACPI protocol and customize this
+for AMD platforms. That is fine-grain and continuous frequency range
+instead of the legacy hardware P-states. ``amd-pstate`` is the kernel
+module which supports the new AMD P-States mechanism on most of future AMD
+platforms. The AMD P-States mechanism will be the more performance and energy
+efficiency frequency management method on AMD processors.
+
+``cpupower`` tool support for ``amd-pstate``
+===============================================
+
+``amd-pstate`` is supported on ``cpupower`` tool that can be used to dump the frequency
+information. And it is in progress to support more and more operations for new
+``amd-pstate`` module with this tool. ::
+
+ root@hr-test1:/home/ray# cpupower frequency-info
+ analyzing CPU 0:
+   driver: amd-pstate
+   CPUs which run at the same hardware frequency: 0
+   CPUs which need to have their frequency coordinated by software: 0
+   maximum transition latency: 131 us
+   hardware limits: 400 MHz - 4.68 GHz
+   available cpufreq governors: ondemand conservative powersave userspace performance schedutil
+   current policy: frequency should be within 400 MHz and 4.68 GHz.
+                   The governor "schedutil" may decide which speed to use
+                   within this range.
+   current CPU frequency: Unable to call hardware
+   current CPU frequency: 4.02 GHz (asserted by call to kernel)
+   boost state support:
+     Supported: yes
+     Active: yes
+     AMD PSTATE Highest Performance: 166. Maximum Frequency: 4.68 GHz.
+     AMD PSTATE Nominal Performance: 117. Nominal Frequency: 3.30 GHz.
+     AMD PSTATE Lowest Non-linear Performance: 39. Lowest Non-linear Frequency: 1.10 GHz.
+     AMD PSTATE Lowest Performance: 15. Lowest Frequency: 400 MHz.
+
+
+Diagnostics and Tuning
+=======================
+
+Trace Events
+--------------
+
+There are two static trace events that can be used for ``amd-pstate``
+diagnostics.  One of them is the cpu_frequency trace event generally used
+by ``CPUFreq``, and the other one is the ``amd_pstate_perf`` trace event
+specific to ``amd-pstate``.  The following sequence of shell commands can
+be used to enable them and see their output (if the kernel is generally
+configured to support event tracing). ::
+
+ root@hr-test1:/home/ray# cd /sys/kernel/tracing/
+ root@hr-test1:/sys/kernel/tracing# echo 1 > events/amd_cpu/enable
+ root@hr-test1:/sys/kernel/tracing# cat trace
+ # tracer: nop
+ #
+ # entries-in-buffer/entries-written: 47827/42233061   #P:2
+ #
+ #                                _-----=> irqs-off
+ #                               / _----=> need-resched
+ #                              | / _---=> hardirq/softirq
+ #                              || / _--=> preempt-depth
+ #                              ||| /     delay
+ #           TASK-PID     CPU#  ||||   TIMESTAMP  FUNCTION
+ #              | |         |   ||||      |         |
+          <idle>-0       [015] dN...  4995.979886: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=15 changed=false fast_switch=true
+          <idle>-0       [007] d.h..  4995.979893: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=7 changed=false fast_switch=true
+             cat-2161    [000] d....  4995.980841: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=0 changed=false fast_switch=true
+            sshd-2125    [004] d.s..  4995.980968: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=4 changed=false fast_switch=true
+          <idle>-0       [007] d.s..  4995.980968: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=7 changed=false fast_switch=true
+          <idle>-0       [003] d.s..  4995.980971: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=3 changed=false fast_switch=true
+          <idle>-0       [011] d.s..  4995.980996: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=11 changed=false fast_switch=true
+
+The cpu_frequency trace event will be triggered either by the ``schedutil`` scaling
+governor (for the policies it is attached to), or by the ``CPUFreq`` core (for the
+policies with other scaling governors).
+
+
+Reference
+===========
+
+.. [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming,
+       https://www.amd.com/system/files/TechDocs/24593.pdf
+
+.. [2] Advanced Configuration and Power Interface Specification,
+       https://uefi.org/sites/default/files/resources/ACPI_Spec_6_4_Jan22.pdf
+
+.. [3] Processor Programming Reference (PPR) for AMD Family 19h Model 21h, Revision B0 Processors
+       https://www.amd.com/system/files/TechDocs/55898_B1_pub_0.50.zip
+
diff --git a/Documentation/admin-guide/pm/working-state.rst b/Documentation/admin-guide/pm/working-state.rst
index f40994c422dc..5d2757e2de65 100644
--- a/Documentation/admin-guide/pm/working-state.rst
+++ b/Documentation/admin-guide/pm/working-state.rst
@@ -11,6 +11,7 @@ Working-State Power Management
    intel_idle
    cpufreq
    intel_pstate
+   amd-pstate
    cpufreq_drivers
    intel_epb
    intel-speed-select
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 17/21] cpupower: add amd-pstate sysfs definition and access helper
  2021-10-29 13:02 ` [PATCH v3 17/21] cpupower: add amd-pstate sysfs definition and access helper Huang Rui
@ 2021-10-29 14:10   ` Limonciello, Mario
  2021-11-01  9:14     ` Huang Rui
  0 siblings, 1 reply; 50+ messages in thread
From: Limonciello, Mario @ 2021-10-29 14:10 UTC (permalink / raw)
  To: Huang Rui, Rafael J . Wysocki, Viresh Kumar, Shuah Khan,
	Borislav Petkov, Peter Zijlstra, Ingo Molnar,
	Giovanni Gherdovich, linux-pm
  Cc: Deepak Sharma, Alex Deucher, Steven Noonan, Nathan Fontenot,
	Jinzhou Su, Xiaojian Du, linux-kernel, x86

On 10/29/2021 08:02, Huang Rui wrote:
> Introduce the marco definitions and access helper function for

You've got a spelling error here.

> amd-pstate sysfs interfaces such as each performance goals and frequency
> levels in amd helper file. They will be used to read the sysfs attribute
> from amd-pstate cpufreq driver for cpupower utilities.
> 
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> ---
>   tools/power/cpupower/utils/helpers/amd.c | 37 ++++++++++++++++++++++++
>   1 file changed, 37 insertions(+)
> 
> diff --git a/tools/power/cpupower/utils/helpers/amd.c b/tools/power/cpupower/utils/helpers/amd.c
> index 97f2c857048e..f233a6ab75ac 100644
> --- a/tools/power/cpupower/utils/helpers/amd.c
> +++ b/tools/power/cpupower/utils/helpers/amd.c
> @@ -8,7 +8,9 @@
>   #include <pci/pci.h>
>   
>   #include "helpers/helpers.h"
> +#include "cpufreq.h"
>   
> +/* ACPI P-States Helper Functions for AMD Processors ***************/ >   #define MSR_AMD_PSTATE_STATUS	0xc0010063
>   #define MSR_AMD_PSTATE		0xc0010064
>   #define MSR_AMD_PSTATE_LIMIT	0xc0010061
> @@ -146,4 +148,39 @@ int amd_pci_get_num_boost_states(int *active, int *states)
>   	pci_cleanup(pci_acc);
>   	return 0;
>   }
> +
> +/* ACPI P-States Helper Functions for AMD Processors ***************/
> +
> +/* AMD P-States Helper Functions ***************/
> +enum amd_pstate_value {
> +	AMD_PSTATE_HIGHEST_PERF,
> +	AMD_PSTATE_NOMINAL_PERF,
> +	AMD_PSTATE_LOWEST_NONLINEAR_PERF,
> +	AMD_PSTATE_LOWEST_PERF,
> +	AMD_PSTATE_MAX_FREQ,
> +	AMD_PSTATE_NOMINAL_FREQ,
> +	AMD_PSTATE_LOWEST_NONLINEAR_FREQ,
> +	MAX_AMD_PSTATE_VALUE_READ_FILES
> +};
> +
> +static const char *amd_pstate_value_files[MAX_AMD_PSTATE_VALUE_READ_FILES] = {
> +	[AMD_PSTATE_HIGHEST_PERF] = "amd_pstate_highest_perf",
> +	[AMD_PSTATE_NOMINAL_PERF] = "amd_pstate_nominal_perf",
> +	[AMD_PSTATE_LOWEST_NONLINEAR_PERF] = "amd_pstate_lowest_nonlinear_perf",
> +	[AMD_PSTATE_LOWEST_PERF] = "amd_pstate_lowest_perf",
> +	[AMD_PSTATE_MAX_FREQ] = "amd_pstate_max_freq",
> +	[AMD_PSTATE_NOMINAL_FREQ] = "amd_pstate_nominal_freq",
> +	[AMD_PSTATE_LOWEST_NONLINEAR_FREQ] = "amd_pstate_lowest_nonlinear_freq",
> +};
> +
> +static unsigned long amd_pstate_get_data(unsigned int cpu,
> +					 enum amd_pstate_value value)
> +{
> +	return cpufreq_get_sysfs_value_from_table(cpu,
> +						  amd_pstate_value_files,
> +						  value,
> +						  MAX_AMD_PSTATE_VALUE_READ_FILES);
> +}
> +
> +/* AMD P-States Helper Functions ***************/
>   #endif /* defined(__i386__) || defined(__x86_64__) */
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 05/21] ACPI: CPPC: add cppc enable register function
  2021-10-29 13:02 ` [PATCH v3 05/21] ACPI: CPPC: add cppc enable register function Huang Rui
@ 2021-10-29 14:15   ` Limonciello, Mario
  2021-11-01  9:20     ` Huang Rui
  0 siblings, 1 reply; 50+ messages in thread
From: Limonciello, Mario @ 2021-10-29 14:15 UTC (permalink / raw)
  To: Huang Rui, Rafael J . Wysocki, Viresh Kumar, Shuah Khan,
	Borislav Petkov, Peter Zijlstra, Ingo Molnar,
	Giovanni Gherdovich, linux-pm
  Cc: Deepak Sharma, Alex Deucher, Steven Noonan, Nathan Fontenot,
	Jinzhou Su, Xiaojian Du, linux-kernel, x86

On 10/29/2021 08:02, Huang Rui wrote:
> From: Jinzhou Su <Jinzhou.Su@amd.com>
> 
> Add a new function to enable CPPC feature. This function
> will write Continuous Performance Control package
> EnableRegister field on the processor.
> 
> CPPC EnableRegister register described in section 8.4.7.1 of ACPI 6.4:
> This element is optional. If supported, contains a resource descriptor
> with a single Register() descriptor that describes a register to which
> OSPM writes a One to enable CPPC on this processor. Before this register
> is set, the processor will be controlled by legacy mechanisms (ACPI
> Pstates, firmware, etc.).
> 
> This register will be used for AMD processors to enable amd-pstate
> function instead of legacy ACPI P-States.
> 
> Signed-off-by: Jinzhou Su <Jinzhou.Su@amd.com>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> ---
>   drivers/acpi/cppc_acpi.c | 45 ++++++++++++++++++++++++++++++++++++++++
>   include/acpi/cppc_acpi.h |  5 +++++
>   2 files changed, 50 insertions(+)
> 
> diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
> index c9169c221209..2d2297ef5bf9 100644
> --- a/drivers/acpi/cppc_acpi.c
> +++ b/drivers/acpi/cppc_acpi.c
> @@ -1275,6 +1275,51 @@ int cppc_get_perf_ctrs(int cpunum, struct cppc_perf_fb_ctrs *perf_fb_ctrs)
>   }
>   EXPORT_SYMBOL_GPL(cppc_get_perf_ctrs);
>   
> +/**
> + * cppc_set_enable - Set to enable CPPC on the processor by writing the
> + * Continuous Performance Control package EnableRegister feild.

s/feild/field/

> + * @cpu: CPU for which to enable CPPC register.
> + * @enable: 0 - disable, 1 - enable CPPC feature on the processor.
> + *
> + * Return: 0 for success, -ERRNO or -EIO otherwise.
> + */
> +int cppc_set_enable(int cpu, bool enable)
> +{
> +	int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpu);
> +	struct cpc_register_resource *enable_reg;
> +	struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpu);
> +	struct cppc_pcc_data *pcc_ss_data = NULL;
> +	int ret = -EINVAL;
> +
> +	if (!cpc_desc) {
> +		pr_debug("No CPC descriptor for CPU:%d\n", cpu);
> +		return -EINVAL;
> +	}

Can this actually happen or is just an extra safety check?
Don't you block running based on acpi_cpc_valid?

> +
> +	enable_reg = &cpc_desc->cpc_regs[ENABLE];
> +
> +	if (CPC_IN_PCC(enable_reg)) {
> +
> +		if (pcc_ss_id < 0)
> +			return -EIO;
> +
> +		ret = cpc_write(cpu, enable_reg, enable);
> +		if (ret)
> +			return ret;
> +
> +		pcc_ss_data = pcc_data[pcc_ss_id];
> +
> +		down_write(&pcc_ss_data->pcc_lock);
> +		/* after writing CPC, transfer the ownership of PCC to platfrom */
> +		ret = send_pcc_cmd(pcc_ss_id, CMD_WRITE);
> +		up_write(&pcc_ss_data->pcc_lock);
> +		return ret;
> +	}
> +
> +	return cpc_write(cpu, enable_reg, enable);
> +}
> +EXPORT_SYMBOL_GPL(cppc_set_enable);
> +
>   /**
>    * cppc_set_perf - Set a CPU's performance controls.
>    * @cpu: CPU for which to set performance controls.
> diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h
> index bc159a9b4a73..92b7ea8d8f5e 100644
> --- a/include/acpi/cppc_acpi.h
> +++ b/include/acpi/cppc_acpi.h
> @@ -138,6 +138,7 @@ extern int cppc_get_desired_perf(int cpunum, u64 *desired_perf);
>   extern int cppc_get_nominal_perf(int cpunum, u64 *nominal_perf);
>   extern int cppc_get_perf_ctrs(int cpu, struct cppc_perf_fb_ctrs *perf_fb_ctrs);
>   extern int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls);
> +extern int cppc_set_enable(int cpu, bool enable);
>   extern int cppc_get_perf_caps(int cpu, struct cppc_perf_caps *caps);
>   extern bool acpi_cpc_valid(void);
>   extern int acpi_get_psd_map(unsigned int cpu, struct cppc_cpudata *cpu_data);
> @@ -162,6 +163,10 @@ static inline int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls)
>   {
>   	return -ENOTSUPP;
>   }
> +static inline int cppc_set_enable(int cpu, bool enable)
> +{
> +	return -ENOTSUPP;
> +}
>   static inline int cppc_get_perf_caps(int cpu, struct cppc_perf_caps *caps)
>   {
>   	return -ENOTSUPP;
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 07/21] cpufreq: amd: add fast switch function for amd-pstate
  2021-10-29 13:02 ` [PATCH v3 07/21] cpufreq: amd: add fast switch function for amd-pstate Huang Rui
@ 2021-10-29 14:16   ` Limonciello, Mario
  2021-11-02 19:56   ` Nathan Fontenot
  1 sibling, 0 replies; 50+ messages in thread
From: Limonciello, Mario @ 2021-10-29 14:16 UTC (permalink / raw)
  To: Huang Rui, Rafael J . Wysocki, Viresh Kumar, Shuah Khan,
	Borislav Petkov, Peter Zijlstra, Ingo Molnar,
	Giovanni Gherdovich, linux-pm
  Cc: Deepak Sharma, Alex Deucher, Steven Noonan, Nathan Fontenot,
	Jinzhou Su, Xiaojian Du, linux-kernel, x86

On 10/29/2021 08:02, Huang Rui wrote:
> Introduce the fast switch function for amd-pstate on the AMD processors
> which support the full MSR register control. It's able to decrease the
> lattency on interrupt context.

s/lattency/latency/

> 
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> ---
>   drivers/cpufreq/amd-pstate.c | 38 ++++++++++++++++++++++++++++++++++++
>   1 file changed, 38 insertions(+)
> 
> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> index a400861c7fdc..55ff03f85608 100644
> --- a/drivers/cpufreq/amd-pstate.c
> +++ b/drivers/cpufreq/amd-pstate.c
> @@ -191,6 +191,41 @@ static int amd_pstate_target(struct cpufreq_policy *policy,
>   	return 0;
>   }
>   
> +static void amd_pstate_adjust_perf(unsigned int cpu,
> +				   unsigned long min_perf,
> +				   unsigned long target_perf,
> +				   unsigned long capacity)
> +{
> +	unsigned long amd_max_perf, amd_min_perf, amd_des_perf,
> +		      amd_cap_perf, lowest_nonlinear_perf;
> +	struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
> +	struct amd_cpudata *cpudata = policy->driver_data;
> +
> +	amd_cap_perf = READ_ONCE(cpudata->highest_perf);
> +	lowest_nonlinear_perf = READ_ONCE(cpudata->lowest_nonlinear_perf);
> +
> +	if (target_perf < capacity)
> +		amd_des_perf = DIV_ROUND_UP(amd_cap_perf * target_perf,
> +					    capacity);
> +
> +	amd_min_perf = READ_ONCE(cpudata->highest_perf);
> +	if (min_perf < capacity)
> +		amd_min_perf = DIV_ROUND_UP(amd_cap_perf * min_perf, capacity);
> +
> +	if (amd_min_perf < lowest_nonlinear_perf)
> +		amd_min_perf = lowest_nonlinear_perf;
> +
> +	amd_max_perf = amd_cap_perf;
> +	if (amd_max_perf < amd_min_perf)
> +		amd_max_perf = amd_min_perf;
> +
> +	amd_des_perf = clamp_t(unsigned long, amd_des_perf,
> +			       amd_min_perf, amd_max_perf);
> +
> +	amd_pstate_update(cpudata, amd_min_perf, amd_des_perf,
> +			  amd_max_perf, true);
> +}
> +
>   static int amd_get_min_freq(struct amd_cpudata *cpudata)
>   {
>   	struct cppc_perf_caps cppc_perf;
> @@ -311,6 +346,8 @@ static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
>   	/* It will be updated by governor */
>   	policy->cur = policy->cpuinfo.min_freq;
>   
> +	policy->fast_switch_possible = true;
> +
>   	ret = freq_qos_add_request(&policy->constraints, &cpudata->req[0],
>   				   FREQ_QOS_MIN, policy->cpuinfo.min_freq);
>   	if (ret < 0) {
> @@ -360,6 +397,7 @@ static struct cpufreq_driver amd_pstate_driver = {
>   	.flags		= CPUFREQ_CONST_LOOPS | CPUFREQ_NEED_UPDATE_LIMITS,
>   	.verify		= amd_pstate_verify,
>   	.target		= amd_pstate_target,
> +	.adjust_perf    = amd_pstate_adjust_perf,
>   	.init		= amd_pstate_cpu_init,
>   	.exit		= amd_pstate_cpu_exit,
>   	.name		= "amd-pstate",
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 08/21] cpufreq: amd: add acpi cppc function as the backend for legacy processors
  2021-10-29 13:02 ` [PATCH v3 08/21] cpufreq: amd: add acpi cppc function as the backend for legacy processors Huang Rui
@ 2021-10-29 14:20   ` Limonciello, Mario
  2021-11-01  9:02     ` Huang Rui
  2021-11-02 18:46   ` Nathan Fontenot
  1 sibling, 1 reply; 50+ messages in thread
From: Limonciello, Mario @ 2021-10-29 14:20 UTC (permalink / raw)
  To: Huang Rui, Rafael J . Wysocki, Viresh Kumar, Shuah Khan,
	Borislav Petkov, Peter Zijlstra, Ingo Molnar,
	Giovanni Gherdovich, linux-pm
  Cc: Deepak Sharma, Alex Deucher, Steven Noonan, Nathan Fontenot,
	Jinzhou Su, Xiaojian Du, linux-kernel, x86

On 10/29/2021 08:02, Huang Rui wrote:
> In some old Zen based processors, they are using the shared memory that
> exposed from ACPI SBIOS.

I don't think this is only "old" processors.  I think there are "new" 
processors that just don't happen to implement the MSR too.

> 
> Signed-off-by: Jinzhou Su <Jinzhou.Su@amd.com>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> ---
>   drivers/cpufreq/amd-pstate.c | 58 ++++++++++++++++++++++++++++++++----
>   1 file changed, 53 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> index 55ff03f85608..d399938d6d85 100644
> --- a/drivers/cpufreq/amd-pstate.c
> +++ b/drivers/cpufreq/amd-pstate.c
> @@ -73,6 +73,19 @@ static inline int pstate_enable(bool enable)
>   	return wrmsrl_safe(MSR_AMD_CPPC_ENABLE, enable ? 1 : 0);
>   }
>   
> +static int cppc_enable(bool enable)
> +{
> +	int cpu, ret = 0;
> +
> +	for_each_online_cpu(cpu) {

I wonder if this should also be changed to present CPU instead of 
offline CPU.  Otherwise could this turn into a situation that the user 
starts with some CPU's offlined and enables them later but this doesn't 
end up applying to the CPUs that were started offlined and changed?

> +		ret = cppc_set_enable(cpu, enable ? 1 : 0);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	return ret;
> +}
> +
>   DEFINE_STATIC_CALL(amd_pstate_enable, pstate_enable);
>   
>   static inline int amd_pstate_enable(bool enable)
> @@ -103,6 +116,24 @@ static int pstate_init_perf(struct amd_cpudata *cpudata)
>   	return 0;
>   }
>   
> +static int cppc_init_perf(struct amd_cpudata *cpudata)
> +{
> +	struct cppc_perf_caps cppc_perf;
> +
> +	int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf);
> +	if (ret)
> +		return ret;
> +
> +	WRITE_ONCE(cpudata->highest_perf, amd_get_highest_perf());
> +
> +	WRITE_ONCE(cpudata->nominal_perf, cppc_perf.nominal_perf);
> +	WRITE_ONCE(cpudata->lowest_nonlinear_perf,
> +		   cppc_perf.lowest_nonlinear_perf);
> +	WRITE_ONCE(cpudata->lowest_perf, cppc_perf.lowest_perf);
> +
> +	return 0;
> +}
> +
>   DEFINE_STATIC_CALL(amd_pstate_init_perf, pstate_init_perf);
>   
>   static inline int amd_pstate_init_perf(struct amd_cpudata *cpudata)
> @@ -120,6 +151,19 @@ static void pstate_update_perf(struct amd_cpudata *cpudata, u32 min_perf,
>   			      READ_ONCE(cpudata->cppc_req_cached));
>   }
>   
> +static void cppc_update_perf(struct amd_cpudata *cpudata,
> +			     u32 min_perf, u32 des_perf,
> +			     u32 max_perf, bool fast_switch)
> +{
> +	struct cppc_perf_ctrls perf_ctrls;
> +
> +	perf_ctrls.max_perf = max_perf;
> +	perf_ctrls.min_perf = min_perf;
> +	perf_ctrls.desired_perf = des_perf;
> +
> +	cppc_set_perf(cpudata->cpu, &perf_ctrls);
> +}
> +
>   DEFINE_STATIC_CALL(amd_pstate_update_perf, pstate_update_perf);
>   
>   static inline void amd_pstate_update_perf(struct amd_cpudata *cpudata,
> @@ -346,7 +390,8 @@ static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
>   	/* It will be updated by governor */
>   	policy->cur = policy->cpuinfo.min_freq;
>   
> -	policy->fast_switch_possible = true;
> +	if (boot_cpu_has(X86_FEATURE_AMD_CPPC))
> +		policy->fast_switch_possible = true;
>   
>   	ret = freq_qos_add_request(&policy->constraints, &cpudata->req[0],
>   				   FREQ_QOS_MIN, policy->cpuinfo.min_freq);
> @@ -397,7 +442,6 @@ static struct cpufreq_driver amd_pstate_driver = {
>   	.flags		= CPUFREQ_CONST_LOOPS | CPUFREQ_NEED_UPDATE_LIMITS,
>   	.verify		= amd_pstate_verify,
>   	.target		= amd_pstate_target,
> -	.adjust_perf    = amd_pstate_adjust_perf,
>   	.init		= amd_pstate_cpu_init,
>   	.exit		= amd_pstate_cpu_exit,
>   	.name		= "amd-pstate",
> @@ -421,10 +465,14 @@ static int __init amd_pstate_init(void)
>   		return -EEXIST;
>   
>   	/* capability check */
> -	if (!boot_cpu_has(X86_FEATURE_AMD_CPPC)) {
> -		pr_debug("%s, AMD CPPC MSR based functionality is not supported\n",
> +	if (boot_cpu_has(X86_FEATURE_AMD_CPPC)) {
> +		pr_debug("%s, AMD CPPC MSR based functionality is supported\n",
>   			 __func__);
> -		return -ENODEV;
> +		amd_pstate_driver.adjust_perf = amd_pstate_adjust_perf;
> +	} else {
> +		static_call_update(amd_pstate_enable, cppc_enable);
> +		static_call_update(amd_pstate_init_perf, cppc_init_perf);
> +		static_call_update(amd_pstate_update_perf, cppc_update_perf);
>   	}
>   
>   	/* enable amd pstate feature */
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 01/21] x86/cpufreatures: add AMD Collaborative Processor Performance Control feature flag
  2021-10-29 13:02 ` [PATCH v3 01/21] x86/cpufreatures: add AMD Collaborative Processor Performance Control feature flag Huang Rui
@ 2021-10-29 14:39   ` Borislav Petkov
  2021-11-06 10:28   ` Borislav Petkov
  1 sibling, 0 replies; 50+ messages in thread
From: Borislav Petkov @ 2021-10-29 14:39 UTC (permalink / raw)
  To: Huang Rui
  Cc: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Peter Zijlstra,
	Ingo Molnar, Giovanni Gherdovich, linux-pm, Deepak Sharma,
	Alex Deucher, Mario Limonciello, Steven Noonan, Nathan Fontenot,
	Jinzhou Su, Xiaojian Du, linux-kernel, x86

On Fri, Oct 29, 2021 at 09:02:21PM +0800, Huang Rui wrote:
> Add Collaborative Processor Performance Control feature flag for AMD
> processors.
> 
> This feature flag will be used on the following amd-pstate driver. The
> amd-pstate driver has two approaches to implement the frequency control
> behavior. That depends on the CPU hardware implementation. One is "Full
> MSR Support" and another is "Shared Memory Support". The feature flag
> indicates the current processors with "Full MSR Support".
> 
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> ---
>  arch/x86/include/asm/cpufeatures.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index d0ce5cfd3ac1..f23dc1abd485 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -313,6 +313,7 @@
>  #define X86_FEATURE_AMD_SSBD		(13*32+24) /* "" Speculative Store Bypass Disable */
>  #define X86_FEATURE_VIRT_SSBD		(13*32+25) /* Virtualized Speculative Store Bypass Disable */
>  #define X86_FEATURE_AMD_SSB_NO		(13*32+26) /* "" Speculative Store Bypass is fixed in hardware. */
> +#define X86_FEATURE_AMD_CPPC		(13*32+27) /* Collaborative Processor Performance Control */
>  
>  /* Thermal and Power Management Leaf, CPUID level 0x00000006 (EAX), word 14 */
>  #define X86_FEATURE_DTHERM		(14*32+ 0) /* Digital Thermal Sensor */
> -- 

Acked-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

SUSE Software Solutions Germany GmbH, GF: Ivo Totev, HRB 36809, AG Nürnberg

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 08/21] cpufreq: amd: add acpi cppc function as the backend for legacy processors
  2021-10-29 14:20   ` Limonciello, Mario
@ 2021-11-01  9:02     ` Huang Rui
  0 siblings, 0 replies; 50+ messages in thread
From: Huang Rui @ 2021-11-01  9:02 UTC (permalink / raw)
  To: Limonciello, Mario
  Cc: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Borislav Petkov,
	Peter Zijlstra, Ingo Molnar, Giovanni Gherdovich, linux-pm,
	Sharma, Deepak, Deucher, Alexander, Steven Noonan, Fontenot,
	Nathan, Su, Jinzhou (Joe),
	Du, Xiaojian, linux-kernel, x86

On Fri, Oct 29, 2021 at 10:20:09PM +0800, Limonciello, Mario wrote:
> On 10/29/2021 08:02, Huang Rui wrote:
> > In some old Zen based processors, they are using the shared memory that
> > exposed from ACPI SBIOS.
> 
> I don't think this is only "old" processors.  I think there are "new" 
> processors that just don't happen to implement the MSR too.
> 

Yes, I will correct the description.

> > 
> > Signed-off-by: Jinzhou Su <Jinzhou.Su@amd.com>
> > Signed-off-by: Huang Rui <ray.huang@amd.com>
> > ---
> >   drivers/cpufreq/amd-pstate.c | 58 ++++++++++++++++++++++++++++++++----
> >   1 file changed, 53 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> > index 55ff03f85608..d399938d6d85 100644
> > --- a/drivers/cpufreq/amd-pstate.c
> > +++ b/drivers/cpufreq/amd-pstate.c
> > @@ -73,6 +73,19 @@ static inline int pstate_enable(bool enable)
> >   	return wrmsrl_safe(MSR_AMD_CPPC_ENABLE, enable ? 1 : 0);
> >   }
> >   
> > +static int cppc_enable(bool enable)
> > +{
> > +	int cpu, ret = 0;
> > +
> > +	for_each_online_cpu(cpu) {
> 
> I wonder if this should also be changed to present CPU instead of 
> offline CPU.  Otherwise could this turn into a situation that the user 
> starts with some CPU's offlined and enables them later but this doesn't 
> end up applying to the CPUs that were started offlined and changed?
> 

Yes, make sense. It is actually similiar with previous acpi_cpc_valid fix
patch. I will update it in V4.

Thanks,
Ray

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 17/21] cpupower: add amd-pstate sysfs definition and access helper
  2021-10-29 14:10   ` Limonciello, Mario
@ 2021-11-01  9:14     ` Huang Rui
  0 siblings, 0 replies; 50+ messages in thread
From: Huang Rui @ 2021-11-01  9:14 UTC (permalink / raw)
  To: Limonciello, Mario
  Cc: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Borislav Petkov,
	Peter Zijlstra, Ingo Molnar, Giovanni Gherdovich, linux-pm,
	Sharma, Deepak, Deucher, Alexander, Steven Noonan, Fontenot,
	Nathan, Su, Jinzhou (Joe),
	Du, Xiaojian, linux-kernel, x86

On Fri, Oct 29, 2021 at 10:10:13PM +0800, Limonciello, Mario wrote:
> On 10/29/2021 08:02, Huang Rui wrote:
> > Introduce the marco definitions and access helper function for
> 
> You've got a spelling error here.

Thanks, it's should be "macro".

Ray

> 
> > amd-pstate sysfs interfaces such as each performance goals and frequency
> > levels in amd helper file. They will be used to read the sysfs attribute
> > from amd-pstate cpufreq driver for cpupower utilities.
> > 
> > Signed-off-by: Huang Rui <ray.huang@amd.com>
> > ---
> >   tools/power/cpupower/utils/helpers/amd.c | 37 ++++++++++++++++++++++++
> >   1 file changed, 37 insertions(+)
> > 
> > diff --git a/tools/power/cpupower/utils/helpers/amd.c b/tools/power/cpupower/utils/helpers/amd.c
> > index 97f2c857048e..f233a6ab75ac 100644
> > --- a/tools/power/cpupower/utils/helpers/amd.c
> > +++ b/tools/power/cpupower/utils/helpers/amd.c
> > @@ -8,7 +8,9 @@
> >   #include <pci/pci.h>
> >   
> >   #include "helpers/helpers.h"
> > +#include "cpufreq.h"
> >   
> > +/* ACPI P-States Helper Functions for AMD Processors ***************/ >   #define MSR_AMD_PSTATE_STATUS	0xc0010063
> >   #define MSR_AMD_PSTATE		0xc0010064
> >   #define MSR_AMD_PSTATE_LIMIT	0xc0010061
> > @@ -146,4 +148,39 @@ int amd_pci_get_num_boost_states(int *active, int *states)
> >   	pci_cleanup(pci_acc);
> >   	return 0;
> >   }
> > +
> > +/* ACPI P-States Helper Functions for AMD Processors ***************/
> > +
> > +/* AMD P-States Helper Functions ***************/
> > +enum amd_pstate_value {
> > +	AMD_PSTATE_HIGHEST_PERF,
> > +	AMD_PSTATE_NOMINAL_PERF,
> > +	AMD_PSTATE_LOWEST_NONLINEAR_PERF,
> > +	AMD_PSTATE_LOWEST_PERF,
> > +	AMD_PSTATE_MAX_FREQ,
> > +	AMD_PSTATE_NOMINAL_FREQ,
> > +	AMD_PSTATE_LOWEST_NONLINEAR_FREQ,
> > +	MAX_AMD_PSTATE_VALUE_READ_FILES
> > +};
> > +
> > +static const char *amd_pstate_value_files[MAX_AMD_PSTATE_VALUE_READ_FILES] = {
> > +	[AMD_PSTATE_HIGHEST_PERF] = "amd_pstate_highest_perf",
> > +	[AMD_PSTATE_NOMINAL_PERF] = "amd_pstate_nominal_perf",
> > +	[AMD_PSTATE_LOWEST_NONLINEAR_PERF] = "amd_pstate_lowest_nonlinear_perf",
> > +	[AMD_PSTATE_LOWEST_PERF] = "amd_pstate_lowest_perf",
> > +	[AMD_PSTATE_MAX_FREQ] = "amd_pstate_max_freq",
> > +	[AMD_PSTATE_NOMINAL_FREQ] = "amd_pstate_nominal_freq",
> > +	[AMD_PSTATE_LOWEST_NONLINEAR_FREQ] = "amd_pstate_lowest_nonlinear_freq",
> > +};
> > +
> > +static unsigned long amd_pstate_get_data(unsigned int cpu,
> > +					 enum amd_pstate_value value)
> > +{
> > +	return cpufreq_get_sysfs_value_from_table(cpu,
> > +						  amd_pstate_value_files,
> > +						  value,
> > +						  MAX_AMD_PSTATE_VALUE_READ_FILES);
> > +}
> > +
> > +/* AMD P-States Helper Functions ***************/
> >   #endif /* defined(__i386__) || defined(__x86_64__) */
> > 
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 05/21] ACPI: CPPC: add cppc enable register function
  2021-10-29 14:15   ` Limonciello, Mario
@ 2021-11-01  9:20     ` Huang Rui
  0 siblings, 0 replies; 50+ messages in thread
From: Huang Rui @ 2021-11-01  9:20 UTC (permalink / raw)
  To: Limonciello, Mario
  Cc: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Borislav Petkov,
	Peter Zijlstra, Ingo Molnar, Giovanni Gherdovich, linux-pm,
	Sharma, Deepak, Deucher, Alexander, Steven Noonan, Fontenot,
	Nathan, Su, Jinzhou (Joe),
	Du, Xiaojian, linux-kernel, x86

On Fri, Oct 29, 2021 at 10:15:35PM +0800, Limonciello, Mario wrote:
> On 10/29/2021 08:02, Huang Rui wrote:
> > From: Jinzhou Su <Jinzhou.Su@amd.com>
> > 
> > Add a new function to enable CPPC feature. This function
> > will write Continuous Performance Control package
> > EnableRegister field on the processor.
> > 
> > CPPC EnableRegister register described in section 8.4.7.1 of ACPI 6.4:
> > This element is optional. If supported, contains a resource descriptor
> > with a single Register() descriptor that describes a register to which
> > OSPM writes a One to enable CPPC on this processor. Before this register
> > is set, the processor will be controlled by legacy mechanisms (ACPI
> > Pstates, firmware, etc.).
> > 
> > This register will be used for AMD processors to enable amd-pstate
> > function instead of legacy ACPI P-States.
> > 
> > Signed-off-by: Jinzhou Su <Jinzhou.Su@amd.com>
> > Signed-off-by: Huang Rui <ray.huang@amd.com>
> > ---
> >   drivers/acpi/cppc_acpi.c | 45 ++++++++++++++++++++++++++++++++++++++++
> >   include/acpi/cppc_acpi.h |  5 +++++
> >   2 files changed, 50 insertions(+)
> > 
> > diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
> > index c9169c221209..2d2297ef5bf9 100644
> > --- a/drivers/acpi/cppc_acpi.c
> > +++ b/drivers/acpi/cppc_acpi.c
> > @@ -1275,6 +1275,51 @@ int cppc_get_perf_ctrs(int cpunum, struct cppc_perf_fb_ctrs *perf_fb_ctrs)
> >   }
> >   EXPORT_SYMBOL_GPL(cppc_get_perf_ctrs);
> >   
> > +/**
> > + * cppc_set_enable - Set to enable CPPC on the processor by writing the
> > + * Continuous Performance Control package EnableRegister feild.
> 
> s/feild/field/
> 
> > + * @cpu: CPU for which to enable CPPC register.
> > + * @enable: 0 - disable, 1 - enable CPPC feature on the processor.
> > + *
> > + * Return: 0 for success, -ERRNO or -EIO otherwise.
> > + */
> > +int cppc_set_enable(int cpu, bool enable)
> > +{
> > +	int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpu);
> > +	struct cpc_register_resource *enable_reg;
> > +	struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpu);
> > +	struct cppc_pcc_data *pcc_ss_data = NULL;
> > +	int ret = -EINVAL;
> > +
> > +	if (!cpc_desc) {
> > +		pr_debug("No CPC descriptor for CPU:%d\n", cpu);
> > +		return -EINVAL;
> > +	}
> 
> Can this actually happen or is just an extra safety check?
> Don't you block running based on acpi_cpc_valid?
> 

It's a helper function in cppc lib, is this possible called on other driver
might not be protected by acpi_cpc_valid in future?

Thanks,
Ray

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 08/21] cpufreq: amd: add acpi cppc function as the backend for legacy processors
  2021-10-29 13:02 ` [PATCH v3 08/21] cpufreq: amd: add acpi cppc function as the backend for legacy processors Huang Rui
  2021-10-29 14:20   ` Limonciello, Mario
@ 2021-11-02 18:46   ` Nathan Fontenot
  2021-11-03 12:00     ` Huang Rui
  1 sibling, 1 reply; 50+ messages in thread
From: Nathan Fontenot @ 2021-11-02 18:46 UTC (permalink / raw)
  To: Huang Rui, Rafael J . Wysocki, Viresh Kumar, Shuah Khan,
	Borislav Petkov, Peter Zijlstra, Ingo Molnar,
	Giovanni Gherdovich, linux-pm
  Cc: Deepak Sharma, Alex Deucher, Mario Limonciello, Steven Noonan,
	Nathan Fontenot, Jinzhou Su, Xiaojian Du, linux-kernel, x86

On 10/29/21 8:02 AM, Huang Rui wrote:
> In some old Zen based processors, they are using the shared memory that
> exposed from ACPI SBIOS.

With this you present two different approaches for support in the driver,
MSRs and shared memory. For processors using shared memory you use the 
shared memory defined in the ACPI tables but access the MSRs directly.

Is there any concern that the MSR registers (defined in patch 2/21) can
differ from what is defined in the ACPI tables?

Should you use the drivers/acpi interfaces for MSRs also?

-Nathan
 
> 
> Signed-off-by: Jinzhou Su <Jinzhou.Su@amd.com>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> ---
>  drivers/cpufreq/amd-pstate.c | 58 ++++++++++++++++++++++++++++++++----
>  1 file changed, 53 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> index 55ff03f85608..d399938d6d85 100644
> --- a/drivers/cpufreq/amd-pstate.c
> +++ b/drivers/cpufreq/amd-pstate.c
> @@ -73,6 +73,19 @@ static inline int pstate_enable(bool enable)
>  	return wrmsrl_safe(MSR_AMD_CPPC_ENABLE, enable ? 1 : 0);
>  }
>  
> +static int cppc_enable(bool enable)
> +{
> +	int cpu, ret = 0;
> +
> +	for_each_online_cpu(cpu) {
> +		ret = cppc_set_enable(cpu, enable ? 1 : 0);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	return ret;
> +}
> +
>  DEFINE_STATIC_CALL(amd_pstate_enable, pstate_enable);
>  
>  static inline int amd_pstate_enable(bool enable)
> @@ -103,6 +116,24 @@ static int pstate_init_perf(struct amd_cpudata *cpudata)
>  	return 0;
>  }
>  
> +static int cppc_init_perf(struct amd_cpudata *cpudata)
> +{
> +	struct cppc_perf_caps cppc_perf;
> +
> +	int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf);
> +	if (ret)
> +		return ret;
> +
> +	WRITE_ONCE(cpudata->highest_perf, amd_get_highest_perf());
> +
> +	WRITE_ONCE(cpudata->nominal_perf, cppc_perf.nominal_perf);
> +	WRITE_ONCE(cpudata->lowest_nonlinear_perf,
> +		   cppc_perf.lowest_nonlinear_perf);
> +	WRITE_ONCE(cpudata->lowest_perf, cppc_perf.lowest_perf);
> +
> +	return 0;
> +}
> +
>  DEFINE_STATIC_CALL(amd_pstate_init_perf, pstate_init_perf);
>  
>  static inline int amd_pstate_init_perf(struct amd_cpudata *cpudata)
> @@ -120,6 +151,19 @@ static void pstate_update_perf(struct amd_cpudata *cpudata, u32 min_perf,
>  			      READ_ONCE(cpudata->cppc_req_cached));
>  }
>  
> +static void cppc_update_perf(struct amd_cpudata *cpudata,
> +			     u32 min_perf, u32 des_perf,
> +			     u32 max_perf, bool fast_switch)
> +{
> +	struct cppc_perf_ctrls perf_ctrls;
> +
> +	perf_ctrls.max_perf = max_perf;
> +	perf_ctrls.min_perf = min_perf;
> +	perf_ctrls.desired_perf = des_perf;
> +
> +	cppc_set_perf(cpudata->cpu, &perf_ctrls);
> +}
> +
>  DEFINE_STATIC_CALL(amd_pstate_update_perf, pstate_update_perf);
>  
>  static inline void amd_pstate_update_perf(struct amd_cpudata *cpudata,
> @@ -346,7 +390,8 @@ static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
>  	/* It will be updated by governor */
>  	policy->cur = policy->cpuinfo.min_freq;
>  
> -	policy->fast_switch_possible = true;
> +	if (boot_cpu_has(X86_FEATURE_AMD_CPPC))
> +		policy->fast_switch_possible = true;
>  
>  	ret = freq_qos_add_request(&policy->constraints, &cpudata->req[0],
>  				   FREQ_QOS_MIN, policy->cpuinfo.min_freq);
> @@ -397,7 +442,6 @@ static struct cpufreq_driver amd_pstate_driver = {
>  	.flags		= CPUFREQ_CONST_LOOPS | CPUFREQ_NEED_UPDATE_LIMITS,
>  	.verify		= amd_pstate_verify,
>  	.target		= amd_pstate_target,
> -	.adjust_perf    = amd_pstate_adjust_perf,
>  	.init		= amd_pstate_cpu_init,
>  	.exit		= amd_pstate_cpu_exit,
>  	.name		= "amd-pstate",
> @@ -421,10 +465,14 @@ static int __init amd_pstate_init(void)
>  		return -EEXIST;
>  
>  	/* capability check */
> -	if (!boot_cpu_has(X86_FEATURE_AMD_CPPC)) {
> -		pr_debug("%s, AMD CPPC MSR based functionality is not supported\n",
> +	if (boot_cpu_has(X86_FEATURE_AMD_CPPC)) {
> +		pr_debug("%s, AMD CPPC MSR based functionality is supported\n",
>  			 __func__);
> -		return -ENODEV;
> +		amd_pstate_driver.adjust_perf = amd_pstate_adjust_perf;
> +	} else {
> +		static_call_update(amd_pstate_enable, cppc_enable);
> +		static_call_update(amd_pstate_init_perf, cppc_init_perf);
> +		static_call_update(amd_pstate_update_perf, cppc_update_perf);
>  	}
>  
>  	/* enable amd pstate feature */
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 06/21] cpufreq: amd: introduce a new amd pstate driver to support future processors
  2021-10-29 13:02 ` [PATCH v3 06/21] cpufreq: amd: introduce a new amd pstate driver to support future processors Huang Rui
@ 2021-11-02 18:52   ` Limonciello, Mario
  2021-11-02 19:38   ` Nathan Fontenot
  1 sibling, 0 replies; 50+ messages in thread
From: Limonciello, Mario @ 2021-11-02 18:52 UTC (permalink / raw)
  To: Huang Rui, Rafael J . Wysocki, Viresh Kumar, Shuah Khan,
	Borislav Petkov, Peter Zijlstra, Ingo Molnar,
	Giovanni Gherdovich, linux-pm
  Cc: Deepak Sharma, Alex Deucher, Steven Noonan, Nathan Fontenot,
	Jinzhou Su, Xiaojian Du, linux-kernel, x86

On 10/29/2021 08:02, Huang Rui wrote:
> amd-pstate is the AMD CPU performance scaling driver that introduces a
> new CPU frequency control mechanism on AMD Zen based CPU series in Linux
> kernel. The new mechanism is based on Collaborative processor
> performance control (CPPC) which is finer grain frequency management
> than legacy ACPI hardware P-States. Current AMD CPU platforms are using
> the ACPI P-states driver to manage CPU frequency and clocks with
> switching only in 3 P-states. AMD P-States is to replace the ACPI
> P-states controls, allows a flexible, low-latency interface for the
> Linux kernel to directly communicate the performance hints to hardware.
> 
> "amd-pstate" leverages the Linux kernel governors such as *schedutil*,
> *ondemand*, etc. to manage the performance hints which are provided by CPPC
> hardware functionality. The first version for amd-pstate is to support one
> of the Zen3 processors, and we will support more in future after we verify
> the hardware and SBIOS functionalities.
> 
> There are two types of hardware implementations for amd-pstate: one is full
> MSR support and another is shared memory support. It can use
> X86_FEATURE_AMD_CPPC_EXT feature flag to distinguish the different types.
> 
> Using the new AMD P-States method + kernel governors (*schedutil*,
> *ondemand*, ...) to manage the frequency update is the most appropriate
> bridge between AMD Zen based hardware processor and Linux kernel, the
> processor is able to ajust to the most efficiency frequency according to
> the kernel scheduler loading.
> 
> Performance Per Watt (PPW) Caculation:
> 
> The PPW caculation is referred by below paper:
> https://software.intel.com/content/dam/develop/external/us/en/documents/performance-per-what-paper.pdf
> 
> Below formula is referred from below spec to measure the PPW:
> 
> (F / t) / P = F * t / (t * E) = F / E,
> 
> "F" is the number of frames per second.
> "P" is power measurd in watts.
> "E" is energy measured in joules.
> 
> We use the RAPL interface with "perf" tool to get the energy data of the
> package power.
> 
> The data comparsions between amd-pstate and acpi-freq module are tested on
> AMD Cezanne processor:
> 
> 1) TBench CPU benchmark:
> 
> +---------------------------------------------------------------------+
> |                                                                     |
> |               TBench (Performance Per Watt)                         |
> |                                                    Higher is better |
> +-------------------+------------------------+------------------------+
> |                   |  Performance Per Watt  |  Performance Per Watt  |
> |   Kernel Module   |       (Schedutil)      |       (Ondemand)       |
> |                   |  Unit: MB / (s * J)    |  Unit: MB / (s * J)    |
> +-------------------+------------------------+------------------------+
> |                   |                        |                        |
> |    acpi-cpufreq   |         3.022          |        2.969           |
> |                   |                        |                        |
> +-------------------+------------------------+------------------------+
> |                   |                        |                        |
> |     amd-pstate    |         3.131          |        3.284           |
> |                   |                        |                        |
> +-------------------+------------------------+------------------------+
> 
> 2) Gitsource CPU benchmark:
> 
> +---------------------------------------------------------------------+
> |                                                                     |
> |               Gitsource (Performance Per Watt)                      |
> |                                                    Higher is better |
> +-------------------+------------------------+------------------------+
> |                   |  Performance Per Watt  |  Performance Per Watt  |
> |   Kernel Module   |       (Schedutil)      |       (Ondemand)       |
> |                   |  Unit: 1 / (s * J)     |  Unit: 1 / (s * J)     |
> +-------------------+------------------------+------------------------+
> |                   |                        |                        |
> |    acpi-cpufreq   |     3.42172E-07        |     2.74508E-07        |
> |                   |                        |                        |
> +-------------------+------------------------+------------------------+
> |                   |                        |                        |
> |     amd-pstate    |     4.09141E-07        |     3.47610E-07        |
> |                   |                        |                        |
> +-------------------+------------------------+------------------------+
> 
> 3) Speedometer 2.0 CPU benchmark:
> 
> +---------------------------------------------------------------------+
> |                                                                     |
> |               Speedometer 2.0 (Performance Per Watt)                |
> |                                                    Higher is better |
> +-------------------+------------------------+------------------------+
> |                   |  Performance Per Watt  |  Performance Per Watt  |
> |   Kernel Module   |       (Schedutil)      |       (Ondemand)       |
> |                   |  Unit: 1 / (s * J)     |  Unit: 1 / (s * J)     |
> +-------------------+------------------------+------------------------+
> |                   |                        |                        |
> |    acpi-cpufreq   |      0.116111767       |      0.110321664       |
> |                   |                        |                        |
> +-------------------+------------------------+------------------------+
> |                   |                        |                        |
> |     amd-pstate    |      0.115825281       |      0.122024299       |
> |                   |                        |                        |
> +-------------------+------------------------+------------------------+
> 
> According to above average data, we can see this solution has shown better
> performance per watt scaling on mobile CPU benchmarks in most of cases.
> 
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> ---
>   drivers/cpufreq/Kconfig.x86  |  17 ++
>   drivers/cpufreq/Makefile     |   1 +
>   drivers/cpufreq/amd-pstate.c | 413 +++++++++++++++++++++++++++++++++++
>   3 files changed, 431 insertions(+)
>   create mode 100644 drivers/cpufreq/amd-pstate.c
> 
> diff --git a/drivers/cpufreq/Kconfig.x86 b/drivers/cpufreq/Kconfig.x86
> index 92701a18bdd9..2e798b2c0bdb 100644
> --- a/drivers/cpufreq/Kconfig.x86
> +++ b/drivers/cpufreq/Kconfig.x86
> @@ -34,6 +34,23 @@ config X86_PCC_CPUFREQ
>   
>   	  If in doubt, say N.
>   
> +config X86_AMD_PSTATE
> +	bool "AMD Processor P-State driver"
> +	depends on X86
> +	select ACPI_PROCESSOR if ACPI
> +	select ACPI_CPPC_LIB if X86_64 && ACPI && SCHED_MC_PRIO
> +	select CPU_FREQ_GOV_SCHEDUTIL if SMP
> +	help
> +	  This driver adds a CPUFreq driver which utilizes a fine grain
> +	  processor performance freqency control range instead of legacy
> +	  performance levels. This driver supports the AMD processors with
> +	  _CPC object in the SBIOS.
> +
> +	  For details, take a look at:
> +	  <file:Documentation/admin-guide/pm/amd-pstate.rst>.
> +
> +	  If in doubt, say N.
> +
>   config X86_ACPI_CPUFREQ
>   	tristate "ACPI Processor P-States driver"
>   	depends on ACPI_PROCESSOR
> diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
> index 48ee5859030c..c8d307010922 100644
> --- a/drivers/cpufreq/Makefile
> +++ b/drivers/cpufreq/Makefile
> @@ -25,6 +25,7 @@ obj-$(CONFIG_CPUFREQ_DT_PLATDEV)	+= cpufreq-dt-platdev.o
>   # speedstep-* is preferred over p4-clockmod.
>   
>   obj-$(CONFIG_X86_ACPI_CPUFREQ)		+= acpi-cpufreq.o
> +obj-$(CONFIG_X86_AMD_PSTATE)		+= amd-pstate.o
>   obj-$(CONFIG_X86_POWERNOW_K8)		+= powernow-k8.o
>   obj-$(CONFIG_X86_PCC_CPUFREQ)		+= pcc-cpufreq.o
>   obj-$(CONFIG_X86_POWERNOW_K6)		+= powernow-k6.o
> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> new file mode 100644
> index 000000000000..a400861c7fdc
> --- /dev/null
> +++ b/drivers/cpufreq/amd-pstate.c
> @@ -0,0 +1,413 @@
> +/*
> + * amd-pstate.c - AMD Processor P-state Frequency Driver
> + *
> + * Copyright (C) 2021 Advanced Micro Devices, Inc. All Rights Reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
> + *
> + * Author: Huang Rui <ray.huang@amd.com>
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/init.h>
> +#include <linux/smp.h>
> +#include <linux/sched.h>
> +#include <linux/cpufreq.h>
> +#include <linux/compiler.h>
> +#include <linux/dmi.h>
> +#include <linux/slab.h>
> +#include <linux/acpi.h>
> +#include <linux/io.h>
> +#include <linux/delay.h>
> +#include <linux/uaccess.h>
> +#include <linux/static_call.h>
> +
> +#include <acpi/processor.h>
> +#include <acpi/cppc_acpi.h>
> +
> +#include <asm/msr.h>
> +#include <asm/processor.h>
> +#include <asm/cpufeature.h>
> +#include <asm/cpu_device_id.h>
> +
> +#define AMD_PSTATE_TRANSITION_LATENCY	0x20000
> +#define AMD_PSTATE_TRANSITION_DELAY	500
> +
> +static struct cpufreq_driver amd_pstate_driver;
> +
> +struct amd_cpudata {
> +	int	cpu;
> +
> +	struct freq_qos_request req[2];
> +
> +	u64	cppc_req_cached;
> +
> +	u32	highest_perf;
> +	u32	nominal_perf;
> +	u32	lowest_nonlinear_perf;
> +	u32	lowest_perf;
> +
> +	u32	max_freq;
> +	u32	min_freq;
> +	u32	nominal_freq;
> +	u32	lowest_nonlinear_freq;
> +};
> +
> +static inline int pstate_enable(bool enable)
> +{
> +	return wrmsrl_safe(MSR_AMD_CPPC_ENABLE, enable ? 1 : 0);
> +}
> +
> +DEFINE_STATIC_CALL(amd_pstate_enable, pstate_enable);
> +
> +static inline int amd_pstate_enable(bool enable)
> +{
> +	return static_call(amd_pstate_enable)(enable);
> +}
> +
> +static int pstate_init_perf(struct amd_cpudata *cpudata)
> +{
> +	u64 cap1;
> +
> +	int ret = rdmsrl_safe_on_cpu(cpudata->cpu, MSR_AMD_CPPC_CAP1,
> +				     &cap1);
> +	if (ret)
> +		return ret;
> +
> +	/*
> +	 * TODO: Introduce AMD specific power feature.
> +	 *
> +	 * CPPC entry doesn't indicate the highest performance in some ASICs.
> +	 */
> +	WRITE_ONCE(cpudata->highest_perf, amd_get_highest_perf());

Wouldn't it be better to default to CAP1_HIGHEST_PERF(cap1) and quirk 
the ones that don't work instead?

> +
> +	WRITE_ONCE(cpudata->nominal_perf, CAP1_NOMINAL_PERF(cap1));
> +	WRITE_ONCE(cpudata->lowest_nonlinear_perf, CAP1_LOWNONLIN_PERF(cap1));
> +	WRITE_ONCE(cpudata->lowest_perf, CAP1_LOWEST_PERF(cap1));
> +
> +	return 0;
> +}
> +
> +DEFINE_STATIC_CALL(amd_pstate_init_perf, pstate_init_perf);
> +
> +static inline int amd_pstate_init_perf(struct amd_cpudata *cpudata)
> +{
> +	return static_call(amd_pstate_init_perf)(cpudata);
> +}
> +
> +static void pstate_update_perf(struct amd_cpudata *cpudata, u32 min_perf,
> +			       u32 des_perf, u32 max_perf, bool fast_switch)
> +{
> +	if (fast_switch)
> +		wrmsrl(MSR_AMD_CPPC_REQ, READ_ONCE(cpudata->cppc_req_cached));
> +	else
> +		wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ,
> +			      READ_ONCE(cpudata->cppc_req_cached));
> +}
> +
> +DEFINE_STATIC_CALL(amd_pstate_update_perf, pstate_update_perf);
> +
> +static inline void amd_pstate_update_perf(struct amd_cpudata *cpudata,
> +					  u32 min_perf, u32 des_perf,
> +					  u32 max_perf, bool fast_switch)
> +{
> +	static_call(amd_pstate_update_perf)(cpudata, min_perf, des_perf,
> +					    max_perf, fast_switch);
> +}
> +
> +static void amd_pstate_update(struct amd_cpudata *cpudata, u32 min_perf,
> +			      u32 des_perf, u32 max_perf, bool fast_switch)
> +{
> +	u64 prev = READ_ONCE(cpudata->cppc_req_cached);
> +	u64 value = prev;
> +
> +	value &= ~REQ_MIN_PERF(~0L);
> +	value |= REQ_MIN_PERF(min_perf);
> +
> +	value &= ~REQ_DES_PERF(~0L);
> +	value |= REQ_DES_PERF(des_perf);
> +
> +	value &= ~REQ_MAX_PERF(~0L);
> +	value |= REQ_MAX_PERF(max_perf);
> +
> +	if (value == prev)
> +		return;
> +
> +	WRITE_ONCE(cpudata->cppc_req_cached, value);
> +
> +	amd_pstate_update_perf(cpudata, min_perf, des_perf,
> +			       max_perf, fast_switch);
> +}
> +
> +static int amd_pstate_verify(struct cpufreq_policy_data *policy)
> +{
> +	cpufreq_verify_within_cpu_limits(policy);
> +
> +	return 0;
> +}
> +
> +static int amd_pstate_target(struct cpufreq_policy *policy,
> +			     unsigned int target_freq,
> +			     unsigned int relation)
> +{
> +	struct cpufreq_freqs freqs;
> +	struct amd_cpudata *cpudata = policy->driver_data;
> +	unsigned long amd_max_perf, amd_min_perf, amd_des_perf,
> +		      amd_cap_perf;
> +
> +	if (!cpudata->max_freq)
> +		return -ENODEV;
> +
> +	amd_cap_perf = READ_ONCE(cpudata->highest_perf);
> +	amd_min_perf = READ_ONCE(cpudata->lowest_nonlinear_perf);
> +	amd_max_perf = amd_cap_perf;
> +
> +	freqs.old = policy->cur;
> +	freqs.new = target_freq;
> +
> +	amd_des_perf = DIV_ROUND_CLOSEST(target_freq * amd_cap_perf,
> +					 cpudata->max_freq);
> +
> +	cpufreq_freq_transition_begin(policy, &freqs);
> +	amd_pstate_update(cpudata, amd_min_perf, amd_des_perf,
> +			  amd_max_perf, false);
> +	cpufreq_freq_transition_end(policy, &freqs, false);
> +
> +	return 0;
> +}
> +
> +static int amd_get_min_freq(struct amd_cpudata *cpudata)
> +{
> +	struct cppc_perf_caps cppc_perf;
> +
> +	int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf);
> +	if (ret)
> +		return ret;
> +
> +	/* Switch to khz */
> +	return cppc_perf.lowest_freq * 1000;
> +}
> +
> +static int amd_get_max_freq(struct amd_cpudata *cpudata)
> +{
> +	struct cppc_perf_caps cppc_perf;
> +	u32 max_perf, max_freq, nominal_freq, nominal_perf;
> +	u64 boost_ratio;
> +
> +	int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf);
> +	if (ret)
> +		return ret;
> +
> +	nominal_freq = cppc_perf.nominal_freq;
> +	nominal_perf = READ_ONCE(cpudata->nominal_perf);
> +	max_perf = READ_ONCE(cpudata->highest_perf);
> +
> +	boost_ratio = div_u64(max_perf << SCHED_CAPACITY_SHIFT,
> +			      nominal_perf);
> +
> +	max_freq = nominal_freq * boost_ratio >> SCHED_CAPACITY_SHIFT;
> +
> +	/* Switch to khz */
> +	return max_freq * 1000;
> +}
> +
> +static int amd_get_nominal_freq(struct amd_cpudata *cpudata)
> +{
> +	struct cppc_perf_caps cppc_perf;
> +	u32 nominal_freq;
> +
> +	int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf);
> +	if (ret)
> +		return ret;
> +
> +	nominal_freq = cppc_perf.nominal_freq;
> +
> +	/* Switch to khz */
> +	return nominal_freq * 1000;
> +}
> +
> +static int amd_get_lowest_nonlinear_freq(struct amd_cpudata *cpudata)
> +{
> +	struct cppc_perf_caps cppc_perf;
> +	u32 lowest_nonlinear_freq, lowest_nonlinear_perf,
> +	    nominal_freq, nominal_perf;
> +	u64 lowest_nonlinear_ratio;
> +
> +	int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf);
> +	if (ret)
> +		return ret;
> +
> +	nominal_freq = cppc_perf.nominal_freq;
> +	nominal_perf = READ_ONCE(cpudata->nominal_perf);
> +
> +	lowest_nonlinear_perf = cppc_perf.lowest_nonlinear_perf;
> +
> +	lowest_nonlinear_ratio = div_u64(lowest_nonlinear_perf <<
> +					 SCHED_CAPACITY_SHIFT, nominal_perf);
> +
> +	lowest_nonlinear_freq = nominal_freq * lowest_nonlinear_ratio >> SCHED_CAPACITY_SHIFT;
> +
> +	/* Switch to khz */
> +	return lowest_nonlinear_freq * 1000;
> +}
> +
> +static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
> +{
> +	int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret;
> +	unsigned int cpu = policy->cpu;
> +	struct device *dev;
> +	struct amd_cpudata *cpudata;
> +
> +	dev = get_cpu_device(policy->cpu);
> +	if (!dev)
> +		return -ENODEV;
> +
> +	cpudata = kzalloc(sizeof(*cpudata), GFP_KERNEL);
> +	if (!cpudata)
> +		return -ENOMEM;
> +
> +	cpudata->cpu = cpu;
> +
> +	ret = amd_pstate_init_perf(cpudata);
> +	if (ret)
> +		goto free_cpudata1;
> +
> +	min_freq = amd_get_min_freq(cpudata);
> +	max_freq = amd_get_max_freq(cpudata);
> +	nominal_freq = amd_get_nominal_freq(cpudata);
> +	lowest_nonlinear_freq = amd_get_lowest_nonlinear_freq(cpudata);
> +
> +	if (min_freq < 0 || max_freq < 0 || min_freq > max_freq) {
> +		dev_err(dev, "min_freq(%d) or max_freq(%d) value is incorrect\n",
> +			min_freq, max_freq);
> +		ret = -EINVAL;
> +		goto free_cpudata1;
> +	}
> +
> +	policy->cpuinfo.transition_latency = AMD_PSTATE_TRANSITION_LATENCY;
> +	policy->transition_delay_us = AMD_PSTATE_TRANSITION_DELAY;
> +
> +	policy->min = min_freq;
> +	policy->max = max_freq;
> +
> +	policy->cpuinfo.min_freq = min_freq;
> +	policy->cpuinfo.max_freq = max_freq;
> +
> +	/* It will be updated by governor */
> +	policy->cur = policy->cpuinfo.min_freq;
> +
> +	ret = freq_qos_add_request(&policy->constraints, &cpudata->req[0],
> +				   FREQ_QOS_MIN, policy->cpuinfo.min_freq);
> +	if (ret < 0) {
> +		dev_err(dev, "Failed to add min-freq constraint (%d)\n", ret);
> +		goto free_cpudata1;
> +	}
> +
> +	ret = freq_qos_add_request(&policy->constraints, &cpudata->req[1],
> +				   FREQ_QOS_MAX, policy->cpuinfo.max_freq);
> +	if (ret < 0) {
> +		dev_err(dev, "Failed to add max-freq constraint (%d)\n", ret);
> +		goto free_cpudata2;
> +	}
> +
> +	/* Initial processor data capability frequencies */
> +	cpudata->max_freq = max_freq;
> +	cpudata->min_freq = min_freq;
> +	cpudata->nominal_freq = nominal_freq;
> +	cpudata->lowest_nonlinear_freq = lowest_nonlinear_freq;
> +
> +	policy->driver_data = cpudata;
> +
> +	return 0;
> +
> +	freq_qos_remove_request(&cpudata->req[1]);
> +free_cpudata2:
> +	freq_qos_remove_request(&cpudata->req[0]);
> +free_cpudata1:
> +	kfree(cpudata);
> +	return ret;
> +}
> +
> +static int amd_pstate_cpu_exit(struct cpufreq_policy *policy)
> +{
> +	struct amd_cpudata *cpudata;
> +
> +	cpudata = policy->driver_data;
> +
> +	freq_qos_remove_request(&cpudata->req[1]);
> +	freq_qos_remove_request(&cpudata->req[0]);
> +	kfree(cpudata);
> +
> +	return 0;
> +}
> +
> +static struct cpufreq_driver amd_pstate_driver = {
> +	.flags		= CPUFREQ_CONST_LOOPS | CPUFREQ_NEED_UPDATE_LIMITS,
> +	.verify		= amd_pstate_verify,
> +	.target		= amd_pstate_target,
> +	.init		= amd_pstate_cpu_init,
> +	.exit		= amd_pstate_cpu_exit,
> +	.name		= "amd-pstate",
> +};
> +
> +static int __init amd_pstate_init(void)
> +{
> +	int ret;
> +
> +	if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
> +		return -ENODEV;
> +
> +	if (!acpi_cpc_valid()) {
> +		pr_debug("%s, the _CPC object is not present in SBIOS\n",
> +			 __func__);
> +		return -ENODEV;
> +	}
> +
> +	/* don't keep reloading if cpufreq_driver exists */
> +	if (cpufreq_get_current_driver())
> +		return -EEXIST;
> +
> +	/* capability check */
> +	if (!boot_cpu_has(X86_FEATURE_AMD_CPPC)) {
> +		pr_debug("%s, AMD CPPC MSR based functionality is not supported\n",
> +			 __func__);
> +		return -ENODEV;
> +	}
> +
> +	/* enable amd pstate feature */
> +	ret = amd_pstate_enable(true);
> +	if (ret) {
> +		pr_err("%s, failed to enable amd-pstate with return %d\n",
> +		       __func__, ret);
> +		return ret;
> +	}
> +
> +	ret = cpufreq_register_driver(&amd_pstate_driver);
> +	if (ret) {
> +		pr_err("%s, return %d\n", __func__, ret);
> +		return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +device_initcall(amd_pstate_init);
> +
> +MODULE_AUTHOR("Huang Rui <ray.huang@amd.com>");
> +MODULE_DESCRIPTION("AMD Processor P-state Frequency Driver");
> +MODULE_LICENSE("GPL");
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 06/21] cpufreq: amd: introduce a new amd pstate driver to support future processors
  2021-10-29 13:02 ` [PATCH v3 06/21] cpufreq: amd: introduce a new amd pstate driver to support future processors Huang Rui
  2021-11-02 18:52   ` Limonciello, Mario
@ 2021-11-02 19:38   ` Nathan Fontenot
  2021-11-03  7:01     ` Huang Rui
  1 sibling, 1 reply; 50+ messages in thread
From: Nathan Fontenot @ 2021-11-02 19:38 UTC (permalink / raw)
  To: Huang Rui, Rafael J . Wysocki, Viresh Kumar, Shuah Khan,
	Borislav Petkov, Peter Zijlstra, Ingo Molnar,
	Giovanni Gherdovich, linux-pm
  Cc: Deepak Sharma, Alex Deucher, Mario Limonciello, Steven Noonan,
	Nathan Fontenot, Jinzhou Su, Xiaojian Du, linux-kernel, x86

On 10/29/21 8:02 AM, Huang Rui wrote:
> amd-pstate is the AMD CPU performance scaling driver that introduces a
> new CPU frequency control mechanism on AMD Zen based CPU series in Linux
> kernel. The new mechanism is based on Collaborative processor
> performance control (CPPC) which is finer grain frequency management
> than legacy ACPI hardware P-States. Current AMD CPU platforms are using
> the ACPI P-states driver to manage CPU frequency and clocks with
> switching only in 3 P-states. AMD P-States is to replace the ACPI
> P-states controls, allows a flexible, low-latency interface for the
> Linux kernel to directly communicate the performance hints to hardware.
> 
> "amd-pstate" leverages the Linux kernel governors such as *schedutil*,
> *ondemand*, etc. to manage the performance hints which are provided by CPPC
> hardware functionality. The first version for amd-pstate is to support one
> of the Zen3 processors, and we will support more in future after we verify
> the hardware and SBIOS functionalities.
> 
> There are two types of hardware implementations for amd-pstate: one is full
> MSR support and another is shared memory support. It can use
> X86_FEATURE_AMD_CPPC_EXT feature flag to distinguish the different types.
> 
> Using the new AMD P-States method + kernel governors (*schedutil*,
> *ondemand*, ...) to manage the frequency update is the most appropriate
> bridge between AMD Zen based hardware processor and Linux kernel, the
> processor is able to ajust to the most efficiency frequency according to

s/ajust/adjust/

> the kernel scheduler loading.
> 
> Performance Per Watt (PPW) Caculation:
> 
> The PPW caculation is referred by below paper:
> https://software.intel.com/content/dam/develop/external/us/en/documents/performance-per-what-paper.pdf
> 
> Below formula is referred from below spec to measure the PPW:
> 
> (F / t) / P = F * t / (t * E) = F / E,
> 
> "F" is the number of frames per second.
> "P" is power measurd in watts.

s/measurd/measured/

> "E" is energy measured in joules.
> 
> We use the RAPL interface with "perf" tool to get the energy data of the
> package power.
> 
> The data comparsions between amd-pstate and acpi-freq module are tested on

s/comparsions/comparisons/

> AMD Cezanne processor:
> 
> 1) TBench CPU benchmark:
> 
> +---------------------------------------------------------------------+
> |                                                                     |
> |               TBench (Performance Per Watt)                         |
> |                                                    Higher is better |
> +-------------------+------------------------+------------------------+
> |                   |  Performance Per Watt  |  Performance Per Watt  |
> |   Kernel Module   |       (Schedutil)      |       (Ondemand)       |
> |                   |  Unit: MB / (s * J)    |  Unit: MB / (s * J)    |
> +-------------------+------------------------+------------------------+
> |                   |                        |                        |
> |    acpi-cpufreq   |         3.022          |        2.969           |
> |                   |                        |                        |
> +-------------------+------------------------+------------------------+
> |                   |                        |                        |
> |     amd-pstate    |         3.131          |        3.284           |
> |                   |                        |                        |
> +-------------------+------------------------+------------------------+
> 
> 2) Gitsource CPU benchmark:
> 
> +---------------------------------------------------------------------+
> |                                                                     |
> |               Gitsource (Performance Per Watt)                      |
> |                                                    Higher is better |
> +-------------------+------------------------+------------------------+
> |                   |  Performance Per Watt  |  Performance Per Watt  |
> |   Kernel Module   |       (Schedutil)      |       (Ondemand)       |
> |                   |  Unit: 1 / (s * J)     |  Unit: 1 / (s * J)     |
> +-------------------+------------------------+------------------------+
> |                   |                        |                        |
> |    acpi-cpufreq   |     3.42172E-07        |     2.74508E-07        |
> |                   |                        |                        |
> +-------------------+------------------------+------------------------+
> |                   |                        |                        |
> |     amd-pstate    |     4.09141E-07        |     3.47610E-07        |
> |                   |                        |                        |
> +-------------------+------------------------+------------------------+
> 
> 3) Speedometer 2.0 CPU benchmark:
> 
> +---------------------------------------------------------------------+
> |                                                                     |
> |               Speedometer 2.0 (Performance Per Watt)                |
> |                                                    Higher is better |
> +-------------------+------------------------+------------------------+
> |                   |  Performance Per Watt  |  Performance Per Watt  |
> |   Kernel Module   |       (Schedutil)      |       (Ondemand)       |
> |                   |  Unit: 1 / (s * J)     |  Unit: 1 / (s * J)     |
> +-------------------+------------------------+------------------------+
> |                   |                        |                        |
> |    acpi-cpufreq   |      0.116111767       |      0.110321664       |
> |                   |                        |                        |
> +-------------------+------------------------+------------------------+
> |                   |                        |                        |
> |     amd-pstate    |      0.115825281       |      0.122024299       |
> |                   |                        |                        |
> +-------------------+------------------------+------------------------+
> 
> According to above average data, we can see this solution has shown better
> performance per watt scaling on mobile CPU benchmarks in most of cases.
> 
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> ---
>  drivers/cpufreq/Kconfig.x86  |  17 ++
>  drivers/cpufreq/Makefile     |   1 +
>  drivers/cpufreq/amd-pstate.c | 413 +++++++++++++++++++++++++++++++++++
>  3 files changed, 431 insertions(+)
>  create mode 100644 drivers/cpufreq/amd-pstate.c
> 
> diff --git a/drivers/cpufreq/Kconfig.x86 b/drivers/cpufreq/Kconfig.x86
> index 92701a18bdd9..2e798b2c0bdb 100644
> --- a/drivers/cpufreq/Kconfig.x86
> +++ b/drivers/cpufreq/Kconfig.x86
> @@ -34,6 +34,23 @@ config X86_PCC_CPUFREQ
>  
>  	  If in doubt, say N.
>  
> +config X86_AMD_PSTATE
> +	bool "AMD Processor P-State driver"
> +	depends on X86
> +	select ACPI_PROCESSOR if ACPI
> +	select ACPI_CPPC_LIB if X86_64 && ACPI && SCHED_MC_PRIO
> +	select CPU_FREQ_GOV_SCHEDUTIL if SMP
> +	help
> +	  This driver adds a CPUFreq driver which utilizes a fine grain
> +	  processor performance freqency control range instead of legacy

s/freqency/frequency/

> +	  performance levels. This driver supports the AMD processors with
> +	  _CPC object in the SBIOS.
> +
> +	  For details, take a look at:
> +	  <file:Documentation/admin-guide/pm/amd-pstate.rst>.
> +
> +	  If in doubt, say N.
> +
>  config X86_ACPI_CPUFREQ
>  	tristate "ACPI Processor P-States driver"
>  	depends on ACPI_PROCESSOR
> diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
> index 48ee5859030c..c8d307010922 100644
> --- a/drivers/cpufreq/Makefile
> +++ b/drivers/cpufreq/Makefile
> @@ -25,6 +25,7 @@ obj-$(CONFIG_CPUFREQ_DT_PLATDEV)	+= cpufreq-dt-platdev.o
>  # speedstep-* is preferred over p4-clockmod.
>  
>  obj-$(CONFIG_X86_ACPI_CPUFREQ)		+= acpi-cpufreq.o
> +obj-$(CONFIG_X86_AMD_PSTATE)		+= amd-pstate.o
>  obj-$(CONFIG_X86_POWERNOW_K8)		+= powernow-k8.o
>  obj-$(CONFIG_X86_PCC_CPUFREQ)		+= pcc-cpufreq.o
>  obj-$(CONFIG_X86_POWERNOW_K6)		+= powernow-k6.o
> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> new file mode 100644
> index 000000000000..a400861c7fdc
> --- /dev/null
> +++ b/drivers/cpufreq/amd-pstate.c
> @@ -0,0 +1,413 @@
> +/*
> + * amd-pstate.c - AMD Processor P-state Frequency Driver
> + *
> + * Copyright (C) 2021 Advanced Micro Devices, Inc. All Rights Reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.

You should use a SPDX license identifier instead of copying the GPL text
in the file. See Documentation/process/license-rules.rst
 
> + *
> + * Author: Huang Rui <ray.huang@amd.com>
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/init.h>
> +#include <linux/smp.h>
> +#include <linux/sched.h>
> +#include <linux/cpufreq.h>
> +#include <linux/compiler.h>
> +#include <linux/dmi.h>
> +#include <linux/slab.h>
> +#include <linux/acpi.h>
> +#include <linux/io.h>
> +#include <linux/delay.h>
> +#include <linux/uaccess.h>
> +#include <linux/static_call.h>
> +
> +#include <acpi/processor.h>
> +#include <acpi/cppc_acpi.h>
> +
> +#include <asm/msr.h>
> +#include <asm/processor.h>
> +#include <asm/cpufeature.h>
> +#include <asm/cpu_device_id.h>
> +
> +#define AMD_PSTATE_TRANSITION_LATENCY	0x20000
> +#define AMD_PSTATE_TRANSITION_DELAY	500
> +
> +static struct cpufreq_driver amd_pstate_driver;
> +
> +struct amd_cpudata {
> +	int	cpu;
> +
> +	struct freq_qos_request req[2];
> +
> +	u64	cppc_req_cached;
> +
> +	u32	highest_perf;
> +	u32	nominal_perf;
> +	u32	lowest_nonlinear_perf;
> +	u32	lowest_perf;
> +
> +	u32	max_freq;
> +	u32	min_freq;
> +	u32	nominal_freq;
> +	u32	lowest_nonlinear_freq;
> +};
> +
> +static inline int pstate_enable(bool enable)
> +{
> +	return wrmsrl_safe(MSR_AMD_CPPC_ENABLE, enable ? 1 : 0);
> +}
> +
> +DEFINE_STATIC_CALL(amd_pstate_enable, pstate_enable);
> +
> +static inline int amd_pstate_enable(bool enable)
> +{
> +	return static_call(amd_pstate_enable)(enable);
> +}
> +
> +static int pstate_init_perf(struct amd_cpudata *cpudata)
> +{
> +	u64 cap1;
> +
> +	int ret = rdmsrl_safe_on_cpu(cpudata->cpu, MSR_AMD_CPPC_CAP1,
> +				     &cap1);
> +	if (ret)
> +		return ret;
> +
> +	/*
> +	 * TODO: Introduce AMD specific power feature.
> +	 *
> +	 * CPPC entry doesn't indicate the highest performance in some ASICs.
> +	 */
> +	WRITE_ONCE(cpudata->highest_perf, amd_get_highest_perf());
> +
> +	WRITE_ONCE(cpudata->nominal_perf, CAP1_NOMINAL_PERF(cap1));
> +	WRITE_ONCE(cpudata->lowest_nonlinear_perf, CAP1_LOWNONLIN_PERF(cap1));
> +	WRITE_ONCE(cpudata->lowest_perf, CAP1_LOWEST_PERF(cap1));
> +
> +	return 0;
> +}
> +
> +DEFINE_STATIC_CALL(amd_pstate_init_perf, pstate_init_perf);
> +
> +static inline int amd_pstate_init_perf(struct amd_cpudata *cpudata)
> +{
> +	return static_call(amd_pstate_init_perf)(cpudata);
> +}
> +
> +static void pstate_update_perf(struct amd_cpudata *cpudata, u32 min_perf,
> +			       u32 des_perf, u32 max_perf, bool fast_switch)
> +{
> +	if (fast_switch)
> +		wrmsrl(MSR_AMD_CPPC_REQ, READ_ONCE(cpudata->cppc_req_cached));
> +	else
> +		wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ,
> +			      READ_ONCE(cpudata->cppc_req_cached));
> +}
> +
> +DEFINE_STATIC_CALL(amd_pstate_update_perf, pstate_update_perf);
> +
> +static inline void amd_pstate_update_perf(struct amd_cpudata *cpudata,
> +					  u32 min_perf, u32 des_perf,
> +					  u32 max_perf, bool fast_switch)
> +{
> +	static_call(amd_pstate_update_perf)(cpudata, min_perf, des_perf,
> +					    max_perf, fast_switch);
> +}
> +
> +static void amd_pstate_update(struct amd_cpudata *cpudata, u32 min_perf,
> +			      u32 des_perf, u32 max_perf, bool fast_switch)
> +{
> +	u64 prev = READ_ONCE(cpudata->cppc_req_cached);
> +	u64 value = prev;
> +
> +	value &= ~REQ_MIN_PERF(~0L);
> +	value |= REQ_MIN_PERF(min_perf);
> +
> +	value &= ~REQ_DES_PERF(~0L);
> +	value |= REQ_DES_PERF(des_perf);
> +
> +	value &= ~REQ_MAX_PERF(~0L);
> +	value |= REQ_MAX_PERF(max_perf);
> +
> +	if (value == prev)
> +		return;
> +
> +	WRITE_ONCE(cpudata->cppc_req_cached, value);
> +
> +	amd_pstate_update_perf(cpudata, min_perf, des_perf,
> +			       max_perf, fast_switch);
> +}
> +
> +static int amd_pstate_verify(struct cpufreq_policy_data *policy)
> +{
> +	cpufreq_verify_within_cpu_limits(policy);
> +
> +	return 0;
> +}
> +
> +static int amd_pstate_target(struct cpufreq_policy *policy,
> +			     unsigned int target_freq,
> +			     unsigned int relation)
> +{
> +	struct cpufreq_freqs freqs;
> +	struct amd_cpudata *cpudata = policy->driver_data;
> +	unsigned long amd_max_perf, amd_min_perf, amd_des_perf,
> +		      amd_cap_perf;

Is there a need to preface each of the variables with amd_? This is
not something you do in any of the other routines.

> +
> +	if (!cpudata->max_freq)
> +		return -ENODEV;
> +
> +	amd_cap_perf = READ_ONCE(cpudata->highest_perf);
> +	amd_min_perf = READ_ONCE(cpudata->lowest_nonlinear_perf);
> +	amd_max_perf = amd_cap_perf;
> +
> +	freqs.old = policy->cur;
> +	freqs.new = target_freq;
> +
> +	amd_des_perf = DIV_ROUND_CLOSEST(target_freq * amd_cap_perf,
> +					 cpudata->max_freq);
> +
> +	cpufreq_freq_transition_begin(policy, &freqs);
> +	amd_pstate_update(cpudata, amd_min_perf, amd_des_perf,
> +			  amd_max_perf, false);
> +	cpufreq_freq_transition_end(policy, &freqs, false);
> +
> +	return 0;
> +}
> +
> +static int amd_get_min_freq(struct amd_cpudata *cpudata)
> +{
> +	struct cppc_perf_caps cppc_perf;
> +
> +	int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf);
> +	if (ret)
> +		return ret;
> +
> +	/* Switch to khz */
> +	return cppc_perf.lowest_freq * 1000;
> +}
> +
> +static int amd_get_max_freq(struct amd_cpudata *cpudata)
> +{
> +	struct cppc_perf_caps cppc_perf;
> +	u32 max_perf, max_freq, nominal_freq, nominal_perf;
> +	u64 boost_ratio;
> +
> +	int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf);
> +	if (ret)
> +		return ret;
> +
> +	nominal_freq = cppc_perf.nominal_freq;
> +	nominal_perf = READ_ONCE(cpudata->nominal_perf);
> +	max_perf = READ_ONCE(cpudata->highest_perf);
> +
> +	boost_ratio = div_u64(max_perf << SCHED_CAPACITY_SHIFT,
> +			      nominal_perf);
> +
> +	max_freq = nominal_freq * boost_ratio >> SCHED_CAPACITY_SHIFT;
> +
> +	/* Switch to khz */
> +	return max_freq * 1000;
> +}
> +
> +static int amd_get_nominal_freq(struct amd_cpudata *cpudata)
> +{
> +	struct cppc_perf_caps cppc_perf;
> +	u32 nominal_freq;
> +
> +	int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf);
> +	if (ret)
> +		return ret;
> +
> +	nominal_freq = cppc_perf.nominal_freq;
> +
> +	/* Switch to khz */
> +	return nominal_freq * 1000;

You could just do
	return cppc_perf.nominal_freq * 1000;

> +}
> +
> +static int amd_get_lowest_nonlinear_freq(struct amd_cpudata *cpudata)
> +{
> +	struct cppc_perf_caps cppc_perf;
> +	u32 lowest_nonlinear_freq, lowest_nonlinear_perf,
> +	    nominal_freq, nominal_perf;
> +	u64 lowest_nonlinear_ratio;
> +
> +	int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf);
> +	if (ret)
> +		return ret;
> +
> +	nominal_freq = cppc_perf.nominal_freq;
> +	nominal_perf = READ_ONCE(cpudata->nominal_perf);
> +
> +	lowest_nonlinear_perf = cppc_perf.lowest_nonlinear_perf;
> +
> +	lowest_nonlinear_ratio = div_u64(lowest_nonlinear_perf <<
> +					 SCHED_CAPACITY_SHIFT, nominal_perf);

Please put the two args to the shift operation should be on the same line.

> +
> +	lowest_nonlinear_freq = nominal_freq * lowest_nonlinear_ratio >> SCHED_CAPACITY_SHIFT;
> +
> +	/* Switch to khz */
> +	return lowest_nonlinear_freq * 1000;
> +}
> +
> +static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
> +{
> +	int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret;
> +	unsigned int cpu = policy->cpu;
> +	struct device *dev;
> +	struct amd_cpudata *cpudata;
> +
> +	dev = get_cpu_device(policy->cpu);
> +	if (!dev)
> +		return -ENODEV;
> +
> +	cpudata = kzalloc(sizeof(*cpudata), GFP_KERNEL);
> +	if (!cpudata)
> +		return -ENOMEM;
> +
> +	cpudata->cpu = cpu;

You could do the following and get rid of the cpu variable, it's only used here.

	cpudata->cpu = policy->cpu;

> +
> +	ret = amd_pstate_init_perf(cpudata);
> +	if (ret)
> +		goto free_cpudata1;
> +
> +	min_freq = amd_get_min_freq(cpudata);
> +	max_freq = amd_get_max_freq(cpudata);
> +	nominal_freq = amd_get_nominal_freq(cpudata);
> +	lowest_nonlinear_freq = amd_get_lowest_nonlinear_freq(cpudata);
> +
> +	if (min_freq < 0 || max_freq < 0 || min_freq > max_freq) {
> +		dev_err(dev, "min_freq(%d) or max_freq(%d) value is incorrect\n",
> +			min_freq, max_freq);
> +		ret = -EINVAL;
> +		goto free_cpudata1;
> +	}
> +
> +	policy->cpuinfo.transition_latency = AMD_PSTATE_TRANSITION_LATENCY;
> +	policy->transition_delay_us = AMD_PSTATE_TRANSITION_DELAY;
> +
> +	policy->min = min_freq;
> +	policy->max = max_freq;
> +
> +	policy->cpuinfo.min_freq = min_freq;
> +	policy->cpuinfo.max_freq = max_freq;
> +
> +	/* It will be updated by governor */
> +	policy->cur = policy->cpuinfo.min_freq;
> +
> +	ret = freq_qos_add_request(&policy->constraints, &cpudata->req[0],
> +				   FREQ_QOS_MIN, policy->cpuinfo.min_freq);
> +	if (ret < 0) {
> +		dev_err(dev, "Failed to add min-freq constraint (%d)\n", ret);
> +		goto free_cpudata1;
> +	}
> +
> +	ret = freq_qos_add_request(&policy->constraints, &cpudata->req[1],
> +				   FREQ_QOS_MAX, policy->cpuinfo.max_freq);
> +	if (ret < 0) {
> +		dev_err(dev, "Failed to add max-freq constraint (%d)\n", ret);
> +		goto free_cpudata2;
> +	}
> +
> +	/* Initial processor data capability frequencies */
> +	cpudata->max_freq = max_freq;
> +	cpudata->min_freq = min_freq;
> +	cpudata->nominal_freq = nominal_freq;
> +	cpudata->lowest_nonlinear_freq = lowest_nonlinear_freq;
> +
> +	policy->driver_data = cpudata;
> +
> +	return 0;
> +
> +	freq_qos_remove_request(&cpudata->req[1]);

Is this line of code reachable?

Perhaps this was meant as cleanup if a freq_qos_add_request() call failed but
it shouldn't be needed with the current code flow.

> +free_cpudata2:
> +	freq_qos_remove_request(&cpudata->req[0]);
> +free_cpudata1:
> +	kfree(cpudata);
> +	return ret;
> +}
> +
> +static int amd_pstate_cpu_exit(struct cpufreq_policy *policy)
> +{
> +	struct amd_cpudata *cpudata;
> +
> +	cpudata = policy->driver_data;
> +
> +	freq_qos_remove_request(&cpudata->req[1]);
> +	freq_qos_remove_request(&cpudata->req[0]);
> +	kfree(cpudata);
> +
> +	return 0;
> +}
> +
> +static struct cpufreq_driver amd_pstate_driver = {
> +	.flags		= CPUFREQ_CONST_LOOPS | CPUFREQ_NEED_UPDATE_LIMITS,
> +	.verify		= amd_pstate_verify,
> +	.target		= amd_pstate_target,
> +	.init		= amd_pstate_cpu_init,
> +	.exit		= amd_pstate_cpu_exit,
> +	.name		= "amd-pstate",
> +};
> +
> +static int __init amd_pstate_init(void)
> +{
> +	int ret;
> +
> +	if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
> +		return -ENODEV;
> +
> +	if (!acpi_cpc_valid()) {
> +		pr_debug("%s, the _CPC object is not present in SBIOS\n",
> +			 __func__);

Do we need to print the function name here (and below)?

> +		return -ENODEV;
> +	}
> +
> +	/* don't keep reloading if cpufreq_driver exists */
> +	if (cpufreq_get_current_driver())
> +		return -EEXIST;
> +
> +	/* capability check */
> +	if (!boot_cpu_has(X86_FEATURE_AMD_CPPC)) {
> +		pr_debug("%s, AMD CPPC MSR based functionality is not supported\n",
> +			 __func__);> +		return -ENODEV;
> +	}
> +
> +	/* enable amd pstate feature */
> +	ret = amd_pstate_enable(true);
> +	if (ret) {
> +		pr_err("%s, failed to enable amd-pstate with return %d\n",
> +		       __func__, ret);
> +		return ret;
> +	}
> +
> +	ret = cpufreq_register_driver(&amd_pstate_driver);
> +	if (ret) {
> +		pr_err("%s, return %d\n", __func__, ret);
> +		return ret;

No need to do a return ret here...

> +	}
> +
> +	return 0;

...just do a return ret here.

-Nathan

> +}
> +
> +device_initcall(amd_pstate_init);
> +
> +MODULE_AUTHOR("Huang Rui <ray.huang@amd.com>");
> +MODULE_DESCRIPTION("AMD Processor P-state Frequency Driver");
> +MODULE_LICENSE("GPL");
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 07/21] cpufreq: amd: add fast switch function for amd-pstate
  2021-10-29 13:02 ` [PATCH v3 07/21] cpufreq: amd: add fast switch function for amd-pstate Huang Rui
  2021-10-29 14:16   ` Limonciello, Mario
@ 2021-11-02 19:56   ` Nathan Fontenot
  1 sibling, 0 replies; 50+ messages in thread
From: Nathan Fontenot @ 2021-11-02 19:56 UTC (permalink / raw)
  To: Huang Rui, Rafael J . Wysocki, Viresh Kumar, Shuah Khan,
	Borislav Petkov, Peter Zijlstra, Ingo Molnar,
	Giovanni Gherdovich, linux-pm
  Cc: Deepak Sharma, Alex Deucher, Mario Limonciello, Steven Noonan,
	Nathan Fontenot, Jinzhou Su, Xiaojian Du, linux-kernel, x86

On 10/29/21 8:02 AM, Huang Rui wrote:
> Introduce the fast switch function for amd-pstate on the AMD processors
> which support the full MSR register control. It's able to decrease the
> lattency on interrupt context.
> 
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> ---
>  drivers/cpufreq/amd-pstate.c | 38 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 38 insertions(+)
> 
> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> index a400861c7fdc..55ff03f85608 100644
> --- a/drivers/cpufreq/amd-pstate.c
> +++ b/drivers/cpufreq/amd-pstate.c
> @@ -191,6 +191,41 @@ static int amd_pstate_target(struct cpufreq_policy *policy,
>  	return 0;
>  }
>  
> +static void amd_pstate_adjust_perf(unsigned int cpu,
> +				   unsigned long min_perf,
> +				   unsigned long target_perf,
> +				   unsigned long capacity)
> +{
> +	unsigned long amd_max_perf, amd_min_perf, amd_des_perf,
> +		      amd_cap_perf, lowest_nonlinear_perf;

You could drop the amd_ prefix to these local variables.

-Nathan

> +	struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
> +	struct amd_cpudata *cpudata = policy->driver_data;
> +
> +	amd_cap_perf = READ_ONCE(cpudata->highest_perf);
> +	lowest_nonlinear_perf = READ_ONCE(cpudata->lowest_nonlinear_perf);
> +
> +	if (target_perf < capacity)
> +		amd_des_perf = DIV_ROUND_UP(amd_cap_perf * target_perf,
> +					    capacity);
> +
> +	amd_min_perf = READ_ONCE(cpudata->highest_perf);
> +	if (min_perf < capacity)
> +		amd_min_perf = DIV_ROUND_UP(amd_cap_perf * min_perf, capacity);
> +
> +	if (amd_min_perf < lowest_nonlinear_perf)
> +		amd_min_perf = lowest_nonlinear_perf;
> +
> +	amd_max_perf = amd_cap_perf;
> +	if (amd_max_perf < amd_min_perf)
> +		amd_max_perf = amd_min_perf;
> +
> +	amd_des_perf = clamp_t(unsigned long, amd_des_perf,
> +			       amd_min_perf, amd_max_perf);
> +
> +	amd_pstate_update(cpudata, amd_min_perf, amd_des_perf,
> +			  amd_max_perf, true);
> +}
> +
>  static int amd_get_min_freq(struct amd_cpudata *cpudata)
>  {
>  	struct cppc_perf_caps cppc_perf;
> @@ -311,6 +346,8 @@ static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
>  	/* It will be updated by governor */
>  	policy->cur = policy->cpuinfo.min_freq;
>  
> +	policy->fast_switch_possible = true;
> +
>  	ret = freq_qos_add_request(&policy->constraints, &cpudata->req[0],
>  				   FREQ_QOS_MIN, policy->cpuinfo.min_freq);
>  	if (ret < 0) {
> @@ -360,6 +397,7 @@ static struct cpufreq_driver amd_pstate_driver = {
>  	.flags		= CPUFREQ_CONST_LOOPS | CPUFREQ_NEED_UPDATE_LIMITS,
>  	.verify		= amd_pstate_verify,
>  	.target		= amd_pstate_target,
> +	.adjust_perf    = amd_pstate_adjust_perf,
>  	.init		= amd_pstate_cpu_init,
>  	.exit		= amd_pstate_cpu_exit,
>  	.name		= "amd-pstate",
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 18/21] cpupower: enable boost state support for amd-pstate module
  2021-10-29 13:02 ` [PATCH v3 18/21] cpupower: enable boost state support for amd-pstate module Huang Rui
@ 2021-11-02 20:11   ` Nathan Fontenot
  2021-11-03  7:04     ` Huang Rui
  0 siblings, 1 reply; 50+ messages in thread
From: Nathan Fontenot @ 2021-11-02 20:11 UTC (permalink / raw)
  To: Huang Rui, Rafael J . Wysocki, Viresh Kumar, Shuah Khan,
	Borislav Petkov, Peter Zijlstra, Ingo Molnar,
	Giovanni Gherdovich, linux-pm
  Cc: Deepak Sharma, Alex Deucher, Mario Limonciello, Steven Noonan,
	Nathan Fontenot, Jinzhou Su, Xiaojian Du, linux-kernel, x86



On 10/29/21 8:02 AM, Huang Rui wrote:
> The legacy ACPI hardware P-States function has 3 P-States on ACPI table,
> the CPU frequency only can be switched between the 3 P-States. While the
> processor supports the boost state, it will have another boost state
> that the frequency can be higher than P0 state, and the state can be
> decoded by the function of decode_pstates() and read by
> amd_pci_get_num_boost_states().
> 
> However, the new AMD P-States function is different than legacy ACPI
> hardware P-State on AMD processors. That has a finer grain frequency
> range between the highest and lowest frequency. And boost frequency is
> actually the frequency which is mapped on highest performance ratio. The
> similiar previous P0 frequency is mapped on nominal performance ratio.

s/similiar/similar/

> If the highest performance on the processor is higher than nominal
> performance, then we think the current processor supports the boost
> state. And it uses amd_pstate_boost_init() to initialize boost for AMD
> P-States function.
> 
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> ---
>  tools/power/cpupower/utils/helpers/amd.c     | 18 ++++++++++++++++++
>  tools/power/cpupower/utils/helpers/helpers.h |  5 +++++
>  tools/power/cpupower/utils/helpers/misc.c    |  2 ++
>  3 files changed, 25 insertions(+)
> 
> diff --git a/tools/power/cpupower/utils/helpers/amd.c b/tools/power/cpupower/utils/helpers/amd.c
> index f233a6ab75ac..92b9fb631768 100644
> --- a/tools/power/cpupower/utils/helpers/amd.c
> +++ b/tools/power/cpupower/utils/helpers/amd.c
> @@ -182,5 +182,23 @@ static unsigned long amd_pstate_get_data(unsigned int cpu,
>  						  MAX_AMD_PSTATE_VALUE_READ_FILES);
>  }
>  
> +void amd_pstate_boost_init(unsigned int cpu, int *support, int *active)
> +{
> +	unsigned long highest_perf, nominal_perf, cpuinfo_min,
> +		      cpuinfo_max, amd_pstate_max;
> +
> +	highest_perf = amd_pstate_get_data(cpu, AMD_PSTATE_HIGHEST_PERF);
> +	nominal_perf = amd_pstate_get_data(cpu, AMD_PSTATE_NOMINAL_PERF);
> +
> +	*support = highest_perf > nominal_perf ? 1 : 0;
> +	if (!(*support))
> +		return;
> +
> +	cpufreq_get_hardware_limits(cpu, &cpuinfo_min, &cpuinfo_max);
> +	amd_pstate_max = amd_pstate_get_data(cpu, AMD_PSTATE_MAX_FREQ);
> +
> +	*active = cpuinfo_max == amd_pstate_max ? 1 : 0;
> +}
> +
>  /* AMD P-States Helper Functions ***************/
>  #endif /* defined(__i386__) || defined(__x86_64__) */
> diff --git a/tools/power/cpupower/utils/helpers/helpers.h b/tools/power/cpupower/utils/helpers/helpers.h
> index e03cc97297aa..c03925bea655 100644
> --- a/tools/power/cpupower/utils/helpers/helpers.h
> +++ b/tools/power/cpupower/utils/helpers/helpers.h
> @@ -140,6 +140,8 @@ extern int cpufreq_has_boost_support(unsigned int cpu, int *support,
>  
>  /* AMD P-States stuff **************************/
>  extern bool cpupower_amd_pstate_enabled(void);
> +extern void amd_pstate_boost_init(unsigned int cpu,
> +				  int *support, int *active);
>  
>  /* AMD P-States stuff **************************/
>  
> @@ -177,6 +179,9 @@ static inline int cpufreq_has_boost_support(unsigned int cpu, int *support,
>  
>  static inline bool cpupower_amd_pstate_enabled(void)
>  { return false; }
> +static void amd_pstate_boost_init(unsigned int cpu,
> +				  int *support, int *active)
> +{ return; }

I don't believe the return statement is needed here, can just be {}

-Nathan

>  
>  /* cpuid and cpuinfo helpers  **************************/
>  
> diff --git a/tools/power/cpupower/utils/helpers/misc.c b/tools/power/cpupower/utils/helpers/misc.c
> index 0c483cdefcc2..e0d3145434d3 100644
> --- a/tools/power/cpupower/utils/helpers/misc.c
> +++ b/tools/power/cpupower/utils/helpers/misc.c
> @@ -41,6 +41,8 @@ int cpufreq_has_boost_support(unsigned int cpu, int *support, int *active,
>  			if (ret)
>  				return ret;
>  		}
> +	} else if (cpupower_cpu_info.caps & CPUPOWER_CAP_AMD_PSTATE) {
> +		amd_pstate_boost_init(cpu, support, active);
>  	} else if (cpupower_cpu_info.caps & CPUPOWER_CAP_INTEL_IDA)
>  		*support = *active = 1;
>  	return 0;
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 06/21] cpufreq: amd: introduce a new amd pstate driver to support future processors
  2021-11-02 19:38   ` Nathan Fontenot
@ 2021-11-03  7:01     ` Huang Rui
  2021-11-04 15:10       ` Nathan Fontenot
  0 siblings, 1 reply; 50+ messages in thread
From: Huang Rui @ 2021-11-03  7:01 UTC (permalink / raw)
  To: Fontenot, Nathan
  Cc: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Borislav Petkov,
	Peter Zijlstra, Ingo Molnar, Giovanni Gherdovich, linux-pm,
	Sharma, Deepak, Deucher, Alexander, Limonciello, Mario,
	Steven Noonan, Su, Jinzhou (Joe),
	Du, Xiaojian, linux-kernel, x86

On Wed, Nov 03, 2021 at 03:38:43AM +0800, Fontenot, Nathan wrote:
> On 10/29/21 8:02 AM, Huang Rui wrote:
> > amd-pstate is the AMD CPU performance scaling driver that introduces a
> > new CPU frequency control mechanism on AMD Zen based CPU series in Linux
> > kernel. The new mechanism is based on Collaborative processor
> > performance control (CPPC) which is finer grain frequency management
> > than legacy ACPI hardware P-States. Current AMD CPU platforms are using
> > the ACPI P-states driver to manage CPU frequency and clocks with
> > switching only in 3 P-states. AMD P-States is to replace the ACPI
> > P-states controls, allows a flexible, low-latency interface for the
> > Linux kernel to directly communicate the performance hints to hardware.
> > 
> > "amd-pstate" leverages the Linux kernel governors such as *schedutil*,
> > *ondemand*, etc. to manage the performance hints which are provided by CPPC
> > hardware functionality. The first version for amd-pstate is to support one
> > of the Zen3 processors, and we will support more in future after we verify
> > the hardware and SBIOS functionalities.
> > 
> > There are two types of hardware implementations for amd-pstate: one is full
> > MSR support and another is shared memory support. It can use
> > X86_FEATURE_AMD_CPPC_EXT feature flag to distinguish the different types.
> > 
> > Using the new AMD P-States method + kernel governors (*schedutil*,
> > *ondemand*, ...) to manage the frequency update is the most appropriate
> > bridge between AMD Zen based hardware processor and Linux kernel, the
> > processor is able to ajust to the most efficiency frequency according to
> 
> s/ajust/adjust/
> 
> > the kernel scheduler loading.
> > 
> > Performance Per Watt (PPW) Caculation:
> > 
> > The PPW caculation is referred by below paper:
> > https://software.intel.com/content/dam/develop/external/us/en/documents/performance-per-what-paper.pdf
> > 
> > Below formula is referred from below spec to measure the PPW:
> > 
> > (F / t) / P = F * t / (t * E) = F / E,
> > 
> > "F" is the number of frames per second.
> > "P" is power measurd in watts.
> 
> s/measurd/measured/
> 
> > "E" is energy measured in joules.
> > 
> > We use the RAPL interface with "perf" tool to get the energy data of the
> > package power.
> > 
> > The data comparsions between amd-pstate and acpi-freq module are tested on
> 
> s/comparsions/comparisons/
> 
> > AMD Cezanne processor:
> > 
> > 1) TBench CPU benchmark:
> > 
> > +---------------------------------------------------------------------+
> > |                                                                     |
> > |               TBench (Performance Per Watt)                         |
> > |                                                    Higher is better |
> > +-------------------+------------------------+------------------------+
> > |                   |  Performance Per Watt  |  Performance Per Watt  |
> > |   Kernel Module   |       (Schedutil)      |       (Ondemand)       |
> > |                   |  Unit: MB / (s * J)    |  Unit: MB / (s * J)    |
> > +-------------------+------------------------+------------------------+
> > |                   |                        |                        |
> > |    acpi-cpufreq   |         3.022          |        2.969           |
> > |                   |                        |                        |
> > +-------------------+------------------------+------------------------+
> > |                   |                        |                        |
> > |     amd-pstate    |         3.131          |        3.284           |
> > |                   |                        |                        |
> > +-------------------+------------------------+------------------------+
> > 
> > 2) Gitsource CPU benchmark:
> > 
> > +---------------------------------------------------------------------+
> > |                                                                     |
> > |               Gitsource (Performance Per Watt)                      |
> > |                                                    Higher is better |
> > +-------------------+------------------------+------------------------+
> > |                   |  Performance Per Watt  |  Performance Per Watt  |
> > |   Kernel Module   |       (Schedutil)      |       (Ondemand)       |
> > |                   |  Unit: 1 / (s * J)     |  Unit: 1 / (s * J)     |
> > +-------------------+------------------------+------------------------+
> > |                   |                        |                        |
> > |    acpi-cpufreq   |     3.42172E-07        |     2.74508E-07        |
> > |                   |                        |                        |
> > +-------------------+------------------------+------------------------+
> > |                   |                        |                        |
> > |     amd-pstate    |     4.09141E-07        |     3.47610E-07        |
> > |                   |                        |                        |
> > +-------------------+------------------------+------------------------+
> > 
> > 3) Speedometer 2.0 CPU benchmark:
> > 
> > +---------------------------------------------------------------------+
> > |                                                                     |
> > |               Speedometer 2.0 (Performance Per Watt)                |
> > |                                                    Higher is better |
> > +-------------------+------------------------+------------------------+
> > |                   |  Performance Per Watt  |  Performance Per Watt  |
> > |   Kernel Module   |       (Schedutil)      |       (Ondemand)       |
> > |                   |  Unit: 1 / (s * J)     |  Unit: 1 / (s * J)     |
> > +-------------------+------------------------+------------------------+
> > |                   |                        |                        |
> > |    acpi-cpufreq   |      0.116111767       |      0.110321664       |
> > |                   |                        |                        |
> > +-------------------+------------------------+------------------------+
> > |                   |                        |                        |
> > |     amd-pstate    |      0.115825281       |      0.122024299       |
> > |                   |                        |                        |
> > +-------------------+------------------------+------------------------+
> > 
> > According to above average data, we can see this solution has shown better
> > performance per watt scaling on mobile CPU benchmarks in most of cases.
> > 
> > Signed-off-by: Huang Rui <ray.huang@amd.com>
> > ---
> >  drivers/cpufreq/Kconfig.x86  |  17 ++
> >  drivers/cpufreq/Makefile     |   1 +
> >  drivers/cpufreq/amd-pstate.c | 413 +++++++++++++++++++++++++++++++++++
> >  3 files changed, 431 insertions(+)
> >  create mode 100644 drivers/cpufreq/amd-pstate.c
> > 
> > diff --git a/drivers/cpufreq/Kconfig.x86 b/drivers/cpufreq/Kconfig.x86
> > index 92701a18bdd9..2e798b2c0bdb 100644
> > --- a/drivers/cpufreq/Kconfig.x86
> > +++ b/drivers/cpufreq/Kconfig.x86
> > @@ -34,6 +34,23 @@ config X86_PCC_CPUFREQ
> >  
> >  	  If in doubt, say N.
> >  
> > +config X86_AMD_PSTATE
> > +	bool "AMD Processor P-State driver"
> > +	depends on X86
> > +	select ACPI_PROCESSOR if ACPI
> > +	select ACPI_CPPC_LIB if X86_64 && ACPI && SCHED_MC_PRIO
> > +	select CPU_FREQ_GOV_SCHEDUTIL if SMP
> > +	help
> > +	  This driver adds a CPUFreq driver which utilizes a fine grain
> > +	  processor performance freqency control range instead of legacy
> 
> s/freqency/frequency/
> 
> > +	  performance levels. This driver supports the AMD processors with
> > +	  _CPC object in the SBIOS.
> > +
> > +	  For details, take a look at:
> > +	  <file:Documentation/admin-guide/pm/amd-pstate.rst>.
> > +
> > +	  If in doubt, say N.
> > +
> >  config X86_ACPI_CPUFREQ
> >  	tristate "ACPI Processor P-States driver"
> >  	depends on ACPI_PROCESSOR
> > diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
> > index 48ee5859030c..c8d307010922 100644
> > --- a/drivers/cpufreq/Makefile
> > +++ b/drivers/cpufreq/Makefile
> > @@ -25,6 +25,7 @@ obj-$(CONFIG_CPUFREQ_DT_PLATDEV)	+= cpufreq-dt-platdev.o
> >  # speedstep-* is preferred over p4-clockmod.
> >  
> >  obj-$(CONFIG_X86_ACPI_CPUFREQ)		+= acpi-cpufreq.o
> > +obj-$(CONFIG_X86_AMD_PSTATE)		+= amd-pstate.o
> >  obj-$(CONFIG_X86_POWERNOW_K8)		+= powernow-k8.o
> >  obj-$(CONFIG_X86_PCC_CPUFREQ)		+= pcc-cpufreq.o
> >  obj-$(CONFIG_X86_POWERNOW_K6)		+= powernow-k6.o
> > diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> > new file mode 100644
> > index 000000000000..a400861c7fdc
> > --- /dev/null
> > +++ b/drivers/cpufreq/amd-pstate.c
> > @@ -0,0 +1,413 @@
> > +/*
> > + * amd-pstate.c - AMD Processor P-state Frequency Driver
> > + *
> > + * Copyright (C) 2021 Advanced Micro Devices, Inc. All Rights Reserved.
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; either version 2
> > + * of the License, or (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License along with
> > + * this program; if not, write to the Free Software
> > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
> 
> You should use a SPDX license identifier instead of copying the GPL text
> in the file. See Documentation/process/license-rules.rst

The SPDX license identifier is an alternative way to instead of the common
way to express the license at the top comment of the file. Acutally it's
not mandatory, right?

>  
> > + *
> > + * Author: Huang Rui <ray.huang@amd.com>
> > + */
> > +
> > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> > +
> > +#include <linux/kernel.h>
> > +#include <linux/module.h>
> > +#include <linux/init.h>
> > +#include <linux/smp.h>
> > +#include <linux/sched.h>
> > +#include <linux/cpufreq.h>
> > +#include <linux/compiler.h>
> > +#include <linux/dmi.h>
> > +#include <linux/slab.h>
> > +#include <linux/acpi.h>
> > +#include <linux/io.h>
> > +#include <linux/delay.h>
> > +#include <linux/uaccess.h>
> > +#include <linux/static_call.h>
> > +
> > +#include <acpi/processor.h>
> > +#include <acpi/cppc_acpi.h>
> > +
> > +#include <asm/msr.h>
> > +#include <asm/processor.h>
> > +#include <asm/cpufeature.h>
> > +#include <asm/cpu_device_id.h>
> > +
> > +#define AMD_PSTATE_TRANSITION_LATENCY	0x20000
> > +#define AMD_PSTATE_TRANSITION_DELAY	500
> > +
> > +static struct cpufreq_driver amd_pstate_driver;
> > +
> > +struct amd_cpudata {
> > +	int	cpu;
> > +
> > +	struct freq_qos_request req[2];
> > +
> > +	u64	cppc_req_cached;
> > +
> > +	u32	highest_perf;
> > +	u32	nominal_perf;
> > +	u32	lowest_nonlinear_perf;
> > +	u32	lowest_perf;
> > +
> > +	u32	max_freq;
> > +	u32	min_freq;
> > +	u32	nominal_freq;
> > +	u32	lowest_nonlinear_freq;
> > +};
> > +
> > +static inline int pstate_enable(bool enable)
> > +{
> > +	return wrmsrl_safe(MSR_AMD_CPPC_ENABLE, enable ? 1 : 0);
> > +}
> > +
> > +DEFINE_STATIC_CALL(amd_pstate_enable, pstate_enable);
> > +
> > +static inline int amd_pstate_enable(bool enable)
> > +{
> > +	return static_call(amd_pstate_enable)(enable);
> > +}
> > +
> > +static int pstate_init_perf(struct amd_cpudata *cpudata)
> > +{
> > +	u64 cap1;
> > +
> > +	int ret = rdmsrl_safe_on_cpu(cpudata->cpu, MSR_AMD_CPPC_CAP1,
> > +				     &cap1);
> > +	if (ret)
> > +		return ret;
> > +
> > +	/*
> > +	 * TODO: Introduce AMD specific power feature.
> > +	 *
> > +	 * CPPC entry doesn't indicate the highest performance in some ASICs.
> > +	 */
> > +	WRITE_ONCE(cpudata->highest_perf, amd_get_highest_perf());
> > +
> > +	WRITE_ONCE(cpudata->nominal_perf, CAP1_NOMINAL_PERF(cap1));
> > +	WRITE_ONCE(cpudata->lowest_nonlinear_perf, CAP1_LOWNONLIN_PERF(cap1));
> > +	WRITE_ONCE(cpudata->lowest_perf, CAP1_LOWEST_PERF(cap1));
> > +
> > +	return 0;
> > +}
> > +
> > +DEFINE_STATIC_CALL(amd_pstate_init_perf, pstate_init_perf);
> > +
> > +static inline int amd_pstate_init_perf(struct amd_cpudata *cpudata)
> > +{
> > +	return static_call(amd_pstate_init_perf)(cpudata);
> > +}
> > +
> > +static void pstate_update_perf(struct amd_cpudata *cpudata, u32 min_perf,
> > +			       u32 des_perf, u32 max_perf, bool fast_switch)
> > +{
> > +	if (fast_switch)
> > +		wrmsrl(MSR_AMD_CPPC_REQ, READ_ONCE(cpudata->cppc_req_cached));
> > +	else
> > +		wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ,
> > +			      READ_ONCE(cpudata->cppc_req_cached));
> > +}
> > +
> > +DEFINE_STATIC_CALL(amd_pstate_update_perf, pstate_update_perf);
> > +
> > +static inline void amd_pstate_update_perf(struct amd_cpudata *cpudata,
> > +					  u32 min_perf, u32 des_perf,
> > +					  u32 max_perf, bool fast_switch)
> > +{
> > +	static_call(amd_pstate_update_perf)(cpudata, min_perf, des_perf,
> > +					    max_perf, fast_switch);
> > +}
> > +
> > +static void amd_pstate_update(struct amd_cpudata *cpudata, u32 min_perf,
> > +			      u32 des_perf, u32 max_perf, bool fast_switch)
> > +{
> > +	u64 prev = READ_ONCE(cpudata->cppc_req_cached);
> > +	u64 value = prev;
> > +
> > +	value &= ~REQ_MIN_PERF(~0L);
> > +	value |= REQ_MIN_PERF(min_perf);
> > +
> > +	value &= ~REQ_DES_PERF(~0L);
> > +	value |= REQ_DES_PERF(des_perf);
> > +
> > +	value &= ~REQ_MAX_PERF(~0L);
> > +	value |= REQ_MAX_PERF(max_perf);
> > +
> > +	if (value == prev)
> > +		return;
> > +
> > +	WRITE_ONCE(cpudata->cppc_req_cached, value);
> > +
> > +	amd_pstate_update_perf(cpudata, min_perf, des_perf,
> > +			       max_perf, fast_switch);
> > +}
> > +
> > +static int amd_pstate_verify(struct cpufreq_policy_data *policy)
> > +{
> > +	cpufreq_verify_within_cpu_limits(policy);
> > +
> > +	return 0;
> > +}
> > +
> > +static int amd_pstate_target(struct cpufreq_policy *policy,
> > +			     unsigned int target_freq,
> > +			     unsigned int relation)
> > +{
> > +	struct cpufreq_freqs freqs;
> > +	struct amd_cpudata *cpudata = policy->driver_data;
> > +	unsigned long amd_max_perf, amd_min_perf, amd_des_perf,
> > +		      amd_cap_perf;
> 
> Is there a need to preface each of the variables with amd_? This is
> not something you do in any of the other routines.

Just the names of the temporary variables, They are used on _target and
_adjust_perf functions.

> 
> > +
> > +	if (!cpudata->max_freq)
> > +		return -ENODEV;
> > +
> > +	amd_cap_perf = READ_ONCE(cpudata->highest_perf);
> > +	amd_min_perf = READ_ONCE(cpudata->lowest_nonlinear_perf);
> > +	amd_max_perf = amd_cap_perf;
> > +
> > +	freqs.old = policy->cur;
> > +	freqs.new = target_freq;
> > +
> > +	amd_des_perf = DIV_ROUND_CLOSEST(target_freq * amd_cap_perf,
> > +					 cpudata->max_freq);
> > +
> > +	cpufreq_freq_transition_begin(policy, &freqs);
> > +	amd_pstate_update(cpudata, amd_min_perf, amd_des_perf,
> > +			  amd_max_perf, false);
> > +	cpufreq_freq_transition_end(policy, &freqs, false);
> > +
> > +	return 0;
> > +}
> > +
> > +static int amd_get_min_freq(struct amd_cpudata *cpudata)
> > +{
> > +	struct cppc_perf_caps cppc_perf;
> > +
> > +	int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf);
> > +	if (ret)
> > +		return ret;
> > +
> > +	/* Switch to khz */
> > +	return cppc_perf.lowest_freq * 1000;
> > +}
> > +
> > +static int amd_get_max_freq(struct amd_cpudata *cpudata)
> > +{
> > +	struct cppc_perf_caps cppc_perf;
> > +	u32 max_perf, max_freq, nominal_freq, nominal_perf;
> > +	u64 boost_ratio;
> > +
> > +	int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf);
> > +	if (ret)
> > +		return ret;
> > +
> > +	nominal_freq = cppc_perf.nominal_freq;
> > +	nominal_perf = READ_ONCE(cpudata->nominal_perf);
> > +	max_perf = READ_ONCE(cpudata->highest_perf);
> > +
> > +	boost_ratio = div_u64(max_perf << SCHED_CAPACITY_SHIFT,
> > +			      nominal_perf);
> > +
> > +	max_freq = nominal_freq * boost_ratio >> SCHED_CAPACITY_SHIFT;
> > +
> > +	/* Switch to khz */
> > +	return max_freq * 1000;
> > +}
> > +
> > +static int amd_get_nominal_freq(struct amd_cpudata *cpudata)
> > +{
> > +	struct cppc_perf_caps cppc_perf;
> > +	u32 nominal_freq;
> > +
> > +	int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf);
> > +	if (ret)
> > +		return ret;
> > +
> > +	nominal_freq = cppc_perf.nominal_freq;
> > +
> > +	/* Switch to khz */
> > +	return nominal_freq * 1000;
> 
> You could just do
> 	return cppc_perf.nominal_freq * 1000;

Updated.

> 
> > +}
> > +
> > +static int amd_get_lowest_nonlinear_freq(struct amd_cpudata *cpudata)
> > +{
> > +	struct cppc_perf_caps cppc_perf;
> > +	u32 lowest_nonlinear_freq, lowest_nonlinear_perf,
> > +	    nominal_freq, nominal_perf;
> > +	u64 lowest_nonlinear_ratio;
> > +
> > +	int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf);
> > +	if (ret)
> > +		return ret;
> > +
> > +	nominal_freq = cppc_perf.nominal_freq;
> > +	nominal_perf = READ_ONCE(cpudata->nominal_perf);
> > +
> > +	lowest_nonlinear_perf = cppc_perf.lowest_nonlinear_perf;
> > +
> > +	lowest_nonlinear_ratio = div_u64(lowest_nonlinear_perf <<
> > +					 SCHED_CAPACITY_SHIFT, nominal_perf);
> 
> Please put the two args to the shift operation should be on the same line.
> 

No problem.

> > +
> > +	lowest_nonlinear_freq = nominal_freq * lowest_nonlinear_ratio >> SCHED_CAPACITY_SHIFT;
> > +
> > +	/* Switch to khz */
> > +	return lowest_nonlinear_freq * 1000;
> > +}
> > +
> > +static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
> > +{
> > +	int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret;
> > +	unsigned int cpu = policy->cpu;
> > +	struct device *dev;
> > +	struct amd_cpudata *cpudata;
> > +
> > +	dev = get_cpu_device(policy->cpu);
> > +	if (!dev)
> > +		return -ENODEV;
> > +
> > +	cpudata = kzalloc(sizeof(*cpudata), GFP_KERNEL);
> > +	if (!cpudata)
> > +		return -ENOMEM;
> > +
> > +	cpudata->cpu = cpu;
> 
> You could do the following and get rid of the cpu variable, it's only used here.
> 
> 	cpudata->cpu = policy->cpu;

Updated.

> 
> > +
> > +	ret = amd_pstate_init_perf(cpudata);
> > +	if (ret)
> > +		goto free_cpudata1;
> > +
> > +	min_freq = amd_get_min_freq(cpudata);
> > +	max_freq = amd_get_max_freq(cpudata);
> > +	nominal_freq = amd_get_nominal_freq(cpudata);
> > +	lowest_nonlinear_freq = amd_get_lowest_nonlinear_freq(cpudata);
> > +
> > +	if (min_freq < 0 || max_freq < 0 || min_freq > max_freq) {
> > +		dev_err(dev, "min_freq(%d) or max_freq(%d) value is incorrect\n",
> > +			min_freq, max_freq);
> > +		ret = -EINVAL;
> > +		goto free_cpudata1;
> > +	}
> > +
> > +	policy->cpuinfo.transition_latency = AMD_PSTATE_TRANSITION_LATENCY;
> > +	policy->transition_delay_us = AMD_PSTATE_TRANSITION_DELAY;
> > +
> > +	policy->min = min_freq;
> > +	policy->max = max_freq;
> > +
> > +	policy->cpuinfo.min_freq = min_freq;
> > +	policy->cpuinfo.max_freq = max_freq;
> > +
> > +	/* It will be updated by governor */
> > +	policy->cur = policy->cpuinfo.min_freq;
> > +
> > +	ret = freq_qos_add_request(&policy->constraints, &cpudata->req[0],
> > +				   FREQ_QOS_MIN, policy->cpuinfo.min_freq);
> > +	if (ret < 0) {
> > +		dev_err(dev, "Failed to add min-freq constraint (%d)\n", ret);
> > +		goto free_cpudata1;
> > +	}
> > +
> > +	ret = freq_qos_add_request(&policy->constraints, &cpudata->req[1],
> > +				   FREQ_QOS_MAX, policy->cpuinfo.max_freq);
> > +	if (ret < 0) {
> > +		dev_err(dev, "Failed to add max-freq constraint (%d)\n", ret);
> > +		goto free_cpudata2;
> > +	}
> > +
> > +	/* Initial processor data capability frequencies */
> > +	cpudata->max_freq = max_freq;
> > +	cpudata->min_freq = min_freq;
> > +	cpudata->nominal_freq = nominal_freq;
> > +	cpudata->lowest_nonlinear_freq = lowest_nonlinear_freq;
> > +
> > +	policy->driver_data = cpudata;
> > +
> > +	return 0;
> > +
> > +	freq_qos_remove_request(&cpudata->req[1]);
> 
> Is this line of code reachable?
> 
> Perhaps this was meant as cleanup if a freq_qos_add_request() call failed but
> it shouldn't be needed with the current code flow.

Nice catch! This line should be cleaned up as well.

> 
> > +free_cpudata2:
> > +	freq_qos_remove_request(&cpudata->req[0]);
> > +free_cpudata1:
> > +	kfree(cpudata);
> > +	return ret;
> > +}
> > +
> > +static int amd_pstate_cpu_exit(struct cpufreq_policy *policy)
> > +{
> > +	struct amd_cpudata *cpudata;
> > +
> > +	cpudata = policy->driver_data;
> > +
> > +	freq_qos_remove_request(&cpudata->req[1]);
> > +	freq_qos_remove_request(&cpudata->req[0]);
> > +	kfree(cpudata);
> > +
> > +	return 0;
> > +}
> > +
> > +static struct cpufreq_driver amd_pstate_driver = {
> > +	.flags		= CPUFREQ_CONST_LOOPS | CPUFREQ_NEED_UPDATE_LIMITS,
> > +	.verify		= amd_pstate_verify,
> > +	.target		= amd_pstate_target,
> > +	.init		= amd_pstate_cpu_init,
> > +	.exit		= amd_pstate_cpu_exit,
> > +	.name		= "amd-pstate",
> > +};
> > +
> > +static int __init amd_pstate_init(void)
> > +{
> > +	int ret;
> > +
> > +	if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
> > +		return -ENODEV;
> > +
> > +	if (!acpi_cpc_valid()) {
> > +		pr_debug("%s, the _CPC object is not present in SBIOS\n",
> > +			 __func__);
> 
> Do we need to print the function name here (and below)?

It's a soft reminder to tell the user where the message comes from.

> 
> > +		return -ENODEV;
> > +	}
> > +
> > +	/* don't keep reloading if cpufreq_driver exists */
> > +	if (cpufreq_get_current_driver())
> > +		return -EEXIST;
> > +
> > +	/* capability check */
> > +	if (!boot_cpu_has(X86_FEATURE_AMD_CPPC)) {
> > +		pr_debug("%s, AMD CPPC MSR based functionality is not supported\n",
> > +			 __func__);> +		return -ENODEV;
> > +	}
> > +
> > +	/* enable amd pstate feature */
> > +	ret = amd_pstate_enable(true);
> > +	if (ret) {
> > +		pr_err("%s, failed to enable amd-pstate with return %d\n",
> > +		       __func__, ret);
> > +		return ret;
> > +	}
> > +
> > +	ret = cpufreq_register_driver(&amd_pstate_driver);
> > +	if (ret) {
> > +		pr_err("%s, return %d\n", __func__, ret);
> > +		return ret;
> 
> No need to do a return ret here...
> 
> > +	}
> > +
> > +	return 0;
> 
> ...just do a return ret here.
> 

Updated.

Thanks,
Ray

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 18/21] cpupower: enable boost state support for amd-pstate module
  2021-11-02 20:11   ` Nathan Fontenot
@ 2021-11-03  7:04     ` Huang Rui
  0 siblings, 0 replies; 50+ messages in thread
From: Huang Rui @ 2021-11-03  7:04 UTC (permalink / raw)
  To: Fontenot, Nathan
  Cc: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Borislav Petkov,
	Peter Zijlstra, Ingo Molnar, Giovanni Gherdovich, linux-pm,
	Sharma, Deepak, Deucher, Alexander, Limonciello, Mario,
	Steven Noonan, Su, Jinzhou (Joe),
	Du, Xiaojian, linux-kernel, x86

On Wed, Nov 03, 2021 at 04:11:09AM +0800, Fontenot, Nathan wrote:
> 
> 
> On 10/29/21 8:02 AM, Huang Rui wrote:
> > The legacy ACPI hardware P-States function has 3 P-States on ACPI table,
> > the CPU frequency only can be switched between the 3 P-States. While the
> > processor supports the boost state, it will have another boost state
> > that the frequency can be higher than P0 state, and the state can be
> > decoded by the function of decode_pstates() and read by
> > amd_pci_get_num_boost_states().
> > 
> > However, the new AMD P-States function is different than legacy ACPI
> > hardware P-State on AMD processors. That has a finer grain frequency
> > range between the highest and lowest frequency. And boost frequency is
> > actually the frequency which is mapped on highest performance ratio. The
> > similiar previous P0 frequency is mapped on nominal performance ratio.
> 
> s/similiar/similar/
> 
> > If the highest performance on the processor is higher than nominal
> > performance, then we think the current processor supports the boost
> > state. And it uses amd_pstate_boost_init() to initialize boost for AMD
> > P-States function.
> > 
> > Signed-off-by: Huang Rui <ray.huang@amd.com>
> > ---
> >  tools/power/cpupower/utils/helpers/amd.c     | 18 ++++++++++++++++++
> >  tools/power/cpupower/utils/helpers/helpers.h |  5 +++++
> >  tools/power/cpupower/utils/helpers/misc.c    |  2 ++
> >  3 files changed, 25 insertions(+)
> > 
> > diff --git a/tools/power/cpupower/utils/helpers/amd.c b/tools/power/cpupower/utils/helpers/amd.c
> > index f233a6ab75ac..92b9fb631768 100644
> > --- a/tools/power/cpupower/utils/helpers/amd.c
> > +++ b/tools/power/cpupower/utils/helpers/amd.c
> > @@ -182,5 +182,23 @@ static unsigned long amd_pstate_get_data(unsigned int cpu,
> >  						  MAX_AMD_PSTATE_VALUE_READ_FILES);
> >  }
> >  
> > +void amd_pstate_boost_init(unsigned int cpu, int *support, int *active)
> > +{
> > +	unsigned long highest_perf, nominal_perf, cpuinfo_min,
> > +		      cpuinfo_max, amd_pstate_max;
> > +
> > +	highest_perf = amd_pstate_get_data(cpu, AMD_PSTATE_HIGHEST_PERF);
> > +	nominal_perf = amd_pstate_get_data(cpu, AMD_PSTATE_NOMINAL_PERF);
> > +
> > +	*support = highest_perf > nominal_perf ? 1 : 0;
> > +	if (!(*support))
> > +		return;
> > +
> > +	cpufreq_get_hardware_limits(cpu, &cpuinfo_min, &cpuinfo_max);
> > +	amd_pstate_max = amd_pstate_get_data(cpu, AMD_PSTATE_MAX_FREQ);
> > +
> > +	*active = cpuinfo_max == amd_pstate_max ? 1 : 0;
> > +}
> > +
> >  /* AMD P-States Helper Functions ***************/
> >  #endif /* defined(__i386__) || defined(__x86_64__) */
> > diff --git a/tools/power/cpupower/utils/helpers/helpers.h b/tools/power/cpupower/utils/helpers/helpers.h
> > index e03cc97297aa..c03925bea655 100644
> > --- a/tools/power/cpupower/utils/helpers/helpers.h
> > +++ b/tools/power/cpupower/utils/helpers/helpers.h
> > @@ -140,6 +140,8 @@ extern int cpufreq_has_boost_support(unsigned int cpu, int *support,
> >  
> >  /* AMD P-States stuff **************************/
> >  extern bool cpupower_amd_pstate_enabled(void);
> > +extern void amd_pstate_boost_init(unsigned int cpu,
> > +				  int *support, int *active);
> >  
> >  /* AMD P-States stuff **************************/
> >  
> > @@ -177,6 +179,9 @@ static inline int cpufreq_has_boost_support(unsigned int cpu, int *support,
> >  
> >  static inline bool cpupower_amd_pstate_enabled(void)
> >  { return false; }
> > +static void amd_pstate_boost_init(unsigned int cpu,
> > +				  int *support, int *active)
> > +{ return; }
> 
> I don't believe the return statement is needed here, can just be {}
> 

Updated.

Thanks,
Ray

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 08/21] cpufreq: amd: add acpi cppc function as the backend for legacy processors
  2021-11-02 18:46   ` Nathan Fontenot
@ 2021-11-03 12:00     ` Huang Rui
  0 siblings, 0 replies; 50+ messages in thread
From: Huang Rui @ 2021-11-03 12:00 UTC (permalink / raw)
  To: Fontenot, Nathan
  Cc: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Borislav Petkov,
	Peter Zijlstra, Ingo Molnar, Giovanni Gherdovich, linux-pm,
	Sharma, Deepak, Deucher, Alexander, Limonciello, Mario,
	Steven Noonan, Su, Jinzhou (Joe),
	Du, Xiaojian, linux-kernel, x86

On Wed, Nov 03, 2021 at 02:46:37AM +0800, Fontenot, Nathan wrote:
> On 10/29/21 8:02 AM, Huang Rui wrote:
> > In some old Zen based processors, they are using the shared memory that
> > exposed from ACPI SBIOS.
> 
> With this you present two different approaches for support in the driver,
> MSRs and shared memory. For processors using shared memory you use the 
> shared memory defined in the ACPI tables but access the MSRs directly.
> 
> Is there any concern that the MSR registers (defined in patch 2/21) can
> differ from what is defined in the ACPI tables?
> 
> Should you use the drivers/acpi interfaces for MSRs also?

That's very good question. Thanks to raise this. I consider the reasons
below:

1. We would like to support fast switch function, this function requires
the directly MSR register operation. And it will have better performance
for schedutil governor.

2. There are some differences between MSR and shared memory definitions.
E.X. CPPCEnableRegister of shared memory solution required us to enable the
field on each thread. However the one of full MSR is per package, and we
only programs it one-off.

3. So far, I received many issues which reported from community, most of
them are caused by SBIOS issues. E.X. Steven's SBIOS has additional object
which modified by motherboard OEM vendor. (Thanks Steven to co-work with us
addressing the issue). Using the MSR definitions directly is friendly for
more platforms.

4. I would like to keep the cppc_acpi as common for ACPI spec, because it's
also used by ARM SOCs. And won't add x86/amd specific things in cppc_acpi.
Using the MSR directly can be more straightforward in the amd-pstate driver
like intel_pstate as well.

Thanks,
Ray

> 
> -Nathan
>  
> > 
> > Signed-off-by: Jinzhou Su <Jinzhou.Su@amd.com>
> > Signed-off-by: Huang Rui <ray.huang@amd.com>
> > ---
> >  drivers/cpufreq/amd-pstate.c | 58 ++++++++++++++++++++++++++++++++----
> >  1 file changed, 53 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> > index 55ff03f85608..d399938d6d85 100644
> > --- a/drivers/cpufreq/amd-pstate.c
> > +++ b/drivers/cpufreq/amd-pstate.c
> > @@ -73,6 +73,19 @@ static inline int pstate_enable(bool enable)
> >  	return wrmsrl_safe(MSR_AMD_CPPC_ENABLE, enable ? 1 : 0);
> >  }
> >  
> > +static int cppc_enable(bool enable)
> > +{
> > +	int cpu, ret = 0;
> > +
> > +	for_each_online_cpu(cpu) {
> > +		ret = cppc_set_enable(cpu, enable ? 1 : 0);
> > +		if (ret)
> > +			return ret;
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> >  DEFINE_STATIC_CALL(amd_pstate_enable, pstate_enable);
> >  
> >  static inline int amd_pstate_enable(bool enable)
> > @@ -103,6 +116,24 @@ static int pstate_init_perf(struct amd_cpudata *cpudata)
> >  	return 0;
> >  }
> >  
> > +static int cppc_init_perf(struct amd_cpudata *cpudata)
> > +{
> > +	struct cppc_perf_caps cppc_perf;
> > +
> > +	int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf);
> > +	if (ret)
> > +		return ret;
> > +
> > +	WRITE_ONCE(cpudata->highest_perf, amd_get_highest_perf());
> > +
> > +	WRITE_ONCE(cpudata->nominal_perf, cppc_perf.nominal_perf);
> > +	WRITE_ONCE(cpudata->lowest_nonlinear_perf,
> > +		   cppc_perf.lowest_nonlinear_perf);
> > +	WRITE_ONCE(cpudata->lowest_perf, cppc_perf.lowest_perf);
> > +
> > +	return 0;
> > +}
> > +
> >  DEFINE_STATIC_CALL(amd_pstate_init_perf, pstate_init_perf);
> >  
> >  static inline int amd_pstate_init_perf(struct amd_cpudata *cpudata)
> > @@ -120,6 +151,19 @@ static void pstate_update_perf(struct amd_cpudata *cpudata, u32 min_perf,
> >  			      READ_ONCE(cpudata->cppc_req_cached));
> >  }
> >  
> > +static void cppc_update_perf(struct amd_cpudata *cpudata,
> > +			     u32 min_perf, u32 des_perf,
> > +			     u32 max_perf, bool fast_switch)
> > +{
> > +	struct cppc_perf_ctrls perf_ctrls;
> > +
> > +	perf_ctrls.max_perf = max_perf;
> > +	perf_ctrls.min_perf = min_perf;
> > +	perf_ctrls.desired_perf = des_perf;
> > +
> > +	cppc_set_perf(cpudata->cpu, &perf_ctrls);
> > +}
> > +
> >  DEFINE_STATIC_CALL(amd_pstate_update_perf, pstate_update_perf);
> >  
> >  static inline void amd_pstate_update_perf(struct amd_cpudata *cpudata,
> > @@ -346,7 +390,8 @@ static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
> >  	/* It will be updated by governor */
> >  	policy->cur = policy->cpuinfo.min_freq;
> >  
> > -	policy->fast_switch_possible = true;
> > +	if (boot_cpu_has(X86_FEATURE_AMD_CPPC))
> > +		policy->fast_switch_possible = true;
> >  
> >  	ret = freq_qos_add_request(&policy->constraints, &cpudata->req[0],
> >  				   FREQ_QOS_MIN, policy->cpuinfo.min_freq);
> > @@ -397,7 +442,6 @@ static struct cpufreq_driver amd_pstate_driver = {
> >  	.flags		= CPUFREQ_CONST_LOOPS | CPUFREQ_NEED_UPDATE_LIMITS,
> >  	.verify		= amd_pstate_verify,
> >  	.target		= amd_pstate_target,
> > -	.adjust_perf    = amd_pstate_adjust_perf,
> >  	.init		= amd_pstate_cpu_init,
> >  	.exit		= amd_pstate_cpu_exit,
> >  	.name		= "amd-pstate",
> > @@ -421,10 +465,14 @@ static int __init amd_pstate_init(void)
> >  		return -EEXIST;
> >  
> >  	/* capability check */
> > -	if (!boot_cpu_has(X86_FEATURE_AMD_CPPC)) {
> > -		pr_debug("%s, AMD CPPC MSR based functionality is not supported\n",
> > +	if (boot_cpu_has(X86_FEATURE_AMD_CPPC)) {
> > +		pr_debug("%s, AMD CPPC MSR based functionality is supported\n",
> >  			 __func__);
> > -		return -ENODEV;
> > +		amd_pstate_driver.adjust_perf = amd_pstate_adjust_perf;
> > +	} else {
> > +		static_call_update(amd_pstate_enable, cppc_enable);
> > +		static_call_update(amd_pstate_init_perf, cppc_init_perf);
> > +		static_call_update(amd_pstate_update_perf, cppc_update_perf);
> >  	}
> >  
> >  	/* enable amd pstate feature */
> > 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 06/21] cpufreq: amd: introduce a new amd pstate driver to support future processors
  2021-11-03  7:01     ` Huang Rui
@ 2021-11-04 15:10       ` Nathan Fontenot
  2021-11-05  4:20         ` Huang Rui
  0 siblings, 1 reply; 50+ messages in thread
From: Nathan Fontenot @ 2021-11-04 15:10 UTC (permalink / raw)
  To: Huang Rui, Fontenot, Nathan
  Cc: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Borislav Petkov,
	Peter Zijlstra, Ingo Molnar, Giovanni Gherdovich, linux-pm,
	Sharma, Deepak, Deucher, Alexander, Limonciello, Mario,
	Steven Noonan, Su, Jinzhou (Joe),
	Du, Xiaojian, linux-kernel, x86

On 11/3/21 2:01 AM, Huang Rui wrote:

>>> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
>>> new file mode 100644
>>> index 000000000000..a400861c7fdc
>>> --- /dev/null
>>> +++ b/drivers/cpufreq/amd-pstate.c
>>> @@ -0,0 +1,413 @@
>>> +/*
>>> + * amd-pstate.c - AMD Processor P-state Frequency Driver
>>> + *
>>> + * Copyright (C) 2021 Advanced Micro Devices, Inc. All Rights Reserved.
>>> + *
>>> + * This program is free software; you can redistribute it and/or
>>> + * modify it under the terms of the GNU General Public License
>>> + * as published by the Free Software Foundation; either version 2
>>> + * of the License, or (at your option) any later version.
>>> + *
>>> + * This program is distributed in the hope that it will be useful,
>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>>> + * GNU General Public License for more details.
>>> + *
>>> + * You should have received a copy of the GNU General Public License along with
>>> + * this program; if not, write to the Free Software
>>> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
>>
>> You should use a SPDX license identifier instead of copying the GPL text
>> in the file. See Documentation/process/license-rules.rst
> 
> The SPDX license identifier is an alternative way to instead of the common
> way to express the license at the top comment of the file. Acutally it's
> not mandatory, right?
> 

It's not mandatory but I believe using SPDX identifiers is the preferred method.

...

>>> +static int __init amd_pstate_init(void)
>>> +{
>>> +	int ret;
>>> +
>>> +	if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
>>> +		return -ENODEV;
>>> +
>>> +	if (!acpi_cpc_valid()) {
>>> +		pr_debug("%s, the _CPC object is not present in SBIOS\n",
>>> +			 __func__);
>>
>> Do we need to print the function name here (and below)?
> 
> It's a soft reminder to tell the user where the message comes from.
>

True, but you do define pr_fmt at the top of the file so users will know
this is coming from the amd_pstate driver.

-Nathan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 00/21] cpufreq: introduce a new AMD CPU frequency control mechanism
  2021-10-29 13:02 [PATCH v3 00/21] cpufreq: introduce a new AMD CPU frequency control mechanism Huang Rui
                   ` (20 preceding siblings ...)
  2021-10-29 13:02 ` [PATCH v3 21/21] Documentation: amd-pstate: add amd-pstate driver introduction Huang Rui
@ 2021-11-04 16:40 ` Giovanni Gherdovich
  2021-11-05 16:09   ` Huang Rui
  21 siblings, 1 reply; 50+ messages in thread
From: Giovanni Gherdovich @ 2021-11-04 16:40 UTC (permalink / raw)
  To: Huang Rui, Rafael J . Wysocki, Viresh Kumar, Shuah Khan,
	Borislav Petkov, Peter Zijlstra, Ingo Molnar, linux-pm
  Cc: Deepak Sharma, Alex Deucher, Mario Limonciello, Steven Noonan,
	Nathan Fontenot, Jinzhou Su, Xiaojian Du, linux-kernel, x86

On Fri, 2021-10-29 at 21:02 +0800, Huang Rui wrote:
> Hi all,
> 
> We would like to introduce a new AMD CPU frequency control mechanism as the
> "amd-pstate" driver for modern AMD Zen based CPU series in Linux Kernel.
> 
> ..snip..

Hello,

I've tested this driver and it seems the results are a little underwhelming.
The test machine is a two sockets server with two AMD EPYC 7713,
family:model:stepping 25:1:1, 128 cores/256 threads, 256G of memory and SSD
storage. On this system, the amd-pstate driver works only in "shared memory
support", not in "full MSR support", meaning that frequency switches are
triggered from a workqueue instead of scheduler context (!fast_switch).

Dbench sees some ludicrous improvements in both performance and performance
per watt; likewise netperf sees some modest improvements, but that's about
the only good news. Schedutil/ondemand on tbench and hackbench do worse
with amd-pstate than acpi-cpufreq. I don't have data for
ondemand/amd-pstate on kernbench and gitsource, but schedutil regresses on
both.

Here the tables, then some questions & discussion points.

Tilde (~) means the result is the same as baseline (which is, the ratio is close to 1).
"Sugov" means "schedutil governor", "perfgov" means "performance governor".

             :        acpi-cpufreq          :        amd-pstate          :
 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
             :  ondemand  sugov  perfgov    :  ondemand  sugov  perfgov  :  better if
 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                       PERFORMANCE RATIOS
dbench       :  1.00      ~      0.33       :  0.37      0.35   0.36     :  lower
netperf      :  1.00      0.97   ~          :  1.03      1.04   ~        :  higher
tbench       :  1.00      1.04   1.06       :  0.83      0.40   1.05     :  higher
hackbench    :  1.00      ~      1.03       :  1.09      1.42   1.03     :  lower
kernbench    :  1.00      0.96   0.97       :  N/A       1.08   ~        :  lower
gitsource    :  1.00      0.67   0.69       :  N/A       0.79   0.67     :  lower
 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                  PERFORMANCE-PER-WATT RATIOS
dbench4      :  1.00      ~      3.37       :  2.68      3.12   3.03     :  higher
netperf      :  1.00      0.96   ~          :  1.09      1.06   ~        :  higher
tbench4      :  1.00      1.03   1.06       :  0.76      0.34   1.04     :  higher
hackbench    :  1.00      ~      0.95       :  0.88      0.65   0.96     :  higher
kernbench    :  1.00      1.06   1.05       :  N/A       0.93   1.05     :  higher
gitsource    :  1.00      1.53   1.50       :  N/A       1.33   1.55     :  higher


How to read the table: all numbers are ratios of the results of some
governor/driver combination and ondemand/acpi-cpufreq, which is the
baseline (first column). When the "better if" column says "higher", a ratio
larger than 1 indicates an improvement; otherwise it's a regression.
Example: hackbench with sugov/amd-pstate is 42% slower than with
ondemand/acpi-cpufreq (top table). At the same time, it's also 35% less
efficient (bottom table).

Now, some questions / possible troubleshooting directions:

- ACPI-CPUFREQ DRIVER: REQUESTS ARE HINTS OR MANDATES?
  When using acpi-cpufreq, and the OS requests some frequency (one of the
  three allowed P-States), does the hardware underneath stick to it? Or
  does it do some ulterior adjustment based on the load?
  This would tell if a machine using acpi-cpufreq is less dumb than it
  seems, and can in principle do fine-grain adjustments all the same.

- PROCESSING CPPC DOORBELL REQUESTS: HOW FAST IS THAT?
  How long does it take the hardware to process the CPPC doorbell
  request to change frequency? What happens to outstanding requests, if
  they're not processed in a timely manner? Is there any queue of requests,
  and if so, how long is it? Could it be that if requests come in too quickly
  the CPU ends up playing catch-up on freq switches that are obsoletes or
  redundant?

- LIKE-FOR-LIKE: TRY BENCHMARKING WITH AMD-PSTATE LIMITED TO 3 P-STATES?
  Could it be that to study the performance of the "shared memory support"
  system against acpi-cpufreq a more like-to-like comparison would be to limit
  amd-pstate to only the 3 P-States available to acpi-cpufreq? That would be
  for experimental/benchmarking purposes only. Eg: on my machines acpi-cpufreq
  sees 1.5GHz, 1.7GHz and 2GHz. Given that max boost is 3.72GHz, and the CPPC
  range is the abstract interval 0..255, I could limit amd-pstate to only set
  performance level of 68, 102 and 137, and see what it gives against the old
  driver. What do you think?

- PROCESSING CPPC DOORBELL REQS IS SLOW. BUT /MAKING/ A REQUEST, SLOW TOO?
  Looks to me that with the "shared memory support" the frequency update
  process is doubly asynchronous: first we have the ->target() callback
  deferred to a workqueue, then when it's eventually executed, it calls
  cppc_update_perf() which again just asks the firmware to do work at a
  later time. Are we sure that cppc_update_perf() is actually so slow to
  warrant !fast_switch?

- HOW MANY P-STATES ARE TOO MANY?
  I've always believed the contrary, but what if having too many P-States is
  harmful for both performance and efficiency? Maybe the governor is
  requesting many updates in small increments where less (and larger) updates
  would be more appropriate?


Thanks,
Giovanni


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 06/21] cpufreq: amd: introduce a new amd pstate driver to support future processors
  2021-11-04 15:10       ` Nathan Fontenot
@ 2021-11-05  4:20         ` Huang Rui
  0 siblings, 0 replies; 50+ messages in thread
From: Huang Rui @ 2021-11-05  4:20 UTC (permalink / raw)
  To: Fontenot, Nathan
  Cc: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Borislav Petkov,
	Peter Zijlstra, Ingo Molnar, Giovanni Gherdovich, linux-pm,
	Sharma, Deepak, Deucher, Alexander, Limonciello, Mario,
	Steven Noonan, Su, Jinzhou (Joe),
	Du, Xiaojian, linux-kernel, x86

On Thu, Nov 04, 2021 at 11:10:59PM +0800, Fontenot, Nathan wrote:
> On 11/3/21 2:01 AM, Huang Rui wrote:
> 
> >>> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> >>> new file mode 100644
> >>> index 000000000000..a400861c7fdc
> >>> --- /dev/null
> >>> +++ b/drivers/cpufreq/amd-pstate.c
> >>> @@ -0,0 +1,413 @@
> >>> +/*
> >>> + * amd-pstate.c - AMD Processor P-state Frequency Driver
> >>> + *
> >>> + * Copyright (C) 2021 Advanced Micro Devices, Inc. All Rights Reserved.
> >>> + *
> >>> + * This program is free software; you can redistribute it and/or
> >>> + * modify it under the terms of the GNU General Public License
> >>> + * as published by the Free Software Foundation; either version 2
> >>> + * of the License, or (at your option) any later version.
> >>> + *
> >>> + * This program is distributed in the hope that it will be useful,
> >>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> >>> + * GNU General Public License for more details.
> >>> + *
> >>> + * You should have received a copy of the GNU General Public License along with
> >>> + * this program; if not, write to the Free Software
> >>> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
> >>
> >> You should use a SPDX license identifier instead of copying the GPL text
> >> in the file. See Documentation/process/license-rules.rst
> > 
> > The SPDX license identifier is an alternative way to instead of the common
> > way to express the license at the top comment of the file. Acutally it's
> > not mandatory, right?
> > 
> 
> It's not mandatory but I believe using SPDX identifiers is the preferred method.
> 

Yes, indeed, I will change this in V4.

> ...
> 
> >>> +static int __init amd_pstate_init(void)
> >>> +{
> >>> +	int ret;
> >>> +
> >>> +	if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
> >>> +		return -ENODEV;
> >>> +
> >>> +	if (!acpi_cpc_valid()) {
> >>> +		pr_debug("%s, the _CPC object is not present in SBIOS\n",
> >>> +			 __func__);
> >>
> >> Do we need to print the function name here (and below)?
> > 
> > It's a soft reminder to tell the user where the message comes from.
> >
> 
> True, but you do define pr_fmt at the top of the file so users will know
> this is coming from the amd_pstate driver.
> 

Hmm, yes, will clean this in V4.

Thanks,
Ray

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 00/21] cpufreq: introduce a new AMD CPU frequency control mechanism
  2021-11-04 16:40 ` [PATCH v3 00/21] cpufreq: introduce a new AMD CPU frequency control mechanism Giovanni Gherdovich
@ 2021-11-05 16:09   ` Huang Rui
  2021-11-06  8:58     ` Matt McDonald
  0 siblings, 1 reply; 50+ messages in thread
From: Huang Rui @ 2021-11-05 16:09 UTC (permalink / raw)
  To: Giovanni Gherdovich
  Cc: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Borislav Petkov,
	Peter Zijlstra, Ingo Molnar, linux-pm, Sharma, Deepak, Deucher,
	Alexander, Limonciello, Mario, Steven Noonan, Fontenot, Nathan,
	Su, Jinzhou (Joe),
	Du, Xiaojian, linux-kernel, x86

On Fri, Nov 05, 2021 at 12:40:18AM +0800, Giovanni Gherdovich wrote:
> On Fri, 2021-10-29 at 21:02 +0800, Huang Rui wrote:
> > Hi all,
> > 
> > We would like to introduce a new AMD CPU frequency control mechanism as the
> > "amd-pstate" driver for modern AMD Zen based CPU series in Linux Kernel.
> > 
> > ..snip..
> 
> Hello,
> 
> I've tested this driver and it seems the results are a little underwhelming.
> The test machine is a two sockets server with two AMD EPYC 7713,
> family:model:stepping 25:1:1, 128 cores/256 threads, 256G of memory and SSD
> storage. On this system, the amd-pstate driver works only in "shared memory
> support", not in "full MSR support", meaning that frequency switches are
> triggered from a workqueue instead of scheduler context (!fast_switch).
> 

Hi Giovanni,

I am really appreciated for the detailed tests and analysis! Thank you!

The initial driver was developed on a mobile CPU (Cezanne) with 8 cores/16
threads which supports the "full MSR" solution. And we spent a lot of time
to debug with BIOS, SMU firmware, and hardware guys to bring up this driver
on this CPU. The test results we provided were based on those series of
processors.

For the processors with "shared memory solution", we bring it up in a short
time recently to hope more AMD processors to also support new driver. :-)
Although our CPUs comply with the ACPI standard theoretically, different
processors have different SBIOS and SMU firmware (I assumed you know this
in previous mail). In real case, we need to verify it one by one, because
there are some differences in SBIOS ACPI _CPC table and firmware
implementation.

Of course, right now, we can start to optimize other processors and "shared
memory solution" in parallel.

Would you mind that we add a module param or filter the known good
processors (mobile parts) to load amd-pstate. And others can use the param
to switch between amd-pstate and acpi-cpufreq manually? After we address the
performance gap, then we can switch it back.

> Dbench sees some ludicrous improvements in both performance and performance
> per watt; likewise netperf sees some modest improvements, but that's about
> the only good news. Schedutil/ondemand on tbench and hackbench do worse
> with amd-pstate than acpi-cpufreq. I don't have data for
> ondemand/amd-pstate on kernbench and gitsource, but schedutil regresses on
> both.
> 
> Here the tables, then some questions & discussion points.
> 
> Tilde (~) means the result is the same as baseline (which is, the ratio is close to 1).
> "Sugov" means "schedutil governor", "perfgov" means "performance governor".
> 
>              :        acpi-cpufreq          :        amd-pstate          :
>  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>              :  ondemand  sugov  perfgov    :  ondemand  sugov  perfgov  :  better if
>  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>                                        PERFORMANCE RATIOS
> dbench       :  1.00      ~      0.33       :  0.37      0.35   0.36     :  lower
> netperf      :  1.00      0.97   ~          :  1.03      1.04   ~        :  higher
> tbench       :  1.00      1.04   1.06       :  0.83      0.40   1.05     :  higher
> hackbench    :  1.00      ~      1.03       :  1.09      1.42   1.03     :  lower
> kernbench    :  1.00      0.96   0.97       :  N/A       1.08   ~        :  lower
> gitsource    :  1.00      0.67   0.69       :  N/A       0.79   0.67     :  lower
>  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>                                   PERFORMANCE-PER-WATT RATIOS
> dbench4      :  1.00      ~      3.37       :  2.68      3.12   3.03     :  higher
> netperf      :  1.00      0.96   ~          :  1.09      1.06   ~        :  higher
> tbench4      :  1.00      1.03   1.06       :  0.76      0.34   1.04     :  higher
> hackbench    :  1.00      ~      0.95       :  0.88      0.65   0.96     :  higher
> kernbench    :  1.00      1.06   1.05       :  N/A       0.93   1.05     :  higher
> gitsource    :  1.00      1.53   1.50       :  N/A       1.33   1.55     :  higher
> 
> 
> How to read the table: all numbers are ratios of the results of some
> governor/driver combination and ondemand/acpi-cpufreq, which is the
> baseline (first column). When the "better if" column says "higher", a ratio
> larger than 1 indicates an improvement; otherwise it's a regression.
> Example: hackbench with sugov/amd-pstate is 42% slower than with
> ondemand/acpi-cpufreq (top table). At the same time, it's also 35% less
> efficient (bottom table).

It seems the issue mainly from the processors with big number of cores and
threads. Let's find the similiar family threadripper or EYPC processors to
duplicate the test results. Will contact at you for details. :-)

> 
> Now, some questions / possible troubleshooting directions:
> 
> - ACPI-CPUFREQ DRIVER: REQUESTS ARE HINTS OR MANDATES?
>   When using acpi-cpufreq, and the OS requests some frequency (one of the
>   three allowed P-States), does the hardware underneath stick to it? Or
>   does it do some ulterior adjustment based on the load?
>   This would tell if a machine using acpi-cpufreq is less dumb than it
>   seems, and can in principle do fine-grain adjustments all the same.
> 

The acpi-cpufreq driver should request the frequency level to go, however,
the firmware has a policy to adjust clock as well according to the hardware
condition such as voltage, electricity, and temperature. Legacy ACPI
P-state doesn't have any transaction to firmware side. But on amd-pstate,
the firmware can detects the performance goals as the hints that driver
provides.

> - PROCESSING CPPC DOORBELL REQUESTS: HOW FAST IS THAT?
>   How long does it take the hardware to process the CPPC doorbell
>   request to change frequency? What happens to outstanding requests, if
>   they're not processed in a timely manner? Is there any queue of requests,
>   and if so, how long is it? Could it be that if requests come in too quickly
>   the CPU ends up playing catch-up on freq switches that are obsoletes or
>   redundant?

That's a good question. We need to consult with firmware and hardware guys.
Or any method, we can caculate it from software side.

> 
> - LIKE-FOR-LIKE: TRY BENCHMARKING WITH AMD-PSTATE LIMITED TO 3 P-STATES?
>   Could it be that to study the performance of the "shared memory support"
>   system against acpi-cpufreq a more like-to-like comparison would be to limit
>   amd-pstate to only the 3 P-States available to acpi-cpufreq? That would be
>   for experimental/benchmarking purposes only. Eg: on my machines acpi-cpufreq
>   sees 1.5GHz, 1.7GHz and 2GHz. Given that max boost is 3.72GHz, and the CPPC
>   range is the abstract interval 0..255, I could limit amd-pstate to only set
>   performance level of 68, 102 and 137, and see what it gives against the old
>   driver. What do you think?

That's good idea. We can give some experiments like this.

> 
> - PROCESSING CPPC DOORBELL REQS IS SLOW. BUT /MAKING/ A REQUEST, SLOW TOO?
>   Looks to me that with the "shared memory support" the frequency update
>   process is doubly asynchronous: first we have the ->target() callback
>   deferred to a workqueue, then when it's eventually executed, it calls
>   cppc_update_perf() which again just asks the firmware to do work at a
>   later time. Are we sure that cppc_update_perf() is actually so slow to
>   warrant !fast_switch?

That's a good question! I think your platform with "shared memory support"
is actually to read/write the memory in Platform Communication Channel
(PCC) to update the performance goals. However, acpi-cpufreq driver is
using the MSR registers with cpu_freq_write_amd()/cpu_freq_read_amd().

Is that possible that MSR register access faster than the memory doorbell
in PCC?

> 
> - HOW MANY P-STATES ARE TOO MANY?
>   I've always believed the contrary, but what if having too many P-States is
>   harmful for both performance and efficiency? Maybe the governor is
>   requesting many updates in small increments where less (and larger) updates
>   would be more appropriate?

I am thinking that, maybe, we can dig out better policy to control the
perf range.


Thanks again for questions / possible troubleshooting directions. They are
very helpful. Next step, let us find out what is the root cause of the
performance gap between acpi-cpufreq and amd-pstate driver.

Thanks,
Ray

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 12/21] cpufreq: amd: add amd-pstate performance attributes
  2021-10-29 13:02 ` [PATCH v3 12/21] cpufreq: amd: add amd-pstate performance attributes Huang Rui
@ 2021-11-05 18:50   ` Nathan Fontenot
  0 siblings, 0 replies; 50+ messages in thread
From: Nathan Fontenot @ 2021-11-05 18:50 UTC (permalink / raw)
  To: Huang Rui, Rafael J . Wysocki, Viresh Kumar, Shuah Khan,
	Borislav Petkov, Peter Zijlstra, Ingo Molnar,
	Giovanni Gherdovich, linux-pm
  Cc: Deepak Sharma, Alex Deucher, Mario Limonciello, Steven Noonan,
	Nathan Fontenot, Jinzhou Su, Xiaojian Du, linux-kernel, x86

On 10/29/21 8:02 AM, Huang Rui wrote:
> Introduce sysfs attributes to get the different level amd-pstate
> performances.

Can you explain why we need to provide duplicate sysfs entries for
these values?

Each one of these are already created by the drivers/acpi code and exist
in sysfs (see /sys/devices/system/cpu/cpuX/acpi_cppc).

-Nathan

> 
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> ---
>  drivers/cpufreq/amd-pstate.c | 53 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 53 insertions(+)
> 
> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> index 8cf1e80f44e0..58ee50bf492b 100644
> --- a/drivers/cpufreq/amd-pstate.c
> +++ b/drivers/cpufreq/amd-pstate.c
> @@ -536,14 +536,67 @@ static ssize_t show_amd_pstate_lowest_nonlinear_freq(struct cpufreq_policy *poli
>  	return sprintf(&buf[0], "%u\n", freq);
>  }
>  
> +static ssize_t show_amd_pstate_highest_perf(struct cpufreq_policy *policy,
> +					    char *buf)
> +{
> +	u32 perf;
> +	struct amd_cpudata *cpudata = policy->driver_data;
> +
> +	perf = READ_ONCE(cpudata->highest_perf);
> +
> +	return sprintf(&buf[0], "%u\n", perf);
> +}
> +
> +static ssize_t show_amd_pstate_nominal_perf(struct cpufreq_policy *policy,
> +					    char *buf)
> +{
> +	u32 perf;
> +	struct amd_cpudata *cpudata = policy->driver_data;
> +
> +	perf = READ_ONCE(cpudata->nominal_perf);
> +
> +	return sprintf(&buf[0], "%u\n", perf);
> +}
> +
> +static ssize_t show_amd_pstate_lowest_nonlinear_perf(struct cpufreq_policy *policy,
> +						     char *buf)
> +{
> +	u32 perf;
> +	struct amd_cpudata *cpudata = policy->driver_data;
> +
> +	perf = READ_ONCE(cpudata->lowest_nonlinear_perf);
> +
> +	return sprintf(&buf[0], "%u\n", perf);
> +}
> +
> +static ssize_t show_amd_pstate_lowest_perf(struct cpufreq_policy *policy,
> +					   char *buf)
> +{
> +	u32 perf;
> +	struct amd_cpudata *cpudata = policy->driver_data;
> +
> +	perf = READ_ONCE(cpudata->lowest_perf);
> +
> +	return sprintf(&buf[0], "%u\n", perf);
> +}
> +
>  cpufreq_freq_attr_ro(amd_pstate_max_freq);
>  cpufreq_freq_attr_ro(amd_pstate_nominal_freq);
>  cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq);
>  
> +cpufreq_freq_attr_ro(amd_pstate_highest_perf);
> +cpufreq_freq_attr_ro(amd_pstate_nominal_perf);
> +cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_perf);
> +cpufreq_freq_attr_ro(amd_pstate_lowest_perf);
> +
>  static struct freq_attr *amd_pstate_attr[] = {
>  	&amd_pstate_max_freq,
>  	&amd_pstate_nominal_freq,
>  	&amd_pstate_lowest_nonlinear_freq,
> +	&amd_pstate_highest_perf,
> +	&amd_pstate_nominal_perf,
> +	&amd_pstate_lowest_nonlinear_perf,
> +	&amd_pstate_lowest_perf,
>  	NULL,
>  };
>  
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 11/21] cpufreq: amd: add amd-pstate frequencies attributes
  2021-10-29 13:02 ` [PATCH v3 11/21] cpufreq: amd: add amd-pstate frequencies attributes Huang Rui
@ 2021-11-05 18:59   ` Nathan Fontenot
  2021-11-10 12:28     ` Huang Rui
  0 siblings, 1 reply; 50+ messages in thread
From: Nathan Fontenot @ 2021-11-05 18:59 UTC (permalink / raw)
  To: Huang Rui, Rafael J . Wysocki, Viresh Kumar, Shuah Khan,
	Borislav Petkov, Peter Zijlstra, Ingo Molnar,
	Giovanni Gherdovich, linux-pm
  Cc: Deepak Sharma, Alex Deucher, Mario Limonciello, Steven Noonan,
	Nathan Fontenot, Jinzhou Su, Xiaojian Du, linux-kernel, x86

On 10/29/21 8:02 AM, Huang Rui wrote:
> Introduce sysfs attributes to get the different level processor
> frequencies.
> 
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> ---
>  drivers/cpufreq/amd-pstate.c | 63 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 63 insertions(+)
> 
> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> index 9af27ac1f818..8cf1e80f44e0 100644
> --- a/drivers/cpufreq/amd-pstate.c
> +++ b/drivers/cpufreq/amd-pstate.c
> @@ -485,6 +485,68 @@ static int amd_pstate_cpu_exit(struct cpufreq_policy *policy)
>  	return 0;
>  }
>  
> +/* Sysfs attributes */
> +
> +/* This frequency is to indicate the maximum hardware frequency.
> + * If boost is not active but supported, the frequency will be larger than the
> + * one in cpuinfo.
> + */
> +static ssize_t show_amd_pstate_max_freq(struct cpufreq_policy *policy,
> +					char *buf)
> +{
> +	int max_freq;
> +	struct amd_cpudata *cpudata;
> +
> +	cpudata = policy->driver_data;
> +
> +	max_freq = amd_get_max_freq(cpudata);
> +	if (max_freq < 0)
> +		return max_freq;
> +
> +	return sprintf(&buf[0], "%u\n", max_freq);
> +}
> +
> +static ssize_t show_amd_pstate_nominal_freq(struct cpufreq_policy *policy,
> +					    char *buf)
> +{
> +	int nominal_freq;
> +	struct amd_cpudata *cpudata;
> +
> +	cpudata = policy->driver_data;
> +
> +	nominal_freq = amd_get_nominal_freq(cpudata);
> +	if (nominal_freq < 0)
> +		return nominal_freq;
> +
> +	return sprintf(&buf[0], "%u\n", nominal_freq);
> +}

The nominal_freq valus is already reported in sysfs by drivers/acpi since this
value is part of the ACPI spec. Is there a reason to have multiple sysfs entries 
for the same value?

-Nathan
 
> +
> +static ssize_t show_amd_pstate_lowest_nonlinear_freq(struct cpufreq_policy *policy,
> +						     char *buf)
> +{
> +	int freq;
> +	struct amd_cpudata *cpudata;
> +
> +	cpudata = policy->driver_data;
> +
> +	freq = amd_get_lowest_nonlinear_freq(cpudata);
> +	if (freq < 0)
> +		return freq;
> +
> +	return sprintf(&buf[0], "%u\n", freq);
> +}
> +
> +cpufreq_freq_attr_ro(amd_pstate_max_freq);
> +cpufreq_freq_attr_ro(amd_pstate_nominal_freq);
> +cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq);
> +
> +static struct freq_attr *amd_pstate_attr[] = {
> +	&amd_pstate_max_freq,
> +	&amd_pstate_nominal_freq,
> +	&amd_pstate_lowest_nonlinear_freq,
> +	NULL,
> +};
> +
>  static struct cpufreq_driver amd_pstate_driver = {
>  	.flags		= CPUFREQ_CONST_LOOPS | CPUFREQ_NEED_UPDATE_LIMITS,
>  	.verify		= amd_pstate_verify,
> @@ -493,6 +555,7 @@ static struct cpufreq_driver amd_pstate_driver = {
>  	.exit		= amd_pstate_cpu_exit,
>  	.set_boost	= amd_pstate_set_boost,
>  	.name		= "amd-pstate",
> +	.attr           = amd_pstate_attr,
>  };
>  
>  static int __init amd_pstate_init(void)
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 00/21] cpufreq: introduce a new AMD CPU frequency control mechanism
  2021-11-05 16:09   ` Huang Rui
@ 2021-11-06  8:58     ` Matt McDonald
  2021-11-08  9:20       ` Huang Rui
  0 siblings, 1 reply; 50+ messages in thread
From: Matt McDonald @ 2021-11-06  8:58 UTC (permalink / raw)
  To: Huang Rui, Giovanni Gherdovich
  Cc: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Borislav Petkov,
	Peter Zijlstra, Ingo Molnar, linux-pm, Sharma, Deepak, Deucher,
	Alexander, Limonciello, Mario, Steven Noonan, Fontenot, Nathan,
	Su, Jinzhou (Joe),
	Du, Xiaojian, linux-kernel, x86

> > I've tested this driver and it seems the results are a little
> > underwhelming.
> > The test machine is a two sockets server with two AMD EPYC 7713,
> > family:model:stepping 25:1:1, 128 cores/256 threads, 256G of memory
> > and SSD
> > storage. On this system, the amd-pstate driver works only in
> > "shared memory support", not in "full MSR support",
> > meaning that frequency switches are triggered from a workqueue
> > instead of scheduler context (!fast_switch).

Huang, I've also done some detailed testing, and while many synthetic
benchmarks seem to show minimal differences between this new frequency
control mechanism and acpi_cpufreq, the general user experience seems a
bit degraded, but most of all, gaming performance in many instances (if
not all) is cut in half. Fully half. 

I have an RTX 3090 and a Ryzen 9 5900X, with 32GB (4x8) DDR4 3600. In
Control with DLSS and RT enabled, on 5.15.rc5 with acpi_cpufreq, I get
120-130 fps at 1440p. The same exact kernel with v3 of AMD_CPPC gives
me 50 fps. GPU usage is still at 100, but the CPU frequency is being
reported as like 5100Mhz*, and other assorted weirdness, but most
importantly the fps is stuck at 50. This is regardless of performance
scheduler (schedutil, ondemand, userspace or performance). 

*My CPU can indeed boost over 5GHz on a single core here and there, but
this was constant and on all cores, so clearly it wasn't accurate.

Also, from the documentation it looks like there's supposed to be a way
to fall back to acpi_cpufreq, but I found no such way to do that. If
AMD_CPPC was built into the kernel, I had to use amd-pstate, there was
no other option. Maybe I misinterpreted and acpi-cpufreq is only able
to be used as a fallback for CPUs that don't support amd-pstate.

I know that gaming on Linux hasn't historically been one of AMD's
priorities with their CPUs, but with the Steam Deck upcoming I would
imagine this is a pretty important use-case, and I've tested multiple
games and they all lose a full 50% performance. I'm happy to test any
revisions or even kernel parameters or whatever else to try and get
this sorted. 



> Would you mind that we add a module param or filter the known good
> processors (mobile parts) to load amd-pstate. And others can use the
> param
> to switch between amd-pstate and acpi-cpufreq manually? After we
> address the
> performance gap, then we can switch it back.


This would be something I would be interested to try.

> 
> It seems the issue mainly from the processors with big number of
> cores and
> threads. Let's find the similiar family threadripper or EYPC
> processors to
> duplicate the test results. Will contact at you for details. :-)

This may be an interesting route of investigation, I could potentially
try running a game with `taskset -c 0-7` or something similar. 

> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 01/21] x86/cpufreatures: add AMD Collaborative Processor Performance Control feature flag
  2021-10-29 13:02 ` [PATCH v3 01/21] x86/cpufreatures: add AMD Collaborative Processor Performance Control feature flag Huang Rui
  2021-10-29 14:39   ` Borislav Petkov
@ 2021-11-06 10:28   ` Borislav Petkov
  2021-11-09  3:08     ` Huang Rui
  1 sibling, 1 reply; 50+ messages in thread
From: Borislav Petkov @ 2021-11-06 10:28 UTC (permalink / raw)
  To: Huang Rui
  Cc: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Peter Zijlstra,
	Ingo Molnar, Giovanni Gherdovich, linux-pm, Deepak Sharma,
	Alex Deucher, Mario Limonciello, Steven Noonan, Nathan Fontenot,
	Jinzhou Su, Xiaojian Du, linux-kernel, x86

On Fri, Oct 29, 2021 at 09:02:21PM +0800, Huang Rui wrote:
> Add Collaborative Processor Performance Control feature flag for AMD
> processors.
> 
> This feature flag will be used on the following amd-pstate driver. The
> amd-pstate driver has two approaches to implement the frequency control
> behavior. That depends on the CPU hardware implementation. One is "Full
> MSR Support" and another is "Shared Memory Support". The feature flag
> indicates the current processors with "Full MSR Support".
> 
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> ---
>  arch/x86/include/asm/cpufeatures.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index d0ce5cfd3ac1..f23dc1abd485 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -313,6 +313,7 @@
>  #define X86_FEATURE_AMD_SSBD		(13*32+24) /* "" Speculative Store Bypass Disable */
>  #define X86_FEATURE_VIRT_SSBD		(13*32+25) /* Virtualized Speculative Store Bypass Disable */
>  #define X86_FEATURE_AMD_SSB_NO		(13*32+26) /* "" Speculative Store Bypass is fixed in hardware. */
> +#define X86_FEATURE_AMD_CPPC		(13*32+27) /* Collaborative Processor Performance Control */

I know I have acked this already but an Intel patchset made me look at
this again: there's no need to have the vendor name in the feature name:

X86_FEATURE_CPPC

is perfectly fine.

-- 
Regards/Gruss,
    Boris.

SUSE Software Solutions Germany GmbH, GF: Ivo Totev, HRB 36809, AG Nürnberg

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 00/21] cpufreq: introduce a new AMD CPU frequency control mechanism
  2021-11-06  8:58     ` Matt McDonald
@ 2021-11-08  9:20       ` Huang Rui
  2021-11-12 11:21         ` Du, Xiaojian
  0 siblings, 1 reply; 50+ messages in thread
From: Huang Rui @ 2021-11-08  9:20 UTC (permalink / raw)
  To: Matt McDonald
  Cc: Giovanni Gherdovich, Rafael J . Wysocki, Viresh Kumar,
	Shuah Khan, Borislav Petkov, Peter Zijlstra, Ingo Molnar,
	linux-pm, Sharma, Deepak, Deucher, Alexander, Limonciello, Mario,
	Steven Noonan, Fontenot, Nathan, Su, Jinzhou (Joe),
	Du, Xiaojian, linux-kernel, x86

On Sat, Nov 06, 2021 at 04:58:35PM +0800, Matt McDonald wrote:
> > > I've tested this driver and it seems the results are a little
> > > underwhelming.
> > > The test machine is a two sockets server with two AMD EPYC 7713,
> > > family:model:stepping 25:1:1, 128 cores/256 threads, 256G of memory
> > > and SSD
> > > storage. On this system, the amd-pstate driver works only in
> > > "shared memory support", not in "full MSR support",
> > > meaning that frequency switches are triggered from a workqueue
> > > instead of scheduler context (!fast_switch).
> 
> Huang, I've also done some detailed testing, and while many synthetic
> benchmarks seem to show minimal differences between this new frequency
> control mechanism and acpi_cpufreq, the general user experience seems a
> bit degraded, but most of all, gaming performance in many instances (if
> not all) is cut in half. Fully half. 
> 
> I have an RTX 3090 and a Ryzen 9 5900X, with 32GB (4x8) DDR4 3600. In

May we know the family/model id of your processors?

> Control with DLSS and RT enabled, on 5.15.rc5 with acpi_cpufreq, I get
> 120-130 fps at 1440p. The same exact kernel with v3 of AMD_CPPC gives
> me 50 fps. GPU usage is still at 100, but the CPU frequency is being
> reported as like 5100Mhz*, and other assorted weirdness, but most
> importantly the fps is stuck at 50. This is regardless of performance
> scheduler (schedutil, ondemand, userspace or performance). 

May we know your SMU version in your SBIOS?

Thanks,
Ray

> 
> *My CPU can indeed boost over 5GHz on a single core here and there, but
> this was constant and on all cores, so clearly it wasn't accurate.
> 
> Also, from the documentation it looks like there's supposed to be a way
> to fall back to acpi_cpufreq, but I found no such way to do that. If
> AMD_CPPC was built into the kernel, I had to use amd-pstate, there was
> no other option. Maybe I misinterpreted and acpi-cpufreq is only able
> to be used as a fallback for CPUs that don't support amd-pstate.
> 
> I know that gaming on Linux hasn't historically been one of AMD's
> priorities with their CPUs, but with the Steam Deck upcoming I would
> imagine this is a pretty important use-case, and I've tested multiple
> games and they all lose a full 50% performance. I'm happy to test any
> revisions or even kernel parameters or whatever else to try and get
> this sorted. 
> 
> 
> 
> > Would you mind that we add a module param or filter the known good
> > processors (mobile parts) to load amd-pstate. And others can use the
> > param
> > to switch between amd-pstate and acpi-cpufreq manually? After we
> > address the
> > performance gap, then we can switch it back.
> 
> 
> This would be something I would be interested to try.
> 
> > 
> > It seems the issue mainly from the processors with big number of
> > cores and
> > threads. Let's find the similiar family threadripper or EYPC
> > processors to
> > duplicate the test results. Will contact at you for details. :-)
> 
> This may be an interesting route of investigation, I could potentially
> try running a game with `taskset -c 0-7` or something similar. 
> 
> > 
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 01/21] x86/cpufreatures: add AMD Collaborative Processor Performance Control feature flag
  2021-11-06 10:28   ` Borislav Petkov
@ 2021-11-09  3:08     ` Huang Rui
  0 siblings, 0 replies; 50+ messages in thread
From: Huang Rui @ 2021-11-09  3:08 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Peter Zijlstra,
	Ingo Molnar, Giovanni Gherdovich, linux-pm, Sharma, Deepak,
	Deucher, Alexander, Limonciello, Mario, Steven Noonan, Fontenot,
	Nathan, Su, Jinzhou (Joe),
	Du, Xiaojian, linux-kernel, x86

On Sat, Nov 06, 2021 at 06:28:54PM +0800, Borislav Petkov wrote:
> On Fri, Oct 29, 2021 at 09:02:21PM +0800, Huang Rui wrote:
> > Add Collaborative Processor Performance Control feature flag for AMD
> > processors.
> > 
> > This feature flag will be used on the following amd-pstate driver. The
> > amd-pstate driver has two approaches to implement the frequency control
> > behavior. That depends on the CPU hardware implementation. One is "Full
> > MSR Support" and another is "Shared Memory Support". The feature flag
> > indicates the current processors with "Full MSR Support".
> > 
> > Signed-off-by: Huang Rui <ray.huang@amd.com>
> > ---
> >  arch/x86/include/asm/cpufeatures.h | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> > index d0ce5cfd3ac1..f23dc1abd485 100644
> > --- a/arch/x86/include/asm/cpufeatures.h
> > +++ b/arch/x86/include/asm/cpufeatures.h
> > @@ -313,6 +313,7 @@
> >  #define X86_FEATURE_AMD_SSBD		(13*32+24) /* "" Speculative Store Bypass Disable */
> >  #define X86_FEATURE_VIRT_SSBD		(13*32+25) /* Virtualized Speculative Store Bypass Disable */
> >  #define X86_FEATURE_AMD_SSB_NO		(13*32+26) /* "" Speculative Store Bypass is fixed in hardware. */
> > +#define X86_FEATURE_AMD_CPPC		(13*32+27) /* Collaborative Processor Performance Control */
> 
> I know I have acked this already but an Intel patchset made me look at
> this again: there's no need to have the vendor name in the feature name:
> 
> X86_FEATURE_CPPC
> 
> is perfectly fine.
> 

Fine. Will update it in V4.

Thanks,
Ray

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 11/21] cpufreq: amd: add amd-pstate frequencies attributes
  2021-11-05 18:59   ` Nathan Fontenot
@ 2021-11-10 12:28     ` Huang Rui
  0 siblings, 0 replies; 50+ messages in thread
From: Huang Rui @ 2021-11-10 12:28 UTC (permalink / raw)
  To: Fontenot, Nathan
  Cc: Rafael J . Wysocki, Viresh Kumar, Shuah Khan, Borislav Petkov,
	Peter Zijlstra, Ingo Molnar, Giovanni Gherdovich, linux-pm,
	Sharma, Deepak, Deucher, Alexander, Limonciello, Mario,
	Steven Noonan, Su, Jinzhou (Joe),
	Du, Xiaojian, linux-kernel, x86

On Sat, Nov 06, 2021 at 02:59:55AM +0800, Fontenot, Nathan wrote:
> On 10/29/21 8:02 AM, Huang Rui wrote:
> > Introduce sysfs attributes to get the different level processor
> > frequencies.
> > 
> > Signed-off-by: Huang Rui <ray.huang@amd.com>
> > ---
> >  drivers/cpufreq/amd-pstate.c | 63 ++++++++++++++++++++++++++++++++++++
> >  1 file changed, 63 insertions(+)
> > 
> > diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> > index 9af27ac1f818..8cf1e80f44e0 100644
> > --- a/drivers/cpufreq/amd-pstate.c
> > +++ b/drivers/cpufreq/amd-pstate.c
> > @@ -485,6 +485,68 @@ static int amd_pstate_cpu_exit(struct cpufreq_policy *policy)
> >  	return 0;
> >  }
> >  
> > +/* Sysfs attributes */
> > +
> > +/* This frequency is to indicate the maximum hardware frequency.
> > + * If boost is not active but supported, the frequency will be larger than the
> > + * one in cpuinfo.
> > + */
> > +static ssize_t show_amd_pstate_max_freq(struct cpufreq_policy *policy,
> > +					char *buf)
> > +{
> > +	int max_freq;
> > +	struct amd_cpudata *cpudata;
> > +
> > +	cpudata = policy->driver_data;
> > +
> > +	max_freq = amd_get_max_freq(cpudata);
> > +	if (max_freq < 0)
> > +		return max_freq;
> > +
> > +	return sprintf(&buf[0], "%u\n", max_freq);
> > +}
> > +
> > +static ssize_t show_amd_pstate_nominal_freq(struct cpufreq_policy *policy,
> > +					    char *buf)
> > +{
> > +	int nominal_freq;
> > +	struct amd_cpudata *cpudata;
> > +
> > +	cpudata = policy->driver_data;
> > +
> > +	nominal_freq = amd_get_nominal_freq(cpudata);
> > +	if (nominal_freq < 0)
> > +		return nominal_freq;
> > +
> > +	return sprintf(&buf[0], "%u\n", nominal_freq);
> > +}
> 
> The nominal_freq valus is already reported in sysfs by drivers/acpi since this
> value is part of the ACPI spec. Is there a reason to have multiple sysfs entries 
> for the same value?
> 

I will clean them up in V4. Thanks!

Ray

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v3 00/21] cpufreq: introduce a new AMD CPU frequency control mechanism
  2021-11-08  9:20       ` Huang Rui
@ 2021-11-12 11:21         ` Du, Xiaojian
  0 siblings, 0 replies; 50+ messages in thread
From: Du, Xiaojian @ 2021-11-12 11:21 UTC (permalink / raw)
  To: Matt McDonald
  Cc: Giovanni Gherdovich, Rafael J . Wysocki, Viresh Kumar,
	Shuah Khan, Borislav Petkov, Peter Zijlstra, Ingo Molnar,
	linux-pm, Sharma, Deepak, Deucher, Alexander, Limonciello, Mario,
	Steven Noonan, Fontenot, Nathan, Su, Jinzhou (Joe),
	linux-kernel, x86, Huang, Ray

[AMD Official Use Only]

Hi Matt,

Thanks for you test, we are very happy to receive the feedback from you and community.
We try to reproduce the issue you reported in our local environment.

Hardware configuration:
CPU: 5900X 12core
MEM: DDR4 8*2GB @2667MHz@2channel
GPU: VEGA20, Radeon VII
Mainboard: B550
Kennel: 5.15-rc, custom kernel, with acpi-cpufreq and amd_pstate driver.

We build two sets of the same system and install the pure Ubuntu20.04.3 OS and Steam.
The software version of Steam is default.
And we use the *USB synchronizer* to control the two systems at the same time.

For "Control" game:
Graphics option: default setting, 1080P, to avoid GPU performance bottle.
GPU driver package is:
https://drivers.amd.com/drivers/linux/amdgpu-pro-21.20-1292797-rhel-8.4.tar.xz
(Installed with command: ./amdgpu-install  --no-dkms)

The only difference of the two systems is the different cpufreq driver: one is acpi-cpufreq, another is amd_pstate.

From our test result, we can't find one obvious performance gap between the two systems, they all run the "Control" at 100-120fps.
You can fetch the result capture from the following picture and videos, they will show the two screens at the same time:

One picture:
https://drive.google.com/file/d/1PvSduykJn9U5MMOhzFWycnbmGmznalM3/view?usp=sharing

Two videos:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdrive.google.com%2Ffile%2Fd%2F1nQQEteL-v_zQxnOJpyW8JqvRW2FFDN2Z%2Fview%3Fusp%3Dsharing&data=04%7C01%7Cray.huang%40amd.com%7C2103847cc456406b2d0508d9a5c6c3c0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637723096252262986%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=NH21Xjhg8BWm17JJW%2F5hN8JIMkXYwjQCIrTxxjSjrIE%3D&reserved=0

https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdrive.google.com%2Ffile%2Fd%2F1heuPgFG71SQHvGb6wfedrQciBfE2rhnu%2Fview%3Fusp%3Dsharing&data=04%7C01%7Cray.huang%40amd.com%7C2103847cc456406b2d0508d9a5c6c3c0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637723096252272980%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=6%2BcvgbUSkk%2BaRThfID5wIbjjY6sHusJ90uygw6%2FO6m4%3D&reserved=0

We don't test with NV GPU cards, because we have no NV RTX cards so far.
But we can test again with navi21 GPU, named as RX 6900xt/6800xt/6800, if the issue is related to ray trace.
Would you have any chance to use one AMD GPU to re-test with your system?

Anyway, very appreciated for your feedback, we need more feedback to improve our AMD CPU driver.

Thanks,
Xiaojian


-----Original Message-----
From: Huang, Ray <Ray.Huang@amd.com>
Sent: 2021年11月8日 17:20
To: Matt McDonald <gardotd426@gmail.com>
Cc: Giovanni Gherdovich <ggherdovich@suse.cz>; Rafael J . Wysocki <rafael.j.wysocki@intel.com>; Viresh Kumar <viresh.kumar@linaro.org>; Shuah Khan <skhan@linuxfoundation.org>; Borislav Petkov <bp@suse.de>; Peter Zijlstra <peterz@infradead.org>; Ingo Molnar <mingo@kernel.org>; linux-pm@vger.kernel.org; Sharma, Deepak <Deepak.Sharma@amd.com>; Deucher, Alexander <Alexander.Deucher@amd.com>; Limonciello, Mario <Mario.Limonciello@amd.com>; Steven Noonan <steven@valvesoftware.com>; Fontenot, Nathan <Nathan.Fontenot@amd.com>; Su, Jinzhou (Joe) <Jinzhou.Su@amd.com>; Du, Xiaojian <Xiaojian.Du@amd.com>; linux-kernel@vger.kernel.org; x86@kernel.org
Subject: Re: [PATCH v3 00/21] cpufreq: introduce a new AMD CPU frequency control mechanism

On Sat, Nov 06, 2021 at 04:58:35PM +0800, Matt McDonald wrote:
> > > I've tested this driver and it seems the results are a little
> > > underwhelming.
> > > The test machine is a two sockets server with two AMD EPYC 7713,
> > > family:model:stepping 25:1:1, 128 cores/256 threads, 256G of
> > > memory and SSD storage. On this system, the amd-pstate driver
> > > works only in "shared memory support", not in "full MSR support",
> > > meaning that frequency switches are triggered from a workqueue
> > > instead of scheduler context (!fast_switch).
>
> Huang, I've also done some detailed testing, and while many synthetic
> benchmarks seem to show minimal differences between this new frequency
> control mechanism and acpi_cpufreq, the general user experience seems
> a bit degraded, but most of all, gaming performance in many instances
> (if not all) is cut in half. Fully half.
>
> I have an RTX 3090 and a Ryzen 9 5900X, with 32GB (4x8) DDR4 3600. In

May we know the family/model id of your processors?

> Control with DLSS and RT enabled, on 5.15.rc5 with acpi_cpufreq, I get
> 120-130 fps at 1440p. The same exact kernel with v3 of AMD_CPPC gives
> me 50 fps. GPU usage is still at 100, but the CPU frequency is being
> reported as like 5100Mhz*, and other assorted weirdness, but most
> importantly the fps is stuck at 50. This is regardless of performance
> scheduler (schedutil, ondemand, userspace or performance).

May we know your SMU version in your SBIOS?

Thanks,
Ray

>
> *My CPU can indeed boost over 5GHz on a single core here and there,
> but this was constant and on all cores, so clearly it wasn't accurate.
>
> Also, from the documentation it looks like there's supposed to be a
> way to fall back to acpi_cpufreq, but I found no such way to do that.
> If AMD_CPPC was built into the kernel, I had to use amd-pstate, there
> was no other option. Maybe I misinterpreted and acpi-cpufreq is only
> able to be used as a fallback for CPUs that don't support amd-pstate.
>
> I know that gaming on Linux hasn't historically been one of AMD's
> priorities with their CPUs, but with the Steam Deck upcoming I would
> imagine this is a pretty important use-case, and I've tested multiple
> games and they all lose a full 50% performance. I'm happy to test any
> revisions or even kernel parameters or whatever else to try and get
> this sorted.
>
>
>
> > Would you mind that we add a module param or filter the known good
> > processors (mobile parts) to load amd-pstate. And others can use the
> > param to switch between amd-pstate and acpi-cpufreq manually? After
> > we address the performance gap, then we can switch it back.
>
>
> This would be something I would be interested to try.
>
> >
> > It seems the issue mainly from the processors with big number of
> > cores and threads. Let's find the similiar family threadripper or
> > EYPC processors to duplicate the test results. Will contact at you
> > for details. :-)
>
> This may be an interesting route of investigation, I could potentially
> try running a game with `taskset -c 0-7` or something similar.
>
> >
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2021-11-12 11:21 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-29 13:02 [PATCH v3 00/21] cpufreq: introduce a new AMD CPU frequency control mechanism Huang Rui
2021-10-29 13:02 ` [PATCH v3 01/21] x86/cpufreatures: add AMD Collaborative Processor Performance Control feature flag Huang Rui
2021-10-29 14:39   ` Borislav Petkov
2021-11-06 10:28   ` Borislav Petkov
2021-11-09  3:08     ` Huang Rui
2021-10-29 13:02 ` [PATCH v3 02/21] x86/msr: add AMD CPPC MSR definitions Huang Rui
2021-10-29 13:02 ` [PATCH v3 03/21] ACPI: CPPC: implement support for SystemIO registers Huang Rui
2021-10-29 13:02 ` [PATCH v3 04/21] ACPI: CPPC: Check present CPUs for determining _CPC is valid Huang Rui
2021-10-29 13:02 ` [PATCH v3 05/21] ACPI: CPPC: add cppc enable register function Huang Rui
2021-10-29 14:15   ` Limonciello, Mario
2021-11-01  9:20     ` Huang Rui
2021-10-29 13:02 ` [PATCH v3 06/21] cpufreq: amd: introduce a new amd pstate driver to support future processors Huang Rui
2021-11-02 18:52   ` Limonciello, Mario
2021-11-02 19:38   ` Nathan Fontenot
2021-11-03  7:01     ` Huang Rui
2021-11-04 15:10       ` Nathan Fontenot
2021-11-05  4:20         ` Huang Rui
2021-10-29 13:02 ` [PATCH v3 07/21] cpufreq: amd: add fast switch function for amd-pstate Huang Rui
2021-10-29 14:16   ` Limonciello, Mario
2021-11-02 19:56   ` Nathan Fontenot
2021-10-29 13:02 ` [PATCH v3 08/21] cpufreq: amd: add acpi cppc function as the backend for legacy processors Huang Rui
2021-10-29 14:20   ` Limonciello, Mario
2021-11-01  9:02     ` Huang Rui
2021-11-02 18:46   ` Nathan Fontenot
2021-11-03 12:00     ` Huang Rui
2021-10-29 13:02 ` [PATCH v3 09/21] cpufreq: amd: add trace for amd-pstate module Huang Rui
2021-10-29 13:02 ` [PATCH v3 10/21] cpufreq: amd: add boost mode support for amd-pstate Huang Rui
2021-10-29 13:02 ` [PATCH v3 11/21] cpufreq: amd: add amd-pstate frequencies attributes Huang Rui
2021-11-05 18:59   ` Nathan Fontenot
2021-11-10 12:28     ` Huang Rui
2021-10-29 13:02 ` [PATCH v3 12/21] cpufreq: amd: add amd-pstate performance attributes Huang Rui
2021-11-05 18:50   ` Nathan Fontenot
2021-10-29 13:02 ` [PATCH v3 13/21] cpupower: add AMD P-state capability flag Huang Rui
2021-10-29 13:02 ` [PATCH v3 14/21] cpupower: add the function to check amd-pstate enabled Huang Rui
2021-10-29 13:02 ` [PATCH v3 15/21] cpupower: initial AMD P-state capability Huang Rui
2021-10-29 13:02 ` [PATCH v3 16/21] cpupower: add the function to get the sysfs value from specific table Huang Rui
2021-10-29 13:02 ` [PATCH v3 17/21] cpupower: add amd-pstate sysfs definition and access helper Huang Rui
2021-10-29 14:10   ` Limonciello, Mario
2021-11-01  9:14     ` Huang Rui
2021-10-29 13:02 ` [PATCH v3 18/21] cpupower: enable boost state support for amd-pstate module Huang Rui
2021-11-02 20:11   ` Nathan Fontenot
2021-11-03  7:04     ` Huang Rui
2021-10-29 13:02 ` [PATCH v3 19/21] cpupower: move print_speed function into misc helper Huang Rui
2021-10-29 13:02 ` [PATCH v3 20/21] cpupower: print amd-pstate information on cpupower Huang Rui
2021-10-29 13:02 ` [PATCH v3 21/21] Documentation: amd-pstate: add amd-pstate driver introduction Huang Rui
2021-11-04 16:40 ` [PATCH v3 00/21] cpufreq: introduce a new AMD CPU frequency control mechanism Giovanni Gherdovich
2021-11-05 16:09   ` Huang Rui
2021-11-06  8:58     ` Matt McDonald
2021-11-08  9:20       ` Huang Rui
2021-11-12 11:21         ` Du, Xiaojian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).