All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] x86/cpufeatures: Add AMD FAST CPPC feature flag
@ 2024-04-28  9:11 Xiaojian Du
  2024-04-28  9:11 ` [PATCH 2/2] cpufreq: amd-pstate: change cpu freq transition delay for some models Xiaojian Du
  2024-04-28  9:54 ` [PATCH 1/2] x86/cpufeatures: Add AMD FAST CPPC feature flag Borislav Petkov
  0 siblings, 2 replies; 8+ messages in thread
From: Xiaojian Du @ 2024-04-28  9:11 UTC (permalink / raw)
  To: linux-pm, gautham.shenoy, Borislav.Petkov, mario.limonciello, ray.huang
  Cc: Perry.Yuan, Perry Yuan, Xiaojian Du

From: Perry Yuan <perry.yuan@amd.com>

Some AMD Zen 4 processors support a new feature FAST CPPC which
allows for a faster CPPC loop due to internal architectual
enhancements. The goal of this faster loop is higher performance
at the same power consumption.

Reference:
Page 99 of PPR for AMD Family 19h Model 61h rev.B1
https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/56713-B1_3_05.zip

Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/kernel/cpu/scattered.c    | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 3c7434329661..6c128d463a14 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -470,6 +470,7 @@
 #define X86_FEATURE_BHI_CTRL		(21*32+ 2) /* "" BHI_DIS_S HW control available */
 #define X86_FEATURE_CLEAR_BHB_HW	(21*32+ 3) /* "" BHI_DIS_S HW control enabled */
 #define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* "" Clear branch history at vmexit using SW loop */
+#define X86_FEATURE_FAST_CPPC		(21*32 + 5) /* "" AMD Fast CPPC */
 
 /*
  * BUG word(s)
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index af5aa2c754c2..9c273c231f56 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -51,6 +51,7 @@ static const struct cpuid_bit cpuid_bits[] = {
 	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
 	{ X86_FEATURE_AMD_LBR_V2,	CPUID_EAX,  1, 0x80000022, 0 },
 	{ X86_FEATURE_AMD_LBR_PMC_FREEZE,	CPUID_EAX,  2, 0x80000022, 0 },
+	{ X86_FEATURE_FAST_CPPC,	CPUID_EDX,  15, 0x80000007, 0 },
 	{ 0, 0, 0, 0, 0 }
 };
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] cpufreq: amd-pstate: change cpu freq transition delay for some models
  2024-04-28  9:11 [PATCH 1/2] x86/cpufeatures: Add AMD FAST CPPC feature flag Xiaojian Du
@ 2024-04-28  9:11 ` Xiaojian Du
  2024-04-28  9:54 ` [PATCH 1/2] x86/cpufeatures: Add AMD FAST CPPC feature flag Borislav Petkov
  1 sibling, 0 replies; 8+ messages in thread
From: Xiaojian Du @ 2024-04-28  9:11 UTC (permalink / raw)
  To: linux-pm, gautham.shenoy, Borislav.Petkov, mario.limonciello, ray.huang
  Cc: Perry.Yuan, Xiaojian Du

Some of AMD ZEN4 APU/CPU have support for adjusting the CPU core
clock more quickly and presicely according to CPU work loading.
This is advertised by the Fast CPPC x86 feature.
This change will only be effective in the *passive mode* of
AMD pstate driver. From the test results of different
transition delay values, 600us is chosen to make a balance
between performance and power consumption.

Some test results on AMD Ryzen 7840HS(Phoenix) APU:

1. Tbench
(Energy less is better, Throughput more is better,
PPW--Performance per Watt more is better)
============= =================== ============== =============== ============== =============== ============== =============== ===============
 Trans Delay   Tbench              governor:schedutil, 3-iterations average
============= =================== ============== =============== ============== =============== ============== =============== ===============
 1000us        Clients             1              2               4              8              12             16              32
               Energy/Joules       2010           2804            8768           17171          16170          15132           15027
               Throughput/(MB/s)   114            259             1041           3010           3135           4851            4605
               PPW                 0.0567         0.0923          0.1187         0.1752         0.1938         0.3205          0.3064
 600us         Clients             1              2               4              8              12             16              32
               Energy/Joules       2115  (5.22%)  2388  (-14.84%) 10700(22.03%)  16716 (-2.65%) 15939 (-1.43%) 15053 (-0.52%)  15083 (0.37% )
               Throughput/(MB/s)   122   (7.02%)  234   (-9.65% ) 1188 (14.12%)  3003  (-0.23%) 3143  (0.26% ) 4842  (-0.19%)  4603  (-0.04%)
               PPW                 0.0576(1.59%)  0.0979(6.07%  ) 0.111(-6.49%)  0.1796(2.51% ) 0.1971(1.70% ) 0.3216(0.34% )  0.3051(-0.42%)
============= =================== ============== ================ ============= =============== ============== =============== ===============

2.Dbench
(Energy less is better, Throughput more is better,
PPW--Performance per Watt more is better)
============= =================== ============== =============== ============== =============== ============== =============== ===============
 Trans Delay   Dbench              governor:schedutil, 3-iterations average
============= =================== ============== =============== ============== =============== ============== =============== ===============
 1000us        Clients             1             2               4              8               12             16              32
               Energy/Joules       4890          3779            3567           5157            5611           6500            8163
               Throughput/(MB/s)   327           167             220            577             775            938             1397
               PPW                 0.0668        0.0441          0.0616         0.1118          0.1381         0.1443          0.1711
 600us         Clients             1             2               4              8               12             16              32
               Energy/Joules       4915  (0.51%) 4912  (29.98%)  3506  (-1.71%) 4907  (-4.85% ) 5011 (-10.69%) 5672  (-12.74%) 8141  (-0.27%)
               Throughput/(MB/s)   348   (6.42%) 284   (70.06%)  220   (0.00% ) 518   (-10.23%) 712  (-8.13% ) 854   (-8.96% ) 1475  (5.58% )
               PPW                 0.0708(5.99%) 0.0578(31.07%)  0.0627(1.79% ) 0.1055(-5.64% ) 0.142(2.82%  ) 0.1505(4.30%  ) 0.1811(5.84% )
============= =================== ============== =============== ============== =============== ============== =============== ===============

3.Hackbench(less time is better)
============= =========================== ==========================
  hackbench     governor:schedutil
============= =========================== ==========================
  Trans Delay   Process Mode Ave time(s)  Thread Mode Ave time(s)
  1000us        14.484                      14.484
  600us         14.418(-0.46%)              15.41(+6.39%)
============= =========================== ==========================

4.Perf_sched_bench(less time is better)
============= =================== ============== ============== ============== =============== =============== =============
 Trans Delay  perf_sched_bench    governor:schedutil
============= =================== ============== ============== ============== =============== =============== =============
  1000us        Groups             1             2              4              8               12              24
                AveTime(s)        1.64          2.851          5.878          11.636          16.093          26.395
  600us         Groups             1             2              4              8               12              24
                AveTime(s)        1.69(3.05%)   2.845(-0.21%)  5.843(-0.60%)  11.576(-0.52%)  16.092(-0.01%)  26.32(-0.28%)
============= ================== ============== ============== ============== =============== =============== ==============

5.Sysbench(higher is better)
============= ================== ============== ================= ============== ================ =============== =================
  Sysbench    governor:schedutil
============= ================== ============== ================= ============== ================ =============== =================
  1000us      Thread             1               2                4              8                12               24
              Ave events         6020.98         12273.39         24119.82       46171.57         47074.37         47831.72
  600us       Thread             1               2                4              8                12               24
              Ave events         6154.82(2.22%)  12271.63(-0.01%) 24392.5(1.13%) 46117.64(-0.12%) 46852.19(-0.47%) 47678.92(-0.32%)
============= ================== ============== ================= ============== ================ =============== =================

In conclusion, a shorter transition delay
of cpu clock will make a quite positive effect to improve PPW on Dbench test,
in the meanwhile , keep stable performance on Tbench,
Hackbench, Perf_sched_bench and Sysbench.

Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
---
 drivers/cpufreq/amd-pstate.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index 2015c9fcc3c9..8c8594f67af6 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -50,6 +50,7 @@
 
 #define AMD_PSTATE_TRANSITION_LATENCY	20000
 #define AMD_PSTATE_TRANSITION_DELAY	1000
+#define AMD_PSTATE_FAST_CPPC_TRANSITION_DELAY	600
 #define AMD_PSTATE_PREFCORE_THRESHOLD	166
 
 /*
@@ -868,7 +869,11 @@ static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
 	}
 
 	policy->cpuinfo.transition_latency = AMD_PSTATE_TRANSITION_LATENCY;
-	policy->transition_delay_us = AMD_PSTATE_TRANSITION_DELAY;
+
+	if (cpu_feature_enabled(X86_FEATURE_FAST_CPPC))
+		policy->transition_delay_us = AMD_PSTATE_FAST_CPPC_TRANSITION_DELAY;
+	else
+		policy->transition_delay_us = AMD_PSTATE_TRANSITION_DELAY;
 
 	policy->min = min_freq;
 	policy->max = max_freq;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] x86/cpufeatures: Add AMD FAST CPPC feature flag
  2024-04-28  9:11 [PATCH 1/2] x86/cpufeatures: Add AMD FAST CPPC feature flag Xiaojian Du
  2024-04-28  9:11 ` [PATCH 2/2] cpufreq: amd-pstate: change cpu freq transition delay for some models Xiaojian Du
@ 2024-04-28  9:54 ` Borislav Petkov
  2024-04-28 10:59   ` Du, Xiaojian
  1 sibling, 1 reply; 8+ messages in thread
From: Borislav Petkov @ 2024-04-28  9:54 UTC (permalink / raw)
  To: Xiaojian Du
  Cc: linux-pm, gautham.shenoy, mario.limonciello, ray.huang, Perry.Yuan, lkml

+ lkml

On Sun, Apr 28, 2024 at 05:11:32PM +0800, Xiaojian Du wrote:
> From: Perry Yuan <perry.yuan@amd.com>
> 
> Some AMD Zen 4 processors support a new feature FAST CPPC which
> allows for a faster CPPC loop due to internal architectual
> enhancements. The goal of this faster loop is higher performance
> at the same power consumption.
> 
> Reference:
> Page 99 of PPR for AMD Family 19h Model 61h rev.B1
> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/56713-B1_3_05.zip

This should say "See the PPR for AMD Family 19h Model 61h rev.B1, docID
56713" so that people can actually find it.

The URLs are flaky and change regularly so can't use them.

> Signed-off-by: Perry Yuan <perry.yuan@amd.com>
> Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com>
> ---
>  arch/x86/include/asm/cpufeatures.h | 1 +
>  arch/x86/kernel/cpu/scattered.c    | 1 +
>  2 files changed, 2 insertions(+)

Always use ./scripts/get_maintainer.pl when sending a patch to know who
to Cc.

Also, have a look at those to get an idea how the process works:

https://kernel.org/doc/html/latest/process/development-process.html
https://kernel.org/doc/html/latest/process/submitting-patches.html

> 
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 3c7434329661..6c128d463a14 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -470,6 +470,7 @@
>  #define X86_FEATURE_BHI_CTRL		(21*32+ 2) /* "" BHI_DIS_S HW control available */
>  #define X86_FEATURE_CLEAR_BHB_HW	(21*32+ 3) /* "" BHI_DIS_S HW control enabled */
>  #define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* "" Clear branch history at vmexit using SW loop */
> +#define X86_FEATURE_FAST_CPPC		(21*32 + 5) /* "" AMD Fast CPPC */
>  
>  /*
>   * BUG word(s)
> diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
> index af5aa2c754c2..9c273c231f56 100644
> --- a/arch/x86/kernel/cpu/scattered.c
> +++ b/arch/x86/kernel/cpu/scattered.c
> @@ -51,6 +51,7 @@ static const struct cpuid_bit cpuid_bits[] = {
>  	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
>  	{ X86_FEATURE_AMD_LBR_V2,	CPUID_EAX,  1, 0x80000022, 0 },
>  	{ X86_FEATURE_AMD_LBR_PMC_FREEZE,	CPUID_EAX,  2, 0x80000022, 0 },
> +	{ X86_FEATURE_FAST_CPPC,	CPUID_EDX,  15, 0x80000007, 0 },
>  	{ 0, 0, 0, 0, 0 }
>  };
>  

With the above addressed:

Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH 1/2] x86/cpufeatures: Add AMD FAST CPPC feature flag
  2024-04-28  9:54 ` [PATCH 1/2] x86/cpufeatures: Add AMD FAST CPPC feature flag Borislav Petkov
@ 2024-04-28 10:59   ` Du, Xiaojian
  2024-04-28 11:05     ` Borislav Petkov
  0 siblings, 1 reply; 8+ messages in thread
From: Du, Xiaojian @ 2024-04-28 10:59 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-pm, Shenoy, Gautham Ranjal, Limonciello, Mario, Huang, Ray,
	Yuan, Perry, lkml

[AMD Official Use Only - General]

Thanks a lot for review, I will modify before submitting.

Xiaojian

-----Original Message-----
From: Borislav Petkov <bp@alien8.de>
Sent: Sunday, April 28, 2024 5:54 PM
To: Du, Xiaojian <Xiaojian.Du@amd.com>
Cc: linux-pm@vger.kernel.org; Shenoy, Gautham Ranjal <gautham.shenoy@amd.com>; Limonciello, Mario <Mario.Limonciello@amd.com>; Huang, Ray <Ray.Huang@amd.com>; Yuan, Perry <Perry.Yuan@amd.com>; lkml <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/2] x86/cpufeatures: Add AMD FAST CPPC feature flag

+ lkml

On Sun, Apr 28, 2024 at 05:11:32PM +0800, Xiaojian Du wrote:
> From: Perry Yuan <perry.yuan@amd.com>
>
> Some AMD Zen 4 processors support a new feature FAST CPPC which allows
> for a faster CPPC loop due to internal architectual enhancements. The
> goal of this faster loop is higher performance at the same power
> consumption.
>
> Reference:
> Page 99 of PPR for AMD Family 19h Model 61h rev.B1
> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/p
> rogrammer-references/56713-B1_3_05.zip

This should say "See the PPR for AMD Family 19h Model 61h rev.B1, docID 56713" so that people can actually find it.

The URLs are flaky and change regularly so can't use them.

> Signed-off-by: Perry Yuan <perry.yuan@amd.com>
> Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com>
> ---
>  arch/x86/include/asm/cpufeatures.h | 1 +
>  arch/x86/kernel/cpu/scattered.c    | 1 +
>  2 files changed, 2 insertions(+)

Always use ./scripts/get_maintainer.pl when sending a patch to know who to Cc.

Also, have a look at those to get an idea how the process works:

https://kernel.org/doc/html/latest/process/development-process.html
https://kernel.org/doc/html/latest/process/submitting-patches.html

>
> diff --git a/arch/x86/include/asm/cpufeatures.h
> b/arch/x86/include/asm/cpufeatures.h
> index 3c7434329661..6c128d463a14 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -470,6 +470,7 @@
>  #define X86_FEATURE_BHI_CTRL         (21*32+ 2) /* "" BHI_DIS_S HW control available */
>  #define X86_FEATURE_CLEAR_BHB_HW     (21*32+ 3) /* "" BHI_DIS_S HW control enabled */
>  #define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* "" Clear
> branch history at vmexit using SW loop */
> +#define X86_FEATURE_FAST_CPPC                (21*32 + 5) /* "" AMD Fast CPPC */
>
>  /*
>   * BUG word(s)
> diff --git a/arch/x86/kernel/cpu/scattered.c
> b/arch/x86/kernel/cpu/scattered.c index af5aa2c754c2..9c273c231f56
> 100644
> --- a/arch/x86/kernel/cpu/scattered.c
> +++ b/arch/x86/kernel/cpu/scattered.c
> @@ -51,6 +51,7 @@ static const struct cpuid_bit cpuid_bits[] = {
>       { X86_FEATURE_PERFMON_V2,       CPUID_EAX,  0, 0x80000022, 0 },
>       { X86_FEATURE_AMD_LBR_V2,       CPUID_EAX,  1, 0x80000022, 0 },
>       { X86_FEATURE_AMD_LBR_PMC_FREEZE,       CPUID_EAX,  2, 0x80000022, 0 },
> +     { X86_FEATURE_FAST_CPPC,        CPUID_EDX,  15, 0x80000007, 0 },
>       { 0, 0, 0, 0, 0 }
>  };
>

With the above addressed:

Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>

--
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] x86/cpufeatures: Add AMD FAST CPPC feature flag
  2024-04-28 10:59   ` Du, Xiaojian
@ 2024-04-28 11:05     ` Borislav Petkov
  0 siblings, 0 replies; 8+ messages in thread
From: Borislav Petkov @ 2024-04-28 11:05 UTC (permalink / raw)
  To: Du, Xiaojian
  Cc: linux-pm, Shenoy, Gautham Ranjal, Limonciello, Mario, Huang, Ray,
	Yuan, Perry, lkml

On Sun, Apr 28, 2024 at 10:59:50AM +0000, Du, Xiaojian wrote:
> Thanks a lot for review, I will modify before submitting.

Thanks.

Also, please do not top-post on a public ML but put your reply
underneath, like I just did.

That's also explained in those docs I pointed you to.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] cpufreq: amd-pstate: change cpu freq transition delay for some models
  2024-04-29  7:03 ` [PATCH 2/2] cpufreq: amd-pstate: change cpu freq transition delay for some models Xiaojian Du
  2024-05-11  6:54   ` Yuan, Perry
@ 2024-05-22 16:46   ` Mario Limonciello
  1 sibling, 0 replies; 8+ messages in thread
From: Mario Limonciello @ 2024-05-22 16:46 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: tglx, mingo, bp, dave.hansen, hpa, daniel.sneddon, jpoimboe,
	pawan.kumar.gupta, sandipan.das, kai.huang, perry.yuan, x86,
	ray.huang, Xiaojian Du, linux-kernel, linux-pm

On 4/29/2024 02:03, Xiaojian Du wrote:
> Some of AMD ZEN4 APU/CPU have support for adjusting the CPU core
> clock more quickly and presicely according to CPU work loading.
> This is advertised by the Fast CPPC x86 feature.
> This change will only be effective in the *passive mode* of
> AMD pstate driver. From the test results of different
> transition delay values, 600us is chosen to make a balance
> between performance and power consumption.
> 
> Some test results on AMD Ryzen 7840HS(Phoenix) APU:
> 
> 1. Tbench
> (Energy less is better, Throughput more is better,
> PPW--Performance per Watt more is better)
> ============= =================== ============== =============== ============== =============== ============== =============== ===============
>   Trans Delay   Tbench              governor:schedutil, 3-iterations average
> ============= =================== ============== =============== ============== =============== ============== =============== ===============
>   1000us        Clients             1              2               4              8              12             16              32
>                 Energy/Joules       2010           2804            8768           17171          16170          15132           15027
>                 Throughput/(MB/s)   114            259             1041           3010           3135           4851            4605
>                 PPW                 0.0567         0.0923          0.1187         0.1752         0.1938         0.3205          0.3064
>   600us         Clients             1              2               4              8              12             16              32
>                 Energy/Joules       2115  (5.22%)  2388  (-14.84%) 10700(22.03%)  16716 (-2.65%) 15939 (-1.43%) 15053 (-0.52%)  15083 (0.37% )
>                 Throughput/(MB/s)   122   (7.02%)  234   (-9.65% ) 1188 (14.12%)  3003  (-0.23%) 3143  (0.26% ) 4842  (-0.19%)  4603  (-0.04%)
>                 PPW                 0.0576(1.59%)  0.0979(6.07%  ) 0.111(-6.49%)  0.1796(2.51% ) 0.1971(1.70% ) 0.3216(0.34% )  0.3051(-0.42%)
> ============= =================== ============== ================ ============= =============== ============== =============== ===============
> 
> 2.Dbench
> (Energy less is better, Throughput more is better,
> PPW--Performance per Watt more is better)
> ============= =================== ============== =============== ============== =============== ============== =============== ===============
>   Trans Delay   Dbench              governor:schedutil, 3-iterations average
> ============= =================== ============== =============== ============== =============== ============== =============== ===============
>   1000us        Clients             1             2               4              8               12             16              32
>                 Energy/Joules       4890          3779            3567           5157            5611           6500            8163
>                 Throughput/(MB/s)   327           167             220            577             775            938             1397
>                 PPW                 0.0668        0.0441          0.0616         0.1118          0.1381         0.1443          0.1711
>   600us         Clients             1             2               4              8               12             16              32
>                 Energy/Joules       4915  (0.51%) 4912  (29.98%)  3506  (-1.71%) 4907  (-4.85% ) 5011 (-10.69%) 5672  (-12.74%) 8141  (-0.27%)
>                 Throughput/(MB/s)   348   (6.42%) 284   (70.06%)  220   (0.00% ) 518   (-10.23%) 712  (-8.13% ) 854   (-8.96% ) 1475  (5.58% )
>                 PPW                 0.0708(5.99%) 0.0578(31.07%)  0.0627(1.79% ) 0.1055(-5.64% ) 0.142(2.82%  ) 0.1505(4.30%  ) 0.1811(5.84% )
> ============= =================== ============== =============== ============== =============== ============== =============== ===============
> 
> 3.Hackbench(less time is better)
> ============= =========================== ==========================
>    hackbench     governor:schedutil
> ============= =========================== ==========================
>    Trans Delay   Process Mode Ave time(s)  Thread Mode Ave time(s)
>    1000us        14.484                      14.484
>    600us         14.418(-0.46%)              15.41(+6.39%)
> ============= =========================== ==========================
> 
> 4.Perf_sched_bench(less time is better)
> ============= =================== ============== ============== ============== =============== =============== =============
>   Trans Delay  perf_sched_bench    governor:schedutil
> ============= =================== ============== ============== ============== =============== =============== =============
>    1000us        Groups             1             2              4              8               12              24
>                  AveTime(s)        1.64          2.851          5.878          11.636          16.093          26.395
>    600us         Groups             1             2              4              8               12              24
>                  AveTime(s)        1.69(3.05%)   2.845(-0.21%)  5.843(-0.60%)  11.576(-0.52%)  16.092(-0.01%)  26.32(-0.28%)
> ============= ================== ============== ============== ============== =============== =============== ==============
> 
> 5.Sysbench(higher is better)
> ============= ================== ============== ================= ============== ================ =============== =================
>    Sysbench    governor:schedutil
> ============= ================== ============== ================= ============== ================ =============== =================
>    1000us      Thread             1               2                4              8                12               24
>                Ave events         6020.98         12273.39         24119.82       46171.57         47074.37         47831.72
>    600us       Thread             1               2                4              8                12               24
>                Ave events         6154.82(2.22%)  12271.63(-0.01%) 24392.5(1.13%) 46117.64(-0.12%) 46852.19(-0.47%) 47678.92(-0.32%)
> ============= ================== ============== ================= ============== ================ =============== =================
> 
> In conclusion, a shorter transition delay
> of cpu clock will make a quite positive effect to improve PPW on Dbench test,
> in the meanwhile , keep stable performance on Tbench,
> Hackbench, Perf_sched_bench and Sysbench.
> 
> Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com>
> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>

Rafael,

You can swap my R-b for an A-b for when you pick this up after merge window.

Thx!

Acked-by: Mario Limonciello <mario.limonciello@amd.com>

> ---
>   drivers/cpufreq/amd-pstate.c | 7 ++++++-
>   1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> index 2015c9fcc3c9..8c8594f67af6 100644
> --- a/drivers/cpufreq/amd-pstate.c
> +++ b/drivers/cpufreq/amd-pstate.c
> @@ -50,6 +50,7 @@
>   
>   #define AMD_PSTATE_TRANSITION_LATENCY	20000
>   #define AMD_PSTATE_TRANSITION_DELAY	1000
> +#define AMD_PSTATE_FAST_CPPC_TRANSITION_DELAY	600
>   #define AMD_PSTATE_PREFCORE_THRESHOLD	166
>   
>   /*
> @@ -868,7 +869,11 @@ static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
>   	}
>   
>   	policy->cpuinfo.transition_latency = AMD_PSTATE_TRANSITION_LATENCY;
> -	policy->transition_delay_us = AMD_PSTATE_TRANSITION_DELAY;
> +
> +	if (cpu_feature_enabled(X86_FEATURE_FAST_CPPC))
> +		policy->transition_delay_us = AMD_PSTATE_FAST_CPPC_TRANSITION_DELAY;
> +	else
> +		policy->transition_delay_us = AMD_PSTATE_TRANSITION_DELAY;
>   
>   	policy->min = min_freq;
>   	policy->max = max_freq;


^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH 2/2] cpufreq: amd-pstate: change cpu freq transition delay for some models
  2024-04-29  7:03 ` [PATCH 2/2] cpufreq: amd-pstate: change cpu freq transition delay for some models Xiaojian Du
@ 2024-05-11  6:54   ` Yuan, Perry
  2024-05-22 16:46   ` Mario Limonciello
  1 sibling, 0 replies; 8+ messages in thread
From: Yuan, Perry @ 2024-05-11  6:54 UTC (permalink / raw)
  To: Du, Xiaojian, linux-kernel, linux-pm
  Cc: tglx, mingo, bp, dave.hansen, hpa, daniel.sneddon, jpoimboe,
	pawan.kumar.gupta, Das1, Sandipan, kai.huang, x86, Huang, Ray,
	rafael, Limonciello, Mario

[AMD Official Use Only - General]

> -----Original Message-----
> From: Du, Xiaojian <Xiaojian.Du@amd.com>
> Sent: Monday, April 29, 2024 3:03 PM
> To: linux-kernel@vger.kernel.org; linux-pm@vger.kernel.org
> Cc: tglx@linutronix.de; mingo@redhat.com; bp@alien8.de;
> dave.hansen@linux.intel.com; hpa@zytor.com;
> daniel.sneddon@linux.intel.com; jpoimboe@kernel.org;
> pawan.kumar.gupta@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; kai.huang@intel.com; Yuan, Perry
> <Perry.Yuan@amd.com>; x86@kernel.org; Huang, Ray
> <Ray.Huang@amd.com>; rafael@kernel.org; Du, Xiaojian
> <Xiaojian.Du@amd.com>; Limonciello, Mario <Mario.Limonciello@amd.com>
> Subject: [PATCH 2/2] cpufreq: amd-pstate: change cpu freq transition delay
> for some models
>
> Some of AMD ZEN4 APU/CPU have support for adjusting the CPU core clock
> more quickly and presicely according to CPU work loading.
> This is advertised by the Fast CPPC x86 feature.
> This change will only be effective in the *passive mode* of AMD pstate
> driver. From the test results of different transition delay values, 600us is
> chosen to make a balance between performance and power consumption.
>
> Some test results on AMD Ryzen 7840HS(Phoenix) APU:
>
> 1. Tbench
> (Energy less is better, Throughput more is better, PPW--Performance per
> Watt more is better) ============= ===================
> ============== =============== ==============
> =============== ============== ===============
> ===============
>  Trans Delay   Tbench              governor:schedutil, 3-iterations average
> ============= =================== ==============
> =============== ============== ===============
> ============== =============== ===============
>  1000us        Clients             1              2               4              8              12             16
> 32
>                Energy/Joules       2010           2804            8768           17171          16170
> 15132           15027
>                Throughput/(MB/s)   114            259             1041           3010           3135
> 4851            4605
>                PPW                 0.0567         0.0923          0.1187         0.1752         0.1938
> 0.3205          0.3064
>  600us         Clients             1              2               4              8              12             16              32
>                Energy/Joules       2115  (5.22%)  2388  (-14.84%) 10700(22.03%)  16716
> (-2.65%) 15939 (-1.43%) 15053 (-0.52%)  15083 (0.37% )
>                Throughput/(MB/s)   122   (7.02%)  234   (-9.65% ) 1188 (14.12%)  3003
> (-0.23%) 3143  (0.26% ) 4842  (-0.19%)  4603  (-0.04%)
>                PPW                 0.0576(1.59%)  0.0979(6.07%  ) 0.111(-6.49%)
> 0.1796(2.51% ) 0.1971(1.70% ) 0.3216(0.34% )  0.3051(-0.42%)
> ============= =================== ==============
> ================ ============= ===============
> ============== =============== ===============
>
> 2.Dbench
> (Energy less is better, Throughput more is better, PPW--Performance per
> Watt more is better) ============= ===================
> ============== =============== ==============
> =============== ============== ===============
> ===============
>  Trans Delay   Dbench              governor:schedutil, 3-iterations average
> ============= =================== ==============
> =============== ============== ===============
> ============== =============== ===============
>  1000us        Clients             1             2               4              8               12             16
> 32
>                Energy/Joules       4890          3779            3567           5157            5611
> 6500            8163
>                Throughput/(MB/s)   327           167             220            577             775
> 938             1397
>                PPW                 0.0668        0.0441          0.0616         0.1118          0.1381
> 0.1443          0.1711
>  600us         Clients             1             2               4              8               12             16              32
>                Energy/Joules       4915  (0.51%) 4912  (29.98%)  3506  (-1.71%) 4907  (-
> 4.85% ) 5011 (-10.69%) 5672  (-12.74%) 8141  (-0.27%)
>                Throughput/(MB/s)   348   (6.42%) 284   (70.06%)  220   (0.00% ) 518   (-
> 10.23%) 712  (-8.13% ) 854   (-8.96% ) 1475  (5.58% )
>                PPW                 0.0708(5.99%) 0.0578(31.07%)  0.0627(1.79% ) 0.1055(-
> 5.64% ) 0.142(2.82%  ) 0.1505(4.30%  ) 0.1811(5.84% )
> ============= =================== ==============
> =============== ============== ===============
> ============== =============== ===============
>
> 3.Hackbench(less time is better)
> ============= ===========================
> ==========================
>   hackbench     governor:schedutil
> ============= ===========================
> ==========================
>   Trans Delay   Process Mode Ave time(s)  Thread Mode Ave time(s)
>   1000us        14.484                      14.484
>   600us         14.418(-0.46%)              15.41(+6.39%)
> ============= ===========================
> ==========================
>
> 4.Perf_sched_bench(less time is better)
> ============= =================== ==============
> ============== ============== ===============
> =============== =============
>  Trans Delay  perf_sched_bench    governor:schedutil
> ============= =================== ==============
> ============== ============== ===============
> =============== =============
>   1000us        Groups             1             2              4              8               12              24
>                 AveTime(s)        1.64          2.851          5.878          11.636          16.093
> 26.395
>   600us         Groups             1             2              4              8               12              24
>                 AveTime(s)        1.69(3.05%)   2.845(-0.21%)  5.843(-0.60%)  11.576(-
> 0.52%)  16.092(-0.01%)  26.32(-0.28%)
> ============= ================== ==============
> ============== ============== ===============
> =============== ==============
>
> 5.Sysbench(higher is better)
> ============= ================== ==============
> ================= ============== ================
> =============== =================
>   Sysbench    governor:schedutil
> ============= ================== ==============
> ================= ============== ================
> =============== =================
>   1000us      Thread             1               2                4              8                12               24
>               Ave events         6020.98         12273.39         24119.82       46171.57
> 47074.37         47831.72
>   600us       Thread             1               2                4              8                12               24
>               Ave events         6154.82(2.22%)  12271.63(-0.01%) 24392.5(1.13%)
> 46117.64(-0.12%) 46852.19(-0.47%) 47678.92(-0.32%)
> ============= ================== ==============
> ================= ============== ================
> =============== =================
>
> In conclusion, a shorter transition delay of cpu clock will make a quite positive
> effect to improve PPW on Dbench test, in the meanwhile , keep stable
> performance on Tbench, Hackbench, Perf_sched_bench and Sysbench.
>
> Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com>
> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
> ---
>  drivers/cpufreq/amd-pstate.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> index 2015c9fcc3c9..8c8594f67af6 100644
> --- a/drivers/cpufreq/amd-pstate.c
> +++ b/drivers/cpufreq/amd-pstate.c
> @@ -50,6 +50,7 @@
>
>  #define AMD_PSTATE_TRANSITION_LATENCY        20000
>  #define AMD_PSTATE_TRANSITION_DELAY  1000
> +#define AMD_PSTATE_FAST_CPPC_TRANSITION_DELAY        600
>  #define AMD_PSTATE_PREFCORE_THRESHOLD        166
>
>  /*
> @@ -868,7 +869,11 @@ static int amd_pstate_cpu_init(struct cpufreq_policy
> *policy)
>       }
>
>       policy->cpuinfo.transition_latency =
> AMD_PSTATE_TRANSITION_LATENCY;
> -     policy->transition_delay_us = AMD_PSTATE_TRANSITION_DELAY;
> +
> +     if (cpu_feature_enabled(X86_FEATURE_FAST_CPPC))
> +             policy->transition_delay_us =
> AMD_PSTATE_FAST_CPPC_TRANSITION_DELAY;
> +     else
> +             policy->transition_delay_us =
> AMD_PSTATE_TRANSITION_DELAY;
>
>       policy->min = min_freq;
>       policy->max = max_freq;
> --
> 2.34.1

LGTM

Reviewed-by: Perry Yuan <perry.yuan@amd.com>



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 2/2] cpufreq: amd-pstate: change cpu freq transition delay for some models
  2024-04-29  7:03 Xiaojian Du
@ 2024-04-29  7:03 ` Xiaojian Du
  2024-05-11  6:54   ` Yuan, Perry
  2024-05-22 16:46   ` Mario Limonciello
  0 siblings, 2 replies; 8+ messages in thread
From: Xiaojian Du @ 2024-04-29  7:03 UTC (permalink / raw)
  To: linux-kernel, linux-pm
  Cc: tglx, mingo, bp, dave.hansen, hpa, daniel.sneddon, jpoimboe,
	pawan.kumar.gupta, sandipan.das, kai.huang, perry.yuan, x86,
	ray.huang, rafael, Xiaojian Du, Mario Limonciello

Some of AMD ZEN4 APU/CPU have support for adjusting the CPU core
clock more quickly and presicely according to CPU work loading.
This is advertised by the Fast CPPC x86 feature.
This change will only be effective in the *passive mode* of
AMD pstate driver. From the test results of different
transition delay values, 600us is chosen to make a balance
between performance and power consumption.

Some test results on AMD Ryzen 7840HS(Phoenix) APU:

1. Tbench
(Energy less is better, Throughput more is better,
PPW--Performance per Watt more is better)
============= =================== ============== =============== ============== =============== ============== =============== ===============
 Trans Delay   Tbench              governor:schedutil, 3-iterations average
============= =================== ============== =============== ============== =============== ============== =============== ===============
 1000us        Clients             1              2               4              8              12             16              32
               Energy/Joules       2010           2804            8768           17171          16170          15132           15027
               Throughput/(MB/s)   114            259             1041           3010           3135           4851            4605
               PPW                 0.0567         0.0923          0.1187         0.1752         0.1938         0.3205          0.3064
 600us         Clients             1              2               4              8              12             16              32
               Energy/Joules       2115  (5.22%)  2388  (-14.84%) 10700(22.03%)  16716 (-2.65%) 15939 (-1.43%) 15053 (-0.52%)  15083 (0.37% )
               Throughput/(MB/s)   122   (7.02%)  234   (-9.65% ) 1188 (14.12%)  3003  (-0.23%) 3143  (0.26% ) 4842  (-0.19%)  4603  (-0.04%)
               PPW                 0.0576(1.59%)  0.0979(6.07%  ) 0.111(-6.49%)  0.1796(2.51% ) 0.1971(1.70% ) 0.3216(0.34% )  0.3051(-0.42%)
============= =================== ============== ================ ============= =============== ============== =============== ===============

2.Dbench
(Energy less is better, Throughput more is better,
PPW--Performance per Watt more is better)
============= =================== ============== =============== ============== =============== ============== =============== ===============
 Trans Delay   Dbench              governor:schedutil, 3-iterations average
============= =================== ============== =============== ============== =============== ============== =============== ===============
 1000us        Clients             1             2               4              8               12             16              32
               Energy/Joules       4890          3779            3567           5157            5611           6500            8163
               Throughput/(MB/s)   327           167             220            577             775            938             1397
               PPW                 0.0668        0.0441          0.0616         0.1118          0.1381         0.1443          0.1711
 600us         Clients             1             2               4              8               12             16              32
               Energy/Joules       4915  (0.51%) 4912  (29.98%)  3506  (-1.71%) 4907  (-4.85% ) 5011 (-10.69%) 5672  (-12.74%) 8141  (-0.27%)
               Throughput/(MB/s)   348   (6.42%) 284   (70.06%)  220   (0.00% ) 518   (-10.23%) 712  (-8.13% ) 854   (-8.96% ) 1475  (5.58% )
               PPW                 0.0708(5.99%) 0.0578(31.07%)  0.0627(1.79% ) 0.1055(-5.64% ) 0.142(2.82%  ) 0.1505(4.30%  ) 0.1811(5.84% )
============= =================== ============== =============== ============== =============== ============== =============== ===============

3.Hackbench(less time is better)
============= =========================== ==========================
  hackbench     governor:schedutil
============= =========================== ==========================
  Trans Delay   Process Mode Ave time(s)  Thread Mode Ave time(s)
  1000us        14.484                      14.484
  600us         14.418(-0.46%)              15.41(+6.39%)
============= =========================== ==========================

4.Perf_sched_bench(less time is better)
============= =================== ============== ============== ============== =============== =============== =============
 Trans Delay  perf_sched_bench    governor:schedutil
============= =================== ============== ============== ============== =============== =============== =============
  1000us        Groups             1             2              4              8               12              24
                AveTime(s)        1.64          2.851          5.878          11.636          16.093          26.395
  600us         Groups             1             2              4              8               12              24
                AveTime(s)        1.69(3.05%)   2.845(-0.21%)  5.843(-0.60%)  11.576(-0.52%)  16.092(-0.01%)  26.32(-0.28%)
============= ================== ============== ============== ============== =============== =============== ==============

5.Sysbench(higher is better)
============= ================== ============== ================= ============== ================ =============== =================
  Sysbench    governor:schedutil
============= ================== ============== ================= ============== ================ =============== =================
  1000us      Thread             1               2                4              8                12               24
              Ave events         6020.98         12273.39         24119.82       46171.57         47074.37         47831.72
  600us       Thread             1               2                4              8                12               24
              Ave events         6154.82(2.22%)  12271.63(-0.01%) 24392.5(1.13%) 46117.64(-0.12%) 46852.19(-0.47%) 47678.92(-0.32%)
============= ================== ============== ================= ============== ================ =============== =================

In conclusion, a shorter transition delay
of cpu clock will make a quite positive effect to improve PPW on Dbench test,
in the meanwhile , keep stable performance on Tbench,
Hackbench, Perf_sched_bench and Sysbench.

Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
---
 drivers/cpufreq/amd-pstate.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index 2015c9fcc3c9..8c8594f67af6 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -50,6 +50,7 @@
 
 #define AMD_PSTATE_TRANSITION_LATENCY	20000
 #define AMD_PSTATE_TRANSITION_DELAY	1000
+#define AMD_PSTATE_FAST_CPPC_TRANSITION_DELAY	600
 #define AMD_PSTATE_PREFCORE_THRESHOLD	166
 
 /*
@@ -868,7 +869,11 @@ static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
 	}
 
 	policy->cpuinfo.transition_latency = AMD_PSTATE_TRANSITION_LATENCY;
-	policy->transition_delay_us = AMD_PSTATE_TRANSITION_DELAY;
+
+	if (cpu_feature_enabled(X86_FEATURE_FAST_CPPC))
+		policy->transition_delay_us = AMD_PSTATE_FAST_CPPC_TRANSITION_DELAY;
+	else
+		policy->transition_delay_us = AMD_PSTATE_TRANSITION_DELAY;
 
 	policy->min = min_freq;
 	policy->max = max_freq;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-05-22 16:46 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-28  9:11 [PATCH 1/2] x86/cpufeatures: Add AMD FAST CPPC feature flag Xiaojian Du
2024-04-28  9:11 ` [PATCH 2/2] cpufreq: amd-pstate: change cpu freq transition delay for some models Xiaojian Du
2024-04-28  9:54 ` [PATCH 1/2] x86/cpufeatures: Add AMD FAST CPPC feature flag Borislav Petkov
2024-04-28 10:59   ` Du, Xiaojian
2024-04-28 11:05     ` Borislav Petkov
2024-04-29  7:03 Xiaojian Du
2024-04-29  7:03 ` [PATCH 2/2] cpufreq: amd-pstate: change cpu freq transition delay for some models Xiaojian Du
2024-05-11  6:54   ` Yuan, Perry
2024-05-22 16:46   ` Mario Limonciello

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.