All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture
@ 2022-09-21  6:36 K Prateek Nayak
  2022-09-21  8:47 ` Borislav Petkov
                   ` (3 more replies)
  0 siblings, 4 replies; 30+ messages in thread
From: K Prateek Nayak @ 2022-09-21  6:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: rafael, lenb, linux-acpi, linux-pm, dave.hansen, bp, tglx, andi,
	puwen, mario.limonciello, peterz, rui.zhang, gpiccoli,
	daniel.lezcano, ananth.narayan, gautham.shenoy, K Prateek Nayak,
	Calvin Ong, stable, regressions

Processors based on the Zen microarchitecture support IOPORT based deeper
C-states. The idle driver reads the acpi_gbl_FADT.xpm_timer_block.address
in the IOPORT based C-state exit path which is claimed to be a
"Dummy wait op" and has been around since ACPI introduction to Linux
dating back to Andy Grover's Mar 14, 2002 posting [1].
The comment above the dummy operation was elaborated by Andreas Mohr back
in 2006 in commit b488f02156d3d ("ACPI: restore comment justifying 'extra'
P_LVLx access") [2] where the commit log claims:
"this dummy read was about: STPCLK# doesn't get asserted in time on
(some) chipsets, which is why we need to have a dummy I/O read to delay
further instruction processing until the CPU is fully stopped."

However, sampling certain workloads with IBS on AMD Zen3 system shows
that a significant amount of time is spent in the dummy op, which
incorrectly gets accounted as C-State residency. A large C-State
residency value can prime the cpuidle governor to recommend a deeper
C-State during the subsequent idle instances, starting a vicious cycle,
leading to performance degradation on workloads that rapidly switch
between busy and idle phases.

One such workload is tbench where a massive performance degradation can
be observed during certain runs. Following are some statistics gathered
by running tbench with 128 clients, on a dual socket (2 x 64C/128T) Zen3
system with the baseline kernel, baseline kernel keeping C2 disabled,
and baseline kernel with this patch applied keeping C2 enabled:

baseline kernel was tip:sched/core at
commit f3dd3f674555 ("sched: Remove the limitation of WF_ON_CPU on
wakelist if wakee cpu is idle")

Kernel        : baseline      baseline + C2 disabled   baseline + patch

Min (MB/s)    : 2215.06       33072.10 (+1393.05%)     33016.10 (+1390.52%)
Max (MB/s)    : 32938.80      34399.10                 34774.50
Median (MB/s) : 32191.80      33476.60                 33805.70
AMean (MB/s)  : 22448.55      33649.27 (+49.89%)       33865.43 (+50.85%)
AMean Stddev  : 17526.70      680.14                   880.72
AMean CoefVar : 78.07%        2.02%                    2.60%

The data shows there are edge cases that can cause massive regressions
in case of tbench. Profiling the bad runs with IBS shows a significant
amount of time being spent in acpi_idle_do_entry method:

Overhead  Command          Shared Object             Symbol
  74.76%  swapper          [kernel.kallsyms]         [k] acpi_idle_do_entry
   0.71%  tbench           [kernel.kallsyms]         [k] update_sd_lb_stats.constprop.0
   0.69%  tbench_srv       [kernel.kallsyms]         [k] update_sd_lb_stats.constprop.0
   0.49%  swapper          [kernel.kallsyms]         [k] psi_group_change
   ...

Annotation of acpi_idle_do_entry method reveals almost all the time in
acpi_idle_do_entry is spent on the port I/O in wait_for_freeze():

  0.14 │      in     (%dx),%al       # <------ First "in" corresponding to inb(cx->address)
  0.51 │      mov    0x144d64d(%rip),%rax
  0.00 │      test   $0x80000000,%eax
       │    ↓ jne    62 	     # <------ Skip if running in guest
  0.00 │      mov    0x19800c3(%rip),%rdx
 99.33 │      in     (%dx),%eax      # <------ Second "in" corresponding to inl(acpi_gbl_FADT.xpm_timer_block.address)
  0.00 │62:   mov    -0x8(%rbp),%r12
  0.00 │      leave
  0.00 │    ← ret

This overhead is reflected in the C2 residency on the test system where
C2 is an IOPORT based C-State. The total C-state residency reported by
"cpupower idle-info" on CPU0 for good and bad case over the 80s tbench
run is as follows (all numbers are in microseconds):

			    Good Run 		Bad Run
			   (Baseline)

POLL: 			       43338		   6231  (-85.62%)
C1 (MWAIT Based): 	    23576156 		 363861  (-98.45%)
C2 (IOPORT Based): 	    10781218 	       77027280  (+614.45%)

The larger residency value in bad case leads to the system recommending
C2 state again for subsequent idle instances. The pattern lasts till the
end of the tbench run. Following is the breakdown of "entry_method"
passed to acpi_idle_do_entry during good run and bad run:

                                        			Good Run    Bad Run
							       (Baseline)

Number of times acpi_idle_do_entry was called:             	6149573     6149050  (-0.01%)
 |-> Number of times entry_method was "ACPI_CSTATE_FFH":        6141494       88144  (-98.56%)
 |-> Number of times entry_method was "ACPI_CSTATE_HALT":             0           0  (+0.00%)
 |-> Number of times entry_method was "ACPI_CSTATE_SYSTEMIO":      8079     6060906  (+74920.49%)

For processors based on the Zen microarchitecture, this dummy wait op is
unnecessary and can be skipped when choosing IOPORT based C-States to
avoid polluting the C-state residency information.

Link: https://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux-fullhistory.git/commit/?id=972c16130d9dc182cedcdd408408d9eacc7d6a2d [1]
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b488f02156d3deb08f5ad7816d565c370a8cc6f1 [2]

Suggested-by: Calvin Ong <calvin.ong@amd.com>
Cc: stable@vger.kernel.org
Cc: regressions@lists.linux.dev
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
 drivers/acpi/processor_idle.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 16a1663d02d4..18850aa2b79b 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -528,8 +528,11 @@ static int acpi_idle_bm_check(void)
 static void wait_for_freeze(void)
 {
 #ifdef	CONFIG_X86
-	/* No delay is needed if we are in guest */
-	if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
+	/*
+	 * No delay is needed if we are in guest or on a processor
+	 * based on the Zen microarchitecture.
+	 */
+	if (boot_cpu_has(X86_FEATURE_HYPERVISOR) || boot_cpu_has(X86_FEATURE_ZEN))
 		return;
 #endif
 	/* Dummy wait op - must do something useless after P_LVL2 read
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture
  2022-09-21  6:36 [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture K Prateek Nayak
@ 2022-09-21  8:47 ` Borislav Petkov
  2022-09-21 10:39   ` K Prateek Nayak
  2022-09-21 14:15 ` Dave Hansen
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 30+ messages in thread
From: Borislav Petkov @ 2022-09-21  8:47 UTC (permalink / raw)
  To: K Prateek Nayak
  Cc: linux-kernel, rafael, lenb, linux-acpi, linux-pm, dave.hansen,
	tglx, andi, puwen, mario.limonciello, peterz, rui.zhang,
	gpiccoli, daniel.lezcano, ananth.narayan, gautham.shenoy,
	Calvin Ong, stable, regressions

On Wed, Sep 21, 2022 at 12:06:38PM +0530, K Prateek Nayak wrote:
> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
> index 16a1663d02d4..18850aa2b79b 100644
> --- a/drivers/acpi/processor_idle.c
> +++ b/drivers/acpi/processor_idle.c
> @@ -528,8 +528,11 @@ static int acpi_idle_bm_check(void)
>  static void wait_for_freeze(void)
>  {
>  #ifdef	CONFIG_X86
> -	/* No delay is needed if we are in guest */
> -	if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
> +	/*
> +	 * No delay is needed if we are in guest or on a processor
> +	 * based on the Zen microarchitecture.
> +	 */
> +	if (boot_cpu_has(X86_FEATURE_HYPERVISOR) || boot_cpu_has(X86_FEATURE_ZEN))

s/boot_cpu_has/cpu_feature_enabled/g

while at it.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture
  2022-09-21  8:47 ` Borislav Petkov
@ 2022-09-21 10:39   ` K Prateek Nayak
  2022-09-21 13:10     ` Borislav Petkov
  0 siblings, 1 reply; 30+ messages in thread
From: K Prateek Nayak @ 2022-09-21 10:39 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, rafael, lenb, linux-acpi, linux-pm, dave.hansen,
	tglx, andi, puwen, mario.limonciello, peterz, rui.zhang,
	gpiccoli, daniel.lezcano, ananth.narayan, gautham.shenoy,
	Calvin Ong, stable, regressions

Hello Boris,

On 9/21/2022 2:17 PM, Borislav Petkov wrote:
> On Wed, Sep 21, 2022 at 12:06:38PM +0530, K Prateek Nayak wrote:
>> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
>> index 16a1663d02d4..18850aa2b79b 100644
>> --- a/drivers/acpi/processor_idle.c
>> +++ b/drivers/acpi/processor_idle.c
>> @@ -528,8 +528,11 @@ static int acpi_idle_bm_check(void)
>>  static void wait_for_freeze(void)
>>  {
>>  #ifdef	CONFIG_X86
>> -	/* No delay is needed if we are in guest */
>> -	if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
>> +	/*
>> +	 * No delay is needed if we are in guest or on a processor
>> +	 * based on the Zen microarchitecture.
>> +	 */
>> +	if (boot_cpu_has(X86_FEATURE_HYPERVISOR) || boot_cpu_has(X86_FEATURE_ZEN))
> 
> s/boot_cpu_has/cpu_feature_enabled/g

I was not aware of cpu_feature_enabled() and it makes perfect sense to
use it here. Thank you for bringing it to my notice.
I'll make this change in v2.

> 
> while at it.
> 

--
Thanks and Regards,
Prateek

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture
  2022-09-21 10:39   ` K Prateek Nayak
@ 2022-09-21 13:10     ` Borislav Petkov
  0 siblings, 0 replies; 30+ messages in thread
From: Borislav Petkov @ 2022-09-21 13:10 UTC (permalink / raw)
  To: K Prateek Nayak
  Cc: linux-kernel, rafael, lenb, linux-acpi, linux-pm, dave.hansen,
	tglx, andi, puwen, mario.limonciello, peterz, rui.zhang,
	gpiccoli, daniel.lezcano, ananth.narayan, gautham.shenoy,
	Calvin Ong, stable, regressions

On Wed, Sep 21, 2022 at 04:09:16PM +0530, K Prateek Nayak wrote:
> I was not aware of cpu_feature_enabled() and it makes perfect sense to
> use it here.

It is no difference what the callers use - we simply want to unify the
interfaces and not have boot_cpu* and cpu_feature_* and so on. One is
enough and we want to use cpu_feature_enabled() everywhere possible.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture
  2022-09-21  6:36 [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture K Prateek Nayak
  2022-09-21  8:47 ` Borislav Petkov
@ 2022-09-21 14:15 ` Dave Hansen
  2022-09-21 19:51   ` Borislav Petkov
                     ` (2 more replies)
  2022-09-21 15:00 ` Peter Zijlstra
  2022-09-22 16:44 ` Dave Hansen
  3 siblings, 3 replies; 30+ messages in thread
From: Dave Hansen @ 2022-09-21 14:15 UTC (permalink / raw)
  To: K Prateek Nayak, linux-kernel
  Cc: rafael, lenb, linux-acpi, linux-pm, dave.hansen, bp, tglx, andi,
	puwen, mario.limonciello, peterz, rui.zhang, gpiccoli,
	daniel.lezcano, ananth.narayan, gautham.shenoy, Calvin Ong,
	stable, regressions

On 9/20/22 23:36, K Prateek Nayak wrote:
> +	/*
> +	 * No delay is needed if we are in guest or on a processor
> +	 * based on the Zen microarchitecture.
> +	 */
> +	if (boot_cpu_has(X86_FEATURE_HYPERVISOR) || boot_cpu_has(X86_FEATURE_ZEN))
>  		return;

In the end, the delay is because of buggy, circa 2006 chipsets?  So, we
use a CPU vendor specific check to approximate that the chipset is
recent and not affected by the bug?  If so, is there no better way to
check for a newer chipset than this?

Do X86_FEATURE_ZEN CPUs just have unusually painful
inl(acpi_fadt.xpm_tmr_blk.address) implementations?  Is that why we
noticed all of a sudden?


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture
  2022-09-21  6:36 [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture K Prateek Nayak
  2022-09-21  8:47 ` Borislav Petkov
  2022-09-21 14:15 ` Dave Hansen
@ 2022-09-21 15:00 ` Peter Zijlstra
  2022-09-21 19:48   ` Borislav Petkov
  2022-09-22 16:44 ` Dave Hansen
  3 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2022-09-21 15:00 UTC (permalink / raw)
  To: K Prateek Nayak
  Cc: linux-kernel, rafael, lenb, linux-acpi, linux-pm, dave.hansen,
	bp, tglx, andi, puwen, mario.limonciello, rui.zhang, gpiccoli,
	daniel.lezcano, ananth.narayan, gautham.shenoy, Calvin Ong,
	stable, regressions

On Wed, Sep 21, 2022 at 12:06:38PM +0530, K Prateek Nayak wrote:
> Processors based on the Zen microarchitecture support IOPORT based deeper
> C-states. 

I've just gotta ask; why the heck are you using IO port based idle
states in 2022 ?!?! You have have MWAIT, right?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture
  2022-09-21 15:00 ` Peter Zijlstra
@ 2022-09-21 19:48   ` Borislav Petkov
  2022-09-22  8:17     ` Peter Zijlstra
  0 siblings, 1 reply; 30+ messages in thread
From: Borislav Petkov @ 2022-09-21 19:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: K Prateek Nayak, linux-kernel, rafael, lenb, linux-acpi,
	linux-pm, dave.hansen, tglx, andi, puwen, mario.limonciello,
	rui.zhang, gpiccoli, daniel.lezcano, ananth.narayan,
	gautham.shenoy, Calvin Ong, stable, regressions

On Wed, Sep 21, 2022 at 05:00:35PM +0200, Peter Zijlstra wrote:
> On Wed, Sep 21, 2022 at 12:06:38PM +0530, K Prateek Nayak wrote:
> > Processors based on the Zen microarchitecture support IOPORT based deeper
> > C-states. 
> 
> I've just gotta ask; why the heck are you using IO port based idle
> states in 2022 ?!?! You have have MWAIT, right?

They have both. And both is Intel technology. And as I'm sure you
know AMD can't do their own thing - they kinda have to follow Intel.
Unfortunately.

Are you saying modern Intel chipsets don't do IO-based C-states anymore?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture
  2022-09-21 14:15 ` Dave Hansen
@ 2022-09-21 19:51   ` Borislav Petkov
  2022-09-21 19:55     ` Limonciello, Mario
  2022-09-22  3:58     ` Ananth Narayan
  2022-09-22  5:44   ` K Prateek Nayak
       [not found]   ` <20220923160106.9297-1-ermorton@ericmoronsm1mbp.amd.com>
  2 siblings, 2 replies; 30+ messages in thread
From: Borislav Petkov @ 2022-09-21 19:51 UTC (permalink / raw)
  To: Dave Hansen
  Cc: K Prateek Nayak, linux-kernel, rafael, lenb, linux-acpi,
	linux-pm, dave.hansen, tglx, andi, puwen, mario.limonciello,
	peterz, rui.zhang, gpiccoli, daniel.lezcano, ananth.narayan,
	gautham.shenoy, Calvin Ong, stable, regressions

On Wed, Sep 21, 2022 at 07:15:07AM -0700, Dave Hansen wrote:
> In the end, the delay is because of buggy, circa 2006 chipsets?  So, we
> use a CPU vendor specific check to approximate that the chipset is
> recent and not affected by the bug?  If so, is there no better way to
> check for a newer chipset than this?

So I did some git archeology but that particular addition is in some
conglomerate, glued-together patch from 2007 which added the cpuidle
tree:

commit 4f86d3a8e297205780cca027e974fd5f81064780
Author: Len Brown <len.brown@intel.com>
Date:   Wed Oct 3 18:58:00 2007 -0400

    cpuidle: consolidate 2.6.22 cpuidle branch into one patch


so the most precise check here should be to limit that dummy read to
that Intel chipset which needed it. Damned if I knew how to figure out
which...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture
  2022-09-21 19:51   ` Borislav Petkov
@ 2022-09-21 19:55     ` Limonciello, Mario
  2022-09-22  3:58     ` Ananth Narayan
  1 sibling, 0 replies; 30+ messages in thread
From: Limonciello, Mario @ 2022-09-21 19:55 UTC (permalink / raw)
  To: Borislav Petkov, Dave Hansen
  Cc: K Prateek Nayak, linux-kernel, rafael, lenb, linux-acpi,
	linux-pm, dave.hansen, tglx, andi, puwen, peterz, rui.zhang,
	gpiccoli, daniel.lezcano, ananth.narayan, gautham.shenoy,
	Calvin Ong, stable, regressions

On 9/21/2022 14:51, Borislav Petkov wrote:
> On Wed, Sep 21, 2022 at 07:15:07AM -0700, Dave Hansen wrote:
>> In the end, the delay is because of buggy, circa 2006 chipsets?  So, we
>> use a CPU vendor specific check to approximate that the chipset is
>> recent and not affected by the bug?  If so, is there no better way to
>> check for a newer chipset than this?
> 
> So I did some git archeology but that particular addition is in some
> conglomerate, glued-together patch from 2007 which added the cpuidle
> tree:
> 
> commit 4f86d3a8e297205780cca027e974fd5f81064780
> Author: Len Brown <len.brown@intel.com>
> Date:   Wed Oct 3 18:58:00 2007 -0400
> 
>      cpuidle: consolidate 2.6.22 cpuidle branch into one patch
> 
> 
> so the most precise check here should be to limit that dummy read to
> that Intel chipset which needed it. Damned if I knew how to figure out
> which...
> 

Functionally most Intel platforms use intel_idle though these days 
though, right?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture
  2022-09-21 19:51   ` Borislav Petkov
  2022-09-21 19:55     ` Limonciello, Mario
@ 2022-09-22  3:58     ` Ananth Narayan
  1 sibling, 0 replies; 30+ messages in thread
From: Ananth Narayan @ 2022-09-22  3:58 UTC (permalink / raw)
  To: Borislav Petkov, Dave Hansen
  Cc: K Prateek Nayak, linux-kernel, rafael, lenb, linux-acpi,
	linux-pm, dave.hansen, tglx, andi, puwen, mario.limonciello,
	peterz, rui.zhang, gpiccoli, daniel.lezcano, gautham.shenoy,
	Calvin Ong, stable, regressions



On 22-09-2022 01:21 am, Borislav Petkov wrote:
> On Wed, Sep 21, 2022 at 07:15:07AM -0700, Dave Hansen wrote:
>> In the end, the delay is because of buggy, circa 2006 chipsets?  So, we
>> use a CPU vendor specific check to approximate that the chipset is
>> recent and not affected by the bug?  If so, is there no better way to
>> check for a newer chipset than this?
> 
> So I did some git archeology but that particular addition is in some
> conglomerate, glued-together patch from 2007 which added the cpuidle
> tree:
> 
> commit 4f86d3a8e297205780cca027e974fd5f81064780
> Author: Len Brown <len.brown@intel.com>
> Date:   Wed Oct 3 18:58:00 2007 -0400
> 
>     cpuidle: consolidate 2.6.22 cpuidle branch into one patch

In fact, the code has moved around a fair bit and the check in its initial form
goes as far back as ACPI's posting for inclusion in the kernel in March 2002
[1]. We could not find any way of digging further back, yet.

Prior to that, I think the ACPI enablement code was being released independent
of the kernel per https://kernel.org/doc/ols/2004/ols2004v1-pages-121-132.pdf
and was included in Andrew's mm tree for a while.

From https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git
the first tag that contains code with the dummy read is v2.5.7 AFAICS.

Ananth

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux-fullhistory.git/commit/?id=972c16130d9dc182cedcdd408408d9eacc7d6a2d

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture
  2022-09-21 14:15 ` Dave Hansen
  2022-09-21 19:51   ` Borislav Petkov
@ 2022-09-22  5:44   ` K Prateek Nayak
       [not found]   ` <20220923160106.9297-1-ermorton@ericmoronsm1mbp.amd.com>
  2 siblings, 0 replies; 30+ messages in thread
From: K Prateek Nayak @ 2022-09-22  5:44 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel
  Cc: rafael, lenb, linux-acpi, linux-pm, dave.hansen, bp, tglx, andi,
	puwen, mario.limonciello, peterz, rui.zhang, gpiccoli,
	daniel.lezcano, ananth.narayan, gautham.shenoy, Calvin Ong,
	stable, regressions

Hello Dave,

On 9/21/2022 7:45 PM, Dave Hansen wrote:
> On 9/20/22 23:36, K Prateek Nayak wrote:
>> +	/*
>> +	 * No delay is needed if we are in guest or on a processor
>> +	 * based on the Zen microarchitecture.
>> +	 */
>> +	if (boot_cpu_has(X86_FEATURE_HYPERVISOR) || boot_cpu_has(X86_FEATURE_ZEN))
>>  		return;
> 
> In the end, the delay is because of buggy, circa 2006 chipsets?  So, we
> use a CPU vendor specific check to approximate that the chipset is
> recent and not affected by the bug?  If so, is there no better way to
> check for a newer chipset than this?

Elsewhere in the thread, people have noted that the faulty chipsets seem to
go all the way back to pre-2002. Andreas's comment was added in 2006 but we
have no way of knowing if it is limited only to chipsets prior to 2006. If
anyone can confirm a clean cut-off point when this was no longer required,
perhaps we can limit this dummy wait to the older chipsets by annotating
them with a X86_BUG_STPCLK quirk.

> 
> Do X86_FEATURE_ZEN CPUs just have unusually painful
> inl(acpi_fadt.xpm_tmr_blk.address) implementations?

Yes. The issue becomes more pronounced with increased core counts when many
cores exit from C2 simultaneously. The core density is especially high on
X86_FEATURE_ZEN chipsets, none of which require a dummy wait op to ensure
correct behavior. Hence, we used the feature check to skip it.

> Is that why we
> noticed all of a sudden?
> 

We saw run-to-run variance in tbench with 128 clients as a part of our
scheduler regression runs. Originally we attributed it to the tbench
threads not being spread around uniformly, but the problem persisted even
when we ensured that the initial task placement spreads them out. Further
analysis showed that significant time was spent in exit from C2 in the
bad runs.

--
Thanks and Regards,
Prateek

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture
  2022-09-21 19:48   ` Borislav Petkov
@ 2022-09-22  8:17     ` Peter Zijlstra
  2022-09-22 15:21       ` Rafael J. Wysocki
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2022-09-22  8:17 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: K Prateek Nayak, linux-kernel, rafael, lenb, linux-acpi,
	linux-pm, dave.hansen, tglx, andi, puwen, mario.limonciello,
	rui.zhang, gpiccoli, daniel.lezcano, ananth.narayan,
	gautham.shenoy, Calvin Ong, stable, regressions

On Wed, Sep 21, 2022 at 09:48:13PM +0200, Borislav Petkov wrote:
> On Wed, Sep 21, 2022 at 05:00:35PM +0200, Peter Zijlstra wrote:
> > On Wed, Sep 21, 2022 at 12:06:38PM +0530, K Prateek Nayak wrote:
> > > Processors based on the Zen microarchitecture support IOPORT based deeper
> > > C-states. 
> > 
> > I've just gotta ask; why the heck are you using IO port based idle
> > states in 2022 ?!?! You have have MWAIT, right?
> 
> They have both. And both is Intel technology. And as I'm sure you
> know AMD can't do their own thing - they kinda have to follow Intel.
> Unfortunately.
> 
> Are you saying modern Intel chipsets don't do IO-based C-states anymore?

I've no idea what they do, but Linux exclusively uses MWAIT on Intel as
per intel_idle.c.

MWAIT also cuts down on IPIs because it wakes from the TIF write.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture
  2022-09-22  8:17     ` Peter Zijlstra
@ 2022-09-22 15:21       ` Rafael J. Wysocki
  2022-09-22 15:36         ` Borislav Petkov
  0 siblings, 1 reply; 30+ messages in thread
From: Rafael J. Wysocki @ 2022-09-22 15:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Borislav Petkov, K Prateek Nayak, Linux Kernel Mailing List,
	Rafael J. Wysocki, Len Brown, ACPI Devel Maling List, Linux PM,
	Dave Hansen, Thomas Gleixner, andi, Pu Wen, Mario Limonciello,
	Zhang, Rui, Guilherme G. Piccoli, Daniel Lezcano, ananth.narayan,
	gautham.shenoy, Calvin Ong, Stable, regressions

On Thu, Sep 22, 2022 at 10:17 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Sep 21, 2022 at 09:48:13PM +0200, Borislav Petkov wrote:
> > On Wed, Sep 21, 2022 at 05:00:35PM +0200, Peter Zijlstra wrote:
> > > On Wed, Sep 21, 2022 at 12:06:38PM +0530, K Prateek Nayak wrote:
> > > > Processors based on the Zen microarchitecture support IOPORT based deeper
> > > > C-states.
> > >
> > > I've just gotta ask; why the heck are you using IO port based idle
> > > states in 2022 ?!?! You have have MWAIT, right?
> >
> > They have both. And both is Intel technology. And as I'm sure you
> > know AMD can't do their own thing - they kinda have to follow Intel.
> > Unfortunately.
> >
> > Are you saying modern Intel chipsets don't do IO-based C-states anymore?
>
> I've no idea what they do, but Linux exclusively uses MWAIT on Intel as
> per intel_idle.c.

Well, it can be forced to use ACPI idle instead.

> MWAIT also cuts down on IPIs because it wakes from the TIF write.

Right.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture
  2022-09-22 15:21       ` Rafael J. Wysocki
@ 2022-09-22 15:36         ` Borislav Petkov
  2022-09-22 15:53           ` Rafael J. Wysocki
  0 siblings, 1 reply; 30+ messages in thread
From: Borislav Petkov @ 2022-09-22 15:36 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Peter Zijlstra, K Prateek Nayak, Linux Kernel Mailing List,
	Len Brown, ACPI Devel Maling List, Linux PM, Dave Hansen,
	Thomas Gleixner, andi, Pu Wen, Mario Limonciello, Zhang, Rui,
	Guilherme G. Piccoli, Daniel Lezcano, ananth.narayan,
	gautham.shenoy, Calvin Ong, Stable, regressions

On Thu, Sep 22, 2022 at 05:21:21PM +0200, Rafael J. Wysocki wrote:
> Well, it can be forced to use ACPI idle instead.

Yeah, I did that earlier. The dummy IO read in question costs ~3K on
average on my Coffeelake box here.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture
  2022-09-22 15:36         ` Borislav Petkov
@ 2022-09-22 15:53           ` Rafael J. Wysocki
  2022-09-22 16:36             ` Dave Hansen
  0 siblings, 1 reply; 30+ messages in thread
From: Rafael J. Wysocki @ 2022-09-22 15:53 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Rafael J. Wysocki, Peter Zijlstra, K Prateek Nayak,
	Linux Kernel Mailing List, Len Brown, ACPI Devel Maling List,
	Linux PM, Dave Hansen, Thomas Gleixner, andi, Pu Wen,
	Mario Limonciello, Zhang, Rui, Guilherme G. Piccoli,
	Daniel Lezcano, ananth.narayan, gautham.shenoy, Calvin Ong,
	Stable, regressions

On Thu, Sep 22, 2022 at 5:36 PM Borislav Petkov <bp@alien8.de> wrote:
>
> On Thu, Sep 22, 2022 at 05:21:21PM +0200, Rafael J. Wysocki wrote:
> > Well, it can be forced to use ACPI idle instead.
>
> Yeah, I did that earlier. The dummy IO read in question costs ~3K on
> average on my Coffeelake box here.

Well, that's the cost of forcing something non-default.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture
  2022-09-22 15:53           ` Rafael J. Wysocki
@ 2022-09-22 16:36             ` Dave Hansen
  0 siblings, 0 replies; 30+ messages in thread
From: Dave Hansen @ 2022-09-22 16:36 UTC (permalink / raw)
  To: Rafael J. Wysocki, Borislav Petkov
  Cc: Peter Zijlstra, K Prateek Nayak, Linux Kernel Mailing List,
	Len Brown, ACPI Devel Maling List, Linux PM, Dave Hansen,
	Thomas Gleixner, andi, Pu Wen, Mario Limonciello, Zhang, Rui,
	Guilherme G. Piccoli, Daniel Lezcano, ananth.narayan,
	gautham.shenoy, Calvin Ong, Stable, regressions

On 9/22/22 08:53, Rafael J. Wysocki wrote:
> On Thu, Sep 22, 2022 at 5:36 PM Borislav Petkov <bp@alien8.de> wrote:
>> On Thu, Sep 22, 2022 at 05:21:21PM +0200, Rafael J. Wysocki wrote:
>>> Well, it can be forced to use ACPI idle instead.
>> Yeah, I did that earlier. The dummy IO read in question costs ~3K on
>> average on my Coffeelake box here.
> Well, that's the cost of forcing something non-default.

Just to be clear: The _original_ AMD Zen problem that this thread is
addressing is from a *default* configuration, right?  It isn't that
someone overrode the idle defaults?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture
  2022-09-21  6:36 [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture K Prateek Nayak
                   ` (2 preceding siblings ...)
  2022-09-21 15:00 ` Peter Zijlstra
@ 2022-09-22 16:44 ` Dave Hansen
  2022-09-22 16:54   ` K Prateek Nayak
  3 siblings, 1 reply; 30+ messages in thread
From: Dave Hansen @ 2022-09-22 16:44 UTC (permalink / raw)
  To: K Prateek Nayak, linux-kernel
  Cc: rafael, lenb, linux-acpi, linux-pm, dave.hansen, bp, tglx, andi,
	puwen, mario.limonciello, peterz, rui.zhang, gpiccoli,
	daniel.lezcano, ananth.narayan, gautham.shenoy, Calvin Ong,
	stable, regressions

On 9/20/22 23:36, K Prateek Nayak wrote:
> Cc: stable@vger.kernel.org
> Cc: regressions@lists.linux.dev

*Is* this a regression?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture
  2022-09-22 16:44 ` Dave Hansen
@ 2022-09-22 16:54   ` K Prateek Nayak
  2022-09-22 17:01     ` Dave Hansen
  0 siblings, 1 reply; 30+ messages in thread
From: K Prateek Nayak @ 2022-09-22 16:54 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel
  Cc: rafael, lenb, linux-acpi, linux-pm, dave.hansen, bp, tglx, andi,
	puwen, mario.limonciello, peterz, rui.zhang, gpiccoli,
	daniel.lezcano, ananth.narayan, gautham.shenoy, Calvin Ong,
	stable, regressions

Hello Dave,

On 9/22/2022 10:14 PM, Dave Hansen wrote:
> On 9/20/22 23:36, K Prateek Nayak wrote:
>> Cc: stable@vger.kernel.org
>> Cc: regressions@lists.linux.dev
> 
> *Is* this a regression?

On second thought, it is not a regression.
Will remove the tag on v2.
--
Thanks and Regards,
Prateek

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture
  2022-09-22 16:54   ` K Prateek Nayak
@ 2022-09-22 17:01     ` Dave Hansen
  2022-09-22 17:48       ` Limonciello, Mario
  2022-09-22 19:42       ` Andreas Mohr
  0 siblings, 2 replies; 30+ messages in thread
From: Dave Hansen @ 2022-09-22 17:01 UTC (permalink / raw)
  To: K Prateek Nayak, linux-kernel
  Cc: rafael, lenb, linux-acpi, linux-pm, dave.hansen, bp, tglx, andi,
	puwen, mario.limonciello, peterz, rui.zhang, gpiccoli,
	daniel.lezcano, ananth.narayan, gautham.shenoy, Calvin Ong,
	stable, regressions

[-- Attachment #1: Type: text/plain, Size: 519 bytes --]

On 9/22/22 09:54, K Prateek Nayak wrote:
> 
> On 9/22/2022 10:14 PM, Dave Hansen wrote:
>> On 9/20/22 23:36, K Prateek Nayak wrote:
>>> Cc: stable@vger.kernel.org
>>> Cc: regressions@lists.linux.dev
>> *Is* this a regression?
> On second thought, it is not a regression.
> Will remove the tag on v2.

What were you planning for v2?

Rafael suggested something like the attached patch.  It's not nearly as
fragile as the Zen check you proposed earlier.

Any testing or corrections on the commentary would be appreciated.

[-- Attachment #2: 0001-ACPI-processor-idle-Limit-Dummy-wait-workaround-to-o.patch --]
[-- Type: text/x-patch, Size: 2918 bytes --]

From 54e4668122d447ee80ec465f244b19d968c4a7c6 Mon Sep 17 00:00:00 2001
From: Dave Hansen <dave.hansen@intel.com>
Date: Thu, 22 Sep 2022 09:22:26 -0700
Subject: [PATCH] ACPI: processor idle: Limit "Dummy wait" workaround to old
 Intel systems

Old, circa 2002 chipsets have a bug: they don't go idle when they are
supposed to.  So, a workaround was added to slow the CPU down and
ensure that the CPU waits a bit for the chipset to actually go idle.
This workaround is ancient and has been in place in some form since
the original kernel ACPI implementation.

But, this workaround is very painful on modern systems.  The "inb()"
can take thousands of cycles (see Link: for some more detailed
archaeology).

First and foremost, modern systems should not be using this code.
Typical Intel systems have not used it in over a decade because it
is horribly inferior to MWAIT-based idle.

Despite this, people do seem to be tripping over this workaround on
AMD CPUs today.

Limit the "dummy wait" workaround to Intel systems.  Since this
code is only used on ancient Intel systems, this fix should render it
harmless everywhere else, including the modern AMD ones.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Len Brown <lenb@kernel.org>
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reported-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/all/20220921063638.2489-1-kprateek.nayak@amd.com/
---
 drivers/acpi/processor_idle.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 16a1663d02d4..9f40917c49ef 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -531,10 +531,27 @@ static void wait_for_freeze(void)
 	/* No delay is needed if we are in guest */
 	if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
 		return;
+	/*
+	 * Modern (>=Nehalem) Intel systems use ACPI via intel_idle,
+	 * not this code.  Assume that any Intel systems using this
+	 * are ancient and may need the dummy wait.  This also assumes
+	 * that the motivating chipset issue was Intel-only.
+	 */
+	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
+		return;
 #endif
-	/* Dummy wait op - must do something useless after P_LVL2 read
-	   because chipsets cannot guarantee that STPCLK# signal
-	   gets asserted in time to freeze execution properly. */
+	/*
+	 * Dummy wait op - must do something useless after P_LVL2 read
+	 * because chipsets cannot guarantee that STPCLK# signal gets
+	 * asserted in time to freeze execution properly
+	 *
+	 * This workaround has been in place since the original ACPI
+	 * implementation was merged, circa 2002.
+	 *
+	 * If a profile is pointing to this instruction, please first
+	 * consider moving your system to a more modern idle
+	 * mechanism.
+	 */
 	inl(acpi_gbl_FADT.xpm_timer_block.address);
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* RE: [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture
  2022-09-22 17:01     ` Dave Hansen
@ 2022-09-22 17:48       ` Limonciello, Mario
  2022-09-22 18:17         ` Dave Hansen
  2022-09-22 19:42       ` Andreas Mohr
  1 sibling, 1 reply; 30+ messages in thread
From: Limonciello, Mario @ 2022-09-22 17:48 UTC (permalink / raw)
  To: Dave Hansen, Nayak, K Prateek, linux-kernel
  Cc: rafael, lenb, linux-acpi, linux-pm, dave.hansen, bp, tglx, andi,
	puwen, peterz, rui.zhang, gpiccoli, daniel.lezcano, Narayan,
	Ananth, Shenoy, Gautham Ranjal, Ong, Calvin, stable, regressions

[Public]



> -----Original Message-----
> From: Dave Hansen <dave.hansen@intel.com>
> Sent: Thursday, September 22, 2022 12:02
> To: Nayak, K Prateek <KPrateek.Nayak@amd.com>; linux-
> kernel@vger.kernel.org
> Cc: rafael@kernel.org; lenb@kernel.org; linux-acpi@vger.kernel.org; linux-
> pm@vger.kernel.org; dave.hansen@linux.intel.com; bp@alien8.de;
> tglx@linutronix.de; andi@lisas.de; puwen@hygon.cn; Limonciello, Mario
> <Mario.Limonciello@amd.com>; peterz@infradead.org;
> rui.zhang@intel.com; gpiccoli@igalia.com; daniel.lezcano@linaro.org;
> Narayan, Ananth <Ananth.Narayan@amd.com>; Shenoy, Gautham Ranjal
> <gautham.shenoy@amd.com>; Ong, Calvin <Calvin.Ong@amd.com>;
> stable@vger.kernel.org; regressions@lists.linux.dev
> Subject: Re: [PATCH] ACPI: processor_idle: Skip dummy wait for processors
> based on the Zen microarchitecture
> 
> On 9/22/22 09:54, K Prateek Nayak wrote:
> >
> > On 9/22/2022 10:14 PM, Dave Hansen wrote:
> >> On 9/20/22 23:36, K Prateek Nayak wrote:
> >>> Cc: stable@vger.kernel.org
> >>> Cc: regressions@lists.linux.dev
> >> *Is* this a regression?
> > On second thought, it is not a regression.
> > Will remove the tag on v2.
> 
> What were you planning for v2?
> 
> Rafael suggested something like the attached patch.  It's not nearly as
> fragile as the Zen check you proposed earlier.
> 
> Any testing or corrections on the commentary would be appreciated.

This seems reasonable to me.  Appreciate the suggestion.

Some small nits:
1) You reference inb() specifically in the commit message, but the code that is skipped is
actually inl().

2) The title says to limit it to old intel systems, but nothing about this actually enforces that.
It actually is limited to all Intel systems, but effectively won't be used on anything but new
ones because of intel_idle.

As an idea for #2 you could check for CONFIG_INTEL_IDLE in the Intel case and
if it's not defined show a pr_notice_once() type of message trying to tell people to use
Intel Idle instead for better performance.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture
  2022-09-22 17:48       ` Limonciello, Mario
@ 2022-09-22 18:17         ` Dave Hansen
  2022-09-22 18:28           ` Limonciello, Mario
  0 siblings, 1 reply; 30+ messages in thread
From: Dave Hansen @ 2022-09-22 18:17 UTC (permalink / raw)
  To: Limonciello, Mario, Nayak, K Prateek, linux-kernel
  Cc: rafael, lenb, linux-acpi, linux-pm, dave.hansen, bp, tglx, andi,
	puwen, peterz, rui.zhang, gpiccoli, daniel.lezcano, Narayan,
	Ananth, Shenoy, Gautham Ranjal, Ong, Calvin, stable, regressions

On 9/22/22 10:48, Limonciello, Mario wrote:
> 
> 2) The title says to limit it to old intel systems, but nothing about this actually enforces that.
> It actually is limited to all Intel systems, but effectively won't be used on anything but new
> ones because of intel_idle.
> 
> As an idea for #2 you could check for CONFIG_INTEL_IDLE in the Intel case and
> if it's not defined show a pr_notice_once() type of message trying to tell people to use
> Intel Idle instead for better performance.

What does that have to do with *this* patch, though?

If you've got CONFIG_INTEL_IDLE disabled, you'll be slow before this
patch.  You'll also be slow after this patch.  It's entirely orthogonal.

I can add a "Practically" to the subject so folks don't confuse it with
some hard limit that is being enforced:

	ACPI: processor idle: Practically limit "Dummy wait" workaround to old
Intel systems

BTW, is there seriously a strong technical reason that AMD systems are
still using this code?  Or is it pure inertia?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture
  2022-09-22 18:17         ` Dave Hansen
@ 2022-09-22 18:28           ` Limonciello, Mario
  2022-09-23 11:47             ` Ananth Narayan
  0 siblings, 1 reply; 30+ messages in thread
From: Limonciello, Mario @ 2022-09-22 18:28 UTC (permalink / raw)
  To: Dave Hansen, Nayak, K Prateek, linux-kernel
  Cc: rafael, lenb, linux-acpi, linux-pm, dave.hansen, bp, tglx, andi,
	puwen, peterz, rui.zhang, gpiccoli, daniel.lezcano, Narayan,
	Ananth, Shenoy, Gautham Ranjal, Ong, Calvin, stable, regressions

[Public]

> -----Original Message-----
> From: Dave Hansen <dave.hansen@intel.com>
> Sent: Thursday, September 22, 2022 13:18
> To: Limonciello, Mario <Mario.Limonciello@amd.com>; Nayak, K Prateek
> <KPrateek.Nayak@amd.com>; linux-kernel@vger.kernel.org
> Cc: rafael@kernel.org; lenb@kernel.org; linux-acpi@vger.kernel.org; linux-
> pm@vger.kernel.org; dave.hansen@linux.intel.com; bp@alien8.de;
> tglx@linutronix.de; andi@lisas.de; puwen@hygon.cn; peterz@infradead.org;
> rui.zhang@intel.com; gpiccoli@igalia.com; daniel.lezcano@linaro.org;
> Narayan, Ananth <Ananth.Narayan@amd.com>; Shenoy, Gautham Ranjal
> <gautham.shenoy@amd.com>; Ong, Calvin <Calvin.Ong@amd.com>;
> stable@vger.kernel.org; regressions@lists.linux.dev
> Subject: Re: [PATCH] ACPI: processor_idle: Skip dummy wait for processors
> based on the Zen microarchitecture
> 
> On 9/22/22 10:48, Limonciello, Mario wrote:
> >
> > 2) The title says to limit it to old intel systems, but nothing about this
> actually enforces that.
> > It actually is limited to all Intel systems, but effectively won't be used on
> anything but new
> > ones because of intel_idle.
> >
> > As an idea for #2 you could check for CONFIG_INTEL_IDLE in the Intel case
> and
> > if it's not defined show a pr_notice_once() type of message trying to tell
> people to use
> > Intel Idle instead for better performance.
> 
> What does that have to do with *this* patch, though?

It was just a thought triggered by your commit message title.

> 
> If you've got CONFIG_INTEL_IDLE disabled, you'll be slow before this
> patch.  You'll also be slow after this patch.  It's entirely orthogonal.
> 

Yeah it's orthogonal, but with this discussion happening and the code is
changing /anyway/ then a pr_notice_once() seemed like a nice way to
guide people towards intel_idle at the same time so they didn't trip into
the same problem AMD systems do today.

> I can add a "Practically" to the subject so folks don't confuse it with
> some hard limit that is being enforced:
> 
> 	ACPI: processor idle: Practically limit "Dummy wait" workaround to
> old
> Intel systems

That works.

> 
> BTW, is there seriously a strong technical reason that AMD systems are
> still using this code?  Or is it pure inertia?

Maybe a better question for Ananth and Prateek to comment on.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture
  2022-09-22 17:01     ` Dave Hansen
  2022-09-22 17:48       ` Limonciello, Mario
@ 2022-09-22 19:42       ` Andreas Mohr
  2022-09-22 20:10         ` Andreas Mohr
  1 sibling, 1 reply; 30+ messages in thread
From: Andreas Mohr @ 2022-09-22 19:42 UTC (permalink / raw)
  To: Dave Hansen
  Cc: K Prateek Nayak, linux-kernel, rafael, lenb, linux-acpi,
	linux-pm, dave.hansen, bp, tglx, andi, puwen, mario.limonciello,
	peterz, rui.zhang, gpiccoli, daniel.lezcano, ananth.narayan,
	gautham.shenoy, Calvin Ong, stable, regressions

Hi,

On Thu, Sep 22, 2022 at 10:01:46AM -0700, Dave Hansen wrote:
> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
> index 16a1663d02d4..9f40917c49ef 100644
> --- a/drivers/acpi/processor_idle.c
> +++ b/drivers/acpi/processor_idle.c
> @@ -531,10 +531,27 @@ static void wait_for_freeze(void)
>  	/* No delay is needed if we are in guest */
>  	if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
>  		return;
> +	/*
> +	 * Modern (>=Nehalem) Intel systems use ACPI via intel_idle,
> +	 * not this code.  Assume that any Intel systems using this
> +	 * are ancient and may need the dummy wait.  This also assumes
> +	 * that the motivating chipset issue was Intel-only.
> +	 */
> +	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
> +		return;
>  #endif
> -	/* Dummy wait op - must do something useless after P_LVL2 read
> -	   because chipsets cannot guarantee that STPCLK# signal
> -	   gets asserted in time to freeze execution properly. */

16 years ago,
I did my testing on a VIA 8233/8235 chipset (AMD Athlon/Duron) system......
(plus reading VIA spec PDFs which mentioned "STPCLK#" etc.).




AFAIR I was doing kernel profiling (via oprofile, IIRC)
for painful performance hotspots (read: I/O accesses etc.), and
this was one resulting place which I stumbled over.
And if I'm not completely mistaken,
that dummy wait I/O op *was* needed (else "nice" effects)
on my system (put loud and clear: *non*-Intel).



So one can see where my profiling effort went
(*optimizing* things, not degrading them)
--> hints that current Zen3-originating effort is not
about a regression in the "regression bug" sense -
merely a (albeit rather appreciable/sizeable... congrats!)
performance deterioration vs.
an optimal (currently non-achieved) software implementation state
(also: of PORT-based handling [vs. MWAIT], mind you!).


I still have that VIA hardware, but inactive
(had the oh-so-usual capacitors issue :( ).


Sorry for sabotaging your current fix efforts ;-) -
but thank you very much for your work/discussion
in this very central/hotpath area! (this extends to all of you...)

Greetings

Andreas Mohr

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture
  2022-09-22 19:42       ` Andreas Mohr
@ 2022-09-22 20:10         ` Andreas Mohr
  2022-09-22 21:21           ` Dave Hansen
  0 siblings, 1 reply; 30+ messages in thread
From: Andreas Mohr @ 2022-09-22 20:10 UTC (permalink / raw)
  To: Andreas Mohr
  Cc: Dave Hansen, K Prateek Nayak, linux-kernel, rafael, lenb,
	linux-acpi, linux-pm, dave.hansen, bp, tglx, puwen,
	mario.limonciello, peterz, rui.zhang, gpiccoli, daniel.lezcano,
	ananth.narayan, gautham.shenoy, Calvin Ong, stable, regressions

On Thu, Sep 22, 2022 at 09:42:15PM +0200, Andreas Mohr wrote:
> So one can see where my profiling effort went
> (*optimizing* things, not degrading them)
> --> hints that current Zen3-originating effort is not
> about a regression in the "regression bug" sense -
> merely a (albeit rather appreciable/sizeable... congrats!)
> performance deterioration vs.
> an optimal (currently non-achieved) software implementation state
> (also: of PORT-based handling [vs. MWAIT], mind you!).

I'd like to add a word of caution here:

AFAIK power management (here: ACPI Cx) handling generally is
about a painful *tradeoff* between
achieving best-possible performance (that's
the respectable Zen3 32MB/s vs. 33MB/s argument) and
achieving maximum power savings.
We all know that one can configure the system for
non-idle mode (idle=poll cmdline?) and
achieve record numbers in performance (...*and* power consumption - ouch!).

Current decision/implementation aspects AFAICS:
- why is the Zen3 config used here choosing
  less-favourable(?) PORT-based operation mode?
- Zen3 is said to not have the STPCLK# issue
  (- but then what about other more modern chipsets?)

--> we need to achieve (hopefully sufficiently precisely) a solution which
takes into account Zen3 STPCLK# improvements while
preserving "accepted" behaviour/requirements on *all* STPCLK#-hampered chipsets
("STPCLK# I/O wait is default/traditional handling"?).

Greetings

Andreas Mohr

-- 
GNU/Linux. It's not the software that's free, it's you.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture
  2022-09-22 20:10         ` Andreas Mohr
@ 2022-09-22 21:21           ` Dave Hansen
  2022-09-22 21:38             ` Limonciello, Mario
                               ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Dave Hansen @ 2022-09-22 21:21 UTC (permalink / raw)
  To: Andreas Mohr
  Cc: K Prateek Nayak, linux-kernel, rafael, lenb, linux-acpi,
	linux-pm, dave.hansen, bp, tglx, puwen, mario.limonciello,
	peterz, rui.zhang, gpiccoli, daniel.lezcano, ananth.narayan,
	gautham.shenoy, Calvin Ong, stable, regressions

On 9/22/22 13:10, Andreas Mohr wrote:
>   (- but then what about other more modern chipsets?)
> 
> --> we need to achieve (hopefully sufficiently precisely) a solution which
> takes into account Zen3 STPCLK# improvements while
> preserving "accepted" behaviour/requirements on *all* STPCLK#-hampered chipsets
> ("STPCLK# I/O wait is default/traditional handling"?).

Ideally, sure.  But, we're talking about theoretically regressing the
idle behavior of some indeterminate set of old systems, the majority of
which are sitting in a puddle of capacitor goo at the bottom of a
landfill right now.  This is far from an ideal situation.

FWIW, I'd much rather do something like

	if ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) &&
	    (boot_cpu_data.x86_model >= 0xF))
		return;

	inl(slow_whatever);

than a Zen check.  AMD has, as far as I know, been a lot more sequential
and sane about model numbers than Intel, and there are some AMD model
number range checks in the codebase today.

A check like this would also be _relatively_ future-proof in the case
that X86_FEATURE_ZEN stops getting set on future AMD CPUs.  That's a lot
more likely than AMD going and reusing a <0xF model.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture
  2022-09-22 21:21           ` Dave Hansen
@ 2022-09-22 21:38             ` Limonciello, Mario
  2022-09-23  7:42             ` Peter Zijlstra
  2022-09-23  7:57             ` Peter Zijlstra
  2 siblings, 0 replies; 30+ messages in thread
From: Limonciello, Mario @ 2022-09-22 21:38 UTC (permalink / raw)
  To: Dave Hansen, Andreas Mohr
  Cc: Nayak, K Prateek, linux-kernel, rafael, lenb, linux-acpi,
	linux-pm, dave.hansen, bp, tglx, puwen, peterz, rui.zhang,
	gpiccoli, daniel.lezcano, Narayan, Ananth, Shenoy,
	Gautham Ranjal, Ong, Calvin, stable, regressions

[Public]



> -----Original Message-----
> From: Dave Hansen <dave.hansen@intel.com>
> Sent: Thursday, September 22, 2022 16:22
> To: Andreas Mohr <andi@lisas.de>
> Cc: Nayak, K Prateek <KPrateek.Nayak@amd.com>; linux-
> kernel@vger.kernel.org; rafael@kernel.org; lenb@kernel.org; linux-
> acpi@vger.kernel.org; linux-pm@vger.kernel.org;
> dave.hansen@linux.intel.com; bp@alien8.de; tglx@linutronix.de;
> puwen@hygon.cn; Limonciello, Mario <Mario.Limonciello@amd.com>;
> peterz@infradead.org; rui.zhang@intel.com; gpiccoli@igalia.com;
> daniel.lezcano@linaro.org; Narayan, Ananth <Ananth.Narayan@amd.com>;
> Shenoy, Gautham Ranjal <gautham.shenoy@amd.com>; Ong, Calvin
> <Calvin.Ong@amd.com>; stable@vger.kernel.org;
> regressions@lists.linux.dev
> Subject: Re: [PATCH] ACPI: processor_idle: Skip dummy wait for processors
> based on the Zen microarchitecture
> 
> On 9/22/22 13:10, Andreas Mohr wrote:
> >   (- but then what about other more modern chipsets?)
> >
> > --> we need to achieve (hopefully sufficiently precisely) a solution which
> > takes into account Zen3 STPCLK# improvements while
> > preserving "accepted" behaviour/requirements on *all* STPCLK#-
> hampered chipsets
> > ("STPCLK# I/O wait is default/traditional handling"?).
> 
> Ideally, sure.  But, we're talking about theoretically regressing the
> idle behavior of some indeterminate set of old systems, the majority of
> which are sitting in a puddle of capacitor goo at the bottom of a
> landfill right now.  This is far from an ideal situation.
> 
> FWIW, I'd much rather do something like
> 
> 	if ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) &&
> 	    (boot_cpu_data.x86_model >= 0xF))
> 		return;
> 
> 	inl(slow_whatever);
> 
> than a Zen check.  AMD has, as far as I know, been a lot more sequential
> and sane about model numbers than Intel, and there are some AMD model
> number range checks in the codebase today.
> 
> A check like this would also be _relatively_ future-proof in the case
> that X86_FEATURE_ZEN stops getting set on future AMD CPUs.  That's a lot
> more likely than AMD going and reusing a <0xF model.

If you're going to use a family check instead it should be 0x17 or newer.
(c->x86 >= 0x17)

That does match what's used to set X86_FEATURE_ZEN at least then right now too.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture
  2022-09-22 21:21           ` Dave Hansen
  2022-09-22 21:38             ` Limonciello, Mario
@ 2022-09-23  7:42             ` Peter Zijlstra
  2022-09-23  7:57             ` Peter Zijlstra
  2 siblings, 0 replies; 30+ messages in thread
From: Peter Zijlstra @ 2022-09-23  7:42 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andreas Mohr, K Prateek Nayak, linux-kernel, rafael, lenb,
	linux-acpi, linux-pm, dave.hansen, bp, tglx, puwen,
	mario.limonciello, rui.zhang, gpiccoli, daniel.lezcano,
	ananth.narayan, gautham.shenoy, Calvin Ong, stable, regressions

On Thu, Sep 22, 2022 at 02:21:31PM -0700, Dave Hansen wrote:
> On 9/22/22 13:10, Andreas Mohr wrote:
> >   (- but then what about other more modern chipsets?)
> > 
> > --> we need to achieve (hopefully sufficiently precisely) a solution which
> > takes into account Zen3 STPCLK# improvements while
> > preserving "accepted" behaviour/requirements on *all* STPCLK#-hampered chipsets
> > ("STPCLK# I/O wait is default/traditional handling"?).
> 
> Ideally, sure.  But, we're talking about theoretically regressing the
> idle behavior of some indeterminate set of old systems, the majority of
> which are sitting in a puddle of capacitor goo at the bottom of a
> landfill right now.  This is far from an ideal situation.
> 
> FWIW, I'd much rather do something like
> 
> 	if ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) &&
> 	    (boot_cpu_data.x86_model >= 0xF))
> 		return;
> 
> 	inl(slow_whatever);
> 
> than a Zen check.  AMD has, as far as I know, been a lot more sequential
> and sane about model numbers than Intel, and there are some AMD model
> number range checks in the codebase today.
> 
> A check like this would also be _relatively_ future-proof in the case
> that X86_FEATURE_ZEN stops getting set on future AMD CPUs.  That's a lot
> more likely than AMD going and reusing a <0xF model.

Except you need to add VENDOR_HYGON at the very least. All of this turns
into a trainwreck real quick.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture
  2022-09-22 21:21           ` Dave Hansen
  2022-09-22 21:38             ` Limonciello, Mario
  2022-09-23  7:42             ` Peter Zijlstra
@ 2022-09-23  7:57             ` Peter Zijlstra
  2 siblings, 0 replies; 30+ messages in thread
From: Peter Zijlstra @ 2022-09-23  7:57 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andreas Mohr, K Prateek Nayak, linux-kernel, rafael, lenb,
	linux-acpi, linux-pm, dave.hansen, bp, tglx, puwen,
	mario.limonciello, rui.zhang, gpiccoli, daniel.lezcano,
	ananth.narayan, gautham.shenoy, Calvin Ong, stable, regressions

On Thu, Sep 22, 2022 at 02:21:31PM -0700, Dave Hansen wrote:
> FWIW, I'd much rather do something like
> 
> 	if ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) &&
> 	    (boot_cpu_data.x86_model >= 0xF))
> 		return;
> 
> 	inl(slow_whatever);
> 
> than a Zen check.  AMD has, as far as I know, been a lot more sequential
> and sane about model numbers than Intel, and there are some AMD model
> number range checks in the codebase today.

Some might be broken; apparently their SoC/Entertainment divisions has a
few out of order SKUs that were not listed in their regular documents.
(yay interweb)

I ran into this when I tried doing a Zen2 range check for retbleed. In
the end we ended up using the availablility of STIBP as a heuristic to
indentify Zen2+ or something.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture
  2022-09-22 18:28           ` Limonciello, Mario
@ 2022-09-23 11:47             ` Ananth Narayan
  0 siblings, 0 replies; 30+ messages in thread
From: Ananth Narayan @ 2022-09-23 11:47 UTC (permalink / raw)
  To: Limonciello, Mario, Dave Hansen, Nayak, K Prateek, linux-kernel
  Cc: rafael, lenb, linux-acpi, linux-pm, dave.hansen, bp, tglx, andi,
	puwen, peterz, rui.zhang, gpiccoli, daniel.lezcano, Shenoy,
	Gautham Ranjal, Ong, Calvin, stable, regressions

On 22-09-2022 11:58 pm, Limonciello, Mario wrote:
>> BTW, is there seriously a strong technical reason that AMD systems are
>> still using this code?  Or is it pure inertia?
> 
> Maybe a better question for Ananth and Prateek to comment on.

We have evaluated using MWAIT for C2 entry and feel that there are good micro
architectural reasons to stick to IOPORT based transitions for now.

Regards,
Ananth


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture
       [not found]   ` <20220923160106.9297-1-ermorton@ericmoronsm1mbp.amd.com>
@ 2022-09-23 16:15     ` Ananth Narayan
  0 siblings, 0 replies; 30+ messages in thread
From: Ananth Narayan @ 2022-09-23 16:15 UTC (permalink / raw)
  To: dave.hansen, eric.morton
  Cc: andi, bp, calvin.ong, daniel.lezcano, dave.hansen,
	gautham.shenoy, gpiccoli, kprateek.nayak, lenb, linux-acpi,
	linux-kernel, linux-pm, mario.limonciello, peterz, puwen, rafael,
	regressions, rui.zhang, stable, tglx

Eric,
The MTA mangled your address. So the note is not showing up on the list.
Responding so this hopefully shows up on lkml for the records.

Apologies to everyone else for duplicates.

Regards,
Ananth

On 23-09-2022 09:31 pm, AMD\ermorton wrote:
> On 2022-09-21 14:15, David Hansen wrote:
> 
>>> Do X86_FEATURE_ZEN CPUs just have unusually painful
>>> inl(acpi_fadt.xpm_tmr_blk.address) implementations?
> 
> Hi David,
> 
> I'm glad you asked this.
> 
> Obviously the words "painful" and "slow" are arbitrary. But... since there are many aspects such as the platform, core clock frequency, system clock frequency, etc, that play into this, I will refrain from any precise numbers.
> 
> I would say that x86 platforms (that today have in excess of a hundred processors) generally design the legacy PM_TMR and other serial resources in the Southbridge/FCH with the underlying assumption that (a) the kernel accesses them "rarely" in non-performance sensitive code and, more importantly, (b) that it is unlikely to have multiple processors access them "simultaneously". These resources are a fair distant from the processor, and unlike memory controllers, these resources were not designed to have multiple simultaneous accesses running in parallel.
> 
> So let's assert that, to start off with, the accesses are already "slow" from the processor standpoint because of this distance. It is likely that most x86 implementations could easily take around 500ns-1us round trip. The exact number will vary, but a quick sanity test on current x86 production platforms match that for a "singleton" access.
> 
> That alone is well over 1000 core clocks and seems to be reason enough to avoid doing this INL when it is not necessary.
> 
> But as the PM_TMR is not designed to handle simultaneous accesses, if multiple processors do simultaneously access this resource (or even "close to simultaneous"), the first access might be "slow", the second access might be "slower", and well, the 100th access might be "painful". And there are interrupt cases where this can indeed happen - due to this ancient workaround...
> 
> Note that a quick sanity test that we created when we understood this tbench data suggested to me that Intel platforms are not immune from the impact of this worst-case access pattern either. This is not surprising to me. But we did not do an exhaustive check.
> 
> Sincerely,
> Eric Morton
> AMD Infinity Fabric and SOC Architecture

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2022-09-23 16:16 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-21  6:36 [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture K Prateek Nayak
2022-09-21  8:47 ` Borislav Petkov
2022-09-21 10:39   ` K Prateek Nayak
2022-09-21 13:10     ` Borislav Petkov
2022-09-21 14:15 ` Dave Hansen
2022-09-21 19:51   ` Borislav Petkov
2022-09-21 19:55     ` Limonciello, Mario
2022-09-22  3:58     ` Ananth Narayan
2022-09-22  5:44   ` K Prateek Nayak
     [not found]   ` <20220923160106.9297-1-ermorton@ericmoronsm1mbp.amd.com>
2022-09-23 16:15     ` Ananth Narayan
2022-09-21 15:00 ` Peter Zijlstra
2022-09-21 19:48   ` Borislav Petkov
2022-09-22  8:17     ` Peter Zijlstra
2022-09-22 15:21       ` Rafael J. Wysocki
2022-09-22 15:36         ` Borislav Petkov
2022-09-22 15:53           ` Rafael J. Wysocki
2022-09-22 16:36             ` Dave Hansen
2022-09-22 16:44 ` Dave Hansen
2022-09-22 16:54   ` K Prateek Nayak
2022-09-22 17:01     ` Dave Hansen
2022-09-22 17:48       ` Limonciello, Mario
2022-09-22 18:17         ` Dave Hansen
2022-09-22 18:28           ` Limonciello, Mario
2022-09-23 11:47             ` Ananth Narayan
2022-09-22 19:42       ` Andreas Mohr
2022-09-22 20:10         ` Andreas Mohr
2022-09-22 21:21           ` Dave Hansen
2022-09-22 21:38             ` Limonciello, Mario
2022-09-23  7:42             ` Peter Zijlstra
2022-09-23  7:57             ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.