All of lore.kernel.org
 help / color / mirror / Atom feed
From: K Prateek Nayak <kprateek.nayak@amd.com>
To: Dave Hansen <dave.hansen@intel.com>, linux-kernel@vger.kernel.org
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
	Len Brown <lenb@kernel.org>,
	Mario Limonciello <Mario.Limonciello@amd.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Borislav Petkov <bp@alien8.de>,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>
Subject: Re: [PATCH] ACPI: processor idle: Practically limit "Dummy wait" workaround to old Intel systems
Date: Sat, 24 Sep 2022 02:05:55 +0530	[thread overview]
Message-ID: <775d8042-639a-1ae6-c099-9411e6027f40@amd.com> (raw)
In-Reply-To: <20220922184745.3252932-1-dave.hansen@intel.com>

Hello Dave,

On 9/23/2022 12:17 AM, Dave Hansen wrote:
> Old, circa 2002 chipsets have a bug: they don't go idle when they are
> supposed to.  So, a workaround was added to slow the CPU down and
> ensure that the CPU waits a bit for the chipset to actually go idle.
> This workaround is ancient and has been in place in some form since
> the original kernel ACPI implementation.
> 
> But, this workaround is very painful on modern systems.  The "inl()"
> can take thousands of cycles (see Link: for some more detailed
> numbers and some fun kernel archaeology).
> 
> First and foremost, modern systems should not be using this code.
> Typical Intel systems have not used it in over a decade because it is
> horribly inferior to MWAIT-based idle.
> 
> Despite this, people do seem to be tripping over this workaround on
> AMD system today.
> 
> Limit the "dummy wait" workaround to Intel systems.  Keep Modern AMD
> systems from tripping over the workaround.  Remotely modern Intel
> systems use intel_idle instead of this code and will, in practice,
> remain unaffected by the dummy wait.

I've run 30 runs of tbench with 128 clients on a dual socket Zen3 system
(2 x 64C/128T) and do not see any massive regression like I used to when
we were hitting the dummy wait issue:

Kernel        : baseline      baseline + C2 disabled   baseline + this patch

Min (MB/s)    : 2215.06       33072.10 (+1393.05%)     30519.60 (+1277.82%)
Max (MB/s)    : 32938.80      34399.10                 32699.30
Median (MB/s) : 32191.80      33476.60                 31418.90
AMean (MB/s)  : 22448.55      33649.27 (+49.89%)       31545.93 (+40.52%)
AMean Stddev  : 17526.70      680.14                   1095.39
AMean CoefVar : 78.07%        2.02%                    3.47%

The range is well within the variation we've normally seen with tbench
on the test platform.

> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Len Brown <lenb@kernel.org>
> Cc: Mario Limonciello <Mario.Limonciello@amd.com>
> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Borislav Petkov <bp@alien8.de>

Can you please add a cc to stable?

Cc: stable@vger.kernel.org

> Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Reported-by: K Prateek Nayak <kprateek.nayak@amd.com>
> Link: https://lore.kernel.org/all/20220921063638.2489-1-kprateek.nayak@amd.com/
> ---
>  drivers/acpi/processor_idle.c | 23 ++++++++++++++++++++---
>  1 file changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
> index 16a1663d02d4..9f40917c49ef 100644
> --- a/drivers/acpi/processor_idle.c
> +++ b/drivers/acpi/processor_idle.c
> @@ -531,10 +531,27 @@ static void wait_for_freeze(void)
>  	/* No delay is needed if we are in guest */
>  	if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
>  		return;
> +	/*
> +	 * Modern (>=Nehalem) Intel systems use ACPI via intel_idle,
> +	 * not this code.  Assume that any Intel systems using this
> +	 * are ancient and may need the dummy wait.  This also assumes
> +	 * that the motivating chipset issue was Intel-only.
> +	 */
> +	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)

Based on Andreas's comment, this problem is not limited to Intel chipsets
and affects at least the AMD Athlon on VIA chipset (circa 2006)
(https://lore.kernel.org/lkml/Yyy6l94G0O2B7Yh1@rhlx01.hs-esslingen.de/)
To be on safer side, the exception could be made for AMD Fam 17h+ and also
Hygon as pointed out by Peter, where we know the dummy wait is unnecessary.
Extending the condition you proposed, we can have:

	if (boot_cpu_data.x86_vendor == X86_VENDOR_HYGON ||
	    ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) &&
	     (boot_cpu_data.x86_model >= 0x17)))
		return;

It is not pretty by any means which is why we can use a x86_BUG_STPCLK to
limit the dummy op to only affected processors. This way, the x86 vendor
check and family check can be avoided in the acpi code. A v2 has been sent
out tackling the problem this way:
https://lore.kernel.org/lkml/20220923153801.9167-1-kprateek.nayak@amd.com/

> +		return;
>  #endif
> -	/* Dummy wait op - must do something useless after P_LVL2 read
> -	   because chipsets cannot guarantee that STPCLK# signal
> -	   gets asserted in time to freeze execution properly. */
> +	/*
> +	 * Dummy wait op - must do something useless after P_LVL2 read
> +	 * because chipsets cannot guarantee that STPCLK# signal gets
> +	 * asserted in time to freeze execution properly
> +	 *
> +	 * This workaround has been in place since the original ACPI
> +	 * implementation was merged, circa 2002.
> +	 *
> +	 * If a profile is pointing to this instruction, please first
> +	 * consider moving your system to a more modern idle
> +	 * mechanism.
> +	 */
>  	inl(acpi_gbl_FADT.xpm_timer_block.address);
>  }
>  

The patch, as it is, solves the problem we've seen on the newer AMD
platforms with large core density that use IOPORT based C-states.

Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
--
Thanks and Regards,
Prateek

  parent reply	other threads:[~2022-09-23 20:42 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20220922184745.3252932-1-dave.hansen@intel.com>
2022-09-22 18:53 ` [PATCH] ACPI: processor idle: Practically limit "Dummy wait" workaround to old Intel systems Rafael J. Wysocki
2022-09-22 18:57   ` Limonciello, Mario
2022-09-22 19:01   ` Dave Hansen
2022-09-23 18:36     ` Kim Phillips
2022-09-26 21:49       ` Dave Hansen
2022-09-23 20:35 ` K Prateek Nayak [this message]
2022-09-23 22:29 ` [tip: x86/urgent] " tip-bot2 for Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=775d8042-639a-1ae6-c099-9411e6027f40@amd.com \
    --to=kprateek.nayak@amd.com \
    --cc=Mario.Limonciello@amd.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=lenb@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.