All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] x86/intel: Disable HPET on another Intel Coffee Lake platform
@ 2021-09-17  2:46 Jakub Kicinski
  2021-09-27 22:36 ` Krzysztof Wilczyński
  2021-09-29 13:11 ` Jakub Kicinski
  0 siblings, 2 replies; 11+ messages in thread
From: Jakub Kicinski @ 2021-09-17  2:46 UTC (permalink / raw)
  To: x86
  Cc: jose.souza, hpa, bp, mingo, tglx, kai.heng.feng, bhelgaas,
	linux-pci, rudolph, xapienz, bmilton, paulmck, Jakub Kicinski,
	stable

My Lenovo T490s with i7-8665U had been marking TSC as unstable
since v5.13, resulting in very sluggish desktop experience...

Kernel logs show:

  clocksource: timekeeping watchdog on CPU3: hpet read-back delay of 316000ns, attempt 4, marking unstable
  tsc: Marking TSC unstable due to clocksource watchdog
  TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
  sched_clock: Marking unstable (14539801827657, -530891666)<-(14539319241737, -48307500)
  clocksource: Checking clocksource tsc synchronization from CPU 3 to CPUs 0-2,6-7.
  clocksource: Switched to clocksource hpet

I have a 8086:3e34 bridge, also known as "Host bridge: Intel
Corporation Coffee Lake HOST and DRAM Controller (rev 0c)".
Add it to the list.

We should perhaps consider applying this quirk more widely.
The Intel documentation does not list my device [1], but
linuxhw [2] does, and it seems to list a few more bridges
we do not currently cover (3e31, 3ecc, 3e35, 3e0f).

[1] https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/8th-gen-core-family-datasheet-vol-2.pdf
[2] https://github.com/linuxhw/DevicePopulation/blob/master/README.md

Cc: stable@vger.kernel.org # v5.13+
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
v2: - add the dmesg output
---
 arch/x86/kernel/early-quirks.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c
index 38837dad46e6..7d2de04f8750 100644
--- a/arch/x86/kernel/early-quirks.c
+++ b/arch/x86/kernel/early-quirks.c
@@ -716,6 +716,8 @@ static struct chipset early_qrk[] __initdata = {
 		PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
 	{ PCI_VENDOR_ID_INTEL, 0x3e20,
 		PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
+	{ PCI_VENDOR_ID_INTEL, 0x3e34,
+		PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
 	{ PCI_VENDOR_ID_INTEL, 0x3ec4,
 		PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
 	{ PCI_VENDOR_ID_INTEL, 0x8a12,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] x86/intel: Disable HPET on another Intel Coffee Lake platform
  2021-09-17  2:46 [PATCH v2] x86/intel: Disable HPET on another Intel Coffee Lake platform Jakub Kicinski
@ 2021-09-27 22:36 ` Krzysztof Wilczyński
  2021-09-29 13:11 ` Jakub Kicinski
  1 sibling, 0 replies; 11+ messages in thread
From: Krzysztof Wilczyński @ 2021-09-27 22:36 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: x86, jose.souza, hpa, bp, mingo, tglx, kai.heng.feng, bhelgaas,
	linux-pci, rudolph, xapienz, bmilton, paulmck, stable

Hi Jakub,

> My Lenovo T490s with i7-8665U had been marking TSC as unstable
> since v5.13, resulting in very sluggish desktop experience...
> 
> Kernel logs show:
> 
>   clocksource: timekeeping watchdog on CPU3: hpet read-back delay of 316000ns, attempt 4, marking unstable
>   tsc: Marking TSC unstable due to clocksource watchdog
>   TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
>   sched_clock: Marking unstable (14539801827657, -530891666)<-(14539319241737, -48307500)
>   clocksource: Checking clocksource tsc synchronization from CPU 3 to CPUs 0-2,6-7.
>   clocksource: Switched to clocksource hpet
> 
> I have a 8086:3e34 bridge, also known as "Host bridge: Intel
> Corporation Coffee Lake HOST and DRAM Controller (rev 0c)".
> Add it to the list.
> 
> We should perhaps consider applying this quirk more widely.
> The Intel documentation does not list my device [1], but
> linuxhw [2] does, and it seems to list a few more bridges
> we do not currently cover (3e31, 3ecc, 3e35, 3e0f).

I wish someone from Intel would be a little more forthcoming and chimed in
about the other devices.  I guess, we will cross that bridge when we get to
it, so to speak.

> [1] https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/8th-gen-core-family-datasheet-vol-2.pdf
> [2] https://github.com/linuxhw/DevicePopulation/blob/master/README.md
> 
> Cc: stable@vger.kernel.org # v5.13+
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> v2: - add the dmesg output
> ---
>  arch/x86/kernel/early-quirks.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c
> index 38837dad46e6..7d2de04f8750 100644
> --- a/arch/x86/kernel/early-quirks.c
> +++ b/arch/x86/kernel/early-quirks.c
> @@ -716,6 +716,8 @@ static struct chipset early_qrk[] __initdata = {
>  		PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
>  	{ PCI_VENDOR_ID_INTEL, 0x3e20,
>  		PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
> +	{ PCI_VENDOR_ID_INTEL, 0x3e34,
> +		PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
>  	{ PCI_VENDOR_ID_INTEL, 0x3ec4,
>  		PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
>  	{ PCI_VENDOR_ID_INTEL, 0x8a12,

Thank you!

Acked-by: Krzysztof Wilczyński <kw@linux.com>

	Krzysztof

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] x86/intel: Disable HPET on another Intel Coffee Lake platform
  2021-09-17  2:46 [PATCH v2] x86/intel: Disable HPET on another Intel Coffee Lake platform Jakub Kicinski
  2021-09-27 22:36 ` Krzysztof Wilczyński
@ 2021-09-29 13:11 ` Jakub Kicinski
  2021-09-29 16:05   ` Bjorn Helgaas
  1 sibling, 1 reply; 11+ messages in thread
From: Jakub Kicinski @ 2021-09-29 13:11 UTC (permalink / raw)
  To: x86
  Cc: jose.souza, hpa, bp, mingo, tglx, kai.heng.feng, bhelgaas,
	linux-pci, rudolph, xapienz, bmilton, paulmck, stable

On Thu, 16 Sep 2021 19:46:48 -0700 Jakub Kicinski wrote:
> My Lenovo T490s with i7-8665U had been marking TSC as unstable
> since v5.13, resulting in very sluggish desktop experience...

Where do we stand? Waiting for tglx to refactor PC10 detection and use
that, or just review delay?

> +++ b/arch/x86/kernel/early-quirks.c
> @@ -716,6 +716,8 @@ static struct chipset early_qrk[] __initdata = {
>  		PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
>  	{ PCI_VENDOR_ID_INTEL, 0x3e20,
>  		PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
> +	{ PCI_VENDOR_ID_INTEL, 0x3e34,
> +		PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
>  	{ PCI_VENDOR_ID_INTEL, 0x3ec4,
>  		PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
>  	{ PCI_VENDOR_ID_INTEL, 0x8a12,


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] x86/intel: Disable HPET on another Intel Coffee Lake platform
  2021-09-29 13:11 ` Jakub Kicinski
@ 2021-09-29 16:05   ` Bjorn Helgaas
  2021-09-30  9:08     ` Thomas Gleixner
  0 siblings, 1 reply; 11+ messages in thread
From: Bjorn Helgaas @ 2021-09-29 16:05 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: x86, jose.souza, hpa, bp, mingo, tglx, kai.heng.feng, bhelgaas,
	linux-pci, rudolph, xapienz, bmilton, paulmck, stable

On Wed, Sep 29, 2021 at 06:11:07AM -0700, Jakub Kicinski wrote:
> On Thu, 16 Sep 2021 19:46:48 -0700 Jakub Kicinski wrote:
> > My Lenovo T490s with i7-8665U had been marking TSC as unstable
> > since v5.13, resulting in very sluggish desktop experience...
> 
> Where do we stand? Waiting for tglx to refactor PC10 detection and use
> that, or just review delay?

From my point of view, this is an x86 issue, not a PCI one, so I'll
defer to the x86 folks.

> > +++ b/arch/x86/kernel/early-quirks.c
> > @@ -716,6 +716,8 @@ static struct chipset early_qrk[] __initdata = {
> >  		PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
> >  	{ PCI_VENDOR_ID_INTEL, 0x3e20,
> >  		PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
> > +	{ PCI_VENDOR_ID_INTEL, 0x3e34,
> > +		PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
> >  	{ PCI_VENDOR_ID_INTEL, 0x3ec4,
> >  		PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
> >  	{ PCI_VENDOR_ID_INTEL, 0x8a12,
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] x86/intel: Disable HPET on another Intel Coffee Lake platform
  2021-09-29 16:05   ` Bjorn Helgaas
@ 2021-09-30  9:08     ` Thomas Gleixner
  2021-09-30 11:15       ` [PATCH RFT] x86/hpet: Use another crystalball to evaluate HPET usability Thomas Gleixner
  0 siblings, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2021-09-30  9:08 UTC (permalink / raw)
  To: Bjorn Helgaas, Jakub Kicinski
  Cc: x86, jose.souza, hpa, bp, mingo, kai.heng.feng, bhelgaas,
	linux-pci, rudolph, xapienz, bmilton, paulmck, stable,
	Rafael J. Wysocki, Dave Hansen

On Wed, Sep 29 2021 at 11:05, Bjorn Helgaas wrote:
> On Wed, Sep 29, 2021 at 06:11:07AM -0700, Jakub Kicinski wrote:
>> On Thu, 16 Sep 2021 19:46:48 -0700 Jakub Kicinski wrote:
>> > My Lenovo T490s with i7-8665U had been marking TSC as unstable
>> > since v5.13, resulting in very sluggish desktop experience...
>> 
>> Where do we stand? Waiting for tglx to refactor PC10 detection and use
>> that, or just review delay?
>
> From my point of view, this is an x86 issue, not a PCI one, so I'll
> defer to the x86 folks.

Yes, it is. I'm still trying to make sense of this enumeration
muck. Adding these silly PCI ids to the quirks section is a whack a mole
game which does not make sense.

Lemme find a few spare cycles to whip up a patch.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH RFT] x86/hpet: Use another crystalball to evaluate HPET usability
  2021-09-30  9:08     ` Thomas Gleixner
@ 2021-09-30 11:15       ` Thomas Gleixner
  2021-09-30 11:38         ` Rafael J. Wysocki
  0 siblings, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2021-09-30 11:15 UTC (permalink / raw)
  To: Bjorn Helgaas, Jakub Kicinski
  Cc: x86, jose.souza, hpa, Borislav Petkov, Ingo Molnar,
	kai.heng.feng, bhelgaas, linux-pci, rudolph, xapienz, bmilton,
	paulmck, stable, Rafael J. Wysocki, Dave Hansen, Feng Tang,
	Harry Pan

On recent Intel systems the HPET stops working when the system reaches PC10
idle state.

The approach of adding PCI ids to the early quirks to disable HPET on
these systems is a whack a mole game which makes no sense.

Check for PC10 instead and force disable HPET if supported. The check is
overbroad as it does not take ACPI, intel_idle enablement and command
line parameters into account. That's fine as long as there is at least
PMTIMER available to calibrate the TSC frequency. The decision can be
overruled by adding "hpet=force" on the kernel command line.

Remove the related early PCI quirks for affected Ice and Coffin Lake
systems as they are not longer required.

Fixes: Yet another hardware trainwreck
Reported-by: Jakub Kicinski <kuba@kernel.org>
Not-yet-signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
Notes: Completely untested. Use at your own peril.
---
 arch/x86/kernel/early-quirks.c |    6 --
 arch/x86/kernel/hpet.c         |   88 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 88 insertions(+), 6 deletions(-)

--- a/arch/x86/kernel/early-quirks.c
+++ b/arch/x86/kernel/early-quirks.c
@@ -714,12 +714,6 @@ static struct chipset early_qrk[] __init
 	 */
 	{ PCI_VENDOR_ID_INTEL, 0x0f00,
 		PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
-	{ PCI_VENDOR_ID_INTEL, 0x3e20,
-		PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
-	{ PCI_VENDOR_ID_INTEL, 0x3ec4,
-		PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
-	{ PCI_VENDOR_ID_INTEL, 0x8a12,
-		PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
 	{ PCI_VENDOR_ID_BROADCOM, 0x4331,
 	  PCI_CLASS_NETWORK_OTHER, PCI_ANY_ID, 0, apple_airport_reset},
 	{}
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -10,6 +10,7 @@
 #include <asm/irq_remapping.h>
 #include <asm/hpet.h>
 #include <asm/time.h>
+#include <asm/mwait.h>
 
 #undef  pr_fmt
 #define pr_fmt(fmt) "hpet: " fmt
@@ -916,6 +917,90 @@ static bool __init hpet_counting(void)
 	return false;
 }
 
+static bool __init get_mwait_substates(unsigned int *mwait_substates)
+{
+	unsigned int eax, ebx, ecx;
+
+	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
+		return false;
+
+	if (!boot_cpu_has(X86_FEATURE_MWAIT))
+		return false;
+
+	if (boot_cpu_data.cpuid_level < CPUID_MWAIT_LEAF)
+		return false;
+
+	cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, mwait_substates);
+
+	if (!(ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED) ||
+	    !(ecx & CPUID5_ECX_INTERRUPT_BREAK) ||
+	    !*mwait_substates)
+		return false;
+
+	return true;
+}
+
+/*
+ * Check whether the system supports PC10. If so force disable HPET as that
+ * stops counting in PC10. This check is overbroad as it does not take any
+ * of the following into account:
+ *
+ *	- ACPI tables
+ *	- Enablement of intel_idle
+ *	- Command line arguments which limit intel_idle C-state support
+ *
+ * That's perfectly fine. HPET is a piece of hardware designed by committee
+ * and the only reasons why it is still in use on modern systems is the
+ * fact that it is impossible to reliably query the TSC frequency via
+ * CPUID or firmware.
+ *
+ * If HPET is functional it is useful for calibrating TSC, but this can be
+ * done via PMTIMER as well which seems to be the last remaining timer on
+ * X86/INTEL platforms that has not been completely wreckaged by feature
+ * creep.
+ *
+ * In theory HPET support should be removed altogether, but there are older
+ * systems out there which depend on it because TSC and APIC timer are
+ * dysfunctional in deeper C-states.
+ *
+ * It's only 20 years now that hardware people have been asked to provide
+ * reliable and discoverable facilities which can be used for timekeeping
+ * and per CPU timer interrupts.
+ *
+ * The probability that this problem is going to be solved in the
+ * forseeable future is close to zero, so the kernel has to be cluttered
+ * with heuristics to keep up with the ever growing amount of hardware and
+ * firmware trainwrecks. Hopefully some day hardware people will understand
+ * that the approach of "This can be fixed in software" is not sustainable.
+ * Hope dies last...
+ */
+static bool __init hpet_is_pc10_damaged(void)
+{
+	unsigned int mwait_substates;
+	unsigned long long pcfg;
+
+	if (!get_mwait_substates(&mwait_substates))
+		return false;
+
+	/* Check whether PC10 substates are supported */
+	if (!(mwait_substates & (0xF << 28)))
+		return false;
+
+	/* Check whether PC10 is enabled in PKG C-state limit */
+	rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, pcfg);
+	if ((pcfg & 0xF) < 8)
+		return false;
+
+	if (hpet_force_user) {
+		pr_warn("HPET force enabled via command line, but dysfunctional in PC10.\n");
+		return false;
+	}
+
+	pr_info("HPET dysfunctional in PC10. Force disabled.\n");
+	boot_hpet_disable = true;
+	return true;
+}
+
 /**
  * hpet_enable - Try to setup the HPET timer. Returns 1 on success.
  */
@@ -929,6 +1014,9 @@ int __init hpet_enable(void)
 	if (!is_hpet_capable())
 		return 0;
 
+	if (hpet_is_pc10_damaged())
+		return 0;
+
 	hpet_set_mapping();
 	if (!hpet_virt_address)
 		return 0;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFT] x86/hpet: Use another crystalball to evaluate HPET usability
  2021-09-30 11:15       ` [PATCH RFT] x86/hpet: Use another crystalball to evaluate HPET usability Thomas Gleixner
@ 2021-09-30 11:38         ` Rafael J. Wysocki
  2021-09-30 17:07           ` Thomas Gleixner
  0 siblings, 1 reply; 11+ messages in thread
From: Rafael J. Wysocki @ 2021-09-30 11:38 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Bjorn Helgaas, Jakub Kicinski, the arch/x86 maintainers,
	jose.souza, H. Peter Anvin, Borislav Petkov, Ingo Molnar,
	Kai-Heng Feng, Bjorn Helgaas, Linux PCI, rudolph, xapienz,
	bmilton, Paul E. McKenney, Stable, Rafael J. Wysocki,
	Dave Hansen, Feng Tang, Harry Pan

On Thu, Sep 30, 2021 at 1:15 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On recent Intel systems the HPET stops working when the system reaches PC10
> idle state.
>
> The approach of adding PCI ids to the early quirks to disable HPET on
> these systems is a whack a mole game which makes no sense.
>
> Check for PC10 instead and force disable HPET if supported. The check is
> overbroad as it does not take ACPI, intel_idle enablement and command
> line parameters into account. That's fine as long as there is at least
> PMTIMER available to calibrate the TSC frequency. The decision can be
> overruled by adding "hpet=force" on the kernel command line.
>
> Remove the related early PCI quirks for affected Ice and Coffin Lake
> systems as they are not longer required.
>
> Fixes: Yet another hardware trainwreck
> Reported-by: Jakub Kicinski <kuba@kernel.org>
> Not-yet-signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> Notes: Completely untested. Use at your own peril.
> ---
>  arch/x86/kernel/early-quirks.c |    6 --
>  arch/x86/kernel/hpet.c         |   88 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 88 insertions(+), 6 deletions(-)
>
> --- a/arch/x86/kernel/early-quirks.c
> +++ b/arch/x86/kernel/early-quirks.c
> @@ -714,12 +714,6 @@ static struct chipset early_qrk[] __init
>          */
>         { PCI_VENDOR_ID_INTEL, 0x0f00,
>                 PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
> -       { PCI_VENDOR_ID_INTEL, 0x3e20,
> -               PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
> -       { PCI_VENDOR_ID_INTEL, 0x3ec4,
> -               PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
> -       { PCI_VENDOR_ID_INTEL, 0x8a12,
> -               PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
>         { PCI_VENDOR_ID_BROADCOM, 0x4331,
>           PCI_CLASS_NETWORK_OTHER, PCI_ANY_ID, 0, apple_airport_reset},
>         {}
> --- a/arch/x86/kernel/hpet.c
> +++ b/arch/x86/kernel/hpet.c
> @@ -10,6 +10,7 @@
>  #include <asm/irq_remapping.h>
>  #include <asm/hpet.h>
>  #include <asm/time.h>
> +#include <asm/mwait.h>
>
>  #undef  pr_fmt
>  #define pr_fmt(fmt) "hpet: " fmt
> @@ -916,6 +917,90 @@ static bool __init hpet_counting(void)
>         return false;
>  }
>
> +static bool __init get_mwait_substates(unsigned int *mwait_substates)
> +{
> +       unsigned int eax, ebx, ecx;
> +
> +       if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
> +               return false;
> +
> +       if (!boot_cpu_has(X86_FEATURE_MWAIT))
> +               return false;
> +
> +       if (boot_cpu_data.cpuid_level < CPUID_MWAIT_LEAF)
> +               return false;
> +
> +       cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, mwait_substates);
> +
> +       if (!(ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED) ||
> +           !(ecx & CPUID5_ECX_INTERRUPT_BREAK) ||
> +           !*mwait_substates)
> +               return false;

I would do

return (ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED) && (ecx &
CPUID5_ECX_INTERRUPT_BREAK) && *mwait_substates;

And this function could just return the mwait_substates value proper,
because returning 0 then would be equivalent to returning 'false' from
it as is.

LGTM otherwise.

> +
> +       return true;
> +}
> +
> +/*
> + * Check whether the system supports PC10. If so force disable HPET as that
> + * stops counting in PC10. This check is overbroad as it does not take any
> + * of the following into account:
> + *
> + *     - ACPI tables
> + *     - Enablement of intel_idle
> + *     - Command line arguments which limit intel_idle C-state support
> + *
> + * That's perfectly fine. HPET is a piece of hardware designed by committee
> + * and the only reasons why it is still in use on modern systems is the
> + * fact that it is impossible to reliably query the TSC frequency via
> + * CPUID or firmware.
> + *
> + * If HPET is functional it is useful for calibrating TSC, but this can be
> + * done via PMTIMER as well which seems to be the last remaining timer on
> + * X86/INTEL platforms that has not been completely wreckaged by feature
> + * creep.
> + *
> + * In theory HPET support should be removed altogether, but there are older
> + * systems out there which depend on it because TSC and APIC timer are
> + * dysfunctional in deeper C-states.
> + *
> + * It's only 20 years now that hardware people have been asked to provide
> + * reliable and discoverable facilities which can be used for timekeeping
> + * and per CPU timer interrupts.
> + *
> + * The probability that this problem is going to be solved in the
> + * forseeable future is close to zero, so the kernel has to be cluttered
> + * with heuristics to keep up with the ever growing amount of hardware and
> + * firmware trainwrecks. Hopefully some day hardware people will understand
> + * that the approach of "This can be fixed in software" is not sustainable.
> + * Hope dies last...
> + */
> +static bool __init hpet_is_pc10_damaged(void)
> +{
> +       unsigned int mwait_substates;
> +       unsigned long long pcfg;
> +
> +       if (!get_mwait_substates(&mwait_substates))
> +               return false;
> +
> +       /* Check whether PC10 substates are supported */
> +       if (!(mwait_substates & (0xF << 28)))
> +               return false;
> +
> +       /* Check whether PC10 is enabled in PKG C-state limit */
> +       rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, pcfg);
> +       if ((pcfg & 0xF) < 8)
> +               return false;
> +
> +       if (hpet_force_user) {
> +               pr_warn("HPET force enabled via command line, but dysfunctional in PC10.\n");
> +               return false;
> +       }
> +
> +       pr_info("HPET dysfunctional in PC10. Force disabled.\n");
> +       boot_hpet_disable = true;
> +       return true;
> +}
> +
>  /**
>   * hpet_enable - Try to setup the HPET timer. Returns 1 on success.
>   */
> @@ -929,6 +1014,9 @@ int __init hpet_enable(void)
>         if (!is_hpet_capable())
>                 return 0;
>
> +       if (hpet_is_pc10_damaged())
> +               return 0;
> +
>         hpet_set_mapping();
>         if (!hpet_virt_address)
>                 return 0;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFT] x86/hpet: Use another crystalball to evaluate HPET usability
  2021-09-30 11:38         ` Rafael J. Wysocki
@ 2021-09-30 17:07           ` Thomas Gleixner
  2021-09-30 17:21             ` [PATCH RFT v2] " Thomas Gleixner
  0 siblings, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2021-09-30 17:07 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Bjorn Helgaas, Jakub Kicinski, the arch/x86 maintainers,
	jose.souza, H. Peter Anvin, Borislav Petkov, Ingo Molnar,
	Kai-Heng Feng, Bjorn Helgaas, Linux PCI, rudolph, xapienz,
	bmilton, Paul E. McKenney, Stable, Rafael J. Wysocki,
	Dave Hansen, Feng Tang, Harry Pan

On Thu, Sep 30 2021 at 13:38, Rafael J. Wysocki wrote:
>> +static bool __init get_mwait_substates(unsigned int *mwait_substates)
>> +{
>> +       unsigned int eax, ebx, ecx;
>> +
>> +       if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
>> +               return false;
>> +
>> +       if (!boot_cpu_has(X86_FEATURE_MWAIT))
>> +               return false;
>> +
>> +       if (boot_cpu_data.cpuid_level < CPUID_MWAIT_LEAF)
>> +               return false;
>> +
>> +       cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, mwait_substates);
>> +
>> +       if (!(ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED) ||
>> +           !(ecx & CPUID5_ECX_INTERRUPT_BREAK) ||
>> +           !*mwait_substates)
>> +               return false;
>
> I would do
>
> return (ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED) && (ecx &
> CPUID5_ECX_INTERRUPT_BREAK) && *mwait_substates;
>
> And this function could just return the mwait_substates value proper,
> because returning 0 then would be equivalent to returning 'false' from
> it as is.

Let me move that substates check into the function which makes it even simpler.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH RFT v2] x86/hpet: Use another crystalball to evaluate HPET usability
  2021-09-30 17:07           ` Thomas Gleixner
@ 2021-09-30 17:21             ` Thomas Gleixner
  2021-09-30 17:50               ` Jakub Kicinski
  2021-10-01 11:08               ` Rafael J. Wysocki
  0 siblings, 2 replies; 11+ messages in thread
From: Thomas Gleixner @ 2021-09-30 17:21 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Bjorn Helgaas, Jakub Kicinski, the arch/x86 maintainers,
	jose.souza, H. Peter Anvin, Borislav Petkov, Ingo Molnar,
	Kai-Heng Feng, Bjorn Helgaas, Linux PCI, rudolph, xapienz,
	bmilton, Paul E. McKenney, Stable, Rafael J. Wysocki,
	Dave Hansen, Feng Tang, Harry Pan

On recent Intel systems the HPET stops working when the system reaches PC10
idle state.

The approach of adding PCI ids to the early quirks to disable HPET on
these systems is a whack a mole game which makes no sense.

Check for PC10 instead and force disable HPET if supported. The check is
overbroad as it does not take ACPI, intel_idle enablement and command
line parameters into account. That's fine as long as there is at least
PMTIMER available to calibrate the TSC frequency. The decision can be
overruled by adding "hpet=force" on the kernel command line.

Remove the related early PCI quirks for affected Ice Cake and Coffin Lake
systems as they are not longer required. That should also cover all
other systems, i.e. Tiger Rag and newer generations, which are most
likely affected by this as well.

Fixes: Yet another hardware trainwreck
Reported-by: Jakub Kicinski <kuba@kernel.org>
Not-yet-signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
Notes: Completely untested. Use at your own peril.

V2: Move the substate check into the helper function. Adjust function
    name accordingly.
---
 arch/x86/kernel/early-quirks.c |    6 ---
 arch/x86/kernel/hpet.c         |   81 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 81 insertions(+), 6 deletions(-)

--- a/arch/x86/kernel/early-quirks.c
+++ b/arch/x86/kernel/early-quirks.c
@@ -714,12 +714,6 @@ static struct chipset early_qrk[] __init
 	 */
 	{ PCI_VENDOR_ID_INTEL, 0x0f00,
 		PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
-	{ PCI_VENDOR_ID_INTEL, 0x3e20,
-		PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
-	{ PCI_VENDOR_ID_INTEL, 0x3ec4,
-		PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
-	{ PCI_VENDOR_ID_INTEL, 0x8a12,
-		PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
 	{ PCI_VENDOR_ID_BROADCOM, 0x4331,
 	  PCI_CLASS_NETWORK_OTHER, PCI_ANY_ID, 0, apple_airport_reset},
 	{}
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -10,6 +10,7 @@
 #include <asm/irq_remapping.h>
 #include <asm/hpet.h>
 #include <asm/time.h>
+#include <asm/mwait.h>
 
 #undef  pr_fmt
 #define pr_fmt(fmt) "hpet: " fmt
@@ -916,6 +917,83 @@ static bool __init hpet_counting(void)
 	return false;
 }
 
+static bool __init mwait_pc10_supported(void)
+{
+	unsigned int eax, ebx, ecx, mwait_substates;
+
+	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
+		return false;
+
+	if (!cpu_feature_enabled(X86_FEATURE_MWAIT))
+		return false;
+
+	if (boot_cpu_data.cpuid_level < CPUID_MWAIT_LEAF)
+		return false;
+
+	cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &mwait_substates);
+
+	return (ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED) &&
+	       (ecx & CPUID5_ECX_INTERRUPT_BREAK) &&
+	       (mwait_substates & (0xF << 28));
+}
+
+/*
+ * Check whether the system supports PC10. If so force disable HPET as that
+ * stops counting in PC10. This check is overbroad as it does not take any
+ * of the following into account:
+ *
+ *	- ACPI tables
+ *	- Enablement of intel_idle
+ *	- Command line arguments which limit intel_idle C-state support
+ *
+ * That's perfectly fine. HPET is a piece of hardware designed by committee
+ * and the only reasons why it is still in use on modern systems is the
+ * fact that it is impossible to reliably query TSC and CPU frequency via
+ * CPUID or firmware.
+ *
+ * If HPET is functional it is useful for calibrating TSC, but this can be
+ * done via PMTIMER as well which seems to be the last remaining timer on
+ * X86/INTEL platforms that has not been completely wreckaged by feature
+ * creep.
+ *
+ * In theory HPET support should be removed altogether, but there are older
+ * systems out there which depend on it because TSC and APIC timer are
+ * dysfunctional in deeper C-states.
+ *
+ * It's only 20 years now that hardware people have been asked to provide
+ * reliable and discoverable facilities which can be used for timekeeping
+ * and per CPU timer interrupts.
+ *
+ * The probability that this problem is going to be solved in the
+ * forseeable future is close to zero, so the kernel has to be cluttered
+ * with heuristics to keep up with the ever growing amount of hardware and
+ * firmware trainwrecks. Hopefully some day hardware people will understand
+ * that the approach of "This can be fixed in software" is not sustainable.
+ * Hope dies last...
+ */
+static bool __init hpet_is_pc10_damaged(void)
+{
+	unsigned long long pcfg;
+
+	/* Check whether PC10 substates are supported */
+	if (!mwait_pc10_supported())
+		return false;
+
+	/* Check whether PC10 is enabled in PKG C-state limit */
+	rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, pcfg);
+	if ((pcfg & 0xF) < 8)
+		return false;
+
+	if (hpet_force_user) {
+		pr_warn("HPET force enabled via command line, but dysfunctional in PC10.\n");
+		return false;
+	}
+
+	pr_info("HPET dysfunctional in PC10. Force disabled.\n");
+	boot_hpet_disable = true;
+	return true;
+}
+
 /**
  * hpet_enable - Try to setup the HPET timer. Returns 1 on success.
  */
@@ -929,6 +1007,9 @@ int __init hpet_enable(void)
 	if (!is_hpet_capable())
 		return 0;
 
+	if (hpet_is_pc10_damaged())
+		return 0;
+
 	hpet_set_mapping();
 	if (!hpet_virt_address)
 		return 0;


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFT v2] x86/hpet: Use another crystalball to evaluate HPET usability
  2021-09-30 17:21             ` [PATCH RFT v2] " Thomas Gleixner
@ 2021-09-30 17:50               ` Jakub Kicinski
  2021-10-01 11:08               ` Rafael J. Wysocki
  1 sibling, 0 replies; 11+ messages in thread
From: Jakub Kicinski @ 2021-09-30 17:50 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Rafael J. Wysocki, Bjorn Helgaas, the arch/x86 maintainers,
	jose.souza, H. Peter Anvin, Borislav Petkov, Ingo Molnar,
	Kai-Heng Feng, Bjorn Helgaas, Linux PCI, rudolph, xapienz,
	bmilton, Paul E. McKenney, Stable, Dave Hansen, Feng Tang,
	Harry Pan

On Thu, 30 Sep 2021 19:21:39 +0200 Thomas Gleixner wrote:
> On recent Intel systems the HPET stops working when the system reaches PC10
> idle state.
> 
> The approach of adding PCI ids to the early quirks to disable HPET on
> these systems is a whack a mole game which makes no sense.
> 
> Check for PC10 instead and force disable HPET if supported. The check is
> overbroad as it does not take ACPI, intel_idle enablement and command
> line parameters into account. That's fine as long as there is at least
> PMTIMER available to calibrate the TSC frequency. The decision can be
> overruled by adding "hpet=force" on the kernel command line.
> 
> Remove the related early PCI quirks for affected Ice Cake and Coffin Lake
> systems as they are not longer required. That should also cover all
> other systems, i.e. Tiger Rag and newer generations, which are most
> likely affected by this as well.
> 
> Fixes: Yet another hardware trainwreck
> Reported-by: Jakub Kicinski <kuba@kernel.org>
> Not-yet-signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Tested-by: Jakub Kicinski <kuba@kernel.org>

$ dmesg | grep -i hpet
[    0.014755] ACPI: HPET 0x000000005DC0F000 000038
[    0.014854] ACPI: Reserving HPET table memory at [mem 0x5dc0f000-0x5dc0f037]
[    0.144457] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[    0.296550] hpet: HPET dysfunctional in PC10. Force disabled.
[    0.912010] hpet_acpi_add: no address or irqs in _CRS
$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
tsc acpi_pm 
$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc
$ dmesg | grep RIP
$

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFT v2] x86/hpet: Use another crystalball to evaluate HPET usability
  2021-09-30 17:21             ` [PATCH RFT v2] " Thomas Gleixner
  2021-09-30 17:50               ` Jakub Kicinski
@ 2021-10-01 11:08               ` Rafael J. Wysocki
  1 sibling, 0 replies; 11+ messages in thread
From: Rafael J. Wysocki @ 2021-10-01 11:08 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Rafael J. Wysocki, Bjorn Helgaas, Jakub Kicinski,
	the arch/x86 maintainers, jose.souza, H. Peter Anvin,
	Borislav Petkov, Ingo Molnar, Kai-Heng Feng, Bjorn Helgaas,
	Linux PCI, rudolph, xapienz, bmilton, Paul E. McKenney, Stable,
	Dave Hansen, Feng Tang, Harry Pan

On Thu, Sep 30, 2021 at 7:21 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On recent Intel systems the HPET stops working when the system reaches PC10
> idle state.
>
> The approach of adding PCI ids to the early quirks to disable HPET on
> these systems is a whack a mole game which makes no sense.
>
> Check for PC10 instead and force disable HPET if supported. The check is
> overbroad as it does not take ACPI, intel_idle enablement and command
> line parameters into account. That's fine as long as there is at least
> PMTIMER available to calibrate the TSC frequency. The decision can be
> overruled by adding "hpet=force" on the kernel command line.
>
> Remove the related early PCI quirks for affected Ice Cake and Coffin Lake
> systems as they are not longer required. That should also cover all
> other systems, i.e. Tiger Rag and newer generations, which are most
> likely affected by this as well.
>
> Fixes: Yet another hardware trainwreck
> Reported-by: Jakub Kicinski <kuba@kernel.org>
> Not-yet-signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Reviewed-by: Rafael J. Wysocki <rafael@kernel.org>

> ---
> Notes: Completely untested. Use at your own peril.
>
> V2: Move the substate check into the helper function. Adjust function
>     name accordingly.
> ---
>  arch/x86/kernel/early-quirks.c |    6 ---
>  arch/x86/kernel/hpet.c         |   81 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 81 insertions(+), 6 deletions(-)
>
> --- a/arch/x86/kernel/early-quirks.c
> +++ b/arch/x86/kernel/early-quirks.c
> @@ -714,12 +714,6 @@ static struct chipset early_qrk[] __init
>          */
>         { PCI_VENDOR_ID_INTEL, 0x0f00,
>                 PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
> -       { PCI_VENDOR_ID_INTEL, 0x3e20,
> -               PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
> -       { PCI_VENDOR_ID_INTEL, 0x3ec4,
> -               PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
> -       { PCI_VENDOR_ID_INTEL, 0x8a12,
> -               PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
>         { PCI_VENDOR_ID_BROADCOM, 0x4331,
>           PCI_CLASS_NETWORK_OTHER, PCI_ANY_ID, 0, apple_airport_reset},
>         {}
> --- a/arch/x86/kernel/hpet.c
> +++ b/arch/x86/kernel/hpet.c
> @@ -10,6 +10,7 @@
>  #include <asm/irq_remapping.h>
>  #include <asm/hpet.h>
>  #include <asm/time.h>
> +#include <asm/mwait.h>
>
>  #undef  pr_fmt
>  #define pr_fmt(fmt) "hpet: " fmt
> @@ -916,6 +917,83 @@ static bool __init hpet_counting(void)
>         return false;
>  }
>
> +static bool __init mwait_pc10_supported(void)
> +{
> +       unsigned int eax, ebx, ecx, mwait_substates;
> +
> +       if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
> +               return false;
> +
> +       if (!cpu_feature_enabled(X86_FEATURE_MWAIT))
> +               return false;
> +
> +       if (boot_cpu_data.cpuid_level < CPUID_MWAIT_LEAF)
> +               return false;
> +
> +       cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &mwait_substates);
> +
> +       return (ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED) &&
> +              (ecx & CPUID5_ECX_INTERRUPT_BREAK) &&
> +              (mwait_substates & (0xF << 28));
> +}
> +
> +/*
> + * Check whether the system supports PC10. If so force disable HPET as that
> + * stops counting in PC10. This check is overbroad as it does not take any
> + * of the following into account:
> + *
> + *     - ACPI tables
> + *     - Enablement of intel_idle
> + *     - Command line arguments which limit intel_idle C-state support
> + *
> + * That's perfectly fine. HPET is a piece of hardware designed by committee
> + * and the only reasons why it is still in use on modern systems is the
> + * fact that it is impossible to reliably query TSC and CPU frequency via
> + * CPUID or firmware.
> + *
> + * If HPET is functional it is useful for calibrating TSC, but this can be
> + * done via PMTIMER as well which seems to be the last remaining timer on
> + * X86/INTEL platforms that has not been completely wreckaged by feature
> + * creep.
> + *
> + * In theory HPET support should be removed altogether, but there are older
> + * systems out there which depend on it because TSC and APIC timer are
> + * dysfunctional in deeper C-states.
> + *
> + * It's only 20 years now that hardware people have been asked to provide
> + * reliable and discoverable facilities which can be used for timekeeping
> + * and per CPU timer interrupts.
> + *
> + * The probability that this problem is going to be solved in the
> + * forseeable future is close to zero, so the kernel has to be cluttered
> + * with heuristics to keep up with the ever growing amount of hardware and
> + * firmware trainwrecks. Hopefully some day hardware people will understand
> + * that the approach of "This can be fixed in software" is not sustainable.
> + * Hope dies last...
> + */
> +static bool __init hpet_is_pc10_damaged(void)
> +{
> +       unsigned long long pcfg;
> +
> +       /* Check whether PC10 substates are supported */
> +       if (!mwait_pc10_supported())
> +               return false;
> +
> +       /* Check whether PC10 is enabled in PKG C-state limit */
> +       rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, pcfg);
> +       if ((pcfg & 0xF) < 8)
> +               return false;
> +
> +       if (hpet_force_user) {
> +               pr_warn("HPET force enabled via command line, but dysfunctional in PC10.\n");
> +               return false;
> +       }
> +
> +       pr_info("HPET dysfunctional in PC10. Force disabled.\n");
> +       boot_hpet_disable = true;
> +       return true;
> +}
> +
>  /**
>   * hpet_enable - Try to setup the HPET timer. Returns 1 on success.
>   */
> @@ -929,6 +1007,9 @@ int __init hpet_enable(void)
>         if (!is_hpet_capable())
>                 return 0;
>
> +       if (hpet_is_pc10_damaged())
> +               return 0;
> +
>         hpet_set_mapping();
>         if (!hpet_virt_address)
>                 return 0;
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-10-01 11:08 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-17  2:46 [PATCH v2] x86/intel: Disable HPET on another Intel Coffee Lake platform Jakub Kicinski
2021-09-27 22:36 ` Krzysztof Wilczyński
2021-09-29 13:11 ` Jakub Kicinski
2021-09-29 16:05   ` Bjorn Helgaas
2021-09-30  9:08     ` Thomas Gleixner
2021-09-30 11:15       ` [PATCH RFT] x86/hpet: Use another crystalball to evaluate HPET usability Thomas Gleixner
2021-09-30 11:38         ` Rafael J. Wysocki
2021-09-30 17:07           ` Thomas Gleixner
2021-09-30 17:21             ` [PATCH RFT v2] " Thomas Gleixner
2021-09-30 17:50               ` Jakub Kicinski
2021-10-01 11:08               ` Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.