From: "Rafael J. Wysocki" <rafael@kernel.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Bjorn Helgaas <helgaas@kernel.org>,
Jakub Kicinski <kuba@kernel.org>,
"the arch/x86 maintainers" <x86@kernel.org>,
jose.souza@intel.com, "H. Peter Anvin" <hpa@zytor.com>,
Borislav Petkov <bp@alien8.de>, Ingo Molnar <mingo@kernel.org>,
Kai-Heng Feng <kai.heng.feng@canonical.com>,
Bjorn Helgaas <bhelgaas@google.com>,
Linux PCI <linux-pci@vger.kernel.org>,
rudolph@fb.com, xapienz@fb.com, bmilton@fb.com,
"Paul E. McKenney" <paulmck@kernel.org>,
Stable <stable@vger.kernel.org>,
"Rafael J. Wysocki" <rafael@kernel.org>,
Dave Hansen <dave.hansen@linux.intel.com>,
Feng Tang <feng.tang@intel.com>, Harry Pan <harry.pan@intel.com>
Subject: Re: [PATCH RFT] x86/hpet: Use another crystalball to evaluate HPET usability
Date: Thu, 30 Sep 2021 13:38:53 +0200 [thread overview]
Message-ID: <CAJZ5v0hH_h9V0dACEMomqZbwpQUf6GB_8UK9+S1AGEdFQqvPLQ@mail.gmail.com> (raw)
In-Reply-To: <87k0iy71rw.ffs@tglx>
On Thu, Sep 30, 2021 at 1:15 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On recent Intel systems the HPET stops working when the system reaches PC10
> idle state.
>
> The approach of adding PCI ids to the early quirks to disable HPET on
> these systems is a whack a mole game which makes no sense.
>
> Check for PC10 instead and force disable HPET if supported. The check is
> overbroad as it does not take ACPI, intel_idle enablement and command
> line parameters into account. That's fine as long as there is at least
> PMTIMER available to calibrate the TSC frequency. The decision can be
> overruled by adding "hpet=force" on the kernel command line.
>
> Remove the related early PCI quirks for affected Ice and Coffin Lake
> systems as they are not longer required.
>
> Fixes: Yet another hardware trainwreck
> Reported-by: Jakub Kicinski <kuba@kernel.org>
> Not-yet-signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> Notes: Completely untested. Use at your own peril.
> ---
> arch/x86/kernel/early-quirks.c | 6 --
> arch/x86/kernel/hpet.c | 88 +++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 88 insertions(+), 6 deletions(-)
>
> --- a/arch/x86/kernel/early-quirks.c
> +++ b/arch/x86/kernel/early-quirks.c
> @@ -714,12 +714,6 @@ static struct chipset early_qrk[] __init
> */
> { PCI_VENDOR_ID_INTEL, 0x0f00,
> PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
> - { PCI_VENDOR_ID_INTEL, 0x3e20,
> - PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
> - { PCI_VENDOR_ID_INTEL, 0x3ec4,
> - PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
> - { PCI_VENDOR_ID_INTEL, 0x8a12,
> - PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},
> { PCI_VENDOR_ID_BROADCOM, 0x4331,
> PCI_CLASS_NETWORK_OTHER, PCI_ANY_ID, 0, apple_airport_reset},
> {}
> --- a/arch/x86/kernel/hpet.c
> +++ b/arch/x86/kernel/hpet.c
> @@ -10,6 +10,7 @@
> #include <asm/irq_remapping.h>
> #include <asm/hpet.h>
> #include <asm/time.h>
> +#include <asm/mwait.h>
>
> #undef pr_fmt
> #define pr_fmt(fmt) "hpet: " fmt
> @@ -916,6 +917,90 @@ static bool __init hpet_counting(void)
> return false;
> }
>
> +static bool __init get_mwait_substates(unsigned int *mwait_substates)
> +{
> + unsigned int eax, ebx, ecx;
> +
> + if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
> + return false;
> +
> + if (!boot_cpu_has(X86_FEATURE_MWAIT))
> + return false;
> +
> + if (boot_cpu_data.cpuid_level < CPUID_MWAIT_LEAF)
> + return false;
> +
> + cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, mwait_substates);
> +
> + if (!(ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED) ||
> + !(ecx & CPUID5_ECX_INTERRUPT_BREAK) ||
> + !*mwait_substates)
> + return false;
I would do
return (ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED) && (ecx &
CPUID5_ECX_INTERRUPT_BREAK) && *mwait_substates;
And this function could just return the mwait_substates value proper,
because returning 0 then would be equivalent to returning 'false' from
it as is.
LGTM otherwise.
> +
> + return true;
> +}
> +
> +/*
> + * Check whether the system supports PC10. If so force disable HPET as that
> + * stops counting in PC10. This check is overbroad as it does not take any
> + * of the following into account:
> + *
> + * - ACPI tables
> + * - Enablement of intel_idle
> + * - Command line arguments which limit intel_idle C-state support
> + *
> + * That's perfectly fine. HPET is a piece of hardware designed by committee
> + * and the only reasons why it is still in use on modern systems is the
> + * fact that it is impossible to reliably query the TSC frequency via
> + * CPUID or firmware.
> + *
> + * If HPET is functional it is useful for calibrating TSC, but this can be
> + * done via PMTIMER as well which seems to be the last remaining timer on
> + * X86/INTEL platforms that has not been completely wreckaged by feature
> + * creep.
> + *
> + * In theory HPET support should be removed altogether, but there are older
> + * systems out there which depend on it because TSC and APIC timer are
> + * dysfunctional in deeper C-states.
> + *
> + * It's only 20 years now that hardware people have been asked to provide
> + * reliable and discoverable facilities which can be used for timekeeping
> + * and per CPU timer interrupts.
> + *
> + * The probability that this problem is going to be solved in the
> + * forseeable future is close to zero, so the kernel has to be cluttered
> + * with heuristics to keep up with the ever growing amount of hardware and
> + * firmware trainwrecks. Hopefully some day hardware people will understand
> + * that the approach of "This can be fixed in software" is not sustainable.
> + * Hope dies last...
> + */
> +static bool __init hpet_is_pc10_damaged(void)
> +{
> + unsigned int mwait_substates;
> + unsigned long long pcfg;
> +
> + if (!get_mwait_substates(&mwait_substates))
> + return false;
> +
> + /* Check whether PC10 substates are supported */
> + if (!(mwait_substates & (0xF << 28)))
> + return false;
> +
> + /* Check whether PC10 is enabled in PKG C-state limit */
> + rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, pcfg);
> + if ((pcfg & 0xF) < 8)
> + return false;
> +
> + if (hpet_force_user) {
> + pr_warn("HPET force enabled via command line, but dysfunctional in PC10.\n");
> + return false;
> + }
> +
> + pr_info("HPET dysfunctional in PC10. Force disabled.\n");
> + boot_hpet_disable = true;
> + return true;
> +}
> +
> /**
> * hpet_enable - Try to setup the HPET timer. Returns 1 on success.
> */
> @@ -929,6 +1014,9 @@ int __init hpet_enable(void)
> if (!is_hpet_capable())
> return 0;
>
> + if (hpet_is_pc10_damaged())
> + return 0;
> +
> hpet_set_mapping();
> if (!hpet_virt_address)
> return 0;
next prev parent reply other threads:[~2021-09-30 11:39 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-17 2:46 [PATCH v2] x86/intel: Disable HPET on another Intel Coffee Lake platform Jakub Kicinski
2021-09-27 22:36 ` Krzysztof Wilczyński
2021-09-29 13:11 ` Jakub Kicinski
2021-09-29 16:05 ` Bjorn Helgaas
2021-09-30 9:08 ` Thomas Gleixner
2021-09-30 11:15 ` [PATCH RFT] x86/hpet: Use another crystalball to evaluate HPET usability Thomas Gleixner
2021-09-30 11:38 ` Rafael J. Wysocki [this message]
2021-09-30 17:07 ` Thomas Gleixner
2021-09-30 17:21 ` [PATCH RFT v2] " Thomas Gleixner
2021-09-30 17:50 ` Jakub Kicinski
2021-10-01 11:08 ` Rafael J. Wysocki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJZ5v0hH_h9V0dACEMomqZbwpQUf6GB_8UK9+S1AGEdFQqvPLQ@mail.gmail.com \
--to=rafael@kernel.org \
--cc=bhelgaas@google.com \
--cc=bmilton@fb.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=feng.tang@intel.com \
--cc=harry.pan@intel.com \
--cc=helgaas@kernel.org \
--cc=hpa@zytor.com \
--cc=jose.souza@intel.com \
--cc=kai.heng.feng@canonical.com \
--cc=kuba@kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=paulmck@kernel.org \
--cc=rudolph@fb.com \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
--cc=xapienz@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).