linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/hpet: Disable HPET on Intel Coffe Lake
@ 2019-08-29  9:12 Kai-Heng Feng
  2019-08-29 12:13 ` Thomas Gleixner
  2019-10-09  5:58 ` Feng Tang
  0 siblings, 2 replies; 8+ messages in thread
From: Kai-Heng Feng @ 2019-08-29  9:12 UTC (permalink / raw)
  To: tglx, mingo, bp; +Cc: hpa, harry.pan, x86, linux-kernel, Kai-Heng Feng

Some Coffee Lake platforms have skewed HPET timer once the SoCs entered
PC10, and marked TSC as unstable clocksource as result.

Harry Pan identified it's a firmware bug [1].

To prevent creating a circular dependency between HPET and TSC, let's
disable HPET on affected platforms.

[1]: https://lore.kernel.org/lkml/20190516090651.1396-1-harry.pan@intel.com/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203183

Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
---
 arch/x86/kernel/hpet.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index c6f791bc481e..07e9ec6f85b6 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -7,7 +7,9 @@
 #include <linux/cpu.h>
 #include <linux/irq.h>
 
+#include <asm/cpu_device_id.h>
 #include <asm/hpet.h>
+#include <asm/intel-family.h>
 #include <asm/time.h>
 
 #undef  pr_fmt
@@ -806,6 +808,12 @@ static bool __init hpet_counting(void)
 	return false;
 }
 
+static const struct x86_cpu_id hpet_blacklist[] __initconst = {
+	{ X86_VENDOR_INTEL, 6, INTEL_FAM6_KABYLAKE_MOBILE },
+	{ X86_VENDOR_INTEL, 6, INTEL_FAM6_KABYLAKE_DESKTOP },
+	{ }
+};
+
 /**
  * hpet_enable - Try to setup the HPET timer. Returns 1 on success.
  */
@@ -819,6 +827,9 @@ int __init hpet_enable(void)
 	if (!is_hpet_capable())
 		return 0;
 
+	if (!hpet_force_user && x86_match_cpu(hpet_blacklist))
+		return 0;
+
 	hpet_set_mapping();
 	if (!hpet_virt_address)
 		return 0;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86/hpet: Disable HPET on Intel Coffe Lake
  2019-08-29  9:12 [PATCH] x86/hpet: Disable HPET on Intel Coffe Lake Kai-Heng Feng
@ 2019-08-29 12:13 ` Thomas Gleixner
  2019-08-29 14:13   ` Kai-Heng Feng
  2019-10-09  5:58 ` Feng Tang
  1 sibling, 1 reply; 8+ messages in thread
From: Thomas Gleixner @ 2019-08-29 12:13 UTC (permalink / raw)
  To: Kai-Heng Feng
  Cc: Ingo Molnar, Borislav Petkov, H. Peter Anvin, harry.pan, x86,
	LKML, Dave Hansen

On Thu, 29 Aug 2019, Kai-Heng Feng wrote:

> Some Coffee Lake platforms have skewed HPET timer once the SoCs entered
> PC10, and marked TSC as unstable clocksource as result.

So here you talk about Coffee Lake and in the patch you use KABYLAKE. 

> Harry Pan identified it's a firmware bug [1].
> 
> To prevent creating a circular dependency between HPET and TSC, let's
> disable HPET on affected platforms.
> 
> [1]: https://lore.kernel.org/lkml/20190516090651.1396-1-harry.pan@intel.com/
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203183

Please use Link:// for reference not [1] and not Bugzilla:

> +static const struct x86_cpu_id hpet_blacklist[] __initconst = {
> +	{ X86_VENDOR_INTEL, 6, INTEL_FAM6_KABYLAKE_MOBILE },
> +	{ X86_VENDOR_INTEL, 6, INTEL_FAM6_KABYLAKE_DESKTOP },

So this disables HPET on all Kaby Lake variants not just on the affected
Coffee Lakes. I know that I rejected the initial patch with the random
stepping cutoff...

  https://lore.kernel.org/lkml/alpine.DEB.2.21.1904081403220.1748@nanos.tec.linutronix.de

In the other attempt to 'fix' this I asked for clarification, but silence
from Intel after this:

  https://lore.kernel.org/lkml/alpine.DEB.2.21.1905182015320.3019@nanos.tec.linutronix.de

Can Intel please provide some useful information about this finally?

Thanks,

	tglx




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86/hpet: Disable HPET on Intel Coffe Lake
  2019-08-29 12:13 ` Thomas Gleixner
@ 2019-08-29 14:13   ` Kai-Heng Feng
  2019-08-29 19:45     ` Thomas Gleixner
  0 siblings, 1 reply; 8+ messages in thread
From: Kai-Heng Feng @ 2019-08-29 14:13 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Borislav Petkov, H. Peter Anvin, harry.pan, x86,
	LKML, Dave Hansen

at 20:13, Thomas Gleixner <tglx@linutronix.de> wrote:

> On Thu, 29 Aug 2019, Kai-Heng Feng wrote:
>
>> Some Coffee Lake platforms have skewed HPET timer once the SoCs entered
>> PC10, and marked TSC as unstable clocksource as result.
>
> So here you talk about Coffee Lake and in the patch you use KABYLAKE.

Coffeelake has the same model number as Kabylake.

>
>> Harry Pan identified it's a firmware bug [1].
>>
>> To prevent creating a circular dependency between HPET and TSC, let's
>> disable HPET on affected platforms.
>>
>> [1]:  
>> https://lore.kernel.org/lkml/20190516090651.1396-1-harry.pan@intel.com/
>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203183
>
> Please use Link:// for reference not [1] and not Bugzilla:

Ok.

>
>> +static const struct x86_cpu_id hpet_blacklist[] __initconst = {
>> +	{ X86_VENDOR_INTEL, 6, INTEL_FAM6_KABYLAKE_MOBILE },
>> +	{ X86_VENDOR_INTEL, 6, INTEL_FAM6_KABYLAKE_DESKTOP },
>
> So this disables HPET on all Kaby Lake variants not just on the affected
> Coffee Lakes. I know that I rejected the initial patch with the random
> stepping cutoff...
>
>   https://lore.kernel.org/lkml/alpine.DEB.2.21.1904081403220.1748@nanos.tec.linutronix.de
>
> In the other attempt to 'fix' this I asked for clarification, but silence
> from Intel after this:
>
>   https://lore.kernel.org/lkml/alpine.DEB.2.21.1905182015320.3019@nanos.tec.linutronix.de
>
> Can Intel please provide some useful information about this finally?

Hopefully Intel can provide more info.

I know we should find the root cause rather than stopping at "it’s a  
firmware bug”, but users are already affected by this issue [1].
Is there any better short-term workaround?

[1] https://bugzilla.kernel.org/show_bug.cgi?id=204537

Kai-Heng

>
> Thanks,
>
> 	tglx



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86/hpet: Disable HPET on Intel Coffe Lake
  2019-08-29 14:13   ` Kai-Heng Feng
@ 2019-08-29 19:45     ` Thomas Gleixner
  2019-08-29 21:38       ` [RFD] x86/tsc: Loosen the requirements for watchdog - (was x86/hpet: Disable HPET on Intel Coffe Lake) Thomas Gleixner
  2019-10-01 15:47       ` [PATCH] x86/hpet: Disable HPET on Intel Coffe Lake Kai-Heng Feng
  0 siblings, 2 replies; 8+ messages in thread
From: Thomas Gleixner @ 2019-08-29 19:45 UTC (permalink / raw)
  To: Kai-Heng Feng
  Cc: Ingo Molnar, Borislav Petkov, H. Peter Anvin, harry.pan, x86,
	LKML, Dave Hansen

[-- Attachment #1: Type: text/plain, Size: 1592 bytes --]

On Thu, 29 Aug 2019, Kai-Heng Feng wrote:
> at 20:13, Thomas Gleixner <tglx@linutronix.de> wrote:
> > On Thu, 29 Aug 2019, Kai-Heng Feng wrote:
> > 
> > > Some Coffee Lake platforms have skewed HPET timer once the SoCs entered
> > > PC10, and marked TSC as unstable clocksource as result.
> > 
> > So here you talk about Coffee Lake and in the patch you use KABYLAKE.
> 
> Coffeelake has the same model number as Kabylake.

Yeah, just a bit more text explaining that would be helpful.
 
> > > +static const struct x86_cpu_id hpet_blacklist[] __initconst = {
> > > +	{ X86_VENDOR_INTEL, 6, INTEL_FAM6_KABYLAKE_MOBILE },
> > > +	{ X86_VENDOR_INTEL, 6, INTEL_FAM6_KABYLAKE_DESKTOP },
> > 
> > So this disables HPET on all Kaby Lake variants not just on the affected
> > Coffee Lakes. I know that I rejected the initial patch with the random
> > stepping cutoff...
> > 
> >  https://lore.kernel.org/lkml/alpine.DEB.2.21.1904081403220.1748@nanos.tec.linutronix.de
> > 
> > In the other attempt to 'fix' this I asked for clarification, but silence
> > from Intel after this:
> > 
> >  https://lore.kernel.org/lkml/alpine.DEB.2.21.1905182015320.3019@nanos.tec.linutronix.de
> > 
> > Can Intel please provide some useful information about this finally?
> 
> Hopefully Intel can provide more info.
> 
> I know we should find the root cause rather than stopping at "it’s a firmware
> bug”, but users are already affected by this issue [1].
> Is there any better short-term workaround?

Not really. And if Intel stays silent, I'm just going to apply it as is
along with a stable tag.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFD] x86/tsc: Loosen the requirements for watchdog - (was x86/hpet: Disable HPET on Intel Coffe Lake)
  2019-08-29 19:45     ` Thomas Gleixner
@ 2019-08-29 21:38       ` Thomas Gleixner
  2019-08-30  3:47         ` Daniel Drake
  2019-10-01 15:47       ` [PATCH] x86/hpet: Disable HPET on Intel Coffe Lake Kai-Heng Feng
  1 sibling, 1 reply; 8+ messages in thread
From: Thomas Gleixner @ 2019-08-29 21:38 UTC (permalink / raw)
  To: Kai-Heng Feng
  Cc: Ingo Molnar, Borislav Petkov, H. Peter Anvin, harry.pan, x86,
	LKML, Dave Hansen, Peter Zijlstra, Daniel Drake, Dan Williams,
	Rafael J. Wysocki, Len Brown, Tom Lendacky, Pu Wen

[-- Attachment #1: Type: text/plain, Size: 3072 bytes --]

On Thu, 29 Aug 2019, Thomas Gleixner wrote:
> On Thu, 29 Aug 2019, Kai-Heng Feng wrote:
> > I know we should find the root cause rather than stopping at "it’s a firmware
> > bug”, but users are already affected by this issue [1].
> > Is there any better short-term workaround?
> 
> Not really. And if Intel stays silent, I'm just going to apply it as is
> along with a stable tag.

Summary for those who are new on CC:

   Coffee Lake machines have a C10 state wrecked HPET which causes the TSC
   clocksource watchdog to misbehave which is not surprising as that's like
   trying to monitor an atomic clock with a sun-dial.

   So the intention is to disable HPET on those machines which affects also
   Kaby Lake CPUs as they share the model number and just differ in the
   stepping. Unless we get precise information from Intel which steppings
   are affected and that these are the only ones, we won't go down the
   stepping road as that is going to be an endless whack a mole game. Tried
   that before and got burned...

While disabling HPET sounds trivial, this can have side effects.

If the HPET is not available for whatever reason the kernel will use
ACPI_PMTIMER as fallback clocksource for monitoring the TSC if the affected
systems actually advertise it. If not that will effectively disable NOHZ
and high resolution timers. Disabling NOHZ is a pain for power consumption
and those machines are mostly laptops I assume.

Now there is something we can consider to do:

These CPUs have finally a working and usable TSC - knock on wood!

Just for the record: That's 20+ years after we started to asked for it!

The TSC has constant frequency and does not stop in deeper C-states. Aside
of that these CPUs have the TSC_ADJUST MSR which allows us to figure out
when the BIOS/SMM manages to wreckage the TSC on a CPU by writing to it for
completely wrong reasons.

So we could finally start to trust TSC at least on single socket systems.

Multi-socket is a different story as the sockets might drift apart for
reasons which I really don't want to discuss in this context for CoC's
sake. So we definitely want a watchdog there as TSC ADJUST is not able to
catch those issues.

So if we have to disable the HPET on Kaby Lake alltogether unless Intel
comes up with the clever fix, i.e. poking at the right registers, then I
think we should also lift the TSC watchdog restrictions on these machines
if they are single socket, which they are as the affected CPUs so far are
mobile and client types.

Also given the fact that we get more and more 'reduced' hardware exposed
via ACPI and we already dealt with quite some fallout with various related
issues due to that, I fear we need to bite this bullet anyway anytime soon.

But TBH, 20+ years exposure to subtly wrecked timer hardware has left quite
a few scars.

I put AMD/HYGON folks on CC as well as they will run into similar problems
sooner than later and their CPUs still do not have the TSC_ADJUST MSR which
is paramount to loosen the watchdog restrictions. Hint, hint, hint...

Thoughts?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFD] x86/tsc: Loosen the requirements for watchdog - (was x86/hpet: Disable HPET on Intel Coffe Lake)
  2019-08-29 21:38       ` [RFD] x86/tsc: Loosen the requirements for watchdog - (was x86/hpet: Disable HPET on Intel Coffe Lake) Thomas Gleixner
@ 2019-08-30  3:47         ` Daniel Drake
  0 siblings, 0 replies; 8+ messages in thread
From: Daniel Drake @ 2019-08-30  3:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Kai-Heng Feng, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	harry.pan, x86, LKML, Dave Hansen, Peter Zijlstra, Dan Williams,
	Rafael J. Wysocki, Len Brown, Tom Lendacky, Pu Wen

Hi Thomas,

On Fri, Aug 30, 2019 at 5:38 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> So if we have to disable the HPET on Kaby Lake alltogether unless Intel
> comes up with the clever fix, i.e. poking at the right registers, then I
> think we should also lift the TSC watchdog restrictions on these machines
> if they are single socket, which they are as the affected CPUs so far are
> mobile and client types.
>
> Also given the fact that we get more and more 'reduced' hardware exposed
> via ACPI and we already dealt with quite some fallout with various related
> issues due to that, I fear we need to bite this bullet anyway anytime soon.

Thanks for the explanation here!

My experience in this area is basically limited to the clock-related
issues that I've sent your way recently, so I don't have deep wisdom
to draw upon, but what you wrote here makes sense to me.

If you can outline a testing procedure, we can test upcoming patches
on Coffee Lake and Kaby Lake consumer laptops.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86/hpet: Disable HPET on Intel Coffe Lake
  2019-08-29 19:45     ` Thomas Gleixner
  2019-08-29 21:38       ` [RFD] x86/tsc: Loosen the requirements for watchdog - (was x86/hpet: Disable HPET on Intel Coffe Lake) Thomas Gleixner
@ 2019-10-01 15:47       ` Kai-Heng Feng
  1 sibling, 0 replies; 8+ messages in thread
From: Kai-Heng Feng @ 2019-10-01 15:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Borislav Petkov, H. Peter Anvin, harry.pan, x86,
	LKML, Dave Hansen

Hi Thomas,

> On Aug 30, 2019, at 03:45, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> On Thu, 29 Aug 2019, Kai-Heng Feng wrote:
>> at 20:13, Thomas Gleixner <tglx@linutronix.de> wrote:
>>> On Thu, 29 Aug 2019, Kai-Heng Feng wrote:
>>> 
>>>> Some Coffee Lake platforms have skewed HPET timer once the SoCs entered
>>>> PC10, and marked TSC as unstable clocksource as result.
>>> 
>>> So here you talk about Coffee Lake and in the patch you use KABYLAKE.
>> 
>> Coffeelake has the same model number as Kabylake.
> 
> Yeah, just a bit more text explaining that would be helpful.
> 
>>>> +static const struct x86_cpu_id hpet_blacklist[] __initconst = {
>>>> +	{ X86_VENDOR_INTEL, 6, INTEL_FAM6_KABYLAKE_MOBILE },
>>>> +	{ X86_VENDOR_INTEL, 6, INTEL_FAM6_KABYLAKE_DESKTOP },
>>> 
>>> So this disables HPET on all Kaby Lake variants not just on the affected
>>> Coffee Lakes. I know that I rejected the initial patch with the random
>>> stepping cutoff...
>>> 
>>> https://lore.kernel.org/lkml/alpine.DEB.2.21.1904081403220.1748@nanos.tec.linutronix.de
>>> 
>>> In the other attempt to 'fix' this I asked for clarification, but silence
>>> from Intel after this:
>>> 
>>> https://lore.kernel.org/lkml/alpine.DEB.2.21.1905182015320.3019@nanos.tec.linutronix.de
>>> 
>>> Can Intel please provide some useful information about this finally?
>> 
>> Hopefully Intel can provide more info.
>> 
>> I know we should find the root cause rather than stopping at "it’s a firmware
>> bug”, but users are already affected by this issue [1].
>> Is there any better short-term workaround?
> 
> Not really. And if Intel stays silent, I'm just going to apply it as is
> along with a stable tag.

Seems like there's still no updates from Intel. Can we have this patch in v5.4?

Kai-Heng

> 
> Thanks,
> 
> 	tglx


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86/hpet: Disable HPET on Intel Coffe Lake
  2019-08-29  9:12 [PATCH] x86/hpet: Disable HPET on Intel Coffe Lake Kai-Heng Feng
  2019-08-29 12:13 ` Thomas Gleixner
@ 2019-10-09  5:58 ` Feng Tang
  1 sibling, 0 replies; 8+ messages in thread
From: Feng Tang @ 2019-10-09  5:58 UTC (permalink / raw)
  To: Kai-Heng Feng
  Cc: Thomas Gleixner, mingo, bp, hpa, harry.pan, x86,
	Linux Kernel Mailing List, feng.tang

Hi Kai-Heng,

On Thu, Aug 29, 2019 at 5:14 PM Kai-Heng Feng
<kai.heng.feng@canonical.com> wrote:
>
> Some Coffee Lake platforms have skewed HPET timer once the SoCs entered
> PC10, and marked TSC as unstable clocksource as result.
>
> Harry Pan identified it's a firmware bug [1].
>
> To prevent creating a circular dependency between HPET and TSC, let's
> disable HPET on affected platforms.

Sorry for chiming late.

We have disabled the HPET for Baytrail platforms in
 commit 62187910b0fc : x86/intel: Add quirk to disable HPET for the
Baytrail platform

Which added a quirk in
@@ -567,6 +577,12 @@ static struct chipset early_qrk[] __initdata = {
+       /*
+        * HPET on current version of Baytrail platform has accuracy
+        * problems, disable it for now:
+        */
+       { PCI_VENDOR_ID_INTEL, 0x0f00,
+               PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, 0, force_disable_hpet},

So maybe we can unify the method to disable HPET. (btw, I have no idea
about the healthy info of HPET for Kabylake, just want to comment
on the disabling method).

Thanks,
Feng

>
> [1]: https://lore.kernel.org/lkml/20190516090651.1396-1-harry.pan@intel.com/
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203183
>
> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> ---
>  arch/x86/kernel/hpet.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
> index c6f791bc481e..07e9ec6f85b6 100644
> --- a/arch/x86/kernel/hpet.c
> +++ b/arch/x86/kernel/hpet.c
> @@ -7,7 +7,9 @@
>  #include <linux/cpu.h>
>  #include <linux/irq.h>
>
> +#include <asm/cpu_device_id.h>
>  #include <asm/hpet.h>
> +#include <asm/intel-family.h>
>  #include <asm/time.h>
>
>  #undef  pr_fmt
> @@ -806,6 +808,12 @@ static bool __init hpet_counting(void)
>         return false;
>  }
>
> +static const struct x86_cpu_id hpet_blacklist[] __initconst = {
> +       { X86_VENDOR_INTEL, 6, INTEL_FAM6_KABYLAKE_MOBILE },
> +       { X86_VENDOR_INTEL, 6, INTEL_FAM6_KABYLAKE_DESKTOP },
> +       { }
> +};
> +
>  /**
>   * hpet_enable - Try to setup the HPET timer. Returns 1 on success.
>   */
> @@ -819,6 +827,9 @@ int __init hpet_enable(void)
>         if (!is_hpet_capable())
>                 return 0;
>
> +       if (!hpet_force_user && x86_match_cpu(hpet_blacklist))
> +               return 0;
> +
>         hpet_set_mapping();
>         if (!hpet_virt_address)
>                 return 0;
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-10-09  5:59 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-29  9:12 [PATCH] x86/hpet: Disable HPET on Intel Coffe Lake Kai-Heng Feng
2019-08-29 12:13 ` Thomas Gleixner
2019-08-29 14:13   ` Kai-Heng Feng
2019-08-29 19:45     ` Thomas Gleixner
2019-08-29 21:38       ` [RFD] x86/tsc: Loosen the requirements for watchdog - (was x86/hpet: Disable HPET on Intel Coffe Lake) Thomas Gleixner
2019-08-30  3:47         ` Daniel Drake
2019-10-01 15:47       ` [PATCH] x86/hpet: Disable HPET on Intel Coffe Lake Kai-Heng Feng
2019-10-09  5:58 ` Feng Tang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).