All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
To: Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Bjorn Helgaas <helgaas@kernel.org>
Cc: Jakub Kicinski <kuba@kernel.org>, <x86@kernel.org>,
	<jose.souza@intel.com>, <hpa@zytor.com>, <bp@alien8.de>,
	<mingo@redhat.com>, <kai.heng.feng@canonical.com>,
	<bhelgaas@google.com>, <linux-pci@vger.kernel.org>,
	<rudolph@fb.com>, <xapienz@fb.com>, <bmilton@fb.com>,
	<stable@vger.kernel.org>,
	Arjan van de Ven <arjan@linux.intel.com>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	"rafael@kernel.org" <rafael@kernel.org>
Subject: Re: [PATCH] x86/intel: Disable HPET on another Intel Coffee Lake platform
Date: Tue, 21 Sep 2021 20:05:19 +0200	[thread overview]
Message-ID: <82c1b753-586d-dadf-54de-6509e70a00ea@intel.com> (raw)
In-Reply-To: <87v92x775x.ffs@tglx>

On 9/19/2021 2:14 AM, Thomas Gleixner wrote:
> On Fri, Sep 17 2021 at 11:34, Peter Zijlstra wrote:
>> On Fri, Sep 17, 2021 at 11:11:49AM +0200, Peter Zijlstra wrote:
>>> On Thu, Sep 16, 2021 at 10:07:07AM -0500, Bjorn Helgaas wrote:
>>>> This seems to be an ongoing issue, not just a point defect in a single
>>>> product, and I really hate the onesy-twosy nature of this.  Is there
>>>> really no way to detect this issue automatically or fix whatever Linux
>>>> bug makes us trip over this?  I am no clock expert, so I have
>>>> absolutely no idea whether this is possible.
> Right, we need to have all these quirks because we can't define a
> generation cutoff based on family/model because X86 model is simply a
> random number. There might be some scheme behind it, but it's neither
> obvious nor documented.
>
> But the HPET on the affected machines goes south when the system enters
> PC10. So the right thing to do is to check whether PC10 is supported and
> force disable HPET if that's the case. That disablement is required
> independent of the clocksource watchdog problem because HPET is exposed
> in other ways as well.
>
> Questions for Rafael:
>
> What's the proper way to figure out whether PC10 is supported?

I can't say without research.  I think it'd be sufficient to check if 
C10 is supported, because asking for it is the only way to get PC10.

However, even if it is supported, the problem is not there until the 
kernel asks for C10.  So instead, I'd disable the TSC watchdog on the 
first attempt to ask the processor for C10 from the cpuidle code and I'd 
do that from the relevant drivers (intel_idle and ACPI idle).

There would be no TSC watchdog for the C10 users, but wouldn't that be a 
fair game?


> I got lost in the maze of intel_idle and ACPI muck and several other places
> which check that. Just grep for CPUID_MWAIT_LEAF and see how consistent
> all of that is.
>
> Why the heck can't we have _ONE_ authoritive implementation of that?
> Just because, right?
>
> Of course all of this is well documented as usual....
>
>>> X86 is gifted with the grant total of _0_ reliable clocks. Given no
>>> accurate time, it is impossible to tell which one of them is broken
>>> worst. Although I suppose we could attempt to synchronize against the
>>> PMU or MPERF..
>>>
>>> We could possibly disable the tsc watchdog for
>>> X86_FEATURE_TSC_KNOWN_FREQ && X86_FEATURE_TSC_ADJUST I suppose.
>>>
>>> And then have people with 'creative' BIOS get to keep the pieces.
>> Alternatively, we can change what the TSC watchdog does for
>> X86_FEATURE_TSC_ADJUST machines. Instead of checking time against HPET
>> it can check if TSC_ADJUST changes. That should make it more resillient
>> vs HPET time itself being off.
> I tried that and I hated the mess it created. Abusing the clocksource
> watchdog machinery for that is a nightmare. Don't even think about it.
>
> When TSC_ADJUST is available then we check the MSR when a CPU goes idle,
> but not more often than once per second. My concern is that we can't
> check TSC_ADJUST for modifications on fully loaded CPUs, but why do I
> still care?
>
> The requirements for ditching the watchdog should be:
>
>      X86_FEATURE_TSC_KNOWN_FREQ &&
>      X86_FEATURE_TSC_ADJUST &&
>      X86_FEATURE_NONSTOP_TSC &&
>      X86_FEATURE_ARAT
>
> But expecting X86_FEATURE_TSC_KNOWN_FREQ to be set on these HPET
> trainwreck equipped machines is wishful thinking:
>
> # cpuid -1 -l 0x15
> CPU:
>     Time Stamp Counter/Core Crystal Clock Information (0x15):
>        TSC/clock ratio = 176/2
>        nominal core crystal clock = 0 Hz
>
> We calculate that missing frequency then from leaf 0x16:
>
> # cpuid -1 -l 0x16
> CPU:
>     Processor Frequency Information (0x16):
>        Core Base Frequency (MHz) = 0x834 (2100)
>        Core Maximum Frequency (MHz) = 0x1068 (4200)
>        Bus (Reference) Frequency (MHz) = 0x64 (100)
>
> But we don't set the TSC_KNOWN_FREQ feature bit in the case that crystal
> clock is 0 and we need to use leaf 16. Which is entirely correct because
> the Core Base Frequency CPUID info is a joke:
>
> [    3.045828] tsc: Refined TSC clocksource calibration: 2111.993 MHz
>
> The refined calibration is pretty accurate according to NTP and if you
> take the CPUID 15/16 numbers into account even obvious with pure math:
>
>    TSC/clock ratio = 176/2
>    Core Base Frequency (MHz) = 0x834 (2100)
>
>    2100 / (176 / 2) = 23.8636 (MHz)
>
> which would be a very unusual crystal frequency. The refined calibration
> makes a lot more sense:
>
>    2112 / (176 / 2) = 24 (MHz)
>
> which is one of the very well known crystal frequencies of these
> machines.
>
> It's 2021 now and we are still not able to get reasonable information
> from hardware/firmware about this?
>
> Can the hardware and firmware people finaly get their act together?
>
> Here is the simple list of things we are asking for:
>
>   - Reliable TSC which cannot be tinkered with even by "value add" BIOSes
>
>   - Reliable information from hardware/firmware about the TSC frequency
>
>   - Hardware enforced (or firmware assisted) guarantees of TSC being
>     synchronized accross sockets
>
> We are asking for that for more than _twenty_ years now. All what we get
> are useless new features and as demonstrated in the case at hand new
> types of wreckage instead of a proper solution to the underlying
> problems.
>
> I'm personally dealing with the x86 timer hardware trainwrecks for more
> than 20 years now. TBH, I'm tired of this idiocy and very close to the
> point where I stop caring.
>
> Thanks,
>
>          tglx



  reply	other threads:[~2021-09-21 18:05 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-16 13:17 [PATCH] x86/intel: Disable HPET on another Intel Coffee Lake platform Jakub Kicinski
2021-09-16 15:07 ` Bjorn Helgaas
2021-09-16 15:30   ` Jakub Kicinski
2021-09-16 16:35     ` Paul E. McKenney
2021-09-17  2:57       ` Jakub Kicinski
2021-09-17  3:33         ` Paul E. McKenney
2021-09-17  9:11   ` Peter Zijlstra
2021-09-17  9:34     ` Peter Zijlstra
2021-09-19  0:14       ` Thomas Gleixner
2021-09-21 18:05         ` Rafael J. Wysocki [this message]
2021-09-21 20:18           ` Thomas Gleixner
2021-09-22 20:27             ` Rafael J. Wysocki
2021-09-22 22:21               ` Thomas Gleixner
2021-09-23 10:46                 ` Rafael J. Wysocki
2021-09-17 14:00 ` Krzysztof Wilczyński
2021-09-17 14:58   ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=82c1b753-586d-dadf-54de-6509e70a00ea@intel.com \
    --to=rafael.j.wysocki@intel.com \
    --cc=arjan@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=bmilton@fb.com \
    --cc=bp@alien8.de \
    --cc=helgaas@kernel.org \
    --cc=hpa@zytor.com \
    --cc=jose.souza@intel.com \
    --cc=kai.heng.feng@canonical.com \
    --cc=kuba@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=rudolph@fb.com \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=x86@kernel.org \
    --cc=xapienz@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.