linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] x86/tsc: use CPUID.0x16 to calculate missing crystal frequency
@ 2019-04-22 10:15 Daniel Drake
  2019-04-22 10:15 ` [PATCH 2/2] x86/tsc: set LAPIC timer frequency to crystal clock frequency Daniel Drake
  2019-04-23  9:08 ` [PATCH 1/2] x86/tsc: use CPUID.0x16 to calculate missing crystal frequency Peter Zijlstra
  0 siblings, 2 replies; 7+ messages in thread
From: Daniel Drake @ 2019-04-22 10:15 UTC (permalink / raw)
  To: tglx, mingo, bp
  Cc: hpa, x86, linux-kernel, len.brown, rafael.j.wysocki, linux

native_calibrate_tsc() had a hardcoded table of Intel CPU families
and crystal clock, but we have found that it is possible to
calculate the crystal clock speed, and this is preferred to a hardcoded
table.

Where crystal clock frequency was not reported by CPUID.0x15,
use CPUID.0x16 data to calculate the crystal clock.

Using CPUID dump data from http://instlatx64.atw.hu/, the calculation
results can be seen to be sufficiently close to the previously hardcoded
values:
SKYLAKE_MOBILE: 24074074 Hz
SKYLAKE_DESKTOP: 23913043 Hz
KABYLAKE_MOBILE: 23893805 Hz
KABYLAKE_DESKTOP: 24050632 Hz
GOLDMONT: 19.2MHz crystal clock correctly reported by CPUID.0x15

Additionally, crystal clock frequency for platforms that were missing
from the list (e.g. SKYLAKE_X) will now be provided.

GOLDMONT_X was left as a hardcoded value, as the CPUID data on that site
indicates that the hardware does not report crystal clock nor CPUID.0x16
data.

Going forward, Intel have hopefully now started providing crystal
frequency in CPUID.0x15. At least ApolloLake, GeminiLake and CannonLake
CPUs all provide the relevant data directly.

Link: https://lkml.kernel.org/r/20190419083533.32388-1-drake@endlessm.com
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Daniel Drake <drake@endlessm.com>
---
 arch/x86/kernel/tsc.c | 33 ++++++++++++++++++---------------
 1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 3fae23834069..3971c837584a 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -629,23 +629,26 @@ unsigned long native_calibrate_tsc(void)
 
 	crystal_khz = ecx_hz / 1000;
 
-	if (crystal_khz == 0) {
-		switch (boot_cpu_data.x86_model) {
-		case INTEL_FAM6_SKYLAKE_MOBILE:
-		case INTEL_FAM6_SKYLAKE_DESKTOP:
-		case INTEL_FAM6_KABYLAKE_MOBILE:
-		case INTEL_FAM6_KABYLAKE_DESKTOP:
-			crystal_khz = 24000;	/* 24.0 MHz */
-			break;
-		case INTEL_FAM6_ATOM_GOLDMONT_X:
-			crystal_khz = 25000;	/* 25.0 MHz */
-			break;
-		case INTEL_FAM6_ATOM_GOLDMONT:
-			crystal_khz = 19200;	/* 19.2 MHz */
-			break;
-		}
+	/*
+	 * Some Intel SoCs like Skylake and Kabylake don't report the crystal
+	 * clock, but we can easily calculate it by considering the crystal
+	 * ratio and the CPU speed.
+	 */
+	if (crystal_khz == 0 && boot_cpu_data.cpuid_level >= 0x16) {
+		unsigned int eax_base_mhz, ebx, ecx, edx;
+		cpuid(0x16, &eax_base_mhz, &ebx, &ecx, &edx);
+		crystal_khz = eax_base_mhz * 1000 * \
+			eax_denominator / ebx_numerator;
 	}
 
+	/*
+	 * Denverton SoCs don't report crystal clock, and also don't support
+	 * CPUID.0x16, so hardcode the 25MHz crystal clock.
+	 */
+	if (crystal_khz == 0 &&
+			boot_cpu_data.x86_model == INTEL_FAM6_ATOM_GOLDMONT_X)
+		crystal_khz = 25000;
+
 	if (crystal_khz == 0)
 		return 0;
 	/*
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] x86/tsc: set LAPIC timer frequency to crystal clock frequency
  2019-04-22 10:15 [PATCH 1/2] x86/tsc: use CPUID.0x16 to calculate missing crystal frequency Daniel Drake
@ 2019-04-22 10:15 ` Daniel Drake
  2019-04-22 12:04   ` Ingo Molnar
  2019-04-23  9:08 ` [PATCH 1/2] x86/tsc: use CPUID.0x16 to calculate missing crystal frequency Peter Zijlstra
  1 sibling, 1 reply; 7+ messages in thread
From: Daniel Drake @ 2019-04-22 10:15 UTC (permalink / raw)
  To: tglx, mingo, bp
  Cc: hpa, x86, linux-kernel, len.brown, rafael.j.wysocki, linux

The APIC timer calibration (calibrate_APIC_timer()) can be skipped
in cases where we know the APIC timer frequency. On Intel SoCs,
we believe that the APIC is fed by the crystal clock; this would make
sense, and the crystal clock frequency has been verified against the
APIC timer calibration result on ApolloLake, GeminiLake, Kabylake,
CoffeeLake, WhiskeyLake and AmberLake.

Set lapic_timer_frequency based on the crystal clock frequency
accordingly.

APIC timer calibration would normally be skipped on modern CPUs
by nature of the TSC deadline timer being used instead,
however this change is still potentially useful, e.g. if the
TSC deadline timer has been disabled with a kernel parameter.
calibrate_APIC_timer() uses the legacy timer, but we are seeing
new platforms that omit such legacy functionality, so avoiding
such codepaths is becoming more important.

Link: https://lkml.kernel.org/r/20190419083533.32388-1-drake@endlessm.com
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1904031206440.1967@nanos.tec.linutronix.de
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Daniel Drake <drake@endlessm.com>
---
 arch/x86/kernel/tsc.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 3971c837584a..8750543287fc 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -665,6 +665,16 @@ unsigned long native_calibrate_tsc(void)
 	if (boot_cpu_data.x86_model == INTEL_FAM6_ATOM_GOLDMONT)
 		setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
 
+#ifdef CONFIG_X86_LOCAL_APIC
+	/*
+	 * The local APIC appears to be fed by the core crystal clock
+	 * (which sounds entirely sensible). We can set the global
+	 * lapic_timer_frequency here to avoid having to calibrate the APIC
+	 * timer later.
+	 */
+	lapic_timer_frequency = (crystal_khz * 1000) / HZ;
+#endif
+
 	return crystal_khz * ebx_numerator / eax_denominator;
 }
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] x86/tsc: set LAPIC timer frequency to crystal clock frequency
  2019-04-22 10:15 ` [PATCH 2/2] x86/tsc: set LAPIC timer frequency to crystal clock frequency Daniel Drake
@ 2019-04-22 12:04   ` Ingo Molnar
  2019-04-23  3:06     ` Daniel Drake
  0 siblings, 1 reply; 7+ messages in thread
From: Ingo Molnar @ 2019-04-22 12:04 UTC (permalink / raw)
  To: Daniel Drake
  Cc: tglx, mingo, bp, hpa, x86, linux-kernel, len.brown,
	rafael.j.wysocki, linux


* Daniel Drake <drake@endlessm.com> wrote:

> +#ifdef CONFIG_X86_LOCAL_APIC
> +	/*
> +	 * The local APIC appears to be fed by the core crystal clock
> +	 * (which sounds entirely sensible). We can set the global
> +	 * lapic_timer_frequency here to avoid having to calibrate the APIC
> +	 * timer later.
> +	 */
> +	lapic_timer_frequency = (crystal_khz * 1000) / HZ;
> +#endif

Minor style nit: the parentheses are unnecessary, integer expressions 
like this are evaluated left to right and multiplication and division has 
the same precedence.

But it might also make sense to actually store crystal_mhz instead of 
crystal_khz, because both CPUID 15H and 16H provides MHz values.

That way the above expression would simplify to:

	lapic_timer_frequency = crystal_mhz / HZ;

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] x86/tsc: set LAPIC timer frequency to crystal clock frequency
  2019-04-22 12:04   ` Ingo Molnar
@ 2019-04-23  3:06     ` Daniel Drake
  2019-04-23 10:30       ` Ingo Molnar
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Drake @ 2019-04-23  3:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, hpa, x86,
	Linux Kernel, Len Brown, Wysocki, Rafael J,
	Linux Upstreaming Team

On Mon, Apr 22, 2019 at 8:04 PM Ingo Molnar <mingo@kernel.org> wrote:
> Minor style nit: the parentheses are unnecessary, integer expressions
> like this are evaluated left to right and multiplication and division has
> the same precedence.

Fair point, although the same could be said for cpu_khz_from_msr().

> But it might also make sense to actually store crystal_mhz instead of
> crystal_khz, because both CPUID 15H and 16H provides MHz values.
>
> That way the above expression would simplify to:
>
>         lapic_timer_frequency = crystal_mhz / HZ;

Wouldn't it be
        lapic_timer_frequency = crystal_mhz * 1000000 / HZ;
?

Thanks
Daniel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] x86/tsc: use CPUID.0x16 to calculate missing crystal frequency
  2019-04-22 10:15 [PATCH 1/2] x86/tsc: use CPUID.0x16 to calculate missing crystal frequency Daniel Drake
  2019-04-22 10:15 ` [PATCH 2/2] x86/tsc: set LAPIC timer frequency to crystal clock frequency Daniel Drake
@ 2019-04-23  9:08 ` Peter Zijlstra
  2019-04-24  5:53   ` Daniel Drake
  1 sibling, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2019-04-23  9:08 UTC (permalink / raw)
  To: Daniel Drake
  Cc: tglx, mingo, bp, hpa, x86, linux-kernel, len.brown,
	rafael.j.wysocki, linux

On Mon, Apr 22, 2019 at 06:15:25PM +0800, Daniel Drake wrote:
> Additionally, crystal clock frequency for platforms that were missing
> from the list (e.g. SKYLAKE_X) will now be provided.

Well, SKX isn't exactly 'missing'; it would be very good if we can
confirm the new code is still correct under the below mentioned
conditions.

---

commit b511203093489eb1829cb4de86e8214752205ac6
Author: Len Brown <len.brown@intel.com>
Date:   Fri Dec 22 00:27:55 2017 -0500

    x86/tsc: Fix erroneous TSC rate on Skylake Xeon
    
    The INTEL_FAM6_SKYLAKE_X hardcoded crystal_khz value of 25MHZ is
    problematic:
    
     - SKX workstations (with same model # as server variants) use a 24 MHz
       crystal.  This results in a -4.0% time drift rate on SKX workstations.
    
     - SKX servers subject the crystal to an EMI reduction circuit that reduces its
       actual frequency by (approximately) -0.25%.  This results in -1 second per
       10 minute time drift as compared to network time.
    
    This issue can also trigger a timer and power problem, on configurations
    that use the LAPIC timer (versus the TSC deadline timer).  Clock ticks
    scheduled with the LAPIC timer arrive a few usec before the time they are
    expected (according to the slow TSC).  This causes Linux to poll-idle, when
    it should be in an idle power saving state.  The idle and clock code do not
    graciously recover from this error, sometimes resulting in significant
    polling and measurable power impact.
    
    Stop using native_calibrate_tsc() for INTEL_FAM6_SKYLAKE_X.
    native_calibrate_tsc() will return 0, boot will run with tsc_khz = cpu_khz,
    and the TSC refined calibration will update tsc_khz to correct for the
    difference.
    
    [ tglx: Sanitized change log ]
    
    Fixes: 6baf3d61821f ("x86/tsc: Add additional Intel CPU models to the crystal quirk list")
    Signed-off-by: Len Brown <len.brown@intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Cc: peterz@infradead.org
    Cc: Prarit Bhargava <prarit@redhat.com>
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/ff6dcea166e8ff8f2f6a03c17beab2cb436aa779.1513920414.git.len.brown@intel.com

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index ce4b71119c36..3bf4df7f52d7 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -602,7 +602,6 @@ unsigned long native_calibrate_tsc(void)
 		case INTEL_FAM6_KABYLAKE_DESKTOP:
 			crystal_khz = 24000;	/* 24.0 MHz */
 			break;
-		case INTEL_FAM6_SKYLAKE_X:
 		case INTEL_FAM6_ATOM_DENVERTON:
 			crystal_khz = 25000;	/* 25.0 MHz */
 			break;

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] x86/tsc: set LAPIC timer frequency to crystal clock frequency
  2019-04-23  3:06     ` Daniel Drake
@ 2019-04-23 10:30       ` Ingo Molnar
  0 siblings, 0 replies; 7+ messages in thread
From: Ingo Molnar @ 2019-04-23 10:30 UTC (permalink / raw)
  To: Daniel Drake
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, hpa, x86,
	Linux Kernel, Len Brown, Wysocki, Rafael J,
	Linux Upstreaming Team


* Daniel Drake <drake@endlessm.com> wrote:

> On Mon, Apr 22, 2019 at 8:04 PM Ingo Molnar <mingo@kernel.org> wrote:
> > Minor style nit: the parentheses are unnecessary, integer expressions
> > like this are evaluated left to right and multiplication and division has
> > the same precedence.
> 
> Fair point, although the same could be said for cpu_khz_from_msr().

Yes, this standardization on mhz, if it makes sense, might have to be 
propagated a bit to function names and any other variables.

Another naming quirk: what unit is 'lapic_timer_frequency' in? AFAICS 
it's a "period" unit (number of clock cycles per jiffy), not a frequency 
(which is number of cycles per second) - so the better name might be 
lapic_timer_period?

> > But it might also make sense to actually store crystal_mhz instead of
> > crystal_khz, because both CPUID 15H and 16H provides MHz values.
> >
> > That way the above expression would simplify to:
> >
> >         lapic_timer_frequency = crystal_mhz / HZ;
> 
> Wouldn't it be
>         lapic_timer_frequency = crystal_mhz * 1000000 / HZ;
> ?

Sorry, what I wanted to suggest is crystal_hz, not crystal_mhz.

I.e. store the raw unit that comes out of the CPUID, which appears to be 
HZ, right?

That would change the calculation to:

         lapic_timer_period = crystal_hz / HZ;

Note how clearly it's visible in that calculation that what we calculate 
there is a 'period', not a frequency.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] x86/tsc: use CPUID.0x16 to calculate missing crystal frequency
  2019-04-23  9:08 ` [PATCH 1/2] x86/tsc: use CPUID.0x16 to calculate missing crystal frequency Peter Zijlstra
@ 2019-04-24  5:53   ` Daniel Drake
  0 siblings, 0 replies; 7+ messages in thread
From: Daniel Drake @ 2019-04-24  5:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, hpa, x86,
	Linux Kernel, Len Brown, Wysocki, Rafael J,
	Linux Upstreaming Team

On Tue, Apr 23, 2019 at 5:08 PM Peter Zijlstra <peterz@infradead.org> wrote:
> Well, SKX isn't exactly 'missing'; it would be very good if we can
> confirm the new code is still correct under the below mentioned
> conditions.
>
> commit b511203093489eb1829cb4de86e8214752205ac6
> Author: Len Brown <len.brown@intel.com>
> Date:   Fri Dec 22 00:27:55 2017 -0500
>
>     x86/tsc: Fix erroneous TSC rate on Skylake Xeon
>
>     The INTEL_FAM6_SKYLAKE_X hardcoded crystal_khz value of 25MHZ is
>     problematic:
>
>      - SKX workstations (with same model # as server variants) use a 24 MHz
>        crystal.  This results in a -4.0% time drift rate on SKX workstations.

Checking http://instlatx64.atw.hu/ there are 11 platforms listed as
00050654 (skylake X), doing the calculations manually:
18-Core Intel Xeon Gold 6154, 3000 MHz: 25000000
2x 16-Core Intel Xeon Gold 6130, 2100 MHz: 25000000
2x 18-Core Intel Xeon Gold 6154, 3000 MHz: 25000000
2x 28-Core Intel Xeon Platinum 8180, 2500 MHz: 25000000
2x HexaCore Intel Xeon Bronze 3106: 25000000
2x OctalCore Intel Xeon Silver 4108, 3000 MHz: 25000000
10-Core Xeon W-2155: 23913043
HexaCore Intel Core i7-7800X: 23972602
10-Core Intel Core i9-7900X, 4000 MHz: 23913043
18-Core Intel Core i9-7980XE: 24074074
28-Core Intel Xeon W-3175X: 25000000

So given that the results include 24MHz and 25MHz crystal clocks
calculated on different products then it looks like this variance is
correctly accounted for.

>      - SKX servers subject the crystal to an EMI reduction circuit that reduces its
>        actual frequency by (approximately) -0.25%.  This results in -1 second per
>        10 minute time drift as compared to network time.
>
>     This issue can also trigger a timer and power problem, on configurations
>     that use the LAPIC timer (versus the TSC deadline timer).  Clock ticks
>     scheduled with the LAPIC timer arrive a few usec before the time they are
>     expected (according to the slow TSC).  This causes Linux to poll-idle, when
>     it should be in an idle power saving state.  The idle and clock code do not
>     graciously recover from this error, sometimes resulting in significant
>     polling and measurable power impact.
>
>     Stop using native_calibrate_tsc() for INTEL_FAM6_SKYLAKE_X.
>     native_calibrate_tsc() will return 0, boot will run with tsc_khz = cpu_khz,
>     and the TSC refined calibration will update tsc_khz to correct for the
>     difference.

It sounds like I should add a condition:
    if (boot_cpu_data.x86_model != INTEL_FAM6_SKYLAKE_X)
        setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
so that refined TSC calibration will run on SKYLAKE_X.

That does raise another question though. Taking other platforms such
as KABYLAKE_MOBILE, the previous code hardcoded a precise 24MHz value,
but my new code calculates 23893805Hz. Is that small discrepancy small
enough to trigger the timer and power problem as described for
SKYLAKE_X above?

If so, maybe the logic should instead be:
    if (CPUID.0x15 provided Crystal Hz, i.e. we didn't have to calculate it)
        setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
This would enable TSC refinement on all variants of skylake and kabylake.

Daniel

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-04-24  5:53 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-22 10:15 [PATCH 1/2] x86/tsc: use CPUID.0x16 to calculate missing crystal frequency Daniel Drake
2019-04-22 10:15 ` [PATCH 2/2] x86/tsc: set LAPIC timer frequency to crystal clock frequency Daniel Drake
2019-04-22 12:04   ` Ingo Molnar
2019-04-23  3:06     ` Daniel Drake
2019-04-23 10:30       ` Ingo Molnar
2019-04-23  9:08 ` [PATCH 1/2] x86/tsc: use CPUID.0x16 to calculate missing crystal frequency Peter Zijlstra
2019-04-24  5:53   ` Daniel Drake

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).