x86/tsc: Don't use cpuid 0x16 leaf to determine cpu speed.
diff mbox series

Message ID SoluZg51N39Rx0tDCSJFbEvvgMrDnJ_g0RdRdN5mtCfag4GahIOPfok7UbkyeO5Qpl3wUHp8H8y73JtClcZvr1ARSIOBIFQAne0Z712el8M=@protonmail.com
State New
Headers show
Series
  • x86/tsc: Don't use cpuid 0x16 leaf to determine cpu speed.
Related show

Commit Message

Krzysztof Piecuch Dec. 5, 2019, 7:15 p.m. UTC
This patch corrects tsc drift on systems with changed base clock frequency
(e.g. overclocking).

We can't use 0x16 cpu leaf as it's documented as "not reflecting actual
values" and is supposed to be used only as a mean to determine "processor
brand string and for determining the appropriate range to use when
displaying processor information e.g. frequency history graphs".

Signed-off-by: Krzysztof Piecuch <piecuch@protonmail.com>
---
 arch/x86/kernel/tsc.c | 21 ++++-----------------
 1 file changed, 4 insertions(+), 17 deletions(-)

--
2.17.1

Comments

Krzysztof Piecuch Dec. 5, 2019, 9:38 p.m. UTC | #1
Sorry, it doesn't work. I will follow up with something better
in a couple of days.

I've bisected the bug and it was introduced on
aa297292d708e89773b3b2cdcaf33f01bfa095d8 - I will start from there.

I would appreciate any feedback you have on this topic.

Kind regards,
Krzysztof Piecuch
Peter Zijlstra Dec. 6, 2019, 10:39 a.m. UTC | #2
On Thu, Dec 05, 2019 at 07:15:03PM +0000, Krzysztof Piecuch wrote:
> This patch corrects tsc drift on systems with changed base clock frequency
> (e.g. overclocking).
> 
> We can't use 0x16 cpu leaf as it's documented as "not reflecting actual
> values" and is supposed to be used only as a mean to determine "processor
> brand string and for determining the appropriate range to use when
> displaying processor information e.g. frequency history graphs".

What is the actual problem you're seeing? Because if CPUID.16h is used,
we don't set KNOWN_FREQ and will thus run tsc_refine_calibration_work()
(against HPET/PIT) later.

The CPUIS.16h value is only used as an initial guess.

> Signed-off-by: Krzysztof Piecuch <piecuch@protonmail.com>
> ---
>  arch/x86/kernel/tsc.c | 21 ++++-----------------
>  1 file changed, 4 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index 7e322e2daaf5..fc9a000a814c 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -641,29 +641,16 @@ unsigned long native_calibrate_tsc(void)
>  			boot_cpu_data.x86_model == INTEL_FAM6_ATOM_GOLDMONT_D)
>  		crystal_khz = 25000;
> 
> +	if (crystal_khz == 0)
> +		return 0;
> +
>  	/*
>  	 * TSC frequency reported directly by CPUID is a "hardware reported"
>  	 * frequency and is the most accurate one so far we have. This
>  	 * is considered a known frequency.
>  	 */
> -	if (crystal_khz != 0)
> -		setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
> -
> -	/*
> -	 * Some Intel SoCs like Skylake and Kabylake don't report the crystal
> -	 * clock, but we can easily calculate it to a high degree of accuracy
> -	 * by considering the crystal ratio and the CPU speed.
> -	 */
> -	if (crystal_khz == 0 && boot_cpu_data.cpuid_level >= 0x16) {
> -		unsigned int eax_base_mhz, ebx, ecx, edx;
> -
> -		cpuid(0x16, &eax_base_mhz, &ebx, &ecx, &edx);
> -		crystal_khz = eax_base_mhz * 1000 *
> -			eax_denominator / ebx_numerator;
> -	}
> 
> -	if (crystal_khz == 0)
> -		return 0;
> +	setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
> 
>  	/*
>  	 * For Atom SoCs TSC is the only reliable clocksource.

This completely screws over everything that doesn't have HPET/PIT and
doesn't have a useful CPUID.15h.

If you're on a system that has no HPET/PIT and also doesn't have a
useful CPUID.15h and CPUID.16h is wrong, then you're up a creek without
paddles.

So please, be more specific in your problem description.
Krzysztof Piecuch Dec. 6, 2019, 4:13 p.m. UTC | #3
Thank you for your reply.

I experience 2% tsc clock-drift (671s ahead of my local NTP server after 9.5h)
on my machine with Supermicro's Hyperspeed turned on. There's no clock
drift when I turn Hyperspeed off.

As far as I know Hyperspeed increases base clock frequency[1].

That's what CPUID says about my overclocked Intel Xeon Gold 6146:

   Time Stamp Counter/Core Crystal Clock Information (0x15):
      TSC/clock ratio = 256/2
      nominal core crystal clock = 0 Hz
   Processor Frequency Information (0x16):
      Core Base Frequency (MHz) = 0xc80 (3200)
      Core Maximum Frequency (MHz) = 0x1068 (4200)
      Bus (Reference) Frequency (MHz) = 0x64 (100)

tsc_refine_calibration_work never corrects the early calibration
because it calculates a tsc frequency beyond 1% tolerance.

I've bumped the tsc_refine_calibration_work's tolerance to 3% and made it work:

Hyperspeed:
[    8.571471] tsc: Refined TSC clocksource calibration: 3264.012 MHz
No hyperspeed:
[    8.506009] tsc: Refined TSC clocksource calibration: 3200.013 MHz

Increasing the tolerance to 3% would work in my case but apparently some
servers can increase the base-clock frequency to 6%. [2]
At this point in order to completely eliminate this bug we would need to
significantly increase the tolerance which might introduce other bugs.

[1]: https://www.supermicro.com/support/faqs/faq.cfm?faq=21337
[2]: https://www.servethehome.com/supermicro-hyper-speed-server-overclocking-bios/

Kind regards,
Krzysztof Piecuch

Patch
diff mbox series

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 7e322e2daaf5..fc9a000a814c 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -641,29 +641,16 @@  unsigned long native_calibrate_tsc(void)
 			boot_cpu_data.x86_model == INTEL_FAM6_ATOM_GOLDMONT_D)
 		crystal_khz = 25000;

+	if (crystal_khz == 0)
+		return 0;
+
 	/*
 	 * TSC frequency reported directly by CPUID is a "hardware reported"
 	 * frequency and is the most accurate one so far we have. This
 	 * is considered a known frequency.
 	 */
-	if (crystal_khz != 0)
-		setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
-
-	/*
-	 * Some Intel SoCs like Skylake and Kabylake don't report the crystal
-	 * clock, but we can easily calculate it to a high degree of accuracy
-	 * by considering the crystal ratio and the CPU speed.
-	 */
-	if (crystal_khz == 0 && boot_cpu_data.cpuid_level >= 0x16) {
-		unsigned int eax_base_mhz, ebx, ecx, edx;
-
-		cpuid(0x16, &eax_base_mhz, &ebx, &ecx, &edx);
-		crystal_khz = eax_base_mhz * 1000 *
-			eax_denominator / ebx_numerator;
-	}

-	if (crystal_khz == 0)
-		return 0;
+	setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);

 	/*
 	 * For Atom SoCs TSC is the only reliable clocksource.