linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/tsc: Add option to force HW timer based recalibration
@ 2022-05-08 14:47 Feng Tang
  2022-05-09  4:58 ` Feng Tang
  0 siblings, 1 reply; 9+ messages in thread
From: Feng Tang @ 2022-05-08 14:47 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Peter Zijlstra, Jonathan Corbet, x86,
	linux-kernel
  Cc: paulmck, rui.zhang, len.brown, tim.c.chen, Feng Tang

Currently when HW provides the tsc freq info through MSR or CPUID(0x15),
the info will be taken as the 'best guess', and kernel will set the
X86_FEATURE_TSC_KNOWN_FREQ flag and skip the HW timer based recalibration,
which works pretty well.

And there is still very few corner case that the freq info is not
accurate enough will small deviation from the actual value, like on
a product with early version (fix needed) of firmware or some
pre-production hardware.

Add an option 'recalibrate' for 'tsc' kernel parameter to force the
tsc freq recalibration with HPET/PM_TIMER, and warn if the deviation
from previous value is more than about 500 PPM.

Signed-off-by: Feng Tang <feng.tang@intel.com>
---
 .../admin-guide/kernel-parameters.txt         |  5 +++
 arch/x86/kernel/tsc.c                         | 35 ++++++++++++++++---
 2 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 3f1cc5e317ed..a5880f25e1bb 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5981,6 +5981,11 @@
 			in situations with strict latency requirements (where
 			interruptions from clocksource watchdog are not
 			acceptable).
+			[x86] recalibrate: force to do freq recalibration with
+			a HW timer (HPET or PM_TIMER). When HW provides tsc freq
+			info through MSR or CPUID(0x15), kernel will take it as
+			the 'best guess', but there is corner case that the info
+			could be wrong, and need a double chck through HW timer.
 
 	tsc_early_khz=  [X86] Skip early TSC calibration and use the given
 			value instead. Useful when the early TSC frequency discovery
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index cafacb2e58cc..186a793fdf3a 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -48,6 +48,8 @@ static DEFINE_STATIC_KEY_FALSE(__use_tsc);
 
 int tsc_clocksource_reliable;
 
+static int __read_mostly tsc_force_recalibrate;
+
 static u32 art_to_tsc_numerator;
 static u32 art_to_tsc_denominator;
 static u64 art_to_tsc_offset;
@@ -303,6 +305,8 @@ static int __init tsc_setup(char *str)
 		mark_tsc_unstable("boot parameter");
 	if (!strcmp(str, "nowatchdog"))
 		no_tsc_watchdog = 1;
+	if (!strcmp(str, "recalibrate"))
+		tsc_force_recalibrate = 1;
 	return 1;
 }
 
@@ -1374,6 +1378,25 @@ static void tsc_refine_calibration_work(struct work_struct *work)
 	else
 		freq = calc_pmtimer_ref(delta, ref_start, ref_stop);
 
+	/* Will hit this only if tsc_force_recalibrate has been set */
+	if (boot_cpu_has(X86_FEATURE_TSC_KNOWN_FREQ)) {
+
+		/* Warn if the deviation exceeds 500 ppm */
+		if (abs(tsc_khz - freq) > (tsc_khz >> 11)) {
+			pr_warn("Warning: TSC freq calibrated by CPUID/MSR differs from what is calibrated by HW timer, please check with vendor!!\n");
+			pr_info("Previous calibrated TSC freq:\t %lu.%03lu MHz\n",
+				(unsigned long)tsc_khz / 1000,
+				(unsigned long)tsc_khz % 1000);
+		}
+
+		pr_info("TSC freq recalibrated by [%s]:\t %lu.%03lu MHz\n",
+			hpet ? "HPET" : "PM_TIMER",
+			(unsigned long)freq / 1000,
+			(unsigned long)freq % 1000);
+
+		return;
+	}
+
 	/* Make sure we're within 1% */
 	if (abs(tsc_khz - freq) > tsc_khz/100)
 		goto out;
@@ -1383,6 +1406,7 @@ static void tsc_refine_calibration_work(struct work_struct *work)
 		(unsigned long)tsc_khz / 1000,
 		(unsigned long)tsc_khz % 1000);
 
+
 	/* Inform the TSC deadline clockevent devices about the recalibration */
 	lapic_update_tsc_freq();
 
@@ -1407,8 +1431,10 @@ static int __init init_tsc_clocksource(void)
 	if (!boot_cpu_has(X86_FEATURE_TSC) || !tsc_khz)
 		return 0;
 
-	if (tsc_unstable)
-		goto unreg;
+	if (tsc_unstable) {
+		clocksource_unregister(&clocksource_tsc_early);
+		return 0;
+	}
 
 	if (boot_cpu_has(X86_FEATURE_NONSTOP_TSC_S3))
 		clocksource_tsc.flags |= CLOCK_SOURCE_SUSPEND_NONSTOP;
@@ -1421,9 +1447,10 @@ static int __init init_tsc_clocksource(void)
 		if (boot_cpu_has(X86_FEATURE_ART))
 			art_related_clocksource = &clocksource_tsc;
 		clocksource_register_khz(&clocksource_tsc, tsc_khz);
-unreg:
 		clocksource_unregister(&clocksource_tsc_early);
-		return 0;
+
+		if (!tsc_force_recalibrate)
+			return 0;
 	}
 
 	schedule_delayed_work(&tsc_irqwork, 0);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/tsc: Add option to force HW timer based recalibration
  2022-05-08 14:47 [PATCH] x86/tsc: Add option to force HW timer based recalibration Feng Tang
@ 2022-05-09  4:58 ` Feng Tang
  2022-05-09  7:16   ` Peter Zijlstra
  0 siblings, 1 reply; 9+ messages in thread
From: Feng Tang @ 2022-05-09  4:58 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Peter Zijlstra, Jonathan Corbet, x86,
	linux-kernel
  Cc: paulmck, rui.zhang, len.brown, tim.c.chen

Sorry, just spotted some typos, here is the updated version


From ee8e3d772c623d27d79c43da5a76fb6252175aba Mon Sep 17 00:00:00 2001
From: Feng Tang <feng.tang@intel.com>
Date: Sun, 8 May 2022 20:22:12 +0800
Subject: [PATCH] x86/tsc: Add option to force HW timer based recalibration

Currently when HW provides the tsc freq info through MSR or CPUID(0x15),
the info will be taken as the 'best guess', and kernel will set the
X86_FEATURE_TSC_KNOWN_FREQ flag and skip the HW timer based recalibration,
which works pretty well.

And there is still very few corner case that the freq info is not
accurate enough with small deviation from the actual value, like on
a product with early buggy version of firmware or on some
pre-production hardware.

Add an option 'recalibrate' for 'tsc' kernel parameter to force the
tsc freq recalibration with HPET/PM_TIMER, and warn if the deviation
from previous value is more than about 500 PPM.

Signed-off-by: Feng Tang <feng.tang@intel.com>
---
 .../admin-guide/kernel-parameters.txt         |  5 +++
 arch/x86/kernel/tsc.c                         | 34 ++++++++++++++++---
 2 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 3f1cc5e317ed..1e06196a591e 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5981,6 +5981,11 @@
 			in situations with strict latency requirements (where
 			interruptions from clocksource watchdog are not
 			acceptable).
+			[x86] recalibrate: force to do freq recalibration with
+			a HW timer (HPET or PM_TIMER). When HW provides tsc freq
+			info through MSR or CPUID(0x15), kernel will take it as
+			the 'best guess', but there is corner case that the info
+			could be wrong, and need a double check through HW timer.
 
 	tsc_early_khz=  [X86] Skip early TSC calibration and use the given
 			value instead. Useful when the early TSC frequency discovery
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index cafacb2e58cc..5cf62a58754a 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -48,6 +48,8 @@ static DEFINE_STATIC_KEY_FALSE(__use_tsc);
 
 int tsc_clocksource_reliable;
 
+static int __read_mostly tsc_force_recalibrate;
+
 static u32 art_to_tsc_numerator;
 static u32 art_to_tsc_denominator;
 static u64 art_to_tsc_offset;
@@ -303,6 +305,8 @@ static int __init tsc_setup(char *str)
 		mark_tsc_unstable("boot parameter");
 	if (!strcmp(str, "nowatchdog"))
 		no_tsc_watchdog = 1;
+	if (!strcmp(str, "recalibrate"))
+		tsc_force_recalibrate = 1;
 	return 1;
 }
 
@@ -1374,6 +1378,25 @@ static void tsc_refine_calibration_work(struct work_struct *work)
 	else
 		freq = calc_pmtimer_ref(delta, ref_start, ref_stop);
 
+	/* Will hit this only if tsc_force_recalibrate has been set */
+	if (boot_cpu_has(X86_FEATURE_TSC_KNOWN_FREQ)) {
+
+		/* Warn if the deviation exceeds 500 ppm */
+		if (abs(tsc_khz - freq) > (tsc_khz >> 11)) {
+			pr_warn("Warning: TSC freq calibrated by CPUID/MSR differs from what is calibrated by HW timer, please check with vendor!!\n");
+			pr_info("Previous calibrated TSC freq:\t %lu.%03lu MHz\n",
+				(unsigned long)tsc_khz / 1000,
+				(unsigned long)tsc_khz % 1000);
+		}
+
+		pr_info("TSC freq recalibrated by [%s]:\t %lu.%03lu MHz\n",
+			hpet ? "HPET" : "PM_TIMER",
+			(unsigned long)freq / 1000,
+			(unsigned long)freq % 1000);
+
+		return;
+	}
+
 	/* Make sure we're within 1% */
 	if (abs(tsc_khz - freq) > tsc_khz/100)
 		goto out;
@@ -1407,8 +1430,10 @@ static int __init init_tsc_clocksource(void)
 	if (!boot_cpu_has(X86_FEATURE_TSC) || !tsc_khz)
 		return 0;
 
-	if (tsc_unstable)
-		goto unreg;
+	if (tsc_unstable) {
+		clocksource_unregister(&clocksource_tsc_early);
+		return 0;
+	}
 
 	if (boot_cpu_has(X86_FEATURE_NONSTOP_TSC_S3))
 		clocksource_tsc.flags |= CLOCK_SOURCE_SUSPEND_NONSTOP;
@@ -1421,9 +1446,10 @@ static int __init init_tsc_clocksource(void)
 		if (boot_cpu_has(X86_FEATURE_ART))
 			art_related_clocksource = &clocksource_tsc;
 		clocksource_register_khz(&clocksource_tsc, tsc_khz);
-unreg:
 		clocksource_unregister(&clocksource_tsc_early);
-		return 0;
+
+		if (!tsc_force_recalibrate)
+			return 0;
 	}
 
 	schedule_delayed_work(&tsc_irqwork, 0);
-- 
2.27.0


Thanks,
Feng

On Sun, May 08, 2022 at 10:47:33PM +0800, Feng Tang wrote:
> Currently when HW provides the tsc freq info through MSR or CPUID(0x15),
> the info will be taken as the 'best guess', and kernel will set the
> X86_FEATURE_TSC_KNOWN_FREQ flag and skip the HW timer based recalibration,
> which works pretty well.
> 
> And there is still very few corner case that the freq info is not
> accurate enough will small deviation from the actual value, like on
> a product with early version (fix needed) of firmware or some
> pre-production hardware.
> 
> Add an option 'recalibrate' for 'tsc' kernel parameter to force the
> tsc freq recalibration with HPET/PM_TIMER, and warn if the deviation
> from previous value is more than about 500 PPM.
> 
> Signed-off-by: Feng Tang <feng.tang@intel.com>
> ---
>  .../admin-guide/kernel-parameters.txt         |  5 +++
>  arch/x86/kernel/tsc.c                         | 35 ++++++++++++++++---
>  2 files changed, 36 insertions(+), 4 deletions(-)

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/tsc: Add option to force HW timer based recalibration
  2022-05-09  4:58 ` Feng Tang
@ 2022-05-09  7:16   ` Peter Zijlstra
  2022-05-09  7:30     ` Feng Tang
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2022-05-09  7:16 UTC (permalink / raw)
  To: Feng Tang
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Jonathan Corbet, x86, linux-kernel, paulmck,
	rui.zhang, len.brown, tim.c.chen

On Mon, May 09, 2022 at 12:58:39PM +0800, Feng Tang wrote:
> Sorry, just spotted some typos, here is the updated version
> 
> 
> From ee8e3d772c623d27d79c43da5a76fb6252175aba Mon Sep 17 00:00:00 2001
> From: Feng Tang <feng.tang@intel.com>
> Date: Sun, 8 May 2022 20:22:12 +0800
> Subject: [PATCH] x86/tsc: Add option to force HW timer based recalibration
> 
> Currently when HW provides the tsc freq info through MSR or CPUID(0x15),
> the info will be taken as the 'best guess', and kernel will set the
> X86_FEATURE_TSC_KNOWN_FREQ flag and skip the HW timer based recalibration,
> which works pretty well.
> 
> And there is still very few corner case that the freq info is not
> accurate enough with small deviation from the actual value, like on
> a product with early buggy version of firmware or on some
> pre-production hardware.
> 
> Add an option 'recalibrate' for 'tsc' kernel parameter to force the
> tsc freq recalibration with HPET/PM_TIMER, and warn if the deviation
> from previous value is more than about 500 PPM.
> 
> Signed-off-by: Feng Tang <feng.tang@intel.com>

Why isn't 'tsc_early_khz=' not working for you? Afaict that will
override calibrate_tsc() when provided and as such can be used on these
early platforms for provide the right value until such time that the
firmware is fixed.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/tsc: Add option to force HW timer based recalibration
  2022-05-09  7:16   ` Peter Zijlstra
@ 2022-05-09  7:30     ` Feng Tang
  2022-05-09 10:01       ` Thomas Gleixner
  0 siblings, 1 reply; 9+ messages in thread
From: Feng Tang @ 2022-05-09  7:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Jonathan Corbet, x86, linux-kernel, paulmck,
	rui.zhang, len.brown, tim.c.chen

Hi Peter,

Thanks for the review!

On Mon, May 09, 2022 at 09:16:52AM +0200, Peter Zijlstra wrote:
> On Mon, May 09, 2022 at 12:58:39PM +0800, Feng Tang wrote:
> > Sorry, just spotted some typos, here is the updated version
> > 
> > 
> > From ee8e3d772c623d27d79c43da5a76fb6252175aba Mon Sep 17 00:00:00 2001
> > From: Feng Tang <feng.tang@intel.com>
> > Date: Sun, 8 May 2022 20:22:12 +0800
> > Subject: [PATCH] x86/tsc: Add option to force HW timer based recalibration
> > 
> > Currently when HW provides the tsc freq info through MSR or CPUID(0x15),
> > the info will be taken as the 'best guess', and kernel will set the
> > X86_FEATURE_TSC_KNOWN_FREQ flag and skip the HW timer based recalibration,
> > which works pretty well.
> > 
> > And there is still very few corner case that the freq info is not
> > accurate enough with small deviation from the actual value, like on
> > a product with early buggy version of firmware or on some
> > pre-production hardware.
> > 
> > Add an option 'recalibrate' for 'tsc' kernel parameter to force the
> > tsc freq recalibration with HPET/PM_TIMER, and warn if the deviation
> > from previous value is more than about 500 PPM.
> > 
> > Signed-off-by: Feng Tang <feng.tang@intel.com>
> 
> Why isn't 'tsc_early_khz=' not working for you? Afaict that will
> override calibrate_tsc() when provided and as such can be used on these
> early platforms for provide the right value until such time that the
> firmware is fixed.

For the early platforms, the problem we met is we don't know what
is the 'correct' tsc-freq, and the value from MSR/CUPID could be wrong. 

And there was some generation, that after enabling some feature, each
instance of HW will have slightly different frequency, so there is
no central "one for all" value to set for 'tsc_early_khz'.

This option is more like a way to double-check the correctness of
tsc-freq got from MSR/CPUID(0x15).

Thanks,
Feng






^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/tsc: Add option to force HW timer based recalibration
  2022-05-09  7:30     ` Feng Tang
@ 2022-05-09 10:01       ` Thomas Gleixner
  2022-05-09 11:22         ` Feng Tang
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas Gleixner @ 2022-05-09 10:01 UTC (permalink / raw)
  To: Feng Tang, Peter Zijlstra
  Cc: Ingo Molnar, Borislav Petkov, Dave Hansen, H . Peter Anvin,
	Jonathan Corbet, x86, linux-kernel, paulmck, rui.zhang,
	len.brown, tim.c.chen

Feng,

On Mon, May 09 2022 at 15:30, Feng Tang wrote:
> On Mon, May 09, 2022 at 09:16:52AM +0200, Peter Zijlstra wrote:
>> On Mon, May 09, 2022 at 12:58:39PM +0800, Feng Tang wrote:
>> > And there is still very few corner case that the freq info is not
>> > accurate enough with small deviation from the actual value, like on
>> > a product with early buggy version of firmware or on some
>> > pre-production hardware.
>> >
>> > Add an option 'recalibrate' for 'tsc' kernel parameter to force the
>> > tsc freq recalibration with HPET/PM_TIMER, and warn if the deviation
>> > from previous value is more than about 500 PPM.
>> > 
>> > Signed-off-by: Feng Tang <feng.tang@intel.com>
>> 
>> Why isn't 'tsc_early_khz=' not working for you? Afaict that will
>> override calibrate_tsc() when provided and as such can be used on these
>> early platforms for provide the right value until such time that the
>> firmware is fixed.
>
> For the early platforms, the problem we met is we don't know what
> is the 'correct' tsc-freq, and the value from MSR/CUPID could be wrong. 
>
> And there was some generation, that after enabling some feature, each
> instance of HW will have slightly different frequency, so there is
> no central "one for all" value to set for 'tsc_early_khz'.
>
> This option is more like a way to double-check the correctness of
> tsc-freq got from MSR/CPUID(0x15).

If at all it's a workaround for the inability and ignorance of firmware
people. The crystal frequency and the TSC/crystal ratio _are_ known to
the system designer and firmware people. It's really not asked too much
to put the correct values into CPUID(0x15) and have proper quality
control to ensure the correctness.

The whole argument about early firmware and pre-production hardware is
bogus. It's 2022 and we are debating this problem for more than a decade
now and still hardware and firmware people think they can do what they
want and it all can be "fixed" in software. It's not rocket science to
get this straight.

Aside of that HPET has become unrealiable and PM timer is not guaranteed
to be there either. So we really do not need a mechanism to enforce
recalibration against something which is not guaranteed to provide
sensible information.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/tsc: Add option to force HW timer based recalibration
  2022-05-09 10:01       ` Thomas Gleixner
@ 2022-05-09 11:22         ` Feng Tang
  2022-05-09 13:03           ` Thomas Gleixner
  0 siblings, 1 reply; 9+ messages in thread
From: Feng Tang @ 2022-05-09 11:22 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Jonathan Corbet, x86, linux-kernel, paulmck,
	rui.zhang, len.brown, tim.c.chen

Hi Thomas,

Thanks for the comments!

On Mon, May 09, 2022 at 12:01:42PM +0200, Thomas Gleixner wrote:
> Feng,
> 
> On Mon, May 09 2022 at 15:30, Feng Tang wrote:
> > On Mon, May 09, 2022 at 09:16:52AM +0200, Peter Zijlstra wrote:
> >> On Mon, May 09, 2022 at 12:58:39PM +0800, Feng Tang wrote:
> >> > And there is still very few corner case that the freq info is not
> >> > accurate enough with small deviation from the actual value, like on
> >> > a product with early buggy version of firmware or on some
> >> > pre-production hardware.
> >> >
> >> > Add an option 'recalibrate' for 'tsc' kernel parameter to force the
> >> > tsc freq recalibration with HPET/PM_TIMER, and warn if the deviation
> >> > from previous value is more than about 500 PPM.
> >> > 
> >> > Signed-off-by: Feng Tang <feng.tang@intel.com>
> >> 
> >> Why isn't 'tsc_early_khz=' not working for you? Afaict that will
> >> override calibrate_tsc() when provided and as such can be used on these
> >> early platforms for provide the right value until such time that the
> >> firmware is fixed.
> >
> > For the early platforms, the problem we met is we don't know what
> > is the 'correct' tsc-freq, and the value from MSR/CUPID could be wrong. 
> >
> > And there was some generation, that after enabling some feature, each
> > instance of HW will have slightly different frequency, so there is
> > no central "one for all" value to set for 'tsc_early_khz'.
> >
> > This option is more like a way to double-check the correctness of
> > tsc-freq got from MSR/CPUID(0x15).
> 
> If at all it's a workaround for the inability and ignorance of firmware
> people. The crystal frequency and the TSC/crystal ratio _are_ known to
> the system designer and firmware people. It's really not asked too much
> to put the correct values into CPUID(0x15) and have proper quality
> control to ensure the correctness.
> 
> The whole argument about early firmware and pre-production hardware is
> bogus. It's 2022 and we are debating this problem for more than a decade
> now and still hardware and firmware people think they can do what they
> want and it all can be "fixed" in software. It's not rocket science to
> get this straight.
 
I completely understand it, as we've also suffered a lot from such
problems. This patch doesn't change any current work flow, and it simply
calibrates and prints out the freq info (warn if there is big deviation).
It acctually provides SW guys a quick way to argue with HW/FW people:
"See! You've given us a wrong number, please fix it", otherwise I heard
there was customer long ago  who used atomic clock to prove the deviation. 

> Aside of that HPET has become unrealiable and PM timer is not guaranteed
> to be there either. So we really do not need a mechanism to enforce
> recalibration against something which is not guaranteed to provide
> sensible information.

Correct. The HPET on new client platforms turn to be disabled for the
PC10 issue, though it's fine on server platforms where tsc accuracy is
more important. Also even for the disabled HPET case, I remembered that
you've once suggested to leverage its capability for calibration, and
only disable it before cpu idle framework really starts :)

Thanks,
Feng

> Thanks,
> 
>         tglx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/tsc: Add option to force HW timer based recalibration
  2022-05-09 11:22         ` Feng Tang
@ 2022-05-09 13:03           ` Thomas Gleixner
  2022-05-09 13:36             ` Feng Tang
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas Gleixner @ 2022-05-09 13:03 UTC (permalink / raw)
  To: Feng Tang
  Cc: Peter Zijlstra, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Jonathan Corbet, x86, linux-kernel, paulmck,
	rui.zhang, len.brown, tim.c.chen

Feng,

On Mon, May 09 2022 at 19:22, Feng Tang wrote:
> On Mon, May 09, 2022 at 12:01:42PM +0200, Thomas Gleixner wrote:
>> > This option is more like a way to double-check the correctness of
>> > tsc-freq got from MSR/CPUID(0x15).
>> 
>> If at all it's a workaround for the inability and ignorance of firmware
>> people. The crystal frequency and the TSC/crystal ratio _are_ known to
>> the system designer and firmware people. It's really not asked too much
>> to put the correct values into CPUID(0x15) and have proper quality
>> control to ensure the correctness.
>> 
>> The whole argument about early firmware and pre-production hardware is
>> bogus. It's 2022 and we are debating this problem for more than a decade
>> now and still hardware and firmware people think they can do what they
>> want and it all can be "fixed" in software. It's not rocket science to
>> get this straight.
>  
> I completely understand it, as we've also suffered a lot from such
> problems. This patch doesn't change any current work flow, and it simply
> calibrates and prints out the freq info (warn if there is big deviation).
> It acctually provides SW guys a quick way to argue with HW/FW people:
> "See! You've given us a wrong number, please fix it", otherwise I heard
> there was customer long ago  who used atomic clock to prove the
> deviation.

Then please say clearly in the changelog what this is about.

 "Currently when HW provides the tsc freq info through MSR or
  CPUID(0x15), the info will be taken as the 'best guess', and kernel
  will set the X86_FEATURE_TSC_KNOWN_FREQ flag and skip the HW timer
  based recalibration, which works pretty well.

  And there is still very few corner case that the freq info is not
  accurate enough will small deviation from the actual value, like on
  a product with early version (fix needed) of firmware or some
  pre-production hardware.

  Add..."

versus:

 "The kernel assumes that the TSC frequency which is provided by the
  hardware / firmware via MSRs or CPUID(0x15) is correct after applying
  a few basic consistency checks. This disables the TSC recalibration
  against HPET or PM timer.

  As a result there is no mechanism to validate that frequency in cases
  where a firmware or hardware defect is suspected.

  Add..."

Can you spot the difference in intention? The first one sounds like:

    We need to tolerate the hardware/firmware induced crap.

The second one says:

    Provide a mechanism to validate because we cannot trust hardware /
    firmware.

Hmm?

>> Aside of that HPET has become unrealiable and PM timer is not guaranteed
>> to be there either. So we really do not need a mechanism to enforce
>> recalibration against something which is not guaranteed to provide
>> sensible information.
>
> Correct. The HPET on new client platforms turn to be disabled for the
> PC10 issue, though it's fine on server platforms where tsc accuracy is
> more important.

TSC accuracy is important in any case. Why would it be more important on
server platforms? Just because?

> Also even for the disabled HPET case, I remembered that you've once
> suggested to leverage its capability for calibration, and only disable
> it before cpu idle framework really starts :)

Correct, but that would only be valid for early boot calibration and not
accross the recalibration.

That's why we ended up disabling HPET early in case of PC10. See
hpet_is_pc10_damaged().

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/tsc: Add option to force HW timer based recalibration
  2022-05-09 13:03           ` Thomas Gleixner
@ 2022-05-09 13:36             ` Feng Tang
  2022-05-09 13:43               ` Thomas Gleixner
  0 siblings, 1 reply; 9+ messages in thread
From: Feng Tang @ 2022-05-09 13:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Jonathan Corbet, x86, linux-kernel, paulmck,
	rui.zhang, len.brown, tim.c.chen

On Mon, May 09, 2022 at 03:03:25PM +0200, Thomas Gleixner wrote:
> Feng,
> 
> On Mon, May 09 2022 at 19:22, Feng Tang wrote:
> > On Mon, May 09, 2022 at 12:01:42PM +0200, Thomas Gleixner wrote:
> >> > This option is more like a way to double-check the correctness of
> >> > tsc-freq got from MSR/CPUID(0x15).
> >> 
> >> If at all it's a workaround for the inability and ignorance of firmware
> >> people. The crystal frequency and the TSC/crystal ratio _are_ known to
> >> the system designer and firmware people. It's really not asked too much
> >> to put the correct values into CPUID(0x15) and have proper quality
> >> control to ensure the correctness.
> >> 
> >> The whole argument about early firmware and pre-production hardware is
> >> bogus. It's 2022 and we are debating this problem for more than a decade
> >> now and still hardware and firmware people think they can do what they
> >> want and it all can be "fixed" in software. It's not rocket science to
> >> get this straight.
> >  
> > I completely understand it, as we've also suffered a lot from such
> > problems. This patch doesn't change any current work flow, and it simply
> > calibrates and prints out the freq info (warn if there is big deviation).
> > It acctually provides SW guys a quick way to argue with HW/FW people:
> > "See! You've given us a wrong number, please fix it", otherwise I heard
> > there was customer long ago  who used atomic clock to prove the
> > deviation.
> 
> Then please say clearly in the changelog what this is about.
> 
>  "Currently when HW provides the tsc freq info through MSR or
>   CPUID(0x15), the info will be taken as the 'best guess', and kernel
>   will set the X86_FEATURE_TSC_KNOWN_FREQ flag and skip the HW timer
>   based recalibration, which works pretty well.
> 
>   And there is still very few corner case that the freq info is not
>   accurate enough will small deviation from the actual value, like on
>   a product with early version (fix needed) of firmware or some
>   pre-production hardware.
> 
>   Add..."
> 
> versus:
> 
>  "The kernel assumes that the TSC frequency which is provided by the
>   hardware / firmware via MSRs or CPUID(0x15) is correct after applying
>   a few basic consistency checks. This disables the TSC recalibration
>   against HPET or PM timer.
> 
>   As a result there is no mechanism to validate that frequency in cases
>   where a firmware or hardware defect is suspected.
> 
>   Add..."
> 
> Can you spot the difference in intention? The first one sounds like:
> 
>     We need to tolerate the hardware/firmware induced crap.
> 
> The second one says:
> 
>     Provide a mechanism to validate because we cannot trust hardware /
>     firmware.
> 
> Hmm?

I see now. My commit log was ambiguous and didn't state clearly what
was the real problem and what we want to achieve. 

Many thanks for rewording! will use it as commit log in the next spin.

> >> Aside of that HPET has become unrealiable and PM timer is not guaranteed
> >> to be there either. So we really do not need a mechanism to enforce
> >> recalibration against something which is not guaranteed to provide
> >> sensible information.
> >
> > Correct. The HPET on new client platforms turn to be disabled for the
> > PC10 issue, though it's fine on server platforms where tsc accuracy is
> > more important.
> 
> TSC accuracy is important in any case. Why would it be more important on
> server platforms? Just because?

It was my wild guess, as I thought servers used by enterprise or financial
system may care more about the accuracy.

> > Also even for the disabled HPET case, I remembered that you've once
> > suggested to leverage its capability for calibration, and only disable
> > it before cpu idle framework really starts :)
> 
> Correct, but that would only be valid for early boot calibration and not
> accross the recalibration.
> 
> That's why we ended up disabling HPET early in case of PC10. See
> hpet_is_pc10_damaged().

Got it. And I just checked some new client generations at hand like
Icelake and Alderlake, they both have acpi_pm timer and CPUID (0x15) 
supported.

Thanks,
Feng

> Thanks,
> 
>         tglx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/tsc: Add option to force HW timer based recalibration
  2022-05-09 13:36             ` Feng Tang
@ 2022-05-09 13:43               ` Thomas Gleixner
  0 siblings, 0 replies; 9+ messages in thread
From: Thomas Gleixner @ 2022-05-09 13:43 UTC (permalink / raw)
  To: Feng Tang
  Cc: Peter Zijlstra, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Jonathan Corbet, x86, linux-kernel, paulmck,
	rui.zhang, len.brown, tim.c.chen

On Mon, May 09 2022 at 21:36, Feng Tang wrote:
> On Mon, May 09, 2022 at 03:03:25PM +0200, Thomas Gleixner wrote:
>> TSC accuracy is important in any case. Why would it be more important on
>> server platforms? Just because?
>
> It was my wild guess, as I thought servers used by enterprise or financial
> system may care more about the accuracy.

Guess-ineering is not really leading anywhere. Facts matter.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-05-09 13:43 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-08 14:47 [PATCH] x86/tsc: Add option to force HW timer based recalibration Feng Tang
2022-05-09  4:58 ` Feng Tang
2022-05-09  7:16   ` Peter Zijlstra
2022-05-09  7:30     ` Feng Tang
2022-05-09 10:01       ` Thomas Gleixner
2022-05-09 11:22         ` Feng Tang
2022-05-09 13:03           ` Thomas Gleixner
2022-05-09 13:36             ` Feng Tang
2022-05-09 13:43               ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).