linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 1/2] x86/tsc: add a timer to make sure tsc_adjust is always checked
@ 2021-11-17  2:37 Feng Tang
  2021-11-17  2:37 ` [PATCH v3 2/2] x86/tsc: skip tsc watchdog checking for qualified platforms Feng Tang
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Feng Tang @ 2021-11-17  2:37 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Peter Zijlstra, x86, linux-kernel
  Cc: paulmck, rui.zhang, andi.kleen, len.brown, tim.c.chen, Feng Tang

Normally the tsc_sync will get checked every time system enters idle state,
but Thomas Gleixner mentioned there is still a caveat that a system won't
enter idle [1], either because it's too busy or configured purposely to not
enter idle. Setup a periodic timer (every 10 minitues) to make sure the
check is always on.

[1]. https://lore.kernel.org/lkml/875z286xtk.fsf@nanos.tec.linutronix.de/
Signed-off-by: Feng Tang <feng.tang@intel.com>
---
Change log:

  v3:
     * rebase against 5.16-rc1
     * small change to code comments and commit log
  
  v2:
     * skip timer setup when tsc_clocksource_reliabe==1 (Thomas)
     * refine comment and code format (Thomas) 

 arch/x86/kernel/tsc_sync.c | 41 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c
index 50a4515fe0ad..911df742145f 100644
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -30,6 +30,7 @@ struct tsc_adjust {
 };
 
 static DEFINE_PER_CPU(struct tsc_adjust, tsc_adjust);
+static struct timer_list tsc_sync_check_timer;
 
 /*
  * TSC's on different sockets may be reset asynchronously.
@@ -77,6 +78,46 @@ void tsc_verify_tsc_adjust(bool resume)
 	}
 }
 
+/*
+ * Normally the tsc_sync will be checked every time system enters idle
+ * state, but there is still caveat that a system won't enter idle,
+ * either because it's too busy or configured purposely to not enter
+ * idle.
+ *
+ * So setup a periodic timer (every 10 minutes) to make sure the check
+ * is always on.
+ */
+
+#define SYNC_CHECK_INTERVAL		(HZ * 600)
+
+static void tsc_sync_check_timer_fn(struct timer_list *unused)
+{
+	int next_cpu;
+
+	tsc_verify_tsc_adjust(false);
+
+	/* Run the check for all onlined CPUs in turn */
+	next_cpu = cpumask_next(raw_smp_processor_id(), cpu_online_mask);
+	if (next_cpu >= nr_cpu_ids)
+		next_cpu = cpumask_first(cpu_online_mask);
+
+	tsc_sync_check_timer.expires += SYNC_CHECK_INTERVAL;
+	add_timer_on(&tsc_sync_check_timer, next_cpu);
+}
+
+static int __init start_sync_check_timer(void)
+{
+	if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST) || tsc_clocksource_reliable)
+		return 0;
+
+	timer_setup(&tsc_sync_check_timer, tsc_sync_check_timer_fn, 0);
+	tsc_sync_check_timer.expires = jiffies + SYNC_CHECK_INTERVAL;
+	add_timer(&tsc_sync_check_timer);
+
+	return 0;
+}
+late_initcall(start_sync_check_timer);
+
 static void tsc_sanitize_first_cpu(struct tsc_adjust *cur, s64 bootval,
 				   unsigned int cpu, bool bootcpu)
 {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 2/2] x86/tsc: skip tsc watchdog checking for qualified platforms
  2021-11-17  2:37 [PATCH v3 1/2] x86/tsc: add a timer to make sure tsc_adjust is always checked Feng Tang
@ 2021-11-17  2:37 ` Feng Tang
  2021-11-30  6:46   ` Feng Tang
                     ` (3 more replies)
  2021-12-01 23:47 ` [tip: x86/urgent] x86/tsc: Add a timer to make sure TSC_adjust is always checked tip-bot2 for Feng Tang
  2022-03-14 17:52 ` [PATCH v3 1/2] x86/tsc: add a timer to make sure tsc_adjust " Nicolas Saenz Julienne
  2 siblings, 4 replies; 23+ messages in thread
From: Feng Tang @ 2021-11-17  2:37 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Peter Zijlstra, x86, linux-kernel
  Cc: paulmck, rui.zhang, andi.kleen, len.brown, tim.c.chen, Feng Tang

There are cases that tsc clocksources are wrongly judged as unstable by
clocksource watchdogs like hpet, acpi_pm or 'refined-jiffies'. While
there is hardly a general reliable way to check the validity of a
watchdog, and to protect the innocent tsc, Thomas Gleixner proposed [1]:

"I'm inclined to lift that requirement when the CPU has:

    1) X86_FEATURE_CONSTANT_TSC
    2) X86_FEATURE_NONSTOP_TSC
    3) X86_FEATURE_NONSTOP_TSC_S3
    4) X86_FEATURE_TSC_ADJUST
    5) At max. 4 sockets

 After two decades of horrors we're finally at a point where TSC seems
 to be halfway reliable and less abused by BIOS tinkerers. TSC_ADJUST
 was really key as we can now detect even small modifications reliably
 and the important point is that we can cure them as well (not pretty
 but better than all other options)."

As feature #3 X86_FEATURE_NONSTOP_TSC_S3 only exists on several generations
of Atom processor, and is always coupled with X86_FEATURE_CONSTANT_TSC
and X86_FEATURE_NONSTOP_TSC, skip checking it, and also be more defensive
to use maxim of 2 sockets.

The check is done inside tsc_init() before registering 'tsc-early' and
'tsc' clocksources, as there were cases that both of them had been
wrongly judged as unreliable.

For more background of tsc/watchdog, there is a good summary in [2]

[1]. https://lore.kernel.org/lkml/87eekfk8bd.fsf@nanos.tec.linutronix.de/
[2]. https://lore.kernel.org/lkml/87a6pimt1f.ffs@nanos.tec.linutronix.de/
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Feng Tang <feng.tang@intel.com>
---
Change log:

  v3:
    * rebased against 5.16-rc1
    * refine commit log

  v2:
    * Directly skip watchdog check without messing flag
      'tsc_clocksource_reliable' (Thomas)

 arch/x86/kernel/tsc.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 2e076a459a0c..389511f59101 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1180,6 +1180,12 @@ void mark_tsc_unstable(char *reason)
 
 EXPORT_SYMBOL_GPL(mark_tsc_unstable);
 
+static void __init tsc_skip_watchdog_verify(void)
+{
+	clocksource_tsc_early.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
+	clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
+}
+
 static void __init check_system_tsc_reliable(void)
 {
 #if defined(CONFIG_MGEODEGX1) || defined(CONFIG_MGEODE_LX) || defined(CONFIG_X86_GENERIC)
@@ -1196,6 +1202,17 @@ static void __init check_system_tsc_reliable(void)
 #endif
 	if (boot_cpu_has(X86_FEATURE_TSC_RELIABLE))
 		tsc_clocksource_reliable = 1;
+
+	/*
+	 * Ideally the socket number should be checked, but this is called
+	 * by tsc_init() which is in early boot phase and the socket numbers
+	 * may not be available. Use 'nr_online_nodes' as a fallback solution
+	 */
+	if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
+	    boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
+	    boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
+	    nr_online_nodes <= 2)
+		tsc_skip_watchdog_verify();
 }
 
 /*
@@ -1387,9 +1404,6 @@ static int __init init_tsc_clocksource(void)
 	if (tsc_unstable)
 		goto unreg;
 
-	if (tsc_clocksource_reliable || no_tsc_watchdog)
-		clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
-
 	if (boot_cpu_has(X86_FEATURE_NONSTOP_TSC_S3))
 		clocksource_tsc.flags |= CLOCK_SOURCE_SUSPEND_NONSTOP;
 
@@ -1527,7 +1541,7 @@ void __init tsc_init(void)
 	}
 
 	if (tsc_clocksource_reliable || no_tsc_watchdog)
-		clocksource_tsc_early.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
+		tsc_skip_watchdog_verify();
 
 	clocksource_register_khz(&clocksource_tsc_early, tsc_khz);
 	detect_art();
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] x86/tsc: skip tsc watchdog checking for qualified platforms
  2021-11-17  2:37 ` [PATCH v3 2/2] x86/tsc: skip tsc watchdog checking for qualified platforms Feng Tang
@ 2021-11-30  6:46   ` Feng Tang
  2021-11-30 14:40     ` Paul E. McKenney
  2021-12-01  4:45   ` Luming Yu
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 23+ messages in thread
From: Feng Tang @ 2021-11-30  6:46 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Peter Zijlstra, x86, linux-kernel
  Cc: paulmck, rui.zhang, andi.kleen, len.brown, tim.c.chen

On Wed, Nov 17, 2021 at 10:37:51AM +0800, Feng Tang wrote:
> There are cases that tsc clocksources are wrongly judged as unstable by
> clocksource watchdogs like hpet, acpi_pm or 'refined-jiffies'. While
> there is hardly a general reliable way to check the validity of a
> watchdog, and to protect the innocent tsc, Thomas Gleixner proposed [1]:

Hi All,

Some more update, last week we got report from validation team that
the "tsc judged as unstable" happened on latest desktop platform,
which has serial earlyprintk enabled, and the watchdog here is
'refined-jiffies' while hpet is disabled during the PC10 check. I
tried severy other client platforms I can find: Kabylake, Icelake
and Alderlake, and the mis-judging can be easily reproduced on
Icelake and Alderlake (not on Kabylake). Which could be cued by
this 2/2 patch.

Also, today we got same report on a 2-sockets Icelake Server with
5.5 kernel, while the watchdog is 'hpet', and the system is running
stressful big-data workload.

Thanks,
Feng


> "I'm inclined to lift that requirement when the CPU has:
> 
>     1) X86_FEATURE_CONSTANT_TSC
>     2) X86_FEATURE_NONSTOP_TSC
>     3) X86_FEATURE_NONSTOP_TSC_S3
>     4) X86_FEATURE_TSC_ADJUST
>     5) At max. 4 sockets
> 
>  After two decades of horrors we're finally at a point where TSC seems
>  to be halfway reliable and less abused by BIOS tinkerers. TSC_ADJUST
>  was really key as we can now detect even small modifications reliably
>  and the important point is that we can cure them as well (not pretty
>  but better than all other options)."
> 
> As feature #3 X86_FEATURE_NONSTOP_TSC_S3 only exists on several generations
> of Atom processor, and is always coupled with X86_FEATURE_CONSTANT_TSC
> and X86_FEATURE_NONSTOP_TSC, skip checking it, and also be more defensive
> to use maxim of 2 sockets.
> 
> The check is done inside tsc_init() before registering 'tsc-early' and
> 'tsc' clocksources, as there were cases that both of them had been
> wrongly judged as unreliable.
> 
> For more background of tsc/watchdog, there is a good summary in [2]
> 
> [1]. https://lore.kernel.org/lkml/87eekfk8bd.fsf@nanos.tec.linutronix.de/
> [2]. https://lore.kernel.org/lkml/87a6pimt1f.ffs@nanos.tec.linutronix.de/
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Feng Tang <feng.tang@intel.com>
> ---
> Change log:
> 
>   v3:
>     * rebased against 5.16-rc1
>     * refine commit log
> 
>   v2:
>     * Directly skip watchdog check without messing flag
>       'tsc_clocksource_reliable' (Thomas)
> 
>  arch/x86/kernel/tsc.c | 22 ++++++++++++++++++----
>  1 file changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index 2e076a459a0c..389511f59101 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -1180,6 +1180,12 @@ void mark_tsc_unstable(char *reason)
>  
>  EXPORT_SYMBOL_GPL(mark_tsc_unstable);
>  
> +static void __init tsc_skip_watchdog_verify(void)
> +{
> +	clocksource_tsc_early.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
> +	clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
> +}
> +
>  static void __init check_system_tsc_reliable(void)
>  {
>  #if defined(CONFIG_MGEODEGX1) || defined(CONFIG_MGEODE_LX) || defined(CONFIG_X86_GENERIC)
> @@ -1196,6 +1202,17 @@ static void __init check_system_tsc_reliable(void)
>  #endif
>  	if (boot_cpu_has(X86_FEATURE_TSC_RELIABLE))
>  		tsc_clocksource_reliable = 1;
> +
> +	/*
> +	 * Ideally the socket number should be checked, but this is called
> +	 * by tsc_init() which is in early boot phase and the socket numbers
> +	 * may not be available. Use 'nr_online_nodes' as a fallback solution
> +	 */
> +	if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
> +	    boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
> +	    boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
> +	    nr_online_nodes <= 2)
> +		tsc_skip_watchdog_verify();
>  }
>  
>  /*
> @@ -1387,9 +1404,6 @@ static int __init init_tsc_clocksource(void)
>  	if (tsc_unstable)
>  		goto unreg;
>  
> -	if (tsc_clocksource_reliable || no_tsc_watchdog)
> -		clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
> -
>  	if (boot_cpu_has(X86_FEATURE_NONSTOP_TSC_S3))
>  		clocksource_tsc.flags |= CLOCK_SOURCE_SUSPEND_NONSTOP;
>  
> @@ -1527,7 +1541,7 @@ void __init tsc_init(void)
>  	}
>  
>  	if (tsc_clocksource_reliable || no_tsc_watchdog)
> -		clocksource_tsc_early.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
> +		tsc_skip_watchdog_verify();
>  
>  	clocksource_register_khz(&clocksource_tsc_early, tsc_khz);
>  	detect_art();
> -- 
> 2.27.0

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] x86/tsc: skip tsc watchdog checking for qualified platforms
  2021-11-30  6:46   ` Feng Tang
@ 2021-11-30 14:40     ` Paul E. McKenney
  2021-11-30 15:02       ` Feng Tang
  0 siblings, 1 reply; 23+ messages in thread
From: Paul E. McKenney @ 2021-11-30 14:40 UTC (permalink / raw)
  To: Feng Tang
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Peter Zijlstra, x86, linux-kernel, rui.zhang,
	andi.kleen, len.brown, tim.c.chen

On Tue, Nov 30, 2021 at 02:46:23PM +0800, Feng Tang wrote:
> On Wed, Nov 17, 2021 at 10:37:51AM +0800, Feng Tang wrote:
> > There are cases that tsc clocksources are wrongly judged as unstable by
> > clocksource watchdogs like hpet, acpi_pm or 'refined-jiffies'. While
> > there is hardly a general reliable way to check the validity of a
> > watchdog, and to protect the innocent tsc, Thomas Gleixner proposed [1]:
> 
> Hi All,
> 
> Some more update, last week we got report from validation team that
> the "tsc judged as unstable" happened on latest desktop platform,
> which has serial earlyprintk enabled, and the watchdog here is
> 'refined-jiffies' while hpet is disabled during the PC10 check. I
> tried severy other client platforms I can find: Kabylake, Icelake
> and Alderlake, and the mis-judging can be easily reproduced on
> Icelake and Alderlake (not on Kabylake). Which could be cued by
> this 2/2 patch.
> 
> Also, today we got same report on a 2-sockets Icelake Server with
> 5.5 kernel, while the watchdog is 'hpet', and the system is running
> stressful big-data workload.

Were these tests run with Waiman's latest patch series?  The first
two of them are on RCU's "dev" branch.

							Thanx, Paul

> Thanks,
> Feng
> 
> 
> > "I'm inclined to lift that requirement when the CPU has:
> > 
> >     1) X86_FEATURE_CONSTANT_TSC
> >     2) X86_FEATURE_NONSTOP_TSC
> >     3) X86_FEATURE_NONSTOP_TSC_S3
> >     4) X86_FEATURE_TSC_ADJUST
> >     5) At max. 4 sockets
> > 
> >  After two decades of horrors we're finally at a point where TSC seems
> >  to be halfway reliable and less abused by BIOS tinkerers. TSC_ADJUST
> >  was really key as we can now detect even small modifications reliably
> >  and the important point is that we can cure them as well (not pretty
> >  but better than all other options)."
> > 
> > As feature #3 X86_FEATURE_NONSTOP_TSC_S3 only exists on several generations
> > of Atom processor, and is always coupled with X86_FEATURE_CONSTANT_TSC
> > and X86_FEATURE_NONSTOP_TSC, skip checking it, and also be more defensive
> > to use maxim of 2 sockets.
> > 
> > The check is done inside tsc_init() before registering 'tsc-early' and
> > 'tsc' clocksources, as there were cases that both of them had been
> > wrongly judged as unreliable.
> > 
> > For more background of tsc/watchdog, there is a good summary in [2]
> > 
> > [1]. https://lore.kernel.org/lkml/87eekfk8bd.fsf@nanos.tec.linutronix.de/
> > [2]. https://lore.kernel.org/lkml/87a6pimt1f.ffs@nanos.tec.linutronix.de/
> > Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> > Signed-off-by: Feng Tang <feng.tang@intel.com>
> > ---
> > Change log:
> > 
> >   v3:
> >     * rebased against 5.16-rc1
> >     * refine commit log
> > 
> >   v2:
> >     * Directly skip watchdog check without messing flag
> >       'tsc_clocksource_reliable' (Thomas)
> > 
> >  arch/x86/kernel/tsc.c | 22 ++++++++++++++++++----
> >  1 file changed, 18 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> > index 2e076a459a0c..389511f59101 100644
> > --- a/arch/x86/kernel/tsc.c
> > +++ b/arch/x86/kernel/tsc.c
> > @@ -1180,6 +1180,12 @@ void mark_tsc_unstable(char *reason)
> >  
> >  EXPORT_SYMBOL_GPL(mark_tsc_unstable);
> >  
> > +static void __init tsc_skip_watchdog_verify(void)
> > +{
> > +	clocksource_tsc_early.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
> > +	clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
> > +}
> > +
> >  static void __init check_system_tsc_reliable(void)
> >  {
> >  #if defined(CONFIG_MGEODEGX1) || defined(CONFIG_MGEODE_LX) || defined(CONFIG_X86_GENERIC)
> > @@ -1196,6 +1202,17 @@ static void __init check_system_tsc_reliable(void)
> >  #endif
> >  	if (boot_cpu_has(X86_FEATURE_TSC_RELIABLE))
> >  		tsc_clocksource_reliable = 1;
> > +
> > +	/*
> > +	 * Ideally the socket number should be checked, but this is called
> > +	 * by tsc_init() which is in early boot phase and the socket numbers
> > +	 * may not be available. Use 'nr_online_nodes' as a fallback solution
> > +	 */
> > +	if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
> > +	    boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
> > +	    boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
> > +	    nr_online_nodes <= 2)
> > +		tsc_skip_watchdog_verify();
> >  }
> >  
> >  /*
> > @@ -1387,9 +1404,6 @@ static int __init init_tsc_clocksource(void)
> >  	if (tsc_unstable)
> >  		goto unreg;
> >  
> > -	if (tsc_clocksource_reliable || no_tsc_watchdog)
> > -		clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
> > -
> >  	if (boot_cpu_has(X86_FEATURE_NONSTOP_TSC_S3))
> >  		clocksource_tsc.flags |= CLOCK_SOURCE_SUSPEND_NONSTOP;
> >  
> > @@ -1527,7 +1541,7 @@ void __init tsc_init(void)
> >  	}
> >  
> >  	if (tsc_clocksource_reliable || no_tsc_watchdog)
> > -		clocksource_tsc_early.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
> > +		tsc_skip_watchdog_verify();
> >  
> >  	clocksource_register_khz(&clocksource_tsc_early, tsc_khz);
> >  	detect_art();
> > -- 
> > 2.27.0

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] x86/tsc: skip tsc watchdog checking for qualified platforms
  2021-11-30 14:40     ` Paul E. McKenney
@ 2021-11-30 15:02       ` Feng Tang
  2021-11-30 16:28         ` Paul E. McKenney
  0 siblings, 1 reply; 23+ messages in thread
From: Feng Tang @ 2021-11-30 15:02 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Peter Zijlstra, x86, linux-kernel, rui.zhang,
	andi.kleen, len.brown, tim.c.chen

Hi Paul,

Thanks for the review!

On Tue, Nov 30, 2021 at 06:40:48AM -0800, Paul E. McKenney wrote:
> On Tue, Nov 30, 2021 at 02:46:23PM +0800, Feng Tang wrote:
> > On Wed, Nov 17, 2021 at 10:37:51AM +0800, Feng Tang wrote:
> > > There are cases that tsc clocksources are wrongly judged as unstable by
> > > clocksource watchdogs like hpet, acpi_pm or 'refined-jiffies'. While
> > > there is hardly a general reliable way to check the validity of a
> > > watchdog, and to protect the innocent tsc, Thomas Gleixner proposed [1]:
> > 
> > Hi All,
> > 
> > Some more update, last week we got report from validation team that
> > the "tsc judged as unstable" happened on latest desktop platform,
> > which has serial earlyprintk enabled, and the watchdog here is
> > 'refined-jiffies' while hpet is disabled during the PC10 check. I
> > tried severy other client platforms I can find: Kabylake, Icelake
> > and Alderlake, and the mis-judging can be easily reproduced on
> > Icelake and Alderlake (not on Kabylake). Which could be cued by
> > this 2/2 patch.
> > 
> > Also, today we got same report on a 2-sockets Icelake Server with
> > 5.5 kernel, while the watchdog is 'hpet', and the system is running
> > stressful big-data workload.
> 
> Were these tests run with Waiman's latest patch series?  The first
> two of them are on RCU's "dev" branch.
 
No, I haven't tried Waiman's patches, which are more about refining
cs_watchdog_read() check, while these 2 cases are about the really
big gap between watchog and cur_clocksource

The error log of first client platform (5.15 kernel) is: 

[    2.994266] clocksource:                       'refined-jiffies' wd_nsec: 516032250 wd_now: fffedc09 wd_last: fffedb88 mask: ffffffff
[    2.998352] initcall irq_sysfs_init+0x0/0x97 returned 0 after 0 usecs
[    3.002266] clocksource:                       'tsc-early' cs_nsec: 767553349 cs_now: 71a87fd2f cs_last: 6db4968ff mask: ffffffffffffffff
[    3.006266] calling  dma_atomic_pool_init+0x0/0x152 @ 1
[    3.010266] clocksource:                       No current clocksource.
[    3.010267] tsc: Marking TSC unstable due to clocksource watchdog

We can see the gap is 516 ms vs 767 ms, and the delta is 267 ms. 
And the root cause is with earlyprintk serial console enabled,
the periodic timer interrupt is severely affected to be not
accurate.

And similar big gap between 'tsc' and 'hpet' is seen for the server
case (5.5 kernel which doesn't have the cs_watchdog_read() patchset). 

[1196945.314929] clocksource: timekeeping watchdog on CPU67: Marking clocksource 'tsc' as unstable because the skew is too large:
[1196945.314935] clocksource:                       'hpet' wd_now: 25272026 wd_last: 2e9ce418 mask: ffffffff
[1196945.314938] clocksource:                       'tsc' cs_now: 95b400003fdf1 cs_last: 95ae7ed7c33f7 mask: ffffffffffffffff
[1196945.314948] tsc: Marking TSC unstable due to clocksource watchdog
[1196945.314977] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
[1196945.314981] sched_clock: Marking unstable (1196945264804527, 50153181)<-(1196945399926576, -84962703)
[1196945.316255] clocksource: Switched to clocksource hpet

For this case, I don't have access to the HW and only have the
dmesg log, from which it seems the watchdog timer has been postponed
a very long time from running.


Thanks,
Feng


> 							Thanx, Paul
> 
> > Thanks,
> > Feng
> > 
> > 
> > > "I'm inclined to lift that requirement when the CPU has:
> > > 
> > >     1) X86_FEATURE_CONSTANT_TSC
> > >     2) X86_FEATURE_NONSTOP_TSC
> > >     3) X86_FEATURE_NONSTOP_TSC_S3
> > >     4) X86_FEATURE_TSC_ADJUST
> > >     5) At max. 4 sockets
> > > 
> > >  After two decades of horrors we're finally at a point where TSC seems
> > >  to be halfway reliable and less abused by BIOS tinkerers. TSC_ADJUST
> > >  was really key as we can now detect even small modifications reliably
> > >  and the important point is that we can cure them as well (not pretty
> > >  but better than all other options)."
> > > 
> > > As feature #3 X86_FEATURE_NONSTOP_TSC_S3 only exists on several generations
> > > of Atom processor, and is always coupled with X86_FEATURE_CONSTANT_TSC
> > > and X86_FEATURE_NONSTOP_TSC, skip checking it, and also be more defensive
> > > to use maxim of 2 sockets.
> > > 
> > > The check is done inside tsc_init() before registering 'tsc-early' and
> > > 'tsc' clocksources, as there were cases that both of them had been
> > > wrongly judged as unreliable.
> > > 
> > > For more background of tsc/watchdog, there is a good summary in [2]
> > > 
> > > [1]. https://lore.kernel.org/lkml/87eekfk8bd.fsf@nanos.tec.linutronix.de/
> > > [2]. https://lore.kernel.org/lkml/87a6pimt1f.ffs@nanos.tec.linutronix.de/
> > > Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> > > Signed-off-by: Feng Tang <feng.tang@intel.com>
> > > ---
> > > Change log:
> > > 
> > >   v3:
> > >     * rebased against 5.16-rc1
> > >     * refine commit log
> > > 
> > >   v2:
> > >     * Directly skip watchdog check without messing flag
> > >       'tsc_clocksource_reliable' (Thomas)
> > > 
> > >  arch/x86/kernel/tsc.c | 22 ++++++++++++++++++----
> > >  1 file changed, 18 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> > > index 2e076a459a0c..389511f59101 100644
> > > --- a/arch/x86/kernel/tsc.c
> > > +++ b/arch/x86/kernel/tsc.c
> > > @@ -1180,6 +1180,12 @@ void mark_tsc_unstable(char *reason)
> > >  
> > >  EXPORT_SYMBOL_GPL(mark_tsc_unstable);
> > >  
> > > +static void __init tsc_skip_watchdog_verify(void)
> > > +{
> > > +	clocksource_tsc_early.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
> > > +	clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
> > > +}
> > > +
> > >  static void __init check_system_tsc_reliable(void)
> > >  {
> > >  #if defined(CONFIG_MGEODEGX1) || defined(CONFIG_MGEODE_LX) || defined(CONFIG_X86_GENERIC)
> > > @@ -1196,6 +1202,17 @@ static void __init check_system_tsc_reliable(void)
> > >  #endif
> > >  	if (boot_cpu_has(X86_FEATURE_TSC_RELIABLE))
> > >  		tsc_clocksource_reliable = 1;
> > > +
> > > +	/*
> > > +	 * Ideally the socket number should be checked, but this is called
> > > +	 * by tsc_init() which is in early boot phase and the socket numbers
> > > +	 * may not be available. Use 'nr_online_nodes' as a fallback solution
> > > +	 */
> > > +	if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
> > > +	    boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
> > > +	    boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
> > > +	    nr_online_nodes <= 2)
> > > +		tsc_skip_watchdog_verify();
> > >  }
> > >  
> > >  /*
> > > @@ -1387,9 +1404,6 @@ static int __init init_tsc_clocksource(void)
> > >  	if (tsc_unstable)
> > >  		goto unreg;
> > >  
> > > -	if (tsc_clocksource_reliable || no_tsc_watchdog)
> > > -		clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
> > > -
> > >  	if (boot_cpu_has(X86_FEATURE_NONSTOP_TSC_S3))
> > >  		clocksource_tsc.flags |= CLOCK_SOURCE_SUSPEND_NONSTOP;
> > >  
> > > @@ -1527,7 +1541,7 @@ void __init tsc_init(void)
> > >  	}
> > >  
> > >  	if (tsc_clocksource_reliable || no_tsc_watchdog)
> > > -		clocksource_tsc_early.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
> > > +		tsc_skip_watchdog_verify();
> > >  
> > >  	clocksource_register_khz(&clocksource_tsc_early, tsc_khz);
> > >  	detect_art();
> > > -- 
> > > 2.27.0

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] x86/tsc: skip tsc watchdog checking for qualified platforms
  2021-11-30 15:02       ` Feng Tang
@ 2021-11-30 16:28         ` Paul E. McKenney
  2021-11-30 20:39           ` Thomas Gleixner
  2021-12-07  1:41           ` Feng Tang
  0 siblings, 2 replies; 23+ messages in thread
From: Paul E. McKenney @ 2021-11-30 16:28 UTC (permalink / raw)
  To: Feng Tang
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Peter Zijlstra, x86, linux-kernel, rui.zhang,
	andi.kleen, len.brown, tim.c.chen

On Tue, Nov 30, 2021 at 11:02:56PM +0800, Feng Tang wrote:
> Hi Paul,
> 
> Thanks for the review!
> 
> On Tue, Nov 30, 2021 at 06:40:48AM -0800, Paul E. McKenney wrote:
> > On Tue, Nov 30, 2021 at 02:46:23PM +0800, Feng Tang wrote:
> > > On Wed, Nov 17, 2021 at 10:37:51AM +0800, Feng Tang wrote:
> > > > There are cases that tsc clocksources are wrongly judged as unstable by
> > > > clocksource watchdogs like hpet, acpi_pm or 'refined-jiffies'. While
> > > > there is hardly a general reliable way to check the validity of a
> > > > watchdog, and to protect the innocent tsc, Thomas Gleixner proposed [1]:
> > > 
> > > Hi All,
> > > 
> > > Some more update, last week we got report from validation team that
> > > the "tsc judged as unstable" happened on latest desktop platform,
> > > which has serial earlyprintk enabled, and the watchdog here is
> > > 'refined-jiffies' while hpet is disabled during the PC10 check. I
> > > tried severy other client platforms I can find: Kabylake, Icelake
> > > and Alderlake, and the mis-judging can be easily reproduced on
> > > Icelake and Alderlake (not on Kabylake). Which could be cued by
> > > this 2/2 patch.
> > > 
> > > Also, today we got same report on a 2-sockets Icelake Server with
> > > 5.5 kernel, while the watchdog is 'hpet', and the system is running
> > > stressful big-data workload.
> > 
> > Were these tests run with Waiman's latest patch series?  The first
> > two of them are on RCU's "dev" branch.
>  
> No, I haven't tried Waiman's patches, which are more about refining
> cs_watchdog_read() check, while these 2 cases are about the really
> big gap between watchog and cur_clocksource
> 
> The error log of first client platform (5.15 kernel) is: 
> 
> [    2.994266] clocksource:                       'refined-jiffies' wd_nsec: 516032250 wd_now: fffedc09 wd_last: fffedb88 mask: ffffffff
> [    2.998352] initcall irq_sysfs_init+0x0/0x97 returned 0 after 0 usecs
> [    3.002266] clocksource:                       'tsc-early' cs_nsec: 767553349 cs_now: 71a87fd2f cs_last: 6db4968ff mask: ffffffffffffffff
> [    3.006266] calling  dma_atomic_pool_init+0x0/0x152 @ 1
> [    3.010266] clocksource:                       No current clocksource.
> [    3.010267] tsc: Marking TSC unstable due to clocksource watchdog
> 
> We can see the gap is 516 ms vs 767 ms, and the delta is 267 ms. 
> And the root cause is with earlyprintk serial console enabled,
> the periodic timer interrupt is severely affected to be not
> accurate.
> 
> And similar big gap between 'tsc' and 'hpet' is seen for the server
> case (5.5 kernel which doesn't have the cs_watchdog_read() patchset). 
> 
> [1196945.314929] clocksource: timekeeping watchdog on CPU67: Marking clocksource 'tsc' as unstable because the skew is too large:
> [1196945.314935] clocksource:                       'hpet' wd_now: 25272026 wd_last: 2e9ce418 mask: ffffffff
> [1196945.314938] clocksource:                       'tsc' cs_now: 95b400003fdf1 cs_last: 95ae7ed7c33f7 mask: ffffffffffffffff
> [1196945.314948] tsc: Marking TSC unstable due to clocksource watchdog
> [1196945.314977] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
> [1196945.314981] sched_clock: Marking unstable (1196945264804527, 50153181)<-(1196945399926576, -84962703)
> [1196945.316255] clocksource: Switched to clocksource hpet
> 
> For this case, I don't have access to the HW and only have the
> dmesg log, from which it seems the watchdog timer has been postponed
> a very long time from running.

Thank you for the analysis!

One approach to handle this situation would be to avoid checking for
clock skew if the time since the last watchdog read was more than (say)
twice the desired watchdog spacing.  This does leave open the question of
exactly which clocksource to use to measure the time between successive
clocksource reads.  My thought is to check this only once upon entry to
the handler and to use the designated-good clocksource.

Does that make sense, or would something else work better?

							Thanx, Paul

> Thanks,
> Feng
> 
> 
> > 							Thanx, Paul
> > 
> > > Thanks,
> > > Feng
> > > 
> > > 
> > > > "I'm inclined to lift that requirement when the CPU has:
> > > > 
> > > >     1) X86_FEATURE_CONSTANT_TSC
> > > >     2) X86_FEATURE_NONSTOP_TSC
> > > >     3) X86_FEATURE_NONSTOP_TSC_S3
> > > >     4) X86_FEATURE_TSC_ADJUST
> > > >     5) At max. 4 sockets
> > > > 
> > > >  After two decades of horrors we're finally at a point where TSC seems
> > > >  to be halfway reliable and less abused by BIOS tinkerers. TSC_ADJUST
> > > >  was really key as we can now detect even small modifications reliably
> > > >  and the important point is that we can cure them as well (not pretty
> > > >  but better than all other options)."
> > > > 
> > > > As feature #3 X86_FEATURE_NONSTOP_TSC_S3 only exists on several generations
> > > > of Atom processor, and is always coupled with X86_FEATURE_CONSTANT_TSC
> > > > and X86_FEATURE_NONSTOP_TSC, skip checking it, and also be more defensive
> > > > to use maxim of 2 sockets.
> > > > 
> > > > The check is done inside tsc_init() before registering 'tsc-early' and
> > > > 'tsc' clocksources, as there were cases that both of them had been
> > > > wrongly judged as unreliable.
> > > > 
> > > > For more background of tsc/watchdog, there is a good summary in [2]
> > > > 
> > > > [1]. https://lore.kernel.org/lkml/87eekfk8bd.fsf@nanos.tec.linutronix.de/
> > > > [2]. https://lore.kernel.org/lkml/87a6pimt1f.ffs@nanos.tec.linutronix.de/
> > > > Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> > > > Signed-off-by: Feng Tang <feng.tang@intel.com>
> > > > ---
> > > > Change log:
> > > > 
> > > >   v3:
> > > >     * rebased against 5.16-rc1
> > > >     * refine commit log
> > > > 
> > > >   v2:
> > > >     * Directly skip watchdog check without messing flag
> > > >       'tsc_clocksource_reliable' (Thomas)
> > > > 
> > > >  arch/x86/kernel/tsc.c | 22 ++++++++++++++++++----
> > > >  1 file changed, 18 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> > > > index 2e076a459a0c..389511f59101 100644
> > > > --- a/arch/x86/kernel/tsc.c
> > > > +++ b/arch/x86/kernel/tsc.c
> > > > @@ -1180,6 +1180,12 @@ void mark_tsc_unstable(char *reason)
> > > >  
> > > >  EXPORT_SYMBOL_GPL(mark_tsc_unstable);
> > > >  
> > > > +static void __init tsc_skip_watchdog_verify(void)
> > > > +{
> > > > +	clocksource_tsc_early.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
> > > > +	clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
> > > > +}
> > > > +
> > > >  static void __init check_system_tsc_reliable(void)
> > > >  {
> > > >  #if defined(CONFIG_MGEODEGX1) || defined(CONFIG_MGEODE_LX) || defined(CONFIG_X86_GENERIC)
> > > > @@ -1196,6 +1202,17 @@ static void __init check_system_tsc_reliable(void)
> > > >  #endif
> > > >  	if (boot_cpu_has(X86_FEATURE_TSC_RELIABLE))
> > > >  		tsc_clocksource_reliable = 1;
> > > > +
> > > > +	/*
> > > > +	 * Ideally the socket number should be checked, but this is called
> > > > +	 * by tsc_init() which is in early boot phase and the socket numbers
> > > > +	 * may not be available. Use 'nr_online_nodes' as a fallback solution
> > > > +	 */
> > > > +	if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
> > > > +	    boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
> > > > +	    boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
> > > > +	    nr_online_nodes <= 2)
> > > > +		tsc_skip_watchdog_verify();
> > > >  }
> > > >  
> > > >  /*
> > > > @@ -1387,9 +1404,6 @@ static int __init init_tsc_clocksource(void)
> > > >  	if (tsc_unstable)
> > > >  		goto unreg;
> > > >  
> > > > -	if (tsc_clocksource_reliable || no_tsc_watchdog)
> > > > -		clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
> > > > -
> > > >  	if (boot_cpu_has(X86_FEATURE_NONSTOP_TSC_S3))
> > > >  		clocksource_tsc.flags |= CLOCK_SOURCE_SUSPEND_NONSTOP;
> > > >  
> > > > @@ -1527,7 +1541,7 @@ void __init tsc_init(void)
> > > >  	}
> > > >  
> > > >  	if (tsc_clocksource_reliable || no_tsc_watchdog)
> > > > -		clocksource_tsc_early.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
> > > > +		tsc_skip_watchdog_verify();
> > > >  
> > > >  	clocksource_register_khz(&clocksource_tsc_early, tsc_khz);
> > > >  	detect_art();
> > > > -- 
> > > > 2.27.0

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] x86/tsc: skip tsc watchdog checking for qualified platforms
  2021-11-30 16:28         ` Paul E. McKenney
@ 2021-11-30 20:39           ` Thomas Gleixner
  2021-11-30 20:47             ` Paul E. McKenney
  2021-12-07  1:41           ` Feng Tang
  1 sibling, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2021-11-30 20:39 UTC (permalink / raw)
  To: paulmck, Feng Tang
  Cc: Ingo Molnar, Borislav Petkov, Dave Hansen, H . Peter Anvin,
	Peter Zijlstra, x86, linux-kernel, rui.zhang, andi.kleen,
	len.brown, tim.c.chen


Can you folks please trim your replies? Finding content in the middle of
quoted nonsense becomes harder with every mail in this thread.

On Tue, Nov 30 2021 at 08:28, Paul E. McKenney wrote:
> On Tue, Nov 30, 2021 at 11:02:56PM +0800, Feng Tang wrote:
>> For this case, I don't have access to the HW and only have the
>> dmesg log, from which it seems the watchdog timer has been postponed
>> a very long time from running.
>
> Thank you for the analysis!
>
> One approach to handle this situation would be to avoid checking for
> clock skew if the time since the last watchdog read was more than (say)
> twice the desired watchdog spacing.  This does leave open the question of
> exactly which clocksource to use to measure the time between successive
> clocksource reads.  My thought is to check this only once upon entry to
> the handler and to use the designated-good clocksource.
>
> Does that make sense, or would something else work better?

Seriously. Jiffies is not usable as watchdog simply because lost ticks
cannot be compensated and you cannot use TSC to bridge them because you
are not trusting TSC. This is simply a circulus vitiosus.

We really need to remove the watchdog requirement for modern hardware.
Let me stare at those patches and get them merged.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] x86/tsc: skip tsc watchdog checking for qualified platforms
  2021-11-30 20:39           ` Thomas Gleixner
@ 2021-11-30 20:47             ` Paul E. McKenney
  2021-11-30 21:55               ` Thomas Gleixner
  0 siblings, 1 reply; 23+ messages in thread
From: Paul E. McKenney @ 2021-11-30 20:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Feng Tang, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Peter Zijlstra, x86, linux-kernel, rui.zhang,
	andi.kleen, len.brown, tim.c.chen

On Tue, Nov 30, 2021 at 09:39:32PM +0100, Thomas Gleixner wrote:
> 
> Can you folks please trim your replies? Finding content in the middle of
> quoted nonsense becomes harder with every mail in this thread.
> 
> On Tue, Nov 30 2021 at 08:28, Paul E. McKenney wrote:
> > On Tue, Nov 30, 2021 at 11:02:56PM +0800, Feng Tang wrote:
> >> For this case, I don't have access to the HW and only have the
> >> dmesg log, from which it seems the watchdog timer has been postponed
> >> a very long time from running.
> >
> > Thank you for the analysis!
> >
> > One approach to handle this situation would be to avoid checking for
> > clock skew if the time since the last watchdog read was more than (say)
> > twice the desired watchdog spacing.  This does leave open the question of
> > exactly which clocksource to use to measure the time between successive
> > clocksource reads.  My thought is to check this only once upon entry to
> > the handler and to use the designated-good clocksource.
> >
> > Does that make sense, or would something else work better?
> 
> Seriously. Jiffies is not usable as watchdog simply because lost ticks
> cannot be compensated and you cannot use TSC to bridge them because you
> are not trusting TSC. This is simply a circulus vitiosus.

OK, HPET or nothing, then.

> We really need to remove the watchdog requirement for modern hardware.
> Let me stare at those patches and get them merged.

You are more trusting of modern hardware than I am, but for all I know,
maybe rightfully so.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] x86/tsc: skip tsc watchdog checking for qualified platforms
  2021-11-30 20:47             ` Paul E. McKenney
@ 2021-11-30 21:55               ` Thomas Gleixner
  2021-11-30 22:48                 ` Paul E. McKenney
  0 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2021-11-30 21:55 UTC (permalink / raw)
  To: paulmck
  Cc: Feng Tang, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Peter Zijlstra, x86, linux-kernel, rui.zhang,
	andi.kleen, len.brown, tim.c.chen

On Tue, Nov 30 2021 at 12:47, Paul E. McKenney wrote:
> On Tue, Nov 30, 2021 at 09:39:32PM +0100, Thomas Gleixner wrote:
>> Seriously. Jiffies is not usable as watchdog simply because lost ticks
>> cannot be compensated and you cannot use TSC to bridge them because you
>> are not trusting TSC. This is simply a circulus vitiosus.
>
> OK, HPET or nothing, then.

Older machines also have pm_timer. But those beasts seem to have lost
that too.

>> We really need to remove the watchdog requirement for modern hardware.
>> Let me stare at those patches and get them merged.
>
> You are more trusting of modern hardware than I am, but for all I know,
> maybe rightfully so.  ;-)

Well, I rather put a bet on the hardware, which has become reasonable
over the last decade, than on trying to solve a circular dependency
problem with tons of heuristics which won't ever work correctly.

TSC_ADJUST is a reasonable safety net and since its invention the amount
of BIOS wreckage has been massively reduced. Seems the nastigram in
dmesg when detecting a change in TSC_ADJUST had an effect or maybe
Microsoft enforces a tinkerfree TSC by now and we get the benefit. :)

I still wish to have a knob to lock down TSC to read only, but that's
probably for christmas 2030 or later. :)

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] x86/tsc: skip tsc watchdog checking for qualified platforms
  2021-11-30 21:55               ` Thomas Gleixner
@ 2021-11-30 22:48                 ` Paul E. McKenney
  2021-11-30 23:19                   ` Thomas Gleixner
  0 siblings, 1 reply; 23+ messages in thread
From: Paul E. McKenney @ 2021-11-30 22:48 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Feng Tang, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Peter Zijlstra, x86, linux-kernel, rui.zhang,
	andi.kleen, len.brown, tim.c.chen

On Tue, Nov 30, 2021 at 10:55:45PM +0100, Thomas Gleixner wrote:
> On Tue, Nov 30 2021 at 12:47, Paul E. McKenney wrote:
> > On Tue, Nov 30, 2021 at 09:39:32PM +0100, Thomas Gleixner wrote:
> >> Seriously. Jiffies is not usable as watchdog simply because lost ticks
> >> cannot be compensated and you cannot use TSC to bridge them because you
> >> are not trusting TSC. This is simply a circulus vitiosus.
> >
> > OK, HPET or nothing, then.
> 
> Older machines also have pm_timer. But those beasts seem to have lost
> that too.

I suppose that one way of avoiding clock-skew messages is to have only
one clock.

> >> We really need to remove the watchdog requirement for modern hardware.
> >> Let me stare at those patches and get them merged.
> >
> > You are more trusting of modern hardware than I am, but for all I know,
> > maybe rightfully so.  ;-)
> 
> Well, I rather put a bet on the hardware, which has become reasonable
> over the last decade, than on trying to solve a circular dependency
> problem with tons of heuristics which won't ever work correctly.

Use of HPET to check the interval length would not be circular, right?

> TSC_ADJUST is a reasonable safety net and since its invention the amount
> of BIOS wreckage has been massively reduced. Seems the nastigram in
> dmesg when detecting a change in TSC_ADJUST had an effect or maybe
> Microsoft enforces a tinkerfree TSC by now and we get the benefit. :)
> 
> I still wish to have a knob to lock down TSC to read only, but that's
> probably for christmas 2030 or later. :)

Indeed.  How would BIOS writers hide their SMI handlers?  :-/

							Thanx, Paul

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] x86/tsc: skip tsc watchdog checking for qualified platforms
  2021-11-30 22:48                 ` Paul E. McKenney
@ 2021-11-30 23:19                   ` Thomas Gleixner
  2021-11-30 23:37                     ` Paul E. McKenney
  0 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2021-11-30 23:19 UTC (permalink / raw)
  To: paulmck
  Cc: Feng Tang, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Peter Zijlstra, x86, linux-kernel, rui.zhang,
	andi.kleen, len.brown, tim.c.chen

On Tue, Nov 30 2021 at 14:48, Paul E. McKenney wrote:
> On Tue, Nov 30, 2021 at 10:55:45PM +0100, Thomas Gleixner wrote:
>> > OK, HPET or nothing, then.
>> 
>> Older machines also have pm_timer. But those beasts seem to have lost
>> that too.
>
> I suppose that one way of avoiding clock-skew messages is to have only
> one clock.

Indeed. It's a complete mystery why it takes ages to implement reliable
clocks in hardware.

>> >> We really need to remove the watchdog requirement for modern hardware.
>> >> Let me stare at those patches and get them merged.
>> >
>> > You are more trusting of modern hardware than I am, but for all I know,
>> > maybe rightfully so.  ;-)
>> 
>> Well, I rather put a bet on the hardware, which has become reasonable
>> over the last decade, than on trying to solve a circular dependency
>> problem with tons of heuristics which won't ever work correctly.
>
> Use of HPET to check the interval length would not be circular, right?

As long as the HPET works reliably :)

>> TSC_ADJUST is a reasonable safety net and since its invention the amount
>> of BIOS wreckage has been massively reduced. Seems the nastigram in
>> dmesg when detecting a change in TSC_ADJUST had an effect or maybe
>> Microsoft enforces a tinkerfree TSC by now and we get the benefit. :)
>> 
>> I still wish to have a knob to lock down TSC to read only, but that's
>> probably for christmas 2030 or later. :)
>
> Indeed.  How would BIOS writers hide their SMI handlers?  :-/

TSC_ADJUST already ruined that party.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] x86/tsc: skip tsc watchdog checking for qualified platforms
  2021-11-30 23:19                   ` Thomas Gleixner
@ 2021-11-30 23:37                     ` Paul E. McKenney
  2021-12-01  1:26                       ` Feng Tang
  0 siblings, 1 reply; 23+ messages in thread
From: Paul E. McKenney @ 2021-11-30 23:37 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Feng Tang, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Peter Zijlstra, x86, linux-kernel, rui.zhang,
	andi.kleen, len.brown, tim.c.chen

On Wed, Dec 01, 2021 at 12:19:43AM +0100, Thomas Gleixner wrote:
> On Tue, Nov 30 2021 at 14:48, Paul E. McKenney wrote:
> > On Tue, Nov 30, 2021 at 10:55:45PM +0100, Thomas Gleixner wrote:
> >> > OK, HPET or nothing, then.
> >> 
> >> Older machines also have pm_timer. But those beasts seem to have lost
> >> that too.
> >
> > I suppose that one way of avoiding clock-skew messages is to have only
> > one clock.
> 
> Indeed. It's a complete mystery why it takes ages to implement reliable
> clocks in hardware.

That one is easy.  It is because the previous clocksource watchdog was
too lenient.  ;-)

(Sorry, couldn't resist...)

> >> >> We really need to remove the watchdog requirement for modern hardware.
> >> >> Let me stare at those patches and get them merged.
> >> >
> >> > You are more trusting of modern hardware than I am, but for all I know,
> >> > maybe rightfully so.  ;-)
> >> 
> >> Well, I rather put a bet on the hardware, which has become reasonable
> >> over the last decade, than on trying to solve a circular dependency
> >> problem with tons of heuristics which won't ever work correctly.
> >
> > Use of HPET to check the interval length would not be circular, right?
> 
> As long as the HPET works reliably :)

Is it also a complete mystery why clocksources previously deemed
reliable no longer work reliably?  ;-)

> >> TSC_ADJUST is a reasonable safety net and since its invention the amount
> >> of BIOS wreckage has been massively reduced. Seems the nastigram in
> >> dmesg when detecting a change in TSC_ADJUST had an effect or maybe
> >> Microsoft enforces a tinkerfree TSC by now and we get the benefit. :)
> >> 
> >> I still wish to have a knob to lock down TSC to read only, but that's
> >> probably for christmas 2030 or later. :)
> >
> > Indeed.  How would BIOS writers hide their SMI handlers?  :-/
> 
> TSC_ADJUST already ruined that party.

Give the BIOS writers time, they will figure something else out.  :-(

							Thanx, Paul

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] x86/tsc: skip tsc watchdog checking for qualified platforms
  2021-11-30 23:37                     ` Paul E. McKenney
@ 2021-12-01  1:26                       ` Feng Tang
  2021-12-01 17:52                         ` Paul E. McKenney
  0 siblings, 1 reply; 23+ messages in thread
From: Feng Tang @ 2021-12-01  1:26 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Peter Zijlstra, x86, linux-kernel, rui.zhang,
	andi.kleen, len.brown, tim.c.chen

On Tue, Nov 30, 2021 at 03:37:26PM -0800, Paul E. McKenney wrote:
> On Wed, Dec 01, 2021 at 12:19:43AM +0100, Thomas Gleixner wrote:
> > On Tue, Nov 30 2021 at 14:48, Paul E. McKenney wrote:
> > > On Tue, Nov 30, 2021 at 10:55:45PM +0100, Thomas Gleixner wrote:
> > >> > OK, HPET or nothing, then.
> > >> 
> > >> Older machines also have pm_timer. But those beasts seem to have lost
> > >> that too.
> > >
> > > I suppose that one way of avoiding clock-skew messages is to have only
> > > one clock.
> > 
> > Indeed. It's a complete mystery why it takes ages to implement reliable
> > clocks in hardware.
> 
> That one is easy.  It is because the previous clocksource watchdog was
> too lenient.  ;-)
> 
> (Sorry, couldn't resist...)
> 
> > >> >> We really need to remove the watchdog requirement for modern hardware.
> > >> >> Let me stare at those patches and get them merged.
> > >> >
> > >> > You are more trusting of modern hardware than I am, but for all I know,
> > >> > maybe rightfully so.  ;-)
> > >> 
> > >> Well, I rather put a bet on the hardware, which has become reasonable
> > >> over the last decade, than on trying to solve a circular dependency
> > >> problem with tons of heuristics which won't ever work correctly.
> > >
> > > Use of HPET to check the interval length would not be circular, right?
> > 
> > As long as the HPET works reliably :)
> 
> Is it also a complete mystery why clocksources previously deemed
> reliable no longer work reliably?  ;-)

For HPET, it's a long story :) Back in 2012 or so, the HPET on Baytrail
platform has a new feature that it will stop counting in PC10 (a cpuidle
state), which prevent it to be a clocksource, and we have to disable
HPET explicitly for that platform. Since then, some new platforms also
have the same feature, and their HPET got disabled too. 

Thanks,
Feng

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] x86/tsc: skip tsc watchdog checking for qualified platforms
  2021-11-17  2:37 ` [PATCH v3 2/2] x86/tsc: skip tsc watchdog checking for qualified platforms Feng Tang
  2021-11-30  6:46   ` Feng Tang
@ 2021-12-01  4:45   ` Luming Yu
  2021-12-01  5:19     ` Feng Tang
  2021-12-01 10:41     ` Thomas Gleixner
  2021-12-01 23:47   ` [tip: x86/urgent] x86/tsc: Disable clocksource watchdog for TSC on qualified platorms tip-bot2 for Feng Tang
  2021-12-02  4:47   ` [PATCH v3 2/2] x86/tsc: skip tsc watchdog checking for qualified platforms Luming Yu
  3 siblings, 2 replies; 23+ messages in thread
From: Luming Yu @ 2021-12-01  4:45 UTC (permalink / raw)
  To: Feng Tang
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Peter Zijlstra, the arch/x86 maintainers, LKML,
	paulmck, rui.zhang, andi.kleen, Len Brown, tim.c.chen

On Tue, Nov 16, 2021 at 11:18 PM Feng Tang <feng.tang@intel.com> wrote:
>
> There are cases that tsc clocksources are wrongly judged as unstable by
> clocksource watchdogs like hpet, acpi_pm or 'refined-jiffies'. While
> there is hardly a general reliable way to check the validity of a
> watchdog, and to protect the innocent tsc, Thomas Gleixner proposed [1]:
>
> "I'm inclined to lift that requirement when the CPU has:
>
>     1) X86_FEATURE_CONSTANT_TSC
>     2) X86_FEATURE_NONSTOP_TSC
>     3) X86_FEATURE_NONSTOP_TSC_S3
>     4) X86_FEATURE_TSC_ADJUST
>     5) At max. 4 sockets
>

Hi Feng,

We do  need to decouple  tsc from HPET as the current HPET as a
clocksource watchdog for tsc
is only useful to find HPET read skews in some circumstances and the
variations of HPET read come from many different sources. But
none of which really came from the tsc quality, AFAICT.

so this patch is in line with my understanding of the problem.
So , please use  reviewed-by :  luming.yu@intel.com , if it can help
the merge of the patch. : -)

BR
Luming


>  After two decades of horrors we're finally at a point where TSC seems
>  to be halfway reliable and less abused by BIOS tinkerers. TSC_ADJUST
>  was really key as we can now detect even small modifications reliably
>  and the important point is that we can cure them as well (not pretty
>  but better than all other options)."
>
> As feature #3 X86_FEATURE_NONSTOP_TSC_S3 only exists on several generations
> of Atom processor, and is always coupled with X86_FEATURE_CONSTANT_TSC
> and X86_FEATURE_NONSTOP_TSC, skip checking it, and also be more defensive
> to use maxim of 2 sockets.
>
> The check is done inside tsc_init() before registering 'tsc-early' and
> 'tsc' clocksources, as there were cases that both of them had been
> wrongly judged as unreliable.
>
> For more background of tsc/watchdog, there is a good summary in [2]
>
> [1]. https://lore.kernel.org/lkml/87eekfk8bd.fsf@nanos.tec.linutronix.de/
> [2]. https://lore.kernel.org/lkml/87a6pimt1f.ffs@nanos.tec.linutronix.de/
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Feng Tang <feng.tang@intel.com>
> ---
> Change log:
>
>   v3:
>     * rebased against 5.16-rc1
>     * refine commit log
>
>   v2:
>     * Directly skip watchdog check without messing flag
>       'tsc_clocksource_reliable' (Thomas)
>
>  arch/x86/kernel/tsc.c | 22 ++++++++++++++++++----
>  1 file changed, 18 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index 2e076a459a0c..389511f59101 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -1180,6 +1180,12 @@ void mark_tsc_unstable(char *reason)
>
>  EXPORT_SYMBOL_GPL(mark_tsc_unstable);
>
> +static void __init tsc_skip_watchdog_verify(void)
> +{
> +       clocksource_tsc_early.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
> +       clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
> +}
> +
>  static void __init check_system_tsc_reliable(void)
>  {
>  #if defined(CONFIG_MGEODEGX1) || defined(CONFIG_MGEODE_LX) || defined(CONFIG_X86_GENERIC)
> @@ -1196,6 +1202,17 @@ static void __init check_system_tsc_reliable(void)
>  #endif
>         if (boot_cpu_has(X86_FEATURE_TSC_RELIABLE))
>                 tsc_clocksource_reliable = 1;
> +
> +       /*
> +        * Ideally the socket number should be checked, but this is called
> +        * by tsc_init() which is in early boot phase and the socket numbers
> +        * may not be available. Use 'nr_online_nodes' as a fallback solution
> +        */
> +       if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
> +           boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
> +           boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
> +           nr_online_nodes <= 2)
> +               tsc_skip_watchdog_verify();
>  }
>
>  /*
> @@ -1387,9 +1404,6 @@ static int __init init_tsc_clocksource(void)
>         if (tsc_unstable)
>                 goto unreg;
>
> -       if (tsc_clocksource_reliable || no_tsc_watchdog)
> -               clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
> -
>         if (boot_cpu_has(X86_FEATURE_NONSTOP_TSC_S3))
>                 clocksource_tsc.flags |= CLOCK_SOURCE_SUSPEND_NONSTOP;
>
> @@ -1527,7 +1541,7 @@ void __init tsc_init(void)
>         }
>
>         if (tsc_clocksource_reliable || no_tsc_watchdog)
> -               clocksource_tsc_early.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
> +               tsc_skip_watchdog_verify();
>
>         clocksource_register_khz(&clocksource_tsc_early, tsc_khz);
>         detect_art();
> --
> 2.27.0
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] x86/tsc: skip tsc watchdog checking for qualified platforms
  2021-12-01  4:45   ` Luming Yu
@ 2021-12-01  5:19     ` Feng Tang
  2021-12-01 10:41     ` Thomas Gleixner
  1 sibling, 0 replies; 23+ messages in thread
From: Feng Tang @ 2021-12-01  5:19 UTC (permalink / raw)
  To: Luming Yu
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Peter Zijlstra, the arch/x86 maintainers, LKML,
	paulmck, rui.zhang, andi.kleen, Len Brown, tim.c.chen

On Tue, Nov 30, 2021 at 11:45:32PM -0500, Luming Yu wrote:
> On Tue, Nov 16, 2021 at 11:18 PM Feng Tang <feng.tang@intel.com> wrote:
> >
> > There are cases that tsc clocksources are wrongly judged as unstable by
> > clocksource watchdogs like hpet, acpi_pm or 'refined-jiffies'. While
> > there is hardly a general reliable way to check the validity of a
> > watchdog, and to protect the innocent tsc, Thomas Gleixner proposed [1]:
> >
> > "I'm inclined to lift that requirement when the CPU has:
> >
> >     1) X86_FEATURE_CONSTANT_TSC
> >     2) X86_FEATURE_NONSTOP_TSC
> >     3) X86_FEATURE_NONSTOP_TSC_S3
> >     4) X86_FEATURE_TSC_ADJUST
> >     5) At max. 4 sockets
> >
> 
> Hi Feng,
> 
> We do  need to decouple  tsc from HPET as the current HPET as a
> clocksource watchdog for tsc
> is only useful to find HPET read skews in some circumstances and the
> variations of HPET read come from many different sources. But
> none of which really came from the tsc quality, AFAICT.
> 
> so this patch is in line with my understanding of the problem.
> So , please use  reviewed-by :  luming.yu@intel.com , if it can help
> the merge of the patch. : -)

Thanks for the review and sharing the real world cases you've met!

- Feng

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] x86/tsc: skip tsc watchdog checking for qualified platforms
  2021-12-01  4:45   ` Luming Yu
  2021-12-01  5:19     ` Feng Tang
@ 2021-12-01 10:41     ` Thomas Gleixner
  1 sibling, 0 replies; 23+ messages in thread
From: Thomas Gleixner @ 2021-12-01 10:41 UTC (permalink / raw)
  To: Luming Yu, Feng Tang
  Cc: Ingo Molnar, Borislav Petkov, Dave Hansen, H . Peter Anvin,
	Peter Zijlstra, the arch/x86 maintainers, LKML, paulmck,
	rui.zhang, andi.kleen, Len Brown, tim.c.chen

Luming,

please trim your replies.

On Tue, Nov 30 2021 at 23:45, Luming Yu wrote:

> We do need to decouple tsc from HPET as the current HPET as a
> clocksource watchdog for tsc is only useful to find HPET read skews in
> some circumstances and the variations of HPET read come from many
> different sources. But none of which really came from the tsc quality,
> AFAICT.

Sorry, but I fail to decode this sentence.

> so this patch is in line with my understanding of the problem.
> So , please use  reviewed-by :  luming.yu@intel.com , if it can help
> the merge of the patch. : -)

Please send Reviewed-by as a properly formatted single line tag:

   Reviewed-by: Luming Yu <luming.yu@intel.com>

so it can be picked up by tools automatically. Mangling it into a
sentence just creates extra work for maintainers.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] x86/tsc: skip tsc watchdog checking for qualified platforms
  2021-12-01  1:26                       ` Feng Tang
@ 2021-12-01 17:52                         ` Paul E. McKenney
  0 siblings, 0 replies; 23+ messages in thread
From: Paul E. McKenney @ 2021-12-01 17:52 UTC (permalink / raw)
  To: Feng Tang
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Peter Zijlstra, x86, linux-kernel, rui.zhang,
	andi.kleen, len.brown, tim.c.chen

On Wed, Dec 01, 2021 at 09:26:55AM +0800, Feng Tang wrote:
> On Tue, Nov 30, 2021 at 03:37:26PM -0800, Paul E. McKenney wrote:
> > On Wed, Dec 01, 2021 at 12:19:43AM +0100, Thomas Gleixner wrote:
> > > On Tue, Nov 30 2021 at 14:48, Paul E. McKenney wrote:
> > > > On Tue, Nov 30, 2021 at 10:55:45PM +0100, Thomas Gleixner wrote:
> > > >> > OK, HPET or nothing, then.
> > > >> 
> > > >> Older machines also have pm_timer. But those beasts seem to have lost
> > > >> that too.
> > > >
> > > > I suppose that one way of avoiding clock-skew messages is to have only
> > > > one clock.
> > > 
> > > Indeed. It's a complete mystery why it takes ages to implement reliable
> > > clocks in hardware.
> > 
> > That one is easy.  It is because the previous clocksource watchdog was
> > too lenient.  ;-)
> > 
> > (Sorry, couldn't resist...)
> > 
> > > >> >> We really need to remove the watchdog requirement for modern hardware.
> > > >> >> Let me stare at those patches and get them merged.
> > > >> >
> > > >> > You are more trusting of modern hardware than I am, but for all I know,
> > > >> > maybe rightfully so.  ;-)
> > > >> 
> > > >> Well, I rather put a bet on the hardware, which has become reasonable
> > > >> over the last decade, than on trying to solve a circular dependency
> > > >> problem with tons of heuristics which won't ever work correctly.
> > > >
> > > > Use of HPET to check the interval length would not be circular, right?
> > > 
> > > As long as the HPET works reliably :)
> > 
> > Is it also a complete mystery why clocksources previously deemed
> > reliable no longer work reliably?  ;-)
> 
> For HPET, it's a long story :) Back in 2012 or so, the HPET on Baytrail
> platform has a new feature that it will stop counting in PC10 (a cpuidle
> state), which prevent it to be a clocksource, and we have to disable
> HPET explicitly for that platform. Since then, some new platforms also
> have the same feature, and their HPET got disabled too. 

I must confess that I have been involved in similar things more times
than I care to admit.  ;-)

So the upshot is that if HPET does not work, it should be disabled,
in which case the clocksource watchdog will be ignoring it.  In cases
where HPET can stop counting and is not disabled in the Linux kernel,
that is a bug that needs to be fixed by disabing HPET for those cases.

If TSC is the only clocksource, then the clocksource watchdog won't be
checking it.

All of this points to using the presumed-good clocksource to measure the
time between clocksource-watchdog checks, but excluding any silly cases.
For example, as Thomas Gleixner suggested, if the jiffies counter is
trying to be the presumed-good clocksource.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [tip: x86/urgent] x86/tsc: Disable clocksource watchdog for TSC on qualified platorms
  2021-11-17  2:37 ` [PATCH v3 2/2] x86/tsc: skip tsc watchdog checking for qualified platforms Feng Tang
  2021-11-30  6:46   ` Feng Tang
  2021-12-01  4:45   ` Luming Yu
@ 2021-12-01 23:47   ` tip-bot2 for Feng Tang
  2021-12-02  4:47   ` [PATCH v3 2/2] x86/tsc: skip tsc watchdog checking for qualified platforms Luming Yu
  3 siblings, 0 replies; 23+ messages in thread
From: tip-bot2 for Feng Tang @ 2021-12-01 23:47 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Feng Tang, Paul E. McKenney, stable, x86, linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     b50db7095fe002fa3e16605546cba66bf1b68a3e
Gitweb:        https://git.kernel.org/tip/b50db7095fe002fa3e16605546cba66bf1b68a3e
Author:        Feng Tang <feng.tang@intel.com>
AuthorDate:    Wed, 17 Nov 2021 10:37:51 +08:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 02 Dec 2021 00:40:36 +01:00

x86/tsc: Disable clocksource watchdog for TSC on qualified platorms

There are cases that the TSC clocksource is wrongly judged as unstable by
the clocksource watchdog mechanism which tries to validate the TSC against
HPET, PM_TIMER or jiffies. While there is hardly a general reliable way to
check the validity of a watchdog, Thomas Gleixner proposed [1]:

"I'm inclined to lift that requirement when the CPU has:

    1) X86_FEATURE_CONSTANT_TSC
    2) X86_FEATURE_NONSTOP_TSC
    3) X86_FEATURE_NONSTOP_TSC_S3
    4) X86_FEATURE_TSC_ADJUST
    5) At max. 4 sockets

 After two decades of horrors we're finally at a point where TSC seems
 to be halfway reliable and less abused by BIOS tinkerers. TSC_ADJUST
 was really key as we can now detect even small modifications reliably
 and the important point is that we can cure them as well (not pretty
 but better than all other options)."

As feature #3 X86_FEATURE_NONSTOP_TSC_S3 only exists on several generations
of Atom processorz, and is always coupled with X86_FEATURE_CONSTANT_TSC
and X86_FEATURE_NONSTOP_TSC, skip checking it, and also be more defensive
to use maximal 2 sockets.

The check is done inside tsc_init() before registering 'tsc-early' and
'tsc' clocksources, as there were cases that both of them had been
wrongly judged as unreliable.

For more background of tsc/watchdog, there is a good summary in [2]

[tglx} Update vs. jiffies:

  On systems where the only remaining clocksource aside of TSC is jiffies
  there is no way to make this work because that creates a circular
  dependency. Jiffies accuracy depends on not missing a periodic timer
  interrupt, which is not guaranteed. That could be detected by TSC, but as
  TSC is not trusted this cannot be compensated. The consequence is a
  circulus vitiosus which results in shutting down TSC and falling back to
  the jiffies clocksource which is even more unreliable.

[1]. https://lore.kernel.org/lkml/87eekfk8bd.fsf@nanos.tec.linutronix.de/
[2]. https://lore.kernel.org/lkml/87a6pimt1f.ffs@nanos.tec.linutronix.de/

[ tglx: Refine comment and amend changelog ]

Fixes: 6e3cd95234dc ("x86/hpet: Use another crystalball to evaluate HPET usability")
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211117023751.24190-2-feng.tang@intel.com

---
 arch/x86/kernel/tsc.c | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 2e076a4..a698196 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1180,6 +1180,12 @@ void mark_tsc_unstable(char *reason)
 
 EXPORT_SYMBOL_GPL(mark_tsc_unstable);
 
+static void __init tsc_disable_clocksource_watchdog(void)
+{
+	clocksource_tsc_early.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
+	clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
+}
+
 static void __init check_system_tsc_reliable(void)
 {
 #if defined(CONFIG_MGEODEGX1) || defined(CONFIG_MGEODE_LX) || defined(CONFIG_X86_GENERIC)
@@ -1196,6 +1202,23 @@ static void __init check_system_tsc_reliable(void)
 #endif
 	if (boot_cpu_has(X86_FEATURE_TSC_RELIABLE))
 		tsc_clocksource_reliable = 1;
+
+	/*
+	 * Disable the clocksource watchdog when the system has:
+	 *  - TSC running at constant frequency
+	 *  - TSC which does not stop in C-States
+	 *  - the TSC_ADJUST register which allows to detect even minimal
+	 *    modifications
+	 *  - not more than two sockets. As the number of sockets cannot be
+	 *    evaluated at the early boot stage where this has to be
+	 *    invoked, check the number of online memory nodes as a
+	 *    fallback solution which is an reasonable estimate.
+	 */
+	if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
+	    boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
+	    boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
+	    nr_online_nodes <= 2)
+		tsc_disable_clocksource_watchdog();
 }
 
 /*
@@ -1387,9 +1410,6 @@ static int __init init_tsc_clocksource(void)
 	if (tsc_unstable)
 		goto unreg;
 
-	if (tsc_clocksource_reliable || no_tsc_watchdog)
-		clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
-
 	if (boot_cpu_has(X86_FEATURE_NONSTOP_TSC_S3))
 		clocksource_tsc.flags |= CLOCK_SOURCE_SUSPEND_NONSTOP;
 
@@ -1527,7 +1547,7 @@ void __init tsc_init(void)
 	}
 
 	if (tsc_clocksource_reliable || no_tsc_watchdog)
-		clocksource_tsc_early.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
+		tsc_disable_clocksource_watchdog();
 
 	clocksource_register_khz(&clocksource_tsc_early, tsc_khz);
 	detect_art();

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [tip: x86/urgent] x86/tsc: Add a timer to make sure TSC_adjust is always checked
  2021-11-17  2:37 [PATCH v3 1/2] x86/tsc: add a timer to make sure tsc_adjust is always checked Feng Tang
  2021-11-17  2:37 ` [PATCH v3 2/2] x86/tsc: skip tsc watchdog checking for qualified platforms Feng Tang
@ 2021-12-01 23:47 ` tip-bot2 for Feng Tang
  2022-03-14 17:52 ` [PATCH v3 1/2] x86/tsc: add a timer to make sure tsc_adjust " Nicolas Saenz Julienne
  2 siblings, 0 replies; 23+ messages in thread
From: tip-bot2 for Feng Tang @ 2021-12-01 23:47 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Feng Tang, Thomas Gleixner, Paul E. McKenney, stable, x86, linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     c7719e79347803b8e3b6b50da8c6db410a3012b5
Gitweb:        https://git.kernel.org/tip/c7719e79347803b8e3b6b50da8c6db410a3012b5
Author:        Feng Tang <feng.tang@intel.com>
AuthorDate:    Wed, 17 Nov 2021 10:37:50 +08:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 02 Dec 2021 00:40:35 +01:00

x86/tsc: Add a timer to make sure TSC_adjust is always checked

The TSC_ADJUST register is checked every time a CPU enters idle state, but
Thomas Gleixner mentioned there is still a caveat that a system won't enter
idle [1], either because it's too busy or configured purposely to not enter
idle.

Setup a periodic timer (every 10 minutes) to make sure the check is
happening on a regular base.

[1] https://lore.kernel.org/lkml/875z286xtk.fsf@nanos.tec.linutronix.de/

Fixes: 6e3cd95234dc ("x86/hpet: Use another crystalball to evaluate HPET usability")
Requested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211117023751.24190-1-feng.tang@intel.com

---
 arch/x86/kernel/tsc_sync.c | 41 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 41 insertions(+)

diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c
index 50a4515..9452dc9 100644
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -30,6 +30,7 @@ struct tsc_adjust {
 };
 
 static DEFINE_PER_CPU(struct tsc_adjust, tsc_adjust);
+static struct timer_list tsc_sync_check_timer;
 
 /*
  * TSC's on different sockets may be reset asynchronously.
@@ -77,6 +78,46 @@ void tsc_verify_tsc_adjust(bool resume)
 	}
 }
 
+/*
+ * Normally the tsc_sync will be checked every time system enters idle
+ * state, but there is still caveat that a system won't enter idle,
+ * either because it's too busy or configured purposely to not enter
+ * idle.
+ *
+ * So setup a periodic timer (every 10 minutes) to make sure the check
+ * is always on.
+ */
+
+#define SYNC_CHECK_INTERVAL		(HZ * 600)
+
+static void tsc_sync_check_timer_fn(struct timer_list *unused)
+{
+	int next_cpu;
+
+	tsc_verify_tsc_adjust(false);
+
+	/* Run the check for all onlined CPUs in turn */
+	next_cpu = cpumask_next(raw_smp_processor_id(), cpu_online_mask);
+	if (next_cpu >= nr_cpu_ids)
+		next_cpu = cpumask_first(cpu_online_mask);
+
+	tsc_sync_check_timer.expires += SYNC_CHECK_INTERVAL;
+	add_timer_on(&tsc_sync_check_timer, next_cpu);
+}
+
+static int __init start_sync_check_timer(void)
+{
+	if (!cpu_feature_enabled(X86_FEATURE_TSC_ADJUST) || tsc_clocksource_reliable)
+		return 0;
+
+	timer_setup(&tsc_sync_check_timer, tsc_sync_check_timer_fn, 0);
+	tsc_sync_check_timer.expires = jiffies + SYNC_CHECK_INTERVAL;
+	add_timer(&tsc_sync_check_timer);
+
+	return 0;
+}
+late_initcall(start_sync_check_timer);
+
 static void tsc_sanitize_first_cpu(struct tsc_adjust *cur, s64 bootval,
 				   unsigned int cpu, bool bootcpu)
 {

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] x86/tsc: skip tsc watchdog checking for qualified platforms
  2021-11-17  2:37 ` [PATCH v3 2/2] x86/tsc: skip tsc watchdog checking for qualified platforms Feng Tang
                     ` (2 preceding siblings ...)
  2021-12-01 23:47   ` [tip: x86/urgent] x86/tsc: Disable clocksource watchdog for TSC on qualified platorms tip-bot2 for Feng Tang
@ 2021-12-02  4:47   ` Luming Yu
  3 siblings, 0 replies; 23+ messages in thread
From: Luming Yu @ 2021-12-02  4:47 UTC (permalink / raw)
  To: Feng Tang
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Peter Zijlstra, the arch/x86 maintainers, LKML,
	paulmck, rui.zhang, andi.kleen, Len Brown, tim.c.chen

On Tue, Nov 16, 2021 at 11:18 PM Feng Tang <feng.tang@intel.com> wrote:
>
> There are cases that tsc clocksources are wrongly judged as unstable by
> clocksource watchdogs like hpet, acpi_pm or 'refined-jiffies'. While
> there is hardly a general reliable way to check the validity of a
> watchdog, and to protect the innocent tsc, Thomas Gleixner proposed [1]:
>
> "I'm inclined to lift that requirement when the CPU has:
>
>     1) X86_FEATURE_CONSTANT_TSC
>     2) X86_FEATURE_NONSTOP_TSC
>     3) X86_FEATURE_NONSTOP_TSC_S3
>     4) X86_FEATURE_TSC_ADJUST
>     5) At max. 4 sockets
>
>  After two decades of horrors we're finally at a point where TSC seems
>  to be halfway reliable and less abused by BIOS tinkerers. TSC_ADJUST
>  was really key as we can now detect even small modifications reliably
>  and the important point is that we can cure them as well (not pretty
>  but better than all other options)."
>
> As feature #3 X86_FEATURE_NONSTOP_TSC_S3 only exists on several generations
> of Atom processor, and is always coupled with X86_FEATURE_CONSTANT_TSC
> and X86_FEATURE_NONSTOP_TSC, skip checking it, and also be more defensive
> to use maxim of 2 sockets.
>
> The check is done inside tsc_init() before registering 'tsc-early' and
> 'tsc' clocksources, as there were cases that both of them had been
> wrongly judged as unreliable.
>
> For more background of tsc/watchdog, there is a good summary in [2]
>
> [1]. https://lore.kernel.org/lkml/87eekfk8bd.fsf@nanos.tec.linutronix.de/
> [2]. https://lore.kernel.org/lkml/87a6pimt1f.ffs@nanos.tec.linutronix.de/
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Feng Tang <feng.tang@intel.com>
> ---
> Change log:
>
>   v3:
>     * rebased against 5.16-rc1
>     * refine commit log
>
>   v2:
>     * Directly skip watchdog check without messing flag
>       'tsc_clocksource_reliable' (Thomas)
>
>  arch/x86/kernel/tsc.c | 22 ++++++++++++++++++----
>  1 file changed, 18 insertions(+), 4 deletions(-)

retry:

Reviewed-by: Luming Yu <luming.yu@intel.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/2] x86/tsc: skip tsc watchdog checking for qualified platforms
  2021-11-30 16:28         ` Paul E. McKenney
  2021-11-30 20:39           ` Thomas Gleixner
@ 2021-12-07  1:41           ` Feng Tang
  1 sibling, 0 replies; 23+ messages in thread
From: Feng Tang @ 2021-12-07  1:41 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Peter Zijlstra, x86, linux-kernel, rui.zhang,
	andi.kleen, len.brown, tim.c.chen

Hi Paul,

On Tue, Nov 30, 2021 at 08:28:15AM -0800, Paul E. McKenney wrote:
> On Tue, Nov 30, 2021 at 11:02:56PM +0800, Feng Tang wrote:
> > And similar big gap between 'tsc' and 'hpet' is seen for the server
> > case (5.5 kernel which doesn't have the cs_watchdog_read() patchset). 
> > 
> > [1196945.314929] clocksource: timekeeping watchdog on CPU67: Marking clocksource 'tsc' as unstable because the skew is too large:
> > [1196945.314935] clocksource:                       'hpet' wd_now: 25272026 wd_last: 2e9ce418 mask: ffffffff
> > [1196945.314938] clocksource:                       'tsc' cs_now: 95b400003fdf1 cs_last: 95ae7ed7c33f7 mask: ffffffffffffffff
> > [1196945.314948] tsc: Marking TSC unstable due to clocksource watchdog
> > [1196945.314977] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
> > [1196945.314981] sched_clock: Marking unstable (1196945264804527, 50153181)<-(1196945399926576, -84962703)
> > [1196945.316255] clocksource: Switched to clocksource hpet
> > 
> > For this case, I don't have access to the HW and only have the
> > dmesg log, from which it seems the watchdog timer has been postponed
> > a very long time from running.
> 
> Thank you for the analysis!
> 
> One approach to handle this situation would be to avoid checking for
> clock skew if the time since the last watchdog read was more than (say)
> twice the desired watchdog spacing.  This does leave open the question of
> exactly which clocksource to use to measure the time between successive
> clocksource reads.  My thought is to check this only once upon entry to
> the handler and to use the designated-good clocksource.
> 
> Does that make sense, or would something else work better?

For this case that the watchdog timer has been delayed for too long
time (170 seconds here), it may be a general problem. IIRC, there
was a similar report in LKML for a non-x86 platform. 

As for fix, I thought about scalable comparing, say if the timer
is delayed 10 seconds, and our checking interval is 500 ms, then
maybe we can lift the checking margin to 20X. But this has a problem
that the watchdog's counter could wrap, in above case, the HPET
already wrapped once (about 170+ seconds), and the wrap time 
could be much shorter for other timers (4 seconds for acpi_pm timer?).
So your idea of limiting the max delay is reasonable.

Thanks,
Feng

> 							Thanx, Paul

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/2] x86/tsc: add a timer to make sure tsc_adjust is always checked
  2021-11-17  2:37 [PATCH v3 1/2] x86/tsc: add a timer to make sure tsc_adjust is always checked Feng Tang
  2021-11-17  2:37 ` [PATCH v3 2/2] x86/tsc: skip tsc watchdog checking for qualified platforms Feng Tang
  2021-12-01 23:47 ` [tip: x86/urgent] x86/tsc: Add a timer to make sure TSC_adjust is always checked tip-bot2 for Feng Tang
@ 2022-03-14 17:52 ` Nicolas Saenz Julienne
  2022-03-15  1:33   ` Feng Tang
  2 siblings, 1 reply; 23+ messages in thread
From: Nicolas Saenz Julienne @ 2022-03-14 17:52 UTC (permalink / raw)
  To: feng.tang, tglx
  Cc: andi.kleen, bp, dave.hansen, hpa, len.brown, linux-kernel, mingo,
	paulmck, peterz, rui.zhang, tim.c.chen, x86, mtosatti, nsaenzju,
	frederic

Hi Feng, Thomas,

> On Wed, Nov 17, 2021 at 10:37:51AM +0800, Feng Tang wrote:
> Normally the tsc_sync will get checked every time system enters idle state,
> but Thomas Gleixner mentioned there is still a caveat that a system won't
> enter idle [1], either because it's too busy or configured purposely to not
> enter idle. Setup a periodic timer (every 10 minitues) to make sure the
> check is always on.
> 
> [1]. https://lore.kernel.org/lkml/875z286xtk.fsf@nanos.tec.linutronix.de/
> Signed-off-by: Feng Tang <feng.tang@intel.com>

I can see this timer interrupting my system's nohz_full CPUs. It'd be nice to
be able to avoid the noise. A solution is using 'tsc=reliable', but IIUC this
is not what the flag was created for. Any ideas/suggestions?

Regards,
Nicolas


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/2] x86/tsc: add a timer to make sure tsc_adjust is always checked
  2022-03-14 17:52 ` [PATCH v3 1/2] x86/tsc: add a timer to make sure tsc_adjust " Nicolas Saenz Julienne
@ 2022-03-15  1:33   ` Feng Tang
  0 siblings, 0 replies; 23+ messages in thread
From: Feng Tang @ 2022-03-15  1:33 UTC (permalink / raw)
  To: Nicolas Saenz Julienne
  Cc: tglx, andi.kleen, bp, dave.hansen, hpa, len.brown, linux-kernel,
	mingo, paulmck, peterz, rui.zhang, tim.c.chen, x86, mtosatti,
	frederic

Hi Nicolas,

Thanks for raising it.

On Mon, Mar 14, 2022 at 06:52:07PM +0100, Nicolas Saenz Julienne wrote:
> Hi Feng, Thomas,
> 
> > On Wed, Nov 17, 2021 at 10:37:51AM +0800, Feng Tang wrote:
> > Normally the tsc_sync will get checked every time system enters idle state,
> > but Thomas Gleixner mentioned there is still a caveat that a system won't
> > enter idle [1], either because it's too busy or configured purposely to not
> > enter idle. Setup a periodic timer (every 10 minitues) to make sure the
> > check is always on.
> > 
> > [1]. https://lore.kernel.org/lkml/875z286xtk.fsf@nanos.tec.linutronix.de/
> > Signed-off-by: Feng Tang <feng.tang@intel.com>
> 
> I can see this timer interrupting my system's nohz_full CPUs. It'd be nice to
> be able to avoid the noise. A solution is using 'tsc=reliable', but IIUC this
> is not what the flag was created for. Any ideas/suggestions?
 
This patch is about correctness. And yes, as you said, the 'tsc=reliable'
works. Another thought is to leverage the 'housekeeping_mask' to exclude
the isolated nohz_full CPUs, but the 'caveat' what the patch try to solve
remains.

Also before this patchset, the clocksource watchdog timer is fired every
500ms, which should interrupt nohz_full CUPS more? 

Thanks,
Feng

> Regards,
> Nicolas

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2022-03-15  1:33 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-17  2:37 [PATCH v3 1/2] x86/tsc: add a timer to make sure tsc_adjust is always checked Feng Tang
2021-11-17  2:37 ` [PATCH v3 2/2] x86/tsc: skip tsc watchdog checking for qualified platforms Feng Tang
2021-11-30  6:46   ` Feng Tang
2021-11-30 14:40     ` Paul E. McKenney
2021-11-30 15:02       ` Feng Tang
2021-11-30 16:28         ` Paul E. McKenney
2021-11-30 20:39           ` Thomas Gleixner
2021-11-30 20:47             ` Paul E. McKenney
2021-11-30 21:55               ` Thomas Gleixner
2021-11-30 22:48                 ` Paul E. McKenney
2021-11-30 23:19                   ` Thomas Gleixner
2021-11-30 23:37                     ` Paul E. McKenney
2021-12-01  1:26                       ` Feng Tang
2021-12-01 17:52                         ` Paul E. McKenney
2021-12-07  1:41           ` Feng Tang
2021-12-01  4:45   ` Luming Yu
2021-12-01  5:19     ` Feng Tang
2021-12-01 10:41     ` Thomas Gleixner
2021-12-01 23:47   ` [tip: x86/urgent] x86/tsc: Disable clocksource watchdog for TSC on qualified platorms tip-bot2 for Feng Tang
2021-12-02  4:47   ` [PATCH v3 2/2] x86/tsc: skip tsc watchdog checking for qualified platforms Luming Yu
2021-12-01 23:47 ` [tip: x86/urgent] x86/tsc: Add a timer to make sure TSC_adjust is always checked tip-bot2 for Feng Tang
2022-03-14 17:52 ` [PATCH v3 1/2] x86/tsc: add a timer to make sure tsc_adjust " Nicolas Saenz Julienne
2022-03-15  1:33   ` Feng Tang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).